Genetic algorithms as tool for statistical analysis of high-dimensional data structures
Autori
Viac o knihe
In regression the objective is to determine an appropriate function which reflects reality as accurate as possible but also eliminates irregularities from data noise and is therefore easy to interpret. A popular and flexible approach for estimating the true underlying function is the additive model. One possible approach for fitting additive models is the expansion in B-splines which allows direct calculation of the estimators. If the number of B-splines is too large the estimated functions become wiggly and tend to be very close to the observed data. To avoid this problem of overfitting we use a penalization approach characterized by smoothing parameters. In this thesis we propose the use of genetic algorithms for smoothing parameter optimization. Genetic algorithms are rarely applied in the field of statistics and refer to the principle that better adapted individuals win against their competitors under equal conditions. Apart from smoothing parameter optimization the user often faces datasets containing large numbers of relevant and irrelevant explanatory variables. Appropriate variable selection approaches allow to reduce the number of variables to subsets of relevant variables. We propose to consider the problems of variable selection and choice of smoothing parameters simultaneously by using genetic algorithms. Our approach bases on an appropriate combination of the genetic algorithms for smoothing parameter optimization and variable selection.