Bayesian regularization and model choice in structured additive regression
Autori
Viac o knihe
In regression models with a large number of potential model terms, the selection of an appropriate subset of covariates and their interactions is an important challenge for data analysis, as is the choice of the appropriate representation of their impact on the quantities to be estimated such as deciding between linear or smooth non-linear effects. The main part of this work is dedicated to the development, implementation and validation of an extension of stochastic search variable selection (SSVS) for structured additive regression models aimed at finding and estimating appropriate and parsimonious model representations. The approach described here is the first implementation of fully Bayesian variable selection and model choice for general responses from the exponential family in generalized additive mixed models (GAMM) available in free and open source software. It is based on a spike-and-slab prior on the regression coefficients with an innovative multiplicative parameter expansion that induces desirable shrinkage properties. This thesis points out a possible reason why previous attempts at extending SSVS algorithms for the selection of parameter vectors have not been entirely successful, discusses the regularization properties of the novel prior structure, investigates sensitivity of observed results with regard to the choice of hyperparameters and compares the performance on real and simulated data in a variety of scenarios to that of established methods such as boosting, conventional generalized additive mixed models and LASSO estimation. Some case studies show the usefulness as well as the limitations of the approach. The second part of this work presents a method for locally adaptive function estimation for functions with spatially varying roughness properties. An implementation of locally adaptive penalized spline smoothing using a class of heavy-tailed shrinkage priors for the estimation of functional forms with highly varying curvature or discontinuities is presented. These priors utilize scale mixtures of normals with locally varying exponential-gamma distributed variances for the differences of the P-spline coefficients. Extensive simulation studies for Gaussian, Poisson, and Binomial responses shows that the performance is competitive to that of previous approaches. Results from two applications support the conclusions of the simulation studies.