Abstract
Motivated by the CATHGEN data, we develop a new statistical method for simultaneous variable selection and parameter estimation in the context of generalized partly linear models for data with high-dimensional covariates. The method is referred to as the broken adaptive ridge (BAR) estimator, which is an approximation of the L0 -penalized regression by iteratively performing reweighted squared L2 -penalized regression. The generalized partly linear model extends the generalized linear model by incorporating a non-parametric component, allowing for the construction of a flexible model to capture various types of covariate effects. We employ the Bernstein polynomials as the sieve space to approximate the non-parametric functions so that our method can be implemented easily using the existing R packages. Extensive simulation studies suggest that the proposed method performs better than other commonly used penalty-based variable selection methods. We apply the method to the CATHGEN data with a binary response from a coronary artery disease study, which motivated our research, and obtained new findings in both high-dimensional genetic and low-dimensional non-genetic covariates.