Abstract
Modeling prognosis has critical implications in cancer research and clinical practice. Many studies have been conducted, built on genomic (and omics in general) and pathological imaging data. In recent research, a handful of studies have also integrated the two types of data, taking advantage of their complementary signals. In this study, we further advance cancer prognosis modeling by developing a semiparametric accelerated failure time model. For modeling the high-dimensional genomic variables-where strong interpretability is desired, we assume parametric effects. For modeling the high-dimensional pathological imaging features-where more flexible/effective modeling can be preferred over interpretability, we assume non-parametric effects. Different from many existing studies, such non-parametric effects are estimated using deep neural networks. To differentiate important genomic and pathological imaging variables from noises, we impose penalization on both the parametric and non-parametric effects. In particular, for the non-parametric effects, we apply group penalization to the first-layer weights. The asymptotic selection and estimation consistency and normality properties are carefully established, which can provide a uniquely strong ground. Computation is examined. Simulation demonstrates competitive performance of the proposed approach. In the analysis of The Cancer Genome Atlas data on lung cancer, the proposed approach leads to satisfactory and sensible findings different from the alternatives.