Abstract
Dynamic treatment regimens (DTRs), where treatment decisions are tailored to individual patient's characteristics and evolving health status over multiple stages, have gained increasing interest in the modern era of precision medicine. Identifying important features that drive these decisions over stages not only leads to parsimonious DTRs for practical use but also enhances the reliability of learning optimal DTRs. Existing methods for learning optimal DTRs, such as Q-learning and O-learning, rely on a sequential procedure to estimate the optimal decision at each stage backward. Incorporating feature selection in these methods through regularization at each stage of estimation only identifies unimportant tailoring variables at each stage but is not necessary for those variables that are not important across all the stages. As a result, false discovery errors are likely to accumulate over stages in these sequential methods. To overcome this limitation, we propose a framework, namely L1 multistage ramp loss (L1-MRL) learning, to learn the optimal decision rules and, at the same time, perform variable selection across all the stages simultaneously. This framework uses a single multistage ramp loss to estimate optimal DTRs for all stages. Furthermore, a group Lasso-type penalty is imposed to penalize the coefficients in the decision rules across all stages, which enables the identification of features that are important for at least one stage decision. Theoretically, we show that the estimator is consistent and enjoys the oracle property toward the optimal. We demonstrate that the proposed method performs equally well as or better than many existing DTR methods with variable selection capability via extensive simulation studies and an application to electronic health record (EHR) data for type 2 diabetes (T2D) patients.