Abstract
Despite the rapid advancement of machine learning algorithms, the important problem of distinguishing patients based on the likelihood of their mortality remains a challenge. In this paper, we investigated the degree to which the incorporation of the time-varying factor, length of hospitalization could contribute to modeling mortality. A two-part modeling approach was proposed to capture the potential heterogeneity over follow-up time and to evaluate the extent to which allowing a predictor based on a fixed-time event like hospitalization (as a time-varying coefficient) enhanced mortality prediction. A test was then conducted to assess whether the association between hospitalization and mortality diminished with continued survival of a patient. Leveraging logistic regression models and the XGBoost procedure, the findings supported the claim that the baseline hospitalization is a risk factor whose importance diminishes the longer the patient survives. While simulation studies and theoretical considerations indicate that the two-part model provides deeper insight into the evolving dynamics of regression coefficients and enhances the prediction accuracy of the marginal probability of mortality, its application to the empirical data that motivated this research yielded less compelling results, a finding that aligns with previous findings. Factors such as class imbalance and the magnitude of heterogeneous effects can significantly impact the performance of the two-part model in empirical datasets.