Abstract
IntroductionLarge Cell Carcinoma (LCC) is a rare, aggressive subtype of non-small cell lung cancer marked by rapid progression and limited treatment options. Prognostic assessment in this population has rarely been addressed, and conventional models based on American Joint Committee on Cancer staging often fail to capture the complex biological behavior of LCC. Machine learning approaches, however, have the potential to integrate diverse clinical data, thereby enhancing prognostic accuracy and supporting personalized treatment strategies.MethodsWe conducted a retrospective study utilizing the Surveillance, Epidemiology, and End Results (SEER) Program database to develop and validate multiple machine learning-based prognostic models for Overall Survival (OS) in a cohort of 1,867 LCC patients. A rigorous analytical framework was implemented, incorporating feature selection via Lasso-Cox regression and the Boruta algorithm. Model performance was assessed using time-dependent Area Under the Curve (AUC), calibration plots, decision curve analysis, and Brier scores. Advanced interpretability techniques including SHapley Additive exPlanations, partial dependence plots, and Restricted Cubic Splines (RCS) were applied to elucidate the prognostic impact of key features.ResultsAmong the models evaluated, the Random Survival Forest (RSF) demonstrated superior discriminatory power and robust calibration compared to traditional Cox regression and other machine learning methods. Specifically, the RSF model achieved AUC values of 0.858 (95% CI: 0.838-0.878) for 3-year OS prediction and 0.863 (95% CI: 0.84-0.886) for 5-year OS prediction in the training set. Key prognostic factors identified included tumor size, metastatic status, and treatment interventions, with RCS analysis revealing significant non-linear relationships between tumor size and survival outcomes.ConclusionsOur machine learning framework, especially the RSF algorithm, exhibits strong predictive performance for OS in LCC patients. The development of an accessible, web-based platform enhances clinical applicability by offering a novel, data-driven approach to personalized treatment planning and risk stratification.