Abstract
OBJECTIVES: Anaplastic large cell lymphoma-anaplastic lymphoma kinase (ALCL-ALK) positive is a rare subtype of peripheral T-cell lymphoma with generally favorable prognosis but marked clinical heterogeneity. Due to limited research and complex prognostic factors, accurately predicting survival outcomes remains a clinical challenge. METHODS: This retrospective cohort study extracted data on 473 patients diagnosed with ALCL-ALK positive between 2016 and 2021 from the Surveillance, Epidemiology, and End Results (SEER) database. A multi-algorithm ensemble machine-learning framework was constructed. The performance of each model combination was systematically evaluated using the area under the curve (AUC) and concordance index, and the least absolute shrinkage and selection operator (LASSO) + Cox proportional hazards model via componentwise likelihood-based boosting (CoxBoost) combination was selected as the optimal model. Feature selection and survival prediction were performed, and model performance was validated using receiver operating characteristic curves and calibration plots. RESULTS: Six key prognostic variables were identified: age, marital status, Ann Arbor stage, radiotherapy, lung metastasis, and primary site. The LASSO + CoxBoost model demonstrated good discriminative ability, calibration, and effective risk stratification in both training and validation set. The AUC at 1, 3, and 5 years were 0.894, 0.829, and 0.862 in the training set, and 0.733, 0.773, and 0.881 in the validation set. Kaplan-Meier survival curves revealed significantly longer overall survival in the low-risk group compared to the high-risk group (P < 0.05). CONCLUSION: This study is the first to construct a survival prediction model for ALCL-ALK positive using a multi-algorithm ensemble strategy. The model offers a practical tool for individualized risk assessment and may aid in optimizing clinical decision-making.