Abstract
BACKGROUND/OBJECTIVES: Follicular lymphoma (FL) and marginal zone lymphoma (MZL) are low-grade B-cell lymphomas (LGBCLs) with indolent clinical courses but a lifelong risk of histologic transformation (HT) to aggressive lymphomas, particularly diffuse large B-cell lymphoma. Predicting HT can be challenging due to class imbalances and the inherent complexity of time-dependent events. While there are current prognostic indices for survival, they do not specifically address HT risk. This study aimed to develop and validate survival-based and traditional classification machine-learning models to predict HT in cohorts. METHODS: Using a multicenter retrospective dataset (n = 1068), survival models (Cox proportional hazards, Lasso-Cox, Random Survival Forest, Gradient-boosted Cox [GBM-Cox], eXtreme Gradient Boosting [XGBoost]-Cox), and classification models (Logistic regression, Lasso logistic, Random Forest, Gradient Boosting, XGBoost) were compared. The best-performing survival models-XGBoost-Cox, Lasso-Cox, and GBM-Cox-were assessed on an independent test set (n = 92). Model sensitivity was maximized using optimal binary risk cutoff points based on Youden's index. RESULTS: Survival models showed superior predictive performance than classical classifiers, with XGBoost-Cox exhibiting the highest mean accuracy (85.3%), time-dependent area under the curve (0.795), sensitivity (98%), specificity (83.9%), and concordance index (0.836). Incorporating next-generation sequencing (NGS) data improved model accuracy and specificity, indicating that genetic factors improve HT prediction. Principal component analysis revealed distinct gene mutation patterns associated with HT risk, highlighting DNA-repair genes such as TP53, BLM, and RAD50. CONCLUSIONS: This study highlights the clinical value of survival-based machine-learning methods integrated with NGS data to personalize HT risk stratification for patients with FL and MZL.