Abstract
BACKGROUND: Existing models do poorly when it comes to quantifying the risk of lymph node metastases (LNM). This study aimed to develop a machine-learning model for LNM in patients with T1 esophageal squamous cell carcinoma (ESCC). METHODS AND RESULTS: The study is multicenter and population based. Elastic net regression (ELR), random forest (RF), extreme gradient boosting (XGB), and a combined (ensemble) model of these were generated. The contribution to the model of each factor was calculated. The models all exhibited potent discriminating power. The elastic net regression performed best with an externally validated the area under the curve (AUC) of 0.803, whereas the NCCN guidelines identified patients with LNM with an AUC of 0.576 and the logistic model with an AUC of 0.670. The most important features were lymphatic and vascular invasion and depth of tumor invasion. CONCLUSIONS: Models created utilizing machine learning approaches had excellent performance estimating the likelihood of LNM in T1 ESCC.