Abstract
OBJECTIVES: Rapid discrimination of infections caused by Mycobacterium tuberculosis (MTB) and non-tuberculous mycobacteria (NTM) is crucial in clinical settings. Despite overlapping clinical and radiological features, the two require markedly different therapeutic approaches and public health responses. Current laboratory methods are time-consuming and complex, underscoring the urgent need for a simple and efficient diagnostic tool to inform public health decision-making. METHODS: Demographic, haematological and biochemical data were collected from two hospitals in Jiangsu province, China, between December 2018 and October 2024. A total of 400 patients were included in the training cohort, with 66 patients used for external validation. Six machine learning models were developed using routine laboratory features, and their performance was evaluated using multiple metrics. RESULTS: The random forest (RF) model outperformed others using 49 routine lab features, achieving 82.71% accuracy in the internal cohort and 87.69% in external validation. SHapley Additive exPlanations (SHAP) model identified the top 10 critical features influencing model decisions, namely, chloride, sodium, gender, prealbumin, high-density lipoprotein, procalcitonin, albumin, globulin, total protein and creatine. Based on these indicators, an interactive web-based tool was developed (https://mtb-ntm.streamlit.app). DISCUSSION: The features identified by the model align with established clinical parameters and existing studies. Certain previously underestimated variables, such as Cl and Na, exhibited substantial importance in distinguishing between MTB and NTM, offering valuable insights for the development of decision-support tools. CONCLUSION: Routine laboratory indicators coupled with the RF model demonstrated potential capacity as an auxiliary diagnostic tool for discriminating MTB and NTM disease, offering effective medical support in resource-limited and remote settings.