Abstract
BACKGROUND: Classification of the etiologies of pleural effusion is a critical challenge in clinical practice. Traditional diagnostic methods rely on a simple cut-off method based on the laboratory tests. However, machine learning (ML) offers a novel approach based on artificial intelligence to improving diagnostic accuracy and capture the non-linear relationships. METHOD: A retrospective study was conducted using data from patients diagnosed with pleural effusion. The dataset was divided into training and test set with a ratio of 7:3 with 6 machine learning algorithms implemented to diagnosis pleural effusion. Model performances were assessed by accuracy, precision, recall, F1 scores and area under the receiver operating characteristic curve (AUC). Feature importance and average prediction of age, Adenosine (ADA) and Lactate dehydrogenase (LDH) was analyzed. Decision tree was visualized. RESULTS: A total of 742 patients were included (training cohort: 522, test cohort: 220), 397 (53.3%) diagnosed with malignant pleural effusion (MPE) and 253 (34.1%) with tuberculous pleural effusion (TPE) in the cohort. All of the 6 models performed well in the diagnosis of MPE, TPE and transudates. Extreme Gradient Boosting and Random Forest performed better in the diagnosis of the MPE, with F1 scores above 0.890, while K-Nearest Neighbors and Tabular Transformer performed better in the diagnosis of the TPE, with F1 scores above 0.870. ADA was identified as the most important feature. The ROC of machine learning model outperformed those of conventional diagnostic thresholds. CONCLUSIONS: This study demonstrates that ML models using age, ADA, and LDH can effectively classify the etiologies of pleural effusion, suggesting that ML-based approaches may enhance diagnostic decision-making.