Abstract
OBJECTIVE: Depression represents a significant global health challenge, further complicated by the multifaceted and complex nature of its diagnosis and treatment. This study explores the application of multiple feature selection (FS) methodologies combined with XAI (explainable artificial intelligence) method named SHapley Additive exPlanations (SHAP) to enhance predictive accuracy in depression classification models using large-scale national survey data. METHODS: Leveraging microdata from the National Mental Health Survey of Korea (2021), encompassing 5511 Korean adults, this research systematically evaluates how different FS-machine learning classifier combinations affect model performance and identifies nondiagnostic socioeconomic, psychological, and lifestyle factors associated with clinically diagnosed depression. By employing diverse FS methods (e.g., ReliefF, Markov Blanket, and Information Gain) across multiple machine learning classifiers, we systematically compare their performance across 12 classifiers. RESULTS: We demonstrate that optimal FS method selection depends on machine learning classifier architecture, with ReliefF excelling in Stacking (F2-score =0.9851) and Markov Blanket performing best in ExtraTrees and LightGBM (F2-score =0.9848, 0.9838). After excluding core diagnostic criteria variables to avoid circularity, our analysis reveals that social distress (loneliness), reluctance to seek professional help, quality of life measures, and physical health comorbidities emerge as highly influential nondiagnostic predictors. CONCLUSION: Our findings advance the field by: (1) systematically demonstrating that FS method effectiveness varies by machine learning classifier type, (2) providing a dual-layer XAI framework combining FS with SHAP for comprehensive interpretability, and (3) identifying culturally specific risk factors in an underrepresented Asian population using high-quality face-to-face collected data. These contributions provide methodological guidance for researchers developing interpretable depression prediction models and offer clinically actionable insights for identifying at-risk individuals in Korean populations.