Abstract
This exploratory study investigated whether voice-derived acoustic features reflect depressive symptom severity and whether they carry preliminary predictive signal for distinguishing individuals with Major Depressive Disorder (MDD) from healthy controls (HC). Using the publicly available MODMA dataset (23 MDD; 29 HC), 6553 acoustic features were extracted with openSMILE. Spearman correlation and group-difference analyses identified several MFCC-derived spectral features as moderately and systematically associated with PHQ-9 scores, indicating their potential relevance as severity-linked acoustic markers. To complement these findings, a supplementary severity-based classification using a PHQ-9 ≥ 10 threshold showed that a logistic regression model trained on the top five correlated MFCC features achieved a cross-validated AUC of 0.78 (SD = 0.15), supporting their association with clinically defined symptom burden. Four machine learning pipelines were further evaluated for an exploratory MDD-HC classification task. Among them, the PCA + XGBoost model demonstrated the most stable generalization (test AUC = 0.60), although predictive performance remained limited within the constraints of the small and high-dimensional dataset. SHAP analysis highlighted MFCC-derived features as key contributors to model decisions, providing transparent interpretability. Overall, the study presents preliminary evidence linking acoustic characteristics to depressive symptoms and outlines a reproducible analytical workflow, while underscoring the need for substantially larger and more diverse datasets to establish clinically meaningful predictive validity.