Abstract
BACKGROUND: Major depressive disorder (MDD) remains challenging to diagnose due to its reliance on subjective interviews and self-reports. Objective, technology-driven methods are increasingly needed to support clinical decision-making. Wearable point-of-view (POV) glasses, which capture both visual and auditory streams, may offer a novel solution for multimodal behavioral analysis. OBJECTIVE: This study investigated whether features extracted from POV glasses, analyzed with machine learning, can differentiate individuals with MDD from healthy controls. METHODS: We studied 44 MDD patients and 41 age/sex-matched HCs (18-55 years). During semi-structured interviews, POV glasses recorded video and audio data. Visual features included gaze distribution, smiling duration, eye-blink frequency, and head movements. Speech features included response latency, silence ratio, and word count. Recursive feature elimination was applied. Multiple classifiers were evaluated, and the primary model-ExtraTrees-was assessed using leave-one-out cross-validation. RESULTS: After Bonferroni correction, smiling duration, center gaze and happy face duration showed significant group differences. The multimodal classifier achieved an accuracy of 84.7%, sensitivity of 90.9%, specificity of 78%, and an F1 score of 86%. CONCLUSIONS: POV glasses combined with machine learning successfully captured multimodal behavioral markers distinguishing MDD from controls. This low-burden, wearable approach demonstrates promise as an objective adjunct to psychiatric assessment. Future studies should evaluate its generalizability in larger, more diverse populations and real-world clinical settings.