Multimodal Depression Detection Through Conversational Interactions with an Emotion-Aware Social Robot: Pilot Study

通过与具有情绪感知能力的社交机器人进行对话互动进行多模态抑郁症检测:初步研究

阅读:2

Abstract

BACKGROUND: Depression affects more than 300 million people worldwide and is a leading contributor to the global disease burden. Traditional diagnostic methods, such as structured clinical interviews, are reliable but impractical for frequent or large-scale screening. Self-report tools like the Patient Health Questionnaire-8 (PHQ-8) require disclosure and clinician oversight, limiting accessibility. Recent artificial intelligence-based approaches leverage multimodal behavioral cues (linguistic, acoustic, and visual) for automated depression detection but remain constrained by limited adaptability, scarce annotated data, weak emotional expression in real-world settings, and the high computational cost of deployment of socially assistive robots (SARs). OBJECTIVE: This study introduces Depression Social Assistant Robot (DEPRESAR)-Fusion, a lightweight multimodal depression detection framework designed for natural interactions with emotion-aware SARs. The objective of this study was to enhance detection accuracy in everyday conversations while addressing the challenges of data scarcity, weak emotional cues, and computational efficiency. METHODS: DEPRESAR-Fusion integrates acoustic, linguistic, and visual features with an emotion-aware response module powered by large language models to adapt conversational strategies dynamically. To stimulate richer emotional expression, participants were exposed to emotionally evocative videos before SAR interactions. To overcome data scarcity, we augmented training with (1) public depression-related social media corpora and (2) synthetic samples generated via large language models. The proposed multimodal fusion architecture was evaluated on benchmark clinical datasets for both binary depression classification and PHQ-8 regression tasks. Performance was compared against prior multimodal baselines using root mean square error, mean absolute error, and standard classification metrics. RESULTS: Participants who viewed emotional stimuli before interacting with SARs exhibited significantly higher emotional expressiveness, leading to improved model performance. Regression tasks showed lower root mean square error and mean absolute error, while classification tasks achieved significantly higher accuracy than the nonstimulus condition. DEPRESAR-Fusion outperformed prior multimodal baselines across multiple benchmark datasets, achieving state-of-the-art performance in both binary classification and PHQ-8 regression. The system maintained a lightweight architecture suitable for real-time deployment on SARs. CONCLUSIONS: DEPRESAR-Fusion demonstrates that integrating emotion induction, data augmentation, and lightweight multimodal fusion can enable accurate and scalable depression detection in naturalistic SAR interactions. By bridging the gap between structured clinical assessments and everyday conversations, this approach highlights the potential of SAR-based systems as nonintrusive, artificial intelligence-driven tools for proactive mental health support.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。