Predicting depression by using a novel deep learning model and video-audio-text multimodal data

利用新型深度学习模型和视频-音频-文本多模态数据预测抑郁症

阅读：1

作者：Li,Yifu,Yang,Xueping,Zhao,Meng,Wang,Jiangtao,Yao,Yudong,Qian,Wei,Qi,Shouliang

期刊：	Frontiers in Psychiatry	影响因子：	3.200
时间：	2025	起止号：	2025;16:1602650
doi：	10.3389/fpsyt.2025.1602650	研究方向：	神经科学
疾病类型：	抑郁症

Abstract

OBJECTIVE: Depression is a prevalent mental health disorder affecting millions of people. Traditional diagnostic methods primarily rely on self-reported questionnaires and clinical interviews, which can be subjective and vary significantly between individuals. This paper introduces the Integrative Multimodal Depression Detection Network (IMDD-Net), a novel deep-learning framework designed to enhance the accuracy of depression evaluation by leveraging both local and global features from video, audio, and text cues. METHODS: The IMDD-Net integrates these multimodal data streams using the Kronecker product for multimodal fusion, facilitating deep interactions between modalities. Within the audio modality, Mel Frequency Cepstrum Coefficient (MFCC) and extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) features capture local and global acoustic properties, respectively. For video data, the TimeSformer network extracts both fine-grained and broad temporal features, while the text modality utilizes a pre-trained BERT model to obtain comprehensive contextual information. The IMDD-Net's architecture effectively combines these diverse data types to provide a holistic analysis of depressive symptoms. RESULTS: Experimental results on the AVEC 2014 dataset demonstrate that the IMDD-Net achieves state-of-the-art performance in predicting Beck Depression Inventory-II (BDI-II) scores, with a Root Mean Square Error (RMSE) of 7.55 and a Mean Absolute Error (MAE) of 5.75. A classification to identify potential depression subjects can achieve an accuracy of 0.79. CONCLUSION: These results underscore the robustness and precision of the IMDD-Net, highlighting the importance of integrating local and global features across multiple modalities for accurate depression prediction.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。