Multimodal Multitask Learning for Predicting Depression Severity and Suicide Risk Using Pretrained Audio and Text Embeddings: Methodology Development and Application

基于预训练音频和文本嵌入的多模态多任务学习预测抑郁症严重程度和自杀风险：方法论开发与应用

阅读：1

作者：Hu,Ya-Han,Wu,Ruei-Yan,Su,Min-Yi,Lin,I-Li,Shen,Cheng-Che

期刊：	JMIR Medical Informatics	影响因子：	3.800
时间：	2025	起止号：	2025 Oct 30;13:e66907
doi：	10.2196/66907	研究方向：	神经科学
疾病类型：	抑郁症

Abstract

BACKGROUND: Depression is a critical psychological disorder necessitating urgent assessment and treatment, given its strong association with increased suicide risk (SR). Effective management hinges on promptly identifying individuals with high depression severity (DS) and SR. While machine learning and deep learning have advanced the identification of DS and SR, research focusing on both aspects simultaneously remains limited and requires further refinement. OBJECTIVE: This study aimed to evaluate whether our proposed methods, which integrate multitask learning (MTL), multimodal learning, and transfer learning, enhance the efficacy of deep learning models in the joint classification of DS and SR. METHODS: This study proposed a multitask framework employing a multimodal fusion strategy for pretrained audio and text embeddings to concurrently assess DS and SR. Data encompassing Chinese audio recordings and clinical questionnaire scores from 100 patients with depression and 100 healthy controls were used. Preprocessed audio and text data were transformed into pretrained embeddings and integrated using concatenation and hard parameter sharing. Single-task learning (STL) models (DS and SR tasks) were evaluated with different embeddings and further compared with the MTL models. RESULTS: The STL models demonstrated exceptional DS prediction (area under the curve [AUC]=0.878) using wav2vec 2.0 combined with ERNIE-health, and SR prediction (AUC=0.876) using HuBERT combined with ERNIE-health. The MTL models significantly improved SR prediction over DS prediction, achieving the highest DS classification (AUC=0.887) with wav2vec 2.0 combined with ERNIE-health, and SR classification (AUC=0.883) with HuBERT combined with ERNIE-health. CONCLUSIONS: The findings of this study underscore the effectiveness of the proposed MTL models using specific pretrained audio and text embeddings in enhancing model performance. However, we advocate for cautious implementation of MTL to mitigate potential negative transfer effects. Our research presents a method that is both promising and effective, offering an objective approach for accurate clinical decision support in the parallel diagnosis of DS and SR.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。