Abstract
BACKGROUND: Depression is a critical psychological disorder necessitating urgent assessment and treatment, given its strong association with increased suicide risk (SR). Effective management hinges on promptly identifying individuals with high depression severity (DS) and SR. While machine learning and deep learning have advanced the identification of DS and SR, research focusing on both aspects simultaneously remains limited and requires further refinement. OBJECTIVE: This study aimed to evaluate whether our proposed methods, which integrate multitask learning (MTL), multimodal learning, and transfer learning, enhance the efficacy of deep learning models in the joint classification of DS and SR. METHODS: This study proposed a multitask framework employing a multimodal fusion strategy for pretrained audio and text embeddings to concurrently assess DS and SR. Data encompassing Chinese audio recordings and clinical questionnaire scores from 100 patients with depression and 100 healthy controls were used. Preprocessed audio and text data were transformed into pretrained embeddings and integrated using concatenation and hard parameter sharing. Single-task learning (STL) models (DS and SR tasks) were evaluated with different embeddings and further compared with the MTL models. RESULTS: The STL models demonstrated exceptional DS prediction (area under the curve [AUC]=0.878) using wav2vec 2.0 combined with ERNIE-health, and SR prediction (AUC=0.876) using HuBERT combined with ERNIE-health. The MTL models significantly improved SR prediction over DS prediction, achieving the highest DS classification (AUC=0.887) with wav2vec 2.0 combined with ERNIE-health, and SR classification (AUC=0.883) with HuBERT combined with ERNIE-health. CONCLUSIONS: The findings of this study underscore the effectiveness of the proposed MTL models using specific pretrained audio and text embeddings in enhancing model performance. However, we advocate for cautious implementation of MTL to mitigate potential negative transfer effects. Our research presents a method that is both promising and effective, offering an objective approach for accurate clinical decision support in the parallel diagnosis of DS and SR.