Evaluating Generative AI in Mental Health: Systematic Review of Capabilities and Limitations

评估生成式人工智能在心理健康领域的应用:能力与局限性的系统性综述

阅读:1

Abstract

BACKGROUND: The global shortage of mental health professionals, exacerbated by increasing mental health needs post COVID-19, has stimulated growing interest in leveraging large language models to address these challenges. OBJECTIVES: This systematic review aims to evaluate the current capabilities of generative artificial intelligence (GenAI) models in the context of mental health applications. METHODS: A comprehensive search across 5 databases yielded 1046 references, of which 8 studies met the inclusion criteria. The included studies were original research with experimental designs (eg, Turing tests, sociocognitive tasks, trials, or qualitative methods); a focus on GenAI models; and explicit measurement of sociocognitive abilities (eg, empathy and emotional awareness), mental health outcomes, and user experience (eg, perceived trust and empathy). RESULTS: The studies, published between 2023 and 2024, primarily evaluated models such as ChatGPT-3.5 and 4.0, Bard, and Claude in tasks such as psychoeducation, diagnosis, emotional awareness, and clinical interventions. Most studies used zero-shot prompting and human evaluators to assess the AI responses, using standardized rating scales or qualitative analysis. However, these methods were often insufficient to fully capture the complexity of GenAI capabilities. The reliance on single-shot prompting techniques, limited comparisons, and task-based assessments isolated from a context may oversimplify GenAI's abilities and overlook the nuances of human-artificial intelligence interaction, especially in clinical applications that require contextual reasoning and cultural sensitivity. The findings suggest that while GenAI models demonstrate strengths in psychoeducation and emotional awareness, their diagnostic accuracy, cultural competence, and ability to engage users emotionally remain limited. Users frequently reported concerns about trustworthiness, accuracy, and the lack of emotional engagement. CONCLUSIONS: Future research could use more sophisticated evaluation methods, such as few-shot and chain-of-thought prompting to fully uncover GenAI's potential. Longitudinal studies and broader comparisons with human benchmarks are needed to explore the effects of GenAI-integrated mental health care.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。