Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA

在印度运用专门的大型语言模型开展月经健康教育:MenstLLaMA 的开发与评估研究

阅读:1

Abstract

BACKGROUND: The quality and accessibility of menstrual health education (MHE) in low- and middle-income countries, including India, remain inadequate due to persistent challenges (eg, poverty, social stigma, and gender inequality). While community-driven initiatives have sought to raise awareness, artificial intelligence offers a scalable and efficient solution for disseminating accurate information. However, existing general-purpose large language models (LLMs) are often ill-suited for this task, tending to exhibit low accuracy, cultural insensitivity, and overly complex responses. To address these limitations, we developed MenstLLaMA-a specialized LLM tailored to the Indian context and designed to deliver MHE empathetically, supportively, and accessibly. OBJECTIVE: We aimed to develop and evaluate MenstLLaMA-a specialized LLM tailored to deliver accurate, culturally sensitive MHE-and assess its effectiveness in comparison to existing general-purpose models. METHODS: We curated MENST-a novel, domain-specific dataset comprising 23,820 question-answer pairs aggregated from medical websites, government portals, and health education resources. This dataset was systematically annotated with metadata capturing age groups, regions, topics, and sociocultural contexts. MenstLLaMA was developed by fine-tuning Meta-LLaMA-3-8B-Instruct, using parameter-efficient fine-tuning with low-rank adaptation to achieve domain alignment while minimizing computational overhead. We benchmarked MenstLLaMA against 9 state-of-the-art general-purpose LLMs, including GPT-4o, Claude-3, Gemini 1.5 Pro, and Mistral. The evaluation followed a multilayered framework: (1) automatic evaluation using standard natural language processing metrics (BLEU [Bilingual Evaluation Understudy], METEOR [Metric for Evaluation of Translation with Explicit Ordering], ROUGE-L [Recall-Oriented Understudy for Gisting Evaluation-Longest Common Subsequence], and BERTScore [Bidirectional Encoder Representations from Transformers Score]); (2) evaluation by clinical experts (N=18), who rated 200 expert-curated queries for accuracy and appropriateness; (3) medical practitioner interaction through the ISHA (Intelligent System for Menstrual Health Assistance) interactive chatbot, assessing qualitative dimensions (eg, relevance, understandability, preciseness, correctness, and context sensitivity); and (4) a user study with volunteer participants (N=200), who evaluated MenstLLaMA in 15- to 20-minute randomized sessions, rating the system across 7 qualitative user satisfaction metrics. RESULTS: MenstLLaMA achieved the highest scores in BLEU (0.059) and BERTScore (0.911), outperforming GPT-4o (BLEU: 0.052, BERTScore: 0.896) and Claude-3 (BERTScore: 0.888). Clinical experts preferred MenstLLaMA's responses over gold-standard answers in several culturally sensitive cases. In medical practitioners' evaluations using the ISHA-the chat interface powered by MenstLLaMA-the model scored 3.5 in relevance, 3.6 in understandability, 3.1/5 in preciseness, 3.5/5 in correctness, and 4.0/5 in context sensitivity. User evaluations indicated even stronger results, with ratings of 4.7/5 for understandability, 4.3/5 for relevance, 4.28/5 for preciseness, 4.1/5 for correctness, 4.6/5 for tone, 4.2/5 for flow, and 3.9/5 for context sensitivity. CONCLUSIONS: MenstLLaMA demonstrates exceptional accuracy, empathy, and user satisfaction within the domain of MHE, bridging critical gaps left by general-purpose LLMs. Its potential for integration into broader health education platforms positions it as a transformative tool for menstrual well-being. Future research could explore its long-term impact on public perception and menstrual hygiene practices, while expanding demographic representation, enhancing context sensitivity, and integrating multimodal and voice-based interactions to improve accessibility across diverse user groups.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。