Leveraging large language models for heuristic usability assessment of medical software: Insights with the Radiation Planning Assistant

利用大型语言模型对医疗软件进行启发式可用性评估：以放射治疗计划助手为例

阅读：1

作者：Court,Laurence E,Smit,Jacobus,Strauss,Lourens,Shaw,William,Marais,Andrea,Trauernicht,Christoph,Joubert,Nanette,Smith,Elaine,Badre,Shona,Lazarus,Graeme L,Khotle,Thekiso,Netherton,Lauren,van Heerden,Wanda,Cardenas,Carlos,Serban,Monica,Seuntjens,Jan,Chung,Christine V,Govyadinov,Pavel,Khan,Meena,Nair,Saurabh,Netherton,Tucker,Zhang,Lifei

期刊：	Journal of Applied Clinical Medical Physics	影响因子：	2.200
时间：	2026	起止号：	2026 Feb;27(2):e70495
doi：	10.1002/acm2.70495

Abstract

BACKGROUND: Usability engineering is essential for ensuring the safety and effectiveness of medical software, as design-related issues are a leading cause of use errors in clinical settings. Heuristic evaluation provides a practical approach to identifying usability problems, but its outcomes depend heavily on expert interpretation. Large Language Models (LLMs), such as ChatGPT, offer a potential means to augment heuristic evaluation by generating structured, context-aware usability feedback. This study explored the use of ChatGPT to support heuristic assessment of the Radiation Planning Assistant (RPA), a web-based radiotherapy planning tool designed to support clinical teams in low- and middle-income countries. METHODS: ChatGPT was provided with the RPA user and technical guides, training videos for each functional dashboard, and Zhang et al.'s 14 usability heuristics. The model was instructed to score each dashboard according to these heuristics, using Zhang's 0-4 severity scale, and to propose concrete interface improvements. The resulting feedback was reviewed and scored independently by the RPA developer team and by 13 users during a dedicated User Meeting. Comparative analysis was performed between ChatGPT, developer, and user ratings. RESULTS: ChatGPT identified 26 potential usability issues across six heuristic domains. The developer team considered nine of these actionable, though all were classified as minor (severity ≤ 2). User ratings showed wide variability, with nine suggestions achieving mean scores ≥ 1.5. Qualitative agreement between users and developers was limited, underscoring the importance of diverse perspectives in heuristic evaluation. Three suggestions-enhanced upload logs, reversible actions ("reopen request"), and stronger error prevention-were rated as potentially high priority by a minority of users. ChatGPT's ratings were consistent across dashboards. CONCLUSIONS: While ChatGPT did not reveal any critical usability failures, its heuristic assessment proved valuable in prompting discussion, identifying minor refinements, and enriching both developer and user engagement with the RPA's interface design. This study demonstrates that LLMs can serve as an effective, low-cost complement to conventional heuristic evaluation, supporting early-stage usability review and stakeholder dialogue in the development of medical software.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

线粒体

凋亡

转录调控

巨噬细胞

自噬

传染病

氧化应激

肠道菌群

磷酸化

血管生成

囊泡

3D/类器官

单细胞

中性粒细胞

外泌体

DNA甲基化

miRNA

药物研究

铁死亡

细胞衰老

乙酰化

缺氧低氧

泛素化

树突状细胞

组蛋白修饰

炎性小体

肿瘤微环境

lncRNA

代谢重编程

焦亡

m6A/m5C/m7G

内质网应激

空间多组学

细胞基因治疗

治疗耐药

相分离

Treg

上皮间质转化

免疫代谢

染色质重塑

脂质过氧化

脂代谢

蛋白质稳态

铁代谢

细胞极性

氨基酸代谢

碱基编辑

cGAS-STING

肠脑轴

蛋白降解

乳酸化

翻译调控

circRNA

piRNA

肿瘤异质性

NK 细胞

氧化脂质

MDSC

NETosis

低氧缺氧

溶酶体功能

细胞干性

琥珀酰化

CAR-NK

冷应激

RNA 编辑

Tfh

巴豆酰化

器官芯片

表观遗传记忆

铜死亡

器官纤维化

线粒体未折叠蛋白反应

空间代谢组

程序性坏死

自噬流

肠肝轴

丙酰化

MAIT 细胞