Analyzing Patient Complaints in Web-Based Reviews of Private Hospitals in Selangor, Malaysia, Using Large Language Model-Assisted Content Analysis: Mixed Methods Study

利用大型语言模型辅助内容分析法分析马来西亚雪兰莪州私立医院网络评论中的患者投诉:混合方法研究

阅读:1

Abstract

BACKGROUND: Large language model (LLM)-assisted content analysis (LACA) is a modification of traditional content analysis, leveraging the LLM to codevelop codebooks and automatically assign thematic codes to a web-based reviews dataset. OBJECTIVE: This study aims to develop and validate the use of LACA for analyzing hospital web-based reviews and to identify themes of issues from web-based reviews using this method. METHODS: Web-based reviews for 53 private hospitals in Selangor, Malaysia, were acquired. Fake reviews were filtered out using natural language processing and machine learning algorithms trained on yelp.com validated datasets. GPT-4o mini model application programming interface (API) was then applied to filter out reviews without any quality issues. In total, 200 of the remaining reviews were randomly extracted and fed into the GPT-4o mini model API to produce a codebook validated through parallel human-LLM coding to establish interrater reliability. The codebook was then used to code (label) all reviews in the dataset. The thematic codes were then summarized into themes using factor analysis to increase interpretability. RESULTS: A total of 14,938 web-based reviews were acquired, of which 1121 (9.3%) were fake, 1279 (12%) contained negative sentiments, and 9635 (88%) did not contain any negative sentiment. GPT-4o mini model subsequently inducted 41 thematic codes together with their definitions. Average human-GPT interrater reliability is perfect (κ=0.81). Factor analysis identified 6 interpretable latent factors: "Service and Communication Effectiveness," "Clinical Care and Patient Experience," "Facilities and Amenities Quality," "Appointment and Patient Flow," "Financial and Insurance Management," and "Patient Rights and Accessibility." The cumulative explained variance for the six factors is 0.74, and Cronbach α is between 0.88 and 0.97 (good and excellent) for all factors except factor 6 (0.61: questionable). The factors identified follow a global pattern of issues identified from the literature. CONCLUSIONS: A data collection and processing pipeline consisting of Python Selenium, the GPT-4o mini model API, and a factor analysis module can support valid and reliable thematic analysis. Despite the potential for collection and information bias in web-based reviews, LACA of web-based reviews is cost-effective, time-efficient, and can be performed in real time, helping hospital managers develop hypotheses for further investigations promptly.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。