Abstract
Natural language processing (NLP) has become an essential tool in healthcare, enabling sentiment analysis to extract insights from patient reviews, clinician notes, and medical research. This study evaluates the effectiveness of three NLP models - Bidirectional Encoder Representations from Transformers (BERT), Valence Aware Dictionary and sEntiment Reasoner (VADER), and Flair - in analyzing patient sentiment from physician reviews. A total of 1,486 reviews of 30 pain management specialists in Atlanta, GA, were collected from Healthgrades, with sentiment scores derived from each model and compared to patient-provided numerical ratings. Statistical analyses, including pairwise t-tests, Pearson correlation, and logistic regression, were conducted to assess each model's performance. Results showed significant differences among models (P < 0.05), with Flair demonstrating the highest correlation with patient ratings (r = 0.80), followed by BERT (r = 0.74) and VADER (r = 0.59). Logistic regression analysis further supported Flair's superior predictive accuracy. These findings highlight the potential of sentiment analysis in healthcare, offering an objective lens to interpret subjective patient experiences. Future research should focus on refining NLP models for medical contexts, integrating multimodal sentiment analysis, and addressing ethical considerations in patient data handling. By leveraging sentiment analysis, healthcare systems may improve patient satisfaction assessment, identify early signs of mental health concerns, and reduce documentation bias. While the results are promising, this study is limited by its retrospective design, single geographic region, and reliance on publicly available online reviews, which may not reflect the broader patient population or clinical encounters. Real-world validation in diverse settings and prospective studies is necessary to confirm the clinical applicability of these models.