Abstract
BACKGROUND: Artificial intelligence (AI) chatbots, driven by advances in natural language processing, can analyze and generate human language through computational linguistics and machine learning. Despite the rapid development of large language models, little investigation has been conducted to assess whether AI chatbot-delivered educational conversations can achieve a similar level of efficacy as human-delivered conversations. OBJECTIVE: This study aims to evaluate and explore the potential efficacy of human-delivered conversations versus AI chatbot conversations in increasing women's knowledge and awareness of symptoms and response to a heart attack in the United States. METHODS: This is a secondary analysis of 2 datasets collected from the AI Chatbot Development Project. Women aged 25 years or older were recruited through flyers and social media. The first dataset contained conversational data where a research interventionist engaged in educational conversations with participants (human dataset), whereas the second dataset contained conversational data where an AI chatbot named HeartBot engaged in the same educational conversations with participants (HeartBot dataset). Knowledge and awareness of symptoms and response to a heart attack were measured at the pre- and post-interaction with either the human or HeartBot. Perceived message effectiveness and conversational quality were measured at the post-survey. Ordinal logistic regression analyses were conducted to explore factors predicting participants' knowledge, adjusting for age, race or ethnicity, intervention group type, education, word count, message effectiveness, and message humanness. RESULTS: A total of 171 participants (mean age=41.06 y, SD=12.08) in the Human dataset and 92 participants (mean age=45.85 y, SD=11.94) in the HeartBot dataset completed the study. Both human-delivered conversations and HeartBot conversations were associated with significant improvements in participants' ability to recognize heart attack symptoms (adjusted odds ratio [AOR] 15.19, 95% CI 8.46-27.25, P<.001; AOR 7.18, 95% CI 3.59-14.36, P<.001), differentiate between symptoms (AOR 9.44, 95% CI 5.60-15.91, P<.001; AOR 5.44, 95% CI 2.76-10.74, P<.001), call emergency services (AOR 6.87, 95% CI 4.09-11.55, P<.001; AOR 5.74, 95% CI 2.84-11.60, P<.001), and seek emergency care within 60 minutes of symptom onset (AOR 8.68, 95% CI 4.98-15.15, P<.001; AOR 2.86, 95% CI 1.55-5.28, P<.001), even after adjusting for covariates. Comparing the 2 datasets via interaction tests showed a statistically significant improvement in human-delivered conversations versus HeartBot conversation for all but the calling an ambulance question (P=.09). CONCLUSIONS: The study's findings provide new insights into the fully automated AI HeartBot, compared to the human-driven text message conversations, and suggest that it has potential in improving women's knowledge and awareness of heart attack symptoms and appropriate response behaviors. Nevertheless, the current evidence remains preliminary. A randomized controlled trial is warranted to validate this study's findings.