Fairness in AI-Driven Oncology: Investigating Racial and Gender Biases in Large Language Models

人工智能驱动的肿瘤学中的公平性:调查大型语言模型中的种族和性别偏见

阅读:1

Abstract

INTRODUCTION: Large language model (LLM) chatbots have many applications in medical settings. However, these tools can potentially perpetuate racial and gender biases through their responses, worsening disparities in healthcare. With the ongoing discussion of LLM chatbots in oncology and the widespread goal of addressing cancer disparities, this study focuses on biases propagated by LLM chatbots in oncology. METHODS: Chat Generative Pre-trained Transformer (Chat GPT; OpenAI, San Francisco, CA, USA) was asked to determine what occupation a generic description of "assesses cancer patients" would correspond to for different demographics. Chat GPT, Gemini (Alphabet Inc., Mountain View, CA, USA), and Bing Chat (Microsoft Corp., Redmond, WA, USA) were prompted to provide oncologist recommendations in the top U.S. cities and demographic information (race, gender) of recommendations was compared against national distributions. Chat GPT was also asked to generate a job description for oncologists with different demographic backgrounds. Finally, Chat GPT, Gemini, and Bing Chat were asked to generate hypothetical cancer patients with race, smoking, and drinking histories. RESULTS: LLM chatbots are about two times more likely to predict Blacks and Native Americans as oncology nurses than oncologists, compared to Asians (p < 0.01 and < 0.001, respectively). Similarly, they are also significantly more likely to predict females than males as oncology nurses (p < 0.001). Chat GPT's real-world oncologist recommendations overrepresent Asians by almost double and underrepresent Blacks by double and Hispanics by seven times. Chatbots also generate different job descriptions based on demographics, including cultural competency and advocacy and excluding treatment administration for underrepresented backgrounds. AI-generated cancer cases are not fully representative of real-world demographic distributions and encode stereotypes on substance abuse, such as Hispanics having a greater proportion of smokers than Whites by about 20% in Chat GPT breast cancer cases. CONCLUSION: To our knowledge, this is the first study of its kind to investigate racial and gender biases of such a diverse set of AI chatbots, and that too, within oncology. The methodology presented in this study provides a framework for targeted bias evaluation of LLMs in various fields across medicine.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。