The Impact of Race, Ethnicity, and Sex on Fairness in Artificial Intelligence for Glaucoma Prediction Models

种族、民族和性别对青光眼预测人工智能模型公平性的影响

阅读：2

作者：Ravindranath,Rohith,Stein,Joshua D,Hernandez-Boussard,Tina,Fisher,A Caroline,Wang,Sophia Y

期刊：	Ophthalmology Science	影响因子：	4.600
时间：	2025	起止号：	2025 Jan-Feb;5(1):100596
doi：	10.1016/j.xops.2024.100596	研究方向：	神经科学
疾病类型：	青光眼

Abstract

OBJECTIVE: Despite advances in artificial intelligence (AI) in glaucoma prediction, most works lack multicenter focus and do not consider fairness concerning sex, race, or ethnicity. This study aims to examine the impact of these sensitive attributes on developing fair AI models that predict glaucoma progression to necessitating incisional glaucoma surgery. DESIGN: Database study. PARTICIPANTS: Thirty-nine thousand ninety patients with glaucoma, as identified by International Classification of Disease codes from 7 academic eye centers participating in the Sight OUtcomes Research Collaborative. METHODS: We developed XGBoost models using 3 approaches: (1) excluding sensitive attributes as input features, (2) including them explicitly as input features, and (3) training separate models for each group. Model input features included demographic details, diagnosis codes, medications, and clinical information (intraocular pressure, visual acuity, etc.), from electronic health records. The models were trained on patients from 5 sites (N = 27 999) and evaluated on a held-out internal test set (N = 3499) and 2 external test sets consisting of N = 1550 and N = 2542 patients. MAIN OUTCOMES AND MEASURES: Area under the receiver operating characteristic curve (AUROC) and equalized odds on the test set and external sites. RESULTS: Six thousand six hundred eighty-two (17.1%) of 39 090 patients underwent glaucoma surgery with a mean age of 70.1 (standard deviation 14.6) years, 54.5% female, 62.3% White, 22.1% Black, and 4.7% Latinx/Hispanic. We found that not including the sensitive attributes led to better classification performance (AUROC: 0.77-0.82) but worsened fairness when evaluated on the internal test set. However, on external test sites, the opposite was true: including sensitive attributes resulted in better classification performance (AUROC: external #1 - [0.73-0.81], external #2 - [0.67-0.70]), but varying degrees of fairness for sex and race as measured by equalized odds. CONCLUSIONS: Artificial intelligence models predicting whether patients with glaucoma progress to surgery demonstrated bias with respect to sex, race, and ethnicity. The effect of sensitive attribute inclusion and exclusion on fairness and performance varied based on internal versus external test sets. Prior to deployment, AI models should be evaluated for fairness on the target population. FINANCIAL DISCLOSURES: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。