Abstract
INTRODUCTION: The adoption of Artificial Intelligence (AI) in clinical decision support has encountered obstacles due to algorithmic bias and lack of transparency. To address this, we developed an auditing framework using detailed data provenance to audit fairness. METHODS: We simulated audits on a synthetic patient dataset (N = 1,000), comparing logistic regression and random forest models to detect gender bias using 5-fold crossvalidation and permutation testing. RESULTS: Logistic regression achieved 75.2 ± 1.0% accuracy (AUC = 0.806 ± 0.030) and random forest achieved 70.1 ± 1.4% accuracy (AUC = 0.745 ± 0.020). Provenance logs successfully detected gender biases in both models. Logistic regression exhibited statistically significant bias (EOD = +0.256, p = 0.0080), while random forest's smaller disparity (EOD = +0.055, p = 0.5664) was not statistically significant, demonstrating that our audit distinguishes systematic discrimination from random variation. Sensitivity analysis confirmed successful bias detection across magnitudes from β = -0.10 to β =-0.80. DISCUSSION: Despite lower accuracy, random forest showed 57% less bias than logistic regression, challenging assumptions that interpretability guarantees fairness. We introduce a standardized AI Fairness Provenance Record documenting data origin, model choices, and bias metrics, enabling auditors to trace decisions to their source. This framework maps to FDA transparency guidelines and ONC HTI-1 requirements, demonstrating how provenance-based auditing supports regulatory compliance and provides a pathway toward more responsible and equitable AI in clinical settings.