Abstract
PURPOSE: The data generating mechanisms underlying health care data are infrequently considered, leading to inequitable equilibria being reinforced throughout the care continuum. As race-based criteria are reassessed, the effect of those criteria on patterns of disease progression should also be reevaluated. We proposed a novel microsimulation-based framework for attenuating societal bias in primary care registry data to study this. METHODS: Our data transformation framework enables generating counterfactual outcome distributions that would have been observed in the absence of race-based diagnosis and treatment criteria. We developed a continuous-time, discrete-event individual-level simulation model of kidney function decline, measured by estimated glomerular filtration rate (eGFR). The model simulates individual eGFR trajectories over time. eGFR decline is accelerated by hypertension, diabetes, and reaching chronic kidney disease stage 3a, and can be delayed by interventions, which are applied based on eGFR level, measured with or without an adjustment for Black race. A Bayesian calibration procedure was applied to identify rates of eGFR decline corresponding to stage distributions in the cohort. RESULTS: Under the counterfactual scenario without a race adjustment, Black individuals qualify for diagnosis earlier, and non-Black individuals later, than under the reference scenario with race adjustment. The difference was largest for earlier stages and smaller at each consecutive stage. We do not observe differences in life expectancy between the two scenarios. LIMITATIONS: Large variability in the prevalence of treatment and heterogeneity in treatment effectiveness may impact our results. CONCLUSIONS: Our data transformation framework demonstrates how the explicit representation of the data generation process could inform the effect of policy changes on clinical data distributions. The framework can flexibly be adapted to mitigate bias in other health data.