Abstract
INTRODUCTION: Artificial intelligence (AI) models in healthcare require accurate diagnostic data. In dementia, diagnostic ambiguity and inconsistent coding may distort data quality. METHODS: This cohort study analyzed 2016 to 2018 Medicare Part A hospitalization claims across 3000+ U.S. counties. Seventeen International Classification of Diseases, 10th Revision dementia codes were grouped into five categories. Temporal patterns were modeled using the transitive sequential pattern mining (tSPM+) algorithm; matrix similarity and multivariable regression assessed geographic and demographic variation. RESULTS: Non-specific codes were most common. Alzheimer's and vascular dementia codes showed high regional variability. Frequent transitions from specific to non-specific codes indicated diagnostic signal decay. Counties with more rural, Medicaid-eligible, and Black or Hispanic patients had lower alignment with national patterns. DISCUSSION: Dementia documentation varies widely and systematically across the United States. Much of this reflects inconsistent diagnostic practices, not true disease differences. Signal decay introduces bias into claims-based research and AI. Linking claims to validated cohorts may improve data quality and model fairness. HIGHLIGHTS: Non-specific dementia codes dominate Medicare hospitalization data. Temporal analysis shows diagnostic signal decay over time. Geographic variation is linked to rurality and racial demographics. Signal decay poses bias risks for AI and claims-based research. Model explains 38% of variation in diagnostic pattern similarity.