Abstract
Algorithmic estimations of dementia status are widely used in public health and epidemiologic research, but inadequate algorithm performance across racial/ethnic groups has been a barrier. We present improvements in the accuracy of group-specific "probable dementia" estimation using a transfer learning approach. Transfer learning involves combining models trained on a large "source" data set with imprecise outcome assessments, alongside models trained on a smaller "target" data set with high-quality outcome assessments. Transfer learning improves model accuracy by leveraging large-source data while refining estimations with smaller, target data. We illustrate with data from the Health and Retirement Study (source data: n = 6630) and the Harmonized Cognitive Assessment Protocol (target data: n = 2388). Models for dementia status estimation were evaluated through overall accuracy (Brier score), calibration (intercept, slope), and discriminative ability (area under the receiver operating characteristic curve [AUR] and area under the precision-recall curve [AUPRC]). The transfer-learned algorithm showed higher accuracy compared to the best previously reported algorithm among both non-Hispanic Black participants (Brier 0.049 vs 0.061; AUC 0.84 vs 0.81; AUPRC 0.52 vs 0.39) and Hispanic participants (Brier 0.052 vs 0.056; AUC 0.89 vs 0.87; AUPRC 0.61 vs 0.56). Transfer learning can improve dementia status estimation for groups historically underrepresented in research. This article is part of a Special Collection on Methods in Social Epidemiology.