Abstract
The National Institutes of Health's All of Us Research Program (All of Us) aims to enhance precision medicine by collecting multimodal data from one million or more participants. Because All of Us prioritizes enrollment from populations for which there is limited data on health outcomes using nonprobability sampling methods, prevalence estimates may not reflect those of the general U.S. population. This study examines the challenges of estimating electronic health record-based disease prevalence from All of Us and offers a framework and novel R package (waou) to help researchers consider these complex issues. We investigated the application of three weighting techniques to improve generalizability for dementia, type 2 diabetes, and depression prevalence estimates. Using data from All of Us alongside the National Health Interview Survey as a benchmark, we found that weighting approaches yielded more representative estimates for dementia and type 2 diabetes, yet amplified bias for depression. Waou is presented as a tool to facilitate the application of these methodologies, empowering researchers to critically evaluate the generalizability of their estimates. This work underscores the need for careful consideration of bias in epidemiological research when using the All of Us dataset for population-level inferences.