Abstract
BACKGROUND: Large biobanks offer unprecedented data for psychiatric genomic research, but concerns exist about representativeness and generalizability. This study examined depression prevalence and polygenic risk score (PRS) associations in the All of Us data to assess potential impacts of nonrepresentative sampling. METHODS: Depression prevalence and correlates were analyzed in two subsamples: those with self-reported personal medical history (PMH) data (N = 185,232 overall; N = 114,739 with genetic data) and those with electronic health record (EHR) data (N = 287,015 overall; N = 206,175 with genetic data). PRS weights were estimated across ancestry groups. Associations of PRS with depression were examined by state and ancestry. RESULTS: Depression prevalence varied across states in both PMH (16.7-35.9%) and EHR (0.2-45.8%) data. Concordance between PMH and EHR diagnoses was low (kappa: 0.29, 95% CI: 0.30-0.30). Overall, one standard deviation increase in depression PRS was associated with lifetime depression based on PMH (odds ratio [OR] = 1.05, 95% confidence interval [CI]: 1.04-1.07) and EHR (OR = 1.05, 95% CI: 1.04-1.07). Results were generally consistent by ancestry, with the strongest signal for European ancestry (PMH: OR = 1.10, 95% CI: 1.08-1.12; EHR: OR = 1.07, 95% CI: 1.05-1.10). Associations between PRS and lifetime depression were largely consistent and significant associations varied minimally (ORs = 1.06-1.45) by state of residence in both subsamples. CONCLUSIONS: Recorded depression prevalence by state in All of Us demonstrates a wide range, likely reflecting recruitment differences, EHR data completeness, and true geographic variation; yet PRS associations remained relatively stable. As studies like All of Us expand, accounting for sample composition and measurement approaches will be crucial for generating actionable findings.