Abstract
BACKGROUND: Polygenic risk scores (PRS) offer an elegant approach to estimating an individual's genetic predisposition to a given disease or trait. Numerous tools are available for PRS calculation, each applying different strategies to account for linkage disequilibrium and effect size shrinkage. No single tool is inherently superior. Therefore, multiple tools should be tested to identify the one that best suits the research question. Additionally, challenges such as population stratification and PRS portability further complicate the field. Here, we developed STREAM-PRS, a PRS pipeline designed to calculate scores using five popular tools: PRSice-2, PRS-CS, LDpred2, lassosum, and lassosum2. METHODS: STREAM-PRS first computes scores under various settings in a training dataset. The selected variants are subsequently used for score calculation in the test dataset, followed by PC correction and standardization to improve portability across different centers. Finally, the pipeline determines the best PRS tool and settings based on the variance explained (R(2)) in the test dataset. To demonstrate this PRS pipeline, we applied it to an in-house inflammatory bowel disease (IBD) cohort consisting of 3192 IBD cases and 822 controls. In total, 472 scores were created using The 1000 Genomes non-Finnish European subpopulation as training data and applied to UK Biobank data as the test dataset. RESULTS: Using STREAM-PRS for 472 scores across the 5 PRS tools with 404 individuals in the training and 1000 individuals in the test dataset takes approximately 20 h to complete. For IBD, lassosum was identified as the best-performing tool with optimal settings as follows: a shrinkage value of 0.7 and a lambda value of 0.008859. Applying this optimized PRS to our in-house IBD dataset (validation) resulted in an R(²) of 0.203 and an AUC of 0.75. Further, the PRS showed a high positive predictive value of 0.905 but a low negative predictive value of 0.341. This suggests that the PRS is effective in identifying individuals at high risk but might be less reliable in excluding lower risk individuals. CONCLUSIONS: Overall, STREAM-PRS provides an efficient framework for selecting the best PRS calculation strategy and helps bridge the portability gap within the PRS field. STREAM-PRS is available at https://github.com/SaraBecelaere/STREAM-PRS.