Abstract
BACKGROUND: The United States Preventive Services Task Force (USPSTF) provides age-based recommendations for prostate cancer screening. Such a single-criterion strategy can not only miss aggressive cancers that occur before the designated cut-off age, but also over-screen men whose cancers occur at a much older age. A multi-factorial model incorporating a wide spectrum of medical history may offer more accurate predictions. Such cancer prediction models may excel by incorporating diverse medical histories, including causal and non-causal conditions, as well as their chronological relationship with the onset of cancer. OBJECTIVE: This study aims to develop an AI (machine learning)-driven predictive model for prostate cancer based on patients' medical history and compare its performance with traditional age-based criteria. METHODS: A predictive model was developed based on electronic health records (EHRs) from the All of Us database. A binary indicator for a prostate cancer diagnosis was established using SNOMED codes. Subsequently, a Least Absolute Shrinkage and Selection Operator (LASSO) logistic regression model was employed to examine the relationship between all prior health conditions and the cancer indicator, thereby identifying the most predictive features. Predictive performance was assessed using the area under the receiver operating characteristic curve (AUROC) and McFadden's R². RESULTS: The EHR-based model achieved a 10-fold cross-validated McFadden's R² of 0.36, significantly outperforming a model based on USPSTF eligibility criteria, which had an R² of 0.20. Validation using AUROC further demonstrated that the proposed model outperformed current screening criteria in terms of both sensitivity and specificity. CONCLUSION: This study highlights the potential of personalized screening strategies and demonstrates that AI-driven prediction models based on EHR data can predict prostate cancer with better accuracy than existing age-based guidelines through non-invasive means. Such approaches may help reduce invasive diagnostic procedures due to unnecessary screening and improve early detection by focusing diagnostic efforts on those most at risk.