Abstract
BACKGROUND: Image-based artificial intelligence (AI) risk models can estimate short-term breast cancer risk directly from mammograms and may outperform traditional questionnaire-based tools. However, risk stratification remains particularly challenging in women with dense breasts who do not otherwise meet high-risk criteria. At our institutions, molecular breast imaging (MBI) is used as supplemental screening for this population. This study evaluated the performance and clinical utility of a mammography-based AI risk model (iCAD ProFound AI(®) Risk) in predicting short-term breast cancer risk among women with dense breasts undergoing MBI. METHODS: This retrospective IRB-approved study included 416 non-actionable (BI-RADS category 1 or 2) screening digital breast tomosynthesis mammograms (BI-RADS C-D density) obtained from 2018 to 2023, all followed by MBI within one year. The cohort comprised 70 cancer cases (16.8%) and 346 (83.2%) non-cancer controls. Mammograms were retrospectively processed using the ProFound AI(®) Risk model to generate 1-year risk and density scores. Tyrer-Cuzick and Gail model scores were computed for comparison. Group differences were assessed using t-tests and effect sizes, and model discrimination was evaluated with ROC analysis using area under the curve (AUC), sensitivity, specificity, and 95% confidence intervals (CIs). RESULTS: Across the full cohort, mean AI risk scores were higher in cancer cases than controls (0.41±0.35 vs. 0.37±0.21), although this difference was not statistically significant (P=0.239; Cohen's d=0.23). Subgroup analyses demonstrated progressively stronger discriminatory performance with increasing breast density. The greatest separation was observed in women with extremely dense breasts (category D), where the AI model achieved an AUC of 0.75 (95% CI: 0.61-0.89; P=0.049), with 69.3% sensitivity and 61.1% specificity at a threshold of 0.14. Effect size in this group was the largest (d=0.41). In contrast, traditional models showed limited and non-significant discrimination across all density categories, with AUC values ranging from 0.54 to 0.63. When stratified by cancer subtype, the AI model produced significantly higher risk scores in invasive lobular carcinoma (ILC) compared with controls (0.69±0.46 vs. 0.41±0.32; P=0.048; d=0.56). Although differences in ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC) were not significant, risk scores trended higher for cancer cases. A similar pattern of increasing AI-estimated risk was observed with higher tumor grade, with the strongest separation seen in grade 2 cancers (P=0.089). CONCLUSIONS: Although overall differences between cancer and non-cancer groups were not statistically significant, the mammography-based AI risk model demonstrated meaningful and statistically significant discrimination in women with extremely dense breasts, outperforming both Tyrer-Cuzick and Gail models. The AI model also showed better separation in ILC and in higher-grade tumors. These findings support the role of image-based AI tools in refining risk assessment in women for whom mammography is least effective and in guiding more targeted use of supplemental MBI screening.