Abstract
BACKGROUND: The COVID-19 pandemic has transitioned into an endemic phase with heterogeneous resurgences. Despite widespread vaccination and public health measures, the interplay of viral evolution, population immunity, and environmental factors drives diverse global patterns of COVID-19 burden. However, how these systematic factors dynamically shape disease transmission and severity across populations remains incompletely understood. OBJECTIVE: This study aims to determine the relative contributions and temporal dynamics of viral variants, population immunity (natural infection and vaccination), environmental conditions, and public health measures in determining COVID-19 disease burden. METHODS: This retrospective longitudinal time-series study used a big data-driven interpretable machine learning approach to analyze global multifaceted data across 38 countries from pandemic onset through December 31, 2022. Daily time-series data encompassing viral variants, natural infection, vaccination coverage, environmental conditions, policy interventions, health care infrastructure, and migration trends were integrated. The gradient-boosted trees (XGBoost [extreme gradient boosting]) model, coupled with Shapley Additive Explanations interpretation, quantifies the complex interdependencies and their spatiotemporal effects on 4 COVID-19 burden metrics-effective reproduction number (Rt), hospitalizations, intensive care unit (ICU) admissions, and deaths. RESULTS: Variant-related factors dominance drives transmission/Rt (24.02%, 95% CI 10.10-66.88 contribution) but progressively attenuates across severe outcomes (4.24%, 95% CI 1.59-10.89 for ICU; 5.52%, 95% CI 1.94-15.39 for deaths). Omicron 21K and Delta 21J demonstrate exceeding baseline transmissibility by 12.2% and 3.4% respectively. Conversely, immunity-related factors show inverse patterns: natural infection contributions escalate with severity (12.82% for Rt, 14.91% for hospitalization, 21.96% for ICU [95% CI 7.36-47.55], rising to 36.00% [95% CI 10.25-78.56] for deaths). COVID-19 vaccination maintains substantial influence on severe outcomes (18.04% [95% CI 6.39-42.49] for ICU; 20.31% [95% CI 6.53-58.31] for deaths), with protective critical population thresholds: 29.9% (95% CI 29.8-29.9) coverage for transmission reduction and 72.3% (95% CI 72.2-72.8) for ICU prevention. Routine immunizations exhibit cross-protective effects, particularly the yellow fever vaccine at doses exceeding 600,000 for Rt reduction and >100,000 for ICU protection. Temperature demonstrates threshold effects: 14.95°C (95% CI 14.86-15.43) for hospitalizations and 11.89°C (95% CI 11.81-11.97) for ICU admissions. Health care infrastructure contributed 23.98% (95% CI 7.03-73.13) to hospitalization outcomes. CONCLUSIONS: The large-scale epidemiological data mining reveals previously unrecognized patterns through three innovations: (1) quantifying variant evolutionary fitness with transmission thresholds, (2) identifying dual vaccination coverage thresholds for transmission versus severe disease prevention, and (3) discovering dose-specific cross-protection from routine immunizations. Unlike black-box predictions, this interpretable framework integrates multidomain surveillance data to reveal how variants, immunity, and environment jointly shape disease burden with temporal resolution. Real-world applications include tiered vaccination strategies targeting specific coverage goals, variant surveillance prioritizing lineages with demonstrated fitness in contemporary immunity contexts, and expanding routine immunization programs as pandemic preparedness measures. This framework provides quantifiable benchmarks for adaptive pandemic response across immunization strategies, variant surveillance, and health care capacity planning.