Abstract
Reliable short-term forecasts enable urban health systems to anticipate dengue surges and allocate resources effectively. We assembled monthly dengue case counts for Freetown, Sierra Leone (2015-2024), and compared four probabilistic model families under a leakage-safe, rolling-origin evaluation at 1-3-month horizons: a negative binomial generalized linear model (NB-GLM), a negative binomial INGARCH model (INGARCH-NB), a mechanistic renewal model with negative binomial observations (Renewal-NB), and a bidirectional long short-term memory network with a negative binomial output (BiLSTM-NB). All models used the same seasonal harmonics and autoregressive lags; "light" climate inputs (rainfall, temperature, and relative humidity) were restricted to lag-1 covariates to reflect real-time availability. We evaluated probabilistic performance using mean log score (primary), empirical coverage, and median widths of 50% and 90% predictive intervals, calibration diagnostics based on the probability integral transform, and Diebold-Mariano tests with Newey-West standard errors. For the main comparison, we evaluated models on a strictly matched set of common issue-target pairs within each horizon (n = 32 per horizon). On this aligned set, INGARCH-NB achieved the best mean log score at all horizons, indicating the strongest overall distributional accuracy. BiLSTM-NB remained competitive and provided more conservative upper-tail uncertainty at longer horizons (e.g., 90% interval coverage of 100% at h = 3), at the cost of wider intervals. NB-GLM variants produced the sharpest intervals but were substantially undercovered, indicating overconfidence, while renewal-based forecasts attained nominal coverage largely through uncertainty inflation that degraded sharpness and log score. In a leakage-safe light-climate ablation, adding lag-1 climate covariates yielded small, statistically non-significant gains for NB-GLM and did not improve renewal forecasts. Overall, the results support a horizon-aware toolkit for operational dengue forecasting: INGARCH-NB as a strong default when distributional accuracy is prioritized, complemented by calibrated deep learning (BiLSTM-NB) when conservative tail reliability is preferred. The aligned indices, per-issue forecasts, and code provide a transparent baseline for future work in similar urban settings.