Abstract
Accurate solar irradiance forecasts are vital for photovoltaic (PV) power prediction, especially in tropical and subtropical regions affected by dust, wildfire smoke, and pollution. Yet, aerosol detection from satellites is often obstructed by clouds, AErosol RObotic NETwork (AERONET) stations are sparsely distributed, and climatological datasets cannot capture intra-day variability. Global products such as the Copernicus Atmosphere Monitoring Service (CAMS) provide broad coverage but miss local events due to coarse resolution and uncertainties in the underlying emission database. In this study, atmospheric parameters from automated METeorological aerodrome report (METAR) observations and CAMS aerosol products are used as inputs to data-driven models trained on normalized pseudo global horizontal clear sky irradiance ([Formula: see text]) targets from one site. Models tested include gradient boosting methods, Random Forests, neural networks, and a quantum variational circuit. Results have been obtained using only openly available data from seven test sites with significant aerosol loads, for the period spanning 2015–2024. The predicted global horizontal clear sky irradiance ([Formula: see text]) is then used in the Heliosat-3 method, which uses satellite-derived cloud index (CI) to estimate the all-sky global horizontal irradiance (GHI), for benchmarking against the all-sky GHI output of Heliosat-3 coupled with [Formula: see text] from the physics-based McClear model. Categorical boosting (CatBoost) shows the highest positive root mean squared error (RMSE) skill score (SS) of 4.2% over the entire test dataset, compared to the reference McClear. A consistent positive RMSE SS from 1–5% is observed for the 6–8 km visibility range for all models. During dust and sand events, the Light Gradient-Boosting Machine (LightGBM) shows a 21% positive RMSE SS. These findings demonstrate the value of [Formula: see text] based machine learning approach for improving solar irradiance estimates in aerosol-rich environments.