Abstract
Road traffic accidents remain a major global public health concern, where complex urban driving environments significantly elevate drivers' visual load and accident risks. Unlike existing research that adopts a macro perspective by considering multiple factors such as the driver, vehicle, and road, this study focuses on the driver's visual load, a key safety factor, and its direct source-the driver's visual environment. We have developed an interpretable framework combining computer vision and machine learning to quantify how road scene features influence oculomotor behavior and scene-induced visual load, establishing a complete and interpretable link between scene features, eye movement behavior, and visual load. Using the DR(eye)VE dataset, visual attention demand is established through occlusion experiments and confirmed to correlate with eye-tracking metrics. K-means clustering is applied to classify visual load levels based on discriminative oculomotor features, while semantic segmentation extracts quantifiable road scene features such as the Green Visibility Index, Sky Visibility Index and Street Canyon Enclosure. Among multiple machine learning models (Random Forest, Ada-Boost, XGBoost, and SVM), XGBoost demonstrates optimal performance in visual load detection. SHAP analysis reveals critical thresholds: the probability of high visual load increases when pole density exceeds 0.08%, signage surpasses 0.55%, or buildings account for more than 14%; while blink duration/rate decrease when street enclosure exceeds 38% or road congestion goes beyond 25%, indicating elevated visual load. The proposed framework provides actionable insights for urban design and driver assistance systems, advancing traffic safety through data-driven optimization of road environments.