Abstract
Accurate traffic light detection and classification are fundamental for autonomous vehicle (AV) navigation and real-time traffic management in complex urban environments. Existing systems often fall short of reliably identifying and classifying traffic light states in real-time, including their flashing modes. This study introduces FlashLightNet, a novel end-to-end deep learning framework that integrates the nano version of You Only Look Once, version 10m (YOLOv10n) for traffic light detection, Residual Neural Networks 18 (ResNet-18) for feature extraction, and a Long Short-Term Memory (LSTM) network for temporal state classification. The proposed framework is designed to robustly detect and classify traffic light states, including conventional signals (red, green, and yellow) and flashing signals (flash red and flash yellow), under diverse and challenging conditions such as varying lighting, occlusions, and environmental noise. The framework has been trained and evaluated on a comprehensive custom dataset of traffic light scenarios organized into temporal sequences to capture spatiotemporal dynamics. The dataset has been prepared by taking videos of traffic lights at different intersections of Starkville, Mississippi, and Mississippi State University, consisting of red, green, yellow, flash red, and flash yellow. In addition, simulation-based video datasets with different flashing rates-2, 3, and 4 s-for traffic light states at several intersections were created using RoadRunner, further enhancing the diversity and robustness of the dataset. The YOLOv10n model achieved a mean average precision (mAP) of 99.2% in traffic light detection, while the ResNet-18 and LSTM combination classified traffic light states (red, green, yellow, flash red, and flash yellow) with an F1-score of 96%.