Abstract
Pengju Ren, Shanghai Maritime University, ChinaLitchi is a popular subtropical fruit with approximately 100 known varieties worldwide. Traditional post-harvest litchi variety classification primarily relies on manual identification, which suffers from low efficiency, strong subjectivity, and a lack of standardized systems. This study constructs an image dataset comprising the 12 most common litchi varieties found in commercial markets and proposes YOLO-LitchiVar, a lightweight and high-precision detection model that synergistically optimizes both computational efficiency and recognition accuracy for fine-grained litchi variety classification. The proposed model is built upon the YOLOv12 architecture and achieves significant performance improvements through the synergistic optimization of three novel modules. First, we introduce the DSC3k2 module to address lightweight design requirements, employing a depthwise separable convolutional structure that decouples standard convolution into spatial filtering and channel fusion. This innovation significantly reduces model complexity, decreasing parameters from 2.57 million to 2.20 million (a 14.1% reduction) and computational cost from 6.5G to 5.6G FLOPs (a 13.8% reduction). Second, we develop the C2PSA cross-layer feature aggregation module to enhance feature representation through multi-scale feature alignment and fusion, specifically improving shallow microtexture characterization capability. This module effectively addresses the missed detection problem of Icy-Flesh Litchi caused by the loss of micro-concave texture, increasing the recall rate from 0.492 to 0.706 (a 43.5% enhancement). Finally, we integrate an ECA attention mechanism to optimize discriminative performance by dynamically calibrating channel weights through adaptive kernel 1D convolution, thereby suppressing background noise (e.g., illumination variations) and features shared by similar varieties. This integration lowers the misclassification rate between Icy-Flesh Litchi and Osmanthus-Fragr Litchi from 0.462 to 0.340 (a 19.1% reduction). Experiments on a dataset of 11,998 multi-variety litchi images demonstrate that the YOLO-LitchiVar model achieves excellent comprehensive performance, with a mAP50-95 of 94.4%, which is 0.8% higher than the YOLOv12 baseline model. It also maintains lightweight advantages with a parameter count of 2.20 million and a computation volume of 5.6G FLOPs, making it suitable for mobile deployment. This study provides an efficient and effective solution for intelligent litchi variety identification with global applicability.