Abstract
Rock identification plays a fundamental role in geological work, particularly in resource reservoir characterization, stratigraphic division, engineering stability assessment, and hazard prevention. However, traditional manual identification approaches exhibit low efficiency and limited ability to capture dynamic and fine-grained features. To address these challenges, this study employs image recognition and object detection techniques to classify igneous, sedimentary, and metamorphic rocks. We propose an improved You Only Look Once version 11 (YOLO11)-based model by integrating the Efficient Visual State Space (EVSS) module, which enhances the extraction of key rock characteristics-such as texture and fractures-by modeling long-range spatial dependencies and overcoming the locality limitations of conventional convolutional networks. The proposed method is evaluated against three mainstream deep learning models. Experimental results show that the EVSS-enhanced YOLO11 achieves the highest classification accuracy of 92%, outperforming the Vision Transformer (ViT, 85%), ResNet (74%), and the standard YOLO11 (87%). In object detection tasks, the EVSS-integrated YOLO11 also demonstrates superior performance, achieving a mean average precision at 50% intersection-over-union (mAP50) of 91.8% compared to 87.7% for the original YOLO11. By combining efficient visual feature modeling with multi-scale detection capability, this study confirms the effectiveness and robustness of the EVSS-YOLO11 framework for rock image identification, providing strong technical support for intelligent geological analysis.