Abstract
In unstructured environments, terrain perception is essential for stability and environmental awareness of Quadruped robot locomotion. Existing approaches primarily rely on visual or proprioceptive signals, but their effectiveness is limited under conditions of visual occlusion or ambiguous terrain features. To address this, this study proposes a multimodal terrain perception method that integrates acoustic features with proprioceptive signals. This terrain perception method collects environmental acoustic information through an externally mounted sound sensor, and combines the sound signal with proprioceptive sensor data from IMU and joint encoder of the quadruped robot. The method was deployed on the quadruped robot Lite2 platform developed by Deep Robotics, and experiments were conducted on four representative terrain types: concrete, gravel, sand, and carpet. Mel-spectrogram features are extracted from the acoustic signals and concatenated with the IMU and joint encoder to form feature vectors, which are subsequently fed into a support vector machine for terrain classification. For each terrain type, 400 s of data were collected. Experimental results show that the terrain classification accuracy reaches 78.28% without using acoustic signals, while increasing to 82.52% when acoustic features are incorporated. To further enhance the classification performance, this study performs a combined exploration of the SVM hyperparameters C and γ as well as the time-window length win. The final results demonstrate that the classification accuracy can be improved to as high as 99.53% across all four terrains.