Abstract
To address the limitations in time-frequency feature representation of shearer arm gear faults and the issues of parameter redundancy and low training efficiency in standard convolutional neural networks (CNNs), this study proposes a diagnostic method based on an improved S-transform and a Depthwise Separable Convolutional Neural Network (DSCNN). First, the improved S-transform is employed to perform time-frequency analysis on the vibration signals, converting the original one-dimensional signals into two-dimensional time-frequency images to fully preserve the fault characteristics of the gear. Then, a neural network model combining standard convolution and depthwise separable convolution is constructed for fault identification. The experimental dataset includes five gear conditions: tooth deficiency, tooth breakage, tooth wear, tooth crack, and normal. The performance of various frequency-domain and time-frequency methods-Wavelet Transform, Fourier Transform, S-transform, and Gramian Angular Field (GAF)-is compared using the same network model. Furthermore, Grad-CAM is applied to visualize the responses of key convolutional layers, highlighting the regions of interest related to gear fault features. Finally, four typical CNN architectures are analyzed and compared: Deep Convolutional Neural Network (DCNN), InceptionV3, Residual Network (ResNet), and Pyramid Convolutional Neural Network (PCNN). Experimental results demonstrate that frequency-domain representations consistently outperform raw time-domain signals in fault diagnosis tasks. Grad-CAM effectively verifies the model's accurate focus on critical fault features. Moreover, the proposed method achieves high classification accuracy while reducing both training time and the number of model parameters.