Abstract
This study proposes a semi-supervised fault diagnosis framework based on vision transformers (ViTs) to enhance the diagnostic accuracy and generalization in machine cutting tools (MCT), particularly under the constraint of limited labeled data, a common challenge in intelligent manufacturing systems. The proposed method integrates pseudo-label generation, uncertainty quantification, and a dynamic teacher-student knowledge distillation strategy with an adaptive model refinement loop. Time-frequency domain scalograms, generated using continuous wavelet transform (CWT), are employed as input representations to preserve critical temporal and spectral characteristics from the acoustic emission (AE) signals. A ViT-based architecture is used to extract both local and global representations, enabling highly accurate fault diagnosis across MCT components such as bearings, gears, and cutting tools. The framework first trains a teacher model using transfer learning on a small, labeled dataset. Pseudo-labels for unlabeled data are then generated and refined using uncertainty estimation. High-confidence pseudo-labeled samples are merged with labeled data to train a lightweight DeiT-tiny transformer student model, which benefits from knowledge distillation for improved generalization and computational efficiency. The final adaptive refinement loop ensures continual performance improvement by filtering low-confidence samples and updating the model iteratively. The proposed framework was validated using real-world AE data collected from a milling machine achieving an accuracy of 99.68% and demonstrating outstanding reliability in identifying small fault variations across both experimental and benchmark datasets. By integrating advanced techniques, this work presents a scalable, data-efficient, and interpretable solution for predictive maintenance and intelligent fault diagnosis in Industry 4.0 environments.