Abstract
Artificial intelligence (AI) algorithms enhance distributed acoustic sensing (DAS) signal interpretation by leveraging large-scale acoustic data. However, heterogeneous deployment environments hinder model generalization ability and exacerbate label scarcity. To overcome these challenges, we propose MAEPD, a foundation model for DAS signal recognition trained via masked autoencoder pre-training on large-scale, unlabeled DAS data collected from diverse domains. The pre-trained model is subsequently adapted to downstream tasks using adapter-based prompt tuning (APT) with only minimal labeled samples. In the DAS gait identity recognition task, with only 240 image signals per class, APT achieves 94.75% accuracy, a 4.46% improvement over full fine-tuning while updating only 2.77% of parameters. Inference latency of 2.74 ms per image meets real-time requirements. Compared to pre-training with gait data only (35.6 k samples), MAEPD improves accuracy by 3.88%, demonstrating the advantage of diverse pre-training data. The method shows robust performance across water pipe leakage, perimeter security, and public datasets, with low sensitivity to labeled data quantity. Results demonstrate an efficient and scalable solution for DAS signal recognition.