Abstract
Multivariate time series (MTS) anomaly detection faces critical challenges, including complex sensor interdependencies, environmental noise, and the inefficiency of modeling long-range dependencies. Existing methods often fail to balance computational efficiency with fine-grained modeling of normal patterns. To address these issues, we propose P-ALIGN, a framework that integrates patch-based feature extraction with prototypical alignment and contrastive learning. Specifically, a patching mechanism is employed to capture long-term context with linear complexity, while an EmbedPatch encoder learns a set of normal prototypes as global reference points. By aligning latent features with these prototypes, P-ALIGN effectively suppresses noise and prevents the model from over-reconstructing anomalies. Furthermore, a Contrastive Fusion module is introduced to enlarge the discriminative gap between normal and abnormal distributions. Extensive experiments on six real-world benchmarks demonstrate that P-ALIGN consistently outperforms state-of-the-art methods, achieving an 11% improvement in F1-score and a 12.23% increase in the Normalized Affinity (NAff) metric.