Abstract
With increasing cybersecurity threats, effective intrusion detection has become critical for safeguarding networks. Although deep learning methods have advanced, two major issues persist: (1) class imbalance biases models toward normal traffic, increasing false negatives; (2) single-task frameworks limit feature representation and fail to leverage multi-task collaboration potential. To address these, we propose Memory Autoencoder with CNN-Attention Integration Network(MEMCAIN), a multi-task feature fusion deep learning method. First, MEMCAIN integrates CNN with attention mechanisms, constructing CCA Blocks through contrastive normalization to capture spatiotemporal features. These blocks are stacked to form CCANet, enabling comprehensive spatiotemporal feature extraction from traffic data. Second, a memory autoencoder is introduced to capture latent distribution features of traffic flows. Finally, an end-to-end collaborative training framework jointly optimizes CCANet (main task) and the memory autoencoder (auxiliary task). Experiments demonstrate MEMCAIN's significant superiority over baselines across multiple datasets, with ablation studies validating each module's efficacy for fine-grained intrusion detection in complex network environments.