Abstract
Extreme Multi-Label Text Classification (XMTC) is a crucial task in natural language processing, aiming to assign the most relevant subset of labels to an input text from an extremely large label set. Existing deep learning models often neglect the correlation information among labels when addressing XMTC, resulting in limited prediction performance. This paper proposes the Label Correlation Enhancement Network (LCENet), a pluggable modular architecture capable of effectively capturing and utilizing inter-label correlation knowledge without modifying the original model structure. The LCENet module introduces a bottleneck layer and residual connection mechanism, transforming raw label predictions into correlation-enhanced outputs. The bottleneck design reduces parameter complexity from [Formula: see text] to O(RL), effectively addressing computational feasibility under extreme-scale label spaces. We integrate the LCENet module into various mainstream deep XMTC models-including CNN-, BERT-, and RNN-based architectures-and conduct comprehensive evaluations on three benchmark datasets: EUR-Lex, AmazonCat-13K, and Wiki-500K. Experimental results demonstrate that LCENet significantly improves the performance of baseline models, achieving consistent gains across multiple metrics such as Precision@k and nDCG@k, with a maximum increase of 5.22 percentage points in P@1, while also accelerating model convergence. Ablation studies further verify the effectiveness of key components, including the bottleneck layer, residual connections, and nonlinear activation functions. Training curve analysis shows that LCENet provides stronger learning signals through label correlation constraints, alleviating the early-stage stagnation observed in some baseline models (Fig. 5) and reducing the required convergence steps by nearly half. This study presents a practical and effective enhancement framework for deep multi-label learning, offering excellent scalability and practical applicability. The core idea of LCENet can be extended to other structured prediction tasks such as multimodal learning and sequence labeling.