Abstract
The automated classification of clinical diagnoses in electronic health records (EHRs) is critical for enhancing clinical decision-making and enabling large-scale medical research, yet existing methods struggle with heterogeneous data structures and limited annotated datasets. Current approaches fail to adequately address the dual challenges of extracting contextual medical semantics from unstructured clinical narratives while maintaining generalizability across institutions with divergent documentation practices. This study proposes a novel framework integrating three core components: a Transformer-based architecture for hierarchical feature extraction from clinical text, a multi-task learning paradigm leveraging diagnostic interdependencies, and transfer learning initialization using pretrained medical language models. Evaluation on the MIMIC-III dataset demonstrates state-of-the-art performance with 89.2% accuracy and 87.6% F1-score, outperforming conventional CNN-RNN hybrids by 8.0% in recall and showing 4.9-6.2% improvements over ablated configurations in critical metrics. The results establish that synergistic integration of contextual attention mechanisms, cross-task knowledge sharing, and medical domain adaptation effectively addresses EHR heterogeneity while reducing reliance on institution-specific annotations, providing a robust foundation for clinical decision support systems that balance accuracy with real-world implementability across diverse healthcare environments.