Abstract
BACKGROUND: Multi-label medical image classification is challenging due to complex inter-label dependencies, data imbalance, and the need to integrate multiple data modalities. These challenges hinder the development of robust and interpretable diagnostic systems capable of leveraging diverse clinical information. METHOD: We propose a cancer risk stratification framework that combines univariate thresholding with multivariate modeling using a hybrid parallel deep learning architecture, MedFusionNet. First, univariate thresholds are applied to identify the top-N discriminative features for each label. These selected features are then incorporated into MedFusionNet, which integrates Self-Attention Mechanisms, Dense Connections, and Feature Pyramid Networks (FPNs). The architecture is further extended for multi-modal learning by fusing image data with corresponding textual and clinical metadata. Self-Attention captures dependencies across image regions, labels, and modalities; Dense Connections enable efficient feature propagation; and FPNs support multi-scale representation and cross-modal fusion. RESULTS: Extensive evaluations on multiple datasets, including NIH ChestX-ray14 and a custom cervical cancer dataset, confirm that MedFusionNet consistently outperforms existing models. The framework delivers higher accuracy, improved robustness, and enhanced interpretability compared to traditional deep learning approaches. CONCLUSIONS: MedFusionNet provides an effective and scalable solution for multi-label medical image classification and cancer risk stratification. By integrating multi-modal information and advanced architectural components, it improves predictive performance while maintaining high interpretability, making it well-suited for real-world clinical applications.