Abstract
Medical image classification requires models that effectively capture both fine-grained local patterns and global anatomical structures while maintaining computational efficiency for clinical deployment. Although state-of-the-art models such as MedMamba utilize State-Space Models (SSMs) to balance accuracy and efficiency, their sequential operations limit parallelism and increase runtime. To overcome these limitations, we propose MedSpectralNet, a lightweight Convolutional Neural Network (CNN) architecture that approximates self-attention with linear complexity to efficiently extract multi-frequency features. The model introduces a dual-stream feature extractor that processes global and local information in parallel, and a ContextGate block that adaptively fuses multi-scale representations. MedSpectralNet is evaluated across six benchmark datasets from MedMNIST (including BloodMNIST, BreastMNIST, DermaMNIST, PneumoniaMNIST, OrganCMNIST, and OrganSMNIST), MedSpectralNet achieves an average accuracy of 93.7% on OrganCMNIST and 98.0% on BloodMNIST, showing 1-4.3% relative accuracy gains when compared to larger transformer-based models. Importantly, it delivers this performance with only 8.5 million parameters, representing approximately 60% fewer parameters than MedMamba-T, which requires 14.5 million parameters. MedSpectralNet has also achieved high AUC values up to 0.999 across multiple classes, demonstrating state-of-the-art accuracy with substantially reduced computational cost and improved parallelization, which makes MedSpectralNet well-suited for real-time and resource-constrained classification-based medical applications.