Abstract
Microscopic hyperspectral imaging (MHSI) of unstained tissue provides quantitative, label-free cues for pathology, but practical diagnosis is hindered by weak morphological contrast and high-dimensional spectra. Patch-wise classification is therefore unstable: discriminative spectral signatures are subtle, spatially sparse, and easily confounded by noise and tissue heterogeneity. To address this, we construct a new unstained breast MHSI dataset and formulate slice-level diagnosis as a multiple instance learning (MIL) problem. We propose a Multi-Scale Hierarchical Attention Network (MS-HAN) tailored to hyperspectral MIL. Each instance (patch) is encoded by an Inception-like multi-branch extractor that operates at a fixed spatial resolution using parallel convolution kernels to capture spectral-spatial patterns at different receptive fields. To reduce high intra-class spectral variability, we introduce a prototype-based clustering regularization that softly assigns instance embeddings to learnable centers and refines the representation. We then apply dual attention directly on the spatial feature map: channel (spectral) attention generates band-wise weights from global spatial descriptors, explicitly modeling inter-band dependencies, followed by spatial attention producing a 2D attention map to localize informative cellular regions. These modules are trained end-to-end with only slice-level labels. Finally, a hierarchical aggregator models inter-patch dependencies via self-attention and performs attention pooling to obtain the slice representation for classification. On a strictly patient-split cohort of 60 patients, MS-HAN achieved 86.7% accuracy and 0.92 AUC, outperforming strong MIL baselines (e.g., TransMIL and DS-MIL). McNemar's test demonstrated statistically significant improvements over ABMIL ([Formula: see text]) and DS-MIL ([Formula: see text]), with marginal significance against CLAM and TransMIL ([Formula: see text]). Ablations verified the necessity of the prototype regularization and hyperspectral-specific attention. Attention visualizations highlighted regions consistent with tumor-related morphology and emphasized informative spectral ranges without pixel-level annotations, pending expert validation. MS-HAN suggests that hyperspectral-specific feature refinement and hierarchical MIL aggregation may improve robust, stain-free breast cancer detection from microscopic MHSI. Further multi-center validation and expert review of attention explanations are needed to establish clinical utility.