Abstract
Semi-supervised joint classification of Hyperspectral Images (HSIs) and LiDAR-derived Digital Surface Models (DSMs) remains challenging due to scarcity of labeled pixels, strong intra-class variability, and the heterogeneous nature of spectral and elevation features. In this work, we propose a Hybrid Mamba-Graph Fusion Network (HMGF-Net) with Multi-Stage Pseudo-Label Refinement (MS-PLR) for semi-supervised hyperspectral-LiDAR classification. The framework employs a spectral-spatial HSI backbone combining 3D-2D convolutions, a compact LiDAR CNN encoder, Mamba-style state-space sequence blocks for long-range spectral and cross-modal dependency modeling, and a graph fusion module that propagates information over a heterogeneous pixel graph. Semi-supervised learning is realized via a three-stage pseudolabeling pipeline that progressively filters, smooths, and re-weights pseudolabels based on prediction confidence, spatial-spectral consistency, and graph neighborhood agreement. We validate HMGF-Net on three benchmark hyperspectral-LiDAR datasets. Compared with a set of eight state-of-the-art (SOTA) baselines, including 3D-CNNs, SSRN, HybridSN, transformer-based models such as SpectralFormer, multimodal CNN-GCN fusion networks, and recent semi-supervised methods, the proposed approach delivers consistent gains in overall accuracy, average accuracy, and Cohen's kappa, especially in low-label regimes (10% labeled pixels). The results highlight that the synergy between sequence modeling and graph reasoning in combination with carefully designed pseudolabel refinement is essential to maximizing the benefit of abundant unlabeled samples in multimodal remote sensing scenarios.