Abstract
In hyperspectral image (HSI) classification, feature learning and label accuracy play a crucial role. In actual hyperspectral scenes, however, noisy labels are unavoidable and seriously impact the performance of methods. While deep learning has achieved remarkable results in HSI classification tasks, its noise-resistant performance usually comes at the cost of feature representation capabilities. High-dimensional and deep convolution can capture rich deep semantic features, but with high complexity and resource consumption. To deal with these problems, we propose a CViT Weakly Supervised Network (CWSN) for HSI classification. Specifically, a lightweight 1D-2D two-branch network is used for local generalization and enhancement of spatial-spectral features. Then, the fusion and characterization of local and global features are achieved through the CNN-Vision Transformer (CViT) cascade strategy. The experimental results on four benchmark HSI datasets show that CWSN has good anti-noise ability and ensures the robustness and versatility of the network facing both clean and noisy training sets. Compared to other methods, the CWSN has better classification accuracy.