Abstract
Tobacco leaf diseases significantly affect yield and quality, underscoring the need for rapid and non-destructive diagnostic tools. Although hyperspectral imaging (HSI) has been applied in tobacco pathology, most existing studies focus on single diseases and lack generalized, interpretable frameworks for multi-class identification. In this study, hyperspectral images of healthy leaves and four major diseases-brown spot, wildfire, Tobacco Mosaic Virus (TMV), and Potato virus Y (PVY)-were collected to construct a balanced, leaf-independent dataset. Pixels were grouped by leaf ID, and the entire dataset was strictly partitioned at the leaf level to prevent pixel-level data leakage and ensure generalization to unseen leaves. Multiple preprocessing techniques, wavelength-selection methods, and machine-learning classifiers were systematically compared. A compact ANN model integrating Savitzky-Golay preprocessing and SPA-based wavelength selection achieved the best overall performance while requiring only a small number of informative wavelengths. A Transformer model provided slightly stronger predictive capacity but depended on full-spectrum inputs and substantially higher computational cost. Pixel-level predictions enabled lesion-area-based severity estimation for the two leaf-spot diseases. SHAP analysis highlighted physiologically meaningful spectral regions associated with pigment absorption and structural variation. Overall, this study presents an efficient and interpretable HSI framework for multi-disease tobacco diagnosis, supporting the development of practical hyperspectral or multispectral systems.