Abstract
Accurately modeling the nonlinear relationships between near-infrared (NIR) spectral signatures and biochemical traits in corn remains a major challenge. A key difficulty lies in capturing multi-scale contextual dependencies-ranging from local absorption peaks to global spectral patterns-that jointly determine quality constituents such as protein and oil. To address this, we propose SpecTran, a spectral Transformer network specifically designed for NIR regression. SpecTran integrates three key components: adaptive multi-scale patch embedding which extracts spectral features at multiple resolutions to capture both fine and coarse patterns, spectral-enhanced positional encoding which preserves wavelength order information more effectively than standard encoding, and hierarchical feature fusion for robust multi-task prediction. Evaluated on the public Eigenvector corn dataset, SpecTran had a performance across four key traits-moisture, starch, oil, and protein-with an average R2 of 0.483. It reduced the RMSE by 11.2% for protein and 10.7% for oil compared to the best-performing baseline, which is the standard Transformer model. These results demonstrate SpecTran's superior ability to model complex spectral dynamics while providing interpretable insights, offering a reliable framework for NIR-based agricultural quality assessment.