Abstract
Complex forest structures, interspecies similarities, and intraspecies variations constrain the acquisition of species-specific tree phenotypes. This study develops a scalable framework for extracting species-specific structural parameters at the individual tree level. Leveraging ultrahigh-resolution UAV-based RGB and LiDAR data, we propose a novel self-attention-guided spectral-structural multimodal fusion transformer (SAMFormer). Key components include: (1) an adaptive feature enhancement module (AFEM) that employs spatial and channel attention to selectively highlight canopy features while suppressing background noise; (2) a cross-modal fusion module (CMFM) that captures intra- and inter-modal dependencies through the cross-attention mechanism, generating highly discriminative representations. SAMFormer achieves fine-grained tree identification in complex forest environments, relieving issues of blurred canopy segmentation and species misclassification. K-fold cross-validation demonstrates robust performance across diverse scenes, achieving 86.3 % F1-score and 88.0 % mAP@0.5, significantly outperforming single-modal inputs and mainstream instance segmentation models. We generate large-scale species-specific maps of tree structural parameters based on SAMFormer outputs, allometric equations, and a sliding window strategy. Subsequently, these parameters are utilized to map carbon stock. Ecological analysis reveals a coupling relationship between tree competition and structural parameters/carbon stock: competition intensity exhibits a significant negative correlation with both (p < 0.001). Trees adapt by adjusting growth strategies (e.g., reducing radial growth and limiting canopy expansion), ultimately lowering biomass accumulation and carbon stock. Additionally, species mixing enhances carbon stock, as mixed forests store more carbon than monocultures. This work provides a high-throughput, non-destructive pathway for forest phenotyping, supporting precision forestry and climate-adaptive management practices.