Abstract
Pre-harvest defoliation of cotton is a key agricultural measure to improve mechanical harvesting efficiency and raw cotton purity. Collecting data on cotton defoliation traits for genetic localization and thus breeding defoliation-prone varieties is an essential alternative to traditional defoliant spraying. Nevertheless, it is hampered by low throughput and artificial error in manual field surveys. In this study, a framework for collecting high-throughput defoliation data in large fields was established. Three spectral indices (MTCI, VDVI, CI) and leaf area index (LAI) were first screened as core predictors through hierarchical segmentation analysis in three levels: leaf number (LN), leaf number difference (LND), and defoliation rate (DR). Four deep learning architectures (CNN, BiGRU, CNN-BiGRU, and CNN-BiGRU-Attention) were developed, and the CNN-BiGRU-Attention hybrid model demonstrated superior performance at all three levels, with R(2) values exceeding 0.85. Importantly, the inversion accuracy of this model at the LN and LND levels was superior to that at the DR level, which was also confirmed by the results of the genome-wide association study (GWAS). We combined GWAS and transcriptome results to identify a new gene, GhDR_UAV1, associated with defoliation traits. The overexpression of GhDR_UAV1 significantly promoted the wilting of cotton leaves, indicating that GhDR_UAV1 plays a positive regulatory role in cotton defoliation. This study proposed a strategy to invert cotton defoliation data at three levels using deep learning fusion of UAV remote sensing data and LAI data and confirmed that LND can provide accurate phenotypic data for GWAS analysis. This study provides a new theoretical basis for cotton defoliation regulation and genetic improvement by integrating cotton high-throughput defoliation phenomics and genomics from an innovative perspective.