Abstract
We propose a lightweight Multi-Head Self-Attention (MHSA) mechanism for plant phenotypic feature extraction, which integrates cross-species transfer learning with dynamic head pruning to improve efficiency without compromising accuracy. The primary challenge stems from minimizing redundant computations without compromising the model's capacity to generalize over varied plant species, an issue intensified by the substantial dimensionality of attention mechanisms in Vision Transformers. Our solution, the Transferable Attention Head Alignment (TAHA) framework, operates in three stages: pre-training on a source species, cross-species alignment via a Domain Alignment Loss (DAL), and head pruning based on a transferability score. The framework selects and keeps solely the attention heads with the highest transferability, thus diminishing model intricacy without compromising the ability to distinguish phenotypic traits. Furthermore, the pruned MHSA module is smoothly combined with standard Transformer backbones, which makes efficient deployment on edge devices possible. Experiments were conducted on real edge hardware (Raspberry Pi 4, NVIDIA Jetson Nano) and GPU platforms, showing our approach attains accuracy similar to full-head models yet cuts computational expenses by as much as 40% (14.1 ms inference latency on Raspberry Pi 4, 519 M parameters). The method holds special importance for scalable plant phenotyping, in situations where computational capacity is frequently constrained yet generalization across species is essential. Moreover, the repeated alignment and pruning procedure permits gradual adjustment to novel species without complete retraining, which increases feasibility for agricultural applications in practical settings. Supplementary experiments on phylogenetically distant species (Arabidopsis → pine) demonstrate the framework's generalization limits, with a 7.2% F1-score drop compared to close-species transfer (Arabidopsis → maize), highlighting the need for trait-specific head adaptation in distant transfers. The proposed method improves lightweight feature extraction by merging transfer learning and attention head optimization, achieving a balanced compromise between performance and efficiency.