Abstract
The inspection of power transmission lines using unmanned aerial vehicles primarily relies on object detection. However, the continuous emergence of new obstacle types necessitates frequent updates to detection models, leading to substantial retraining costs. To address this challenge, we propose a novel framework named IViT, which integrates incremental learning with a hybrid CNN-Transformer architecture for improved identification. We combined knowledge distillation with the elastic response selection distillation strategy to enhance detection performance for old classes and strengthen knowledge retention through star convolutional residual blocks constructed via element-wise multiplication. We designed a separable convolution aggregation block that integrates PConv with an attention mechanism, effectively merging global and local information to improve detection accuracy. Finally, we unified the two modules into a hybrid block. In the static detection task, IViT achieves a mAP of 55.3%, a mAP(50) of 83.6%, and a mAP(75) of 61.0%. For the incremental detection task, it attains a mAP of 57.8%, a mAP(50) of 79.7%, and a mAP(75) of 62.3%. Extensive experiments on the transmission line corridor external damage dataset and the INSPLAD dataset demonstrate that IViT exhibits outstanding detection performance compared to mainstream static object detection models and incremental object detection models.