Two-Stage Marker Detection-Localization Network for Bridge-Erecting Machine Hoisting Alignment

用于桥梁架设机提升对准的两阶段标记检测定位网络

阅读:1

Abstract

To tackle the challenges of complex construction environment interference (e.g., lighting variations, occlusion, and marker contamination) and the demand for high-precision alignment during the hoisting process of bridge-erecting machines, this paper presents a two-stage marker detection-localization network tailored to hoisting alignment. The proposed network adopts a "coarse detection-fine estimation" phased framework; the first stage employs a lightweight detection module, which integrates a dynamic hybrid backbone (DHB) and dynamic switching mechanism to efficiently filter background noise and generate coarse localization boxes of marker regions. Specifically, the DHB dynamically switches between convolutional and Transformer branches to handle features of varying complexity (using depthwise separable convolutions from MobileNetV3 for low-level geometric features and lightweight Transformer blocks for high-level semantic features). The second stage constructs a Transformer-based homography estimation module, which leverages multi-head self-attention to capture long-range dependencies between marker keypoints and the scene context. By integrating enhanced multi-scale feature interaction and position encoding (combining the absolute position and marker geometric priors), this module achieves the end-to-end learning of precise homography matrices between markers and hoisting equipment from the coarse localization boxes. To address data scarcity in construction scenes, a multi-dimensional data augmentation strategy is developed, including random homography transformation (simulating viewpoint changes), photometric augmentation (adjusting brightness, saturation, and contrast), and background blending with bounding box extraction. Experiments on a real bridge-erecting machine dataset demonstrate that the network achieves detection accuracy (mAP) of 97.8%, a homography estimation reprojection error of less than 1.2 mm, and a processing frame rate of 32 FPS. Compared with traditional single-stage CNN-based methods, it significantly improves the alignment precision and robustness in complex environments, offering reliable technical support for the precise control of automated hoisting in bridge-erecting machines.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。