Abstract
Fixed roadside monocular cameras are widely used as low-cost sensing devices in intelligent transportation systems; however, extracting reliable three-dimensional (3D) information from such sensors remains challenging due to limited baselines, long observation distances, and moving vehicles. This paper presents a traffic-oriented 3D vehicle reconstruction framework based on monocular image sequences captured by fixed roadside camera sensors. Semantic and non-semantic vehicle feature points are jointly exploited to balance structural consistency and surface completeness, and a feature-map-consistency-based optimization strategy is introduced to refine feature point localization and reduce reprojection errors. In addition, an optimized incremental Structure-from-Motion (SfM) pipeline incorporating traffic-aware initialization, keyframe selection, and local bundle adjustment is developed to improve reconstruction efficiency. Experiments on real-world traffic surveillance videos show that the proposed method reduces the mean reprojection error by 13.6% and shortens reconstruction time by 43.9% compared with widely used incremental SfM systems.