Abstract
Current object detection methods deployed in closed-circuit television (CCTV) systems experience substantial performance degradation due to domain gaps between training datasets and real-world environments. At the same time, increasing privacy concerns and stricter personal data regulations limit the reuse or distribution of source-domain data, highlighting the need for source-free learning. To address these challenges, we propose a stable and effective source-free semi-supervised domain adaptation framework based on the Mean Teacher paradigm. The method integrates three key components: (1) pseudo-label fusion, which combines predictions from weakly and strongly augmented views to generate more reliable pseudo-labels; (2) static adversarial regularization (SAR), which replaces dynamic discriminator optimization with a frozen adversarial head to provide a stable domain-invariance constraint; and (3) a time-varying exponential weighting strategy that balances the contributions of labeled and unlabeled target data throughout training. We evaluate the method on four benchmark scenarios: Cityscapes, Foggy Cityscapes, Sim10k, and a real-world CCTV dataset. The experimental results demonstrate that the proposed method improves mAP@0.5 by an average of 7.2% over existing methods and achieves a 6.8% gain in a low-label setting with only 2% labeled target data. Under challenging domain shifts such as clear-to-foggy adaptation and synthetic-to-real transfer, our method yields an average improvement of 5.4%, confirming its effectiveness and practical relevance for real-world CCTV object detection under domain shift and privacy constraints.