Abstract
Rotated Bounding Boxes (RBBs) for oriented object detection are labor-intensive and time-consuming to annotate. Single-point supervision offers a cost-effective alternative but suffers from insufficient size and orientation information, leading existing methods to rely heavily on complex priors and fixed refinement stages. In this paper, we propose DP2PNet (Diffusion-Point-to-Polygon Network), the first diffusion model-based framework for single-point supervised oriented object detection. DP2PNet features three key innovations: (1) A multi-scale consistent noise generator that replaces manual or external model priors with Gaussian noise, reducing dependency on domain-specific information; (2) A Noise Cross-Constraint module based on multi-instance learning, which selects optimal noise point bags by fusing receptive field matching and object coverage; (3) A Semantic Key Point Aggregator that aggregates noise points via graph convolution to form semantic key points, from which pseudo-RBBs are generated using convex hulls. DP2PNet supports dynamic adjustment of refinement stages without retraining, enabling flexible accuracy optimization. Extensive experiments on DOTA-v1.0 and DIOR-R datasets demonstrate that DP2PNet achieves 53.82% and 53.61% mAP50, respectively, comparable to methods relying on complex priors. It also exhibits strong noise robustness and cross-dataset generalization.