SOP: Selective Orthogonal Projection for Composed Image Retrieval

SOP:用于合成图像检索的选择性正交投影

阅读:1

Abstract

The proliferation of intelligent sensor networks in urban surveillance and remote sensing has triggered the explosive growth of unstructured visual sensor data. Accurately retrieving targets from these massive streams based on complex cross-modal user intents remains a critical bottleneck for efficient intelligent perception. Composed Image Retrieval (CIR) addresses this by enabling retrieval via a multi-modal query that combines a reference image with semantic control signals. However, existing methods often struggle with abstract instructions in real-world scenarios. Consequently, models often suffer from feature distribution shifts due to focus ambiguity, as well as semantic erosion caused by highly entangled visual and textual features. To address these challenges, we propose a geometry-based Selective Orthogonal Projection Network (SOP). First, the Selective Focus Recovery module quantifies instruction uncertainty via information entropy and calibrates shifted query features to the true target distribution using structural consistency regularization. Second, to ensure data fidelity, we introduce Orthogonal Subspace Projectionand Geometric Composition Fidelity. These mechanisms employ Gram-Schmidt orthogonalization to decouple features into a constant visual base and an orthogonal modification increment, restricting semantic modifications to the null space. Extensive experiments on FashionIQ, Shoes, and CIRR datasets demonstrate that SOP significantly outperforms SOTA methods, offering a novel solution for efficient large-scale sensor data retrieval and analysis.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。