Abstract
Bloom filters (BFs) are efficient probabilistic data structures widely used for set membership queries, but can suffer from false positives, particularly with large datasets. This study proposes a novel method to reduce false positives in BFs for image retrieval by incorporating content-based check bits derived from image captions and image intensity. The approach introduces check bits generated from image captions and image data, enabling more compact BFs without compromising performance. Experiments conducted on a dataset of 31,783 images from Flickr demonstrate that our method significantly reduces the false positive rate, even with smaller filter sizes. By leveraging auxiliary data, our technique enhances BF efficiency in terms of memory usage and processing power, making it more practical for large-scale image retrieval, database management, and network security applications. This work highlights the importance of optimizing BF performance by integrating additional data sources to achieve high efficiency with minimal computational overhead.