Abstract
Information retrieval serves as a critical methodology for accurately and efficiently obtaining the required information from massive amounts of data. In this paper, we propose an information retrieval framework (SE-MSLC) that utilizes information theory to improve the retrieval effectiveness of inverted index retrieval, thus achieving higher-quality retrieval results in intelligent vertical domain search engines. First, we propose a semantic entropy-driven keyword importance analysis method (SE-KIA) in the query understanding module. This method combines search query logs, the corpus of the search engine, and the theory of semantic entropy, enabling the search engine to dynamically adjust the weights of query keywords, thereby improving its ability to recognize user intent. Then, we propose a hybrid recall strategy that combines a multi-stage strategy and a logical combination strategy (HRS-MSLC) in the recall module. It separately recalls the keywords obtained from the multi-granularity word segmentation of the query in the form of multi-queue recall and simultaneously considers the "AND" and "OR" logical relationships between the keywords. By systematically managing retrieval uncertainty and giving priority to the keywords with high information content, it achieves the best balance between the quantity of the retrieval results and the relevance of the retrieval results to the query. Finally, we experimentally evaluate our methods using the Hit Rate@K and case analysis. Our results demonstrate that the proposed method improves the Hit Rate@1 by 7.3% and the Hit Rate@3 by 6.6% while effectively solving the bad cases in our vertical domain search engine.