Abstract
Hot data identification plays a critical role in a wide range of computing systems, including memory hierarchy management, database optimization, and large-scale storage infrastructures. Its importance has further increased with the emergence of non-volatile memory (NVM) technologies. However, many existing studies have not fully succeeded in accurately identifying hot and cold data due to excessive computational overhead, large memory requirements, and limited accuracy, primarily stemming from ineffective data structures for recording both recency and frequency information. To date, two representative data structures-bit-array counters and multiple bloom filters-have been widely adopted. While bit-array counters effectively capture access count (i.e., frequency) information, they fail to consider recency. Multiple bloom filters were proposed to record both frequency and recency information and have therefore been widely employed. Nevertheless, many hot data identification schemes based on multiple bloom filters still suffer from low accuracy due to fundamental limitations of the underlying data structure. To overcome these inherent limitations, this paper proposes Tierra, a novel hot data identification scheme based on a completely new data structure employing asymmetric multilevel arrays. These asymmetric arrays improve performance by significantly reducing internal data movement by 3.1×. In addition, Tierra incorporates a recency-aware request screening mechanism based on an enhanced stack distance approximation algorithm, which substantially reduces computational overhead while improving identification accuracy. Comprehensive evaluations using diverse real-world workloads demonstrate that the proposed Tierra achieves high accuracy, with an average true identification rate of 99.4%.