Abstract
Collaborative filtering and neural recommendation models have advanced substantially over the past decade, yet they frequently struggle to capture the fine-grained structure of user preferences and the rich, multi-relational dependencies among items. Knowledge graph-based methods provide a principled means of representing such structured information; however, systematic mechanisms for combining structured reasoning with heterogeneous recommendation signals remain relatively underexplored. In this work, we introduce KGERA (Knowledge Graph Enhanced Reasoning Architecture), a recommendation framework that augments a multi-relational knowledge graph with an explicit reasoning module to enhance ranking quality through test-time inference. We construct a unified knowledge graph encoding diverse item relationships, including genre co-membership, creator-level associations, and data-driven similarity links, and apply structured reasoning on this graph to compute contextualized recommendation scores. Unlike approaches that require expensive large language model inference, KGERA performs lightweight, interpretable reasoning exclusively at inference time using embedding-based computations, allowing the system to adapt dynamically to evolving user preference signals without retraining. The reasoning component is coupled with nine complementary recommendation strategies through a learned ensemble that integrates collaborative filtering (ItemKNN), graph neural networks (LightGCN), self-supervised graph learning (SGL, SimGCL), neural collaborative filtering (NCF), knowledge graph embeddings (TransE), content-based filtering (TextProxy), and popularity-based signals (PopRec) into a single predictive model. Experiments on MovieLens-1M demonstrate that KGERA significantly outperforms strong baselines, including recent self-supervised methods from SIGIR 2021-2022, achieving improvements of 52.83% in NDCG@10, 50.24% in Recall@10, and 54.62% in MRR@10 over the best baseline. Comprehensive validation through leave-one-out ablation studies, threshold sensitivity analysis, 5-fold cross-validation (CV = 4.84%), and stratification experiments across user activity levels (77.5%–85.6% improvement for low/medium activity users), item popularity tiers (114.2% improvement for medium-popularity items), and content genres demonstrates the robustness and generalizability of our approach. All improvements are statistically significant with [Formula: see text] under paired t-tests. Collectively, these results indicate that explicit test-time reasoning over structured knowledge provides a computationally efficient and interpretable direction for advancing the design of next-generation recommender systems.