Abstract
With the rapid development of online education, personalized learning path recommendations have played an increasingly important role in enhancing learning efficiency and optimizing learning experiences. However, existing learning path recommendation methods still face significant limitations in knowledge structure modeling, dynamic learner knowledge state perception, and recommendation strategy optimization. To address these challenges, this study proposes an online personalized English learning path recommendation method that integrates a domain knowledge graph with deep reinforcement learning. The graph encodes prerequisite (directed) and semantic (undirected) relations and uses a resource-to-knowledge mapping to structurally bind learning resources to concepts; learner mastery is updated in real time via interaction feedback, graph-based propagation, and an exponential forgetting mechanism. The task is formulated as an MDP in which Q-learning provides value-based pruning of prerequisite-feasible candidates and PPO selects the final action from the pruned set (a prune-then-select workflow). Deployed as a WeChat Mini Program, the system was evaluated on 200 active learners over three months with 18,742 valid interactions. It achieves Precision 0.85, Recall 0.82, F1 0.84, MAE 0.12, RMSE 0.18, cumulative return G 650, and AMG 0.42, consistently outperforming strong baselines AKT, LightGCN, TA-RL, cDQN, KG-H, MC, CF, and Rule; paired per-learner tests with BH-FDR control confirm significance, particularly for Top-K ∈ [3,10]. Engineering evaluations show an average 241 ms latency for personalized recommendation at 200 concurrent threads and sub-350 ms / sub-500 ms startup on Wi-Fi / 4G across mainstream devices, demonstrating practical scalability and real-time applicability.