Abstract
A nation's progress toward Sustainable Development Goal 4 (Quality Education) depends in part on the long-term health of its educational research system, yet systematic, longitudinal assessments of such research ecosystems remain scarce. This study applies a generative artificial intelligence-based framework to evaluate the sustainable development of China's empirical educational research ecosystem from 2004 to 2023. We compiled a dataset of 2,145 empirical studies published in leading Chinese education journals and used GPT-4o to score each paper on 31 quality indicators covering research problem, theoretical framing, design, data collection, analysis, and reporting, using a 1-10 analytic rating scale. Based on the resulting score distributions, we constructed a fuzzy relation matrix and applied a fuzzy comprehensive evaluation method to derive annual and overall sustainability indices, while the Criteria Importance Through Intercriteria Correlation (CRITIC) method was used to determine objective indicator weights. The overall sustainability index of China's empirical educational research ecosystem over the 20-year period is 75.77 on a 100-point scale, with membership degrees concentrated at quality levels 7 (0.328) and 8 (0.435), indicating a generally robust and maturing system. Longitudinal trends reveal three stages of evolution-fluctuating development, rapid growth, and continuous improvement-corresponding to a shift toward more stable high-quality output. At the micro level, the ecosystem shows strong responsiveness to real-world educational problems, with high average scores for the relevance (8.45) and social significance (8.23) of research questions, as well as generally solid research design and data analysis practices. However, relatively lower scores for transparency of data analysis (7.08) and accessibility of raw data (6.46) highlight persistent challenges for reproducibility, open science, and methodological innovation. We conclude that China's empirical educational research ecosystem has reached a relatively high and stable level of performance but faces critical tasks in strengthening data openness, methodological renewal, and AI-augmented governance. The proposed generative AI-based evaluation framework may offer a scalable tool for continuous monitoring and governance of national research ecosystems, while its results should be interpreted as an auxiliary input rather than a substitute for expert peer assessment.