Abstract
OBJECTIVE: This study aims to reorganize randomized controlled trials (RCTs) of traditional Chinese medicine (TCM) for adults with type 2 diabetes mellitus (T2DM) and metabolic-associated steatotic liver disease (MASLD; including legacy NAFLD) into a clinical evidence-anchored knowledge graph (KG) and harmonize effect semantics ("unified effects") to support endpoint- and design-aware evidence navigation. METHODS: We systematically reviewed RCTs (2015-2025). Effect direction and scale were unified using a prespecified rule (treatment effect [TE] >0 indicates improvement). The prespecified primary endpoints maximizing cross-trial comparability were alanine aminotransferase (ALT), triglycerides (TG), Homeostatic Model Assessment of Insulin resistance (HOMA-IR), and controlled attenuation parameter (CAP); aspartate aminotransferase (AST) was retained for robustness. Metabolic endpoints were synthesized at the 12-week timepoint, while imaging endpoints (CAP and liver stiffness measurement [LSM]) were synthesized within a prespecified 8- to 24-week window. Trials were stratified as add-on versus mixed, with primary efficacy inferences based on add-on trials with balanced background Western medicine. Evidence was synthesized using REML-based random-effects meta-analysis (reporting prediction intervals) and weighted meta-regression. Risk of bias was assessed using RoB 2 and certainty of evidence using GRADE. RESULTS: A total of 95 trials were included (n = 8,813; follow-up 2-48 weeks; predominantly add-on). The KG linked intervention categories (classic formulas, custom formulas, and Chinese patent medicines) to recurrent syndrome/symptom patterns; Salvia miltiorrhiza emerged as a central herb-layer hub. In add-on trials, pooled effects for ALT, AST, HOMA-IR, and TG were directionally favorable, but heterogeneity was substantial and prediction intervals for biochemical endpoints were often wide and crossed the null. CAP showed a comparatively more reproducible short-term imaging signal than LSM. Meta-regression suggested hypothesis-generating design patterns in which estimate stability tended to improve with larger per-arm sample sizes (≈≥40-50) and longer follow-up (≈≥12-16 weeks). RoB 2 ratings were predominantly "some concerns," and GRADE certainty was commonly downgraded for inconsistency and/or imprecision. CONCLUSION: In adults with T2DM and MASLD, add-on trials show directionally favorable pooled biochemical/metabolic changes after unified effect harmonization, but uncertainty remains substantial. CAP may be a more reproducible short-term imaging endpoint than LSM. Evidence-derived design patterns should be interpreted as hypothesis-generating rather than causal thresholds. SYSTEMATIC REVIEW REGISTRATION: https://www.crd.york.ac.uk/prospero/, identifier CRD420251167450.