Abstract
IMPORTANCE: The unexplored quality of evidence supporting online video claims by medical professionals creates a credibility-evidence gap that threatens the principles of evidence-based medicine. OBJECTIVE: To systematically evaluate the evidence hierarchy supporting medical claims in health care professional-created online videos using a novel evidence classification framework. DESIGN, SETTING, AND PARTICIPANTS: In this quality improvement study using a cross-sectional analysis, YouTube was searched using cancer- and diabetes-related terms. A total of 309 videos met the inclusion criteria. The video search, data extraction, and archiving were conducted between June 20 and 21, 2025, to create a static dataset. Videos were assessed using the newly developed Evidence-GRADE (E-GRADE [Grading of Recommendations Assessment, Development and Evaluation]) framework, categorizing evidence into 4 levels: grade A (high certainty from systematic reviews and/or guidelines), grade B (moderate certainty from randomized clinical trials, cohort studies, and high-quality observational studies with clear citations), grade C (low certainty from limited observational studies, physiological mechanisms, or case series without critical appraisal), and grade D (very low or no certainty from anecdotal evidence). EXPOSURE: Videos that had a minimum of 10 000 views, were created by health care professionals, had a minimum duration of 1 minute, and contained specific health claims. MAIN OUTCOMES AND MEASURES: Primary outcomes included the distribution of evidence grades (A-D) supporting medical claims. Secondary outcomes included correlations between evidence quality and engagement metrics (views and likes) and traditional quality scores (DISCERN, JAMA benchmark criteria, and Global Quality Scale). RESULTS: Among the 309 videos included, which had a median of 164 454 (IQR, 58 909-477 075) views, most medical claims (193 [62.5%]) were supported by very low or no evidence (grade D), while only 61 claims (19.7%) were supported by high-quality evidence (grade A). Moderate (grade B) and low (grade C) evidence levels were found in 45 (14.6%) and 10 (3.2%) videos, respectively. The correlation with view counts was statistically significant for grade D videos, which were associated with a 34.6% higher view count (incidence rate ratio, 1.35; 95% CI, 1.00-1.81; P = .047) than grade A videos. Traditional quality tools showed only weak correlations (range of coefficients, 0.11-0.23) with evidence levels, thus failing to detect important qualitative differences. CONCLUSIONS AND RELEVANCE: In this quality improvement study, a substantial credibility-evidence gap was found in physician-generated video-sharing content, where medical authorities often legitimized claims lacking robust empirical support. These findings emphasize the need for evidence-based content guidelines and enhanced science communication training for health care professionals to maintain scientific integrity in digital health information.