Abstract
BACKGROUND: This study aimed to assess and compare ChatGPT-4o and Gemini Pro's ability to generate structured abstracts from full-text systematic reviews and meta-analyses in orthodontics, based on adherence to the PRISMA Abstract (PRISMA-A) Checklist, using a customised prompt developed for this purpose. MATERIALS AND METHODS: A total of 162 full-text systematic reviews and meta-analyses published in Q1-ranked orthodontic journals since January 2019 were included. Each full-text article was processed by ChatGPT-4o and Gemini Pro, using a PRISMA-A Checklist-aligned structured prompt. Outputs were scored using a tailored Overall quality Score OQS derived from 11 PRISMA-A checklist. Inter-rater and time-dependent reliability were assessed with Intraclass Correlation Coefficients (ICCs), and model outputs were compared using Mann-Whitney U tests. RESULTS: Both models yielded satisfactory OQS in generating PRISMA-A checklist compliant abstracts; however, ChatGPT-4o consistently achieved higher scores than Gemini Pro. The most notable differences were observed in the "Included Studies" and "Synthesis of Results" sections, where ChatGPT-4o produced more complete and structurally coherent outputs. ChatGPT-4o achieved a mean OQS of 21.67 (SD 0.58) versus 21.00 (SD 0.71) for Gemini Pro, a difference that was highly significant (p < 0.001). CONCLUSIONS: Both LLMs demonstrated the ability to generate PRISMA-A-compliant abstracts from systematic reviews, with ChatGPT-4o consistently achieving higher quality scores than Gemini Pro. While tested in orthodontics, the approach holds potential for broader applications across evidence-based dental and medical research. Systematic reviews and meta-analyses are essential to evidence-based dentistry but can be challenging and time-consuming to report in accordance with established standards. The structured prompt developed in this study may assist researchers in generating PRISMA-A-compliant outputs more efficiently, helping to accelerate the completion and standardisation of high-level clinical evidence reporting.