Abstract
Background: Large Language Models (LLMs) are reshaping medical research workflows. Objective: This narrative review synthesizes evidence on LLM applications across systematic reviews, scientific writing, and clinical research. Methods: We reviewed literature from 2023-2025 examining LLM applications in medical research, identified through PubMed, Scopus, Web of Science, arXiv, medRxiv, and Google Scholar. Studies reporting empirical findings, methodological evaluations, or systematic analyses of LLM applications were included; editorials and commentaries without empirical data were excluded. Results: In systematic reviews, LLMs achieve 80-94% data extraction accuracy and 40% reduction in screening workload, but show only slight-to-moderate agreement (κ = 0.16-0.43) in risk-of-bias assessment. In scientific writing, hallucination rates of 47-55% for fabricated references and over 90% prevalence of demographic bias require rigorous verification. For clinical research, LLMs assist with statistical coding and protocol development but require human validation. Critically, excessive reliance on automated tools may cause cognitive offloading that compromises analytical capabilities. Conclusions: LLMs are powerful but unstable tools requiring constant verification. Success depends on maintaining human-in-the-loop approaches that preserve critical thinking while leveraging AI efficiency.