Abstract
BACKGROUND: DeepSeek is an open-source large language model (LLM), and it has greatly accelerated LLM adoption in health care. Its rapid deployment has sparked concerns regarding its impact on patient outcomes and safety. However, little is known about how DeepSeek is used and regulated in health care. OBJECTIVE: This study aimed to (1) systematically review the characteristics of DeepSeek deployed in the top 100 hospitals in China, and (2) compare the performance and risks of DeepSeek between hospital disclosures and research evidence. METHODS: We searched the official websites and WeChat accounts of the top 100 hospitals in China and the databases of Web of Science and PubMed, using the terms "DeepSeek" and "large language models." Searches were limited to records after January 15, 2025, when DeepSeek was first released. All searches were conducted on May 20, 2025, with an update on June 28, 2025. We extracted the basic characteristics of DeepSeek; its aims, evaluation approach, performance, and risks; and hospital regulations. A coding framework was developed covering the application scenarios, evaluation dimensions, and risk sources of LLMs. The risk of bias was assessed using the Joanna Briggs Institute checklist. RESULTS: We identified a total of 58 DeepSeek models in 48 out of the top 100 Chinese hospitals and found 27 studies in the literature. The first hospital deployment of DeepSeek was recorded on February 10, 2025, and deployment rapidly expanded to 37 hospitals within a month. Concurrently, most related research studies (20/27, 74%) were published after May 2025. Among deployments and studies that reported version information, DeepSeek-reasoner (R1) was the most frequently used model, and private deployment was the predominant approach. DeepSeek was mainly used to assist in clinical decision-making, including patient diagnosis and treatment recommendation. Among hospital disclosures, only 36% (21/58) clearly indicated a predeployment assessment, 22% (13/58) presented assessment results, and 9% (5/58) identified potential risks and countermeasures. We found poor transparency in hospital reporting, with none of the disclosures presenting evaluation details. Hospitals were more likely to report higher performance and fewer risks for DeepSeek. CONCLUSIONS: This is one of the first scoping reviews to reveal the rapid, widespread deployment of DeepSeek in China's leading hospitals, primarily for clinical decision support. The deployment of DeepSeek in China's leading hospitals poses potential risks to patient outcomes and safety. We highlight the urgent need for existing regulations to be expanded to downstream developers and users to promote the responsible use of LLMs in health care. Hospitals need to use a more rigorous validation process and adopt a more transparent reporting policy. The main limitations of this review include the restriction to top-tier hospitals and the inherent constraints of gray literature. These factors should be considered when interpreting the findings.