Abstract
BACKGROUND: While AI chatbots have increased access to healthcare information, evidence regarding the readability, reliability, and overall quality of nursing care plans generated by these systems remains limited. PURPOSE: This study aimed to comparatively evaluate nursing care plan texts generated by ChatGPT, Gemini, and DeepSeek in terms of readability, reliability, and overall quality. METHODS: Thirty nursing diagnoses were randomly selected from the NANDA International 2021–2023 taxonomy. For each diagnosis, nursing care plans were generated using three AI chatbots, resulting in 90 texts. Outputs were comparatively evaluated using a descriptive information form, the DISCERN instrument, and multiple readability measures (FRES, SMOG, Gunning Fog Index, and Flesch–Kincaid Grade Level). RESULTS: Readability analyses indicated that nursing care plans generated by all three AI models significantly exceeded the recommended sixth-grade reading level (P < .001). DISCERN scores reflected moderate reliability, with mean scores of 57.41 ± 5.9 for ChatGPT, 58.41 ± 4.8 for Gemini, and 56.51 ± 6.8 for DeepSeek. Overall, 27 texts (90%) were rated as providing nursing care information of moderate quality. The presence of verifiable references demonstrated a statistically significant positive association with both reliability and quality scores (P < .05). CONCLUSION: Although AI chatbots demonstrate potential as supportive tools in nursing education and documentation, they should not be used as standalone resources for generating complete nursing care plans without professional review. Improvements in content clarity, reference accuracy, and expert oversight are necessary to enhance their applicability in nursing practice. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12912-026-04295-7.