Abstract
BACKGROUND: The quality of generative nursing diagnoses and plans reported in existing research remains a topic of debate, and previous studies have primarily utilized ChatGPT as the sole large language mode. PURPOSE: To explore the quality of nursing diagnoses and plans generated by a prompt framework across different large language models (LLMs) and assess the potential applicability of LLMs in clinical settings. METHODS: We designed a structured nursing assessment template and iteratively developed a prompt framework incorporating various prompting techniques. We then evaluated the quality of nursing diagnoses and care plans generated by this framework across two distinct LLMs(ERNIE Bot 4.0 and Moonshot AI), while also assessing their clinical utility. RESULTS: The scope and nature of the nursing diagnoses generated by ERNIE Bot 4.0 and Moonshot AI were similar to the "gold standard" nursing diagnoses and care plans.The structured assessment template effectively and comprehensively captures the key characteristics of neurosurgical patients, while the strategic use of prompting techniques has enhanced the generalization capabilities of the LLMs. CONCLUSION: Our research further confirms the potential of LLMs in clinical nursing practice.However, significant challenges remain in the effective integration of LLM-assisted nursing processes into clinical environments.