Abstract
BACKGROUND: Multidisciplinary teams (MDTs) are fundamental to cancer care but face increasing burdens. This pilot study evaluates a large language model (LLM) simulating a tumor board for a clinically complex cohort of non-small cell lung cancer (NSCLC) patients, using a full-context guideline injection methodology to ground its reasoning in authoritative standards. METHODS: Ten real-world NSCLC cases were presented to Google's Gemini 2.5 Pro using prompt engineering. The model was primed by providing the complete National Cancer Institute guidelines as in-context source data in a structured JSON file. AI-generated recommendations were scored against the institutional human MDT. RESULTS: The LLM demonstrated high performance, achieving mean scores of 4.9/5.0 for content accuracy, 5.0/5.0 for internal consistency, and 4.4/5.0 for clinical applicability. Importantly, no safety concerns were identified in the AI's recommendations. However, the model did not generate any novel insights beyond those considered by the human MDT. CONCLUSIONS: An LLM primed with comprehensive guidelines can accurately and safely replicate MDT recommendations for complex NSCLC cases. The combination of guideline injection and meticulous prompt engineering is a critical strategy for ensuring LLM reliability. This positions these models as powerful decision-support tools to augment, not replace, expert clinical workflow.