Abstract
BACKGROUND: Accurate interpretation of clinical test results is essential for diagnosing and managing complex immunological disorders. We explored the potential of large language models (LLMs) to automate interpretative reports for immunodeficiency and immune competence assessment via quantitative lymphocyte profiling and B-cell subset phenotyping (QLP/BSP). METHODS: An LLM was fine-tuned using parameter-efficient techniques on a dataset consisting of immunophenotyping test results and corresponding interpretive reports. Model performance was compared against a retrieval-based method and expert pathologist reports. A novel automated evaluation framework assessed both the accuracy of cell population comments and the clinical relevance of generated interpretations. A custom application was built to simulate clinical workflows and measure the impact on pathologist efficiency and accuracy. RESULTS: Fine-tuned LLMs achieved accuracy comparable to expert pathologists in identifying and commenting on abnormal cell type counts and frequencies, in comparison the retrieval-based method exhibited substantial error rates. There was no significant difference between the rate at which abnormalities for cell subtypes were commented on between the LLMs and pathologists. More importantly, LLMs significantly reduced the time required for a single pathologist to finalize reports with a mean reduction in time of 29%. CONCLUSION: Our results suggest that LLMs hold promise for enhancing efficiency and consistency in the clinical laboratory setting. By automating aspects of interpretive reporting, LLMs can potentially reduce pathologist workload and improve the turnaround time for critical diagnostic information, while requiring expert pathologist oversight. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10875-026-02006-0.