An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports

一种易于操作、高效且准确的自然语言处理方法,用于从病理报告中提取诊断数据

阅读:1

Abstract

CONTEXT: Analysis of diagnostic information in pathology reports for the purposes of clinical or translational research and quality assessment/control often requires manual data extraction, which can be laborious, time-consuming, and subject to mistakes. OBJECTIVE: We sought to develop, employ, and evaluate a simple, dictionary- and rule-based natural language processing (NLP) algorithm for generating searchable information on various types of parameters from diverse surgical pathology reports. DESIGN: Data were exported from the pathology laboratory information system (LIS) into extensible markup language (XML) documents, which were parsed by NLP-based Python code into desired data points and delivered to Excel spreadsheets. Accuracy and efficiency were compared to a manual data extraction method with concordance measured by Cohen's κ coefficient and corresponding P values. RESULTS: The automated method was highly concordant (90%-100%, P<.001) with excellent inter-observer reliability (Cohen's κ: 0.86-1.0) compared to the manual method in 3 clinicopathological research scenarios, including squamous dysplasia presence and grade in anal biopsies, epithelial dysplasia grade and location in colonoscopic surveillance biopsies, and adenocarcinoma grade and amount in prostate core biopsies. Significantly, the automated method was 24-39 times faster and inherently contained links for each diagnosis to additional variables such as patient age, location, etc., which would require additional manual processing time. CONCLUSIONS: A simple, flexible, and scaleable NLP-based platform can be used to correctly, safely, and quickly extract and deliver linked data from pathology reports into searchable spreadsheets for clinical and research purposes.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。