Development of an Open-Source Annotated Glaucoma Medication Dataset From Clinical Notes in the Electronic Health Record

基于电子健康记录中的临床记录,开发开源带注释的青光眼药物数据集

阅读:1

Abstract

PURPOSE: To describe the methods involved in processing and characteristics of an open dataset of annotated clinical notes from the electronic health record (EHR) annotated for glaucoma medications. METHODS: In this study, 480 clinical notes from office visits, medical record numbers (MRNs), visit identification numbers, provider names, and billing codes were extracted for 480 patients seen for glaucoma by a comprehensive or glaucoma ophthalmologist from January 1, 2019, to August 31, 2020. MRNs and all visit data were de-identified using a hash function with salt from the deidentifyr package. All progress notes were annotated for glaucoma medication name, route, frequency, dosage, and drug use using an open-source annotation tool, Doccano. Annotations were saved separately. All protected health information (PHI) in progress notes and annotated files were de-identified using the published de-identifying algorithm Philter. All progress notes and annotations were manually validated by two ophthalmologists to ensure complete de-identification. RESULTS: The final dataset contained 5520 annotated sentences, including those with and without medications, for 480 clinical notes. Manual validation revealed 10 instances of remaining PHI which were manually corrected. CONCLUSIONS: Annotated free-text clinical notes can be de-identified for upload as an open dataset. As data availability increases with the adoption of EHRs, free-text open datasets will become increasingly valuable for "big data" research and artificial intelligence development. This dataset is published online and publicly available at https://github.com/jche253/Glaucoma_Med_Dataset. TRANSLATIONAL RELEVANCE: This open access medication dataset may be a source of raw data for future research involving big data and artificial intelligence research using free-text.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。