OBJECTIVE: This paper describes a successful approach to de-identification that was developed to participate in a recent AMIA-sponsored challenge evaluation. METHOD: Our approach focused on rapid adaptation of existing toolkits for named entity recognition using two existing toolkits, Carafe and LingPipe. RESULTS: The "out of the box" Carafe system achieved a very good score (phrase F-measure of 0.9664) with only four hours of work to adapt it to the de-identification task. With further tuning, we were able to reduce the token-level error term by over 36% through task-specific feature engineering and the introduction of a lexicon, achieving a phrase F-measure of 0.9736. CONCLUSIONS: We were able to achieve good performance on the de-identification task by the rapid retargeting of existing toolkits. For the Carafe system, we developed a method for tuning the balance of recall vs. precision, as well as a confidence score that correlated well with the measured F-score.
Rapidly retargetable approaches to de-identification in medical records.
针对医疗记录中去标识化问题的快速重定向方法
阅读:4
作者:Wellner Ben, Huyck Matt, Mardis Scott, Aberdeen John, Morgan Alex, Peshkin Leonid, Yeh Alex, Hitzeman Janet, Hirschman Lynette
| 期刊: | Journal of the American Medical Informatics Association | 影响因子: | 4.600 |
| 时间: | 2007 | 起止号: | 2007 Sep-Oct;14(5):564-73 |
| doi: | 10.1197/jamia.M2435 | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
