Pre-training phenotyping classifiers

表型分类器预训练

阅读:1

Abstract

Recent transformer-based pre-trained language models have become a de facto standard for many text classification tasks. Nevertheless, their utility in the clinical domain, where classification is often performed at encounter or patient level, is still uncertain due to the limitation on the maximum length of input. In this work, we introduce a self-supervised method for pre-training that relies on a masked token objective and is free from the limitation on the maximum input length. We compare the proposed method with supervised pre-training that uses billing codes as a source of supervision. We evaluate the proposed method on one publicly-available and three in-house datasets using the standard evaluation metrics such as the area under the ROC curve and F1 score. We find that, surprisingly, even though self-supervised pre-training performs slightly worse than supervised, it still preserves most of the gains from pre-training.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。