Hypothesis testing procedure for binary and multi-class F(1) -scores in the paired design

配对设计中二元和多类 F(1) 分数的假设检验程序

阅读:1

Abstract

In modern medicine, medical tests are used for various purposes including diagnosis, disease screening, prognosis, and risk prediction. To quantify the performance of the binary medical test, we often use sensitivity, specificity, and negative and positive predictive values as measures. Additionally, the F1 -score, which is defined as the harmonic mean of precision (positive predictive value) and recall (sensitivity), has come to be used in the medical field due to its favorable characteristics. The F1 -score has been extended for multi-class classification, and two types of F1 -scores have been proposed for multi-class classification: a micro-averaged F1 -score and a macro-averaged F1 -score. The micro-averaged F1 -score pools per-sample classifications across classes and then calculates the overall F1 -score, whereas the macro-averaged F1 -score computes an arithmetic mean of the F1 -scores for each class. Additionally, Sokolova and Lapalme 1 gave an alternative definition of the macro-averaged F1 -score as the harmonic mean of the arithmetic means of the precision and recall over classes. Although some statistical methods of inference for binary and multi-class F1 -scores have been proposed, the methodology development of hypothesis testing procedure for them has not been fully progressing yet. Therefore, we aim to develop hypothesis testing procedure for comparing two F1 -scores in paired study design based on the large sample multivariate central limit theorem.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。