Assessing generalizability of a dengue classifier across multiple datasets

评估登革热分类器在多个数据集上的泛化能力

阅读:1

Abstract

Early diagnosis of dengue fever is important for individual treatment and monitoring disease prevalence in the population. To assist diagnosis, previous studies have proposed classification models to detect dengue from symptoms and clinical measurements. However, there has been little exploration of whether existing models can be used to make predictions for new populations. In this study, we assess the generalizability of dengue classification models to new datasets. We trained logistic regression models on five publicly available dengue datasets from previous studies, using three explanatory variables identified as important in prior work: age, white blood cell count, and platelet count. These five datasets were collected at different times in different locations, with a variety of disease rates and patient ages. A model was trained on each dataset, and predictive performance and model calibration was evaluated on both the original (training) dataset, and the other (test) datasets from different studies. By comparing the model's performance when applied to data from a new location, we are able to assess the model's generalizability to new populations. We further compared performance with larger models and other classification methods. In-sample area under the receiver operating characteristic curve (AUC) values for the logistic regression models ranged from 0.74 to 0.89, while out-of-sample AUCs ranged from 0.55 to 0.89. Matching age ranges in training/test datasets increased AUC values and balanced the sensitivity and specificity. Adjusting the predicted probabilities to account for differences in dengue prevalence improved calibration in 20/28 training-test pairs. Results were similar when other explanatory variables were included and when other classification methods (decision trees and support vector machines) were used. The in-sample performance of the logistic regression model was consistent with previous dengue classifiers, suggesting the chosen model is a good choice in a variety of settings and has decent overall performance. However, adjustments are required to make predictions on new datasets. Practitioners can use existing dengue classifiers in new settings but should be careful with different patient ages and disease rates.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。