"Cake causes herpes?" - promiscuous dichotomisation induces false positives

“蛋糕会导致疱疹吗?”——这种随意二分法会导致假阳性结果。

阅读:1

Abstract

BACKGROUND: Continuous biomedical data is often dichotomized into two or more groups for analysis, despite long-standing warnings from statisticians that this constitutes bad practice. This dichotomisation is typically discouraged because it reduces statistical power and may obscure important trends. This paper considers another reason to discourage this practice: that dichotomisation is a powerful tool to manipulate data, as dichotomising at an arbitrary yet flexible threshold (which we term 'promiscuous dichotomisation') represents a powerful researcher degree of freedom. METHODS: The motivating question is how probable is it that given a set of uniformly distributed data a threshold can be engineered to produce the illusion of a true effect when none exists? To estimate this, we employed both analytical approaches and Monte-Carlo simulation approaches to quantify the expected number of spurious findings that could arise from manipulating a dichotomous threshold for an arbitrary data set. We also illustrate an example of this with NHANES data, showing how a spurious relationship between blood glucose and herpes status could be engineered. RESULTS: For even a relatively small sample of [Formula: see text], a false positive rate of [Formula: see text] can be observed, rising to over [Formula: see text] if low counts scenarios are not excluded. With larger samples even with low-count exclusion, false positive rates in excess of [Formula: see text] for [Formula: see text] and [Formula: see text] for [Formula: see text] are possible, climbing to in excess of [Formula: see text] and [Formula: see text] respectively if low-count scenarios were not excluded. For most configurations, manipulation of thresholds was a highly viable methods of crafting a false positive result. CONCLUSIONS: It is likely that manipulating cut-off points in measured variables represents a significant source of data manipulation in published science, and the ease of access of larger health databases means this is an issue that is likely to grow in severity. We discuss implications of this, and means of identifying potential promiscuous dichotomisation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。