Abstract
We have studied 21 435 unique randomized controlled trials (RCTs) from the Cochrane Database of Systematic Reviews (CDSR). Of these trials, 7224 (34%) have a continuous (numerical) outcome and 14 211 (66%) have a binary outcome. We find that trials with a binary outcome have larger sample sizes on average, but also larger standard errors and fewer statistically significant results. We conclude that researchers tend to increase the sample size to compensate for the low information content of binary outcomes, but not sufficiently. In many cases, the binary outcome is the result of dichotomization of a continuous outcome, which is sometimes referred to as "responder analysis". In those cases, the loss of information is avoidable. Burdening more participants than necessary is wasteful, costly, and unethical. We provide a method to convert a sample size calculation for the comparison of two proportions into one for the comparison of the means of the underlying continuous outcomes. This demonstrates how much the sample size may be reduced if the outcome were not dichotomized. We also provide a method to calculate the loss of information after a dichotomization. We apply this method to all the trials from the CDSR with a binary outcome, and estimate that on average, only about 60% of the information is retained after dichotomization. We provide R code and a shiny app at: https://vanzwet.shinyapps.io/info_loss/ to do these calculations. We hope that quantifying the loss of information will discourage researchers from dichotomizing continuous outcomes. Instead, we recommend they "model continuously but interpret dichotomously". For example, they might present "percentage achieving clinically meaningful improvement" derived from a continuous analysis rather than by dichotomizing raw data.