Describing the Pearson R distribution of aggregate data

描述汇总数据的皮尔逊相关系数分布

阅读:1

Abstract

Ecological studies and epidemiology need to use group averaged data to make inferences about individual patterns. However, using correlations based on averages to estimate correlations of individual scores is subject to an "ecological fallacy". The purpose of this article is to create distributions of Pearson R correlation values computed from grouped averaged or aggregate data using Monte Carlo simulations and random sampling. We show that, as the group size increases, the distributions can be approximated by a generalized hypergeometric distribution. The expectation of the constructed distribution slightly underestimates the individual Pearson R value, but the difference becomes smaller as the number of groups increases. The approximate normal distribution resulting from Fisher's transformation can be used to build confidence intervals to approximate the Pearson R value based on individual scores from the Pearson R value based on the aggregated scores.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。