Controlling FDR in selecting group-level simultaneous signals from multiple data sources with application to the National COVID Collaborative Cohort data

利用国家新冠协作队列数据,控制错误发现率(FDR)以从多个数据源中选择组级同步信号。

阅读:1

Abstract

One challenge in exploratory association studies using observational data is that the associations between the predictors and the outcome are potentially weak and rare, and the candidate predictors have complex correlation structures. False discovery rate (FDR) controlling procedures can provide important statistical guarantees for replicability in predictor identification in exploratory research. In the recently established National COVID Collaborative Cohort (N3C), electronic health record (EHR) data on the same set of grouped candidate predictors are independently collected in multiple different data contributing sites, offering opportunities to identify true associations by combining information from different sources. One challenge is to handle the heterogeneous data types for the same clinical endpoint from the multiple sites. This paper addresses this challenge by presenting a general knockoff-based variable selection algorithm to identify associations from unions of group-level conditional independence tests (simultaneous signals) with exact FDR control guarantees under finite sample settings. This algorithm can work with general regression settings, allowing heterogeneity of both the predictors and the outcomes across multiple data sources. We demonstrate the performance of this method with extensive numerical studies and an application to the N3C data.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。