Feature selection by replicate reproducibility and non-redundancy

基于重复性和非冗余性的特征选择

阅读:1

Abstract

MOTIVATION: A fundamental step in many analyses of high-dimensional data is dimension reduction. Two basic approaches are introduction of new synthetic coordinates and selection of extant features. Advantages of the latter include interpretability, simplicity, transferability, and modularity. A common criterion for unsupervized feature selection is variance or dynamic range. However, in practice, it can occur that high-variance features are noisy, that important features have low variance, or that variances are simply not comparable across features because they are measured in unrelated numeric scales or physical units. Moreover, users may want to include measures of signal-to-noise ratio and non-redundancy into feature selection. RESULTS: Here, we introduce the RNR algorithm, which selects features based on (i) the reproducibility of their signal across replicates and (ii) their non-redundancy, measured by linear dependence. It takes as input a typically large set of features measured on a collection of objects with two or more replicates per object. It returns an ordered list of features, i1,i2,…,ik, where feature i1 is the one with the highest reproducibility across replicates, i2 that with the highest reproducibility across replicates after projecting out the dimension spanned by i1, and so on. Applications to microscopy-based imaging of cells and proteomics highlight benefits of the approach. AVAILABILITY AND IMPLEMENTATION: The RNR method is available via Bioconductor (Huber W, Carey VJ, Gentleman R et al. (Orchestrating high-throughput genomic analysis with bioconductor. Nat Methods 2015;12:115-21.) in the R package FeatSeekR. Its source code is also available at https://github.com/tcapraz/FeatSeekR under the GPL-3 open source license.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。