An integrative multiomics random forest framework for robust biomarker discovery

一种用于稳健生物标志物发现的整合多组学随机森林框架

阅读:2

Abstract

BACKGROUND: High-throughput technologies now produce a wide array of omics data, from genomic and transcriptomic profiles to epigenomic and proteomic measurements. Integrating multiple omics layers measured on the same samples can reveal cross-layer molecular hubs that single-layer analyses miss. However, many existing integrative methods rely on linear assumptions or univariate feature importance, limiting their ability to capture nonlinear and interaction-driven dependencies across data modalities. RESULTS: We present an unsupervised, multivariate random forest (MRF) framework with an inverse minimal depth (IMD) importance to prioritize shared biomarkers across omics. In each forest, one layer serves as a multivariate response and the other as predictors; IMD summarizes how early a predictor (or response maximal splitting response variable) appears across trees, yielding interpretable, cross-layer feature rankings. We provide two IMD-based selection strategies and introduce an optional IMD power transform to enhance sensitivity to interaction signals. In extensive simulations spanning linear, nonlinear, and interaction regimes, our method matches sparse partial least squares/canonical correlation analysis under linear settings and outperforms them as nonlinearity increases, while adapted univariate ensemble learners (random forest, gradient boosting machine, XGBoost) underperform in the multivariate, unsupervised context. Applied to breast invasive carcinoma and colon adenocarcinoma in The Cancer Genome Atlas (TCGA), MRF-IMD identifies genes, CpGs, and microRNAs enriched for cancer-relevant pathways and yields more robust survival stratification than linear integrators with matched model sizes. In a TCGA pan-cancer analysis, MRF-IMD features achieve a higher Adjusted Rand Index than alternatives and recover coherent tumor-type clusters; in the Alzheimer's Disease Neuroimaging Initiative (ADNI), the integrative signature improves dementia progression stratification over a published methylation risk score. CONCLUSIONS: MRF-IMD provides a scalable and interpretable framework for multiomics integration that reliably identifies cross-layer biomarkers when nonlinear and interaction-driven dependencies are present. This approach advances robust biomarker discovery beyond the limits of linear integrative methods.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。