Machine learning-based proteogenomic data modeling identifies circulating plasma biomarkers for early detection of lung cancer

基于机器学习的蛋白质组学数据建模可识别循环血浆生物标志物,用于肺癌的早期检测

阅读:2

Abstract

BACKGROUND: Genetic aberrations are among the critical driving factors of lung cancer. Importantly, the impact of genetic variations on proteomic dysregulations with the goal of characterizing potential diagnostic biomarkers at the population-level requires additional investigation. Modeling such proteogenomic interactions is crucial in understanding early-stage biological disruptions to inform biomarker discovery, successful clinical trials, and developing effective therapeutics. METHODS: We investigated two complementary aspects of lung cancer risk. First, we performed a genome-wide association study of lung cancer using population-scale datasets, then examined whether lung cancer risk-associated variants influence plasma protein levels using the UK Biobank Pharma Proteomics Project data. Second, we identified plasma proteomic dysregulations in presymptomatic and symptomatic patients with the objective of pinpointing diagnostic biomarkers through leveraging machine learning methods. RESULTS: Using the identified proteins, machine learning models achieved median cross-validated AUCs of 0.85-0.88 (0-4 years before diagnosis [YBD]), 0.81-0.84 (5-9 YBD), and 0.80-0.86 (0-9 YBD). Performing survival analyses within the 5-9 YBD group, elevated levels of eight proteins, such as CALCB, PLAUR, and CD74, were found to significantly associate with lower survival. We identified 22 disease-associated proteins, of which 14 have been previously implicated in lung cancer, including CEACAM5, CXCL17, GDF15, WFDC2 along with 8 novel proteins. These proteins were enriched in pathways related to cytokine signaling, interleukin regulation, neutrophil degranulation, and lung fibrosis. CONCLUSIONS: While these findings do not establish mechanistic causality, they highlight proteomic alterations reflecting systemic changes preceding the diagnosis. Our study contributes to understanding genome-proteome relationships in lung cancer and identifies circulating proteins warranting further investigation as potential early biomarkers for screening and risk stratification.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。