Application of Transcriptome-Based Gene Set Featurization for Machine Learning Model to Predict the Origin of Metastatic Cancer

基于转录组的基因集特征化在机器学习模型中预测转移性癌症起源的应用

阅读:10
作者:Yeonuk Jeong, Jinah Chu, Juwon Kang, Seungjun Baek, Jae-Hak Lee, Dong-Sub Jung, Won-Woo Kim, Yi-Rang Kim, Jihoon Kang, In-Gu Do

Abstract

Identifying the primary site of origin of metastatic cancer is vital for guiding treatment decisions, especially for patients with cancer of unknown primary (CUP). Despite advanced diagnostic techniques, CUP remains difficult to pinpoint and is responsible for a considerable number of cancer-related fatalities. Understanding its origin is crucial for effective management and potentially improving patient outcomes. This study introduces a machine learning framework, ONCOfind-AI, that leverages transcriptome-based gene set features to enhance the accuracy of predicting the origin of metastatic cancers. We demonstrate its potential to facilitate the integration of RNA sequencing and microarray data by using gene set scores for characterization of transcriptome profiles generated from different platforms. Integrating data from different platforms resulted in improved accuracy of machine learning models for predicting cancer origins. We validated our method using external data from clinical samples collected through the Kangbuk Samsung Medical Center and Gene Expression Omnibus. The external validation results demonstrate a top-1 accuracy ranging from 0.80 to 0.86, with a top-2 accuracy of 0.90. This study highlights that incorporating biological knowledge through curated gene sets can help to merge gene expression data from different platforms, thereby enhancing the compatibility needed to develop more effective machine learning prediction models.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。