Benchmarking knowledge graph embedding models for the prediction of oligogenic combinations

对用于预测寡基因组合的知识图谱嵌入模型进行基准测试

阅读:2

Abstract

Identifying the potential oligogenic causes of rare diseases remains a challenge, notwithstanding the advancements made in the last decade. While a variety of predictive and ranking approaches have been proposed, their precision remains limited, as only a small number of high-quality training cases are available and it remains difficult to know which features may be most relevant for the design of new predictors. We hypothesize here that structured biological information, which provides an integration of various relevant biological networks and ontologies in a single heterogeneous knowledge graph, can make a difference as it allows for learning a relevant genetic representation through KGE methods. An exhaustive benchmarking is performed here wherein we assess the performance of various state-of-the-art embedding models for the task of identifying potentially pathogenic gene pairs. The results obtained show that these KGE provide highly accurate predictions, leading to an Area Under the Precision-Recall Curve of up to $0.93$, representing also a significant advancement over previous approaches for predicting gene pairs involved in oligogenic diseases. We show nonetheless that care needs to be taken in the cross-validation when using embeddings, as data leakage between folds in embedding space will reveal overly optimistic results. The further evaluation of the methods on a holdout set as well as on a group of new male infertility cases show that three Translational Distance models (TransE, MurE, and RotatE) and two of the Semantic Matching models (DisMult and QuatE) provide the better results. The analysis is concluded by comparing all known gene combinations for these top-ranking models, examining their similarities and differences. Overall, KGE provide a predictive advancement but new steps will need to be taken generate explanations as to why the pairs are relevant for oligogenic diseases.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。