PERADIGM: Phenotype embedding similarity-based rare disease gene mapping

PERADIGM:基于表型嵌入相似性的罕见病基因映射

阅读:2

Abstract

Identifying genes associated with rare diseases remains challenging due to the scarcity of patients and the limited statistical power of traditional association methods. Here, we introduce PERADIGM ( Phenotype Embedding similarity-based RAre DIsease Gene Mapping), a novel framework that leverages natural language processing techniques to integrate comprehensive phenotype information from electronic health records for rare disease gene discovery. PERADIGM employs an embedding model to capture relationships between ICD-10 codes, providing a nuanced representation of individual phenotypes. By utilizing patient similarity scores, it enhances the identification of candidate genes associated with disease-specific phenotypes, surpassing conventional methods that rely on binary disease status. We applied PERADIGM to the UK Biobank dataset for three rare diseases: autosomal dominant polycystic kidney disease (ADPKD), Marfan syndrome, and neurofibromatosis type 1 (NF1). PERADIGM identified additional candidate genes associated with ADPKD-related and Marfan syndrome-related phenotypes, some of which are supported by existing literature, and demonstrated enhanced signal detection for NF1-specific phenotypes beyond traditional methods. Our findings demonstrate the potential of PERADIGM to identify genes associated with rare diseases and related phenotypes by incorporating phenotype embeddings and patient similarity, providing a powerful tool for precision medicine and a deeper understanding of rare disease genetics and clinical manifestations.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。