Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation

共识编码序列 (CCDS) 数据库:由专家管理支持的一组标准化的人类和小鼠蛋白质编码区域

阅读:5
作者:Shashikant Pujar, Nuala A O'Leary, Catherine M Farrell, Jane E Loveland, Jonathan M Mudge, Craig Wallin, Carlos G Girón, Mark Diekhans, If Barnes, Ruth Bennett, Andrew E Berry, Eric Cox, Claire Davidson, Tamara Goldfarb, Jose M Gonzalez, Toby Hunt, John Jackson, Vinita Joardar, Mike P Kay, Vamsi K K

Abstract

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。