Celeste: A cloud-based genomics infrastructure with variant-calling pipeline suited for population-scale sequencing projects

Celeste:一个基于云的基因组学基础设施,配备适用于群体规模测序项目的变异检测流程。

阅读:1

Abstract

BACKGROUND: The All of Us Research Program (All of Us) is one of the world's largest sequencing efforts that will generate genetic data for over one million individuals from diverse backgrounds. This historic megaproject will create novel research platforms that integrate an unprecedented amount of genetic data with longitudinal health information. Here, we describe the design of Celeste, a resilient, open-source cloud architecture for implementing genomics workflows that has successfully analyzed petabytes of participant genomic information for All of Us - thereby enabling other large-scale sequencing efforts with a comprehensive set of tools to power analysis. The Celeste infrastructure is tremendously scalable and has routinely processed fluctuating workloads of up to 9,000 whole-genome sequencing (WGS) samples for All of Us, monthly. It also lends itself to multiple projects. Serverless technology and container orchestration form the basis of Celeste's system for managing this volume of data. RESULTS: In 12 months of production (within a single Amazon Web Services (AWS) Region), around 200 million serverless functions and over 20 million messages coordinated the analysis of 1.8 million bioinformatics, quality control, and clinical reporting jobs. Adapting WGS analysis to clinical projects requires adaptation of variant-calling methods to enrich the reliable detection of variants with known clinical importance. Thus, we also share the process by which we tuned the variant-calling pipeline in use by the multiple genome centers supporting All of Us to maximize precision and accuracy for low fraction variant calls with clinical significance. CONCLUSIONS: When combined with hardware-accelerated implementations for genomic analysis, Celeste had far-reaching, positive implications for turn-around time, dynamic scalability, security, and storage of analysis for one hundred-thousand whole-genome samples and counting. Other groups may align their sequencing workflows to this harmonized pipeline standard, included within the Celeste framework, to meet clinical requisites for population-scale sequencing efforts. Celeste is available as an Amazon Web Services (AWS) deployment in GitHub, and includes command-line parameters and software containers.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。