A Parallelization Strategy for the Time Efficient Analysis of Thousands of LC/MS Runs in High-Performance Computing Environment

一种在高性能计算环境下高效分析数千次LC/MS运行的并行化策略

阅读:1

Abstract

Combining robust proteomics instrumentation with high-throughput enabling liquid chromatography (LC) systems (e.g., timsTOF Pro and the Evosep One system, respectively) enabled mapping the proteomes of 1000s of samples. Fragpipe is one of the few computational protein identification and quantification frameworks that allows for the time-efficient analysis of such large data sets. However, it requires large amounts of computational power and data storage space that leave even state-of-the-art workstations underpowered when it comes to the analysis of proteomics data sets with 1000s of LC mass spectrometry runs. To address this issue, we developed and optimized a Fragpipe-based analysis strategy for a high-performance computing environment and analyzed 3348 plasma samples (6.4 TB) that were longitudinally collected from hospitalized COVID-19 patients under the auspice of the Immunophenotyping Assessment in a COVID-19 Cohort (IMPACC) study. Our parallelization strategy reduced the total runtime by ∼90% from 116 (theoretical) days to just 9 days in the high-performance computing environment. All code is open-source and can be deployed in any Simple Linux Utility for Resource Management (SLURM) high-performance computing environment, enabling the analysis of large-scale high-throughput proteomics studies.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。