Protein Abundance Inference via Expectation Maximization in Fluorosequencing

利用期望最大化算法在荧光测序中推断蛋白质丰度

阅读:1

Abstract

Fluorosequencing produces millions of single-peptide reads, yet a principled strategy for converting these data into quantitative protein abundances has been lacking. We introduce a probabilistic framework that adapts expectation maximization to the fluorosequencing measurement process, estimating relative protein abundances with peptide inference results delivered by previously developed peptide-classification tools. The algorithm iteratively updates protein abundances, maximising the likelihood of the observed reads by obtaining more accurate protein abundance estimations. We first assess performance on simulated five-protein mixtures that reflect realistic labelling and system errors. A simple Python implementation processes one million reads in under ten seconds on a standard work-station and lowers the mean absolute error in relative abundance by more than an order of magnitude compared with a uniform-abundance guess, demonstrating robustness in protein inference for small-scale settings. Scalability is then evaluated with simulations of the complete human proteome (20 642 proteins). Ten million reads are processed in less than four hours on a NVIDIA DGX system using one Tesla V100 GPU, confirming that the method remains tractable at proteome scale. Using error rates characteristic of current fluorosequencing, the algorithm produces marginal improvements in relative abundance accuracy. However, when error rates were artificially lowered, estimation error decreased significantly. This result suggests that improvements in fluorosequencing chemistry could directly translate into substantially more accurate quantitative proteomics with this computational framework. Together, these results establish EM-based inference as a scalable model-driven bridge between peptide-level classification and protein-level quantification in fluorosequencing, laying computational groundwork for highthroughput single-molecule proteomics. Furthermore, the proposed protein inference framework can also be used as a refinement step within other inference methods, enhancing their protein abundance estimates.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。