Preventing Proteomics Data Tombs Through Collective Responsibility and Community Engagement

通过集体责任和社区参与防止蛋白质组学数据“坟墓”的出现

阅读：2

作者：Vadadokhau,Uladzislau,Soliman,Mai,Castillon,Leticia,Pastor Muñoz,Paula,Id,Linda,Natraj Gayathri,Swethaa,Srivastava,Ankita,Runeberg,Tyko,González-Armijos,Tamara,Šapovalovaitė,Karina,Sakalauskaite,Milda,Adhikari,Sadiksha,Abe,Oluwatosin,Tohmola,Tiialotta,Li,Hao,Sundaresan,Srividhya,Vesikukka,Hanna,Roininen,Jannica,Zangene,Ehsan,Soliymani,Rabah,Tuomivaara,Sami T,Schwämmle,Veit,Saei,Amir A,Varjosalo,Markku,Jafari,Mohieddin

期刊：	Scientific Data	影响因子：	6.900
时间：	2026	起止号：	2026 Jan 22;13(1)
doi：	10.1038/s41597-026-06614-8

Abstract

Public proteomics repositories now host vast amounts of mass spectrometry data, yet much of it remains difficult to reuse, risking "data tombs" that are open access but not practically re-analyzable. In spring 2025, a graduate-level course at the University of Helsinki tasked six student teams with reanalyzing six projects from the Proteomics Identification Database (label-free quantification only) using a common R-based workflow (rpx, mzR, QFeatures, DEP/MSqRob2/limma/OmicsQ packages) that was shared across all teams. The teams reproduced identification, optional quantification, normalization, imputation, and differential expression analyses, and compared the outcomes to the original studies. As expected, systemic barriers recurred across cases: (i) no sample and data relationship format for proteomics metadata in any of the cases; (ii) missing details regarding decoy sets for false discovery rate assessment; (iii) proprietary-only outputs or software (e.g., Thermo.msf, Progenesis) that impeded open reanalysis in interoperable, community-standard formats; (iv) missing data-independent acquisition spectral libraries or protein sequences database files (FASTA); (v) absent or vague normalization/imputation/statistical parameters; (vi) inconsistent file naming; and (vii) insufficient biological/technical replication in at least one project. These shortcomings yielded large discrepancies in the analysis results (e.g., 13,068 vs. 4,923 proteins; 108 vs. 11 differentially expressed proteins), and, in one instance, a highlighted protein lacked robust support in the deposited identifications. We observed that reproducibility in mass spectrometry-based proteomics hinges less on instruments than on transparent metadata, open formats, and executable analysis provenance. We propose that data creators provide a minimum re-analysis package, including raw data and open formats, community standards, basic quality control summaries, data-independent acquisition spectral libraries, and complete parameter/code sets with pinned versions or containers. Moreover, we recommend repository-level nudges toward making such packages mandatory. This educational exercise simultaneously trains the students as well as stress-tests the community data practices to prevent proteomics "data tombs".

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

凋亡

线粒体

转录调控

巨噬细胞

自噬

传染病

氧化应激

肠道菌群

血管生成

磷酸化

囊泡

单细胞

3D/类器官

中性粒细胞

外泌体

药物研究

DNA甲基化

细胞衰老

miRNA

铁死亡

缺氧低氧

乙酰化

泛素化

组蛋白修饰

炎性小体

树突状细胞

代谢重编程

肿瘤微环境

焦亡

lncRNA

m6A/m5C/m7G

空间多组学

细胞基因治疗

内质网应激

相分离

治疗耐药

Treg

免疫代谢

上皮间质转化

染色质重塑

脂质过氧化

蛋白质稳态

铁代谢

脂代谢

cGAS-STING

肠脑轴

细胞极性

乳酸化

氨基酸代谢

碱基编辑

蛋白降解

circRNA

翻译调控

肿瘤异质性

piRNA

低氧缺氧

NK 细胞

MDSC

氧化脂质

溶酶体功能

NETosis

RNA 编辑

细胞干性

CAR-NK

琥珀酰化

冷应激

Tfh

器官芯片

巴豆酰化

表观遗传记忆

空间代谢组

铜死亡

器官纤维化

线粒体未折叠蛋白反应

程序性坏死

自噬流

肠肝轴

MAIT 细胞

丙酰化