VEFill: a model for accurate and generalizable deep mutational scanning score imputation across protein domains

VEFill:一种用于跨蛋白质结构域进行准确且可推广的深度突变扫描评分插补的模型

阅读:1

Abstract

BACKGROUND: Deep Mutational Scanning (DMS) assays can systematically assess the effects of amino acid substitutions on protein function. While DMS datasets have been generated for many targets, they often suffer from incomplete variant coverage due to technical constraints, limiting their utility in variant interpretation and downstream analyses. RESULTS: We developed VEFill, a gradient boosting model for imputing missing DMS scores across protein domains. VEFill is trained on the Human Domainome 1 dataset, a large, standardized set of DMS experiments using a uniform stability-based assay, and integrates a broad set of additional biologically informative features including ESM-1v sequence embeddings, evolutionary conservation (EVE scores), amino acid substitution matrices, and physicochemical descriptors. The model achieved robust predictive performance ( R2 = 0.64 , Pearson r = 0.80). It also demonstrated reliable generalization to unseen proteins in other stability-based datasets, while showing weaker performance on activity-based assays. Per-protein models further confirmed VEFill's effectiveness under limited-data conditions. A reduced two-feature version using only ESM-1v embeddings and mean DMS scores performed comparably to the full model, suggesting a computationally efficient alternative. However, true zero-shot prediction without positional context remains a challenge, particularly for functionally complex proteins. CONCLUSIONS: VEFill offers an interpretable, scalable framework for DMS score imputation, especially effective in stability-focused and sparse-data settings. It enables systematic mutation prioritization and may support the design of efficient experimental libraries for variant effect studies.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。