Class-balanced negative training sets for improving classifier model predictions of enhancer-promoter interactions

使用类别平衡的负样本训练集来改进增强子-启动子相互作用的分类器模型预测。

阅读:1

Abstract

BACKGROUND: Enhancers regulate gene expression by forming DNA loops, thereby bringing themselves in close proximity to the target gene promoter. The human genome contains hundreds of thousands of enhancers, vastly outnumbering its 20,000-25,000 protein-coding genes, highlighting the importance of enhancer-promoter interactions (EPIs) in gene regulation. Supervised learning models have been developed to predict EPIs, often using experimentally validated interacting enhancer-promoter pairs and artificially generated negative samples. However, the lack of reliable negative samples presents a challenge. Current methods randomly select pairs from unlabeled data, leading to class imbalance and reduced predictive performance. This imbalance, where enhancers and promoters are unevenly distributed between the positive and negative sets, hinders classifiers from learning meaningful patterns. Therefore, constructing more reliable negative samples is crucial for improving the accuracy of EPI predictions. RESULTS: We developed two methods to generate class-balanced negative training sets for EPI classifiers: one based on maximum flow and the other on Gibbs sampling. We evaluated these methods with the TargetFinder and TransEPI classifiers across five and six cell lines, respectively. The trained models were tested using a common negative test set. Our negative training sets significantly improved the prediction performance across several metrics, including precision, recall, and area under the receiver operating characteristic curve. CONCLUSIONS: Our findings demonstrate that carefully designed negative samples can enhance the performance of EPI classifiers. Further advanced methods in generating negative EPIs should further improve prediction accuracy. The source code is available at https://github.com/maruyama-lab-design/CBOEP2 .

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。