HKDE-LACM: a hybrid model for lactic acid bacteria classification via k-mer and DNABERT-2 embedding fusion with cyclic DE-BO optimization

HKDE-LACM:一种基于k-mer和DNABERT-2嵌入融合以及循环DE-BO优化的乳酸菌分类混合模型

阅读:4

Abstract

BACKGROUND: Lactic acid bacteria (LAB) play vital roles in food production and clinical applications. Accurate classification of LAB strains facilitates their functional development and targeted utilization. Although machine learning and deep learning methods have been widely applied to genome sequence classification, challenges remain in capturing comprehensive feature representations and enhancing model generalizability. RESULTS: We present HKDE-LACM, a hybrid classification model that integrates high-dimensional k-mer frequency features with contextual embeddings derived from DNABERT-2. To optimize model hyperparameters, we introduce a Cyclic Differential Evolution and Bayesian Optimization with Failure Avoidance (C-DBFA) framework. We conducted 10-fold cross-validation on three LAB datasets and evaluated performance. Experimental results demonstrate that HKDE-LACM outperforms existing methods in terms of both classification accuracy and robustness. CONCLUSIONS: HKDE-LACM overcomes the limitations of traditional k-mer features by incorporating semantic embeddings, thereby enriching the representation of genomic sequences. In addition, the model can automatically identify optimal combinations of feature extractors and classifiers through the C-DBFA optimization framework. These advantages effectively enhance the model's generalization ability, making it a promising tool for genome-based LAB classification and related tasks.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。