AliFilter: a machine learning approach to alignment filtering

AliFilter:一种基于机器学习的对齐过滤方法

阅读:1

Abstract

Multiple sequence alignments are a crucial step in many bioinformatic and computational biology analyses, from protein structure and function prediction to the inference of phylogenetic trees. However, highly divergent sequence alignments often contain a significant amount of noise. Reducing noise is normally achieved by filtering the alignment to remove columns that are poorly aligned or offer minimal useful information-either automatically using various software tools or through manual inspection. Manual approaches are labor-intensive and less reproducible but can utilize the researcher's specialist knowledge, rather than relying on filtering criteria that might not be adequate for each alignment. AliFilter bridges these two approaches to alignment curation, using machine learning to automate manual alignment filtering. AliFilter uses a supervised learning approach to create a model from a small number of manually annotated alignments, then applies this model to reproduce the manual annotation on different datasets. Users can employ the program with a default model or create customized models for individual datasets or filtering criteria. AliFilter accurately reproduces the results of manual annotation (98% accuracy) while being resilient to mistakes in the training data. In a typical phylogenomic workflow, AliFilter reduced the runtime by 35% and produced results that were almost identical to the full alignment, unlike other filtering tools we tested. AliFilter is free and open-source software; it is written in C# and distributed under a GPLv3 license from https://github.com/arklumpus/AliFilter, where both the source code and standalone executables for Windows, macOS, and Linux are available.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。