Semi-supervised machine learning for automated species identification by collagen peptide mass fingerprinting

基于胶原蛋白肽质量指纹图谱的半监督机器学习自动物种鉴定

阅读:1

Abstract

BACKGROUND: Biomolecular methods for species identification are increasingly being utilised in the study of changing environments, both at the microscopic and macroscopic levels. High-throughput peptide mass fingerprinting has been largely applied to bacterial identification, but increasingly used to identify archaeological and palaeontological skeletal material to yield information on past environments and human-animal interaction. However, as applications move away from predominantly domesticate and the more abundant wild fauna to a much wider range of less common taxa that do not yet have genetically-derived sequence information, robust methods of species identification and biomarker selection need to be determined. RESULTS: Here we developed a supervised machine learning algorithm for classifying the species of ancient remains based on collagen fingerprinting. The aim was to minimise requirements on prior knowledge of known species while yielding satisfactory sensitivity and specificity. The algorithm uses iterations of a modified random forest classifier with a similarity scoring system to expand its identified samples. We tested it on a set of 6805 spectra and found that a high level of accuracy can be achieved with a training set of five identified specimens per taxon. CONCLUSIONS: This method consistently achieves higher accuracy than two-dimensional principal component analysis and similar accuracy with hierarchical clustering using optimised parameters, which greatly reduces requirements for human input. Within the vertebrata, we demonstrate that this method was able to achieve the taxonomic resolution of family or sub-family level whereas the genus- or species-level identification may require manual interpretation or further experiments. In addition, it also identifies additional species biomarkers than those previously published.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。