Microbiology research was conducted for decades before widespread availability of sequencing resources and large culture collection sequence repositories, making it challenging to efficiently identify and validate strains used in historical studies. Similarly, finding commercially available microbe strains similar to strains of interest, or containing target genes of interest found during metagenomic experiments is challenging. Despite tremendous advances in sequencing data availability, database curation, and sequence-searching software capabilities, identifying commercially available microbe strains from sequence data remains complicated and tedious. The American Type Culture Collection (ATCC) is an organization selling a wide variety of microbes, uniquely providing strain-level taxonomy classification and associated sequenced reference genomes for over four thousand isolates, with more being added regularly. As researchers purchase and sequence isolates from ATCC, many sequences derived from ATCC isolates are deposited on public databases such as NCBI-Genome. Sequences uploaded to public databases will vary in laboratory, bioinformatics, and metadata quality and can also contain mutations derived from cultivation which are not representative of ATCC stocks. Using ATCC-sourced reference genomes ensures consistent quality and analysis methodologies are implemented to accurately represent strain sequences. Currently, ATCC does not provide methods to search for sequence similarity between many query sequences and ATCC genomes. While NCBI-BLAST could be used to search for queries against GenBank, with results filtered for "ATCC" tags, search result quality varies and requires time-consuming sorting. Here we present the software ATCCfinder (GitHub: https://github.com/lanl/ATCCfinder, Zenodo: https://doi.org/10.5281/zenodo.15178103), utilizing ATCC application interface software (API) to generate query-able databases from ATCC genome resources. The algorithm generates databases of the four ATCC data types: strain-specific genome assembly sequence data (sequence), information about how each strain was collected (metadata, catalogue), and structural/functional information about genome assemblies (annotation). Once ATCC sequences are retrieved by ATCCfinder, nucleotide queries are compared against ATCC reference genomes via sequence alignment tool minimap2, with results parsed and analyzed to produce summaries describing ATCC-available strain homologous sequence matches. ATCCfinder identifies and downloads new ATCC references, allowing users to maintain an updated target search database. ATCCfinder efficiently accesses, queries, and summarizes ATCC resources, identifying purchasable strains homologous to historical sequences, functional genes, operons, and other genetic components.
Use ATCCfinder to identify commercially available American Type Culture Collection strains based on sequence queries.
阅读:6
作者:Koehler Samuel I, Middlebrook Earl A, Hovde Blake T, Hanschen Erik R
| 期刊: | PeerJ | 影响因子: | 2.400 |
| 时间: | 2025 | 起止号: | 2025 Aug 13; 13:e19832 |
| doi: | 10.7717/peerj.19832 | ||
特别声明
1、本文转载旨在传播信息,不代表本网站观点,亦不对其内容的真实性承担责任。
2、其他媒体、网站或个人若从本网站转载使用,必须保留本网站注明的“来源”,并自行承担包括版权在内的相关法律责任。
3、如作者不希望本文被转载,或需洽谈转载稿费等事宜,请及时与本网站联系。
4、此外,如需投稿,也可通过邮箱info@biocloudy.com与我们取得联系。
