Microbiology research was conducted for decades before widespread availability of sequencing resources and large culture collection sequence repositories, making it challenging to efficiently identify and validate strains used in historical studies. Similarly, finding commercially available microbe strains similar to strains of interest, or containing target genes of interest found during metagenomic experiments is challenging. Despite tremendous advances in sequencing data availability, database curation, and sequence-searching software capabilities, identifying commercially available microbe strains from sequence data remains complicated and tedious. The American Type Culture Collection (ATCC) is an organization selling a wide variety of microbes, uniquely providing strain-level taxonomy classification and associated sequenced reference genomes for over four thousand isolates, with more being added regularly. As researchers purchase and sequence isolates from ATCC, many sequences derived from ATCC isolates are deposited on public databases such as NCBI-Genome. Sequences uploaded to public databases will vary in laboratory, bioinformatics, and metadata quality and can also contain mutations derived from cultivation which are not representative of ATCC stocks. Using ATCC-sourced reference genomes ensures consistent quality and analysis methodologies are implemented to accurately represent strain sequences. Currently, ATCC does not provide methods to search for sequence similarity between many query sequences and ATCC genomes. While NCBI-BLAST could be used to search for queries against GenBank, with results filtered for "ATCC" tags, search result quality varies and requires time-consuming sorting. Here we present the software ATCCfinder (GitHub: https://github.com/lanl/ATCCfinder, Zenodo: https://doi.org/10.5281/zenodo.15178103), utilizing ATCC application interface software (API) to generate query-able databases from ATCC genome resources. The algorithm generates databases of the four ATCC data types: strain-specific genome assembly sequence data (sequence), information about how each strain was collected (metadata, catalogue), and structural/functional information about genome assemblies (annotation). Once ATCC sequences are retrieved by ATCCfinder, nucleotide queries are compared against ATCC reference genomes via sequence alignment tool minimap2, with results parsed and analyzed to produce summaries describing ATCC-available strain homologous sequence matches. ATCCfinder identifies and downloads new ATCC references, allowing users to maintain an updated target search database. ATCCfinder efficiently accesses, queries, and summarizes ATCC resources, identifying purchasable strains homologous to historical sequences, functional genes, operons, and other genetic components.
Use ATCCfinder to identify commercially available American Type Culture Collection strains based on sequence queries
使用 ATCCfinder 根据序列查询结果识别市售的美国典型培养物保藏中心菌株
阅读:2
作者:Samuel I Koehler ,Earl A Middlebrook ,Blake T Hovde ,Erik R Hanschen
| 期刊: | PeerJ | 影响因子: | 2.300 |
| 时间: | 2025 | 起止号: | 2025 Aug 13:13:e19832. |
| doi: | 10.7717/peerj.19832 | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
