Abstract
Cryo-electron tomography (cryo-ET) is a powerful technique for imaging molecular complexes in their native cellular environments. However, identifying the vast majority of molecular species in cellular tomograms remains prohibitively difficult. Machine learning (ML) methods provide an opportunity to automate the annotation process, but algorithm development has been hindered by the lack of large, standardized datasets. Here we present an experimental phantom dataset with comprehensive ground-truth annotations for six molecular species to spur new algorithm development and benchmark existing tools. This annotated dataset is available on the CryoET Data Portal with infrastructure to streamline access for methods developers across fields.