Abstract
Accurate localization and identification of protein complexes in cryo-electron tomography (cryo-ET) volumes are essential for understanding cellular functions and disease mechanisms. However, automated annotation of these macromolecular assemblies remains challenging due to low signal-to-noise ratios, missing wedge artifacts, heterogeneous backgrounds, and structural diversity. In this study, we present a hybrid framework integrating You Only Look Once (YOLO) object detection with UNet3D volumetric segmentation, enhanced by density-based spatial clustering of applications with noise (DBSCAN) post-processing for automated protein particle annotation in cryo-ET volumes. Our approach combines YOLO's efficient region proposal capabilities with UNet3D's powerful 3D feature extraction through a dual-branch architecture featuring optimized Spatial Pyramid Pooling-Fast (SPPF) modules and asymmetric feature splitting. Extensive experiments on the Chan Zuckerberg Initiative Imaging (CZII) cryo-ET dataset demonstrate that our method significantly outperforms existing state-of-the-art approaches, including DeepFinder, standard UNet3D, YOLOv5-3D, and 3D ResNet models, achieving a mean recall of 0.8848 and F4-score of 0.7969. The framework demonstrates robust performance across various protein particle types and imaging conditions, offering a promising technical solution for high-throughput structural biology workflows requiring accurate macromolecular annotation in cellular cryo-ET data.