Abstract
Accurate identification of influenza virus surface glycoproteins-hemagglutinin (HA) and neuraminidase (NA)-is critical for understanding viral morphology and supporting vaccine development. Recent machine learning (ML) methods have shown promise for spike analysis, but they require large annotated datasets and often lack biological interpretability. Here, we present an unsupervised, morphology-based approach for classifying HA and NA spikes directly from cryo-electron microscopy (cryo-EM) reconstructions. Our pipeline integrates fuzzy connectedness segmentation with a simple geometric descriptor, the head-to-stem width ratio, to distinguish spike types automatically. We first applied the method to experimental influenza B virus (B/Lee/40) reconstructions, where the approach identified HA- and NA-like morphologies consistent with biological expectations, though ground truth could not be directly established. To provide quantitative validation, we then generated synthetic three-dimensional (3D) virus phantoms with randomized spike distributions, simulated cryo-EM acquisition and reconstruction, and applied our classification procedure to the segmented spikes. Across 30 independent phantom reconstructions, the method achieved an average classification purity of 97.5%. This framework reduces manual annotation effort, improves reproducibility, and provides interpretable, high-confidence spike labels. Most importantly, the annotations can support training of supervised ML classifiers, bridging classical image analysis and modern data-driven approaches. By enabling scalable spike classification in cryo-EM datasets, the method offers a practical tool for structural virology and influenza surveillance.