Abstract
Understanding the stoichiometry and associated stability of virus-like particles (VLPs) is crucial for optimizing their assembly efficiency and immunogenic properties, which are essential for advancing biotechnology, vaccine design, and drug delivery. However, current experimental methods for determining VLP stoichiometry are labor intensive and time-consuming. Machine learning approaches have hardly been applied to the study of VLPs. To address this challenge, we introduce a novel persistent-Laplacian-based machine learning mode that leverages both harmonic and nonharmonic spectra to capture intricate topological and geometric features of VLP structures. This approach achieves superior performance on the VLP200 data set compared with existing methods. To further assess robustness and generalizability, we collected a new data set, VLP706, containing 706 VLP samples with expanded stoichiometry diversity. Our persistent-Laplacian-based machine learning model maintains strong predictive accuracy on VLP706. Additionally, through random sequence perturbative mutation analysis, we found that 60-mers and 180-mers exhibit greater stability than 240-mers and 420-mers.