Abstract
Polysorbate 20 (PS20) and polysorbate 80 (PS80) are essential surfactants used to stabilize biopharmaceutical products, yet their highly heterogeneous mixtures and susceptibility to oxidation and enzymatic hydrolysis complicate routine analysis. We developed a hierarchical generative model that reconstructs entire liquid chromatography-mass spectrometry (LC-MS) measurements to automatically interpret complex polysorbate datasets. By embedding domain knowledge of base structures, oxyethylene chain lengths, fatty acid esterification, and isotope patterns, the model resolves individual subspecies and provides molecular-level composition. Applied to PS20 and PS80, the approach distinguishes oxidative from hydrolytic degradation and yields pathway-specific fingerprints. Model outputs agree closely with manual integration while delivering greater depth and automation. This transforms polysorbate analysis from labor-intensive peak-by-peak workflows into an objective, comprehensive characterization tool suited for quality control, batch selection and degradation monitoring throughout development and manufacturing.