Abstract
Live biotherapeutic products (LBPs) deliver microbial strains to modulate the host microbiota in order to promote health or treat and prevent disease. Since endogenous strains are already present, accurately evaluating LBP efficacy and mechanism of action requires distinguishing administered from endogenous strains. Although computational tools exist for inferring strains from short-read metagenomic data, few have been rigorously tested in the context of LBP treatment. Here, we assess the ability of StrainFacts, a computational tool for inferring strains from short-read metagenomic data, to estimate strain abundances and genotypes of endogenous and administered strains. We performed a simulation study of a single-strain LBP trial, modeling serial samples across a range of administered strain abundance, co-occurring endogenous strains, and sequencing depths. We found that StrainFacts accurately estimated both LBP and endogenous strain abundances and genotypes within simulated samples. We further validated methods using human vaginal microbiota samples spiked with CTV-05, the Lactobacillus crispatus strain contained in the LBP LACTIN-V, which has been shown to reduce recurrent bacterial vaginosis. Our findings demonstrate that StrainFacts can robustly assess LBP and endogenous strain colonization, abundance, and dynamics in simulated and experimental microbiota samples, supporting its utility as an analysis tool for vaginal LBP therapeutic trial data.