Abstract
MOTIVATION: The gut microbiota interacts closely with the host, playing crucial roles in maintaining health. Analysing time-series genomic data enables the investigation of dynamic microbiota changes. However, missing values create significant analytical challenges. RESULTS: We propose a microbiome imputation framework based on a conditional score-based diffusion model, tailored to microbiome data by incorporating phylogenetic convolutional layers. Our method effectively reduces mean absolute error across various missing data ratios for both 16S rRNA and whole-genome shotgun profiles. The imputed datasets enhance downstream predictive tasks, achieving area under the curve scores that exceed or are comparable with those of the existing methods. To further improve the performance, we embedded host metadata into the model using a tabular encoding approach, which yielded additional improvements particularly under higher missing ratios. Our findings underscore the potential of the diffusion model for processing time-series microbiome data with missing values. AVAILABILITY AND IMPLEMENTATION: Related codes and dataset can be found at: https://github.com/misatoseki/metag_time_impute_phylo.git.