Abstract
BACKGROUND: Biological factors affect the human microbiome, highlighting the need for reasonably estimating sample sizes in future population studies. METHODS: We assessed the temporal stability of fecal microbiome diversity, species composition, and genes and functional pathways through shallow shotgun metagenome sequencing. Using intraclass correlation coefficients (ICC), we measured biological variability over 6 months. We estimated case numbers for 1:1 or 1:3 matched case-control studies, considering significance levels of 0.05 and 0.001 with 80% power, based on the collected fecal specimens per participant. RESULTS: The fecal microbiome's temporal stability over 6 months varied (ICC < 0.6) for most alpha and beta diversity metrics. Heterogeneity was seen in species, genes, and pathways stability (ICC, 0.0-0.9). Detecting an OR of 1.5 per SD required 1,000 to 5,000 cases (0.05 significance for alpha and beta; 0.001 for species, genes, and pathways) with equal cases and controls. Low-prevalence species needed 15,102 cases, and high-prevalence species required 3,527. Similar needs applied to genes and pathways. In a 1:3 matched case-control study with one fecal specimen, 10,068 cases were needed for low-prevalence species and 2,351 for high-prevalence species. For ORs of 1.5 with multiple specimens, cases needed for low-prevalence species were 15,102 (one specimen), 8,267 (two specimens), and 5,989 (three specimens). CONCLUSIONS: Detecting disease associations requires a large number of cases. Repeating prediagnostic samples and matching cases to more controls could decrease the needed number of cases for such detections. IMPACT: Our results will help future epidemiologic study designs and implement well-powered microbiome studies.