Abstract
PURPOSE: We aim to develop a conditional generative diffusion model capable of producing three-dimensional (3D) trabecular bone samples that can be tuned to achieve specific structural characteristics prescribed in terms of three geometric metrics of trabecular microarchitecture: bone volume fraction (BV/TV), trabecular thickness (Tb.Th), and spacing (Tb.Sp). APPROACH: The generative model is based on 3D latent diffusion. The latent representation of trabecular patches is obtained by a dedicated variational autoencoder (VAE). To control the microstructure characteristics of the synthetic samples, the model is conditioned on BV/TV, Tb.Th, and Tb.Sp. In addition, a shifting slab inference method is employed to generate extended volumes with locally tunable microstructure in a computationally efficient manner. The training data involved 3551 128 × 128 × 128 volumes of interest (VOIs) extracted from micro-CT volumes ( 50 μm voxel size) of 20 femoral bone specimens, paired with trabecular metrics computed within each VOI; the split for training and validation data was 9:1. For testing, 2000 synthetic bone samples were generated using single slab inference over a wide range of condition (target) microstructure metrics. Results were evaluated in terms of (i) consistency across multiple realizations of reverse diffusion for a fixed condition, measured by the coefficient of variation (CV) of trabecular measurements; (ii) agreement between BV/TV, Tb.Th, and Tb.Sp values provided as a condition and those measured in the corresponding synthetic samples, assessed using Pearson correlation coefficient (PCC); and (iii) overlap between the distributions of trabecular parameters of real and synthetic bone patches; this coverage analysis included both the conditioning parameters of BV/TV, Tb.Th, and Tb.Sp, as well as unconditioned metrics of degree of anisotropy, ellipsoid factor, and connectivity. Further, extended volumes ( 128 × 128 × 256 voxels ) were generated using shifting-slab inference with spatially invariant and spatially varying conditioning and evaluated in terms of local agreement between the prescribed and achieved trabecular parameters. RESULTS: Visually, the synthesized cancellous bone patches appear highly similar to the training micro-CT data. The conditioned parameters of the generated volumes agree well with their target values (PCC of 0.99, 0.97, and 0.95 for BV/TV, Tb.Th, and Tb.Sp, respectively). There is a trend toward generating trabeculae that are slightly thicker than prescribed, but this bias is typically on the order of one voxel ( 50 μm ). The metrics of BV/TV, Tb.Th, and Tb.Sp remain stable across multiple model inferences with a fixed condition (CV of ≤ 6% ). Joint distributions of microstructure parameters of the synthetic samples capture the real-world distributions of the training data, with a slight underrepresentation of cases with large Tb.Sp ( ≥ 1200 μm ), attributed to imbalances in the training set. The shifting slab mechanism resulted in realistic and continuous trabecular structures with variable local architecture that accurately matched the prescribed spatial variation of the conditioned metrics of microstructure. CONCLUSION: The proposed generative model is capable of generating realistic digital trabecular bone patches. The application of latent space diffusion using a dedicated VAE augmented with the shifting-slab mechanism effectively overcomes computer memory constraints to enable the synthesis of 3D volumes. The conditioning mechanism is effective in guiding the synthesis toward desired microstructural characteristics. Possible applications include virtual clinical trials of new skeletal image biomarkers and establishing priors for advanced image reconstruction.