Abstract
Fluorescence microscopy increasingly produces complex volumetric datasets whose biologically meaningful differences are difficult to capture with hand-crafted measurements, especially when signal is distributed across three-dimensional space. Here, we present an interpretable 3D Bag-of-Visual-Words (BoVW) pipeline for classification and analysis of volumetric microscopy data. The framework detects multiscale local keypoints, computes rotationally robust 3D gradient-based descriptors, and aggregates them into image-level visual-word representations. These features are then used for low-dimensional visualization and logistic regression classification, while model weights are mapped back to the original volumes to generate attention maps that localize discriminative structures. We applied the pipeline to two cerebellar granule neuron datasets spanning both ideal and non-ideal imaging conditions. In a near-isotropic lattice light-sheet dataset of chromatin organization, the method separated control and NIPBL loss-of-function nuclei and supported accurate classification, with strongest performance in the facultative heterochromatin and H3.3 channels. Attention mapping and downstream connected-component and Haralick analyses revealed that loss-of-function nuclei contained more fragmented high-attention regions and smoother, more homogeneous chromatin-associated textures than controls. We then evaluated the same framework on an anisotropic confocal timelapse dataset of receptor clustering in dense neuronal cultures, where single-cell segmentation was impractical. Despite these challenges, the representation captured the expected ligand-driven clustering response and resolved subtler differences associated with a polarity protein overexpression. Together, these results establish a simple, interpretable, and broadly applicable framework for extracting biologically meaningful structure from volumetric microscopy datasets while preserving native 3D context.