Abstract
Medical image analysis plays a crucial role in linking perceptual mechanisms with clinical diagnosis, yet conventional deep learning models often rely on statistical correlations rather than modeling the underlying generative structure, leading to limited robustness in small-sample and cross-domain scenarios. To address this issue, we propose a hierarchical feature consistency framework named "MedCSS" that integrates causal self-supervised learning. Built upon a 3D ResNet backbone, the method aligns intermediate and high-level features through distributional consistency while introducing a coding rate-based causal regularization to suppress non-causal redundancy. Experiments on the MedMNIST3D benchmark demonstrate enhanced feature stability, boundary sensitivity, and generalization across diverse medical structures. Visualization analyses further reveal improved morphological coherence and causal interpretability. This study highlights the potential of causal self-supervision for structurally robust and semantically consistent representation learning in three-dimensional medical imaging.