Abstract
SIGNIFICANCE: Lumen segmentation in intravascular optical coherence tomography (IVOCT) images is essential for quantifying vascular stenosis severity, location, and length. Current methods relying on manual parameter tuning or single-frame spatial features struggle with image artifacts, limiting clinical utility. AIM: We aim to develop a temporal residual U-Net (TR-Unet) leveraging spatiotemporal feature fusion for robust IVOCT lumen segmentation, particularly in artifact-corrupted images. APPROACH: We integrate convolutional long short-term memory networks to capture vascular morphology evolution across pullback sequences, enhanced ResUnet for spatial feature extraction, and coordinate attention mechanisms for adaptive spatiotemporal fusion. RESULTS: By processing 2451 clinical images, the proposed TR-Unet model shows a well performance as Dice coefficient = 98.54%, Jaccard similarity (JS) = 97.17%, and recall = 98.26%. Evaluations on severely blood artifact-corrupted images reveal improvements of 3.01% (Dice), 1.3% (ACC), 5.24% (JS), 2.15% (recall), and 2.06% (precision) over competing methods. CONCLUSIONS: TR-Unet establishes a robust and effective spatiotemporal fusion paradigm for IVOCT segmentation, demonstrating significant robustness to artifacts and providing architectural insights for temporal modeling optimization.