Leveraging Temporal Down-Sampling Structure and Spatio-Temporal Fusion for Efficient Video Coding

利用时间下采样结构和时空融合实现高效视频编码

阅读:1

Abstract

Down-sampling-based video compression frameworks have shown great potential in improving compression efficiency in modern sensing and imaging systems. However, existing methods ignore critical spatial and temporal redundancy, and treat all frames uniformly during down-sampling. This leads to the loss of important information and impacts compression efficiency. To address these limitations, this paper proposes a temporal down-sampling system, in which only intermediate frames are down-sampled while preserving key frames with high quality for reference. On the decoding side, we employ a frame-recurrent enhancement mechanism to maximize the use of temporal redundancy information. In the fusion of enhancement stage, we design a Multi-scale Temporal-Spatial Attention (MTSA) module. MTSA consists of two components: Multi-Temporal Attention (MTA) and Pyramid Spatial Attention (PSA). MTA performs multi-scale temporal correlation modeling, expanding the receptive field and providing stable cues in compressed regions. PSA integrates local spatial saliency and contextual structure in a progressive and multi-stage manner. Extensive experiments show that our approach achieves consistent BD-rate reductions. Under All-Intra, Low-Delay-P, and Random Access configurations, we observe BD-rate reductions of I, P, and B frames ranging from 14% to 39% compared to VVC, and outperform prior approaches anchored by the standard HEVC.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。