Viewport prediction with cross modal multiscale transformer for 360° video streaming

基于跨模态多尺度Transformer的360°视频流视口预测

阅读:1

Abstract

In the realm of immersive video technologies, efficient 360° video streaming remains a challenge due to the high bandwidth requirements and the dynamic nature of user viewports. Most existing approaches neglect the dependencies between different modalities, and personal preferences are rarely considered. These limitations lead to inconsistent prediction performance. Here, we present a novel viewport prediction model leveraging a Cross Modal Multiscale Transformer (CMMST) that integrates user trajectory and video saliency features across different scales. Our approach outperforms baseline methods, maintaining high precision even with extended prediction intervals. By harnessing the Cross Modal attention mechanisms, CMMST captures intricate user preferences and viewing patterns, offering a promising solution for adaptive streaming in virtual reality and other immersive platforms. The code of this work is available at https://github.com/bbgua85776540/CMMST .

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。