Multimodal scene recognition using semantic segmentation and deep learning integration

基于语义分割和深度学习的多模态场景识别

阅读:2

Abstract

Semantic modeling and recognition of indoor scenes present a significant challenge due to the complex composition of generic scenes, which contain a variety of features including themes and objects, makes semantic modeling and indoor scene recognition difficult. The gap between high-level scene interpretation and low-level visual features increases the complexity of scene recognition. In order to overcome these obstacles, this study presents a novel multimodal deep learning technique that enhances scene recognition accuracy and robustness by combining depth information with conventional red-green-blue (RGB) image data. Convolutional neural networks (CNNs) and spatial pyramid pooling (SPP) are used for analysis after a depth-aware segmentation methodology is used to identify several objects in an image. This allows for more precise image classification. The effectiveness of this method is demonstrated by experimental findings, which show 91.73% accuracy on the RGB-D scene dataset and 90.53% accuracy on the NYU Depth v2 dataset. These results demonstrate how the multimodal approach can improve scene detection and classification, with potential uses in fields including robotics, sports analysis, and security systems.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。