The New Frontier of Quality Evaluation for Visual Sensors: A Survey of Large Multimodal Model-Based Methods

视觉传感器质量评价的新前沿:基于大型多模态模型的方法综述

阅读:1

Abstract

Visual quality assessment is entering a new frontier as media evolve from static images to temporally dynamic videos and 3D content. These visual signals are typically captured by sensing devices such as cameras and depth sensors, whose acquisition characteristics significantly influence perceptual quality. Traditional quality models, including distortion-centric and regression-based approaches, perform well on conventional degradations but struggle to evaluate higher-level attributes such as semantic plausibility and structural coherence in modern AI-generated and multimodal scenarios. The emergence of large multimodal models (LMMs), including vision–language models (VLMs) and multimodal large language models (MLLMs), reshapes the evaluation paradigm by enabling semantic grounding, instruction-driven assessment, and explainable reasoning. This survey presents a unified perspective on visual quality assessment for sensor-captured visual data across image, video, and 3D modalities. We review conventional deep learning approaches and recent LMM-based methods, highlighting how multimodal fusion and language-conditioned reasoning transform quality assessment from scalar prediction to perceptual intelligence. Finally, we discuss key challenges and future opportunities for building efficient, robust, and sensor-aware visual quality assessment systems.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。