Towards Fault-Aware Image Captioning: A Review on Integrating Facial Expression Recognition (FER) and Object Detection

面向故障感知的图像描述：面部表情识别（FER）与目标检测融合研究综述

阅读：1

作者：Khan,Abdul Saboor,Abbass,Muhammad Jamshed,Khan,Abdul Haseeb

期刊：	Sensors	影响因子：	3.500
时间：	2025	起止号：	2025 Sep 28;25(19)
doi：	10.3390/s25195992	靶点：	FER

Abstract

The term "image captioning" refers to the process of converting an image into text through computer vision and natural language processing algorithms. Image captioning is still considered an open-ended topic despite the fact that visual data, most of which pertains to images, is readily available in today's world. This is despite the fact that recent developments in computer vision, such as Vision Transformers (ViT) and language models using BERT and GPT, have opened up new possibilities for the field. The purpose of this review paper is to provide an overview of the present status of the field, with a specific emphasis on the use of facial expression recognition and object detection for the purpose of image captioning, particularly in the context of fault-aware systems and Prognostics and Health Management (PHM) applications within Industry 4.0 environments. However, to the best of our knowledge, no review study has focused on the significance of facial expressions in relation to image captioning, especially in industrial settings where operator facial expressions can provide valuable insights for fault detection and system health monitoring. This is something that has been overlooked in the existing body of research on image captioning, which is the primary reason why this study was conducted. During this paper, we will talk about the most important approaches and procedures that have been utilized for this task, including fault-aware methodologies that leverage visual data for PHM in smart manufacturing contexts, and we will highlight the advantages and disadvantages of each strategy. The purpose of this review is to present a comprehensive assessment of the current state of the field and to recommend topics for future research that will lead to machine-translated captions that are more detailed and accurate, particularly for Industry 4.0 applications where visual monitoring plays a crucial role in system diagnostics and maintenance.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。