Image Captioning with Object Detection and Facial Expression Recognition for Smart Industry

面向智能行业的基于目标检测和面部表情识别的图像描述

阅读:1

Abstract

This paper presents a new image captioning system which contains facial expression recognition as a way to provide better emotional and contextual comprehension of the captions generated. A combination of affective cues and visual features is made, which enables semantically full and emotionally conscious descriptions. Experiments were carried out on two created datasets, FlickrFace11k and COCOFace15k, with standard benchmarks such as BLEU, METEOR, ROUGE-L, CIDEr, and SPICE to analyze their effectiveness. The suggested model produced better results in all metrics as compared to baselines, like Show-Attend-Tell and Up-Down, remaining consistently better on all the scores. Remarkably, it has reached gains of 2.5 points on CIDEr and 1.0 on SPICE, which means a closer correlation to the prompt captions made by people. A 5-fold cross-validation confirmed the model's robustness, with minimal standard deviation across folds (<±0.2). Qualitative results further demonstrated its ability to capture fine-grained emotional expressions often missed by conventional models. These findings underscore the model's potential in affective computing, assistive technologies, and human-centric AI applications. The pipeline is designed for on-prem/edge deployment with lightweight interfaces to IoT middleware (MQTT/OPC UA), enabling smart-factory integration. These characteristics align the method with Industry 4.0 sensor networks and human-centric analytics.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。