Investigating the capabilities of large vision language models in dog emotion recognition

研究大型视觉语言模型在狗情绪识别方面的能力

阅读:1

Abstract

Identifying emotional states in animals is a key challenge in behavioural science and a prerequisite for developing reliable welfare assessments, ethical frameworks, and robust human-animal communication models. Recently, large vision-language models (LVLMs) such as GPT-4o, Gemini, and LLaVA have shown promise in general image understanding tasks, and are beginning to be applied for emotion recognition in animals. In this study, we critically evaluated the ability of state-of-the-art LVLMs to classify emotional states in dogs using a zero-shot approach. We assessed model performance on two datasets: (1) the Dog Emotions (DE) dataset, consisting of web-sourced images with layperson-generated emotion labels, and (2) the Labrador Retriever cropped-face (LRc) dataset, which stems from a rigorously controlled experimental study where emotional states were systematically elicited in dogs and defined based on the experimental context in canine emotion research. Our results revealed that while LVLMs showed moderate classification accuracy on DE, performance is likely driven by superficial correlations, such as background context and breed morphology. When evaluated on LRc, where emotional states are experimentally induced and backgrounds are minimal, performance dropped to near-chance levels, indicating limited ability to generalise based on biologically relevant cues. Background manipulation experiments further confirmed that models relied heavily on contextual features. Prompt variation and system-level instructions slightly improved response rates but did not enhance classification accuracy. These findings highlight significant limitations in the current application of LVLMs to non-human species and raise ethical and epistemological concerns regarding potential anthropocentric biases embedded in their training data. We advocate for species-sensitive AI approaches grounded in validated behavioural science, emphasising the need for high-quality, preferably experimentally-based multimodal datasets and more transparent validation. Our study underscores both the potential and the risks of using general-purpose AI to infer internal states in animals and calls for rigorous, interdisciplinary development of animal-centred computational approaches.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。