Abstract
This study presents DeepVisionAnalytics, an integrated framework that combines eye tracking, OpenCV-based computer vision (CV), and machine learning (ML) to support objective analysis of consumer behaviour in visually driven tasks. Unlike conventional self-reported surveys, which are prone to cognitive bias, recall errors, and social desirability effects, the proposed approach relies on direct behavioural measurements of visual attention. The system captures gaze distribution and fixation dynamics during interaction with products or interfaces. It uses AOI-level eye tracking metrics as the sole behavioural signal to infer candidate choice under constrained experimental conditions. In parallel, OpenCV and ML perform facial analysis to estimate demographic attributes (age, gender, and ethnicity). These attributes are collected independently and linked post hoc to gaze-derived outcomes. Demographics are not used as predictive features for choice inference. Instead, they are used as contextual metadata to support stratified, segment-level interpretation. Empirical results show that gaze-based inference closely reproduces observed choice distributions in short-horizon, visually driven tasks. Demographic estimates enable meaningful post hoc segmentation without affecting the decision mechanism. Together, these results show that multimodal integration can move beyond descriptive heatmaps. The platform produces reproducible decision-support artefacts, including AOI rankings, heatmaps, and segment-level summaries, grounded in objective behavioural data. By separating the decision signal (gaze) from contextual descriptors (demographics), this work contributes a reusable end-to-end platform for marketing and UX research. It supports choice inference under constrained conditions and segment-level interpretation without demographic priors in the decision mechanism.