Abstract
The Palace Museum’s digital cultural relics library, a key component of Chinese cultural digitization initiative, offers high-precision image acquisition and multi-dimensional search. However, its practical usage reveals significant issues, including a unimodal interaction design, fragmented information structure, and an insufficiently layered user experience. Based on the system logic of “behavior-driven-perception enhancement-interface reconstruction”, this study adopts the empirical method of combining eye tracking and behavior analysis to investigate the visual attention, information acquisition efficiency and user experience of three types of typical user groups in the digital cultural relics library. The results of the study reveal significant differences in the understanding of cultural content and interface use strategies among different users, which further indicates the cultural expression gap and the defective navigation mechanism in the current system. Based on the above findings, this paper proposes a multimodal interface optimization scheme with “audio-visual interaction” as the core, which covers the visual guide system, hierarchical voice explanation, semantic structure reconstruction and user hierarchical adaptation mechanism, and integrates their logical paths visually through the system flowchart. The optimization goal is not only to improve the interface friendliness and cultural communication, but also to foster immersive perception and facilitate the narrative transmission of cultural information. The study finally builds up a closed loop of the theory of “empathic psychology-media materiality-multimodal interaction”, which provides a new direction for the digital platform of cultural heritage from “static presentation” to “dynamic transmission”.