Abstract
This paper introduces a gesture-controlled conversational interface driven by a local AI model, aimed at improving accessibility and facilitating hands-free interaction within digital environments. The technology utilizes real-time hand gesture recognition via a typical laptop camera and connects with a local AI engine to produce customized learning materials. Users can peruse educational documents, obtain topic summaries, and generate automated quizzes with intuitive gestures, including lateral finger movements, a two-finger gesture, or an open palm, without the need for conventional input devices. Upon selection of a file, the AI model analyzes its whole content, producing a structured summary and a multiple-choice assessment, both of which are immediately saved for subsequent inspection. A unified set of gestures facilitates seamless navigating within the user interface and the opened documents. The system underwent testing with university students and faculty (n = 31), utilizing assessment measures such as gesture detection accuracy, command-response latency, and user satisfaction. The findings demonstrate that the system offers a seamless, hands-free user experience with significant potential for usage in accessibility, human-computer interaction, and intelligent interface design. This work advances the creation of multimodal AI-driven educational aids, providing a pragmatic framework for gesture-based document navigation and intelligent content enhancement.