Abstract
Eye-tracking technology enables communication for individuals with muscle control difficulties, making it a valuable assistive tool. Traditional systems rely on electrooculography (EOG) or infrared devices, which are accurate but costly and invasive. While vision-based systems offer a more accessible alternative, they have not been extensively explored for eye-writing recognition. Additionally, the natural instability of eye movements and variations in writing styles result in inconsistent signal lengths, which reduces recognition accuracy and limits the practical use of eye-writing systems. To address these challenges, we propose a novel vision-based eye-writing recognition approach that utilizes a webcam-captured dataset. A key contribution of our approach is the introduction of a Discrete Fourier Transform (DFT)-based length normalization method that standardizes the length of each eye-writing sample while preserving essential spectral characteristics. This ensures uniformity in input lengths and improves both efficiency and robustness. Moreover, we integrate a hybrid deep learning model that combines 1D Convolutional Neural Networks (CNN) and Temporal Convolutional Networks (TCN) to jointly capture spatial and temporal features of eye-writing. To further improve model robustness, we incorporate data augmentation and initial-point normalization techniques. The proposed system was evaluated using our new webcam-captured Arabic numbers dataset and two existing benchmark datasets, with leave-one-subject-out (LOSO) cross-validation. The model achieved accuracies of 97.68% on the new dataset, 94.48% on the Japanese Katakana dataset, and 98.70% on the EOG-captured Arabic numbers dataset-outperforming existing systems. This work provides an efficient eye-writing recognition system, featuring robust preprocessing techniques, a hybrid deep learning model, and a new webcam-captured dataset.