Abstract
A persistent challenge for Deaf and Hard-of-Hearing individuals is the communication gap between sign language users and the hearing community, particularly in regions with limited automated translation resources. In Saudi Arabia, this gap is amplified by the reliance on Saudi Sign Language (SSL) and the scarcity of real-time, sentence-level translation systems. This paper presents a real-time system for sentence-level recognition of continuous SSL and direct mapping to natural spoken Arabic. The proposed system operates end-to-end on live video streams or pre-recorded content, extracting spatio-temporal landmark features using the MediaPipe Holistic framework. For classification, the input feature vector consists of 225 features derived from hand and body pose landmarks. These features are processed by a Bidirectional Long Short-Term Memory (BiLSTM) network trained on the ArabSign (ArSL) dataset to perform direct sentence-level classification over a vocabulary of 50 continuous Arabic sign language sentences, supported by an idle-based segmentation mechanism that enables natural, uninterrupted signing. Experimental evaluation demonstrates robust generalization: under a Leave-One-Signer-Out (LOSO) cross-validation protocol, the model attains a mean sentence-level accuracy of 94.2%, outperforming the fixed signer-independent split baseline of 92.07%, while maintaining real-time performance suitable for interactive use. To enhance linguistic fluency, an optional post-recognition refinement stage is incorporated using a large language model (LLM), followed by text-to-speech synthesis to produce audible Arabic output; this refinement operates strictly as post-processing and is not included in the reported recognition accuracy metrics. The results demonstrate that direct sentence-level modeling, combined with landmark-based feature extraction and real-time segmentation, provides an effective and practical solution for continuous SSL sentence recognition in real-time.