Abstract
OBJECTIVES: To develop and validate a smartphone video-based framework using deep learning for quantifying smooth-pursuit abnormalities in Parkinson's disease. METHODS: Smartphone videos (N = 54) from 18 patients with confirmed Parkinson's disease were rigorously annotated to identify 1767 event-level samples (2-second windows), comprising 941 normal and 826 abnormal smooth-pursuit events. Ocular landmarks were extracted using MediaPipe FaceLandmarker. Preprocessing steps included canthus-referenced spatial normalization, Kalman smoothing, and blink filtering. Event samples were encoded as kinematic feature sequences and classified using DP-MDLA Net, a dual-path multi-scale dilated-LSTM attention architecture that fuses convolutional and recurrent representations. RESULTS: Under a random split regimen for event samples, the framework achieved 96.59% accuracy, 97.50% precision, 95.12% recall, 96.03% F1-score, and an AUC of 0.9939 on the test set (n = 176). Five-fold cross-validation yielded a mean accuracy of 93.04% (SD 1.86%) and a mean AUC of 0.9735 (SD 0.0102). Subject-independent validation (disjoint split by patient) produced an accuracy of 93.57% and an AUC of 0.9693. Ablation without normalization decreased accuracy to 84.09% and AUC to 0.9323, indicating the critical role of landmark-based spatial alignment. CONCLUSION: The framework enables robust event-level quantification of smooth-pursuit abnormalities from smartphone video, supporting portable bedside assessment and standardized longitudinal monitoring of Parkinson's disease without specialized equipment.