Abstract
Early-stage Alzheimer's disease (AD) remains difficult to assess using conventional linguistic or cognitive assessments, which often overlook subtle and individualized disruptions in speech. In this work, we propose a novel biomarker discovery framework that leverages fine-grained, character-level information from speech transcripts to capture these early cognitive changes. By encoding transcripts symbolically at the character level and applying recurrence quantification analysis (RQA), we generate interpretable recurrence plots that reveal temporal dynamics in speech patterns such as pauses, repetitions, and hesitations. Siamese neural networks are then used to learn embeddings from these recurrence representations, enabling the discovery of discriminative linguistic biomarkers associated with cognitive decline. Applied to the DementiaBank corpus, our approach uncovers meaningful character-level signatures and enables visualization of subtle cognitive disruptions through recurrence plots. These findings suggest that character-level temporal patterns may offer a promising new direction for digital biomarker discovery in dementia research, complementing traditional word-level analyses and enhancing interpretability for clinical applications.