Abstract
Dementia, one of the most prevalent neurodegenerative diseases, affects millions worldwide. Understanding linguistic markers of dementia is crucial for elucidating how cognitive decline manifests in speech patterns. Current non-invasive assessments like the Montreal Cognitive Assessment (MoCA) and Saint Louis University Mental Status (SLUMS) tests rely on manual interpretation and often lack detailed linguistic insight. This paper introduces a first-of-its-kind interpretable artificial intelligence (IAI) framework, CharMark, which leverages first-order Markov Chain models to characterize language production at the character level. By computing steady-state probabilities of character transitions in speech transcripts from individuals with dementia and healthy controls, we uncover distinctive character-usage patterns. The space character " ", representing pauses, and letters such as "n" and "i" showed statistically significant differences between groups. Principal Component Analysis (PCA) revealed natural clustering aligned with cognitive status, while Kolmogorov-Smirnov tests confirmed distributional shifts. A Lasso Logistic Regression model further demonstrated that these character-level features possess strong discriminative potential. Our primary contribution is the identification and characterization of candidate linguistic biomarkers of cognitive decline; features that are both interpretable and easily computable. These findings highlight the potential of character-level modeling as a lightweight, scalable strategy for early-stage dementia screening, particularly in settings where more complex or audio-dependent models may be impractical.