Abstract
INTRODUCTION: Early detection of mild cognitive impairment (MCI) is critical for intervention and dementia prevention. Interpretable linguistic features may offer transparent, cross-linguistic markers yet remain underexplored in bilingual dataset contexts. METHODS: Using the TAUKADIAL Challenge dataset, which includes English and Chinese picture descriptions, we extracted 93 language-specific linguistic features with the Efficient Linguistic Feature Extraction for Natural Language Datasets (ELFEN) package and 141 language-agnostic linguistic features using the Comprehensive Handcrafted Linguistic Features (LFTK) toolkit. One-way ANOVA and Tukey's Honestly Significant Difference tests assessed associations with diagnosis, task, and language. RESULTS: Seven ELFEN and 33 LFTK features showed significant differences between diagnostic groups. MCI speech exhibited reduced lexical diversity, fewer pronouns, greater use of numerals and participles, and longer sentences across both languages. Task- and language-based analyses revealed structural and lexical variability, with greater variability in Chinese responses. DISCUSSION: These findings identify statistically significant, interpretable linguistic features associated with MCI, establishing a cross-linguistic foundation for developing transparent, multilingual tools for early cognitive assessment. HIGHLIGHTS: Nintey-three language-specific and 141 language-agnostic features are analyzed from bilingual speech.Seven ELFEN and 33 LFTK features were identified as significantly linked to MCI diagnosis.MCI speakers used fewer pronouns, showed lower lexical diversity, and produced longer sentences.Findings reveal consistent cross-linguistic markers in English and Chinese picture descriptions.The study offers an interpretable, statistically validated foundation for multilingual MCI screening tools.