Abstract
Background/Objectives: Neuropsychological assessments are valuable tools for evaluating the cognitive performance of older adults. Limitations associated with these in-person paper-and-pencil tests have inspired efforts to develop digital assessments, which would expand access to cognitive screening. Digital tests, however, often lack validity relative to gold-standard paper-and-pencil versions that have been robustly validated. Speech-to-text (STT) technology has the potential to improve the validity of digital tests through its ability to capture verbal responses, yet the effect of its performance on standardized scores used for cognitive characterization is unknown. Methods: The present study evaluated the accuracy of Apple's STT engine relative to ground-truth transcriptions (RQ1), as well as the effect of the engine's transcription errors on resulting standardized scores (RQ2). Our study analyzed data from 223 older adults who completed a digital assessment on an iPad that used STT to transcribe and score task responses. These automated transcriptions were then compared against ground-truth transcriptions that were human-corrected via external recordings. Results: Results showed differences between STT and ground-truth transcriptions (RQ1). Nevertheless, these differences were not large enough to practically affect standardized measures of cognitive performance (RQ2). Conclusions: Our results demonstrate the practical utility of Apple's STT engine for digital neuropsychological assessment and cognitive characterization. These findings support the possibility that speech-to-text, with its ability to capture and process verbal responses, will be a viable tool for increasing the validity of digital neuropsychological assessments.