The Eye movements on Machine-generated Texts Corpus (EMTeC) is a naturalistic eye-movements-while-reading corpus of 107 native English speakers reading machine-generated texts. The texts are generated by three large language models using five different decoding strategies, and they fall into six different text-type categories. EMTeC entails the eye movement data at all stages of pre-processing, i.e., the raw coordinate data sampled at 2000Â Hz, the fixation sequences, and the reading measures. It further provides both the original and a corrected version of the fixation sequences, accounting for vertical calibration drift. Moreover, the corpus includes the language models' internals that underlie the generation of the stimulus texts: the transition scores, the attention scores, and the hidden states. The stimuli are annotated for a range of linguistic features both at text and at word level. We anticipate EMTeC to be utilized for a variety of use cases such as, but not restricted to, the investigation of reading behavior on machine-generated text and the impact of different decoding strategies; reading behavior on different text types; the development of new pre-processing, data filtering, and drift correction algorithms; the cognitive interpretability and enhancement of language models; and the assessment of the predictive power of surprisal and entropy for human reading times. The data at all stages of pre-processing, the model internals, and the code to reproduce the stimulus generation, data pre-processing, and analyses can be accessed via https://github.com/DiLi-Lab/EMTeC/ .
EMTeC: A corpus of eye movements on machine-generated texts.
阅读:25
作者:Bolliger Lena S, Haller Patrick, Cretton Isabelle C R, Reich David R, Kew Tannon, Jäger Lena A
| 期刊: | Behavior Research Methods | 影响因子: | 3.900 |
| 时间: | 2025 | 起止号: | 2025 Jun 3; 57(7):189 |
| doi: | 10.3758/s13428-025-02677-4 | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
