Abstract
INTRODUCTION: Contextual embeddings-a core component of large language models (LLMs) that generate dynamic vector representations capturing words' semantic properties-have demonstrated structural similarities to brain activity patterns at the single-word level. This alignment supports the theoretical framework proposing vector-based neural coding for natural language processing in the brain, where linguistic units may be represented as context-sensitive vectors analogous to LLM-derived embeddings. Building on this framework, we hypothesize that cumulative distance metrics between contextual embeddings of adjacent linguistic units (words/Chinese characters) in sentence contexts may quantitatively reflect neural activation intensity during reading comprehension. METHODS: Using large-scale EEG datasets collected during reading tasks, we systematically investigated the relationship between these computationally derived distance features and frequency-specific band power measures associated with neural activity. RESULTS: In conclusion, gamma-band power exhibited associations with various NLP features in the ChineseEEG dataset, whereas no comparable gamma-specific effects were observed in the ZuCo1.0 dataset. Additionally, significant effects were found in other frequency bands for both datasets. DISCUSSION: The mixed yet intriguing results invite a deeper discussion of the directional associations (positive/negative) observed in Gamma and other frequency bands, their cognitive implications, and the potential influence of textual characteristics on these findings. While observed effects may be somehow text- or dataset- dependent, our analyses revealed associations between various distance metrics and neural responses, consistent with predictions derived from the vector-based neural coding framework.