Abstract
This two-part meta-study explores the relationship of Shannon's information and iconicity in American English, with a focus on their implications for cognitive processing and the evolution of lexemes. Part one explores the expression of information in iconic words by calculating phonemic bigram surprisal using a very large corpus of spoken American English and cross referencing it with iconicity ratings. Iconic words-those with a form/meaning resemblance-are known to be processed with a cognitive advantage, so they are included in our tests as a benchmark. Within the framework of the Iconic Treadmill Hypothesis, we posit that as iconic words evolve towards arbitrariness, bigram sequences become more predictable, offsetting some the cognitive costs associated with processing arbitrary words. In part 2, we extend Cognitive Load Theory and the Lossy Context Surprisal Model-both sentence level language processing models-to test our predictions at the bigram level using the results of a battery of existing psycholinguistic experiments. In line with these models that explain the psycholinguistic consequences of hearing improbable words in sentences, our results show that words made up of improbable phonemes are processed with cognitive disadvantage, but that extra processing effort enhances their retention in long term memory. Overall, our findings speak to the cognitive limitations of language processing and how these limitations influence lexeme evolution.