Abstract
Real-world conceptual knowledge is complex, multifaceted, and substantially over-simplified in most laboratory studies. Here we develop a mathematical framework, based on natural language processing models, for tracking and characterizing the acquisition of real-world conceptual knowledge. Our approach embeds each concept in a high-dimensional representation space where nearby coordinates reflect similar or related concepts. We test our approach using behavioral data from participants who answered small sets of multiple-choice quiz questions interleaved between watching two Khan Academy course videos. We apply our framework to the videos' transcripts and the text of the quiz questions to quantify the content of each moment of video and each quiz question. We use these embeddings, along with participants' quiz responses, to track how the learners' knowledge changed after watching each video and predict their success on individual quiz questions. Our findings show how a small set of quiz questions may be used to obtain rich and meaningful insights into what each learner knows, and how their knowledge changes over time as they learn.