Abstract
Theoretically driven meta-analyses often involve testing whether study characteristics moderate an effect in a pattern that supports one of multiple competing theoretical accounts. To conduct such analyses, researchers must manually extract, i.e., code, these characteristics from sometimes hundreds of studies. Ideally, studies should be coded by two researchers independently to prevent errors and biases from compromising the dataset's integrity. The laborious nature of this task, however, means that meta-analysts usually settle for double-coding just a portion of the studies included in their study. Some researchers have proposed using large language models (LLMs) as a double-coder, showing that they can reliably extract explicitly stated information (e.g., publication year) from articles. However, meta-analyses in psychology generally require studies to be coded in terms of higher-level conceptual dimensions (e.g., the type of behavior rather than which behavior). The present study investigated whether two LLMs, OpenAI's GPT-5 and Google's Gemini 2.5 Pro, can code studies in this way. Both models replicated the study codes from three recently published meta-analyses in psychology with high accuracy (>92% overall), despite the limited information they received. LLM double-coding, therefore, offers a practical solution for meta-analysts seeking to ensure data integrity.