Abstract
Computer-based assessments generate rich process data that captures examinees' interactions with test items. Using process data from the U.S. PISA 2012 computer-based mathematics assessment sample, this study applied recurrent neural networks to predict item-level correctness and assessment-level latent proficiency. The analysis also examines the impact of expert-engineered features, levels of architectural complexity, action variability, and score variability on model performance. At the item level, most models achieved AUC values around 0.80, indicating good predictive performance. Moderate correlations were observed between latent proficiency from 30 items and predictions based on process data from a subset of items (n = 10). For item-level models, adding expert-engineered features reduces training time and may improve predictive performance with low action variability. For the assessment-level models, adding expert-engineered features improved performance. Model complexity, including model type (i.e., standard RNN, GRU, and LSTM), number of nodes, and number of layers, had little effect on accuracy and efficiency. Moreover, items with greater action variability were associated with better model performance. The findings suggest that simple neural network architectures are sufficient for modeling process data with limited action variability and that combining action sequences with expert-engineered features improves accuracy, efficiency, and interpretability.