Abstract
The growing importance of machine learning in geological data computing, as well as in the broader geoscience and energy sectors, makes it essential to evaluate the impact of data augmentation on model performance, since the reliability of model outcomes in these domains is critical. Although data augmentation is widely applied and often recommended as a standard step in machine learning pipelines, its uncritical use may distort results and undermine the reliability of automated analyses. Despite its popularity, relatively few systematic studies assess how augmentation influences model behavior in geoscience applications, particularly in microscopic rock imagery. To address this need, the present work investigates the effects of static and dynamic data space augmentation on the performance and generalization of convolutional models, using datasets of realistic microscopic images of rock thin sections as an interdisciplinary application of established techniques in geological prediction. A total of 133 augmentation configurations across 691 scenarios were tested on five convolutional models, including three pretrained networks and two trained from scratch. Results show that augmentation has a significant and highly variable impact on performance. In many cases, augmentation reduced classification accuracy, while specific configurations improved outcomes. Linear and nonlinear image mixing proved particularly beneficial for generalization, especially when data were limited. Furthermore, in low-data scenarios, static augmentation sometimes yielded better or more stable results than dynamic approaches, reducing the need for extensive fine-tuning of models. While some augmentation methods enhanced performance, many degraded it, and although training times scaled proportionally with the technique, accuracies varied considerably. These findings highlight that augmentation can partly offset data limitations. However, the strategies should be chosen carefully and tailored to specific use in geoscientific machine learning applications.