Impact of data space augmentation strategy on model accuracy and generalization in thin-section rock classification

数据空间增强策略对薄片岩石分类模型精度和泛化能力的影响

阅读:1

Abstract

The growing importance of machine learning in geological data computing, as well as in the broader geoscience and energy sectors, makes it essential to evaluate the impact of data augmentation on model performance, since the reliability of model outcomes in these domains is critical. Although data augmentation is widely applied and often recommended as a standard step in machine learning pipelines, its uncritical use may distort results and undermine the reliability of automated analyses. Despite its popularity, relatively few systematic studies assess how augmentation influences model behavior in geoscience applications, particularly in microscopic rock imagery. To address this need, the present work investigates the effects of static and dynamic data space augmentation on the performance and generalization of convolutional models, using datasets of realistic microscopic images of rock thin sections as an interdisciplinary application of established techniques in geological prediction. A total of 133 augmentation configurations across 691 scenarios were tested on five convolutional models, including three pretrained networks and two trained from scratch. Results show that augmentation has a significant and highly variable impact on performance. In many cases, augmentation reduced classification accuracy, while specific configurations improved outcomes. Linear and nonlinear image mixing proved particularly beneficial for generalization, especially when data were limited. Furthermore, in low-data scenarios, static augmentation sometimes yielded better or more stable results than dynamic approaches, reducing the need for extensive fine-tuning of models. While some augmentation methods enhanced performance, many degraded it, and although training times scaled proportionally with the technique, accuracies varied considerably. These findings highlight that augmentation can partly offset data limitations. However, the strategies should be chosen carefully and tailored to specific use in geoscientific machine learning applications.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。