Gradual poisoning of a chest x-ray convolutional neural network with an adversarial attack and AI explainability methods

利用对抗攻击逐步毒害胸部X光卷积神经网络及人工智能可解释性方法

阅读:1

Abstract

Given artificial intelligence's transformative effects, studying safety is important to ensure it is implemented in a beneficial way. Convolutional neural networks are used in radiology research for prediction but can be corrupted through adversarial attacks. This study investigates the effect of an adversarial attack, through poisoned data. To improve generalizability, we create a generic ResNet pneumonia classification model and then use it as an example by subjecting it to BadNets adversarial attacks. The study uses various poisoned datasets of different compositions (2%, 16.7% and 100% ratios of poisoned data) and two different test sets (a normal set of test data and one that contained poisoned images) to study the effects of BadNets. To provide a visual effect of the progressing corruption of the models, SHapley Additive exPlanations (SHAP) were used. As corruption progressed, interval analysis revealed that performance on a valid test set decreased while the model learned to predict better on a poisoned test set. SHAP visualization showed focus on the trigger. In the 16.7% poisoned model, SHAP focus did not fixate on the trigger in the normal test set. Minimal effects were seen in the 2% model. SHAP visualization showed decreasing performance was correlated with increasing focus on the trigger. Corruption could potentially be masked in the 16.7% model unless subjected specifically to poisoned data. A minimum threshold for corruption may exist. The study demonstrates insights that can be further studied in future work and with future models. It also identifies areas of potential intervention for safeguarding models against adversarial attacks.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。