We consider adversarial attacks to a black-box model when no queries are
allowed. In this setting, many methods directly attack surrogate models and
transfer the obtained adversarial examples to fool the target model. Plenty of
previous works investigated what kind of attacks to the surrogate model can
generate more transferable adversarial examples, but their performances are
still limited due to the mismatches between surrogate models and the target
model. In this paper, we tackle this problem from a novel angle — instead of
using the original surrogate models, can we obtain a Meta-Surrogate Model (MSM)
such that attacks to this model can be easier transferred to other models? We
show that this goal can be mathematically formulated as a well-posed
(bi-level-like) optimization problem and design a differentiable attacker to
make training feasible. Given one or a set of surrogate models, our method can
thus obtain an MSM such that adversarial examples generated on MSM enjoy
eximious transferability. Comprehensive experiments on Cifar-10 and ImageNet
demonstrate that by attacking the MSM, we can obtain stronger transferable
adversarial examples to fool black-box models including adversarially trained
ones, with much higher success rates than existing methods. The proposed method
reveals significant security challenges of deep models and is promising to be
served as a state-of-the-art benchmark for evaluating the robustness of deep
models in the black-box setting.

By admin