robustness.attacker module¶
For most use cases, this can just be considered an internal class and ignored.
This module houses the robustness.attacker.Attacker
and
robustness.attacker.AttackerModel
classes.
Attacker
is an internal class that should not be
imported/called from outside the library.
AttackerModel
is a “wrapper” class which is fed a
model and adds to it adversarial attack functionalities as well as other useful
options. See robustness.attacker.AttackerModel.forward()
for documentation
on which arguments AttackerModel supports, and see
robustness.attacker.Attacker.forward()
for the arguments pertaining to
adversarial examples specifically.
For a demonstration of this module in action, see the walkthrough “Input manipulation with pretrained models”
Note 1: .forward()
should never be called directly but instead the
AttackerModel object itself should be called, just like with any
nn.Module
subclass.
Note 2: Even though the adversarial example arguments are documented in
robustness.attacker.Attacker.forward()
, this function should never be
called directly—instead, these arguments are passed along from
robustness.attacker.AttackerModel.forward()
.

class
robustness.attacker.
Attacker
(model, dataset)¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Attacker class, used to make adversarial examples.
This is primarily an internal class, you probably want to be looking at
robustness.attacker.AttackerModel
, which is how models are actually served (AttackerModel uses this Attacker class).However, the
robustness.Attacker.forward()
function below documents the arguments supported for adversarial attacks specifically.Initialize the Attacker
Parameters:  model (nn.Module) – the PyTorch model to attack
 dataset (Dataset) – dataset the model is trained on, only used to get mean and std for normalization

forward
(x, target, *_, constraint, eps, step_size, iterations, random_start=False, random_restarts=False, do_tqdm=False, targeted=False, custom_loss=None, should_normalize=True, orig_input=None, use_best=True, return_image=True, est_grad=None)¶ Implementation of forward (finds adversarial examples). Note that this does not perform inference and should not be called directly; refer to
robustness.attacker.AttackerModel.forward()
for the function you should actually be calling.Parameters:  target (x,) – see
robustness.attacker.AttackerModel.forward()
 constraint – (“2””inf””unconstrained””fourier”
AttackerStep
) : threat model for adversarial attacks (\(\ell_2\) ball, \(\ell_\infty\) ball, \([0, 1]^n\), Fourier basis, or custom AttackerStep subclass).  eps (float) – radius for threat model.
 step_size (float) – step size for adversarial attacks.
 iterations (int) – number of steps for adversarial attacks.
 random_start (bool) – if True, start the attack with a random step.
 random_restarts (bool) – if True, do many random restarts and take the worst attack (in terms of loss) per input.
 do_tqdm (bool) – if True, show a tqdm progress bar for the attack.
 targeted (bool) – if True (False), minimize (maximize) the loss.
 custom_loss (functionNone) – if provided, used instead of the
criterion as the loss to maximize/minimize during
adversarial attack. The function should take in
model, x, target
and return a tuple of the formloss, None
, where loss is a tensor of size N (perelement loss).  should_normalize (bool) – If False, don’t normalize the input (not recommended unless normalization is done in the custom_loss instead).
 orig_input (ch.tensorNone) – If not None, use this as the
center of the perturbation set, rather than
x
.  use_best (bool) – If True, use the best (in terms of loss) iterate of the attack process instead of just the last one.
 return_image (bool) – If True (default), then return the adversarial example as an image, otherwise return it in its parameterization (for example, the Fourier coefficients if ‘constraint’ is ‘fourier’)
 est_grad (tupleNone) – If not None (default), then these are
(query_radius [R], num_queries [N])
to use for estimating the gradient instead of autograd. We use the spherical gradient estimator, shown below, along with antithetic sampling [1] to reduce variance: \(\nabla_x f(x) \approx \sum_{i=0}^N f(x + R\cdot \vec{\delta_i})\cdot \vec{\delta_i}\), where \(\delta_i\) are randomly sampled from the unit ball.
Returns: An adversarial example for x (i.e. within a feasible set determined by eps and constraint, but classified as:
 target (if targeted == True)
 not target (if targeted == False)
[1] This means that we actually draw \(N/2\) random vectors from the unit ball, and then use \(\delta_{N/2+i} = \delta_{i}\).  target (x,) – see

class
robustness.attacker.
AttackerModel
(model, dataset)¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Wrapper class for adversarial attacks on models. Given any normal model (a
ch.nn.Module
instance), wrapping it in AttackerModel allows for convenient access to adversarial attacks and other applications.:model = ResNet50() model = AttackerModel(model) x = ch.rand(10, 3, 32, 32) # random images y = ch.zeros(10) # label 0 out, new_im = model(x, y, make_adv=True) # adversarial attack out, new_im = model(x, y, make_adv=True, targeted=True) # targeted attack out = model(x) # normal inference (no label needed)
More code examples available in the documentation for forward. For a more comprehensive overview of this class, see our detailed walkthrough

forward
(inp, target=None, make_adv=False, with_latent=False, fake_relu=False, no_relu=False, with_image=True, **attacker_kwargs)¶ Main function for running inference and generating adversarial examples for a model.
Parameters:  inp (ch.tensor) – input to do inference on [N x input_shape] (e.g. NCHW)
 target (ch.tensor) – ignored if make_adv == False. Otherwise, labels for adversarial attack.
 make_adv (bool) – whether to make an adversarial example for
the model. If true, returns a tuple of the form
(model_prediction, adv_input)
wheremodel_prediction
is a tensor with the logits from the network.  with_latent (bool) – also return the secondlast layer along
with the logits. Output becomes of the form
((model_logits, model_layer), adv_input)
ifmake_adv==True
, otherwise(model_logits, model_layer)
.  fake_relu (bool) – useful for activation maximization. If
True
, replace the ReLUs in the last layer with “fake ReLUs,” which are ReLUs in the forwards pass but identity in the backwards pass (otherwise, maximizing a ReLU which is dead is impossible as there is no gradient).  no_relu (bool) – If
True
, return the latent output with the (preReLU) output of the secondlast layer, instead of the postReLU output. Requiresfake_relu=False
, and has no visible effect withoutwith_latent=True
.  with_image (bool) – if
False
, only return the model output (even ifmake_adv == True
).
