robustness.attacker module

For most use cases, this can just be considered an internal class and ignored.

This module houses the robustness.attacker.Attacker and robustness.attacker.AttackerModel classes.

Attacker is an internal class that should not be imported/called from outside the library. AttackerModel is a “wrapper” class which is fed a model and adds to it adversarial attack functionalities as well as other useful options. See robustness.attacker.AttackerModel.forward() for documentation on which arguments AttackerModel supports, and see robustness.attacker.Attacker.forward() for the arguments pertaining to adversarial examples specifically.

For a demonstration of this module in action, see the walkthrough “Input manipulation with pre-trained models

Note 1: .forward() should never be called directly but instead the AttackerModel object itself should be called, just like with any nn.Module subclass.

Note 2: Even though the adversarial example arguments are documented in robustness.attacker.Attacker.forward(), this function should never be called directly—instead, these arguments are passed along from robustness.attacker.AttackerModel.forward().

class robustness.attacker.Attacker(model, dataset)

Bases: sphinx.ext.autodoc.importer._MockObject

Attacker class, used to make adversarial examples.

This is primarily an internal class, you probably want to be looking at robustness.attacker.AttackerModel, which is how models are actually served (AttackerModel uses this Attacker class).

However, the robustness.Attacker.forward() function below documents the arguments supported for adversarial attacks specifically.

Initialize the Attacker

Parameters:
  • model (nn.Module) – the PyTorch model to attack
  • dataset (Dataset) – dataset the model is trained on, only used to get mean and std for normalization
forward(x, target, *_, constraint, eps, step_size, iterations, random_start=False, random_restarts=False, do_tqdm=False, targeted=False, custom_loss=None, should_normalize=True, orig_input=None, use_best=True, return_image=True, est_grad=None)

Implementation of forward (finds adversarial examples). Note that this does not perform inference and should not be called directly; refer to robustness.attacker.AttackerModel.forward() for the function you should actually be calling.

Parameters:
  • target (x,) – see robustness.attacker.AttackerModel.forward()
  • constraint – (“2”|”inf”|”unconstrained”|”fourier”|AttackerStep) : threat model for adversarial attacks (\(\ell_2\) ball, \(\ell_\infty\) ball, \([0, 1]^n\), Fourier basis, or custom AttackerStep subclass).
  • eps (float) – radius for threat model.
  • step_size (float) – step size for adversarial attacks.
  • iterations (int) – number of steps for adversarial attacks.
  • random_start (bool) – if True, start the attack with a random step.
  • random_restarts (bool) – if True, do many random restarts and take the worst attack (in terms of loss) per input.
  • do_tqdm (bool) – if True, show a tqdm progress bar for the attack.
  • targeted (bool) – if True (False), minimize (maximize) the loss.
  • custom_loss (function|None) – if provided, used instead of the criterion as the loss to maximize/minimize during adversarial attack. The function should take in model, x, target and return a tuple of the form loss, None, where loss is a tensor of size N (per-element loss).
  • should_normalize (bool) – If False, don’t normalize the input (not recommended unless normalization is done in the custom_loss instead).
  • orig_input (ch.tensor|None) – If not None, use this as the center of the perturbation set, rather than x.
  • use_best (bool) – If True, use the best (in terms of loss) iterate of the attack process instead of just the last one.
  • return_image (bool) – If True (default), then return the adversarial example as an image, otherwise return it in its parameterization (for example, the Fourier coefficients if ‘constraint’ is ‘fourier’)
  • est_grad (tuple|None) – If not None (default), then these are (query_radius [R], num_queries [N]) to use for estimating the gradient instead of autograd. We use the spherical gradient estimator, shown below, along with antithetic sampling [1] to reduce variance: \(\nabla_x f(x) \approx \sum_{i=0}^N f(x + R\cdot \vec{\delta_i})\cdot \vec{\delta_i}\), where \(\delta_i\) are randomly sampled from the unit ball.
Returns:

An adversarial example for x (i.e. within a feasible set determined by eps and constraint, but classified as:

  • target (if targeted == True)
  • not target (if targeted == False)

[1]This means that we actually draw \(N/2\) random vectors from the unit ball, and then use \(\delta_{N/2+i} = -\delta_{i}\).
class robustness.attacker.AttackerModel(model, dataset)

Bases: sphinx.ext.autodoc.importer._MockObject

Wrapper class for adversarial attacks on models. Given any normal model (a ch.nn.Module instance), wrapping it in AttackerModel allows for convenient access to adversarial attacks and other applications.:

model = ResNet50()
model = AttackerModel(model)
x = ch.rand(10, 3, 32, 32) # random images
y = ch.zeros(10) # label 0
out, new_im = model(x, y, make_adv=True) # adversarial attack
out, new_im = model(x, y, make_adv=True, targeted=True) # targeted attack
out = model(x) # normal inference (no label needed)

More code examples available in the documentation for forward. For a more comprehensive overview of this class, see our detailed walkthrough

forward(inp, target=None, make_adv=False, with_latent=False, fake_relu=False, no_relu=False, with_image=True, **attacker_kwargs)

Main function for running inference and generating adversarial examples for a model.

Parameters:
  • inp (ch.tensor) – input to do inference on [N x input_shape] (e.g. NCHW)
  • target (ch.tensor) – ignored if make_adv == False. Otherwise, labels for adversarial attack.
  • make_adv (bool) – whether to make an adversarial example for the model. If true, returns a tuple of the form (model_prediction, adv_input) where model_prediction is a tensor with the logits from the network.
  • with_latent (bool) – also return the second-last layer along with the logits. Output becomes of the form ((model_logits, model_layer), adv_input) if make_adv==True, otherwise (model_logits, model_layer).
  • fake_relu (bool) – useful for activation maximization. If True, replace the ReLUs in the last layer with “fake ReLUs,” which are ReLUs in the forwards pass but identity in the backwards pass (otherwise, maximizing a ReLU which is dead is impossible as there is no gradient).
  • no_relu (bool) – If True, return the latent output with the (pre-ReLU) output of the second-last layer, instead of the post-ReLU output. Requires fake_relu=False, and has no visible effect without with_latent=True.
  • with_image (bool) – if False, only return the model output (even if make_adv == True).