You are on page 1of 25

Image recognition: Defense adversarial attacks

using Generative Adversarial Network (GAN)

Speaker: Guofei Pang

Division of Applied Mathematics


Brown University

Presentation after reading the paper:


Ilyas, Andrew, et al. "The Robust Manifold Defense: Adversarial Training using
Generative Models." arXiv preprint arXiv:1712.09196 (2017).
Outline

 Adversarial attacks
 Generative Adversarial Network (GAN)
 How to defense attacks using GAN
 Numerical results

2/25
Adversarial Attacks

3/25
Adversarial Attacks

4/25
Adversarial Attacks

n =

𝐈𝐦𝐚𝐠𝐞 𝐚𝐬 𝐚 𝐯𝐞𝐜𝐭𝐨𝐫:
m
𝐱 = 𝐱𝐣 ,
𝐣 = 𝟏, 𝟐, … , 𝐧 ∗ 𝐦

n*m

5/25
Adversarial Attacks

𝐱𝟏

𝐱𝟐

𝐱𝟏 − 𝐱 𝟐 𝟐 < 𝐞𝟎 𝐂(𝐱𝟏) − 𝐂(𝐱𝟐) > 𝐟𝟎

Adversarial examples for a classifier C():


 A pair of input x1 and x2
 A person says they are of the same class
 But a classifier will they are completely different!
6/25
Adversarial Attacks

Why does classifier become fool for these examples?

7/25
Adversarial Attacks

Why does classifier become fool for these examples?

An intuition from the authors:


 Natural image: Low-dimensional manifold
 Noisy image: High-dimensional manifold
 High dimensionality is tough for classifier.

8/25
Generative adversarial network (GAN)

 x and x’ have similar PDF


 G() has learned the underlying distribution of
image dataset after training GAN
 The DNN G() is a nonlinear mapping from low-
dimensional space, z, to high-dimensional space, x’

Original image x Synthetic image /Generative model


x’=G(z)
GAN Generator G(z)

Noisy input z, say, z – N(0,I)

9/25
Generative adversarial network (GAN)

 Convergence state: pdata(x)=pG(x)


 Green solid line: probability density function (PDF) of the
generator G()
 Black dotted line: PDF of original image x, i.e., pdata(x)
 Blue dash line: PDF of discriminator D()
10/25
Generative adversarial network (GAN)

11/25
How to defense attacks using GAN

 G() is pre-trained and has learned the underlying


distribution of the training (image) dataset after
training GAN
Invert and Classify
Synthetic image x’=G(z*)
(Preserve low-dimensional
manifold)
Classifier C()

Original image x
(Could include high-
dimensional manifold
when noise enters)

12/25
How to defense attacks using GAN

 G() is pre-trained and has learned the underlying


distribution of the training (image) dataset after
training GAN
Enhanced Invert and Classify
Synthetic image x’=G(z*)
(Preserve low-dimensional Classifier C()
manifold) (retrain the classifier)

Upper bound of attack magnitude Classification loss

13/25
Numerical results

First-order classifier attacks for handwritten digit classification

14/25
Numerical results

First-order classifier attacks for handwritten digit classification

15/25
Numerical results

First-order classifier attacks for handwritten digit classification

16/25
Numerical results

First-order classifier attacks for gender classification

17/25
Numerical results

First-order classifier attacks for gender classification

18/25
Numerical results

Substitute model attacks

Results from Invert and Classify

19/25
Numerical results

Comparison between
Invert and Classify and Enhanced Invert and Classify

20/25
Numerical results

21/25
Numerical results

22/25
Numerical results

23/25
Numerical results

24/25
Thinking

 GAN for regression problems?

 GAN versus other neural


networks?

 One defense strategy for all types


of attacks?
25/25

You might also like