You are on page 1of 64

Introduction

The Basics

Adding Prior Knowledge

Sparse Coding: An Overview


Brian Booth
SFU Machine Learning Reading Group

November 12, 2013

Conclusions

Introduction

The Basics

The aim of sparse coding

Adding Prior Knowledge

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Conclusions

The aim of sparse coding

Every column
of D is a
prototype

Introduction

The Basics

Adding Prior Knowledge

Conclusions

The aim of sparse coding

Every column
of D is a
prototype
Similar to, but
more general
than, PCA

Introduction

The Basics

Adding Prior Knowledge

Example: Sparse Coding of Images

Conclusions

Introduction

The Basics

Sparse Coding in V1

Adding Prior Knowledge

Conclusions

Introduction

The Basics

Example: Image Denoising

Adding Prior Knowledge

Conclusions

Introduction

The Basics

Example: Image Restoration

Adding Prior Knowledge

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Sparse Coding and Acoustics

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Sparse Coding and Acoustics

Inner ear (cochlea) also


does sparse coding of
frequencies

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Sparse Coding and Natural Language Processing

Introduction

The Basics

Adding Prior Knowledge

Outline

Introduction: Why Sparse Coding?

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Outline

Introduction: Why Sparse Coding?

Sparse Coding: The Basics

Adding Prior Knowledge

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Outline

Introduction: Why Sparse Coding?

Sparse Coding: The Basics

Adding Prior Knowledge

Conclusions

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Outline

Introduction: Why Sparse Coding?

Sparse Coding: The Basics

Adding Prior Knowledge

Conclusions

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Outline

Introduction: Why Sparse Coding?

Sparse Coding: The Basics

Adding Prior Knowledge

Conclusions

Conclusions

Introduction

The Basics

Adding Prior Knowledge

The aim of sparse coding, revisited


We assume our data x satisfies
x

n
X
i=1

i di = D

Conclusions

Introduction

The Basics

Adding Prior Knowledge

The aim of sparse coding, revisited


We assume our data x satisfies
x

n
X

i di = D

i=1

Learning:
Given training data xj , j {1, , m}
Learn dictionary D and sparse code

Conclusions

Introduction

The Basics

Adding Prior Knowledge

The aim of sparse coding, revisited


We assume our data x satisfies
x

n
X

i di = D

i=1

Learning:
Given training data xj , j {1, , m}
Learn dictionary D and sparse code
Encoding:
Given test data x, dictionary D
Learn sparse code

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Learning: The Objective Function


Dictionary learning involves optimizing:
m
n
X
X
j
arg min
kx
ij di k2
{di },{j }

j=1

i=1

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Learning: The Objective Function


Dictionary learning involves optimizing:
n
m X
m
n
X
X
X
j
j
2
|ij |
arg min
kx
i di k +
{di },{j }

j=1

i=1

j=1 i=1

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Learning: The Objective Function


Dictionary learning involves optimizing:
n
m X
m
n
X
X
X
j
j
2
|ij |
arg min
kx
i di k +
{di },{j }

j=1

i=1
2

subject to kdi k c,

j=1 i=1

i = 1, , n.

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Learning: The Objective Function


Dictionary learning involves optimizing:
n
m X
m
n
X
X
X
j
j
2
|ij |
arg min
kx
i di k +
{di },{j }

j=1

j=1 i=1

i=1
2

subject to kdi k c,

i = 1, , n.

In matrix notation:
arg minkX ADk2F +
D,A

subject to

X
|i,j |
i,j

X
i

D2i,j c,

i = 1, , n.

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Learning: The Objective Function


Dictionary learning involves optimizing:
n
m X
m
n
X
X
X
j
j
2
|ij |
arg min
kx
i di k +
{di },{j }

j=1

j=1 i=1

i=1
2

subject to kdi k c,

i = 1, , n.

In matrix notation:
arg minkX ADk2F +
D,A

subject to

X
|i,j |
i,j

D2i,j c,

i = 1, , n.

Split the optimization over D and A in two.

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Step 1: Learning the Dictionary


Reduced optimization problem:
arg minkX ADk2F
D
X
subject to
D2i,j c,
i

i = 1, , n.

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Step 1: Learning the Dictionary


Reduced optimization problem:
arg minkX ADk2F
D
X
subject to
D2i,j c,

i = 1, , n.

Introduce Lagrange multipliers:


n
 X
L (D, ) = tr (X AD)T (X AD) +
j

j=1

!
X
i

Di,j c

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Step 1: Learning the Dictionary


Reduced optimization problem:
arg minkX ADk2F
D
X
subject to
D2i,j c,

i = 1, , n.

Introduce Lagrange multipliers:


n
 X
L (D, ) = tr (X AD)T (X AD) +
j

j=1

where each j 0 is a dual variable...

!
X
i

Di,j c

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Step 1: Moving to the dual


From the Lagrangian
n
 X
L (D, ) = tr (X AD) (X AD) +
j

j=1

!
X
i

D2i,j c

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Step 1: Moving to the dual


From the Lagrangian
n
 X
L (D, ) = tr (X AD) (X AD) +
j

j=1

minimize over D to obtain Lagrange dual


D () = min L (D, ) =
D

!
X
i

D2i,j c

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Step 1: Moving to the dual


From the Lagrangian
n
 X
L (D, ) = tr (X AD) (X AD) +
j

j=1

!
X

D2i,j c

minimize over D to obtain Lagrange dual





1 
T
T
T
T
T
D () = min L (D, ) = tr X X XA AA +
XA
c
D

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Step 1: Moving to the dual


From the Lagrangian
n
 X
L (D, ) = tr (X AD) (X AD) +
j

j=1

!
X

D2i,j c

minimize over D to obtain Lagrange dual





1 
T
T
T
T
T
D () = min L (D, ) = tr X X XA AA +
XA
c
D

The dual can be optimized using congugate gradient

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Step 1: Moving to the dual


From the Lagrangian
n
 X
L (D, ) = tr (X AD) (X AD) +
j

j=1

!
X

D2i,j c

minimize over D to obtain Lagrange dual





1 
T
T
T
T
T
D () = min L (D, ) = tr X X XA AA +
XA
c
D

The dual can be optimized using congugate gradient


Only n, values compared to D being n k

Introduction

The Basics

Adding Prior Knowledge

Step 1: Dual to the Dictionary

With the optimal , our dictionary is



1 
T
DT = AAT +
XAT

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Step 1: Dual to the Dictionary

With the optimal , our dictionary is



1 
T
DT = AAT +
XAT

Key point: Moving to the dual reduces the number of


optimization variables, speeding up the optimization.

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Step 2: Learning the Sparse Code


With D now fixed, optimize for A
arg minkX ADk2F +
A

X
i,j

|i,j |

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Step 2: Learning the Sparse Code


With D now fixed, optimize for A
arg minkX ADk2F +
A

|i,j |

i,j

Unconstrained, convex quadratic optimization

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Step 2: Learning the Sparse Code


With D now fixed, optimize for A
arg minkX ADk2F +
A

|i,j |

i,j

Unconstrained, convex quadratic optimization


Many solvers for this (e.g. interior point methods, in-crowd
algorithm, fixed-point continuation)

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Step 2: Learning the Sparse Code


With D now fixed, optimize for A
arg minkX ADk2F +
A

|i,j |

i,j

Unconstrained, convex quadratic optimization


Many solvers for this (e.g. interior point methods, in-crowd
algorithm, fixed-point continuation)
Note:
Same problem as the encoding problem.
Runtime of optimization in the encoding stage?

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Speeding up the testing phase

Fair amount of work on speeding up the encoding stage:


H. Lee et al., Efficient sparse coding algorithms
http://ai.stanford.edu/~hllee/
nips06-sparsecoding.pdf
K. Gregor and Y. LeCun, Learning Fast Approximations of
Sparse Coding
http://yann.lecun.com/exdb/publis/pdf/
gregor-icml-10.pdf
S. Hawe et al., Separable Dictionary Learning
http://arxiv.org/pdf/1303.5244v1.pdf

Introduction

The Basics

Adding Prior Knowledge

Outline

Introduction: Why Sparse Coding?

Sparse Coding: The Basics

Adding Prior Knowledge

Conclusions

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Outline

Introduction: Why Sparse Coding?

Sparse Coding: The Basics

Adding Prior Knowledge

Conclusions

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Relationships between Dictionary atoms

Dictionaries are
over-complete bases

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Relationships between Dictionary atoms

Dictionaries are
over-complete bases
Dictate relationships
between atoms

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Relationships between Dictionary atoms

Dictionaries are
over-complete bases
Dictate relationships
between atoms
Example: Hierarchical
dictionaries

Conclusions

Introduction

The Basics

Example: Image Patches

Adding Prior Knowledge

Conclusions

Introduction

The Basics

Example: Document Topics

Adding Prior Knowledge

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Problem Statement
Goal:
Have sub-groups of sparse code
all be non-zero (or zero).

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Problem Statement
Goal:
Have sub-groups of sparse code
all be non-zero (or zero).
Hierarchical:
If a node is non-zero, its parent
must be non-zero
If a nodes parent is zero, the
node must be zero

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Problem Statement
Goal:
Have sub-groups of sparse code
all be non-zero (or zero).
Hierarchical:
If a node is non-zero, its parent
must be non-zero
If a nodes parent is zero, the
node must be zero
Implementation:
Change the regularization
Enforce sparsity differently...

Introduction

The Basics

Adding Prior Knowledge

Grouping Code Entries

Level k included in k + 1 groups

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Grouping Code Entries

Level k included in k + 1 groups


Add |i | to objective function once for each group

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Group Regularization
Updated objective function:
arg min

D,{j }

m h
X
j=1

kxj Dj k2

Introduction

The Basics

Adding Prior Knowledge

Group Regularization
Updated objective function:
arg min

D,{j }

m h
X
j=1

 i
kxj Dj k2 + j

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Group Regularization
Updated objective function:
arg min

D,{j }

m h
X

 i
kxj Dj k2 + j

j=1

where
() =

X
gP

wg k|g k

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Group Regularization
Updated objective function:
arg min

D,{j }

m h
X

 i
kxj Dj k2 + j

j=1

where
() =

wg k|g k

gP

|g are the code values for group g.

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Group Regularization
Updated objective function:
arg min

D,{j }

m h
X

 i
kxj Dj k2 + j

j=1

where
() =

wg k|g k

gP

|g are the code values for group g.


wg weights the enforcement of the hierarchy

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Group Regularization
Updated objective function:
arg min

D,{j }

m h
X

 i
kxj Dj k2 + j

j=1

where
() =

wg k|g k

gP

|g are the code values for group g.


wg weights the enforcement of the hierarchy
Solve using proximal methods.

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Other Examples

Other examples of structured sparsity:


M. Stojnic et al., On the Reconstruction of Block-Sparse
Signals With an Optimal Number of Measurements,
http://dx.doi.org/10.1109/TSP.2009.2020754
J. Mairal et al., Convex and Network Flow Optimization for
Structured Sparsity, http://jmlr.org/papers/
volume12/mairal11a/mairal11a.pdf

Introduction

The Basics

Adding Prior Knowledge

Outline

Introduction: Why Sparse Coding?

Sparse Coding: The Basics

Adding Prior Knowledge

Conclusions

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Outline

Introduction: Why Sparse Coding?

Sparse Coding: The Basics

Adding Prior Knowledge

Conclusions

Conclusions

Introduction

Summary

The Basics

Adding Prior Knowledge

Conclusions

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Summary

Two interesting
directions:

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Summary

Two interesting
directions:
Increasing
speed of the
testing phase

Introduction

The Basics

Adding Prior Knowledge

Conclusions

Summary

Two interesting
directions:
Increasing
speed of the
testing phase
Optimizing
dictionary
structure

You might also like