You are on page 1of 33

Introduction

Dense Methods
Sparse Methods
Handwritten Digits

Non-Negative Matrix Factorization with


Applications to Handwritten Digit Recognition
Michael J. M. Mazack
CSCI 8314 Final Project
Department of Scientific Computation
University of Minnesota - Twin Cities

December 15th, 2009

Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

Definition
History
Properties

Definition

A WH
Non-negative Matrix Factorization (NMF)
Let A Rmn be matrix such that aij 0 for all i {1, . . . , m}
and j {1, . . . , n} (henceforth, A 0). Then for k min{m, n}
there exist W Rmk 0 and H Rkn 0 such that A WH.
Gives a rank-k approximation of A.
Factorization is not unique.
Can be approached as an optimization problem.
Implemented in MATLAB as nnmf().
Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

Definition
History
Properties

The History of NMF

1971: Self modeling curve resolution used in chemometrics


(continuous method).
1994: Positive Matrix Factorization used in Finland by
Paatero et al.
1999: Popularized by Lee and Seung under the name
non-negative matrix factorization.
2001: Sparse methods for NMF emerge.
(Gao and Church, 2005) and Wikipedia

Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

Definition
History
Properties

Important Properties
A WH

Ax WHx = Wy

y = Hx.

Columns of W are a rank-k approximation of the column


space of A.

A WH = WDD 1 H = W 0 H 0

D Rkk 0

(D with strictly positive diagonals)


Factorization is not unique.
Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

Definition
History
Properties

The Problem

A WH
Minimize f (W , H) = 21 kA WHk2F subject to:
W Rmk 0,
H Rkn 0.
(Resembles an optimization problem)

Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

Multiplicative Update
Alternating Least Squares

Dense Methods

Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

Multiplicative Update
Alternating Least Squares

Multiplicative Update Algorithm

A WH
Hij Hij


W T A ij
(W T WH)ij

Wij Wij

AH T


ij

(WHH T )ij

Iterative method starting with random W 0 and H 0.


Approaches the optimization problem using
Karush-Kuhn-Tucker (KKT) conditions.
For more information see (Lee and Seung, 2001).
Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

Multiplicative Update
Alternating Least Squares

MATLAB Implementation of Multiplicative NMF


function [w, h] = nmf_mu(a, k, maxiter)
[m, n] = size(a);
w = rand(m, k);
h = rand(k, n);
for i = 1:maxiter
h = h.*(w*a)./(w*w*h + 1e-9);
w = w.*(a*h)./(w*h*h + 1e-9);
end
Convergence for this method is very slow.
Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

Multiplicative Update
Alternating Least Squares

Alternating Non-Negative Least Squares

A WH
At each step solve:
min kWH Ak2F
H

min kH T W T AT k2F
W

such that
such that

H0
W 0

Iterative method starting with random W 0.


Finds the next H and W by solving alternating, non-negative
least squares problems (ANLS).
Efficient implementations using projected gradients exist:
http://www.csie.ntu.edu.tw/~cjlin/nmf/index.html
Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

Multiplicative Update
Alternating Least Squares

NMF/ANLS Algorithm
min kWH Ak2F
H

min kH T W T AT k2F
W

such that
such that

H0
W 0

Let aj and hj denote the j-th columns of A and H and respectively.


Also let ai and wi denote the i-th rows of A and W respectively.
To find W and H such that A WH:
Start with a random W Rmk 0.
For j n, solve minhj kWhj aj k22 .
Set all negative elements in hj to 0 and set H(:, j) = hj .
For i m, solve minw T kH T wiT aiT k22 .
i

Set all negative elements in wi to 0 and set W (i, :) = wi .


Repeat the minimization problems.
Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

Multiplicative Update
Alternating Least Squares

MATLAB Implementation of NMF/ANLS


function [w, h] = nmf_als(a, k, maxiter)
[m, n] = size(a);
w = rand(m, k);
for i = 1:maxiter
% Solve for h.
for j = 1:n
h(:, j) = lsqnonneg(w, a(:, j));
end
% Solve for w.
for j = 1:m
w(j, :) = lsqnonneg(h, a(j, :));
end
end
Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

SNMF/R
SNMF/L
Applications

Sparse Methods

Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

SNMF/R
SNMF/L
Applications

Sparse Methods
Two Popular Methods:
SNMF Right (SNMF/R) [forces sparseness in H].
SNMF Left (SNMF/L) [forces sparseness in W].
Both minimize f (W , H) subject to W 0 and H 0.
Both reduced to ANLS problems.
SNMF/R Objective Function:
n

X
1
f (W , H) = kA WHk2F + kW k2F +
kH(:, j)k21
2
j=1

SNMF/L Objective Function:


m

X
1
f (W , H) = kA WHk2F + kHk2F +
kW (i, :)k21
2
i=1

Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

SNMF/R
SNMF/L
Applications

SNMF/R
SNMF/R Objective Function:
n

X
1
f (W , H) = kA WHk2F + kW k2F +
kH(:, j)k21
2
j=1

ANLS Problem:



 2


W
A

min
H
e1k
01n F
H

such that

H 0.

 T 
 T  2

H
A
T



min
W
Ik
0km F
W

such that

W 0.

Where e1k is a vector of all ones.


Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

SNMF/R
SNMF/L
Applications

MATLAB Implementation of SNMF/R


function [w, h] = snmfr(a, beta, eta, k, maxiter)
[m, n] = size(a);
w = rand(m, k);
for i = 1:maxiter
% Solve for h.
for j = 1:n
h(:, j) = lsqnonneg([w ; sqrt(beta)*ones(1,k)], ...
[a(:, j) ; 0]);
end
% Solve for w.
for j = 1:m
w(j, :) = lsqnonneg([h ; sqrt(eta).*eye(k)], ...
[a(j, :) ; zeros(k, 1)]);
end
end
% We now have a sparse h.
h = sparse(h);

Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

SNMF/R
SNMF/L
Applications

SNMF/L
SNMF/L Objective Function:
m

X
1
f (W , H) = kA WHk2F + kHk2F +
kW (i, :)k21
2
i=1

ANLS Problem:



 2
W
A


min
H
Ik
0kn F
H


 T  2


A
HT
T


W
min
e1k
01m F
W

such that

H 0.

such that

W 0.

Where e1k is a vector of all ones.


Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

SNMF/R
SNMF/L
Applications

MATLAB Implementation of SNMF/L


function [w, h] = snmfl(a, beta, eta, k, maxiter)
[m, n] = size(a);
w = rand(m, k);
for i = 1:maxiter
% Solve for h.
for j = 1:n
h(:, j) = lsqnonneg([w ; sqrt(eta).*eye(k)], ...
[a(:, j) ; zeros(k, 1)]);
end
% Solve for w.
for j = 1:m
w(j, :) = lsqnonneg([h ; sqrt(beta)*ones(1, k)], ...
[a(j, :) ; 0]);
end
end
% We now have a sparse w.
w = sparse(w);

Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

SNMF/R
SNMF/L
Applications

Some Applications

SNMF/R
Molecular Cancer Class Discovery (Gao and Church, 2005).
Microarray Data Analysis (Kim and Park, 2007).
SNMF/L
Handwritten Digit Recognition (next slide!)
Others?
SNMF/R seems to be used almost exclusively in recent literature.

Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

The Problem
Image Representation
The Database
Algorithm Test Results

Handwritten Digit Recognition

Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

The Problem
Image Representation
The Database
Algorithm Test Results

The Problem
Automatically classify a single unknown handwritten digit using a
database of known digits.
An Unknown Digit (Test Image)

0? ... 2? ... 3? ... 5? ... 9?


16 16-pixel grayscale images (matrices) of digits 0, ..., 9.
Application: Automatic mail sorting at the post office.
Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

The Problem
Image Representation
The Database
Algorithm Test Results

Image Representation
Images from the Database

Scanned and rescaled to 16 16-pixel grayscale images.


Pixels modified to take floating point values between 0
(white) and 1 (black).

Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

The Problem
Image Representation
The Database
Algorithm Test Results

The Database
7291 handwritten digits collected by the U.S. Postal Service.
Breakdown of Digits
Digit
0
1
2
3
4
5
6
7
8
9
1

Sample Size
1194
1005
731
658
652
556
664
645
542
644

Database retrieved from http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html


Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

The Problem
Image Representation
The Database
Algorithm Test Results

How We Handle the Database


Unroll the 16 16-pixel images into vectors in R256 .
Collect all the different types (0 through 9) of unrolled images.
Place all unrolled images of type i {0, 1, ..., 9} into the
matrix Di as the columns.

| | | ... |
D5 = 5 5 5 ... 5
| | | ... |
D5 R256556
Notice there are many more columns than rows.
Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

The Problem
Image Representation
The Database
Algorithm Test Results

The Column Space and Least Squares


Take a test image d R256 .

| | | ... |
D5 = 5 5 5 ... 5 ,
| | | ... |

d =?

Is d a linear combination of the columns of some Di ?


How close is d to being a linear combination of the columns
of Di ?
Solve a least squares problem!
i = min kDi x dk22
x

Observe: We are interested in the residual i and not the x.


Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

The Problem
Image Representation
The Database
Algorithm Test Results

A Classification Algorithm

Let d R256 be a test digit to classify and let i {0, 1, ..., 9}.
Form the Di matrices (as described before) for every i.
For every i, find i = minx kDi x dk22 .
Compute mini {i } and classify d as a digit of type i.
Q: How can we use NMF to do this efficiently?
A: Use properties of NMF and a low-rank approximation.

Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

The Problem
Image Representation
The Database
Algorithm Test Results

Exploiting Properties of NMF


Let Di have a rank-k NMF called Dik . Recall that the columns of
Wik form a basis for the column space of Dik .
Di x

Dik x

Wik Hik x

Wik y

y = Hik x

This allows us to use Wik instead of Dik to find the residual.


min kDik x dk22
x

min kWik y dk22


y

Note: SNMF/L can be used to get a sparse Wik which can be


exploited to solve the least squares problem faster!

Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

The Problem
Image Representation
The Database
Algorithm Test Results

Why Do We Use Fixed Low Rank Approximation?

Reduces the computation time (pre-compute Wik ).


Avoids disasters (some Di matrices span R256 !).
Provides fairness (not all Di matrices have the same rank).

Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

The Problem
Image Representation
The Database
Algorithm Test Results

The NMF Based Algorithm

Let i {0, 1, ..., 9}.


Do once at startup:
Form the Di matrices for every i.
Compute the NMF (or SNMF/L) of each Di with a rank-k
approximation.
Let d R256 be a test digit to classify.
For every i, compute qi = miny kWik y dk22 .
Compute mini {qi } and classify d as an i.

Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

The Problem
Image Representation
The Database
Algorithm Test Results

NMF Based Algorithm Results


The following data are the test results for the NMF based
algorithm with a rank approximation of 10 on a sample of 2007
test digits.
Digit
0
1
2
3
4
5
6
7
8
9

Sample Size
359
264
198
166
200
160
170
147
166
177

Correct
353
257
175
141
178
148
163
129
149
167

Incorrect
6
7
23
25
22
12
7
18
17
10

Success Rate
98.329%
97.348%
88.384%
84.940%
89.000%
92.500%
95.882%
87.755%
89.759%
94.350%

Average Success Rate: 92.676%.


Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

The Problem
Image Representation
The Database
Algorithm Test Results

SNMF/L Based Algorithm Results


All tests were of Di with rank 10 taking = 0.1. The number of
ANLS iterations was equal to 5. Run time2 is given in seconds.
Note: Each Di R25610 has 2560 entries.

0.01
0.1
1.0
10.0
100.0
1000.0
10000.0

min (nnz(Dik ))
542
529
269
198
218
157
157

max (nnz(Dik ))
1271
1199
1000
930
674
411
256

Run Time
181.37
179.19
147.50
124.60
135.45
151.08
131.97

Success Rate
92.676%
91.179%
90.533%
90.882%
88.490%
84.853%
80.668%

The run time is the sum of the time to obtain the factorizations and the time to test all 2007 digits.
Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

The Problem
Image Representation
The Database
Algorithm Test Results

Comparison to Other Classification Algorithms


Other methods for handwritten digit recognition are based on the SVD
and minimization of tangent distances. See (Mazack, 2009)3 .
Benefits of the NMF based algorithm:
Computing NMF is almost trivial compared to computing the SVD.
Much faster and more accurate than the tangent distance method.
Has the ability to introduce sparsity.
Drawbacks of the NMF based algorithm:
Least squares problems at each step are expensive.
The SVD gives a better rank-k approximation of each Di .
The SVD based algorithm is much faster (13.03 seconds, k = 10).
Convergence of NMF can be time consuming.
3

Available at:

http://www.wwu.edu/depts/math/colloquium/Mazack slides.pdf
Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

The Problem
Image Representation
The Database
Algorithm Test Results

References
Y. Gao and G. Church. Improving molecular cancer class discovery
through sparse non-negative matrix factorization. Bioinformatics,
21:3970-3975, 2005.
D. Lee and H. Seung Algorithms for Non-negative Matrix
Factorization. Advances in Neural Information Processing Systems
13: Proceedings of the 2000 Conference. MIT Press. pp. 556-562,
2001.
H. Kim and H. Park. Sparse non-negative matrix factorizations via
alternating non-negativity-constrained least squares for microarray
data analysis. Bioinformatics, 23:1495-1502, 2007.
M. Mazack. Algorithms for Handwritten Digit Recognition.
Masters colloquium, Mathematics Department, Western
Washington University, 2009.
Michael Mazack

NMF and Handwritten Digit Recognition

Introduction
Dense Methods
Sparse Methods
Handwritten Digits

The Problem
Image Representation
The Database
Algorithm Test Results

The End!

Michael Mazack

NMF and Handwritten Digit Recognition

You might also like