Professional Documents
Culture Documents
A = UΣV T
and " #
√1 √1
U= 2 2
− √12 √1
2
We can then solve for the~v vectors using A>~ui = σi~vi , producing~v1 = [0, − √12 , √12 ]> and~v2 = [1, 0, 0]> .
The last ~v must be orthonormal to the other two, so we can pick [0, √12 , √12 ]> .
(b) Let us think about what the SVD does. Let us look at matrix A acting on some vector ~x to give the
result ~y. We have,
A~x = UΣV T~x =~y
Observe that V T~x rotates the vector, Σ scales it and U rotates it again. We will try to "reverse" these
operations one at a time and then
put them together.
T
If U “rotates” the vector ΣV ~x, what operator can we derive that will undo the rotation?
Solution: By orthonormality, we know that U T U = UU T = I. Therefore, U T undoes the rotation.
(c) Derive a matrix that will "unscale", or undo the effect of Σ where it is possible to undo. Recall that Σ
has the same dimensions as A. Ignore any division by zeros (that is to say, let it stay zero).
Solution: If you observe the equation:
Σ~x =~y, (1)
you can see that σi xi = yi for i = 0, ..., m − 1, which means that to obtain xi from yi , we need to multiply
yi by σ1i . For any i > m−1, the information in xi is lost by multiplying with 0. Therefore, the reasonable
guess for xi is 0 in this case. That’s why we padded 0s in the bottom of Σ e given below:
1
σ0 0 ... 0
1
0
σ1 ... 0
σ0 0 0 0 0 ... 0 .
. .. ..
. . ... .
0 σ1 0 0 0 ... 0
If Σ = . then Σ = 0 1
0 ...
. . . .. e
.. .. .. .. σm−1
. ... 0
0 0 ... 0
0 0 0 σm−1 0 . . . 0 ..
.. .. ..
. . . .
0 0 ... 0
~x = A†~y = V ΣU
e T~y
0 1 0 1 "
0 √1
#
1 1
2 − √12
= − √2 0 √2 0 √1 2
2 √1 √1
√1 0 √12 0 0 2 2
2
3
= 12
− 21
(g) (Optional) Now we will see why this matrix is a useful proxy for the matrix inverse in such circum-
stances. Show that the solution given by the Moore-Penrose Psuedoinverse satisfies the minimality
property that if ~x̂ is the psuedo-inverse solution to A~x =~y, then k~x̂k ≤ k~zk for all other vectors~z satis-
fying A~z =~y.
(Hint: look at the vectors involved in the V basis. Think about the relevant nullspace and how it is
connected to all this.)
This minimality property is useful in both control applications (as you will see in the next problem)
and in communications applications.
Solution: Since ~x̂ is the pseudo-inverse solution, we know that,
~x̂ = V ΣU
e T~y
Let us write down what ~x̂ is with respect to the columns of V . Let there be k non-zero singular values.
The following expression comes from expanding the matrix multiplication.
~x̂ = V T~x̂
V
= V T A†~y = V T V ΣUe T~y = ΣU e T~y
T
h~y,~u0 i h~y,~u1 i h~y,~uk−1 i
= , ,..., , 0, . . . , 0
σ0 σ1 σk−1
The n − k zeros at the end come from the fact that there are only k non-zero singular values. Therefore,
by construction, ~x̂ is a linear combination of the first k columns of V .
where ~z|V is the projection of ~z in the V basis. Using the idea of “unscaling” for the first k elements
(where the unscaling is clearly invertible) and “unrotation” after that, we see that the first k elements
of ~z|V must be identical to those first k elements of ~x|V .
However, since the information for the last n − k elements of ~z|V is lost by multiplying 0s, any values
α` there are unconstrained as weights on the last part of the V basis — namely the weights on the basis
for the nullspace of A. Therefore,
T
h~y,~u0 i h~y,~u1 i h~y,~uk−1 i
~z|V = , ,..., , αk , αk+1 , . . . , αn−1
σ0 σ1 σk−1
.
Now, since the columns of V are orthonormal, observe that,
k−1
h~y,~ui i 2
~ 2
||x̂|| = ∑
i=0 σi
and that,
k−1
h~y,~ui i 2 n−1
2 2
||~z|| = ∑
+ ∑ |αi |
i=0 σ i i=k
Therefore,
n−1
||~z||2 = ||~x̂||2 + ∑ |αi |2
i=k
2. Eigenfaces
In this problem, we will be be exploring the use of PCA to compress and visualize pictures of human faces.
We use the images from the data set Labeled Faces in the Wild. Specifically, we use a set of 13,232 images
aligned using deep funneling to ensure that the faces are centered in each photo. Each image is a 100x100
image with the face aligned in the center. To turn the image into a vector, we stack each column of pixels
in the image on top of each other, and we normalize each pixel value to be between 0 and 1. Thus, a single
image of a face is represented by a 10,000 dimensional vector. A vector this size is a bit challenging to
work with directly. We combine the vectors from each image into a single matrix so that we can run PCA.
For this problem, we will provide you with the first 1000 principal components, but you can explore how
well the images are compressed with fewer components. Please refer to the IPython notebook to answer the
following questions.
(a) We provide you with a randomly selected subset of 1000 faces from the training set, the first 1000
principle components, all 13,232 singular values, and the average of all of the faces. What do we need
the average of the faces for?
Solution: We need to zero-center the data by subtracting out the average before running PCA. During
the reconstruction, we need to add the average back in.
3. Image Processing by Clustering In this homework problem, you will learn how to use the k-means
algorithm to solve two image processing problems: (1) color quantization and (2) image segmentation.
Digital images are composed of pixels (you could think of pixels as small points shown on your screen.)
Each pixel is a data point (sample) of an original image. The intensity of each pixel is its feature. If we use
8-bit integer Vari to present the intensity of one pixel, (Vari = 0) means black, while (Vari = 255) means
white. Images expressed only by pixel intensities are called grayscale images.
In color image systems, the color of a pixel is typically represented by three component intensities (features)
such as Red, Green, and Blue. The features of one pixel are a list of three 8-bit integers:[Varr ,Varg ,Varb ].
Here [0, 0, 0] means black, while [255, 255, 255] presents white. You can find tools like this website to
(a) Please look at the ipython notebook file, where you will find a 4 by 4 grayscale image. Perform the
k-means algorithm on the 16 data points with k = 4. What are the representation colors (centroids)?
Show the image after color quantization.
Solution: See sol3.ipynb. There are multiple local optima. Here we chose [0, 135, 175, 230] as the
initial centroids. The final centroids are [23.5, 119.67, 174, 230.67]. The values of the pixels assigned
to each cluster are {0, 18, 27, 49} for 23.5, {111, 113, 115, 122, 123, 134} for 135, {169, 175, 178} for
174, and {223, 234, 235}.
(b) See the ipython notebook. Apply the k-means algorithm to the grayscale image with different values of
k. Observe the distortion. Choose a good value for k, which should be the minimum value for keeping
the compressed image visually similar to the original image. Calculate the memory we need for the
compressed image.
Solution: When you adjust the value of k_gray, the higher k_gray will make the image more similar
to the original image, which means the value of distortion_gray is smaller. For this image, 8 different
colors might be enough, but some might feel we need more. This image we use is of size 597 ×
4. Brain-machine interface
The iPython notebook pca_brain_machine_interface.ipynb will guide you through the process of analyzing
brain machine interface data using principle component analysis (PCA). This will help you to prepare for
the project, where you will need to use PCA as part of a classifier that will allow you to use voice or music
inputs to control your car.
Please complete the notebook by following the instructions given.
Solution: The notebook pca_brain_machine_interface_sol.ipynb contains solutions to this exercise.
Contributors:
• Siddharth Iyer.
• Justin Yim.
• Stephen Bailey.
• Yu-Yun Dai.