You are on page 1of 14

2/16/2015

Assignment #2
Principal Component Analysis

Matthew Reaume
GISC9216

February 16, 2015


GISC9261
Janet Finlay
GISC Program Coordinator and Instructor
Niagara College
135 Taylor Road
Niagara-on-the-Lake, ON
L0S 1J0
Dear Ms. Finlay,
RE: GISC9216 Assignment #2: Principal Component Analysis
Please accept this letter as my formal submission of deliverable two Principal Component
Analysis for the course GISC9216, Digital Image Processing.
The assignment serves to provide valuable working knowledge of Principal Component Analysis
and comparing this image to the original subset image. A comparison will be conducted between
the PCA image and the original unsupervised classification outlining the noticeable differences
between the two images. Comprised within this submission is a write up answering the questions
given for Assignment 2, along with diagrams, histograms, and images.
Should you have any questions regarding the enclosed documents, or if there are technical issues
regarding the files please contact us at your convenience at mkreaume9@gmail.com or at 226345-2440. Thank you for your time and attention. I look forward to your comments and
suggestions.
Kindest Regards,
Matt Reaume, BES
GIS-GM Certificate Candidate
Niagara College
M.R./m.r.
Enclosures:

Digital
Copy:
X:\Students\MREAUME2\GISC9216\Assignment#1\ReaumeMGISC9216D1 containing;
a) ReaumeMGISC9216D2.docx
b) unsupervised_image.img
c) pca_unsuper_image.img
d) pca_image

Table of Contents
1) Transform Original Bands to Principal Components ................................................................................................ 1
2) Band Comparison of Strong, Moderate, and Weak Correlations ..............................................................................1
3) PCA Variance on First, Second, and Third Channel .................................................................................................4
4) Comparison of Original Data to the PCA Channels ..................................................................................................5
5a) Unsupervised Classification on the Original Image ................................................................................................ 7
5b) Unsupervised Classification on the Principal Component Analysis........................................................................8
6) Comparison of the PCA and Unsupervised ...............................................................................................................9

Table of Figures
Figure 1: Histogram of Band 1 and Band 2 ...................................................................................................................3
Figure 2: Histogram of Band 1 and Band 4 ...................................................................................................................3
Figure 3: Eigen Matrix of PCA .....................................................................................................................................4
Figure 4: Eigen Values of PCA .....................................................................................................................................4
Figure 5: Formal Layout of Unsupervised Classification .............................................................................................. 7
Figure 6: Formal Layout of Unsupervised PCA ............................................................................................................8
Figure 7: Agricultural Features of PCA (left) and Unsupervised Classification (right) ................................................9
Figure 8: Pixel Value Comparison .............................................................................................................................. 10
Figure 9: Urban Features Comparison of PCA (left) and Unsupervised Classification (right) ................................... 10
Figure 10: Comparison of PCA (left) and Subset Image (right) .................................................................................. 11
Figure 11: Comparison of Unsupervised Classification (left) and Subset Image (right) ............................................. 11

Table of Tables
Table 1: Strong Correlation of Bands ............................................................................................................................ 1
Table 2: Moderate Correlation of Bands ....................................................................................................................... 2
Table 3: Weak Correlation of Bands ............................................................................................................................. 2
Table 4: PCA Channels and Calculation of Covariance ................................................................................................ 4
Table 5: Comparison of PCA to Orignal Subset Scatterplots ........................................................................................ 5
Table 6: Comparison of PCA and Original Subset Histograms ..................................................................................... 6

1|P age

1) Transform Original Bands to Principal Components


During the Principal Component Analysis (PCA), file bands are compressed in order for the
variance levels within associated data of the PCA to make new bands, which are created with
reduced data. This procedure allows for transforming a set of correlated variable into a new set of
uncorrelated variables. The need to transform original image bands to the Principal Components
is due to the strong correlation between bands, thus allowing for data to become redundant. An
example of redundancy is if bands one and two are strongly correlated, as well as bands 4 and 5,
and bands 5 and 6. It becomes redundant after the first strong correlation between bands. Therefore,
pulling out only the most important information is needed, and becomes easier to interpret data if
there is less data between the bands. This will allow for greater accuracy within the results that
Principal Components provide.

2) Band Comparison of Strong, Moderate, and Weak Correlations


Feature space images are created in order to graph the data file values of one band of data against
the values of another band, which is also known as a scatterplot. The colours portrayed in the
images below reflect the density of points for both bands, as bright tones that represent a high
density and the dark tones represent a low density. When a scatterplot is linear, is located at the
bottom left of the image, and ends up in the top right hand corner of the graph, it means it is
strongly correlated. Images that are not correlated strongly, will result in a triangular shaped image
that represents a weak correlation. Lastly, images that have both ends of the feature pointing in
opposite directions result in a moderate correlation. The following table shows all of the different
bands correlations, an example image, and reasoning as to why the correlation is strong, moderate,
or weak.
Table 1: Strong Correlation of Bands

Strong Correlation of Bands


Bands 1 : 2
Bands 1 : 3
Bands 2 : 3

Example of Correlation

Reasoning
This is a strong correlation
due to the linear placement
of the scatterplot. The plot
starts in the bottom left and
progresses linearly to the
top right of the image.

2|P age

Table 2: Moderate Correlation of Bands

Moderate Correlation of
Bands

Example of Correlation

1:5

Reasoning
This is a moderate
correlation due to the linear
placement of the
scatterplot. There is a
partial linear shape and
both ends of the feature are
pointing diagonally in
opposite corners of the
space image.

1:6
2: 5
2:6
3:5
3:6
5:6

Table 3: Weak Correlation of Bands

Weak Correlation of Bands


1:4
2:4
3:4
3:5

Example of Correlation

Reasoning
This is a weak correlation
due to the linear placement
of the scatterplot. There is
no linear shape on the
image due to the triangular
shape of the scatterplot.

4:5
4:6

Along with the images provided above, histograms also show the correlation between different
bands. Figure 1 below shows redundancy for band 1 and band 2. Above in Table 1, band 1 and
band 2 have a strong correlation and this can be seen on Figure 1 below by displaying an
identical overall shape. Figure 2 displays band 1 and band 4, which has a weak correlation as
seen above in Table 3. This figure shows how the histogram has a different shape between the
bands, thus displaying a weak correlation. Due to the tables provided above and the histograms
provided below, it is apparent that some of the data is redundant, thus a Principal Component
Analysis is performed to eliminate the amount of redundancy.

3|P age

Figure 1: Histogram of Band 1 and Band 2

Figure 2: Histogram of Band 1 and Band 4

4|P age

3) PCA Variance on First, Second, and Third Channel


The subset image has redundancy between some bands; therefore, a Principal Component Analysis
is executed in order to reduce the amount of redundancy. Principal Component Analysis is a way
of picking out unique information from all bands and putting that information into new channels.
These specific components has all the data inherent in the original bands that is organized into new
channels of uniqueness. Figure 3 below, shows the result of all the bands in the Eigen Matrix by
displaying 6 values horizontally and 6 values vertically. These values display provides a layout of
the components and the band data.

Figure 3: Eigen Matrix of PCA

In order to properly create the Principal Component Analysis, three components are selected that
are at the top of the Eigen Values chart, and which are the most important for the purpose of this
assignment. Three components are chosen because the goal is to reduce the amount of redundancy
of the six bands. The top three values account for 99.8%, therefore 0.60% of the data is be lost
when the PCA is executed. Below in Figure 4, shows the Eigen Values and in Table 4 by showing
the calculation of percentages for each of the PCA channels

Figure 4: Eigen Values of PCA

Table 4: PCA Channels and Calculation of Covariance

PCA Channel

Eigen Values

Divided By
Total

Percentage (%)

Covariance
Accounted For (%)

1993.072512

0.681932356

68.19323562

68.2

816.4739036

0.27935761

27.93576097

96.1

5|P age

94.49608606

0.032331959

3.233195894

99.4

11.71935572

0.004009793

0.400979283

99.8

5.215368048

0.001784445

0.178444497

99.9

1.706371854

0.000583837

0.058383735

100.0

Total:

2922.683597

100%

100%

According to the table provided above, the first channel of the Eigen Values is 68.2%, the second
channel is 27.9%, and the third is 3.2%. All three of these channels adds up to 99.4% of the
covariance that is account for. This means that the remaining 0.60%, as mentioned above, of the
data is lost. After the third channel, there is only a 0.4% difference, whereas the other differences
are 27.9% between channels one and two, and 3.3% between channels two and three. This is crucial
for understanding how PCA works and why it is helpful after data has been reduced and
compressed.

4) Comparison of Original Data to the PCA Channels


The Principal Component Analysis allows for significant change when being compared to the
original subset image. The outcome of the scatterplot can be seen blow in Table 5, which shows
how the correlation changes from the original data to the PCA channels. The outcome depicted
below shows the relationship between bands 1 and 2, 1 and 3, and 2 and 3. The correlation of these
bands has a strong correlation in the original data subset image, but has a weak or no correlation
in the PCA channels. This is due to the elimination of the redundancy and the compression that
the Principal Component Analysis allows. The result is a different set of data from the original
subset image. Another way the data can be viewed is through the histograms shown in Table 6.
This table shows how strongly correlated bands 1, 2, and 3 are with each other in the original data,
which results in a similar scatterplot. This table also shows how bands 1, 2, and 3 have a completely
different shape in the PCA scatterplots. The different shapes result in a weak or no correlation, and
the similar shapes results in a strong correlation. Therefore, by compressing the data and
eliminating redundancy, it makes the data look unique.
Table 5: Comparison of PCA to Orignal Subset Scatterplots

Bands
1:2

PCA Channels

Original Data

Comparison
The PCA scatterplot
shows that the bands
are
unique
when
compared
to
the
original data. The
redundancy has been
removed as shows no
or a weak correlation.

6|P age

1:3

Again, The PCA


scatterplot shows that
the bands have little or
no correlation, and the
redundancy has also
been removed.

2:3

Lastly,
the
PCA
scatterplot shows a
weak or no correlation
between the bands, and
the redundancy is once
again removed.

Table 6: Comparison of PCA and Original Subset Histograms

Band 1
PCA
Scatterplot

Original
Data

Band 2

Band 3

7|P age

5a) Unsupervised Classification on the Original Image

Figure 5: Formal Layout of Unsupervised Classification

8|P age

5b) Unsupervised Classification on the Principal Component Analysis

Figure 6: Formal Layout of Unsupervised PCA

9|P age

6) Comparison of the PCA and Unsupervised


Agricultural Comparison
After creating the Principal Component Analysis image and comparing the PCA with the original
unsupervised image, it is evident that the urban and agricultural areas have a noticeable difference.
Figure 7 below, displays the unsupervised Principal Component Analysis image on the left and
the original unsupervised classification on the right. This figure highlights the agricultural area by
illustrating there is more of this land-use type on the PCA image when being compared with the
original unsupervised classification image. This comparison shows that the PCA image is closer
to reality given how there are agricultural areas throughout the entire image, especially around the
residential/roads land-use type. This analysis is proven by referring to histogram column Figure 8.
The pixel values in the PCA image for agricultural is 116,448, while the pixel values in the
unsupervised classification image for agricultural in 41,038. When the PCA image is compared to
the original subset image, the vegetation areas match more so than the original unsupervised
classification image, as shown in Figure 10 on page 11.

Figure 7: Agricultural Features of PCA (left) and Unsupervised Classification (right)

10 | P a g e

Figure 8: Pixel Value Comparison

Urban (Industrial/Commercial) Comparison


When comparing the urban land-use types between the PCA image and the unsupervised
classification image, it is evident that the urban areas is more pronounced on the original
unsupervised classification image. The reason for this is due to the unsupervised classification
process of grouping multiple land-use types together, thus resulting in an inaccurate representation
of real world features. Figure 9 below, displays the unsupervised Principal Component Analysis
image on the left and the original unsupervised classification on the right. This figure highlights
the urban areas by illustrating there is more of this land-use type on the original unsupervised
classification image when being compared with the PCA image. As mentioned above in Figure 8,
the histogram column displays the pixels for each land-use type. The PCA image for urban
(industrial/commercial) areas is 12,334, while the original unsupervised classification has 51,123
pixels in the land-use type. When the original unsupervised classification image is compared with
the original subset image, it is clear that these urban areas are not displayed on the subset image,
which means the unsupervised classification images land-use types are not displayed properly.
This comparison can be seen on the following page in Figure 11.

Figure 9: Urban Features Comparison of PCA (left) and Unsupervised Classification (right)

11 | P a g e

Figure 10: Comparison of PCA (left) and Subset Image (right)

Figure 11: Comparison of Unsupervised Classification (left) and Subset Image (right)

You might also like