You are on page 1of 2

Monday, June 4, 2007

Helvio’s experiments
PCA of ions --

Univariate stats Boxplot of ions from original envases

Ion concentrations from the original contain-


ers

Note that S04 and NO3 have very little varia-


tion and are largely uninformative. I rec-
ommend dropping them completely

PCA Scree plot

PCA of ions without SO4 nor NO3. The point


of the PCA is to collapse variation in the dif-
ferent variables into a single common index.
I.e. it takes from covariance between variables
and calculates vectors that best capture the
combined variance.

Scree plot illustrates the amount of the vari-


ance in the data set (all ions + pH) explained
by each component

Component one explains most of the variation

Variable sorting Biplot

plot placing the different variables and the


original containers on a plot with component
1 and 2 as the axes. The numbers represent
the original envases. Most of the variables are
increasing going to the left. There are clearly
2 highly influential outliers: 1 and 4.

Note that at this point we could suggest 2


groups of envases: 1, 4, 9, 10 and then the
rest, based on how they fall relative to compo-
nent 1
Monday, June 4, 2007
Helvio’s experiments
PCA without envase 1

Now note that NO2 and the others are per-


fectly aligned with the first and second
components. This means that each ‘loads’
completely, and there is no covariance be-
tween the ions (NH4, PO4, and pH all corre-
late). PCA is suggesting here that these cor-
related ions can separate our envases into 2
groups, those above and below 0 on compo-
nent 2. We could do the same with NO2, but
this is almost too continuous.

PCA based groupings

The PCA suggests 2 main groups:


1: 2,3,4,5,6
2: 7,8,9,10
and envase 1 is completely different from the
rest. This grouping is confirmed with factor
analysis.

Clustering More clustering

But, PCA isn’t a great way to make groups. A perhaps more rigorous method of clustering
Lets try kmeans clustering. This will make (just a different algorithm) gives:
a pre-specified number of groups for us by 1: 1
finding centroids that minimize squared er-
ror. 2: 2,3,5,6,7,8
It looks like we need 4 groups to represent the 3: 4,9,10
envases:
1: 2,3,5,6 It appears that these make decent groups.
2: 7,8 Now we can see what happens in models…
3: 4,9,10
4: 1
That first envase is a bad outlier.

You might also like