You are on page 1of 75

A guide to the practical use of Chemometrics with applications for Static SIMS

Joanna Lee, Ian Gilmore National Physical Laboratory, Teddington, UK


Email: joanna.lee@npl.co.uk Web: http://www.npl.co.uk/nanoanalysis

Crown Copyright 2006

Contents

Slide 2

Crown Copyright 2006

1. Introduction 2. Linear algebra 3. Factor analysis Principal component analysis Multivariate curve resolution 4. Multivariate regression Multiple linear regression Principal component regression Partial least squares regression 5. Classification Principal component discriminant function analysis Partial least squares discriminant analysis 6. Conclusion

w o r C

C n

p o

r y

h ig

2 t

6 0 0

Chemoinformatics
Virtual Screening Modelling

creation design

analysis dissemination

chemical chemical information information


retrieval

QSARs

Chemometrics

visualisation

organisation

management

A. R. Leach and V. J. Gillet, An Introduction to Chemoinformatics, Kluwer Academic Publishers, 2003

Slide 3

Crown Copyright 2006

w o r C

C n

use

p o

r y

3D Descriptors

h ig

2 t

6 0 0

Graph Structures

Chemometrics

Chemometrics is the science of relating measurements made on a chemical system or process to the state of the system via application of mathematical or statistical methods

A. M. C. Davies, Spectroscopy Europe 10 (1998) 28

Slide 4

Crown Copyright 2006

w o r C

C n

p o

r y

h ig

2 t

6 0 0

Chemometrics

Advantages Fast and efficient on modern computers Statistically valid Removes potential bias Uses all information available

Slide 5

Crown Copyright 2006

Disadvantages Lots of different methods, procedures, terminologies Can be difficult to understand

w o r C

C n

p o

r y

h ig

2 t

6 0 0

Data analysis

Identification What chemicals are on the surface? Where are they located?

Classification Is there an outlier in the data?

Which group does it belong to?

Slide 6

Crown Copyright 2006

w o r C

C n

SIMS SIMS Dataset Dataset

p o

r y

h ig

Calibration / Quantification

How is it related to known properties? Can we predict these properties?

2 t

6 0 0

Data matrix
Mass spectrum of Sample 1 40 30 20 10 0

Intensity

X has 3 row and 5 columns 3 5 data matrix

Mass spectrum of Sample 3 40 30 20 10 0 1 2 3 Mass 4 5

Slide 7

Crown Copyright 2006

w o r C

Variables

C n

Intensity

9 32 10 1 21 X= 18 20 22 4 12 24 12 30 6 6

p o
30 20 10 0

r y
1

Mass spectrum of Sample 2

h ig
2

3 Mass

2 t
4

6 0 0
5

Intensity

Samples

3 Mass

Vector algebra (1)


ax = 1 ay = 2 az = 4 z y

1 a= 2 4

4 b= 2 2

Vector Inner Product (dot product)


a b = a b cos

a b = ax bx + ay by + az bz

a = [1 2 4] Vector length

= 44.5o

4 a b = ab = [1 2 4] 2 2

a = a a = aa = 12 + 22 + 42

Slide 8

Crown Copyright 2006

w o r C

= 1 4 + 2 2 + 4 2 = 16

C n

p o

r y

h ig
1 2

2 t
4

6 0 0
2 2

Transpose (to exchange rows and columns)

Vector algebra (2)

ab = a b cos
b Orthogonality a If ab = 0 then they are orthogonal i.e. at right angles = 90o Collinearity

Orthogonal vectors are uncorrelated

The smaller is the larger the correlation between a and b

Slide 9

Crown Copyright 2006

If they are also of unit length then they are orthonormal i.e. aa = 1 bb = 1

w o r C

C n

p o

b a o If = 0 then the vectors are collinear

Correlation
o o

r y

h ig

2 t
b

6 0 0

If 0 90 then the vectors are neither orthogonal nor collinear they are correlated

Matrix algebra
Matrix multiplication Matrix addition

A+ B = C
(I K ) + (I K ) = (I K )

A and B must be the same size Each corresponding element is added

(e.g. pure spectra + noise = experimental data)

Slide 10

Crown Copyright 2006

2 4 1 1 2 0 1 6 1 3 8 6 + 0 1 2 = 3 9 4

w o r C

C n

p o

No. of columns of A must be equal no. of rows of B Element in row i and column j of the product matrix AB is equal to row i of A times column j of B
1 4 1 1+ 4 3 1 2 + 4 2 2 2 1 2 = 2 1 + 2 3 2 2 + 2 2 3 2 4 2 4 1 + 2 3 4 2 + 2 2 13 10 8 8 = 10 12

r y

6 AB = C 0 0 2 t h ig
(I N )(N K ) = (I K )

Matrix inverse
Identity matrix: diagonal of 1s
1 0 0 I= 0 1 0 0 0 1

AI = A

Matrix inverse for a square matrix

AA -1 = I
(only exists if matrix is full rank)

We can now solve matrix equation If A is square If A is rectangular

AB = C

B = A 1C B = A +C

Slide 11

Crown Copyright 2006

w o r C

C n

Matrix pseudoinverse for a rectangular matrix

A + = A AA AA + = I

p o[ ]

r y
1

h ig

2 t

6 0 0

Additional properties

(AB) = BA (AB)1 = B1A 1 (AB)+ = B+ A +

Rank and singularity

Simultaneous equations of any size can be solved by matrices Rank = number of unique equations. This matrix is rank 2 1x + 2y = 5 3 x + 2y = 7

The matrix inverse. If it cannot be inverted the matrix is singular

Rank is the max. number of rows or columns that are linearly independent Rank of a matrix R min(I,K) To obtain unique solution we require number of variables rank

Slide 12

Crown Copyright 2006

Row 3 is simply multiple of row 1 so rank = 2

w o r C

C n

p o

x 1 2 5 1 y = 3 2 7 = 2
5 1 2 3 2 x = 7 y 10 2 4

r y

h ig
1

1 2 x 5 y = 7 3 2

2 t

6 0 0

Matrix projections
To write a in terms of x* and y*, we find its projections on the new axes
a = 2x + 3y

x a = [2 3] y

y*

a 30

projections onto new axes

new axes

Slide 13

Crown Copyright 2006

w o r C

C n
x*

p o

0.87 0.5 x * a = [2 3] * 0.5 0.87 y

r y

x 0.87 0.5 x * y = 0.5 0.87 y *

h ig

2 t

6 0 0

x* a = [3.2 1.6] * y

Data matrix
Mass spectrum of Sample 1 40 30 20 10 0 1
Intensity

Mass spectrum of Chemical 1 8


Intensity

Chemical Chemical 1 2 Sample 1 Sample 2 Sample 3 5 2 0 1 4

6 4 2 0 1 2 3 Mass 4 5

Intensity

Mass spectrum of Chemical 2 6 4 2 0

5 1 2 4 0 6

1 6 1 0 4 4 2 5 1 1
Variables [mass]

9 32 10 1 21 = 18 20 22 4 12 24 12 30 6 6
Variables [mass]

Chemicals

Slide 14

Crown Copyright 2006

Sample composition Samples

w o r C

Chemical spectra Chemicals

C n
1 2

3 Mass

Intensity

p o

r y
=

h ig
30 20 10 0
Intensity

2 t
2 3 Mass

Mass spectrum of Sample 2

6 0 0
4 5

3 Mass

Mass spectrum of Sample 3

40 30 20 10 0

3 Mass

Data matrix

Samples

Data matrix

1. Each spectra can be represented by a vector

2. Instead of x, y, z in real space, the axes are mass1, mass2, mass3 etc in variable space (also data space)

3. Without noise, rank of dataset = number of unique components 4. With random, uncorrelated noise, rank of dataset = number of samples or number of variables, whichever is smaller
Sample composition

5 1 2 4 0 6

1 6 1 0 4 4 2 5 1 1
Variables [mass]

9 32 10 1 21 = 18 20 22 4 12 24 12 30 6 6
Variables [mass]

Chemicals

Slide 15

Crown Copyright 2006

w o r C

Chemical spectra Chemicals

C n

p o

r y
=

h ig

2 t

6 0 0

Data matrix

Samples

Samples

Data analysis

Identification What chemicals are on the surface? Where are they located?

Classification Is there an outlier in the data?

Which group does it belong to?

Slide 16

Crown Copyright 2006

w o r C

C n

SIMS SIMS Dataset Dataset

p o

r y

h ig

Calibration / Quantification

How is it related to known properties? Can we predict these properties?

2 t

6 0 0

Terminology

In order to clarify existing terminology and emphasise the relationship between the different chemometrics techniques, the following terminology is adopted in this tutorial
Terms used here factors P PCA loadings, eigenvector, principal component scores

projections T

component concentration

scores

Slide 17

Crown Copyright 2006

w o r C

C n

p o

r y

h ig

2 t

6 0 0

MCR

PLS latent vectors, latent variables

component spectra

Principal component analysis (PCA)


y PCA Factor 1 PCA Factor 2

Factors are linear combinations of original variables i.e. mass Equivalent to a rotation in data space factors are new axes Data described by their projections onto the factors PCA is a factor analysis technique

Slide 18

Crown Copyright 2006

w o r C

C n

p o

PCA is a technique for reducing matrices of data to their lowest dimensionality by describing them using a small number of factors

r y

h ig

2 t

6 0 0

Principal component analysis (PCA)


PCA follows the factor analysis equation
(I K ) = (I N )(N K ) + (I K )

X = TP + E

Data matrix Projection of data onto factors

X = TP + E = t np n +E
n =1

(I 1)(1 I )

Slide 19

Crown Copyright 2006

We decompose the data X (rank R) into N simpler matrices of rank 1, where N < R. Each simple matrix is the outer product of two vectors, tn, and pn

w o r C

C n

p o

Factors

r y

h ig

Experimental noise

2 t

6 0 0

PCA outline
Matrix multiplication

Data Matrix
Data selection and preprocessing

Covariance Matrix

Raw Data

Reproduced Data Matrix

PCA Factors and Projections x N


Factor compression

PCA Factors and Projections x R

After Malinowski, Factor Analysis in Chemistry, John Wiley & Sons (2002)

Slide 20

Crown Copyright 2006

w o r C

Reproduction

C n

p o

r y

h ig

Decomposition

2 t

6 0 0

Re

tion duc pro

Eigenvectors and Eigenvalues


Sort by eigenvalues

PCA outline
Matrix multiplication

Data Matrix
Data selection and preprocessing

Covariance Matrix

Raw Data

Reproduced Data Matrix

PCA Factors and Projections x N


Factor compression

PCA Factors and Projections x R

After Malinowski, Factor Analysis in Chemistry, John Wiley & Sons (2002)

Slide 21

Crown Copyright 2006

w o r C

Reproduction

C n

p o

r y

h ig

Decomposition

2 t

6 0 0

Re

tion duc pro

Eigenvectors and Eigenvalues


Sort by eigenvalues

PCA decomposition

Covariance matrix contains information about the variances of data points within the dataset, and is defined as
(K K ) = (K I )(I K )

Z = XX

In PCA, Z is decomposed into a set of eigenvectors p and associated eigenvalues , such that

Eigenvalues and eigenvectors have some special properties:

Eigenvalues are positive or zero The number of non-zero eigenvalues = rank of data R Eigenvectors are orthonormal

Slide 22

Crown Copyright 2006

w o r C

C n

(K K )(K 1) = (K 1)

Zp = p

p o

r y

h ig

2 t

6 0 0

PCA outline
Matrix multiplication

Data Matrix
Data selection and preprocessing

Covariance Matrix
Decomposition

Raw Data

Reproduced Data Matrix

PCA Factors and Projections x N


Factor compression

PCA Factors and Projections x R

After Malinowski, Factor Analysis in Chemistry, John Wiley & Sons (2002)

Slide 23

Crown Copyright 2006

w o r C

Reproduction

C n

p o

r y

h ig

2 t

6 0 0

Re

tion duc pro

Eigenvectors and Eigenvalues


Sort by eigenvalues

PCA factors
Because Z is the covariance matrix, eigenvectors of Z are special directions in the data space that is optimal in describing the variance of the data Eigenvalues are the amount of variance described by their associated eigenvector

X = TP = t np n
n =1

Projection of data onto nth factor (scores)

The nth factor (loadings)

Projection of data onto factors (often called scores) are orthogonal

Slide 24

Crown Copyright 2006

w o r C

C n

These eigenvectors are the factors PCA obtain for the factor analysis equation. They are sorted by their eigenvalues PCA factors successively capture the largest amount of variance (spread) within the dataset

p o

r y

h ig

2 t

6 0 0

PCA graphical representation


The first factor lies along the major axis of ellipse and accounts for most variation

Instead of describing the data PCA Factor 2 using correlated variables x and y, we transform them onto a new basis (factors) which are uncorrelated By removing higher factors (variances due to noise) we can reduce dimensionality of data factor compression

Slide 25

Crown Copyright 2006

w o r C

C n

p o

r y

h ig
y

2 t

6 0 0
x

PCA Factor 1

PCA outline
Matrix multiplication

Data Matrix
Data selection and preprocessing

Covariance Matrix

Raw Data

Reproduced Data Matrix

PCA Factors and Projections x N


Factor compression

PCA Factors and Projections x R

After Malinowski, Factor Analysis in Chemistry, John Wiley & Sons (2002)

Slide 26

Crown Copyright 2006

w o r C

Reproduction

C n

p o

r y

h ig

Decomposition

2 t

6 0 0

Re

tion duc pro

Eigenvectors and Eigenvalues


Sort by eigenvalues

Number of factors
Data set of 8 spectra from mixing 3 pure compound spectra
1010 Eigenvalue 105 100 10-5 10-10 10-15 1 108 107 106 105 104 103 102 1011 2 3 4 5

(a) no noise

1. Prior knowledge of system

Poisson noise max 5000 counts

4. Percentage of total variance captured by N eigenvectors:


8

sum of eigenvalue s up to N 100% sum of all eigenvalue s

Sorted eigenvector index

Slide 27

Crown Copyright 2006

w o r C

Eigenvalue

C n
7 8

p o

2. Scree test: Eigenvalue plot levels off in a linearly decreasing manner after 3 factors

3. Percentage of variance captured by Nth eigenvector:


N th eigenvalue 100% sum of all eigenvalue s

r y

h ig

2 t

6 0 0

(b)

Data reproduction

X = TP + E = t np n +E
n =1

X = X E = TP

E=XX
E = X t np n
n =1 N

Slide 28

Crown Copyright 2006

E is the matrix of residuals should contain noise only useful for judging quality of PCA model may show up unexpected features!

w o r C

C n

X is the reproduced data matrix reproduced from N selected factors and projections noise filtered by removal of higher factors that describe noise variations useful for MCR

p o

r y

h ig

2 t

6 0 0

PCA outline
Matrix multiplication

Data Matrix
Data selection and preprocessing

Covariance Matrix

Raw Data

Reproduced Data Matrix

PCA Factors and Projections x N


Factor compression

PCA Factors and Projections x R

After Malinowski, Factor Analysis in Chemistry, John Wiley & Sons (2002)

Slide 29

Crown Copyright 2006

w o r C

Reproduction

C n

p o

r y

h ig

Decomposition

2 t

6 0 0

Re

tion duc pro

Eigenvectors and Eigenvalues


Sort by eigenvalues

Data preprocessing

Enhances PCA by bringing out important variance in dataset Makes assumption about nature of variance in data Can distort interpretation and quantification Includes: mass binning peak selection mean centering normalisation variance scaling Poisson scaling logarithmic transformation

More details in the following slides

Slide 30

Crown Copyright 2006

w o r C

C n

p o

r y

h ig

2 t

6 0 0

Mean centering
Preprocessed data sample i, mass k

~ X i,k = X i,k mean (X:,k )


Raw data sample i, mass k

Mean intensity of mass k

Subtract mean spectrum from each sample PCA describes variations from the mean
z Factor 2 Raw data

Factor 1

y 1st factor goes from origin and accounts for the highest variance

1st factor goes from origin to centre of gravity of data

Slide 31

Crown Copyright 2006

w o r C

C n

p o

r y
z

h ig

2 t

6 0 0

Mean Centering

Factor 2

Factor 1

Normalisation
Preprocessed data sample i, mass k

~ X i ,k = X i ,k
Raw data sample i, mass k

1 sum (X i ,: )

Divide each spectrum by its total ion intensity Reduces effects of topography, sample charging, drift in primary ion current Assumes chemical variances can be described by relative changes in ion intensities Reduces rank of data by 1

S. N. Deming, J. A. Palasota, J. M. Nocerino, J. Chemomet, 7 (1993) 393

Slide 32

Crown Copyright 2006

w o r C

C n

p o

r y

h ig

2 t

Total intensity of sample i

6 0 0

Variance scaling
Preprocessed data sample i, mass k

~ X i ,k = X i ,k
Raw data sample i, mass k

1 var (X :,k )

Variance of mass k

Divide each variable by its variance in the dataset Equalises importance of each variable (i.e. mass) Problematic for weak peaks usually used with peak selection Called auto scaling if combined with mean centering
For each variable (mass, in SIMS spectrum)

Mean

P. Geladi and B. Kowalski, Partial Least-Squares Regression: A Tutorial, Analytica Chimica Acta, 185 (1986) 1

Slide 33

Crown Copyright 2006

w o r C

Variance

C n

Raw data

p o

Meancentering

r y

h ig

2 t

6 0 0

Variance scaling

Auto scaling

Poisson scaling
~ X i ,k = X i ,k
Preprocessed data sample i, mass k Raw data sample i, mass k

1 1 mean (X i ,: ) mean (X : ,k )
Mean intensity of sample i

SIMS data dominated by Poisson counting noise Equalises the noise variance of each data point Provides greater noise rejection
No preprocessing No. of factors?

M. R. Keenan, P. G. Kotula, Surf. Interface Anal., 36 (2004) 203

Slide 34

Crown Copyright 2006

w o r C

C n

p o

r y

h ig

2 t

6 0 0

Mean intensity of mass k

Poisson scaling 4 factors needed

Data preprocessing summary

Method of preprocessing No preprocessing Mean centering Normalisation

Effect of preprocessing

First factor goes from origin to mean of data All factors describe variations from the mean Equalises total ion yield of each sample and emphasise relative changes in ion intensities Equalises variance of every peak regardless of intensity. Best with peak selection. Equalises noise variance of each data point. Provides greater noise rejection.
Crown Copyright 2006 Slide 35

Variance scaling Poisson scaling

w o r C

C n

p o

r y

h ig

2 t

6 0 0

PCA example (1)


PCA Factor 1 (62%)
Alb

Three protein compositions (100% fibrinogen, 50% fibrinogen / 50% albumin, 100% albumin) adsorbed onto poly(DTB suberate) First factor (PC1) shows relative abundance of amino acid peaks of two proteins

Projection onto first factor separates samples based on protein composition

D.J. Graham et al, Appl. Surf. Sci., 252 (2006) 6860

Slide 36

Crown Copyright 2006

w o r C

PCA Projections 1 (62%)

C n

p o

r y

h ig
Fib

2 t

6 0 0

PCA example (2)

PCA Factors

16 different single protein films adsorbed on mica Excellent classification of proteins using only 2 factors Factors consistent with total amino acid composition of various proteins 95% confidence limits provide means for identification / classification

PCA Projections 1 (53%)

M. Wagner & D. G. Castner, Langmuir, 17 (2001) 4649

Slide 37

Crown Copyright 2006

w o r C

PCA Projections 2 (19%)

C n

p o

r y

h ig

2 t

6 0 0

PCA image analysis

Datacube contains a raster of I x J pixels and K mass peaks The datacube is rearranged into 2D data matrix with dimensions [(I J) K] prior to PCA unfolding PCA results are folded to form projection images prior to interpretation 1
1 2 3

2 3 4 5 6 7 8 9

4 5 6

7 8 9

I, rows
s s a m K, aks pe

unfold

J, columns

Slide 38

Crown Copyright 2006

w o r C

C n

p o

r y

h ig

2 t

6 0 0

PCA image example (1)


2 log(eigenvalues)

Mean centering
1 0 -1 -2 2 4

Immiscible PC / PVC polymer blend


42 counts per pixel on average Total ion image

0.8 log(eigenvalues) 0.6 0.4 0.2 0 2 4 6 8 10 12 14 16 Sorted eigenvector index 18 20

Poisson scaling

Only 2 factors needed dimensionality of image reduced by factor of 20! J. Lee, I. S. Gilmore, to be published

Slide 39

Crown Copyright 2006

w o r C

C n

p o
log(eigenvalues)

-1.5 -2 -2.5 -3 -3.5 -4 -4.5 -5

r y
2 4

h ig

6 8 10 12 14 16 Sorted eigenvector index

2 t

6 0 0
18 20
18 20

Normalisation

6 8 10 12 14 16 Sorted eigenvector index

PCA image example (1)

Poisson scaled PCA results after mean centering


10

1.5
5

0.5
0

-0.5
-5

10

0.6 0.4 0.2 0

-2

2nd factor shows detector saturation for intense 35Cl peak


30 35 40

-0.2 -0.4

10

15

20 25 Mass , u

J. Lee, I. S. Gilmore, to be published

Slide 40

Crown Copyright 2006

w o r C
1 0.8

C n
5 10 15

20 25 Mas s, u

p o

30

r y
35

h ig
40

1st factor distinguishes PVC and PC phases

2 t

6 0 0

PCA image example (2)


Image courtesy of Dr Ian Fletcher, ICI Measurement Sciences Group

Hair fibre with multi-component pretreatment


Total ion Total Spectra

50m

Use Varimax rotation!

H. F. Kaiser. Psychometrika 23 (1958) 187

Slide 41

Crown Copyright 2006

w o r C
Factor 1

C n
Mass, u

p o

r y

PCA factors are linear combinations of chemical components and optimally describe variance PCA results difficult to interpret

h ig

2 t
D

6 0 0
Factor 3

Factor 2

PCA image example (2)


Factor 1 D Factor 4

Factor 2

Factor 3 C

After Varimax rotation, distribution and characteristic peaks are obtained, simplifying interpretation of huge dataset

Mass, u

Slide 42

Crown Copyright 2006

w o r C

C n

p o

r y
E

h ig

2 t

6 0 0

Factor 5

Mass, u

Multivariate curve resolution (MCR)


PCA factors are directions that describes variance positive and negative peaks in factors can be difficult to interpret

We want to resolve original chemical spectra and reverse the following process:
Sample Chemical spectra composition

Chemicals

Variables

Use multivariate curve resolution (also called self modelling mixture analysis)

Slide 43

Crown Copyright 2006

w o r C
Samples

5 1 2 4 1 6 1 0 4 4 2 5 1 1 0 6 Variables

C n

p o
Chemicals

r y

h ig

2 t

6 0 0
Samples

Data matrix

9 32 10 1 21 18 20 22 4 12 = 24 12 30 6 6

Multivariate curve resolution (MCR)


(I K ) = (I N )(N K ) + (I K )

X = TP + E

Data matrix Projection of data onto factors Factors

MCR Factor 1

MCR uses an iterative least-squares algorithm to extract solutions, while applying suitable constraints e.g. nonnegativity

Slide 44

Crown Copyright 2006

MCR is designed for recovery of chemical spectra and contributions from a multicomponent mixture, when little or no prior information about the composition is available

w o r C

C n

p o

r y
y

h ig

Experimental noise

2 t

6 0 0

MCR Factor 2

Multivariate curve resolution (MCR)


Six Steps to MCR Results

1. Determine number of factors N via eigenvalue plot 2. Obtain PCA reproduced data matrix for N factors 3. Obtain initial estimates of spectra (factors) or contributions (projections)
Random initialisation PCA factors Varimax rotated PCA factors Pure variable detection algorithm e.g. SIMPLISMA Non-negativity Equality

5. Convergence criterion 6. Alternating least squares (ALS) optimisation

Slide 45

Crown Copyright 2006

4. Constraints

w o r C

C n

p o

r y

h ig

2 t

6 0 0

Outline of MCR
Raw Data

Data Matrix

Reproduced Data Matrix Constraints

PCA

MCR Projections T

MCR Factors P

Slide 46

Crown Copyright 2006

w o r C

C n

p o

r y

h ig

2 t

6 0 0

Number of Factors Initial Estimates

MCR-ALS

MCR-ALS algorithm
Pseudoinverse of rectangular matrix 1 A + = A[AA]

Start with PCA reproduced data matrix


X = TP

Assume initial estimate of factors P


T = X (P)+ P = T + X = TP X X E=X M

(1) Find estimate of T using P, applying constraints (2) Find new estimate of P using T, applying constraints (3) Compute MCR reproduced matrix (4) Compare results and check convergence
Crown Copyright 2006 Slide 47

Steps (1) (4) are repeated until MCR factors P and projections T are able to reconstruct reproduced data matrix X within acceptable error specified in convergence criterion

w o r C

C n

p o

r y

h ig

2 t

6 0 0

Rotational ambiguity

MCR can suffer from rotational ambiguity Accuracy of resolved spectra depends on selectivity i.e. existence of pixels or sample where there is only contribution from one component
y MCR Factor 2

Good initial estimates are essential Peaks for the intense components may appear in spectra resolved for weak component

Slide 48

Crown Copyright 2006

w o r C

MCR Factor 1

C n

p o
y

r y

Chemical 1?

h ig

2 t

6 0 0

Chemical 2?

MCR image example (1)


Hair fibre with multi-component pretreatment
Total ion Total Spectra

50m

Projections 1

Projections 2

Projections 3

Projections 4

Projections 5

Slide 49

Crown Copyright 2006

w o r C

C n

Mass, u

p o

r y

h ig

Image courtesy of Dr Ian Fletcher, ICI Measurement Sciences Group

2 t

6 0 0

MCR image example (1)

Factor 1

Factor 2

Factor 3

Mass, u

Distribution and characteristic peaks are obtained, in complete agreement with PCA and manual analysis by expert

Slide 50

Crown Copyright 2006

w o r C

C n

p o

r y

h ig
E

2 t

Factor 4

6 0 0

Factor 5

Mass, u

MCR image example (2)

Three images are each assigned a SIMS spectra (PBC, PC, PVT) and combined to form a multivariate image dataset Poisson noise are added to the image (avg ~50 counts per pixel) Projection on PCA factors show combinations of original images
PCA Projections 1

Slide 51

Crown Copyright 2006

w o r C

C n

PCA Projections 2

p o

r y

h ig

2 t

6 0 0

PCA Projections 3

MCR image example (2)

MCR resolves the original images and spectra unambiguously!

MCR Projections 1

MCR Projections 2

Slide 52

Crown Copyright 2006

w o r C

C n

p o

r y

h ig

MCR Projections 3

2 t

6 0 0

Data analysis

Identification What chemicals are on the surface? Where are they located?

Classification Is there an outlier in the data?

Which group does it belong to?

Slide 53

Crown Copyright 2006

w o r C

C n

SIMS SIMS Dataset Dataset

p o

r y

h ig

Calibration / Quantification

How is it related to known properties?

2 t

6 0 0

Can we predict these properties?

Regression analysis
Measured properties
XPS measurement
1 2 3 Mass 4 5

Mass spectrum of Sample 1 40 30 20 10 0

Sample 1 Sample 2 Sample 3

Mass spectrum of Sample 2 30

20 10 0 1 2 3 Mass 4 5

Mass spectrum of Sample 3 40 30 20 10 0 1 2

Dependent variable i.e. measured property

Regression coefficient

Independent variable i.e. intensity at mass m

Slide 54

Crown Copyright 2006

w o r C
3 Mass 4 5

y = f (x ) + e y = b1x1 + b2 x 2 + b3 x3 + ... + bm x m + e

C n

We can build a model to predict the properties of materials from their SIMS spectra

p o

Intensity

r y
2 1

h ig
1 4 6

Molecular weight

Intensity

2 t

Density 3

6 0 0
7 4

Intensity

Multiple linear regression (MLR)

I = no. of samples K = no. of mass units M = no. of dependent variables

Extending to I samples and N dependent variables


(I M ) = (I K )(K M ) + (I M )

Y = XB + E

Dependent variables

Least squares solution (MLR solution)

This is the covariance matrix of X. In SIMS this is likely to be close to singular and a well defined inverse matrix cannot be found. This is due to the problem of collinearity, caused by linearly dependent rows or columns in the matrix.

Slide 55

Crown Copyright 2006

B = X+Y

w o r C

or

C n (

SIMS data matrix

B = XX ) XY
1

p o

r y

Regression matrix

h ig

Error

2 t

6 0 0

1 X + = (XX ) X

is the pseudoinverse of X

MLR - graphical representation


Dependent variables

Y = XB + E
SIMS data matrix

We relate Y to the projection of X onto B

Large number of variables (e.g. mass) Risk of overfitting!

A. M. C. Davies, T. Fearn, Spectroscopy Europe 17 (2005) 28

Slide 56

Crown Copyright 2006

w o r C

C n

p o

r y

h ig

2 t

Regression matrix

6 0 0
Error

MLR finds the least squares solution i.e. the best R2 correlation between Y and the projections of data onto the regression vector XB

Principal component regression (PCR)

I = no. of samples M = no. of dependent variables N = no. of PCA factors

PCA reduces dimensionality of data and reduces effect of noise PCA projection matrix is the coordinates of data points in reduced factor space Hence we can use PCA projection matrix T in our linear regression
(I M ) = (I N )(N M ) + (I M )

Y = TB + E

Regression matrix

These are now guaranteed to be invertible since the rows of PCA projection matrix are orthogonal

Slide 57

Crown Copyright 2006

Dependent variables PCA Projection Matrix

w o r C

C n

p o

r y

h ig

2 t

6 0 0

B =T + Y B = (T T ) T Y
1

Error

PCR graphical representation


Dependent variables

Y = + E
PCA projection matrix

One factor PCR example

For more than one factor, PCR finds linear combinations of projections T that are best for predicting Y, i.e. regression vectors are linear combinations of PCA factors P.
A. M. C. Davies, T. Fearn, Spectroscopy Europe 17 (2005) 28

Slide 58

Crown Copyright 2006

w o r C

C n

p o

r y

h ig

PCR finds correlation between Y and projection of data onto first PCA factor, T. The regression vector is a multiple of the PCA factor P.

2 t

Regression matrix

6 0 0
Error

Partial least squares regression (PLS)


The problem with PCR

X = SIMS data matrix Y = Dependent variables (e.g. XPS)

Regression uses projections T are computed to model X only By choosing directions that maximise the variance in data X we hope to include information which relates the original variables to Y First few PCA factors of X may contain only matrix effect and may have no relation to quantities Y which we want to predict

Introducing PLS

Slide 59

Crown Copyright 2006

PLS extracts projections that are common to both X and Y This is done by simultaneous decomposition of X and Y using an iterative algorithm (NIPALS) It removes redundant information from the regression i.e. factors describing large amounts of variance in X that does not correlates with Y More viable, robust solution using fewer number of factors

w o r C

C n

p o

r y

h ig

2 t

6 0 0

PLS (NIPALS) algorithm

I = no. of samples M = no. of dependent variables N = no. of PCA factors

For decomposition of single matrix X in PCA, NIPALS calculate t1 and p1 alternately until convergence. The next set of factors t2 and p2 are calculate by fitting the residuals (data not explained by p1) (1) PCA decomposition X

(I K )

(I 1)

tX = tY
(I 1)

For simultaneous decomposition of X and Y, PLS finds a mutual set of projections common to X and Y so tx = ty

p (1 K )

p (1 K )

(1 M )

From E. Malinowski, Factor Analysis in Chemistry, John Wiley and Sons (2002) Slide 60

Crown Copyright 2006

w o r C

C n
t

p o
(I K )

(2) PLS decomposition X

r y

h ig

2 t
(I M )

6 0 0

PLS formulation

We can now write

X = TP + E Y = TQ + F
projections errors

Y = XB + E

B = X + Y = (P)+ Q = WQ
regression vector

T are PLS projections used to predict Y from X (often referred to as scores) W is the weights matrix and reflects covariance structure between X and Y P and Q are not orthogonal matrices due to constraint on finding common projections T. They are sometimes called x-loadings and y-loadings respectively In literature latent variable refers to the set of quantities t, p and q associated with each PLS factor
Crown Copyright 2006 Slide 61

w o r C

C n

p o

r y

h ig

2 t

6 0 0

weights matrix

PLS example (1)

SIMS spectra of thin films of Irganox were compared with their thicknesses measured with XPS PLS model able to predict thicknesses for t < 6nm PLS regression vector shows us the SIMS peaks most correlated with thicknesses
12 10 x 10
10

231 59

Thickness predicted by SIMS (nm)

Regression Vector for Y

8 6 4 2 0 -2

3 2 1 0 0

277

1176

200

400

600 800 Mass, u

1000

1200

1 2 3 4 5 Thickness measured by XPS (nm)

J. Lee, I. S. Gilmore, to be published

Slide 62

Crown Copyright 2006

w o r C

C n

p o
6 5 4

r y

h ig

2 t

6 0 0

PLS example (2)

PLS prediction from TOF-SIMS data

Surfaces of plasma deposited films were characterised by SIMS. This was then related to bovine arterial endothelial cell (BAEC) growth (cell counting)

Reduces amount of biological cell counting experiments required

Cells counted optically

A. Chilkoti, A.E Scheimer, V.H Perez Luna and B.D Ratner, Anal. Chem. 67 (1995) 2883

Slide 63

Crown Copyright 2006

Allowed surface treatment to be characterised

w o r C

C n

p o

r y

h ig

2 t

6 0 0

PLS validation

PLS can be used to build predictive models (calibration) Validation is needed to guard against over-fitting Without enough data for validation set, cross validation can be useful Good predictive model
35 30 25 20 15 10

Dependent variable, Y

Dependent variable, Y

20 15 10 5 0 0 10 20 Independent variable, X 30

5 0 0

10 20 Independent variable, X

30

Slide 64

Crown Copyright 2006

w o r C

C n

p o
45 40 35 30 25

r y

Data is overfitted!

h ig

2 t

6 0 0

PLS validation

Leave one out cross validation most popular



0.7 0.6

Calculate PLS model excluding sample i Predict sample i Repeat for all different samples Calculate root mean square error of prediction

RMSECV, RMSEC

0.5 0.4 0.3 0.2 0.1

0 1

To decide optimal number of factors use minimum of RMSECV (Root Mean Square Error of Cross Validation) or PRESS (Prediction Residual Sum of Squares)

4 5 6 7 8 9 Number of PLS Factors

10 11 12

Slide 65

Crown Copyright 2006

w o r C

C n

RMSECV RMSEC

p o

RMSEC (Root Mean Square Error of Calibration) goes down with increasing number of factors

r y

h ig

2 t

6 0 0

PLS validation population


calibration

sample

next set of sample?

If dataset is large enough, split into calibration and validation sets Rule of thumb 2/3 calibration set, 1/3 validation set Validation data should be statistically independent from calibration data e.g. NOT repeat spectra of same sample

Calibration Validation Prediction

Independent validation set is essential if we want to use model to predict new samples

Slide 66

Crown Copyright 2006

w o r C

C n

p o

r y

validation

h ig

2 t

6 0 0

Data analysis

Identification What chemicals are on the surface? Where are they located?

Classification Is there an outlier in the data?

Which group does it belong to?

Slide 67

Crown Copyright 2006

w o r C

C n

SIMS SIMS Dataset Dataset

p o

r y

h ig

Calibration / Quantification

How is it related to known properties?

2 t

6 0 0

Can we predict these properties?

PCA classification (1)

PCA Factors

16 different single protein films adsorbed on mica Excellent classification of proteins using only 2 factors Factors consistent with total amino acid composition of various proteins 95% confidence limits provide means for identification / classification

PCA Projections 1 (53%)

M. Wagner & D. G. Castner, Langmuir, 17 (2001) 4649

Slide 68

Crown Copyright 2006

w o r C

PCA Projections 2 (19%)

C n

p o

r y

h ig

2 t

6 0 0

PCA classification (2)

Octadecanethiol selfassembled monolayers on gold substrates, exposed to different allylamine plasma deposition times Projections of data onto PCA factors indicate four clusters of objects Magnification of framed cluster reveals further clustering

Outliers can also be located

M. Von Gradowski et al, Surf. Interface Anal. 36 (2004) 1114

Slide 69

Crown Copyright 2006

w o r C

C n

p o

r y

h ig

2 t

6 0 0

PC-DFA

PC-DFA = Principal Component Discriminant Function Analysis Discriminant functions maximizes the Fishers ratio between groups
2 ( mean1 mean2 ) Fisher' s ratio =

var1 + var2

Used to distinguish strains of bacteria

J. S. Fletcher et al, Appl. Surf. Sci. 252 (2006) 6869

Slide 70

Crown Copyright 2006

w o r C

C n

p o

r y

h ig

2 t

6 0 0

PLS-DA

PLS-DA = Partial Least Squares Discriminant Analysis We put data in X and categorical information in Y

Regression vector is used for future predictions

PLS-DA Projections 1

Slide 71

Crown Copyright 2006

w o r C

C n

PLS-DA Projections 2

PLS finds factors that explains variance in data X while taking into account classifications Y

p o

r y

h ig

2 t

6 0 0

Other methods

PC-DFA and PLS-DA are both supervised methods Prior knowledge about groups are required There also exists unsupervised clustering methods
analysis creation

retrieval

visualisation

management

All these (and much more) belong to the wider field of chemoinformatics

Slide 72

Crown Copyright 2006

w o r C
design

C n

chemical chemical information information

p o

dissemination

r y

h ig

2 t

6 0 0

use organisation

Conclusion

In this tutorial we looked at Identification using PCA and MCR Quantification using MLR, PCR and PLS Classification using PC-DFA, PLS-DA Importance of validation for predictive models Data preprocessing techniques and their effects Matrix and vector algebra New set of terminologies
Terms used here

factors P

PCA loadings, eigenvector, principal component scores

MCR component spectra component concentration

PLS latent vectors, latent variables scores

projections T

Slide 73

Crown Copyright 2006

w o r C

C n

p o

r y

h ig

2 t

6 0 0

Bibliography
General A. R. Leach, V. J. Gillet, An introduction to Chemoinformatics, Kluwer Academic Publishers (2003) S. Wold, Chemometrics; what do we mean with it, and what do we want from it?, Chemom. Intell. Lab. Syst. 30 (1995) 109 E. R. Malinowski, Factor analysis in Chemistry, John Wiley and Sons (2002) P. Geladi, H. Grahn, Multivariate image analysis, John Wiley and Sons (1996) D. J. Graham, NESAC/BIO ToF-SIMS MVA web resource, http://nb.engr.washington.edu/nb-sims-resource/ PCA D. J. Graham, M. S. Wagner, D. G. Castner, Information from complexity: challenges of ToF-SIMS data interpretation, Appl. Surf. Sci. 252 (2006) 6860 M. R. Keenan, P. G. Kotula, Accounting for Poisson noise in the multivariate analysis of ToF-SIMS spectrum images, Surf. Interface Anal. 36 (2004) 203 MCR N. B. Gallagher, J. M. Shaver, E. B. Martin, J. Morris, B. M. Wise, W. Windig, Curve resolution for multivariate images with applications to TOF-SIMS and Raman, Chemom. Intell. Lab. Syst. 73 (2004) 105 J. A. Ohlhausen, M. R. Keenan, P. G. Koulta, D. E. Peebles, Multivariate statistical analysis of time-of-flight secondary ion mass spectrometry using AXSIA, Appl. Surf. Sci. 231-232 (2004) 230 R. Tauler, A. de Juan, MCR-ALS Graphic User Friendly Interface, http://www.ub.es/gesq/mcr/mcr.htm PLS P. Geladi, B. Kowalski, Partial Least-Squares Regression: A Tutorial, Analytica Chimica Acta 185 (1986) 1 A. M. C. Davies, T. Fearn, Back to basics: observing PLS, Spectroscopy Europe 17 (2005) 28

Slide 74

Crown Copyright 2006

w o r C

C n

p o

r y

h ig

2 t

6 0 0

Acknowledgements

The work is supported by UK Department of Trade and Industrys Valid Analytical Measurements (VAM) Programme and cofunded by UK MNT Network

We would like to thank Dr Ian Fletcher (ICI Measurement Sciences Group) for images and expert analysis, and Dr Martin Seah (NPL) for helpful comments

For further information of Surface and Nanoanalysis at NPL please visit http://www.npl.co.uk/nanoanalysis

Slide 75

Crown Copyright 2006

w o r C

C n

p o

r y

h ig

2 t

6 0 0

You might also like