You are on page 1of 35

Data Treatment

Chemometrics
defined as the application of mathematical, statistical, graphical or symbolic methods to maximize the chemical information that can be extracted from the data

Selection of Suitable Microwave Digestion Method


(From Kokot, et al., 1992. Anal. Chim. Acta, 259, 267-279)

  

 

PCA (Principal component analysis) SIMCA (Soft Independent Modeling of Class Analogies) PROMETHEE (Preference Ranking Organization METHod for Enrichment Evaluation) GAIA (Geometrical Analysis for Interactive Aid) Fuzzy Clustering

Principal Component Analysis (PCA)




A summarization and data reduction technique Examines the interrelationships among a large number of variables and then attempts to explain them in terms of their common underlying dimensions, referred to as components

PCA  Based on the the derivation of linear combinations of the original variables to produce principal components characterized by scores and loadings PCjk = aj1xk1 + aj2xk2 + ...... + ajnxkn
where PCjk = the score for object k on component j, aji = the loading of variable i on component j, xki = the measured value of a variable i on object k and n = total number of original variables

SCORES projections of objects in a particular component LOADINGS reflect the contribution of each variable to a particular component

PCA
BIPLOT displays scaled scores and loadings in a PC plane


1st component accounts for the largest amount of variation Subsequent components decreasing amounts of data variance
From Kokot, et al., 1992. Anal. Chim. Acta, 259, 267-279

Extracted Information:

The objects (methods of digestion) appear to cluster in at least two groups based on the six metals (Cu, Pb, Ni, Cr, Co, and Zn) variables. Group I - methods 4Cb, 7Ab, and HPb. No hydrofluoric acid (HF) in acid mixtures. Group II consists of methods that contain HF in their acid digest. The presence of HF plays a major role in the discrimination of methods into groups. A typical method of digestion, 8Ab, appeared either as an outlier or a single member group. The metals Cr and Pb are two most discriminating variables.

From Kokot, et al., 1992. Anal. Chim. Acta, 259, 267-279

Soft Independent Modeling of Class Analogies (SIMCA)


 Uses PCA to model the shape
and position of the object formed by the samples in row space for class definition The shape of a class depends on the number of components used in the model To predict the classification of future samples, it is necessary to determine what region of measurement space it occupies

SIMCA Procedure:
1. Compute for residual standard deviation (RSD) for a class as a whole ( mean distance between the objects of a class and the class model) 2. Compute for RSD for each object (orthogonal distance between the object and the class model) 3.Compute F value from the computed residuals. If Fcal < Fcrit , the unknown sample is a member of a class

Extracted Information :

Only method 4C and probably 11A could be part of the training sets consisted of digestion methods with HF in their acid mixtures. This means that methods 4C and 11A could perform relatively well as those digestion methods with HF included in their acid mixtures based on the defined variables.

From Kokot, et al., 1992. Anal. Chim. Acta, 259, 267-279

Preference Ranking Organization METHod for Enrichment Evaluation (PROMETHEE) and Geometrical Analysis for Interactive Aid (GAIA) PROMETHEE  Designed to rank number of actions (objects) in the context of constraints present in or imposed on the data


Ranking is performed according to a set of user supplied preference conditions which are applied to the criteria (variables)

objects Method 2B 4A 4B 6A 8B 8C Cu 103 99.5 91.4 103 98 99

variables Metal content (g g-1) Pb Co 155 12.6 166 15 159 16 145 13.7 159 13 164 13 161 17 14.0 0.6

Zn 441 432 433 432 421 435 438 12

NBS 2704 98.6 5.0

preference conditions

PROMETHEE I
Procedure: 1. Define a specific preference function for each criterion. 2. Compute the degree of preference associated to the best action in case of pairwise comparisons using the prefered function. 3. Calculate positive and negative preference flows for each alternative. POSITIVE FLOW express how much an alternative is dominating (power) the other ones NEGATIVE FLOW express how much it is dominated (weakness) by the other

Interpretation of flow chart




Actions (methods of digestion) that are comparable are joined by one or more arrows, Any comparable action to the left of another is preferred, Any actions that are incomparable remain unconnected.

Extracted Information:

From Kokot, et al., 1992. Anal. Chim. Acta, 259, 267-279

For the BSR set of data (denoted by a b in the label, e.g. 7Ab), methods 12b and 4Bb outranked the others but they could not be compared because each method performed differently on the six metal variables. For NBS 2704 data (denoted by labels without a b, e.g. 7A), the performance of method 8C is comparable to methods 4A and 8B. However, 8C is located on the left of 4A and 8B thus the former method is preferred than the latter.

PROMETHEE II


Applied to eliminate the indecisive result and to produce a simple ranking scale Compute net outranking flow value: difference between the associated positive and negative outranking flows Results are less reliable than those of PROMETHEE I

Extracted Information:
PROMETHEE II ranking for complete NBS 2704, BSR and polished combined data. (From Kokot, et al., 1992. Anal. Chim. Acta, 259, 267-279)

The net flow values for the two most preferred BSR methods are very similar and are well above the value for the next methods 11Bb, 6Ab, and 8Ab. For NBS 2704, method 8C 2704, had considerably higher value than that of each of the next two methods. HF in the acid mixtures (methods 4Bb, 12b, and 11Bb) plays a major role in the digestion of BSR sample HCl in the acid mixtures (methods 8C, 8B, and 4A) determines the efficiency of digestion of NBS 2704 sample.

Method for investigating the PROMETHEE results Net outranking flows are decomposed to suit for PCA Biplot facilitates the interpretation of the significance of the criteria

GAIA

Extracted Information:


From Kokot, et al., 1992. Anal. Chim. Acta, 259, 267-279

The GAIA biplot shows the discrimination of the methods of digestion according to the acid digest composition (PC2) and the origin of the rock/sediment sample (PC1). The diagram is very similar to exploratory PCA, however, the cluster separation appears to be sharper.

FUZZY CLUSTERING


Attempts to assign a degree of class membership for a given object over several classes Classification is performed with the aid of a membership function m(x) = 1 c /x a/p where a and c are constants, p is a positive or constructed with reference to the data Sum of the membership values for each object is 1

Main advantage: Facilitates the distinction between objects that clearly belong to one cluster (membership value of 1 or close to 1) and those that are members of several clusters (membership value of 1/(no. of clusters)).

Fuzzy Clustering


Extracted Information:
Results are in good agreement with the other chemometrics procedures. For NBS 2704, methods 2B, 4C, 7A, and 11A are the important class members of one cluster. This cluster is characterized by the exclusion of HF in the acid mixtures.


The second cluster is composed of methods 4A, 4B, 8B, and 8C. These methods have HF in their acid digest.


In the case of BSR, a 3-cluster model is more appropriate for analysis. The use of a 2-cluster model is heavily influenced by the atypical method 8Ab.


Some methods are in the intermediate positions, e.g. 2A. They are classified as members of two clusters.


From Kokot, et al., 1992. Anal. Chim. Acta, 259, 267-279

Selection of Suitable Digestion Method




It was shown from the previous examples that all the chemometrics procedures provide consistent information about outliers, groupings and trends. However, only the multicriteria decision-making decisionPROMETHEE provides the rank order information which help in the selection of suitable microwave digestion method. SIMCA and FC methods are most preferred for the purposes of sample classification.

Chemometrics
Extraction of Latent Information

Ordination Diagram


Used in the determination of carrier substances for trace metals in sediments Involved simple correlation analysis on the set of major and trace elements The positive correlation coefficient matrices obtained are graphically pictured in a 2-dimensional diagram 2-

Interpretation of Ordination Diagram


 The proximity of two variables on the diagram is a measure of their statistical dependence.
 If a trace element is significantly correlated to one or

several major elements, it is possible that the mineral phase containing the major elements can be their carrier.
 The validity of this hypothesis should be verified by a

chemical speciation analysis.

Ordination Diagram

(From Jaquet, et al. 1982. Hydrobiologia, 91, 139-146.)

autochtonous carbonate

mixed carbonate

mixed silicate

silicate


dolomitic

contaminated

Three major carrier substances in the whole study area: 1. Organic matter - Cd, Pb, Ag, Cu and Hg 2. Phosphates - Zn and Sn 3. Silicates Ni, V, Co, Be, Cr and Zn

Ordination Diagram

(From Jaquet, et al. 1982. Hydrobiologia, 91, 139-146.)

autochtonous carbonate

mixed carbonate

mixed silicate

silicate


dolomitic

contaminated

Organic matter does not act as a carrier for any metal in facies 1 (autochtonous carbonate) unlike most of the other facies. This exceptional behaviour has been attributed to the fact that organic carbon in facies 1 is mostly autochtonous whereas in other facies, particularly in facies 7, the allochtonous, anthropogenic organic matter predominates.

Linear Regression Analysis




Develops linear equations from collected experimental data to make predictions about the values of a dependent variable based on the values of one or more independent variables.

Simple Linear Regression one independent variable is used to predict the value of the dependent variable Eqn.: Y = a + bX where Y = dependent variable a = constant; intercept b = slope; regression coefficient or coefficient X = independent variable Multiple Linear Regression more than one independent variable is used to predict the criterion. Eqn.: Y = a + b1X1 + b2X2+ ..........+ bnXn

Application: Normalization Procedures


  

Metal: Grain size normalization Metal: Reference metal normalization MultiMulti-element normalization

Concept: Should the concentration of the metal be related to changing sediment particle size, the concentration will change with a constant relation to grain size or its proxy
(Loring and Rantala, 1992. Earth Science Reviews, 235-283)

Why do we use normalization procedures?




To reduce or eliminate grain size effects on chemical data Identification of anomalous metal concentrations in sediments Determination of factors that control the trace metal distribution in sediments (multiple regression)

Example: Metal: Reference Metal Normalization


Al was used as a proxy for the granular variations of the aluminosilicate fractions



95% confidence band

Concentration of metals covary with Al except for Cd

Data points outside the 95% confidence band were considered contaminated


The slopes of these regression equations can be compared to the metal to aluminum ratios computed for average continental rocks and for average continental soils

(From Windom, et al., 1989. Environ. Sci. Technol., 23, 314-320)

Hierarchical Cluster Analysis (HCA)




Seeks to minimize within-group variance and withinmaximize between-group variance and represent betweenthat information in the form of a two-dimensional twoplot called dendrogram Result is a number of heterogeneous groups with homogeneous contents Classify objects or variables into several mutually exclusive groups based on the similarity of the characteristics they possess Develop hypothesis about the nature of the data or examine previously stated hypothesis

HCA

Dendrogram

(From Angelidis and Aloupi, 2000. Mar. Poll. Bull., 77-82)

Cluster I

Cluster II Cluster III




Cluster IV

Cluster I: The group of Fe, Mn, Zn, Pb and Li. The presence naturally occurring Li, Fe and Mn in this group suggests that the other elements (Zn and Pb) may also be of similar (natural) origin or they may have been distributed evenly in the coastal sediments by the tidal activity. The oxides of Fe and Mn probably play an important role in their distribution.


Cluster II: The group of organic carbon and Cu indicates the role of organic matter in the distribution of Cu.

Dendrogram

(From Angelidis and Aloupi, 2000. Mar. Poll. Bull., 77-82)

Cluster III Cluster IV

Cluster III: The group of Al, Cr, and Ni. The fact that Cr and Ni apper in the same group with the naturally derived Al, suggests that weathering of natural rocks may play an important role in the distribution of those metals in the sediments of the study area.
 

Cluster IV: The group of Cd. Cadmium forms a group of its own which indicates that the metal has a different distribution process compared to the other metals.

Principal Component Analysis (PCA)


 

A summarization and data reduction technique Examines the interrelationships among a large number of variables and then attempts to explain them in terms of their common underlying dimensions, referred to as components Provides visual display of the data that is often more enlightening than comparison of only one or two variables at a time Used to delimit areas of most contaminated sediments and the relative importance of the major metal anthropogenic inputs

PCA


Spatial distribution of metals is explained by two PCs which account for 77.9% of the variance. Identified three end members:  the clean Buzzards Bay sediments  the less contaminated outer harbor sediments  the contaminated inner harbor sediments The first PC has separated the clean Buzzards Bay samples from the contaminated samples in New Bedford Harbor.

(From Shine, et al., 1995. Environ. Sci. Technol., 29, 1781-1788)

The second PC has further separated the samples from the New Bedford Harbor based on the types of metals present in the sediments.

PCA


Co, Mn and Ni define the clean Buzzards Bay sediments Zn and Pb define the outer portion of New Bedford Harbor Cu, Cd and Cr define the contaminated inner portion of New Bedford Harbor Each of these three clusters of metals have similar loadings as the geographical clusters in the score plot.
(From Shine, et al., 1995. Environ. Sci. Technol., 29, 1781-1788)

You might also like