You are on page 1of 48

# Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Tutorial on Exploratory Data

Analysis
Julie Josse, Franois Husson, Sbastien L
julie.josse at agrocampus-ouest.fr
francois.husson at agrocampus-ouest.fr
Applied Mathematics Department, Agrocampus Ouest

useR-2008
Dortmund, August 11th 2008

1 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Our research focus on multi-table data analysis

We teach Exploratory Data analysis from a long time (but in

French ...)
The use of a more geometrical point of view allowing to draw

graphs
The possibility to propose new methods (taking into account

## different structure on the data)

To have a package user friendly and oriented to practitioner (a

## very easy GUI)

2 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Outline

## Multiple Correspondence Analysis (MCA)

Some extensions

3 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Multivariate data analysis

Principal Component Analysis (PCA) continuous variables
Correspondence Analysis (CA) contingency table
Multiple Correspondence Analysis (MCA) categorical

variables
Dimensionality reduction describe the dataset with smaller

number of variables
Techniques widely used for applications such as: data

## compression, data reconstruction; preprocessing before

clustering, and ...

4 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Exploratory data analysis

Descriptive methods
Data visualization
Geometrical approach: importance to graphical outputs
Identification of clusters, detection of outliers

## French school (Benzcri)

5 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Principal Component Analysis

6 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

PCA in R

## prcomp, princomp from stats

PCA from FactoMineR (http://factominer.free.fr)

7 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Notations:

## xi. the individual i,

S = n1 Xc0 Xc the covariance
matrix,
W = Xc Xc0 the inner products
matrix.

## can also be included in the analysis

8 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Some examples
Many examples
Sensory analysis: products - descriptors
Environmental data: plants - measurements; waters physico-chemical analyses
Economy: countries - economic indicators
Microbiology: cheeses - microbiological analyses
etc.
Today we illustrate PCA with:
data decathlon: athletes performances during two athletics
meetings
data chicken: genomics data

9 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Decathlon data

100m

Long.jump

Shot.put

High.jump

400m

110m.hurdle

Discus

Pole.vault

Javeline

1500m

Rank

Points

Competition

41 athletes (rows)
13 variables (columns):
10 continuous variables corresponding to the performances
2 continuous variables corresponding to the rank and the
points obtained PCA Example
1 categorical variable corresponding to the athletics meeting:
DataOlympic
: performances
41 athletes
during
two meetings of decathlon
Gameofand
Decastar
(2004)

SEBRLE
CLAY
KARPOV
BERNARD
YURKOV

11.04
10.76
11.02
11.02
11.34

7.58
7.40
7.30
7.23
7.09

14.83
14.26
14.77
14.25
15.19

2.07
1.86
2.04
1.92
2.10

49.81
49.37
48.37
48.93
50.42

14.69
14.05
14.09
14.99
15.31

43.75
50.72
48.95
40.87
46.26

5.02
4.92
4.92
5.32
4.72

63.19
60.15
50.31
62.77
63.44

291.70
301.50
300.20
280.10
276.40

1
2
3
4
5

8217
8122
8099
8067
8036

Decastar
Decastar
Decastar
Decastar
Decastar

Sebrle
Clay
Karpov
Macey
Warners

10.85
10.44
10.50
10.89
10.62

7.84
7.96
7.81
7.47
7.74

16.36
15.23
15.93
15.73
14.48

2.12
2.06
2.09
2.15
1.97

48.36
49.19
46.81
48.97
47.97

14.05
14.13
13.97
14.56
14.01

48.72
50.11
51.65
48.34
43.73

5.00
4.90
4.60
4.40
4.90

70.52
69.71
55.54
58.46
55.39

280.01
282.00
278.11
265.42
278.05

1
2
3
4
5

8893
8820
8725
8414
8343

OlympicG
OlympicG
OlympicG
OlympicG
OlympicG

10 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Problems - objectives

## variables (Euclidian distance) partition between individuals

Variables study: are there linear relationships between

## variables? Visualization of the correlation matrix; find some

synthetic variables
Link between the two studies: characterization of the groups

## of individuals by the variables; particular individuals to better

11 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Two points clouds

var 1
var k

ind 1
ind k

12 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Pre-processing: Mean centering, Scaling?

Mean centering does not modify the shape of the cloud
Scaling: variables are always scaled when they are not in the

same units

20

60

+
+
+

40

+
++ +

++

20

+
+++

+
++
+
+ ++
++
+
+
+
+
+
+
+
+
+
+

## Long jump (in cm)

20

+++ +
+ ++
+ +
+
+ ++ + + ++ + ++ + +
+++ ++

40

10

10

+
+

+
+

+
+
+
+

20

60

10

10

20

30

40

+
50

50

## PCA always centered and often scaled

13 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Individuals cloud

Individuals are in RK
Similarity between individuals: Euclidean distance
Study the structure, i.e. the shape of the individual cloud
14 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Inertia

## the data and the barycenter:

Ig

I
X

pi ||xi. g ||2 =

i=1

= tr (S) =

1X
(xi. g )0 (xi. g ),
I
i=1

s = K .

15 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Fit the individuals cloud

Find the subspace who better sum up the data: the closest one by
projection.

## Figure: Camel vs dromedary?

16 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

xi.

min

= < x, u1 > u1
max

Fu

## Maximize the variance of the projected data

Minimize the distance between individuals and their projections

## Best representation of the diversity, variability of the individuals

Do not distort the distances between individuals

17 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Find the subspace (1)

Find the first axis u1 , for which variance of Fu1 = Xu1 is

maximized:
u1 = argmaxu1 var (Xu1 ) with u10 u1 = 1
1
1
var (Fu1 ) = (Xu1 )0 Xu1 = u10 X 0 Xu1
I
I
max u10 Su1 with u10 u1 = 1
u1 first eigenvector of S (associated with the largest
eigenvalue 1 ):
Su1 = 1 u1 .
18 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Find the subspace (2)

Projected inertia on the first axis:
var (Fu1 ) = var (Xu1 ) = 1
Percentage of variance explained by the first axis:

P 1
k k
Additional axes are defined in an incremental fashion: each new

## direction is chosen by maximizing the projected variance among all

orthogonal directions
Solution: K eigenvectors u1 ,...,uK of the data covariance matrix

## corresponding to the K largest eigenvalues 1 ,...,K

19 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Example: graph of the individuals

Individuals factor map (PCA)

Casarsa

YURKOV
Parkhomenko
Korkizoglou

0
2

Dimension 2 (17.37%)

Zsivoczky
Smith
Macey

SEBRLE
Pogorelov
CLAY
MARTINEAU
HERNU
KARPOV
Turi Terek

Barras

Uldal
McMULLEN

BOURGUIGNON

Schoenbeck
Bernard

Karlivans
BARRAS Qi

Hernu
Ojaniemi
BERNARD

Smirnov
ZSIVOCZKY
Gomez

Schwarzl

Nool

Averyanov
Lorenzo
WARNERS

Warners

NOOL

Sebrle

Clay

Karpov

Drews

Dimension 1 (32.72%)

## Need variables to interpret the dimension of variability

20 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Can we interpret the individualsgraph with the variables?

Correlation between variable x.k and Fu (the vector of

individuals coordinates in RI )

r(F2,k)

r(F1,k)

## Correlation circle graph

21 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

1.0

## Variables factor map (PCA)

Discus
Shot.put

X400m
0.5

X1500m
High.jump

0.0

Pole.vault

0.5

Long.jump

1.0

Dimension 2 (17.37%)

Javeline
X110m.hurdle
X100m

1.0

0.5

0.0

0.5

1.0

Dimension 1 (32.72%)

## Figure: Graph of the variables.

22 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Find the subspace (1)

Variables are in RI
Find the first dimension v1 (in RI ) which maximizes the

x.k

Gv

## Pv1 (x.k ) = v1 (v10 v1 )1 v10 x.k

< v1 , x.k >
Gv1 (k) =
||v1 ||

P
PK
PK
2
2
2
max K
k=1 Gv1 (k) = max
k=1 cor (v1 , x.k ) = max
k=1 cos ()
with v 0 v = 1
v1 is the best synthetic variable
23 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Solution: v1 is the first eigenvector of W = XX 0 the individuals

inner product matrix (associated with the largest eigenvalue 1 ):
Wv1 = 1 v1

## The next dimensions are the other eigenvectors

Dimensionality reduction: principal components are linear
combination of the variables
A subset of components to sum up the data

24 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

1.0

## Variables factor map (PCA)

Discus
Shot.put

X400m
0.5

X1500m
High.jump

X110m.hurdle
X100m
0.0

Dimension 2 (17.37%)

Javeline

Pole.vault

1.0

0.5

Long.jump

1.0

0.5

0.0

0.5

1.0

Dimension 1 (32.72%)

## Same representation! What a wonderful result!

25 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Projections...
Only well projected variables (high cos2 between the variable and
its projection) can be interpreted!

A
HA
HB
HA
HB

D
HD

HD

HC
HC
B

26 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Exercices...

## your correlation circle look like?

27 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Exercices...

## your correlation circle look like?

mat=matrix(rnorm(7*200,0,1),ncol=200)
PCA(mat)

28 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Link between the two representations: transition formulae

Su = X 0 Xu = u
XX 0 Xu = X u W (Xu) = (Xu)
WFu = Fu and since Wv = v then Fu and v are colinear
Since, ||Fu || = and ||v || = 1 we have:

v=
u=

1 Fu

1 Gv

Gv = X 0 v = 1 X 0 Fu
Fu = Xu = 1 XGv

K
1 X
Fs (i) =
xik Gs (k)
s k=1

I
1 X
Gs (k) =
xik Fs (i)
s i=1

29 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Example on decathlon
Dim 2 (17.37%)

Discus
X400m

X1500m
Javeline

X110m.hurdle
X100m

Casarsa

Shot.put

High.jump

Rank

Points
Dim 1 (32.72%)
Pole.vault

YURKOV
Parkhomenko
Korkizoglou
Zsivoczky
Smith
Macey
Pogorelov
SEBRLE
CLAY
MARTINEAU
HERNU
KARPOV
Turi TerekBarras
McMULLEN
BOURGUIGNON Uldal
OlympicG
Decastar
Schoenbeck
Bernard
Karlivans
BARRAS Qi
Hernu
Ojaniemi
BERNARD
Smirnov
ZSIVOCZKY
Gomez
Schwarzl
Nool
Averyanov
Lorenzo
WARNERS
Warners
NOOL

Long.jump
Sebrle
Clay
Karpov
Dim 1 (32.72%)

Drews

## Figure: Individuals and variables representations.

30 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Supplementary information (1)

For the continuous variables: projection of these

1.0

## Variables factor map (PCA)

Discus
Shot.put

X400m
0.5

X1500m
High.jump

X110m.hurdle
X100m
Rank

0.0

Dimension 2 (17.37%)

Javeline

Points

Pole.vault

1.0

0.5

Long.jump

1.0

0.5

0.0

0.5

1.0

Dimension 1 (32.72%)

## Supplementary information do not participate to create the axes

31 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Supplementary information (2)

How to deal with (supplementary) categorical variables?
HERNU
BARRAS
NOOL
BOURGUIGNON
Sebrle
Clay

11.37
7.56
14.41
1.86
Decastar
11.33
6.97
14.09
1.95
Decastar
11.33
7.27
12.68
1.98
Decastar
11.36
6.80
13.46
1.86
Decastar
10.85
7.84
16.36
2.12
OlympicG
10.44
7.96
15.23
2.06
OlympicG

HERNU
BARRAS
NOOL
BOURGUIGNON
Sebrle
Clay

11.37
7.56
14.41
1.86
11.33
6.97
14.09
1.95
11.33
7.27
12.68
1.98
11.36
6.80
13.46
1.86
10.85
7.84
16.36
2.12
10.44
7.96
15.23
2.06

Decastar
Olympic.G

11.18
10.92

7.25
7.27

14.16
14.62

1.98
1.98

## The categories are projected at the barycenter of the individuals

who take the categories

32 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Supplementary information (3)

Individuals factor map (PCA)
Decastar
OlympicG

Casarsa

YURKOV
Parkhomenko
Korkizoglou

Zsivoczky
Smith

Sebrle
Macey

SEBRLE
Pogorelov
CLAY
MARTINEAU
HERNU
KARPOV
Turi Terek

Barras

McMULLEN

BOURGUIGNON Uldal

OlympicG

Decastar
Schoenbeck
Bernard

Karlivans
BARRAS Qi

Hernu
Ojaniemi

BERNARD
Smirnov
ZSIVOCZKY
Gomez

Schwarzl

Nool

Averyanov
Lorenzo
WARNERS

Warners

NOOL

Dimension 2 (17.37%)

Clay

Karpov

Drews

Dimension 1 (32.72%)

## Figure: Projection of supplementary variables.

33 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Confidence ellipses
Individuals factor map (PCA)
Decastar
OlympicG

Casarsa

YURKOV
Parkhomenko
Korkizoglou

Zsivoczky
Smith

Sebrle
Macey

SEBRLE
Pogorelov
CLAY
MARTINEAU
HERNU
KARPOV
Turi Terek

Barras

McMULLEN

BOURGUIGNON Uldal

OlympicG

Decastar
Schoenbeck
Bernard

Karlivans
BARRAS Qi

Hernu
Ojaniemi

BERNARD
Smirnov
ZSIVOCZKY
Gomez

Schwarzl

Nool

Averyanov
Lorenzo
WARNERS

Warners

NOOL

Dimension 2 (17.37%)

Clay

Karpov

Drews

Dimension 1 (32.72%)

## Figure: Confidence ellipses around the barycenter of each category.

34 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Number of dimensions?

## brought by the dimension

Quality of the approximation:

PQ
s
PsK

s
s

## variability in our data (the other components are noise)

Bar plot of the eigenvalues: scree test
Test on eigenvalues, confidence interval,...

35 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Percentage of variance obtained under independence

Is there a structure on my data?

nr=50
nc=8
iner=rep(0,1000)
for (i in 1:1000)
{
mat=matrix(rnorm(nr*nc,0,1),ncol=nc)
iner[i]=PCA(mat,graph=F)\$eig[2,3]
}
quantile(iner,0.95)

36 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Percentage of variance obtained under independence

Is there a structure on my data?
nbind
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
35
40
45
50
100

4
96.5
93.3
90.5
88.1
86.1
84.5
82.8
81.5
80.0
79.0
78.1
77.3
76.5
75.5
75.1
74.1
72.0
69.8
68.5
67.5
66.4
65.6
60.9

5
93.1
88.6
84.9
82.3
79.5
77.5
75.7
74.0
72.5
71.5
70.3
69.4
68.4
67.6
67.0
66.1
63.3
61.1
59.6
58.3
57.1
56.3
51.4

6
90.2
84.8
80.9
77.2
74.8
72.3
70.3
68.6
67.2
65.7
64.6
63.5
62.6
61.8
60.9
60.1
57.1
55.1
53.3
52.0
50.8
49.9
44.9

7
87.6
81.5
77.4
73.8
70.7
68.2
66.3
64.4
62.9
61.5
60.3
59.2
58.2
57.1
56.5
55.6
52.5
50.3
48.6
47.3
46.1
45.2
40.0

8
85.5
79.1
74.4
70.7
67.4
65.0
62.9
61.2
59.4
58.1
57.0
55.6
54.7
53.7
52.8
52.1
48.9
46.7
44.9
43.4
42.4
41.4
36.3

Number of variables
9
10
11
83.4
81.9
80.7
76.9
75.1
73.2
72.0
70.1
68.3
68.2
66.1
64.0
65.1
62.9
61.1
62.4
60.1
58.3
60.1
58.0
56.0
58.3
55.8
54.0
56.7
54.4
52.2
55.1
52.8
50.8
53.9
51.5
49.4
52.9
50.3
48.3
51.8
49.3
47.1
50.8
48.4
46.3
49.9
47.4
45.5
49.1
46.6
44.7
46.0
43.4
41.4
43.6
41.1
39.1
41.9
39.5
37.4
40.5
38.0
36.0
39.3
36.9
34.8
38.4
35.9
33.9
33.3
31.0
28.9

12
79.4
72.2
67.0
62.8
59.4
56.5
54.4
52.4
50.5
49.0
47.8
46.6
45.5
44.6
43.7
42.9
39.6
37.3
35.6
34.1
33.1
32.1
27.2

13
78.1
70.8
65.3
61.2
57.9
55.1
52.7
50.9
48.9
47.5
46.1
45.2
44.0
43.0
42.1
41.3
38.1
35.7
34.0
32.7
31.5
30.5
25.8

14
77.4
69.8
64.3
60.0
56.5
53.7
51.3
49.3
47.7
46.2
44.9
43.6
42.6
41.6
40.7
39.8
36.7
34.4
32.7
31.3
30.2
29.2
24.5

15
76.6
68.7
63.2
59.0
55.4
52.5
50.1
48.2
46.6
45.0
43.6
42.4
41.4
40.4
39.6
38.7
35.5
33.2
31.6
30.1
29.0
28.1
23.3

## Table: 95 % quantile inertia on the two first dimensions of 10000 PCA

on data with independent variables

16
75.5
68.0
62.2
58.0
54.3
51.5
49.2
47.2
45.4
44.0
42.5
41.4
40.3
39.3
38.4
37.5
34.5
32.1
30.4
29.1
27.9
27.0
22.3

37 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Percentage of variance obtained under independence

nbind
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
35
40
45
50
100

17
74.9
67.0
61.3
57.0
53.6
50.6
48.1
46.2
44.4
42.9
41.6
40.4
39.4
38.3
37.4
36.7
33.5
31.2
29.5
28.1
27.0
26.1
21.5

18
74.2
66.3
60.7
56.2
52.5
49.8
47.2
45.2
43.4
42.0
40.7
39.5
38.5
37.4
36.5
35.8
32.5
30.3
28.6
27.3
26.1
25.3
20.7

19
73.5
65.6
59.7
55.4
51.8
49.0
46.5
44.4
42.8
41.3
39.8
38.7
37.6
36.7
35.8
34.9
31.8
29.5
27.9
26.5
25.4
24.6
19.9

20
72.8
64.9
59.1
54.5
51.2
48.3
45.8
43.8
41.9
40.4
39.1
37.9
36.9
35.8
34.9
34.2
31.1
28.8
27.1
25.8
24.7
23.8
19.3

25
70.7
62.3
56.4
51.8
48.1
45.2
42.8
40.7
39.0
37.4
36.2
35.0
33.8
32.9
32.0
31.3
28.1
26.0
24.3
23.0
21.9
21.1
16.7

Number of variables
30
35
40
68.8
67.4
66.4
60.4
58.9
57.6
54.3
52.6
51.4
49.7
47.8
46.7
45.9
44.4
42.9
42.9
41.4
40.1
40.6
39.0
37.7
38.5
36.9
35.5
36.8
35.1
33.9
35.2
33.6
32.3
34.0
32.4
31.1
32.8
31.1
29.8
31.7
30.1
28.8
30.7
29.1
27.8
29.9
28.3
27.0
29.1
27.5
26.2
26.0
24.5
23.3
23.9
22.3
21.1
22.2
20.7
19.6
21.0
19.5
18.4
20.0
18.5
17.4
19.1
17.7
16.6
14.9
13.6
12.5

50
64.7
55.8
49.5
44.6
41.0
38.0
35.6
33.5
31.8
30.4
29.0
27.9
26.8
25.9
25.1
24.3
21.4
19.3
17.8
16.6
15.7
14.9
11.0

75
62.0
52.9
46.4
41.6
38.0
35.0
32.6
30.5
28.8
27.4
26.0
24.9
23.9
22.9
22.2
21.4
18.6
16.6
15.2
14.1
13.2
12.5
8.9

100
60.5
51.0
44.6
39.8
36.1
33.2
30.8
28.8
27.1
25.7
24.3
23.2
22.2
21.3
20.5
19.8
17.0
15.1
13.7
12.7
11.8
11.1
7.7

150
58.5
49.0
42.4
37.6
34.0
31.0
28.7
26.7
25.0
23.6
22.4
21.2
20.3
19.4
18.6
18.0
15.2
13.4
12.1
11.1
10.3
9.6
6.4

200
57.4
47.8
41.2
36.4
32.7
29.8
27.5
25.5
23.9
22.4
21.2
20.1
19.2
18.3
17.5
16.9
14.2
12.5
11.1
10.2
9.4
8.7
5.7

## Table: 95 % quantile inertia on the two first dimensions of 10000 PCA

on data with independent variables
38 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## between the variable and its projection) can be interpreted!

For the individuals: (same idea) distance between individuals

## can only be interpreted for well projected individuals

res.pca\$ind\$cos
Dim.1
Sebrle
0.70
Clay
0.71
Karpov
0.85

Dim.2
0.08
0.03
0.00

39 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Contribution
Contribution to the inertia to create the axis:
For the individuals: Ctrs (i) =

F 2 (i)
PI s 2
i =1 Fs (i)

Fs2 (i)
s

## Individuals with large coordinate contribute the most to the

construction of the axis
round(res.pca\$ind\$contrib,2)
Dim.1 Dim.2
Sebrle
12.16 2.62
Clay
11.45 0.98
Karpov
15.91 0.00
For the variables: Ctrs (k) =

Gs2 (k)
s

s

## Variables highly correlated with the principal component

contribute the most to the construction of the dimension
40 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Description of the dimensions (1)

By the quantitative variables:
The correlation between each variable and the coordinate of
the individuals (principal components) on the axis s is
calculated
The correlation coefficients are sorted and significant ones are
given
\$Dim.1
\$Dim.1\$quanti
Dim.1
Points
Long.jump
Shot.put
Rank
400m
110m.hurdle
100m

0.96
0.74
0.62
-0.67
-0.68
-0.75
-0.77

\$Dim.2
\$Dim.2\$quanti
Dim.2
Discus
Shot.put

0.61
0.60

41 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

## Description of the dimensions (2)

By the categorical variables:
Perform a one-way analysis of variance with the coordinates of

## the individuals on the axis explained by the categorical variable

A F -test by variable
For each category, a t-test to compare the average of the

## category with the general mean

\$Dim.1\$quali
Competition

P-value
0.155

\$Dim.1\$category
Estimate
OlympicG
0.4393
Decastar
-0.4393

P-value
0.155
0.155
42 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Practice

library(FactoMineR)
data(decathlon)
res <- PCA(decathlon,quanti.sup=11:12,quali.sup=13)
plot(res,habillage=13)
res\$eig
x11()
barplot(res\$eig[,1],main="Eigenvalues",names.arg=1:nrow(res\$eig))
res\$ind\$coord
res\$ind\$cos2
res\$ind\$contrib
dimdesc(res)
aa=cbind.data.frame(decathlon[,13],res\$ind\$coord)
bb=coord.ellipse(aa,bary=TRUE)
plot.PCA(res,habillage=13,ellipse=bb)
#write.infile(res,file="my_FactoMineR_results.csv") #to export a list

43 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

Application

Chicken data:
43 chickens (individuals)
7407 genes (variables)
One categorical variable: 6 diets corresponding to different

stresses
Do genes differentially expressed from one stress to another?

## Dimensionality reduction: with few principal components, we

identify the structure in the data

44 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

100

## Individuals factor map (PCA)

J16
J16R16
J16R5
J48
J48R24
N

j48r24_9
j48r24_5

50

j48r24_7

J48R24
j48r24_3
j48r24_6

j48r24_2
J16

j48r24_1
0

j48_7
j48_6

j48r24_4

j48_1
J48

J16R16
J16R5

j48_3
j48_4
j48_2
-50

Dimension 2 (9.35%)

j48r24_8

-100

-50

50

Dimension 1 (19.63%)

45 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

-20

j48r24_8
j16r5_6
j48r24_9
j16r5_8
j48_7 j16r5_2 j16r5_7
j16r16_1 j48_1
j16_7
j16_6
N_1
j16r5_5
j48_6
N_2
j48_4
N_3 N
j48r24_6
J16R5
N_6
J48
j48r24_1
J16
N_4 j48r24_2 j16r16_5
N_7 j16r16_6
j16_5
j16r16_7
j16r16_3
j16r16_2
j16_4
j48_3
J16R16
J48R24
j16r16_9 j16r16_8
j16r5_1 j16_3
j16r5_4
j16r5_3

j48r24_7

j48_2
j48r24_3

-40

j48r24_5
j16r16_4
j48r24_4

-60

Dimension 4 (5.87%)

20

40

## Individuals factor map (PCA)

J16
J16R16
J16R5
J48
J48R24
N

-60

-40

-20

20

40

60

Dimension 3 (7.24%)

46 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

100

## Individuals factor map (PCA)

J16
J16R16
J16R5
J48
J48R24
N

j48r24_9
j48r24_5

50

j48r24_7

J48R24
j48r24_3
j48r24_6

j48r24_2
J16

j48r24_1
0

j48_7
j48_6

j48r24_4

j48_1
J48

J16R16
J16R5

j48_3
j48_4
j48_2
-50

Dimension 2 (9.35%)

j48r24_8

-100

-50

50

Dimension 1 (19.63%)

47 / 48

Introduction

Individuals cloud

Variables cloud

Helps to interpret

-20

j48r24_8
j16r5_6
j48r24_9
j16r5_8
j48_7 j16r5_2 j16r5_7
j16r16_1 j48_1
j16_7
j16_6
N_1
j16r5_5
j48_6
N_2
j48_4
N_3 N
j48r24_6
J16R5
N_6
J48
j48r24_1
J16
N_4 j48r24_2 j16r16_5
N_7 j16r16_6
j16_5
j16r16_7
j16r16_3
j16r16_2
j16_4
j48_3
J16R16
J48R24
j16r16_9 j16r16_8
j16r5_1 j16_3
j16r5_4
j16r5_3

j48r24_7

j48_2
j48r24_3

-40

j48r24_5
j16r16_4
j48r24_4

-60

Dimension 4 (5.87%)

20

40

## Individuals factor map (PCA)

J16
J16R16
J16R5
J48
J48R24
N

-60

-40

-20

20

40

60

Dimension 3 (7.24%)

48 / 48