Stata

Robust PCA in Stata
Vincenzo Verardi (vverardi@fundp.ac.be)

FUNDP (Namur) and ULB (Brussels), Belgium
FNRS Associate Researcher
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
PCA, transforms a set of correlated

variables into a smaller set of
uncorrelated
variables
(principal
components).
For p random variables X1,,Xp. the
goal of PCA is to construct a new set
of p axes in the directions of greatest
variability.
X2
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
X1
X2
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
X1
X2
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
X1
X2
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
X1
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
Hence, for the first principal component,

the goal is to find a linear transformation
Y=1 X1+2 X2+..+ p Xp (= TX) such that
tha variance of Y (=Var(TX) =T ) is
maximal
The direction of is given by the
eigenvector correponding to the largest
eigenvalue of matrix
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
The second vector (orthogonal to the

first), is the one that has the second
highest variance. This corresponds to
the eigenvector associated to the
second largest eigenvalue
And so on
Introduction
The new variables (PCs) have a

variance equal to their corresponding
eigenvalue
Robust
Covariance
Matrix
Var(Yi)= i for all i=1p
Robust PCA
Application
Conclusion
The relative variance explained by each

PC is given by i / i
How many PC should be considered?

Introduction
Robust
Covariance
Sufficient number of PCs to have a

cumulative variance explained that is at
least 60-70% of the total
Matrix
Robust PCA
Application
Conclusion
Kaiser criterion: keep PCs with

eigenvalues >1
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
PCA is based on the classical

covariance matrix which is sensitive to
outliers Illustration:
Introduction
Robust
PCA is based on the classical

covariance matrix which is sensitive to
outliers Illustration:
. set obs 1000
Covariance
Matrix
Robust PCA
Application
Conclusion
. drawnorm x1-x3, corr(C)

. matrix list C
c1 c2 c3
r1
1
r2 .7
1
r3 .6 .5
1
. corr x1 x2 x3
(obs=1000)
Introduction
Robust
x1
x2
x3
x1
x2
x3
1.0000
0.7097
0.6162
1.0000
0.5216
1.0000
x1
x2
x3
1.0000
0.0005
-0.0148
1.0000
0.5216
1.0000
Covariance
Matrix
Robust PCA
Application
. replace x1=100 in 1/100

(100 real changes made)
. corr x1 x2 x3
(obs=1000)
Conclusion
x1
x2
x3
. corr x1 x2 x3
(obs=1000)
Introduction
Robust
x1
x2
x3
x1
x2
x3
1.0000
0.7097
0.6162
1.0000
0.5216
1.0000
x1
x2
x3
1.0000
0.0005
-0.0148
1.0000
0.5216
1.0000
Covariance
Matrix
Robust PCA
Application

. corr x1 x2 x3
(obs=1000)
Conclusion
x1
x2
x3
. corr x1 x2 x3
(obs=1000)
Introduction
Robust
x1
x2
x3
x1
x2
x3
1.0000
0.7097
0.6162
1.0000
0.5216
1.0000
x1
x2
x3
1.0000
0.0005
-0.0148
1.0000
0.5216
1.0000
Covariance
Matrix
Robust PCA
Application

. corr x1 x2 x3
(obs=1000)
Conclusion
x1
x2
x3
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
This drawback can be easily solved by

basing the PCA on a robust estimation
of the covariance (correlation) matrix.
A well suited method for this is MCD that
considers all subsets containing h% of
the observations (generally 50%) and
estimates and on the data of the
subset associated with the smallest
covariance matrix determinant.
Intuition
Introduction
Robust
Covariance
Matrix
Robust PCA
The generalized variance proposed by

Wilks (1932), is a one-dimensional
measure of multidimensional scatter. It
is defined as GV det( ) .
In the 2x2 case it is easy to see the
underlying idea:
Application
Conclusion
2
x
xy
xy
2
y
Spread due to covariations
and det()
2
x
2
y
Raw bivariate spread
2
xy
Remember, MCD considers all subsets

containing 50% of the observations
Introduction
Robust
However, if N=200, the number of

subsets to consider would be:
Covariance
Matrix
Robust PCA
Application
Conclusion
200
58
9.054910 ...
100
Solution: use subsampling algorithms
The implemented algorithm:

Rousseeuw and Van Driessen (1999)
Introduction
1.P-subset
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
2.Concentration (sorting distances)

3.Estimation of robust MCD
4.Estimation of robust PCA
Introduction
Robust
Covariance
Consider a number of subsets

containing (p+1) points (where p is the
number of variables) sufficiently large to
be sure that at least one of the subsets
does not contain outliers.
Matrix
Robust PCA
Application
Calculate the covariance matrix on each

subset and keep the one with the
smallest determinant
Conclusion
Do some fine tuning to get closer to the

global solution
Introduction
Robust
Covariance
Matrix
Robust PCA
The minimal number of subsets we

need to have a probability (Pr) of having
at least one clean if % of outliers
corrupt the dataset can be easily
derived:
N
Pr
1 1 1 %
Contamination:
Application
Conclusion
N
*
log(1Pr)
log(1(1 ) )
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion

derived:
N
Pr 1 1 1
Will be the probability that one random

point in the dataset is not an outlier
N
*
log(1Pr)
log(1(1 ) )
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion

derived:
N
Pr 1 1 1
Will be the probability that none of the

p random points in a p-subset is an
log(1Pr)
*
outlier
N
p
log(1(1 ) )
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion

derived:
N
Pr 1 1 1
Will be the probability that at least one

of the p random points in a p-subset is
log(1Pr)
*
an
outlier
N
p
log(1(1 ) )
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion

derived:
N
Pr 1 1 1
Will be the probability that there is at

least one outlier in each of the N plog(1Pr)
*
subsets
(i.e. that all pN log(1considered
p
(1 ) )
subsets are corrupt)
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion

derived:
N
Pr 1 1 1
Will be the probability that there is at

least one clean p-subset among the N
log(1Pr)
*
considered
N
p
log(1(1 ) )
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion

derived:
N
Pr 1 1 1
Rearranging we have:
N
*
log(1Pr)
log(1(1 ) )
The preliminary p-subset step allowed to

estimate a preliminary * and *
Introduction
Robust
Calculate Mahalanobis distances using

* and * for all individuals
Covariance
Matrix
Robust PCA
Mahalanobis distances, are defined as

1
MD ( xi ) ( xi )'
Application
p2 .
Conclusion
MD are distributed as
data.
for Gaussian
The preliminary p-subset step allowed to

estimate a preliminary * and *
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
Calculate Mahalanobis distances using

* and * for all individuals
Sort
individuals
according
to
Mahalanobis distances and re-estimate
* and * using the first 50%
observations
Repeat
the
convergence
previous
step
till
In Stata, Hadis method is available to

estimate a robust Covariance matrix
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
Unfortunately it is not very robust

The reason for this is simple, it relies on
a non-robust preliminary estimation of
the covariance matrix
1. Compute a variant of MD
MD ( x ) 1( x
i
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
MED
MED
)'
. Use
2. Sort individuals according to MD
the subset with the first p+1 points to
re-estimate and .
3. Compute MD and sort the data.
4. Check if the first point out of the
subset is an outlier. If not, add this
point to the subset and repeat steps
3 and 4. Otherwise stop.
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
clear
set obs 1000
local b=sqrt(invchi2(5,0.95))
drawnorm x1-x5 e
replace x1=invnorm(uniform())+5 in
1/100
mcd x*, outlier
gen RD=Robust_distance
hadimvo x*, gen(a b) p(0.5)
scatter RD b, xline(`b') yline(`b')
8
6
Fast-MCD
Robust
Covariance
Matrix
Robust PCA
Robust distance
4
Introduction
Application
Conclusion
Hadi
0
2
3
Hadi distance (p=.5)
C .7 1
.6 .5 1

. pca x1-x3
Principal components/correlation
Introduction
Robust
Covariance
Matrix
Robust PCA
Number of obs
Number of comp.
Trace
Rho
Rotation: (unrotated = principal)
=
=
=
=
1000
3
3
1.0000
Component
Eigenvalue
Difference
Proportion
Cumulative
Comp1
Comp2
Comp3
2.26233
.471721
.26595
1.79061
.205771
.
0.7541
0.1572
0.0886
0.7541
0.9114
1.0000
Principal components (eigenvectors)
Application
Variable
Comp1
Comp2
Comp3
Unexplained
Conclusion
x1
x2
x3
0.6021
0.5815
0.5471
-0.2227
-0.5358
0.8145
-0.7667
0.6123
0.1931
0
0
0
C .7 1
.6 .5 1

. pca x1-x3
Introduction
Robust
Covariance
Matrix
Robust PCA
Number of obs
Number of comp.
Trace
Rho
=
=
=
=
1000
3
3
1.0000
Component
Eigenvalue
Difference
Proportion
Cumulative
Comp1
Comp2
Comp3
2.26233
.471721
.26595
1.79061
.205771
.
0.7541
0.1572
0.0886
0.7541
0.9114
1.0000
Application
Variable
Comp1
Comp2
Comp3
Unexplained
Conclusion
x1
x2
x3
0.6021
0.5815
0.5471
-0.2227
-0.5358
0.8145
-0.7667
0.6123
0.1931
0
0
0

. pca x1-x3
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
Number of obs
Number of comp.
Trace
Rho
=
=
=
=
1000
3
3
1.0000
Component
Eigenvalue
Difference
Proportion
Cumulative
Comp1
Comp2
Comp3
1.51219
1.00075
.487058
.511435
.513695
.
0.5041
0.3336
0.1624
0.5041
0.8376
1.0000

Variable
Comp1
Comp2
Comp3
Unexplained
x1
x2
x3
-0.0261
0.7064
0.7073
0.9986
0.0512
-0.0143
0.0463
-0.7059
0.7068
0
0
0

. pca x1-x3
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
Number of obs
Number of comp.
Trace
Rho
=
=
=
=
1000
3
3
1.0000
Component
Eigenvalue
Difference
Proportion
Cumulative
Comp1
Comp2
Comp3
1.51219
1.00075
.487058
.511435
.513695
.
0.5041
0.3336
0.1624
0.5041
0.8376
1.0000

Variable
Comp1
Comp2
Comp3
Unexplained
x1
x2
x3
-0.0261
0.7064
0.7073
0.9986
0.0512
-0.0143
0.0463
-0.7059
0.7068
0
0
0

. pca x1-x3
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
Number of obs
Number of comp.
Trace
Rho
=
=
=
=
1000
3
3
1.0000
Component
Eigenvalue
Difference
Proportion
Cumulative
Comp1
Comp2
Comp3
1.51219
1.00075
.487058
.511435
.513695
.
0.5041
0.3336
0.1624
0.5041
0.8376
1.0000

Variable
Comp1
Comp2
Comp3
Unexplained
x1
x2
x3
-0.0261
0.7064
0.7073
0.9986
0.0512
-0.0143
0.0463
-0.7059
0.7068
0
0
0
. mcd x*
The number of subsamples to check is 20
. pcamat covRMCD, n(1000)
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
Number of obs
Number of comp.
Trace
Rho
=
=
=
=
1000
3
3
1.0000
Component
Eigenvalue
Difference
Proportion
Cumulative
Comp1
Comp2
Comp3
2.24708
.473402
.27952
1.77368
.193882
.
0.7490
0.1578
0.0932
0.7490
0.9068
1.0000

Variable
Comp1
Comp2
Comp3
Unexplained
x1
x2
x3
0.6045
0.5701
0.5564
-0.0883
-0.6462
0.7581
-0.7917
0.5074
0.3402
0
0
0
. mcd x*
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
Number of obs
Number of comp.
Trace
Rho
=
=
=
=
1000
3
3
1.0000
Component
Eigenvalue
Difference
Proportion
Cumulative
Comp1
Comp2
Comp3
2.24708
.473402
.27952
1.77368
.193882
.
0.7490
0.1578
0.0932
0.7490
0.9068
1.0000

Variable
Comp1
Comp2
Comp3
Unexplained
x1
x2
x3
0.6045
0.5701
0.5564
-0.0883
-0.6462
0.7581
-0.7917
0.5074
0.3402
0
0
0
. mcd x*
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
Number of obs
Number of comp.
Trace
Rho
=
=
=
=
1000
3
3
1.0000
Component
Eigenvalue
Difference
Proportion
Cumulative
Comp1
Comp2
Comp3
2.24708
.473402
.27952
1.77368
.193882
.
0.7490
0.1578
0.0932
0.7490
0.9068
1.0000

Variable
Comp1
Comp2
Comp3
Unexplained
x1
x2
x3
0.6045
0.5701
0.5564
-0.0883
-0.6462
0.7581
-0.7917
0.5074
0.3402
0
0
0
QUESTION: Can a single indicator

accurately sum up research excellence?
Introduction
Robust
Covariance
Matrix
Robust PCA
GOAL: Determine the underlying factors

measured by the variables used in the
Shanghai ranking
Application
Conclusion
Principal component analysis
Alumni: Alumni recipients of the Nobel

prize or the Fields Medal;
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
Award: Current faculty Nobel laureates

and Fields Medal winners;
HiCi : Highly cited researchers
N&S: Articles published in Nature and
Science;
PUB: Articles in the Science Citation
Index-expanded, and the Social Science
Citation Index;
. pca
scoreonalumni scoreonaward scoreonhici scoreonns scoreonpub
Number of obs
Number of comp.
Trace
Rho

Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
=
=
=
=
150
5
5
1.0000
Component
Eigenvalue
Difference
Proportion
Cumulative
Comp1
Comp2
Comp3
Comp4
Comp5
3.40526
.872601
.414444
.189033
.118665
2.53266
.458157
.225411
.0703686
.
0.6811
0.1745
0.0829
0.0378
0.0237
0.6811
0.8556
0.9385
0.9763
1.0000

Variable
Comp1
Comp2
Comp3
Comp4
Comp5
Unexplained
scoreonalu~i
scoreonaward
scoreonhici
scoreonns
scoreonpub
0.4244
0.4405
0.4829
0.5008
0.3767
-0.4816
-0.5202
0.2651
0.1280
0.6409
0.5697
-0.1339
-0.4261
-0.3848
0.5726
-0.5129
0.6991
-0.3417
-0.1104
0.3453
-0.0155
0.1696
0.6310
-0.7567
0.0161
0
0
0
0
0
. pca
Number of obs
Number of comp.
Trace
Rho

Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
=
=
=
=
150
5
5
1.0000
Component
Eigenvalue
Difference
Proportion
Cumulative
Comp1
Comp2
Comp3
Comp4
Comp5
3.40526
.872601
.414444
.189033
.118665
2.53266
.458157
.225411
.0703686
.
0.6811
0.1745
0.0829
0.0378
0.0237
0.6811
0.8556
0.9385
0.9763
1.0000

Variable
Comp1
Comp2
Comp3
Comp4
Comp5
Unexplained
scoreonalu~i
scoreonaward
scoreonhici
scoreonns
scoreonpub
0.4244
0.4405
0.4829
0.5008
0.3767
-0.4816
-0.5202
0.2651
0.1280
0.6409
0.5697
-0.1339
-0.4261
-0.3848
0.5726
-0.5129
0.6991
-0.3417
-0.1104
0.3453
-0.0155
0.1696
0.6310
-0.7567
0.0161
0
0
0
0
0
. pca
Number of obs
Number of comp.
Trace
Rho

Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
=
=
=
=
150
5
5
1.0000
Component
Eigenvalue
Difference
Proportion
Cumulative
Comp1
Comp2
Comp3
Comp4
Comp5
3.40526
.872601
.414444
.189033
.118665
2.53266
.458157
.225411
.0703686
.
0.6811
0.1745
0.0829
0.0378
0.0237
0.6811
0.8556
0.9385
0.9763
1.0000

Variable
Comp1
Comp2
Comp3
Comp4
Comp5
Unexplained
scoreonalu~i
scoreonaward
scoreonhici
scoreonns
scoreonpub
0.4244
0.4405
0.4829
0.5008
0.3767
-0.4816
-0.5202
0.2651
0.1280
0.6409
0.5697
-0.1339
-0.4261
-0.3848
0.5726
-0.5129
0.6991
-0.3417
-0.1104
0.3453
-0.0155
0.1696
0.6310
-0.7567
0.0161
0
0
0
0
0
The first component accounts for 68% of

the inertia and is given by:
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
1=0.42Al.+0.44Aw.+0.48HiCi+0.50NS+0.38PUB
Variable
Corr. (1,Xi)
Alumni
Awards
HiCi
N&S
PUB
Total score
0.78
0.81
0.89
0.92
0.70
0.99
. mcd scoreonalumni scoreonaward scoreonhici scoreonns scoreonpub, raw

. pcamat covMCD, n(150) corr
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
Number of obs
Number of comp.
Trace
Rho
=
=
=
=
150
5
5
1.0000
Component
Eigenvalue
Difference
Proportion
Cumulative
Comp1
Comp2
Comp3
Comp4
Comp5
1.96803
1.46006
.835928
.409133
.326847
.507974
.624132
.426794
.0822867
.
0.3936
0.2920
0.1672
0.0818
0.0654
0.3936
0.6856
0.8528
0.9346
1.0000

Variable
Comp1
Comp2
Comp3
Comp4
Comp5
Unexplained
scoreonalu~i
scoreonaward
scoreonhici
scoreonns
scoreonpub
-0.4437
-0.5128
0.5322
0.3178
0.3948
0.4991
0.4375
0.3220
0.6537
0.1690
0.2350
-0.0544
-0.3983
-0.1712
0.8682
0.6946
-0.5293
0.3494
-0.3163
-0.1233
-0.1277
0.5123
0.5765
-0.5851
0.2158
0
0
0
0
0

Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
Number of obs
Number of comp.
Trace
Rho
=
=
=
=
150
5
5
1.0000
Component
Eigenvalue
Difference
Proportion
Cumulative
Comp1
Comp2
Comp3
Comp4
Comp5
1.96803
1.46006
.835928
.409133
.326847
.507974
.624132
.426794
.0822867
.
0.3936
0.2920
0.1672
0.0818
0.0654
0.3936
0.6856
0.8528
0.9346
1.0000

Variable
Comp1
Comp2
Comp3
Comp4
Comp5
Unexplained
scoreonalu~i
scoreonaward
scoreonhici
scoreonns
scoreonpub
-0.4437
-0.5128
0.5322
0.3178
0.3948
0.4991
0.4375
0.3220
0.6537
0.1690
0.2350
-0.0544
-0.3983
-0.1712
0.8682
0.6946
-0.5293
0.3494
-0.3163
-0.1233
-0.1277
0.5123
0.5765
-0.5851
0.2158
0
0
0
0
0

Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
Number of obs
Number of comp.
Trace
Rho
=
=
=
=
150
5
5
1.0000
Component
Eigenvalue
Difference
Proportion
Cumulative
Comp1
Comp2
Comp3
Comp4
Comp5
1.96803
1.46006
.835928
.409133
.326847
.507974
.624132
.426794
.0822867
.
0.3936
0.2920
0.1672
0.0818
0.0654
0.3936
0.6856
0.8528
0.9346
1.0000

Variable
Comp1
Comp2
Comp3
Comp4
Comp5
Unexplained
scoreonalu~i
scoreonaward
scoreonhici
scoreonns
scoreonpub
-0.4437
-0.5128
0.5322
0.3178
0.3948
0.4991
0.4375
0.3220
0.6537
0.1690
0.2350
-0.0544
-0.3983
-0.1712
0.8682
0.6946
-0.5293
0.3494
-0.3163
-0.1233
-0.1277
0.5123
0.5765
-0.5851
0.2158
0
0
0
0
0

Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
Number of obs
Number of comp.
Trace
Rho
=
=
=
=
150
5
5
1.0000
Component
Eigenvalue
Difference
Proportion
Cumulative
Comp1
Comp2
Comp3
Comp4
Comp5
1.96803
1.46006
.835928
.409133
.326847
.507974
.624132
.426794
.0822867
.
0.3936
0.2920
0.1672
0.0818
0.0654
0.3936
0.6856
0.8528
0.9346
1.0000

Variable
Comp1
Comp2
Comp3
Comp4
Comp5
Unexplained
scoreonalu~i
scoreonaward
scoreonhici
scoreonns
scoreonpub
-0.4437
-0.5128
0.5322
0.3178
0.3948
0.4991
0.4375
0.3220
0.6537
0.1690
0.2350
-0.0544
-0.3983
-0.1712
0.8682
0.6946
-0.5293
0.3494
-0.3163
-0.1233
-0.1277
0.5123
0.5765
-0.5851
0.2158
0
0
0
0
0
Introduction
Two underlying factors are uncovered:

1 explains 38% of inertia and
2 explains 28% of inertia
Variable
Corr. (1,)
Corr. (2,)
Alumni
Awards
HiCi
N&S
PUB
Total score
-0.05
-0.01
0.74
0.63
0.72
0.99
0.78
0.83
0.88
0.95
0.63
0.47
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
Classical PCA could be heavily distorted

by the presence of outliers.
Introduction
Robust
Covariance
Matrix
Robust PCA
Application
Conclusion
A robustified version of PCA could be

obtained either by relying on a robust
covariance matrix or by removing
multivariate outliers identified through a
robust identification method.

Stata

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stata

Uploaded by

Copyright:

Available Formats

Robust PCA in Stata

Vincenzo Verardi (vverardi@fundp.ac.be)

PCA, transforms a set of correlated

Hence, for the first principal component,

The second vector (orthogonal to the

The new variables (PCs) have a

Var(Yi)= i for all i=1p

The relative variance explained by each

How many PC should be considered?

Sufficient number of PCs to have a

Kaiser criterion: keep PCs with

PCA is based on the classical

PCA is based on the classical

. drawnorm x1-x3, corr(C)

. replace x1=100 in 1/100

. replace x1=100 in 1/100

. replace x1=100 in 1/100

This drawback can be easily solved by

The generalized variance proposed by

Spread due to covariations

Raw bivariate spread

Remember, MCD considers all subsets

However, if N=200, the number of

The implemented algorithm:

2.Concentration (sorting distances)

Consider a number of subsets

Calculate the covariance matrix on each

Do some fine tuning to get closer to the

The minimal number of subsets we

The minimal number of subsets we

Will be the probability that one random

The minimal number of subsets we

Will be the probability that none of the

The minimal number of subsets we

Will be the probability that at least one

The minimal number of subsets we

Will be the probability that there is at

subsets are corrupt)

The minimal number of subsets we

Will be the probability that there is at

The minimal number of subsets we

The preliminary p-subset step allowed to

Calculate Mahalanobis distances using

Mahalanobis distances, are defined as

The preliminary p-subset step allowed to

Calculate Mahalanobis distances using

In Stata, Hadis method is available to

Unfortunately it is not very robust

. drawnorm x1-x3, corr(C)

Rotation: (unrotated = principal)

Principal components (eigenvectors)

. drawnorm x1-x3, corr(C)

Rotation: (unrotated = principal)

Principal components (eigenvectors)

. replace x1=100 in 1/100

Rotation: (unrotated = principal)

Principal components (eigenvectors)

. replace x1=100 in 1/100

Rotation: (unrotated = principal)

Principal components (eigenvectors)

. replace x1=100 in 1/100

Rotation: (unrotated = principal)

Principal components (eigenvectors)

Rotation: (unrotated = principal)