Professional Documents
Culture Documents
tw
(02)2705-5066
www.wunan.com.tw
(02)2705-5066
www.wunan.com.tw
(02)2705-5066
R
R R R
R (R )
R
R
R R
(data mining)
(decision tree)
(regression tree) (classication)
R
www.wunan.com.tw i
(02)2705-5066
2017 2
ii www.wunan.com.tw
(02)2705-5066
Chapter 01 1
Chapter 03 23
24
rpart ( ) 24
30
printcp ( ) 38
post ( ) 42
rpart ( ) 43
labels ( ) path.rpart ( ) 47
rsq.rpart ( ) R 49
50
{ggplot2} ggplot ( ) 57
67
74
www.wunan.com.tw iii
(02)2705-5066
Chapter 04 87
88
88
93
95
97
98
99
103
103
104
105
106
110
Chapter 05 115
rpart.plot ( ) 116
prp ( ) 118
119
122
122
123
125
132
140
{plotmo} plotmo ( ) 146
iv www.wunan.com.tw
(02)2705-5066
Chapter 06 151
152
153
153
163
165
169
170
175
179
Chapter 07 k 189
191
191
197
204
206
209
216
www.wunan.com.tw v
(02)2705-5066
() 251
() 255
257
vi www.wunan.com.tw
(02)2705-5066
329
329
331
Chapter 11 341
lm ( ) 342
{Blossom} ols ( ) lad ( ) 348
ols ( ) 348
lad ( ) 349
{psych} setCor ( ) 357
{rms} ols ( ) 367
373
{GGally} 373
383
390
392
Chapter 12 403
404
404
{radiant} logistic ( ) 415
{rms} lrm ( ) 422
431
{rpart} rpart ( ) 431
{partykit} ctree ( ) 436
{evtree} evtree ( ) 438
{C50} C5.0 ( ) 441
447
{rpartScore} rpartScore ( ) 451
www.wunan.com.tw vii
(02)2705-5066
Chapter 13 457
458
{DiscriMiner} 458
{mda} fda ( ) 473
{MASS} lda ( ) 479
485
489
{rpart} rpart ( ) 489
{partykit} ctree ( ) 492
{evtree} evtree ( ) 494
{Rweka} J48 ( ) 496
{C50} C5.0 ( ) 497
Chapter 14 501
502
(loop) 505
514
518
521
t 539
548
559
579
viii www.wunan.com.tw
(02)2705-5066
603
608
613
www.wunan.com.tw ix
(02)2705-5066
www.wunan.com.tw
(02)2705-5066
Chapter 01
R
www.wunan.com.tw
(02)2705-5066
2 R
(Chi-square Automatic
Interaction Detection; [CHAID]) (Classification and Regression
Tree; [CART]/[CRT]) AID FACT QUEST
C4.5 Ctree SPSS CHAID CRT
(CART) QUEST R CART CART
(child node) (
www.wunan.com.tw
(02)2705-5066
01 3
) (impurity)
(root node) (child
node) () ()
(branching point)
CART (classification and regression tree)
(binary tree structured classifiers)
()
CART
(impurity measure)
()
()
Gini (least
squares deviation) (sum of square of deviation
from the mean; [SS])
(improvement measure)()
(greedy algorithm)
Gini (Gini Index) ID3C4.5
(Information Gain) CHAID
Gini
s s s
(D1D2) N1N2 D n
Gini Gini(D) = 1 nj =1 p 2j pj j
D =D1
100 Gini(D)
2
= 1 2 n =1 p2 2)j = 0.50
j =1 p
n
j = 1 (0.5 + j0.5
1 nj =1 =p12j =
50 50 Gini= p 2j 2=+10.8
1 nj =1(0.2 (0.5
2
) =2 + 0.52 ) = 0.50 (
0.32
1 )p=
nj =(0.2
2 22
Gini(D) = 1 nj =1 p 2j = 1 nj =1 =pGini(D)
ID3 12j = p=2j 12=+(entropy)
1 nj =1(0.1 10.9 0.82 )1
+
j 0.18 = 0.32
) N
1 1
=
= 1 nj =1 p 2j = 1 (0.52 + 0.52 ) ==0.50 n n 2 2
j =120 1 jp
j =p 1N
=j =
12
(0.12 2
(0.5++ 0.90.52 2
=) =0.18
) 80 0.50
Gini(D) = 1 Gini( D 1 ) + Gini( D 2 )
Gini = 1 j =1 p j = 1 (0.2 + 0.8 ) = 0.32
n 2 2 2 N Np = 1 (0.2 +
n
= 1 2 N 2
N 0.8 ) = 0.32
2
n Gini(D) =j =1 1j Gini( D1 ) + 2 Gini( D2 )
10
= 1 nj =1 90
p 2j 1 + 0.9
Gini
= 1 (0.12
=1 1 ( (1
p) j==0.18
2 2
/nj =6)N2 2+ (5 / 6) 2 ) =
1 p j = 1 (0.1
2 N
+0.278
0.92 ) = 0.18
n
1( (12 / 6) 2 + (5 2N 2 ) = 0.278
j =1
Gini N N n 1 2 p 2j = 1N
/ 6) 2
N j =1 j j =1 ( N
Gini(D) = 1 Gini( D1 )=+1 2Gini( p =2 1 (8 / 10) + (2 /1 10) ) = 0.320
Gini(D)
D ) = Gini( D ) + Gini( D2 )
Gini Gini N ( N
N1 2 = 1 n njN =122p j = 1 ( (8 / 10) 102 ) = 0.320
2
n 62 2 + (2 /210)
1 p j = 1 ( (1 / 6) + (5 / 6) ) Gini
2
2014) 2= Gini( D
1
= ) +
10.278 p j = 1 ( (1
Gini(
D 2 / 6) + (5 / 6) ) = 0.278
) = 0.278 + 0.320 = 0.304
N N1 j =1 N N 2 16 6 16 10
j =1
ei =2 yi y=i N 2Gini( D1 ) + Gini( D2 ) = 0.278 + 0.320 = 0.3
= 1 j =1 p j = 1 ( (8 / 10) + (2 / 10)= 1 ) =1 p j = 1 ( (8 / 10) + (2 / 10) ) = 0.320
n 2 n
= j0.320 2 N 2 16 2 16
y i
www.wunan.com.tw ei = yi y i
N N 6 N1 10 N 2 6 10
= 1 Gini( D1 )(02)2705-5066
+ 2 Gini( D2 ) = y1i= 0.278 Gini( + D1 )2+0.320 Gini( D2 ) = 0.278 + 0.320 = 0
= 0.304
N N SEP = 16 N i =1 ( ei 16
n
e) N 16 16
Gini(D) = 1 nj =1 p 2j
4 R = 1 nj =1 p 2j = 1 (0.52 + 0.52 ) = 0.50
= 1 nj =1 p 2j = 1 (0.22 + 0.82 ) = 0.32
j =1 p(
n 2 2 2
D = 1 D1 j = 1 (0.1 + 0.9 N1)D2 ) = 0.18( N2)
N N
Gini Gini(D) = 1 Gini( D1 ) + 2 Gini( D2 )
N N
n
Gini(D) = 1 nj =1 p 2j
(
p=2j 1= 1 (nj =(11 p/ 6) + (5 / 6)22 ) = 0.278
2
1 2 2
j = 1 (0.5 + 0.5 ) = 0.50
) j =1 Gini
Gini(D) = 1 nj =1 p 2j n= 1 2 j =1 p j = 1 (0.2
n 2 2
+ 0.822 ) = 0.32
j =1 p j = 1 ( (8 / 10) (2 / 10) ) = 0.320
2
= Gini 1 16+
= 1 nj =1 p 2j = 1 (0.52 + 0.52 ) = 0.50 Gini(D) = 1 =nj =11 p 2j=nj =11 p 2j(0.12 + 0.92 ) = 0.18
N N2 6 10
= 1 Gini( D1n) + 2 N Gini( D2<2) =40 N 2 0.278 + 0.320 = 0.304
= 1 nj =1 p 2j = 1 (0.22 + 0.82 ) = 0.32 N = 1Gini(D) j =1 p=N j = 11Gini( (0.5D +) +0.5 16 2) Gini( = 0.50D 16
1 Gini 2 )
= 1 nj =1 p 2j = 1 (0.12 + 0.92 ) =ei0.18 = yi=1 y i n p 2 =N1 (0.22 + 0.8N2 ) = 0.32
+ 0.52 ) = 0.50 n j =1 j
2 1 pj
Gini(D) n = 12 n 2
GiniN ( N
+ 0.82 ) = 0.32Gini(D) = 1 Gini( D1 ) + 2 Gini( D2 )= 1 j =1j =1 pj j = 1 n (0.1
< y i40) = 1 n pGini(D)
2
2= 1 (1
( = 1
/
6) 2 +j =(5
2 p
2/j 6) j ==
+ 0.9 )2 = 0.18
1 ) 0.278
N N 1 n= 1 j =1 2p j = 1j
= 12 n =1 (0.5 p j =2 1Gini(D)
+0.5 (0.5 2 2 =
) =+0.50 10.5 2) nj==1 0.50 p 2j
SEP = N
n =1 (1en2i e2 ) N
+ 0.92 ) = 0.18 n
p 2j2 ==Gini(D) Gini(D)=nn1=1 12=ji= =1 pjj=Gini( n1=p1 2D1(1(8 ) +/ 10) 22
pGini(
+2 = (2 =+/D 110)) 2 ) 2=+0.320
2 n
p0.8
2 2 2
n nj = 2 n 21 2(0.2 =n )1 = n 2(0.5
0.32 + 0.52 ) = 0.50
p 2j =Gini(D)
1 ( (1 /=6)12+(5
Gini(D) ) 1 Gini(D)
p = 1 N1 p p = 1 jN 1 (0.2 j Gini(D) 0.8 )=j= =1 0.32
1jj = 1 p j =1j p j
2
1 j =/1 6) 0.278 j =1 j j = 1 j = 1 j j = Gini(D) 1=
N 2 e2 = Gini nNn p 2= 1 (0.5
1 2 )=Ginin 2
N212+ 220.5 (
n2 )2 =20.50 61 (0.1 n2 40) 2 10
=2)1=(0.5 10.18 - 22 + 0.8
+ 2 Gini( D2 ) j = p0.9 2
= 1 nj == 1 p
2
1j=1= (0.5j nj =+1 0.5 j =0.50 j = 1 2n = p 0.5 p) = 21 =
=0.50 n2 n=+ = (0.2 =2))0.50
= 0.32
j =11 p
n
1
Gini(D) p=2j11== 1 2p
=p1Gini( = D 1 ) +(0.5 =
2 Gini(
+ 1 1D (0.1 j )2= = + 0.9 )
0.278j= 120.18 + 0.320 = 0.304
j(0.5
+ 0.5
( ) = 0.50
)
1
N = 1 (1 j = /
1 6) j + (5j=
/ 6) 1= = 1
0.278 p =
p j 21 = 1
2
(0.5 + 0.5
+
2
0.5) = 0.50
= 1 j =1 p= j1=1nn ( (8p 22/ 10) IQR= 1)
j 1 2
n 2 2 S
(2 /22n10)
+(0.2 2 Q
= 3 0.320 Q
N 1p N+ 0.8 ) = 0.32 16 40
j = 1 j =1j
2 16
j
2 22=) 2 N =
n j 2 2 2
2= 1 (0.2 p N
n 2 2
= 1
= 1(0.5 j =1 0.5 + 0.8 = 0.32
0.50 N1+ 0.8= ) =1=0.32 = 1
1+=N12DjGini( = 1 (0.1 + 0.9 ) = 0.18
y=1 pGini(D)
n1 2
p=j 1=e1 j =j 1
=
j(0.2
j
jy += 0.8 1 (0.2 2
)Gini(D)
==2 0.32 D1 )Gini(
n
=
j =p1 p
n
11) j+
2
=j 21Gini(
=1j Dj )=
2
1(0.2 (0.2
2
D2+) 0.8
2
+ 0.8
2
) =)0.32
2
= 0.32
i =
jj = jj
+ (5 / 6) 2 ) = 0.278N1 Gini
=11
S = median n ni 2 e 2i median ( Gini(
e )
)=(= / 10)N )n = n0.320
N 26 =0.9 12=221)=j+ n1 10 p i 2p2=ij0.320 1 1(0.1 (8 +
2N
22
0.9i )2+=(2
/j+10) N0.18 2 N
2 N21
2
N2 2 2
= ==0.9 1n 2(0.1
MAD
Gini(=D11 )+njj ==112 pGini(
n 22
= = 1=D12 (0.1
)=2nj =+1
(0.2 0.8
p=0.278 0.18
0.32 j =1j
=1 p j +
0.304
j 1 =y1i j(0.1 0.18 n 0.9 ) ==0.18 =1j =1 pj =
2
jj 1Gini(D) = 1Gini(
=1j p j =
1(0.1 (0.1D1+)20.9 ++ 0.9 ) Gini(
=)0.18 D2 )
= 0.18
N N 16 1 16
0) + (2 / 10) e) ==0.320
2 2
= 1 ( n
j == N 1 (0.1< 40)
1 p j 1=Gini(
2
D1 ) ++=
M2
0.9
Gini(D) = 2 N
NN2= ) =1 Gini(
nN =D
n
i =1 i1
0.18
N
e 2
NGini(
1
)N+2 pGini(
N
21 N 2p = 1 26 (1 / 6) 2+ (510
Dnj12)=Gini( +1 N ()(1DGini(
2
j / 6) + ( (5 /n6) )N+=1 N0.278
2 22 ) = D2 ) 0.278
2 N / 6) 2
) N2 N2
= 0.278 N
yi yGini
Gini(D) 1Gini(
D2N1)) D +1 11Gini( 1 0.320 = 0.304
SEP
Gini(D) Gini(D) Gini(
SEP == j =1 N
D1 e) i+
i =1 (
jD
=12 e
N ) Gini( DGini(D)
Gini(D) 2) = 2 = 16 Gini( Gini( D )D+ ) + Gini( Gini( D )D2 )
16 ( (1 / 16) 221 +N(5N/ 6) 2 ) =2 0.278
i i
N N 2N N 2N 1 n 2 Nn 2 1 2 p j N= 21N
6 10 N N nMSEP n
y i
i( D2 ) = 0.278 + Gini(D) 0.320 ==0.304 1
Gini( D n 1) +
(2015) SEP =
eiGini(
2
=n y D e
2y)=i 1 j == 1j=1j =1 (2p(8j =/ 10) 1 n (j+ (8 / 10) / 10)+2 )(2=/0.320 10) ) = 0.320
12 ( (1 / 6)122 p+ 2 (5 / 6) )2 = 0.278 2
=n1 (2
n
1 (1(1/ 6) 2ep+22 1(5
/ 6) e2 p) 2pj=i 2=0.278
( )N1 N )1 =11N2pj jp==1j61p=j 1(=(11(/(16)6(/(826)+10
yji=1( j
p 2j = N N
16
SEP =
16 11
jn=1i =1 ( ei e )
n 2 j = 1 1
S
(1 /
=
6) =
Q
1 +
N
(5
Q
(1/ 6)/ 6) = + (5
0.278 / 6) = 0.278 n
2
2
/ (5
2
+ /(5
10)
2
26)/ 6)
+ (2 )10=2/)0.278
10) ) = 0.
= 0.278
2
1 n 2
RMSEP = i =1 ei
n (predicted residual error sum of squares; [PRESS])
PRESS = i =1 ei = n MSE /
n 2
CART
(training-and-testing) k (k-fold cross validation)
(training samples) (testing samples)
www.wunan.com.tw
(02)2705-5066
6 R
()
450 (n=300)
300 150
SEM
()
()
k k
k1
k
100 5
(k1k2k3k4) k5 k1k2
k3k5 k4
(RN) (N)
(RN) (N)
0.00%
20 18
2 90.0% (=18/20) 10.0%
(=2/20)
R
CART
(/)
www.wunan.com.tw
(02)2705-5066
01 7
1/2~2/3
1/3~1/2
(2015)
(2014)
Garcia,H., & Filzmoser, P. (2015). Multivariate statistical analysis using the R package
chemometrics. R package chemometrics
www.wunan.com.tw
(02)2705-5066
www.wunan.com.tw
(02)2705-5066
Chapter 02
{rpart}
rpart ( )
R
www.wunan.com.tw
(02)2705-5066
10 R
{rpart}
rpart ( )
{rpart} rpart ( )
formula ()y ~ x1 + x2 +
x3()
data
weights
subset ()
na.action
method "anova""poisson""class"
"exp" "anova" ()
"class" () (survival object) "exp"
()R
model
x y
parms () anova poisson
1
exponential poisson
1 0
gini
gini
control rpart rpart.control ( )
www.wunan.com.tw
(02)2705-5066
{rpart} rpart ( ) 02 11
cost
cost
cost
rpart.control ( )
rpart.control ( )
minsplit ()
20 20
minsplit
minbucket
minsplit ( minsplit minbucket
) minbucket = round (minsplit/3)minsplit
= 60 minbucket
20 minbucket 30 minsplit
minsplit 90
90 30
minsplit minbucket
cp (complexity parameter) = 0.01
cp
R cp
cp
cp
maxcompete
www.wunan.com.tw
(02)2705-5066
12 R
4
maxsurrogate (surrogate splits)
5 0
()
usesurrogate 0
0 1
2
xval 10
surrogatestyle 0
(surrogate variable)
1
maxdepth ()
0 30 30
parms (anova)
(Poisson)
1 (Exponential)
(
) () () (priors)
1
(splitting index) gini (information)
1 gini
print ( ) rpart
print ( )
www.wunan.com.tw
(02)2705-5066
{rpart} rpart ( ) 02 13
object rpart ( )
minlength 0
1 spaces
2 cp
cp () digits
summary ( ) rpart
summary ( )
object rpart ( ) cp (
) cp
printcp ( ) rpart (cp )
printcp ( )
digits
digits = getOption("digits") 4
path.rpart ( ) (path)
path.rpart ( )
tree rpart
nodes ()
()
pretty 0
www.wunan.com.tw
(02)2705-5066
14 R
print.it
read.csv ( ) test0.csv
tempIQINVOTACTSCORERANKARANKB
() () () (
) A ( A ) B ( B )
A 1 A 2 B
3 C B 1
2
()
() () () A (
) B () as.factor ( ) RANKARANKB
() ifelse ( )
> temp=read.csv("test0.csv",header=T)
> tail(temp)
IQ INVO TACT SCORE RANKA RANKB
37 110 50 23 90 1 1
38 110 50 24 91 1 1
39 110 52 25 92 1 1
40 110 52 26 93 1 1
41 108 53 24 91 1 1
42 115 32 28 90 1 1
> temp$RANKA = as.factor (temp$RANKA)
> temp$RANKA = ifelse (temp$RANKA == 1, "A ", ifelse (temp$ RANKA ==2, "B
", "C "))
> temp$RANKB = as.factor(temp$RANKB)
> temp$RANKB = ifelse(temp$RANKB == 1, "", "")
> names (temp) = c ("","","",""," A"," B")
www.wunan.com.tw
(02)2705-5066
{rpart} rpart ( ) 02 15
R
D rdata/
D rdata
/
read.csv( ) read.xlsx( )
> temp=read.csv("d:/rdata/test0.csv",header=T)
\\/
www.wunan.com.tw
(02)2705-5066
16 R
> temp=read.csv("d:\\rdata\\test0.csv",header=T)
R /\\\\\
> temp=read.csv("d:\rdata\test0.csv",header=T)
Error: '\R' is an unrecognized escape in character string starting ""d:\R"
[] "d:\R" R
read.csv ( )
*. xlsx {xlsx}
read.xlsx ( ) test0.xlsx
(NUM)
> library(xlsx)
Loading required package: rJava
Loading required package: xlsxjars
> edata = read.xlsx ("test0.xlsx", 1)
> tail (edata, 3)
NUM IQ INVO TACT SCORE RANKA RANKB
40 s40 110 52 26 93 1 1
41 s41 108 53 24 91 1 1
42 s42 115 32 28 90 1 1
www.wunan.com.tw
(02)2705-5066
{rpart} rpart ( ) 02 17
head ( )
cor ( )
cor.test ( ) 0
p cor.test ( ) p $p.value
{wCorr} weightedCorr ( )
www.wunan.com.tw
(02)2705-5066
18 R
(tetrachoric correlation)
(polyserial correlation) weightedCorr ( )
xy weights
1 () ML
weightedCorr ( )
()
(error sum of squares; [ESS])
A ()
aov ( )
www.wunan.com.tw
(02)2705-5066
{rpart} rpart ( ) 02 19
B ()
aov ( )
( B
t )
www.wunan.com.tw
(02)2705-5066
20 R
F
0
{ggplot2} geom_boxplot ( )
ggplot ( ) ( A) (
) geom_boxplot ( ) colour ll
geom_jitter ( )
B
coord_flip ( )
www.wunan.com.tw
(02)2705-5066
{rpart} rpart ( ) 02 21
www.wunan.com.tw
(02)2705-5066
www.wunan.com.tw
(02)2705-5066
Chapter 03
R
www.wunan.com.tw
(02)2705-5066
24 R
(Regression Tree)
Gini (log-likelihood function)
method anova SST - (SSL + SSR) SST = (yi - y)2
SSLSSR
()
y ()
(ynew - y)
rpart ( )
library ( ) {rpart} rpart ( )
method = anova
n = 20cp = 0.01
( 7) minsplit = 20
minbucket = 7
regt1
print ( ) rpart ( )
www.wunan.com.tw
(02)2705-5066
.
R,
.,2008.08
; . , 2017.05
.
ISBN978-957-11-5229-5()
1.2.3.
ISBN 978-957-11-9149-2
489.67 97009174
1. 2. 3.
512.4 106005096
1MCH
1H0G
R
275
1063394
(02)2705-5066(02)2706-6100
http://www.wunan.com.tw
1063394
wunanwunan.com.tw
(02)2705-5066(02)2706-6100
0 1 0 6 8 9 5 3
http://www.wunan.com.tw
wunan@wunan.com.tw
/6
0 1068953
(04)2223-0891(04)2223-3549
/290
(07)2358-702(07)2350-236
20175
760
2 0 0 8 8
3 5 0
www.wunan.com.tw
(02)2705-5066