You are on page 1of 38

www.wunan.com.

tw
(02)2705-5066
www.wunan.com.tw
(02)2705-5066
www.wunan.com.tw
(02)2705-5066

R
R R R
R (R )
R
R


R R


(data mining)
(decision tree)
(regression tree) (classication)
R

www.wunan.com.tw i
(02)2705-5066




2017 2

ii www.wunan.com.tw
(02)2705-5066

Chapter 01 1

Chapter 02 {rpart} rpart ( ) 9


rpart ( ) 10
rpart.control ( ) 11
12
14

Chapter 03 23
24
rpart ( ) 24
30
printcp ( ) 38
post ( ) 42
rpart ( ) 43
labels ( ) path.rpart ( ) 47
rsq.rpart ( ) R 49
50
{ggplot2} ggplot ( ) 57
67
74

www.wunan.com.tw iii
(02)2705-5066
Chapter 04 87
88
88
93
95
97
98
99
103
103
104
105
106
110

Chapter 05 115
rpart.plot ( ) 116
prp ( ) 118
119
122
122
123
125
132
140
{plotmo} plotmo ( ) 146

iv www.wunan.com.tw
(02)2705-5066
Chapter 06 151
152
153
153
163
165
169
170
175
179

Chapter 07 k 189
191
191
197
204
206
209
216

Chapter 08 {evtree} 223


evtree ( ) 224
225
225
234
241
{evtree} predict ( ) 246
247

www.wunan.com.tw v
(02)2705-5066
() 251
() 255
257

Chapter 09 {partykit} 265


ctree ( ) 266
267
267
269
272
plot ( ) 274
lmtree ( ) glmtree ( ) 279
lmtree ( ) 279
glmtree ( ) 282
284
I 284
284
II 287
291
291
295
predict ( ) 302

Chapter 10 {RWeka} {tree} 307


{RWeka} J48 ( ) 308
J48 ( ) 308
309
315
{tree} tree ( ) 322

vi www.wunan.com.tw
(02)2705-5066
329
329
331

Chapter 11 341
lm ( ) 342
{Blossom} ols ( ) lad ( ) 348
ols ( ) 348
lad ( ) 349
{psych} setCor ( ) 357
{rms} ols ( ) 367
373
{GGally} 373
383
390
392

Chapter 12 403
404
404
{radiant} logistic ( ) 415
{rms} lrm ( ) 422
431
{rpart} rpart ( ) 431
{partykit} ctree ( ) 436
{evtree} evtree ( ) 438
{C50} C5.0 ( ) 441
447
{rpartScore} rpartScore ( ) 451

www.wunan.com.tw vii
(02)2705-5066
Chapter 13 457
458
{DiscriMiner} 458
{mda} fda ( ) 473
{MASS} lda ( ) 479
485
489
{rpart} rpart ( ) 489
{partykit} ctree ( ) 492
{evtree} evtree ( ) 494
{Rweka} J48 ( ) 496
{C50} C5.0 ( ) 497

Chapter 14 501
502
(loop) 505
514
518
521
t 539
548
559
579

Chapter 15 RStudio 589


RStudio 594
600
601

viii www.wunan.com.tw
(02)2705-5066
603
608
613

www.wunan.com.tw ix
(02)2705-5066
www.wunan.com.tw
(02)2705-5066
Chapter 01

R
www.wunan.com.tw
(02)2705-5066
2 R

(classification) (data miniing)


(logistic)
(decision tree)



(root node)
(leaf node) (terminal node)
(decision tree) (/)
(regression tree) (classication tree) (
) ()
()

(response variable)
(explanatory variables)
(continuous-response variable)
(binary response variable)
(dummy variable)

(Chi-square Automatic
Interaction Detection; [CHAID]) (Classification and Regression
Tree; [CART]/[CRT]) AID FACT QUEST
C4.5 Ctree SPSS CHAID CRT
(CART) QUEST R CART CART

(child node) (
www.wunan.com.tw
(02)2705-5066
01 3

) (impurity)
(root node) (child
node) () ()
(branching point)
CART (classification and regression tree)
(binary tree structured classifiers)
()
CART
(impurity measure)
()
()
Gini (least
squares deviation) (sum of square of deviation
from the mean; [SS])
(improvement measure)()
(greedy algorithm)
Gini (Gini Index) ID3C4.5
(Information Gain) CHAID
Gini
s s s
(D1D2) N1N2 D n
Gini Gini(D) = 1 nj =1 p 2j pj j
D =D1
100 Gini(D)
2
= 1 2 n =1 p2 2)j = 0.50
j =1 p
n
j = 1 (0.5 + j0.5

1 nj =1 =p12j =
50 50 Gini= p 2j 2=+10.8
1 nj =1(0.2 (0.5
2
) =2 + 0.52 ) = 0.50 (
0.32
1 )p=
nj =(0.2
2 22
Gini(D) = 1 nj =1 p 2j = 1 nj =1 =pGini(D)
ID3 12j = p=2j 12=+(entropy)
1 nj =1(0.1 10.9 0.82 )1
+
j 0.18 = 0.32
) N
1 1
=
= 1 nj =1 p 2j = 1 (0.52 + 0.52 ) ==0.50 n n 2 2
j =120 1 jp
j =p 1N
=j =
12
(0.12 2
(0.5++ 0.90.52 2
=) =0.18
) 80 0.50

Gini(D) = 1 Gini( D 1 ) + Gini( D 2 )
Gini = 1 j =1 p j = 1 (0.2 + 0.8 ) = 0.32
n 2 2 2 N Np = 1 (0.2 +
n
= 1 2 N 2
N 0.8 ) = 0.32
2
n Gini(D) =j =1 1j Gini( D1 ) + 2 Gini( D2 )
10
= 1 nj =1 90
p 2j 1 + 0.9
Gini
= 1 (0.12
=1 1 ( (1
p) j==0.18
2 2
/nj =6)N2 2+ (5 / 6) 2 ) =
1 p j = 1 (0.1
2 N
+0.278
0.92 ) = 0.18
n

1( (12 / 6) 2 + (5 2N 2 ) = 0.278
j =1
Gini N N n 1 2 p 2j = 1N
/ 6) 2

N j =1 j j =1 ( N
Gini(D) = 1 Gini( D1 )=+1 2Gini( p =2 1 (8 / 10) + (2 /1 10) ) = 0.320
Gini(D)
D ) = Gini( D ) + Gini( D2 )
Gini Gini N ( N
N1 2 = 1 n njN =122p j = 1 ( (8 / 10) 102 ) = 0.320
2
n 62 2 + (2 /210)
1 p j = 1 ( (1 / 6) + (5 / 6) ) Gini
2
2014) 2= Gini( D
1
= ) +
10.278 p j = 1 ( (1
Gini(
D 2 / 6) + (5 / 6) ) = 0.278
) = 0.278 + 0.320 = 0.304
N N1 j =1 N N 2 16 6 16 10

j =1
ei =2 yi y=i N 2Gini( D1 ) + Gini( D2 ) = 0.278 + 0.320 = 0.3
= 1 j =1 p j = 1 ( (8 / 10) + (2 / 10)= 1 ) =1 p j = 1 ( (8 / 10) + (2 / 10) ) = 0.320
n 2 n
= j0.320 2 N 2 16 2 16
y i
www.wunan.com.tw ei = yi y i
N N 6 N1 10 N 2 6 10
= 1 Gini( D1 )(02)2705-5066
+ 2 Gini( D2 ) = y1i= 0.278 Gini( + D1 )2+0.320 Gini( D2 ) = 0.278 + 0.320 = 0
= 0.304
N N SEP = 16 N i =1 ( ei 16
n
e) N 16 16
Gini(D) = 1 nj =1 p 2j
4 R = 1 nj =1 p 2j = 1 (0.52 + 0.52 ) = 0.50
= 1 nj =1 p 2j = 1 (0.22 + 0.82 ) = 0.32
j =1 p(
n 2 2 2
D = 1 D1 j = 1 (0.1 + 0.9 N1)D2 ) = 0.18( N2)
N N
Gini Gini(D) = 1 Gini( D1 ) + 2 Gini( D2 )
N N
n
Gini(D) = 1 nj =1 p 2j
(
p=2j 1= 1 (nj =(11 p/ 6) + (5 / 6)22 ) = 0.278
2
1 2 2
j = 1 (0.5 + 0.5 ) = 0.50
) j =1 Gini
Gini(D) = 1 nj =1 p 2j n= 1 2 j =1 p j = 1 (0.2
n 2 2
+ 0.822 ) = 0.32
j =1 p j = 1 ( (8 / 10) (2 / 10) ) = 0.320
2
= Gini 1 16+
= 1 nj =1 p 2j = 1 (0.52 + 0.52 ) = 0.50 Gini(D) = 1 =nj =11 p 2j=nj =11 p 2j(0.12 + 0.92 ) = 0.18
N N2 6 10
= 1 Gini( D1n) + 2 N Gini( D2<2) =40 N 2 0.278 + 0.320 = 0.304
= 1 nj =1 p 2j = 1 (0.22 + 0.82 ) = 0.32 N = 1Gini(D) j =1 p=N j = 11Gini( (0.5D +) +0.5 16 2) Gini( = 0.50D 16
1 Gini 2 )
= 1 nj =1 p 2j = 1 (0.12 + 0.92 ) =ei0.18 = yi=1 y i n p 2 =N1 (0.22 + 0.8N2 ) = 0.32
+ 0.52 ) = 0.50 n j =1 j
2 1 pj
Gini(D) n = 12 n 2
GiniN ( N
+ 0.82 ) = 0.32Gini(D) = 1 Gini( D1 ) + 2 Gini( D2 )= 1 j =1j =1 pj j = 1 n (0.1
< y i40) = 1 n pGini(D)
2
2= 1 (1
( = 1
/
6) 2 +j =(5
2 p
2/j 6) j ==
+ 0.9 )2 = 0.18
1 ) 0.278
N N 1 n= 1 j =1 2p j = 1j
= 12 n =1 (0.5 p j =2 1Gini(D)
+0.5 (0.5 2 2 =
) =+0.50 10.5 2) nj==1 0.50 p 2j
SEP = N
n =1 (1en2i e2 ) N
+ 0.92 ) = 0.18 n
p 2j2 ==Gini(D) Gini(D)=nn1=1 12=ji= =1 pjj=Gini( n1=p1 2D1(1(8 ) +/ 10) 22
pGini(
+2 = (2 =+/D 110)) 2 ) 2=+0.320
2 n

p0.8
2 2 2
n nj = 2 n 21 2(0.2 =n )1 = n 2(0.5
0.32 + 0.52 ) = 0.50
p 2j =Gini(D)
1 ( (1 /=6)12+(5
Gini(D) ) 1 Gini(D)
p = 1 N1 p p = 1 jN 1 (0.2 j Gini(D) 0.8 )=j= =1 0.32
1jj = 1 p j =1j p j
2
1 j =/1 6) 0.278 j =1 j j = 1 j = 1 j j = Gini(D) 1=
N 2 e2 = Gini nNn p 2= 1 (0.5
1 2 )=Ginin 2
N212+ 220.5 (
n2 )2 =20.50 61 (0.1 n2 40) 2 10
=2)1=(0.5 10.18 - 22 + 0.8
+ 2 Gini( D2 ) j = p0.9 2
= 1 nj == 1 p
2
1j=1= (0.5j nj =+1 0.5 j =0.50 j = 1 2n = p 0.5 p) = 21 =
=0.50 n2 n=+ = (0.2 =2))0.50
= 0.32
j =11 p
n
1
Gini(D) p=2j11== 1 2p
=p1Gini( = D 1 ) +(0.5 =
2 Gini(
+ 1 1D (0.1 j )2= = + 0.9 )
0.278j= 120.18 + 0.320 = 0.304
j(0.5
+ 0.5
( ) = 0.50
)
1
N = 1 (1 j = /
1 6) j + (5j=
/ 6) 1= = 1
0.278 p =
p j 21 = 1
2
(0.5 + 0.5
+
2
0.5) = 0.50
= 1 j =1 p= j1=1nn ( (8p 22/ 10) IQR= 1)
j 1 2
n 2 2 S
(2 /22n10)
+(0.2 2 Q
= 3 0.320 Q
N 1p N+ 0.8 ) = 0.32 16 40
j = 1 j =1j
2 16
j
2 22=) 2 N =
n j 2 2 2
2= 1 (0.2 p N
n 2 2
= 1
= 1(0.5 j =1 0.5 + 0.8 = 0.32
0.50 N1+ 0.8= ) =1=0.32 = 1
1+=N12DjGini( = 1 (0.1 + 0.9 ) = 0.18
y=1 pGini(D)
n1 2
p=j 1=e1 j =j 1
=
j(0.2
j
jy += 0.8 1 (0.2 2
)Gini(D)
==2 0.32 D1 )Gini(
n
=
j =p1 p
n
11) j+
2
=j 21Gini(
=1j Dj )=
2
1(0.2 (0.2
2
D2+) 0.8
2
+ 0.8
2
) =)0.32
2
= 0.32
i =
jj = jj
+ (5 / 6) 2 ) = 0.278N1 Gini
=11
S = median n ni 2 e 2i median ( Gini(
e )
)=(= / 10)N )n = n0.320
N 26 =0.9 12=221)=j+ n1 10 p i 2p2=ij0.320 1 1(0.1 (8 +
2N
22
0.9i )2+=(2
/j+10) N0.18 2 N
2 N21
2
N2 2 2
= ==0.9 1n 2(0.1
MAD
Gini(=D11 )+njj ==112 pGini(
n 22
= = 1=D12 (0.1
)=2nj =+1
(0.2 0.8
p=0.278 0.18
0.32 j =1j
=1 p j +
0.304
j 1 =y1i j(0.1 0.18 n 0.9 ) ==0.18 =1j =1 pj =
2
jj 1Gini(D) = 1Gini(
=1j p j =
1(0.1 (0.1D1+)20.9 ++ 0.9 ) Gini(
=)0.18 D2 )
= 0.18
N N 16 1 16
0) + (2 / 10) e) ==0.320
2 2
= 1 ( n
j == N 1 (0.1< 40)
1 p j 1=Gini(
2
D1 ) ++=
M2
0.9
Gini(D) = 2 N
NN2= ) =1 Gini(
nN =D
n
i =1 i1
0.18
N
e 2
NGini(
1

)N+2 pGini(
N
21 N 2p = 1 26 (1 / 6) 2+ (510
Dnj12)=Gini( +1 N ()(1DGini(
2
j / 6) + ( (5 /n6) )N+=1 N0.278
2 22 ) = D2 ) 0.278
2 N / 6) 2
) N2 N2
= 0.278 N
yi yGini
Gini(D) 1Gini(
D2N1)) D +1 11Gini( 1 0.320 = 0.304
SEP
Gini(D) Gini(D) Gini(
SEP == j =1 N
D1 e) i+
i =1 (
jD
=12 e
N ) Gini( DGini(D)
Gini(D) 2) = 2 = 16 Gini( Gini( D )D+ ) + Gini( Gini( D )D2 )
16 ( (1 / 16) 221 +N(5N/ 6) 2 ) =2 0.278
i i
N N 2N N 2N 1 n 2 Nn 2 1 2 p j N= 21N
6 10 N N nMSEP n
y i
i( D2 ) = 0.278 + Gini(D) 0.320 ==0.304 1
Gini( D n 1) +
(2015) SEP =
eiGini(
2
=n y D e
2y)=i 1 j == 1j=1j =1 (2p(8j =/ 10) 1 n (j+ (8 / 10) / 10)+2 )(2=/0.320 10) ) = 0.320
12 ( (1 / 6)122 p+ 2 (5 / 6) )2 = 0.278 2
=n1 (2

n
1 (1(1/ 6) 2ep+22 1(5
/ 6) e2 p) 2pj=i 2=0.278
( )N1 N )1 =11N2pj jp==1j61p=j 1(=(11(/(16)6(/(826)+10
yji=1( j
p 2j = N N
16
SEP =
16 11
jn=1i =1 ( ei e )
n 2 j = 1 1
S
(1 /
=
6) =
Q
1 +
N
(5
Q
(1/ 6)/ 6) = + (5
0.278 / 6) = 0.278 n
2
2
/ (5
2
+ /(5
10)
2
26)/ 6)
+ (2 )10=2/)0.278
10) ) = 0.
= 0.278
2

1 pn 2j = 21 ( (1 / 6) 2 + (5 = Gini( D ) + Gini( D 20.278 + 0.320 = 0.304 = 0


) = 0.278 + 0.320
) 2= Gini( D1 )2 + Gini( 2 D2 ) =
j = 1 j = 1 1 2 j = 1
n1 / 6) 2IQR
n =10.278
3
22 (n (8 / 10) 2+
1
N / 10) )2N=1 n0.320
1
N
j = 1
16 16
= 1 j =1 j =1 p j = 1=1((8/RMSEP = 12+(2 /j =10) p j )21= n 1
=N0.320 2 (2 N 0.320
i =i1((e ) n16 16 2 2
=n1( (8 ey ) 2)+=y(2 =2j11)=+1(N 2(
6 2 =2 0.32
) 0.278
)=0
n10) 2 2
j =1 p=jSEP 1=S1= p/ji10) =1=e1 i + (2(8 eii/ /10)
10) /=10) 0.320
=jGini(
1 p
2
e MADj =n1= median
median j ( ei =
1=)=1 = j =1j p D (8 /(810)
Gini(/ 10) +D(2 + /(2
2) =
10) / 10)
n2 ei = 1N yi yi i i e = i 1,2,...n
N yNi 16
2 N1 6 10
(2(81Gini( n D2))=
i
=Q N
111 n
j =1 pD
2
j = 1 N N /PRESS
10)=D+)N(2 6
/N10) 0.320 10
S IQR = Q3 = Gini( 11=+n +N Gini( 60.320 D2 ) == 0.304 610 N0.2781=Ny + 10 0.320 N 2 N 2= 0.304 6 6 1
2
2 e0.278
)2 i Gini( 1 ) += Gini( =
D2Ne1 )M =+1
Gini( i =1 =Gini(iD y i1 n
)
+D N 2e )MSE
22= yGini(
i D 0.2782) = 16+= e=i 0.278 0.320
Gini( +16
1i
iGini( Dy i=
1 )D +
0.304
1) +
0.320 Gini( = 0.304
Gini( D2 )D=2 ) = 0.278 0.278+
N N
N1 e medianNN(2e ) e =S Ny =6NyQ n Q N10 16 16
SEP i = 1 16
i 1616 N N 16 N N 16 16 1
S MAD
= mediane=i = yi Gini( i D ) + j Gini( D i 2 )IQR =iy(standard 30.278 1error + of 12 0.320
prediction; =1 0.304 yn[SEP])
i yi 1 e = yi

y e = 2 iy SEP = e = e i2
y
(= e y ye

)y2

N iN i i i SEP 16
i =i SEP MSEP = 16 e i =1 ( ei ie i)=i1 i i i i i
n
1 y n 2 y i S MAD = mediani ei n median 1 n (e1) 1
M SEP= eiii ==1 eyi i y i y i yi e 2 j iy i SEP y i = in=1 ( ei e ) 2
n 1 e e n 1
y i = 2 1 n ( e e )SEP 1 n n (2e e ) 2
dian j ( ei ) 21
M =SEPn = e 1 i12=e 1 1 n n
SEP 2 = MSEP SEP e SEP
i =1 i=
SEP ==1 (nein1Se=IQR
RMSEP
i)= 1n1
i=
i (ieS
=1 Q 3ini=1 eQ
()
e2 )1=2 Q3 QSEP 1 eSEP

= = i = 1 (ie e )2 2
=i1 ( ei e )
n 1 1 n 1
i
n 1
IQR
i n n1 1
e2 n S2 S IQR
SEP
e = ein=1 ( ei e )e2 e SEP = MSEP = emedian =i median i e i j ( eQi )3 Qj1( ei )
=median
2
(interquartile S MADn range) MAD ei median e e
n 1 PRESSQ3 Q=1 i =1 ei = n MSE
2
1 = nQ3 S IQRe 2= S2SMAD median
RMSEP = eS i =1 ei

IQR 2 Q1 S
IQR = Q 3 S Q IQR1 = Q3 Q1
M =
1M n =2 1 n SeIQR
e i =1 i IQR
= Q= =3 Q 3Q1 Q1 i ei median j ( ei )
n S MAD( e=)mediani 1ei n median SEP SEPi =1 i
(ne )
S MAD
IQR == Qmedian3 Q1 i S MAD
ei median = median SRMSEP
j ii e =i median median = i 2 eji (eniemedian )2 2 j j (iei ) S M S MAD =2 median 1 n e 2 median ( e )
== median i =i1 iii ei median ij ( ei )
PRESS = i =1 ei = n MSE
n 2 MAD
SEP n =i = 1 SEP
MSEP i = MSEP
e 2 MAD
e SEP
n
j

SMMAD == median 1 n i 2 ei median 1


i =1 eiM (the 1M mean j ( ei= ) 1in=1n2enerror
squared of 2prediction; [MSEP])
2
1 1n n2 2 2
SEP = M nSEP
PRESS ei2 =n = e i =1i =e
i 2 2
1 iei = n MSE
e MSEP M SEP =2 == MSEPi =1 ei e
e
n1 SEP
n
i = 1SEP
n SEP
n n i =1 i
M SEP2 == MSEP
SEP i =1 ei
n 2
e 2 2 = MSEP SEP 2= MSEP 2
e 2
1 n = 2 1SEP e n 2
2
0 2SEP 2
MSE n SEP SEP = e 2MSEP RMSEP e 2 RMSEP = e SEP
i = 1
=i22 MSEP
e = MSEP e e 2
eSEP 2 e2 2 n
i =1 i
n2 2 1 n 2
MSEP
2
= MSEP e 2e 2
e e RMSEP e = i =10 ei
PRESS
1 PRESS
= i =1 ei = ni
n 2 = n e 2 = n MSE
=1 MSE n
eRMSEP =
2 1 RMSEP 1 n = 2 1i =1n ei 2 n 2
i
1 1n 2 n2 2
i =1 ei
n 2
MSEP RMSEP = RMSEP i =1 e=i n SEP i =1 ei RMSEP PRESS
RMSEP ===in ie=1 e= i n e MSE
n1 n n2 n =1n i i =1 i
RMSEP
PRESS ==in=n1 ei2PRESS
n
i ==1e
2
ni MSE PRESS 2 = i =1n ei 2= n MSE
n
= PRESS i =1 ei = =
n
n MSE i =1 ei = n MSE PRESS PRESS = =in= 1 ei =i 1 =
n2
ei2n=nMSE MSE
www.wunan.com.tw
PRESS = in=1 ei2 = n MSE
(02)2705-5066
S MAD = mediani ei median j ( ei )
1 n 2
M SEP = i =1 ei
n
SEP 2 = MSEP e 2 01 5
e 2

1 n 2
RMSEP = i =1 ei

n (predicted residual error sum of squares; [PRESS])
PRESS = i =1 ei = n MSE /
n 2

(Garcia & Filzmoser, 2015)


(pruning)R
(cp)
()


R


CART

CART
(training-and-testing) k (k-fold cross validation)

(training samples) (testing samples)


www.wunan.com.tw
(02)2705-5066
6 R

()
450 (n=300)
300 150
SEM

()
()
k k
k1
k
100 5
(k1k2k3k4) k5 k1k2
k3k5 k4

(RN) (N)
(RN) (N)


0.00%

20 18
2 90.0% (=18/20) 10.0%
(=2/20)
R


CART

(/)



www.wunan.com.tw
(02)2705-5066
01 7

1/2~2/3

1/3~1/2

(2015)
(2014)
Garcia,H., & Filzmoser, P. (2015). Multivariate statistical analysis using the R package
chemometrics. R package chemometrics

www.wunan.com.tw
(02)2705-5066
www.wunan.com.tw
(02)2705-5066
Chapter 02
{rpart}
rpart ( )

R
www.wunan.com.tw
(02)2705-5066
10 R

{rpart}

rpart ( )

{rpart} rpart ( )

rpart (formula, data, weights, subset, na.action = na.rpart, method, model =


FALSE, x = FALSE, y = TRUE, parms, control, cost)

formula ()y ~ x1 + x2 +
x3()
data
weights
subset ()
na.action

method "anova""poisson""class"
"exp" "anova" ()
"class" () (survival object) "exp"
()R
model
x y

parms () anova poisson
1
exponential poisson

1 0
gini
gini
control rpart rpart.control ( )

www.wunan.com.tw
(02)2705-5066
{rpart} rpart ( ) 02 11

cost
cost
cost

rpart.control ( )

rpart.control ( )

rpart.control (minsplit = 20, minbucket = round (minsplit/3), cp = 0.01,


maxcompete = 4, maxsurrogate = 5, usesurrogate = 2, xval = 10, surrogatestyle = 0,
maxdepth = 30)

minsplit ()
20 20
minsplit

minbucket
minsplit ( minsplit minbucket
) minbucket = round (minsplit/3)minsplit
= 60 minbucket
20 minbucket 30 minsplit
minsplit 90
90 30
minsplit minbucket

cp (complexity parameter) = 0.01
cp
R cp

cp
cp
maxcompete

www.wunan.com.tw
(02)2705-5066
12 R

4
maxsurrogate (surrogate splits)
5 0
()
usesurrogate 0

0 1

2

xval 10
surrogatestyle 0
(surrogate variable)
1
maxdepth ()
0 30 30
parms (anova)
(Poisson)
1 (Exponential)

(
) () () (priors)
1
(splitting index) gini (information)
1 gini

print ( ) rpart
print ( )

print (object, minlength = 0, spaces = 2, cp, digits = getOption("digits"))

www.wunan.com.tw
(02)2705-5066
{rpart} rpart ( ) 02 13

object rpart ( )
minlength 0
1 spaces
2 cp
cp () digits

summary ( ) rpart
summary ( )

summary (object, cp = 0, digits = getOption ("digits"))

object rpart ( ) digits


cp
prune ( )

prune (tree, cp)

object rpart ( ) cp (
) cp
printcp ( ) rpart (cp )
printcp ( )

printcp (object, digits = getOption ("digits") 2)

digits
digits = getOption("digits") 4

path.rpart ( ) (path)
path.rpart ( )

path.rpart (tree, nodes, pretty = 0, print.it = TRUE)

tree rpart
nodes ()
()
pretty 0
www.wunan.com.tw
(02)2705-5066
14 R

print.it



read.csv ( ) test0.csv
tempIQINVOTACTSCORERANKARANKB
() () () (
) A ( A ) B ( B )
A 1 A 2 B
3 C B 1
2
()
() () () A (
) B () as.factor ( ) RANKARANKB
() ifelse ( )

> temp=read.csv("test0.csv",header=T)
> tail(temp)
IQ INVO TACT SCORE RANKA RANKB
37 110 50 23 90 1 1
38 110 50 24 91 1 1
39 110 52 25 92 1 1
40 110 52 26 93 1 1
41 108 53 24 91 1 1
42 115 32 28 90 1 1
> temp$RANKA = as.factor (temp$RANKA)
> temp$RANKA = ifelse (temp$RANKA == 1, "A ", ifelse (temp$ RANKA ==2, "B
", "C "))
> temp$RANKB = as.factor(temp$RANKB)
> temp$RANKB = ifelse(temp$RANKB == 1, "", "")
> names (temp) = c ("","","",""," A"," B")

www.wunan.com.tw
(02)2705-5066
{rpart} rpart ( ) 02 15

> tail (temp)


A B
37 110 50 23 90 A
38 110 50 24 91 A
39 110 52 25 92 A
40 110 52 26 93 A
41 108 53 24 91 A
42 115 32 28 90 A

R
D rdata/
D rdata

/
read.csv( ) read.xlsx( )

> temp=read.csv("d:/rdata/test0.csv",header=T)

\\/

www.wunan.com.tw
(02)2705-5066
16 R

> temp=read.csv("d:\\rdata\\test0.csv",header=T)

R /\\\\\

> temp=read.csv("d:\rdata\test0.csv",header=T)
Error: '\R' is an unrecognized escape in character string starting ""d:\R"
[] "d:\R" R

read.csv ( )

> temp = read.csv ("d:/R01/test0.csv",header = T)


Error in file (file, "rt") : (cannot open the connection)
In addition: Warning message:
In file (file, "rt") :
(cannot open file) 'd:/R01/test0.csv': No such file or directory

*. xlsx {xlsx}
read.xlsx ( ) test0.xlsx
(NUM)

> library(xlsx)
Loading required package: rJava
Loading required package: xlsxjars
> edata = read.xlsx ("test0.xlsx", 1)
> tail (edata, 3)
NUM IQ INVO TACT SCORE RANKA RANKB
40 s40 110 52 26 93 1 1
41 s41 108 53 24 91 1 1
42 s42 115 32 28 90 1 1

> edata = read.xlsx ("test0.xlsx", 1)


www.wunan.com.tw
(02)2705-5066
{rpart} rpart ( ) 02 17

> edata = read.xlsx ("d:/rdata/test0.xlsx",1)


head ( )

> names (edata) = c ("","","","",""," A"," B")


> head (edata,2)
A B
1 s01 130 17 10 78 2 2
2 s02 131 18 11 76 2 2

cor ( )

> round (cor(temp[,1:4,6]),3)



1.000 -0.013 0.206 0.311
-0.013 1.000 0.582 0.813
0.206 0.582 1.000 0.779
0.311 0.813 0.779 1.000
[]
0.3110.8130.779

cor.test ( ) 0
p cor.test ( ) p $p.value

> round (cor.test (~ + , data = temp) $p.value,3)


[1] 0.045
> round (cor.test (~ + , data = temp) $p.value,3)
[1] 0
> round (cor.test (~ + , data = temp) $p.value,3)
[1] 0
[]
p 0.0450.0000.000 (p < .05)
0

{wCorr} weightedCorr ( )

www.wunan.com.tw
(02)2705-5066
18 R

(tetrachoric correlation)
(polyserial correlation) weightedCorr ( )

weightedCorr (x, y, method = c("Pearson", "Spearman", "Polyserial",


"Polychoric"), weights = rep(1, length (x)), ML = FALSE)

xy weights
1 () ML


weightedCorr ( )

> library (wCorr)


> round (weightedCorr (temp [,1], temp [,4], method = "Pearson"), 3)
[1] 0.311
> round (weightedCorr (temp [,2], temp [,4], method = "Pearson"), 3)
[1] 0.813
> round (weightedCor r(temp [,3], temp [,4], method = "Pearson"), 3)
[1] 0.779
[]
0.3110.8130.779


()
(error sum of squares; [ESS])

A ()
aov ( )

> summary (aov ( ~ A, data = temp))


Df Sum Sq Mean Sq F value Pr (> F)
A 2 345.3 172.67 4.533 0.017 *
Residuals 1485.6 38.09
---

www.wunan.com.tw
(02)2705-5066
{rpart} rpart ( ) 02 19

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1


> summary (aov ( ~ A, data = temp))
Df Sum Sq Mean Sq F value Pr (> F)
A 2 5918 2959 55.79 3.62e-12 ***
Residuals 39 2068 53
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
> summary (aov ( ~ A, data = temp))
Df Sum Sq Mean Sq F value Pr(> F)
A 2 6382 3191 28.48 2.38e-08 ***
Residuals 39 4370 112
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
[] A
F
0

B ()
aov ( )
( B
t )

> summary (aov ( ~ B, data = temp))


Df Sum Sq Mean Sq F value Pr (> F)
B 1 25 25.02 0.554 0.461
Residuals 40 1806 45.15
> summary (aov ( ~ B, data = temp))
Df Sum Sq Mean Sq F value Pr (> F)
B 1 4841 4841 61.55 1.28e-09 ***
Residuals 40 3146 79
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
> summary (aov ( ~ B, data = temp))
Df Sum Sq Mean Sq F value Pr (> F)
B 1 4939 4939 33.99 8.2e-07 ***
Residuals 40 5813 145
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
[] B

www.wunan.com.tw
(02)2705-5066
20 R

F
0

{ggplot2} geom_boxplot ( )
ggplot ( ) ( A) (
) geom_boxplot ( ) colour ll
geom_jitter ( )

> library (ggplot2)


> p.m = ggplot (data = temp, aes (x = A, y = ))
> p.m + geom_boxplot (colour = "blue", ll = "gray") + geom_jitter (width = 0.4)
A

B
coord_flip ( )

www.wunan.com.tw
(02)2705-5066
{rpart} rpart ( ) 02 21

> p.m = ggplot (data = temp,aes (x = B, y = ))


> p.m + geom_boxplot (colour = "blue", ll = "green", size = 1) + geom_jitter (size =
2,shape = 1) + coord_ip ()

www.wunan.com.tw
(02)2705-5066
www.wunan.com.tw
(02)2705-5066
Chapter 03

R
www.wunan.com.tw
(02)2705-5066
24 R

(Regression Tree)
Gini (log-likelihood function)
method anova SST - (SSL + SSR) SST = (yi - y)2
SSLSSR

()
y ()
(ynew - y)

rpart ( )
library ( ) {rpart} rpart ( )
method = anova
n = 20cp = 0.01
( 7) minsplit = 20
minbucket = 7

> round (20/3,0)


[1] 7

regt1

> library (rpart)


> regt1 = rpart (~ + + , data = temp, method = "anova")

print ( ) rpart ( )

> print (regt1)


n = 42
node), split, n, deviance, yval
* denotes terminal node
[] N = 42

www.wunan.com.tw
(02)2705-5066

.
R,
.,2008.08
; . , 2017.05
.

ISBN978-957-11-5229-5()

1.2.3.
ISBN 978-957-11-9149-2
489.67 97009174
1. 2. 3.

512.4 106005096
1MCH
1H0G
R


275





















1063394

(02)2705-5066(02)2706-6100

http://www.wunan.com.tw
1063394
wunanwunan.com.tw
(02)2705-5066(02)2706-6100
0 1 0 6 8 9 5 3
http://www.wunan.com.tw

wunan@wunan.com.tw
/6
0 1068953
(04)2223-0891(04)2223-3549

/290

(07)2358-702(07)2350-236
20175

760
2 0 0 8 8
3 5 0

www.wunan.com.tw
(02)2705-5066

You might also like