You are on page 1of 12

Suppose 1 , 2 , , are ID from (identically distributed but not necessarily independent) with pairwise correlation, > 0.

t; 0. The variance of the average is given by 1

=1

=1

= 2 2 +

= 2 2 + 2 1

2 2 1 = +
2 2 = + 2 2 2 = 2 + = 2 + 1 2 2 as

Pruned Tree for Training Data

Edible

Training Error: 0.28% (N=5000) Test Error: (N=3124) 0.32%

2577 2423 odor=alm,ans,non crs,fsh,fol,mst,png,spc

Edible
2577 72 spore.print.color=blc,brw,bff,chc,orn,prp,wht,yll grn

Poison
0 2351

Edible
2577 30 stalk.color.below.ring=brw,gry,orn,pnk,red,wht yll

Poison
0 42

> train.cm Predicted Actual Edible Poison Edible 2577 0 Poison 14 2409 > test.cm

Edible
2577 14

Poison
0 16

Predicted Actual Edible Poison Edible 1631 0 Poison 10 1483

OOB

Gini

odor spore.print.color gill.color habitat population stalk.root gill.size cap.color ring.type ring.number stalk.surface.above.ring gill.spacing stalk.shape stalk.surface.below.ring stalk.color.below.ring cap.surface cap.shape stalk.color.above.ring bruises veil.color gill.attachment veil.type

odor spore.print.color gill.color gill.size stalk.surface.below.ring ring.type stalk.surface.above.ring population stalk.root habitat bruises gill.spacing stalk.color.above.ring cap.color stalk.color.below.ring ring.number stalk.shape cap.surface cap.shape veil.color gill.attachment veil.type

200

400

600

800

Variable Importance

Variable Importance

1000

0.0

0.1

0.2

0.3

0.4

0.5

-10 10 0 5 0 -5 -5

-10

10

almond anise creosote fishy foul musty none pungent spicy

Partial Dependence on odor

Partial Dependence on gill.color

black brown buff chocolate gray green orange pink purple red white yellow

-10

10

-5

-10

-5

10

black brown buff chocolate green orange purple white yellow

Partial Dependence on spore.print.color

Partial Dependence on gill.size

broad narrow

Residuals vs Fitted
40 Standardized residuals 4
266

Normal Q-Q
266

20

Residuals

-20

-40

124

-5

10

15

-4

399

-2
124 399

0
-3

-2

-1

Fitted values

Theoretical Quantiles

Scale-Location
2.0
266

Residuals vs Leverage
Standardized residuals 4
266 210348

Standardized residuals

124

399

1.5

1.0

0.5

-4

-2

0.0

Cook's distance 0.00 0.01 0.02 Leverage 0.03 0.04

-5

10

15

Fitted values

Call: lm(formula = y ~ ., data = train) Residuals: Min 1Q Median 3Q Max -38.439 -6.281 -0.157 6.077 43.840 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.90011 0.38307 18.013 < 2e-16 *** x1 1.01472 0.19203 5.284 1.69e-07 *** x2 0.33665 0.19040 1.768 0.0775 . x3 -2.03025 0.19615 -10.350 < 2e-16 *** x4 -0.04505 0.19248 -0.234 0.8150 x5 0.00369 0.19045 0.019 0.9845 x6 -0.04939 0.18944 -0.261 0.7944 x7 0.20111 0.19208 1.047 0.2955 x8 0.14179 0.19606 0.723 0.4698 x9 -0.32560 0.19544 -1.666 0.0962 . --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 10.06 on 690 degrees of freedom Multiple R-squared: 0.1747, Adjusted R-squared: 0.1639 F-statistic: 16.23 on 9 and 690 DF, p-value: < 2.2e-16

%IncMSE

IncNodePurity

x3

x2

x1

x4

x9

x7

x5

x6

x8

10 20 30 40 50 60 70 Importance

x5
0

x7

x8

x9

x6

x4

x1

x2

x3

5000 10000 Importance

20000

y -20
-6

y -20 0 10 20 30 40

10 20 30 40

-4 -4 -2 -2 0 2 4 x3 x1 0 2 4

y -20 0 10 20 30 40
-6

y -20 0 10 20 30 40

-4 -4 -2 -2 0 2 2 4 4 6 x4 x2 0

Partial Dependence on x1
10 25

Partial Dependence on x2

-8

-6

-4

-2 x1

5
-6

10

15

20

-4

-2

0 x2

Partial Dependence on x3

Partial Dependence on x4

10

-6

-4

-2

0 x3

10

-5 x4

5 Ix -4
-4 -2 -2 -4

0 0 2 4 -1.0 -0.5 0.0

5 Ix

sin x 0.5 1.0

-2

x3

x1

x 2 4

0 2 4
2

log x -5
-4 -2

x -1 0 1
-4

-4

-3

-2

10

15

-2

x4

x2

0 2 4

0 2 4

140

Random Forest : R = 0.72 2 Good LM : R = 0.93 2 Bad LM : R = 0.16

MSE 0
0

20

40

60

80

100

120

100

200

300

400

500

Number of Trees

Some Stuff
Intuitively, reducing will reduce and 2 therefore reduce
Regression: = 1 =1

characterizes the -th random forest tree

You might also like