Professional Documents
Culture Documents
http://lib.stat.cmu.edu/DASL/Datafiles/USCrime.html
Nombre de archivo de: EE.UU. Delito
Archivo de Materias: ciencias sociales
Nombres historia: EE.UU. Delito
Referencia: Vandaele, W. (1978) Participacin en actividades ilcitas: Erlich
renovado. En La disuasin y la incapacitacin, Blumstein, A., Cohen, J. y Nagin, D.,
eds., Washington, DC: National Academy of Sciences, 270-335. Mtodos: Estudio, Nueva
York: Chapman & Hall, 11. Tambin encontrado en: mano, D.J., et al. (1994) Manual de
los pequeos conjuntos de datos, Londres: Chapman & Hall, 101-103.
Autorizacin: Contacto autor
Descripcin: Estos datos estn relacionados con el delito y las estadsticas
demogrficas para 47 estados de EE.UU. en 1960. Los datos fueron recolectados a
partir de la FBI del Informe Uniforme de Delincuencia y otros organismos
gubernamentales a fin de determinar la forma en que la variable tasa de delincuencia
depende de las otras variables medidas en el estudio.
Nmero de casos: 47
Nombres de variables:
R: tasa de delincuencia: el nmero de delitos denunciados a la polica por milln de
habitantes
Edad: El nmero de hombres de edad 14ta-24o por cada 1000 habitantes
S: Indicador de la variable de los estados del sur (0 = no, 1 = S)
Ed: La media de nmero de aos de escolaridad x 10 para las personas de 25 aos de
edad o ms
Ex0: 1960 el gasto per cpita de la polica estatal y el gobierno local
Ex1: 1959 el gasto per cpita de la polica estatal y el gobierno local
LF: tasa de participacin de la fuerza laboral por cada 1000 hombres de edad civil
urbana 14ta-24ta
M: El nmero de varones por cada 1000 mujeres
N: tamao de la poblacin del Estado en cientos de miles
NW: El nmero de no-blancos por cada 1000 habitantes
U1: Tasa de desempleo urbano por cada 1000 hombres de edad 14a-24a
U2: Tasa de desempleo urbano por cada 1000 hombres de edad 35-39
W: La mediana del valor de los bienes y activos transferibles o ingresos de la
familia en decenas de $
X: El nmero de familias por cada 1000 ingresos por debajo de 1 / 2, la mediana de
ingresos
Y tasa de delincuencia
>RyM<- read.table(file.choose(),T)
>RyM
R Age S Ed Ex0 Ex1 LF
M
1
79.1 151 1 91 58 56 510 950
2 163.5 143 0 113 103 95 583 1012
3
57.8 142 1 89 45 44 533 969
4 196.9 136 0 121 149 141 577 994
5 123.4 141 0 121 109 101 591 985
6
68.2 121 0 110 118 115 547 964
7
96.3 127 1 111 82 79 519 982
8 155.5 131 1 109 115 109 542 969
9
85.6 157 1 90 65 62 553 955
10 70.5 140 0 118 71 68 632 1029
11 167.4 124 0 105 121 116 580 966
12 84.9 134 0 108 75 71 595 972
13 51.1 128 0 113 67 60 624 972
14 66.4 135 0 117 62 61 595 986
15 79.8 152 1 87 57 53 530 986
16 94.6 142 1 88 81 77 497 956
17 53.9 143 0 110 66 63 537 977
18 92.9 135 1 104 123 115 537 978
19 75.0 130 0 116 128 128 536 934
20 122.5 125 0 108 113 105 567 985
21 74.2 126 0 108 74 67 602 984
22 43.9 157 1 89 47 44 512 962
23 121.6 132 0 96 87 83 564 953
24 96.8 131 0 116 78 73 574 1038
25 52.3 130 0 116 63 57 641 984
26 199.3 131 0 121 160 143 631 1071
27 34.2 135 0 109 69 71 540 965
28 121.6 152 0 112 82 76 571 1018
29 104.3 119 0 107 166 157 521 938
30 69.6 166 1 89 58 54 521 973
31 37.3 140 0 93 55 54 535 1045
32 75.4 125 0 109 90 81 586 964
33 107.2 147 1 104 63 64 560 972
34 92.3 126 0 118 97 97 542 990
35 65.3 123 0 102 97 87 526 948
36 127.2 150 0 100 109 98 531 964
37 83.1 177 1 87 58 56 638 974
38 56.6 133 0 104 51 47 599 1024
39 82.6 149 1 88 61 54 515 953
40 115.1 145 1 104 82 74 560 981
41 88.0 148 0 122 72 66 601 998
42 54.2 141 0 109 56 54 523 968
43 82.3 162 1 99 75 70 522 996
44 103.0 136 0 121 95 96 574 1012
45 45.5 139 1 88 46 41 480 968
46 50.8 126 0 104 106 97 599 989
47 84.9 130 0 121 90 91 623 1049
> RyM$S<-factor(RyM$S)
> levels(RyM$S)
[1] "0" "1"
> levels(RyM$S)<-c("no","si")
> table(RyM$S)
no si
31 16
N
33
13
18
157
18
25
4
50
39
7
101
47
28
22
30
33
10
31
51
78
34
22
43
7
14
3
6
10
168
46
6
97
23
18
113
9
24
7
36
96
9
4
40
29
19
40
3
NW
301
102
219
80
30
44
139
179
286
15
106
59
10
46
72
321
6
170
24
94
12
423
92
36
26
77
4
79
89
254
20
82
95
21
76
24
349
40
165
126
19
2
208
36
49
24
22
U1
108
96
94
102
91
84
97
79
81
100
77
83
77
77
92
116
114
89
78
130
102
97
83
142
70
102
80
103
92
72
135
105
76
102
124
87
76
99
86
88
84
107
73
111
135
78
113
U2
41
36
33
39
20
29
38
35
28
24
35
31
25
27
43
47
35
34
34
58
33
34
32
42
21
41
22
28
36
26
40
43
24
35
50
38
28
27
35
31
20
37
27
37
53
25
40
W
394
557
318
673
578
689
620
472
421
526
657
580
507
529
405
427
487
631
627
626
557
288
513
540
486
674
564
537
637
396
453
617
462
589
572
559
382
425
395
488
590
489
496
622
457
593
588
X
261
194
250
167
174
126
168
206
239
174
170
172
206
190
264
247
166
165
135
166
195
276
227
176
196
152
139
215
154
237
200
163
233
166
158
153
254
225
251
228
144
170
224
162
249
171
160
//No uso la function na.omit(RyM) para eliminar datos perdidos, ya que en mi data
esta completa
> summary(RyM)
R
Min.
: 34.20
1st Qu.: 65.85
Median : 83.10
Mean
: 90.51
3rd Qu.:105.75
Max.
:199.30
Ex1
Min.
: 41.00
1st Qu.: 58.50
Median : 73.00
Mean
: 80.23
3rd Qu.: 97.00
Max.
:157.00
NW
Min.
: 2.0
1st Qu.: 24.0
Median : 76.0
Mean
:101.1
3rd Qu.:132.5
Max.
:423.0
X
Min.
:126.0
1st Qu.:165.5
Median :176.0
Mean
:194.0
3rd Qu.:227.5
Max.
:276.0
Age
Min.
:119.0
1st Qu.:130.0
Median :136.0
Mean
:138.6
3rd Qu.:146.0
Max.
:177.0
LF
Min.
:480.0
1st Qu.:530.5
Median :560.0
Mean
:561.2
3rd Qu.:593.0
Max.
:641.0
U1
Min.
: 70.00
1st Qu.: 80.50
Median : 92.00
Mean
: 95.47
3rd Qu.:104.00
Max.
:142.00
S
no:31
si:16
Ed
Min.
: 87.0
1st Qu.: 97.5
Median :108.0
Mean
:105.6
3rd Qu.:114.5
Max.
:122.0
M
Min.
: 934.0
1st Qu.: 964.5
Median : 977.0
Mean
: 983.0
3rd Qu.: 992.0
Max.
:1071.0
U2
Min.
:20.00
1st Qu.:27.50
Median :34.00
Mean
:33.98
3rd Qu.:38.50
Max.
:58.00
Ex0
Min.
: 45.0
1st Qu.: 62.5
Median : 78.0
Mean
: 85.0
3rd Qu.:104.5
Max.
:166.0
N
Min.
: 3.00
1st Qu.: 10.00
Median : 25.00
Mean
: 36.62
3rd Qu.: 41.50
Max.
:168.00
W
Min.
:288.0
1st Qu.:459.5
Median :537.0
Mean
:525.4
3rd Qu.:591.5
Max.
:689.0
> modelo<-lm(R~Ex1,RyM)
> modelo
Call:
lm(formula = R ~ Ex1, data = RyM)
Coefficients:
(Intercept)
16.5164
Ex1
0.9222
>cat("Y=",modelo$coefficients[1],"+",modelo$coefficients[2],"X","\n\n")
Y= 16.51642 + 0.9222031 X
> summary(modelo)
Call:
lm(formula = R ~ Ex1, data = RyM)
Residuals:
Min
1Q
-59.558 -15.676
Median
1.229
3Q
14.674
Max
59.374
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.5164
13.0427
1.266
0.212
Ex1
0.9222
0.1537
6.001 3.11e-07 ***
---
Signif. codes:
167.4 84.9
121.6 96.8
65.3 127.2
84.9
142 143 135 130
123 150 177 133
no si no no no
no no
88 110 104 116
102 100 87 104
81 66 123 128
97 109 58 51
77
87
63 115 128
98 56 47
986
973
968
33
113
10
9
31
24
51
7
321
76
6 170
24 349
24
40
116 114
124 87
89
76
78
99
33 34 32 42 21
25 40
427 487 631 627
572 559 382 425
[1] 261 194 250 167 174 126 168 206 239 174 170 172 206 190 264 247 166 165 135
[20] 166 195 276 227 176 196 152 139 215 154 237 200 163 233 166 158 153 254 225
[39] 251 228 144 170 224 162 249 171 160
Histograma para la variable R
> RyM$R
[1] 79.1 163.5
[13] 51.1 66.4
[25] 52.3 199.3
[37] 83.1 56.6
68.2
92.9
69.6
54.2
> n<-length(RyM$R)
> n
[1] 47
#Regla de Sturges para hallar el numero de intervalos
> k<-1+3.3*log10(47)
> k
[1] 6.517923
> round(k)
[1] 7
#A<-Xmax-Xmin
> A<-max(RyM$R)-min(RyM$R)
> A
[1] 165.1
#tic=A/round(k)
> tic<-A/round(k)
> tic
[1] 23.58571
#COMO LOS NUMEROS TIENEN UN DECIMAL REDONDEAMOS CON UN DECIMAL
tic<-23.6
#LI1=min(RyM$R)
LS1= LI1+tic
> LI1<-min(RyM$R)
> LI1
[1] 34.2
> LS1<-LI1+tic
> LS1
[1] 57.8
#LI2= LI1+tic
> LI2<-LI1+tic
> LI2
[1] 57.8
> LS2<- LS1+tic
> LS2
[1] 81.4
LS2= LS1+tic
h<-graph.freq(RyM$R,frequency=3)
summary(h)
Inf
Sup
MC fi
fri Fi
34.2 57.8 46.0 11 0.23404255 11
57.8 81.4 69.6 10 0.21276596 21
81.4 105.0 93.2 14 0.29787234 35
105.0 128.6 116.8 7 0.14893617 42
128.6 152.2 140.4 0 0.00000000 42
152.2 175.8 164.0 3 0.06382979 45
175.8 199.4 187.6 2 0.04255319 47
Fri
0.2340426
0.4468085
0.7446809
0.8936170
0.8936170
0.9574468
1.0000000
0.000
0.004
0.008
0.012
> normal.freq(h,col="blue",frequency=3)
50
100
RyM$R
150
200
0.012
0.008
0.004
0.000
50
100
150
200
RyM$R
> qqnorm(RyM$R)
> shapiro.test(RyM$R)
Shapiro-Wilk normality test
data: RyM$R
W = 0.9127, p-value = 0.001882
150
100
50
Sample Quantiles
200
-2
-1
Theoretical Quantiles
6
8
10
12
14
16
18
|
|
|
|
|
|
|
568014559
0233556823567
3475
22337
6
47
79
> par(mfrow=c(1,3))
> hist(RyM$R)
> plot(density(RyM$R,na.rm=TRUE))
> plot(sort(RyM$R),pch=".")
sort(RyM$R)
0.000
50
0.002
0.004
100
0.006
Density
Frequency
0.008
150
0.010
10
0.012
200
12
Histogram of RyM$R
50
100
150
200
RyM$R
50
100
150
200
10
20
30
Index
#coeficiente de variabilidad
#creando la funcion cv
cv<-function(x){sd(x)/abs(mean(x))*100}
> cv(RyM$R)
[1] 42.73219
#Valores outlier para la variable R
Varanalizar<-RyM$R
outliers<-boxplot(Varanalizar,plot=F)$out
nout=as.character(outliers)
boxplot(Varanalizar,col="blue")
for(i in 1:length(outliers))
{
text(outliers[i],as.character(which(Varanalizar==outliers[i])),cex=.8,pos=4)}
40
200
26
4
50
100
150
11
Las observaciones
160
40
60
80
100
120
140
29
valores outlier: 29
#dividiendo los datos de la varioable Ex1 teniendo en cuenta la variable S
> split(RyM$Ex1,RyM$S)
$no
[1] 95 141 101 115 68 116
[20] 157 54 81 97 87 98
71
47
60
66
61
54
63 128 105
96 97 91
67
83
73
57 143
$si
[1]
77 115
44
54
54
74
70
41
56
44
79 109
62
53
64
56
71
Varanalizar<- split(RyM$Ex1,RyM$S)$no
60
80
100
120
140
160
outliers<-boxplot(Varanalizar,plot=F)$out
nout=as.character(outliers)
boxplot(Varanalizar,col="blue")
for(i in 1:length(outliers))
{
text(outliers[i],as.character(which(Varanalizar==outliers[i])),cex=.8,pos=4)
}
valores outlier: 20
Varanalizar<- split(RyM$Ex1,RyM$S)$si
20
76
outliers<-boxplot(Varanalizar,plot=F)$out
nout=as.character(outliers)
boxplot(Varanalizar,col="blue")
for(i in 1:length(outliers))
{
text(outliers[i],as.character(which(Varanalizar==outliers[i])),cex=.8,pos=4)
}
40
60
80
100
200
150
50
50
100
R
100
150
200
valores outlier: 8, 4
> par(mfrow=c(1,2))
> plot(R~Ex1 ,RyM)
> plot(R~S ,RyM)
40
80
120
Ex1
160
no
si
S
###
> modelo<-lm(RyM$R~RyM$Ex1)
> summary(modelo)
Call:
lm(formula = RyM$R ~ RyM$Ex1)
Residuals:
Min
1Q
-59.558 -15.676
Median
1.229
3Q
14.674
Max
59.374
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.5164
13.0427
1.266
0.212
RyM$Ex1
0.9222
0.1537
6.001 3.11e-07 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 29.14 on 45 degrees of freedom
Multiple R-squared: 0.4445,
Adjusted R-squared: 0.4322
F-statistic: 36.01 on 1 and 45 DF, p-value: 3.114e-07
El R cuadrado sale casi 45% influenciado por los valores extremos del modelo
> plot(modelo,2)
Normal Q-Q
1
0
-1
-2
Standardized residuals
19
29
-2
-1
0
Theoretical Quantiles
lm(R ~ Ex1)
Podemos eliminar del modelo las observaciones 29,19 y 2 para ver como cambia nuestro
R
> modelo<-lm(R~Ex1,RyM[-c(2,19,29),])
> summary(modelo)
Call:
lm(formula = R ~ Ex1, data = RyM[-c(2, 19, 29), ])
Residuals:
Min
1Q
-64.792 -14.427
Median
2.869
3Q
15.707
Max
33.953
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.7190
12.2912 -0.058
0.954
Ex1
1.1627
0.1518
7.659 1.68e-09 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 25 on 42 degrees of freedom
Multiple R-squared: 0.5828,
Adjusted R-squared: 0.5729
F-statistic: 58.67 on 1 and 42 DF, p-value: 1.682e-09
#nuestro R cuadrado sube a 58%
>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
subset(RyM,select=c("R","Ex1"))
R Ex1
79.1 56
163.5 95
57.8 44
196.9 141
123.4 101
68.2 115
96.3 79
155.5 109
85.6 62
70.5 68
167.4 116
84.9 71
51.1 60
66.4 61
79.8 53
94.6 77
53.9 63
92.9 115
75.0 128
122.5 105
74.2 67
43.9 44
121.6 83
96.8 73
52.3 57
199.3 143
34.2 71
121.6 76
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
104.3 157
69.6 54
37.3 54
75.4 81
107.2 64
92.3 97
65.3 87
127.2 98
83.1 56
56.6 47
82.6 54
115.1 74
88.0 66
54.2 54
82.3 70
103.0 96
45.5 41
50.8 97
84.9 91
Median
1.229
3Q
14.674
Max
59.374
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.5164
13.0427
1.266
0.212
Ex1
0.9222
0.1537
6.001 3.11e-07 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 29.14 on 45 degrees of freedom
Multiple R-squared: 0.4445,
Adjusted R-squared: 0.4322
F-statistic: 36.01 on 1 and 45 DF, p-value: 3.114e-07
Median
2.798
3Q
15.006
Max
32.326
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.3652
11.9364 -0.282
0.78
Ex1
1.2225
0.1509
8.104 8.32e-10 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 22.67 on 38 degrees of freedom
Multiple R-squared: 0.6335,
Adjusted R-squared: 0.6238
F-statistic: 65.67 on 1 and 38 DF, p-value: 8.325e-10
50
100
150
200
> plot(R~Ex1,data=RyM[-c(2,19,18,29,11,46,22),])
> abline(modelo)
40
60
80
100
120
140
Ex1
#Podemos observer que algunas observaciones todavia estan muy alejas de la recta
estimada de regression.
#Con las observaciones quie se elimino se llego a un R cuadrado del 63%.
> summary(lm(R~Age+S+Ed+Ex0+Ex1+LF+M+N+NW+U1+U2+W+X,data=RyM))
Call:
lm(formula = R ~ Age + S + Ed + Ex0 + Ex1 + LF + M + N + NW +
U1 + U2 + W + X, data = RyM)
Residuals:
Min
1Q
-34.884 -11.923
Median
-1.135
3Q
13.495
Max
50.560
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.918e+02 1.559e+02 -4.438 9.56e-05 ***
Age
1.040e+00 4.227e-01
2.460 0.01931 *
Ssi
-8.308e+00 1.491e+01 -0.557 0.58117
Ed
1.802e+00 6.496e-01
2.773 0.00906 **
Ex0
1.608e+00 1.059e+00
1.519 0.13836
Ex1
-6.673e-01 1.149e+00 -0.581 0.56529
LF
-4.103e-02 1.535e-01 -0.267 0.79087
M
1.648e-01 2.099e-01
0.785 0.43806
N
-4.128e-02 1.295e-01 -0.319 0.75196
NW
7.175e-03 6.387e-02
0.112 0.91124
U1
-6.017e-01 4.372e-01 -1.376 0.17798
U2
1.792e+00 8.561e-01
2.093 0.04407 *
W
1.374e-01 1.058e-01
1.298 0.20332
X
7.929e-01 2.351e-01
3.373 0.00191 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 21.94 on 33 degrees of freedom
Multiple R-squared: 0.7692,
Adjusted R-squared: 0.6783
F-statistic: 8.462 on 13 and 33 DF, p-value: 3.686e-07
Elimino NW
> summary(lm(R~Age+S+Ed+Ex0+Ex1+LF+M+N+U1+U2+W+X,data=RyM))
Call:
lm(formula = R ~ Age + S + Ed + Ex0 + Ex1 + LF + M + N + U1 +
U2 + W + X, data = RyM)
Residuals:
Min
1Q Median
-35.29 -11.72 -0.96
Coefficients:
3Q
13.71
Max
50.91
Signif. codes:
Median
-1.150
3Q
13.050
Max
50.860
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -683.94267 148.80924 -4.596 5.4e-05 ***
Age
1.06058
0.38739
2.738 0.00966 **
Ssi
-6.17049
11.56670 -0.533 0.59708
Ed
1.74411
0.59587
2.927 0.00598 **
Ex0
1.55088
1.00817
1.538 0.13297
Ex1
-0.58155
1.06816 -0.544 0.58959
M
0.13488
0.17288
0.780 0.44051
N
-0.04601
0.12440 -0.370 0.71372
U1
-0.55055
0.38253 -1.439 0.15898
U2
1.78151
0.83123
2.143 0.03912 *
W
0.13283
0.10002
1.328 0.19276
X
0.78095
0.22210
3.516 0.00123 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 21.32 on 35 degrees of freedom
Multiple R-squared: 0.7687,
Adjusted R-squared: 0.696
F-statistic: 10.58 on 11 and 35 DF, p-value: 3.52e-08
Elimino N
> summary(lm(R~Age+S+Ed+Ex0+Ex1+M+U1+U2+W+X,data=RyM))
Call:
lm(formula = R ~ Age + S + Ed + Ex0 + Ex1 + M + U1 + U2 + W +
X, data = RyM)
Residuals:
Min
1Q
-34.148 -12.773
Coefficients:
Median
0.667
3Q
12.649
Max
49.797
U2
1.77880
0.82117
2.166 0.036997 *
W
0.12639
0.09731
1.299 0.202227
X
0.75334
0.20666
3.645 0.000837 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 21.07 on 36 degrees of freedom
Multiple R-squared: 0.7678,
Adjusted R-squared: 0.7033
F-statistic: 11.91 on 10 and 36 DF, p-value: 1.042e-08
Elimino Ex1
> summary(lm(R~Age+S+Ed+Ex0+M+U1+U2+W+X,data=RyM))
Call:
lm(formula = R ~ Age + S + Ed + Ex0 + M + U1 + U2 + W + X, data = RyM)
Residuals:
Min
1Q
-37.416 -13.828
Median
-0.419
3Q
12.294
Max
48.106
Coefficients:
Median
0.7294
3Q
11.7858
Max
49.9309
X
0.73210
0.18921
3.869 0.000415 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 20.68 on 38 degrees of freedom
Multiple R-squared: 0.7638,
Adjusted R-squared: 0.7141
F-statistic: 15.36 on 8 and 38 DF, p-value: 8.88e-10
Elimino M
> summary(lm(R~Age+Ed+Ex0+U1+U2+W+X,data=RyM))
Call:
lm(formula = R ~ Age + Ed + Ex0 + U1 + U2 + W + X, data = RyM)
Residuals:
Min
1Q
-42.4673 -12.8154
Median
-0.3834
3Q
11.4613
Max
51.9505
Coefficients:
Median
-1.313
3Q
9.919
Max
54.544
X
0.8236
0.1815
4.538 5.10e-05 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 20.83 on 40 degrees of freedom
Multiple R-squared: 0.7478,
Adjusted R-squared: 0.71
F-statistic: 19.77 on 6 and 40 DF, p-value: 1.441e-10
> modelo<-(lm(R~Age+Ed+Ex0+U2+X,data=RyM))
> modelo
Call:
lm(formula = R ~ Age + Ed + Ex0 + U2 + X, data = RyM)
Coefficients:
(Intercept)
-524.3743
>
Age
1.0198
Ed
2.0308
Ex0
1.2331
U2
0.9136
shapiro.test(residuals(modelo))
Shapiro-Wilk normality test
data: residuals(modelo)
W = 0.9715, p-value = 0.3017
#Nuestro modelo mejora cuando eliminamos algunas observaciones
#Eliminare las observaciones 2,19,18,29,11,46 del modelo
Vemos ke nuestro R cuadrado mejora (87%)
> modelo<-(lm(R~Age+Ed+Ex0+U2+X,data=RyM[-c(2,19,18,29,11,46),]))
> summary(modelo)
Call:
lm(formula = R ~ Age + Ed + Ex0 + U2 + X, data = RyM[-c(2, 19,
18, 29, 11, 46), ])
Residuals:
Min
1Q
-30.4608 -6.2641
Median
-0.1716
3Q
7.9186
Max
25.0147
Coefficients:
X
0.6349
> stem(RyM$Age)
The decimal point is 1 digit(s) to the right of the |
11
12
13
14
15
16
17
|
|
|
|
|
|
|
9
1345566678
000111234555669
001122335789
012277
26
7
> stem(RyM$Ed)
The decimal point is 1 digit(s) to the right of the |
8 | 77888999
9 | 013
9 | 69
10 | 0244444
10 | 578889999
11 | 001233
11 | 666788
12 | 111112
> stem(RyM$LF)
|
|
|
|
|
|
|
02366777889
0133446789
12246779
022223578
1346
4
055
14 | 2
> stem(RyM$U2)
The decimal point is 1 digit(s) to the right of the |
2 | 001244
2 | 5567778889
3 | 11233444
3 | 555556677889
4 | 0011233
4 | 7
5 | 03
5 | 8
> stem(RyM$W)
The decimal point is 2 digit(s) to the right of the |
2
3
3
4
4
5
5
6
6
|
|
|
|
|
|
|
|
|
9
2
89
001233
56679999
0113344
66667889999
2223334
6779
> stem(RyM$X)
The decimal point is 1 digit(s) to the right of the |
12
14
16
18
20
22
24
26
|
|
|
|
|
|
|
|
659
42348
0235666780012446
0456
0665
4578379
79014
146
> h<-graph.freq(RyM$Ex0,frequency=3)
> summary(h)
Inf
Sup
MC fi
fri Fi
Fri
45.0 62.3 53.65 12 0.25531915 12 0.2553191
62.3 79.6 70.95 12 0.25531915 24 0.5106383
79.6 96.9 88.25 8 0.17021277 32 0.6808511
96.9 114.2 105.55 7 0.14893617 39 0.8297872
114.2 131.5 122.85 5 0.10638298 44 0.9361702
131.5 148.8 140.15 0 0.00000000 44 0.9361702
148.8 166.1 157.45 3 0.06382979 47 1.0000000
> lines(density(RyM$Ex0),col="red")
> polygon.freq(h,frequency=3,col="blue")