You are on page 1of 7

Jack Molloy Econometrics Final Project.

Mean
71.0093
Std. Dev.
7.96766
5% Perc.
55.3289

Summary Statistics, using the observations 1 - 26


for the variable W (26 valid observations)
Median
Minimum
Maximum
72.1270
53.3850
81.6640
C.V.
Skewness
Ex. kurtosis
0.112206
-0.395554
-0.731328
95% Perc.
IQ range
Missing obs.
81.5951
12.8987
0

Mean
126.830
Std. Dev.
9.77984
5% Perc.
109.428

Summary Statistics, using the observations 1 - 26


for the variable POP (26 valid observations)
Median
Minimum
Maximum
131.135
108.700
139.260
C.V.
Skewness
Ex. kurtosis
0.0771101
-0.367997
-1.32665
95% Perc.
IQ range
Missing obs.
138.822
17.9975
0

Mean
65.1462
Std. Dev.
1.23878
5% Perc.
62.5450

Summary Statistics, using the observations 1 - 26


for the variable T (26 valid observations)
Median
Minimum
Maximum
65.2500
62.3000
67.3000
C.V.
Skewness
Ex. kurtosis
0.0190155
-0.471961
-0.212280
95% Perc.
IQ range
Missing obs.
67.2300
1.50000
0

Mean
13.7535
Std. Dev.
6.20394
5% Perc.
5.09250

Summary Statistics, using the observations 1 - 26


for the variable R (26 valid observations)
Median
Minimum
Maximum
12.1550
4.83000
27.4700
C.V.
Skewness
Ex. kurtosis
0.451082
0.585032
-0.511413
95% Perc.
IQ range
Missing obs.
27.0290
9.45500
0

Mean
0.329231
Std. Dev.
0.133294
5% Perc.
0.220000

Summary Statistics, using the observations 1 - 26


for the variable PR (26 valid observations)
Median
Minimum
Maximum
0.260000
0.220000
0.750000
C.V.
Skewness
Ex. kurtosis
0.404866
1.64563
2.19273
95% Perc.
IQ range
Missing obs.
0.697500
0.155000
0

Mean
26.9050
Std. Dev.
13.5495
5% Perc.
9.97147

Summary Statistics, using the observations 1 - 26


for the variable Y (26 valid observations)
Median
Minimum
Maximum
23.6035
9.53450
59.2670
C.V.
Skewness
Ex. kurtosis
0.503607
0.775717
-0.257330
95% Perc.
IQ range
Missing obs.
57.1904
19.2505
0

Correlation coefficients, using the observations 1 - 26


5% critical value (two-tailed) = 0.3882 for n = 26
W
1.0000

POP
0.8533
1.0000

T
0.2433
0.1581
1.0000

R
-0.2028
0.1236
-0.0633
1.0000

PR
0.5711
0.6298
0.0787
-0.0240
1.0000

W
POP
T
R
PR

0.7478
0.8285
0.1078
0.0089
0.9538
1.0000

W
POP
T
R
PR
Y

Expected Signs:
POP: I would expect to see a positive sign. The more people in a given area the more water we expect
to see consumed holding everything constant.
T: As the temperature increases we would expect to see an increase in water consumed on average.
R: The expected sign depends on how much rain is actually received. If there is a draught we would
expect to see a decrease in water consumed and vice versa. As a whole I would expect to see a negative
sign as LA doesnt receive a lot of rain.
PR: Price would be positive because regardless of the price people are always going to pay for water.
CO: We would expect to see a negative sign, if there is a conservation effort underway where the
majority of the water is coming from we would expect to see a decrease in the amount of water used in
the LA area.
Y: Positive water is a necessity and regardless of income it is needed to survive and will never be cut out
of a persons budget.

Best Guess:
Wt = POPt + Tt + Rt + CO + PR
After testing my model Im going to choose to not add anything. While T is shown to have a very high P
value Im going to leave it in the model because it makes sense that water consumption will fluctuate
with the temperature.

Model 1: OLS, using observations 1988-2013 (T = 26)


Dependent variable: W

const
POP
T
PR
R
CO
Mean dependent var
Sum squared resid
R-squared
F(5, 20)
Log-likelihood
Schwarz criterion
rho

Coefficient
-54.5268
0.619997
0.710405
19.9866
-0.377744
-9.91879

Std. Error
34.0213
0.093616
0.527127
9.95994
0.104983
3.94786

71.00935
203.0051
0.872090
27.27193
-63.60915
146.7669
-0.144780

Testing for Hetoerskedacity and Serial Correlation:


Below is the residual graph

t-ratio
-1.6027
6.6228
1.3477
2.0067
-3.5981
-2.5124

S.D. dependent var


S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn
Durbin-Watson

p-value
0.12467
<0.00001
0.19283
0.05849
0.00180
0.02068

***
*
***
**

7.967661
3.185947
0.840112
2.76e-08
139.2183
141.3920
2.100911

Regression residuals (= observed - fitted W)


6

residual

-2

-4

-6
1990

1995

2000

2005

Model estimation range: 1988 - 2013


Standard error of residuals = 3.18595

1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001

W
53.3850
60.3110
60.0640
58.9390
62.3300
62.3610
65.4090
71.9140
70.4170
73.5490
69.0930
67.4790
73.2770
68.9470

fitted
52.7449
60.8723
61.7169
61.9822
61.6439
65.7480
63.4772
69.5676
68.3059
70.8000
64.3641
72.3487
73.0661
73.2899

residual
0.640054
-0.561296
-1.65292
-3.04318
0.686084
-3.38697
1.93179
2.34638
2.11110
2.74903
4.72889
-4.86969
0.210907
-4.34292

2010

2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013

73.4370
72.3400
76.0440
75.1550
81.6640
80.5440
81.4180
76.7070
80.6150
80.0620
81.4670
69.3150

74.0248
74.9717
77.3421
73.8626
79.1398
77.1662
81.3668
76.1257
79.5877
81.9459
76.0467
74.7353

-0.587830
-2.63168
-1.29807
1.29243
2.52417
3.37778
0.0511882
0.581318
1.02732
-1.88390
5.42031
-5.42031

0.18

uhat3
N(7.1054e-015,2.668)

Test statistic for normality:


Chi-square(2) = 0.250 [0.8824]
0.16

0.14

Density

0.12

0.1

0.08

0.06

0.04

0.02

0
-8

-6

-4

-2

0
uhat3

Minimum possible value = 1.0


Values > 10.0 may indicate a collinearity problem
POP 2.065
T 1.050
R 1.045

PR 4.341
CO 2.835
VIF(j) = 1/(1 - R(j)^2), where R(j) is the multiple correlation coefficient
between variable j and the other independent variables
Properties of matrix X'X:
1-norm = 685710.01
Determinant = 3.7456619e+008
Reciprocal condition number = 1.2450385e-008
Actual and fitted W
85

fitted
actual

80

75

70

65

60

55

50
1990

1995

2000

2005

2010

After running multiple tests to try and see if there was any heteroskedacticity or Serial
Correlation I found that there is a possible issue with the CO variable in terms of heteroskedaciticity.
We found no positive serial correlation. We couldnt run the Drubin Watson D test because of the fact
that we have a time lagged dummy in the model. The CO variable is a variable that must be relooked at

when it comes to running this regression again. Overall, as shown by the graph above, this model does a
good job of explaining the variation in W.

Model 6: Heteroskedasticity-corrected, using observations 1988-2013 (T = 26)


Dependent variable: W

const
POP
T
R
PR
CO

Coefficient
-97.0203
0.537897
1.47431
-0.317789
27.3191
-11.9451

Std. Error
21.016
0.0669329
0.349782
0.0587544
7.13968
6.36359

t-ratio
-4.6165
8.0364
4.2149
-5.4088
3.8264
-1.8771

p-value
0.00017
<0.00001
0.00043
0.00003
0.00106
0.07517

***
***
***
***
***
*

Sum squared resid


R-squared
F(5, 20)
Log-likelihood
Schwarz criterion
rho

Statistics based on the weighted data:


49.79320
S.E. of regression
0.966496
Adjusted R-squared
115.3875
P-value(F)
-45.33957
Akaike criterion
110.2277
Hannan-Quinn
-0.271268
Durbin-Watson

1.577866
0.958120
4.83e-14
102.6791
104.8529
2.246119

Mean dependent var


Sum squared resid

Statistics based on the original data:


71.00935
S.D. dependent var
229.8576
S.E. of regression

7.967661
3.390115

I ran a heteroskedasticity corrected regression run and the results were a lot different the original
model. This indicates that there is an issue with heteroskedasticity with this model and set of variables.

You might also like