Homework 1 - Econometrics With R

Homework1
Disa Alda Naomi

April 10, 2018
Econ 144 - Economic Forecasting: Homework 1
Problem 1
1.
a)
truehist(trt,col ="gray", xlab="0 = PSID, 1 = NSW", ylab="Fraction", main="Histogram of Count of The Stu
lines(density(trt),col="blue", lwd=2)
Histogram of Count of The Study

15
Fraction
10
5
0
0.0 0.2 0.4 0.6 0.8 1.0
0 = PSID, 1 = NSW
truehist(age, col ="gray", xlab="Age", ylab="Fraction", main="Histogram of Age of Participants", prob=TR
lines(density(age), col="blue", lwd=2)
1
Histogram of Age of Participants
0.00 0.01 0.02 0.03 0.04 0.05
Fraction
20 30 40 50
Age
truehist(educ, col ="gray",xlab="Years of Education", ylab="Fraction", main="Histogram of Years of Educa
lines(density(educ), col="blue", lwd=2)
Histogram of Years of Education

0.30
0.20
Fraction
0.10
0.00
0 5 10 15
Years of Education
truehist(black, col ="gray",xlab="0 = not black, 1 = black", ylab="Fraction", main="Histogram of Black P
lines(density(black), col="blue", lwd=2)
2
Histogram of Black Participants
7
6
5
Fraction
4
3
2
1
0
0.0 0.2 0.4 0.6 0.8 1.0
0 = not black, 1 = black

truehist(hisp, col ="gray",xlab="0 = not hispanic, 1 = hispanic", ylab="Fraction", main="Histogram of Hi
lines(density(hisp), col="blue", lwd=2)
Histogram of Hispanic Participants

20
15
Fraction
10
5
0
0.0 0.2 0.4 0.6 0.8 1.0
0 = not hispanic, 1 = hispanic

truehist(marr, col ="gray",xlab="0 = not married, 1 = married", ylab="Fraction", main="Histogram of Marr
lines(density(marr), col="blue", lwd=2)
3
Histogram of Married Participants
8
6
Fraction
4
2
0
0.0 0.2 0.4 0.6 0.8 1.0
0 = not married, 1 = married

truehist(nodeg, col ="gray",xlab="0 = completed high school, 1 = dropout", ylab="Fraction", main="Histog
lines(density(nodeg), col="blue", lwd=2)
Histogram of Dropout Participants

6
5
Fraction
4
3
2
1
0
0.0 0.2 0.4 0.6 0.8 1.0
0 = completed high school, 1 = dropout

truehist(re74, col ="gray",xlab="Real Earnings in 1974", ylab="Fraction", main="Histogram of Real Earnin
lines(density(re74), col="blue", lwd=2)
4
Histogram of Real Earnings in 1974
3e−05
2e−05
Fraction
1e−05
0e+00
0 20000 40000 60000 80000 100000 120000 140000
Real Earnings in 1974


0e+00 1e−05 2e−05 3e−05 4e−05
Fraction
0 50000 100000 150000
Real Earnings in 1975

5
3e−05
2e−05
Fraction
1e−05
0e+00
0 20000 40000 60000 80000 100000 120000
Real Earnings in 1978 From

the histograms, we observe that there significantly more subjects in the sample that is enrolled in PSID
compared to the NSW study. We also see that the age of participants peak just below thirty and there is
a smaller peak just before fifty. We observe a prominent peak in the histogram of years of education at
approximately 12 years of education, making it the most common observed value. The histogram of black
and Hispanic participants show that the sample is significantly mostly not black and not Hispanic. From the
histogram of married participants we see that most of the subjects in the samplea are married, which makes
sense since the participants are generally aged 20 and above. From the histogram of dropouts we see that
most subjects in the sample completed high school. From the histogram of real earnings throughout the
three years we see that the distribution is right-skewed, which is what we expect from an income histogram.
We notice that the histogram of real earnings in 1978 has flattest peak.
b)
fullmodel=lm(re78~trt+age+educ+black + hisp + marr +nodeg + re74 + re75)
summary(fullmodel)
##
## Call:
## lm(formula = re78 ~ trt + age + educ + black + hisp + marr +
## nodeg + re74 + re75)
##
## Residuals:
## Min 1Q Median 3Q Max
## -64870 -4302 -435 3786 110412
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -129.74276 1688.51706 -0.077 0.9388
## trt 751.94643 915.25723 0.822 0.4114
## age -83.56559 20.81380 -4.015 6.11e-05 ***
## educ 592.61020 103.30278 5.737 1.07e-08 ***
6
## black -570.92797 495.17772 -1.153 0.2490
## hisp 2163.28118 1092.29036 1.981 0.0478 *
## marr 1240.51952 586.25391 2.116 0.0344 *
## nodeg 590.46695 646.78417 0.913 0.3614
## re74 0.27812 0.02792 9.960 < 2e-16 ***
## re75 0.56809 0.02756 20.613 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10070 on 2665 degrees of freedom
## Multiple R-squared: 0.5864, Adjusted R-squared: 0.585
## F-statistic: 419.8 on 9 and 2665 DF, p-value: < 2.2e-16
VIF for each variable is below 5, with the min of 1.04506 for hisp, the max of 3.871331 for re74. We conclude
that none of the variable implies strong collinearity and so we don’t omit any variable The statistically
significant variables (i.e. p-value of less than 0.05) are age, educ, hisp, marr, re74 and re75. We conclude that
the rest of the variables are not statistically significant at 5% significance level The regression has adjusted
R-squared of 0.585 Based on F-statistic, the regression is statistically significant different from zero at 1%
significance level
anova(fullmodel)
## Analysis of Variance Table

##
## Response: re78
## Df Sum Sq Mean Sq F value Pr(>F)
## trt 1 3.9811e+10 3.9811e+10 392.5421 < 2.2e-16 ***
## age 1 1.8792e+09 1.8792e+09 18.5289 1.734e-05 ***
## educ 1 8.4777e+10 8.4777e+10 835.9071 < 2.2e-16 ***
## black 1 4.6227e+09 4.6227e+09 45.5802 1.791e-11 ***
## hisp 1 8.7206e+07 8.7206e+07 0.8599 0.3539
## marr 1 6.1605e+09 6.1605e+09 60.7427 9.250e-15 ***
## nodeg 1 1.0394e+07 1.0394e+07 0.1025 0.7489
## re74 1 2.0274e+11 2.0274e+11 1998.9912 < 2.2e-16 ***
## re75 1 4.3093e+10 4.3093e+10 424.8942 < 2.2e-16 ***
## Residuals 2665 2.7028e+11 1.0142e+08
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Anova output indicates that trt, age, educ, black, marr, re74 and re75 are significant factors in predicting
re78
step(fullmodel)
## Start: AIC=49323.01
## re78 ~ trt + age + educ + black + hisp + marr + nodeg + re74 +
## re75
##
## Df Sum of Sq RSS AIC
## - trt 1 6.8456e+07 2.7035e+11 49322
## - nodeg 1 8.4527e+07 2.7037e+11 49322
## - black 1 1.3482e+08 2.7042e+11 49322
## <none> 2.7028e+11 49323
## - hisp 1 3.9781e+08 2.7068e+11 49325
## - marr 1 4.5411e+08 2.7074e+11 49325
## - age 1 1.6348e+09 2.7192e+11 49337
7
## - educ 1 3.3376e+09 2.7362e+11 49354
## - re74 1 1.0061e+10 2.8034e+11 49419
## - re75 1 4.3093e+10 3.1338e+11 49717
##
## Step: AIC=49321.68
## re78 ~ age + educ + black + hisp + marr + nodeg + re74 + re75
##
## - black 1 1.0495e+08 2.7046e+11 49321
## - nodeg 1 1.0708e+08 2.7046e+11 49321
## <none> 2.7035e+11 49322
## - marr 1 3.8715e+08 2.7074e+11 49324
## - hisp 1 4.2143e+08 2.7077e+11 49324
## - age 1 1.7112e+09 2.7206e+11 49337
## - educ 1 3.4257e+09 2.7378e+11 49353
## - re74 1 1.0011e+10 2.8036e+11 49417
## - re75 1 4.3032e+10 3.1338e+11 49715
##
## Step: AIC=49320.72
## re78 ~ age + educ + hisp + marr + nodeg + re74 + re75
##
## - nodeg 1 8.9778e+07 2.7055e+11 49320
## <none> 2.7046e+11 49321
## - marr 1 4.7535e+08 2.7093e+11 49323
## - hisp 1 5.1045e+08 2.7097e+11 49324
## - age 1 1.6466e+09 2.7210e+11 49335
## - educ 1 3.7157e+09 2.7417e+11 49355
## - re74 1 1.0049e+10 2.8051e+11 49416
## - re75 1 4.3409e+10 3.1387e+11 49717
##
## Step: AIC=49319.61
## re78 ~ age + educ + hisp + marr + re74 + re75
##
## <none> 2.7055e+11 49320
## - marr 1 4.3805e+08 2.7098e+11 49322
## - hisp 1 5.0861e+08 2.7105e+11 49323
## - age 1 1.5918e+09 2.7214e+11 49333
## - educ 1 5.9169e+09 2.7646e+11 49375
## - re74 1 1.0021e+10 2.8057e+11 49415
## - re75 1 4.3372e+10 3.1392e+11 49715
##
## Call:
## lm(formula = re78 ~ age + educ + hisp + marr + re74 + re75)
##
## Coefficients:
## (Intercept) age educ hisp marr
## 517.8421 -81.4208 547.7177 2405.4974 1124.5932
## re74 re75
## 0.2773 0.5682
The model with lowest AIC is the model that includes the variables age, educ, hisp, marr, re74 and re75
8
c)
ss=regsubsets(re78~trt + age + educ + black + hisp + marr + nodeg + re74 + re75,data=nsw74psid1, nbest=3
summary(ss)
## Subset selection object

## Call: regsubsets.formula(re78 ~ trt + age + educ + black + hisp + marr +
## nodeg + re74 + re75, data = nsw74psid1, nbest = 3)
## 9 Variables (and intercept)
## Forced in Forced out
## trt FALSE FALSE
## age FALSE FALSE
## educ FALSE FALSE
## black FALSE FALSE
## hisp FALSE FALSE
## marr FALSE FALSE
## nodeg FALSE FALSE
## re74 FALSE FALSE
## re75 FALSE FALSE
## 3 subsets of each size up to 8
## Selection Algorithm: exhaustive
## trt age educ black hisp marr nodeg re74 re75
## 1 ( 1 ) " " " " " " " " " " " " " " " " "*"
## 1 ( 2 ) " " " " " " " " " " " " " " "*" " "
## 1 ( 3 ) " " " " "*" " " " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " " " " " " "*" "*"
## 2 ( 2 ) " " " " "*" " " " " " " " " " " "*"
## 2 ( 3 ) " " " " " " " " " " " " "*" " " "*"
## 3 ( 1 ) " " " " "*" " " " " " " " " "*" "*"
## 3 ( 2 ) " " " " " " " " " " " " "*" "*" "*"
## 3 ( 3 ) " " "*" " " " " " " " " " " "*" "*"
## 4 ( 1 ) " " "*" "*" " " " " " " " " "*" "*"
## 4 ( 2 ) " " " " "*" " " "*" " " " " "*" "*"
## 4 ( 3 ) " " " " "*" " " " " "*" " " "*" "*"
## 5 ( 1 ) " " "*" "*" " " "*" " " " " "*" "*"
## 5 ( 2 ) " " "*" "*" " " " " "*" " " "*" "*"
## 5 ( 3 ) " " "*" "*" "*" " " " " " " "*" "*"
## 6 ( 1 ) " " "*" "*" " " "*" "*" " " "*" "*"
## 6 ( 2 ) " " "*" "*" "*" "*" " " " " "*" "*"
## 6 ( 3 ) " " "*" "*" "*" " " "*" " " "*" "*"
## 7 ( 1 ) " " "*" "*" " " "*" "*" "*" "*" "*"
## 7 ( 2 ) " " "*" "*" "*" "*" "*" " " "*" "*"
## 7 ( 3 ) "*" "*" "*" " " "*" "*" " " "*" "*"
## 8 ( 1 ) " " "*" "*" "*" "*" "*" "*" "*" "*"
## 8 ( 2 ) "*" "*" "*" "*" "*" "*" " " "*" "*"
## 8 ( 3 ) "*" "*" "*" " " "*" "*" "*" "*" "*"
subsets(ss,statistic="cp",legend=F,main="Mallows CP",col="steelblue4", ylim=c(0,50))
9
a−r74−r75
n−r74−r75
Mallows CP
50
40
Statistic: cp
30
e−m−r74−r75
e−r74−r75
20
e−h−r74−r75
a−e−r74−r75
a−e−b−r74−r75
10
a−e−b−m−r74−r75
a−e−m−r74−r75 t−a−e−h−m−n−r74−r75
a−e−b−h−r74−r75
a−e−h−r74−r75 t−a−e−b−h−m−r74−r75
a−e−b−h−m−n−r74−r75
t−a−e−h−m−r74−r75
a−e−b−h−m−r74−r75
a−e−h−m−n−r74−r75
a−e−h−m−r74−r75
0
2 4 6 8
Subset Size
## Abbreviation
## trt t
## age a
## educ e
## black b
## hisp h
## marr m
## nodeg n
## re74 r74
## re75 r75
We observe that the regression with the lowest Cp statistic is the regression with 7 variables that includes
age, educ,hisp, marr, re74 and re75. We choose this model to proceed since the model with the lowest Cp
statistic in magnitude should be our preferred choice.
d)
chosenmodel <- lm(re78~age + educ + hisp + marr + re74 + re75)
summary(chosenmodel)
##
## Call:
## lm(formula = re78 ~ age + educ + hisp + marr + re74 + re75)
##
## Residuals:
## -64914 -4316 -498 3722 110303
##
## Coefficients:
## (Intercept) 517.84209 1182.98921 0.438 0.6616
10
## age -81.42078 20.55047 -3.962 7.63e-05 ***
## educ 547.71773 71.70329 7.639 3.04e-14 ***
## hisp 2405.49742 1074.08566 2.240 0.0252 *
## marr 1124.59319 541.07870 2.078 0.0378 *
## re74 0.27729 0.02789 9.941 < 2e-16 ***
## re75 0.56817 0.02747 20.681 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10070 on 2668 degrees of freedom
## F-statistic: 629.4 on 6 and 2668 DF, p-value: < 2.2e-16
plot(chosenmodel$fit,chosenmodel$res, col="skyblue4",lwd=10, main="Residuals vs. Fitted Values of Chosen
abline(h=0,lwd=2,col = "red3")
Residuals vs. Fitted Values of Chosen Regression Model

100000
chosenmodel$res
50000
0
−50000
0 20000 40000 60000 80000 100000 120000
chosenmodel$fit
There is a somewhat increasing magnitude in the residuals as fitted values increase in magnitude. This implies
a pattern in the residuals of the chosen model, and might imply heteroskedaticity. Even so, approximately
half of the points lie above and on top of the zero line.
e)
vif(chosenmodel)
## age educ hisp marr re74 re75

## 1.227765 1.264142 1.010674 1.142693 3.863304 3.833144
plot(vif(chosenmodel), main="VIF of The Variables in The Chosen Model", ylabel="VIFs", xlabel="Variables
11
VIF of The Variables in The Chosen Model
3.5
vif(chosenmodel)
3.0
2.5
2.0
1.5
1.0
1 2 3 4 5 6
Index
Here the variable with highest VIF is re75, which implies it has strongest likelihood of collinearity. However,
the VIFs for all variables are below five so none of it implies collinearity strong enough. This makes sense
since real earnings of different years are somewhat likely to move in the same directions.
f)
library(corrplot)
chosenmodelcorr = cor(nsw74psid1)
corrplot(chosenmodelcorr, method="number", main="Correlation Plot")
12
Correlation Plot
nodeg
black
educ
marr
re74
re75
re78
hisp
age
trt
1
trt 1 −0.22−0.15 0.33 0.04 −0.45 0.22 −0.32−0.32−0.25
0.8
age −0.22 1 −0.19 −0.1 0 0.27 0.17 0.25 0.22 0.11
0.6
educ −0.15−0.19 1 −0.34−0.09 0.05 −0.76 0.33 0.35 0.37
0.4
black 0.33 −0.1 −0.34 1 −0.12−0.25 0.32 −0.29−0.29−0.27
0.2
hisp 0.04 0 −0.09−0.12 1 0.01 0.06 −0.03−0.05−0.01
0
marr −0.45 0.27 0.05 −0.25 0.01 1 −0.1 0.28 0.27 0.22
−0.2
nodeg 0.22 0.17 −0.76 0.32 0.06 −0.1 1 −0.28−0.29 −0.3
−0.4
re74 −0.32 0.25 0.33 −0.29−0.03 0.28 −0.28 1 0.86 0.7
−0.6
re75 −0.32 0.22 0.35 −0.29−0.05 0.27 −0.29 0.86 1 0.74
−0.8
re78 −0.25 0.11 0.37 −0.27−0.01 0.22 −0.3 0.7 0.74 1
−1
The variables are mostly weakly correlated with each other - some positive and some negative. Some notable
correlations include no degree status is highly negatively correlated with years of education (with value -0.76),
which makes sense since no degree implies a person is unlikely to spend more years in education. Another is
re75 is highly positively correlated with re74 (with value 0.86). This also makes sense since the higher the
earnings in the previous year the more likely it has high earnings in the current year. The same argument
applies for the high positive correlation between re78 and re74 and re78 and re75.
g)
chosenmodel_cook=cooks.distance(chosenmodel)
plot(chosenmodel_cook,ylab="Cook's distance",type='o',main="Cook's Distance Plot",col="skyblue4")
13
Cook's Distance Plot
0.00 0.05 0.10 0.15 0.20 0.25

Cook's distance
0 500 1000 1500 2000 2500
Index There
seems to be an outlier with the highest Cook’s distance of 0.25. Another observation with CD of 0.15 and
approximately 0.14 might also be potential outliers. We might want to omit them from our regression model
h)
truehist(chosenmodel$res,col="skyblue3",xlab="Residuals",ylab="Fraction",main="Histogram of Residuals")
Histogram of Residuals
6e−05
4e−05
Fraction
2e−05
0e+00
−50000 0 50000 100000
Residuals There
seems to be an outlier with the highest Cook’s distance of 0.25. Another observation with CD of 0.15 and
14
approximately 0.14 might also be potential outliers. We might want to omit them from our regression model
i)
qqnorm(chosenmodel$res,col="skyblue4", main="QQ Normal Plot")
qqline(chosenmodel$res, col=2)
QQ Normal Plot
100000
Sample Quantiles
50000
0
−50000
−3 −2 −1 0 1 2 3
Theoretical Quantiles
#abline(0,1,col="red",lwd=2,lty=2)
We expect a 1-1 correspondence of x and y values if the model is good. The QQ plot shows a curve with
some parts above the diagonal 45 degree line and some parts below it, but it has the overal trend of a 1-1
correspondence.
j)
plot(chosenmodel$fit, re78,col="skyblue4", xlab="Predicted Response",ylab="Observed Response",main="Obse
lines(lowess(chosenmodel$fit, re78),lwd=2)
abline(0,1,col="red",lwd=2,lty=2)
legend(150,110000, c(expression(y[obs]==y[pred]), "Lowess Smoother"), fill =c("red", "black"),cex=1,bty=
15
Observed vs. Predicted Response
120000
yobs = ypred
Observed Response
Lowess Smoother
80000
40000
0
0 20000 40000 60000 80000 100000 120000
Predicted Response The

scatterplot of observed vs. predicted values somewhat follow a positive linear relationship. The data is more
clustered at lower values than at higher values and there are a moderate amount of deviations from the 45
degree line. The Lowess Smoother indicates similar positive linear relationships as the 45 degree line, with
slight difference in its slope.
2. Problem 2.2 from Textbook
setwd("/Users/AldaSiamipar/Documents/Econ144/Data")
library(TTR)
library(lattice)
df <- read.table("Chapter2no2.txt", header = TRUE)
summary(df)
## date GRGDP RETURN

## 1950Q2 : 1 Min. :-2.7075 Min. :-26.937
## 1950Q3 : 1 1st Qu.: 0.3273 1st Qu.: -0.915
## 1950Q4 : 1 Median : 0.7679 Median : 2.023
## 1951Q1 : 1 Mean : 0.7958 Mean : 1.962
## 1951Q2 : 1 3rd Qu.: 1.3107 3rd Qu.: 5.753
## 1951Q3 : 1 Max. : 3.9343 Max. : 20.117
## (Other):242
par(mfrow=c(2,1))
truehist(df$GRGDP, col ="orange", xlab="Growth in GDP", ylab="Fraction", main="Histogram of Growth in GD
truehist(df$RETURN, col ="yellow", xlab="S&P500 Returns", ylab="Fraction", main="Histogram of S&P500 Ret
16
Fraction Histogram of Growth in GDP
0.0
−3 −2 −1 0 1 2 3 4
Growth in GDP
Histogram of S&P500 Returns

Fraction
0.00
−30 −20 −10 0 10 20
S&P500 Returns
#timeseries2
plot.ts(timeseries2, main = "Time Series of Growth in US GDP and S&P500 Returns")
Time Series of Growth in US GDP and S&P500 Returns

250
150
date
40 50
GRGDP
2
0
−2
RETURN
0 10
−20
1950 1960 1970 1980 1990 2000 2010
Time
17
acf(df$GRGDP, plot=TRUE)
1.0
0.8
0.6 Series df$GRGDP
ACF
0.4
0.2
0.0
0 5 10 15 20
Lag
acf(df$RETURN, plot=TRUE)
Series df$RETURN
1.0
0.8
0.6
ACF
0.4
0.2
0.0
0 5 10 15 20
Lag
18
The time series plot for both growth in US GDP and S&P500 Returns show first order weak stationarity as
both plots cross the respective means (that are constant in time) very often. There are hints of seasonality
but these seasonality appear to be weak and inconsistent - the magnitudes of the dips vary, as well as the
length from one peak to another. The acf plot of the time series also implies the variance also does not vary
much with time, which implies covariance stationarity.
df3 <- read.table("Chapter3no1.txt", header = TRUE)
summary(df3)
## date rpce rdpi

## 1/1/00 : 1 Min. :1690 Min. : 1877
## 1/1/01 : 1 1st Qu.:2989 1st Qu.: 3344
## 1/1/02 : 1 Median :4578 Median : 5164
## 1/1/03 : 1 Mean :5036 Mean : 5544
## 1/1/04 : 1 3rd Qu.:7052 3rd Qu.: 7662
## 1/1/05 : 1 Max. :9592 Max. :10536
## (Other):634
timeseries3 <-ts(df3, frequency=12, start=c(1959,1))
plot.ts(timeseries3, main="Time Series of Consumption Growth and Income Growth")
Time Series of Consumption Growth and Income Growth

200 400 600
date
0
6000
rpce
2000
10000
rdpi
6000
2000
1960 1970 1980 1990 2000 2010
Time
logtimeseries3 <- log(timeseries3)
plot.ts(logtimeseries3, main="Time Series of Log Transform of Consumption Growth and Income Growth")
19
Time Series of Log Transform of Consumption Growth and Income Growth
6
date
4
2
9.0 0
8.5
rpce
8.0
9.0 7.5
rdpi
8.5
8.0
7.5
1960 1970 1980 1990 2000 2010
Time
difflogtimeseries3 <- diff(logtimeseries3)
plot.ts(difflogtimeseries3, main="Time Series of Difference of Log Transform of Consumption Growth and I
Time Series of Difference of Log Transform of Consumption Growth and Income Growth
4
2
date
0
0.02 −4
rpce
0.00
−0.02
0.04
rdpi
0.00
−0.04
1960 1970 1980 1990 2000 2010
Time
a) Permanent income theory states that agents spread consumption over their life times - it supposes that a
person’s consumption at a point in time is determined not just by their current income but also by their
20
expected income in future yeares. We see that the fluctuations of consumption growth is less volatile than
fluctuations of income growth. This agrees with the permanent income model since it implies that large
fluctuations in income growth don’t correspond to as large of fluctuations of consumption growth since one’s
consumption does not only depend on current income.
b)
#difflogtimeseries3[,2]
m = lm(difflogtimeseries3[,2]~difflogtimeseries3[,3])
summary(m)
##
## Call:
## lm(formula = difflogtimeseries3[, 2] ~ difflogtimeseries3[, 3])
##
## Residuals:
## -0.0303967 -0.0029727 0.0001525 0.0030417 0.0244545
##
## Coefficients:
## (Intercept) 0.0022542 0.0002242 10.055 < 2e-16 ***
## difflogtimeseries3[, 3] 0.1745671 0.0292029 5.978 3.77e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.005318 on 637 degrees of freedom
## F-statistic: 35.73 on 1 and 637 DF, p-value: 3.768e-09
Both the intercept and coefficient of the disposable income growth are statistically significant since their
p-values are sufficiently small at 1% significance level. The slope coefficient indicates that for every increase
in growth of disposable income by one percentage point, the growth of consumption increases by .174571
percentage points, agreeing with permanent income hypothesis. The R-squared is 0.005318 which means that
variation in growth of disposable income explains roughly 5%lag variation in growth of consumption rate -
which implies our regression model is not very good.
c)
library(dyn)
library(xts)
lagged <- lag.xts(difflogtimeseries3[,3], lag=-1)
#lagged
#difflogtimeseries3[,3]
mpluslag = lm(difflogtimeseries3[,2]~difflogtimeseries3[,3] + lagged)
summary(mpluslag)
##
## Call:
## lm(formula = difflogtimeseries3[, 2] ~ difflogtimeseries3[, 3] +
## lagged)
##
## Residuals:
## -0.0300734 -0.0028702 -0.0000006 0.0029797 0.0255131
##
## Coefficients:
21
## (Intercept) 0.0019887 0.0002405 8.268 7.98e-16 ***
## difflogtimeseries3[, 3] 0.1872692 0.0293883 6.372 3.58e-10 ***
## lagged 0.0828596 0.0293877 2.820 0.00496 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.005285 on 635 degrees of freedom
## (1 observation deleted due to missingness)
## F-statistic: 22 on 2 and 635 DF, p-value: 5.8e-10
The lagged variable that is just added is statistically significant at 1% significance level. Its slope coefficient
indicates that a percentage change in the growth of disposable income of the last period leads to .08 change
in percentage of the growth of consumption rate. The R-squared is higher, with the adjusted R squared being
0.06185. The total marginal effect of income growth on consumption growth is 0.27% which is larger than
the regression in b). We may want to include t-1 lag and experiment with additional lags to estimate this
regression.
df4a <- read.table("Chapter3no3a.txt", header = TRUE)
summary(df4a)
## date rgdp
## 1/1/00 : 1 Min. : 1766
## 1/1/01 : 1 1st Qu.: 3178
## 1/1/02 : 1 Median : 5821
## 1/1/03 : 1 Mean : 6522
## 1/1/04 : 1 3rd Qu.: 9176
## 1/1/05 : 1 Max. :13491
## (Other):255
ts3a <-ts(df4a, frequency=4, start=c(1947,1))
plot.ts(ts3a, main="Time Series of U.S. Real GDP")
22
Time Series of U.S. Real GDP
250
date
100
0
8000
rgdp
2000
1950 1960 1970 1980 1990 2000 2010
Time
logts3a<- log(ts3a)
plot.ts(logts3a, main="Time Series of Log Transform of U.S. Real GDP")
Time Series of Log Transform of U.S. Real GDP

4
date
2
9.50
rgdp
8.5
7.5
1950 1960 1970 1980 1990 2000 2010
Time
difflogts3a <- diff(logts3a)
plot.ts(difflogts3a, main="Difference of Log Transform of U.S. Real GDP")
23
Difference of Log Transform of U.S. Real GDP
4
date
0
0.01 0.04−4
rgdp
−0.03
1950 1960 1970 1980 1990 2000 2010
Time
a) Exact Definition: Real GDP (Gross Domestic Product) is the inflation adjusted value of the goods and
services produced by factors of production located in United States Periodicity/Frequency: quarterly
(1947Q1 - 2012Q1) Units: y is in Billions of Chained 2005 Dollars, x is in quarterly increments of time
starting from January 1947 to January 2012
Q: Is the underlying stochastic process first and second order weakly stationary?
A: the plot of RGDP against time that the underlying stochastic process is not first order weakly stationary
and not second order weakly stationary. We observe occasional dips and peaks that are indicative of recessions
and booms. Each observation at different times have different means. But when we take the difference of log
transformation of the time series we see that the underlying stochastic process satisfies first and second order
weakly stationary. As we can see the plot often passes a common mean and the variance does not depend on
time as we don’t really a discern a pattern on the fluctuations.
df4b <- read.table("Chapter3no3b.txt", header = TRUE)
summary(df4b)
## DATE jpy_usd
## 1/10/00: 1 Min. : 75.72
## 1/10/01: 1 1st Qu.:110.09
## 1/10/02: 1 Median :130.34
## 1/10/03: 1 Mean :168.75
## 1/10/05: 1 3rd Qu.:235.64
## 1/10/06: 1 Max. :358.44
## (Other):10385
#df4b
ts3b <-ts(df4b, frequency=365.25, start=c(1947,1,4))

plot.ts(ts3b, main="Time Series of Japan/U.S. Foreign Exchange Rate")
24
Time Series of Japan/U.S. Foreign Exchange Rate
10000
DATE
4000
0
jpy_usd
250
100
1950 1955 1960 1965 1970 1975
Time
logts3b<- log(ts3b)
plot.ts(logts3b, main="Time Series of Log Transform of Japan/U.S. Foreign Exchange Rate")
Time Series of Log Transform of Japan/U.S. Foreign Exchange Rate

8
DATE
4
0
5.5
jpy_usd
4.5
1950 1955 1960 1965 1970 1975
Time
difflogts3b <- diff(logts3b)
plot.ts(difflogts3b, main="Time Series of Difference of Log Transform of Japan/U.S Foreign Exchange Rate
25
Time Series of Difference of Log Transform of Japan/U.S Foreign Exchange Rate
2
DATE
−2
−6
jpy_usd
0.00
−0.10
1950 1955 1960 1965 1970 1975
Time
b) Exact Definition: Japan/U.S. Foreign exchange rate is the buying rate of $1 U.S. Dollar in terms of
Japanese Yen Periodicity/Frequency: daily (1971-01-04) to (2012-06-01) Units: y is in Japanese Yen to one
U.S. Dollar, x is in daily increments of time starting from 4th January 1971 to 6th January 2012
A: the plot of U.S./Japanese Exchange Rate against time that the underlying stochastic process is not
stationary - in fact it shows a downward trend. The exchange rate fluctuated in a few years but overall it
consistently follows a downward trend. But when we take the difference of log transformation of the time
series we see that the underlying stochastic process satisfies first and second order weakly stationary. As we
can see the plot often passes a common mean and the variance does not depend on time as we don’t really a
discern a pattern on the fluctuations - althuogh we observe more extreme fluctuations at the beginning of the
dataset.
df4c <- read.table("Chapter3no3c.txt", header = TRUE)
summary(df4c)
## DATE CMRate10Yr
## 1/1/01 : 1 Min. : 0.000
## 1/1/02 : 1 1st Qu.: 4.340
## 1/1/03 : 1 Median : 6.160
## 1/1/04 : 1 Mean : 6.407
## 1/1/07 : 1 3rd Qu.: 7.950
## 1/1/08 : 1 Max. :15.840
## (Other):13152
#df4c
ts3c <-ts(df4c, frequency=365.25, start=c(1962,1,2))
plot.ts(ts3c, main="Time Series of 10-Year Treasury Constant Maturity Rate")
26
DATE Time Series of 10−Year Treasury Constant Maturity Rate
6000
10 150
CMRate10Yr
5
0
1970 1980 1990
Time
logts3c<- log(ts3c)
plot.ts(logts3c, main="Time Series of Log Transform of 10-Year Treasury Constant Maturity Rate")
Time Series of Log Transform of 10−Year Treasury Constant Maturity Rate

8
DATE
4
2.5 0
CMRate10Yr
1.5
0.5
1970 1980 1990
Time
difflogts3c <- diff(logts3c)
plot.ts(difflogts3c, main="Time Series of Difference of Log Transform of 10-Year Treasury Constant Matur
27
Time Series of Difference of Log Transform of 10−Year Treasury Constant Maturity Rate
6
−2 2
DATE
−8
CMRate10Yr
0.00
−0.15
1970 1980 1990
Time
c) Exact Definition: Yields on 10-year treasury bonds of the U.S. government adjusted to constant maturities
Periodicity/Frequency: daily (1962-01-02 to 2012-06-07) Units: y is in percentage of 10-Year Treasury
Constant Maturity Rate, x is in daily increments of time starting from 2nd January 1962 to 7th June 2012
A: the plot of 10-Year Treasury Constant Maturity Rate against time shows that the underlying stochastic
process is not stationary. The rate seems to increase up to late 1970s but shows a decreasing trend since then.
But when we take the difference of log transformation of the time series we see that the underlying stochastic
process satisfies first and second order weakly stationary. As we can see the plot often passes a common
mean and the variance does not depend on time as we don’t really a discern a pattern on the fluctuations -
althuogh we observe more extreme fluctuations at the end of the dataset.
df4d <- read.table("Chapter3no3d.txt", header = TRUE)
summary(df4d)
## DATE unemrate
## 1/1/00 : 1 Min. : 2.500
## 1/1/01 : 1 1st Qu.: 4.600
## 1/1/02 : 1 Median : 5.600
## 1/1/03 : 1 Mean : 5.785
## 1/1/04 : 1 3rd Qu.: 6.800
## 1/1/05 : 1 Max. :10.800
## (Other):767
#df4d
ts3d <-ts(df4d, frequency=12, start=c(1948,1))
plot.ts(ts3d, main="Time Series of Civilian Unemployment Rate")
28
Time Series of Civilian Unemployment Rate
800
DATE
400
0
unemrate
4 6 8
1950 1960 1970 1980 1990 2000 2010
Time
logts3d<- log(ts3d)
plot.ts(logts3d, main="Time Series of Log Transform of Civilian Unemployment Rate")
Time Series of Log Transform of Civilian Unemployment Rate

6
DATE
4
2
0
2.0
unemrate
1.0
1950 1960 1970 1980 1990 2000 2010
Time
difflogts3d <- diff(logts3d)
plot.ts(difflogts3d, main="Time Series of Difference of Log Transform of Civilian Unemployment Rate")
29
Time Series of Difference of Log Transform of Civilian Unemployment Rate
4
DATE
0
0.2 −4
unemrate
0.0
−0.2
1950 1960 1970 1980 1990 2000 2010
Time
d) Exact Definition: The unemployment rate represents the number of unemployed as a percentage of the
labor force. Labor force data are restricted to people 16 years of age and older, who currently reside in
1 of the 50 states or the District of Columbia, who do not reside in institutions (e.g., penal and mental
facilities, homes for the aged), and who are not on active duty in the Armed Forces. Periodicity/Frequency:
12 (monthly) Units: y is in percentage of Civilian Unemployment Rate, x is in monthly increments of time
starting from January 1948
A: the plot of Civilian Unemployment Rate shows cycles/seasonality, with slight upward trend.But when
we take the difference of log transformation of the time series we see that the underlying stochastic process
satisfies first and second order weakly stationary. As we can see the plot often passes a common mean and
the variance does not depend on time as we don’t really a discern a pattern on the fluctuations - although we
observe more extreme fluctuations at the beginning of the dataset, and the fluctuations somehow seem to
diminish a bit over time.

QUARTERLY DATA
qdata <- read.table("Chapter4no3.txt", header = TRUE)
summary(qdata)
## OBS P R Pgrowth
## 1980Q2 : 1 Min. : 42.50 Min. : 4.310 Min. :-5.3069
## 1980Q3 : 1 1st Qu.: 63.52 1st Qu.: 6.297 1st Qu.: 0.2814
## 1980Q4 : 1 Median : 77.51 Median : 7.816 Median : 1.0533
## 1981Q1 : 1 Mean : 90.26 Mean : 8.758 Mean : 0.8489
## 1981Q2 : 1 3rd Qu.:121.33 3rd Qu.:10.333 3rd Qu.: 1.7130
## 1981Q3 : 1 Max. :161.98 Max. :17.740 Max. : 4.3114
30
## (Other):120
## Rchanges
## Min. :-2.15667
## 1st Qu.:-0.32917
## Median :-0.10892
## Mean :-0.08032
## 3rd Qu.: 0.16950
## Max. : 1.61333
##
#qdata
qdatats <- ts(qdata,frequency=4, start=c(1980, 4))
plot.ts(qdatats)
qdatats
120
4
Pgrowth
2
80
OBS
0
40
−4
1600
Rchanges
1
120
0
P
80
−1
−2
16 40
1980 1985 1990 1995 2000 2005 2010

12
Time
R
8
4
1980 1985 1990 1995 2000 2005 2010
Time
par(mfrow=c(2,2))
Pacf <- acf(qdata$P, plot=TRUE)
Pgrowthacf <- acf(qdata$Pgrowth, plot = TRUE)
Racf <- acf(qdata$R, plot=TRUE)
Rchangesacf <- acf(qdata$Rchanges, plot = TRUE)
31
Series qdata$P Series qdata$Pgrowth
0.6
0.6
ACF
ACF
−0.2
−0.2
0 5 10 15 20 0 5 10 15 20
Lag Lag
Series qdata$R Series qdata$Rchanges
0.6
0.6
ACF
ACF
−0.2
−0.2
0 5 10 15 20 0 5 10 15 20
Lag Lag
Based on the ACF plots, the House Prices (P) has stronger time dependence than the percentage of real
interest rates(R). The ACF of growth of house prices also reflect stronger time dependence than the changes
in real interest rates - though the ACFs of growth of house prices and the changes in real interest rates do
not show significant/great magitudes of autocorrelations.
par(mfrow=c(2,2))
Ppacf <- pacf(qdata$P, plot=TRUE)
Pgrowthpacf <- pacf(qdata$Pgrowth, plot = TRUE)
Rpacf <- pacf(qdata$R, plot=TRUE)
Rchangespacf <- pacf(qdata$Rchanges, plot = TRUE)
32
Partial ACF Series qdata$P Series qdata$Pgrowth
Partial ACF
0.6
0.2
−0.4
−0.2
5 10 15 20 5 10 15 20
Lag Lag
Series qdata$R Series qdata$Rchanges

1.0
Partial ACF
Partial ACF
0.1
−0.2 0.4
5 10 15 20 −0.2 5 10 15 20
Lag Lag
Based on the PACF plots, the House Prices(P) has similar magnitude of time dependence compared with the
percentage of real interest rates. The PACF of house prices for lag greater than 1 is negative while the PACF
of real interest rates for lag greater than 1 oscillate between negative and positive. However, the PACF of
changes in interest rates show greater magnitudes in general at each lag compared to PACF of growth in
house prices.
ANNUAL DATA
adata <-read.table("Chapter4no1.txt", header=TRUE)
#data
par(mfrow=c(2,2))
Pacf <- acf(adata$P, plot=TRUE)
Pgrowthacf <- acf(adata$Pgrowth, plot = TRUE)
Racf <- acf(adata$R, plot=TRUE)
Rchangesacf <- acf(adata$Rchange, plot = TRUE)
33
Series adata$P Series adata$Pgrowth
0.6
0.6
ACF
ACF
−0.2
−0.2
0 5 10 15 0 5 10 15
Lag Lag
Series adata$R Series adata$Rchange

0.6
0.6
ACF
ACF
−0.2
0 5 10 15 −0.2 0 5 10 15
Lag Lag ACF

of house prices from annual data has similar time dependence for all house prices, percentage of real interest
rates, house prices growth and changes in real interest rates. The magnitude of the acfs of annual data decay
faster than acfs of the quarterly data. The acfs of annual and quarterly data of all variables move in the
same direction except for growth of housing prices, where the acfs in annual might be negative when the
quarterly count.erparts are positive. Both annual and quarterly data show strong autocorrelations decaying
over time and a arge first-order partial autocorrelation close to one.
par(mfrow=c(2,2))
Ppacf <- pacf(adata$P, plot=TRUE)
Pgrowthpacf <- pacf(adata$Pgrowth, plot = TRUE)
Rpacf <- pacf(adata$R, plot=TRUE)
Rchangespacf <- pacf(adata$Rchange, plot = TRUE)
34
Partial ACF Series adata$P Series adata$Pgrowth
Partial ACF
0.6
−0.2 0.4
−0.2
2 4 6 8 10 12 14 2 4 6 8 10 12 14
Lag Lag
Series adata$R Series adata$Rchange

Partial ACF
Partial ACF
0.6
0.1
−0.2
2 4 6 8 10 12 14 −0.3 2 4 6 8 10 12 14
Lag Lag The

PACF of house prices and real interest rates show similar magnitude of time dependency in both quarterly
and annual data. However, there seem to be more variations in negative and positive oscillations in the
annual data pacfs that are greater in magnitude in general.
35

Homework 1 - Econometrics With R

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Homework 1 - Econometrics With R

Uploaded by

Copyright:

Available Formats

Homework1

Disa Alda Naomi

Econ 144 - Economic Forecasting: Homework 1

Histogram of Count of The Study

0.0 0.2 0.4 0.6 0.8 1.0

Histogram of Years of Education

0.0 0.2 0.4 0.6 0.8 1.0

0 = not black, 1 = black

Histogram of Hispanic Participants

0.0 0.2 0.4 0.6 0.8 1.0

0 = not hispanic, 1 = hispanic

0.0 0.2 0.4 0.6 0.8 1.0

0 = not married, 1 = married

Histogram of Dropout Participants

0.0 0.2 0.4 0.6 0.8 1.0

0 = completed high school, 1 = dropout

0 20000 40000 60000 80000 100000 120000 140000

Real Earnings in 1974

Histogram of Real Earnings in 1975

0 50000 100000 150000

Real Earnings in 1975

0 20000 40000 60000 80000 100000 120000

Real Earnings in 1978 From

## Analysis of Variance Table

## Subset selection object

Residuals vs. Fitted Values of Chosen Regression Model

0 20000 40000 60000 80000 100000 120000

## age educ hisp marr re74 re75

0.00 0.05 0.10 0.15 0.20 0.25

0 500 1000 1500 2000 2500

−50000 0 50000 100000

0 20000 40000 60000 80000 100000 120000

Predicted Response The

2. Problem 2.2 from Textbook

## date GRGDP RETURN

Histogram of S&P500 Returns

−30 −20 −10 0 10 20

Time Series of Growth in US GDP and S&P500 Returns

1950 1960 1970 1980 1990 2000 2010

3. Problem 3.1 from Textbook

## date rpce rdpi

Time Series of Consumption Growth and Income Growth

1960 1970 1980 1990 2000 2010

1960 1970 1980 1990 2000 2010

1960 1970 1980 1990 2000 2010

4. Problem 3.3 from Textbook

1950 1960 1970 1980 1990 2000 2010

Time Series of Log Transform of U.S. Real GDP

1950 1960 1970 1980 1990 2000 2010

1950 1960 1970 1980 1990 2000 2010

ts3b <-ts(df4b, frequency=365.25, start=c(1947,1,4))

1950 1955 1960 1965 1970 1975

Time Series of Log Transform of Japan/U.S. Foreign Exchange Rate

1950 1955 1960 1965 1970 1975

1950 1955 1960 1965 1970 1975

1970 1980 1990

Time Series of Log Transform of 10−Year Treasury Constant Maturity Rate

1970 1980 1990

1970 1980 1990

1950 1960 1970 1980 1990 2000 2010

Time Series of Log Transform of Civilian Unemployment Rate

1950 1960 1970 1980 1990 2000 2010

1950 1960 1970 1980 1990 2000 2010