Professional Documents
Culture Documents
Problem 1
1.
a)
truehist(trt,col ="gray", xlab="0 = PSID, 1 = NSW", ylab="Fraction", main="Histogram of Count of The Stu
lines(density(trt),col="blue", lwd=2)
10
5
0
0 = PSID, 1 = NSW
truehist(age, col ="gray", xlab="Age", ylab="Fraction", main="Histogram of Age of Participants", prob=TR
lines(density(age), col="blue", lwd=2)
1
Histogram of Age of Participants
0.00 0.01 0.02 0.03 0.04 0.05
Fraction
20 30 40 50
Age
truehist(educ, col ="gray",xlab="Years of Education", ylab="Fraction", main="Histogram of Years of Educa
lines(density(educ), col="blue", lwd=2)
0.10
0.00
0 5 10 15
Years of Education
truehist(black, col ="gray",xlab="0 = not black, 1 = black", ylab="Fraction", main="Histogram of Black P
lines(density(black), col="blue", lwd=2)
2
Histogram of Black Participants
7
6
5
Fraction
4
3
2
1
0
10
5
0
3
Histogram of Married Participants
8
6
Fraction
4
2
0
4
3
2
1
0
4
Histogram of Real Earnings in 1974
3e−05
2e−05
Fraction
1e−05
0e+00
5
Histogram of Real Earnings in 1978
3e−05
2e−05
Fraction
1e−05
0e+00
##
## Call:
## lm(formula = re78 ~ trt + age + educ + black + hisp + marr +
## nodeg + re74 + re75)
##
## Residuals:
## Min 1Q Median 3Q Max
## -64870 -4302 -435 3786 110412
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -129.74276 1688.51706 -0.077 0.9388
## trt 751.94643 915.25723 0.822 0.4114
## age -83.56559 20.81380 -4.015 6.11e-05 ***
## educ 592.61020 103.30278 5.737 1.07e-08 ***
6
## black -570.92797 495.17772 -1.153 0.2490
## hisp 2163.28118 1092.29036 1.981 0.0478 *
## marr 1240.51952 586.25391 2.116 0.0344 *
## nodeg 590.46695 646.78417 0.913 0.3614
## re74 0.27812 0.02792 9.960 < 2e-16 ***
## re75 0.56809 0.02756 20.613 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10070 on 2665 degrees of freedom
## Multiple R-squared: 0.5864, Adjusted R-squared: 0.585
## F-statistic: 419.8 on 9 and 2665 DF, p-value: < 2.2e-16
VIF for each variable is below 5, with the min of 1.04506 for hisp, the max of 3.871331 for re74. We conclude
that none of the variable implies strong collinearity and so we don’t omit any variable The statistically
significant variables (i.e. p-value of less than 0.05) are age, educ, hisp, marr, re74 and re75. We conclude that
the rest of the variables are not statistically significant at 5% significance level The regression has adjusted
R-squared of 0.585 Based on F-statistic, the regression is statistically significant different from zero at 1%
significance level
anova(fullmodel)
## Start: AIC=49323.01
## re78 ~ trt + age + educ + black + hisp + marr + nodeg + re74 +
## re75
##
## Df Sum of Sq RSS AIC
## - trt 1 6.8456e+07 2.7035e+11 49322
## - nodeg 1 8.4527e+07 2.7037e+11 49322
## - black 1 1.3482e+08 2.7042e+11 49322
## <none> 2.7028e+11 49323
## - hisp 1 3.9781e+08 2.7068e+11 49325
## - marr 1 4.5411e+08 2.7074e+11 49325
## - age 1 1.6348e+09 2.7192e+11 49337
7
## - educ 1 3.3376e+09 2.7362e+11 49354
## - re74 1 1.0061e+10 2.8034e+11 49419
## - re75 1 4.3093e+10 3.1338e+11 49717
##
## Step: AIC=49321.68
## re78 ~ age + educ + black + hisp + marr + nodeg + re74 + re75
##
## Df Sum of Sq RSS AIC
## - black 1 1.0495e+08 2.7046e+11 49321
## - nodeg 1 1.0708e+08 2.7046e+11 49321
## <none> 2.7035e+11 49322
## - marr 1 3.8715e+08 2.7074e+11 49324
## - hisp 1 4.2143e+08 2.7077e+11 49324
## - age 1 1.7112e+09 2.7206e+11 49337
## - educ 1 3.4257e+09 2.7378e+11 49353
## - re74 1 1.0011e+10 2.8036e+11 49417
## - re75 1 4.3032e+10 3.1338e+11 49715
##
## Step: AIC=49320.72
## re78 ~ age + educ + hisp + marr + nodeg + re74 + re75
##
## Df Sum of Sq RSS AIC
## - nodeg 1 8.9778e+07 2.7055e+11 49320
## <none> 2.7046e+11 49321
## - marr 1 4.7535e+08 2.7093e+11 49323
## - hisp 1 5.1045e+08 2.7097e+11 49324
## - age 1 1.6466e+09 2.7210e+11 49335
## - educ 1 3.7157e+09 2.7417e+11 49355
## - re74 1 1.0049e+10 2.8051e+11 49416
## - re75 1 4.3409e+10 3.1387e+11 49717
##
## Step: AIC=49319.61
## re78 ~ age + educ + hisp + marr + re74 + re75
##
## Df Sum of Sq RSS AIC
## <none> 2.7055e+11 49320
## - marr 1 4.3805e+08 2.7098e+11 49322
## - hisp 1 5.0861e+08 2.7105e+11 49323
## - age 1 1.5918e+09 2.7214e+11 49333
## - educ 1 5.9169e+09 2.7646e+11 49375
## - re74 1 1.0021e+10 2.8057e+11 49415
## - re75 1 4.3372e+10 3.1392e+11 49715
##
## Call:
## lm(formula = re78 ~ age + educ + hisp + marr + re74 + re75)
##
## Coefficients:
## (Intercept) age educ hisp marr
## 517.8421 -81.4208 547.7177 2405.4974 1124.5932
## re74 re75
## 0.2773 0.5682
The model with lowest AIC is the model that includes the variables age, educ, hisp, marr, re74 and re75
8
c)
ss=regsubsets(re78~trt + age + educ + black + hisp + marr + nodeg + re74 + re75,data=nsw74psid1, nbest=3
summary(ss)
9
a−r74−r75
n−r74−r75
Mallows CP
50
40
Statistic: cp
30
e−m−r74−r75
e−r74−r75
20
e−h−r74−r75
a−e−r74−r75
a−e−b−r74−r75
10
a−e−b−m−r74−r75
a−e−m−r74−r75 t−a−e−h−m−n−r74−r75
a−e−b−h−r74−r75
a−e−h−r74−r75 t−a−e−b−h−m−r74−r75
a−e−b−h−m−n−r74−r75
t−a−e−h−m−r74−r75
a−e−b−h−m−r74−r75
a−e−h−m−n−r74−r75
a−e−h−m−r74−r75
0
2 4 6 8
Subset Size
## Abbreviation
## trt t
## age a
## educ e
## black b
## hisp h
## marr m
## nodeg n
## re74 r74
## re75 r75
We observe that the regression with the lowest Cp statistic is the regression with 7 variables that includes
age, educ,hisp, marr, re74 and re75. We choose this model to proceed since the model with the lowest Cp
statistic in magnitude should be our preferred choice.
d)
chosenmodel <- lm(re78~age + educ + hisp + marr + re74 + re75)
summary(chosenmodel)
##
## Call:
## lm(formula = re78 ~ age + educ + hisp + marr + re74 + re75)
##
## Residuals:
## Min 1Q Median 3Q Max
## -64914 -4316 -498 3722 110303
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 517.84209 1182.98921 0.438 0.6616
10
## age -81.42078 20.55047 -3.962 7.63e-05 ***
## educ 547.71773 71.70329 7.639 3.04e-14 ***
## hisp 2405.49742 1074.08566 2.240 0.0252 *
## marr 1124.59319 541.07870 2.078 0.0378 *
## re74 0.27729 0.02789 9.941 < 2e-16 ***
## re75 0.56817 0.02747 20.681 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10070 on 2668 degrees of freedom
## Multiple R-squared: 0.586, Adjusted R-squared: 0.585
## F-statistic: 629.4 on 6 and 2668 DF, p-value: < 2.2e-16
plot(chosenmodel$fit,chosenmodel$res, col="skyblue4",lwd=10, main="Residuals vs. Fitted Values of Chosen
abline(h=0,lwd=2,col = "red3")
50000
0
−50000
chosenmodel$fit
There is a somewhat increasing magnitude in the residuals as fitted values increase in magnitude. This implies
a pattern in the residuals of the chosen model, and might imply heteroskedaticity. Even so, approximately
half of the points lie above and on top of the zero line.
e)
vif(chosenmodel)
11
VIF of The Variables in The Chosen Model
3.5
vif(chosenmodel)
3.0
2.5
2.0
1.5
1.0
1 2 3 4 5 6
Index
Here the variable with highest VIF is re75, which implies it has strongest likelihood of collinearity. However,
the VIFs for all variables are below five so none of it implies collinearity strong enough. This makes sense
since real earnings of different years are somewhat likely to move in the same directions.
f)
library(corrplot)
chosenmodelcorr = cor(nsw74psid1)
corrplot(chosenmodelcorr, method="number", main="Correlation Plot")
12
Correlation Plot
nodeg
black
educ
marr
re74
re75
re78
hisp
age
trt
1
trt 1 −0.22−0.15 0.33 0.04 −0.45 0.22 −0.32−0.32−0.25
0.8
age −0.22 1 −0.19 −0.1 0 0.27 0.17 0.25 0.22 0.11
0.6
educ −0.15−0.19 1 −0.34−0.09 0.05 −0.76 0.33 0.35 0.37
0.4
black 0.33 −0.1 −0.34 1 −0.12−0.25 0.32 −0.29−0.29−0.27
0.2
hisp 0.04 0 −0.09−0.12 1 0.01 0.06 −0.03−0.05−0.01
0
marr −0.45 0.27 0.05 −0.25 0.01 1 −0.1 0.28 0.27 0.22
−0.2
nodeg 0.22 0.17 −0.76 0.32 0.06 −0.1 1 −0.28−0.29 −0.3
−0.4
re74 −0.32 0.25 0.33 −0.29−0.03 0.28 −0.28 1 0.86 0.7
−0.6
re75 −0.32 0.22 0.35 −0.29−0.05 0.27 −0.29 0.86 1 0.74
−0.8
re78 −0.25 0.11 0.37 −0.27−0.01 0.22 −0.3 0.7 0.74 1
−1
The variables are mostly weakly correlated with each other - some positive and some negative. Some notable
correlations include no degree status is highly negatively correlated with years of education (with value -0.76),
which makes sense since no degree implies a person is unlikely to spend more years in education. Another is
re75 is highly positively correlated with re74 (with value 0.86). This also makes sense since the higher the
earnings in the previous year the more likely it has high earnings in the current year. The same argument
applies for the high positive correlation between re78 and re74 and re78 and re75.
g)
chosenmodel_cook=cooks.distance(chosenmodel)
plot(chosenmodel_cook,ylab="Cook's distance",type='o',main="Cook's Distance Plot",col="skyblue4")
13
Cook's Distance Plot
Index There
seems to be an outlier with the highest Cook’s distance of 0.25. Another observation with CD of 0.15 and
approximately 0.14 might also be potential outliers. We might want to omit them from our regression model
h)
truehist(chosenmodel$res,col="skyblue3",xlab="Residuals",ylab="Fraction",main="Histogram of Residuals")
Histogram of Residuals
6e−05
4e−05
Fraction
2e−05
0e+00
Residuals There
seems to be an outlier with the highest Cook’s distance of 0.25. Another observation with CD of 0.15 and
14
approximately 0.14 might also be potential outliers. We might want to omit them from our regression model
i)
qqnorm(chosenmodel$res,col="skyblue4", main="QQ Normal Plot")
qqline(chosenmodel$res, col=2)
QQ Normal Plot
100000
Sample Quantiles
50000
0
−50000
−3 −2 −1 0 1 2 3
Theoretical Quantiles
#abline(0,1,col="red",lwd=2,lty=2)
We expect a 1-1 correspondence of x and y values if the model is good. The QQ plot shows a curve with
some parts above the diagonal 45 degree line and some parts below it, but it has the overal trend of a 1-1
correspondence.
j)
plot(chosenmodel$fit, re78,col="skyblue4", xlab="Predicted Response",ylab="Observed Response",main="Obse
lines(lowess(chosenmodel$fit, re78),lwd=2)
abline(0,1,col="red",lwd=2,lty=2)
legend(150,110000, c(expression(y[obs]==y[pred]), "Lowess Smoother"), fill =c("red", "black"),cex=1,bty=
15
Observed vs. Predicted Response
120000
yobs = ypred
Observed Response
Lowess Smoother
80000
40000
0
setwd("/Users/AldaSiamipar/Documents/Econ144/Data")
library(TTR)
library(lattice)
df <- read.table("Chapter2no2.txt", header = TRUE)
summary(df)
16
Fraction Histogram of Growth in GDP
0.0
−3 −2 −1 0 1 2 3 4
Growth in GDP
0.00
S&P500 Returns
#timeseries2
plot.ts(timeseries2, main = "Time Series of Growth in US GDP and S&P500 Returns")
Time
17
acf(df$GRGDP, plot=TRUE)
1.0
0.8
0.6 Series df$GRGDP
ACF
0.4
0.2
0.0
0 5 10 15 20
Lag
acf(df$RETURN, plot=TRUE)
Series df$RETURN
1.0
0.8
0.6
ACF
0.4
0.2
0.0
0 5 10 15 20
Lag
18
The time series plot for both growth in US GDP and S&P500 Returns show first order weak stationarity as
both plots cross the respective means (that are constant in time) very often. There are hints of seasonality
but these seasonality appear to be weak and inconsistent - the magnitudes of the dips vary, as well as the
length from one peak to another. The acf plot of the time series also implies the variance also does not vary
much with time, which implies covariance stationarity.
setwd("/Users/AldaSiamipar/Documents/Econ144/Data")
df3 <- read.table("Chapter3no1.txt", header = TRUE)
summary(df3)
Time
logtimeseries3 <- log(timeseries3)
plot.ts(logtimeseries3, main="Time Series of Log Transform of Consumption Growth and Income Growth")
19
Time Series of Log Transform of Consumption Growth and Income Growth
6
date
4
2
9.0 0
8.5
rpce
8.0
9.0 7.5
rdpi
8.5
8.0
7.5
Time
difflogtimeseries3 <- diff(logtimeseries3)
plot.ts(difflogtimeseries3, main="Time Series of Difference of Log Transform of Consumption Growth and I
Time Series of Difference of Log Transform of Consumption Growth and Income Growth
4
2
date
0
0.02 −4
rpce
0.00
−0.02
0.04
rdpi
0.00
−0.04
Time
a) Permanent income theory states that agents spread consumption over their life times - it supposes that a
person’s consumption at a point in time is determined not just by their current income but also by their
20
expected income in future yeares. We see that the fluctuations of consumption growth is less volatile than
fluctuations of income growth. This agrees with the permanent income model since it implies that large
fluctuations in income growth don’t correspond to as large of fluctuations of consumption growth since one’s
consumption does not only depend on current income.
b)
#difflogtimeseries3[,2]
m = lm(difflogtimeseries3[,2]~difflogtimeseries3[,3])
summary(m)
##
## Call:
## lm(formula = difflogtimeseries3[, 2] ~ difflogtimeseries3[, 3])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0303967 -0.0029727 0.0001525 0.0030417 0.0244545
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0022542 0.0002242 10.055 < 2e-16 ***
## difflogtimeseries3[, 3] 0.1745671 0.0292029 5.978 3.77e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.005318 on 637 degrees of freedom
## Multiple R-squared: 0.05312, Adjusted R-squared: 0.05163
## F-statistic: 35.73 on 1 and 637 DF, p-value: 3.768e-09
Both the intercept and coefficient of the disposable income growth are statistically significant since their
p-values are sufficiently small at 1% significance level. The slope coefficient indicates that for every increase
in growth of disposable income by one percentage point, the growth of consumption increases by .174571
percentage points, agreeing with permanent income hypothesis. The R-squared is 0.005318 which means that
variation in growth of disposable income explains roughly 5%lag variation in growth of consumption rate -
which implies our regression model is not very good.
c)
library(dyn)
library(xts)
lagged <- lag.xts(difflogtimeseries3[,3], lag=-1)
#lagged
#difflogtimeseries3[,3]
mpluslag = lm(difflogtimeseries3[,2]~difflogtimeseries3[,3] + lagged)
summary(mpluslag)
##
## Call:
## lm(formula = difflogtimeseries3[, 2] ~ difflogtimeseries3[, 3] +
## lagged)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0300734 -0.0028702 -0.0000006 0.0029797 0.0255131
##
## Coefficients:
21
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0019887 0.0002405 8.268 7.98e-16 ***
## difflogtimeseries3[, 3] 0.1872692 0.0293883 6.372 3.58e-10 ***
## lagged 0.0828596 0.0293877 2.820 0.00496 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.005285 on 635 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.06479, Adjusted R-squared: 0.06185
## F-statistic: 22 on 2 and 635 DF, p-value: 5.8e-10
The lagged variable that is just added is statistically significant at 1% significance level. Its slope coefficient
indicates that a percentage change in the growth of disposable income of the last period leads to .08 change
in percentage of the growth of consumption rate. The R-squared is higher, with the adjusted R squared being
0.06185. The total marginal effect of income growth on consumption growth is 0.27% which is larger than
the regression in b). We may want to include t-1 lag and experiment with additional lags to estimate this
regression.
setwd("/Users/AldaSiamipar/Documents/Econ144/Data")
df4a <- read.table("Chapter3no3a.txt", header = TRUE)
summary(df4a)
## date rgdp
## 1/1/00 : 1 Min. : 1766
## 1/1/01 : 1 1st Qu.: 3178
## 1/1/02 : 1 Median : 5821
## 1/1/03 : 1 Mean : 6522
## 1/1/04 : 1 3rd Qu.: 9176
## 1/1/05 : 1 Max. :13491
## (Other):255
ts3a <-ts(df4a, frequency=4, start=c(1947,1))
plot.ts(ts3a, main="Time Series of U.S. Real GDP")
22
Time Series of U.S. Real GDP
250
date
100
0
8000
rgdp
2000
Time
logts3a<- log(ts3a)
plot.ts(logts3a, main="Time Series of Log Transform of U.S. Real GDP")
2
9.50
rgdp
8.5
7.5
Time
difflogts3a <- diff(logts3a)
plot.ts(difflogts3a, main="Difference of Log Transform of U.S. Real GDP")
23
Difference of Log Transform of U.S. Real GDP
4
date
0
0.01 0.04−4
rgdp
−0.03
Time
a) Exact Definition: Real GDP (Gross Domestic Product) is the inflation adjusted value of the goods and
services produced by factors of production located in United States Periodicity/Frequency: quarterly
(1947Q1 - 2012Q1) Units: y is in Billions of Chained 2005 Dollars, x is in quarterly increments of time
starting from January 1947 to January 2012
Q: Is the underlying stochastic process first and second order weakly stationary?
A: the plot of RGDP against time that the underlying stochastic process is not first order weakly stationary
and not second order weakly stationary. We observe occasional dips and peaks that are indicative of recessions
and booms. Each observation at different times have different means. But when we take the difference of log
transformation of the time series we see that the underlying stochastic process satisfies first and second order
weakly stationary. As we can see the plot often passes a common mean and the variance does not depend on
time as we don’t really a discern a pattern on the fluctuations.
setwd("/Users/AldaSiamipar/Documents/Econ144/Data")
df4b <- read.table("Chapter3no3b.txt", header = TRUE)
summary(df4b)
## DATE jpy_usd
## 1/10/00: 1 Min. : 75.72
## 1/10/01: 1 1st Qu.:110.09
## 1/10/02: 1 Median :130.34
## 1/10/03: 1 Mean :168.75
## 1/10/05: 1 3rd Qu.:235.64
## 1/10/06: 1 Max. :358.44
## (Other):10385
#df4b
24
Time Series of Japan/U.S. Foreign Exchange Rate
10000
DATE
4000
0
jpy_usd
250
100
Time
logts3b<- log(ts3b)
plot.ts(logts3b, main="Time Series of Log Transform of Japan/U.S. Foreign Exchange Rate")
4
0
5.5
jpy_usd
4.5
Time
difflogts3b <- diff(logts3b)
plot.ts(difflogts3b, main="Time Series of Difference of Log Transform of Japan/U.S Foreign Exchange Rate
25
Time Series of Difference of Log Transform of Japan/U.S Foreign Exchange Rate
2
DATE
−2
−6
jpy_usd
0.00
−0.10
Time
b) Exact Definition: Japan/U.S. Foreign exchange rate is the buying rate of $1 U.S. Dollar in terms of
Japanese Yen Periodicity/Frequency: daily (1971-01-04) to (2012-06-01) Units: y is in Japanese Yen to one
U.S. Dollar, x is in daily increments of time starting from 4th January 1971 to 6th January 2012
Q: Is the underlying stochastic process first and second order weakly stationary?
A: the plot of U.S./Japanese Exchange Rate against time that the underlying stochastic process is not
stationary - in fact it shows a downward trend. The exchange rate fluctuated in a few years but overall it
consistently follows a downward trend. But when we take the difference of log transformation of the time
series we see that the underlying stochastic process satisfies first and second order weakly stationary. As we
can see the plot often passes a common mean and the variance does not depend on time as we don’t really a
discern a pattern on the fluctuations - althuogh we observe more extreme fluctuations at the beginning of the
dataset.
setwd("/Users/AldaSiamipar/Documents/Econ144/Data")
df4c <- read.table("Chapter3no3c.txt", header = TRUE)
summary(df4c)
## DATE CMRate10Yr
## 1/1/01 : 1 Min. : 0.000
## 1/1/02 : 1 1st Qu.: 4.340
## 1/1/03 : 1 Median : 6.160
## 1/1/04 : 1 Mean : 6.407
## 1/1/07 : 1 3rd Qu.: 7.950
## 1/1/08 : 1 Max. :15.840
## (Other):13152
#df4c
ts3c <-ts(df4c, frequency=365.25, start=c(1962,1,2))
plot.ts(ts3c, main="Time Series of 10-Year Treasury Constant Maturity Rate")
26
DATE Time Series of 10−Year Treasury Constant Maturity Rate
6000
10 150
CMRate10Yr
5
0
Time
logts3c<- log(ts3c)
plot.ts(logts3c, main="Time Series of Log Transform of 10-Year Treasury Constant Maturity Rate")
4
2.5 0
CMRate10Yr
1.5
0.5
Time
difflogts3c <- diff(logts3c)
plot.ts(difflogts3c, main="Time Series of Difference of Log Transform of 10-Year Treasury Constant Matur
27
Time Series of Difference of Log Transform of 10−Year Treasury Constant Maturity Rate
6
−2 2
DATE
−8
CMRate10Yr
0.00
−0.15
Time
c) Exact Definition: Yields on 10-year treasury bonds of the U.S. government adjusted to constant maturities
Periodicity/Frequency: daily (1962-01-02 to 2012-06-07) Units: y is in percentage of 10-Year Treasury
Constant Maturity Rate, x is in daily increments of time starting from 2nd January 1962 to 7th June 2012
Q: Is the underlying stochastic process first and second order weakly stationary?
A: the plot of 10-Year Treasury Constant Maturity Rate against time shows that the underlying stochastic
process is not stationary. The rate seems to increase up to late 1970s but shows a decreasing trend since then.
But when we take the difference of log transformation of the time series we see that the underlying stochastic
process satisfies first and second order weakly stationary. As we can see the plot often passes a common
mean and the variance does not depend on time as we don’t really a discern a pattern on the fluctuations -
althuogh we observe more extreme fluctuations at the end of the dataset.
setwd("/Users/AldaSiamipar/Documents/Econ144/Data")
df4d <- read.table("Chapter3no3d.txt", header = TRUE)
summary(df4d)
## DATE unemrate
## 1/1/00 : 1 Min. : 2.500
## 1/1/01 : 1 1st Qu.: 4.600
## 1/1/02 : 1 Median : 5.600
## 1/1/03 : 1 Mean : 5.785
## 1/1/04 : 1 3rd Qu.: 6.800
## 1/1/05 : 1 Max. :10.800
## (Other):767
#df4d
ts3d <-ts(df4d, frequency=12, start=c(1948,1))
plot.ts(ts3d, main="Time Series of Civilian Unemployment Rate")
28
Time Series of Civilian Unemployment Rate
800
DATE
400
0
unemrate
4 6 8
Time
logts3d<- log(ts3d)
plot.ts(logts3d, main="Time Series of Log Transform of Civilian Unemployment Rate")
4
2
0
2.0
unemrate
1.0
Time
difflogts3d <- diff(logts3d)
plot.ts(difflogts3d, main="Time Series of Difference of Log Transform of Civilian Unemployment Rate")
29
Time Series of Difference of Log Transform of Civilian Unemployment Rate
4
DATE
0
0.2 −4
unemrate
0.0
−0.2
Time
d) Exact Definition: The unemployment rate represents the number of unemployed as a percentage of the
labor force. Labor force data are restricted to people 16 years of age and older, who currently reside in
1 of the 50 states or the District of Columbia, who do not reside in institutions (e.g., penal and mental
facilities, homes for the aged), and who are not on active duty in the Armed Forces. Periodicity/Frequency:
12 (monthly) Units: y is in percentage of Civilian Unemployment Rate, x is in monthly increments of time
starting from January 1948
Q: Is the underlying stochastic process first and second order weakly stationary?
A: the plot of Civilian Unemployment Rate shows cycles/seasonality, with slight upward trend.But when
we take the difference of log transformation of the time series we see that the underlying stochastic process
satisfies first and second order weakly stationary. As we can see the plot often passes a common mean and
the variance does not depend on time as we don’t really a discern a pattern on the fluctuations - although we
observe more extreme fluctuations at the beginning of the dataset, and the fluctuations somehow seem to
diminish a bit over time.
## OBS P R Pgrowth
## 1980Q2 : 1 Min. : 42.50 Min. : 4.310 Min. :-5.3069
## 1980Q3 : 1 1st Qu.: 63.52 1st Qu.: 6.297 1st Qu.: 0.2814
## 1980Q4 : 1 Median : 77.51 Median : 7.816 Median : 1.0533
## 1981Q1 : 1 Mean : 90.26 Mean : 8.758 Mean : 0.8489
## 1981Q2 : 1 3rd Qu.:121.33 3rd Qu.:10.333 3rd Qu.: 1.7130
## 1981Q3 : 1 Max. :161.98 Max. :17.740 Max. : 4.3114
30
## (Other):120
## Rchanges
## Min. :-2.15667
## 1st Qu.:-0.32917
## Median :-0.10892
## Mean :-0.08032
## 3rd Qu.: 0.16950
## Max. : 1.61333
##
#qdata
qdatats <- ts(qdata,frequency=4, start=c(1980, 4))
plot.ts(qdatats)
qdatats
120
4
Pgrowth
2
80
OBS
0
40
−4
1600
Rchanges
1
120
0
P
80
−1
−2
16 40
Time
R
8
4
Time
par(mfrow=c(2,2))
Pacf <- acf(qdata$P, plot=TRUE)
Pgrowthacf <- acf(qdata$Pgrowth, plot = TRUE)
Racf <- acf(qdata$R, plot=TRUE)
Rchangesacf <- acf(qdata$Rchanges, plot = TRUE)
31
Series qdata$P Series qdata$Pgrowth
0.6
0.6
ACF
ACF
−0.2
−0.2
0 5 10 15 20 0 5 10 15 20
Lag Lag
0.6
0.6
ACF
ACF
−0.2
−0.2
0 5 10 15 20 0 5 10 15 20
Lag Lag
Based on the ACF plots, the House Prices (P) has stronger time dependence than the percentage of real
interest rates(R). The ACF of growth of house prices also reflect stronger time dependence than the changes
in real interest rates - though the ACFs of growth of house prices and the changes in real interest rates do
not show significant/great magitudes of autocorrelations.
par(mfrow=c(2,2))
Ppacf <- pacf(qdata$P, plot=TRUE)
Pgrowthpacf <- pacf(qdata$Pgrowth, plot = TRUE)
Rpacf <- pacf(qdata$R, plot=TRUE)
Rchangespacf <- pacf(qdata$Rchanges, plot = TRUE)
32
Partial ACF Series qdata$P Series qdata$Pgrowth
Partial ACF
0.6
0.2
−0.4
−0.2
5 10 15 20 5 10 15 20
Lag Lag
Partial ACF
0.1
−0.2 0.4
5 10 15 20 −0.2 5 10 15 20
Lag Lag
Based on the PACF plots, the House Prices(P) has similar magnitude of time dependence compared with the
percentage of real interest rates. The PACF of house prices for lag greater than 1 is negative while the PACF
of real interest rates for lag greater than 1 oscillate between negative and positive. However, the PACF of
changes in interest rates show greater magnitudes in general at each lag compared to PACF of growth in
house prices.
ANNUAL DATA
setwd("/Users/AldaSiamipar/Documents/Econ144/Data")
adata <-read.table("Chapter4no1.txt", header=TRUE)
#data
par(mfrow=c(2,2))
Pacf <- acf(adata$P, plot=TRUE)
Pgrowthacf <- acf(adata$Pgrowth, plot = TRUE)
Racf <- acf(adata$R, plot=TRUE)
Rchangesacf <- acf(adata$Rchange, plot = TRUE)
33
Series adata$P Series adata$Pgrowth
0.6
0.6
ACF
ACF
−0.2
−0.2
0 5 10 15 0 5 10 15
Lag Lag
0.6
ACF
ACF
−0.2
0 5 10 15 −0.2 0 5 10 15
34
Partial ACF Series adata$P Series adata$Pgrowth
Partial ACF
0.6
−0.2 0.4
−0.2
2 4 6 8 10 12 14 2 4 6 8 10 12 14
Lag Lag
Partial ACF
0.6
0.1
−0.2
2 4 6 8 10 12 14 −0.3 2 4 6 8 10 12 14
35