Maximum Likelihood Estimation (Continued) Multivariate Distributions Seemingly Unrelated Regression (SUR)

School of Business
OPIM 5503 Data Analytics Using R

Session 5a
Maximum Likelihood Estimation (continued)

Multivariate Distributions
Seemingly Unrelated Regression (SUR)
In the previous sessions, we examined a number of univariate distributions. For example, for a univariate
normal distribution
x = rnorm(100,mean = 10, sd = 4)
dnorm(8,mean = 10,sd = 4)
[1] 0.08801633
plot(density(x))
A univariate normal distribution can be depicted as
N (, 2 )
A multivariate distribution is a probability distribution of two or more variables.
1 12 122
N , 2 2

2 12 2
To work with multivariate distributions, we need to use the package mvtnorm. Install and load the package.
1
install.packages("mvtnorm")
library(mvtnorm)
Suppose we want to create random data from the following joint distribution.
4 2 1
N ,

10 1 3
The syntax is:
M = c(4,10)
S = matrix(c(2,1,1,3),nrow = 2,ncol = 2)
S
x = rmvnorm(1000,M,sigma = S)
x
[,1] [,2]
[1,] 3.4458896 9.982241
[2,] 1.6570184 9.352386
[3,] 4.9145387 8.972307
[4,] 5.7328493 10.121716
[5,] 3.8460262 10.883015
[6,] 5.1941458 10.597796
If you want to find out the probability of observing values of 4 and 9 for the two variables,
dmvnorm(c(4,9),mean = M,sigma = S)
[1] 0.05827419
To create the density function, we need the function kde2d() from the MASS package.
library(MASS)
y = kde2d(x = x[,1],y=x[,2])
persp(y,col="red",theta = 30)
Seemingly Unrelated Regression
Read the file hsb2.csv
hsb2 <- read.csv("C://hsb2.csv")

2
head(hsb2)
id female race ses schtyp prog read write math science socst
1 70 male white low public general 57 52 41 47 57
2 121 female white middle public vocation 68 59 53 63 61
3 86 male white high public general 44 33 54 58 31
4 141 male white high public vocation 63 44 47 53 56
5 172 male white middle public academic 47 52 57 53 61
6 113 male white middle public academic 44 52 51 63 61
Suppose we want to build the following regression models.

read 0 1 ( female) 2 ( ses ) 3 ( socst ) 1
math 0 1 ( female) 2 ( ses ) 3 ( science) 2
One can build these two regressions separately. As we discussed before, the error terms are specified as:
1 ~ N (0, 12 ); 2 ~ N (0, 22 )
reg1 = lm(read~as.numeric(hsb2$female)+as.numeric(hsb2$ses)+socst,data =
hsb2)
summary(reg1)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.89628 3.87283 5.396 1.96e-07 ***
as.numeric(hsb2$female) 1.77089 1.13889 1.555 0.122
as.numeric(hsb2$ses) -0.88930 0.67222 -1.323 0.187
socst 0.58583 0.05373 10.903 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
reg2 = lm(math~as.numeric(hsb2$female)+as.numeric(hsb2$ses)+science,data =
hsb2)
summary(reg2)
(Intercept) 24.88929 3.41436 7.290 7.46e-12 ***

as.numeric(hsb2$female) -0.93640 1.04268 -0.898 0.370
as.numeric(hsb2$ses) -0.77059 0.60771 -1.268 0.206
science 0.59406 0.05303 11.202 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
An implicit assumption in the above is that the error terms in the two models are uncorrelated. But perhaps
they are. For example, a student who does better than expected in reading (positive error) may also do better in
math (positive error) as the student may be overall smart; or perhaps the errors are negatively correlated (one
who is good in math may not be good in reading and vice versa). A better approach is to consider possibility
of correlation between the two errors terms as shown below.
1 0 12 122
N 0, 2 2

2
12 2
This approach is called seemingly unrelated regression. The following code provides the maximum
likelihood estimation for this specification.
3
library(bbmle)
LLSUR = function(a0,a1,a2,a3,b0,b1,b2,b3,s1,s2,s12)
{
y1 = a0 + a1*(as.numeric(hsb2$female)) + a2*(as.numeric(hsb2$ses)) +
a3*(hsb2$socst)
y2 = b0 + b1*(as.numeric(hsb2$female)) + b2*(as.numeric(hsb2$ses)) +
b3*(hsb2$science)
e1 = hsb2$read-y1
e2 = hsb2$math-y2
S = matrix(c(s1,s12,s12,s2),nrow = 2,ncol = 2)
LLsum = sum(dmvnorm(cbind(e1,e2),mean = c(0,0),sigma = S,log = T))
return(-1*LLsum)
}
res1 = mle2(minuslogl = LLSUR,
start =
list(a0=mean(hsb2$read),a1=0,a2=0,a3=0,b0=mean(hsb2$math),b1=0,b2=0,b3=0,s1=1
00,s2=100,s12=cov(hsb2$read,hsb2$math)))
summary(res1)
mle2(minuslogl = LLSUR, start = list(a0 = mean(hsb2$read), a1 = 0,

a2 = 0, a3 = 0, b0 = mean(hsb2$math), b1 = 0, b2 = 0, b3 = 0,
s1 = 100, s2 = 100, s12 = cov(hsb2$read, hsb2$math)))
Coefficients:
Estimate Std. Error z value Pr(z)
a0 25.405590 4.108955 6.1830 6.290e-10 ***
a1 1.705661 1.133366 1.5050 0.132336
a2 -1.055350 0.670755 -1.5734 0.115631
a3 0.508460 0.058770 8.6518 < 2.2e-16 ***
b0 29.949253 3.761965 7.9611 1.706e-15 ***
b1 -0.668848 1.045196 -0.6399 0.522221
b2 -0.918586 0.609210 -1.5078 0.131597
b3 0.495192 0.061306 8.0774 6.615e-16 ***
s1 63.499317 6.428655 9.8775 < 2.2e-16 ***
s2 52.924588 5.424540 9.7565 < 2.2e-16 ***
s12 16.183306 5.338093 3.0317 0.002432 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Comparing the results to separate regressions, we see that the coefficients are somewhat different. Is there
correlation between the two error terms?
16.1833/(sqrt(63.499)*sqrt(52.9245))
[1] 0.2791613
Yet another illustration of how the maximum likelihood principle can enhance and allow you to design and
estimate more complicated specifications.

Maximum Likelihood Estimation (Continued) Multivariate Distributions Seemingly Unrelated Regression (SUR)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Maximum Likelihood Estimation (Continued) Multivariate Distributions Seemingly Unrelated Regression (SUR)

Uploaded by

Copyright:

Available Formats

School of Business

OPIM 5503 Data Analytics Using R

Maximum Likelihood Estimation (continued)

A univariate normal distribution can be depicted as

A multivariate distribution is a probability distribution of two or more variables.

The syntax is:

Seemingly Unrelated Regression

Read the file hsb2.csv

hsb2 <- read.csv("C://hsb2.csv")

Suppose we want to build the following regression models.

(Intercept) 24.88929 3.41436 7.290 7.46e-12 ***

mle2(minuslogl = LLSUR, start = list(a0 = mean(hsb2$read), a1 = 0,

You might also like