You are on page 1of 4

School of Business

OPIM 5503 Data Analytics Using R


Session 5a

Maximum Likelihood Estimation (continued)


Multivariate Distributions
Seemingly Unrelated Regression (SUR)
In the previous sessions, we examined a number of univariate distributions. For example, for a univariate
normal distribution

x = rnorm(100,mean = 10, sd = 4)
dnorm(8,mean = 10,sd = 4)
[1] 0.08801633
plot(density(x))

A univariate normal distribution can be depicted as

N (, 2 )

A multivariate distribution is a probability distribution of two or more variables.

1 12 122
N , 2 2


2 12 2

To work with multivariate distributions, we need to use the package mvtnorm. Install and load the package.

1
install.packages("mvtnorm")
library(mvtnorm)

Suppose we want to create random data from the following joint distribution.

4 2 1
N ,

10 1 3

The syntax is:

M = c(4,10)
S = matrix(c(2,1,1,3),nrow = 2,ncol = 2)
S
x = rmvnorm(1000,M,sigma = S)
x
[,1] [,2]
[1,] 3.4458896 9.982241
[2,] 1.6570184 9.352386
[3,] 4.9145387 8.972307
[4,] 5.7328493 10.121716
[5,] 3.8460262 10.883015
[6,] 5.1941458 10.597796

If you want to find out the probability of observing values of 4 and 9 for the two variables,

dmvnorm(c(4,9),mean = M,sigma = S)
[1] 0.05827419

To create the density function, we need the function kde2d() from the MASS package.

library(MASS)
y = kde2d(x = x[,1],y=x[,2])
persp(y,col="red",theta = 30)

Seemingly Unrelated Regression

Read the file hsb2.csv

hsb2 <- read.csv("C://hsb2.csv")


2
head(hsb2)
id female race ses schtyp prog read write math science socst
1 70 male white low public general 57 52 41 47 57
2 121 female white middle public vocation 68 59 53 63 61
3 86 male white high public general 44 33 54 58 31
4 141 male white high public vocation 63 44 47 53 56
5 172 male white middle public academic 47 52 57 53 61
6 113 male white middle public academic 44 52 51 63 61

Suppose we want to build the following regression models.


read 0 1 ( female) 2 ( ses ) 3 ( socst ) 1
math 0 1 ( female) 2 ( ses ) 3 ( science) 2
One can build these two regressions separately. As we discussed before, the error terms are specified as:

1 ~ N (0, 12 ); 2 ~ N (0, 22 )

reg1 = lm(read~as.numeric(hsb2$female)+as.numeric(hsb2$ses)+socst,data =
hsb2)
summary(reg1)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.89628 3.87283 5.396 1.96e-07 ***
as.numeric(hsb2$female) 1.77089 1.13889 1.555 0.122
as.numeric(hsb2$ses) -0.88930 0.67222 -1.323 0.187
socst 0.58583 0.05373 10.903 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

reg2 = lm(math~as.numeric(hsb2$female)+as.numeric(hsb2$ses)+science,data =
hsb2)
summary(reg2)

(Intercept) 24.88929 3.41436 7.290 7.46e-12 ***


as.numeric(hsb2$female) -0.93640 1.04268 -0.898 0.370
as.numeric(hsb2$ses) -0.77059 0.60771 -1.268 0.206
science 0.59406 0.05303 11.202 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

An implicit assumption in the above is that the error terms in the two models are uncorrelated. But perhaps
they are. For example, a student who does better than expected in reading (positive error) may also do better in
math (positive error) as the student may be overall smart; or perhaps the errors are negatively correlated (one
who is good in math may not be good in reading and vice versa). A better approach is to consider possibility
of correlation between the two errors terms as shown below.

1 0 12 122
N 0, 2 2

2
12 2

This approach is called seemingly unrelated regression. The following code provides the maximum
likelihood estimation for this specification.

3
library(bbmle)

LLSUR = function(a0,a1,a2,a3,b0,b1,b2,b3,s1,s2,s12)
{
y1 = a0 + a1*(as.numeric(hsb2$female)) + a2*(as.numeric(hsb2$ses)) +
a3*(hsb2$socst)
y2 = b0 + b1*(as.numeric(hsb2$female)) + b2*(as.numeric(hsb2$ses)) +
b3*(hsb2$science)
e1 = hsb2$read-y1
e2 = hsb2$math-y2
S = matrix(c(s1,s12,s12,s2),nrow = 2,ncol = 2)
LLsum = sum(dmvnorm(cbind(e1,e2),mean = c(0,0),sigma = S,log = T))
return(-1*LLsum)
}
res1 = mle2(minuslogl = LLSUR,
start =
list(a0=mean(hsb2$read),a1=0,a2=0,a3=0,b0=mean(hsb2$math),b1=0,b2=0,b3=0,s1=1
00,s2=100,s12=cov(hsb2$read,hsb2$math)))
summary(res1)

mle2(minuslogl = LLSUR, start = list(a0 = mean(hsb2$read), a1 = 0,


a2 = 0, a3 = 0, b0 = mean(hsb2$math), b1 = 0, b2 = 0, b3 = 0,
s1 = 100, s2 = 100, s12 = cov(hsb2$read, hsb2$math)))

Coefficients:
Estimate Std. Error z value Pr(z)
a0 25.405590 4.108955 6.1830 6.290e-10 ***
a1 1.705661 1.133366 1.5050 0.132336
a2 -1.055350 0.670755 -1.5734 0.115631
a3 0.508460 0.058770 8.6518 < 2.2e-16 ***
b0 29.949253 3.761965 7.9611 1.706e-15 ***
b1 -0.668848 1.045196 -0.6399 0.522221
b2 -0.918586 0.609210 -1.5078 0.131597
b3 0.495192 0.061306 8.0774 6.615e-16 ***
s1 63.499317 6.428655 9.8775 < 2.2e-16 ***
s2 52.924588 5.424540 9.7565 < 2.2e-16 ***
s12 16.183306 5.338093 3.0317 0.002432 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Comparing the results to separate regressions, we see that the coefficients are somewhat different. Is there
correlation between the two error terms?

16.1833/(sqrt(63.499)*sqrt(52.9245))
[1] 0.2791613

Yet another illustration of how the maximum likelihood principle can enhance and allow you to design and
estimate more complicated specifications.

You might also like