You are on page 1of 20

wage

female
educ
exper
expersq
tenure
race

Daily Wage
Dummy Variable =1 if female
Education in Years
Experience in Months
Square of Education
Tenure in Months
Race 1=Black

Descriptive Statistics:
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------wage |
526
5.896103
3.693086
.53
24.98
female |
526
.4790875
.500038
0
1
educ |
526
12.56274
2.769022
0
18
exper |
526
17.01711
13.57216
1
51
expersq |
526
473.4354
616.0448
1
2601
tenure |
526
5.104563
7.224462
0
44
race |
526
.1026616
.3038053
0
1
----------------------------------------------------------------------

. bysort female: summ wage


-> female = 0
Variable |

Obs

Mean

Std. Dev.

Min

Max

-------------+-------------------------------------------------------wage |

274

7.099489

4.160858

1.5

24.98

--------------------------------------------------------------------------> female = 1
Variable |

Obs

Mean

Std. Dev.

Min

Max

-------------+-------------------------------------------------------wage |

252

4.587659

2.529363

.53

21.63

Our Aim is to explain the Wage Differential between Male and Female
We are trying to find whether females are discriminated in the work
place
If they are discriminated than the average wage received by females
should be less than the male

. reg wage female


Source |
SS
df
MS
-------------+-----------------------------Model | 828.220467
1 828.220467
Residual | 6332.19382
524 12.0843394
-------------+-----------------------------Total | 7160.41429
525 13.6388844

Number of obs
F( 1,
524)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

526
68.54
0.0000
0.1157
0.1140
3.4763

-----------------------------------------------------------------------------wage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------female |
-2.51183
.3034092
-8.28
0.000
-3.107878
-1.915782
_cons |
7.099489
.2100082
33.81
0.000
6.686928
7.51205
------------------------------------------------------------------------------

E(Wage|Female=0)=
E(Wage|Female=1)=
= E(Wage|Female=1)- E(Wage|Female=0)=
The wage difference can be also due to other factor like Education

. reg wage female educ


Source |
SS
df
MS
-------------+-----------------------------Model | 1853.25304
2 926.626518
Residual | 5307.16125
523 10.1475359
-------------+-----------------------------Total | 7160.41429
525 13.6388844

Number of obs
F( 2,
523)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

526
91.32
0.0000
0.2588
0.2560
3.1855

-----------------------------------------------------------------------------wage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------female | -2.273362
.2790444
-8.15
0.000
-2.821547
-1.725176
educ |
.5064521
.0503906
10.05
0.000
.4074592
.605445
_cons |
.6228168
.6725334
0.93
0.355
-.698382
1.944016
------------------------------------------------------------------------------

Wage=

+u

E(Wage|Female=0,Education)=
E(Wage|Female=1,Education)=
So, Interpret the Coefficient

? Draw a diagram?

OverAll Model
. reg wage female educ exper expersq tenure
Source |
SS
df
MS
-------------+-----------------------------Model | 2854.50963
5 570.901926
Residual | 4305.90466
520 8.28058589
-------------+-----------------------------Total | 7160.41429
525 13.6388844

Number of obs
F( 5,
520)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

526
68.94
0.0000
0.3987
0.3929
2.8776

-----------------------------------------------------------------------------wage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------female | -1.790226
.2576917
-6.95
0.000
-2.296471
-1.283981
educ |
.5300907
.0485881
10.91
0.000
.4346376
.6255439
exper |
.2048418
.0344576
5.94
0.000
.1371486
.272535
expersq | -.0041266
.0007489
-5.51
0.000
-.0055978
-.0026553
tenure |
.1336694
.0206325
6.48
0.000
.0931361
.1742027
_cons | -2.120092
.7120462
-2.98
0.003
-3.518933
-.7212513
------------------------------------------------------------------------------

What is the interpretation of the Coefficient

Suppose the Dependent Variable is in logarithmic Form. What is the


interpretation of the coefficient on the dummy variable
. reg lnwage female educ exper expersq tenure
Source |
SS
df
MS
-------------+-----------------------------Model | 64.3854726
5 12.8770945
Residual | 83.9442788
520 .161431305
-------------+-----------------------------Total | 148.329751
525
.28253286

Number of obs
F( 5,
520)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

526
79.77
0.0000
0.4341
0.4286
.40179

-----------------------------------------------------------------------------lnwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------female | -.2979067
.0359802
-8.28
0.000
-.3685911
-.2272222
educ |
.0809586
.0067841
11.93
0.000
.0676309
.0942862
exper |
.03281
.0048111
6.82
0.000
.0233583
.0422616
expersq |
-.000648
.0001046
-6.20
0.000
-.0008535
-.0004426
tenure |
.016215
.0028808
5.63
0.000
.0105555
.0218744
_cons |
.4146365
.0994195
4.17
0.000
.2193233
.6099498
------------------------------------------------------------------------------

Suppose there are four categorical Variable,


o Married Male
o Married Female
o Single Female
o Single Male
Suppose we want to find the effect of these four categorical variable
in the expected wage
What is Dummy Variable Trap?
One of the Variable is dropped
What are the interpretation of the Coefficients?

. reg wage marr_Fm marr_Ml sing_Fm educ exper expersq tenure tenuresq
Source |
SS
df
MS
-------------+-----------------------------Model | 3040.11599
8 380.014498
Residual |
4120.2983
517 7.96962921
-------------+-----------------------------Total | 7160.41429
525 13.6388844

Number of obs
F( 8,
517)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

526
47.68
0.0000
0.4246
0.4157
2.8231

-----------------------------------------------------------------------------wage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------marr_Fm | -1.400509
.4151456
-3.37
0.001
-2.216089
-.584929
marr_Ml |
1.300762
.3973564
3.27
0.001
.5201303
2.081394
sing_Fm | -.3938605
.400119
-0.98
0.325
-1.17992
.3921986
educ |
.5229128
.0480534
10.88
0.000
.4285088
.6173168
exper |
.1831766
.0376334
4.87
0.000
.1092434
.2571098
expersq | -.0037014
.0007926
-4.67
0.000
-.0052586
-.0021442
tenure |
.1941521
.048538
4.00
0.000
.0987962
.289508
tenuresq |
-.002614
.0016599
-1.57
0.116
-.0058749
.0006469
_cons |
-2.84824
.7178692
-3.97
0.000
-4.258539
-1.437941
------------------------------------------------------------------------------

Suppose we think the effect of Education on Wages Differ across Sex


We hypothesize that one year additional education to a woman increases wages
more than one additional year of education for Male
Hypothesis: Incremental effect of education on wages for female is more than
male

If we draw a wage and Education line for male and female, determine the intercept and
the slope coefficient?

. reg lnwage female educ Female_Educ exper expersq tenure tenuresq


Source |
SS
df
MS
-------------+-----------------------------Model | 65.4081534
7 9.34402192
Residual |
82.921598
518 .160080305
-------------+-----------------------------Total | 148.329751
525
.28253286

Number of obs
F( 7,
518)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

526
58.37
0.0000
0.4410
0.4334
.4001

-----------------------------------------------------------------------------lnwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------female | -.2267886
.1675394
-1.35
0.176
-.5559289
.1023517
educ |
.0823692
.0084699
9.72
0.000
.0657296
.0990088
Female_Educ | -.0055645
.0130618
-0.43
0.670
-.0312252
.0200962
exper |
.0293366
.0049842
5.89
0.000
.019545
.0391283
expersq | -.0005804
.0001075
-5.40
0.000
-.0007916
-.0003691
tenure |
.0318967
.006864
4.65
0.000
.018412
.0453814
tenuresq |
-.00059
.0002352
-2.51
0.012
-.001052
-.000128
_cons |
.388806
.1186871
3.28
0.001
.1556388
.6219732
-----------------------------------------------------------------------------Draw the Diagram for Wage and Education relationship from Female and Male Separately
From the above model what can we conclude about the hypothesis?
From the above model can we conclude that there is no wage discrimination against
women?

Detection of Multi Collinearity


If a model have low significance of variable by high R2 then it has a
multicollinearity problem.
Multicollinearity is detected by VIF-Variance Inflation factor for each variable.
Variables have VIF more than 10 is potentially have a problem of
multicolliearity.

Variable |
VIF
1/VIF
-------------+---------------------female |
23.02
0.043445
Female_Educ |
22.86
0.043741
exper |
15.01
0.066634
expersq |
14.39
0.069498
tenure |
8.06
0.123998
tenuresq |
7.21
0.138634
educ |
1.80
0.554330
-------------+---------------------Mean VIF |
13.19

Non-Linear Effect of a Variable


By using the same logic we can determine the non-linear effect of a variable.
Suppose we are estimating the relationship of education on wage (Like in the
previous case)
We hypothesize that individual educated more than 10 years of education has more
increment on wave compared to an individual educated less than 10 years of
education.
Step 1:
Create a Dummy Variable Edu_M10=1 if education is more than 10 years and 0 other wise.
Step 2:
Interact Edu_M10 with Education
The Model we estimate is

The wage equation of education less than 10 years is

The wage equation for education MORE than 10 years is

IF we want to test whether individual with education more than 10 years has a
significant effect on wage then what should we do?

Difference in Difference Estimator (DiD)


The process is used in Policy analysis
Consider the following Case
o The school enrollment of girl child is dependent of many factors like
household income, parents education, culture, government policy etc. One of
the researcher hypothesized that one of the factor that effects the
enrollment is the dignity of the girls from the peer group. If a girl child
find that her dignity has been affected then she will be disinclined to go
to school and will drop out from school. Therefore the researcher undertook
a simple experimental study.
The researcher collected a sample of 80 girls between the age group of 6-12
from two villages, T and C. She collected information on the half yearly
$amount spend by the family in the education of the girl child, distance
travelled (by bus) to go to school and number of days the girl when to
school in last 180 days
After collecting the sample she undertook an intervention where all the girl
child in village T was dewormed but not in village C. After designing the
intervention the researcher has to leave the job of coordinating the project
and she has to go back to her university.
Suppose you are a manager of a consulting firm and your firm collected
another sample of 80 girls from T and C, 7 months after the deworming
intervention. You are asked to evaluate the impact of the deworming
intervention.

What will you do?

Data Available
Number of days to school in last 6 months. (Both before and after the program not for
the same set of girls)
Distance travelled to school
Amount spend in education
Variable of interest: Number of days gone to school in last 6 months. (Denote it by Y)
T: Village T where girl child were dewormed (Let this be the treatment village)
C: Village C where girl child were not dewormed (Let this be the control village)
0: Time period before dewormed
1: Time period after dewormed
DiD Estimate (Average or Estimated)

But the above estimate is not controlling for the effect of other variable. So, we use
a regression analysis to estimate DiD since it helps in controlling for the other
effect.
Define two Dummy variable
Time=1 Time period after dewormed and 0 other wise
Group=1 The treatment group ( The village where the girls were dewormed) and 0
otherwise

Regress

: IS the DiD Estimator

Source |
SS
df
MS
-------------+-----------------------------Model | 23211.9555
5
4642.3911
Residual |
21237.032
74 286.986919
-------------+-----------------------------Total | 44448.9875
79 562.645411

Number of obs
F( 5,
74)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

80
16.18
0.0000
0.5222
0.4899
16.941

-----------------------------------------------------------------------------sch_days |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------time |
9.735718
5.731058
1.70
0.094
-1.683664
21.1551
group |
1.340718
5.404498
0.25
0.805
-9.427979
12.10941
time_group |
26.43942
7.871359
3.36
0.001
10.75539
42.12344
dist | -.5642605
.1820824
-3.10
0.003
-.9270676
-.2014534
amt |
1.981004
.7319576
2.71
0.008
.5225462
3.439461
_cons |
30.60021
7.191824
4.25
0.000
16.27019
44.93023
------------------------------------------------------------------------------

Adjusted R2

When R2 is adjusted by the numerator and denominator degrees of freedom it


is adjusted R2

Advantage of adjusted R2
1. Unlike R2 its value do not increases with the number of variables
Sometime the adjusted R2 can be negative implying a poor fit of the
model
2. Adjusted R2 can be used for model selection among two un-nested have
two different functional form of independent variable (not for
dependent variables).
Example

First Difference estimation


Problem with wage equation is the individual effect.
Problem with estimating and country specific model is the country
specific effect.
But we have observed all this cross section unit for atleast two
periods.
The unobserved effects are two types.
1. One that changes with time
2. Other that do not changes with time but remain constant but are
correlated with
and
;

One easy way to estimate this equation to get an unbiased estimate to


and
is to take the first difference of the variables

Let us use the model to estimate the sleep work tradeoff

The unobserved effect would be called an unobserved individual effect or


an individual fixed effect.
The same factors (some biological) that cause people to sleep more or less
(captured in ai) are likely correlated with the amount of time spent
working.
Some people just have more energy, and this causes them to sleep less and
work more.
Source |
SS
df
MS
-------------+-----------------------------Model | 14674698.2
5 2934939.64
Residual | 83482611.7
233 358294.471
-------------+-----------------------------Total | 98157309.9
238 412425.672

Number of obs
F( 5,
233)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

239
8.19
0.0000
0.1495
0.1313
598.58

-----------------------------------------------------------------------------d_slpnap |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------d_totwrk | -.2266694
.036054
-6.29
0.000
-.2977029
-.1556359
d_educ | -.0244717
48.75938
-0.00
1.000
-96.09008
96.04113
d_marr |
104.2139
92.85536
1.12
0.263
-78.72946
287.1574
d_yngkid |
94.6654
87.65252
1.08
0.281
-78.02739
267.3582
d_gdhlty |
87.57785
76.59913
1.14
0.254
-63.33758
238.4933
_cons | -92.63404
45.8659
-2.02
0.045
-182.9989
-2.269152
------------------------------------------------------------------------------

Test for hetroscedasticity in the error term of the regression


Breusch Pagan Test of
. reg wage female educ exper expersq tenure
Source |
SS
df
MS
-------------+-----------------------------Model | 2854.50963
5 570.901926
Residual | 4305.90466
520 8.28058589
-------------+-----------------------------Total | 7160.41429
525 13.6388844

Number of obs
F( 5,
520)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

526
68.94
0.0000
0.3987
0.3929
2.8776

-----------------------------------------------------------------------------wage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------female | -1.790226
.2576917
-6.95
0.000
-2.296471
-1.283981
educ |
.5300907
.0485881
10.91
0.000
.4346376
.6255439
exper |
.2048418
.0344576
5.94
0.000
.1371486
.272535
expersq | -.0041266
.0007489
-5.51
0.000
-.0055978
-.0026553
tenure |
.1336694
.0206325
6.48
0.000
.0931361
.1742027
_cons | -2.120092
.7120462
-2.98
0.003
-3.518933
-.7212513
-----------------------------------------------------------------------------. estat hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of wage
chi2(1)
Prob > chi2

=
=

163.37
0.0000

Detection of Influential Observations


Detecting the outliers or observation with large residual will not help in detecting
the influential observation due to masking.
The influential observation is determined by D-Cooks distance or by Dfits value
Influential observation according to D-cook is if the value of D-cook is greater than
4/n
Influential observation according to D-cook is if the value of D-cook is greater than

You might also like