You are on page 1of 30

Score:

<1 point>

Week 5

Correlation and Regression

1. Create a correlation table for the variables in our data set. (Use analysis ToolPak or
a.
Reviewing the data levels from week 1, what variables can be used in a Pe

b. Place table here (C8):

<1 point>

c.

Using r = approximately .28 as the signicant r value (at p = 0.05) for a corr
significantly related to Salary?
To compa?

d.

Looking at the above correlations - both significant or not - are there any s
mean any relationships you expected to be meaningful and are not and vice

e.

Does this help us answer our equal pay for equal work question?

Below is a regression analysis for salary being predicted/explained by the o


age, performance rating, service, gender, and degree variables. (Note: sin
expressing an employees salary, we do not want to have both used in the
Plase interpret the findings.
Ho: The regression equation is not significant.
Ha: The regression equation is significant.
Ho: The regression coefficient for each variable is not significant
Ha: The regression coefficient for each variable is significant
Sal
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.9915590747
R Square
0.9831893985
Adjusted R Square
0.9808437332

Standard Error
Observations

2.6575925726
50

ANOVA
df
Regression
Residual
Total

SS
6 17762.2996739
43 303.700326126
49
18066

Coefficients
Standard Error
Intercept
-1.7496212123 3.6183676583
Midpoint
1.2167010505 0.0319023509
Age
-0.0046280102 0.065197212
Performace Rating -0.0565964405 0.0344950678
Service
-0.0425003573 0.0843369821
Gender
2.420337212 0.8608443176
Degree
0.2755334143 0.7998023048
Note: since Gender and Degree are expressed as 0 and 1, they are consider

Interpretation:
For the Regression as a whole:
What is the value of the F statistic:
What is the p-value associated with this value:
Is the p-value <0.05?
Do you reject or not reject the null hypothesis:
What does this decision mean for our equal pay question:
For each of the coefficients:
What is the coefficient's p-value for each of the variables:
Is the p-value < 0.05?
Do you reject or not reject each null hypothesis:
What are the coefficients for the significant variables?
Using only the significant variables, what is the equation?
Is gender a significant factor in salary:
If so, who gets paid more with all other things being equal?
How do we know?

<1 point>

Perform a regression analysis using compa as the dependent variable and t


variables as used in question 2. Show the result, and interpret your finding

Note: be sure to include the appropriate hypothesis statements.


Regression hypotheses
Ho:
Ha:
Coefficient hyhpotheses (one to stand for all the separate variables)
Ho:
Ha:
Place D94 in output box.

Interpretation:
For the Regression as a whole:
What is the value of the F statistic:
What is the p-value associated with this value:
Is the p-value < 0.05?
Do you reject or not reject the null hypothesis:
What does this decision mean for our equal pay question:
For each of the coefficients:
What is the coefficient's p-value for each of the variables:

Is the p-value < 0.05?


Do you reject or not reject each null hypothesis:
What are the coefficients for the significant variables?
Using only the significant variables, what is the equation?
Is gender a significant factor in compa:
If so, who gets paid more with all other things being equal?
How do we know?

<1 point>

Based on all of your results to date,


Do we have an answer to the question of are males and females paid equal
If so, which gender gets paid more?
How do we know?
Which is the best variable to use in analyzing pay practices - salary or com
What is most interesting or surprising about the results we got doing the an

<2 points>

Why did the single factor tests and analysis (such as t and single factor AN
What outcomes in your life or work might benefit from a multiple regressi

our data set. (Use analysis ToolPak or StatPlus:mac LE function Correlation.)


1, what variables can be used in a Pearson's Correlation table (which is what Excel produces)?

gnicant r value (at p = 0.05) for a correlation between 50 values, what variables are

oth significant or not - are there any surprises -by that I


to be meaningful and are not and vice-versa?

ay for equal work question?

ary being predicted/explained by the other variables in our sample (Midpoint,


nder, and degree variables. (Note: since salary and compa are different ways of
do not want to have both used in the same regression.)

ch variable is not significant


h variable is significant

Note: technically we have one for each input variable.


Listing it this way to save space.

MS
F
Significance F
2960.383279 419.151611129 1.8121524E-036
7.062798282

t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
-0.483538816 0.6311664899 -9.0467550427
5.547512618
-9.0467550427
38.138288116 8.664163E-035 1.1523638283 1.2810382727
1.1523638283
-0.070984788 0.9437389875 -0.1361107191 0.1268546987
-0.1361107191
-1.640711097 0.1081531819 -0.1261623747 0.0129694936
-0.1261623747
-0.503935003 0.6168793519 -0.2125820912 0.1275813765
-0.2125820912
2.8115852804 0.0073966188
0.684279192
4.156395232
0.684279192
0.3445019009 0.732148119 -1.3374216547 1.8884884833
-1.3374216547
xpressed as 0 and 1, they are considered dummy variables and can be used in a multiple regression equation.

he value of the F statistic:


associated with this value:
Is the p-value <0.05?
reject the null hypothesis:
or our equal pay question:

for each of the variables:


Is the p-value < 0.05?
eject each null hypothesis:
r the significant variables?
bles, what is the equation?
significant factor in salary:
l other things being equal?
How do we know?

Intercept

Midpoint

Age

Salary =

ompa as the dependent variable and the same independent


w the result, and interpret your findings by answering the same questions.

Perf. Rat.

ate hypothesis statements.

for all the separate variables)

he value of the F statistic:


associated with this value:
Is the p-value < 0.05?
reject the null hypothesis:
or our equal pay question:
Intercept
for each of the variables:

Midpoint

Age

Perf. Rat.

Is the p-value < 0.05?


eject each null hypothesis:
r the significant variables?
bles, what is the equation?
ignificant factor in compa:
l other things being equal?
How do we know?

Compa =

n of are males and females paid equally for equal work?

nalyzing pay practices - salary or compa? Why?


about the results we got doing the analysis during the last 5 weeks?

alysis (such as t and single factor ANOVA tests on salary equality) not provide a complete answer to our salary equality question
might benefit from a multiple regression examination rather than a simpler one variable test?

one for each input variable.

Upper 95.0%
5.547512618
1.2810382727
0.1268546987
0.0129694936
0.1275813765
4.156395232
1.8884884833
egression equation.

Service

Gender

Degree

Service

Gender

Degree

e answer to our salary equality question?

ID

Salary

Compa

Midpoint

Age

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

56.5
28.3
34
64.5
46.6
75.6
41.6
24.6
74.6
24
24.3
57.9
41.9
24.5
23.1
41.1
71.8
34.2
23.8
34.3
78.5
54.9
23.3
55.9
23.7
23.8
38.7
76.7
75.9
48.5
22.3
27
61.4
27.8
22.8
24.7
23.3
56.8
36.1
24
47.3

0.992
0.913
1.097
1.132
0.971
1.129
1.040
1.070
1.114
1.044
1.055
1.015
1.047
1.066
1.005
1.027
1.259
1.104
1.036
1.106
1.171
1.143
1.015
1.165
1.029
1.036
0.967
1.145
1.133
1.011
0.970
0.871
1.077
0.898
0.992
1.075
1.014
0.997
1.165
1.045
1.182

57
31
31
57
48
67
40
23
67
23
23
57
40
23
23
40
57
31
23
31
67
48
23
48
23
23
40
67
67
48
23
31
57
31
23
23
23
57
31
23
40

34
52
30
42
36
36
32
32
49
30
41
52
30
32
32
44
27
31
32
44
43
48
36
30
41
22
35
44
52
45
29
25
35
26
23
27
22
45
27
24
25

Performance
Rating
85
80
75
100
90
70
100
90
100
80
100
95
100
90
80
90
55
80
85
70
95
65
65
75
70
95
80
95
95
90
60
95
90
80
90
75
95
95
90
90
80

42
43
44
45
46
47
48
49
50

21.9
75.6
57.2
53.1
58.1
61
64.6
66.3
57.5

0.950
1.128
1.003
1.107
1.019
1.071
1.134
1.163
1.008

23
67
57
48
57
57
57
57
57

32
42
45
36
39
37
34
41
38

100
95
90
95
75
95
90
95
80

Service

Gender

Raise

Degree

Gender1

Gr

8
7
5
16
16
12
8
9
10
7
19
22
2
12
8
4
3
11
1
16
13
6
6
9
4
2
7
9
5
18
4
4
9
2
4
3
2
11
6
2
5

0
0
1
0
0
0
1
1
0
1
1
0
1
1
1
0
1
1
0
1
0
1
1
1
0
1
0
1
0
0
1
0
0
0
1
1
1
0
1
0
0

5.7
3.9
3.6
5.5
5.7
4.5
5.7
5.8
4
4.7
4.8
4.5
4.7
6
4.9
5.7
3
5.6
4.6
4.8
6.3
3.8
3.3
3.8
4
6.2
3.9
4.4
5.4
4.3
3.9
5.6
5.5
4.9
5.3
4.3
6.2
4.5
5.5
6.3
4.3

0
0
1
1
1
1
1
1
1
1
1
0
0
1
1
0
1
0
1
0
1
1
0
0
0
0
1
0
0
0
1
0
1
1
0
0
0
0
0
0
0

M
M
F
M
M
M
F
F
M
F
F
M
F
F
F
M
F
F
M
F
M
F
F
F
M
F
M
F
M
M
F
M
M
M
F
F
F
M
F
M
M

E
B
B
E
D
F
C
A
F
A
A
E
C
A
A
C
E
B
A
B
F
D
A
D
A
A
C
F
F
D
A
B
E
B
A
A
A
E
B
A
C

The ongoing question that the we


Note: to simplfy the analysis, we w

The column labels in the table


ID Employee sample number
Age Age in years
Service Years of service (roun
Midpoint salary grade midpo
Grade job/pay grade
Gender1 (Male or Female)

8
20
16
8
20
5
11
21
12

1
1
0
1
0
0
1
0
0

5.7
5.5
5.2
5.2
3.9
5.5
5.3
6.6
4.6

1
0
1
1
1
1
1
0
0

F
F
M
F
M
M
F
M
M

A
F
E
D
E
E
E
E
E

e ongoing question that the weekly assignments will focus on is: Are males and females paid the same for equal work (under the Equal Pa
ote: to simplfy the analysis, we will assume that jobs within each grade comprise equal work.

he column labels in the table mean:


D Employee sample number
ge Age in years
rvice Years of service (rounded)
idpoint salary grade midpoint
rade job/pay grade
ender1 (Male or Female)

Salary Salary in thousands


Performance Rating - Appraisal rating (employee evaluation score)
Gender 0 = male, 1 = female
Raise percent of last raise
Degree (0= BS\BA 1 = MS)
Compa - salary divided by midpoint

e for equal work (under the Equal Pay Act)?

oyee evaluation score)

Sal

Compa
24
1.045
24.2
1.053
23.4
1.018
23.4
1.017
22.6
0.983
22.9
0.995
23.1
1.003
23.3
1.011
22.7
0.985
23.5
1.023
23
1.002
24
1.042
35.5
1.145
34.7
1.119
35.5
1.146
35.2
1.136
40.4
1.01
42.7
1.068
53.4
1.112
51.5
1.072
49.8
1.037
68.3
1.198
65.4
1.148
78.4
1.17
75.9
1.133
24
1.044
23.3
1.012
24.1
1.049
27.5
0.887
27.1
0.875
27.7
0.895
40.8
1.019
43.9
1.097
41
1.025
48.7
1.014
49.4
1.029
64.4
1.13
64.5
1.132
58.9
1.033
57.9
1.016
59
1.035

G
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Mid
23
23
23
23
23
23
23
23
23
23
23
23
31
31
31
31
40
40
48
48
48
57
57
67
67
23
23
23
31
31
31
40
40
40
48
48
57
57
57
57
57

Age
32
30
41
32
32
36
22
29
23
27
22
32
30
31
44
27
32
30
48
30
36
27
34
44
42
32
41
24
52
25
26
44
35
25
36
45
34
42
52
35
45

EES
90
80
100
90
80
65
95
60
90
75
95
100
75
80
70
90
100
100
65
75
95
55
90
95
95
85
70
90
80
95
80
90
80
80
90
90
85
100
95
90
95

SR
9
7
19
12
8
6
2
4
4
3
2
8
5
11
16
6
8
2
6
9
8
3
11
9
20
1
4
2
7
4
2
4
7
5
16
18
8
16
22
9
11

63.3
56.8
58
62.4
63.8
79
77
74.8
76

1.111
0.996
1.017
1.094
1.12
1.179
1.149
1.116
1.135

0
0
0
0
0
0
0
0
0

57
57
57
57
57
67
67
67
67

45
39
37
41
38
36
49
43
52

90
75
95
95
80
70
100
95
95

16
20
5
21
12
12
10
13
5

G
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Raise
5.8
4.7
4.8
6
4.9
3.3
6.2
3.9
5.3
4.3
6.2
5.7
3.6
5.6
4.8
5.5
5.7
4.7
3.8
3.8
5.2
3
5.3
4.4
5.5
4.6
4
6.3
3.9
5.6
4.9
5.7
3.9
4.3
5.7
4.3
5.7
5.5
4.5
5.5
4.5

Deg
1
1
1
1
1
0
0
1
0
0
0
1
1
0
0
0
1
0
1
0
1
1
1
0
0
1
0
0
0
0
1
0
1
0
1
0
0
1
0
1
0

SUMMARY OUTPUT
Regression Statistics

Multiple R 0.7050179
R Square 0.4970503
Adjusted R 0.4132254
Standard Er0.0561253
Observatio
50
ANOVA
df

Regression
Residual
Total

SS

7 0.1307501
42 0.1323019
49 0.263052
Coefficients Standard Error

Intercept
Mid
Age
EES
SR
G
Raise
Deg

0.9486239 0.0817168
0.0034995 0.0006493
0.0005528 0.0014459
-0.001846 0.0010252
-0.000418 0.0018278
0.064665 0.0183397
0.014655 0.0139089
0.0014676 0.0161098

t-Test: Two-Sample Assuming Equal Variances


Variable 1

Variable 2

Mean
1.06684
1.04836
Variance 0.0043016 0.006481
Observatio
25
25
Pooled Vari 0.0053913
Hypothesiz
0
df
48
t Stat
0.8898353
P(T<=t) one
0.1889963
t Critical on1.6772242
P(T<=t) two0.3779926

0
0
0
0
0
0
0
0
0

5.2
3.9
5.5
6.6
4.6
4.5
4
6.3
5.4

1
1
1
0
0
1
1
1
0

t Critical tw2.0106348

MS

Significance F

0.0186786 5.9296226 7.83E-005


0.00315

t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0% Upper 95.0%

11.60868 1.09E-014 0.7837128 1.113535 0.7837128 1.113535


5.3900133 2.98E-006 0.0021892 0.0048098 0.0021892 0.0048098
0.3822925 0.7041721 -0.002365 0.0034708 -0.002365 0.0034708
-1.800846 0.0789106 -0.003915 0.0002227 -0.003915 0.0002227
-0.228814 0.8201239 -0.004107 0.0032704 -0.004107 0.0032704
3.525963 0.0010349 0.027654 0.1016759 0.027654 0.1016759
1.053639 0.2980722 -0.013414 0.0427242 -0.013414 0.0427242
0.0910996 0.9278465 -0.031043 0.0339785 -0.031043 0.0339785

ming Equal Variances

SUMMARY OUTPUT
Regression Statistics

Multiple R 0.9931287
R Square 0.9863046
Adjusted R 0.984022
Standard Er2.4352823
Observatio
50
ANOVA
df

Regression
Residual
Total

SS

Coefficients Standard Error

Intercept
Mid
Age
EES
SR
G
Raise
Deg

MS

Significance F

7 17938.425 2562.6321 432.10336 5.30E-037


42 249.08519 5.9305997
49 18187.51
t Stat

P-value

Lower 95%

Upper 95%

-4.871454 3.5457007 -1.373905 0.1767599 -12.02697 2.2840593


1.2284155 0.0281713 43.605164 1.32E-036 1.1715635 1.2852676
0.0368279 0.0627397 0.5869957 0.5603489 -0.089786 0.1634418
-0.082158 0.0444842 -1.846901 0.0718147 -0.171931 0.0076148
-0.077848 0.0793089 -0.981585 0.331925
-0.2379 0.0822034
2.9145083 0.7957605 3.6625445 0.0006935 1.3085986 4.520418
0.6763295 0.6035088 1.1206622 0.2687989 -0.541601 1.8942595
0.0345044 0.6990073 0.0493621 0.9608648 -1.376149 1.4451582

Lower 95.0% Upper 95.0%

-12.02697 2.2840593
1.1715635 1.2852676
-0.089786 0.1634418
-0.171931 0.0076148
-0.2379 0.0822034
1.3085986 4.520418
-0.541601 1.8942595
-1.376149 1.4451582

You might also like