You are on page 1of 12

BIT 5724

Homework Set #1
Due: Thursday, 14 February, 2013
Ben Thompson
Source: Data Analysis and Decision Making Fourth Edition
Chapter 2 Describing the Distribution of a Single Variable
Chapter 3 Finding Relationships among Variables
Chapter 4 Probability and Probability Distributions

Virginia Tech MBA Spring 2013 - BIT 5724 Managerial Statistics

Ben Thompson
0

Chapter 2
Chapter 2, Question #17a - What salary level is most indicative of those
earned by students graduating from this MBA program this year?
The salary level most indicative indicates that we are looking for the most likely salary.
Note that the summary statistics table to the right shows that the mean salary is
significantly larger than the median, illustrating that the sample set is skewed to the
right. In a skewed data set, the median is a more representative measure of central
tendency than the mean. Thus, the salary level most indicative of those earned by
students graduating from the MBA program this year is $119,250.

Chapter 2, Question #17b Do the empirical rules for standard


deviations apply to this data?
Standard deviation is a measure of variability, expressed in the same units as the
observations (square root of the variance). The empirical rules for standard deviation
apply if the values of the variable are approximately normally distributed (symmetric and
bell-shaped).

Empirical Rules of Standard Deviation:


Approximately 68% of the observations are within one standard deviation of the mean
Approximately 95% of the observations are within two standard deviations of the mean
Approximately 99.7% of the observations are within three standard deviations of the mean

Salary
Mean
124,452.5
Standard Error
2,792.708825
Median
119,250
Mode
106,700
Standard Deviation 39,494.86696
Sample Variance
1,559,844,516
Kurtosis
1.069430798
Skewness
0.812492851
Range
209,600
Minimum
41,900
Maximum
251,500
Sum
24,890,500
Count
200
Summary statistics from file
P02_17.xlsx using the
descriptive statistics, data
analysis tool in MS Excel.

Calculations for the mean, standard deviation and comparison to empirical rules of standard deviation
Q
20
21
22
23
24
25
26

Standard Deviation
1
2
3
Mean
s*

Mean - s

Mean + s

=S$25-(S$26*Q22)
=S$25-(S$26*Q23)
=S$25-(S$26*Q24)
=AVERAGE(B3:B202)
=STDEV.S(B3:B202)

=S$25+(S$26*Q22)
=S$25+(S$26*Q23)
=S$25+(S$26*Q24)

U
Estimate
=200 * 68.24%
=200 * 95.44%
=200 * 99.74%

V
Empirical Rule - Count
Actual
=(COUNTIF($B$3:$B$202,"<="&SUM(S25,S26))-COUNTIF($B$3:$B$202,"<"&(S25-S26)))
=(COUNTIF($B$3:$B$202,"<="&SUM(S25,S26*2))-COUNTIF($B$3:$B$202,"<"&(S25-S26*2)))
=(COUNTIF($B$3:$B$202,"<="&SUM(S25,S26*3))-COUNTIF($B$3:$B$202,"<"&(S25-S26*3)))

X
Empirical Rule - %
Estimate
Actual
0.68
=V22/COUNT($B$3:$B$202)
0.95
=V23/COUNT($B$3:$B$202)
0.997
=V24/COUNT($B$3:$B$202)

Note: Sample set contains 200 data


*s = standard deviation

Virginia Tech MBA Spring 2013 - BIT 5724 Managerial Statistics

Excel column and row numbers are labeled to help understand the
formulas. The sample data set is in $B$3:$B$202.
Ben Thompson Page

Chapter 2
Chapter 2, Question #17b Continued Do the empirical rules for standard deviations apply to this
data?
The information in the chart to the right is the result of the
calculations above. The estimates show the empirical rules
of standard deviation. While the actual results are similar to
the empirical values, the amount of data within one standard
deviation from the data set is larger than the empirical rule
value. Overall, I would say the empirical rules could apply to
this data set, but would use caution in this case where the
data is not symmetrical and bell-shaped around the mean.
This will be shown graphically next.

Standard
Deviation
1
2
3
Mean
s*

Empirical Rule - Count Empirical Rule - %


Estimate
Actual Estimate Actual
84,957.6330 163,947.3670
136
146
68.0%
73.0%
45,462.7661 203,442.2339
191
190
95.0%
95.0%
5,967.8991 242,937.1009
199
198
99.7%
99.0%
124,452.5000 Note: Sample set contains 200 data
39,494.8670 *s = standard deviation
Mean - s

Mean + s

Chapter 2, Question #17b Can you tell, or at least make an educated guess, by looking at the
shape of the histogram? Why?

This histogram is unimodal meaning it shows one


peak and is likely from one data set.

Virginia Tech MBA Spring 2013 - BIT 5724 Managerial Statistics

60

Recent Graduate Salaries

50

Frequency

The empirical rules for standard deviation apply if the


values of the variable are approximately normally
distributed (symmetric and bell-shaped). From the
histogram pictured to the right, you can see that the data is
not symmetrical around the median, nor is it bell shaped.
Thus, just by looking at the histogram, you can make an
educated guess that while close to a symmetrical, bellshape, it is slightly skewed to the right (positively skewed)
with increased frequency within one standard deviation as
shown from the table above.

40

30
Frequency

20

Median
$119,250

10

0
45

65

85

105 125 145 165 185 205 225 245 265


Salary in Thousands

Ben Thompson Page

Chapter 2
Chapter 2, Question #17c If the empirical rules apply here, between which two numbers can you be
about 68% sure that the salary of any one of these 200 students will fall?
Assuming the empirical rules apply here, then 68% of the
data would fall within one standard deviation of the mean.
In reviewing the same chart of information shown earlier, we
can see that if the empirical rules applied, 68% of the of the
salary data would fall between $84,957.63 and
$163,947.37. Please note, from the earlier analysis, we can
see that actually 73% of the salary data will fall between
these two numbers.

Standard
Deviation
1
2
3
Mean
s*

Empirical Rule - Count Empirical Rule - %


Estimate
Actual Estimate Actual
84,957.6330 163,947.3670
136
146
68.0%
73.0%
45,462.7661 203,442.2339
191
190
95.0%
95.0%
5,967.8991 242,937.1009
199
198
99.7%
99.0%
124,452.5000 Note: Sample set contains 200 data
39,494.8670 *s = standard deviation
Mean - s

Mean + s

Chapter 2, Question #17d If the MBA program wants to make a statement such as Some of our
recent graduates started out making X dollars or more, and almost all of them started out making at
least Y dollars for their promotional materials, what values of X and Y would you suggest they use?
Defend your choice.
To promote the MBA program, you want to strongly advertise the
top salaries. Key phrases from the statement are some of our
recent graduates and making X dollars or more which require
more than the top salary. You can rank the salaries and select the
second highest salary, $247,900 which will be the highest salary
satisfying the parameters of this statement.
The statement also wants to show the minimum salary graduates
can expect with the statement almost all of them started out
making at least. It would be fair to say that 90% could represent
almost all, and the 10th percentile starts out at $77,500,
(Note: A design engineer uses the rule of the 5th/95th percentile. Basically it will
satisfy 90% of a population's needs, which is deemed as almost all.)
Virginia Tech MBA Spring 2013 - BIT 5724 Managerial Statistics

Completed Statement:
Some of our recent
graduates started out
making $247,900 or
more, and almost all of
them started out
making at least $77,500
dollars.

Salary
$251,500
$247,900
$241,600
$77,500
$77,300
$76,100
$73,500
$73,200
$71,500
$70,900
$70,900
$70,500
$70,200
$65,100

Rank

1
2
3
180
181
182
183
184
185
186
186
188
189
190

Percentile
100.00%
99.40%
98.90%
10.00%
9.50%
9.00%
8.50%
8.00%
7.50%
6.50%
6.50%
6.00%
5.50%
5.00%

Ben Thompson Page

Chapter 2
Chapter 2, Question #17e As an admissions officer of this MBA program, how would you proceed to
use these findings to market the program to prospective students?
As the admissions officer, in regards to salary, I would market the MBA program using the graduate student starting salaries
in comparative ways.
Source: a recent US News article The Future Looks
Bright for B-School Grads June 1, 2012

MBA Program
Our MBA program has been recognized by
organizations as a leading MBA program by
providing our graduates a 32% higher
starting salary on average than the median
expected MBA starting salary.
What does this MBA mean to my salary?
The 2012 average starting salary of an
undergraduate is only $44,259, almost
$75,000 below our MBA median average
starting salary of $119,250.
The 2012 median expected starting salary
of an MBA graduate is only $90,000,
$29,000 less than our MBA graduates.
In fact, 84% of our MBA graduates land a
starting salary above the national median
expected average salary.

Our MBA Graduates' Starting Salaries


Mean
$
124,453
Median
$
119,250
Maximum
$
251,500
16th Precentile
$
90,300
Sample Count
$
200

Virginia Tech MBA Spring 2013 - BIT 5724 Managerial Statistics

Ben Thompson Page

Chapter 3
Chapter 3, Question #28a Create a table of correlations between all of the variables. Comment on the
magnitudes of the correlations. Specifically, which of the last three variables, Square Feet, Bedrooms,
and Bathrooms, are highly correlated with selling price?
Before looking at a correlation table, it is best to look at the descriptive statistics.
Appraised Value
Mean
132,090
Standard Error
1,049.307444
Median
131,760
Mode
136,890
Standard Deviation
12,765.3760
Sample Variance
162,954,824.5
Kurtosis
(0.212803681)
Skewness
0.038596873
Range
65,800
Minimum
101,930
Maximum
167,730
Sum
19,549,320
Count
148

Appraised Value:
The average (mean)
appraised home value for
this sample of 148 homes
is $132,090, ranging from
$101,930 up to $167,730.
Mean and Median are
approximately the same,
with low skewness and
kurtosis, indicating the
appraised values are fairly
symmetrical and open to
the empirical rules of
standard deviation.

Selling Price
Mean
132,955.6757
Standard Error
1,181.489365
Median
135,190
Mode
145,140
Standard Deviation
14,373.4385
Sample Variance
206,595,733.6
Kurtosis
(0.010596101)
Skewness
(0.188608381)
Range
88,600
Minimum
83,760
Maximum
172,360
Sum
19,677,440
Count
148

Selling Price:
The average (mean) home
selling price for this sample
is $132,955 (higher than
the mean appraised value),
and ranging from $83,760
up to $172,360. The Mean
is less than the Median,
with negative skew,
indicating the selling values
have some outliers with low
selling value skewing the
average selling price.

Virginia Tech MBA Spring 2013 - BIT 5724 Managerial Statistics

Square Feet
Mean
1,754.1419
Standard Error
18.844603
Median
1,730
Mode
1,700
Standard Deviation
229.2545
Sample Variance
52,557.6192
Kurtosis
(0.137747392)
Skewness
0.265111803
Range
1,300
Minimum
1,210
Maximum
2,510
Sum
259,613
Count
148

Square Feet:
The average (mean)
square feet for this sample
is 1,754, ranging from
1,210 up to 2,510. Mean
and Median are close to
the same, with some
positive skewness,
indicating the values are
fairly symmetrical, but with
some larger homes
skewing the square footage
distribution to the right.

Bedrooms
Mean
3.033783784
Standard Error
0.070727278
Median
3
Mode
3
Standard Deviation 0.860434478
Sample Variance
0.740347490
Kurtosis
(0.639534125)
Skewness
0.389320707
Range
3
Minimum
2
Maximum
5
Sum
449
Count
148

Bathrooms
Mean
3.040540541
Standard Error
0.102764857
Median
3
Mode
3
Standard Deviation 1.250188440
Sample Variance
1.562971134
Kurtosis
(0.464574189)
Skewness
0.091921821
Range
5
Minimum
1
Maximum
6
Sum
450
Count
148

Bedrooms:
The average number of
bedrooms for this sample
is 3, ranging from 2 - 5.
The mode and median are
the same, showing most
homes in this sample have
3 bedrooms. With some
negative kurtosis, positive
skewness and the mean
being slightly higher than
the median, there are more
homes with 2 than 4
bedrooms, but the 5
bedroom homes skew the
mean higher.

Bathrooms:
The average (mean)
number of bathrooms for
this sample is 3, ranging
from 1 - 6. The mode and
median are the same,
showing most homes in
this sample have 3
bathrooms. With the
mean being slightly higher
than the median, there are
more homes with greater
than 3 bathrooms, than
there are homes with
fewer than 3 bathrooms.

Ben Thompson Page

Chapter 3
Chapter 3, Question #28a Continued Create a table of correlations between all of the variables.
Comment on the magnitudes of the correlations. Specifically, which of the last three variables, Square
Feet, Bedrooms, and Bathrooms, are highly correlated with selling price?
A correlation table, like the one depicted
to the right, shows the measure of a linear
relationship between two numerical
values. It is basically a single-number
summary of a scatterplot graph.
Correlation values are always between -1
and 1, so a high correlation would be
close to one of those numbers.

Correlation Table Appraised Value


Appraised Value
1
Selling Price
0.833406326
Square Feet
0.677236205
Bedrooms
0.572260566
Bathrooms
0.423526693

Selling Price Square Feet Bedrooms

Bathrooms

1
0.749499929
1
0.587484524 0.790229857
1
0.446022394 0.603537259 0.738621892

Magnitude of correlations between selling price and square feet, number of bedrooms, and number of bathrooms.
Selling Price and Square Feet as you can see from the correlation table above, there is a strong correlation
(0.7495) between square feet and selling price. In fact, this is arguably the strongest contributor to selling price since
the appraised value variable is just a valuation of selling price, and not a contributor to it.
Selling Price and Bedrooms there is a correlation between the number of bedrooms and selling price of a home
(0.5875). This correlation is stronger than the correlation of selling price to the number of bathrooms, but we may
need to look closer using a scatterplot to determine if this is a strong correlation or not.
Selling Price and Bathrooms - there is a correlation between the number of bathrooms and selling price of a home
(0.4460). While this is the weakest correlation of the three variables being evaluated, we will want to look at this
correlation closer using a scatterplot to determine how strong of a correlation it is.
Note there is a large correlation between the number of bedrooms and bathrooms to the square footage of the homes in this
sample set of data. This reveals that typically the larger the home in square feet, the more bedrooms and bathrooms it has.

Virginia Tech MBA Spring 2013 - BIT 5724 Managerial Statistics

Ben Thompson Page

Chapter 3

Selling Price

Thousands

Chapter 3, Question #28b Create four scatterplots to show how the other four variables are related to
Selling Price. In each, Selling Price should be on the Y axis. Are these in line with the correlations in
part a?
180

Correlation between Selling Price and Appraisal Price

A scatterplot is a scatter of points, where


each point denotes the values of an
observation for two selected variables.

170
160
150
140
130
Regression Equation

120

y = 0.9384x + 9003.6

110

Selling Price

100

Linear (Selling Price)

90
80

Selling Price

Thousands

100

180

110

120

130
140
150
Appraisal Price of Home

160

Correlation between Selling Price and Appraisal Price:


The scatterplot shows a strong positive correlation
between the Selling Price and the Appraisal Price as
evidenced by almost all of the data points being close to
the linear trend line. This is in line with the correlation
table value of 0.8334.

170

Thousands

A trend line is a line or curve that fits the


scatter as well as possible and given by the
Regression Equation shown.

Correlation between Selling Price and Square Feet

170
160

150
140
130
Regression Equation

120

y = 46.991x + 50527

110

Selling Price

100

Linear (Selling Price)

90

80
1200

1400

1600

1800
2000
Square Feet in Home

2200

2400

Virginia Tech MBA Spring 2013 - BIT 5724 Managerial Statistics

2600

Correlation between Selling Price and Square Feet:


The scatterplot shows a strong positive correlation
between the Selling Price and the square feet as
evidenced by the data points being close to the linear
trend line. This is in line with the correlation table value of
0.7495. Note that the data points are not as close to the
trend line as in the correlation to Appraisal price, which is
in line with the lower correlation value.
Ben Thompson Page

Chapter 3

Selling Price

Thousands

Chapter 3, Question #28b Continued Create four scatterplots to show how the other four variables
are related to Selling Price. In each, Selling Price should be on the Y axis. Are these in line with the
correlations in part a?
180

Correlation between Selling Price and Number of


Bedrooms

170
160

Correlation between Selling Price and Bedrooms:

150
140
130
120
Regression Equation

110

y = 9813.8x + 103183

100

Selling Price

90

Linear (Selling Price)

80

Thousands

Selling Price

Looking at the regression equation, it implies


that each additional bedroom increases the
selling price by $9,813.80 on average.

180

3
4
Number of Bedrooms

Looking at the regression equation, it implies


that each additional bathroom increases the
selling price by $5,127.90 on average.

Correlation between Selling Price and Number of


Bathrooms

170
160
150
140
130
120

110
100
Regression Equation

90

y = 5127.9x + 117364

80
1

3
4
Number of Bathrooms

Selling Price
Linear (Selling Price)
5

Virginia Tech MBA Spring 2013 - BIT 5724 Managerial Statistics

The scatterplot shows a fairly positive correlation between


the Selling Price and the number of bedrooms as
evidenced by a fairly close approximation of the data
points being close to the linear trend line. This is in line
with the correlation table value of 0.5875. Note that the
data points are not as close to the trend line as in the
correlation to Appraisal price or Square Feet, which is in
line with the lower correlation value.

Correlation between Selling Price and Bathrooms:


The scatterplot shows a slightly positive correlation
between the Selling Price and the number of bathrooms as
evidenced by a wider spread of the data points to the
linear trend line. This is in line with the correlation table
value of 0.4460. Note that some homes with one
bathroom sold for more than some homes with five
bathrooms. Overall, the linear trend shows some positive
correlation, but not as much as the other variables.
Ben Thompson Page

Chapter 3

Selling Price

Thousands

Chapter 3, Question #28c You might think of the difference, Selling Price minus Appraised Value, as
the error in the appraised value, in the sense that this difference is how much more or less the house
sold for than the appraiser expected. Find the correlation between this difference and Selling Price,
and find the correlation between the absolute value of this difference and selling Price. If either of
these correlations is reasonably large, what is it telling us?
180

Correlation between Selling Price and the difference


between Appraisal Price and Selling Price

170
160

Regression Equation

150

y = 0.8424x + 132226

140

Selling Price

130

Calculations for difference (Selling Price minus Appraised


Value) and the absolute value of that difference.

Linear (Selling Price)

120
110

Correlation between Selling Price and the difference


between Appraisal Price and Selling Price:

100
90

80
(55)

(35)

(15)

Selling Price

Thousands

Difference between Appraisal Price and Selling Price of Home in Thousands


180

Correlation between Selling Price and the absolute value


of the difference between the Appraisal Price and
Selling Price Regression Equation

170

160
150

y = -0.4753x + 135889

140

Selling Price

130

Linear (Selling Price)

120
110
100

90

As you can see from the scatter plot to the left, the data
points look more like a cloud, showing evidence of no
close correlation and one extreme outlier that sold for just
above $80,000. The correlation value of 0.4679 is in line
with the cloud pattern of the scatter plot.
Correlation between Selling Price and the absolute
value of the difference between Appraisal Price and
Selling Price :
As you can see from the scatter plot to the left, the data
points look more like a cloud, showing evidence of no
close correlation. The correlation value of -0.1690 is in line
with the cloud pattern of the scatter plot.

80
0

10

20

30

40

50

60

Difference between Appraisal Price and Selling Price of Home inThousands

Virginia Tech MBA Spring 2013 - BIT 5724 Managerial Statistics

Note: the absolute value of the difference between the appraised value and the selling
price is mostly $10,000 or less, but the difference is not correlated to the selling price.

Ben Thompson Page

Chapter 4
Chapter 4, Question #70 Suppose X and Y are independent random variables. The possible values of
X are -1,0, and 1; the possible values of Y are 10, 20, and 30. You are given that P(X=-1 and Y=10) =
0.05, P(X=0 and Y=30) = 0.20, P(Y=10) = 0.20, and P(X=0) = 0.50. Determine the joint probability
distribution of X and Y.
We are given the values listed (bolded black) in the joint probability table below. Then we can use the joint probability formulas
to calculate the missing values, in the order listed in this chart.
Joint Probability Distribution is the probability of the joint event that
X = x and Y = y both occur. P(X=x and Y=y) = P(X=x)P(Y=y).
The joint probabilities must be nonnegative and sum to one.

Marginal Probability (Distribution) is the probability distribution of a


single random variable. They are called marginal because they are
usually displayed in the margins of a table.
Multiplication Rule for Independent Events: P(A and B) = P(A)P(B)
Conditional Probability: P(AB) = P(A and B) / P(B)
The principles used to calculate the joint probabilities are listed to the bottom right and are shaded the same color as the cell
in the formula table to the bottom left. The results are displayed on the following page.

The joint probabilities, in each column and row,


must add up to the marginal probability.
The joint probability for independent random
variables can be calculated by using the
multiplication rule for independent events.
Conditional probability is the joint probability
divided by the marginal probability.

Virginia Tech MBA Spring 2013 - BIT 5724 Managerial Statistics

Ben Thompson Page

10

Chapter 4
Chapter 4, Question #70 Continued Suppose X and Y are independent random variables. The
possible values of X are -1,0, and 1; the possible values of Y are 10, 20, and 30. you are given that
P(X=-1 and Y=10) = 0.05, P(X=0 and Y=30) = 0.20, P(Y=10) = 0.20, and P(X=0) = 0.50. Determine the joint
probability distribution of X and Y.
Using the addition rule for mutually exclusive events, the conditional
probability rule, and the multiplication rule, we were able to calculate the
missing joint probabilities as shown in the table to the left.
The information shows that it is twice as likely for X to be 0, than for X to
be -1 or 1. Also, it is twice as likely for Y to be 20 or 30, than it is for Y to
be 10. This is why the highest joint probabilities (0.2) are P(X=0Y=20)
and P(X=0Y=30).

Virginia Tech MBA Spring 2013 - BIT 5724 Managerial Statistics

Ben Thompson Page

11

You might also like