Professional Documents
Culture Documents
Homework Set #1
Due: Thursday, 14 February, 2013
Ben Thompson
Source: Data Analysis and Decision Making Fourth Edition
Chapter 2 Describing the Distribution of a Single Variable
Chapter 3 Finding Relationships among Variables
Chapter 4 Probability and Probability Distributions
Ben Thompson
0
Chapter 2
Chapter 2, Question #17a - What salary level is most indicative of those
earned by students graduating from this MBA program this year?
The salary level most indicative indicates that we are looking for the most likely salary.
Note that the summary statistics table to the right shows that the mean salary is
significantly larger than the median, illustrating that the sample set is skewed to the
right. In a skewed data set, the median is a more representative measure of central
tendency than the mean. Thus, the salary level most indicative of those earned by
students graduating from the MBA program this year is $119,250.
Salary
Mean
124,452.5
Standard Error
2,792.708825
Median
119,250
Mode
106,700
Standard Deviation 39,494.86696
Sample Variance
1,559,844,516
Kurtosis
1.069430798
Skewness
0.812492851
Range
209,600
Minimum
41,900
Maximum
251,500
Sum
24,890,500
Count
200
Summary statistics from file
P02_17.xlsx using the
descriptive statistics, data
analysis tool in MS Excel.
Calculations for the mean, standard deviation and comparison to empirical rules of standard deviation
Q
20
21
22
23
24
25
26
Standard Deviation
1
2
3
Mean
s*
Mean - s
Mean + s
=S$25-(S$26*Q22)
=S$25-(S$26*Q23)
=S$25-(S$26*Q24)
=AVERAGE(B3:B202)
=STDEV.S(B3:B202)
=S$25+(S$26*Q22)
=S$25+(S$26*Q23)
=S$25+(S$26*Q24)
U
Estimate
=200 * 68.24%
=200 * 95.44%
=200 * 99.74%
V
Empirical Rule - Count
Actual
=(COUNTIF($B$3:$B$202,"<="&SUM(S25,S26))-COUNTIF($B$3:$B$202,"<"&(S25-S26)))
=(COUNTIF($B$3:$B$202,"<="&SUM(S25,S26*2))-COUNTIF($B$3:$B$202,"<"&(S25-S26*2)))
=(COUNTIF($B$3:$B$202,"<="&SUM(S25,S26*3))-COUNTIF($B$3:$B$202,"<"&(S25-S26*3)))
X
Empirical Rule - %
Estimate
Actual
0.68
=V22/COUNT($B$3:$B$202)
0.95
=V23/COUNT($B$3:$B$202)
0.997
=V24/COUNT($B$3:$B$202)
Excel column and row numbers are labeled to help understand the
formulas. The sample data set is in $B$3:$B$202.
Ben Thompson Page
Chapter 2
Chapter 2, Question #17b Continued Do the empirical rules for standard deviations apply to this
data?
The information in the chart to the right is the result of the
calculations above. The estimates show the empirical rules
of standard deviation. While the actual results are similar to
the empirical values, the amount of data within one standard
deviation from the data set is larger than the empirical rule
value. Overall, I would say the empirical rules could apply to
this data set, but would use caution in this case where the
data is not symmetrical and bell-shaped around the mean.
This will be shown graphically next.
Standard
Deviation
1
2
3
Mean
s*
Mean + s
Chapter 2, Question #17b Can you tell, or at least make an educated guess, by looking at the
shape of the histogram? Why?
60
50
Frequency
40
30
Frequency
20
Median
$119,250
10
0
45
65
85
Chapter 2
Chapter 2, Question #17c If the empirical rules apply here, between which two numbers can you be
about 68% sure that the salary of any one of these 200 students will fall?
Assuming the empirical rules apply here, then 68% of the
data would fall within one standard deviation of the mean.
In reviewing the same chart of information shown earlier, we
can see that if the empirical rules applied, 68% of the of the
salary data would fall between $84,957.63 and
$163,947.37. Please note, from the earlier analysis, we can
see that actually 73% of the salary data will fall between
these two numbers.
Standard
Deviation
1
2
3
Mean
s*
Mean + s
Chapter 2, Question #17d If the MBA program wants to make a statement such as Some of our
recent graduates started out making X dollars or more, and almost all of them started out making at
least Y dollars for their promotional materials, what values of X and Y would you suggest they use?
Defend your choice.
To promote the MBA program, you want to strongly advertise the
top salaries. Key phrases from the statement are some of our
recent graduates and making X dollars or more which require
more than the top salary. You can rank the salaries and select the
second highest salary, $247,900 which will be the highest salary
satisfying the parameters of this statement.
The statement also wants to show the minimum salary graduates
can expect with the statement almost all of them started out
making at least. It would be fair to say that 90% could represent
almost all, and the 10th percentile starts out at $77,500,
(Note: A design engineer uses the rule of the 5th/95th percentile. Basically it will
satisfy 90% of a population's needs, which is deemed as almost all.)
Virginia Tech MBA Spring 2013 - BIT 5724 Managerial Statistics
Completed Statement:
Some of our recent
graduates started out
making $247,900 or
more, and almost all of
them started out
making at least $77,500
dollars.
Salary
$251,500
$247,900
$241,600
$77,500
$77,300
$76,100
$73,500
$73,200
$71,500
$70,900
$70,900
$70,500
$70,200
$65,100
Rank
1
2
3
180
181
182
183
184
185
186
186
188
189
190
Percentile
100.00%
99.40%
98.90%
10.00%
9.50%
9.00%
8.50%
8.00%
7.50%
6.50%
6.50%
6.00%
5.50%
5.00%
Chapter 2
Chapter 2, Question #17e As an admissions officer of this MBA program, how would you proceed to
use these findings to market the program to prospective students?
As the admissions officer, in regards to salary, I would market the MBA program using the graduate student starting salaries
in comparative ways.
Source: a recent US News article The Future Looks
Bright for B-School Grads June 1, 2012
MBA Program
Our MBA program has been recognized by
organizations as a leading MBA program by
providing our graduates a 32% higher
starting salary on average than the median
expected MBA starting salary.
What does this MBA mean to my salary?
The 2012 average starting salary of an
undergraduate is only $44,259, almost
$75,000 below our MBA median average
starting salary of $119,250.
The 2012 median expected starting salary
of an MBA graduate is only $90,000,
$29,000 less than our MBA graduates.
In fact, 84% of our MBA graduates land a
starting salary above the national median
expected average salary.
Chapter 3
Chapter 3, Question #28a Create a table of correlations between all of the variables. Comment on the
magnitudes of the correlations. Specifically, which of the last three variables, Square Feet, Bedrooms,
and Bathrooms, are highly correlated with selling price?
Before looking at a correlation table, it is best to look at the descriptive statistics.
Appraised Value
Mean
132,090
Standard Error
1,049.307444
Median
131,760
Mode
136,890
Standard Deviation
12,765.3760
Sample Variance
162,954,824.5
Kurtosis
(0.212803681)
Skewness
0.038596873
Range
65,800
Minimum
101,930
Maximum
167,730
Sum
19,549,320
Count
148
Appraised Value:
The average (mean)
appraised home value for
this sample of 148 homes
is $132,090, ranging from
$101,930 up to $167,730.
Mean and Median are
approximately the same,
with low skewness and
kurtosis, indicating the
appraised values are fairly
symmetrical and open to
the empirical rules of
standard deviation.
Selling Price
Mean
132,955.6757
Standard Error
1,181.489365
Median
135,190
Mode
145,140
Standard Deviation
14,373.4385
Sample Variance
206,595,733.6
Kurtosis
(0.010596101)
Skewness
(0.188608381)
Range
88,600
Minimum
83,760
Maximum
172,360
Sum
19,677,440
Count
148
Selling Price:
The average (mean) home
selling price for this sample
is $132,955 (higher than
the mean appraised value),
and ranging from $83,760
up to $172,360. The Mean
is less than the Median,
with negative skew,
indicating the selling values
have some outliers with low
selling value skewing the
average selling price.
Square Feet
Mean
1,754.1419
Standard Error
18.844603
Median
1,730
Mode
1,700
Standard Deviation
229.2545
Sample Variance
52,557.6192
Kurtosis
(0.137747392)
Skewness
0.265111803
Range
1,300
Minimum
1,210
Maximum
2,510
Sum
259,613
Count
148
Square Feet:
The average (mean)
square feet for this sample
is 1,754, ranging from
1,210 up to 2,510. Mean
and Median are close to
the same, with some
positive skewness,
indicating the values are
fairly symmetrical, but with
some larger homes
skewing the square footage
distribution to the right.
Bedrooms
Mean
3.033783784
Standard Error
0.070727278
Median
3
Mode
3
Standard Deviation 0.860434478
Sample Variance
0.740347490
Kurtosis
(0.639534125)
Skewness
0.389320707
Range
3
Minimum
2
Maximum
5
Sum
449
Count
148
Bathrooms
Mean
3.040540541
Standard Error
0.102764857
Median
3
Mode
3
Standard Deviation 1.250188440
Sample Variance
1.562971134
Kurtosis
(0.464574189)
Skewness
0.091921821
Range
5
Minimum
1
Maximum
6
Sum
450
Count
148
Bedrooms:
The average number of
bedrooms for this sample
is 3, ranging from 2 - 5.
The mode and median are
the same, showing most
homes in this sample have
3 bedrooms. With some
negative kurtosis, positive
skewness and the mean
being slightly higher than
the median, there are more
homes with 2 than 4
bedrooms, but the 5
bedroom homes skew the
mean higher.
Bathrooms:
The average (mean)
number of bathrooms for
this sample is 3, ranging
from 1 - 6. The mode and
median are the same,
showing most homes in
this sample have 3
bathrooms. With the
mean being slightly higher
than the median, there are
more homes with greater
than 3 bathrooms, than
there are homes with
fewer than 3 bathrooms.
Chapter 3
Chapter 3, Question #28a Continued Create a table of correlations between all of the variables.
Comment on the magnitudes of the correlations. Specifically, which of the last three variables, Square
Feet, Bedrooms, and Bathrooms, are highly correlated with selling price?
A correlation table, like the one depicted
to the right, shows the measure of a linear
relationship between two numerical
values. It is basically a single-number
summary of a scatterplot graph.
Correlation values are always between -1
and 1, so a high correlation would be
close to one of those numbers.
Bathrooms
1
0.749499929
1
0.587484524 0.790229857
1
0.446022394 0.603537259 0.738621892
Magnitude of correlations between selling price and square feet, number of bedrooms, and number of bathrooms.
Selling Price and Square Feet as you can see from the correlation table above, there is a strong correlation
(0.7495) between square feet and selling price. In fact, this is arguably the strongest contributor to selling price since
the appraised value variable is just a valuation of selling price, and not a contributor to it.
Selling Price and Bedrooms there is a correlation between the number of bedrooms and selling price of a home
(0.5875). This correlation is stronger than the correlation of selling price to the number of bathrooms, but we may
need to look closer using a scatterplot to determine if this is a strong correlation or not.
Selling Price and Bathrooms - there is a correlation between the number of bathrooms and selling price of a home
(0.4460). While this is the weakest correlation of the three variables being evaluated, we will want to look at this
correlation closer using a scatterplot to determine how strong of a correlation it is.
Note there is a large correlation between the number of bedrooms and bathrooms to the square footage of the homes in this
sample set of data. This reveals that typically the larger the home in square feet, the more bedrooms and bathrooms it has.
Chapter 3
Selling Price
Thousands
Chapter 3, Question #28b Create four scatterplots to show how the other four variables are related to
Selling Price. In each, Selling Price should be on the Y axis. Are these in line with the correlations in
part a?
180
170
160
150
140
130
Regression Equation
120
y = 0.9384x + 9003.6
110
Selling Price
100
90
80
Selling Price
Thousands
100
180
110
120
130
140
150
Appraisal Price of Home
160
170
Thousands
170
160
150
140
130
Regression Equation
120
y = 46.991x + 50527
110
Selling Price
100
90
80
1200
1400
1600
1800
2000
Square Feet in Home
2200
2400
2600
Chapter 3
Selling Price
Thousands
Chapter 3, Question #28b Continued Create four scatterplots to show how the other four variables
are related to Selling Price. In each, Selling Price should be on the Y axis. Are these in line with the
correlations in part a?
180
170
160
150
140
130
120
Regression Equation
110
y = 9813.8x + 103183
100
Selling Price
90
80
Thousands
Selling Price
180
3
4
Number of Bedrooms
170
160
150
140
130
120
110
100
Regression Equation
90
y = 5127.9x + 117364
80
1
3
4
Number of Bathrooms
Selling Price
Linear (Selling Price)
5
Chapter 3
Selling Price
Thousands
Chapter 3, Question #28c You might think of the difference, Selling Price minus Appraised Value, as
the error in the appraised value, in the sense that this difference is how much more or less the house
sold for than the appraiser expected. Find the correlation between this difference and Selling Price,
and find the correlation between the absolute value of this difference and selling Price. If either of
these correlations is reasonably large, what is it telling us?
180
170
160
Regression Equation
150
y = 0.8424x + 132226
140
Selling Price
130
120
110
100
90
80
(55)
(35)
(15)
Selling Price
Thousands
170
160
150
y = -0.4753x + 135889
140
Selling Price
130
120
110
100
90
As you can see from the scatter plot to the left, the data
points look more like a cloud, showing evidence of no
close correlation and one extreme outlier that sold for just
above $80,000. The correlation value of 0.4679 is in line
with the cloud pattern of the scatter plot.
Correlation between Selling Price and the absolute
value of the difference between Appraisal Price and
Selling Price :
As you can see from the scatter plot to the left, the data
points look more like a cloud, showing evidence of no
close correlation. The correlation value of -0.1690 is in line
with the cloud pattern of the scatter plot.
80
0
10
20
30
40
50
60
Note: the absolute value of the difference between the appraised value and the selling
price is mostly $10,000 or less, but the difference is not correlated to the selling price.
Chapter 4
Chapter 4, Question #70 Suppose X and Y are independent random variables. The possible values of
X are -1,0, and 1; the possible values of Y are 10, 20, and 30. You are given that P(X=-1 and Y=10) =
0.05, P(X=0 and Y=30) = 0.20, P(Y=10) = 0.20, and P(X=0) = 0.50. Determine the joint probability
distribution of X and Y.
We are given the values listed (bolded black) in the joint probability table below. Then we can use the joint probability formulas
to calculate the missing values, in the order listed in this chart.
Joint Probability Distribution is the probability of the joint event that
X = x and Y = y both occur. P(X=x and Y=y) = P(X=x)P(Y=y).
The joint probabilities must be nonnegative and sum to one.
10
Chapter 4
Chapter 4, Question #70 Continued Suppose X and Y are independent random variables. The
possible values of X are -1,0, and 1; the possible values of Y are 10, 20, and 30. you are given that
P(X=-1 and Y=10) = 0.05, P(X=0 and Y=30) = 0.20, P(Y=10) = 0.20, and P(X=0) = 0.50. Determine the joint
probability distribution of X and Y.
Using the addition rule for mutually exclusive events, the conditional
probability rule, and the multiplication rule, we were able to calculate the
missing joint probabilities as shown in the table to the left.
The information shows that it is twice as likely for X to be 0, than for X to
be -1 or 1. Also, it is twice as likely for Y to be 20 or 30, than it is for Y to
be 10. This is why the highest joint probabilities (0.2) are P(X=0Y=20)
and P(X=0Y=30).
11