You are on page 1of 2

Statistical Methods for Managerial Decisions Assignment 1

1. Prepare a brief report summarizing the home values (prices) in this area. Use both graphical and numerical
summaries. Your report should briefly describe what those summaries tell you, and anything of particular
note/interest.

Quantile House Price


100.0% maximum 446,436 Parameter Value
99.5% 385,738 Mean 163,862
97.5% 329,453 Std Dev 67,652
90.0% 255,526 Std Err Mean 2,091
75.0% quartile 205,397 Upper 95% Mean 167,965
50.0% median 151,917 Lower 95% Mean 159,760
25.0% quartile 111,875 N 1,047
10.0% 92,785
2.5% 62,971
0.5% 41,830
0.0% minimum 16,858

a. The histogram for the above distribution (House Prices) is skewed towards the left, meaning more than half the houses are priced
above the average house price in the area. (Median < Mean)
b. The interquartile range is 111,875 through 205,397 i.e. 50% of the houses are priced between these two.
c. As can be seen from the boxplot, there are significant number of outliers on the maximum house price, which might lead to an
inference that there are houses which maybe covered over a large area or are luxurious, therefore more expensive (Since geography
makes no difference in this case.)
d. Mean of this distribution is 163,862 with Standard Deviation of 67,652. The Distribution is likely to be spread out than be narrow as
can also be reflected from the Histogram.
e. Looking from a Real Estate Agents perspective, bucketing based on the clients budgeting and requirement can be done using these
statistics. For Ex Cheapest house is worth 16K while costliest is worth 446K

2. Does the normal model provide a good description of the prices? Use a Normal Quantile plot to frame your response.

a. The distribution of House Prices is unlikely to follow the normal distribution model as can be seen from the estimated fitting of continuous
normal model over the histogram of house prices.
b. The Normal Quantile Plot concurs with the above inference, since most of the data points (Black dots) are outside the acceptable limits (Red
Dotted Curves) and are significantly away from the Normal Distribution Model Line (Red Line).

3. Irrespective of your response to Q2, assume that Price ~ N(164K, (68K)2). Given this:
A. Calculate the following probabilities P(Price > 92.8K), P(Price < 255.5K). Do these numbers agree with what you see in the
data?

a. Assuming: Price ~ N(164K, (68K)2 ) b. Assuming: Price ~ N(164K, (68K)2 )


= 92.8 , = 164, = 68 = 255.5 , = 164, = 68
= + = +
92.8 = 164 + 68 255.5 = 164 + 68

= 1.04 ( < 92.8) = 0.1492 = 1.34 ( < 255.5) = 0.9099
( > 92.8 ) = 1 ( < 92.8) ( < . ) = .
( > . ) = . As per data, 943 of 1047 total houses have prices less
As per data, 942 of 1047 total houses have prices greater than 255.5K, which is closely 90% of the data.
than 92.8K, which is closely 90% of the data. As per data, P (Price < 255.5K) = 0.90
As per data, P (Price > 92.8K) = 0.899
For P( Price > 92.8) the numbers are close with 5% difference, however for P (Price > 255.5K) the Probability from both sources is very close
and is a good representation from both sources.

Student Name : Rajat Shah Dated : 28/04/2017


Student ID : 61810460
Statistical Methods for Managerial Decisions Assignment 1

B. Once again, assuming the above normal C. Based on the theoretical model, what do you
distribution, what percentage of houses should expect should be the price of a house that is
have a value less than 232K? Does that agree exactly on the 3rd quartile (75th percentile,).
with the data? How does that compare to the actual?

Assuming: Price ~ N(164K, (68K)2 ) As per distribution table above for the data, 75th
percentile House Price is 205397.
= 232 , = 164, = 68
= + As per theoretical model however,
232 = 164 + 68 (0.75) ; = 0.675

= +
= 1 ( < 232) = 0.8413 = 164 + 68 0.675
= .
( < ) = .
As per data, 878 of 1047 total houses have prices less The theoretical and actual values are within the
comparable range
than 232K, which is closely 83.8% percentile of the data
(209K vs 205K) .
values.

As per data, P (Price > 92.8K) = 0.838 which is in sync

with the Normal Distribution Model.

4. Create a histogram and boxplot for the Living Area variable. Is the distribution symmetric? Check the skewness measure to see if it
is consistent with your observation.

Basis the histogram and boxplot, the distribution does not seem symmetric,
and looks positively skewed towards the left, since the Mean > Median.

The skewness measure as per data :


= .
which is consistent with the observation based on the histogram.

5. Create a new column in the dataset by taking the logarithm of the Living Area variable. Is the normal distribution a better fit for
this variable or the original (Living Area) variable? Why do you think this is the case?

As is evident from the Normal Distribution fit as well as the normal quantile plot, Logarithm of the Living Area variable can be fairly
approximated as a normal distribution model, however, the variable Living Area has significant data points outside the acceptable range
and therefore might not be a very good approximation of the same.

Student Name : Rajat Shah Dated : 28/04/2017


Student ID : 61810460

You might also like