You are on page 1of 10

BES Tutorial Sample Solutions, Semester2 2010

WEEK 3 TUTORIAL EXERCISES (To be discussed


in the week starting August 2)
1. Using the car data from Week 2, Question 3:
(a) Redo Q3(c) using EXCEL to confirm that the
frequency histogram is given by Figure 3.1.

Frequency

Figure 3.1: Revised histogram for age


of cars
10
9
8
7
6
5
4
3
2
1
0
2

10

14

18

22

Age

(b)Calculate the mean, median and mode for this sample


of data and use them to further describe the
distribution of ages.
5 5 6 ... 24 11
7 .3
20
Ordering the data from lowest to highest:

Mean

2, 2, 3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 7, 9, 10, 11, 11, 14, 24,


Median = (6+6)/2=6
Mode = 6
The sample mean is to the right of mode and median,
suggesting that the sample distribution is skewed
towards the right. The cause seems to be the large outlier
one car had an age of 24, which appeared to be very
different to the age of other cars. Given the skewness and
the outlier, the median is possibly a better measure of
central tendency. Hence a typical second-hand car is 6
years old.
Alternatively the EXCEL output is:
Age
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count

7.3
1.126476
6
6
5.037752
25.37895
5.712234
2.0983
22
2
24
146
20

(c) If the largest observation were removed from this data


set, how would the three measures of central tendency
you have calculated change?
5 5 6 ... 6 11
6 .4
19
median)

Mean

(Now

closer

to

Median = (6+6)/2= 6 (unchanged)


Mode = 6 (unchanged)

2. For the following statistical population, compute the


mean, range, variance and standard deviation: 3, 3, 5,
12, 13, 14, 17, 20, 21, 21.
3 3 5 12 13 14 17 20 21 21
12.9
10
Range 21 3 18
( xi ) 2 (3 12.9) 2 .... (21 12.9) 2

Variance
10
N
45.89
Standard deviation 45.89 6.7742
Mean

3. For the population in Q2 above, what would happen to


each of the measures you have calculated if :
(a) 4 were added to each data point (observation)?
The mean would increase by 4, but the range variance
and standard deviation would be unchanged.
(b) Each data point was multiplied by 2?
The mean, range and standard deviation would be
multiplied by 2, whilst the variance would be multiplied
by 4.
4. Calculate the 90th percentile for the following set of
data:
-2.4, -1.34, 3.4, 3.5, 4.01, 6.5, 6.7, 7.25, 7.9, 8.46, 9.7,
9.8, 10.45
For a value of p 90 , we have
p
90
L p (n 1)
(14)
12.6 .
100
100
Implying the 90th percentile is 60% of the distance
between the 12th & 13th observation. Then:
90th percentile 9.8 (0.6)(10.45 9.8) 10.19

5. SIA: Migrant wealth.


Suppose the Minister for Immigration is interested in
research on the assimilation of migrant households (a
household where the chief income-earner is foreign
born). The Household, Income and Labour Dynamics
in Australia (HILDA) survey is a representative
survey of Australian households. Using 4,669
household observations for 2002 from HILDA, we
find there are 3,567 households classified as
Australian-born and 1,102 classified as migrants. One
key consideration is how migrant households are
doing in terms of wealth compared with Australianborn households. Using these data, we find the
following:
Summary statistics for net household wealth ($A)

236,064

10th
Median
90th
percen
percen
tile
tile
1,545 123,020 560,006

248,970

1,720

Mean

Australianborn
Migrant

131,152 524,372

(a) What can you say about the distribution of net


household wealth for both Australian-born and
migrant households by looking at just the mean and
the median figures?
The wealth distribution is skewed quite heavily towards
the right for both Australian-born and migrant
households. The mean is much larger than the median,
5

suggesting that more than 50% of each sample have less


than average wealth, while less than 50% of each sample
have more than average wealth. In other words, there is
a fair amount of wealth inequality in both samples.
(b)More generally what can you say about the
distribution of wealth for migrant households
compared to that for Australian-born households? In
particular, which type of household has greater
variation in wealth?
Based on just the mean and the median measures, a
typical migrant family appears to be slightly wealthier
than a typical Australian-born family. Both figures are
larger for the migrant sample than the Australian-born
sample. This is also the case for the 10th percentile
figure. By contrast, the 90th percentile is greater for the
Australian-born sample than the migrant sample. These
figures suggest that, while typical migrant families are
better off than typical Australian families in terms of
wealth, migrant families are less likely to be very poor or
very rich compared with Australian-born families. In
other words, Australian-born families have greater
variation in household wealth than migrant families.

(c) Suppose the minister has net household wealth of


$600,000. What can you say about their financial
circumstances relative to other Australian-born
households?
The ministers household has greater wealth than at least
90% of Australian-born households in Australia. They
6

are amongst
households.

the

wealthiest

10%

of

Australian

6. SIA: Sydney housing prices.


Figure 3.2 depicts a scatter plot of Sydney housing
prices versus distance from Sydney. The unit of
observation is a suburb, price is the mean of the
median price of houses sold in each suburb for two
quarters (September and December 2002) and
distance is measured in kilometers from Sydneys
CBD.
(a) What would you expect the correlation to be between
price and distance?
There is an inverse relationship between Distance to
CBD and Price so expect correlation to be negative.
(b)Does it appear that there is a linear relationship
between the two variables?
Relationship does not look linear largely because of the
large variability in prices for suburbs close to the CBD.
(These observations also tend to distort what the
relationship looks like for the bulk of the data. If you
were to eliminate these outliers, it is not clear what the
relationship would look like for the remainder of the
data.)
(c) What other key features of these data can be
determined from the plot?

Figure 3.2: House prices in Sydney suburbs versus distance to


CBD
6000000

5000000

Price $

4000000

3000000

2000000

1000000

0
0

10

20

30

40

50

60

70

80

Distance to CBD (kms)

Have already mentioned the large variability in


prices for suburbs close to the CBD. Could say this
more formally - the variance of prices close to the
CBD (conditional variance) is much larger than
the variance of prices further away from the CBD.
Other outliers around 30kms from CBD
(Clareville, Palm Beach and Whale Beach).
There is no suspicion that these outliers are due to
errors. All are feasible observations.
Can see that the price and distance variables are
both skewed to the right.
There are numerous suburbs where there were no
sales. Most of these are suburbs relatively close to
the CBD.
What should we do with the zero sales
observations when we analyse the data? They are
not data errors as sometimes occur. But they are
not real zeros as we dont know what the price
would have been had there been sales for the
period in question.
8

7. Anzac Grange wants to develop guidelines for setting


prices of cars according to the cars age. They hire a
business consultant who chooses a sample of 117
second-hand passenger car advertisements collected
from www.drive.com.au and retrieves data on age and
price of the cars.
(a) The business consultant first calculates the correlation
coefficient between age and price and finds it to be 0.278. Interpret this result.

Correlation coefficients lie between -1 and 1. A negative


value suggests an inverse relationship between the
variables. A magnitude of (-)0.278 suggests that the
relationship is not very strong.
(b)Then the business consultant constructs a simple
linear regression model using price (in dollars) as the
dependent variable, and age (in years) as the
independent variable. This model can be represented
by:

pricei 0 1agei ui
Interpret the ordinary least squares coefficient
estimates, found to b0= 47,467 and b1 = - 2,658.
The estimated slope coefficient of -2658, suggests that
for every year older, second-hand car prices are
expected to drop by $2,658. The sign is as expected:
older cars tend to have a lower value. The sign is also
consistent with the negative correlation coefficient.
Literally the intercept is the predicted price of second
hand cars with age = 0, i.e. $47,469. As is sometimes the
case, interpretation of intercepts may be somewhat
problematic. In this particular situation all second-hand
cars have age > 0.

10

You might also like