Professional Documents
Culture Documents
A S 12.4.3
Did you hear about the statistician who put her head in the oven and her feet in the refrigerator? She said, "On average, I feel just fine."
Which average?
All three averages are useful for summarizing e.g. the distribution of household incomes.
In 1998, the income common to the greatest number of households (mode) was R25 000. Half the households (median) earned less than R38 885. Reporting only one measure of central The mean income was R50 and tendency might be misleading 600. perhaps reflect a bias.
height (cm)midpoint (x) frequency (f) 150- <155 155- <160 160- <165 165- <170 152,5 157,5 167,5 172,5 162,5 177,5 4 7 18 11
The table shows the heights of 50 randomly chosen Grade 12 school girls.
(f)(x) 610,0 1102,5 2925,0 1842,5 1035,0 710,0 8225,0
/50 = 164,5 cm
height (cm) midpoint 150- <155 155- <160 160- <165 165- <170 170- <175 152,5 157,5 162,5 167,5
frequency (f) 4 7 18 11 6
11
21
Median is the mean of 25th and 26th girl. Both 4 25th and 26th midpoint is 162,5.
Measures of Variability
It is a single summary figure that describes the spread of data within a distribution. Range difference between the smallest and largest observations. Percentiles where p% of the values falls below a certain value. Interquartile Range (IQR) - Range of the middle half of median scores.
The results of a survey of the travelling time (in minutes) of 200 workers are asFreq Cum Time Complete the follows. freq (min) (f) cumulative 0<x10 28 28 frequency table.
10<x20 37 20<x30 55 30<x40 44 65 120 164 186
numbers
200 Use the graph 180 to estimate: 160 (150 140 the median Q2= 27 ) 120 100 the interquartile range 80 60 Q3=37; Q1= 16 40 IQR = 37 - 16 (100) 20 =21 0 0 10 20 30 40 50 60
Cum freq
(50)
Time (min)
value, first quartile (Q1), median (Q2), third quartile (Q3) and highest data value. Unathi sells the following number of computers in 12 months: 34, 47, 1, 15, 57, 24, 20, 11, 19, 50, 28, 37 Arrange the data in ascending order. 1, 11, 15, 19, 20, 24, 28, 34, 37, 47, 50, 57
1, 11, 15, 19, 20, 24, 28, 34, 37, 47, 50, 57 Q1 Minimum = 1 Median = Q2 = Q1 =
(1 +9 5 1) (2 +8 4 2)
median
Q3 Maximum = 57
/2 = 26
(3 +7 7 4)
/2 = 17
Q3 =
/2 = 42
Barry also sells computers during a 12 month period. Below is a 5-number summary for each person. Unathi min 1 Q1 17 Q2 Q3 26 42 Barry 6 15 32 46 Which person would you most likely want to appoint for your company? Barry Explain.
max 57 62 Barrys highest and lowest sales are higher than Unathis corresponding sales, and Barrys median sales figure is also higher than Unathis.
Percentiles
A percentile is a score below which a certain percentage of values fall. There are 100 percentiles in a sample.
e.g. If your test score is in the 95th Oscars height is at the 90th percentile and percentile, at the 60th percentile his weight isit means that if 1000 for his students took the test, at least 950 age. students did worse than build in at Describe Oscar's physical you and general most terms. 49 students did better than you. He is taller thanmean of the It does not 90% that people but only weighsfor you received 95% more thanthe test.the people 60% of possibly tall and thin.
In 2004 the snow depth at Tiffendell was measured (in mm) for 25 days and recorded. 242, 228, 217, 209, 253, 239, 266, 242, 251, 240, 223, 219, 246, 260, 258, 225, 234, 230, 249, 245, 254, 243, 235, 231, 257. Depth (mm) freq cum freq
200- 210- 220- 230- 240- 250- 260210 220 230 240 250 260 270 1 1 4 2 3 12 3 6 24 5 1 1 4 4 7 1 8 7 2 5 23 92 2 2 5 100
Plot the graph of snow depth against the cumulative frequency and cumulative % using two different vertical axes. 25 100 percentiles
20 80
Cum freq
Cum %
15 10 5
60 40 20
Depth (mm)
25 20
100 80 60 40 20
Cum %
15 10 5
For how many days was the depth at least 250mm? 25-18 = 7 days
Cum freq
Depth (mm) A year later (2005) the depth is shown by the broken line. Explain which year had possibly better 2004 - depth greater on more days. E.g. 44% below 240mm compared to 80% below 240mm in
Standard Deviation
Standard deviation is useful when comparing the spread of two or more data sets that have approximately the same mean. This technique is best used with symmetric distributions with no outliers. The smaller the standard deviation the narrower the spread of measurements around the mean, as it has possibly few high or low values. e.g. If the mean of a data set is 5 and the S.D. is 2, then on average the data lies between 3 and 7.
Super Crisps come in 25g bags. There are two machines (A & B) producing the chips. A 25,6 24,8 25,7 engineer weighs a A 25,3quality control25,5 25 24,9 25,7 25,5 25,6 sample of 25,3 25,4 24,9 each machine. B 25,3 10 bags from 25,3 25,3 25,4 25,4 25,4 25,3 Calculate the mean of each machine. For A: mean = 253,6 /10 = 25,36g For B: mean = 253 /10 = 25,3g
Below is the variance and standard deviation for each machine. Mean Variance S.D.
A B 25,36g 25,3g 1,044 0,2 1,02 0,45
Super Crisps will be taken to court if it is found their bags are less than 25g. Which machine gives the best chance of avoiding this fate? Machine B Explain. A: Mass of chips are on average from 24,34g to 26,38g. B: Mass of chips are from 24,85g to 25,75g. B has a narrower spread than A. Its smallest value is very