Data Description

DATA DESCRIPTION
SUBTOPICS
1. Data Representation .
2. Measures of Central Tendency.
3. Measures of dispersion.
1.0 INTRODUCTION
Statistics is a science.
Two branches
descriptive statistics
inferential statistics
Uses in the field of science, business,
industry, economy, medicine, agriculture
and so on.
Statistik perihalan
1.1 DATA REPRESENTATION
Data
Data is a collection of observations,
measurements or information.
Two type :
Quantitative information
divided into
Discrete data
Continuous data
Qualitative information
Discrete data
Data discrete can be described variable which is
a variable that can only assume particular
numerical values over a certain interval. They
are usually obtained by counting.

Continuous data
Continuous data can be described by a
continuous variable that can assume any
numerical values over a certain interval. They
are obtained by measuring and the accuracy
depends on the measuring instruments.
Frequency distribution
Frequency distribution is a table that
contains a list of data values and its
frequency. Frequency is the number of
times a value occurs.
Ungrouped frequency distribution.
Grouped frequency distribution.
Terminologies that are used in the
frequency distribution.
Step for constructing frequency distribution.

Ungrouped frequency distribution

Example 1:

The following data is a record of the number
residents living in each unit for a block of
flats.

8 4 5 6 7 3 5 7
4 5 6 2 4 6 4 5
5 3 7 5 3 5 6 3
6 5 3 4 5 8 5 6

Table 1.1
The number of
residents in
each unit

Tally

Number of unit
(frequency)
2 / 1
3 //// 5
4 //// 5
5 //// //// 10
6 //// / 6
7 /// 3
8 // 2
Table 1.2
Grouped data

Example 2:

Based on the record of a construction
company, the age of the buyers are as
shown in Table 1.3.

42 34 50 26 40 36 47 30 38 31
30 38 45 34 58 42 33 36 53 42
46 41 32 38 52 33 44 41 34 37
45 27 31 39 40 32 37 48 34 51
54 32 46 36 34 44 29 41 31 38
33 39 56 31 45 37 49 36 43 32

Table 1.3

Age (years)
(Class interval)
Number of buyers
(Frequency)
25 29 3
30 34 18
35 39 13
40 44 11
45 49 8
50 54 5
55 59 2
Age (years)
(Class interval)
Number of buyers
(Frequency)
25 29 3
30 34 18
35 39 13
40 44 11
45 49 8
50 54 5
55 59 2
Frequency distribution terminologies

Class limit : the biggest and smallest values of a class
Class boundaries of a class (or the actual class limits) :
the central value between the upper class limit and the
lower class limit of the next class.
The size or the width of a class : the distance between
the class boundaries.
Class size = upper class boundary lower class
boundary
Class mark or the midpoint of a class : average value
of the upper and lower class limit or the average value
of the upper and lower class boundaries.
EXAMPLE 3:

From the class 30 40:
Lower class limit =
Upper class limit =
Lower class boundary =

Upper class boundary =

Class size =
=
=

25
29 30 34
35 39
5 . 29
2
30 29
=
+
30
34
5 . 34
2
35 34
=
+
34.5 29.5
5
upper class boundary lower class boundary
Midpoint of the class

=

=

2
34 30
2
it lim Lower it lim Upper +
=
+
32
or
2
5 . 34 5 . 29
2
boundary lower boundary upper +
=
+
=
= 32
Step for constructing frequency distribution
Choose a suitable number of classes. As a guide line,
Sturges formula,
k = 1 + 3.3 log n
n is the number of data
Determine the range for the set of data.
Range = Highest value Lowest value
Determine the class size.

State the class limit. The class is chosen such that
there is no overlapping between classes and each
observation can go to only one class.
Determine the frequency of the class by using the tally
method.

classes of Number
Range
size Class =
Example 4:

Table 1.4 shows the masses ( to the
nearest 0.1 kg ) of 40 newborn babies.

3.0 3.5 4.3 2.9 3.6 3.4 2.4 3.5 3.4 2.7
2.7 2.8 3.4 3.7 1.6 3.1 2.8 3.2 3.5 3.3
3.5 2.4 3.5 3.0 3.4 2.9 3.7 2.3 3.4 3.2
3.1 2.9 3.3 3.6 3.5 2.6 3.2 2.8 3.1 3.4

Table 1.4
Solution

Number of data, n = 40
Using Sturges formula, the number of class
= 1 + 3.3 log 40
= 6.29
= 6 ( nearest integer)
Highest value = 4.3
Lower value = 1.6
Range = 4.3 1.6
= 2.7
Class size = 2.7
6
= 0.5 ( 1 d.p)
Frequency distribution
Mass (kg) Tally Frequency
1.5 - 1.9 / 1
2.0 - 2.4 /// 3
2.5 - 2.9 //// //// 9
3.0 - 3.4 //// //// //// / 16
3.5 - 3.9 //// //// 10
4.0 - 4.4 / 1
Stem plots (stem and leaf diagrams)
Stem plots is another technique of illustrating the
quantitative data.
Steps for constructing stem plot:
Separate each value into two parts, i.e. the stem and the leaf.
Draw the vertical line and list the stem on the left following the
magnitude starting from the smallest number.
List the leaf
After arranging the leaves in ascending order and by giving
a key.
Stem plots can be used to compare two data
samples by drawing back-to-back stem plots.
The stem plot gives a good picture of the shape
of the data distribution. We can also read the
highest value and the lowest value and
determine the mode.
Example 5:
The following data shows the number of
patients treated daily in a clinic over a
period of 3 weeks.

60 31 42 55 64 68 78
59 53 63 81 44 58 62
75 65 72 69 86 75 47

Solution :

stem leaf
3 1
4 2 4 7
5 3 5 8 9
6 0 2 3 4 5 8 9
7 2 5 5 8
8 1 6
Key: 4 2 means 42

QUESTION 1:

Based on the following statements, determine
whether the data obtained is discrete or
continuous data.
1. The age of a person.
2. The result obtained when a fair die is thrown.
3. The time taken to run 100 metres.
4. Set negative.
5. The number of robberies reported per day.
6. The diameter of a tennis ball.
QUESTION 2 :

The scores on a quiz in statistics range 1 through 10 as
follows.

8 5 3 7 6 6 9 5 4 8
7 1 2 5 6 5 4 9 7 6
5 10 8 10 7 6 9 5 6 4

Use a interval length of 2 and a lower limit of 1 for interval,
construct the scores in frequency table of 5 intervals.
QUESTION 3:

The following data shows the duration (in days) between
the time the electric bill was issued and the payment was
made.

22 15 29 41 14 35 21 8 25 36
1 11 38 11 32 17 5 23 28 16
15 21 24 34 18 13 16 25 2 9
18 31 30 16 27 26 12 10 18 20
17 4 20 20 9 33 28 6 24 15

Construct a frequency table with a class interval of 7
days and the lower class limit of the first class is 1 day.
QUESTION 4:

The masses (in g) of random sample of 24 biscuits were
recorded as follows.

0.92 0.98 0.97 1.03 1.05 1.11 1.22 1.32
1.51 1.46 1.09 1.13 1.25 0.99 1.35 1.43
0.95 1.10 1.23 1.03 1.45 1.03 1.06 1.53

Construct stem and leaf diagram for the given data and
write down the mode mass.
QUESTION 5:

Use a back-to-back stem plot to compare the examination
marks in Statistics and Economics. For a class of 20
students.

Statistics : 68 59 76 80 62 62 48 59 47 39
83 73 66 43 51 88 70 84 79 76

Economics : 56 64 70 51 43 61 40 62 81 72
37 80 43 70 67 42 57 52 53 66
HISTOGRAM
(a) Histogram with equal class widths
The step for constructing a histogram are as
follows.
Step 1 : Determine the class boundaries for each
class.
Step 2 : Choose a suitable scale for the horizontal
axis and mark the class boundary.
Step 3 : Choose a suitable scale for the vertical
axis to represent frequency.

Example 6:
The table below shows the time taken between
the ordering and serving of food for 50 randomly
selected customers who were chosen at random
in a restaurant.

Illustrate the data by means by of a histogram.

Time(min) 10 - 12 13 - 15 16 - 18 19 - 21 22 - 24 25 - 27 28 - 30 31 - 33
Number of
customers
2 10 18 9 5 3 2 1
Solution

Time(min) Number of
customers
Lower
boundary
10 - 12 2 9.5
13 - 15 10 12.5
16 - 18 18 15.5
19 - 21 9 18.5
22 - 24 5 21.5
25 - 27 3 25.5
28 - 30 2 27.5
31 - 33 1 30.5
Graph
(b) Histograms with different class width

- The height of each bar is proportional
to the
Frequency
Class width

- Then use frequency density to
represent the height of a
rectangle.
Example 7:

The following table shows the electricity
consumption of 100 households in a residential
area for a duration of one month.

Electricity
consumption
(kilowatt-hours)

100 - 199

200 - 249

250 - 299

300 - 349

350 - 399

400 - 549
Number of
households
20 26 23 17 8 6
Solution
Choose a class width of 50 as the standard
width.
Lower
boundary
Upper
boundary
Class
width
n
Class width
Standard width
Frequency Frequency
density
99.5 199.5 100 2 20 10
199.5 249.5 50 1 26 26
249.5 299.5 50 1 23 23
299.5 349.5 50 1 17 17
349.5 399.5 50 1 8 8
399.5 549.5 150 3 6 2
Graph
Question 1:

The table below shows the time taken between the
ordering and serving of food for randomly selected
customers who were chosen at random in a
restaurant.

Time (min) Number of customers
10 - 12 2
13 - 15 10
16 - 28 18
19 - 21 9
22 - 24 5
25 - 27 3
28 - 30 2
31 - 33 1
Illustrate the data by means of a histogram
Question 2:

The time taken by each of the 40 runners in
a 100 metres race in a school is as following:

Time (seconds) Number of runners
11.5 < x 12.0 2
12.0 < x 12.5 5
12.5 < x 13.0 9
13.0 < x 13.5 13
13.5 < x 14.5 8
14.5 < x 16.0 3
Plot a histogram to display the above data
Cumulative Frequency Curve
Cumulative frequency is the total number of data that is
less than a particular value (usually the upper class
boundary).
Cumulative frequency curve a curve obtained by
representing the upper class boundary along the
horizontal axis and the corresponding cumulative
frequencies along the vertical axis.
There are two types:
(a) More than cumulative frequency curve
- is the sum of the frequencies for classes above that
class.
(b) Less than cumulative frequency curve
- is the sum of the frequencies for classes below that
class.

Example:

Consider the following frequency distribution which
shows the mark obtained by 50 students in a
Mathematics test.
Marks Frequency
50 54 2
55 59 3
60 64 8
65 69 12
70 74 15
75 79 6
80 84 3
85 89 1
Draw a cumulative frequency curve.
Solution:
Upper boundary Less than, cumulative frequency
< 49.5 0
< 54.5 2
< 59.5 5
< 64.5 13
< 69.5 25
< 74.5 40
< 79.5 46
< 84.5 49
< 89.5 50
Graph
Solution:
Upper boundary More than, cumulative frequency
> 49.5 50
> 54.5 48
> 59.5 45
> 64.5 37
> 69.5 25
> 74.5 10
> 79.5 4
> 84.5 1
> 89.5 0
Graph
Measures of Central Tendency
Measures of location or averages.
- Descriptive measures that indicate where
the centre or most typical value of a data
set lies.
Three important types:
1. mean
2. median
3. mode
1. Mean (Arithmetic mean)
a) Mean of ungrouped data
- is the sum of values of all observations divided by
the total number of observations.
i) For a set of n data x
1
, x
2
, x
3
,., x
n
. The mean

n
x
n
x
n
x x x x
data of Number
data all of Sum
x
n
i
i
n
=
=
+ + + +
=
=
=1
3 2 1
......
Example 1:

The following data shows the amount of vitamins C in 8
oranges. The figures given are in mg per 10 g.

1.15 1.56 1.52 1.32 1.10 1.26 1.45 1.36

Solution
34 . 1
8
36 . 1 45 . 1 26 . 1 10 . 1 32 . 1 52 . 1 56 . 1 15 . 1
=
+ + + + + + +
=
(ii) For data in an ungrouped frequency distribution. If a set
of data x
1
, x
2
, x
3
,.., x
n
occurs f
1
, f
2
, f
3
, ., f
n
times
respectively, then the mean,

=
=
+ + +
+ + + +
=
=
=
f
f x
f
x f
f f f f
x f x f x f x f
x
n
i
i
n
i
i i
n
n n
1
1
3 2 1
3 3 2 2 1 1
......
......
Example 2 :
The following table shows the number of typing errors in a
report that consists of 100 pages.

Solution

Number of errors in one page 0 1 2 3 4
Number of pages 64 23 8 4 1
Number of errors (x) Number of pages (f) fx
0
1
2
3
4
64
23
8
4
1
0
23
16
12
4
=100 f

=55 fx
55 . 0
100
55
=
=
=
f
f x
x
b) Mean of grouped data
When a set of data is grouped frequency distribution, the
mean

where x = mid-point of the class

= (lower class boundary + upper class boundary)

and f is the corresponding frequency.

2
1
=
f
f x
x
Example 3 :

The waiting time for 40 customers at a bank
is shown in the table below.

Find the mean waiting time of a customer for the
above distribution.

Time
(minutes)
1 - 4 5 - 8 9 - 12 13 - 16 17 - 20 21 - 24 25 - 28
Number of
customers
3 8 13 9 4 2 1
Solution

Times (minutes) Mid point, x Number of customers, f fx
1 4 2.5 3 7.5
5 8 6.5 8 52
9 12 10.5 13 136.5
13 16 14.5 9 130.5
17 20 18.5 4 74
21 24 22.5 2 45
25 28 26.5 1 26.5
= 40 f

=472 fx
8 . 11
40
472
=
=
=
f
f x
x
Use the coding method to find the mean
(a) For ungrouped data :

The coding formula where i =1, 2, ., k, is

being used to transform the set of numbers x
1
, x
2
, , x
k

to the set of numbers y
1
, y
2
, ., y
k
. Then

a is the assumed mean and n is the scaling factor.

n
a x
y
i
i

=
y n a x
ny a x
a x ny
i i
i i
+ =
+ =
=
(b) For grouped data :
The coding formula where i =1, 2, ., k

and n is usually the class witdh for grouped data, if all
class interval are of equal witdh. The set of data x
1
, x
2
,
, x
k
with respective frequency f
1
, f
2
, ., f
k
is
transformed to another set of data y
1
, y
2
, ., y
k
with
respective frequency f
1
, f
2
, ., f
k
. The mean of the set
x
1
, x
2
, , x
k
with frequency f
1
, f
2
, ., f
k
by definition is

n
a x
y
i
i

=
) (
) (
y n a
f
f y n
f
f a
f
ny a f
f
f x
x
+ =
+ =
+
=
=
Example :
A researcher measured the height (in cm) of seedlings that have been planted
for two years and the following result:

Find the mean height of the seedling using the coding method.

Height (cm) Frequency
55 59
60 64
65 69
70 74
75 79
80 84
85 89
90 94
95 99

3
6
10
13
21
14
8
4
1
Solution:
By choosing a = 77, n = 5 thus

Height (cm) Midpoint, x f fy
55 59
60 64
65 69
70 74
75 79
80 84
85 89
90 94
95 99
57
62
67
72
77
82
87
92
97
3
6
10
13
21
14
8
4
1
-4
-3
-2
-1
0
1
2
3
4
-12
-18
-20
-13
0
14
16
12
4
5
77
=
x
y
5
77
=
x
y
=80 f

= 17 fy
9 . 75
80
17
5 77
=
|
.
|
\
|

+ =
|
|
.
|
\
|
+ =
+ =
f
f y
n a
y n a x
Question 1

Find the mean of the following numbers.

8, 5, 7, 9, 10, 6, 9, 8, 7, 10

Answer: 7.9
Question 2

The following frequency distribution shows the diameter
(in cm) of twenty tubes manufactures by a machine.

Find the mean diameter.

Answer : 5.6

Diameter (cm) 5.0 5.3 5.8 6.2 6.5
Frequency 5 6 3 4 2
Question 3

The incomes in RM of 100 employees are given in
the following table. Estimate the mean income.

Answer : RM1370

Income, I (RM) 800 I < 1000 1000 I < 1200 1200 I < 1400 1400 I < 1600 1600 I < 1800
Frequency 10 25 15 20 30
Question 4

The scores obtained by 50 students on a quiz in Statistics
were recorded as follows.

Answers : 75.08
Score 65 67 69 71 73 75 77 79 81 83
Frequency 2 3 3 5 7 6 8 9 4 5
Question 5

Using an appropriate coding, determine the
mean mark for the grouped frequency
distribution given below.

Answer : 55.93
Marks 10 - 19 20 - 29 30 - 39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
Frequency 2 5 6 12 15 13 10 4 3
Median
1. Median for ungrouped data
- if a set data has n observation, that is, x
1
, x
2
,
x
3
,.,x
n
which has been arranged in
ascending order, then

(a) the median is the observation
when n is odd.
(b) the median is the mean of the and

observation when n is even.

th
n
|
.
|
\
|
+
2
1
th
n
|
.
|
\
|
2
th
n
|
.
|
\
|
+1
2
Example 1:
(a) 75, 67, 48, 66, 89, 51, 70

Solution:
Arrange the number in ascending order, that is

48, 51, 66, 67, 70, 75, 89
Number of observation is, n = 7

= 4th

Therefore, the median = 67
th
|
.
|
\
|
+
=
2
1 7
Example 2:
14, 16, 17,17, 18, 21, 23, 27, 29, 29, 30, 32

Solution
Number of observation, n=12

( )
( )
22
23 21
2
1
7 6
2
1
5 . 6
2
1 12
=
+ =
+ =
=
|
.
|
\
|
+
=
n observatio n observatio
th
th th
th
2. Median of grouped data
- is value of the observation in the center
after all observations of the data are
arranged in order.
- the median is the observation and it

can be estimated by using the following steps.
(a) Find the median class.
(b) Determine the total frequency before the
median class.
(c) Use the method pf proportion to calculate
the median.
th
n
f
|
|
.
|
\
|
( )
class median of witdh c
class median of f requency f
class median bef ore f requency cumulative F
class median of boundary lower L
where
c
f
F f
L M
m
B
B
m
B
B
=
=
=
=
(
(
(

+ =

,
2
1
Example 3:

Find the median for the data in the following
grouped frequency distribution.
Class 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frequency 5 8 22 18 36 29 20 12
Solution:

Class boundary Frequency, f Cumulative frequency
0.5 2.5 5 5
2.5 4.5 8 13
4.5 6.5 22 35
6.5 8.5 18 53
8.5 10.5 36 89
10.5 12.5 29 118
12.5 14.5 20 138
14.5 16.5 12 150
n observatio th
n observatio th Median
f n observatio of Number
75
2
150
150 ,
=
|
.
|
\
|
=
=
( )
( )
7 . 9
2 . 1 5 . 8
2
36
53 75
5 . 8
2
36
53 150
2
1
5 . 8
2
1
2
36
53
5 . 8
=
+ =
|
.
|
\
|

+ =
(
(
(

+ =
(
(
(

+ =
=
=
=
=
c
f
F f
L Median
c
f
F
L
m
B
B
m
B
B
C
u
m
u
l
a
t
i
v
e

f
r
e
q
u
e
n
c
y

Data
2
f
median
f
r
e
q
u
e
n
c
y

data
median
MODE
- The mode of a set of data is the value that
occurs most frequently.
A) Mode for ungrouped data
Example 1:
Find the mode for the following set of
data:
(a) 74, 9, 5, 8, 3, 8, 8
(b) 0.2, 0.6, 0.6, 0.6, 0.3, 0.8, 0.8, 0.8, 0.3
(c) 2, 2, 6, 6, 8, 8, 9, 9, 13, 13

Solution:

(a) 74, 9, 5, 8, 3, 8, 8
Mode = 8

(b) 0.2, 0.6, 0.6, 0.6, 0.3, 0.8, 0.8, 0.8, 0.3
Mode = 0.6 and 0.8

(c) 2, 2, 6, 6, 8, 8, 9, 9, 13, 13
No mode
unimodal
bimodal
Mode does not exist when every
observation has the same frequency
B) Mode for grouped data
- when the data is grouped into class
interval, the class with the highest
frequency is called the modal class.

- estimate the mode of the data using
the following the two method:
(a) Drawing a histogram
f
r
e
q
u
e
n
c
y

data
Estimate mode
C B
D
A
(b) Calculation method
f
r
e
q
u
e
n
c
y

data
m
0
d
1
c

d
2
a

. mod
. mod
mod
2
. mod
1
,
2 1
1
0
class al the of width c
class al the of boundary lower a
it after immediatly class the and class al the of frequency the between difference d
it preceding immediatly class the and class al the of frequency the between difference d
where
c
d d
d
a m
=
=
=
=
|
|
.
|
\
|
+
+ =
Example 2:

The table below shows the distribution of the height of
30 plants of type A which have been planted for 6
weeks. These heights are measured to the nearest cm.
Estimate the mode of the distribution.
Height (cm) 3 5 6 8 9 11 12 14 15 17 18 20
Frequency 1 2 11 10 5 1
Answer : 11.2 cm
Relative Frequency Distribution
Show the proportion or percentage of data in class
interval.
Relative frequency.
= frequency of class and the total relative
total frequency
frequency is equal to 1.

=

Percentage distribution

=
f
f
100
f
f
Question 1:
The data below refers to the distance (in km) from its colony, where
100 adult sea birds are found dead.

(i) Determine the mean of the data.
(ii) Display the result in a cumulative frequency graph.
(iii) Find an approximation of the median.

Distance from
colony (x)
Number of
bird
25
35
20
15
5
100 s x
200 100 s < x
300 200 s < x
500 300 s < x
1000 500 s < x
Answer : (i) 212.5 km
(ii) 171 km
Question 2:
The relative frequency distribution of the
marks obtained by 50 social science
students in a mathematics quiz is shown in
the following table.

Find the percentage of the students whose
marks obtained in the quiz exceeds the
mean.

Marks 5 6 7 8 9 10
Relative
frequency
0.08 0.12 0.18 0.24 0.22 0.16
Answer : 62%
Measures of Dispersion
A descriptive measure that indicates the spread
or variability of data set.
Frequently used measures of dispersion:
(a) Range
i) Range for ungrouped data
The range for a se of data x
1
, x
2
,., x
n
is
defined as the difference between the biggest
and smallest value in the set.

Range = biggest value smallest value
Example:

4, 9,16, 20, 11,15,19, 14

Range = biggest value smallest value
= 20 4
= 16

* If there are extreme value in the data set, range
is not suitable to be used as a measure of
dispersion
ii) Range for a frequency distribution
Range for a frequency distribution is
defined as the difference between the
midpoint of the highest class and
midpoint of the lowest class.

Range = midpoint of the highest class midpoint of the lowest class

Quartile for ungrouped data
- Value which divide a set of data arranged in
ascending or descending order into four equal
parts.
- If a set of data has n observation, that is x
1
, x
2
, x
3
,
., x
n
, which has been arranged in order
where is an integer, then

First quartile,

Third quartile,
n
4
1
) (
2
1
1 1 +
+ =
r r
x x Q
) (
2
1
1 3 3 3 +
+ =
r r
x x Q
Example 1:
Find the first quartile, median and third quartile for the
following set of data.
(a) 114, 120, 133, 138, 145, 148,151
(b) 15, 19, 20, 22, 25, 27, 27, 28, 31, 32

Solution
(a) 114, 120, 133, 138, 145, 148,151

First quartile, Q
1
= 120
Median, Q
2
= 138
Third quartile, Q
3
= 148

2 75 . 1
4
7
4
1
, 7 ~ = = = n n
Q
1
Q
2
Q
3
(b) 15 19 20 22 25 27 27 28 31 32

First quartile, Q
1
= 20

Median, Q
2
=

Third quartile, Q
3
= 28

Q
1
Q
2
Q
3
5 . 2
4
10
4
1
, 10 = = = n n
) 27 25 (
2
1
+
Quartile for grouped data

( )
c
f
F f
L Q
m
B
B
(
(
(

+ =

4
1
1
( )
c
f
F f
L Q
m
B
B
(
(
(

+ =

2
1
2
( )
c
f
F f
L Q
m
B
B
(
(
(

+ =

4
3
3
Example 1:
The following table shows the ages of 50
mothers when they gave birth to their first-
born.

Find the values of Q
1
and Q
3
.
Ages (x years) frequency
20 x < 23 1
23 x < 26 8
26 x < 29 13
29 x < 32 20
32 x < 35 7
35 x < 38 1
Interquartile range and Semi interquartile range

Interquartile range = Q
3
Q
1

Semi interquartile range =

Example 1

Find the range, first quartile, median, third quartile,
interquartile range and semi-interquartile range for
the following set of data.
(a) 2.2, 0.6, 1.2, 2.3, 1.5, 0.9
(b) 127, 162, 221, 135, 346, 153, 341, 235

Example 2

The above table shows the number of
fishes reared in each house in a row of 25
house in Green Road. Find the median
and semi-interquartile range for the data.

Number 1 2 3 4 5 6
Frequency 1 5 8 7 3 1
Example 3:
The following table shows the height
distribution for a group of students. Find
the first quartile , third quartile and semi-
interquartile range.

Answer: 160.28, 168.58, 4.15

Height
(cm)
150 155 155 160 160 165 165 170 170 175 175 180
Frequency 15 32 68 52 24 12
Example 4:
The table below shows the distribution of
the mass of babies (in kg) for babies born
in a hospital from January to June. Draw
an ogive to shows the frequency
distribution. From your ogive, find the first
quartile and third quartile for the mass of
the babies.

Answer : 2.0kg, 3.3 kg
Mass (kg) 0.0 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 5.0 5.0 6.0
Number 12 233 442 185 96 32
Standard deviation
Ungrouped data:

Example 1:
Find the standard deviation for the set of
data {4, 5, 6, 7, 8, 9, 10}

Answer: 2

2
( ) x x
s
n
=

2
2 x
s x
n
=
2
2
x x
s
n n
| |
=
|
|
\ .

Example 2:
Find the standard deviation for the set of
data.
3, 5, 6, 4, 6, 5, 6, 8, 5
Answer: 1.33

Example 3:
Find the mean and standard deviation for
the following set of data.
13, 23, 35, 6, 28, 35, 48, 12, 37
Answer: 13.10

Grouped data:

Example 1:
Find the mean and standard deviation for the data below.

Answer: 8.5, 2.69

2
( ) f x x
s
f
2
2 fx
s x
f
=
2
2
fx fx
s
f f
| |
=
|
|
\ .

Observation Frequency
3 4
5 22
8 35
9 36
12 17
15 6
Example 2:
Find the mean and standard deviation for
the data in that distribution.

Answer : 29.50, 11.17

Class Frequency
0 9.9 5
10 19.9 13
20 29.9 23
30 39.9 31
40 49.9 16
Coding method

where k is the assumed mean and
h is the scaling factor

Therefore, the standard deviation of x = h x standard deviation of y

x k
y
h
=
x y
s hs =
Example 1:
By using a suitable assumed mean, find the
standard deviation for the following
frequency distribution.

Class Frequency
51 52 7
53 54 12
55 56 28
57 58 35
59 60 23
61 62 18
63 64 11
65 66 7
Variance

Ungrouped data

Grouped data

* Standard deviation =

2
2 2
2
( ) x x x x
s
n n n
| |
= =
|
|
\ .

2
2 2
2
( ) f x x fx fx
s
f f f
| |
= =
|
|
\ .

var
Symmetry and skewness of data distribution
(a) Symmetrical distribution (bell shaped)

(b) Positively skewed distribution (skewed to the right)

Mean = median = mode
mean mode
median
Mode < median < mean
(c) Negatively skewed distribution (skewed to the left)

Example 1:
A farmer record the number of eggs collected from a farm for 65 days.
The frequency distribution for the number of eggs collected is
summarized in the table below.

Determine the mode, mean and median for the above distribution.
Comment on the shape of the graph.

mode
median
mean
Mean Median >Mode
Number 10 11 12 13 14 15 16 17
Frequency 2 3 6 9 10 15 12 8
Pearson coefficient of skewness

If mean = mode, Pearson coefficient of skewness = 0, symmetrical
distribution.
If mean > mode, Pearson coefficient of skewness = + ve, positively
skewed distribution.
If mean < mode, Pearson coefficient of skewness = - ve, negatively
skewed distribution.

Pearson coefficient of skewness
mean-mode
standard deviation
=
3(mean - median)
standard deviation
=
Example 1
Calculate the Pearson coefficient of
skewness for each of the following case.

x 20 30 40 50 60 70 80
y 11 13 18 28 14 5 3
Boxplot

Symmetry and skewness of data distribution

Smallest
value
median
Q
1
Largest
value
Q
3
Q
1
Q
3
Q
2
Symmetrical distribution

Q
1
Q
3
Q
2
Q
1
Q
3
Q
2
Negatively skewed distribution
Positively skewed distribution
Use of boxplots to identify outliers

Upper boundary =

Lower Boundary =

Q
1
Q
3
Outlier Outlier
Boundary
Boundary
Last value inside
boundary
Last value inside
boundary
1.5
(Q
3
Q
1
)

1.5
(Q
3
Q
1
)

3 3 1
1.5( ) Q Q Q +
3 3 1
1.5( ) Q Q Q
Example 1:
Auto classic Company has 48 used cars for sale.
The table below shows the age, x (in years) of the
cars.

(a)Find (i) the median age,
(ii) the first and third quartiles for this
distribution.
(b) Draw a box plot to represent this data.
(c) State the type of distribution base on your
boxplot.

Age (x) 1 2 3 4 5 6 7 8 9
Frequency 7 12 8 6 5 4 3 2 1
Example 2:
The following data shows a summary of the
marks for Mathematics and Biology for
students in a class.

Draw two boxplots for this data give
comments regarding the distribution of
marks for Mathematics and Biology.

Subjects Minimum Maximum Median First
Quartile
Third
Quartile
Mathematics 10 90 60 45 70
Biology 35 85 60 48 72
Example 3:
The following stem plot shows the maximum
temperature for each day from 1
st
August to
23
rd
August in a town. Draw a boxplot and
use your boxplot to indentify the outlier.
Stem Leaf
7 6 7
7 0 2 2 3
6 5 7 8 8 8 9 9
6 2 3 3 4 4 4 4 4
5 9
5 1

Data Description

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Description

Uploaded by

Copyright:

Available Formats

DATA DESCRIPTION

You might also like