Professional Documents
Culture Documents
(Summary of Chapters 1 to 4)
Data Representation and summarisation techniques
Variable
a quantity which can vary in value
Variables can be classified into quantitative variables and qualitative variables.
Variables
Quantitative
Variables
Qualitative
Variables
Quantitative variables
- Numerical values are used to represent the value. e.g.:
weight, mass, no. of members in a family
Qualitative variables
- Non-numerical descriptors are used to represent the
value. e.g.: Religion, race, skin colour
Continuous
Variables
Continuous variables
e.g.: weight, mass, time
Discrete
Variables
Discrete variables
- These values can take up only specific values in a given
range. e.g.: no. of members in a family, no. of students in a class
Page 1 of 8
Summarisation of data
Summarisation of data has to be done to identify important features with the data
which help us to make deductions about the data which in turn help us to make
predictions for the data.
Data summarisation can be done by either of the following two methods:
Data Summarisation
e.g.:
Value
1
2
3
4
Frequency
10
8
1
2
Cumulative Frequency
10
18
19
21
Page 2 of 8
Weight (w) Kg
0w<9
9 w < 18
18 w < 28
28 w < 38
Frequency
10
9
5
20
Cumulative frequency
10
19
24
44
Lower Class Boundary = Lower class limit + Upper class limit of class above
2
Upper Class Boundary = Upper class limit + Lower class limit of class below
2
Class width = Upper class boundary Lower class boundary
Class-mid value = Lower Class boundary + Upper Class boundary
2
(c) Histograms
e.g.:
To compare two sets of data, a back to back stem and leaf diagram can be used.
Page 4 of 8
Description
How to calculate
Advantages
(1)Mode
(2) Median
Easy to
calculate
Disadvantages
No important
mathematical
features
No important
mathematical
features
Not affected
by extreme
values
Relatively easy
to
Calculate
Not affected
by extreme
values
(a) Percentiles :
x (n+1)th value *
100
(3)
Quantiles
(b) Quartiles
x (n+1)th value *
grouped frequency
distributions
(4) Mean
Affected by
extreme values
Page 5 of 8
(IQR = Q3 Q1)
Q1
Q2
Q3
x x
Whiskers
Min. Outlier Boundary or minimum value
Outliers
Here we show quartiles, Inter quartile range, sometimes range and possible outliers.
Working out outlier boundaries is done according to the formula given in the
question.
If there are no outliers, then the whiskers are drawn up to the lowest and the
maximum value.
If there are outliers, then the whiskers are drawn up to the outlier boundaries or to
the first value that we get below Q1 and above Q3. The outliers are usually marked
with crosses.
Page 6 of 8
)2
2 =
)2
Variance is the mean of the squares subtract the square of the mean.
= variance
Page 7 of 8
(6) Skewness
Skewness is a measure of symmetry.
Positively Skewed
Symmetrical
Negatively Skewed
How a
graph
looks like
Mean,
median
and mode
How a box
plot looks
like
Quartiles
Q2 Q1
< Q3 Q1
Q2 Q1
= Q3 Q1
Q2 Q1
> Q3 Q1
Skewness formula
3 (Mean Median)
Standard deviation
Page 8 of 8