Chapters 1 To 4 - Data Representation and Summarisation Techniques

EDEXCEL GCE Statistics 1
(Summary of Chapters 1 to 4)
Data Representation and summarisation techniques
Variable
a quantity which can vary in value
Variables can be classified into quantitative variables and qualitative variables.
Variables
Quantitative
Variables
Qualitative
Variables
Quantitative variables
- Numerical values are used to represent the value. e.g.:
weight, mass, no. of members in a family
Qualitative variables
- Non-numerical descriptors are used to represent the
value. e.g.: Religion, race, skin colour
The quantitative variables can be further classified in to continuous variables and

discrete variables.
Quantitative variables
Continuous
Variables
Continuous variables
e.g.: weight, mass, time
Discrete
Variables
- These values can take up any value within a given range.
Discrete variables
- These values can take up only specific values in a given
range. e.g.: no. of members in a family, no. of students in a class
Page 1 of 8
Summarisation of data
Summarisation of data has to be done to identify important features with the data
which help us to make deductions about the data which in turn help us to make
predictions for the data.
Data summarisation can be done by either of the following two methods:
Data Summarisation
Using graphical methods
using numerical methods
(1) Using graphical methods

Frequency distributions
Grouped frequency distributions
Histograms
Stem and leaf diagrams
(a) Frequency distribution
e.g.:
Value
1
2
3
4
Frequency
10
8
1
2
Cumulative Frequency
10
18
19
21
Page 2 of 8
(b) Grouped Frequency distribution

e.g.:
Weight (w) Kg
0w<9
9 w < 18
18 w < 28
28 w < 38
Lower Class Limit
Frequency
10
9
5
20
Cumulative frequency
10
19
24
44
Upper Class Limit
Lower Class Boundary = Lower class limit + Upper class limit of class above
2
Upper Class Boundary = Upper class limit + Lower class limit of class below
2
Class width = Upper class boundary Lower class boundary
Class-mid value = Lower Class boundary + Upper Class boundary
2
(c) Histograms
e.g.:
Frequency area of a bar

Frequency = frequency density x class width
Page 3 of 8
(d) Stem and leaf diagrams

Values are divided into two sections called stem and leaf.
First we draw an unordered stem and leaf diagram and later convert it to an ordered
stem and leaf diagram.
e.g.:
To compare two sets of data, a back to back stem and leaf diagram can be used.
Page 4 of 8
(2) Using Numerical Methods

Measures of Location
Measures of dispersion
(a) Measures of Location
Here we use a single value to represent the set of data
Quantity
Description
How to calculate
Advantages
(1)Mode
(2) Median
The value which

occurs the most no. of
times
The value that comes

at the centre of a set
of values when
arranged in ascending
or descending order
The number which occurs the

most no. of times
(a) For a set of values

median is the x(n+1)th value
(b) For grouped frequency

distributions, median is the
x nth value
Easy to
calculate
Disadvantages
No important
mathematical
features
No important
mathematical
features
Not affected
by extreme
values
Relatively easy
to
Calculate
Not affected
by extreme
values
(a) Percentiles :
x (n+1)th value *
100
(3)
Quantiles
(b) Quartiles
x (n+1)th value *
* Use (x nth value) for
grouped frequency
distributions
(a) For raw data, mean =

(4) Mean
The average value
(b) For grouped frequency

distributions, mean =
All values are

used in working
out the mean
Has important
mathematical
features
Affected by
extreme values
Page 5 of 8
Using interpolation to find a value in a grouped frequency distribution

Required value (n) =
Lower class boundary + ( n cumulative frequency up to class above) x Class width

Frequency of class
(b) Measured of dispersion

Here we use quantities which give us an idea about the spread of the data.
(1) Range = Maximum value minimum value
(2) Interquartile range = 3rd quartile 1st quartile
(IQR = Q3 Q1)
(3)Box Plots (or Whisker and box diagrams)

e.g.:
Q1
Q2
Q3
x x
Whiskers
Min. Outlier Boundary or minimum value
Max. Outlier boundary
Outliers
Here we show quartiles, Inter quartile range, sometimes range and possible outliers.
Working out outlier boundaries is done according to the formula given in the
question.
If there are no outliers, then the whiskers are drawn up to the lowest and the
maximum value.
If there are outliers, then the whiskers are drawn up to the outlier boundaries or to
the first value that we get below Q1 and above Q3. The outliers are usually marked
with crosses.
Page 6 of 8
(4) Variance (2)

For raw data, variance is:
2
)2
For grouped data, variance is:
2 =
)2
Variance is the mean of the squares subtract the square of the mean.
(5) Standard Deviation ()
= variance
Page 7 of 8
(6) Skewness
Skewness is a measure of symmetry.
Positively Skewed
Symmetrical
Negatively Skewed
Mode < Median < Mean
Mode = Median = Mean
Mode > Median > Mean
How a
graph
looks like
Mean,
median
and mode
How a box
plot looks
like
Quartiles
Q2 Q1
< Q3 Q1
Q2 Q1
= Q3 Q1
Q2 Q1
> Q3 Q1
Skewness formula
3 (Mean Median)
Standard deviation
Positive Positively Skewed
Negative Negatively Skewed
Smaller the value (closer to zero) ----------> More symmetrical

Higher the value ----------> More skewed
Page 8 of 8

Chapters 1 To 4 - Data Representation and Summarisation Techniques

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapters 1 To 4 - Data Representation and Summarisation Techniques

Uploaded by

Copyright:

Available Formats

EDEXCEL GCE Statistics 1

The quantitative variables can be further classified in to continuous variables and

- These values can take up any value within a given range.

Using graphical methods

using numerical methods

(1) Using graphical methods

(a) Frequency distribution

(b) Grouped Frequency distribution

Lower Class Limit

Upper Class Limit

Frequency area of a bar

(d) Stem and leaf diagrams

(2) Using Numerical Methods

The value which

The value that comes

The number which occurs the

(a) For a set of values

(b) For grouped frequency

* Use (x nth value) for

(a) For raw data, mean =

The average value

(b) For grouped frequency

All values are

Using interpolation to find a value in a grouped frequency distribution

Lower class boundary + ( n cumulative frequency up to class above) x Class width

(b) Measured of dispersion

(3)Box Plots (or Whisker and box diagrams)

Max. Outlier boundary

(4) Variance (2)

For grouped data, variance is:

(5) Standard Deviation ()

Mode < Median < Mean

Mode = Median = Mean

Mode > Median > Mean

Positive Positively Skewed

Negative Negatively Skewed

Smaller the value (closer to zero) ----------> More symmetrical

You might also like