You are on page 1of 8

EDEXCEL GCE Statistics 1

(Summary of Chapters 1 to 4)
Data Representation and summarisation techniques

Variable
a quantity which can vary in value
Variables can be classified into quantitative variables and qualitative variables.
Variables

Quantitative
Variables

Qualitative
Variables

Quantitative variables
- Numerical values are used to represent the value. e.g.:
weight, mass, no. of members in a family
Qualitative variables
- Non-numerical descriptors are used to represent the
value. e.g.: Religion, race, skin colour

The quantitative variables can be further classified in to continuous variables and


discrete variables.
Quantitative variables

Continuous
Variables
Continuous variables
e.g.: weight, mass, time

Discrete
Variables

- These values can take up any value within a given range.

Discrete variables
- These values can take up only specific values in a given
range. e.g.: no. of members in a family, no. of students in a class

Page 1 of 8

Summarisation of data
Summarisation of data has to be done to identify important features with the data
which help us to make deductions about the data which in turn help us to make
predictions for the data.
Data summarisation can be done by either of the following two methods:
Data Summarisation

Using graphical methods

using numerical methods

(1) Using graphical methods


Frequency distributions
Grouped frequency distributions
Histograms
Stem and leaf diagrams

(a) Frequency distribution

e.g.:

Value
1
2
3
4

Frequency
10
8
1
2

Cumulative Frequency
10
18
19
21

Page 2 of 8

(b) Grouped Frequency distribution


e.g.:

Weight (w) Kg
0w<9
9 w < 18
18 w < 28
28 w < 38

Lower Class Limit

Frequency
10
9
5
20

Cumulative frequency
10
19
24
44

Upper Class Limit

Lower Class Boundary = Lower class limit + Upper class limit of class above
2
Upper Class Boundary = Upper class limit + Lower class limit of class below
2
Class width = Upper class boundary Lower class boundary
Class-mid value = Lower Class boundary + Upper Class boundary
2
(c) Histograms
e.g.:

Frequency area of a bar


Frequency = frequency density x class width
Page 3 of 8

(d) Stem and leaf diagrams


Values are divided into two sections called stem and leaf.
First we draw an unordered stem and leaf diagram and later convert it to an ordered
stem and leaf diagram.
e.g.:

To compare two sets of data, a back to back stem and leaf diagram can be used.

Page 4 of 8

(2) Using Numerical Methods


Measures of Location
Measures of dispersion
(a) Measures of Location
Here we use a single value to represent the set of data
Quantity

Description

How to calculate

Advantages

(1)Mode

(2) Median

The value which


occurs the most no. of
times

The value that comes


at the centre of a set
of values when
arranged in ascending
or descending order

The number which occurs the


most no. of times

(a) For a set of values


median is the x(n+1)th value

(b) For grouped frequency


distributions, median is the
x nth value

Easy to
calculate

Disadvantages

No important
mathematical
features

No important
mathematical
features

Not affected
by extreme
values
Relatively easy
to
Calculate
Not affected
by extreme
values

(a) Percentiles :

x (n+1)th value *
100

(3)
Quantiles

(b) Quartiles

x (n+1)th value *

* Use (x nth value) for

grouped frequency
distributions

(a) For raw data, mean =



(4) Mean

The average value

(b) For grouped frequency


distributions, mean =

All values are


used in working
out the mean
Has important
mathematical
features

Affected by
extreme values

Page 5 of 8

Using interpolation to find a value in a grouped frequency distribution


Required value (n) =

Lower class boundary + ( n cumulative frequency up to class above) x Class width


Frequency of class

(b) Measured of dispersion


Here we use quantities which give us an idea about the spread of the data.
(1) Range = Maximum value minimum value
(2) Interquartile range = 3rd quartile 1st quartile

(IQR = Q3 Q1)

(3)Box Plots (or Whisker and box diagrams)


e.g.:

Q1

Q2

Q3
x x

Whiskers
Min. Outlier Boundary or minimum value

Max. Outlier boundary

Outliers

Here we show quartiles, Inter quartile range, sometimes range and possible outliers.
Working out outlier boundaries is done according to the formula given in the
question.
If there are no outliers, then the whiskers are drawn up to the lowest and the
maximum value.
If there are outliers, then the whiskers are drawn up to the outlier boundaries or to
the first value that we get below Q1 and above Q3. The outliers are usually marked
with crosses.
Page 6 of 8

(4) Variance (2)


For raw data, variance is:
2

)2

For grouped data, variance is:

2 =

)2

Variance is the mean of the squares subtract the square of the mean.

(5) Standard Deviation ()

= variance

Page 7 of 8

(6) Skewness
Skewness is a measure of symmetry.
Positively Skewed

Symmetrical

Negatively Skewed

Mode < Median < Mean

Mode = Median = Mean

Mode > Median > Mean

How a
graph
looks like
Mean,
median
and mode
How a box
plot looks
like
Quartiles

Q2 Q1

< Q3 Q1

Q2 Q1

= Q3 Q1

Q2 Q1

> Q3 Q1

Skewness formula

3 (Mean Median)
Standard deviation

Positive Positively Skewed

Negative Negatively Skewed

Smaller the value (closer to zero) ----------> More symmetrical


Higher the value ----------> More skewed

Page 8 of 8

You might also like