You are on page 1of 11

STATISTICS 4040

GCSE Statistics
Revision notes

Collecting data
Sample – This is when data is collected from part of the population. There are different methods for sampling
Random sampling, Stratified sampling, Systematic sampling, cluster sampling, Quota sampling
Convenience sampling
Random sample – Where each piece of data has an equal chance of being picked.
Methods
Random number table – Tables of random numbers can be use.
Here is an extract from a table of random numbers
36015 37672 90153 67480 26237 10635 34269 01638
Split the numbers into to digit numbers
36 01 53 76 72 90 15 36 74 80 26 23 71 06 35 34 26 90 16 38
And then start from 36 and select numbers between 0 and 50
36 01 15 36 26 Leave out any numbers above 50
Calculator – Use the RAN button on your calculator. For numbers from 0 to 100 type
100 Shift RAN Until you have enough numbers.
Numbers in a bag – List numbers from 1 to 100 and put them in a bag and select the appropriate number at
random

Stratified sample – Where the data sampled is in proportion to the population.


Example-
The table shows the number of students in a school
Year Students
7 120
8 100
9 115
10 125
Total 460
A stratified sample of size 30 is to be taken. How many year 7’s will be picked?
Solution
Fraction of year 7 students in school is 120
240

120 x 30 = 7.82… Approx 8 year 7 students will be picked


240

Other Sample techniques


Convenience sample –the first so many pieces of data in the list are sampled.
(Quick but unlikely to be representative
Quota sample – The amount or quota of each group is given eg 100 woman were sampled
Systematic sampling – Data is chosen at regular intervals eg Every 10th Person.
Cluster sampling – The population is divided into groups (cluster) and then a group is chosen at random.

Khodabocus Aihjaaz Ahmad


STATISTICS 4040

Census – This is when all of the data in the population is taken. For example a census of the entire population
of the UK is taken every 10 years.

Advantage Disadvantage
Sample Cheaper Not completely representative
Less time consuming Possibly biased
Less data to be analysed

Census Unbiased Time consuming


Accurate Expensive
Takes account of Difficult to ensure whole
Whole population Whole population is surveyed
Types of Data
- Secondary data – This is data that has been collected by someone else.
Advantage- No need to collect. Ready to analyse
Disadvantage – Could be unreliable
- Primary data - This is data collected by the person doing the analysis
Advantage - Should be reliable
Disadvantage – Collecting is time consuming

Continuous Data – This is data that is on a continuous scale (Lengths, height, weights, measurements)
Discrete Data - This data that consists of separate numbers. (Shoe sizes, number of people, money)

Quantitative data – This is data that has numerical values. (Time , heights, weights , number of people)
Qualitative data - This is data that is not numerical (Colour, type , )

Questionnaires
Open questions – Has no suggested answers and gives people chance to reply as they wish
Advantage –Allows for a range of answers
Disadvantage – Range of response too broad- hard to analyse
Closed questions – Gives a set of answer for the person to choose from
Advantage – Restricts response making it easy to analyse responses
Disadvantage – Will not necessarily cover all responses

Pilot survey (pre-test) – Small scale replica of the survey to be carried out. Used to ensure that the method of
Data collection/ questionnaire and data required is suitable for the bigger survey.
Leading question – Avoid questions that infer an opinion such as “Smoking is bad for you. Do you agree?”

Other sampling methods See page 16

Questionnaires
Open questions – Has no suggested answers and gives people chance to reply as they wish
Closed questions – Gives a set of answer for the person to choose from
Pilot survey (pre-test) – Small scale replica of the survey to be carried out. Used to ensure that the method of
Data collection and data required is suitable for the bigger survey.
Leading question – Avoid questions that infer an opinion such as “Smoking is bad for you. Do you agree?”

Khodabocus Aihjaaz Ahmad


STATISTICS 4040

Calculations
Means from frequency distributions
Example

Means from grouped data


Find the mid-point of each group and then multiply by frequency. Sum and then divide by total frequency
Example

Standard Deviation
Variance is a measure of spread about the mean of a distribution of data
The square root of the variance is the standard deviation
Example 1

Example 2 If the data is grouped ( The mean for this example was found at the top of this page)

Khodabocus Aihjaaz Ahmad


STATISTICS 4040

Standardised Scores
This is used to compare values from different sets of data. For example, how do you compare your score in a
maths mock exam to your score in an English exam. Here’s how?
Standardised score = score – mean
Standard deviation
Example
Sam takes an exam in maths and another in English.
His marks along with the mean marks for the year and the standard deviation are shown below

Normal Distribution
Standard deviation is used to describe the normal distribution.
The normal distribution appears when large amounts of data are collected such as heights of people.
When put into a histogram the data will form a Bell shape as below.

Scatter Diagrams

Khodabocus Aihjaaz Ahmad


STATISTICS 4040

To find the equation of a line of best fit y = ax + b Where a is the gradient of the line
and b is the intercept on the y axis.
Causal Relationship
When a change in one variable causes a change in another variable there is said to be a causal relationship
between the two.
For example
The size of a car engine and the amount of petrol the car uses.
Sales of computers and sales of software
Not a causal relationship -> Sales of chocolates and sales of clothes.

Spearmans Rank Spearman’s rank correlation coefficient is a numerical measure of the correlation
between two sets of data.
- 1 is a perfect negative correlation
+ 1 is a perfect negative correlation 0 means no correlation

Geometric mean
To work out the geometric mean of n numbers, multiply the numbers together and then take the nth root of the
product
Geometric mean of 3 , 7, 4, 8
Geometric mean = 4 3 × 7 × 4 ×8 = 5.09
In percentage change problems the geometric mean tell us the average percentage change over a period
of time.
Index numbers
An index number shows the rate of change in quantity , value or price of an item over a period of time.
Index number = quantity x 100
Quantity in base year
Example

Chain base index numbers

Khodabocus Aihjaaz Ahmad


STATISTICS 4040
A chain base index number tells you the annual percentage change. It is found by using the previous year
as the base year and then working out the relative value of an item
Example (Using data above for antique)

Weighted Means
In a GCSE course 40% of the mark is for paper 1, 40% is for paper 2 10% is for coursework task 1 and 10% is
for coursework task 2.
If a student scores the following marks we can work out the weighted mean.
Paper 1 62%
Paper 2 38%
Coursework 1 58%
Coursework 2 29%
Weighted mean = 40 x Paper 1 + 40 x paper 2 + 10 x coursework1 + 10 x coursework 2
40 + 40 + 10 + 10
= 40 x 62 + 40 x 38 + 10 x 58 + 10 x 29 = 49.7%
100

Time series and moving averages


A time series graph shows how values change over a a period of time (days, weeks , months, quarters of
years)
The moving average gives an idea of how the values are changing
To find the 3 point moving average or 23, 22, 24, 25, 26, 29, 28
Average 23, 22, 24 then Average 22, 24, 25, then average 24, 25, 26 and so on.
Once you have calculated the moving averages you will need to plot these. Then draw a line of best fit through
the moving averages to get a trend line.

Quality assurance
These are used in commercial productions. For example. A packet of crisps should have a weight of 50g.
Samples of packets are taken a regular intervals and the mean weight calculated . Upper and lower warning and
action limits are set. If the sample mean is above or below the warning limit another sample should be taken
immediately. If the sample mean is above or below the action limit the production should be stopped and
machines reset.

Khodabocus Aihjaaz Ahmad


STATISTICS 4040

Quality control chart for ranges.


Samples are taken and the range found. If the range is too large then production should be stopped.

Charts and Graphs


Box plots

Lowest value Lower quartile median Upper quartile Highest value

Outliers Any values 1.5 x IQR above the UQ or below the LQ are considered to be an outlier
Cumulative frequency
The frequency of a distribution is accumulated
For example
Mark Frequenc Cumulative frequency
y
0-1 4 4
1 -2 5 4+ 5 = 9
2- 3 2 4 + 5 + 2 = 11
3- 4 6 4 + 5 + 2 + 6 = 17
4- 5 2 4 + 5 + 2 + 6 + 2 = 19
5- 6 3 4 + 5 + 2 + 6 + 2 + 3 = 22
6- 7 1 4 + 5 + 2 + 6 + 2 + 3 + 1 = 23
The values of the cumulative frequency are then plotted at the top value of each group and connected either
by straight lines or a curve

Khodabocus Aihjaaz Ahmad


STATISTICS 4040

Histograms The area of the bar represents the frequency and the height of the bar is the Frequency density

Frequency density = frequency


Class width

Stem and Leaf diagrams


This is a chart to help order data.
For example
68 , 72, 56, 52, 78, 53, 64, 73
Can be represented in a stem and leaf diagram
5 2 3 6
6 4 8
7 2 3 8
Key 5 2 = 52

Comparative Pie charts


When comparing two sets of data using pie charts we need to take the total frequencies into account.
The areas of the two circles should be in the same ratio as the two frequencies.
The larger pie chart has the bigger frequency.

Khodabocus Aihjaaz Ahmad


STATISTICS 4040

Compound Bar Charts See page 53

Population pyramids
This allows you to compare percentages of populations by age and gender.

Choropleth maps – Used to show population distributions

Khodabocus Aihjaaz Ahmad


STATISTICS 4040

Probability
Odds
The ratio failures : successes is the odds against an event happening
The ratio successes : failures is the odds on an event happening
If the odds are 7:2 against, what is the probability of success
Answer: There are 7 chances of failure to every success, thus for (7 + 2) = 9 attempts there will be 2 successes
The probability of a success is 2
9
Mutually Exclusive events – Events that cannot happen at the same time
Independent events – The probability of one event is not affected by the probability of another event.
Exhaustive events – A set of events is exhaustive if the set contains all possible outcomes.
Rules of probability
P(a or b ) = P (a ) + P(b)
P(a and b) = P(a ) x P(b)
Tree diagrams
When completing a tree diagram remember each pair of branches must add to make 1.
As you travel along the branches to find possible outcomes you multiply the probabilities.
If the is more than one possible out come sum them.

Khodabocus Aihjaaz Ahmad


STATISTICS 4040
Venn Diagrams

Discrete uniform distribution

A discrete uniform distribution has n distinct outcomes. Each outcome is equally likely, with probability
Equal to 1
n
For example a fair six sided dice is rolled. The possible outcomes would be written as a probability distribution

x: 1 2 3 4 5 6
p(x): 1 1 1 1 1 1
6 6 6 6 6 6

Binomial distribution

If two events p and q are independent. If p is consider a success and q a failure and n trials are carried
out then the probabilities are found by expanding (p + q)n .

p (success) = 0.2
q (failure) = 0.8
5 trials are carried out.
Probability distribution is (p + q)5 = p5 + 5p4q + 10p3q2 + 10p2q3 + 5pq4 + q5

Probability of two successes : use 10p2q3 = 10 x (0.2)2 x (0.8)3

Khodabocus Aihjaaz Ahmad

You might also like