Professional Documents
Culture Documents
GCSE Statistics
Revision notes
Collecting data
Sample – This is when data is collected from part of the population. There are different methods for sampling
Random sampling, Stratified sampling, Systematic sampling, cluster sampling, Quota sampling
Convenience sampling
Random sample – Where each piece of data has an equal chance of being picked.
Methods
Random number table – Tables of random numbers can be use.
Here is an extract from a table of random numbers
36015 37672 90153 67480 26237 10635 34269 01638
Split the numbers into to digit numbers
36 01 53 76 72 90 15 36 74 80 26 23 71 06 35 34 26 90 16 38
And then start from 36 and select numbers between 0 and 50
36 01 15 36 26 Leave out any numbers above 50
Calculator – Use the RAN button on your calculator. For numbers from 0 to 100 type
100 Shift RAN Until you have enough numbers.
Numbers in a bag – List numbers from 1 to 100 and put them in a bag and select the appropriate number at
random
Census – This is when all of the data in the population is taken. For example a census of the entire population
of the UK is taken every 10 years.
Advantage Disadvantage
Sample Cheaper Not completely representative
Less time consuming Possibly biased
Less data to be analysed
Continuous Data – This is data that is on a continuous scale (Lengths, height, weights, measurements)
Discrete Data - This data that consists of separate numbers. (Shoe sizes, number of people, money)
Quantitative data – This is data that has numerical values. (Time , heights, weights , number of people)
Qualitative data - This is data that is not numerical (Colour, type , )
Questionnaires
Open questions – Has no suggested answers and gives people chance to reply as they wish
Advantage –Allows for a range of answers
Disadvantage – Range of response too broad- hard to analyse
Closed questions – Gives a set of answer for the person to choose from
Advantage – Restricts response making it easy to analyse responses
Disadvantage – Will not necessarily cover all responses
Pilot survey (pre-test) – Small scale replica of the survey to be carried out. Used to ensure that the method of
Data collection/ questionnaire and data required is suitable for the bigger survey.
Leading question – Avoid questions that infer an opinion such as “Smoking is bad for you. Do you agree?”
Questionnaires
Open questions – Has no suggested answers and gives people chance to reply as they wish
Closed questions – Gives a set of answer for the person to choose from
Pilot survey (pre-test) – Small scale replica of the survey to be carried out. Used to ensure that the method of
Data collection and data required is suitable for the bigger survey.
Leading question – Avoid questions that infer an opinion such as “Smoking is bad for you. Do you agree?”
Calculations
Means from frequency distributions
Example
Standard Deviation
Variance is a measure of spread about the mean of a distribution of data
The square root of the variance is the standard deviation
Example 1
Example 2 If the data is grouped ( The mean for this example was found at the top of this page)
Standardised Scores
This is used to compare values from different sets of data. For example, how do you compare your score in a
maths mock exam to your score in an English exam. Here’s how?
Standardised score = score – mean
Standard deviation
Example
Sam takes an exam in maths and another in English.
His marks along with the mean marks for the year and the standard deviation are shown below
Normal Distribution
Standard deviation is used to describe the normal distribution.
The normal distribution appears when large amounts of data are collected such as heights of people.
When put into a histogram the data will form a Bell shape as below.
Scatter Diagrams
To find the equation of a line of best fit y = ax + b Where a is the gradient of the line
and b is the intercept on the y axis.
Causal Relationship
When a change in one variable causes a change in another variable there is said to be a causal relationship
between the two.
For example
The size of a car engine and the amount of petrol the car uses.
Sales of computers and sales of software
Not a causal relationship -> Sales of chocolates and sales of clothes.
Spearmans Rank Spearman’s rank correlation coefficient is a numerical measure of the correlation
between two sets of data.
- 1 is a perfect negative correlation
+ 1 is a perfect negative correlation 0 means no correlation
Geometric mean
To work out the geometric mean of n numbers, multiply the numbers together and then take the nth root of the
product
Geometric mean of 3 , 7, 4, 8
Geometric mean = 4 3 × 7 × 4 ×8 = 5.09
In percentage change problems the geometric mean tell us the average percentage change over a period
of time.
Index numbers
An index number shows the rate of change in quantity , value or price of an item over a period of time.
Index number = quantity x 100
Quantity in base year
Example
Weighted Means
In a GCSE course 40% of the mark is for paper 1, 40% is for paper 2 10% is for coursework task 1 and 10% is
for coursework task 2.
If a student scores the following marks we can work out the weighted mean.
Paper 1 62%
Paper 2 38%
Coursework 1 58%
Coursework 2 29%
Weighted mean = 40 x Paper 1 + 40 x paper 2 + 10 x coursework1 + 10 x coursework 2
40 + 40 + 10 + 10
= 40 x 62 + 40 x 38 + 10 x 58 + 10 x 29 = 49.7%
100
Quality assurance
These are used in commercial productions. For example. A packet of crisps should have a weight of 50g.
Samples of packets are taken a regular intervals and the mean weight calculated . Upper and lower warning and
action limits are set. If the sample mean is above or below the warning limit another sample should be taken
immediately. If the sample mean is above or below the action limit the production should be stopped and
machines reset.
Outliers Any values 1.5 x IQR above the UQ or below the LQ are considered to be an outlier
Cumulative frequency
The frequency of a distribution is accumulated
For example
Mark Frequenc Cumulative frequency
y
0-1 4 4
1 -2 5 4+ 5 = 9
2- 3 2 4 + 5 + 2 = 11
3- 4 6 4 + 5 + 2 + 6 = 17
4- 5 2 4 + 5 + 2 + 6 + 2 = 19
5- 6 3 4 + 5 + 2 + 6 + 2 + 3 = 22
6- 7 1 4 + 5 + 2 + 6 + 2 + 3 + 1 = 23
The values of the cumulative frequency are then plotted at the top value of each group and connected either
by straight lines or a curve
Histograms The area of the bar represents the frequency and the height of the bar is the Frequency density
Population pyramids
This allows you to compare percentages of populations by age and gender.
Probability
Odds
The ratio failures : successes is the odds against an event happening
The ratio successes : failures is the odds on an event happening
If the odds are 7:2 against, what is the probability of success
Answer: There are 7 chances of failure to every success, thus for (7 + 2) = 9 attempts there will be 2 successes
The probability of a success is 2
9
Mutually Exclusive events – Events that cannot happen at the same time
Independent events – The probability of one event is not affected by the probability of another event.
Exhaustive events – A set of events is exhaustive if the set contains all possible outcomes.
Rules of probability
P(a or b ) = P (a ) + P(b)
P(a and b) = P(a ) x P(b)
Tree diagrams
When completing a tree diagram remember each pair of branches must add to make 1.
As you travel along the branches to find possible outcomes you multiply the probabilities.
If the is more than one possible out come sum them.
A discrete uniform distribution has n distinct outcomes. Each outcome is equally likely, with probability
Equal to 1
n
For example a fair six sided dice is rolled. The possible outcomes would be written as a probability distribution
x: 1 2 3 4 5 6
p(x): 1 1 1 1 1 1
6 6 6 6 6 6
Binomial distribution
If two events p and q are independent. If p is consider a success and q a failure and n trials are carried
out then the probabilities are found by expanding (p + q)n .
p (success) = 0.2
q (failure) = 0.8
5 trials are carried out.
Probability distribution is (p + q)5 = p5 + 5p4q + 10p3q2 + 10p2q3 + 5pq4 + q5