s2 Revision Notes

Karina Gahir
S2 EDEXCEL REVISION NOTES

Binomial Distribution
Binomial probability distribution is defined as:
o P(X=r) = nCr x pn x (1-p)n-r
o Distribution is written as: X~B(n,p)
Conditions include:
o Fixed number of trials
o All trials are independent of one another
o Probability of success remains constant
o Each trial much have the same two possible outcomes
E(X) = np
Var(X) = npq [where q = 1 - p]
SD(X) =
Var ( X )
npq
To calculate the probabilities:

o P(X x) = read off the tables
o P(X x) = 1- P(X x-1)
o P(X < x) = P(X x-1)
o P(X > x) = 1- P(X x)
o P(x X y) = P(X y) P(X x-1)
o P(x < X < y) = P(X y) P(X x)
o P(x X < y) = P(X y-1) P(X x-1)
o P(x < X y) = P(X y) P(X x)
Note: If p > 0.5, need to make X~B(n,p) convert to Y~B(n,p)
Example:
Because...
If X~B(18,0.9), 0.9 is > 0.5 therefore not on the tables
X changes to Y
We want to find P(X > 8)
NEW value for n is 18 8 =10
This will become P(Y < 10) AND Y~B(10,0.1)
NEW value for p is 1.0 0.9 =
0.1
And the sign swaps round
We can now read this off the
Poisson Distribution
Binomial probability distribution is defined as:
o
P(X=r)
e x r
r!
o Distribution is written as: X~Po()

Conditions include:
o Events occur at random
o All events are independent of one another
o Average rate of occurrence remains constant
o Zero probability of simultaneous occurrences
E(X) =
Var(X) =
SD(X) =
Var ( X )
To calculate the probabilities:

o P(X x) = read off the tables
o P(X x) = 1- P(X x-1)
o P(X < x) = P(X x-1)
o P(X > x) = 1- P(X x)
o P(x X y) = P(X y) P(X x-1)
o P(x < X < y) = P(X y) P(X x)
o P(x X < y) = P(X y-1) P(X x-1)
o P(x < X y) = P(X y) P(X x)
To approximate the poisson to the binomial, the following conditions have to apply:
o If X~B(n,p) and
1) n is large [n > 50] and
2) p is small [p < 0.1] then, X~Po(np)
Karina Gahir
Continuous Random Variables

For a probability density function (p.d.f)
b
f ( x ) dx
P(a < X < b) =
o P(X < k) = P(X k)

For a cumulative distribution function (c.d.f)
upper limit
f ( x ) dx
F(xo) = P(X xo) using a p.d.f =
Finding probabilities just sub numbers into the F(x), but check the limits first
Finding the mode calculate
o
o
o
o
In general
To
To
To
To
find
find
find
find
the
the
the
the
dy
dx
lower limit
and make
dy
=0 and solve
dx
median, make F(x) = 0.5

lower quartile (Q1), make F(x) = 0.25
upper quartile (Q2), make F(x) = 0.75
inter quartile range, calculate Q2 - Q1
upper limit
E(X) =
xf ( x ) dx
lower limit
upper limit
x 2 f ( x ) dx
Var(X) =
o
o
E(aX+b) = aE(X) +b
Var(aX+b) = a2Var(X)
SD(X) =
o
o
For a 3 part
o
o
o
o
o
lower limit
[E(X)]2
Var ( X )
P(X=x) is always equal to 0

To convert c.d.f to p.d.f, you have to differentiate the c.d.f
probability density function
Stage 1) Integrate the first f(x)
2) Put the lower limit into that F(X) just calculated
3) Add this answer to the next f(x) integrated
To calculate E(X), find for each F(X) and then add them
To calculate Var(X), find for each F(X) and then add them
To calculate SD(X), find for each F(X) and then add them
To calculate the median, use the last F(X) and put that = 0.5
Sampling
Key Definitions
o A statistical model is a statistical process devised to describe or make predictions about
the expected behaviour of a real-world problem
o A population is a collection/ group/ set of individuals or items
o A sample is any subset of a population
o A sampling frame is a complete list or complete identification of the population (e.g. a
list, index register, database, map or file)
o A sampling unit is an individual member of the population
o A census is a survey of every member of a population
o A sample survey of a sample of a population
o A random sample is one in which every member if the population has an equal chance of
being selected
o Statistical inference are those methods by which one makes generalisations about a
population, based on information obtained from samples selected from that population
Karina Gahir
A statistic is a random variable that depends only on the observed sample (e.g. the
sample mean)
o A sampling distribution is the probability density function of a statistic (e.g. the sampling
distribution of the mean is the probability distribution of all possible means of samples of
a fixed size)
Advantages and Disadvantages of a sample survey
Disadvantages of a survey:
o May be too expensive
o May be time consuming
o May involve testing till destruction (e.g. if you want to find out how long batteries last,
you can test them until they run out)
o May be impossible to carry out for every member of the population
Therefore a sample survey:
o Is less time consuming
o Not as expensive
o HOWEVER it may not be representative of the whole population, and can be biased.
Common sources of bias involve:
Subjective choice by person taking the sample
Non-response
Sampling from an incomplete sampling frame
Sampling Distribution of a statistic
o Parameters are quantities that describe characteristics of a population (e.g. the mean,
variance or proportion that satisfies certain criteria)
o They can be estimated from sample data using quantities called statistics:
For example, if a random sample of size n, X 1, ..., Xn, is taken from a population
o
with an unknown mean, the sample mean

population mean.
X , can be used to estimate the
is a statistic given by:
Xi
X =
n
The probability distribution of a statistic is called its sampling distribution, it gives
all the possible values that a statistic can take
Continuous Distributions
o
For the uniform distribution X~U[a,b], the probability density function (p.d.f) is f(x) =
1
a xb
ba
0 otherwise
o
o
The area under the graph is always equal to 1 as it is uniform (i.e. a rectangle)
When finding probabilities, we use the concept that the area is always equal to 1
To find the cumulative distribution function (c.d.f) is F(X) =
To find E(X) =
a+b
2
0
xa
ba
1
x <a
a xb
x >b
Karina Gahir
2
(a+ b)
12
To find Var(X) =
The proof for this is as follows:
Continuity Corrections
1
2
P(X = b) = P(b -
<Y<b+
P(X b) = P(Y < b +
P(X < b) = P(Y < b -
1
2
P(X b) = P(Y > b -
1
2
P(X > b) = P(Y > b +
P(a X b) = P(a -
P(a < X < b) = P(a +
P(a X < b) = P(a -
P(a < X b) = P(b +
1
2
1
2
1
2
1
2
1
2
1
2
1
2
Yb+
<Y<b-
Y<b-
<Yb+
1
2
1
2 )
1
2
1
2
Approximating the Normal distribution to the Binomial and Poisson

Normal Approximation to the Binomial Distribution
o If X~B(n,p) and
1) n is large and
2) p is ~ 0.5 then, X~N(np,npq)
Normal Approximation to the Poisson Distribution
o If X~Po() and
1) is large [>10]
then, X~N(, )
-
First we identify which distribution to use

Then we approximate it to the normal
Then we do our continuity correction
Then we find out z value using the following formula (from S1):
z=
-
x valuemean
standard deviation
Then we use our normal distribution table to find the probability
Hypothesis Testing
Carrying out the tests
Karina Gahir
o
When
carrying out a hypothesis test, the stages involved are as follows:

Establish the null (Ho) and alternative (H1) hypotheses
Decide on a significance level (usually given to you in the question)
Collect your suitable data (conditions for this involve random sampling and the
items are independent)
Conduct the test binomial or poisson?
Interpret your results by comparing it to the significant level
If your calculated p > significance level then you do not reject your H o
meaning there is no evidence to suggest that..(relate to the scenario in the
question)
If your calculated p < significance level then you reject your H o meaning
there is evidence to suggest that...(relate to the scenario in the question)
Writing the null and alternative hypothesis

For Binomial distribution:
o Ho : p = ??
o H1 : p < or > ??
To decide on whether to use < or > you have to see in the question if there was a decrease
or increase, if there was a decrease, then p < ??, and if there was an increase, then p>??
When you conduct your test and you decide which way round the or goes, it goes the
same way as the sign in your alternative hypothesis (H 1) did
For Poisson distribution:
o Ho : = ??
o H1 : < or > ??
To decide on whether to use < or > you have to see in the question if there was a decrease
or increase, if there was a decrease, then < ??, and if there was an increase, then >??
When you conduct your test and you decide which way round the or goes, it goes the
same way as the sign in your alternative hypothesis (H 1) did
For two tailed tests:
o Ho : p = ??
o H1 : p ??
When carrying out a two tailed test, the significance level changes to half (e.g. 5%
significance changes to 2.5% significance). You then interpret your results based on this new
significance level.
To identify whether you use a two tailed test or not, you have to see whether in the question
they are talking about a change.
(e.g. Some time after the end of the campaign, a survey is conducted to find out id it has
had any impact in the number of smokers, both positive and negative)
Note: you are looking at whether

the numbers have increased or
decreases therefore looking at
whether it has changed or not.
Therefore a two tailed test is
required.

s2 Revision Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

s2 Revision Notes

Uploaded by

Copyright:

Available Formats

Karina Gahir

S2 EDEXCEL REVISION NOTES

To calculate the probabilities:

o Distribution is written as: X~Po()

To calculate the probabilities:

Continuous Random Variables

P(a < X < b) =

o P(X < k) = P(X k)

F(xo) = P(X xo) using a p.d.f =

Finding the mode calculate

median, make F(x) = 0.5

P(X=x) is always equal to 0

with an unknown mean, the sample mean

X , can be used to estimate the

is a statistic given by:

To find the cumulative distribution function (c.d.f) is F(X) =

The proof for this is as follows:

P(X b) = P(Y < b +

P(X < b) = P(Y < b -

P(X b) = P(Y > b -

P(X > b) = P(Y > b +

P(a < X < b) = P(a +

P(a X < b) = P(a -

P(a < X b) = P(b +

Approximating the Normal distribution to the Binomial and Poisson

First we identify which distribution to use

Then we use our normal distribution table to find the probability

carrying out a hypothesis test, the stages involved are as follows:

Writing the null and alternative hypothesis

Note: you are looking at whether

You might also like