You are on page 1of 5

Karina Gahir

S2 EDEXCEL REVISION NOTES


Binomial Distribution
Binomial probability distribution is defined as:
o P(X=r) = nCr x pn x (1-p)n-r
o Distribution is written as: X~B(n,p)
Conditions include:
o Fixed number of trials
o All trials are independent of one another
o Probability of success remains constant
o Each trial much have the same two possible outcomes
E(X) = np
Var(X) = npq [where q = 1 - p]
SD(X) =

Var ( X )

npq

To calculate the probabilities:


o P(X x) = read off the tables
o P(X x) = 1- P(X x-1)
o P(X < x) = P(X x-1)
o P(X > x) = 1- P(X x)
o P(x X y) = P(X y) P(X x-1)
o P(x < X < y) = P(X y) P(X x)
o P(x X < y) = P(X y-1) P(X x-1)
o P(x < X y) = P(X y) P(X x)
Note: If p > 0.5, need to make X~B(n,p) convert to Y~B(n,p)
Example:
Because...
If X~B(18,0.9), 0.9 is > 0.5 therefore not on the tables
X changes to Y
We want to find P(X > 8)
NEW value for n is 18 8 =10
This will become P(Y < 10) AND Y~B(10,0.1)
NEW value for p is 1.0 0.9 =
0.1
And the sign swaps round
We can now read this off the

Poisson Distribution
Binomial probability distribution is defined as:
o

P(X=r)

e x r
r!

o Distribution is written as: X~Po()


Conditions include:
o Events occur at random
o All events are independent of one another
o Average rate of occurrence remains constant
o Zero probability of simultaneous occurrences
E(X) =
Var(X) =
SD(X) =

Var ( X )

To calculate the probabilities:


o P(X x) = read off the tables
o P(X x) = 1- P(X x-1)
o P(X < x) = P(X x-1)
o P(X > x) = 1- P(X x)
o P(x X y) = P(X y) P(X x-1)
o P(x < X < y) = P(X y) P(X x)
o P(x X < y) = P(X y-1) P(X x-1)
o P(x < X y) = P(X y) P(X x)
To approximate the poisson to the binomial, the following conditions have to apply:
o If X~B(n,p) and
1) n is large [n > 50] and
2) p is small [p < 0.1] then, X~Po(np)

Karina Gahir

Continuous Random Variables


For a probability density function (p.d.f)
b

f ( x ) dx

P(a < X < b) =

o P(X < k) = P(X k)


For a cumulative distribution function (c.d.f)
upper limit

f ( x ) dx

F(xo) = P(X xo) using a p.d.f =

Finding probabilities just sub numbers into the F(x), but check the limits first

Finding the mode calculate

o
o
o
o
In general

To
To
To
To

find
find
find
find

the
the
the
the

dy
dx

lower limit

and make

dy
=0 and solve
dx

median, make F(x) = 0.5


lower quartile (Q1), make F(x) = 0.25
upper quartile (Q2), make F(x) = 0.75
inter quartile range, calculate Q2 - Q1

upper limit

E(X) =

xf ( x ) dx

lower limit
upper limit

x 2 f ( x ) dx

Var(X) =

o
o

E(aX+b) = aE(X) +b
Var(aX+b) = a2Var(X)

SD(X) =

o
o
For a 3 part
o
o
o
o
o

lower limit

[E(X)]2

Var ( X )

P(X=x) is always equal to 0


To convert c.d.f to p.d.f, you have to differentiate the c.d.f
probability density function
Stage 1) Integrate the first f(x)
2) Put the lower limit into that F(X) just calculated
3) Add this answer to the next f(x) integrated
To calculate E(X), find for each F(X) and then add them
To calculate Var(X), find for each F(X) and then add them
To calculate SD(X), find for each F(X) and then add them
To calculate the median, use the last F(X) and put that = 0.5

Sampling
Key Definitions
o A statistical model is a statistical process devised to describe or make predictions about
the expected behaviour of a real-world problem
o A population is a collection/ group/ set of individuals or items
o A sample is any subset of a population
o A sampling frame is a complete list or complete identification of the population (e.g. a
list, index register, database, map or file)
o A sampling unit is an individual member of the population
o A census is a survey of every member of a population
o A sample survey of a sample of a population
o A random sample is one in which every member if the population has an equal chance of
being selected
o Statistical inference are those methods by which one makes generalisations about a
population, based on information obtained from samples selected from that population

Karina Gahir
A statistic is a random variable that depends only on the observed sample (e.g. the
sample mean)
o A sampling distribution is the probability density function of a statistic (e.g. the sampling
distribution of the mean is the probability distribution of all possible means of samples of
a fixed size)
Advantages and Disadvantages of a sample survey
Disadvantages of a survey:
o May be too expensive
o May be time consuming
o May involve testing till destruction (e.g. if you want to find out how long batteries last,
you can test them until they run out)
o May be impossible to carry out for every member of the population
Therefore a sample survey:
o Is less time consuming
o Not as expensive
o HOWEVER it may not be representative of the whole population, and can be biased.
Common sources of bias involve:
Subjective choice by person taking the sample
Non-response
Sampling from an incomplete sampling frame
Sampling Distribution of a statistic
o Parameters are quantities that describe characteristics of a population (e.g. the mean,
variance or proportion that satisfies certain criteria)
o They can be estimated from sample data using quantities called statistics:
For example, if a random sample of size n, X 1, ..., Xn, is taken from a population
o

with an unknown mean, the sample mean


population mean.

X , can be used to estimate the

is a statistic given by:

Xi
X =
n
The probability distribution of a statistic is called its sampling distribution, it gives
all the possible values that a statistic can take

Continuous Distributions
o

For the uniform distribution X~U[a,b], the probability density function (p.d.f) is f(x) =

1
a xb
ba
0 otherwise
o
o

The area under the graph is always equal to 1 as it is uniform (i.e. a rectangle)
When finding probabilities, we use the concept that the area is always equal to 1

To find the cumulative distribution function (c.d.f) is F(X) =

To find E(X) =

a+b
2

0
xa
ba
1

x <a
a xb
x >b

Karina Gahir
2

(a+ b)
12

To find Var(X) =

The proof for this is as follows:

Continuity Corrections

1
2

P(X = b) = P(b -

<Y<b+

P(X b) = P(Y < b +

P(X < b) = P(Y < b -

1
2

P(X b) = P(Y > b -

1
2

P(X > b) = P(Y > b +

P(a X b) = P(a -

P(a < X < b) = P(a +

P(a X < b) = P(a -

P(a < X b) = P(b +

1
2

1
2
1
2
1
2
1
2
1
2

1
2

Yb+

<Y<b-

Y<b-

<Yb+

1
2

1
2 )
1
2

1
2

Approximating the Normal distribution to the Binomial and Poisson


Normal Approximation to the Binomial Distribution
o If X~B(n,p) and
1) n is large and
2) p is ~ 0.5 then, X~N(np,npq)
Normal Approximation to the Poisson Distribution
o If X~Po() and
1) is large [>10]
then, X~N(, )
-

First we identify which distribution to use


Then we approximate it to the normal
Then we do our continuity correction
Then we find out z value using the following formula (from S1):

z=
-

x valuemean
standard deviation

Then we use our normal distribution table to find the probability

Hypothesis Testing
Carrying out the tests

Karina Gahir
o

When

carrying out a hypothesis test, the stages involved are as follows:


Establish the null (Ho) and alternative (H1) hypotheses
Decide on a significance level (usually given to you in the question)
Collect your suitable data (conditions for this involve random sampling and the
items are independent)
Conduct the test binomial or poisson?
Interpret your results by comparing it to the significant level
If your calculated p > significance level then you do not reject your H o
meaning there is no evidence to suggest that..(relate to the scenario in the
question)
If your calculated p < significance level then you reject your H o meaning
there is evidence to suggest that...(relate to the scenario in the question)

Writing the null and alternative hypothesis


For Binomial distribution:
o Ho : p = ??
o H1 : p < or > ??
To decide on whether to use < or > you have to see in the question if there was a decrease
or increase, if there was a decrease, then p < ??, and if there was an increase, then p>??
When you conduct your test and you decide which way round the or goes, it goes the
same way as the sign in your alternative hypothesis (H 1) did
For Poisson distribution:
o Ho : = ??
o H1 : < or > ??
To decide on whether to use < or > you have to see in the question if there was a decrease
or increase, if there was a decrease, then < ??, and if there was an increase, then >??
When you conduct your test and you decide which way round the or goes, it goes the
same way as the sign in your alternative hypothesis (H 1) did
For two tailed tests:
o Ho : p = ??
o H1 : p ??
When carrying out a two tailed test, the significance level changes to half (e.g. 5%
significance changes to 2.5% significance). You then interpret your results based on this new
significance level.
To identify whether you use a two tailed test or not, you have to see whether in the question
they are talking about a change.
(e.g. Some time after the end of the campaign, a survey is conducted to find out id it has
had any impact in the number of smokers, both positive and negative)

Note: you are looking at whether


the numbers have increased or
decreases therefore looking at
whether it has changed or not.
Therefore a two tailed test is
required.

You might also like