Statistics Study Sheet

Probability
Other synonyms: likelihood, chance, odds

Probability is a measure of likelihood, describing / quantifying how likely something is to
happen
Abbreviated as Pr(...) or P(...)
Probability Properties
Ranges from no chance to 100% chance: 0 probability 1
cannot be negative.
cannot be greater than 1.
Therefore total probability of all the possibilities in a situation must be 1
Possibilities are increased by adding alternatives
Possibilities are decreased by imposing restrictions
Probabilities of non-overlapping (disjoint) alternatives are added to give probability of a larger
set of possibilities. E.g. Pr(Getting A or B) = Pr(A) + Pr(B)
If there is an overlap, subtract the probability of the overlap because its in both alternatives.
E.g. (Getting A or B) = Pr(A) + Pr(B) Pr(A and B)
Probability of something not happening is 1 minus the probability it happens.
E.g. Pr( ) = 1 Pr(A) and Pr(A| ) = 1 Pr ( | )
Estimating Probabilities
Choosing a number at random from A to B assume probability of each number is A/B
Choosing a point at random along a line of length Acm. Assume the probability that point lies
on any l cm segment is equal and proportional to length of the segment, so the chance of
choosing a point in a segment of length l cm is estimated as l/A
Choosing a point at random in an area of A cm2. The chance of the point lying within a section
of area B cm2 is B/A.
Independence of Events
Two events A and B are independent if and only if Pr(AB) = Pr(A)Pr(B)
Note that independence is saying that knowing B occurs (or has occurred) does not in any way affect
the probability of A occurring.
Conditional Probability
For events A and B, the conditional probability of A given B is denoted by Pr(A|B) and is
defined by: Pr(A and B) = Pr(A|B)Pr(B) = Pr(B|A)Pr(A)
Equivalently:
If Pr(B)
0, then Pr(A|B) =
If Pr(A)
0, then Pr(B|A) =
Note AB denotes the event A and B
| Conditional Operator:
given, if, when.
Independence is a special case of Conditional Probability

If events A, B are independent, then Pr(A|B) = Pr(A), and so:
Pr(AB) = Pr(A|B)Pr(B) = Pr(A)Pr(B)
Bayes Theorem (Conditional Probability)

Pr(A|B) =
, where Pr(A|B) and Pr(B|A) are conditional
Note: Pr(A) = Pr(AB) + Pr(A)
Probability Distributions
A set of distinct events that cover all possibilities in a situation. We can put together a list or
description of the corresponding probabilities. This gives us a probability distribution for the variable
that describes how the total probability of 1 is distributed over the possible values of the random
variable. (A random variable is any variable which has a probability distribution associated with it.)
Discrete Probability Distributions

Probabilities associated with individual values of the variable
With discrete variables, we find probabilities of combined events by summing the probability
function over the values included in the event.
Probability distribution represented formally as a probability function: Pr(X=k) = p(k).
Where: X is the random variable, k is any value (possibly descriptive) that X can take, and p(k)
is the probability of k occurring.
Therefore the function p(k) must satisfy:
and
Continuous Probability Distributions

Continuous variables take values in a range (rather than whole numbers). Therefore we
associate probabilities with intervals of values of the variable rather than individual values.
The distribution for a continuous random variable X can be represented formally by a
probability density function f(x) such that:
I.e., the probability that X lies in an interval is given by the integral of f(x) over the interval
Therefore the probability density function (pdf) f(x) must satisfy:
and
If X is a continuous random variable, note that Pr(X = x) = Pr(x < X < x) = 0 for any x, and
therefore
.
I.e., the probability associated with an exact (individual) value of a continuous random
variable is 0
Distribution Parameters Mean (Expected Value)

An average of the possible values of the variable
Denoted by E(X) or
or
, or just if its clear which random variable is being referred to
The mean of X is defined as
Where: p(x) = Pr(X=x) is the probability function for a discrete random variable and f(x) is the
probability density function for a continuous random variable.
Distribution Parameters Median

Continuous Case - Defined to be the value of the variable which has a probability of exactly 0.5
less than it, and (therefore) 0.5 greater than it.
Discrete Case - Try to get as close as possible to 0.5 on each side of the median value.
Hence, mainly used for continuous random variables
Distribution Parameters Variance

Variance is a measure of spread, i.e. how spread-out a distribution is around its mean
Denoted by Var(X) or
Units are the square of the units of the variable
The variance of X is defined as Var(X) =
We can also use the E( ) notation to define variance as Var(X) = [
Distribution Parameters Standard Deviation

The square root of the variance is an alternative measure of spread, called the standard
deviation
Denoted by or
Units are the same as the variable itself
Distribution
Binomial
Geometric
Negative
Binomial
Poisson
Exponential
Gamma
Normal
Definition
Notes
Probability Function
A discrete distribution for, the

number of occurrences (X) out
of a specified total (n) that
meet a specified characteristic.
The same conditions are repeated n times with each repeat independent of all
others
At each repetition (or trial), there is a constant probability (p) that a particular
event (E) occurs
The random variable X is discrete
Possible values for x are: 0, 1, ..., n
The distribution has two parameters: n and p
Can be written: X ~ b(n, p). Let X be the number of times out of n that E occurs
To find the probability function of

X, we consider the possible
sequences that will give us x
occurrences of E.
Number of repetitions to obtain

st
the 1 occurrence of a
particular event (E)

2
Mean of the geometric variable Y is (1p)/p and its variance is (1 p)/p
Consider first the number of

repetitions, Y, before the first
occurrence of an event E.
Number of repetitions to obtain

th
the k occurrence of a
particular event (E)
A discrete distribution for, the
number of occurrences (X) of an
event occurring randomly in an
interval of time or space, at
average or expected rate of
per unit time or space.
The time or distance until an
event happens, or the time
between events, when events
are occurring at random in time
or space with a constant
average or expected rate
The waiting time until the kth
event in a Poisson process.
A continuous random variable
of natural measurements.

For the corresponding negative binomial variable, the mean is r(1p)/p and the
2
variance is r(1 p)/p
Occurrences in non-overlapping intervals of time or space are independent
The random variable X is discrete
Possible values for X are: 0, 1, 2,
The distribution has one parameter: (The chance of an event occurring in the
next instant; a constant)
Let N(t) be the number of events that occur in the interval of length t
A continuous distribution for a positive random variable
Distribution of the time or distance between events in a Poisson process
The random variable T is continuous. Let pdf of T be f(t)
The distribution has one parameter:
( )
Where ( )
Mean: = t = Variance
(for t > 0)
A continuous distribution for a positive random variable

Has two parameters: mean: , and standard deviation:
2
Can be written: X N(, )
Examples of Normal variables:

height, weight, error of
measurement, time, attendance.
Linear Combinations
Means:
In general,
, similarly
[
Variances:
Consider Var(cX+Y) where c is a constant:
Covariance:
With this definition
[
we
]
have
and
If X and Y are independent then Cov(X,Y) = 0

Correlation:
As | |
if X and Y are linearly related, we can consider
linear relationship between X and Y.
With this definition we have
to be a measure of the degree of

and
Variables
Any characteristic we observe can be considered as a variable, and its values may be either
numerical or descriptive.
Variable Types
Nominal: straight descriptions
Ordinal: descriptions with natural ordering
Count: whole number answers
Continuous: unrestricted numerical values
Discrete
Qualitative
(Descriptive)
Quantitative
(Numerical)
Hypothesis Testing
All statistical tests have this basic structure:
P-value = the probability of getting our data (or more extreme) if the null hypothesis is true. If
p-value is small, gives evidence to reject , but if p-value is not small, cannot reject . i.e.
< 10% = slight evidence
< 5% = some evidence
< 1-2% = strong evidence
If the p-value is not sufficiently small, our conclusion is that we havent got enough evidence
in the data to accept or reject our original assumption.
Test of Proportions (1 Categorical Variable)

With individual categorical variables, a question that often arises is: Are our data consistent with a
stated or given set of percentages/probabilities?
1. Create table to compare observed frequencies against those we would expect to see
(sample size x original frequencies).
2. To allow for data variation, calculate
for each category, than add the
results of all the categories together. This test statistic measures how far the observed
frequencies are from the expected frequencies.
This test statistic has a chi-square
distribution, modelled with varying degrees of
freedom, that give the p-value (probability) of obtaining our test statistic, or more extreme,
if the observed frequencies actually apply. Degrees of freedom = number of categories 1
(Note: categories are counted after any combining).
Test of Proportions (2 Categorical Variables)

With two categorical variables, a question that often arises is: Are the two variables independent of
one another?
1. Create two tables to compare observed frequencies against those we would expect to see
[General formula can be summarised as (Row Total)x(Column Total)/(Grand Total)].
2. As before, calculate
for each category, than add the results over all the
categories.
3. Again compare test statistic with chi-square distribution table. But our rule for finding
degrees of freedom must now relate to both the variables involved. Hence, if the row
variable has r categories and the column variable has c categories, then the degrees of
freedom = (r-1)(c-1).
Test for Categorical Variables

The above tests are only appropriate if the expected frequencies are not too small. A general
guideline is for expected frequencies to be at least 5. If they are not, we should combine categories,
i.e. add their observed and expected frequencies.
Confidence Interval for 1 Proportion

gives us a single estimate for p; known as a point estimate, which tells us how close is
likely to be to p. Observed proportion, = (sample proportion/total)
We cant calculate the standard deviation of so we estimate p (using ) to get the
approximation [ (1 )/n], known as the standard error.
An approximate 95% confidence interval for a true proportion p is given by:
1.96 (Standard Error)
Confidence Interval for 2 Proportions

To generate an approximate interval we begin by labelling the true proportions as
and the corresponding sample proportions as and .
Based on this, an approximate 95% confidence interval for
is given by:
1.96 [ (1 )/
+ (1 )/ ]
and
One-way ANOVA Analysis of Variance

Analyses whether/how one or more categorical variables influence the expected value of a
numerical response variable. Assumptions: that the residuals are normally distributed with constant
variance.
Example:
Group 1: 10.96 10.77 10.9 10.69 10.87
Group 2: 10.88 10.75 10.8 10.81 10.7 10.82
Group 3: 11.13 10.99 10.98 11.02
Overall total:
Totals
54.19
64.76
44.12
163.07
1. Calculate the following values from the data:

Sum of squares of all 15 observations = 1773.0107
Sum of [(group total)2/(no. of observations for group)] = 54.192/5 + 64.762/6 + 44.122/4 = 1772.9311
(Overall total)2/15 = 1772.7883
2. Next calculate:
SS(total) = [Sum of all squares] (Total)2/(Total no.) = 1773.01071772.7883 = 0.2224
SS(group) = [Group s&s term] (Total)2/(Total no.) = 1772.93111772.7883 = 0.1428
3. Then find the associated degrees of freedom:
df(total) = Total no. of observations 1 = 14
df(group) = No. of groups 1 = 2
4. Finally, tabulate these results and finish the calculations as shown below.
Source
df
SS
SS/df
T.S.
Groups 2
0.1428
0.0714
0.0714/
0.2224
0.00663
Error
14-2 -0.1428
=10.76
=12
=0.0796 0.00663
Total
14
0.2224
Interpreting results: For 1-way ANOVA and simple regression, the p-value for the factor/predictor
tests the null; : the factor/predictor does not affect the response so a low p-value is evidence for
an effect.
Interpreting results: For 2-way ANOVA and multiple regressions, we have a separate p-value to
interpret for each factor/predictor. Here we need to be more careful, in that the result for each
factor/predictor is evidence (or lack of evidence) for its effect on the response after allowing for the
(possible effects of) other factors/predictors in the model.
If interaction is included in an ANOVA model, we also need to: interpret the test result for
interaction ( is that there is no interaction); be more careful again in interpreting the main effects:
if interaction is present, the result for each factor is evidence (or lack of evidence) for its effect on
the response averaged over the levels of the other factors in the model.
We find very strong evidence that time is affected by lights and age group but no evidence that it is
affected by gender, in each case after allowing for the other factors in the model.
Confidence interval for 1 mean

Given from a sample, we have 95% confidence that the interval t*s/n will include . For a
95% confidence interval, we need the value t* that has probability 0.025 to the right of it: e.g. for a
sample of 15 items (d = 14), the value we need is 2.145, so [when n = 15] a 95% confidence interval
for is given by ( 2.145s/n, + 2.145s/n), where is sample standard deviation.
Confidence Interval: Variance / Standard Deviation

We estimate the theoretical standard deviation by the sample standard deviation s. For a sample
of size n from a Normal distribution with theoretical variance
, the result we use is that
has a chi-square distribution with n 1 degrees of freedom.
Reliability
We define the items reliability at age t as R(t) = Pr(item is still working at age t) = Pr(failure hasnt
occurred by age t). The situation involves the random variable T = the items age at failure. Then
Pr(failure occurs at or before age t) = Pr (T t) = F(t). Note that R(t) = Pr(T>t) = 1F(t) or F(t) = 1R(t).

Statistics Study Sheet

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Study Sheet

Uploaded by

Copyright:

Available Formats

Probability

Other synonyms: likelihood, chance, odds

Note AB denotes the event A and B

Independence is a special case of Conditional Probability

Bayes Theorem (Conditional Probability)

, where Pr(A|B) and Pr(B|A) are conditional

Note: Pr(A) = Pr(AB) + Pr(A)

Discrete Probability Distributions

Continuous Probability Distributions

Distribution Parameters Mean (Expected Value)

The mean of X is defined as

Distribution Parameters Median

Distribution Parameters Variance

We can also use the E( ) notation to define variance as Var(X) = [

Distribution Parameters Standard Deviation

A discrete distribution for, the

To find the probability function of

Number of repetitions to obtain

Possible values for x are: 0, 1, ..., n

Consider first the number of

Number of repetitions to obtain

Possible values for x are: 0, 1, ..., n

A continuous distribution for a positive random variable

Examples of Normal variables:

If X and Y are independent then Cov(X,Y) = 0

to be a measure of the degree of

Test of Proportions (1 Categorical Variable)

for each category, than add the

Test of Proportions (2 Categorical Variables)

Test for Categorical Variables

Confidence Interval for 1 Proportion

Confidence Interval for 2 Proportions

One-way ANOVA Analysis of Variance

1. Calculate the following values from the data:

Confidence interval for 1 mean

Confidence Interval: Variance / Standard Deviation

You might also like