You are on page 1of 216

Bayesian Statistical Analysis

Chapter 1: Fundamentals of Bayesian Inference

Tang Yin-cai
yctang@stat.ecnu.edu.cn

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 1/??


1.1 The Bayesian
Method and
Comparison with
Classical Method

1.1 The Bayesian Method and


Comparison with Classical Method

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 2/??


Statistical Inference
Stat Inference
Classical ...
Bayesian ...
Bayesian Theorem
Statistical Inference Difference
Subj. and Obj.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 3/??


Statistical inference

Statistical Inference
Statistical Inference is a problem in which data Stat Inference
have been generated in accordance with some un- Classical ...
Bayesian ...
known probability distribution which must be ana- Bayesian Theorem
Difference
lyzed and some type of inferences about the un- Subj. and Obj.
known distributions to be made.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 4/??


Statistical inference

Statistical Inference
Statistical Inference is a problem in which data Stat Inference
have been generated in accordance with some un- Classical ...
Bayesian ...
known probability distribution which must be ana- Bayesian Theorem
Difference
lyzed and some type of inferences about the un- Subj. and Obj.
known distributions to be made.
In other words, in a statistics problem, there are
two or more probability distributions which may
have generated the data. By analyzing the data,
we attempt

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 4/??


Statistical inference

Statistical Inference
Statistical Inference is a problem in which data Stat Inference
have been generated in accordance with some un- Classical ...
Bayesian ...
known probability distribution which must be ana- Bayesian Theorem
Difference
lyzed and some type of inferences about the un- Subj. and Obj.
known distributions to be made.
In other words, in a statistics problem, there are
two or more probability distributions which may
have generated the data. By analyzing the data,
we attempt
■ to learn about the unknown distribution,

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 4/??


Statistical inference

Statistical Inference
Statistical Inference is a problem in which data Stat Inference
have been generated in accordance with some un- Classical ...
Bayesian ...
known probability distribution which must be ana- Bayesian Theorem
Difference
lyzed and some type of inferences about the un- Subj. and Obj.
known distributions to be made.
In other words, in a statistics problem, there are
two or more probability distributions which may
have generated the data. By analyzing the data,
we attempt
■ to learn about the unknown distribution,

■ to make some inferences about certain

properties of the distribution, and

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 4/??


Statistical inference

Statistical Inference
Statistical Inference is a problem in which data Stat Inference
have been generated in accordance with some un- Classical ...
Bayesian ...
known probability distribution which must be ana- Bayesian Theorem
Difference
lyzed and some type of inferences about the un- Subj. and Obj.
known distributions to be made.
In other words, in a statistics problem, there are
two or more probability distributions which may
have generated the data. By analyzing the data,
we attempt
■ to learn about the unknown distribution,

■ to make some inferences about certain

properties of the distribution, and


■ to determine the relative likelihood that each

possible distribution
S
is actually
CHOOL OFF
the
S
INANCE AND
correct one.
TAT I S T I C S

March 11, 2009 Chapter 1 - p. 4/??


There are three approaches to Probability. They are Statistical Inference
Stat Inference
Classical ...
Bayesian ...
Bayesian Theorem
Difference
Subj. and Obj.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 5/??


There are three approaches to Probability. They are Statistical Inference
Stat Inference
1. Axiomatic: Probability by definition and Classical ...
Bayesian ...
properties Bayesian Theorem
Difference
Subj. and Obj.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 5/??


There are three approaches to Probability. They are Statistical Inference
Stat Inference
1. Axiomatic: Probability by definition and Classical ...
Bayesian ...
properties Bayesian Theorem
Difference
2. Relative Frequency: Repeated trials Subj. and Obj.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 5/??


There are three approaches to Probability. They are Statistical Inference
Stat Inference
1. Axiomatic: Probability by definition and Classical ...
Bayesian ...
properties Bayesian Theorem
Difference
2. Relative Frequency: Repeated trials Subj. and Obj.

3. Degree of belief (subjective): Personal measure


of uncertainty

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 5/??


There are three approaches to Probability. They are Statistical Inference
Stat Inference
1. Axiomatic: Probability by definition and Classical ...
Bayesian ...
properties Bayesian Theorem
Difference
2. Relative Frequency: Repeated trials Subj. and Obj.

3. Degree of belief (subjective): Personal measure


of uncertainty
We have quite familiar with the first two and we use
quite often in decision making, especially when no
information or data available. The third is closely
related to Bayesian inference which we are going
to learn.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 5/??


Let’s take a look at Hypothesis Testing as an Statistical Inference
Stat Inference
example to see what classical statistical inference Classical ...
Bayesian ...
and Bayesian inference do correspondingly. Bayesian Theorem
Difference
Subj. and Obj.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 6/??


Let’s take a look at Hypothesis Testing as an Statistical Inference
Stat Inference
example to see what classical statistical inference Classical ...
Bayesian ...
and Bayesian inference do correspondingly. Bayesian Theorem
Hypothesis Testing is a form of proof by statistical Difference
Subj. and Obj.
contradiction: Evidence is gathered in favor of the-
ory by demonstrating that the data is unlikely to be
observed if the postulated theoretical model were
false.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 6/??


Let’s take a look at Hypothesis Testing as an Statistical Inference
Stat Inference
example to see what classical statistical inference Classical ...
Bayesian ...
and Bayesian inference do correspondingly. Bayesian Theorem
Hypothesis Testing is a form of proof by statistical Difference
Subj. and Obj.
contradiction: Evidence is gathered in favor of the-
ory by demonstrating that the data is unlikely to be
observed if the postulated theoretical model were
false.
Why do we do it this way?

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 6/??


Classical Approach

According to the probability theory, we express our Statistical Inference


Stat Inference
uncertainty as: Classical ...
Bayesian ...
P (Model is True|Observed Data) Bayesian Theorem

£Ø
Difference
Subj. and Obj.
However, based on our epistemological( )
foundations, we cannot state that the model is true
with a certain probability X.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 7/??


Classical Approach

According to the probability theory, we express our Statistical Inference


Stat Inference
uncertainty as: Classical ...
Bayesian ...
P (Model is True|Observed Data) Bayesian Theorem

£Ø
Difference
Subj. and Obj.
However, based on our epistemological( )
foundations, we cannot state that the model is true
with a certain probability X.

Either the model is true, or not.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 7/??


Instead, we are limited to a knowledge of: Statistical Inference
Stat Inference
Classical ...
P (Observed Data|Model is True) Bayesian ...
Bayesian Theorem
Difference
Subj. and Obj.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 8/??


Instead, we are limited to a knowledge of: Statistical Inference
Stat Inference
Classical ...
P (Observed Data|Model is True) Bayesian ...
Bayesian Theorem
■ If P (Observed Data|Model is True) is close to Difference
Subj. and Obj.
one, then the data is consistent with the model,
and we would not reject it as an objective
interpretation of reality.
Example ?

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 8/??


Instead, we are limited to a knowledge of: Statistical Inference
Stat Inference
Classical ...
P (Observed Data|Model is True) Bayesian ...
Bayesian Theorem
■ If P (Observed Data|Model is True) is close to Difference
Subj. and Obj.
one, then the data is consistent with the model,
and we would not reject it as an objective
interpretation of reality.
Example ?
■ If P (Observed Data|Model is True) is not close to

.
one, then the data is inconsistent with the
model s predictions, and we reject the model.
Example ?

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 8/??


Thus we summarized the three-step procedure for Statistical Inference
Stat Inference
classical hypothesis testing. Classical ...
Bayesian ...
Bayesian Theorem
Difference
Subj. and Obj.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 9/??


Thus we summarized the three-step procedure for Statistical Inference
Stat Inference
classical hypothesis testing. Classical ...
Bayesian ...
Bayesian Theorem
■ Define the Research Hypothesis. A
Step 1. Difference
Subj. and Obj.
Research or Alternative Hypothesis is a
statement derived from theory about what the
researcher expects to find in the data.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 9/??


Thus we summarized the three-step procedure for Statistical Inference
Stat Inference
classical hypothesis testing. Classical ...
Bayesian ...
Bayesian Theorem
■ Define the Research Hypothesis. A
Step 1. Difference
Subj. and Obj.
Research or Alternative Hypothesis is a
statement derived from theory about what the
researcher expects to find in the data.

■ Define the Null Hypothesis. The Null


Step 2.
Hypothesis is a statement of what you would not
expect to find if your research or alternative
hypothesis was consistent with reality.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 9/??


■ Step 3. Conduct an analysis of the data to
determine whether or not you can reject the null
hypothesis with some pre-determined
probability. If you can reject the null hypothesis
with some probability, then the data is consistent
with the model. If you cannot reject the null
hypothesis with some probability, then the data
is not consistent with the model.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 10/??


Bayesian Approach

Bayesians, in contrast, try to do the following:

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 11/??


Bayesian Approach

Bayesians, in contrast, try to do the following:


■ Make inferences based on all information

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 11/??


Bayesian Approach

Bayesians, in contrast, try to do the following:


■ Make inferences based on all information

■ See how new data effects our (old) inferences

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 11/??


Bayesian Approach

Bayesians, in contrast, try to do the following:


■ Make inferences based on all information

■ See how new data effects our (old) inferences

■ Need to identify all hypotheses (or states of

nature) that may be true

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 11/??


Bayesian Approach

Bayesians, in contrast, try to do the following:


■ Make inferences based on all information

■ See how new data effects our (old) inferences

■ Need to identify all hypotheses (or states of

nature) that may be true


■ Need to know what each hypothesis (or state of

nature) predicts

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 11/??


Bayesian Approach

Bayesians, in contrast, try to do the following:


■ Make inferences based on all information

■ See how new data effects our (old) inferences

■ Need to identify all hypotheses (or states of

nature) that may be true


■ Need to know what each hypothesis (or state of

nature) predicts
■ Need to know how to update our old inferences

in light of our observations

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 11/??


Bayesian Approach

Bayesians, in contrast, try to do the following:


■ Make inferences based on all information

■ See how new data effects our (old) inferences

■ Need to identify all hypotheses (or states of

nature) that may be true


■ Need to know what each hypothesis (or state of

nature) predicts
■ Need to know how to update our old inferences

in light of our observations


In sum, we try to do statistics like how scientists
think.
See Figure 1.1 for a schematic representation of
Bayesian reasoning.
SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 11/??


Figure 1: Schematic Representation of Bayesian Reasoning.

Theory, Creativity
Inference,
Hypothesis,  Verification,
Model Clasification
6

Deduction Induction

?
Epistemic Relationships
Predition - Data
Observation
SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 12/??


Bayes’ Theorem/Rule

Based on the conditional probability we have the


Bayes’ Theorem:
p(θ)p(y|θ)
p(θ|y) = (1.1)
p(y)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 13/??


Bayes’ Theorem/Rule

Based on the conditional probability we have the


Bayes’ Theorem:
p(θ)p(y|θ)
p(θ|y) = (1.1)
p(y)

z p(θ) —
■ called Prior Distribution

■ the probability distribution for the parameters θ

■ subjective uncertainty about the parameters


before we see the data: we have some idea
about what values the parameters might take

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 13/??


z p(y), —
■ called the marginal distribution of the data or

Prior Predictive Distribution

■ the unconditional distribution of the data

■ a constant: only depends on y

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 14/??


z p(y), —
■ called the marginal distribution of the data or

Prior Predictive Distribution

■ the unconditional distribution of the data

■ a constant: only depends on y

Thus, we may write the Bayes’ formula as


p(θ|y) ∝ p(θ)p(y|θ). (1.2)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 14/??


z p(θ|y) —
■ called the Posterior Distribution

■ the product of the prior and the likelihood

■ combines the both information from the prior and


the information of the data

■ can be updated according to these two kinds of


information.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 15/??


The key difference between C & B

Deduction( üÌ) and induction(8B) are two facet


of reasoning.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 16/??


The key difference between C & B

Deduction( üÌ ) and induction( 8B


) are two facet
of reasoning.
■ We deduct outcomes from hypothesis:

"If A then B", that is,


If (hypothesis) A is true, then B can be concluded
(observed).

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 16/??


The key difference between C & B

Deduction( üÌ ) and induction( 8B


) are two facet
of reasoning.
■ We deduct outcomes from hypothesis:

"If A then B", that is,


If (hypothesis) A is true, then B can be concluded
(observed).
■ We infer hypothesis from outcomes by induction.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 16/??


The key difference between C & B

Deduction( üÌ ) and induction( 8B


) are two facet
of reasoning.
■ We deduct outcomes from hypothesis:

"If A then B", that is,


If (hypothesis) A is true, then B can be concluded
(observed).
■ We infer hypothesis from outcomes by induction.

Figure 2 shows one possible situation which may


happen.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 16/??


A VLLVLVVVV
o

rr B
LLL VVVVV
j

r
f

LLL VVVVV rrrr


LLL r VrrVVV
LLrrr VVVV
rrrLLL hh C
r L LhLhLhh hh
rrr hhh LLL
rrr hhhh
r r hhhhh LL
r
hrh hh L
E D
x

Figure 2: An example of Bayesian reasoning

If A is true, then we are likely to observe B, C or D.


B and C are now observed. Therefore, A is sup-
ported!.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 17/??


The key difference between classical and Bayesian
reasoning is that

the Bayesian believes that knowledge is subjective.

Consequently, the Bayesian rejects the

/ 0
epistemological foundation that there exists a
true data-generating process that can be
revealed through process of elimination.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 18/??


Subjectivity and objectivity

■ All statistical methods that use probability are


subjective in the sense of relying on
mathematical idealizations of the world.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 19/??


Subjectivity and objectivity

■ All statistical methods that use probability are


subjective in the sense of relying on
mathematical idealizations of the world.
■ Bayesian methods are sometimes said to be
especially subjective because of their reliance on
a prior distribution, but in most problems,
scientific judgment is necessary to specify both
the ’likelihood’ and the ’prior’ parts of the model.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 19/??


Subjectivity and objectivity

■ All statistical methods that use probability are


subjective in the sense of relying on
mathematical idealizations of the world.
■ Bayesian methods are sometimes said to be
especially subjective because of their reliance on
a prior distribution, but in most problems,
scientific judgment is necessary to specify both
the ’likelihood’ and the ’prior’ parts of the model.
■ A general principle is at work here: whenever
there is replication, in the sense of many
exchangeable units observed, there is scope for
estimating features of a probability distribution
from data and thus making the analysis more
’objective.’
SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 19/??


Problems with Classical Statistical
Inference

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 20/??


Problem A: p value and Hypothesis Testing

Example:
■ Background and Data Information:

◆ The staff of Slater School was concerned that


their high cancer rate could be due to two
nearby high voltage transmission lines.

◆ There were 8 cases of invasive cancer over a


long time among 145 staff members

◆ Based on the national cancer rate among


woman this age, the expected number of
cancers is 4.2(approximately 3/100)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 21/??


■ Assumption—independence:

The 145 staff members developed cancer


independently of each other and the chance of
cancer, θ, was the same for each staff person.

Therefore, the number of cancers, Y , follows a


binomial distribution: Y |θ ∼ Bin(145, θ) .

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 22/??


The Question

The classical hypothesis is to test


H0 : θ = 0.03 vs. H1 : θ > 0.03

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 23/??


The Question

The classical hypothesis is to test


H0 : θ = 0.03 vs. H1 : θ > 0.03
Instead of answer this question directly, we answer
the following question:
How well do each of Four Simplified Competing
Theories explain the data?

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 23/??


The Question

The classical hypothesis is to test


H0 : θ = 0.03 vs. H1 : θ > 0.03
Instead of answer this question directly, we answer
the following question:
How well do each of Four Simplified Competing
Theories explain the data?

Theory A1 : θ = 0.03
Theory A2 : θ = 0.04
Theory A3 : θ = 0.05
Theory A4 : θ = 0.06

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 23/??


The Likelihood of Theories A-D

For each hypothesized θ, from Bin(145, θ), we have


 
145 8
P r(Y = 8|θ) = θ (1 − θ)137 . (1.3)
8

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 24/??


The Likelihood of Theories A-D

For each hypothesized θ, from Bin(145, θ), we have


 
145 8
P r(Y = 8|θ) = θ (1 − θ)137 . (1.3)
8

Theory A1 : P r(Y = 8|θ = 0.03) ≈ 0.036


Theory A2 : P r(Y = 8|θ = 0.04) ≈ 0.096
Theory A3 : P r(Y = 8|θ = 0.05) ≈ 0.134
Theory A4 : P r(Y = 8|θ = 0.06) ≈ 0.136

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 24/??


The Likelihood of Theories A-D

For each hypothesized θ, from Bin(145, θ), we have


 
145 8
P r(Y = 8|θ) = θ (1 − θ)137 . (1.3)
8

Theory A1 : P r(Y = 8|θ = 0.03) ≈ 0.036


Theory A2 : P r(Y = 8|θ = 0.04) ≈ 0.096
Theory A3 : P r(Y = 8|θ = 0.05) ≈ 0.134
Theory A4 : P r(Y = 8|θ = 0.06) ≈ 0.136
This is a ratio of approximately 1: 3: 4: 4.
So, Theory A2 explains the data about 3 times as
well as theory A1 .

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 24/??


From the likelihood principal we see that
once Y = 8 has been observed, then
p(y|θ) = P r(Y = y|θ)
describes how well each theory, or value of θ
explains the data.

No other value of Y is relevant.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 25/??


From the likelihood principal we see that
once Y = 8 has been observed, then
p(y|θ) = P r(Y = y|θ)
describes how well each theory, or value of θ
explains the data.

No other value of Y is relevant.

The Likelihood principal is central to Bayesian rea-


soning.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 25/??


Bayesian Analysis

There are other sources of information about


whether cancer can be induced by proximity to
high-voltage transmission lines.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 26/??


Bayesian Analysis

There are other sources of information about


whether cancer can be induced by proximity to

£Øö
high-voltage transmission lines.
■ Pro: Epidemiologists( ) show positive
correlations between cancer and proximity
■ Con: Other epidemiologists don .
t show these
correlations, and physicists and biologists
maintain believe that energy in magnetic fields
associated with high-voltage power lines is too
small to have an appreciable biological effect.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 26/??


Supposes we judge the pro and con sources
equally reliable.

Therefore, Theory A1 (no effect) is as likely as


Theories A2 , A3 , and A4 together, and we judge
theories A2 , A3 , and A4 to be equally likely.

So,

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 27/??


Supposes we judge the pro and con sources
equally reliable.

Therefore, Theory A1 (no effect) is as likely as


Theories A2 , A3 , and A4 together, and we judge
theories A2 , A3 , and A4 to be equally likely.

So,
P r(A1 ) ≈ 0.5 ≈ P r(A2 ) + P r(A3 ) + P r(A4 ),

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 27/??


Supposes we judge the pro and con sources
equally reliable.

Therefore, Theory A1 (no effect) is as likely as


Theories A2 , A3 , and A4 together, and we judge
theories A2 , A3 , and A4 to be equally likely.

So,
P r(A1 ) ≈ 0.5 ≈ P r(A2 ) + P r(A3 ) + P r(A4 ),

P r(A2 ) ≈ P r(A3 ) ≈ P r(A4 ) ≈ 1/6.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 27/??


Supposes we judge the pro and con sources
equally reliable.

Therefore, Theory A1 (no effect) is as likely as


Theories A2 , A3 , and A4 together, and we judge
theories A2 , A3 , and A4 to be equally likely.

So,
P r(A1 ) ≈ 0.5 ≈ P r(A2 ) + P r(A3 ) + P r(A4 ),

P r(A2 ) ≈ P r(A3 ) ≈ P r(A4 ) ≈ 1/6.

These quantities will


S represent
F
CHOOL OF
ourS prior beliefs.
INANCE AND TAT I S T I C S

March 11, 2009 Chapter 1 - p. 27/??


Based on the Bayes’ Theorem and the
assumptions of four theories, we have
P r(A1 )P r(Y = 8|A1 )
P r(A1 |Y = 8) =
P r(Y = 8)
P r(A1 )P r(Y = 8|A1 )
= P3
i=1 P r(Ai )P r(Y = 8|Ai )
1
= 2 ×0.36
1 ×0.36+ 1 ×0.096+ 1 ×0.134+ 1 ×0.136
2 6 6 6

= 0.23
P r(A2 |Y = 8) = 0.21
P r(A3 |Y = 8) = 0.28
P r(A4 |Y = 8) = 0.28.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 28/??


Based on the Bayes’ Theorem and the
assumptions of four theories, we have
P r(A1 )P r(Y = 8|A1 )
P r(A1 |Y = 8) =
P r(Y = 8)
P r(A1 )P r(Y = 8|A1 )
= P3
i=1 P r(Ai )P r(Y = 8|Ai )
1
= 2 ×0.36
1 ×0.36+ 1 ×0.096+ 1 ×0.134+ 1 ×0.136
2 6 6 6

= 0.23
P r(A2 |Y = 8) = 0.21
P r(A3 |Y = 8) = 0.28
P r(A4 |Y = 8) = 0.28.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 28/??


VÇ'
Accordingly, we see that each of these four
theories is equally likely, and the odds( ) are
3:1 that the cancer rate at Slater is greater than
0.03.

Therefore, the Bayesian analysis revealed that the


probability that
P r(θ > 0.03|Y = 8) = 0.77,
which would not be sufficient to reject the null
hypothesis H0 : θ = 0.03.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 29/??


Non-Bayesian Analysis

H0 : θ = 0.03 ⇔ H1 : θ > 0.03

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 30/??


Non-Bayesian Analysis

H0 : θ = 0.03 ⇔ H1 : θ > 0.03


p-value of Classical statisticians: the probability
under H0 of observing an outcome at least as ex-
treme as that actually observed.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 30/??


Non-Bayesian Analysis

H0 : θ = 0.03 ⇔ H1 : θ > 0.03


p-value of Classical statisticians: the probability
under H0 of observing an outcome at least as ex-
treme as that actually observed.

For the Slater problem, we find:


p-value = P r(Y = 8|θ = 0.03)
+P r(Y = 9|θ = 0.03)
+P r(Y = 10|θ = 0.03)
+ · · · + P r(Y = 145|θ = 0.03)
≈ 0.07.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 30/??


Thus, under a classical hypothesis test(at the
significance level α = 0.10),

reject the null hypothesis of no effect from the


power lines at Slater.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 31/??


Critique of p-values

Bayesians claim that the p-value should not be


used to compare hypotheses because:

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 32/??


Critique of p-values

Bayesians claim that the p-value should not be


used to compare hypotheses because:
■ hypotheses should be compared by how well

they explain the data.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 32/??


Critique of p-values

Bayesians claim that the p-value should not be


used to compare hypotheses because:
■ hypotheses should be compared by how well

they explain the data.


■ the p-value does not account for how well the

alternative hypotheses explain the data

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 32/??


Critique of p-values

Bayesians claim that the p-value should not be


used to compare hypotheses because:
■ hypotheses should be compared by how well

they explain the data.


■ the p-value does not account for how well the

alternative hypotheses explain the data

.
■ the p-value summands are irrelevant because

they don t explain how well any hypothesis


explains any observed data.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 32/??


Critique of p-values

Bayesians claim that the p-value should not be


used to compare hypotheses because:
■ hypotheses should be compared by how well

they explain the data.


■ the p-value does not account for how well the

alternative hypotheses explain the data

.
■ the p-value summands are irrelevant because

they don t explain how well any hypothesis


explains any observed data.
In short, the p-value does not obey the likelihood
principle because it uses P r(Y = y|θ) for values of
y other than the observed value y = 8.
The same thing is true of all classical hypothesis
tests and confidence intervals.
SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 32/??


Problem B: Confidence interval

A %100p (frequentist) confidence interval(CI) for a


parameter θ is an interval constructed according to
a specific method (for example, the maximum like-
lihood method), such that if we were to repeat the
experiment numerous times, with a new set of ob-
servational data (with different random errors) for
each experiment, then 100p% of the confidence in-
tervals we construct using this method would con-
tain the true (fixed) value of θ, whatever that is.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 33/??


An Example

Example. For example, a random sample survey


of American adults may indicate that mean income
in the United States is $35,000. Assuming (rather
implausibly) that income is normally distributed, we
could estimate a 90% confidence interval for our
sample mean, perhaps [$15,000, $55,000] for a
modestly sized sample. Using conventional fre-
quentist inference we can conclude that intervals
like the one calculated would cover the true (popu-
lation) mean income 90% of the time for repeated
applications of the sampling procedure.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 34/??


Questions

The questions about frequentist CI arise:

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 35/??


Questions

The questions about frequentist CI arise:


■ What about non-repeatable data? That is, there

is no data-generation process (DGP) creating


data sets for us, just a single set of data. How
can we apply frequentist procedures?

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 35/??


Questions

The questions about frequentist CI arise:


■ What about non-repeatable data? That is, there

is no data-generation process (DGP) creating


data sets for us, just a single set of data. How
can we apply frequentist procedures? Β(α = 1, β = 3)

■ What about the asymmetric

3.0
distribution? For example left

2.5
2.0
skew Be(1, 3) of sample

1.5
mean, what is the CI for the

1.0
population mean? The mode

0.5
is not included in the CI. This
0.0
seems not plausible!
0.0 0.2 0.4 0.6 0.8 1.0

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 35/??


Questions

The questions about frequentist CI arise:


■ What about non-repeatable data? That is, there

is no data-generation process (DGP) creating


data sets for us, just a single set of data. How
can we apply frequentist procedures?
■ What about the asymmetric
distribution? For example left
skew Be(1, 3) of sample
mean, what is the CI for the
population mean? The mode
is not included in the CI. This
seems not plausible!
■ What about multimodal distribution? How can
the two modes in the middle of the CI?
SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 35/??


Explanation

One reason: different definiton of "probability"

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 36/??


Explanation

One reason: different definiton of "probability"


■ The frequentists definition: probability is the

long-run expected frequency of occurrence.


P (A) = n/N,
where n is the number of times event A occurs in
N opportunities.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 36/??


Explanation

One reason: different definiton of "probability"


■ The frequentists definition: probability is the

long-run expected frequency of occurrence.


P (A) = n/N,
where n is the number of times event A occurs in
N opportunities.
■ The Bayesian view: Probability is related to
degree of belief. It is a measure of the plausibility
of an event given incomplete knowledge.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 36/??


■ Thus a frequentist believes that a population
mean is real, unknown, and can only be
estimated from the data.
■ Knowing the distribution for the sample mean, he
constructs a confidence interval, centered at the
sample mean.
■ Tricky: Either the true mean is in the interval or
it is not.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 37/??


■ So the frequentist can’t say there’s a 95%
probabilitya that the true mean is in this interval,
because it’s either already in, or it’s not. And
that’s because, to a frequentist, the true mean,
being a single fixed value, doesn’t have a
distribution.

s£`{
■ The sample mean does. Thus the frequentist
must use circumlocutions( ) like "95% of
similar intervals would contain the true mean, if
each interval were constructed from a different
random sample like this one."
a
”probability” = long-run fraction having this characteristic.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 38/??


Confidence intervals based on z distribution
20

|
|
|
|
|
15

|
|
|
|
Index

|
10

|
|
|
|
|
|
5

|
|
|
|

−4 −2 0 2 4 6

Confidence Interval

Figure 3: Confidence intervals for population mean

38-1
Bayesian’s Point of View

Bayesians have an altogether different world-view.


They say that only the data are real. The
population mean is an abstraction, and as such
some values are more believable than others
based on the data and their prior beliefs.
(Sometimes the prior belief is very non-informative,
however.) The Bayesian constructs a credible

N!
interval, centered near the sample mean, but
tempered( ) by "prior" beliefs concerning the
mean.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 39/??


A credible interval (which can also be abbreviated
to CI: how confusing) is an inherently Bayesian
concept: it is an interval such that the parameter is
believed to lie in the interval with probability p.
Fundamentally, the belief (probability) attaches to
the person who makes the statement, rather than
the parameter itself - in other words, it is subjective.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 40/??


Now the Bayesian can say what the frequentist
cannot: "There is a 95% probabilitya that this
interval contains the mean." b
a
”probability” = degree of believability.
b
A frequentist is a person whose long-run ambition is to be wrong 5% of the time. A
Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly
believes he has seen a mule.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 41/??


The Late Bus Example: What we are concern is
the probability that the school bus will be late in the
morning. We observed n mornings and found that
school buses are late y times. Then y follows the
binomial distribution Bin(n, θ), where θ is the
probability that the school bus is late.
 
n y
P r(y = y|θ) = θ (1 − θ)(n−y) .
y

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 42/??


Let n = 10, y = 3 and assume that we have no
information about θ. That is, we choose uniform
distribution on (0,1] as the prior for θ, called
noninformative prior of θ. In the late subsections
we will discuss the example in a general way
where Beta distributions Be(α, β) will be used as
informative prior for θ. This time the uniform prior is
a special Beta distribution, Be(1, 1). Thus we have
from (1.2) that the posterior distribution is a Beta
distribution Be(4, 8). Its mean, median and mode
are 0.33, 0.32 and 0.30 respectively. Thus the 95%
symmetric credible interval is (0.11, 0.61). a
a
In R, we get the quarter quantiles: qbeta(0.025, 4, 8)=0.11, qbeta(0.975, 4, 8)=0.61.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 43/??


Criticisms of the Bayesian approach

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 44/??


Criticisms of the Bayesian approach

■ The results are subjective. With only a few


observations, the parameter estimates may be
sensitive to the choice of priors. (See the Slater
School case)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 45/??


Criticisms of the Bayesian approach

■ The results are subjective. With only a few


observations, the parameter estimates may be
sensitive to the choice of priors. (See the Slater
School case)
/
0 ~
■ Bayesian reply: Bayesians use diffuse

”
priors, sensitivity analysis, etc. to mitigate(
) the influence of priors on their results.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 45/??


Criticisms of the Bayesian approach

■ The results are subjective. With only a few


observations, the parameter estimates may be
sensitive to the choice of priors. (See the Slater
School case)
/
0 ~
■ Bayesian reply: Bayesians use diffuse

”
priors, sensitivity analysis, etc. to mitigate(
) the influence of priors on their results.
■ The Bayesian analysis is philosophically
unsound. Bayesians treat θ as a random variable
where classical analysis treats θ as a fixed, but
unknown constant.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 45/??


Criticisms of the Bayesian approach

■ The results are subjective. With only a few


observations, the parameter estimates may be
sensitive to the choice of priors. (See the Slater
School case)
/
0 ~
■ Bayesian reply: Bayesians use diffuse

”
priors, sensitivity analysis, etc. to mitigate(
) the influence of priors on their results.
■ The Bayesian analysis is philosophically
unsound. Bayesians treat θ as a random variable
where classical analysis treats θ as a fixed, but
unknown constant.
■ Bayesian Reply: Treating θ as random does not
necessarily mean that θ is random; rather, it
expresses our uncertainty/knowledge about θ.
SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 45/??


Advantages of Bayesian statistics

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 46/??


Advantages of Bayesian statistics

There was a big explosion of Bayesian statistics


over the past 20 years. Approximately 30% of
papers in top statistical reviews are about
Bayesian statistics. Among the top 10 most cited
mathematicians over the last 10 years, 5 are
Bayesian statisticians! Over the last 5 years, 4
Copss Medals were awarded to Bayesian
statisticians. Bayesian inference has been widely
used because of its advantages which classical
statistics may lack of.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 47/??


■ The Bayesian approach is very well-adapted to
many application areas:
bioinformatics, genetics, epidemiology,
econometrics, machine learning, spatial
statistics, clinical trials, survival analysis,
computer modelling, nuclear magnetic
resonance etc.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 48/??


■ The Bayesian approach is very well-adapted to
many application areas:
bioinformatics, genetics, epidemiology,
econometrics, machine learning, spatial
statistics, clinical trials, survival analysis,
computer modelling, nuclear magnetic
resonance etc.
■ It allows one to incorporate in a principle way any
prior information available on a given problem.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 48/??


■ The Bayesian approach is very well-adapted to
many application areas:
bioinformatics, genetics, epidemiology,
econometrics, machine learning, spatial
statistics, clinical trials, survival analysis,
computer modelling, nuclear magnetic
resonance etc.
■ It allows one to incorporate in a principle way any
prior information available on a given problem.
■ It is honest and makes clear that any analysis
relies on a part of subjectivity.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 48/??


■ The Bayesian approach is very well-adapted to
many application areas:
bioinformatics, genetics, epidemiology,
econometrics, machine learning, spatial
statistics, clinical trials, survival analysis,
computer modelling, nuclear magnetic
resonance etc.
■ It allows one to incorporate in a principle way any
prior information available on a given problem.
■ It is honest and makes clear that any analysis
relies on a part of subjectivity.
■ Knowledge synthesis - It formalizes process of
learning from data to update beliefs.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 48/??


■ It is a simple framework and, much simpler than
"standard" approaches. Nevertheless, it is richer
than classic approach in modelling, less
assumptions and less (irrelevant) math too.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 49/??


■ It is a simple framework and, much simpler than
"standard" approaches. Nevertheless, it is richer
than classic approach in modelling, less
assumptions and less (irrelevant) math too.
■ Classical methods are often special cases of
Bayesian methods. For instances, Basic
hypothesis testing and estimation, design and
sample-size computations, linear and non-linear
regression, non-parametric statistics. etc. It gives
direct interpretation of confidence intervals and
p-values which is not easy via classical
approach.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 49/??


■ It is straightforward to handle missing data,
outliers, censored data, sparse data sets etc.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 50/??


■ It is straightforward to handle missing data,
outliers, censored data, sparse data sets etc.
■ It provides comprehensive and robust estimation
of models that cannot be fitted otherwise —
multilevel models, nested random effects etc.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 50/??


1.2 Introduction to Bayesian Statistics

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 51/??


Overview

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 52/??


Process of Bayesian Inference

Bayesian inference is the process of fitting a


probability model to a set of data and summarizing
the result by a probability distribution on the
parameters of the model and on unobserved

©
quantities such as predictions for new
observations

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 53/??


The process of Bayesian data analysis:

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 54/??


The process of Bayesian data analysis:
■ Setting up a full probability model — a joint

©
probability distribution for all observable and
unobservable quantities in a problem

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 54/??


The process of Bayesian data analysis:
■ Setting up a full probability model — a joint

©
probability distribution for all observable and
unobservable quantities in a problem
■ Conditioning on observed data: calculating and

interpreting the appropriate posterior distribution

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 54/??


The process of Bayesian data analysis:
■ Setting up a full probability model — a joint

©
probability distribution for all observable and
unobservable quantities in a problem
■ Conditioning on observed data: calculating and

interpreting the appropriate posterior distribution


■ Evaluating the fit of the model and the

µ
implications of the resulting posterior
distribution
◆ Does the model fit the data?

◆ Are the substantive conclusions reasonable?

◆ How sensitive are the results to the modeling

assumptions in step 1?

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 54/??


The process of Bayesian data analysis:
■ Setting up a full probability model — a joint

©
probability distribution for all observable and
unobservable quantities in a problem
■ Conditioning on observed data: calculating and

interpreting the appropriate posterior distribution


■ Evaluating the fit of the model and the

µ
implications of the resulting posterior
distribution
◆ Does the model fit the data?

◆ Are the substantive conclusions reasonable?

◆ How sensitive are the results to the modeling

assumptions in step 1?

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 54/??


General notation for statistical
inference

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 55/??


Two kinds of estimands

Two kinds of estimands( O )—unobserved


quantities for which statistical inferences are made.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 56/??


Two kinds of estimands

Two kinds of estimands( O )—unobserved


quantities for which statistical inferences are made.


1) potentially observable quantities, such as future
observations of a process

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 56/??


Two kinds of estimands

Two kinds of estimands( O)—unobserved


quantities for which statistical inferences are made.


1) potentially observable quantities, such as future
observations of a process
2) quantities that are not directly observable, that is,
parameters that govern the hypothetical process

©
leading to the observed data (for example,
regression coefficients)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 56/??


Two kinds of estimands

Two kinds of estimands( O)—unobserved


quantities for which statistical inferences are made.


1) potentially observable quantities, such as future
observations of a process
2) quantities that are not directly observable, that is,
parameters that govern the hypothetical process

©
leading to the observed data (for example,
regression coefficients)

§
The distinction between these two kinds of
estimands is not always precise but generally
useful as a way of understanding how a statistical

©
model for a particular problem fits into the real
world

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 56/??


Notations

Notations (they can be scalar or vectors):

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 57/??


Notations

Notations (they can be scalar or vectors):


■ θ — unobservable vector quantities or population

parameters of interest;

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 57/??


Notations

Notations (they can be scalar or vectors):


■ θ — unobservable vector quantities or population

parameters of interest;
■ y = (y1 , . . . , yn ) — the observed data;

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 57/??


Notations

Notations (they can be scalar or vectors):


■ θ — unobservable vector quantities or population

parameters of interest;
■ y = (y1 , . . . , yn ) — the observed data;

■ ỹ unknown§ but potentially observable,


quantities

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 57/??


Notations

Notations (they can be scalar or vectors):


■ θ — unobservable vector quantities or population

parameters of interest;
■ y = (y1 , . . . , yn ) — the observed data;

■ ỹ unknown§ but potentially observable,


quantities
■ When using matrix notation, we consider vectors

as column vectors. For example, if u is a vector


with n components, then uT u is a scalar and uuT
©
an n × n matrix

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 57/??


Exchangeability

The n values yi may be regarded as


§

exchangeable meaning that the joint probability


density p(y1 , . . . , yn ) should be invariant to
permutations of the indexes ©

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 58/??


Exchangeability

The n values yi may be regarded as


§

exchangeable meaning that the joint probability


density p(y1 , . . . , yn ) should be invariant to
permutations of the indexes ©
■ Generally, it is useful and appropriate to model
data from an exchangeable distribution as
independently and identically distributed (iid)
given some unknown parameter vector θ with
©
distribution p(θ)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 58/??


Bayesian inference

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 59/??


Bayesian Inference

§
■ Bayesian statistical conclusions about a
parameter θ or unobserved data ỹ, are made in
terms of probability statements ©

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 60/??


Bayesian Inference

§
■ Bayesian statistical conclusions about a
parameter θ or unobserved data ỹ, are made in
terms of probability statements ©
■ These probability statements are conditional on
the observed value of y, and in our notation are
written simply as p(θ|y) or p(ỹ|y) ©

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 60/??


Bayesian Inference

§
■ Bayesian statistical conclusions about a
parameter θ or unobserved data ỹ, are made in
terms of probability statements ©
■ These probability statements are conditional on
the observed value of y, and in our notation are
written simply as p(θ|y) or p(ỹ|y) ©
©
■ We also implicitly condition on the known values
of any covariates, x

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 60/??


Bayes’ rule

■ Prior distribution p(θ)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 61/??


Bayes’ rule

■ Prior distribution p(θ)


■ Sampling/data distribution p(y|θ)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 61/??


Bayes’ rule

■ Prior distribution p(θ)


■ Sampling/data distribution p(y|θ)
■ Joint distribution p(θ, y) = p(θ)p(y|θ)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 61/??


Bayes’ rule

■ Prior distribution p(θ)


■ Sampling/data distribution p(y|θ)
■ Joint distribution p(θ, y) = p(θ)p(y|θ)
■ Posterior distribution
p(θ, y) p(θ)p(y|θ)
p(θ|y) = = , (1.4)
p(y) p(y)
where P
◆ p(y) =
R θ p(θ)p(y|θ) for discrete θ or
◆ p(y) = p(θ)p(y|θ)dθ for continuous θ
θ

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 61/??


Bayes’ rule

■ Prior distribution p(θ)


■ Sampling/data distribution p(y|θ)
■ Joint distribution p(θ, y) = p(θ)p(y|θ)
■ Posterior distribution
p(θ, y) p(θ)p(y|θ)
p(θ|y) = = , (1.4)
p(y) p(y)
where P
◆ p(y) =
R θ p(θ)p(y|θ) for discrete θ or
◆ p(y) = p(θ)p(y|θ)dθ for continuous θ
θ
■ or (unnormalized posterior)
p(θ|y) ∝ p(θ)p(y|θ). (1.5)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 61/??


Prediction

■ Before the data y are considered, the distribution


of the unknown but observable y is
Z Z
p(y) = p(y, θ)dθ = p(θ)p(y|θ)dθ. (1.6)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 62/??


Prediction

■ Before the data y are considered, the distribution


of the unknown but observable y is
Z Z
p(y) = p(y, θ)dθ = p(θ)p(y|θ)dθ. (1.6)

■ called
◆ marginal distribution of y

◆ prior predictive distribution

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 62/??


Prediction

■ Before the data y are considered, the distribution


of the unknown but observable y is
Z Z
p(y) = p(y, θ)dθ = p(θ)p(y|θ)dθ. (1.6)

■ called
◆ marginal distribution of y

◆ prior predictive distribution

■ why prior: because it is not conditional on a


previous observation of the process

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 62/??


Prediction

■ Before the data y are considered, the distribution


of the unknown but observable y is
Z Z
p(y) = p(y, θ)dθ = p(θ)p(y|θ)dθ. (1.6)

■ called
◆ marginal distribution of y

◆ prior predictive distribution

■ why prior: because it is not conditional on a


previous observation of the process

©
■ why predictive: because it is the distribution for a
quantity that is observable

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 62/??


After the data y have been observed, we can
§§

predict all unknown observable ỹ from the


same process ©

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 63/??


After the data y have been observed, we can
§§

predict all unknown observable ỹ from the


same process ©

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 63/??


Average of conditional predictions

After the data y have been observed, we can


§§

predict all unknown observable ỹ from the


same process ©

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 63/??


y and ỹ are conditionally independent
After the data y have been observed, we can
§§

predict all unknown observable ỹ from the


same process ©

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 63/??


After the data y have been observed, we can
§§

predict all unknown observable ỹ from the


same process ©

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 63/??


After the data y have been observed, we can
§§

predict all unknown observable ỹ from the


same process ©
■ why posterior: conditional on the observed y

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 63/??


After the data y have been observed, we can
§§

predict all unknown observable ỹ from the


same process ©
■ why posterior: conditional on the observed y
■ why predictive: a prediction for an observable ỹ.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 63/??


Likelihood

■ Using Bayes’ rule with a chosen probability


model means that the data y affect the posterior
inference (1.5) only through the function p(y|θ) —

§ ©
the likelihood function (when regarded as a
function of θ for fixed y)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 64/??


Likelihood

■ Using Bayes’ rule with a chosen probability


model means that the data y affect the posterior
inference (1.5) only through the function p(y|θ) —

§ ©
the likelihood function (when regarded as a
function of θ for fixed y)
■ In this way Bayesian inference obeys what is

§
sometimes called the likelihood principle, which
states that for a given sample of data any two
probability models p(y|θ) that have the same

©
likelihood function yield the same inference for
θ

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 64/??


Likelihood and odds ratios

■ The posterior odds (ratios) for θ1 compared to θ2 :


p(θ1 |y) p(θ1 )p(y|θ1 )/p(y) p(θ1 )p(y|θ1 )
= = . (1.7)
p(θ2 |y) p(θ2 )p(y|θ2 )/p(y) p(θ2 )p(y|θ2 )

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 65/??


Likelihood and odds ratios

■ The posterior odds (ratios) for θ1 compared to θ2 :


p(θ1 |y) p(θ1 )p(y|θ1 )/p(y) p(θ1 )p(y|θ1 )
= = . (1.7)
p(θ2 |y) p(θ2 )p(y|θ2 )/p(y) p(θ2 )p(y|θ2 )

■ Odds have the attractive property that Bayes’


rule takes a particularly simple form — In words,

©
the posterior odds are equal to the prior odds
multiplied by the likelihood ratio, p(y|θ1 )/p(y|θ2 )

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 65/??


Computation and software

§
■ We will rely primarily on the statistical package R
for graphs and basic simulations fitting of

©
classical simple models(including regression, ...),
optimization, and some simple programming

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 66/??


Computation and software

§
■ We will rely primarily on the statistical package R
for graphs and basic simulations fitting of

©
classical simple models(including regression, ...),
optimization, and some simple programming

©
■ We use WinBugs within R(see Appendix C) as a
first try for fitting most models

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 66/??


Computation and software

§
■ We will rely primarily on the statistical package R
for graphs and basic simulations fitting of

©
classical simple models(including regression, ...),
optimization, and some simple programming

©
■ We use WinBugs within R(see Appendix C) as a
first try for fitting most models
■ other related softwares
◆ First Bayes: http://www.tonyohagan.co.uk/1b/

◆ BACC for R/S-plus/Matlab:

(http://www.econ.umn.edu/ bacc)
◆ MCMCpack: R package (V0.7-3)

◆ coda: R package

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 66/??


µ
Specific computational tasks that arise in Bayesian
data analysis include

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 67/??


µ
Specific computational tasks that arise in Bayesian
data analysis include
■ Vector and matrix manipulations (see Table 1.1)

■ Computing probability density functions (see

Appendix A)
■ Drawing simulations from probability distributions

■ Structured programming (including looping and

customized functions)
■ Calculating the linear regression estimate and

variance matrix

©
■ Graphics, including scatterplots with overlain

lines and multiple graphs per page

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 67/??


µ
Specific computational tasks that arise in Bayesian
data analysis include
■ Vector and matrix manipulations (see Table 1.1)

■ Computing probability density functions (see

Appendix A)
■ Drawing simulations from probability distributions

■ Structured programming (including looping and

customized functions)
■ Calculating the linear regression estimate and

variance matrix

©
■ Graphics, including scatterplots with overlain

lines and multiple graphs per page

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 67/??


Our general approach to computation is to fit many

©
models, gradually increasing the complexity ( See
Appendix C for a simple example)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 68/??


Our general approach to computation is to fit many

©
models, gradually increasing the complexity ( See
Appendix C for a simple example)
Appendix C illustrates how to perform

©
computations in R and Bugs in several different
ways for a single example

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 68/??


1.3 Simple Examples

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 69/??


The Bayes’ Theorem/rule revisited

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 70/??


The Bayes’ Theorem

■ The central idea and the goal of applied Bayesian


paradigm is to investigate how to combine and
model changes when new information from
different sources (’data’) is received.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 71/??


The Bayes’ Theorem

■ This is done through the Bayes’ rule:


p(θ)p(y|θ)
p(θ|y) = ∝ p(θ)p(y|θ),
p(y)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 72/??


The Bayes’ Theorem

■ This is done through the Bayes’ rule:


p(θ)p(y|θ)
p(θ|y) = ∝ p(θ)p(y|θ),
p(y)

■ where
◆ θ is the parameter of interest.

◆ y is the observed data.

◆ p(y|θ) is the probability of y for θ (likelihood).

◆ p(θ) is the prior (distribution), initial distribution

for θ.
◆ p(θ|y) is the posterior distribution for θ, given

the data y.
◆ p(y) is the marginal distribution, the total

probability for the given data y.


SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 72/??


■ Suppose ỹ ∼ p(ỹ|θ) is to be observed. The
(posterior) predictive distribution of ỹ, given
observed data y can be got from the posterior
distribution
Z
p(ỹ|y) = p(ỹ|θ)p(θ|y)dθ.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 73/??


■ Suppose ỹ ∼ p(ỹ|θ) is to be observed. The
(posterior) predictive distribution of ỹ, given
observed data y can be got from the posterior
distribution
Z
p(ỹ|y) = p(ỹ|θ)p(θ|y)dθ.

■ C.f.: The marginal distribution p(y) is sometimes


called the prior predictive distribution.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 73/??


Types of priors

In applied Bayesian inference, we have three kinds


of priors for the parameter.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 74/??


Types of priors

In applied Bayesian inference, we have three kinds


of priors for the parameter.
1. Uninformative prior:
■ Uniform, as wide as possible

■ sometimes called flat priors

■ problem: often difficult to define

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 74/??


Types of priors

In applied Bayesian inference, we have three kinds


of priors for the parameter.
1. Uninformative prior:
■ Uniform, as wide as possible

■ sometimes called flat priors

■ problem: often difficult to define

2. Informative Prior
■ not uniform

■ assume we have some prior knowledge

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 74/??


Types of priors

In applied Bayesian inference, we have three kinds


of priors for the parameter.
1. Uninformative prior:
■ Uniform, as wide as possible

■ sometimes called flat priors

■ problem: often difficult to define

2. Informative Prior
■ not uniform

■ assume we have some prior knowledge

3. Conjugate Prior
■ prior and posterior have same distribution

■ often makes the maths easier

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 74/??


Since Bayesian inference is virtually determined by
the prior of the parameter θ and the likelihood
p(y|θ), we often write the Bayesian model as

Y |θ ∼ p(y|θ) and θ ∼ p(θ).

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 75/??


Example 1: θ takes two possible values

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 76/??


θ takes two possible values

Assume a DNA trace is found at a criminal scene.


Assume trace is run through a database with
10,000,000 citizens, and a single match is found.
What is the probability for guilty?

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 77/??


θ takes two possible values

Assume a DNA trace is found at a criminal scene.


Assume trace is run through a database with
10,000,000 citizens, and a single match is found.
What is the probability for guilty?

Let θ = 1 be ’guilt’ and θ = 0 be not. Then


p(θ = 1) = 10−7 . We also assume
p(match|θ = 1) ≈ 1 and p(match|θ = 0) = 10−6 .

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 77/??


θ takes two possible values

Assume a DNA trace is found at a criminal scene.


Assume trace is run through a database with
10,000,000 citizens, and a single match is found.
What is the probability for guilty?

Let θ = 1 be ’guilt’ and θ = 0 be not. Then


p(θ = 1) = 10−7 . We also assume
p(match|θ = 1) ≈ 1 and p(match|θ = 0) = 10−6 .

From the Bayes’ Theorem, we have


1 × 10−7
p(θ = 1|match) = ≈ 0.09.
1 × 10 + 10 × 1
−7 −6

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 77/??


Example 2: Binomial data with beta
prior

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 78/??


Binomial data with beta prior

Suppose that

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 79/??


Binomial data with beta prior

Suppose that
■ the likelihood (model) for y given θ is binomial

Bin(n, θ), i.e.,


 
n y
p(y|θ) = θ (1 − y)n−y ,
y

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 79/??


Binomial data with beta prior

Suppose that
■ the likelihood (model) for y given θ is binomial

Bin(n, θ), i.e.,


 
n y
p(y|θ) = θ (1 − y)n−y ,
y

■ and the prior is beta Be(α, β), where the


hyperparameters α and β are known,
1
p(θ) = θα−1 (1 − θ)β−1 , 0 ≤ θ ≤ 1.
B(α, β)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 79/??


Binomial data with beta prior

Suppose that
■ the likelihood (model) for y given θ is binomial

Bin(n, θ), i.e.,


 
n y
p(y|θ) = θ (1 − y)n−y ,
y

■ and the prior is beta Be(α, β), where the


hyperparameters α and β are known,
1
p(θ) = θα−1 (1 − θ)β−1 , 0 ≤ θ ≤ 1.
B(α, β)

■ Find the 1) joint, 2) marginal, 3) posterior, and 4)


predictive distributions.
SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 79/??


■ The results are as follows:
n

y
p(y, θ) = θα+y−1 (1 − θ)n−y+β−1
B(α, β)
n

y
B(y + α, n − y + β)
p(y) = , y = 0, 1, · · · , n
B(α, β)
1
p(θ|y) = θα+y−1 (1 − θ)n−y+β−1 ,
B(y + α, n − y + β)
0≤θ≤1
n

y
B(y + ỹ + α, 2n − y − ỹ + β)
p(ỹ|y) =
B(y + α, n − y + β)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 80/??


■ Note that the posterior of θ, Be(α + y, n − y + β),
is the same as its prior, beta distribution.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 81/??


■ Note that the posterior of θ, Be(α + y, n − y + β),
is the same as its prior, beta distribution.
■ Priors which have the same form as the
posteriors are called conjugate priors.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 81/??


■ Note that the posterior of θ, Be(α + y, n − y + β),
is the same as its prior, beta distribution.
■ Priors which have the same form as the
posteriors are called conjugate priors.
■ Thus beta distribution is the conjugate prior for
the proportional rate θ.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 81/??


■ Note that the posterior of θ, Be(α + y, n − y + β),
is the same as its prior, beta distribution.
■ Priors which have the same form as the
posteriors are called conjugate priors.
■ Thus beta distribution is the conjugate prior for
the proportional rate θ.
■ If α = 1, β = 1, the prior becomes the uniform
prior.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 81/??


Sources of influence on the posterior: Now let us
take an numerical example to see how

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 82/??


Sources of influence on the posterior: Now let us
take an numerical example to see how
■ different priors,

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 82/??


Sources of influence on the posterior: Now let us
take an numerical example to see how
■ different priors,

■ different data, and

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 82/??


Sources of influence on the posterior: Now let us
take an numerical example to see how
■ different priors,

■ different data, and

■ new coming data

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 82/??


Sources of influence on the posterior: Now let us
take an numerical example to see how
■ different priors,

■ different data, and

■ new coming data

bring changes to the posterior.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 82/??


The influence of different priors

The shape of the beta distribution Be(α, β) is


determined by both hyperparameters α and β. The
expectation, mode and variance are
α
E(θ) = ,
α+β
α−1
M (θ) = ,
α+β−2
αβ
V ar(θ) = 2
.
(α + β) (α + β + 1)

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 83/??


The influence of different priors

The shape of the beta distribution Be(α, β) is


determined by both hyperparameters α and β. The
expectation, mode and variance are
α
E(θ) = ,
α+β
α−1
M (θ) = ,
α+β−2
αβ
V ar(θ) = 2
.
(α + β) (α + β + 1)
And we can also get the median and different
quantiles. Similar quantities can be obtained from
the posterior beta distributions.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 83/??


Thus we have
■ When both α and β increase, the variance gets

lower.
■ When β increases, the distribution shifts

downward and when α increases, the distribution


shifts upward.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 84/??


Thus we have
■ When both α and β increase, the variance gets

lower.
■ When β increases, the distribution shifts

downward and when α increases, the distribution


shifts upward.
For the Late Bus Example, suppose we observe
y=3 late buses in two weeks (n=10 days). Figure
?? shows 9 priors for θ with (α, β) = (0.5, 0.5),
(0.5, 1.0), (0.5, 1.5), (1.0, 0.5), (1.0, 1.0),
(1.0, 1.5), (1.5, 0.5), (1.5, 1.0), (1.5, 1.5), and the
corresponding overlapping posteriors.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 84/??


α = 1, β = 1 α = 0.5, β = 1 α = 0.5, β = 1.5
postmean=0.32,postmax=0.28 postmean=0.30,postmax=0.26 postmean=0.29,postmax=0.25

5
3.0

6
5
4

4
2.0

3
2

2
1.0

1
1

0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

α = 1, β = 0.5 α = 1, β = 1 α = 1, β = 1.5
postmean=0.35,postmax=0.32 postmean=0.33,postmax=0.30 postmean=0.32,postmax=0.29
1.4

1.5
5

1.2
4

1.0
3

1.0

0.5
2

0.8
1

0.6

0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

α = 1.5, β = 0.5 α = 1.5, β = 1 α = 1.5, β = 1.5


postmean=0.38,postmax=0.35 postmean=0.36,postmax=0.33 postmean=0.35,postmax=0.32
1.5

1.2
6
5

1.0

0.8
4
3

0.5

0.4
2
1

0.0

0.0
0

SCHOOL OF FINANCE AND S TAT I S T I C S


0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

March 11, 2009 Chapter 1 - p. 85/??


The influence of different data

■ The result of the Bayesian inference is also


affected by the data or its distribution (likelihood).

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 86/??


The influence of different data

■ The result of the Bayesian inference is also


affected by the data or its distribution (likelihood).

■ For the Late Bus Example, we only consider the


case under flat prior: p(θ) = Be(1, 1) ∝ 1, the
unform distribution.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 86/??


The influence of different data

■ The result of the Bayesian inference is also


affected by the data or its distribution (likelihood).

■ For the Late Bus Example, we only consider the


case under flat prior: p(θ) = Be(1, 1) ∝ 1, the
unform distribution.
■ Figure ?? shows the posterior beta distributions
Be(y + 1, n − y + 1), where n = 5 (one week) and
y = 0, 1, 2, 3, 4, 5.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 86/??


The influence of different data

■ The result of the Bayesian inference is also


affected by the data or its distribution (likelihood).

■ For the Late Bus Example, we only consider the


case under flat prior: p(θ) = Be(1, 1) ∝ 1, the
unform distribution.
■ Figure ?? shows the posterior beta distributions
Be(y + 1, n − y + 1), where n = 5 (one week) and
y = 0, 1, 2, 3, 4, 5.
■ Figure ?? shows the two posteriors for
n = 5, y = 1 and n = 10, y = 3. We see that the
shapes of posteriors differ a lot.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 86/??


4

y=0 y=5
3

y=1 y=4

y=2 y=3
2

Prior
1
0

0.0 0.2 0.4 0.6 0.8 1.0

Figure 5: Posterior distributions: Be(α + y, n − y + β), for n = 5


and y = 0, 1, 2, 3, 4, 5 S C H O O L O F F I N A N C E A N D S T A T I S T I C S

March 11, 2009 Chapter 1 - p. 87/??


3.0
2.5
2.0

Prior
1.5

n=5,y=1
n=10,y=3
1.0
0.5
0.0

0.0 0.2 0.4 0.6 0.8 1.0

Figure 6: Posterior distributions: Be(α + y, n − y + β), for n =


5, y = 1 and n = 10, yS=
CH3O O L O F F I N A N C E A N D S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 88/??


The influence of new coming data

■ Suppose we observe some data y1 , and then


from the Bayes’ rule, we get the posterior
distribution
p(θ|y1 ) ∝ p(y1 |θ) × p(θ).

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 89/??


The influence of new coming data

■ Suppose we observe some data y1 , and then


from the Bayes’ rule, we get the posterior
distribution
p(θ|y1 ) ∝ p(y1 |θ) × p(θ).

■ Later we observe some more data y2 . If it is


independent of the first data set y1 , then
p(y1 and y2 |θ) = p(y1 |θ) × p(y2 |θ).

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 89/??


Hence, from the Bayes’ rule, we have
p(θ|y1 , y2 ) ∝ p(θ) × p(y1 |θ) × p(y2 |θ)
= p(θ|y1 ) × p(y2 |θ).

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 90/??


Hence, from the Bayes’ rule, we have
p(θ|y1 , y2 ) ∝ p(θ) × p(y1 |θ) × p(y2 |θ)
= p(θ|y1 ) × p(y2 |θ).
That is, we use the first posterior as the prior for
the second posterior. The resulting posterior can
then be used as a new prior distribution which can
be updated with further data.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 90/??


■ The Bayesian approach is often talked about as
a learning process:
As we get more data, we add to our store of in-
formation by multiplying it by our current posterior
distribution.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 91/??


■ The Bayesian approach is often talked about as
a learning process:
As we get more data, we add to our store of in-
formation by multiplying it by our current posterior
distribution.
■ It has been argued that this can form the basis of
science and this has been applied to the
(Bayesian) decision making process.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 91/??


■ The Bayesian approach is often talked about as
a learning process:
As we get more data, we add to our store of in-
formation by multiplying it by our current posterior
distribution.
■ It has been argued that this can form the basis of
science and this has been applied to the
(Bayesian) decision making process.
■ For the Late Bus Example, if after 10 weeks, we
observe 10 late bused, then we see from Figure
?? that as evidence accumulates, our beliefs of θ
converge, though our priors differ a great:
Be(1, 1), Be(2, 5), Be(1, 10).

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 91/??


10

8
5
8

6
4
Posterios
Priors
6

n=5, y=1
Posterios
n=50, y=10

4
4

2
2

1
0

0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

y y y

Figure 7: Posterior distributions with accumulating sampling


information S C H O O L O F F I N A N C E A N D S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 92/??


Example 3: Normal data with normal
prior

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 93/??


Example 3: Normal data with normal prior

■ This example is important because it addresses


the normal likelihood and normal prior
combination often used in practice.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 94/??


Example 3: Normal data with normal prior

■ This example is important because it addresses


the normal likelihood and normal prior
combination often used in practice.
■ Assume that
◆ an observation, y is normally distributed with

mean θ and known variance σ 2


◆ The parameter of interest, θ also has normal

distribution with parameters µ and τ 2 .

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 94/??


Example 3: Normal data with normal prior

■ This example is important because it addresses


the normal likelihood and normal prior
combination often used in practice.
■ Assume that
◆ an observation, y is normally distributed with

mean θ and known variance σ 2


◆ The parameter of interest, θ also has normal

distribution with parameters µ and τ 2 .


■ Find the marginal, posterior, and predictive
distributions.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 94/??


The results are as follows:

p(y) = N (µ, σ 2 + τ 2 ),
2 2 2 2
 
τ σ σ τ
p(θ|y) = N 2 2
y+ 2 2
µ, 2 2
,
σ +τ σ +τ σ +τ
2 2 2 2
 
τ σ 2 σ τ
p(ỹ|y) = N 2 2
y+ 2 2
µ, σ + 2 2
.
σ +τ σ +τ σ +τ

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 95/??


If y1 , y2 , · · · , yn are observed instead of a single
observation y, then from the sampling distribution
σ2
of ȳ, N (θ, n ), we have
!
σ2 σ2 2
τ2 n n
τ
p(θ|ȳ) = N σ2
y+ σ2
µ, σ2
,
+τ 2 + τ2 + τ2
n n n
!
σ2 σ2 2
τ2 τ
p(ỹ|ȳ) = N σ2
y+ σ2
n
µ, σ 2 + σ2
n
.
+τ 2 + τ2 + τ2
n n n

We see that normal distribution is the conjugate


prior for the normal mean.

SCHOOL OF FINANCE AND S TAT I S T I C S

March 11, 2009 Chapter 1 - p. 96/??

You might also like