You are on page 1of 24

AbriefintroductiontoBayesianstatistics

H.Tyralis DepartmentofWaterResourcesandEnvironmentalEngineering NationalTechnicalUniversityofAthens

1.1Purposeofastatisticalanalysis
- The purpose of a statistical analysis is fundamentally an inversion purpose.

02/24

- It aims at retrieving the causes (reduced to the parameters ) from the effects (summarized by the observations x) - In other words, when observing a random phenomenon directed by , statistical methods allow to deduce from x an inference (that is a summary, a characterization) about .

nature x observations statistical analysis causes (parameters, deterministic)

reduction nature (real parameters)

1.2Purposeofastatisticalanalysis
- The purpose of a statistical analysis is fundamentally an inversion purpose.

03/24

- It aims at retrieving the causes (reduced to the parameters ) from the effects (summarized by the observations x) - In other words, when observing a random phenomenon directed by , statistical methods allow to deduce from x an inference (that is a summary, a characterization) about . - Instead probabilistic modeling characterizes the behavior of the future x conditional on .

nature x observations statistical analysis causes (parameters) prediction of future x

probabilistic modeling reduction nature (real parameters)

1.3Purposeofastatisticalanalysis
- The purpose of a statistical analysis is fundamentally an inversion purpose.

04/24

- It aims at retrieving the causes (reduced to the parameters ) from the effects (summarized by the observations x) - In other words, when observing a random phenomenon directed by , statistical methods allow to deduce from x an inference (that is a summary, a characterization) about . - Instead probabilistic modeling characterizes the behavior of the future x conditional on . - This inverting aspect of statistics is obvious in the notion of the likelihood function, since, formally, it is just the sample density rewritten in the proper order l(|x) = f(x|) i.e. as a function of , which is unknown, depending on the observed value x.

1.4Purposeofastatisticalanalysis
Example of statistical analysis
nature x observations maximize l(|x) = f(x|) causes (parameters, deterministic)

05/24

f(x|) nature (real parameters)

2.Bayess theorem
- A general description of the inversion of probabilities is given by Bayess theorem P(A|E) = P(E|A)P(A) P(E|A)P(A) c c = P(E) P(E|A)P(A) + P(E|A )P(A )

06/24

- This theorem is an actualization principle since it describes the updating of the likelihood of A from P(A) to P(A|E). - Bayes (1764) proved a continuous version of this result, namely that given two random variables x and y, with conditional distribution f(x|y) and marginal distribution g(y), the conditional distribution of y given x is g(y|x) = f(x|y)g(y) f(x|y)g(y)dy

3.1DefinitionoftheBayesianstatisticalmodel

07/24

- Bayes and Laplace went further and considered that the uncertainty on the parameters of a model could be modeled through a probability distribution on , called prior distribution. The inference is then based on the distribution of conditional on x, (|x), called posterior distribution and defined by (|x) = f(x|)() f(x|)()d

- The main addition brought by a Bayesian statistical model is thus to consider a probability distribution on the parameters DEFINITION A Bayesian statistical model is made of a parametric statistical model, f(x|) and a prior distribution on the parameters, ().

3.2DefinitionoftheBayesianstatisticalmodel

08/24

- In statistical terms, Bayess theorem thus actualizes the information on contained in x. - Its impact is based on the daring move that puts causes (observations) and effects (parameter) on the same conceptual level, since both of them have probability distributions. - From a statistical viewpoint, there is thus little difference between observations and parameters, since conditional manipulations allow for an interplay of their respective roles.

nature x observations statistical analysis nature () causes (|x) (parameters) prediction of future x

probabilistic modeling reduction nature (real parameters)

3.3DefinitionoftheBayesianstatisticalmodel

09/24

- In statistical terms, Bayess theorem thus actualizes the information on contained in x. - Its impact is based on the daring move that puts causes (observations) and effects (parameter) on the same conceptual level, since both of them have probability distributions. - From a statistical viewpoint, there is thus little difference between observations and parameters, since conditional manipulations allow for an interplay of their respective roles. - Historically this perspective, that parameters directing random phenomena can also be perceived as random variables goes against the atheistic determinism of Laplace as well as the clerical position of Bayes, who was a nonconformist minister. - The importance of the prior distribution in a Bayesian statistical analysis is not at all that the parameter of interest can (or cannot) be perceived as generated from or even as a random variable, but rather that the use of a prior distribution is the best way to summarize the available information (or even the lack of information) about this parameter, as well as the residual uncertainty, thus allowing for incorporation of this imperfect information in the decision process. - A more technical point is that the only way to construct a mathematically justified approach operating conditional upon the observations is to introduce a corresponding distribution on the parameters.

4.1Bayess example
Problem

10/24

- A billiard ball W is rolled on a line of length 1, with a uniform probability of stopping anywhere. It stops at p. A second ball O is then rolled n times under the same assumptions and X denotes the number of times the ball O stopped on the left of W. Given X, what inference can we make on p?

O 0

W p 1

n times rolled X times left of W

4.2Bayess example
Solution - The problem is to derive the posterior distribution of p given X. - The prior distribution of p is uniform on [0,1], (p) = 1. - X B(n,p) n - P(X = x|p) = px(1 p)nx x n x - P(a < p < b and X = x) = b p (1 p)nxdp a x n x - P(X = x) = 1 p (1 p)nxdp and we derive that 0 x n x p (1 p)nxdp b a x
x n x bp (1 p) dp a = B(x + 1,n x + 1)

11/24

- P(a < p < b|X = x) =

n x p (1 p)nxdp 1 0 x i.e. the distribution of p conditional on upon X = x is a beta distribution Be(x + 1,n x + 1). xa1(1x)b1 Be(a,b): f(x|a,b) = I (x) B(a,b) [0,1]

5.Priorandposteriordistributions
- Sample distribution: f(x|). - Prior distribution on : (). - Joint distribution of (,x): (,x) = f(x|)(). - Marginal distribution of x: m(x) = (,x)d = f(x|)()d. - Posterior distribution of : (|x) = f(x|)() f(x|)() = . m(x) f(x|)()d

12/24

- Predictive distribution of y when y g(y|,x): g(y|x) = g(y|,x)(|x)d.

6.1Improperpriordistributions

13/24

- When the parameter can be treated as a random variable with known probability distribution (), Bayess theorem is the basis of Bayesian inference, since it leads to the posterior distribution. (|x) = f(x|)() f(x|)()d

- In many cases, however, the prior distribution is determined on a subjective or theoretical basis that provides a measure , such that ()d = - In such cases the prior distribution is said to be improper (or generalized) Example - Consider a distribution f(x ) where the location parameter R. If no prior distribution is available on the parameter , it is quite acceptable to consider that the likelihood of an interval [a,b] is proportional to its length b a, therefore that the prior is proportional to the Lebesque measure on R. Therefore R()d = .

6.2Improperpriordistributions
Be careful - When using an improper prior distribution always check that f(x|)()d < - This leads to a proper posterior distribution (|x) = f(x|)() f(x|)()d

14/24

7.1Anotherexample
- Consider x N(,1). - Sample distribution: f(x|) = (1/ 2) exp[(1/2)(x )2]. - Prior distribution: () = c, R. - Posterior distribution of : (|x) = f(x|)() f(x|)()d

15/24

- f(x|)() = c(1/ 2) exp[(1/2)(x )2] - Rf(x|)()d = R c(1/ 2) exp[(1/2)(x )2]d = m(x) c(1/ 2) exp[(1/2)(x )2] - This leads to (|x) = exp[(1/2)(x )2] = exp[(1/2)( x)2] m(x) - In fact (|x) = h(x)exp[(1/2)( x)2] - Now look at the equivalence: y N(,1): f(y|) = (1/ 2) exp[(1/2)(y )2] = h()exp[(1/2)(y )2] exp[(1/2)(y )2] : (|x) exp[(1/2)( x)2] Thus |x N(x,1)

7.2Anotherexample
parameter Prior knowledge (or lack) on prior distribution () Observations x Determination of sample density f(x ) posterior distribution ( x)
0.2

16/24

x=0
0.16

inference on

0.12

() = 0.1 |0 N(0,1)

0.08

0.04

0 -4 -3 -2 -1 0 1 2 3 4

8.1HurstKolmogorov stochasticprocess
- Parameter: = (,2,H) - Sample distribution: f(xn|) =

17/24

1 2 1/2 exp[1/(22) (xn e)T R1 (xn e)] n/2 [det( R)] (2)

- Prior distribution on : () 1/2 - Posterior distribution of : (H|xn) (1/ |R|) [eT R1 e x n R1 xn (x n R1 e)2] (n1)/2 (eT R1 e)n/2 1 n 1 eT R1 e x n R1 xn (x n R1 e)2 2 , ) |H,xn inv-gamma( 2 2 eT R1 e x n R1 e 2 2 | ,H,xn N( T 1 , T 1 ) e R ee R e
T T T T T

8.2HurstKolmogorov stochasticprocess
2

18/24

data

-2 0

-1

20

40

60

80

100

time

8.3HurstKolmogorov stochasticprocess
histogram |xn
5e+05

19/24

Frequency

0e+00

1e+05

2e+05

3e+05

4e+05

-20

-10

10

20

30

8.4HurstKolmogorov stochasticprocess
histogram |xn
7e+05

20/24

Frequency

0e+00 0

1e+05

2e+05

3e+05

4e+05

5e+05

6e+05

10

15

20

8.5HurstKolmogorov stochasticprocess
histogram H|xn

21/24

Frequency

5000

10000

15000

20000

0.5

0.6

0.7

0.8

0.9

1.0

8.6HurstKolmogorov stochasticprocess
prognosis
2

22/24

posterior mean data


0 -2 0 -1

50

100

150

time

8.7HurstKolmogorov stochasticprocess
0,90 predictive percentile for yi|xn

23/24

data

-2 0

-1

20

40

60

80

time

9.References

24/24

Beran J (1994) Statistics for Long-Memory Processes, Volume 61 of Monographs on Statistics and Applied Probability. Chapman and Hall, New York Gelman A, Carlin J, Stern H, Rubin D (2004) Bayesian Data Analysis. Chapman & Hall/CRC Robert C (2007) The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation. Springer ,New York

You might also like