You are on page 1of 19

1.13.

Suciency
Denition: Let X have a family of possible distribution {Fi
x
x
i
H } I, is suf-
cient for based on x, i each condition distrbution of x given I = td not
depend on (except perhaps for a null set A with T
(
T A) = 0) t (Idea: based
decisions on (minimal)sucient statistic).
Eg: x
1
, x
2
, ..., x
n
idd b(i,p), P [0, 1]. Let T =

n
i=1
x
i
, note T bin (n.P)
For t {0, 1, 2, ..., n}, consider P(X=x T = t)
P(X = xT = t) =
P(X=x,T=t)
P(T=t)
=
P(X=x)I(x
i
{0,1}i=1,2,...,n,

n
i=1
x
i
=t)
(
n
t
)p
t
(1p)
nt
=

n
x=1
p
x
i
(1p)
x
i
I(x
i
{0,1}i=1,2,...,n,

n
i=1
x
i
=t)
(
n
t
)p
t
(1p)
nt
=
p
t
(1p)
nt
I(x
i
{0,1}i=1,2,...,n,

n
i=1
x
i
=t)
(
n
t
)p
t
(1p)
nt
=
I(x
i
{0,1}i=1,2,...,n,

n
i=1
x
i
=t)
(
n
t
)
So, it doesnt depend on P, so, T is sucient for p based on X.
Eg 2: x
1
, x
2
, ..., x
n
idd N(, 1).
Candidate sucient statistic T =

n
i=1
x
i
.
Need to show P(X = xT = t) does not depend on for all t.

xt
(xt)... 3
Claim: can sow x
1
, ..., x
n
T = t does not depend involve ... 1
E(e
i
n

j=1
x
j
u
j

n1

j=1
x
j
= t) = E(e
i(

n
j=1
x
j
u
j
+u
n
(t

n1
j=1
x
j
))
T = t)... 2
1 is true 2 istrue 3 istrue.
(Fisher- neyman criterion factorization criterion)
Led X jave density f
x
(x, ), H . I = I(x) is sucient for on X, i f
x
(x, )
can be expressed in the from:
f
v
(x, ) = q(TX)) h(x) (*)
, where h(X) doesnt depend on
Valid for dersftles discrete & continous r.v.s
even valid in more abstract settings.
eg: x
1
, ..., x
n
idd B(I.P) P (0, 1)
f
x
(x) =

n
i=1
p
x
i
(1 p)
x
i
I(x
i
{0,1})
1
= P

x
i
.P
g(

x
i
.p)
..
(1 P)
n

x
i
I(
h(x)
..
x
i
{0, 1})

x
i
is a sucient statistic by fact ... crit.
eg2: x
1
, ..., x
n
idd N(, 1) R
f
x
(x) =

i = 1
n 1

2exp{
(x
i
)
2
2
}
(2)
n
2
e
1
2

i=1
n
x
2
i
. .
h(x)
. e

i=1
n
x
i
e
1
2
n
2
. .
g(

x
i
.)
I(x R
2

x
i
is sucient for basedd on X.
Discete case see text
proof: we prove when T is one-dimensional under regular conditions.
Suppose () hdds, let T
1
= T and T
2
, ..., T
n
be sucht that the transfomation
from x
1
, ..., x
n
to T
1
, ..., T
n
has a non.venishing Jacobian fo its inverse.
f
t
(t
i
, ) = f
x
(T
1(t)
)
doesn

tinvolve
..
T(T
1
) .
g(t
1
, ).h(T
1
(T))T(T
1
))
We wish to show to distribution of (x
1
, ..., x
n
) given T
1
= t
1
, does not involve
.
First show, F
T
2
,T
3
,...,T
n
T
1
(T
2
, T
3
, ..., T
n
T
1
) does not involve
Key!
Then show the condition cif of x
1
, ..., x
n
given T
1
= t1 does not involve .
F
T
2
,...,T
n
T
1
=
f
T
1
,...,t
n
(T
1
,...,T
n
)
int

...

f
T
1
,...,t
n
(T
1
,...,T
n
)dt
2
...dt
n
=
g(t
1
,) //////h
1
F
1
(t).T(T
1
)
g(t
1
,) //////

...

f
T
1
,...,t
n
(T
1
,...,T
n
)dt
2
...dt
n
does not involve .
Conversely, wlog assume T
1
, X
2
, ..., X
n
have a joint density which can be obained
via Jacobian from f
x
(x
i
, ). If X T
i
= t
i
is indep of , then x
2
, ..., x
n
T
1
= t
1
,
is also independent of .
f
x
(x, ) = f
t
1
,x
2
...x
n
(t
1
(x), x
2
...x
n
)
a function ot x
i
does not depend on
..
J(T
1
)
use denition: f
T
1
(t
1
(x).)
. .
g(,T)may depend on and depend on x through t
1
(x)
f
x
2
,...,x
n
t
1
(x
2
, ..., x
n
t
1
(x))J
h(x)
() hold.
Notes: (1) From the proof, w see that g(T(x).) is, except for a fa which might
depend on x, but not on , the density of T.
(2) if I is k.dimensional say I = (T
1
, ..., T
k
). We say that (T
1
, ..., T
k
) are jointly
sucient for based on X.
(3) If I is sucient for based on X and if S is a 1-1 function , then S is sucient
for based on X also.
(

x
i
)
1
3
1

x
i
(4) If is k-dimensional if I is su for based on x, the the dim of I can b > or
= or < then k.
2
eg: x
1
, x
2
, ..., x
n
idd (0, ), where (0, ).
f
x(x,)=

n
i=1
1

I(0<x
i
<)=
1
n I(x
1=n
>0).I(x
n=n
<).
By factorization crit, X
n=n
is sucient for based on X. Try: x
1
, x
2
, ..., x
n
idd
u(, ).
Suciency in exponencial family
Let Q: (a,b) R H = (a, b)
let r(x) be a non-negative function dene on R.
suppose that r(x)e
Q()T(x)
is integrable over R, (a, b).
Then f(x, ) = ()r(x)e
Q()T(x)
is a well dened family of densitife on called:
one-parameter exponential family.
f(x, ) =
..
()r(x)e
Q()T(x)
= e
Q()T(x)+D()+B(x)
I(X A)
Note: f(x, ) > 0, i r(x) > 0 depend on (U(0, )X)
Eg: X poisson ()
f
x
(x, )

x
e

x!
I(x Z
+
) = e

..
()
e
(log)
. .
Q()
X
..
T(x) 1
X!
I(x Z
+
)
. .
r(x)
1 If X has a one-parameter exponential family of densities with the structure
function T(x) also has a one- para... expon, family of densities (with structure
function the identities) T= tx
f
t
t; =
..
()
..
r (t)e
Q()t
2 If x
1
, x
2
, ..., x
n
ae idd with common densities f
t
t; =
..
()
..
r (t)e
Q()t
,
then

n
i=1
T(x
i
) has an exponential family of densities & is sucient (minimal)
for based on X.
proof: Su to prove in the case T(x)= X. Suppose that x1 and x
2
are indepent
t
x
i
(x, ) = ()r
i
(c)e
Q()X
H
Let Y= x
1
+x
2
f
r
(y, ) =
_

f
1
(y; )
3
f
2
(; )d =
1
()
2
()
. .

r
()=[()]
2
e
Q()Y
_

r
1
(y)r
2
()dByinduction, continuetodo
(x
1
+x
2
) +x
3
, ...., (x
1
+x
2
+... +x
n
)
Eventually, Y =

n
i=1
x
i
will have exponential family of densities, with
r
() =
[()]
n
r(y)= r r r r.... r
. .
n

r
3
Q
r
() = Q() T(Y)=Y
Proof: F
xi
(x
1
, ..., x
n
) = [()]
n

n
i=1
r
1
x
i
e
Q()
n

i=1
T(x
i
)
. .
sultian
1.15. If x
1
, x
2
, .., x
n
idd f
x
(x, ) = ()r(x)e
(
Invertibleforidentifibility
..
Q()
structurefunction
..
P(x) )
the

n
i=1
t(x
i
) is sucient for based on X and

n
i=1
t(x
i
) has an exponential
family of densities with structure function T
o
(t) = t
Example: x
1
, x
2
, .., x
n
idd (B](1, )), what is a good sucient statistics.
f
x
(x, ) =
(1 x)
(
1)
B(1.)
I(0 < x < 1) =
1
B(1.)
. .
()
I(0 < x < 1)
1 x
. .
r(x)
e
. log(1 x)
. .
Q()t(X)
Supose T(x) = log(1x) a sucient stat for is

n
1=1
log(1x
1
) = log(

n
1=1
(1
x
1
)) or could use

n
i=1
(1 x
i
)
(If T is su for based on x and if is invertible, the g(I) is su for based on
X)
Notes: (1) the dimension of a su stat can be < = > the dim of the parameter
space.
(2) A k-parameter parameter exponential family of densities is of the for f(x, ) =
()r(x)e
(

n
1=1
Q
i
()T
i
(x)
or exp(A() +B(x) +

n
i=1
Q
i
()T
i
(x))I(x D) where
X
n
i=1
(a
i
, b
i
) (usually m=k) and Q
i
() and T
i
(x) both should be functional.
Ex. x
1
, x
2
, ..., x
n
are idd N(
1
,
2
),
1
R,
2
R
+
F
x
(x; ) =
1

2
exp{
(x)
2
2
2
}
=
1

2
e

2
e

1
x

2
e

x
2
2
2
Some authors will advocate reparementerization in terms of naturalpara n
1
=
Q(), n
2
= Q(), ... , n
k
= Q(),
Exponential family curved exponential families: dimension of (n
1
, n
2
, ..., n
k
)is
less than k. Also assume that {Q
1
(), Q
2
(), ..., Q
1
().

i=1
m
(a, b}) contain
a k-dm rectangle.
If x
1
, x
2
, ..., x
n
are idd with densities (*) then a su for based on X is (

n
i=1
T
1
(x
i
),

n
i=1
T
2
(x
i
), ...,

n
i=1
T
k
(x
i
))
(In fact, complete su, mutual su)
If x
1
, x
2
, ..., x
n
are idd N (
1
,
2
). Then a su stat is (

n
i=1
x
i
,

n
i=1
x
2
i
) or if
can perform (x,s
2
). Examples of su stats.
x
1
, x
2
, ..., x
n
idd such stats
(1) (
1
,
2
)
1
> 0,
2
> 0
textsuchstats (

n
i=1
x
i
,

n
i=1
x
1
)or(

n
i=1
log x
i
,

n
i=1
x
1
)
4
(2) u(, ) maxx
i

(3) f
x
(x; ) =
e
x
1

2
I(0 < x < , (0, +)) (

n
i=1
x
1
, x
n=n
)
Hard to use the factorization criterion to show that

T is not su.To show not
sucient, we should use denition, the condition distribution of x given

T =

t
does depend on for a non-tribial set of

ts whether T(x) is su depends on H .
H
1
H
2
H
3
H
1
H
2
H
3
x
1
, x
2
, ..., x
n
iddN(,1), R x
1
, x
2
, ..., x
n
iddu(, ), > 0
x
1
, x
2
, ..., x
n
x
1
, x
2
, ..., x
n
x
1=n
, x
2=n
, ..., x
n=n
x
1=n
, x
2=n
, ..., x
n=n

n
i=1
x
i
= n (x
i
= n, x
n
= n)
max x
i
= max(x
1=n
, x
n=n
)
Minimal sucient statistic = is minimal su statistic (based on X F(x; )),
if is sucient for based on x and is a function of every su statistic based
on x.
To identity a minimal su. Stat, we need to consider the minimal su parti-
tion of x (the set of possible values of x)
For each x X, let
D(X) = {y = y X, f
x
y; ) = f
x
(x, ) k(y, x)
. .
indep of
}](Note : ify
D(X), then x D(y))
This denes a partition of x.
For example, x
1
, x
2
, ..., x
n
idd B(1, ) x= {0, 1} x = 2
6
= 64
X f(x; ) x = 64
(1,1,1,1,1,1)
6
1
(1,1,1,1,1,0) (1-)
5
6
15
20
15
6
(0,0,0,0,0,0) (1-)
6
1
_

_
partitions
D(x) is called the minimal su partition.
Any statistic function T(x) which is constant within cells & takes an dierent
values in dierent cells, denes T(x) a minimal statistic.
How must x and y be related if they are to be in the same cell of the minimal
5
su partition?
Must have f
x
(x, ) and f
x
(y, ) such that are is a constant mu of the others.
Usually, can look at
f
x
(x,)
f
x
(y,)
eg: x
1
, x
2
, ..., x
n
idd B(1, ) for x, y {0, 1}
n
f
x
(x, )
f
y
(y, )
=

x
1
(1 )
1x
1
...
x
n
(1 )
1x
n

y
1
(1 )
1y
1
...
y
n
(1 )
1y
n
=
_

1
_

n
i=1
x
i

n
i=1
y
i
This is indep of i

n
i=1
x
i
=

n
i=1
y
i
Consider T(x)=

n
i=1
x
i

n
i=1
x
i
is minimal su for based on x.
eg: x
1
, x
2
, ..., x
n
idd Cauchy (,1)
f
x
(x, ) =
n

i=1
1
(1 + (x
i
)
2
)
=
f
x
(x, )
f
y
(y, )
=

n
i=1
(1 + (y
i
)
2
)

n
i=1
(1 + (x
i
)
2
)
if y is permutation of x
f
x
(x,)
f
y
(y,)
is indep of . That means the minimal si.
so (x
1=n
, x
2=n
, ..., x
n=n
) is a minimal su stat.
1.20 Nice properties of estimates.
Untiasedness (all estimates based on bayesian are biased)
X F
x
(x; ), H . Wish to estimate g() based on X.
If Y=h(X) is a estimate with nite mean ()
Can write E

(Y ) = g()+b
y
(), where b
y
() is called th bias of Y as an estimate
of g()
If b
y
() = 0, H . Then Y is said to be an unbiased estimated of g() (and
consequently, g() is estimable)
eg: x
1
, x
2
, ..., x
n
idd N(, 1), R
Y =
1
n

n
i=1
x
i
is an unbiased estimate of
More generally, if x
1
, x
2
, ..., x
n
idd with common mean , the

n
i=1
x
i
is an um-
biased estimate of
If x
1
, x
2
, ..., x
n
are idd with common variance , then s
2

1
n1

n
i=1
(x
i
x)
2
is an unbiased estimate of
2
.
Invariance in estimation
If we use T as an estimate of g(). And then wish to estimate h(g()) we would
consider h(T) as a good estimate of h(g())
eg :x
1
, x
2
, ..., x
n
idd N(,
2
) wish to estimate
2
by T=
1
n1

n
i=1
(x
i
x)
2
.
Instead we might wish to estimate . Consider

T but E(

t < )
However

T multiple by a constant should be unbiased for . But the constant


are dependent on the density.
Unbiasedness not invariant. If T is unbiased for g(), h(t) usually not unbiased
for h(g()).
Unviased estimates are not unique.
If x
1
, x
2
, ..., x
n
idd N(, 1)
x is unbiased for .
6
x
1
+2x
2
+3x
3

x
1
,
x
1
+x
2
2
xn+1
2
=n
are all unviased for .
If T
1
, T
2
, ..., T
k
are unbiased est. of g() based on X. Then

n
i=1
a
i
T
i
is un-
biased for g() provided

n
i=1
a
i
= 1
Somtimes, could have = = {unbiased est ofg()}
eg: X bin(1,P), p (0,1) wish to estimate log p. Claim no unbiased est. exist.
Suppose T= t(x) is an unviased est of p based on x.
log p = E
p
(T) = T(0)(1 p) + T(1)p = T(o) + (T(1) T(0))p is a linear fraet
of P #.
So, no matter what T your choose, log p is not a linear function of therefore, no
unbiased est exists.
Proposition: suppose x F
x
(x; ). H . Thas only a nite number a pos-
sible value.
Proof: For any est. T, we have only a nite bist of possible values T
1
, ..., T
N
so, E

(T) =

n
i=1
T
i
P

(T = T
i
) (
l=N
, T
N=N
) is bomeled, but () is unboun-
ded, so no chace for E

= () so proposition theorem holds.


E. lehmann
They only unviased statistic may be unreasonable.
eg: X p(), wish to estimate e
2
. Let T(X) be an unbiased statistic.
e
2
=

j=0
e

j
j!
T(j)
e
2
=

j=0

j
j!
T(j) for all > 0
e
2
=

j=0
()
j
j!
=

j=0
()
j
(1
j
)
j!
T(j) = (1)
j
j = 0, 1, 2, ...
=
_
1if j is even,
1if j is odd.
but e

(0, 1),
How we pick among unbiased estimate? Step i, suciency.
Suppose that T, is unbrased est. of g() beased an X and that S is the (minimal)
sucient stat. For based on X.
Dene T
2
= E(T
1
S) obviausly its a function of s.
Its also a statistic because T
1
is a function of X, and S is suft stat. for .
Usually, it is a estimate.
Usually, if H is convex.
What mat unbiasedness? E

(T
2
) = E

(E

(T
1
|s)) = E

(T
1
)
So T
2
is also unbiased.
If we are lucky

contain only one est.


Completeness
Let {F(x; ) H } be a family of distributions. The family is said to be
complete for any measureable function g.
_
E

(g(x)) = 0, H

_
P

(g(x)) = 0, H

()
The family is said to be boundedly complete if for any bounded g, (*) hold if
E

(h(x)) = 0 can call h(x) an unbiased est. of .


7
The family is complete if there are no non.trivial unbiased est. of .
Complete bounded complete
Supose {F

H } is complete & suppose g


1
(x) and g
2
(x) are unbiased est. of
h() g(x) = g
1
(x) g
2
(x)
p
o
Eg: X (0, ), (0, )
Consider suppose E

(g(x)) = 0

0
1

g(x)dx = 0 () (
_
b
a
f(x)dx = 0(a, b) f(x) = 0a.e.x)
g(x) = 0a.ex
If dierentiable for (*), then g() = 0, > 0, g(x) = 0, x > 0.
eg2: f(x; ) = ()r(x)e
x
where < <
Suppose E

g(x) = 0, (, )
int

()r(x)e
x
g(x)dx = 0, (, )
int

()e
x
g(x)dx = 0, (, )
This is just the laplace transform of g(x)r(x) is o(x.i.e), P

(g(x)0) =
0 (, )
In generall, in a k.dim exponential family, the min, su stat. based a sample of si-
ze n (for any n) haz a complete family of distribution provided Q
1
(), Q
2
(), ..., Q
k
(), inH
contains an open set. (usually k-dim rectangle), otherwise often not.
Eg: x
1
, x
2
, ..., x
n
idd f(x; ) is this f
x
(x; ) complete?
consider x
1
x
2
.g(x) = x
1
x
2
, E

g(x) = 0 p(g(x)0) = 1) f
x
(x, ) is not
complete.
Questions Friday 1-23-10
1 Rohaliji pag 364 example 9
f(x; ) =
1

I(x
_

2
,

2

, > 2) f(x; ) =
1

n
I(x {x :

2
x
n:n


2
})(g(n : n, x
n:n
))?f(x; ) =
1

I(x (0, )), > 0f(x; ) =


1

n
I(x
1!
n > 0), I(x
n:n
< ) = g(n : n, )
2 N N(,
2
)f
x
(x; ) =
1

2
2
e
1

x
1
2
x
2
l

1
2
Q
1
=
1

, Q
2
is function of Q
1
?
Q
2
=
1
2
Denition. Suciency
Let x = (x
1
, x
2
, ..., x
n
) be a sample from
. .
or have a family of possibles distribution?
{F
x
(x : ), H }
A statistics I = I(X) is SUFFICIENT for (or for the family of distributions)
bured on x, i each conditional distribution) of x given I=t is independent of
(except perphaps for a null set A with P

(T A) = 0

forall

theta

inH )
January 11/09
Class
8
Data: X=x
Model: X F(x; ), H
Gool:
Given a model F(x; ) we have to decide a time value of bured x.
We know H
We do not know wich is in H
Clasical (objetive) Bayesian (subjetive)
estimation
_
point,
interval
Aditional information. f
H
(x) prior data minotion of
Hypothesis testing
seleding ranking, ...
f
H
Inference: Trying to infer properly of a population bored on a sample propor-
tion.

1
=
x
n
if n is small me does mo preper it, if = 0
2
y
3
are good.
=
1
2

3
on are good.
x
1
, x
2
, ..., x
n
idd N(, 1)
Take sucient statistic x
When x N(,
1
n
)
Now we can gerent a sample x
1
, x
2
, ..., x
n
with the some distribution N(, 1)
January 11
Class
On class of stimate of v rs
{(x) : }
Example
(i) x
1
, x
2
, ..., x
n
idd N(, 1), R
we wish to estimate
Candidates: x =
1
n

x
i
or xn+1
2
:n
or x
1
or x
2
1
Nor
x
1
+
2
if a is uniknow the
x
1
+
2
is not a statistics
(ii) x
1
, x
2
, ..., x
n
idd N(
1
,
2
),
1
R,
2
R
+
we wis estimate (
1
,
2
)
Candidates: (x, s
2
) s
2
=
1
n1
I(x
i
x)
2
(xn+1
2
: n,
(x
n:n
x
n:n
)
2
16
)
Which estimate is best?
Depend on how the estimate will be used depend on cant of error.
Might consider loss functions might considerer the corresponding risk:
l(t,v) f

(l(t, v))
cost of estimate v by t
* The problem is to write the loss function.
9
CLASS 02 January
* I =t is sucient we can reproduce data without knodoge of the parameter
, because the conditional distribution of x doues not depend on
* I is a umpad of the somple, we would like to get the more compant. This is
the minimal sucient statistics.
* Denition: the some.
* Example: x
1
, x
2
, ..., x
n
iid B(1,p), p (0,1), let T =

x
i
note Y bin(n, p)
is T sucient (m p bored on x?)
For t {0, 1, 2, ..., n}
Consider p(X = xT = t) =
p(X=x,T=t)
p(T=t)
=
p(x=x)I(x
i
{0,1},i=1,2,...,n,

n
i=1
x
i
=t)
(
n
t
)p
t
(1p)
nt
n=6, t=4, (1,1,1,1,0,0),
(1,1,1,1,0,1)
_
6
4
_
. . . .
. . . .
. . . .
=
1
(
n
t
)
I(x
i
= 0, 1;

x
i
= t)
T is a sucient for p based on x
Example 2 x
1
, x
2
, ..., x
n
iid N(, 1)
candidate sucient statistic T = 2x
i
claim; 1
?
Finiser-Neyman criterio or factorization criterio.
Eg1: x
1
, x
2
, ..., x
n
iid B(1,p), p (0, 1)
f
x
(x) =
n

i=1
p
x
i
(n p)
n2x
i
I(x
i
{0, i
..
iN
}) = p

x
i
(1.p)
n2x
i
. .
g(

x
i
,p)
i
..
iN
+
})
eg2: x
1
, x
2
, ..., x
n
iid N(, 1), R
f
x
(x) =

n
i=1
1

2
e

1
2
(x
i
)
2
, I(x
i
R)
= (
1

2
)
n
e

1
2

(x
i
)
2
, I(x
i
R)
= (
1

2
)
n
e

1
2

((x
i
)
2
2x
i
+
2
)
, I(x
i
R)
=
= (
1

2
)
n
e

1
2

(x
i
)
2
2

x
i
+

2
)
, I(x
i
R)
= (
1

2
)
n
e

1
2

(x
i
)
2
e

1
2
2

x
i

+e

1
2

2
)
, I(x
i
R)
Factorizacion criterio
proof: Ok
Why we use charadentic fuction?
Notes (1)? osk profesor.
10
January 11 1.1
Date.
x : (x
1
, x
2
)x RxR
w x (w
1
, w
2
) (x
1
, x
2
)

x :
1
x...x
n
R
+
(w
1
, ..., w
n
) (x
1
, ..., x
n
)
x F
x
(x; ) (x
1
, x
2
) f
(x
1
,x
2
)
(x
1
, x
2
; )
x Poisson x f
x
(x; )
F
x
(x)) = ()
x Normal f
x
(x; ,
2
) = (,
2
)
H = H = { : > 0}
= (,
2
) H = {(,
2
) : R,
2
> 0}
Eg: Tuss a coin n times
x = # of heads opeaning in the n times.
x bin(x, p) or X =

x
i
x
i
_
1 if head
0 otherwise
nisgirul x
i
bernoulli(p)
p= ? X =

x
i
bin(n, p)
= p A = {p : 0 p 1} Note:
_
1. Estimate: real yef value of
2. Estimator, random variable
How should we estimate p?
Pick some function of X to estimate P
T
1
=
x
n
E(T
1
) = p()
E [t
1
] = E
_
x
n

=
1
n
E [x] =
np
n
var [t
1
] = var
_
x
n

=
1
n
2
var(x)
=
1
n
2
var(1 p) =
p(1p)
n
January 11 class
T
2
=
1
e
, t
3
= C If P is close to
1
e
we preper T
2
=
1
e
We prefer t
1
, t
0
, t
2
, t
3
but we can not saw t
1
is better than t
2
, t
3
. In some cir-
cunstances t
2
, t
3
might be clases to p.
Let T
4
=
1
2
if
x
n

1
2
< 0,02
x
n
if
x
n

1
2
> 0,02
Model: suppose x= x
1
, x
2
, ..., x
n
has dist f
x
(x; ), H , H = parameter space.
is unknown except that
H (K-dim parameter space).
11
Eg:
X = (x
1
, ..., x
n
)x
i
N(,
2
) = N(,
2
) H K-dim parameter space(,
2
) H = {(,
2
) R

, R,
2
> 0}
Let v= h(), some fuction of { + 3 = h(,
2
) we wish to estimate v.
Dene x= {v : v = h()} for some H . A estimate of v is any statistic (any
function of x) before normal is a struct of x (sometime often v = orv =

)
Identilily assumption:
=

F
x
(x; ) F
x
(x;

)
Example: x= x
0
+ (, F)= parameter
Where x
o
N(, 1) R
F
x
0
N(3, 1) x
0
N(2, 1)
N(2, 1) N(3, 1)
x N(5, 2) x N(5, 2)
Often x is a ramdon sample from F(x; ) H
x = (x
1
, ..., x
n
) idd
H parameter space
x possibles values for X.
all function x to h ()
eg: {x
i
}
n
1
idd N(, 1), R
u =? x, xn+1
2
: n,
_
x
2
1
+x
2
2
,...
eg: {x
i
}
n
1
idd N(,
2
), R,
2
R

(,
2
) =? (x, s
2
), (xn+1
2
:n
,
R
2
4
),...
Which estimate should be used? T =? tsuchE

(t h()
2
) is innitely small.
T h()
Step (Fisher)
- Cut out all non-informative aspect of the data. - Pick (x) to simplify without
loss of infermation which does not depent to
eg: x
i
idd N(, 1)
xis sucient for all based on (x
1
, ..., x
n
).
Suciency:
Denition (Rohatzi)
Let x= (x
1
, ..., x
n
) be a sample from {f

: H }. A statistic T(T(x)) is suf-


cient for or for the family of distributions {f

: H }; if and only the


conditional distribution of x, given T=t, does not depend on (except perhops
for a null set A ,P

{T A} = 0 for all ).
Denition (Arnold)
Let x have a family of possible distribution {F

(x; )}
A statistic T=T(x) is sucient for based on x i each conditional distribution
of x given I = t is independent of (except perphops for a null set A with
12
P

(T A) = 0, H )
What is on family?
Binomial (1,p)

{F
x
(x, = p)}
Binomial (n,p)
{F
x
(x, ) = (n, p)}
eg: x
1
, ..., x
n
idd binomial (1,p). x= (x
1
, ..., x
n
) have a family of possibles dis-
tributions.
f
x
(x, = p} =
_

_
f
x
(x, = 0,5) f
x
(x, = 0,1)
f
x
(x, = 0,2) f
x
(x, = 0,8
...
f
x
(x) = f
x
1
(x
1
) +... +f
x
n
(x
n
) x
i
are indep
= P
x
1
(1 p)
1x
1
...P
x
n
(1 p)
1x
n
= p

x
i
(1 p)
n

x
i
I(x
i
{0, 1}i = 1, 2, ..., n)
Let T=

n
1
x
i
binomial (n,p) why?
Theorem 1.
The MGF uniquily determines a DF and, conversely, if the MGF exist, it is
unique.
M
t=

x
i
(t) = E(e
t

x
i
)
= Ee
tx
1
+tx
2
+...+tx
n
= Ee
tx
1
e
tx
2
...e
tx
n
= Ee
tx
1
Ee
tx
2
...Ee
tx
n
= m
x
1
(t)m
x
2
(t)...m
x
n
(t)
m
x
(t) =? x binomial (n,p)
m
x
(t) = E(e
tx
) =

n
i=0
e
tx
p
x
(X = x)
=

n
i=0
e
tx
_
n
x
_
p
x
(1 p)
nx
=

n
i=0
_
n
x
_
(e
t
p)
x
(1 p)
nx
Binomial theorem (a +b)
n
= sum
n
x=0
_
n
x
_
a
x
b
nx
= (e
t
p + (1 p))
n
If x bin (1,p) M
x
(t) = (e
t
p + (1 p))
1
. Then
M
t=

xi
(t) = M
x
1
(t)M
x
2
(t)...M
x
n
(t)
= (e
t
p + (1 p))
0
(e
t
p + (1 p))
1
...(e
t
p + (1 p))
n
By theorem this is the GMF a variable x Bin (n,p)
Therefore T =

x
i
Bin (n,p)
for t {0, 1, ..., n}
Non is T sucient for = p?
13
Note we here a x (x
1
, ..., x
n
), x
i
binom(1,p). Here = P and I =

x
i
(a one
dimensional case).
By denition we have:
If P (X=x T=t) is a independet of , the T is sucient for = p.
We none the P(AB) =
P(A

B)
P(B)
; P(B) > 0
P (X=x T=t) =
P(X=x,T=t)
P(T=t)
Notice: P (X=x , T=t) = P ({X = x}

{T = t})
What man the events {X = x} and {T = t}?
X = x : R R
x x
x = (x
1
, x
2
) : R

(x
1
, x
2
) (x
1
, x
2
)
.
.
.
x(x
1
, ..., x
n
) : R

or T=

x
i
: R

R
{x = x} = {(x
1
, ..., x
n
) R

: x(x
1
, ..., x
n
) = (x
1
, ..., x
n
)}
{T =

xi = t} = {(x
1
, ..., x
n
) R

: T(x
1
, ..., x
n
) =

x
i
= t}
{x = x}

{T =

xi = t} = x = x
P{X = xT = t} = P{X = x} for t=0,1,...,n
Now P(X = xT = t) =
P(X=x,T=t)
P(t=T)
=
P(X=x)
P(t=T)
I(x
i
= 0 or 1,

x
i
= t)
. P(X = x) = p{x
1
, ..., x
n
)} = P{x
1
= x
1
, x
2
= x
2
, ..., x
n
= x
n
}

n
i=1
P(x
i
=
x
i
) * x
i
is independent.
P(x
i
= x
i
) =? x
i
bin (1,p)
P(x
i
= x
i
) = P
xi
(i p)
1x
i
P(x = x) =

n
i=1
P(x
i
= x
i
) =

n
i=1
P
x
i
(i p)
1x
i
= p
x
1
(1 p)
1x
1
)p
x
2
(1 p)
1x
2
)...p
x
n
(1 p
)
1 x
n
)
= p

x
i
(1 p)
n

x
i
= p
t
(1 p)
nt
t =

x
i
. p(T=t) = ?
T =

x
i
binomal (n,p)
p(T = t) =
_
n
t
_
p
t
(1 p)
nt
The for

x
i
= t, whe have t= 0,1,2,...,n
P(x = xT = t) =
P
t
(1p)
nt
(
n
t
)p
t
(1p)
nt
I(x
i
{1, 0}, wit

n
i=1
x
i
= t)
=
1
(
n
t
)
I(x
i
{1, 0}, wit

n
i=1
x
i
= t)
Sino t, P(x = xT = t) does not depend on p. T is sucient for p based on
x. eg. x = (x
1
, ..., x
n
)
x have a family of possible distribution {F
x
(x, ) : H }
F
x
is a the distribution of x U (0, )
T= X
n:n
is sucient for = based on x.
P(X = xT = x
n:n
= t) doe not depend on ?
P(X = xT x
n:n
= t) =
P(xx,x
n:n
=t)
P(x
n:n
=t)
14
. P(x x, x
n:n
= t) =?
The event {x x, x
n:n
= t} = {(x
1
, ..., x
n
) (x
1
, ..., x
n
), x
n:n
= t}
= {x
1
x
1
, x
2
x
2
, ..., x
n
x
n
, x
n:n
= t}
= P(x
1
x
1
)P(x
2
x
2
)...P(x
n
x
n
) = F
x
1
...F
x
n
F
x
(x) =? if x U(a,b)
x f
x
(x) =
1
ab
I
(a,b)
(x)
f
x
(x) =
x
ab
I(x (a, b))
Here x U(o, )
here F
x
(x) =
x

Then P(x x, x
n:n
= t) = F
x
1
(x
1
)...F
x
n
(x
n
)
=
x
1

(x
1
)...
x
n

(x
n
)
=
1

x
i
. P(x
n:n
= t) = P(x
1
t, x
2
t, ..., x
n
t, )
= P(x
1
t)P(x
2
t)...P(x
n
t)
= F
x
1
(t)F
x
2
(t)...F
x
n
(t)
Remember F
x
(x) =
x

if x u(0, )
Then F
x
(t) =
t

if x u(0, )
P(x
n:n
= t) =
t

...
t

=
_
t

_
n
P(x xx
n:n
= t) =
1

n
i=1
x
i
_
t

_
n
I(x
n:n
= t)
=

n
i=1
x
i
t
n
, I(x
n:n
= t)
Then x
n
n
is sucient statistic for = basul on x become P(x xx
n:n
= t)
does not depend on .
MORE EXAMPLES AND EXERCISES
example 4 (1360) Let x
1
, x
2
be idd P()Rvs. Then T= x
1
+ x
2
is sucient
for based on x= (x
1
, x
2
)
P{X = xT = t} =
= P{X = (x
1
, x
2
)T = x
1
+x
2
= t}
= P{x
1
= x
1
, x
2
= x
2
)T = x
1
+x
2
= t}
=
_
P{x
1
=x
1
,x
2
=x
2
)T=x
1
+x
2
=t}
P(x
1
+x
2
=t)
if t = x
1
+x
2
, x
i
= 0, 1, 2, ...
0 otherwise
=
P{x
1
=x
1
,x
2
=tx
1
}
P(x
1
+x
2
=t)
I (x
i
{0, 1, ...} with

2
i=1
x
i
= t)
x
1
or x
2
are independet the
.P(x
1
= x
1
, x
2
= t x
2
) = P(x
1
= x
1
)P(x
2
= t x
2
)
We know that P(X=x)=
e

x
x!
I(x 0, 1, ...)
P(x
1
= x
1
, x
2
= t x
2
) =
e

x
1
x
1
!
e

x
2
x
2
!
. P(x
1
+x
2
= t) =?
What is the distribution of T= x
1
+x
2
?
. By MGF (Moment Generety Function)
15
m
t=x
1
+x
2
(t) = E(e
t(x
1
+x
2
)
) = E(e
tx
1
+tx
2
)
= E(e
tx
1
e
tx
2
)
= E(e
tx
1
)E(e
tx
2
) * x
1
and x
2
are independent
= m
x
1
(t)m
x
2
(t)
= m
x
1
(t) =? m
x
2
(t) =?
We knoe that x poiss(), mx
2
poiss() also if x poiss().
m
x
(t) = E(e
tx
) =

i=0
e
tx
f(x)
m
x
(t) =

i=0
e
tx e

x
x!
= e

i=0
(e
t
)
x
x!
we know that e
t
=

i=0
x
i
x!
. Remember that taylor theorem
f(x) =

i=0
f
(
i)(a)
i!
(x a)
i
m
x
(t) = e

e
e
t
= e
(e
t
1)
Then m
x
1
+x
2
(t) = m
x
1
(t) +m
x
2
(t)
= e
(e
t
1)
e
(e
t
1)
= e
2(e
t
1)
This is the GMF of X poisson(2, ). Therefore P(x
1
+x
2
= t) =
e
2(2)
t
t!
p{X = xT = t} =
P{x
1
=x
1
,x
2
=tx
1
}
p(x
1
+x
2
=t)
=
P(x
1
=x
1
)p(x
2
=tx
1
)
p(x
1
+x
2
=t)
e

x
1
x
1
!

tx
1
(tx
1
)!
e
2
(2)
t
t!
I
A
(X)
A= {(x
1
, x
2
) R

: x
i
= 0, 1, 2, ...; i = 1, 2; t = x
1
+x
2
}
e
2
t
x
1
!(tx
1
)!
e
2
2
t
t
t!
=
t!
2
t
x
1
!(tx
1
)!
I
A
(X)
=
_
t
x
1
_ _
1
2
_
t
I
A
(X)
Example 5 (pag 360)
Let x
1
, x
2
be idd P() Rvs, and consider the statistic T = x
1
+x
2
.
= H = {: > 0}
x = (x
1
, x
2
), T = x
1
+ 2x
2
P(x = xT = t) = P(x
1
= x
1
, x
2
= x
2
x
1
+ 2x
2
= t)
Theorem
Factorization criterion
Let x have density f
x
(x; ) H .Then T= T(x) is sucient for based on x
if and only if (i) f
x
(x; ) can be expresed in the form.
f
x
(x; ) = g(

(x), )h(x) (x R

)
Where h(.) is a nonnegative function of x
1
, ..., x
n
only and does not depend on
on g(.) is a nonegative function on and I(x).
The denition is true for discrete or absolute continous core and ever more abs-
tract setting (Halms- savoge)
16
proof
..
Suppose I is dimensional. Suppose *.
Let t
1
= T an let t
2
, ..., t
n
be such that the trom formation form x to T has a
no-vanmisting Jacobian for its inversa.
T : T(x) = t
T
1
: x = T
1
(t)
T(x) = T(T
1
(t)) is a fuction of t
1
so F
t
(t; ) = f
x
(t
1
(t); )(j(T
1
))

..
= g(t
1
; )h((T
1
(t))(T
1
)
We will show the conditional density f
t
2
,...,t
n
t
1
(t
2
, ..., t
n
t
1
) does not depend
on
f
t
2
,...,t
n
t
1
(t
2
, ..., t
n
t
1
) =
f(t
2
, ..., t
n
, t
1
)
f
t
1
(t
1
)
=
f
t
(t; )
f
t
1
(t
1
)
=
g(t
1
; )h(t
1
(t))(J(t
1
))
_
t
2
...
_
t
n
g(t
1
; )h(t
1
(t))(J(t
1
))dt
2
...dt
n
Notice: g(t
1
; ) is a constant for the integral, then we can canal onil ir with the
numeral.
=
h(t
1
(t))(J(t
1
))
_
t
2
...
_
t
n
h(t
1
(t))(J(t
1
))dt
2
...dt
n
This expression does not depend on . Then t
1
is sucient for based on x. Con-
versely: Asumme that x
1
, x
2
, ..., x
n
have density that can be obteined via Jaco-
bian frm f
x
(x; ) if xT
1
= t
1
is independent of t
1
then also x
2
, ..., x
n
T
1
= t
1
is independent of t
1
Why? on because (x
2
, ..., x
n
) i a subset of (x
1
, ..., x
n
)
So f
x
(x; ) = f
t
1
,x
2
,...,x
n
(t
1
(x), x
2
, ..., x
n
; )J
Notice: J does not depend on only it is function of x

i
s
f
x
(x; ) = f
t
1
,x
2
,...,x
n
(t
1
(x), x
2
, ..., x
n
)J
f
x
(x; ) = f
t
1
(t
1
(x); )f
x
2
,...,x
n
f(x
2
, ..., x
n
t
1
(t
1
(x)))
x
1
, ..., x
n
are independent t
1
(x
1
, ..., x
n
) = t
1
(x)
Then t
1
(x) and (x
2
, ..., x
n
) are independent. *** May depend on , depend on
x through t
1
(x) so * holds.
f
x
(x; ) = g(t
1
(x); )h(x)
f
n
discrete cases see rohotgis.
Denition (Ancillary statistic)
x f
x
(x; ), inH U(x) is anciallary based on x is the distribution of U(x)doesnt
involve .
Note1 : Fromtheproofweseethatg(t(x);) is except for a factor which might
17
depend on x but not on , thedensityofT.
Why?
Example. 1 x
1
, ..., x
n
idd N(, 1)
f
x
(x; ) = f
x
i
(x, )
=1

2(1)e

1
2
(
x
1

1
)
2
...frac1

2(1)e

1
2
(
x
n

1
)
2
=
_
1

2(1)
_
n
e

1
2

(x
i
)
2
we have that

(x
i
)
2
=

(x
i
x +x )
2
=

((x
i
x)
2
+ (x
i
x)(x ) + (x )
2
)
=

(x
i
x)
2
+

(x
i
x)(x ) +

x )
2
f
x
(x; () =
_
1

2
_
n
e

1
2

(x
i
x)
2

1
2

(x)
2
}
f
x
(x; () =
_
1

2
_
n
e

1
2

(x
i
x)
2
e

1
2

(x)
2
*

(x )
2
= n(x )
2
Then g(t(x), ) =? h(x)=?
let t(x) = x
g(t(x), ) = e

n
2
(x)
2
h(t(x), ) =
_
1

2
_
n
e

1
2

(x)
2
Therefore T(x)= x is sucient for all based on X.
Note 2:
If I is k-dimentional, say I = (t
1
, ..., t
k
) we say that I = (t
1
, ..., t
k
) are JOIN
SUFFICIENT for based on X.
EXAMPLE
textcircled2
x
1
, ..., x
n
idd N(, 1)
t
1
= x
1:n
t
2
= x
2:n
.
.
.
t
n
= x
n:n
Then (x
1:n
, x
2:n
, ..., x
n:n
) are joint sucient for based on x
Why? x is sucient for .
textcircled3
x
1
, ..., x
n
idd N(
1
,
2
);
1
,
2
UNKNOWN
t
1
= x
1:n
t
2
= x
2:n
18
.
.
.
t
n
= x
n:n
Why? . (x
1:n
, x
2:n
, ..., x
n:n
) are joint sucient for based on x
.(x, s
2
) or (

x
i
,

x
2
i
) is sucient for (
1
,
2
) based on x.
why?
Example 8
f
x
(x;
1
,
2
) =
1

2n
2
e

1
2

n
i=1
(x
i

1
)
2

2
=
1
theta
2

x
2
i

2
1

x
i
+n

2
1

2
f
x
(x;
1
,
2
) =
1

2n
2
e

1
2
2

x
2
i
e

1
2
2

x
i

m
2
1

2
g(t(x), ) =? h(x)=?
19

You might also like