You are on page 1of 21

Mathematical

Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents

Mathematical Biostatistics Boot Camp: Lecture 4, Random


Vectors

Random
vectors
Independence
Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

Brian Caffo
Department of Biostatistics
Johns Hopkins Bloomberg School of Public Health
Johns Hopkins University

August 3, 2012

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

Table of contents
1 Table of contents

Brian Caffo
Table of
contents
Random
vectors
Independence
Independent
events
Independent
random variables
IID random
variables

2 Random vectors
3 Independence

Independent events
Independent random variables
IID random variables
4 Correlation

Correlation
Variance and
correlation
properties

5 Variance and correlation properties


6 Variances properties of sample means

Variances
properties of
sample means

7 The sample variance

The sample
variance

8 Some discussion

Some

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

Random vectors

Brian Caffo
Table of
contents
Random
vectors

Random vectors are simply random variables collected into a vector


For example if X and Y are random variables (X , Y ) is a random vector

Independence
Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

Joint density f (x, y ) satisfies f > 0 and


For discrete random variables

PP

RR

f (x, y )dxdy = 1

f (x, y ) = 1

In this lecture we focus on independent random variables where

f (x, y ) = f (x)g (y )

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

Independent events

Brian Caffo
Table of
contents
Random
vectors
Independence
Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

Two events A and B are independent if

P(A B) = P(A)P(B)
Two random variables, X and Y are independent if for any two sets A and B

P([X A] [Y B]) = P(X A)P(Y B)


If A is independent of B then
Ac is independent of B
A is independent of B c
Ac is independent of B c

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

Example

Brian Caffo
Table of
contents
Random
vectors

What is the probability of getting two consecutive heads?

Independence

A = {Head on flip 1}

P(A) = .5

B = {Head on flip 2}

P(B) = .5

Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

A B = {Head on flips 1 and 2}


P(A B) = P(A)P(B) = .5 .5 = .25

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

Example

Brian Caffo
Table of
contents
Random
vectors
Independence
Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

Volume 309 of Science reports on a physician who was on trial for expert

testimony in a criminal trial


Based on an estimated prevalence of sudden infant death syndrome of 1 out of

8, 543, Dr Meadow testified that that the probability of a mother having two
2

1
children with SIDS was 8,543
The mother on trial was convicted of murder
What was Dr Meadows mistake(s)?

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

Example: continued

Brian Caffo
Table of
contents
Random
vectors
Independence
Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

For the purposes of this class, the principal mistake was to assume that the

probabilities of having SIDs within a family are independent


That is, P(A1 A2 ) is not necessarily equal to P(A1 )P(A2 )
Biological processes that have a believed genetic or familiar environmental

component, of course, tend to be dependent within families


In addition, the estimated prevalence was obtained from an unpublished report

on single cases; hence having no information about recurrence of SIDs within


families

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

Useful fact

Brian Caffo
Table of
contents
Random
vectors
Independence
Independent
events
Independent
random variables
IID random
variables

We will use the following fact extensively in this class:

If a collection of random variables X1 , X2 , . . . , Xn are independent, then


their joint distribution is the product of their individual densities or mass
functions
That is, if fi is the density for random variable Xi we have that

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

f (x1 , . . . , xn ) =

n
Y
i=1

fi (xi )

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

IID random variables

Brian Caffo
Table of
contents
Random
vectors
Independence
Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

In the instance where f1 = f2 = . . . = fn we say that the Xi are iid for

independent and identically distributed


iid random variables are the default model for random samples
Many of the important theories of statistics are founded on assuming that

variables are iid

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

Example

Brian Caffo
Table of
contents
Random
vectors
Independence
Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

Suppose that we flip a biased coin with success probability p n times, what is

the join density of the collection of outcomes?


These random variables are iid with densities p xi (1 p)1xi
Therefore

f (x1 , . . . , xn ) =

n
Y
i=1

p xi (1 p)1xi = p

xi

(1 p)n

xi

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

Correlation

Brian Caffo
Table of
contents
Random
vectors
Independence
Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

The covariance between two random variables X and Y is defined as

Cov(X , Y ) = E [(X x )(Y y )] = E [XY ] E [X ]E [Y ]


The following are useful facts about covariance
1 Cov(X , Y ) = Cov(Y , X )
2 Cov(X , Y ) can p
be negative or positive
3 |Cov(X , Y )|
Var(X )Var(y )

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

Correlation

Brian Caffo
Table of
contents

The correlation between X and Y is

Random
vectors

Cor(X , Y ) = Cov(X , Y )/

p
Var(X )Var(y )

Independence
Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

1 Cor(X , Y ) 1
Cor(X , Y ) = 1 if and only if X = a + bY for some constants a and b
Cor(X , Y ) is unitless
4 X and Y are uncorrelated if Cor(X , Y ) = 0
5 X and Y are more positively correlated, the closer Cor(X , Y ) is to 1
6 X and Y are more negatively correlated, the closer Cor(X , Y ) is to 1
1
2
3

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors

Some useful results


Let {Xi }ni=1 be a collection of random variables
When the {Xi } are uncorrelated
!
n
n
X
X
Var
ai Xi + b =
ai2 Var(Xi )
i=1

Independence
Independent
events
Independent
random variables
IID random
variables

Otherwise

Var

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

i=1

n
X

!
ai Xi + b

i=1

n
X
i=1

ai2 Var(Xi ) + 2

n1 X
n
X

ai aj Cov(Xi , Xj )

i=1 j=i

) = 2 /n and E [S 2 ] = 2
If the Xi are iid with variance 2 then Var(X

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

Example proof
Prove that Var(X + Y ) = Var(X ) + Var(Y ) + 2Cov(X , Y )

Brian Caffo

Var(X + Y )
Table of
contents
Random
vectors

= E [(X + Y )(X + Y )] E [X + Y ]2

Independence
Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

= E [X 2 + 2XY + Y 2 ] (x + y )2
= E [X 2 + 2XY + Y 2 ] 2x 2x y 2y
= (E [X 2 ] 2x ) + (E [Y 2 ] 2y ) + 2(E [XY ] x y )
= Var(X ) + Var(Y ) + 2Cov(X , Y )

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

Result

Brian Caffo
Table of
contents
Random
vectors
Independence
Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

A commonly used subcase from these properties is that if a collection of

random variables {Xi } are uncorrelated, then the variance of the sum is the
sum of the variances
!
n
n
X
X
Var
Xi =
Var(Xi )
i=1

i=1

Therefore, it is sums of variances that tend to be useful, not sums of standard

deviations; that is, the standard deviation of the sum of bunch of independent
random variables is the square root of the sum of the variances, not the sum
of the standard deviations

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

The sample mean


Suppose Xi are iid with variance

Brian Caffo
Table of
contents

) = Var
Var(X

Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties

Some

n
X
1
Var
Xi
n2
i=1

1
n2

n
X

Var(Xi )

i=1

1
n 2
n2

2
n

Variances
properties of
sample means
The sample
variance

i=1

Random
vectors
Independence

1X
Xi
n

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

Some comments

Brian Caffo
Table of
contents
Random
vectors
Independence
Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

) =
When Xi are independent with a common variance Var(X
n

/ n is called the standard error of the sample mean

The standard error of the sample mean is the standard deviation of the

distribution of the sample mean


is the standard deviation of the distribution of a single observation
Easy way to remember, the sample mean has to be less variable than a single

observation, therefore its standard deviation is divided by a

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo

The sample variance


The sample variance is defined as

Table of
contents

S =

Random
vectors
Independence
Independent
events
Independent
random variables
IID random
variables

The numerator has a version thats quicker for calculation


n
n
X
X
)2 =
2
(Xi X
Xi2 nX

Variance and
correlation
properties

The sample
variance
Some

)2
X
n1

i=1 (Xi

The sample variance is an estimator of 2

Correlation

Variances
properties of
sample means

Pn

i=1

i=1

The sample variance is (nearly) the mean of the squared deviations from the

mean

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

The sample variance is unbiased

Brian Caffo

"
Table of
contents
Random
vectors
Independence
Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties

n
X

#
)2
(Xi X

i=1

n
X

 
 2

E Xi2 nE X

i=1

n
X




) + 2
Var(Xi ) + 2 n Var(X
i=1

n
X





2 + 2 n 2 /n + 2

i=1

= n 2 + n2 2 n2

Variances
properties of
sample means
The sample
variance
Some

= (n 1) 2

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

Hoping to avoid some confusion

Brian Caffo
Table of
contents
Random
vectors

Suppose Xi are iid with mean and variance 2

Independence

S 2 estimates 2

Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

The calculation of S 2 involves dividing by n 1

S/ n estimates / n the standard error of the mean

S/ n is called the sample standard error (of the mean)

Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors

Example

Brian Caffo
Table of
contents
Random
vectors
Independence
Independent
events
Independent
random variables
IID random
variables

Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some

In a study of 495 organo-lead workers, the following summaries were obtained

for TBV in cm3


mean = 1151.281
sum of squared observations = 662361978

p
(662361978 495 1151.2812 )/494 = 112.6215

estimated se of the mean = 112.6215/ 495 = 5.062


sample sd =

You might also like