You are on page 1of 13

Random Variables and Probability

Distributions
Random Variables - Random responses corresponding
to subjects randomly selected from a population.
Probability Distributions - A listing of the possible
outcomes and their probabilities (discrete r.v.s) or their
densities (continuous r.v.s)
Normal Distribution - Bell-shaped continuous
distribution widely used in statistical inference
Sampling Distributions - Distributions corresponding
to sample statistics (such as mean and proportion)
computed from random samples

Normal Distribution
Bell-shaped, symmetric family of distributions
Classified by 2 parameters: Mean () and standard
deviation (). These represent location and spread
Random variables that are approximately normal have
the following properties wrt individual measurements:

Approximately half (50%) fall above (and below) mean


Approximately 68% fall within 1 standard deviation of mean
Approximately 95% fall within 2 standard deviations of mean
Virtually all fall within 3 standard deviations of mean

Notation when Y is normally distributed with mean and


standard deviation :

Y ~ N ( , )

Normal Distribution

P (Y ) 0.50 P ( Y ) 0.68 P ( 2 Y 2 ) 0.95

Example - Heights of U.S. Adults


Female and Male adult heights are well approximated by
normal distributions: YF~N(63.7,2.5) YM~N(69.1,2.6)
20

20

18
16
14
12
10

10

8
6
4

Std. Dev = 2.48

Std. Dev = 2.61

Mean = 63.7

Mean = 69.1

N = 99.68
55.5

57.5

56.5

59.5

58.5

61.5

60.5

63.5

62.5

65.5

64.5

67.5

66.5

INCHESF
Cases weighted by PCTF

Source: Statistical Abstract of the U.S. (1992)

69.5

68.5

70.5

N = 99.23

0
59.5 61.5 63.5 65.5 67.5 69.5 71.5 73.5 75.5
60.5 62.5 64.5 66.5 68.5 70.5 72.5 74.5 76.5

INCHESM
Cases weighted by PCTM

Standard Normal (Z) Distribution


Problem: Unlimited number of possible normal
distributions (- < < , > 0)
Solution: Standardize the random variable to have
mean 0 and standard deviation 1

Y
Y ~ N ( , ) Z
~ N (0,1)

Probabilities of certain ranges of values and specific


percentiles of interest can be obtained through the
standard normal (Z) distribution

Standard Normal (Z) Distribution


Standard Normal Distribution Characteristics:

a
za

P(Z 0) = P(Y ) = 0.5000


P(-1 Z 1) = P(-Y +) = 0.6826
P(-2 Z 2) = P(-2Y +2) = 0.9544
P(Z za) = P(Z -za) = a (using Z-table)

0.500
0.000

0.100
1.282

0.050
1.645

0.025
1.960

0.010
2.326

0.005
2.576

Finding Probabilities of Specific Ranges


Step 1 - Identify the normal distribution of interest (e.g.
its mean () and standard deviation () )
Step 2 - Identify the range of values that you wish to
determine the probability of observing (YL , YU), where
often the upper or lower bounds are or -
Step 3 - Transform YL and YU into Z-values:

YL
ZL

YU
ZU

Step 4 - Obtain P(ZL Z ZU) from Z-table

Example - Adult Female Heights


What is the probability a randomly selected female is
510 or taller (70 inches)?
Step 1 - Y ~ N(63.7 , 2.5)
Step 2 - YL = 70.0 YU =
Step 3 70.0 63.7
ZL
2.52
ZU
2.5
Step 4 - P(Y 70) = P(Z 2.52) = .0059 ( 1/170)
z
2.4
2.5
2.6

.00
.0082
.0062
.0047

.01
.0080
.0060
.0045

.02
.0078
.0059
.0044

.03
.0075
.0057
.0043

Finding Percentiles of a Distribution


Step 1 - Identify the normal distribution of interest
(e.g. its mean () and standard deviation () )

Step 2 - Determine the percentile of interest 100p


% (e.g. the 90th percentile is the cut-off where only 90%
of scores are below and 10% are above)

Step 3 - Turn the percentile of interest into a tail


probability a and corresponding z-value (zp):
If 100p 50 then a = 1-p and zp = za
If 100p < 50 then a = p and zp = -za

Step 4 - Transform zp back to original units:

Yp z
p

Example - Adult Male Heights

Above what height do the tallest 5% of males lie above?


Step 1 - Y ~ N(69.1 , 2.6)
Step 2 - Want to determine 95th percentile (p = .95)
Step 3 - Since 100p > 50, a = 1-p = 0.05
zp = za = z.05 = 1.645

Step 4 - Y.95 = 69.1 + (1.645)(2.6) = 73.4

z
1.5
1.6
1.7

.03
.0630
.0516
.0418

.04
.0618
.0505
.0409

.05
.0606
.0495
.0401

.06
.0594
.0485
.0392

Statistical Models
When making statistical inference it is useful to
write random variables in terms of model
parameters and random errors

Y (Y )

Here is a fixed constant and is a random variable


In practice will be unknown, and we will use sample data to
estimate or make statements regarding its value

Sampling Distributions and the Central


Limit Theorem
Sample statistics based on random samples are also random
variables and have sampling distributions that are probability
distributions for the statistic (outcomes that would vary across
samples)
When samples are large and measurements independent then
many estimators have normal sampling distributions (CLT):
Sample Mean:


Y ~ N ,

Sample Proportion:

~ N

(1 )
,

Example - Adult Female Heights


Random samples of n = 100 females to be selected
For each sample, the sample mean is computed
Sampling distribution:

2.5
Y ~ N 63.5,
N (63.5,0.25)
100

Note that approximately 95% of all possible random


samples of 100 females will have sample means between
63.0 and 64.0 inches

You might also like