Professional Documents
Culture Documents
2/14/2016
Descriptive Statistics
Statistics Sampling Methods Simple random
Pseudo random
Probability Stratified
Systematic
Cluster
Non Convenience
probability Judgment
Discrete
Quantitative
or Numerical Continuous
Dimensions of Data
General statistical symbol conventions:
Symbols differ for populations and samples
Populations are described by parameters
Parameters are represented by Greek letters
For example, = mu = population mean
= sigma = population standard deviation
N=Population Size
Samples are described by statistics
Statistics are represented by Latin letters
For example, = X-bar = sample mean
s = sample standard deviation
n = sample size
Symbols
Mean
thearithmetic average, the center of balance
for the data.
Median
The middle of the ordered data set
Mode
The most common value
Summary Methods--Center
Thesum of all the observations in a data set
divided by the number of observations.
= =
Mean Characteristics
It is that value that divides the data set into two parts of
equal size with respect to the number of observations.
Specifically the value than which 50% of the observations
are larger and 50% are smaller.
= +
Unique
More applicable
Insensitive
Exclusive
Median Characteristics
The value with the highest frequency.
Universal use
Simple
Only hope for nominal data
Insensitive
Highly unstable
Doesnt always exist.
Sometimes there is more than one mode.
Small changes in observations can dramatically change the
mode.
Mode Characteristics
Summary Statistic Behavior
Xi Xi + 7 2Xi
1 1+7=8 2*1=2
2 9 4
3 10 6
3 10 6
4 11 8
5 12 10
m=3 m = 10 m=6
Md = 3 Md =10 Md = 6
Mo = 3 Mo =10 Mo = 6
Irritating Data
= =
= =
Summary Methods--Spread
Simple and intuitive
It
is sensitive to extreme values, just like the
mean
Range Characteristics
Interquartile Range
The distance between the first and third quartiles
of the data set. IQR = Q3 - Q1
A quartile is a percentile, but instead of dividing
the data into 100 levels it divides it into 4
The first quartile is the 25th percentile
The third quartile is the 75th percentile
The IQR cuts off the smallest 25% and the
largest 25% of the data, removing outliers.
A Range Alternative
0 10 20 35 L25=(40+1)*(25/100)=
1 10 21 35 10.25
2 12 22 35 10th Obs = 9
4 13 22 38 11th Obs = 10
5 13 22 39 10-9=1
5 13 24 45 1*.25=.25 Q1=9.25
6 14 24 50 L75=(40+1)*(75/100)=
30.75
7 16 25 56
Q3=34.5
9 17 26 60
IQR=34.5-9.25=25.25
9 19 33 63
Calculate the IQR
2 2
2 1 2 1
= =
1
Variance
Itis a unique value and uses all information in
the data set
It is difficult to interpret
Variance Characteristics
= 2 = 2
Standard Deviation
Calculating relative frequencies is the closest to
dispersion one can get with categorical data.
What if you want to compare two data sets and
they are in different units, or they are in
different magnitudes?
The Coefficient of Variation (CV) is a measure
of relative dispersion.
Is everything covered?
= 100 = 100
Eliminates units and enables comparisons
Eliminates the effect of differences of
magnitude.
Often the best choice for comparative
dispersion
Concerns
Not usable for data with a 0 mean
Inappropriate for data that can be negative.
Coefficient of Variation
Summary Statistic Behavior
Xi Xi + 7 2Xi
1 8 2
2 9 4
3 10 6
3 10 6
4 11 8
5 12 10
=3 = 10 =6
Md = 3 Md =10 Md = 6
Mo = 3 Mo =10 Mo = 6
range = 4 range = 4 range = 8
2 = 1.667 2 = 1.667 2 = 6.667
= 1.291 = 1.291 = 2.582
More irritated data
Skewness
Measures the degree of asymmetry in a data set.
3
Pearsons 2nd Skewness Coefficient =
Ranges from -3 to 3 usually.
Reflects the general result that
when > Md, the data is right skewed, Sk>0
when < Md, the data is left skewed, Sk<0
when = Md, the data is un-skewed, ie, symmetric,
Sk=0
This is not a rule, rather a rule of thumb.
Statisticians know that the size of the sample and the
value of the mode affect the skewness.
Shape Methods
Right Skewed Histogram
Symmetric Histogram
Which group of males has the
Weights: Boys Men more uniform weight?
Mean 54.78 172.52
How do you know?
Median 53 171.5
.
Mode 52 171 = = . %
.
Standard Deviation 7.93 21.81 .
= = . %
Sample Variance 62.91 475.48 .
Skewness 0.67 0.14 Which group is least
Range 32 103 symmetric?
(.)
Minimum 41 126 = = .
.
Maximum 73 229 (..)
Sum 2739 8626 = = .
.
Count 50 50
14.5% 12.6%
Some Descriptive Statistics
Chebyshevs Theorem Empirical or Normal Rule
1 find about 68%
1 of observations
%OBS 1 2 2 find about 95%
k of observations
k number of standard 3 find about
deviations > 1 99.7% of observations
Universal application Only bell-shaped and
Provides minimum symmetric distributions.
guarantee Only integer values of
Descriptive Statistics