Professional Documents
Culture Documents
Descriptive Statistics
The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
What is the pattern of scores over the range of possible values? Where, on the scale of possible scores, is a point that best represents the set of scores? Do the scores cluster about their central point or do they spread out around it?
Central Tendency
Measure of Central Tendency:
A single summary score that best describes the central location of an entire distribution of scores.
The typical score.
Central Tendency
Measures of Central Tendency:
Mean
The sum of all scores divided by the number of
scores.
Median
The value that divides the distribution in half
Mode
The most frequent score.
Cons
Good for nominal data. Good when there are two typical scores. Easiest to compute and understand. The score comes from the data set.
Ignores most of the information in a distribution. Small samples may not have a mode.
To find the middle, count in (N+1)/2 scores when observations are ordered lowest to highest. (35+1)/2 = 18 317
Median is 6.5
Cons
Not influenced by extreme scores or skewed distributions. Good with ordinal data. Easier to compute than the mean.
May not exist in the data. Doesnt take actual values into account.
Mean
Is the balance point of a distribution. The sum of negative deviations from the mean exactly equals the sum of positive deviations from the mean.
Mean
Population
mu
X N
N, the total number of scores in a population sigma, the sum of X, add up all scores
Sample
X bar
X X n
X X n
13005 X 371.60 35
Mean hotel rate: $371.60
Mathematical center of a distribution. Just as far from scores above it as it is from scores below it. Good for interval and ratio data. Does not ignore any information. Inferential statistics is based on mathematical properties of the mean.
Cons
Influenced by extreme scores and skewed distributions. May not exist in the data.
Which average?
Each measure contains a different kind of information.
For example, all three measures are useful for summarizing the distribution of American household incomes.
In 1998, the income common to the greatest number of
households was $25,000. Half the households earned less than $38,885. The mean income was $50,600.
Reporting only one measure of central tendency might be misleading and perhaps reflect a bias.
Which average?
Wal-Mart's average wage is around $10 an hour, nearly double the federal minimum wage. The truth is that our wages are competitive with comparable retailers in each of the more than 3,500 communities we serve, with one exception: a handful of urban markets with unionized grocery workers. Few people realize that about 74 percent of Wal-Mart hourly store associates work full-time, compared to 20 to 40 percent at comparable retailers. This means Wal-Mart spends more broadly on health benefits than do most big retailers, whose part-timers are not offered health insurance. You may not be aware that we are one of the few retail firms that offer health benefits to parttimers. Premiums begin at less than $40 a month for an individual and less than $155 per month for a family.
Descriptive Statistics
The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
What is the pattern of scores over the range of possible values? Where, on the scale of possible scores, is a point that best represents the set of scores? Do the scores cluster about their central point or do they spread out around it?
Measures of Variability
A single summary figure that describes the spread of observations within a distribution.
Measures of Variability
Range
Interquartile Range
Variance
Standard Deviation
Cons
Value depends only on two scores. Very sensitive to outliers. Influenced by sample size (the larger the sample, the larger the range).
Interquartile Range:
Cons
Fairly easy to compute. Scores exist in the data set. Eliminates influence of extreme scores.
Variance
The average amount that a score deviates from the typical score.
Score Mean = Difference Score Average of Difference Scores = 0 In order to make this number not 0, square the difference scores (no negatives to cancel out the positives).
(X ) N
Sample
2
(X X ) n
sigma
Variance
Use the definitional formula to calculate the variance.
n (3 6) 2 (4 6) 2 (4 6) 2 (4 6) 2 (6 6) 2 (7 6) 2 (7 6) 2 (8 6) 2 (8 6) 2 (9 6) 2 2 S 10 40 S2 4.0 10
(X X )
S2
n X 2 ( X ) 2 n2
Variance
Use the computational formula to calculate the variance.
X X2
S2
n X 2 ( X ) 2 n2
X X 472 222784 303 91809 280 78400 282 79524 417 173889 400 160000 254 64516 205 42025 384 147456 264 69696 317 100489 76 5776 643 413449 480 230400 136 18496 250 62500 100 10000 732 535824 317 100489 264 69696 384 147456 750 562500 402 161604 422 178084 373 139129 325 105625 313 97969 749 561001 791 625681 196 38416 891 793881 283 80089 52 2704 186 34596 693 480249 Sum: 13386 Sum: 6686202
Cons
Takes all data into account. Lends itself to computation of other stable measures (and is a prerequisite for many of them).
Standard Deviation
To undo the squaring of difference scores, take the square root of the variance. Return to original units rather than squared units.
Standard Deviation
Rough measure of the average amount by which observations deviate on either side of the mean. The square root of the variance.
Population
2
Sample
s s
2
2
2
(X )
S
2
(X X )
n
2
N X2 ( X) 2 N
n X ( X ) n2
S S S
(X X )
n
(3 6) 2 (4 6) 2 (4 6) 2 (4 6) 2 (6 6) 2 (7 6) 2 (7 6) 2 (8 6) 2 (8 6) 2 (9 6) 2 10 40 2.0 10
S n X 2 ( X ) n2
2
10(400) (60) 2 S 10 2
S 4.0 S 2.0
Frequency
5 4 3 2 1 0
hotel rates
Mean:
Standard Deviation:
0-99
100-199
$371.60
200-299
300-399
Rates
400-499
500-599
S 44760.88 $211.57
600-699
700-799
800-899
Cons
Lends itself to computation of other stable measures (and is a prerequisite for many of them). Average of deviations around the mean. Majority of data within one standard deviation above or below the mean.
Is an efficient way to describe a distribution with just two numbers. Allows a direct comparison between distributions that are on different scales.