You are on page 1of 36

Central Tendency and Variability

Descriptive Statistics
The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.

What is the pattern of scores over the range of possible values? Where, on the scale of possible scores, is a point that best represents the set of scores? Do the scores cluster about their central point or do they spread out around it?

Central Tendency
Measure of Central Tendency:

A single summary score that best describes the central location of an entire distribution of scores.
The typical score.

The center of the distribution.

One distribution can have multiple locations where scores cluster.


Must decide which measure is best for a given situation.

Central Tendency
Measures of Central Tendency:

Mean
The sum of all scores divided by the number of

scores.

Median
The value that divides the distribution in half

when observations are ordered.

Mode
The most frequent score.

Central Tendency Example: Mode


52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891 Mode: most frequent observation Mode(s) for hotel rates:

264, 317, 384

Pros and Cons of the Mode


Pros

Cons

Good for nominal data. Good when there are two typical scores. Easiest to compute and understand. The score comes from the data set.

Ignores most of the information in a distribution. Small samples may not have a mode.

Central Tendency Example: Median


52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891 The median is the middle value when observations are ordered.

To find the middle, count in (N+1)/2 scores when observations are ordered lowest to highest. (35+1)/2 = 18 317

Median hotel rate:


Finding the median with an even number of scores.


2, 2, 3, 5, 6, 7, 7, 7, 8, 9 With an even number of scores, the median is the average of the middle two observations when observations are ordered.

Find the average of the N/2 and the (N+2)/2 score.


N/2 = 5th score, (N+2)/2 = 6th score

Add middle two observations and divide by two.


(6+7)/2 = 6.5

Median is 6.5

Pros and Cons of Median


Pros

Cons

Not influenced by extreme scores or skewed distributions. Good with ordinal data. Easier to compute than the mean.

May not exist in the data. Doesnt take actual values into account.

Mean
Is the balance point of a distribution. The sum of negative deviations from the mean exactly equals the sum of positive deviations from the mean.

Mean
Population
mu

sigma, the sum of X, add up all scores

X N

N, the total number of scores in a population sigma, the sum of X, add up all scores

Sample
X bar

X X n

n, the total number of scores in a sample

Central Tendency Example: Mean


52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891

Mean hotel rate:

X X n
13005 X 371.60 35
Mean hotel rate: $371.60

Pros and Cons of the Mean


Pros

Mathematical center of a distribution. Just as far from scores above it as it is from scores below it. Good for interval and ratio data. Does not ignore any information. Inferential statistics is based on mathematical properties of the mean.

Cons

Influenced by extreme scores and skewed distributions. May not exist in the data.

The effect of skew on average.


In a normal distribution, the mean, median, and mode are the same. In a skewed distribution, the mean is pulled toward the tail.

Which average?
Each measure contains a different kind of information.

For example, all three measures are useful for summarizing the distribution of American household incomes.
In 1998, the income common to the greatest number of

households was $25,000. Half the households earned less than $38,885. The mean income was $50,600.

Reporting only one measure of central tendency might be misleading and perhaps reflect a bias.

Which average?
Wal-Mart's average wage is around $10 an hour, nearly double the federal minimum wage. The truth is that our wages are competitive with comparable retailers in each of the more than 3,500 communities we serve, with one exception: a handful of urban markets with unionized grocery workers. Few people realize that about 74 percent of Wal-Mart hourly store associates work full-time, compared to 20 to 40 percent at comparable retailers. This means Wal-Mart spends more broadly on health benefits than do most big retailers, whose part-timers are not offered health insurance. You may not be aware that we are one of the few retail firms that offer health benefits to parttimers. Premiums begin at less than $40 a month for an individual and less than $155 per month for a family.

Descriptive Statistics
The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.

What is the pattern of scores over the range of possible values? Where, on the scale of possible scores, is a point that best represents the set of scores? Do the scores cluster about their central point or do they spread out around it?

Measures of Variability
A single summary figure that describes the spread of observations within a distribution.

Measures of Variability
Range

Difference between the smallest and largest observations.

Interquartile Range

Range of the middle half of scores.


Mean of all squared deviations from the mean. Rough measure of the average amount by which observations deviate from the mean. The square root of the variance.

Variance

Standard Deviation

Variability Example: Range


Las Vegas Hotel Rates 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891 Range: 891-52 = 839

Pros and Cons of the Range


Pros

Cons

Very easy to compute. Scores exist in the data set.

Value depends only on two scores. Very sensitive to outliers. Influenced by sample size (the larger the sample, the larger the range).

Variability Example: Interquartile Range


Las Vegas Hotel Rates
52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891

Interquartile Range:

(35+1)/4 = 9 472-257 = 215

Pros and Cons of the Interquartile Range


Pros

Cons

Fairly easy to compute. Scores exist in the data set. Eliminates influence of extreme scores.

Discards much of the data.

Variance
The average amount that a score deviates from the typical score.

Score Mean = Difference Score Average of Difference Scores = 0 In order to make this number not 0, square the difference scores (no negatives to cancel out the positives).

Variance: Definitional Formula


Population

(X ) N

Sample
2

(X X ) n

sigma

Variance
Use the definitional formula to calculate the variance.

n (3 6) 2 (4 6) 2 (4 6) 2 (4 6) 2 (6 6) 2 (7 6) 2 (7 6) 2 (8 6) 2 (8 6) 2 (9 6) 2 2 S 10 40 S2 4.0 10

(X X )

Variance: Computational Formula


Population Sample
N X 2 ( X ) 2 N2

S2

n X 2 ( X ) 2 n2

Variance
Use the computational formula to calculate the variance.
X X2

S2

n X 2 ( X ) 2 n2

10(400) (60) 2 S 10 2 4000 3600 S2 100 S 2 4.0


2

3 9 4 16 4 16 4 16 6 36 7 49 7 49 8 64 8 64 9 81 Sum: 60 Sum: 400

Variability Example: Variance


Las Vegas Hotel Rates
S2 n X 2 ( X ) 2 n2

2 35 ( 6686202 ) ( 13386 ) S2 352 234017070 179184996 2 S 1225 S 2 44760.88

X X 472 222784 303 91809 280 78400 282 79524 417 173889 400 160000 254 64516 205 42025 384 147456 264 69696 317 100489 76 5776 643 413449 480 230400 136 18496 250 62500 100 10000 732 535824 317 100489 264 69696 384 147456 750 562500 402 161604 422 178084 373 139129 325 105625 313 97969 749 561001 791 625681 196 38416 891 793881 283 80089 52 2704 186 34596 693 480249 Sum: 13386 Sum: 6686202

Pros and Cons of Variance


Pros

Cons

Takes all data into account. Lends itself to computation of other stable measures (and is a prerequisite for many of them).

Hard to interpret. Can be influenced by extreme scores.

Standard Deviation
To undo the squaring of difference scores, take the square root of the variance. Return to original units rather than squared units.

Standard Deviation
Rough measure of the average amount by which observations deviate on either side of the mean. The square root of the variance.

Population
2

Sample

s s
2

2
2

(X )

S
2

(X X )
n
2

N X2 ( X) 2 N

n X ( X ) n2

Variability Example: Standard Deviation

S S S

(X X )
n

(3 6) 2 (4 6) 2 (4 6) 2 (4 6) 2 (6 6) 2 (7 6) 2 (7 6) 2 (8 6) 2 (8 6) 2 (9 6) 2 10 40 2.0 10
S n X 2 ( X ) n2
2

10(400) (60) 2 S 10 2

Mean: 6 Standard Deviation: 2

4000 3600 100

S 4.0 S 2.0

Variability Example: Standard Deviation


Las Vegas Hotel Rates
9 8 7 6

Frequency

5 4 3 2 1 0

hotel rates

Mean:

Standard Deviation:

0-99

100-199

$371.60

200-299

300-399

Rates

400-499

500-599

35(6686202) (13386) 2 S 35 2 S 234017070 179184996 1225

S 44760.88 $211.57

600-699

700-799

800-899

Pros and Cons of Standard Deviation


Pros

Cons

Lends itself to computation of other stable measures (and is a prerequisite for many of them). Average of deviations around the mean. Majority of data within one standard deviation above or below the mean.

Influenced by extreme scores.

Mean and Standard Deviation


Using the mean and standard deviation together:

Is an efficient way to describe a distribution with just two numbers. Allows a direct comparison between distributions that are on different scales.

You might also like