You are on page 1of 95

Methods for Describing

Sets of Data
1. Describe data using graphs
2. Describe data using numerical measures

1/6/2015

Describing Qualitative Data

1/6/2015

Key Terms
A class is one of the categories into which
qualitative data can be classified.
The class frequency is the number of
observations in the data set falling into a
particular class.
The class relative frequency is the class
frequency divided by the total numbers of
observations in the data set.
The class percentage is the class relative
frequency multiplied by 100.
1/6/2015

Data Presentation
Data
Presentation
Qualitative
Data

Quantitative
Data
Dot
Plot

Summary
Table
Bar
Graph
1/6/2015

Pie
Chart

Pareto
Diagram

Stem-&-Leaf
Display

Frequency
Distribution

Histogram
4

Data Presentation
Data
Presentation
Qualitative
Data

Quantitative
Data
Dot
Plot

Summary
Table
Bar
Graph
1/6/2015

Pie
Chart

Pareto
Diagram

Stem-&-Leaf
Display

Frequency
Distribution

Histogram
5

Summary Table
1. Lists categories & number of elements in category
2. Obtained by tallying responses in category
3. May show frequencies (counts), % or both
Row Is
Category

1/6/2015

Major
Accounting
Economics
Management
Total

Count
130
20
50
200

Tally:
|||| ||||
|||| ||||

Data Presentation
Data
Presentation
Qualitative
Data

Quantitative
Data
Dot
Plot

Summary
Table
Bar
Graph
1/6/2015

Pie
Chart

Pareto
Diagram

Stem-&-Leaf
Display

Frequency
Distribution

Histogram
7

Bar Graph

Percent
Used
Also

Frequency

150

Equal Bar
Widths

Bar Height
Shows
Frequency or %

100

50

0
Acct.

Econ.
Major

Zero Point
1/6/2015

Mgmt.

Vertical Bars
for Qualitative
Variables
8

Data Presentation
Data
Presentation
Qualitative
Data

Quantitative
Data
Dot
Plot

Summary
Table
Bar
Graph
1/6/2015

Pie
Chart

Pareto
Diagram

Stem-&-Leaf
Display

Frequency
Distribution

Histogram
9

Pie Chart
1. Shows breakdown of
total quantity into
categories
2. Useful for showing
relative differences

Majors
Econ.
10%

Mgmt.
25%
36

Acct.
65%

3. Angle size

(360)(percent)
(360) (10%) = 36

1/6/2015

10

Data Presentation
Data
Presentation
Qualitative
Data

Quantitative
Data
Dot
Plot

Summary
Table
Bar
Graph
1/6/2015

Pie
Chart

Pareto
Diagram

Stem-&-Leaf
Display

Frequency
Distribution

Histogram
11

Pareto Diagram
Like a bar graph, but with the categories arranged by
height in descending order from left to right.

Percent
Used
Also

Frequency

150

Equal Bar
Widths

Bar Height
Shows
Frequency or %

100

50

0
Acct.

Mgmt.
Major

Zero Point

1/6/2015

Econ.

Vertical Bars
for Qualitative 12
Variables

Summary
Bar graph: The categories (classes) of the qualitative
variable are represented by bars, where the height of
each bar is either the class frequency, class relative
frequency, or class percentage.
Pie chart: The categories (classes) of the qualitative
variable are represented by slices of a pie (circle). The
size of each slice is proportional to the class relative
frequency.
Pareto diagram: A bar graph with the categories
(classes) of the qualitative variable (i.e., the bars)
arranged by height in descending order from left to
right.
1/6/2015

13

Thinking Challenge
Youre an analyst for IRI. You want to show the
market shares held by Web browsers in 2006.
Construct a bar graph, pie chart, & Pareto diagram
to describe the data.
Browser
Firefox
Internet Explorer
Safari
Others
1/6/2015

Mkt. Share (%)


14
81
4
1
14

Market Share (%)

Bar Graph Solution*


100%
80%
60%
40%
20%
0%
Firefox

Internet
Explorer

Safari

Others

Browser
1/6/2015

15

Pie Chart Solution*


Market Share
Firefox,
14%
Safari, 4%
Others,
1%

Internet
Explorer,
81%
1/6/2015

16

Market Share (%)

Pareto Diagram Solution*


100%
80%
60%
40%
20%
0%
Internet
Explorer

Firefox

Safari

Others

Browser
1/6/2015

17

Graphical Methods for Describing


Quantitative Data

1/6/2015

18

Data Presentation
Data
Presentation
Qualitative
Data

Quantitative
Data
Dot
Plot

Summary
Table
Bar
Graph
1/6/2015

Pie
Chart

Pareto
Diagram

Stem-&-Leaf
Display

Frequency
Distribution

Histogram
19

Dot Plot
1. Horizontal axis is a scale for the quantitative variable,
e.g., percent.
2. The numerical value of each measurement is located
on the horizontal scale by a dot.

1/6/2015

20

Data Presentation
Data
Presentation
Qualitative
Data

Quantitative
Data
Dot
Plot

Summary
Table
Bar
Graph
1/6/2015

Pie
Chart

Pareto
Diagram

Stem-&-Leaf
Display

Frequency
Distribution

Histogram
21

Stem-and-Leaf Display
1. Divide each observation
into stem value and leaf
value
Stems are listed in
order in a column
Leaf value is placed in
corresponding stem
row to right of bar

2 144677
3 028

26

4 1

2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
1/6/2015

22

Data Presentation
Data
Presentation
Qualitative
Data

Quantitative
Data
Dot
Plot

Summary
Table
Bar
Graph
1/6/2015

Pie
Chart

Pareto
Diagram

Stem-&-Leaf
Display

Frequency
Distribution

Histogram
23

Frequency Distribution
Table Steps
1. Determine range
2. Select number of classes
Usually between 5 & 15 inclusive
3. Compute class intervals (width)
4. Determine class boundaries (limits)
5. Compute class midpoints

6. Count observations & assign to classes


1/6/2015

24

Frequency Distribution Table


Example
Raw Data: 24, 26, 24, 21, 27 27 30, 41, 32, 38
Class

Width

1/6/2015

Midpoint Frequency

15.5 25.5

20.5

25.5 35.5

30.5

35.5 45.5

40.5

Boundaries

(Lower + Upper Boundaries) / 2


25

Relative Frequency &


% Distribution Tables
Relative Frequency
Distribution

Percentage
Distribution

Class

Prop.

Class

15.5 25.5

.3

15.5 25.5

30.0

25.5 35.5

.5

25.5 35.5

50.0

35.5 45.5

.2

35.5 45.5

20.0

1/6/2015

26

Data Presentation
Data
Presentation
Qualitative
Data

Quantitative
Data
Dot
Plot

Summary
Table
Bar
Graph
1/6/2015

Pie
Chart

Pareto
Diagram

Stem-&-Leaf
Display

Frequency
Distribution

Histogram
27

Histogram
Class
15.5 25.5
25.5 35.5
35.5 45.5

Count
5

Frequency
Relative
Frequency

Percent

Freq.
3
5
2

3
Bars
Touch

1
0
0

1/6/2015

15.5

25.5

35.5

45.5

Lower Boundary

55.5
28

Summation Notation

1/6/2015

29

Summation Notation
Most formulas we use require a summation of numbers.
n

i1

Sum the measurements on the variable that appears to the


right of the summation symbol, beginning with the 1st
measurement and ending with the nth measurement.

1/6/2015

30

Summation Notation
For the data x1 5, x2 3, x3 8, x4 5, x5 4
5

2
2
2
2
2
2
x

x
i 1 2 3 4 5
i1

5 3 8 5 4
25 9 64 25 16 139
2

1/6/2015

31

Mean, Median, Mode and


Range
The Basics of Statistics

1/6/2015

32

Did You Know


That you probably
use Statistics such
as Mean, Median,
Mode and Range
almost every day
without even
realizing it?!?
1/6/2015

33

Today We Will Learn

Mean
Median
Mode
Range

And how to use these in everyday life,


as well as the classroom!
1/6/2015

34

To Help You Learn


You should have
a Pencil, a Piece
of Paper, and a
Calculator if you
need one.

1/6/2015

35

What Do We Already Know?


Sure, the words Mean, Median, Mode
and Range all sound confusing
But what about the words we already
know, like Average, Middle, Most
Frequent, and Difference?
They are all the same ideas!
1/6/2015

36

Mean
The mean is the Average of a
group of numbers
It is helpful to know the mean
because then you can see which
numbers are above and below the
mean
It is very easy to find!
1/6/2015

37

Mean Example
Here is an example test scores for Ms.
Maths class.
82
93 86 97 82
To find the Mean, first you must add up all of the
numbers.
82+93+86+97+82= 440
Now, since there are 5 test scores, we will next
divide the sum by 5.
4405= 88
The Mean is 88!
1/6/2015

38

Median
The Median is the middle value
on the list.
The first step is always to put the
numbers in order.

1/6/2015

39

Median Example

First, lets examine these five test scores.


78 93 86 97 79
We need to put them in order.
78 79 86 93 97
The number in the middle is 86
78 79 86 93 97
In this case, the Median is 86!

1/6/2015

40

Median Example #2
Now, lets try it with an even number of test scores.

92

86

94

83

72

First, we will put them in order


72 83
86
88
92

88

94

This time, there are two numbers in the middle, 86 and 88


72 83
86
88
92
94

Now we will need to find the Average of these two numbers, by


adding them and dividing by two.
86+88= 174
1742= 87
1/6/2015

Here the Median is 87

41

Mode
The Mode refers to the number that occurs
the most frequently.
Its easy to remember the first two
numbers are the same! MOde and MOst
Frequently!

1/6/2015

42

Mode Example
Here is an list of temperatures for one week.
Mon. Tues. Wed. Thurs. Fri. Sat. Sun.
77
79 83
77
83 77
82

Again, We will put them in order.


77 77 77 79 82 83 83
77 is the most frequent number, so the mode= 77

1/6/2015

43

Range
The range is the difference between the
highest and the lowest numbers of the
series.
All we have to do is put the numbers in
order and subtract!

1/6/2015

44

Range Example
Lets look at the temperatures again.
77 77 77 79 82 83 83

The highest number is 83, and the lowest is


77.
All you need to do is subtract!
83-77= 6
In this case, the Range is 6
1/6/2015

45

Now YOU try it!!!


This is the Stat Family!

Dad
34
1/6/2015

Mom
33

Jack

Alex

Katie

1
46

Mean
Here are the ages again
Dad- 34, Mom- 33, Jack- 5, Alex- 5, Katie- 1
What is the Mean?
Remember Mean is the AVERAGE
Try it on your paper and see what you come up
with!
1/6/2015

47

Mean
Remember, to find the mean, we have to first add up
all of the numbers.
34+33+5+5+1= 78
Then, since there are 6 people in the family, we next
divide by 6.
785= 15.6
The Mean in this case is 15.6
1/6/2015

48

Median
Here are the ages again
Dad- 34, Mom- 33, Jack- 5, Alex- 5, Katie- 1
What is the Median?
Remember Median is the MIDDLE NUMBER
Try it on your paper and see what you come up
with!

1/6/2015

49

Median
Remember, to find the mean, we have to first
put all of the numbers in order.
34

33

The Mean in this case is 5

1/6/2015

50

Mode
Here are the ages again
Dad- 34, Mom- 33, Jack- 5, Alex- 5, Katie- 1
What is the Mode?
Remember Mode is the MOST FREQUENT
Try it on your paper and see what you come up with!

1/6/2015

51

Mode
Remember, to find the mode, we have to
first put all of the numbers in order.
34

33

The Mode in this case is 5

1/6/2015

52

Range
Here are the ages again
Dad- 34, Mom- 33, Jack- 5, Alex- 5, Katie- 1
What is the Range?
Remember Range is the DIFFERENCE

Try it on your paper and see what you come up


with!
1/6/2015

53

Range
Remember, to find the range, we have to first put
all of the numbers in order.
34

33

The highest age is 34, and the lowest is 1


Now we need to subtract to find the difference
34-1= 33
The range is 33
1/6/2015

54

1/6/2015

55

Next, Try it at home with your


Family!

1/6/2015

56

Numerical Measures
of Central Tendency

1/6/2015

57

Thinking Challenge
Rs 400,000
Rs 70,000
Rs 50,000
Rs 30,000

... employees cite low pay -most workers earn only Rs


20,000.

Rs 20,000

... President claims average


pay is Rs 70,000!

1/6/2015

58

Two Characteristics
The central tendency of the set of
measurementsthat is, the tendency of the data to
cluster, or center, about certain numerical values.

Central Tendency
(Location)

1/6/2015

59

Two Characteristics
The variability of the set of measurementsthat
is, the spread of the data.

Variation
(Dispersion)

1/6/2015

60

Standard Notation
Measure

Sample

Population

Mean

Size

1/6/2015

61

Mean
1.
2.
3.
4.

Most common measure of central tendency


Acts as balance point
Affected by extreme values (outliers)
Denoted x where
n

x
1/6/2015

x i
i 1

x 1 x 2 x

n
62

Mean Example
Raw Data:

10.3 4.9 8.9 11.7 6.3 7.7

x i
i 1

x1x2 x

x6

10 .3 4.9 8.9 11.7 6.3 7.7


6

8.30
1/6/2015

63

Median
1. Measure of central tendency

2. Middle value in ordered sequence

If n is odd, middle value of sequence


If n is even, average of 2 middle values

3. Position of median in sequence


n 1
Positioning Point
2
4. Not affected by extreme values
1/6/2015

64

Median Example
Odd-Sized Sample
Raw Data: 24.1 22.6 21.5 23.7 22.6
Ordered: 21.5 22.6 22.6 23.7 24.1
Position:
1
2
3
4
5
n 1 5 1
Positioning Point

3.0
2
2
Median 22 .6
1/6/2015

65

Median Example
Even-Sized Sample
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
Position:
1
2
3
4
5
6
n 1 6 1
Positioning Point

3.5
2
2
7.7 8.9
Median
8.30
2
1/6/2015

66

Mode
1. Measure of central tendency

2. Value that occurs most often


3. Not affected by extreme values
4. May be no mode or several modes
5. May be used for quantitative or qualitative
data

1/6/2015

67

Mode Example
No Mode
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
One Mode
Raw Data: 6.3 4.9 8.9

6.3 4.9 4.9

More Than 1 Mode


Raw Data: 21 28

41

1/6/2015

28

43

43

68

Thinking Challenge
Youre a financial analyst
for Prudential-Bache
Securities. You have
collected the following
closing stock prices of new
stock issues: 17, 16, 21, 18,
13, 16, 12, 11.
Describe the stock prices
in terms of central
tendency.
1/6/2015

69

Central Tendency Solution*


Mean
n

x i
i 1

x 1 x 2 x

17 16 21 18 13 16 12 11
8

15 .5
1/6/2015

70

Central Tendency Solution*


Median
Raw Data: 17 16 21
Ordered: 11 12 13
Position:
1 2 3
n
Positioning Point
Median
1/6/2015

16 16
2

18 13 16 12 11
16 16 17 18 21
4 5 6 7 8
1 8 1

4.5
2
2

16
71

Central Tendency Solution*


Mode
Raw Data:

17 16 21 18 13 16 12 11

Mode = 16

1/6/2015

72

Summary of
Central Tendency Measures
Measure
Mean
Median

Mode

1/6/2015

Formula
xi /n
(n +1)
Position
2
none

Description
Balance Point
Middle Value
When Ordered
Most Frequent

73

Percentiles
Measures of central tendency that divide a
group of data into 100 parts
At least n% of the data lie below the nth
percentile, and at most (100 - n)% of the
data lie above the nth percentile
Example: 90th percentile indicates that at
least 90% of the data lie below it, and at
most 10% of the data lie above it
The median and the 50th percentile have
the same value.
Applicable for ordinal, interval, and ratio
data
Not applicable for nominal data
1/6/2015

74

Percentiles: Example
Raw Data: 14, 12, 19, 23, 5, 13, 28, 17
Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28
Location of
30
i (8) 2.4
30th percentile:

100

The location index, i, is not a whole number;


i+1 = 2.4+1=3.4; the whole number portion is
3; the 30th percentile is at the 3rd location of
the array; the 30th percentile is 13.
1/6/2015

75

Percentiles: Example
Raw Data: 14, 12, 19, 23, 5, 13, 28, 17
Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28
Location of
30
i (8) 2.4
30th percentile:

100

The location index, i, is not a whole number;


i+1 = 2.4+1=3.4; the whole number portion is
3; the 30th percentile is at the 3rd location of
the array; the 30th percentile is 13.
1/6/2015

76

Quartiles
Measures of central tendency that divide a
group of data into four subgroups
Q1: 25% of the data set is below the first
quartile
Q2: 50% of the data set is below the second
quartile
Q3: 75% of the data set is below the third
quartile
Q1 is equal to the 25th percentile
Q2 is located at 50th percentile and equals
the median
Q3 is equal to the 75th percentile
Quartile values are not necessarily members
of the data set
1/6/2015

77

Quartiles

Q2

Q1
25%

1/6/2015

25%

Q3
25%

25%

78

Quartiles: Example
Ordered array: 106, 109, 114, 116, 121,
122, 125, 129
25
109114
i
(8) 2
Q1
1115
.
Q1
100

Q2
Q3
1/6/2015

50
i
(8) 4
100

116121
Q2
1185
.
2

75
i
(8) 6
100

122125
Q3
1235
.
2
79

Measures of Shape
Skewness

Absence of symmetry
Extreme values in one side of a distribution

Kurtosis

Peakedness of a distribution
Leptokurtic: high and thin
Mesokurtic: normal shape
Platykurtic: flat and spread out

Box and Whisker Plots

Graphic display of a distribution


Reveals skewness

1/6/2015

80

Shape
1. Describes how data are distributed

2. Measures of Shape
Skew = Symmetry
Left-Skewed
Mean Median

1/6/2015

Symmetric
Mean = Median

Right-Skewed
Median Mean

81

Skewness

Negatively
Skewed

1/6/2015

Symmetric
(Not Skewed)

Positively
Skewed

82

Kurtosis
Peakedness of a distribution
Leptokurtic: high and thin
Mesokurtic: normal in shape
Platykurtic: flat and spread out
Leptokurtic
Mesokurtic

1/6/2015

Platykurtic

83

Numerical Measures
of Variability

1/6/2015

84

Range
1. Measure of dispersion
2. Difference between largest & smallest
observations
Range = xlargest xsmallest
3. Ignores how data are distributed

7 8 9 10
Range = 10 7 = 3
1/6/2015

7 8 9 10
Range = 10 7 = 3
85

Variance &
Standard Deviation
1. Measures of dispersion

2. Most common measures


3. Consider how data are distributed
4. Show variation about mean (x or )
x = 8.3
4
1/6/2015

8 10 12
86

Standard Notation
Measure
Mean

Sample

Population

Standard
Deviation

Variance

Size

1/6/2015

N
87

Sample Variance Formula


n

s2

i1

n 1

x1 x x2 x

L xn x

n 1

n 1 in denominator!

1/6/2015

88

Sample Standard Deviation


Formula
s s2
n

i1

n 1

x1 x x2 x
2

1/6/2015

L xn x

n 1
89

Variance Example
Raw Data:

10.3 4.9 8.9 11.7 6.3 7.7

(x i x )
i 1

n 1

where x

x i
i 1

8.3

10 .3 8.3 ) (4.9 8.3 ) (7.7 8.3 )


(

6 1

6.368

1/6/2015

90

Thinking Challenge
Youre a financial analyst
for Prudential-Bache
Securities. You have
collected the following
closing stock prices of
new stock issues: 17, 16,
21, 18, 13, 16, 12, 11.
What are the variance
and standard deviation
of the stock prices?
1/6/2015

91

Variation Solution*
Sample Variance
Raw Data: 17 16 21 18 13 16 12 11
n

(x i x )
i 1

n 1

where x

i 1

15 .5
2

17 15 .5 ) (16 15 .5 ) (11 15 .5 )
(

11.14
1/6/2015

x i

8 1

92

Variation Solution*
Sample Standard Deviation
n

s s2

1/6/2015

i1

n 1

11.14 3.34

93

Summary of
Variation Measures
Measure
Range
Standard Deviation
(Sample)

Formula

Description

X largest X smallest
n

x x

Total Spread

Dispersion about
Sample Mean

i1

n 1
Standard Deviation
(Population)

i1

Dispersion about
Population Mean

N
n

Variance
(Sample)
1/6/2015

xi x

i1

n 1

Squared Dispersion
about Sample Mean
94

1/6/2015

95

You might also like