You are on page 1of 43

STAT 201 Introduction to Business Statistics

Class 3: Describing Data by Graphs, Charts, Tables Bar and Pie Charts Stem and Leaf Diagram Scatter Plots, Line Charts Describing Data by Numerical Measures

Takeaways from Class 2

A frequency distribution is a tabular summary of data showing the number (frequency) of items of several non-overlapping classes. Relative frequency of a class equals the fraction or proportion of items belonging to a class.

For a data with n observations: Relative frequency of a class = Frequency of class / n

Three steps necessary to define the classes for a frequency distribution with quantitative data:

2

Determine the number of non-overlapping classes Determine the width of each class Determine the class limits

Takeaways from Class 2

Too many classes:

May yield a very jagged distribution with gaps from empty classes
May compress variation too much and yield a blocky distribution

Too few classes:

May use 2k > n as a guideline Range of data = Largest data point Smallest data point

Approximate class width = Range / (Number of classes)

Takeaways from Class 2


Important Considerations for Selecting Classes:

Must be mutually exclusive Must be all-inclusive Categories (classes) should be of equal width Avoid empty categories
Histogram: A histogram is constructed by placing the variable of interest on the horizontal axis and the frequency, relative frequency, or percent frequency on the vertical axis. Cumulative Distributions (Ogives): shows the number of data items with values less than or equal to the upper class limit of each class.
4

Todays Focus

Further graphs and charts for qualitative and quantitative data


Bar and Pie Charts Stem and Leaf Diagram Scatter Plots and Line Charts

Describing Data by Numerical Measures

Mean Median Mode

Bar Charts

A bar chart is a graphical device for depicting categorical data summarized in a frequency, relative frequency, or percent frequency distribution.

Horizontal axis: Specify labels used for the classes (categories) Vertical axis: Using a bar of fixed width, drawn above each class label, extend the length of the bar until we reach the frequency, relative frequency, or percent frequency of the class. For categorical data, bars should be separated to emphasize the fact that each class is separate.

Soft Drink Example Revisited

Data from a sample of 50 soft drink purchases


Coke Classic Diet Coke Pepsi Diet Coke Coke Classic Coke Classic Dr. Pepper Diet Coke Pepsi Pepsi Coke Classic Dr. Pepper Sprite Coke Classic Diet Coke Coke Classic Coke Classic Sprite Coke Classic Diet Coke Coke Classic Diet Coke Coke Classic Sprite Pepsi Coke Classic Coke Classic Coke Classic Pepsi Coke Classic Sprite Dr. Pepper Pepsi Diet Coke Pepsi Coke Classic Coke Classic Coke Classic Pepsi Dr. Pepper Coke Classic Diet Coke Pepsi Pepsi Pepsi Pepsi Coke Classic Dr. Pepper Pepsi Sprite

Soft Drink Example Revisited

Data from a sample of 50 soft drink purchases


Percent Soft Drink Frequency Relative Frequency Frequency Coke Classic 19 0.38 Diet Coke 8 0.16 Dr. Pepper 5 0.10 Pepsi 13 0.26 Sprite 5 0.10 Total 50 1.00

38 16 10 26 10 100

Soft Drink Example Revisited

Column Bar Chart:


Bar Chart for Soft Drinks
20

15 Frequency

10

0 Coke Classic Diet Coke Dr. Pepper Soft Drinks Pepsi Sprite

Bar Charts vs Histogram

Histograms are used to represent a frequency distribution associated with a single quantitative (ratio or interval scale) variable. There are no gaps between the histogram bars. Bar charts are used when one or more variables of interest are categorical

10

Multivariate Categorical Data

A useful feature of bar charts is that they can display multiple issues.
Investor A 46.5 32.0 15.5 16.0 Total 110.0 Investor B 55.0 44.0 20.0 28.0 147.0 Investor C 27.5 19.0 13.5 7.0 67.0 Total 129.0 95.0 49.0 51.0 324.0

Investment Category Stocks Bonds Derivatives Savings

11

Multivariate Categorical Data

A useful feature of bar charts is that they can display multiple issues.
Investment Categories Across Investors
60.0 50.0 40.0 30.0 20.0 10.0 0.0 Stocks Bonds Derivatives Investment Categories Savings Investor A Investor B

Thousand Dollars

Investor C

12

Horizontal Bar Chart

Besides column bar charts, we can also display data by horizontal bar charts
Investment Categories Across Investors
Investment Categories Savings

Derivatives Investor C Bonds Investor B

Investor A
Stocks 0.0 10.0 20.0 30.0 40.0 50.0 Thousand Dollars 60.0

13

Pie Charts

A pie chart is another graphical device for depicting relative frequency, or percent frequency for categorical data.

A pie chart is the shape of a circle The circle is divided into slices corresponding to the categories or classes to be displayed. The size of each slice is proportional to the magnitude of the displayed variable associated with each category class.

14

Soft Drink Example Revisited

Data from a sample of 50 soft drink purchases


Percent Soft Drink Frequency Relative Frequency Frequency Coke Classic 19 0.38 Diet Coke 8 0.16 Dr. Pepper 5 0.10 Pepsi 13 0.26 Sprite 5 0.10 Total 50 1.00

38 16 10 26 10 100

15

Soft Drink Example Revisited

Pie Chart:
Pie Chart for Soft Drink Purchases

Sprite 10%

Pepsi 26%

Coke Classic 38%

Dr. Pepper 10%

Diet Coke 16%

16

Bar Chart vs Pie Chart

Both Bar and Pie Charts are used to depict categorical data. Which type of chart should be preferred?

A pie chart is appropriate to show proportion among a total. Otherwise a bar chart will be more appropriate.

17

Stem and Leaf Diagram

A simple way to see distribution details from quantitative data Constructing a Stem and Leaf Diagram

(Sort the data from low to high) Decide how to split into stem and leaves List all possible stems in a single column For each stem, list all leaves associated with the stem

18

Stem and Leaf Diagram

Ex: Number of questions answered correctly in an aptitude test:


112 73 126 82 92 72 92 128 104 108 69 76 118 132 96 97 86 127 134 100 107 73 124 83 92

115
95

76
141

91
81

102
80

81
106

84
68 100

119
98 85

113
115 94

98
106 106

75
95 119

19

Stem and Leaf Diagram

Sorting the data from low to high:


68 75 81 69 76 82 72 76 83 73 80 84 73 81 85

86
94 98 104 108 118 127

91
95 98 106 112 119 128

92
95 100 106 113 119 132

92
96 100 106 115 124 134

92
97 102 107 115 126 141

20

Stem and Leaf Diagram

Leading digits of each value to the left of a vertical line


6 7 8 9 10 11 12 13 14
21

Stem and Leaf Diagram

To the right of the vertical line, record the last digit for each data value
6 7 8 9 10 11 12 13 14
22

89 233566 01123456 12224556788 002466678 2355899 4678 24 1

Stem and Leaf Diagram

To the right of the vertical line, record the last digit for each data value
6 7 8 89 233566 01123456 12224556788 002466678 2355899 4678 24 1

Numbers to the left of the vertical line form the stem

9 10 11 12 13 14

Each digit to the right of the vertical line is a leaf

23

Stem and Leaf Diagram vs Histogram

A Stem and Leaf diagram is similar to the histogram as it displays the distribution for the quantitative variable. Advantages of Stem and Leaf over Histogram:

The stem and leaf display is easier to construct by hand Within a class interval, the stem and leaf display provides more information than the histogram because the stem and leaf shows the actual data In the histogram, the individual value of the data is lost once it falls into a class The stem and leaf diagram shows individual data values

24

Stem and Leaf Diagram

Using other stem units:


Using the 100s digit as the stem: Round off the 10s digit to form the leaves
6 7 1 8

613 would become 776 would become ... 1224 becomes

12

25

Line Charts

Line Charts are effective tools to represent data that are measured over time (e.g., monthly, quarterly, annually) Line charts show values of one variable vs. time

Time is traditionally shown on the horizontal axis Variable of interest on the vertical axis.

Line charts are also called trend charts or time series

26

Line Chart Example


Year 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 27 Inflation Rate 3.56 1.86 3.65 4.14 4.82 5.40 4.21 3.01 2.99 2.56 2.83 2.95 2.29 1.56 2.21 3.36 2.85 1.59 2.27 2.68 3.39 3.24

Inflation Rate
6 5 4 3

Inflation Rate

2
1 0 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Year

Scatter Diagrams

A Scatter diagram is a graphical presentation between two quantitative variables.


shows points for bivariate data (joint values for 2 quantitative variables) One variable is measured on the vertical axis and the other variable is measured on the horizontal axis

28

Scatter Diagrams

Relationship between number of commercials shown and sales:


Week 1 2 3 4 5 6 7 8 9 10
29

Number of commercials 2 5 1 3 4 1 5 3 4 2

Sales in $1000 50 57 41 54 54 38 63 48 59 46

Scatter Diagrams

Relationship between number of commercials shown and sales: Sales vs Number of Commercials Shown
70 Sales in $1000 60

50
40 30

20
10 0 0 1 2 3 4 5 Number of commercials shown 6

What type of a relationship is there between number of commercials shown and sales?

30

Increasing Linear Relationship

Scatter Diagrams

Several types of relationships between two variables


Increasing Linear Decreasing Linear

Increasing Curvilinear Decreasing curvilinear


No Relationship

31

Tabular and Graphical Methods for Summarizing Data

Categorical Data

Quantitative Data

Tabular Methods

Tabular Methods

Frequency distribution Relative frequency distribution Percent frequency distribution Bar chart Pie chart

Graphical Methods

Frequency distribution Relative frequency distribution Percent frequency distribution Cumulative frequency distribution Cumulative relative frequency distribution Cumulative percent frequency distribution

Graphical Methods

32

Histogram Ogive Stem and Leaf Display Line chart Scatter diagram

STAT 201 Introduction to Business Statistics


Class 3: Describing Data by Graphs, Charts, Tables Bar and Pie Charts Stem and Leaf Diagram Scatter Plots, Line Charts Describing Data by Numerical Measures

Measures of Center and Location

Center and Location

Mean

Median

Mode

Weighted Mean

xi
i1

XW
Midpoint Most often

x
i 1

Balance point

wx w w x w
i i

i i

i i

34

Mean (Arithmetic Average)


The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10

Mean = 3

Mean = 4

1 2 3 4 5 15 3 5 5

1 2 3 4 10 20 4 5 5

35

Mean (Arithmetic Average)

The Mean is the arithmetic average of data values

Population mean
N = Population Size

x1 x 2 x N N N
i1 i

Sample mean
n = Sample Size

x
36

x
i1

x1 x 2 x n n

Median

In an ordered array (lowest to highest), the median is the middle number, i.e., the number that splits the distribution in half numerically

50% of the data is above the median, 50% is below Represented as Md

The median is not affected by extreme values

0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10

Median = 3

Median = 3

37

Median

To find the median, sort the n data values from low to high (sorted data is called a data array) Find the value in the i = (1/2) n position

The ith position is the Median Index Point

If i is not an integer, round up to next highest integer. (i.e., for an odd number of observations, the median is the middle value.) If i is an integer, the median is the average of the values in position i and i + 1. (i.e., for an even number of observations, the median is the average of the two middle values.)

38

Median Example
Data array: 4, 4, 5, 5, 9, 11, 12, 14, 16, 19, 22, 23, 24

Note that n = 13 Find the i = (1/2) n position: i = (1/2)(13) = 6.5

Since 6.5 is not an integer, round up to 7 The median is the value in the 7th position: Md = 12
39

Mean vs Median

The median is the measure of location most often reported for annual income and property value data because a few extremely large incomes or property values can inflate the mean.

40

Shape of a Distribution

Describes how data is distributed Symmetric or skewed The greater the difference between the mean and the median, the more skewed the distribution
Symmetric Right-Skewed

Left-Skewed

Mean < Median


(Longer tail extends to left)
41

Mean = Median

Median < Mean


(Longer tail extends to right)

Shape of a Distribution
Left-Skewed Symmetric Right-Skewed

Mean < Median


(Longer tail extends to left)

Mean = Median

Median < Mean


(Longer tail extends to right)

Exam Scores

Weight and height of people

Data from business and economics Income Housing prices

42

Mode

A measure of location The value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may be no mode There may be several modes (2 modes = bimodal)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 5

No Mode

43

You might also like