You are on page 1of 38

STAT 201 Introduction to Business Statistics

Class 2: Describing Data by Graphs, Charts, Tables

Takeaways from Class 1

Statistics is the study of the collection, organization, analysis, interpretation, and presentation of data gives managers a better understanding of the business environment enables them to make more informed and better decisions Descriptive Statistics: Summaries of data which may be graphical, tabular, or numerical Inferential Statistics: Procedures that help draw conclusions about a set of data from a subset of that data
2

Takeaways from Class 1

Population: the set of all items or individuals of interest A parameter is a summary measure computed to describe a characteristic of the population Sample: a subset of the population A statistic is a summary measure computed to describe a characteristic of the sample drawn from the population

Takeaways from Class 1


Types of Data: Qualitative (Categorical): Data grouped by specific categories (e.g., eye color, marital status), generally

Nominal Scale: only labels Ordinal Scale: can be ordered

Quantitative (Numerical): Data grouped by numerical values (e.g., number of children, weight), generally

Interval Scale: meaningful intervals in addition to ordinal scale Ratio Scale: meaningful ratios in addition to interval scale (there is a zero value)

Time Series Data: Collected over several time 4 periods

Todays Focus

Introducing tabular and graphical methods commonly used to summarize both categorical and quantitative data. Tabular and graphical summaries of data are found in:

Annual reports Newspaper articles Research studies

Sources: The Economist, RealClearPolitics.com


5

Todays Focus

Introducing tabular and graphical methods commonly used to summarize both categorical and quantitative data. Tabular and graphical summaries of data are found in:

Annual reports Newspaper articles Research studies

Important to understand how these summaries are prepared and how they should be interpreted.

Summarizing Categorical Data


Frequency Distribution:

A frequency distribution is a tabular summary of data showing the number (frequency) of items of several non-overlapping classes.

Example: Soft Drink Purchases*

Data from a sample of 50 soft drink purchases


Coke Classic Diet Coke Pepsi Diet Coke Coke Classic Coke Classic Dr. Pepper Diet Coke Pepsi Pepsi Coke Classic Dr. Pepper Sprite Coke Classic Diet Coke Coke Classic Coke Classic Sprite Coke Classic Diet Coke Coke Classic Diet Coke Coke Classic Sprite Pepsi Coke Classic Coke Classic Coke Classic Pepsi Coke Classic Sprite Dr. Pepper Pepsi Diet Coke Pepsi Coke Classic Coke Classic Coke Classic Pepsi Dr. Pepper Coke Classic Diet Coke Pepsi Pepsi Pepsi Pepsi Coke Classic Dr. Pepper Pepsi Sprite

Source: Modern Business Statistics by Anderson, Sweeney, Williams


8

Summarizing Categorical Data

To develop a frequency distribution, count the number of times each item type appears in data.
Frequenc y

Soft Drink Coke Classic Diet Coke Dr. Pepper Pepsi Sprite

Example: Soft Drink Purchases*

Data from a sample of 50 soft drink purchases


Coke Classic Diet Coke Pepsi Diet Coke Coke Classic Coke Classic Dr. Pepper Diet Coke Pepsi Pepsi Coke Classic Dr. Pepper Sprite Coke Classic Diet Coke Coke Classic Coke Classic Sprite Coke Classic Diet Coke Coke Classic Diet Coke Coke Classic Sprite Pepsi Coke Classic Coke Classic Coke Classic Pepsi Coke Classic Sprite Dr. Pepper Pepsi Diet Coke Pepsi Coke Classic Coke Classic Coke Classic Pepsi Dr. Pepper Coke Classic Diet Coke Pepsi Pepsi Pepsi Pepsi Coke Classic Dr. Pepper Pepsi Sprite

10

Summarizing Categorical Data

To develop a frequency distribution, count the number of times each item type appears in data.
Frequenc y 19

Soft Drink Coke Classic Diet Coke Dr. Pepper Pepsi Sprite

11

Example: Soft Drink Purchases*

Data from a sample of 50 soft drink purchases


Coke Classic Diet Coke Pepsi Diet Coke Coke Classic Coke Classic Dr. Pepper Diet Coke Pepsi Pepsi Coke Classic Dr. Pepper Sprite Coke Classic Diet Coke Coke Classic Coke Classic Sprite Coke Classic Diet Coke Coke Classic Diet Coke Coke Classic Sprite Pepsi Coke Classic Coke Classic Coke Classic Pepsi Coke Classic Sprite Dr. Pepper Pepsi Diet Coke Pepsi Coke Classic Coke Classic Coke Classic Pepsi Dr. Pepper Coke Classic Diet Coke Pepsi Pepsi Pepsi Pepsi Coke Classic Dr. Pepper Pepsi Sprite

12

Summarizing Categorical Data

To develop a frequency distribution, count the number of times each item type appears in data.
Frequenc y 19 8

Soft Drink Coke Classic Diet Coke Dr. Pepper Pepsi Sprite

13

Example: Soft Drink Purchases*

Data from a sample of 50 soft drink purchases


Coke Classic Diet Coke Pepsi Diet Coke Coke Classic Coke Classic Dr. Pepper Diet Coke Pepsi Pepsi Coke Classic Dr. Pepper Sprite Coke Classic Diet Coke Coke Classic Coke Classic Sprite Coke Classic Diet Coke Coke Classic Diet Coke Coke Classic Sprite Pepsi Coke Classic Coke Classic Coke Classic Pepsi Coke Classic Sprite Dr. Pepper Pepsi Diet Coke Pepsi Coke Classic Coke Classic Coke Classic Pepsi Dr. Pepper Coke Classic Diet Coke Pepsi Pepsi Pepsi Pepsi Coke Classic Dr. Pepper Pepsi Sprite

14

Summarizing Categorical Data

To develop a frequency distribution, count the number of times each item type appears in data.
Frequenc y 19 8 5 13 5

Soft Drink Coke Classic Diet Coke Dr. Pepper Pepsi Sprite

15

Summarizing Categorical Data

We can also use EXCEL to find the frequencies.

16

Summarizing Categorical Data


Frequenc y 19 8 5 13 5

Soft Drink Coke Classic Diet Coke Dr. Pepper Pepsi Sprite

What can we say by looking at this data? Who is the market leader?

17

Coke Classic is the market leader Pepsi is second Diet Coke is third

Summarizing Categorical Data


Frequenc Soft Drink y Coke Classic 19 Diet Coke 8 Dr. Pepper 5 Pepsi 13 Sprite 5
Coke Classic Diet Coke Pepsi Diet Coke Coke Classic Coke Classic Dr. Pepper Diet Coke Pepsi Pepsi Coke Classic Dr. Pepper Sprite Coke Classic Diet Coke Coke Classic Coke Classic Sprite Coke Classic Diet Coke Coke Classic Diet Coke Coke Classic Sprite Pepsi Coke Classic Coke Classic Coke Classic Pepsi Coke Classic Sprite Dr. Pepper Pepsi Diet Coke Pepsi Coke Classic Coke Classic Coke Classic Pepsi Dr. Pepper Coke Classic Diet Coke Pepsi Pepsi Pepsi Pepsi Coke Classic Dr. Pepper Pepsi Sprite

The summary provides more insight than the raw data!


18

Summarizing Categorical Data

Relative frequency of a class equals the fraction or proportion of items belonging to a class. For a data with n observations:

Relative frequency of a class = Frequency of class / n

19

Summarizing Categorical Data

Relative frequency of a class equals the fraction or proportion of items belonging to a class.
Percent Soft Drink Frequency Relative Frequency Frequency Coke Classic 19 0.38 Diet Coke 8 0.16 Dr. Pepper 5 0.10 Pepsi 13 0.26 Sprite 5 0.10 Total 50 1.00

38 16 10 26 10 100

20

Summarizing Quantitative Data

A frequency distribution is a tabular summary of data showing the number (frequency) of items of several non-overlapping classes.
Three steps necessary to define the classes for a frequency distribution with quantitative data:

Determine the number of non-overlapping classes Determine the width of each class Determine the class limits

21

Summarizing Quantitative Data

Number of classes: Classes are formed by specifying ranges that will be used to group the data.
As a general rule, it is recommended to use between 5 and 20 classes. May use 2k > n as a guideline

22

Summarizing Quantitative Data

Too many classes:


May yield a very jagged distribution with gaps from empty classes Can give a poor indication of how frequency varies across classes
3.5 3

Frequency

2.5 2 1.5 1 0.5 0

12

16

20

24

28

32

36

40

44

48

52

56

60

Temperature

23

More

Summarizing Quantitative Data

Too few classes:


May compress variation too much and yield a blocky distribution Can obscure important patterns of variation
12 10

Frequency

8 6 4 2 0 0 30 60 More Temperature

24

Summarizing Quantitative Data


Number of classes:
Number

of Data Points
5- 7 6 - 10 7 - 12 10 - 20

Number of

Classes under 50 50 100 100 250 over 250


Class

widths can typically be reduced as the number of observations increases


Distributions
25

with numerous observations are more likely to be smooth and have gaps filled since data are plentiful

Summarizing Quantitative Data


Width of classes:
If

possible, use the same width for each class.

Range of data = Largest data point Smallest data point


Approximate class width = Range / (Number of classes) Generally round to a convenient number
26

Summarizing Quantitative Data


Class Limits:

Class limits must be chosen so that each data item belongs to one and only one class. Lower class limit is the smallest possible data value assigned to the class. Upper class limit is the largest possible data value assigned to the class. Class midpoint: The value halfway between the lower and upper class limits.
27

Summarizing Quantitative Data


Important Considerations for Selecting Classes:

Must be mutually exclusive

Must be all-inclusive
Categories (classes) should be of equal width Avoid empty categories

28

Example: Credit Card Balances


(See Class 02 Example Credit Card Balances.xls)
Credit

Card Balance Data (300 observations)

$1,018.00, $1,021.00, $1,081.00, $300.00, $769.00, $486.00, $716.00, $1,013.00, $440.00, $1,246.00, $1,254.00, , $1,115.00.

Determining

the number of non-overlapping classes:

The guideline 2k > n suggests 9 classes. (28 < 300, 29 > 300)

29

Example: Credit Card Balances


(See Class 02 Example Credit Card Balances.xls)
Minimum

Data value: $99.00 Maximum Date value: $1493.00 Range: $1493.00 - $99.00 = $1394.00
Approximate

Class Size:

Approx Class Size = Range/(# of classes)=1394/9=154.89 For convenience and better representation, will pick 200.00

30

Example: Credit Card Balances

Determining the class limits:


Class Lower Limit Class Upper Limit 0 199.99 200 399.99 400 599.99 600 799.99 800 999.99 1000 1199.99 1200 1399.99 1400 1599.99

Omitted the 9th class $1600 and under $1799.99 as no data falls in the class (maximum data value is $1493). So, will use 8 classes in total.
31

Example: Credit Card Balances


(See Class 02 Example Credit Card Balances.xls)

Excel Array Formula !

32

Example: Credit Card Balances


(See Class 02 Example Credit Card Balances.xls) Using PhStat2

33

Example: Credit Card Balances


(See Class 02 Example Credit Card Balances.xls) Using PhStat2

34

Summarizing Quantitative Data

Histogram: A histogram is constructed by placing the variable of interest on the horizontal axis and the frequency, relative frequency, or percent frequency on the vertical axis.
Rectangles with bases determined by the class limits on the horizontal axis and heights corresponding to frequency, relative frequency, or percent frequency.

35

Summarizing Quantitative Data

Histogram:
Adjacent rectangles of a histogram touch one another. (Unlike a bar graph, no separation between the rectangles.) Histograms provide information about the shape or form of a distribution.

36

Summarizing Quantitative Data


Histogram:
Histogram
90 120.00%

80
100.00% 70 60 Frequency 50 80.00% Frequency

60.00%
40 30 20 20.00% 10 0 199.99 399.99 599.99 799.99 999.99 1199.99 Upper Class Limit 1399.99 1599.99 0.00% 40.00% Cumulative %

37

Summarizing Quantitative Data

Cumulative Distributions (Ogives): shows the number of data items with values less than or equal to the upper class limit of each class.

38

You might also like