Professional Documents
Culture Documents
Statistics is the study of the collection, organization, analysis, interpretation, and presentation of data gives managers a better understanding of the business environment enables them to make more informed and better decisions Descriptive Statistics: Summaries of data which may be graphical, tabular, or numerical Inferential Statistics: Procedures that help draw conclusions about a set of data from a subset of that data
2
Population: the set of all items or individuals of interest A parameter is a summary measure computed to describe a characteristic of the population Sample: a subset of the population A statistic is a summary measure computed to describe a characteristic of the sample drawn from the population
Quantitative (Numerical): Data grouped by numerical values (e.g., number of children, weight), generally
Interval Scale: meaningful intervals in addition to ordinal scale Ratio Scale: meaningful ratios in addition to interval scale (there is a zero value)
Todays Focus
Introducing tabular and graphical methods commonly used to summarize both categorical and quantitative data. Tabular and graphical summaries of data are found in:
Todays Focus
Introducing tabular and graphical methods commonly used to summarize both categorical and quantitative data. Tabular and graphical summaries of data are found in:
Important to understand how these summaries are prepared and how they should be interpreted.
A frequency distribution is a tabular summary of data showing the number (frequency) of items of several non-overlapping classes.
To develop a frequency distribution, count the number of times each item type appears in data.
Frequenc y
Soft Drink Coke Classic Diet Coke Dr. Pepper Pepsi Sprite
10
To develop a frequency distribution, count the number of times each item type appears in data.
Frequenc y 19
Soft Drink Coke Classic Diet Coke Dr. Pepper Pepsi Sprite
11
12
To develop a frequency distribution, count the number of times each item type appears in data.
Frequenc y 19 8
Soft Drink Coke Classic Diet Coke Dr. Pepper Pepsi Sprite
13
14
To develop a frequency distribution, count the number of times each item type appears in data.
Frequenc y 19 8 5 13 5
Soft Drink Coke Classic Diet Coke Dr. Pepper Pepsi Sprite
15
16
Soft Drink Coke Classic Diet Coke Dr. Pepper Pepsi Sprite
What can we say by looking at this data? Who is the market leader?
17
Coke Classic is the market leader Pepsi is second Diet Coke is third
Relative frequency of a class equals the fraction or proportion of items belonging to a class. For a data with n observations:
19
Relative frequency of a class equals the fraction or proportion of items belonging to a class.
Percent Soft Drink Frequency Relative Frequency Frequency Coke Classic 19 0.38 Diet Coke 8 0.16 Dr. Pepper 5 0.10 Pepsi 13 0.26 Sprite 5 0.10 Total 50 1.00
38 16 10 26 10 100
20
A frequency distribution is a tabular summary of data showing the number (frequency) of items of several non-overlapping classes.
Three steps necessary to define the classes for a frequency distribution with quantitative data:
Determine the number of non-overlapping classes Determine the width of each class Determine the class limits
21
Number of classes: Classes are formed by specifying ranges that will be used to group the data.
As a general rule, it is recommended to use between 5 and 20 classes. May use 2k > n as a guideline
22
May yield a very jagged distribution with gaps from empty classes Can give a poor indication of how frequency varies across classes
3.5 3
Frequency
12
16
20
24
28
32
36
40
44
48
52
56
60
Temperature
23
More
May compress variation too much and yield a blocky distribution Can obscure important patterns of variation
12 10
Frequency
8 6 4 2 0 0 30 60 More Temperature
24
of Data Points
5- 7 6 - 10 7 - 12 10 - 20
Number of
with numerous observations are more likely to be smooth and have gaps filled since data are plentiful
Class limits must be chosen so that each data item belongs to one and only one class. Lower class limit is the smallest possible data value assigned to the class. Upper class limit is the largest possible data value assigned to the class. Class midpoint: The value halfway between the lower and upper class limits.
27
Must be all-inclusive
Categories (classes) should be of equal width Avoid empty categories
28
$1,018.00, $1,021.00, $1,081.00, $300.00, $769.00, $486.00, $716.00, $1,013.00, $440.00, $1,246.00, $1,254.00, , $1,115.00.
Determining
The guideline 2k > n suggests 9 classes. (28 < 300, 29 > 300)
29
Data value: $99.00 Maximum Date value: $1493.00 Range: $1493.00 - $99.00 = $1394.00
Approximate
Class Size:
Approx Class Size = Range/(# of classes)=1394/9=154.89 For convenience and better representation, will pick 200.00
30
Omitted the 9th class $1600 and under $1799.99 as no data falls in the class (maximum data value is $1493). So, will use 8 classes in total.
31
32
33
34
Histogram: A histogram is constructed by placing the variable of interest on the horizontal axis and the frequency, relative frequency, or percent frequency on the vertical axis.
Rectangles with bases determined by the class limits on the horizontal axis and heights corresponding to frequency, relative frequency, or percent frequency.
35
Histogram:
Adjacent rectangles of a histogram touch one another. (Unlike a bar graph, no separation between the rectangles.) Histograms provide information about the shape or form of a distribution.
36
80
100.00% 70 60 Frequency 50 80.00% Frequency
60.00%
40 30 20 20.00% 10 0 199.99 399.99 599.99 799.99 999.99 1199.99 Upper Class Limit 1399.99 1599.99 0.00% 40.00% Cumulative %
37
Cumulative Distributions (Ogives): shows the number of data items with values less than or equal to the upper class limit of each class.
38