Professional Documents
Culture Documents
Information
Sample
Subset
Definitions
A variable is some characteristic of a population
or sample.
E.g. student grades.
Numerical Data
Numerical data
Real numbers, i.e. heights, weights, prices,
waiting time at a medical practice, etc.
Also referred to as quantitative or interval.
Arithmetic operations can be performed on
numerical data, thus its meaningful to talk
about 2*Height, or Price + $1, and so on.
Nominal Data
Nominal Data
The values of nominal data are categories.
E.g. responses to questions about marital status are
categories, coded as: Single = 1, Married = 2,
Divorced = 3, Widowed = 4
Ordinal Data
Ordinal Data
Ordinal data appear to be categorical in
nature, but their values have an order; a
ranking to them:
E.g. University course evaluation system: poor = 1,
fair = 2, good = 3, very good = 4, excellent = 5
Nominal data
person
married
person
married
Ordinal data
exam grade
grade
exam
HD
HD
75
000
1
yes
75 000
1
yes
D
D
68 000
000
no
68
22
no
CC
no
..
33.
no
P
P
.
.
.
F
..
F
.
.
.
.
all we
computer
brand
computer
brand
weight gain
gain With nominal data,
weight
With
ordinal data, all we
Food quality
quality
Food
can
calculate
is
the
1
IBM
1
IBM
+10
+10
can use is computations
Excellent
Excellent
proportion
of
data
that
2
Dell
2
Dell
+5
+5
involving the ordering
Good
Good
falls
into
each
category.
Compaq
33
Compaq
Satisfactory
process.
..
Satisfactory
IBM
44
IBM
Poor
Poor
..
..
..
IBM
Dell
Compaq
other total
total
IBM
Dell
Compaq
other
25
11
50
25
11
88
66
50
50%
22%
16%
12%
50%
22%
16%
12%
55
55
42
42
..
..
10
11
Hierarchy of Data
Numerical
Values are real numbers.
All calculations are valid.
Data may be treated as ordinal or nominal.
Nominal
Values are the arbitrary numbers that represent
categories.
Only calculations based on the frequencies of occurrence
are valid.
Data may not be treated as ordinal or numerical.
Ordinal
Values must represent the ranked order of the data.
Calculations based on an ordering process are valid.
Data may be treated as nominal but not as numerical.
12
Introduction ...
The methods presented apply to both
the entire population, and
a sample selected from the population.
15
Graphical techniques
for nominal data
The graphical presentations shown here
are used primarily for nominal data.
These graphical tools are most appropriate
when the raw data can be naturally
categorised in a meaningful manner.
16
Bar charts
The bar chart is mainly used for nominal
data.
A bar chart graphically represents the
frequency of each category as a bar rising
vertically from the horizontal axis
The height of each bar is proportional to the
frequency of the corresponding category.
17
Pie charts
Another useful chart to present nominal
data is the pie chart.
The pie chart is a very popular tool used to
represent the proportions of appearance for
nominal data.
A pie chart is a circle that is subdivided into
slices whose areas are proportional to the
frequencies (or relative frequencies),
thereby displaying the proportion of
occurrences of each category.
18
Example 2.1
To determine the approximate market share of
various womens magazines in New Zealand, a
womens magazine readership survey was
conducted using a sample of 200 readers.
Data was collected and the count of the
occurrences (frequencies) was recorded for each
magazine.
The frequencies were presented in a bar chart.
Then the frequencies were converted to
proportions and the results were presented in a
pie chart.
19
Example 2.1
1 = Australian Womens Weekly (NZ Edition); 2 = Next;
3 = NZ New Idea; 4 = NZ Womans Day; 5 = NZ Womens
Weekly; and 6 = Thats Life.
20
21
(10/100)(3600) = 360
22
23
Example 2.5
Providing information concerning the
monthly bills of new subscribers in the
first month after signing on with a
telephone company
collect data
prepare a frequency distribution
draw a histogram.
25
Example 2.5
As part of a larger study, a long-distance company
wanted to acquire information about the monthly
bills of new subscribers in the first month after
signing with the company. The companys
marketing manager conducted a survey of 200
new residential subscribers wherein the first
months bills were recorded. These data are stored
in file XM02-05. The general manager planned to
present his findings to senior executives. What
information can be extracted from these data?
26
Example 2.5
In Example 2.1 we created a frequency distribution
of the 6 categories. In this example we also create
a frequency distribution by counting the number of
observations that fall into a series of intervals,
called classes.
The justification for the classes chosen below will
be discussed later.
27
Example 2.5
We have chosen eight classes defined in such a way
that each observation falls into one and only one
class. These classes are defined as follows:
Classes
Amounts
Amounts
Amounts
Amounts
Amounts
Amounts
Amounts
Amounts
that
that
that
that
that
that
that
that
are
are
are
are
are
are
are
are
Example 2.4
29
Interpret
(18+28+14=60)200 = 30%
30
Building a Histogram
1) Collect the data
2) Create a frequency distribution for the data
How?
a) Determine the number of classes to use
How?
Refer to Table 2.10:
With 200 observations,
we should have
between 7 & 10
classes we could use Sturges formula:
Alternatively,
Number of class intervals = 1 + 3.3 log (n)
31
Building a Histogram
Class width
It is generally best to use equal class
widths, but sometimes unequal class
widths are called for.
Unequal class widths are used when the
frequency associated with some classes is
too low. Then,
several classes are combined together to form
a wider and more populated class
it is possible to form an open-ended class at
the higher or lower end of the histogram.
32
Building a Histogram
1) Collect the data
2) Create a frequency distribution for the data
How?
a) Determine the number of classes to use. [8]
b) Determine how wide to make each class
(assuming equal class width). How?
Look at the range of the data, that is,
Range = Largest observation
Smallest observation
Range = $119.63 $0 = $119.63
Then each class width becomes:
Range (# classes) = 119.63 8 15
33
Building a Histogram
34
Building a Histogram
35
Shapes of Histograms
Variable
Frequency
Frequency
Frequency
Symmetry
A histogram is said to be symmetric if, when we
draw a vertical line down the center of the
histogram, the two sides are identical in shape
and size:
Variable
Variable
36
Shapes of Histograms
Frequency
Frequency
Skewness
A skewed histogram is one with a long tail
extending to either the right or the left:
Variable
Positively skewed
Variable
Negatively skewed
37
Shapes of Histograms
Modality
A unimodal histogram is one with a single peak, while
a bimodal histogram is one with two peaks:
Bimodal
Frequency
Frequency
Unimodal
Variable
Variable
38
Shapes of Histograms
Frequency
Bell Shape
A special type of symmetric unimodal
histogram is one that is bell shaped:
bell shaped.
Variable
Bell Shaped
39
Histogram comparison
Compare and contrast the following histograms based
on data from Example 2.7: The marks from the computerUnimodal vs. bimodal
Marks (computer course)
40
Stem
Leaf
2
41
000001111233333334455555667889999
0000111112344666778999
001335589
124445589
33566
3458
022224556789
334457889999
00112222233344555999
Relative frequency
It is often preferable to show the
relative frequency (proportion) of
observations falling into each class,
rather than the frequency itself.
Class relative frequency=
Class frequency
Total number of observations
44
Relative frequency
Relative frequencies should be used
when
the population relative frequencies are
studied
comparing two or more histograms
the number of observations of the samples
studied are different.
45
46
Ogive
(pronounced Oh-jive) is a graph of
a cumulative frequency distribution.
We create an ogive in three steps
First, from the frequency distribution created
earlier, calculate relative frequencies:
Class relative frequency=
Class frequency
Total number of observations
47
Relative Frequencies
For example, we had 71 observations in our
first class (telephone bills from $0.00 to
$15.00). Thus, the relative frequency for this
class is 71 200 (the total # of phone bills) =
0.355 (or 35.5%).
48
Ogive
Is a graph of a cumulative frequency
distribution.
We create an ogive in three steps
1) Calculate relative frequencies.
2) Calculate cumulative relative
frequencies by adding the current
class relative frequency to the previous
class cumulative relative frequency.
(For the first class, its cumulative relative
frequency is just its relative frequency.)
2.49
First class
Next class: .355+.185=.540
:
:
50
Ogive
Is a graph of a cumulative frequency
distribution.
1) Calculate relative frequencies.
2) Calculate cumulative relative
frequencies.
3) Graph the cumulative relative
frequencies
Example 2.5 Ogive
51
Ogive
Example 2.5 Ogive
around $35
54
Line Chart
Plot the frequency of a category above
the point on the horizontal axis
representing that category.
Use line charts when the categories are
points in time.
Line charts are particularly useful when
the trend over time is to be
emphasised.
55
Line Chart
Figure 2.22 Line chart showing change in Australian exports over time
56
Line Chart
Australian exports have had a slow but
steady increase from 1992 to 2004.
After 2004, Australian exports have
been increasing steadily at a much
higher rate.
57
58
Example 2.8
In a major Australian city there are four
competing newspapers: N1, N2, N3 and N4.
To help design advertising campaigns, the
advertising managers of the newspapers
need to know which segments of the
newspaper market are reading their papers.
A survey was conducted to analyse the
relationship between newspapers read and
occupation.
60
Example 2.8
A sample of newspaper readers was
asked to report which newspaper they
read: N1, N2, N3, N4, and to indicate
whether they were blue-collar worker (1),
white-collar worker (2), or professional
(3).
The responses are stored in file XM02-08.
61
Example 2.8
By counting the number of times each of
the 12 combinations occurs, we produced
the Table 2.16.
62
Example 2.8
If occupation and newspaper are related,
63
Example 2.8
Interpretation: The relative frequencies in the rows 2
and 3 are similar, but there are large differences
between rows 1 and 2, and between rows 1 and 3.
64
Example 2.8
Interpretation: The relative frequencies in the rows
2 and 3 are similar, but there are large differences
between rows 1 and 2, and between rows 1 and 3.
Row 1: Blue collar (1); Row 2: White collar (2);
Row 3: Professional (3)
This tells us that blue collar workers tend to read
different newspapers from both white collar
workers and professionals and that white collar
and professionals are quite similar in their
newspaper choice.
65
Example 2.8
Use the data from the cross-classification table to create
bar charts
For example,
Professionals (3)
tend to read
newspaper N2
more than twice
as often as
newspaper N3.
66
Sales
Sales
30
30
40
40
40
40
50
50
35
35
50
50
35
35
25
25
67
Example 2.9
A small-business owner wants to assess the
effects of advertising on sales levels.
Paired observation data were collected.
Each pair consisted of monthly advertising
expenditure and monthly sales levels.
68
Scatter diagram
A scatter diagram can describe the
relationship between advertising
expenditure and sales.
Sales
Sales
30
30
40
40
40
40
50
50
35
35
50
50
35
35
25
25
Sales
Advert
Advert
11
33
55
44
22
55
33
22
Sales
60
50
40
30
20
10
0
0
ales
s
d
n
re a
u
t
i
hip.
d
s
n
n
e
o
i
p
t
g ex near rela
n
i
s
i
t
r
Adve to have li
ear
p
p
a
1
2
3
4
5
Advertising Expenditure
69
70
Chapter-Opening Example
WERE OIL COMPANIES GOUGING CUSTOMERS 19992006?: SOLUTION
Chapter-Opening Example
WERE OIL COMPANIES GOUGING CUSTOMERS 19992006?: SOLUTION
72
Chapter-Opening Example
73
Chapter-Opening Example
Interpreting the results:
The scatter diagram reveals that the two
prices are strongly related linearly.
As the oil price increases, petrol price also
increases. When the price of oil was below
A$85, the relationship between the two
variables was stronger than when the price of
oil exceeded A$85.
74
Summary I
Factors That Identify When to Use Frequency and Relative
Frequency Tables, Bar and Pie Charts
1. Objective: Describe a single set of data.
2. Data type: Nominal.
Factors That Identify When to Use a Histogram, Ogive, or Stemand-Leaf Display
1. Objective: Describe a single set of data.
2. Data type: Interval.
Factors that Identify When to Use a Cross-classification Table
1. Objective: Describe the relationship between two variables.
2. Data type: Nominal.
Factors that Identify When to Use a Scatter Diagram
1. Objective: Describe the relationship between two variables.
2. Data type: Interval.
75
Summary II
Numerical
data
Histogram
Single set of
data
Relationship
between two
variables
Scatter diagram
Nominal
data
Frequency and
relative
frequency
tables, bar and
pie charts
Crossclassification
table, bar charts
76
Typical patterns
Positive linear relationship
No relationship
77