Problems & Considerations in FD

PROBLEMS & CONSIDERATIONS
IN FREQUENCY DISTRIBUTION
SUBMITTED TO: SUBMITTED BY:
MR.DINESH DHANKHAR AARTI
Asst. Professor Ph.D. - 09

DEPARTMENT OF TOURISM & HOTEL MANAGEMENT, KUK
FREQUENCY DISTRIBUTION
FREQUENCY

It is the number of times a particular value of the variable
occurs.
usually abbreviated as f.

FREQUENCY DISTRIBUTION

A tabular organization of statistical data, where each piece of
data is assigned its corresponding frequency.

Used for both qualitative and quantitative data.

One variable is considered at a time.

Indicates the shape of empirical distribution of the variable.

TYPES OF FREQUENCIES
Absolute Frequency
The absolute frequency is the number of times that a certain
value appears in a statistical study.
It is denoted by fi.

The sum of the absolute frequencies is equal to the total number
of data, which is denoted by N.

This sum is commonly denoted by the Greek letter (capital
sigma) which represents 'sum'.
Cumulative Frequency

It is the sum of the absolute frequencies
of all values less than or equal to the
value considered.

It is denoted by Fi.
Relative cumulative Frequency

The quotient between the absolute frequency of a
certain value and the total number of data.

Expressed in terms of percentages or fractions.

Denoted by ni.

The sum of the relative frequency is equal to 1.
Example

A city has recorded the
following daily maximum
temperatures during the
month:
32, 31, 28, 29, 33, 32,
31, 30, 31, 31, 27, 28,
29, 30, 32, 31, 31, 30,
30, 29, 29, 30, 30, 31,
30, 31, 34, 33, 33, 29,
29.

x
i
f
i
F
i
N
i

27 1 1 0.032
28 2 3 0.097
29 6 9 0.290
30 7 16 0.0516
31 8 24 0.774
32 3 27 0.871
33 3 30 0.968
34 1 31 1
31
REASONS for constructing a FD

To organize the data in a meaningful, intelligent way.

To enable the reader to make comparisons among different data
sets.

To facilitate computational procedures for measures of average
and spread.

To enable the reader to determine the nature and shape of
distribution.

To enable the researcher to draw charts and graphs for
presentation of data.
TYPES of Frequency Distribution
CATEGORICAL

Are used when data can be placed in specific categories, such as nominal or
ordinal level data.
Example:
Political affiliations,
Blood types

UNGROUPED

Are used when few distinct data is to be organized.
Example:
Number of incoming calls per day over first 20 days

GROUPED

Are used when large amount of data is to be organized.
The values are grouped in intervals (classes) that have the same amplitude.
Each class is assigned its corresponding frequency.
Example:
Miles traveled by 50 employees of a company to work every day.

CONSTRUCTION
Classify the data

Decide the range by equal classes and number of classes for
dividing the data
The range of scores (highest score lowest score)

Width: divide the range by the number of class intervals.
Round the interval width in either direction to a convenient
number, even if that means adjusting the number of class
intervals.

Frequencies: count the number of observations that occur in
each interval and enter the count as the frequency of the
interval.
CONSIDERATIONS

Must:

5-20 classes;

The classes must be exhaustive- enough classes to accommodate all the data.

The class width should be an odd number. This ensures that the midpoint
has the same place value as the data. E.g. width= H-L/number of classes; round
up result.

The classes must be mutually exclusive- no overlapping class limits.

The classes must be continuous.

The classes must be equal in width (exception : open ended distributions, no
specific beginning value or no specific ending value.). This makes it easier to
compare the frequency in one class to another.

Mere SUGGESTIONS:

Avoid open-ended classes if possible such as "75 and over".

Try to use between 5 and 20 classes if possible. If you have
fewer than 5 classes, you're not really breaking up the data, and
if you use more than 20 classes, this will probably be
information overflow.

It is usually convenient to use class sizes of 5 or 10, in other
words, to have each class containing 5 or 10 possible values.

It is usually convenient to make the lower limit of the first
category a multiple of the class size.

It is necessary to include scores with zero frequency in order to
draw the frequency polygons correctly.

PROBLEMS
Selection of classes
No hard & fast rules

It depends on a number of factors such as:
The number of classes to be classified
The magnitude of the class interval
The accuracy desired
The ease of calculation for further processing of data

Difficult to find out values with zero frequencies

PRESENTATION
BAR GRAPH
Used for discrete variables,
often nominal or ordinal
data.
Bars represent separate
groups, so they should be
separated.

HISTOGRAM

was first introduced by Karl Pearson

It is a graphical representation of a single dataset, which is tallied into classes.

It comprises of a series of rectangles, the widths of which are defined by the
limits of the classes, the heights of these are determined by the frequency in
each interval.

Used for continuous variables.

Bars represent segments of a range, so they should touch.

A RELATIVE FREQUENCY HISTOGRAM

A Relative frequency histogram is made by
taking the relative frequencies as heights of
the rectangles.
Dont forget to close the tails to the X axis.

PIE GRAPH
1999 Top Company Employers in Central Florida
Tourism
35%
Retail
20%
Health Care
16%
Others
29%
Pie graphs are used to show the relationship between the
parts and the whole.
ABSOLUTE FREQUENCY POLYGON
An absolute frequency polygon is drawn exactly like a histogram
except that points are drawn rather than bars.
RELATIVE FREQUENCY POLYGON
The relative frequency polygon is drawn exactly like
the absolute frequency polygon except the Y-axis is
labeled and incremented with relative frequency
rather than absolute frequency.
CUMULATIVE FREQUENCY POLYGON/OGIVES

A cumulative frequency polygon will always be
monotonically increasing.
The line will never go down, it will either stay at the same
level or increase.
PARETO CHART
It is named after Vilfredo Pareto.
It is a chart that contains both bars and a line graph, where
individual values are represented in descending order by bars,
and the cumulative total is represented by the line.
The purpose of the Pareto chart is to highlight the most important
among a (typically large) set of factors.
Used to show frequencies for nominal variables.

TIME SERIES GRAPHS
A line chart, also called a time plot, is a series of data plotted at
various time intervals.
Measuring time along the horizontal axis and the numerical quantity
of interest along the vertical axis yields a point on the graph for
each observation.
Joining points adjacent in time by straight lines produces a time
plot.
Used to show a pattern or trend that occurs over time.

Growth Trends in Internet Use by Age
1997 to 1999
16.5
20.2
26.3
31.3
32.7
9.8
13.8
15.8
17.2
18.5
5
7.5
11.4
13
14.2
0
5
10
15
20
25
30
35
A
p
r
-
9
7
J
u
l
-
9
7
O
c
t
-
9
7
J
a
n
-
9
8
A
p
r
-
9
8
J
u
l
-
9
8
O
c
t
-
9
8
J
a
n
-
9
9
A
p
r
-
9
9
J
u
l
-
9
9
April 1997 to July 1999
M
i
l
l
i
o
n
s

o
f

A
d
u
l
t
s
Age 18 to 29
Age 30 to 49
Age 50+
STEM and LEAF PLOT

It is an alternative to the histogram.

Data are grouped according to their leading
digits (called the stem) while listing the final
digits (called leaves) separately for each
member of a class.

The leaves are displayed individually in
ascending order after each of the stems.

SCATTER PLOT
Absences Grade
0 2 4 6 8 10 12 14 16
40
45
50
55
60
65
70
75
80
85
90
95
Absences (x)
x
8
2
5
12
15
9
6
y
78
92
90
58
43
74
81

G
r
a
d
e
s
First introduced by Sir Francis Galton
SHAPES
Chapter 3 - 29
The Normal Distribution
A bell-shaped curve
Called the normal curve or a normal distribution
It is symmetrical
The far left and right portions containing the low-frequency
extreme scores are called the tails of the distribution.
Variations in Normal Distribution:
Mesokurtic = normal distribution
Leptokurtic = thin
Platykurtic = broad or fat

Copyright Houghton Mifflin Company. All rights reserved. Chapter 3 - 30
Skewed Distributions
It is not symmetrical as it has only one pronounced tail.
A distribution may be either negatively skewed or positively skewed.
Whether a skewed distribution is negative or positive corresponds to
whether the distinct tail slopes toward or away from zero.

Chapter 3 - 31
Negatively Skewed Distribution
A negatively skewed
distribution contains extreme
low scores that have a low
frequency, but does not
contain low frequency
extreme high scores
Copyright Houghton Mifflin Company. All rights reserved. Chapter 3 - 32
Positively Skewed Distribution
A positively skewed distribution contains extreme high scores that
have a low frequency, but does not contain low frequency
extreme low scores.
Chapter 3 - 33
Bimodal Distribution
A bimodal distribution is a symmetrical
distribution containing two distinct humps
Chapter 3 - 34
Rectangular Distribution
A rectangular distribution is a symmetrical
distribution shaped like a rectangle
REFERENCES
Kothari, C.R., Research Methodology,2
nd
ed.,
New Delhi: New Age International (P)
Ltd.,Publishers, 2004.
Malhotra, Naresh K. and Dash,
Satyabhushan, Research Marketing
Richard, I. Levin and David, S. Rubin,
Statistics for Management, Pearson
Education, Inc.,1998.

Problems & Considerations in FD

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Problems & Considerations in FD

Uploaded by

Copyright:

Available Formats

PROBLEMS & CONSIDERATIONS

You might also like