You are on page 1of 25

1

Welcome to Business Statistics


Lecture 1 & 2
Contents: Basic Statistical Concepts
Summarisation of Data
Frequency Distribution
Measures of Central Tendency
Measures of Dispersion
Relative Dispersion, Skewness.
Shwetank Rewatkar
Using Statistics
2
Malcom Forbes a businessman and a key hot air balloon enthusiast lost
his way & landed in the middle of a cornfield. He saw a man running to
him and had the following conversation,
Forbes Sir, Can you tell me where I am?
Man Certainly, you are in a basket in a field of Corn.
Forbes Sir, You must be a Statistician.
Man Thats amazing. How did you know?
Forbes Easy. Your information is concise, precise and absolutely
useless!!!
A GOOD STUDENT of Statistics should ensure that the information
resulting from a good statistical analysis is always CONCISE, often
PRECISE and never USELESS.
Shwetank Rewatkar
Types of Variables
3
A. Qualitative or Attribute variable - the characteristic being studied is generally
nonnumeric.
A. EXAMPLES: Gender, religious affiliation, type of automobile owned, state of birth, eye color are
examples.

B. Qualitative variables could also be described by numbers, although the
description might be arbitrary.
A. Examples: Car Registration number, State of birth 1, 2, 3, 4, etc.

C. Quantitative variable Can be described by a number for which arithmetic
operations such as averaging makes sense.
A. EXAMPLES: Balance in your mobile account, minutes remaining in class, or number of children
in a family.

D. Quantitative Variable can be either Discrete or Continuous.
Shwetank Rewatkar
Summary of Types of Variables
4
Shwetank Rewatkar
Four Scales of Measurement Weakest 1 & Strongest 4
5
1 Nominal scale - data that is classified into
categories and cannot be arranged in any
particular order. Numbers are just labels for
groups or classes. Nominal stands for
NAME

EXAMPLES: eye color, gender, religious affiliation,
Platform number.


2 Ordinal scale involves data arranged in
some order according to their relative size
or quality. The differences between data
values cannot be determined or are
meaningless. We know one is better than
the other but how much better is not known.

EXAMPLE: During a taste test of 4 soft drinks, Coca
Cola was ranked number 1, Sprite number 2,
Seven-up number 3, and Orange Mirinda number
4.

3 Interval scale - similar to the ordinal scale,
with the additional property that
meaningful amounts of differences
between data values can be determined.
There is no natural zero point.

EXAMPLE: Time of a day. 10:00 a.m. is not twice of 5:00
a.m. but the interval between 00:00 & 10:00 a.m. is
twice the interval between 00:00 and 5:00 a.m..


4 Ratio scale - the interval scale with an
inherent zero starting point. Differences
and ratios are meaningful for this level of
measurement.

EXAMPLES: Monthly income of surgeons, or
distance traveled by manufacturers
representatives per month.

Shwetank Rewatkar
Population v/s Sample
6
A population is a collection of all possible individuals, objects, or measurements
of interest. The population is also called the UNIVERSE. Greek letters, like or
are used for population & termed as Population Parameter. A sample is a
portion, or part, or subset of measurements selected from the population of
interest. Roman letters, x, s are used for describing sample statistic.
Shwetank Rewatkar
Types of Statistics Descriptive Statistics
7
Data and Data Collection A set of measurements obtained on
some variable is called a data set.

Descriptive Statistics - methods of organizing, summarizing, and
presenting data in an informative way. Generally when the
entire population space is considered, tabulating & presenting
the data is a challenge.


Inferential Statistics: A decision, estimate, prediction, or
generalization about a population, based on a sample.


Shwetank Rewatkar
Problems To Be Solved
8
Percentiles & Quartiles.
Measures of Central Tendency,
Mean, Arithmetic, Geometric, Harmonic.
Mean for individual, discrete, continuous distribution.
Mean from Assumed mean.
Median for individual, discrete, continuous distribution.
Mode for individual, discrete, continuous distribution
Measures of Dispersion,
Range.
Mean Deviation.
Standard Deviation.
Coefficient of Variation.
Combined Standard Deviation.
Skewness,
Test for Skewness.
Shwetank Rewatkar
Requisites of a Good Measure of Central Tendency
9
It should be rigidly defined, which means that it should be calculated
and interpreted in the same way by everyone

It should be based on all values of the data

It should not be unduly affected by the extreme values

It should be amenable for further algebraic treatment

It should be amenable to sampling, by which we mean that the results
obtained by various samples should be similar

It should be simple to compute.
Shwetank Rewatkar
Some Measures of Central Tendency
10
Arithmetic Mean: It is an mathematical average and is obtained by dividing the sum of the
observations by the number of observations.
Median: It refers to the VALUE of the middle observation of the array & is an positional
average.
Quartiles, Deciles, Percentiles: These are also positional averages and divides the series
into four parts, ten parts and 100 parts respectively.
MODE: MODE is the Value of the data that occurs most frequently.
Geometric Mean: It is a specialized average and is applicable when quantities requiring
averaging are drawn from situations following Exponential law of growth or decline.
Harmonic Mean: Harmonic Mean is used to average rates.


Shwetank Rewatkar
Arithmetic Mean
11
Merits
Easy to understand and simple to calculate

It is based on all items of the series

Rigidly defined by a mathematical formula

It is capable of further algebraic treatment

It has sampling stability and is least affected
by sampling fluctuations

Arrangement of items is not required
Demerits
It is affected by extreme values & thus for
distributions where concentration is on
small or big values the mean is not an
ideal representative

For open ended distributions mean cannot
be calculated with accuracy

Mean is not useful for studying
quantitative phenomena like beauty,
intelligence, honesty, etc

Mean does not have a life of its own.
Average number of children is 3.6 in
India is meaningless

Mean averages out the positive and
negative deviations, which is incorrect.
Shwetank Rewatkar
12
Median
Merits
Useful in Open ended series as it is
based on position and not on the
values.
Easier to compute as compared to
mean in case of unequal class
intervals.
It is not affected by extreme values.
Suitable in case of Qualitative Data
It minimises total absolute
deviations.
Demerits
Requires arrangement of data.
It is not based on all the items of
the series.
Incapable of any algebraic
treatment & combined medians
cannot be obtained.
Assumption of uniformly
distributed median class is not
always true.
Shwetank Rewatkar
13
MODE
Merits
In certain situations mode is the
only suitable average, e.g. size
of shoes, garments, wages, etc.
It is not affected by extreme
values.
It can be used for qualitative
phenomena.
It indicates point of maximum
concentration in case of highly
skewed distributions.
Limitations
In case of bi modal or multi
modal series, mode cannot
be uniquely defined.
It is incapable of further
algebraic treatment.
It is not based on all the
items of the series.
It is not rigidly defined
because different formulae
will give different answers.
Its value is affected by size of
class interval.
Shwetank Rewatkar
14
Case Study Descriptive Statistics
Ms. Kathryn Ball of AutoUSA wants to
develop tables, charts, and graphs to
show the typical selling price on
various dealer lots. The table on the
right reports only the price of the 80
vehicles sold last month at Whitner
Autoplex.
Shwetank Rewatkar
15
Constructing a Frequency Table - Example
Step 1: Decide on the number of classes.
A useful recipe to determine the number of classes (k) is the 2
to the k rule. such that 2
k
> n.
There were 80 vehicles sold. So n =80. If we try k = 6, which
means we would use 6 classes, then 2
6
= 64, somewhat less
than 80. Hence, 6 is not enough classes. If we let k =7, then 2
7

128, which is greater than 80. So the recommended number of
classes is 7.
Step 2: Determine the class interval or width.
The formula is: i (H-L)/k where i is the class interval, H is
the highest observed value, L is the lowest observed value, and
k is the number of classes.
($35,925 - $15,546)/7 = $2,911
Round up to some convenient number, such as a multiple of
10 or 100. Use a class width of $3,000


Shwetank Rewatkar
16
Step 3: Set the individual class limits

Constructing a Frequency Table - Example
Shwetank Rewatkar
17

Step 4: Tally the
vehicle selling prices
into the classes.
Step 5: Count the
number of items in
each class.


Constructing a Frequency Table
Shwetank Rewatkar
18
Relative Frequency Distribution
To convert a frequency distribution to a relative frequency
distribution, each of the class frequencies is divided by the
total number of observations.
Shwetank Rewatkar
19
Graphic Presentation of a Frequency Distribution
The three commonly used graphic
forms are:
Histograms
Frequency polygons
Cumulative frequency distributions



Shwetank Rewatkar
20
Histogram
Histogram for a frequency distribution based on
quantitative data is very similar to the bar chart showing the
distribution of qualitative data. The classes are marked on
the horizontal axis and the class frequencies on the vertical
axis. The class frequencies are represented by the heights
of the bars.
Shwetank Rewatkar
21
Histogram Using Excel
Shwetank Rewatkar
22
Frequency Polygon
A frequency polygon also
shows the shape of a
distribution and is similar
to a histogram.
It consists of line segments
connecting the points
formed by the intersections
of the class midpoints and
the class frequencies.

Shwetank Rewatkar
23
Cumulative Frequency Distribution
Shwetank Rewatkar
24
Cumulative Frequency Distribution
Shwetank Rewatkar
25
Standard Deviation,
Merits.
It is based on all items of the
distribution.
It is amenable to algebraic
treatment.
It is least affected by fluctuations
in sampling.
It facilitates the calculation of
combined standard deviation of
two or more groups.
It provides a unit of
measurement for normal
distribution.
Demerits.
It cannot be used for comparing
the variability of two or more
series of observations given in
different units.
It is difficult to compute as
compared with other measures
of dispersion.
It is very much affected by the
extreme values & importance is
given to extreme values from the
mean than the near values.
Shwetank Rewatkar

You might also like