Professional Documents
Culture Documents
ECONOMIC ANALYSIS 1
III SEMESTER
B A ECONOMICS
(2013 Admission )
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
Calicut university P.O, Malappuram Kerala, India 673 635.
263 A
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
B.A. ECONOMICS
(2013 ADMISSION )
III SEMESTER
QUANTITATIVE METHODS FOR
ECONOMIC ANALYSIS 1
Prepared by:
Module
Materials Prepared by
Full Module
Editor
Dr.C.Krishnan
Associate Professor
PG Department of Economics
Govt. College Kodanchery
Kozhikode 673580
Email: ckcalicut@rediffmail.com
Page 2
CONTENTS
PAGES
MODULE - I
5- 79
MODULE - II
80-100
MODULE - III
101-150
MODULE - IV
151-169
Page 3
Page 4
Module I
Description of Data and Sampling
1. STATISTICS-MEANING
Statistics is as old as the human race!. Its utility has been increasing as the ages goes by. In the
olden days it was used in the administrative departments of the states and the scope was limited.
Earlier it was used by governments to keep record of birth, death, population etc., for
administrative purpose. John Graunt was the first man to make a systematic study of birth and
death statistics and the calculation of expectation of life at different age in the 17th century
which led to the idea of Life Insurance.
The word Statistics seems to have been derived from the Latin word status or Italian word
statista or the German word Statistik each of which means a political state. Fields like
agriculture, economics, sociology, business management etc., are now using Statistical Methods
for different purposes.
Statistics has been defined differently by different writers. According to Webster "Statistics are
the classified facts representing the conditions of the people in a state. Specially those facts
which can be stated in numbers or any tabular or classified arrangement."
According to Bowley statistics are statistics is the science of counting, science of averages
Numerical statements of facts in any department of enquiry placed in relation to each other.
According to Yule and Kendall, statistics means quantitative data affected to a marked extent by
multiplicity of causes.
More broad definition of statistics was given by Horace Secrist. According to him, statistics
means aggregate of facts affected to marked extent by multiplicity of causes, numerically
expressed, enumerated or estimated according to a reasonable standard of accuracy, collected in
a systematic manner for a predetermined purpose and placed in relation to each other.
This definition points out some essential characteristics that numerical facts must possess so that
they may be called statistics. These characteristics are:
1.
They are enumerated or estimated according to a reasonable standard of accuracy
2.
They are affected by multiplicity of factors
3.
They must be numerically expressed
4.
They must be aggregate of facts
W.I. King defines the science of statistics is the method of judging collection, natural or social
phenomena from the results obtained from the analysis or enumeration or collection of
estimates.
Prof: Boddington has defined statistics as science of estimate and probabilities
Let us also see some other definitions of statistics.
Statistics as a discipline is the development and application of methods to collect, analyse and
interpret data.
Page 5
Statistics is the science of learning from data, and of measuring, controlling, and communicating
uncertainty; and it thereby provides the navigation essential for controlling the course of
scientific and societal advances.
Statistics is a collection of mathematical techniques that help to analyse and present data.
Statistics is also used in associated tasks such as designing experiments and surveys and planning
the collection and analysis of data from these.
Statistics is the study of numerical information, called data. Statisticians acquire, organize, and
analyse data. Each part of this process is also scrutinized. The techniques of statistics are applied
to a multitude of other areas of knowledge.
Thus to sum up statistics are the numerical statement of facts capable of analysis and
interpretation and the science of statistics is the study of the principles and the methods applied
in collecting, presenting, analysis and interpreting the numerical data in any field of inquiry.
Characteristics of Statistics
1. Statistics are aggregate of facts: A single age of 20 or 30 years is not statistics, a series of ages
are. Similarly, a single figure relating to production, sales, birth, death etc., would not be
statistics although aggregates of such figures would be statistics because of their comparability
and relationship.
2. Statistics are affected to a marked extent by a multiplicity of causes: A number of causes
affect statistics in a particular field of enquiry, e.g., in production statistics are affected by
climate, soil, fertility, availability of raw materials and methods of quick transport.
3. Statistics are numerically expressed, enumrated or estimated: The subject of statistics is
concerned essentially with facts expressed in numerical form -with their quantitative details but
not qualitative descriptions. Therefore, facts indicated by terms such as good, poor are not
statistics unless a numerical equivalent, is assigned to each expression. Also this may either be
enumerated or estimated, where actual enumeration is either not possible or is very difficult.
4. Statistics are numerated or estimated according to reasonable standard of accuracy: Personal
bias and prejudices of the enumeration should not enter into the counting or estimation of
figures, otherwise conclusions from the figures would not be accurate. The figures should be
counted or estimated according to reasonable standards of accuracy. Absolute accuracy is neither
necessary nor sometimes possible in social sciences. But whatever standard of accuracy is once
adopted, should be used throughout the process of collection or estimation.
5. Statistics should be collected in a systematic manner for a predetermined purpose: The
statistical methods to be applied on the purpose of enquiry since figures are always collected
with some purpose. If there is no predetermined purpose, all the efforts in collecting the figures
may prove to be wasteful. The purpose of a series of ages of husbands and wives may be to find
whether young husbands have young wives and the old husbands have old wives.
6. Statistics should be capable of being placed in relation to each other: The collected figure
should be comparable and well-connected in the same department of inquiry. Ages of husbands
Page 6
are to be compared only with the corresponding ages of wives, and not with, say, heights of
trees.
Functions of Statistics
The functions of statistics may be enumerated as follows :
(i) To present facts in a definite form : Without a statistical study our ideas are likely to be vague,
indefinite and hazy, but figures helps as to represent things in their true perspective. For
example, the statement that some students out of 1,400 who had appeared, for a certain
examination, were declared successful would not give as much information as the one that 300
students out of 400 who took the examination were declared successful.
(ii) To simplify unwieldy and complex data : It is not easy to treat large numbers and hence they
are simplified either by taking a few figures to serve as a representative sample or by taking
average to give a birds eye view of the large masses. For example, complex data may be
simplified by presenting them in the form of a table, graph or diagram, or representing it through
an average etc.
(iii) To use it as a technique for making comparisons: The significance of certain figures can be
better appreciated when they are compared with others of the same type. The comparison
between two different groups is best represented by certain statistical methods, such as average,
coefficients, rates, ratios, etc.
Uses of Statistics
Statistics is primarily used either to make predictions based on the data available or to make
conclusions about a population of interest when only sample data is available.
In both cases statistics tries to make sense of the uncertainty in the available data.
Statisticians apply statistical thinking and methods to a wide variety of scientific, social, and
business endeavours in such areas as astronomy, biology, education, economics, engineering,
genetics, marketing, medicine, psychology, public health, sports, among many. Many economic,
social, political, and military decisions cannot be made without statistical techniques, such as the
design of experiments to gain federal approval of a newly manufactured drug.
Statistics is of two types (a) Descriptive statistics involves methods of organizing, picturing and
summarizing information from data. (b) Inferential statistics involves methods of using
information from a sample to draw conclusions about the population.
These days statistical methods are applicable everywhere. There is no field of work in which
statistical methods are not applied. According to A L. Bowley, A knowledge of statistics is like
a knowledge of foreign languages or of Algebra, it may prove of use at any time under any
circumstances. The importance of the statistical science is increasing in almost all spheres of
knowledge, e g., astronomy, biology, meteorology, demography, economics and mathematics.
Economic planning without statistics is bound to be baseless. Statistics serve in administration,
and facilitate the work of formulation of new policies. Financial institutions and investors utilise
statistical data to summaries the past experience. Statistics are also helpful to an auditor, when he
uses sampling techniques or test checking to audit the accounts of his client.
Page 7
(a) Statistics and Economics: In the year 1890 Prof. Alfred Marshall, the renowned economist
observed that statistics are the straw out of which I, like every other economist, have to make
bricks. This proves the significance of statistics in economics. Economics is concerned with
production and distribution of wealth as well as with the complex institutional set-up connected
with the consumption, saving and investment of income. Statistical data and statistical methods
are of immense help in the proper understanding of the economic problems and in the
formulation of economic policies. In fact these are the tools and appliances of an economists
laboratory. In the field of economics it is almost impassible to find a problem which does not
require an extensive uses of statistical data. As economic theory advances use of statistical
methods also increase. The laws of economics like law of demand, law of supply etc can be
considered true and established with the help of statistical methods. Statistics of consumption
tells us about the relative strength of the desire of a section of people. Statistics of production
describe the wealth of a nation. Exchange statistics through light on commercial development of
a nation. Distribution statistics disclose the economic conditions of various classes of people.
There for statistical methods are necessary for economics.
(b) Statistics and business: Statistics is an aid to business and commerce. When a person enters
business, he enters into the profession of fore casting. Modern statistical devices have made
business forecasting more precise and accurate. A business man needs statistics right from the
time he proposes to start business. He should have relevant fact and figures to prepare the
financial plan of the proposed business. Statistical methods are necessary for these purposes. In
industrial concern statistical devices are being used not only to determined and control the
quality of products manufactured by also to reduce wastage to a minimum. The technique of
statistical control is used to maintain quality of products.
(c) Statistics and Research: Statistics is an indispensable tool of research. Most of the
advancement in knowledge has taken place because of experiments conducted with the help of
statistical methods. For example, experiments about crop yield and different types of fertilizers
and different types of soils of the growth of animals under different diets and environments are
frequently designed and analysed according to statistical methods. Statistical methods are also
useful for the research in medicine and public health. In fact there is hardly any research work
today that one can find complete without statistical data and statistical methods.
Other uses of statistics are as follows.
(1) Statistics helps in providing a better understanding and exact description of a phenomenon of
nature.
(2) Statistical helps in proper and efficient planning of a statistical inquiry in any field of study.
(3) Statistical helps in collecting an appropriate quantitative data.
(4) Statistics helps in presenting complex data in a suitable tabular, diagrammatic and graphic
form for an easy and clear comprehension of the data.
(5) Statistics helps
in
understanding
the nature
and pattern
of
variability
of
a phenomenon through quantitative observations.
Page 8
(6) Statistics helps in drawing valid inference, along with a measure of their reliability about the
population parameters from the sample data.
Limitations of Statistics
Statistics is indispensable to almost all sciences - social, physical and natural. It is very often
used in most of the spheres of human activity. In spite of the wide scope of the subject it has
certain limitations. Some important limitations of statistics are the following:
1. Statistics does not study qualitative phenomena: Statistics deals with facts and figures. So
the quality aspect of a variable or the subjective phenomenon falls out of the scope of statistics.
For example, qualities like beauty, honesty, intelligence etc. cannot be numerically expressed. So
these characteristics cannot be examined statistically. This limits the scope of the subject.
2. Statistical laws are not exact: Statistical laws are not exact as incase of natural sciences.
These laws are true only on average. They hold good under certain conditions. They cannot be
universally applied. So statistics has less practical utility.
3. Statistics does not study individuals: Statistics deals with aggregate of facts. Single or
isolated figures are not statistics. This is considered to be a major handicap of statistics.
4. Statistics can be misused: Statistics is mostly a tool of analysis. Statistical techniques are
used to analyze and interpret the collected information in an enquiry. As it is, statistics does not
prove or disprove anything. It is just a means to an end. Statements supported by statistics are
more appealing and are commonly believed. For this, statistics is often misused. Statistical
methods rightly used are beneficial but if misused these become harmful. Statistical methods
used by less expert hands will lead to inaccurate results. Here the fault does not lie with the
subject of statistics but with the person who makes wrong use of it.
Other limitations are as follows.
(1) Statistics laws are true on average. Statistics are aggregates of facts. So single observation is
not a statistics, it deals with groups and aggregates only.
(2) Statistical methods are best applicable on quantitative data.
(3) Statistical cannot be applied to heterogeneous data.
(4) It sufficient care is not exercised in collecting, analyzing and interpretation the data,
statistical results might be misleading.
(5) Only a person who has an expert knowledge of statistics can handle statistical data
efficiently.
(6) Some errors are possible in statistical decisions. Particularly the inferential statistics involves
certain errors. We do not know whether an error has been committed or not.
2.DATA: ELEMENTS, VARIABLES, OBSERVATIONS, SCALE OF
MEASUREMENT
Data may be defined as facts, observations, and information that come from investigations. Data
can be defined as groups of information that represent the qualitative or quantitative attributes of
a variable or set of variables, which is the same as saying that data can be any set of information
that describes a given entity. Data in statistics can be classified into grouped data and ungrouped
data.
Quantitative Methods for Economic Analysis - I
Page 9
1. Elements: A data element is a unit of data for which the definition, identification,
representation, and permissible values are specified by means of a set of attributes. It is the
smallest named item of data that conveys meaningful information or condenses lengthy
description into a short code called data field in the structure of a database.
2. Variable - property of an object or event that can take on different values. A variable is any
measurable characteristic or attribute that can have different values for different subjects. Height,
age, amount of income, country of birth, grades obtained at school and type of housing are
examples of variables. For example, college major is a variable that takes on values like
mathematics, computer science, English, psychology, etc.
Discrete Variable - a variable with a limited number of values (e.g., gender (male/female),
college class (freshman/sophomore/junior/senior).
Continuous Variable - a variable that can take on many different values, in theory, any value
between the lowest and highest points on the measurement scale.
Independent Variable - a variable that is manipulated, measured, or selected by the researcher as
an antecedent condition to an observed behavior. In a hypothesized cause-and-effect
relationship, the independent variable is the cause and the dependent variable is the outcome or
effect.
Dependent Variable - a variable that is not under the experimenter's control -- the data. It is the
variable that is observed and measured in response to the independent variable.
Qualitative Variable - a variable based on categorical data.
Quantitative Variable - a variable based on quantitative data.
Qualitative vs. Quantitative Variables
Variables can be classified as qualitative (aka, categorical) or quantitative (aka, numeric).
Qualitative. Qualitative variables take on values that are names or labels. The color of a
ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier) would be
examples of qualitative or categorical variables.
Quantitative. Quantitative variables are numeric. They represent a measurable quantity.
For example, when we speak of the population of a city, we are talking about the number
of people in the city - a measurable attribute of the city. Therefore, population would be a
quantitative variable.
In algebraic equations, quantitative variables are represented by symbols (e.g., x, y, or z).
Discrete vs. Continuous Variables
Quantitative variables can be further classified as discrete or continuous. If a variable can take on
any value between its minimum value and its maximum value, it is called a continuous variable;
otherwise, it is called a discrete variable.
Some examples will clarify the difference between discrete and continouous variables.
Suppose the fire department mandates that all fire fighters must weigh between 150 and
250 pounds. The weight of a fire fighter would be an example of a continuous variable;
since a fire fighter's weight could take on any value between 150 and 250 pounds.
Quantitative Methods for Economic Analysis - I
Page 10
Suppose we flip a coin and count the number of heads. The number of heads could be any
integer value between 0 and plus infinity. However, it could not be any number between
0 and plus infinity. We could not, for example, get 2.3 heads. Therefore, the number of
heads must be a discrete variable.
Univariate vs. Bivariate Data
Statistical data are often classified according to the number of variables being studied.
Univariate data. When we conduct a study that looks at only one variable, we say that we
are working with univariate data. Suppose, for example, that we conducted a survey to
estimate the average weight of high school students. Since we are only working with one
variable (weight), we would be working with univariate data.
Bivariate data. When we conduct a study that examines the relationship between two
variables, we are working with bivariate data. Suppose we conducted a study to see if
there were a relationship between the height and weight of high school students. Since we
are working with two variables (height and weight), we would be working with bivariate
data.
3. Observations
An observation is the value, at a particular period, of a particular variable, such as the individual
price of an item at a given outlet. An observation is the value, at a particular period, of a
particular variable. It is thus a method of data collection in which the situation of interest is
watched and the relevant facts, actions and behaviors are recorded.
Observation units vary according to the specific survey or data collection: for statistical data
collected on persons the observation unit is usually one individual or a household.
4. Scale of Measurement
Normally, when one hears the term measurement, they may think in terms of measuring the
length of something (e.g., the length of a piece of wood) or measuring a quantity of something
(ie. a cup of flour).This represents a limited use of the term measurement. In statistics, the term
measurement is used more broadly and is more appropriately termed scales of measurement.
Scales of measurement refer to ways in which variables/numbers are defined and categorized.
Each scale of measurement has certain properties which in turn determines the appropriateness
for use of certain statistical analyses. The four scales of measurement are nominal, ordinal,
interval, and ratio.
Properties of Measurement Scales
Each scale of measurement satisfies one or more of the following properties of measurement.
Identity: Each value on the measurement scale has a unique meaning. It is not equal to any other
value on the scale.
Magnitude: All values on the measurement scale have an ordered relationship to one another.
That is, some values are larger and some are smaller.
Equal intervals: Scale units along the scale are equal to one another. This means, for example,
that the difference between 1 and 2 would be equal to the difference between 19 and 20.
A minimum value of zero: The scale has a true zero point that is now values exist below zero.
Quantitative Methods for Economic Analysis - I
Page 11
Measurement scales are of four types, namely, Nominal Scale of Measurement, Ordinal Scale of
Measurement, Interval Scale of Measurement and Ratio Scale of Measurement
(a) Nominal Scale of Measurement
The nominal scale of measurement only satisfies the identity property of measurement. Values
assigned to variables represent a descriptive category, but have no inherent numerical value with
respect to magnitude.
Gender is an example of a variable that is measured on a nominal scale. Individuals may be
classified as "male" or "female", but neither value represents more or less "gender" than the
other. Religion and political affiliation are other examples of variables that are normally
measured on a nominal scale.
(b) Ordinal Scale of Measurement
The ordinal scale has the property of both identity and magnitude. Each value on the ordinal
scale has a unique meaning, and it has an ordered relationship to every other value on the scale.
An example of an ordinal scale in action would be the results of a horse race, reported as "win",
"place", and "show". We know the rank order in which horses finished the race. The horse that
won finished ahead of the horse that placed, and the horse that placed finished ahead of the horse
that showed. However, we cannot tell from this ordinal scale whether it was a close race or
whether the winning horse won by a mile.
(c) Interval Scale of Measurement
The interval scale of measurement has the properties of identity, magnitude, and equal intervals.
A perfect example of an interval scale is the Fahrenheit scale to measure temperature. The scale
is made up of equal temperature units, so that the difference between 40 and 50 degrees
Fahrenheit is equal to the difference between 50 and 60 degrees Fahrenheit.
With an interval scale, you know not only whether different values are bigger or smaller, you
also know how much bigger or smaller they are. For example, suppose it is 60 degrees
Fahrenheit on Monday and 70 degrees on Tuesday. You know not only that it was hotter on
Tuesday, you also know that it was 10 degrees hotter.
(d) Ratio Scale of Measurement
The ratio scale of measurement satisfies all four of the properties of measurement: identity,
magnitude, equal intervals, and a minimum value of zero.
The weight of an object would be an example of a ratio scale. Each value on the weight scale has
a unique meaning, weights can be rank ordered, units along the weight scale are equal to one
another, and the scale has a minimum value of zero.
Weight scales have a minimum value of zero because objects at rest can be weightless, but they
cannot have negative weight.
The table below will help clarify the fundamental differences between the four scales of
measurement:
Page 12
Nominal
Ordinal
Interval
Ratio
Indications
Difference
Indicates Direction of
Difference
Indicates Amount of
Difference
Absolute
Zero
X
X
X
X
X
X
X
X
X
You will notice in the above table that only the ratio scale meets the criteria for all four
properties of scales of measurement.
Interval and Ratio data are sometimes referred to as parametric and Nominal and Ordinal data
are referred to as nonparametric. Parametric means that it meets certain requirements with
respect to parameters of the population (for example, the data will be normal--the distribution
parallels the normal or bell curve). In addition, it means that numbers can be added, subtracted,
multiplied, and divided. Parametric data are analyzed using statistical techniques identified as
Parametric Statistics. As a rule, there are more statistical technique options for the analysis of
parametric data and parametric statistics are considered more powerful than nonparametric
statistics. Nonparametric data are lacking those same parameters and cannot be added,
subtracted, multiplied, and divided. For example, it does not make sense to add Social Security
numbers to get a third person. Nonparametric data are analyzed by using Nonparametric
Statistics.
3. TYPES OF DATA: Qualitative and Quantitative; Cross-section, Time
series and Pooled Data
3.1 Qualitative and Quantitative
Data is a collection of facts, such as values or measurements. It can be numbers, words,
measurements, observations or even just descriptions of things.Some methods provide data
which are quantitative and some methods data which are qualitative.
Quantitative data are anything that can be expressed as a number, or quantified. Examples of
quantitative data are scores on achievement tests, number of hours of study, or weight of a
subject. These data may be represented by ordinal, interval or ratio scales and lend themselves to
most statistical manipulation. Thus qualitative data is one that approximates or characterizes but
does not measure the attributes, characteristics, properties, etc., of a thing or phenomenon.
Qualitative data describes whereas quantitative data defines.
Qualitative data cannot be expressed as a number. Data that represent nominal scales such as
gender, socio-economic status, religious preference are usually considered to be qualitative data.
Thus quantitative data is one that can be quantified and verified, and is amenable to statistical
manipulation. Quantitative data defines whereas qualitative data describes.
Both types of data are valid types of measurement. But only quantitative data can be analysed
statistically, and thus more rigorous assessments of the data are possible.
Quantitative Methods for Economic Analysis - I
Page 13
Quantitative and qualitative data provide different outcomes, and are often used together to get a
full picture of a population. For example, if data are collected on annual income (quantitative),
occupation data (qualitative) could also be gathered to get more detail on the average annual
income for each type of occupation.
Quantitative and qualitative data can be gathered from the same data unit depending on whether
the variable of interest is numerical or categorical. For example:
Example 1:
Oil Painting
Oil Painting
Qualitative data:
blue/green color, gold frame
Quantitative data:
picture is 10" by 14"
paint
in.
cost Rs5000
Example 2
Data
unit
A person
A house
A
business
110 employees
Categorical
variable
"In which
country were your
children born?"
"What is your
occupation?"
"Do you work fulltime or part-time?"
"In which city or
town is the house
located?"
"What is
the industry of the
business?"
= Qualitative
data
India
Banker
Full-time
Thrissur
Textile retail
Page 14
A farm
36 cows
"What is the
main activity of
the farm?"
Dairy
Quantitative:
Discrete:
He has 4 legs
He has 2 brothers
Continuous:
He weighs 25.5 kg
He is 565 mm tall
Page 15
Other examples: if one considered the closing prices of a group of 20 different tech stocks of
BSE on September 15, 2014 this would be an example of cross-sectional data. Note that the
underlying population should consist of members with similar characteristics. For example,
suppose you are interested in how much companies spend on research and development
expenses. Firms in some industries such as retail spend little on research and development
(R&D), while firms in industries such as technology spend heavily on R&D. Therefore, it's
inappropriate to summarize R&D data across all companies. Rather, analysts should summarize
R&D data by industry, and then analyze the data in each industry group. Other examples of
cross-sectional data would be: an inventory of all ice creams in stock at a particular supermarket,
a list of grades obtained by a class of students for a specific test.
The major difference between time series data and cross-section data is that the former focuses
on results gained over an extended period of time, often within a small area, whilst the latter
focuses on the information received from surveys and opinions at a particular time, in various
locations, depending on the information sought.
4. FREQUENCY DISTRIBUTIONS: ABSOLUTE AND RELATIVE
Frequency distribution is a specification of the way in which the frequencies of members of a
population are distributed according to the values of the variates which they exhibit. For
observed data the distribution is usually specified in tabular form, with some grouping for
continuous variates.
The frequency distribution or frequency table is a tabular organization of statistical data,
assigning to each piece of data its corresponding frequency.
Types of Frequencies
(a) Absolute Frequency
The absolute frequency is the number of times that a certain value appears in a statistical study.
It is denoted by .
The sum of the absolute frequencies is equal to the total number of data, which is denoted by N.
+
+ +
This sum is commonly denoted by the Greek letter (capital sigma) which represents sum.
The relative frequency is the quotient between the absolute frequency of a certain value and the
total number of data. It can be expressed as a percentage and is denoted by
=
Page 16
fi
1
2
6
Fi
1
3
9
ni
0.032
0.065
0.194
Ni
0.032
0.097
0.290
30
16
0.226
0.516
31
24
0.258
0.774
3
3
1
31
27
30
31
0.097
0.097
0.032
1
0.871
0.968
1
32
33
34
Count
I
II
III
III
I
Page 17
The information is presented on a coordinate axis. The values of the variable are represented on
the horizontal axis and the absolute, relative or cumulative frequencies are represented on the
vertical axis.
The data is represented by bars whose height is proportional to the frequency.
Example
A study has been conducted to determine the blood group of a class of 20 students. The results
are as follows:
Blood
Group
fi
AB
9
20
Number of students
AB
Blood Group
Page 18
Histogram:
A histogram is a set of vertical bars whose one as are proportional to the frequencies
represented. While constructing histogram, the variable is always taken on the X axis and the
frequencies on the Y axis. The width of the bars in the histogram will be proportional to the
class interval. The bars are drawn without leaving space between them. A histogram generally
represents a continuous curve. If the class intervals are uniform for a frequency distribution,
then the width of all the bars will by equal.
Example:
15-20
20
20-25
47
25-30
38
30-35
10
50
No. of students
10-15
No. of
students
5
Marks
40
30
20
10
0
Marks
10
15
20
25
30
Page 19
Example:
Draw a frequency polygon to the following frequency distribution
Marks:
10-20
20-30
No. of
13
30-40
40-50
50-60
19
28
19
60-70
11
70-80
9
Students:
No. of students
Y
20
15
10
10
20
30
40
50
60
70
Marks
Frequency Curves
Frequency curves are derived from frequency polygons. Frequency curve is obtained by
joining the points of frequency polygon by a freehand smoothed curve. Unlike frequency
polygon, where the points we joined by straight lines, we make use of free hand joining of those
points in order to get a smoothed frequency curve. It is used to remove the ruggedness of
polygon and to present it in a good form or shape. We smoothen the angularities of the polygon
only without making any basic change in the shape of the curve. In this case also the curve
begins and ends at base line, as is in case of polygon. Area under the curve must remain almost
the same as in the case of polygon.
Example:
Marks:
10-20
No. of
Students:
20-30
30-40
15
40-50
20
50-60
60-70
12
7
Page 20
No. of students
Y
20
15
10
5
|
x
x
x
x
|
No. of
Students
Marks
10
20
|
30
|
40
50
60
70
Marks
Page 21
10-20
20-30
30-40
40-50
50-60
60-70
Marks
less than
No. of
Students
10
20
30
40
50
60
70
0
4
10
20
40
58
60
4
6
10
20
18
2
Marks
More
than
10
20
30
40
50
60
70
No. of
Students
60
56
50
40
20
2
0
No. of Students
70
60
50
40
No. of Students
30
No. of Students2
20
10
20
40
60
80
Marks
Pie Diagrams
One of the most common ways to represent data graphically is called a pie chart. It gets its name
by how it looks, just like a circular pie that has been cut into several slices. This kind of graph is
helpful when graphing qualitative data, where the information describes a trait or attribute and is
not numerical. Each trait corresponds to a different slice of the pie. By looking at all of the pie
pieces, you can compare how much of the data fits in each category.
Pie charts are a form of an area chart that are easy to understand with a quick look. They show
the part of the total (percentage) in an easy-to-understand way. Pie charts are useful tools that
Quantitative Methods for Economic Analysis - I
Page 22
help you figure out and understand polls, statistics, complex data, and income or spending. They
are so wonderful because everybody can see what is going on.
Pie diagrams are used when the aggregate and their division are to be shown together.
The aggregate is shown by means of a circle and the division by the sectors of the circle. For
example: to show the total expenditure of a government distributed over different departments
like agriculture, irrigation, industry, transport etc. can be shown in a pie diagram. In constructing
a pie diagram the various components are first expressed as a percentage and then the percentage
is multiplied by 3.6. so we get angle for each component. Then the circle is divided into sectors
such that angles of the components and angles of the sectors are equal. Therefore one sector
represents one component. Usually components are with the angles in descending order are
shown.
Example:
You conducted a survey as part of a project work. You had taken a sample of 20 individuals and
you want to represent their occupation using a pie chart .
First, put your data into a table, then add up all the values to get a total:
Farmer
Business
Teacher
Bank
Driver
TOTAL
20
Divide each value by the total and multiply by 100 to get a percent:
Farmer
Business
Teacher
Bank
Driver
TOTAL
20
4/20 =20% 5/20 =25% 6/20 =30% 1/20 = 5% 4/20 =20% 100%
Now you need to figure out how many degrees for each pie slice (correctly
called a sector).
A Full Circle has 360 degrees, so we do this calculation:
Farmer
Business
Teacher
Bank
Driver
TOTAL
20
4/20 =20%
5/20 =25%
6/20 =30%
1/20 = 5%
4/20 =20%
100%
4/20 360 5/20 360 6/20 360 1/20 360 4/20 360 360
= 72
= 90
= 108
= 18
= 72
Page 23
Business
Teacher
20%
Bank
Driver
20%
5%
25%
30%
Pie charts are to be used with qualitative data, however there are some limitations in using them.
If there are too many categories, then there will be a multitude of pie pieces. Some of these are
likely to be very skinny, and can be difficult to compare to one another.
If we want to compare different categories that are close in size, a pie chart does not always help
us to do this. If one slice has central angle of 30 degrees, and another has a central angle of 29
degrees, then it would be very hard to tell at a glance which pie piece is larger than the other.
6. SUMMARY MEASURE OF DISTRIBUTIONS
We will discuss three sets of summary measures namely Measures of Central Tendency,
Variability and Shape. These are called summary measures because they summarise the data. For
example, one of summary measure very familiar to you is mean. (Mean comes under measure of
central tendency.) If we take mean mark of students in a class for a subject, it gives you a rough
idea of what the marks are like. Thus based on just one summary value, we get idea of the entire
data.
6.1 Measures of Central Tendency
A measure of central tendency is a measure that tells us where the middle of a bunch of data lies.
A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data. As such, measures of central tendency are
sometimes called measures of central location. They are also classed as summary statistics. The
mean (often called the average) is most likely the measure of central tendency that you are most
familiar with, but there are others, such as the median and the mode.
Quantitative Methods for Economic Analysis - I
Page 24
Mean: Mean is the most common measure of central tendency. It is simply the sum of the
numbers divided by the number of numbers in a set of data. This is also known as average.
Median: Median is the number present in the middle when the numbers in a set of data are
arranged in ascending or descending order. If the number of numbers in a data set is even, then
the median is the mean of the two middle numbers.
Mode: Mode is the value that occurs most frequently in a set of data.
The mean, median and mode are all valid measures of central tendency, but under different
conditions, some measures of central tendency become more appropriate to use than others. In
the following sections, we will look at the mean, mode and median, and learn how to calculate
them.
We will also discuss Geometric Mean and Harmonic Mean.
Requisites of a good average
Since an average is a single value representing a group of values, it is desired that such a
value satisfies the following properties.
1.
Easy to understand:- Since statistical methods are designed to simplify the complexities.
2.
Simple to compute: A good average should be easy to compute so that it can be used
widely. However, though case of computation is desirable, it should not be sought at the
expense of other averages. ie, if in the interest of greater accuracy, use of more difficult average
is desirable.
3.
Based on all items:- The average should depend upon each and every item of the series,
so that if any of the items is dropped, the average itself is altered.
4.
Not unduly affected by Extreme observations:- Although each and every item should
influence the value of the average, non of the items should influence it unduly. If one or two
very small or very large items unduly affect the average, ie, either increase its value or reduce its
value, the average cant be really typical of entire series. In other words, extremes may distort
the average and reduce its usefulness.
5.
Rigidly defined: An average should be properly defined so that it has only one
interpretation. It should preferably be defined by algebraic formula so that if different people
compute the average from the same figures they all get the same answer. The average should not
depend upon the personal prejudice and bias of the investigator, other wise results can be
misleading.
6.
Capable of further algebraic treatment: We should prefer to have an average that could be
used for further statistical computation so that its utility is enhanced. For example, if we are
given the data about the average income and number of employees of two or more factories, we
should able to compute the combined average.
7.
Sampling stability: Last, but not least we should prefer to get a value which has what the
statisticians called sampling stability. This means that if we pick 10 different group of college
students, and compute the average of each group, we should expect to get approximately the
same value. It does not mean, however that there can be no difference in the value of different
Page 25
samples. There may be some differences but those samples in which this difference is less that
are considered better than those in which the difference is more.
(a) Mean (Arithmetic mean / average)
The mean (or average) is the most popular and well known measure of central tendency. It can
be used with both discrete and continuous data, although its use is most often with continuous
data (see our Types of Variable guide for data types). The mean is equal to the sum of all the
values in the data set divided by the number of values in the data set. So, if we have n values in a
data set and they have values x1, x2, ..., xn, the sample mean, usually denoted by (pronounced x
bar), is:
This formula is usually written in a slightly different manner using the Greek
capitol letter, , pronounced "sigma", which means "sum of...":
Example
In a survey you collected information on monthly spending for mobile recharge by 20 students of
which 10 are male and 10 female. We illustrate below how the data is used to find mean.
1
Male
250
Female 100
Both
350
2
150
150
300
3
100
150
250
4
175
100
275
5
150
200
350
6
250
150
400
7
200
125
325
8
200
150
350
9
150
130
280
10
170
180
350
Total
1795
1435
3230
Mean
179.50
143.50
161.50
First we found the mean for male students. Here x= 1795. n =10. So 1795/10 = 179.5.
Similarly, the mean for female students. Here x= 1435. n =10. So 1435/10 = 143.5.
We also find the mean for male and female taken together.
Here x= 3230. n =20. So 3230/20 = 161.50.
Based on the above we can make certain observations. Male students spend Rs. 179.50 on an
average in a month for mobile recharge. Female students spend Rs. 143.50. We may conclude
that male students spend more on monthly mobile recharges. As a researcher, you may now use
this information to make further studies as to why this is so. What are the factors that make male
students to spend more on mobile recharges. We have also calculated the average for all students
taken together. It is Rs. 161.50. Thus we observe that the male students spend more than the
average for all students while female students spend less than the total for all students.
Mean is also calculated using another method called the shortcut method asexplained below.
Short cut method: The arithmetic mean can also be calculated by short cut method. This method
reduces the amount of calculation. It involves the following steps
Quantitative Methods for Economic Analysis - I
Page 26
i.
Assume any one value as an assumed mean, which is also known as working mean
or arbitrary average (A).
ii. Find out the difference of each value from the assumed mean
(d = X-A).
iii. Add all the deviations (d)
iv. Apply the formula
X=A+
Where X Mean,
Sum of deviation from assumed mean,
A Assumed mean
Example:
Calculate arithmetic mean
Roll No :
Marks :
1
40
2
50
Roll Nos.
3
55
4
78
5
58
Marks
d = X - 55
40
-15
50
-5
55
78
23
58
60
1
2
3
4
5
6
6
60
d = 11
X=A+
= 55 +
= 56.83
The formula is X =
Example
X
f
1
10
2
12
3
8
4
7
5
11
Page 27
Solution
X
1
2
3
4
5
fX
10
10
12
24
24
28
11
55
N = fX = 141
X=
= .
= 2.93
X=A
Continuous series
In continuous frequency distribution, the value of each individual frequency distribution
is unknown. Therefore an assumption is made to make them precise or on the assumption that
the frequency of the class intervals is concentrated at the centre that the mid point of each class
intervals has to be found out. In continuous frequency distribution, the mean can be calculated
by any of the following methods.
a. Direct method
b. Short cut method
c. Step deviation method
a. Direct Method
Steps:
1. Find out the mid value of each group or class. The mid value is obtained by adding the
lower and upper limit of the class and dividing the total by two. (symbol = m)
2. Multiply the mid value of each class by the frequency of the class. In other words m will
be multiplied by f.
3. Add up all the products - fm
4. fm is divided by N
Example:
From the following find out the mean profit
Profit/Shop:
No. of shops:
100-200 200-300
300-400
400-500
500-600
600-700
700-800
10
20
26
30
28
18
18
Page 28
Solution
Mid point - m
150
250
350
450
550
650
750
Profit ( )
100-200
200-300
300-400
400-500
500-600
600-700
700-800
X=
No of Shops (f)
10
18
20
26
30
28
18
f = 150
fm
1500
4500
7000
11700
16500
18200
13500
fm = 72900
= 486
X=A +
Example: (solving the last example)
Solving: Calculation of Mean
Profit ( )
100-200
200-300
300-400
400-500
500-600
600-700
700-800
X=A +
m
150
250
350
450
550
650
750
=450 +
c) Step deviation method
d = m - 450
-300
-200
-100
0
100
200
300
f
10
18
20
26
30
28
18
f = 150
fd
-3000
-3600
-2000
0
3000
5600
5400
fd = 5400
= 486
The short cut method discussed above is further simplified or calculations are reduced to a
great extent by adopting step deviation methos.
Steps:
1. Find out the mid value of each class or group (m)
2. Assume any one of the mid value as an average (A)
Quantitative Methods for Economic Analysis - I
Page 29
3.
4.
5.
6.
7.
Find out the deviations of the mid value of each from the assumed mean (d)
Deviations are divided by a common factor (d')
Multiply the d' of each class by its frequency (f d')
Add up the products (fd')
Then apply the formula
X=A +
c
Where c = Common factor
Example:
Calculate mean for the last problem
Solution
Profit
100-200
200-300
300-400
400-500
500-600
600-700
700-800
m
150
250
350
450
550
650
750
X=A +
f
10
18
20
26
30
28
18
f = 150
d
-300
-200
-100
0
100
200
300
d'
-3
-2
-1
0
1
2
3
f d'
-30
-36
-20
0
30
56
54
f d' = 540
450 +
100
450 + (0.36 100) = 486
The mean is essentially a model of your data set. It is the value that is most common. You will
notice, however, that the mean is not often one of the actual values that you have observed in
your data set. However, one of its important properties is that it minimises error in the prediction
of any one value in your data set. That is, it is the value that produces the lowest amount of error
from all other values in the data set.
An important property of the mean is that it includes every value in your data set as part of the
calculation. In addition, the mean is the only measure of central tendency where the sum of the
deviations of each value from the mean is always zero.
We complete our discussion on arithmetic mean by listing the merits and demerits of it.
Merits:
It is rigidly defined.
It is easy to calculate and simple to follow.
Page 30
Page 31
=
=
(b) Median
The median is also a frequently used measure of central tendency. The median is the midpoint of
a distribution: the same number of data points are above the median as below it. The median is
the middle score for a set of data that has been arranged in order of magnitude.
The median is determined by sorting the data set from lowest to highest values and taking the
data point in the middle of the sequence. There is an equal number of points above and below the
median. For example, in the data 7,8,9,10,11, the median is 9; there are two data points greater
than this value and two data points less than this value. Thus to find the median, we arrange the
observations in order from smallest to largest value. If there is an odd number of observations,
the median is the middle value.
If there is an even number of observations, the median is the average of the two middle values.
Thus, the median of the numbers 2, 4, 7, 12 is (4+7)/2 = 5.5.
In certain situations the mean and median of the distribution will be the same, and in some
situations it will be different. For example, in the data 1,2,3,4,5 the median is 3; there are two
data points greater than this value and two data points less than this value. In this case, the
median is equal to the mean. But consider the data 1,2,3,4,10. In this dataset, the median still is
three, but the mean is equal to 4.
The median can be determined for ordinal data as well as interval and ratio data. Unlike the
mean, the median is not influenced by outliers at the extremes of the data set. For this reason, the
median often is used when there are a few extreme values that could greatly influence the mean
and distort what might be considered typical. For data which is very skewed, the median often is
used instead of the mean.
Calculation of Median : Discrete series
Steps:
Arrange the date in ascending or descending order
Find cumulative frequencies
Apply the formula Median
Median = Size of
item
Example: Calculate median from the following
Size of shoes:
5
5.5 6
6.5
Frequency :
10
16
28
15
7
30
7.5
40
8
34
Solution
Page 32
Size
5
5.5
6
6.5
7
7.5
8
Median = Size of
N = 173
f
10
16
28
15
30
40
34
Cumulative f (f)
10
26
54
69
99
139
173
item
Median =
= 87th item = 7
Median = 7
Calculation of median Continuous frequency distribution
Steps:
Find out the median by using N/2
Find out the class which median lies
Apply the formula
= +
2
Where L = lower limit of the median class
h = class interval of the median class
f = frequency of the median class
N= ,
Below
20
5
Below
30
9
Below
40
12
Below
50
14
Below
60
15
Below
70
15.5
70 and
over
15.6
Solution:
First we have to convert the distribution to a continuous frequency distribution as in the
following table and then compute median.
Age in years
0-10
10-20
5-2=3
20-30
9-5=4
30-40
12-9=3
12
40-50
14-12=2
14
50-60
15-14=1
15
60-70
15.5-15=0.5
15.5
70 and above
15.6-15.5=0.1
15.6
Page 33
Median item =
= 7.8
Find the cumulative frequency (c.f) greater than 7.8 is 9. Thus the corresponding class 20-30 is
the median class.
= 20, = 10, = 4,
= 15.6 , = 5
Use the formula
10
5
(7.8 5) = 20 + 2.8
= 20 +
4
2
= 20 + 5 1.4 = 27.
Page 34
93-97
98-102
(kg)
No. of
103-
108-
113-
118-
123-
128-
107
112
117
122
127
132
12
17
14
students
Solution: Since the formula for mode requires the distribution to be continuous
with exclusive type classes, we first convert the classes into class boundaries.
Wight
Class
boundaries
Mid
value (X)
93-97
98-102
103-107
108-112
113-117
118-122
123-127
128-132
92.5-97.5
97.5-102.5
102.5-107.5
107.5-112.5
112.5-117.5
117.5-122.5
122.5-127.5
127.5-132.5
95
100
105
110
115
120
125
130
Number
of
students
(f)
3
5
12
17
14
6
3
1
Mean
Mean = 110.66kgs.
Mode
= 61
110
5
-3
-2
-1
0
1
2
3
4
fd
Less than
c.f
-9
-10
-12
0
14
12
9
4
3
8
20
37
51
57
60
61
=8
= +
58
= 110 +
= 110.66.
61
Here maximum frequency is 17. The corresponding class 107.5-112.5 is the model class.
Using the formula of mode
( )
= +
2
We get
Quantitative Methods for Economic Analysis - I
Page 35
= 107.5 +
= 107.5 +
5(17 12)
2(17) 12 14
25
= 107.5 + 3.125 = 110.625
8
Median
Use the formula
=
Here 2 = 61 2 = 30.5
The cumulative frequency (c.f.) just greater than 30.5 is 37. So the corresponding class 107.5112.5 is the median class.
Substituting values in the median formula
5 61
= 107.5 +
20
17 2
5
(30.5 20)
= 107.5 +
17
= 107.5 +
5 10.5
17
Measurement Scale
Best Measure
Nominal
(Categorical)
Mode
Ordinal
Median
Interval
Ratio
Page 36
comprising all values greater than the median value and the other part comprising all the values
smaller than the median value.
Merits of median
(1) Simplicity:- It is very simple measure of the central tendency of the series. I the case of
simple statistical series, just a glance at the data is enough to locate the median value.
(2) Free from the effect of extreme values: - Unlike arithmetic mean, median value is not
destroyed by the extreme values of the series.
(3) Certainty: - Certainty is another merits is the median. Median values are always a certain
specific value in the series.
(4) Real value: - Median value is real value and is a better representative value of the series
compared to arithmetic mean average, the value of which may not exist in the series at all.
(5) Graphic presentation: - Besides algebraic approach, the median value can be estimated also
through the graphic presentation of data.
(6) Possible even when data is incomplete: - Median can be estimated even in the case of certain
incomplete series. It is enough if one knows the number of items and the middle item of the
series.
Demerits of median:
Following are the various demerits of median:
(1) Lack of representative character: - Median fails to be a representative measure in case of such
series the different values of which are wide apart from each other. Also, median is of limited
representative character as it is not based on all the items in the series.
(2) Unrealistic:- When the median is located somewhere between the two middle values, it
remains
only
an
approximate
measure,
not
a
precise
value.
(3) Lack of algebraic treatment: - Arithmetic mean is capable of further algebraic treatment, but
median is not. For example, multiplying the median with the number of items in the series will
not give us the sum total of the values of the series.
However, median is quite a simple method finding an average of a series. It is quite a commonly
used measure in the case of such series which are related to qualitative observation as and health
of the student.
Mode: The value of the variable which occurs most frequently in a distribution is called the
mode.
Merits of mode:
Following are the various merits of mode:
(1) Simple and popular: - Mode is very simple measure of central tendency. Sometimes, just at
the series is enough to locate the model value. Because of its simplicity, it s a very popular
measure of the central tendency.
(2) Less effect of marginal values: - Compared top mean, mode is less affected by marginal
values in the series. Mode is determined only by the value with highest frequencies.
Quantitative Methods for Economic Analysis - I
Page 37
(3) Graphic presentation:- Mode can be located graphically, with the help of histogram.
(4) Best representative: - Mode is that value which occurs most frequently in the series.
Accordingly,
mode
is
the
best
representative
value
of
the
series.
(5) No need of knowing all the items or frequencies: - The calculation of mode does not require
knowledge of all the items and frequencies of a distribution. In simple series, it is enough if one
knows
the
items
with
highest
frequencies
in
the
distribution.
Demerits of mode:
Following are the various demerits of mode:
(1) Uncertain and vague: - Mode is an uncertain and vague measure of the central tendency.
(2) Not capable of algebraic treatment: - Unlike mean, mode is not capable of further algebraic
treatment.
(3) Difficult: - With frequencies of all items are identical, it is difficult to identify the modal
value.
(4) Complex procedure of grouping:- Calculation of mode involves cumbersome procedure of
grouping the data. If the extent of grouping changes there will be a change in the model value.
(5) Ignores extreme marginal frequencies:- It ignores extreme marginal frequencies. To that
extent model value is not a representative value of all the items in a series.
Besides, one can question the representative character of the model value as its calculation does
not involve all items of the series.
Exercises
1. Find the measures of central tendency for the data set 3, 7, 9, 4, 5, 4, 6, 7, and 9.
Mean = 6, median = 6 and modes are 4, 7 and 9.Note that here mode is bimodal.
2. Four friends take an IQ test. Their scores are 96, 100, 106, 114. Which of the following
statements is true?
I. The mean is 103.
II. The mean is 104.
III. The median is 100.
IV. The median is 106.
Quantitative Methods for Economic Analysis - I
Page 38
(A) I only
(B) II only
(C) III only
(D) IV only
(E) None is true
The correct answer is (B). The mean score is computed from the equation:
Mean score = x / n = (96 + 100 + 106 + 114) / 4 = 104
Since there are an even number of scores (4 scores), the median is the average of the two middle
scores. Thus, the median is (100 + 106) / 2 = 103.
3. The owner of a shoe shop recorded the sizes of the feet of all the customers who bought shoes
in his shop in one morning. These sizes are listed below:
8 7 4 5 9 13 10 8 8 7 6 5 3 11 10 8 5 4 8 6
What is the mean of these values: 7.25
What is the median of these values: 7.5
What is the mode of these values: 8.
4. Eight people work in a shop. Their hourly wage rates of pay are:
Worker
Wage
14
Rs.
Work out the mean, median and mode for the values above.
Mean = 5.75, Median = 4.50, Mode = 4.00.
Using the above findings, if the owner of the shop wants to argue that the staff are paid well.
Which measure would they use? He will use mean. Because mean shows the highest value.
Using the above findings, if the staff in the shop want to argue that they are badly paid. Which
measure would they use? The staff will use mode as it is the lowest of the three measures of
central tendencies.
5. The table below gives the number of accidents each year at a particular road junction:
1991 1992 1993 1994 1995 1996 1997 1998
4
5
4
2
10
5
3
5
Work out the mean, median and mode for the values above.
Mean =4.75
Median =4.5
Mode =5
Using the above measures, a road safety group want to get the council to make this junction
safer.
Which measure will they use to argue for this? They will use mode as it is the figure which will
help them to justify their argument that the junction has a large number of accidents.
Page 39
Using the same data the council do not want to spend money on the road junction. Which
measure will they use to argue that safety work is not necessary? The council will use median as
this figure will help them to argue that the junction has less number of accidents.
6. Mr Sasi grows two different types of tomato plant in his greenhouse.
One week he keeps a record of the number of tomatoes he picks from each type of plant.
Day
Mon Tue Wed
Type A 5
5
4
Type B 3
4
3
(a) Calculate the mean, median and mode for the Type A plants.
Mean =3, Median = 4, Mode = 5.
(b) Calculate the mean, median and mode for the Type B plants.
Mean =5, Median = 4, Mode = 3.
(c) Which measure would you use to argue that there is no difference between the types?
We will use median as it is the same for both plants.
(d) Which measure would you use to argue that Type A is the best plant?
We will use mode as mode for type A is higher than B. Note that for type A mean is lower than
type B and median is the same for both types.
(e) Which measure would you use to argue that Type B is the best plant?
We will use mean as mean for type A is higher than type B.
Geometric Mean:
The geometric mean is a type of mean or average, which indicates the central tendency or typical
value of a set of numbers. It is similar to the arithmetic mean, which is what most people think of
with the word "average", except that the numbers are multiplied and then the n th root (where n is
the count of numbers in the set) of the resulting product is taken.
Geometric mean is defined as the nth root of the product of N items of series. If there are two
items, take the square root; if there are three items, we take the cube root; and so on.
Symbolically;
GM = ( )( ) ( )
Where X1, X2 .. Xn are refer to the various items of the series.
For instance, the geometric mean of two numbers, say 2 and 8, is just the square root of their
product; that is 2 8 = 4. As another example, the geometric mean of three numbers 1, , is
the cube root of their product (1/8), which is 1/2; that is
1 1 21 4=
8=
When the number of items are three or more, the task of multiplying the numbers and of
extracting the root becomes excessively difficult. To simplify calculations, logarithms are used.
GM then is calculated as follows.
log G.M =
Page 40
G.M. =
log X
G.M. = Antilog N
f log X
In discrete series GM = Antilog
Nf log m
In continuous series GM = Antilog
N
Where f = frequency
M = mid point
Merits of G.M
1. It is based on each and every item of the series.
2. It is rigidly defined.
3. It is useful in averaging ratios and percentages and in determining rates of increase and
decrease.
4. It is capable of algebraic manipulation.
Limitations
1. It is difficult to ounderstant
2. It is difficult to compute and to interpret
3. It cant be computed when there are negative and positive values in a series or one or
more of values is zero.
4. G.M has very limited applications.
Harmonic Mean
Harmonic mean is a kind of average.It is the mean of a set of positive variables. It is calculated
by dividing the number of observations by the reciprocal of each number in the series.
Harmonic Mean of a set of numbers is the number of items divided by the sum of the reciprocals
of the numbers. Hence, the Harmonic Mean of a set of n numbers i.e. a1, a2, a3, ... an, is given as
=
+ +
.
.
12
= 2
12
24
=
= 3.43
7
7
Page 41
2. It is difficult to compute
3. It gives larges weight to smallest item.
Page 42
The study of dispersion is very important in statistical data. If in a certain factory there is
consistence in the wages of workers, the workers will be satisfied. But if some workers have high
wages and some have low wages, there will be unrest among the low paid workers and they
might go on strikes and arrange demonstrations. If in a certain country some people are very
poor and some are very high rich, we say there is economic disparity. It means that dispersion is
large. The idea of dispersion is important in the study of wages of workers, prices of
commodities, standard of living of different people, distribution of wealth, distribution of land
among framers and various other fields of life. Some brief definitions of dispersion are:
The degree to which numerical data tend to spread about an average value is called the
dispersion or variation of the data.
Dispersion or variation may be defined as a statistics signifying the extent of the scatteredness of
items around a measure of central tendency.
Dispersion or variation is the measurement of the scatter of the size of the items of a series about
the average.
There are five frequently used measures of variability: the Range, Interquartile range or quartile
deviation, Mean deviation or average deviation, Standard deviation and Lorenz curve.
7.1 Range
The range is the simplest measure of variability to calculate, and one you have
probably encountered many times in your life. The range is simply the highest
score minus the lowest score.
Range: R = maximum minimum
Lets take a few examples. What is the range of the following group of numbers: 10, 2, 5, 6, 7, 3,
4. Well, the highest number is 10, and the lowest number is 2, so 10 - 2 = 8. The range is 8.
Lets take another example. Heres a dataset with 10 numbers: 99, 45, 23, 67, 45, 91, 82, 78, 62,
51. What is the range. The highest number is 99 and the lowest number is 23, so 99 - 23 equals
76; the range is 76.
Example2: Ms. Kesavan listed 9 integers on the blackboard. What is the range of these integers?
14, -12, 7, 0, -5, -8, 17, -11, 19
Ordering the data from least to greatest, we get:
-12, -11, -8, -5, 0, 7, 14, 17, 19
Range: R = highest - lowest = 19 - -12 = 19 + +12 = +31
Quantitative Methods for Economic Analysis - I
Page 43
Page 44
Now the lower quartile ( Q1 ) is the 25th percentile and the upper quartile (Q3 ) is the 75th
percentile. It is interesting to note that the 50th percentile is the middle quartile ( Q2 ) which is in
fact what you have studied under the title Median . Thus symbolically
Inter quartile range = Q3 - Q1
If we divide ( Q3 - Q1 ) by 2 we get what is known as Semi-Iinter quartile range.
i.e.
Another look at the same issue is given here to make the concept more clear for the student.
In the same way that the median divides a dataset into two halves, it can be further divided into
quarters by identifying the upper and lower quartiles. The lower quartile is found one quarter of
the way along a dataset when the values have been arranged in order of magnitude; the upper
quartile is found three quarters along the dataset. Therefore, the upper quartile lies half way
between the median and the highest value in the dataset whilst the lower quartile lies halfway
between the median and the lowest value in the dataset. The inter-quartile range is found by
subtracting the lower quartile from the upper quartile.
For example, the examination marks for 20 students following a particular module are arranged
in order of magnitude.
median lies at the mid-point between the two central values (10th and 11th)
= half-way between 60 and 62 = 61
The lower quartile lies at the mid-point between the 5th and 6th values
= half-way between 52 and 53 = 52.5
The upper quartile lies at the mid-point between the 15th and 16th values
= half-way between 70 and 71 = 70.5
The inter-quartile range for this dataset is therefore 70.5 - 52.5 = 18 whereas the range is: 80 - 43
= 37.
The inter-quartile range provides a clearer picture of the overall dataset by removing/ignoring the
outlying values.
Page 45
Like the range however, the inter-quartile range is a measure of dispersion that is based upon
only two values from the dataset. Statistically, the standard deviation is a more powerful measure
of dispersion because it takes into account every value in the dataset. The standard deviation is
explored in the next section.
Example 1
The wheat production (in Kg) of 20 acres is given as: 1120, 1240, 1320, 1040, 1080, 1200, 1440,
1360, 1680, 1730, 1785, 1342, 1960, 1880, 1755, 1720, 1600, 1470, 1750, and 1885. Find the
quartile deviation and coefficient of quartile deviation.
After arranging the observations in ascending order, we get
1040, 1080, 1120, 1200, 1240, 1320, 1342, 1360, 1440, 1470, 1600, 1680, 1720, 1730, 1750,
1755, 1785, 1880, 1885, 1960.
+1
4
20 + 1
(5.25)
=5
+ 0.25(6
=
= 15
= 1240 + 20 = 1260
3( + 1)
4
3(20 + 1)
4
(15.75)
+ 0.75(16
15
( . .) =
Example 2
1753.75 1260
492.75
=
= 246.88
2
2
1753.75 1260
= 0.164
1753.75 + 1260
Page 46
Wages ( )
Labourers
30 32
12
32 34
18
34 36
16
36 38
14
38 40
12
40 42
42 - 44
Solution
Range : = L S
Calculation of Quartiles :
X
c.f
30 32
12
12
32 34
18
30
34 36
16
46
36 38
14
60
38 40
12
72
40 42
80
42 - 44
86
= Size of
=
item
= 21.5
i
.
= 32 + 1.06
= 33.06
====
= Size of
item
Page 47
= 3 = 64.5 item
lies in the group 38 40
.
=L+
= 38 +
= 38 + 0.75
= 38.75
Q.D =
=
=
= 2.85
===
Coefficient of Q.D. =
=
=
= 0.08
Merits of Quartile Deviation
1. It is simple to understand and easy to calculate.
2. It is not influenced by extreme values.
3. It can be found out with open end distribution.
4. It is not affected by the presence of extreme values.
Demerits
1. It ignores the first 25% of the items and the last 25% of the items.
2. It is a positional average : hence not amenable to further mathematical treatment.
3. The value is affected by sampling fluctuations.
7.3
Mean Deviation or Average Deviation
Average deviations (mean deviation) is the average amount of variations
(scatter) of the items in a distribution from either the mean or the median or
the mode, ignoring the signs of these deviations. In other words, the mean
deviation or average deviation is the arithmetic mean of the absolute
deviations.
Example 1: Find the Mean Deviation of 3, 6, 6, 7, 8, 11, 15, 16
Step 1: Find the mean:
=9
Page 48
Value
Distance
from 9
3
6
6
7
8
11
15
16
6
3
3
2
1
2
6
7
6+3+3+2+1+2+6+7
30
=
= 3.75
8
8
It tells us how far, on average, all values are from the middle.
In that example the values are, on average, 3.75 away from the middle.
The formula is:
Where
| |
Page 49
Let us redo example 1 using the formula: Find the Mean Deviation of 3, 6, 6, 7,
8, 11, 15, 16
Step 1: Find the mean:
3 + 6 + 6 + 7 + 8 + 11 + 15 + 16
72
=
=9
8
8
x-
|x - |
-6
-3
-3
-2
-1
11
15
16
7
| | = 30
| |
Example 2
30
= 3.75
8
Calculate the mean deviation using mean for the following data
2-4
4-6
6-8
8-10
Solution
Class
Mid
Frequency
Value
(f)
d = X-5
(X)
Quantitative Methods for Economic Analysis - I
fd
| |
= | 5.2|
| |
Page 50
2-4
-2
-6
2.2
6.6
4-6
0.2
0.8
6-8
1.8
3.6
8-10
3.8
3.8
= 10
| |
=2
Example 3
= 14.8
2
= 5.2
10
14.8
| |=
= 1.48
10
= 5+
0-10
10-20
20-30
30-40
40-50
50-60
60-70
12
10
Interval
Frequency 8
f
Solution
Let us first make the necessary computations.
Class
interval
Mid
value
(X)
Frequency
(f)
Less
than
c.f.
fX
0-10
10-20
15
12
20-30
25
30-40
|
|
40
| |
=|
29|
24
192
20
180
14
168
10
30
250
35
38
280
40-50
45
41
50-60
55
60-70
65
136
84
40
30
48
13
104
135
16
48
23
69
43
110
26
52
33
66
50
455
36
252
43
301
N=50
|
|
=|
22|
17
|
= 800
= 1450
( )=
1450
= 29
50
= 790
Page 51
. .=
| |=
800
= 16
50
(N/2) =(50/2) = 25. The c.f. just greater than 25 is 30 in the table above. So the
corresponding class 20-30 is the median class.
Sol= lower limit of the median class = 20, f = frequency of the median class =
25, h = class interval of the median class =10,c = cumulative frequency of the
preceding median class =20.
Use the formula of median to substitute values.
= 20 +
= + ( )
2
10
(25 20) = 20 + 2 = 22
25
|=
790
= 15.8
50
Thus we have computed Mean Deviation from Mean and Median. Let us
compare the two results. MD from Mean is 16 and MD from median is 15.8.
So, M.D. from Median < M.D. from Mean. This implies that M.D. is least when
taken about median.
Merits of M.D.
i.
ii.
iii.
iv.
v. It is rigidly defined.
vi. It is a better measure for comparison.
Demerits of M.D.
i.
It is a non-algebraic treatment
ii.
iii.
Uses :
Page 52
Standard Deviation
The concept, standard deviation was introduced by Karl Pearson in 1893. It is the most
important measure of dispersion and is widely used. It is a measure of the dispersion of a set of
data from its mean. The standard deviation is kind of the mean of the mean, and often can help
you find the story behind the data.
The standard deviation is a measure that summarises the amount by which every value within a
dataset varies from the mean. Effectively it indicates how tightly the values in the dataset are
bunched around the mean value. It is the most robust and widely used measure of dispersion
since, unlike the range and inter-quartile range, it takes into account every variable in the dataset.
When the values in a dataset are pretty tightly bunched together the standard deviation is small.
When the values are spread apart the standard deviation will be relatively large.
Standard deviation is defined as a statistical measure of dispersion in the value of an asset around
mean. The standard deviation calculation tells you how spread out the numbers are in your
sample. Standard Deviation is represented using the symbol (
).
For example if you want to measure the performance a mutual fund, SD can be used. It gives an
idea of how volatile a fund's performance is likely to be. It is an important measure of a fund's
performance. It gives an idea of how much the return on the asset at a given time differs or
deviates from the average return. Generally, it gives an idea of a fund's volatility i.e. a higher
dispersion (indicated by a higher standard deviation) shows that the value of the asset has
fluctuated over a wide range.
The formula for finding SD in a sentence form is : it is the square root of the Variance. So now
you ask, What is the Variance. Let us see what is variance.
The Variance is defined as:The average of the squared differences from the Mean.
We can calculate the variance follow these steps:
a. Work out the Mean (the simple average of the numbers)
b. Then for each number: subtract the Mean and square the result (the squared difference).
c. Then work out the average of those squared differences.
You may ask Why square the differences. If we just added up the differences from the mean ...
the negatives would cancel the positives as shown below. So we take the square.
Page 53
Example
You have figures of the marks obtained by your five bench mates which is as
follows: 600, 470, 170, 430 and 300. Find out the Mean, the Variance, and the
Standard Deviation.
Your first step is to find the Mean:
=
So the mean (average) mark is 394. Let us plot this on the chart:
x
600
206
470
76
5776
170
-224
50176
430
36
1296
300
-94
8836
42436
( )
( )
= 108520
To calculate the Variance, take each difference, square it, find the sum
(108520) and find average:
108520
=
= 21704
5
So, the Variance is 21,704.
The Standard Deviation is just the square root of Variance, so:
SD = = 21704 = 147.32 147
Now we can see which heights are within one Standard Deviation (147) of the
Mean.
Please note that there is a slight difference when we find variance from a
population and mean. In the above example we found out variance for data
collected from all your bench mates. So it may be considered as population.
Suppose now you collect data only from some of your bench mates. Now it may
be considered as a sample. If you are finding variance for a sample data, in the
formula to find variance, divide by N-1 instead of N.
For example, if we say that in our problem the marks are of some students in a
class, it should be treated as a sample. In that case
Variance (or to be precise Sample Variance) = 108,520 / 4 = 27,130. Note that
instead of N (i.e.5) we divided by N-1 (5-1=4).
Standard Deviation (Sample Standard Deviation) = = 27130 = 164.31 164
Quantitative Methods for Economic Analysis - I
Page 54
Based on the above information, let us build the formula for finding SD. Since
we use two different formulae for data which is population and data which is
sample, we will have two different formula for SD also.
methods
to
Example 1
Calculate SD for the following observations using different methods.
160, 160, 161, 162, 163, 163, 163, 164, 164, 170
(a) Direct method No.1
Formula
X
160
= 7.4 = 2.72
-3
160
-3
161
-2
162
-1
163
163
163
164
164
170
49
= 1630
Now compute SD
= 163
= 74
Page 55
160
25600
160
25600
161
25921
162
26244
163
26569
163
26569
163
26569
164
26896
164
26896
170
28900
= 1630
=
=
= 2657640
(c)Method 3 (Short Cut Method) in this method instead of finding the mean we assume a
figure as mean. Here we have assumed 162 as mean arbitrarily.
We use the formula
X
160
160
161
162
163
4
4
1
0
1
Page 56
163
163
164
164
170
1630
1
1
2
2
8
+10
1
1
4
4
64
= 84
84
10
10
10
= 8.4 1
= 7.4 = 2.72
(c)Mean
(d)Variance
(d)Variance
=
=
2+3+4+5+6+7+9
= 5.143
7
2 +3 +4 +5 +6 +7 +9
5.143 = 4.978
7
(e)Standard Deviation
= 4.978 = 2.231
2
3
4
5
6
7
9
Quantitative Methods for Economic Analysis - I
| |
= | 5.143|
3.143
2.143
1.143
0.143
0.857
1.857
3.857
Page 57
| |
= 13.143
13.143
= 1.878
7
10-20
20-30
30-40
40-50
50-60
60-70
12
30
45
50
37
21
Midpoint
(m)
0-10
10-20
15
12
20-30
25
30-40
35
40-50
=
35)
10
-3
fd
fd 2
-15
45
-2
-24
48
30
-1
-30
30
45
45
50
50
50
50-60
55
37
74
148
60-70
65
21
63
189
= 118
= 510
N = 200
= 35 +
118
10 = 35 + 5.9 = 40.9
200
Page 58
510
118
200
200
10
= 2.55 348110
=1.483910=14.839.
Merits of Standard Deviation
1.
2.
3.
4.
It is rigidly defined and its value is always definite and based on all observation.
As it is based on arithmetic mean, it has all the merits of arithmetic mean.
It is possible for further algebraic treatment.
It is less affected by sampling fluctuations.
Demerits
1. It is not easy to calculate.
It gives more weight to extreme values, because the values are squared up.
Coefficient of Variation
Standard deviation is the absolute measure of dispersion. It is expressed in
terms of the units in which the original figures are collected and stated. The relative
measure of standard deviation is known as coefficient of variation.
Variance : Square of Standard deviation
Symbolically;
Variance
=
=
Page 59
Find the percentage of each of the cumulated figures taking the grand total of each
corresponding column as 100.
Represent the percentage of the cumulated frequencies on X axis and those of the values
on the Y axis.
Plot the percentages of cumulated values against the percentages of the cumulated
frequencies of a given distribution and join the points so plotted through a free hand
curve.
Page 60
The greater the distance between the curve and the line of equal distribution, the
greater the dispersion. If the Lorenz curve is nearer to the line of equal distribution, the
dispersion or variation is smaller.
Based on data of annual income of 8 individuals we have drawn a Lorenz curve
below using MS Excel.
Individual
Income
%
population
%
income
Cumulative
Income %
5000
12.5
1.204819
1.204819
12000
25
2.891566
4.096385
18000
37.5
4.337349
8.433735
30000
50
7.228916
15.66265
40000
62.5
9.638554
25.3012
60000
75
14.45783
39.75904
100000
87.5
24.09639
63.85542
150000
100
36.14458
100
415000
Page 61
Example
From the following table giving data regarding income of workers in a factory, draw
Lorenz Curve to study inequality of income
The following method for constructing Lorenz Curve.
1.
2.
Percentage must be calculated for each cumulation value of the size and
frequency of items.
3.
Plot the percentage of the cumulated values of the variable against the
percentage of the corresponding cumulated frequencies. Join these points with as
smooth free hand curve. This curve is called Lorenz curve.
4.
Zero percentage on the X axis must be joined with 100% on Y axis. This line is
called the line of equal distribution.
Mid value
Cumulative
income
% of
cumulative
income
No. of
workers (f)
Cumulative
no. of
workers
0-500
250
250
2.94
6000
6000
% of
Cumulative
no. Of
workers
37.50
500-1000
750
1000
11.76
4250
10250
64.06
1000-2000
1500
2500
29.41
3600
13850
86.56
2000-3000
2500
5000
58.82
1500
15350
95.94
3000-4000
3500
8500
100.00
650
16000
100.00
Income
8500
16000
Page 62
Page 63
The Gini coefficient (also known as the Gini index or Gini ratio) is a measure of statistical
dispersion intended to represent the income distribution of a nation's residents. This is the most
commonly used measure of inequality. The coefficient varies between 0, which reflects complete
equality and 1, which indicates complete inequality (one person has all the income or
consumption, all others have none). It was developed by the Italian statistician and sociologist
Corrado Gini in 1912.
8.3 - Skewness
We have discussed earlier techniques to calculate the deviations of a
distribution from its measure of central tendency (mean / median, mode ).
Here we see another measure for that named Skewness. Skewness characterizes
the degree of asymmetry of a distribution around its mean. If there is only one mode (peak)
in our data (unimodel) , and if the other data are distributed evenly to the left and right of
this value, if we plot it in a graph, we get a curve like this, which is called a normal curve
(See figure below). Here we say that there is no skewness or skewness = 0. If there is zero
skewness (i.e., the distribution is symmetric) then the mean = median for this distribution.
Page 64
However data need not always be like this. Sometimes the bulk of the data is at the left and the
right tail is longer, we say that the distribution is skewed right or positively skewed. Positive
skewness indicates a distribution with an asymmetric tail extending towards more positive
values.On the other hand, sometimes the bulk of the data is at is at the right and the left tail is
longer, we say that the distribution is skewed left or negatively skewed. Negative skewness
indicates a distribution with an asymmetric tail extending towards more negative values"
Skewed Left
Symmetric
Skewed Right
Tests of Skewness
There are certain tests to know whether skewness does or does not exist in a frequency
distribution.
They are :
1. In a skewed distribution, values of mean, median and mode would not coincide. The
values of mean and mode are pulled away and the value of median will be at the centre.
In this distribution, mean-Mode = 2/3 (Median - Mode).
2. Quartiles will not be equidistant from median.
3. When the asymmetrical distribution is drawn on the graph paper, it will not give a
bell shapedcurve.
4. Sum of the positive deviations from the median is not equal to sum of negative
deviations.
5. Frequencies are not equal at points of equal deviations from the mode.
Nature of Skewness
Skewness can be positive or negative or zero.
1. When the values of mean, median and mode are equal, there is no skewness.
2. When mean > median > mode, skewness will be positive.
3. When mean < median < mode, skewness will be negative.
Characteristic of a good measure of skewness
1. It should be a pure number in the sense that its value should be independent of the
unit of the series and also degree of variation in the series.
2. It should have zero-value, when the distribution is symmetrical.
Page 65
Measures of Skewness
=
Properties of Karl Pearson coefficient of Skewness
(1)1 Skp 1.
Page 66
Then
(
(
=
)(
)+(
)
)
Page 67
Note:
In the above equation, where the Qs denote the interquartile ranges. Divide a set of data into two
groups (high and low) of equal size at the statistical median if there is an even number of data
points, or two groups consisting of points on either side of the statistical median itself plus the
statistical median if there is an odd number of data points. Find the statistical medians of the low
and high groups, denoting these first and third quartiles by Q1 and Q3. The interquartile range
is then defined by IQR = Q3 - Q1.
Properties of Bowleys coefficient of skewness
1 1 Skq 1.
2 Skq = 0 distribution is symmetrical about mean.
3 Skq> 0 distribution is skewed to the right.
4 Skq< 0 distribution is skewed to the left.
Advantageof Bowleys coefficient of skewness
Skq does not depend on extreme values.
Disadvantage of Bowleys coefficient of skewness
Skq does not utilize the data fully.
Example
The following table shows the distribution of 128 families according to the number of
children.
No of children
No of families
20
15
25
30
18
10
8 or more
Page 68
20
20
15
35
25
60
30
90
18
108
10
118
124
127
8 or more
128
= (32.25)th observation
=1
= (64.5)th observation
=3
= (96.75)th observation
=4
=
( )
1
= 0.333
3
Bowleys measure of skewness is based on the middle 50% of the observations because
it leaves 25% of the observations on each extreme of the distribution.As an
Quantitative Methods for Economic Analysis - I
Page 69
improvement over Bowleys measure, Kelly has suggested a measure based on P10 and,
P90 so that only 10% of the observations on each extreme are ignored.
=
(
(
8.4 - KURTOSIS
)(
)+(
)
)
Mesokurtic
Leptokurtic
Platykurtic
Page 70
Measures of Kurtosis
Moment ratio and Percentile Coefficient of kurtosis are used to measure the kurtosis
Page 71
. .
Page 72
A population is any entire collection of people, animals, plants or things from which we
may collect data. It is the entire group we are interested in, which we wish to describe
or draw conclusions about.
A population is an entire set of individuals or objects, which may be finite or infinite.
Examples of finite populations include the employees of a given company, the number
of airplanes owned by an airline, or the potential consumers in a target market.
Examples of infinite populations include the number of watches manufactured by a
company that plans to be in business forever, or the grains of sand on the beaches of the
world or stars in the sky.
For a deeper understanding of a population, consider a market researcher for a fast food
chain who might want to determine the flavour preferences of Indian customers
between the ages of 15 and 25. The population in this example is finite and includes
every Indian in this age group of 15-25.
Note that population does not refer to people only. Statisticians also speak of a
population of objects, or events, or procedures, or observations, including such things
as the quantity of haemoglobin in blood, number of visits to the doctor by a patient, or
number surgical operations by a doctor. A population is thus an aggregate of creatures,
things, cases and so on.
Sample
A population commonly contains too many individuals to study conveniently, so gathering data
from every individual in this population would be nearly impossible and prohibitively expensive.
So an investigation is often restricted to a part drawn from it, which is called a sample. A part of
the population is called a sample. It is a proportion of the population, a slice of it, a part of it and
all its characteristics.
A sample is a group of units selected from a larger group (the population). By studying the
sample it is hoped to draw valid conclusions about the larger group.
A sample is a smaller group of members of a population selected to represent the population.
A sample is a subset of population.
A sample is a scientifically drawn group that actually possesses the same characteristics as the
population if it is drawn randomly. Thus a well-chosen sample will contain most of the
information about a particular population parameter but the relation between the sample and the
population must be such as to allow true inferences to be made about a population from that
sample.
The best example of sampling is what housewives do in a kitchen to see whether rice has cooked
enough by tasting just one piece of grain.
If the sample is to be used to make inferences about the population the sample data must be
unbiased. In order for a sample to be unbiased, it must be
Quantitative Methods for Economic Analysis - I
Page 73
A populationincludes each element from the set of observations that can be made.
Page 74
Depending on the sampling method, a sample can have fewer observations than the population,
the same number of observations, or more observations. More than one sample can be derived
from the same population.
Other differences are related to terms used. For example,
The mean of a population is denoted by the symbol ; but the mean of a sample is
denoted by the symbol x.
What is the difference between information based on a sample and information based on a
population: Information based on a sample is, by definition, incomplete; as such, a sample
demands that inferences be drawn regarding the population from which it came. Information
based on a population, however, is considered complete, and therefore requires no inferential
leap to be made.
What Characteristics are necessary before a sample can be considered random: The members of
the sample must be chosen based on chance from the population. Each member of the population
must have an equal likelihood of being chosen.
What is the consequence of failing to have a random sample from a population?: A sample is a
subset of a population. If a sample is randomly selected and sufficiently large, the information
obtained from the sample will be representative of the population. A small sample, or one that is
not drawn in a random fashion, may be biased. Making inferences from a biased sample to a
population is ill-advised and may lead to costly business mistakes.
Different methods of sampling
There are numerous sample selection methods for drawing the sample from the population,
broadly classified into random or probability-based sampling schemes or survey design methods,
and non-random or non-probability based sampling.
Probability Sampling
Probability samples are selected in such a way as to be representative of the population. They
provide the most valid or credible results because they reflect the characteristics of the
population from which they are selected.
The following sampling methods are types of probability sampling:
1.
2.
3.
4.
5.
6.
1.
The most widely known type of a random sample is the simple random sample (SRS). This is
characterized by the fact that the probability of selection is the same for every case in the
Quantitative Methods for Economic Analysis - I
Page 75
population. All have an equal chance of being selected. Simple random sampling is a method of
selecting n units from a population of size N such that every unit of the population has equal
chance of being selected.
There are two methods by which we can select a random sample
(a) Lottery Method
An example may make this easier to understand. Imagine you want to carry out a survey of 100
voters in a small town with a population of 1,000 eligible voters. One method of SRS is that we
write the names of all voters on a piece of paper, put all pieces of paper into a box and draw 100
tickets at random. The draw is done in this manner - Shake the box, draw a piece of paper and set
it aside, shake again, draw another, set it aside, etc. until we had 100 slips of paper. These 100
form our sample. And this sample would be drawn through a simple random sampling procedure
- at each draw, every name in the box had the same probability of being chosen. This is called
the lottery method of random sampling.
(b) Table of random numbers:
The lottery method is a clumsy physical process for choosing random samples. Often it is
convenient to use a ready-made table of random numbers. A random number table is a table of
digits. The digit given in each position in the table was originally chosen randomly from the
digits 1,2,3,4,5,6,7,8,9,0 by a random process in which each digit is equally likely to be chosen.
Thus a random number table is a series of digits (0 to 9) arranged randomly through the rows and
columns. Table 1 gives part of table of random numbers. The digits are often grouped in fives as
shown here.
Table 1 : table of Random Numbers
The researcher can use the list of random numbers to draw a simple random sample from a
population.
Step 1: each element in the population from which the sample is to be drawn must be assigned a
unique number. This is usually done by numbering the elements in the population consecutively.
If there were 280 elements in the population, for example, they would be numbered 001, 002,
003. . . 280. Here is one procedure for using Table B.1 to select a simple random sample:
Quantitative Methods for Economic Analysis - I
Page 76
Step 2: determine a starting point in the table by closing your eyes and placing the point of your
pencil anywhere in the table.
Step 3:Using the starting point you have selected, begin reading the numbers in the table either
across the rows or down the columns. If your population consisted of 99 or fewer elements, read
the numbers in two-digit units; for 999 or fewer elements in the population, read the numbers in
three-digit units, and so forth. If a table number is larger than the number of elements in the
population (e.g., if the table number is 323 and the your population is 286), skip that number and
read the next. If you come to a number equivalent to one you have already drawn, you can either
skip the number and read the next one or count the data for that unit of analysis twice. Continue
until you have selected as many valid numbers as there are elements in your desired sample.
The population elements that comprise the simple random sample are those whose
numbers correspond to the numbers read from the table.
For example, you have to select a sample 5 students from a population of 75 students.
First give numbers to all students from 1 to 75. Now through process in step two above,
place your pencil anywhere on the table. Suppose you place on 62570 in 2nd column and
4th row. Since step 3 above says If your population consisted of 99 or fewer elements, read
the numbers in two-digit units, we read only the first two digits, so it is 62. So the 62nd student is
our 1st sample. (If in case you get a number which is bigger than your sample, then you take the
next number from the table). Now to get the next sample, move in the table in any direction from
the number you have chosen. Suppose we decide to keep moving move down the column. So
the next digit is 26440. We take the first two digits, so the number is 26. This means 26 th student
is our 2nd sample. Going down the column, we get 47174, so it is 47. So the 47th student is our 3rd
sample. Moving down, 34378, we take 34. So 34th student is our 4th sample. Next is 22466, so
22nd student is our 5th sample.
Stratified Random Sampling
In this form of sampling, the population is first divided into two or more mutually exclusive
segments based on some categories of variables of interest in the research. It is designed to
organize the population into homogenous subsets before sampling, then drawing a random
sample within each subset. With stratified random sampling the population of N units is divided
into subpopulations of units respectively. These subpopulations, called strata, are nonoverlapping and together they comprise the whole of the population. When these have been
determined, a sample is drawn from each, with a separate draw for each of the different strata.
The sample sizes within the strata are denoted by respectively. If a SRS is taken within each
stratum, then the whole sampling procedure is described as stratified random sampling.
The primary benefit of this method is to ensure that cases from smaller strata of the population
are included in sufficient numbers to allow comparison.
Systematic Sampling
This method of sampling is at first glance very different from SRS. In practice, it is a variant of
simple random sampling that involves some listing of elements - every nth element of list is then
Quantitative Methods for Economic Analysis - I
Page 77
drawn for inclusion in the sample. Say you have a list of 10,000 people and you want a sample of
1,000.
Creating such a sample includes three steps:
1. Divide number of cases in the population by the desired sample size. In this example,
dividing 10,000 by 1,000 gives a value of 10.
2. Select a random number between one and the value attained in Step 1. In this example,
we choose a number between 1 and 10 - say we pick 7.
3. Starting with case number chosen in Step 2, take every tenth record (7, 17, 27, etc.).
More generally, suppose that the N units in the population are ranked 1 to N in some order (e.g.,
alphabetic). To select a sample of n units, we take a unit at random, from the 1st k units and take
every k- unit thereafter.
Cluster Sampling
In some instances the sampling unit consists of a group or cluster of smaller units that we call
elements or subunits (these are the units of analysis for your study). There are two main reasons
for the widespread application of cluster sampling. Although the first intention may be to use the
elements as sampling units, it is found in many surveys that no reliable list of elements in the
population is available and that it would be prohibitively expensive to construct such a list. In
many countries there are no complete and updated lists of the people, the houses or the farms in
any large geographical region.
Even when a list of individual houses is available, economic considerations may point to the
choice of a larger cluster unit. For a given size of sample, a small unit usually gives more precise
results than a large unit. For example a SRS of 600 houses covers a town more evenly than 20
city blocks containing an average of 30 houses each. But greater field costs are incurred in
locating 600 houses and in traveling between them than in covering 20 city blocks. When cost is
balanced against precision, the larger unit may prove superior.
Nonprobability Sampling
Social research is often conducted in situations where a researcher cannot select the kinds of
probability samples used in large-scale social surveys. For example, say you wanted to study
homelessness - there is no list of homeless individuals nor are you likely to create such a list.
However, you need to get some kind of a sample of respondents in order to conduct your
research. To gather such a sample, you would likely use some form of non-probability sampling.
To restate, the primary difference between probability methods of sampling and non-probability
methods is that in the latter you do not know the likelihood that any element of a population will
be selected for study.
There are four primary types of non-probability sampling methods:
Availability Sampling
Availability sampling is a method of choosing subjects who are available or easy to find. This
method is also sometimes referred to as haphazard, accidental, or convenience sampling. The
primary advantage of the method is that it is very easy to carry out, relative to other methods. For
Quantitative Methods for Economic Analysis - I
Page 78
example if you want to collect data from women alone, you may stand in a crowded market place
and distribute your schedule as you wish
Quota Sampling
Quota sampling is designed to overcome the most obvious flaw of availability sampling. Rather
than taking just anyone, you set quotas to ensure that the sample you get represents certain
characteristics in proportion to their prevalence in the population. Note that for this method, you
have to know something about the characteristics of the population ahead of time. Say you want
to make sure you have a sample proportional to the population in terms of gender - you have to
know what percentage of the population is male and female, then collect sample until yours
matches. Marketing studies are particularly fond of this form of research design.
Purposive or judgmental Sampling
Purposive sampling is a sampling method in which elements are chosen based on purpose of the
study. Purposive sampling may involve studying the entire population of some limited group
(Economics BA students of Calicut University) or a subset of a population (Economics BA
students of Calicut University who are women). As with other non-probability sampling
methods, purposive sampling does not produce a sample that is representative of a larger
population, but it can be exactly what is needed in some cases - study of organization,
community, or some other clearly defined and relatively limited group.
Snowball Sampling
Snowball sampling is a method in which a researcher identifies one member of some population
of interest, speaks to him/her, then asks that person to identify others in the population that the
researcher might speak to. This person is then asked to refer the researcher to yet another person,
and so on. Snowball sampling is very good for cases where members of a special population are
difficult to locate.
The best sampling method is the sampling method that most effectively meets the particular
goals of the study in question. The effectiveness of a sampling method depends on many factors.
Because these factors interact in complex ways, the best sampling method is seldom obvious.
Good researchers use the following strategy to identify the best sampling method.
List the research goals (usually some combination of accuracy, precision, and/or cost).
Identify potential sampling methods that might effectively achieve those goals.
Choose the method that does the best job of achieving the goals.
***********************************
Page 79
Module II
CORRELATION AND REGRESSION ANALYSIS
Module II. Correlation and Regression Analysis
Correlation-Meaning, Types and Degrees of Correlation- Methods of Measuring CorrelationGraphical Methods: Scatter Diagram and Correlation Graph; Algebraic Methods: Karl
Pearsons Coefficient of Correlation and Rank Correlation Coefficient - Properties and
Interpretation of Correlation Coefficient
Introduction
Correlation is a statistical technique which tells us if two variables are related.For
example, consider the variables family income and family expenditure. It is well known that
income and expenditure increase or decrease together. Thus they are related in the sense that
change in any one variable is accompanied by change in the other variable.Again price and
demand of a commodity are related variables; when price increases demand will tend to
decreases and vice versa. If the change in one variable is accompanied by a change in the other,
then the variables are said to be correlated. We can therefore say that family income and family
expenditure, price and demand are correlated.
Correlation can tell us something about the relationship between variables. It is used to
understand:a. whether the relationship is positive or negative b. the strength of relationship.
Correlation is a powerful tool that provides these vital pieces of information.
In the case of family income and family expenditure, it is easy to see that they both rise or fall
together in the same direction. This is called positive correlation.
In case of price and demand, change occurs in the opposite direction so that increase in one is
accompanied by decrease in the other. This is called negative correlation.
Coefficient of Correlation
Correlation is measured by what is called coefficient of correlation (r). A correlation coefficient
is a statistical measure of the degree to which changes to the value of one variable predict change
to the value of another. Correlation coefficients are expressed as values between +1 and -1. Its
numerical value gives us an indication of the strength of relationship. In general, r > 0 indicates
positive relationship, r < 0 indicates negative relationship while r = 0 indicates no relationship
(or that the variables are independent and not related). Here r = +1.0 describes a perfect positive
correlation and r = 1.0 describes a perfect negative correlation. Closer the coefficients are to
+1.0 and 1.0, greater is the strength of the relationship between the variables. As a rule of
thumb, the following guidelines on strength of relationship are often useful (though many experts
would somewhat disagree on the choice of boundaries).
Page 80
Value of r
Strength of relationship
Correlation is only appropriate for examining the relationship between meaningful quantifiable
data (e.g. air pressure, temperature) rather than categorical data such as gender, favourite colour
etc.
A key thing to remember when working with correlations is never to assume a correlation means
that a change in one variable causes a change in another. Sales of personal computers and
athletic shoes have both risen strongly in the last several years and there is a high correlation
between them, but you cannot assume that buying computers causes people to buy athletic shoes
(or vice versa).
The second caution is that the Pearson correlation technique (which we are about to see) works
best with linear relationships: as one variable gets larger (or smaller), the other gets larger (or
smaller) in direct proportion. It does not work well with curvilinear relationships (in which the
relationship does not follow a straight line). An example of a curvilinear relationship is age and
health care. They are related, but the relationship doesn't follow a straight line. Young children
and older people both tend to use much more health care than teenagers or young adults. (In such
cases, the technique of multiple regression can be used to examine curvilinear relationships)
Graphical Method
(a) Scatter Diagram
(b) Correlation Graph
II. Algebraic Method (Coefficient of Correlation)
(a) Karl Pearsons Coefficient of Correlation
(b) Spearmans Rank Correlation Coefficient
I. (a) Scatter Diagram
Scatter Diagram (also called scatter plot, XY graph) is a graph that shows the relationship
between two quantitative variables measured on the same individual. Each individual in the data
set is represented by a point in the scatter diagram. The predictor variable is plotted on the
Quantitative Methods for Economic Analysis - I
Page 81
horizontal axis and the response variable is plotted on the vertical axis. Do not connect the points
when drawing a scatter diagram. The scatter diagram graphs pairs of numerical data, with one
variable on each axis, to look for a relationship between them. If the variables are correlated, the
points will fall along a line or curve. The better the correlation, the tighter the points will hug the
line. Scatter Diagram is a graphical measure of correlation.
Examples of Scatter Diagram. Given below each diagram is the value of correlation.
Note that the value shows how good the correlation is (not how steep the line is), and if it is
positive or negative.
Scatter Diagram Procedure
1. Collect pairs of data where a relationship is suspected.
2. Draw a graph with the independent variable on the horizontal axis and the dependent variable
on the vertical axis. For each pair of data, put a dot or a symbol where the x-axis value intersects
the y-axis value. (If two dots fall together, put them side by side, touching, so that you can see
both.)
3. Look at the pattern of points to see if a relationship is obvious. If the data clearly form a line
or a curve, you may stop. The variables are correlated.
The data set below represents a random sample of 5 workers in a particular industry. The
productivity of each worker was measured at one point in time, and the worker was asked the
number of years of job experience. The dependent variable is productivity, measured in number
of units produced per day, and the independent variable is experience, measured in years.
Worker
y=Productivity(output/day) x=Experience(in
years)
1
2
3
4
5
33
19
32
26
15
10
6
12
8
4
Page 82
Productivity
30
25
20
15
10
5
0
0
10
12
14
Experience
This scatter diagram tell us that the two variables, productivity and experience, are
positively correlated.
Merits of Scatter Diagram Method:
1. It is an easy way of finding the nature of correlation between two variables.
2. By drawing a line of best fit by free hand method through the plotted dots, the method
can be used for estimating the missing value of the dependent variable for a given value
of independent variable.
3. Scatter diagram can be used to find out the nature of linear as well as non-linear
correlation.
4. The values of extreme observations do not affect the method.
Demerits of Scatter Diagram Method:
It gives only rough idea of how the two variables are related. It gives an idea about the
direction of correlation and also whether it is high or low. But this method does not give any
quantitative measure of the degree or extent of correlation.
I (b) Correlation Graph
Correlation graph is also used as a measure of correlation. When this method is used
the correlation graph is drawn and the direction of curve is examined to understand the nature of
correlation. Under this method, separate curves are drawn for the X variable and Y variable on
the same graph paper. The values of the variable are taken as ordinates of the points plotted.
From the direction and closeness of the two curves we can infer whether the variables are
related. If both the curves move in the same direction (upward or downward), correlation is said
Quantitative Methods for Economic Analysis - I
Page 83
to be positive. If the curves are moving in the opposite direction, correlation is said to be
negative.
But correlation graphs are not capable of doing anything more than suggesting the fact
of a possible relationship between two variables. We can neither establish any casual
relationship between two variables nor obtain the exact degree of correlation through them.
They only tell us whether the two variables are positively or negatively correlated. Example of a
graph is given below.
II.
Page 84
Product Moment correlation is designated by the Greek letter rho (?). When computed
in a sample, it is designated by the letter "r" and is sometimes called "Pearson's r."
Pearson's correlation reflects the degree of linear relationship between two variables.
Mathematical Formula:-The quantity r, called the linear correlation coefficient, measures the strength and the
direction of a linear relationship between two variables. (The linear correlation
coefficient is a measure of the strength of linear relation between two quantitative
variables. We use the Greek letter (rho) to represent the population correlation
coefficient and r to represent the sample correlation coefficient.)
Correlation coefficient for ungrouped data
)( )
Where
Xi is the ith observation of the variable X
Yi is the ith observation of the variable Y
is the mean of the observations of the variable X
is the mean of the observations of the variable Y
n is the number of pairs of observations of X and Y
is the standard deviation of the variable X
is the standard deviation of the variable Y
)( )
( )
Page 85
Year
(i)
Annual Sales
Annual
advertising
expenditure Xi
10
12
30
14
37
16
50
18
56
20
78
22
89
24
100
26
120
10
28
110
20
Compute the necessary values and substitute in the formula, we will solve using both
formula. We get
= (
)=
= 19.
= ( ) =
= 69.
Year
(i)
Xi
Annual
Sales
(Yi)
10
20
-9
-49
81
2401
441
12
30
-7
-39
49
1521
273
14
37
-5
-32
25
1024
160
16
50
-3
-19
361
57
18
56
-1
-13
169
13
20
78
81
22
89
20
400
60
24
100
31
25
961
155
26
120
51
49
2601
357
10
28
110
41
81
1681
369
190
690
330
11200
1894
( )
( )
)( )
100
400
360
144
900
518
196
1369
Page 86
800
256
2500
1008
324
3136
1560
400
6084
1958
484
7921
2400
576
10000
3120
676
14400
3080
784
12100
15004
3940
58810
)(
1894
330 11200
= 0.985
= 0.985
The correlation coefficient between annual advertising expenditure and annual sales revenue is
0.985. This is a positive value and is very close to 1. So it implies there is very strong corelation
between annual advertising expenditure and annual sales revenue.
Properties of Correlation coefficient
1. The correlation coefficient lies between -1 & +1 symbolically ( - 1 r 1 )
2. The correlation coefficient is independent of the change of origin & scale.
3. The coefficient of correlation is the geometric mean of two regression coefficient.
=
The one regression coefficient is (+ve) other regression coefficient is also (+ve) correlation
coefficient is (+ve)
Page 87
Page 88
,
)
measures the strength of association between two ranked variables.
Data which are arranged in numerical order, usually from largest to smallest and numbered 1,2,3
---- are said to be in ranks or ranked data.. These ranks prove useful at certain times when two or
more values of one variable are the same. The coefficient of correlation for such type of data is
given by Spearman rank difference correlation coefficient.
Spearman Rank Correlation Coefficient uses ranks to calculate correlation. The Spearman Rank
Correlation Coefficient is its analogue when the data is in terms of ranks. One can therefore also
call it correlation coefficient between the ranks.
The Spearman's rank-order correlation is used when there is a monotonic relationship between
our variables. A monotonic relationship is a relationship that does one of the following: (1) as the
value of one variable increases, so does the value of the other variable; or (2) as the value of one
variable increases, the other variable value decreases. A monotonic relationship is an important
underlying assumption of the Spearman rank-order correlation. It is also important to recognize
the assumption of a monotonic relationship is less restrictive than a linear relationship (an
assumption that has to be met by the Pearson product-moment correlation). The middle image
above illustrates this point well: A non-linear relationship exists, but the relationship is
monotonic and is suitable for analysis by Spearman's correlation, but not by Pearson's
correlation.
Let us make the relevance of use of Spearman Rank Correlation Coefficient with the aid of an
example.
As an example, let us consider a musical talent contest where 10 competitors are evaluated by
two judges, A and B. Usually judges award numerical scores for each contestant after his/her
performance.
A product moment correlation coefficient of scores by the two judges hardly makes sense here as
we are not interested in examining the existence or otherwise of a linear relationship between the
scores.
What makes more sense is correlation between ranks of contestants as judged by the two judges.
Spearman Rank Correlation Coefficient can indicate if judges agree to each other's views as far
as talent of the contestants are concerned (though they might award different numerical scores) in other words if the judges are unanimous.
The numerical value of the correlation coefficient, rs, ranges between -1 and +1. The correlation
coefficient is the number indicating the how the scores are relating.
In general,
rs > 0 implies positive agreement among ranks
rs < 0 implies negative agreement (or agreement in the reverse direction)
rs = 0 implies no agreement
Page 89
Closer rs is to 1, better is the agreement while rs closer to -1 indicates strong agreement in the
reverse direction.
The formula for finding Spearman Rank Correlation Coefficient is
=1
Where
Xiis the rank of the ith observation of the variable X
Yiis the rank of the ith observation of the variable Y
n is the number of payers of observations
+ )
1)
Let us calculate Spearman Rank Correlation Coefficient for our example of the musical talent
contest where 10 competitors are evaluated by two judges, A and B. The scores are givenbelow,
Contestant
1
2
Rating by judge 1
1
2
Rating by judge 2
2
4
3
4
3
4
5
1
5
6
7
8
9
10
5
6
7
8
9
10
3
6
7
9
10
8
Rating by
Rating by
judge 1 (Xi)
judge 2(Yi)
-1
-2
-2
-1
10
-1
10
10
28
Quantitative Methods for Economic Analysis - I
Page 90
=1
+ )
=
1)
= 1
6 28
= 0.8303
10 (10 )
Spearman Rank Correlation Coefficient tries to assess the relationship between ranks
without making any assumptions about the nature of their relationship. Hence it is a
non-parametric measure - a feature which has contributed to its popularity and wide
spread use.
Interpretation of Rank Correlation Coefficient (R)
1. The value of rank correlation coefficient, R ranges from -1 to +1
2. If R = +1, then there is complete agreement in the order of the ranks and the ranks are
in the same direction
3. If R = -1, then there is complete agreement in the order of the ranks and the ranks are
in the opposite direction
4. If R = 0, then there is no correlation
Advantages Spearmans Rank Correlation
1. This method is simpler to understand and easier to apply compared to karlearsons
correlation method.
2. This method is useful where we can give the ranks and not the actual data.
(qualitative term)
3. This method is to use where the initial data in the form of ranks.
Disadvantages Spearmans Rank Correlation
1. It cannot be used for finding out correlation in a grouped frequency distribution.
2. This method should be applied where N exceeds 30.
3. As Spearman's rank only uses rank, it is not affected by significant variations in
readings. As long as the order remains the same, the coefficient will stay the same. As
with any comparison, the possibility of chance will have to be evaluated to ensure that
the two quantities are actually connected.
4. A significant correlation does not necessarily mean cause and effect.
REGRESSION ANALYSIS*
* Note: In the syllabus for III Semester BA Economics paper Quantitative Methods for
Economic Analysis 1,though the tile of this module II is given as Correlation and Regression
Analysis, regression is not included in the contents. Hence here we give a brief discussion on
regression.
Quantitative Methods for Economic Analysis - I
Page 91
If two variables are significantly correlated, and if there is some theoretical basis for doing so, it
is possible to predict values of one variable from the other. This observation leads to a very
important concept known as Regression Analysis.
Regression analysis, in general sense, means the estimation or prediction of the unknown value
of one variable from the known value of the other variable. It is one of the most important
statistical tools which is extensively used in almost all sciences Natural, Social and Physical. It
is specially used in business and economics to study the relationship between two or more
variables that are related causally and for the estimation of demand and supply graphs, cost
functions, production and consumption functions and so on.
Prediction or estimation is one of the major problems in almost all the spheres of human activity.
The estimation or prediction of future production, consumption, prices, investments, sales,
profits, income etc. are of very great importance to business professionals. Similarly, population
estimates and population projections, GNP, Revenue and Expenditure etc. are indispensable for
economists and efficient planning of an economy.
Regression analysis was explained by M. M. Blair as follows:
Regression analysis is a mathematical measure of the average relationship between two or more
variables in terms of the original units of the data.
Regression Analysis is a very powerful tool in the field of statistical analysis in predicting the
value of one variable, given the value of another variable, when those variables are related to
each other.Regression Analysis is mathematical measure of average relationship between two or
more variables.Regression analysis is a statistical tool used in prediction of value of unknown
variable from known variable.
Advantages of Regression Analysis
1. Regression analysis provides estimates of values of the dependent variables from the values of
independent variables.
2. Regression analysis also helps to obtain a measure of the error involved in using the
regression line as a basis for estimations .
3. Regression analysis helps in obtaining a measure of the degree of association or correlation
that exists between the two variable.
Assumptions in Regression Analysis
1. Existence of actual linear relationship.
2. The regression analysis is used to estimate the values within the range for which it is valid.
3. The relationship between the dependent and independent variables remains the same till the
regression equation is calculated.
4. The dependent variable takes any random value but the values of the independent variables are
fixed.
5. In regression, we have only one dependant variable in our estimating equation. However, we
can use more than one independent variable.
Quantitative Methods for Economic Analysis - I
Page 92
Regression line
A regression line summarizes the relationship between two variables in the setting when one of
the variables helps explain or predict the other.
A regression line is a straight line that describes how a response variable y changes as an
explanatory variable x changes. A regression line is used to predict the value of y for a given
value of x. Regression, unlike correlation, requires that we have an explanatory variable and a
response variable.
Regression line is the line which gives the best estimate of one variable from the value of any
other given variable. The regression line gives the average relationship between the two variables
in mathematical form.
For two variables X and Y, there are always two lines of regression
Regression line of X on Y : gives the best estimate for the value of X for any specific given
values of Y :
X=a+bY
Where
a = X intercept
b = Slope of the line
X = Dependent variable
Y = Independent variable
Regression line of Y on X : gives the best estimate for the value of Y for any specific given
values of X
Y = a + bx
Where
a = Y intercept
b = Slope of the line
Y = Dependent variable
x= Independent variable
Simple Linear Regression
Regression analysis is most often used for prediction. The goal in regression analysis is to create
a mathematical model that can be used to predict the values of a dependent variable based upon
the values of an independent variable. In other words, we use the model to predict the value of Y
when we know the value of X. (The dependent variable is the one to be predicted). Correlation
analysis is often used with regression analysis because correlation analysis is used to measure the
strength of association between the two variables X and Y.
In regression analysis involving one independent variable and one dependent variable the values
are frequently plotted in two dimensions as a scatter plot. The scatter plot allows us to visually
inspect the data prior to running a regression analysis. Often this step allows us to see if the
relationship between the two variables is increasing or decreasing and gives only a rough idea of
the relationship. The simplest relationship between two variables is a straight-line or linear
relationship. Of course the data may well be curvilinear and in that case we would have to use a
Quantitative Methods for Economic Analysis - I
Page 93
different model to describe the relationship. Simple linear regression analysis finds the straight
line that best fits the data.
in
months
centimeters
(x)
(y)
18
76.1
19
77
20
78.1
21
78.2
22
78.8
23
79.7
24
79.9
25
81.1
26
81.2
27
81.8
28
82.8
29
83.5
Page 94
84
83
Mean Height
82
81
80
79
78
77
76
75
16
18
20
22
24
26
28
30
Age in months
We can see on the plot a strong positive linear association with no outliers. The correlation is
r=0.994, close to the r = 1 of points that lie exactly on a line.
If we draw a line through the points, it will describe these data very well. This line is called the
regression line and the process of doing so is called Fitting a line. This is done in figure below.
Let y is a response variable and x is an explanatory variable.
A straight line relating y to x has an equation of the form y = a + bx.
In this equation, b is the slope, the amount by which y changes when x increases by one unit.
The number a is the intercept, the value of y when x = 0
The straight line describing the data has the form
height = a + (b age).
In Figure below the regression line has been drawn with the following equation
height = 64.93 + (0.635 age).
Page 95
Regression Line
84
y = 0.635x + 64.92
83
Mean Height
82
81
80
79
78
77
76
75
16
18
20
22
24
26
28
30
Age in months
The figure above shows that this line fits the data well.
The slope b = 0.635 tells us that the height of children increases by about 0.6 cm for each
month of age.
The slope b of a line y = a + bx is the rate of change in the response y as the explanatory
variable x changes.
The slope of a regression line is an important numerical description of the relationship
between the two variables.
( )( )
( ) ( )
In the regreesion equation the symbol y* refers to the predicted value of y from a given
value of x from the regression equation.
Let us see with the aid of an example how regressions used for prediction.
Example:
Page 96
Scores made by students in a statistics class in the mid - term and final examination are
given here. Develop a regression equation which may be used to predict final
examination scores from the mid term score.
STUDENT
MID TERM
FINAL
98
90
66
74
100
98
96
88
88
80
45
62
76
78
60
74
74
86
10
82
80
Solution:
We want to predict the final exam scores from the mid term scores. So let us designate
y for the final exam scores and x for the mid term exam scores. We open the
following table for the calculations.
STUDENT X
X2
XY
98
90
9604
8820
66
74
4356
4884
100
98
10000
9800
96
88
9216
8448
88
80
7744
7040
45
62
2025
2790
76
78
5776
5928
60
74
3600
4440
74
86
5476
6364
10
82
80
6724
6560
785
810
64521
65074
Page 97
( )( )
10(65074) (785)(810)
=
( ) ( )
10 (64521) (785)
=
810 (0.514)(785)
810 403.49 406.51
=
=
= 40.651
10
10
10
We can use this to find the projected or estimated final scores of the students.
For example, for the midterm score of 50 the projected final score is
y* = 40.651 + (0.514) 50 = 40.651 + 25.70 = 66.351, which is a quite a good estimation.
To give another example, consider the midterm score of 70. Then the projected final
score is
y* = 40.651 + (0.514) 70 = 40.651 + 35.98= 76.631, which is again a very good estimation.
Page 98
call center may wish to know the relationship between wait times of callers and number of
complaints.
4. A fundamental driver of enhanced productivity in business and rapid economic advancement
around the globe during the 20th century was the frequent use of statistical tools in
manufacturing as well as service industries. Today, managers considers regression an
indispensable tool.
Correlation or Regression
Correlation and regression analysis are related in the sense that both deal with relationships
among variables. Whether to use Correlation or Regression in an analysis is often confusing for
researchers.
In regression the emphasis is on predicting one variable from the other, in correlation the
emphasis is on the degree to which a linear model may describe the relationship between two
variables. In regression the interest is directional, one variable is predicted and the other is the
predictor; in correlation the interest is non-directional, the relationship is the critical aspect.
Correlation makes no a priori assumption as to whether one variable is dependent on the other(s)
and is not concerned with the relationship between variables; instead it gives an estimate as to
the degree of association between the variables. In fact, correlation analysis tests for
interdependence of the variables.
As regression attempts to describe the dependence of a variable on one (or more) explanatory
variables; it implicitly assumes that there is a one-way causal effect from the explanatory
variable(s) to the response variable, regardless of whether the path of effect is direct or indirect.
There are advanced regression methods that allow a non-dependence based relationship to be
described (eg. Principal Components Analysis or PCA) and these will be touched on later.
Quantitative Methods for Economic Analysis - I
Page 99
*************************
Page 100
MODULE III
INDEX NUMBERS AND TIME SERIES ANALYSIS
Index Numbers: Meaning and Uses- Laspeyres, Paasches, Fishers, Dorbish-Bowley,
Marshall-Edgeworth and Kelleys Methods- Tests of Index Numbers: Time Reversal and Factor
Reversal tests -Base Shifting, Splicing and Deflating- Special Purpose IndicesWholesale Price
Index, Consumer Price Index and Stock Price Indices: BSE SENSEX and NSE-NIFTY. Time
Series Analysis-Components of Time Series, Measurement of Trend by Moving Average and the
Method of Least Squares.
Introduction
Historically, the first index was constructed in 1764 to compare the Italian price index in
1750 with the price level in 1500. Though originally developed for measuring the effect of
change in prices, index numbers have today become one of the most widely used statistical
devices and there is hardly any field where they are not used. Newspapers headline the fact that
prices are going up or down, that industrial production is rising or falling, that imports are
increasing or decreasing, that crimes are rising in a particular period compared to the previous
period as disclosed by index numbers. They are used to feel the pulse of the economy and they
have come to be used as indicators of inflationary or deflationary tendencies, In fact, they are
described as barometers of economic activity, i.e., if one wants to get an idea as to what is
happening to an economy, he should look to important indices like the index number of
industrial production, agricultural production, business activity, etc.
Of the important statistical devices and techniques, Index Numbers have today become one of
the most widely used for judging the pulse of economy, although in the beginning they were
originally constructed to gauge the effect of changes in prices. Today we use index numbers for
cost of living, industrial production, agricultural production, imports and exports, etc.
Index numbers are the indicators which measure percentage changes in a variable (or a group of
variables) over a specified time. For example,if we say that the index of export for the year 2013
is 125, taking base year as 2010, it means that there is an increase of 25% in the country's export
as compared to the corresponding figure for the year 2000.
Definitions of Index number
According to
Spiegel: An index number is a statistical measure, designed to measure changes in a variable,
or a group of related variables with respect to time, geographical location or other characteristics
such as income, profession, etc.
Patternson: In its simplest form, an index number is the ratio of two index numbers expressed as
a percent. An index is a statistical measure, a measure designed to show changes in one variable
Quantitative Methods for Economic Analysis - I
Page 101
or a group of related variables over time, with respect to geographical location or other
characteristics.
Bowley: Index numbers are used to measure the changes in some quantity which we cannot
observe directly
We can thus say that index numbers are economic barometers to judge the inflation (increase in
prices) or deflationary (decrease in prices) tendencies of the economy. They help the government
in adjusting its policies in case of inflationary situations.
Page 102
Page 103
consider all items of production and each item may have undergone a different fractional
increase (or even a decrease). How do we obtain a composite measure? This composite measure
is provided by index numbers which may be defined as a device for combining the variations that
have come in group of related variables over a period of time, with a view to obtain a figure that
represents the net result of the change in the constitute variables.
Index numbers may be classified in terms of the variables that they are intended to measure. In
business, different groups of variables in the measurement of which index number techniques are
commonly used are (i) price, (ii) quantity, (iii) value and (iv) business activity. Thus, we have
index of wholesale prices, index of consumer prices, index of industrial output, index of value of
exports and index of business activity, etc. Here we shall be mainly interested in index numbers
of prices showing changes with respect to time, although methods described can be applied to
other cases. In general, the present level of prices is compared with the level of prices in the past.
The present period is called the current period and some period in the past is called the base
period.
1) Index numbers are used as economic barometers:
Index number is a special type of averages which helps to measure the economic
fluctuations on price level, money market, economic cycle like inflation, deflation etc.
G.Simpson and F.Kafka say that index numbers are today one of the most widely used
statistical devices. They are used to take the pulse of economy and they are used as indicators
of inflation or deflation tendencies. So index numbers are called economic barometers.
2) Index numbers helps in formulating suitable economic policies and planning etc.
Many of the economic and business policies are guided by index numbers. For
example while deciding the increase of DA of the employees; the employers have to depend
primarily on the cost of living index. If salaries or wages are not increased according to the
cost of living it leads to strikes, lock outs etc. The index numbers provide some guide lines that
one can use in making decisions.
3) They are used in studying trends and tendencies.
Since index numbers are most widely used for measuring changes over a period of
time, the time series so formed enable us to study the general trend of the phenomenon under
study. For example for last 8 to 10 years we can say that imports are showing upward
tendency.
4) They are useful in forecasting future economic activity.
Index numbers are used not only in studying the past and present workings of our
economy but also important in forecasting future economic activity.
5) Index numbers measure the purchasing power of money.
The cost of living index numbers determine whether the real wages are rising or falling
or remain constant. The real wages can be obtained by dividing the money wages by the
Page 104
corresponding price index and multiplied by 100. Real wages helps us in determining the
purchasing power of money.
6) Index numbers are used in deflating.
Index numbers are highly useful in deflating i.e. they are used to adjust the wages for cost of
living changes and thus transform nominal wages into real wages, nominal income to real
income, nominal sales to real sales etc. through appropriate index numbers.
(a)Unweighted indices
(i) Simple Aggregative method
(ii) Simple average of price relative method
(b)Weighted indices
(i) Weighted Aggregative Indices
1. Laspayers Method
2. Paashe Method
3. Dorbish&Bowleys method
4. Fishers ideal Method
5. Marshall Edgeworth Method, and
6. Kelleys Method
(ii) Weighted Average of relatives
Let us see them in detail.
a (i) Simple Aggregative Method
This is a simple method for constructing index numbers. In this method, the total of the prices of
commodities in a given (current) years is divided by the total of the prices of commodities in a
base year and expressed as percentage.
=
100
Solution :
Calculation of simple Aggregative index number for 2013 (against the year 2010) using the
formula.
Price in the year2010
60
50
70
120
100
80
60
100
160
150
Commodity
A
B
C
D
E
= 400
100 =
= 550
400
100 = 137.50
550
This means that the price index for the year 2013, taking 2010 as base year, is 137.5, showing
that there is an increase of 37.5% in the prices in 2013 as against 2010.
Example 2
Compute the index number for the years 2011, 2012, 2013 and 2014, taking 2010 as base year,
from the following data.
Year
Price
2010
120
2011
144
2012
168
2013
204
2014
216
Solution :
Price relatives for different years are
Year
Price
2010
120
100
120
= 100
2011
144
100
120
= 120
2012
168
100
120
= 140
2013
204
100
120
= 170
2014
216
100
120
= 180
2010
100
2011
120
2012
140
2013
170
2014
180
Page 106
When this method is used to construct a price index, first of all price relatives are
obtained for the various items included in the index and then arrange of these relatives is
obtained using any one of the measures of central value, ie, arithmetic mean, median, mode,
geometric or harmonic mean. When arithmetic mean is used for averaging the relatives, the
formula for computing the index is:
100
if A.M. is used as average where
Is the price index, N is the number of items,P0 is the price
in the base year and P1 is the price of corresponding commodity in present year (for which index
is to be calculated).
Example
Construct by simple average of price relative method the price index of 2013, taking 2010 as
base year from the following data
Commodity
Price in
2010
60
50
60
50
25
20
Price in
2014
80
60
72
75
37.5
30
Solution
Find the price relatives for each, take the sum, substitute in formula.
Commodity
Price in
2010 (P0)
60
50
60
50
25
20
Price in
2014 (P1)
80
60
72
75
37.5
30
60
100
80
60
100
50
72
100
60
75
100
50
150.00
37.5
100
25
150.00
30
100
20
Price
relative
100
133.33
120.00
120.00
150.00
100 = 823.33
Substituting we get
100
823.33
= 137.22
6
Price index for 2013, taking 2010 for base year = 137.22
An un-weighted aggregate price index represents the changes in prices, over time, for an
entire group of commodities. However, an un-weighted aggregate price index has two
short comings. First, this index considers each commodity in the group as equally
Quantitative Methods for Economic Analysis - I
Page 107
important. Thus, the most expensive commodities per unit are overly influential. Second,
not all the commodities are consumed at the same rate. In an un-weighted index, changes
in the price of the least consumed commodities are overly influential.
The primary disadvantage of the Laspeyres Method is that it does not take into consideration the
consumption pattern. The Laspeyres Index has an upward bias. When the prices increase, there
is a tendency to reduce the consumption of higher priced items. Similarly when prices decline,
consumers shift their purchase to those items which decline the most.
b. i. (ii) Paasches Method
Under this method weights are determined by quantities in the given year
=
The Paasche price index uses the consumption quantities in the year of interest instead of using
the initial quantities. Thus, the Paasche index is a more accurate reflection of total consumption
costs at that point in time. However, there are two major drawbacks of the Paasche index. First,
accurate consumption values for current purchases are often difficult to obtain. Thus, many
important indices, such as the consumer price index (CPI), use the Laspeyres method. Second, if
a particular product increases greatly in price compared to the other items in the market basket,
consumers will avoid the high-priced item out of necessity, not because of changes in what they
might prefer to purchase.
Page 108
P = Paasches Index
OR it may be written as
=
The geometric mean of Laspeyres and Paasches price indices is called Fishers price Index.
Fisher price index uses both current year and base year quantities as weight. This index corrects
the positive bias inherent in the Laspeyres index and the negative bias inherent in the Paasche
index. Fishers price index is also a weighted aggregative price index because it is an average
(G.M) of two weighted aggregative indices. The computational formula for the fisher ideal price
index is:
OR
Fischers Index is known as ideal because (1) it is based on geometric mean, which
is considered to be the best average for constructing index numbers. (2) It takes into account
both current as well as base year prices and quantities (3) It satisfies both time reversal as well
as the factor reversal tests (which we will study soon) and (4) it is free from bias.
It is not, however, a practical index to compute because it is excessively laborious.
The data, particularly for the Paasche segment of the index, are not readily available.
b.i. (v) Marshall-Edgeworth Method
If the weights are taken as the arithmetic mean of base and current year quantities, then the
weighted aggregative index is called Marshal-Edgeworth index. Like Fishers index, MarshallEdgeworth index alsorequires too much labor in selection of commodities. In some cases the
usage of this index is not suitable, for example the comparison of the price level of a large
Page 109
country to a small country. Marshal-Edgeworth index can be calculated by using the formula
given below.
+
=
+
It is a simple, readily constructed measure, giving a very close approximation to the
results obtained by the ideal formula.
The Marshall-Edgeworth formula uses the arithmetic mean of the quantities purchased in the
base and current periods as weights. Like the Fisher 'Ideal' index it is impracticable to use as a
timely indicator of price change because it requires the use of quantities purchased in the current
period. In practice, the Marshall-Edgeworth index and the Fisher Ideal, index give similar
results.
b.i. (vi) Kelleys Method
According to Truman L. Kelly the formula for constructing index numbers.
Where q refer to some period, not necessarily the base year or current year.
Example 1
From the following data calculate Price Index Numbers for 2000 with 2013 as base year by using
(i) Laspayers Method (ii) Paasches Method (iii) Dorbish&Bowleys Method (iv) Fishers Ideal
Index (v) Marshall-Edgeworth Method
2000
Commodity
A
B
C
D
Price
20
50
40
20
2013
Quantity
8
10
15
20
Solution
Let us first compute the necessary values.
(i) Laspayers Method
2000
2013
Commodity
P0
Q0
P1
Q1
20
40
50
10
40
20
Price
40
60
50
20
100
P1Q0
P0Q0
320
160
60
600
500
15
50
15
750
600
20
20
25
400
400
2070
1660
=
Quantitative Methods for Economic Analysis - I
Quantity
6
5
15
25
2070
100 = 124.70
1660
Page 110
2000
2013
Commodity
P0
Q0
P1
Q1
20
40
50
10
40
20
100
P1Q1
P0Q1
240
120
60
300
250
15
50
15
750
600
20
20
25
500
500
1790
1470
1790
100 = 121.77
1470
=
+
2
124.70 + 121.77
246.47
=
= 123.23
2
2
=
(v)Marshall-Edgeworth Method
2000
2013
+
+
100
Commodity
P0
Q0
P1
Q1
P1Q1
P0Q1
P1Q0
P0Q0
20
40
240
120
320
160
50
10
60
300
250
600
500
40
15
50
15
750
600
750
600
20
20
20
25
500
500
400
400
1790
1470
2070
1660
2070 + 1790
3860
100 =
100 = 1.233226837 100
1660 + 1470
3130
= 123.32
Page 111
Example 2
Compute index number from the following data
Materials
Unit
Cement
Timber
Steel
Bricks
Quantity
required
500 lb
2000 c.ft.
50 cvt.
20000
100lb
c.ft.
Cwt.
Per 000
Price
2000
5.0
9.5
34.0
12.0
2010
8.0
14.2
42.20
24.0
Solution
Since the quantities (weights) required of different materials are fixed for both base year and
current year, we will use Kellys formula.
For materials we have to do certain conversions. For example, for cement unit is in 100 lbs, and
the quantity required is 500 lbs. Hence, the quantity consumed per unit for cement is 500/100 =
5. Similarly, the quantity consumed per unit for brick is 20000/1000= 20.
By Kelleys Method,
100
Unit
Cement
Timber
Steel
Bricks
Quantity
required
100 lb
c.ft.
Cwt.
Per 000
500 lb
2000 c.ft.
50 cvt.
20000
5
2000
50
20
100 =
Price (Rs.)
2000
2010
P0
P1
5.0
8.0
9.5
14.2
34.0
42.0
12.0
24.0
Total
P1q
P0q
25
19000
1700
240
20965
40
28400
2100
480
31020
Substituting
=
31020
1.4796
100 =
= 147.96
20965
100
relative importance of those commodities in the group. Thus the index for the whole group is
Page 112
obtained on taking the weighted average of the price relatives. To find the average, Arithmetic
Mean or Geometric Mean can be used.
=
P = Price relative
V = Value of weights i.e.
Example:
From the following data compute price index by supplying weighted average of price relatives
method using Arithmetic Mean
Commodity
Sugar
Flour
Milk
3.0
1.5
1.0
20 Kg.
40 Kg.
10 Lit.
4.0
1.6
1.5
(v)
Sugar
3.0
20 Kg
60
Flour
1.5
40 Kg.
1.6
60
Milk
1.0
10 Lit.
1.5
10
x 100
x 100
.
V = 130
x 100
15900
= 122.31
130
.
.
x 100
x 100
pv
8000
6400
1500
PV
= 15900
V = Value of weight
The above example can be re worked using GM as follows.
Page 113
(v)
3.0
1.5
1.0
20 Kg
40 Kg.
10 Lit.
4
1.6
1.5
60
60
10
p
133.3
106.7
150.0
Log p
2.1249
2.0282
2.1761
= 130
V Log p
127.494
121.692
21.761
.
= 270.947
270.947
=
2.084 = 120.9
130
Merits of weighted Average of Relative Indices
When different index numbers are constructed by the average of relatives method, all of
which have the same base, they can be combined to form a new index.
When an index is computed by selecting one item from each of the many sub groups of
items, the values of each sub subgroup may be used as weights. Then only the method of
weighted average of relatives is appropriate.
When a new commodity is introduced to replace the one formerly used, the relative for
the new it may be spliced to the relative for the old one, using the former value weights.
The price or quantity relatives each single item in the aggregate are in effect, themselves
a simple index that often yields valuable information for analysis.
=
Where P01 is the price index for year 1 with year 0 as base year and P10 is the price index for
year a with year b as base.
This test is not satisfied by both Laspeyres and Paasches index numbers.
=
X
Paasches Method =
Wheref
stands for the price relative for the year 1 with base year 0 and
quantity relative for the year 1 with base year 0, then the condition is
This test is not satisfied by both Laspeyres and Paasches index numbers.
LaspeyresFormula =
=
Paaschesformula =
stands for
Page 114
Fishers Formula =
Fishers formula satisfies both time reversal and factor reversal test. This is why the
Fishers formula is often called Fishers Ideal Index Number.
Example
For the following data prove that the Fishers Ideal Index satisfies both the Time Reversal Test
and the Factor Reversal Test.
Commodity
Base Year
Price
6
2
4
10
A
B
C
D
Quantity
50
100
60
30
Price
10
2
6
12
Current Year
Quantity
56
120
60
24
Solution
A
50
P1
10
100
120
200
240
200
240
60
60
240
240
360
360
10
30
12
24
300
240
360
288
= 1040
= 1056
= 1420
= 1448
56
300
336
500
560
=
Time reversal test:
Page 115
Substituting
=
= .
=
=
Page 116
Following series is given to the base year 2000. Now convert it into the new series with base
year 2003.
Year
Index
2000
100
2001
130
2002
145
2003
155
Year
2004
205
2005
255
2000
2001
2002
2003
2004
2005
Base = 2003
100/155100 = 64.52
130/155100 = 83.87
145/155100 = 93.55
155/155100 = 100.00
205/155100 = 132.26
255/155100 = 164.52
2005
100
Year
2005
2006
2007
2008
2009
2010
2006
105
2007
110
2008
107
2009
112
2010
107
2005
100
Year
2005
1980
2006
105
2007
115
2008
130
2009
150
2010
175
Page 117
2006
1981
105
2007
1982
115
2008
1983
130
2009
1984
150
2010
1985
175
,
,
,
105
100 = 105
100
115
=
100 = 109.52
105
130
=
100 = 113.04
115
150
=
100 = 115.38
130
175
=
100 = 116.67
150
=
Splicing of index numbers mean combining two or more series of overlapping index numbers to
obtain a single index number on a common base. This is done by the same technique as used in
base shifting.
To combine two or more series of overlapping index numbers to obtain a single series of index
numbers on a common base.
It is of two types:(i) Splicing of new index numbers to old index numbers
(ii) Splicing of old index numbers to new index number.
Splicing of Index numbers can be done only if the index numbers are constructed with the same
items, and have an overlapping year. Suppose we have an index number with a base year of 2001
and another index number (using the same item as the first one) with a base of 2011. Suppose
both index numbers are continuing. Then we can splice the first series of index number to the
second series and have a common index with base 2011. We can also spice index number series
two with series one and have a common index number with base 2001. Splicing is generally
done when an old index number with an old base is being discontinued and a new index with a
new base is being started.
The following formula must be used in this method of splicing
Index number after splicing =
index number to be spliced old index number of existing base
100
Example
Index Number A given below was started in 1981 and discontinued in 2001 when another index
B was started which continues up to date. From the data given in the table below splice the index
number B to index number A so that a continuous series of index numbers from 1951 up to date
is available.
Splicing of Index B to Index A
Here we multiply index B with a common factor
which is the ratio of index B to index A in
the overlapping year 2001.
Quantitative Methods for Economic Analysis - I
Page 118
Year
1981
.
.
.
2000
2001
Index A
100
Index B
-
Index B Spliced to A
-
180
200
100
2002
120
2003
.
.
2013
140
200
100 = 200
100
200
120 = 240
100
200
140 = 280
100
250
200
250 = 500
100
Thus we have a continuous series of index numbers with base 1981 which continues up todate.
DEFLATING THE INDEX NUMBERS
By deflating we mean making allowances for the effect of changing price levels. A rise in price
level means a reduction in the purchasing power of money. To take the case of a single
commodity suppose the price of wheat rises from 500 per quintal in 1999 to 1,000 per
quintal in 2009 it means that in 2009 one can buy only half of wheat if the spends the same
amount which he was spending on wheat in 1999. Thus the value (or purchasing power) of a
rupee is simply the reciprocal of an appropriate price index written as a proportion. If prices
increase by 60 per cent, the price index is 1.60 and what a rupee will busy is only 1/1.60 or 5/8 of
what it used to buy. In other words the purchasing power of rupee is 5/8 of what it was.
Similarly, if prices increase by 25 per cent the price index is 1.25 (125 per cent). And the
purchasing power of the rupee is 1/1.25 = 0.80.
1
Thus the purchasing power of money =
price index
In times of rising prices the money wages should be deflated by the price index to get the
figure of real wages. The real wages alone tells whether a wage earner is in better position or in
worst position.
For calculating real wage, the money wages or income is divided by the corresponding
price index and multiplied by 100.
i.e. Real wages =
Thus Real Wage Index=
Money wages
100
Pr ice index
Re al wage of current year
100
Re al wage of base year
Page 119
Example
The annual wage of workers (in Rs.) of workers are given along with Consumer Price Indices.
Find (i) the real wage and (ii) the real wage indices.
Year
Wages
Consumer
Indices
2010
1800
Price 100
Year
Wage
Price Index
2010
1800
100
2011
2200
170
2012
3400
300
2013
3600
320
2011
2200
170
Real Wage
100 =1800
100 =1294.1
100 =1133.3
100 =1125
2012
3400
300
2013
3600
320
100 =71.90
100 =62.96
100 =62.50
Page 120
Wholesale Price Index (WPI) represents the price of goods at a wholesale stage i.e. goods that
are sold in bulk and traded between organizations instead of consumers. WPI is used as a
measure of inflation in some economies.
Uses
In a dynamic world, prices do not remain constant. Inflation rate calculated on the basis of the
movement of the Wholesale Price Index (WPI) is an important measure to monitor the dynamic
movement of prices. As WPI captures price movements in a most comprehensive way, it is
widely used by Government, banks, industry and business circles. Important monetary and fiscal
policy changes are often linked to WPI movements. Similarly, the movement of WPI serves as
an important determinant, in formulation of trade, fiscal and other economic policies by the
Government of India. The WPI indices are also used for the purpose of escalation clauses in the
supply of raw materials, machinery and construction work.
WPI is used as an important measure of inflation in India. Fiscal and monetary policy changes
are greatly influenced by changes in WPI.
WPI is an easy and convenient method to calculate inflation. Inflation rate is the difference
between WPI calculated at the beginning and the end of a year. The percentage increase in WPI
over a year gives the rate of inflation for that year.
WPI computation in India
WPI is the most widely used inflation indicator in India. This is published by the Office of
Economic Adviser, Ministry of Commerce and Industry. WPI captures price movements in a
most comprehensive way. It is widely used by Government, banks, industry and business
circles. Important monetary and fiscal policy changes are linked to WPI movements. It is in use
since 1939 and is being published since 1947 regularly. We are well aware that with the
changing times, the economies too undergo structural changes. Thus, there is a need for
revisiting such indices from time to time and new set of articles / commodities are required to be
included based on current economic scenarios. Thus, since 1939, the base year of WPI has been
revised on number of occasions. The current series of Wholesale Price Index has 2004-05 as
the base year.
Wholesale price index comprises as far as possible all transactions at first point of bulk sale in
the domestic market. Provisional monthly WPI for All Commodities is released on 14th of every
month (next working day, if 14th is holiday). Detailed item level WPI is put on official website
(http://www.eaindustry.nic.in/) for public use. The provisional index is made final after a period
of eight weeks/ two months.
The Office of the Economic Adviser to the Government of India undertook to publish for the
first time, an index number of wholesale prices, with base week ended August 19, 1939 = 100,
from the week commencing January 10, 1942. The index was calculated as the geometric mean
Quantitative Methods for Economic Analysis - I
Page 121
of the price relatives of 23 commodities classified into four groups: (1) food & tobacco; (2)
agricultural commodities; (3) raw materials; and (4) manufactured articles. Each item was
assigned equal weight and for each item, there was a single price quotation. That was a modest
beginning to what became an important weekly activity for the monitoring and management of
the Indian economy and a benchmark for business transactions.
Step-in compilation of WPI in India
Like most of the price indices, WPI is based on Laspeyres formula for reason of practical
convenience. Therefore, once the concept of wholesale price is defined and the base year is
finalized, the exercise of index compilation involve finalization of item basket, allocation of
weights (W) at item, groups/ sub-groups level. Simultaneously, the exercise to collect base prices
(Po), current prices (P1), finalization of item specifications, price data sources, and data
collection machinery is undertaken. These steps are
1. Definition of the Concept of Wholesale Prices:
Wholesale price has divergent connotations adopted by the different departments using them.
There is no uniform definition for agricultural and non- agricultural commodities as all the
wholesale prices cannot be collected from the established markets. So proper definition has to be
made by the competent authority.
For example in the case of agricultural commodities, in practice, there are three types of
wholesale markets viz., primary, secondary and terminal in the agricultural sector. The price
movements and price levels in all three vary. Price movement in the terminal market may tend to
converge toward the retail prices. Option to collect the wholesale prices for these three different
stages of wholesale transactions exists for agricultural commodities though the primary market is
prepared. So, the Ministry of Agriculture has defined wholesale price as the rate at which
relatively large transaction of purchase, usually for further sale, is effected.
Similarly, for non-agricultural commodities, which are predominantly manufacturing items, the
problem arises, as there are no established sources in markets. This is true of mining and fuel
items also. The issue of ex-factory vis--vis wholesale prices for non-agriculture items have been
discussed by the successive Working Groups set up for the revision of WPI and all have reached
the conclusion that in practice, it is not feasible to collect wholesale prices for most of the
manufacturing items. It has also been observed that the margin of wholesalers in case of nonagricultural commodities remains unchanged for over a long period of time. As a result, it is felt
that the trends in the index compiled on the basis of ex-factory prices would not be much
different from the index if compiled on the basis of wholesale prices if it were feasible to get
these prices. The last Working Group has recommended collecting wholesale prices from the
Quantitative Methods for Economic Analysis - I
Page 122
markets as far as possible, because the economy is moving towards globalization and open
trade with inputs increasing in the commodities set.
2) Choice of Base Year
The second step is choice of base year. The well-known criteria for the selection of base year
are (i) a normal year i.e. a year in which there are no abnormalities in the level of production,
trade and in the price level and price variations, (ii) a year for which reliable production, price
and other required data are available and (iii) a year as recent possible and comparable with
other data series at national and state level. The National Statistical Commission has
recommended that base year should be revised every five year and not later than ten years.
3. Selection of Items, Varieties/ Grades, Markets:
To ensure that the items in the index basket are as best representatives as possible, efforts are
made to include all the important items transacted in the economy during the base year. The
importance of an item in the free market will depend on its traded value during the base year. At
wholesale level, bulk transactions of goods and services need to be captured. As the services are
not covered so far, the WPI basket mainly consists of items from goods sector. In the absence of
single source of data on traded value, the selection procedures followed for agricultural
commodities and non-agricultural commodities have also been different.
For example, in the case of agricultural commodities: As there is a little scope of emergence of
new commodities in the agriculture, the selection of new items in the basket is done on the basis
of increased importance in wholesale markets. Varieties, which have declined in importance,
need to be dropped in the revised series. Final inclusion or exclusion of an item in the basket is
based on the process of consultation with the various departments. The exercise of adding
/deleting commodities, specifications and markets is completed once the consultation process is
over. In the existing WPI series, items, their specifications and markets have been finalized in
consultation of with the Directorate of E&S (M/O Agriculture), National Horticulture Board,
Spices Board, Tea board, Coffee Board and Rubber Board, Silk Board, Directorate Of Tobacco,
Cotton Corporation of India etc.
4. Derivation of Weighting Diagram
Weights used in the WPI are value weights not quantity weights as its difficult to assign quantity
weights. Distribution of the appropriate weight to each of the item is most important exercise for
reliable index. Unlike consumer price indices, where weights are derived on the basis of results
of Expenditure Surveys, several sources of data are used for derivation of weights for WPI.
5) Collection of Prices
In WPI pricing methodology used is specification pricing. Under this, in consultation with the
identified source agencies, precise specifications of all items in the basket are defined for repeat
Quantitative Methods for Economic Analysis - I
Page 123
pricing every week. All characteristics like make, model, features along with the unit of sale,
type of packaging, if applicable, etc are recorded and printed in the price collection schedule. At
the time of scrutiny of price data all these are kept in mind. This pricing to constant quality
technique is the cornerstone of Laspeyres formula. In case of changes in quality and
specifications, due adjustments are made as per the standard procedures.
The collection of base prices is done concurrently while the work on finalisation of index basket
is on. Therefore, price collection is normally done for larger number of items pending
finalisation. Once the basket is ready, current prices are collected only as per the final basket
from the designated sources. Weekly prices need to be collected for pre-determined day of the
week. For the current series prices are quoted on the basis of the prevailing prices of every
Friday. Agricultural wholesale prices are for bulk transactions and include transport cost. Nonagricultural prices are ex-mine or ex-factory inclusive of excise duty but exclusive of rebate if
any.
6) Treatment of prices collected from open market & administered prices:
There are some items which constitute part of index baskets but the prices for these items are
either totally administered by the Government or are under dual pricing policy. The issue of
using administered prices for index compilation is resolved by taking into account appropriate
ratio between the levy and non-levy portions. Where these ratios are not available, the issues can
be resolved through taking the appropriate number of price quotations of the administered prices
and the open market prices after periodic review.
Due to variation in quality and different price movements of the commodities belonging to
unorganized sector, separate quotations from organized and unorganized units have to be taken
and merged based on the turnover value of both the sectors at item level. For pricing from
unorganized sector, adequate number of price quotations has to be drawn out of the list of units
by criteria of share of production as far as possible.
7) Classification structure:
The Working Groups over the period have been suggesting to bring the classification of various
items under different groups and sub-groups as per the latest revised National Industrial
Classification (NIC) which in turn is comparable to International Standard Industrial
Classification (ISIC). The classification based on NIC renders the WPI data amenable to
comparison with the Index of Industrial Production (IIP) and National Income data.
Major Group/Groups: I. Primary Articles II. Fuel, Power, Light & Lubricants III. Manufactured
Products
8) Methodology of Index Calculation
Actual index compilation is done in stages.
Quantitative Methods for Economic Analysis - I
Page 124
In the first stage, once the price data are scrutinized, price relative for each price quote is
calculated. Price relative is calculated as the ratio of the current price to the base price multiplied
by 100 i.e. (P1/Po)100.
In the next stage, commodity/item level index is arrived at as the simple arithmetic average of
the price relatives of all the varieties (each quote) included under that commodity. An average of
price ratio/ relative is used under implicit assumption that each price quotation collected for an
item/commodity index compilation has equal importance i.e. the shares of production value is
equal.
Next, the indices for the sub groups/groups/ major groups are compiled and the aggregation
method is based on Laspeyres formula as below:
I= S (Ii x Wi) / S Wi
Where,
I = Index numbers of wholesale prices of a sub- group/group/ major group/ all commodities
S = represents the summation operation,
Ii = Index of the ith item / sub- group/ group/ major group.
Wi = Weight assigned to the ith item of sub- group/group/ major group.
The weights are value weights. Aggregation is first done at sub-group and group level. All
commodities index is compiled by aggregating Major group indices.
9) Handling of the Seasonal Commodities :
There are number of agriculture items, especially some fruits and vegetables, which are of
seasonal nature. When a particular seasonal item disappears from the market and its prices are
not available because of its being out of season, the weights of such item is imputed amongst the
other items on pro rata basis with in the sub-group of vegetables or fruits. The underlying
assumption is that if the items remained available, the prices of these items would have moved in
the same proportion as the prices of the other items in the sub-group, which did remain available.
This is equivalent to giving a greater weight to the remaining items. The seasonality problem can
be sorted by adopting other methods like, i) prices of unavailable items can also be extrapolated
forward from the period of availability or ii) if such seasonal item has insignificant weight it can
be taken permanently from the basket etc.
Page 125
data from several surveys to produce a timely and precise measure of average price change for
the consumption sector.
Consumer Price Index is a comprehensive measure used for estimation of price changes in a
basket of goods and services representative of consumption expenditure is called consumer price
index. The calculation involved in the estimation of CPI is quite rigorous. Various categories and
sub-categories have been made for classifying consumption items and on the basis of consumer
categories like urban or rural. Based on these indices and sub indices obtained, the final overall
index of price is calculated mostly by national statistical agencies. It is one of the most important
statistics for an economy and is generally based on the weighted average of the prices of
commodities. It gives an idea of the cost of living.
Inflation is measured using CPI. The percentage change in this index over a period of time gives
the amount of inflation over that specific period, i.e. the increase in prices of a representative
basket of goods consumed.
The CPI frequently is called a cost-of-living index, but it differs in important ways from a
complete cost-of-living measure. A cost-of-living index would measure changes over time in the
amount that consumers need to spend to reach a certain utility level or standard of living. Both
the CPI and a cost-of-living index would reflect changes in the prices of goods and services, such
as food and clothing that are directly purchased in the marketplace; but a complete cost-of-living
index would go beyond this role to also take into account changes in other governmental or
environmental factors that affect consumers' well-being. It is very difficult to determine the
proper treatment of public goods, such as safety and education, and other broad concerns, such as
health, water quality, and crime, that would constitute a complete cost-of-living framework.
How do we read or interpret an index?
An index is a tool that simplifies the measurement of movements in a numerical series. Most of
the specific CPI indexes have a 1982-84 reference base. That is, the agency computing the index
sets the average index level (representing the average price level)-for the 36-month period
covering the years 1982, 1983, and 1984-equal to 100. The agency then measures changes in
relation to that figure. An index of 110, for example, means there has been a 10-percent increase
in price since the reference period; similarly, an index of 90 means a 10-percent decrease.
Movements of the index from one date to another can be expressed as changes in index points
(simply, the difference between index levels), but it is more useful to express the movements as
percent changes. This is because index points are affected by the level of the index in relation to
its reference period, while percent changes are not.
Year I
Year II
Change in index
points
Percent change
Item A
112.500
121.500
9.000
Item B
225.000
243.000
18.000
Item C
110.000
128.000
18.000
9.0/112.500 x 100 =
8.0
18.0/225.000 x 100 =
8.0
18.0/110.000 x 100 =
16.4
Page 126
In the table above, Item A increased by half as many index points as Item B between Year I and
Year II. Yet, because of different starting indexes, both items had the same percent change; that
is, prices advanced at the same rate. By contrast, Items B and C show the same change in index
points, but the percent change is greater for Item C because of its lower starting index value.
Uses of cost of living index numbers:
1. Cost of living index numbers indicate whether the real wages are rising or falling. In
other words they are used for calculating the real wages and to determine the change in
the purchasing power of money.
1
Purchasing power of money
Cost of living index number
Real Wages
Money wages
100
Cost of living index umbers
2. Cost of living indices are used for the regulation of D.A or the grant of bonus to the
workers so as to enable them to meet the increased cost of living.
3. Cost of living index numbers are used widely in wage negotiations.
4. These index numbers also used for analyzing markets for particular kinds of goods.
Main steps or problems in construction of cost of living index numbers
Production of the CPI requires the skills of many professionals, including economists,
statisticians, computer scientists, data collectors, and others.
The cost of living index numbers measures the changes in the level of prices of commodities
which directly affects the cost of living of a specified group of persons at a specified place. The
general index numbers fails to give an idea on cost of living of different classes of people at
different places.
Different classes of people consume different types of commodities, peoples consumption
habit is also vary from man to man, place to place and class to class i.e. richer class, middle class
and poor class. For example the cost of living of rickshaw pullers at BBSR is different from the
rickshaw pullers at Kolkata. The consumer price index helps us in determining the effect of rise
and fall in prices on different classes of consumers living in different areas.
The following are the main steps in constructing a cost of living index number.
1. Decision about the class of people for whom the index is meant
It is absolutely essential to decide clearly the class of people for whom the index
is meant i.e. whether it relates to industrial workers, teachers, officers, labors, etc. Along
with the class of people it is also necessary to decide the geographical area covered by the
index, such as a city, or an industrial area or a particular locality in a city.
2. Conducting family budget enquiry
Once the scope of the index is clearly defined the next step is to conduct a sample
family budget enquiry i.e. we select a sample of families from the class of people for
whom the index is intended and scrutinize their budgets in detail. The enquiry should be
conducted during a normal period i.e. a period free from economic booms or depressions.
Quantitative Methods for Economic Analysis - I
Page 127
The purpose of the enquiry is to determine the amount; an average family spends on
different items. The family budget enquiry gives information about the nature and quality
of the commodities consumed by the people. The commodities are being classified under
following heads
i) Food ii) Clothing iii)Fuel and Lighting iv)House rent v) miscellaneous
3. Collecting retail prices of different commodities
The collection of retail prices is a very important and at the same time very
difficult task, because such prices may vary from lace to place, shop to shop and person
to person. Price quotations should be obtained from the local markets, where the class of
people reside or from super bazaars or departmental stores from which they usually make
their purchases.
Where
and stand for the prices of the current year and base year.
and
stand for the quantities of the current year and base year.
Steps:
i) The prices of commodities for various groups for the current year is multiplied by the quantities
of the base year and their aggregate expenditure of current year is obtained .i.e. p1q0
p q
iii) The aggregate expenditure of the current year is divided by the aggregate expenditure of the
base year and the quotient is multiplied by 100.
Symbolically
pq
p q
1 0
0
100
Page 128
p1
100 for each item
po
v p0 q0 , value on the base year
Where p
Example
Construct the Consumer price index number of 2013 on the basis of 2009 from the following
data using 1) the aggregate expenditure method and 2) the family budget method.
Commodity
Quantity in units in
2009
A
B
C
D
E
F
100
25
10
20
25
30
8
6
5
48
15
9
12
7.50
5.25
52
16.50
27
Solution
(1) Aggregate expenditure method
Formula
Commodity
for
aggregate
expenditure
=
100
Price
per unit
in 2013
()
P1
12
Quantity
in units
in 2009
Price
per unit
in 2000
()
P0
8
100
800
1200
7.5
25
150
187.5
5.25
10
50
52.5
48
52
20
960
1040
15
16.5
25
375
412.5
27
30
270
810
q0
Total
= 2605
=
Quantitative Methods for Economic Analysis - I
method
1 0
= 3702.50
100
Page 129
3702.50
100 = 142.13
2605
=
Where
=
Commodity
100
150
800
120000
7.5
25
125
150
18750
5.25
10
105
50
5250
48
52
20
108.33
960
104000
15
16.5
25
110
375
41250
27
30
300
270
81000
898.33
2605
370250
100
370250
=
= 142.13
2605
Note: It should be noted that the answer obtained by applying the aggregate expenditure method
and family budget method is the same.
=
Page 130
Page 131
of a particular class of people stationed at a particular place. In this index number we take
retail price of the commodities.
2. The wholesale price index number and the consumer price index numbers are generally
different because there is lag between the movement of wholesale prices and the retail
prices.
3. The retail prices required for the construction of consumer price index number increased
much faster than the wholesale prices i.e. there might be erratic changes in the consumer
price index number unlike the wholesale price index numbers.
4. The method of constructing index numbers in general the same for wholesale prices and
cost of living. The wholesale price index number is based on different weighting systems
and the selection of commodities is also different as compared to cost of living index
number
Limitations or demerits of index numbers:
Although index numbers are indispensable tools in economics, business, management
etc, they have their limitations and proper care should be taken while interpreting them. Some of
the limitations of index numbers are
1. Since index numbers are generally based on a sample, it is not possible to take into
account each and every item in the construction of index.
2. At each stage of the construction of index numbers, starting from selection of
commodities to the choice of formulae there is a chance of the error being introduced.
3. Index numbers are also special type of averages, since the various averages like mean,
median, G.M have their relative limitations, their use may also introduce some error.
4. None of the formulae for the construction of index numbers is exact and contains the so
called formula error. For example Laspereys index number has an upward bias while
Paasches index has a downward bias.
5. An index number is used to measure the change for a particular purpose only. Its misuse
for other purpose would lead to unreliable conclusions.
6. In the construction of price or quantity index numbers it may not be possible to retain the
uniform quality of commodities during the period of investigation.
Page 132
selecting a group of stocks that are representative of the whole market or a specified sector or
segment of the market. An Index is calculated with reference to a base period and a base index
value.
Stock indexes are useful for benchmarking portfolios, for generalizing the experience of all
investors, and for determining the market return used in the Capital Asset Pricing Model
(CAPM).
A hypothetical portfolio encompassing all possible securities would be too broad to measure, so
proxies such as stock indexes have been developed to serve as indicators of the overall market's
performance. In addition, specialized indexes have been developed to measure the performance
of more specific parts of the market, such as small companies.
It is important to realize that a stock price index by itself does not represent an average return to
shareholders. By definition, a stock price index considers only the prices of the underlying stocks
and not the dividends paid. Dividends can account for a large percentage of the total investment
return.
An stock market index (or just index) is a number that measures the relative value of a group of
stocks. As the stocks in this group change value, the index also changes value. If an index goes
up by 1% then that means the total value of the securities which make up the index have gone up
by 1% in value.
A stock market index measures the change in the stock prices of the index's components.
How it works/Example:
Let's say we want to measure the performance of the Indian stock market. Assume there are
currently four public companies that operate in the United States: Company A, Company B,
Company C, and Company D.
In the year 2000, the four companies' stock prices were as follows:
Company A
10
Company B
Company C
12
Company D
25
Total 55
To create an index, we simply set the total (55) in the year 2000 equal to 100 and measure any
future periods against that total. For example, let's assume that in 2001 the stock prices were:
Company A
Company B
38
Company C
12
Company D
24
Total 78
Page 133
Because 78 is 41.82% higher than the 2000 base, the index is now at 141.82. Every day,
month, year, or other period, the index can be recalculated based on current stock prices.
Note that this index is price-weighted (i.e., the larger the stock price, the more influence it has on
the index). Indexes can be weighted by any number of metrics, including shares outstanding,
market capitalization, or stock price.
Name
Amex Composite
DWS NASDAQ-100 Volatility Target Index
FTSE NASDAQ 500 Index
NASDAQ Capital Market Composite Index
NASDAQ Composite
NASDAQ Global Market Composite
NASDAQ Global Select Market Composite
NASDAQ OMX 100 Index
NASDAQ OMX AeA Illinois Tech Index
NASDAQ OMX Middle East North Africa Index
NASDAQ-100
NYSE Composite
OMX Baltic 10
OMX Copenhagen 20
OMX Helsinki 25
OMX Nordic 40
OMX Stockholm 30 Index
Russell 1000
Russell 2000
Russell 3000
S&P 100
S&P 500
S&P MidCap
The NASDAQ-100 Equal Weighted Index
VINX 30
Wilshire 5000
CNX Nifty(The CNX Nifty is a well diversified 50 stock index accounting for 23 sectors
of the economy. It is used for a variety of purposes such as benchmarking fund portfolios,
index based derivatives and index funds.)
LIX15 Midcap
Page 134
CNX 100
Nifty Midcap 50
CNX Midcap
India VIX
CNX IT Index
CPSE Index
CNX Shariah25
Page 135
CNX 100 Equal Weight (The CNX 100 Equal Weight Index comprises of same constituents as
CNX 100 Index (free float market capitalization based Index).
The CNX 100 tracks the behavior of combined portfolio of two indices viz. CNX Nifty and CNX
Nifty Junior. It is a diversified 100 stock index. The maintenance of the CNX Nifty and the CNX
Nifty Junior are synchronized so that the two indices will always be disjoint sets; i.e. a stock will
never appear in both indices at the same time.)
CNX Alpha Index
CNX Defty
NV20 Index
NI15 Index
Nifty TR 2X Leverage
Nifty TR 1X Inverse
Page 136
They provide a historical comparison of returns on money invested in the stock market
against other forms of investments such as gold or debt.
They can be used as a standard against which to compare the performance of an equity
fund.
In It is a lead indicator of the performance of the overall economy or a sector of the
economy
Stock indexes reflect highly up to date information
Modern financial applications such as Index Funds, Index Futures, Index Options play an
important role in financial investments and risk management
The Sensex is an "index". What is an index? An index is basically an indicator. It gives you a
general idea about whether most of the stocks have gone up or most of the stocks have gone
down. The Sensex is an indicator of all the major companies of the BSE.
Quantitative Methods for Economic Analysis - I
Page 137
BSE SENSEX is considered as the Barometer of Indian Capital Markets. If the Sensex goes up,
it means that the prices of the stocks of most of the major companies on the BSE have gone up.
If the Sensex goes down, this tells you that the stock price of most of the major stocks on the
BSE have gone down.
BSE SENSEX, first compiled in 1986, was calculated on a "Market Capitalization-Weighted"
methodology of 30 component stocks representing large, well-established and financially sound
companies across key sectors. The base year of S&P BSE SENSEX was taken as 1978-79. S&P
BSE SENSEX today is widely reported in both domestic and international markets through print
as well as electronic media. It is scientifically designed and is based on globally accepted
construction and review methodology. Since September 1, 2003, BSE SENSEX is being
calculated on a free-float market capitalization methodology. The "free-float market
capitalization-weighted" methodology is a widely followed index construction methodology on
which majority of global equity indices are based; all major index providers like MSCI, FTSE,
STOXX, and Dow Jones use the free-float methodology.
The BSE Sensex currently consists of the following 30 major Indian companies as of October
2014
Axis Bank Ltd
Bajaj Auto Ltd
Bharat Heavy Electricals Ltd
Bharti Airtel Ltd
Cipla Ltd
Coal India Ltd
Dr.Reddy's Laboratories Ltd
GAIL (India) Ltd
HDFC Bank Ltd
Hero MotoCorp Ltd
Hindalco Industries Ltd
Hindustan Unilever Ltd
Housing Development Finance Corporation Ltd
ICICI Bank Ltd
Infosys Ltd
ITC Ltd
Larsen & Toubro Ltd
Mahindra and Mahindra Ltd
Maruti Suzuki India Ltd
NTPC Ltd
Oil and Natural Gas Corporation Ltd
Reliance Industries Ltd
Sesa Goa Ltd
State Bank of India
Sun Pharmaceutical Industries Ltd
Tata Consultancy Services Ltd
Tata Motors Ltd
Tata Power Company Ltd
Tata Steel Ltd
Wipro Ltd
Page 138
Sector
CEMENT AND CEMENT PRODUCTS
CEMENT AND CEMENT PRODUCTS
PAINTS
BANKS
AUTOMOBILES - 2 AND 3 WHEELERS
BANKS
ELECTRICAL EQUIPMENT
REFINERIES
TELECOMMUNICATION - SERVICES
OIL EXPLORATION/PRODUCTION
PHARMACEUTICALS
MINING
CONSTRUCTION
PHARMACEUTICALS
GAS
CEMENT AND CEMENT PRODUCTS
COMPUTERS - SOFTWARE
BANKS
AUTOMOBILES - 2 AND 3 WHEELERS
ALUMINIUM
PERSONAL CARE
FINANCE - HOUSING
CIGARETTES
BANKS
BANKS
COMPUTERS - SOFTWARE
FINANCIAL INSTITUTION
STEEL AND STEEL PRODUCTS
BANKS
ENGINEERING
PHARMACEUTICALS
AUTOMOBILES - 4 WHEELERS
AUTOMOBILES - 4 WHEELERS
MINING
POWER
OIL EXPLORATION/PRODUCTION
POWER
BANKS
REFINERIES
MINING
BANKS
PHARMACEUTICALS
Page 139
COMPUTERS - SOFTWARE
AUTOMOBILES - 4 WHEELERS
POWER
STEEL AND STEEL PRODUCTS
COMPUTERS - SOFTWARE
CEMENT AND CEMENT PRODUCTS
BREW/DISTILLERIES
COMPUTERS - SOFTWARE
Page 140
equally spaced and, the observations may take values from a continuous distribution.
The above setup could be easily generalized: for example, the times of observation need not be
equally spaced in time, the observations may only take values from a discrete distribution . . .
If we repeatedly observe a given system at regular time intervals, it is very likely that the
observations we make will be correlated. So we cannot assume that the data constitute a random
sample. The time-order in which the observations are made is vital.
Objectives of time series analysis:
description - summary statistics, graphs
analysis and interpretation - find a model to describe the time dependence in the data, can we
interpret the model
forecasting or prediction - given a sample from the series, forecast the next value, or the next
few values
control - adjust various control parameters to make the series fit closer to a target
adjustment - in a linear model the errors could form a time series of correlated observations,
and we might want to adjust estimated variances to allow for this
Types of time Series
1. continuous
2. discrete
Discrete means that observations are recorded in discrete times it says nothing about the nature
of the observed variable. The time intervals can be annually, quarterly, monthly, weekly, daily,
hourly, etc.
Continuous means that observations are recorded continuously -e.g. temperature and/or humidity
in some laboratory. Again, time series can be continuous regardless of the nature of the observed
variable.
Discrete time series can result when continuous time series are sampled. Sometimes quantities
that don't have an instantaneous value get aggregated also resulting in a discrete time series e.g.
daily rainfall We will mostly study discrete time series in this course. Note that discrete time
series are often the result of discretization of continuous time series (e.g. monthly rainfall).
Uses of time series
There are two main uses of time series analysis: (a) identifying the nature of the phenomenon
represented by the sequence of observations, and (b) forecasting (predicting future values of the
time series variable). Both of these goals require that the pattern of observed time series data is
identified and more or less formally described. Once the pattern is established, we can interpret
and integrate it with other data (i.e., use it in our theory of the investigated phenomenon, e.g.,
seasonal commodity prices). Regardless of the depth of our understanding and the validity of our
interpretation (theory) of the phenomenon, we can extrapolate the identified pattern to predict
future events.
The usage of time series models is twofold:
Page 141
Obtain an understanding of the underlying forces and structure that produced the
observed data
Fit a model and proceed to forecasting, monitoring or even feedback and feedforward
control.
Economic Forecasting
Sales Forecasting
Budgetary Analysis
Stock Market Analysis
Yield Projections
Process and Quality Control
Inventory Studies
Workload Projections
Utility Studies
Census Analysis
Time series analysis can be useful to see how a given asset, security or economic variable
changes over time or how it changes compared to other variables over the same time period. For
example, in stock market investments, suppose you wanted to analyze a time series of daily
closing stock prices for a given stock over a period of one year. You would obtain a list of all the
closing prices for the stock over each day for the past year and list them in chronological order.
This would be a one-year, daily closing price time series for the stock. Delving a bit deeper, you
might be interested to know if a given stock's time series shows any seasonality, meaning it goes
through peaks and valleys at regular times each year. Or you might want to know how a stocks
share price changes as an economic variable, such as the unemployment rate, changes.
The analysis of time series if of great significance not only to the economists and business man
but also to the scientist, astronomist, geologist etc. for the reasons given below.
1) It helps in understanding past behavior. It helps to understand what changes have taken
place in the past. Such analysis is helpful in predicting the future behavior.
2) It helps in planning future operations : Statistical techniques have been evolved which
enable time series to be analysed in such a way that the influence which have determined
the form of that series may be ascertained. If the regularity of occurrence of any feature
over a sufficient long period could be clearly established then. Within limits, prediction
of probable future variations would become possible.
3) It helps in evaluating current accomplishments. The actual performance can be compared
with the expected performance and the cause of variation analysed. For example, if
expected sale for 2000-01 was 10,000 washing machine and the actual sale was only
9000. One can investigate the cause for the shortfall in achievement.
4) It facilitates comparison. Different time series are often compared and important
conclusions drawn therefrom.
Quantitative Methods for Economic Analysis - I
Page 142
Page 143
Demand
120
110
90
115
125
117
121
The engineer who is in charge of this project needs to predict the demand for the next month (the
8th month) based on the available data. He decided to take the average of the data and predicted
the demand as follows.
Average = (120 + 110 + 90 + 115 + 125 + 117 + 121)/7 = 114
But this method has a disadvantage. The above method is known as the Simple Mean
Forecasting Method. The main problem with this method is the space limitation for storing all of
the past data. If the data contains several thousand items, each of which has several hundred data
records, you need a lot of memory space to store this data on your computer. In addition, this
method is not very sensitive to a shift in recent data if it contains a large number of data points.
A solution to the these problems is the Moving Averages technique. Using this method, you need
to maintain only the N most recent periods of data points. At the end of each period, the oldest
period's data is discarded and the newest period's data is added to the data base. The average is
then divided by N and used as a forecast for the next period.
The formula for a three period moving average is given below:
(3) =
]
Page 144
Now using the three period moving average, the average for the above problem can be calculated
as follows.
[
When a trend is to be determined by the method of moving average value for a number of years
is secured and this average is taken as the normal or trend value for the unit of time falling at the
middle of the period covered in the calculation of the average. While applying this method, it is
necessary to select a period for moving average such as 3 yearly, 5 yearly or 8 yearly moving
average etc.
The 3 yearly moving average shall be computed as follows :
+
+ +
,
3
3
5 yearly moving average
+ +
5
Example
+ +
+ +
3
+ +
..
+ +
3
..
Calculate the 3 yearly moving average and 5yearly moving average of the producing figures
given below .
For computing three yearly trend, first find three yearly moving totals a+b+c, b+c+d, c+d+eetc
(Column 3 in the following table). Then find average of each. Since it is sum of three
,
3 yearly
moving totals
(1)
(2)
(3)
3 yearly
moving averages
(trend values)
(4) =(3)3
1990
242
(5)
_
5 yearly
moving
averages
(trend values)
(6) = (5) 5
_
1991
250
744
248.0
1246
249.2
1992
252
751
250.3
1259
251.8
1993
249
754
251.3
1260
252
1994
253
757
252.3
1265
253
1995
255
759
253.0
1276
255.2
1996
251
763
254.3
1288
257.6
1997
257
768
256.0
1295
259
1998
260
782
260.7
1999
265
787
262.3
2000
262
5 yearly
moving
totals
Page 145
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1991
1992
1993
1994
1995
1996
1997
1998
Value
Moving Average
270
265
260
255
250
245
240
235
230
Actual
Forecast
3 per. Mov. Avg. (Forecast)
Data Point
Quantitative Methods for Economic Analysis - I
Page 146
Page 147
Solving these two normal equations, we get a and b. Substituting these values in the
equation y = a+bx, we get the trend equation.
Example:
Fit a linear trend to the following data by the least square method.
Year
2000
2002
2004
2006
2008
Production
18
21
23
27
16
Solution
Let x = t -2004 .(I)
Let the trend line of y (production) on x be
= + , (
2004)..(II)
Year (t)
x2
x=t-2004
xy
Ye=21+0.1x
Y-Ye
2000
18
-4
16
-72
20.6
-2.6
2002
21
-2
-42
20.8
0.2
2004
23
21
2006
27
54
21.2
5.8
2008
16
16
64
21.4
-5.4
=105
=0
=40
=4
)=0
+
Page 148
105 = 5 +
=
105
= 21
5
4=
0+
40
4
1
=
= 0.1
40 10
Putting x = 4, 2,0,2 and 4 in (III), we obtain the trend values ( ) for the years 2000,
20022008 respectively, as given in last but one column of the table above.
The difference (
We have
The least square methods (LSM)is probably the most popular technique in statistics. This is due
to several factors.
First, most common estimators can be casted within this framework. For example, the mean of a
distribution is the value that minimizes the sum of squared deviations of the scores.
Second, using squares makes LSM mathematically very tractable because the Pythagorean
theorem indicates that, when the error is independent of an estimated quantity, one can add the
squared error and the squared estimated quantity.
Third, the mathematical tools and algorithms involved in LSM (for eg. derivatives) have been
well studied for a relatively long time.
The use of LSM in a modern statistical framework can be traced to Galton (1886) who used it in
his work on the heritability of size which laid down the foundations of correlation and (also gave
the name to) regression analysis. The two antagonistic giants of statistics Pearson and Fisher,
who did so much in the early development of statistics, used and developed it in different
contexts (factor analysis for Pearson and experimental design for Fisher).
Nowadays, the least square method is widely used to find or estimate the numerical values of the
parameters to fit a function to a set of data and to characterize the statistical properties of
estimates. It exists with several variations: Its simpler version is called ordinary least
squares(OLS), a more sophisticated version is called weighted least squares (WLS), which often
performs better than OLS because it can modulate the importance of each observation in the final
solution. Recent variations of the least square method are alternating least squares (ALS) and
partial least squares (PLS).
Page 149
Page 150
MODULE IV
NATURE AND SCOPE OF ECONOMETRICS
Econometrics: Meaning, Scope, and Limitations - Methodology of econometrics-Modern
interpretation-Stochastic Disturbance term- Population Regression Function and Sample
Regression Function-Assumptions of Classical Linear regression model.
Introduction
Between the world wars, advances in mathematical statistics and a cadre of
mathematically trained economists led to econometrics, which was the name proposed for the
discipline of advancing economics by using mathematics and statistics. The roots of modern
econometrics can be traced to the American economist Henry L. Moore. Moore studied
agricultural productivity and attempted to fit changing values of productivity for plots of corn
and other crops to a curve using different values of elasticity. Moore made several errors in his
work, some from his choice of models and some from limitations in his use of mathematics.
Ragnar Frisch coined the word econometrics and helped to found both the Econometric
Society in 1930 and the Journal Econometrica in 1933.
It may be described as a branch of economics in which economic theory and statistical
methods are fused in the analysis of numerical and institutional data. The term econometrics
means economic measurement, which is synonymous with empirical research in economics.
Econometrics is concerned with the measurement of data or the application of statistical
procedures, which have been formulated in mathematical terms. It is therefore a branch of
mathematical economics. Statistical data and statistical procedures are employed to provide
numerical results, which may be used for verification of or to help in verification of economic
theorems. Econometrics provides the quantitative information that may be used to make a
qualitative analysis empirically truer and more meaningful.
The term econometrics is formed from two Greek words which means, economy and measure.
Econometrics is a rapidly developing branch of economics. Econometrics aims to give empirical
content to economic relations. The term econometrics was first used by PawelClompa in 1910.
But the credit of coining the term econometrics should be given to Ragnar Frisch (1936), one of
the founders of the Econometric Society. He was the person who established the subject in the
sense in which it is known today. Econometrics can be defined generally as the application of
mathematics and statistical methods to the analysis of economic data. In the words of
Samuelson, Koopmans and Stone, econometrics is defined as the quantitative analysis of actual
economic phenomena based on the concurrent development of the theory and observation,
related by appropriate methods of inference (1954). Other definitions of econometrics are:
Every application of mathematics or of statistical methods to the study of economic phenomena
(Malinvaud 1966)
Quantitative Methods for Economic Analysis - I
Page 151
The production of quantitative economic statements that either explain the behaviour of variables
we have already seen, or forecast (ie. predict) behaviour that we have not yet seen, or both
(Christ 1966)
Econometrics is the art and science of using statistical methods for the measurement of economic
relations (Chow, 1983).
Need for econometrics
Economic theory makes statements or hypotheses that are mostly qualitative in nature.
For eg. Micro economic theory states that other thing remaining the same, a reduction in the
price of a commodity is expected to increase the quantity demanded of that commodity. Thus
economic Theory postulates a negative or inverse relation between price and quantity. But the
theory does not provide any numerical measure of the relationship between the two. It is the job
of the econometrician to provide such numerical estimates. Econometrics give empirical content
to most economic theory.
Scope of Econometrics
To make the meaning of econometrics more clear and detailed, it is appropriate to quote Frish
(1933) in full. econometrics is by no means the same as economic statistics. Nor is it
identical with what we call general economic theory, although a considerable portion of this
theory has a definitely quantitative character. Nor should econometrics be taken as synonymous
with the application of mathematics to economics. Experience has shown that each of these
three view points, that of statistics, economic theory, and mathematics, is necessary, but not by
itself a sufficient, condition for a real understanding of the quantitative relations in modern
economic life. It is this unification of all three that is powerful. And it is this unification that
constitutes econometrics.
Let us consider the following example to understand this unification more clearly. From +2
classes onwards we learn demand function which explains that demand is a function of price,
assuming ceteris paribus. When we relax the assumption of ceteris paribus, we argue that
demand is influenced by four factors namely, price, price of substitutes, income and taste of the
consumer. So when we consider these four factors together, it is a case of exact relation. This
exact relation can be expressed in the form of a regression model, where quantity demanded is
dependent variable and price, price of substitutes, income and taste are the independent variables.
So this mathematical representation is again an exact relation. But practical wisdom suggests
that there are many more factors which influence the quantity demanded. Some new factors are
expectation of a price rise, coming of a new product, government policy and so on. Because of
the influence of these factors, our price quantity relation becomes not exact. Then, naturally
there should be a provision to incorporate the influence of other factors. The inclusion of
provision for other factors is the uniqueness of econometrics and how it is done can be explained
in later pages.
Quantitative Methods for Economic Analysis - I
Page 152
Goals of econometrics
There are three main goals
1. Analysis- the testing of economic theory
2. Policy making -supplying numerical estimates which can be used for decision making
3. Forecasting using numerical estimates to forecast future values.
1. Analysis: Testing Economic theory
The earlier economic theories started from a set of observations concerning the behaviour
of individuals as consumers or producers. Some basic assumptions were set regarding the
motivations of individual economic units. From these assumptions the economists by pure
logical reasoning derive some general conclusion regarding the working process of the economic
system. Economic theories thus developed in an abstract level were not tested against economic
reality. No attempt was made to examine whether the theories explained adequately the actual
economic behaviour of individuals.
Econometrics aims primarily at the verifications of economic theories. That is obtaining
empirical evidence to test the explanatory power of economic theories. To decide how well they
explain the observed behaviour of the economic units.
2. Policy making
Various econometric techniques can be obtained in order to obtain reliable estimates of
the individual coefficients of economic relationships .The knowledge of numerical value of these
coefficients is very important for the decision of the firm as well as the formulation of the
economic policy of the government. It helps to compare the effects of alternative policy
decisions.
For eg. If the price elasticity of demand for a product is less than one (inelastic demand)
it will not benefit the manufacturer to decrease its price, because his revenue would be reduced.
Since econometrics can provide numerical estimate of the co-efficients of economic relationships
it becomes an essential tool for the formulation of sound economic policies.
3. Forecasting future values
In formulating policy decisions it is essential to be able to forecast the value of the
economic variables. Such forecasts will enable the policy makers to make efficient decision. In
formulating policy decisions, it is essential to be able to forecast the value of the economic
magnitudes. For example, what will be the demand for food grains in India by 2020? Estimates
about this are essential for formulating agriculture production policies. Similarly, what will be
the impact of a rise in deposit rate in share market and so on? It is known that if the bank deposit
rates go up, day to day demand for shares will come down. Econometric tools help in such
decision makings.
Quantitative Methods for Economic Analysis - I
Page 153
Page 154
of substitutes, income and taste are independent variables or regressors. There are certain
practical difficulties at this stage (1) there may be a host of variables influencing a phenomenon.
Then is it possible to identify all those variables? Even if we could identify all those variables, is
it appropriate to include all those variables in the model? If we are omitting certain important
variables, it will be leading to errors. Similarly if we are including large number of variables or
unnecessary variables, it will also lead to errors. When such errors are committed in the
development of an econometric model, it is called as specification bias or specification error. So
let us assume that we are considering only price as the variable influencing quantity demanded,
assuming other factors remain constant. So let us write,
D = f (P)
where D represents quantity demanded, P represents price.
(b) Sign and magnitude of parameters: Once the function is identified, next task is to
attribute signs to the coefficients. Based on the general theory, we know that price takes a
negative sign. Thus we can convert the demand function into a demand equation as follows
D = + P where
demand equation.
But we know that price is not the only factor influencing demand, but at the same time it is
difficult to add all the variables. Thus to accommodate the unexplained variables or variables
which are not included in the model, we add a stochastic term U into the model, called
disturbance term or error term. The inclusion of an error term makes an econometric model
unique and distinct from a mathematical model or exact model. When an error term is included,
our demand equation model will become,
D =
Similarly, in the case of consumption function, the variables are consumption expenditure,
income, savings, and government policy and so on. Conventionally we assume that consumption
expenditure depends on income, assuming other factors remain constant. Thus our consumption
function model will be,
C =
is intercept and is
Page 155
method is simultaneous equation model. However, in the present discussions let us limit to
single equation models.
The second issue is also very relevant. If we use a linear equation, there is an implied
assumption that, in the case of linear equations, the growth rate remains constant or more
precisely coefficient remain constant. When we estimate a demand equation, we assume that
the rate of change in quantity demanded for a change in price is constant. Similarly, in the case
of consumption function, we assume that the slope () remains constant; otherwise, marginal
propensity remains constant. If we apply little numerical wisdom, we can realize that marginal
propensity to consume can never be constant. Then what is the logic in assuming a linear
equation? Thus we have to keep in mind that linear equations are suitable for class room
analysis but not for policy research. However, after this caution, for the time being let us assume
that we follow a linear equation for the purpose of simple understanding and explanation.
When we develop an econometric model, time specifications are also very important.
Conventionally, for all current values we give suffix t, for previous values t-1 and for all
future values t+1(t*). Thus our models can be written as,
Dt =
+ Pt + Ut
.Demand equation
Ct =
+ Yt + Ut
Consumption equation
Yt =
+ Xt + Ut
When we incorporate only one independent variable, it is only a narrow situation of the
reality. When we want to make our model more realistic, we have to incorporate more number
of independent variables. When we use two independent variables, the model can be written as,
Yt =
1+
This is the most simple multiple regression model. When we have two or more independent
variables, the model becomes multiple regression models. The general form of a multiple
regression model can be written as,
Yt =
Yt =
+iXti + Ut
Just like incorporating current variables, it is easy to incorporate lagged variables or expected
variables in a model. See the following example.
Quantitative Methods for Economic Analysis - I
Page 156
Yt =
+ 1Pt +2Yt-1 + 3W* +Utwhere the new variables are Yt-1which is the lagged
Yt =
+ 1X1t + 2Zt + Ut
+ Log X t +Ut
+ Xt + Ut
Log Yt =
+ log Xt + Ut
The choice of the model depends on many factors, particularly the scatter diagram of the
dependent and independent variables. Among the following, the best is double log model
because the coefficients of the double log models give directly elasticity values.
Thus in the model specification stage we consider mainly, the variables to be included in the
model, and also the mathematical form of the model. Any error committed in this stage will lead
to errors termed as specification bias or specification error, as mentioned earlier.
2 Estimation of the model
As mentioned above, one of the objectives of econometric models is to estimate the
coefficients. Estimations are possible only if data are gathered. Data can be collected either by
census method or sample method. Important sampling methods used are simple random sample,
stratified sample, systematic sample, multistage sampling, cluster sampling and quota sampling.
Similarly, data are classified into primary data, secondary data, time series data, cross section
data and pooled data.
In econometric models, the distinction between time series data and cross section data are
important. To make its distinction clear, let us consider the following example,
Year
1999
2000
2002
2003
2004
2005
2007
2008
2010
Sales
15
14
17
14
12
14
17
14
12
Page 157
A casual look into the data set gives an impression that it belongs to time series, because it is
ordered in time. But the given set is neither time series nor cross section. Why?
For a data set to be time series, there are two conditions. Data collection interval should be
equal and gather information on a single entity. The given set of data does not obey these
conditions and hence not time series. But if we are provided with sales data for a few years, with
regular intervals, on year, six months etc, definitely they constitute time series data.
Now what is cross section data? When we gather information on multiple entities at a point
of time, it is called cross section data. For example, if we are gathering details of income,
savings, education, occupation etc of a group of 35 persons at a point of time, it is the best
example of cross section data. In other words, survey data are broadly cross section data.
In short, time series data is gathered at an interval of time while cross section data are
gathered at a point of time. The classification of time series and cross section data are important
because, the use of appropriate techniques depends on the nature of the data, whether it is time
series or cross section.
Another set of data used in econometric modelling is pooled data. Pooled data, in a simple
way is the integration or mixing of time series and cross section data. But the treatment pooled
data set is little complicated.
Aggregation problem
Once the data are collected, another issue to be dealt is the aggregation problem. Aggregation
problem arises from the irrational pooling of data. Aggregation problems are classified into
aggregation over individuals, over commodities, over space and over time.
Aggregation over individuals arises when we get the sum total of income of a few individuals
or income of firms. When we do this exercise, we are likely to commit errors. For example, if
the income of three persons namely, X, Y and Z are, Rs100000, Rs10000 and Rs500
respectively, their aggregate income can be easily computed as Rs110500 and average income as
36833, but this computation as well comparison is unscientific and leads to aggregation problem
over individuals. We may aggregate over the quantities of various commodities using
appropriate quantity indexes or over the prices of a group of commodities using some
appropriate price index. But these aggregations may lead to errors called as aggregation over
commodities.
While we collect data for different purposes, periodicity is very important. But in many
practical situations, this periodicity is not maintained. For example, in India, data are gathered at
two levels. One classification is recording of data at calendar year while the other one is
recording of data at financial year. Accountants admit that these differences create sufficient
difficulties while computing certain ratios or while comparing different years. This problem is
called aggregation over time.
At last, the aggregation of population of different towns, countries, regions also create
problems. This problem is called aggregation over space. The above sources of aggregation
Quantitative Methods for Economic Analysis - I
Page 158
create various complications which may impart some aggregation bias in the estimates of the
coefficients.
Identification problem
While discussing the econometric methodology, econometricians mention the problem of
identification of coefficients. This problem arises seriously only in the case of simultaneous
equation models, but a mentioned is made below.
We know that demand is a function of price. Similarly, supply is also a function of price.
Thus, at equilibrium point, demand equals supply. Thus at this point, we do not know whether
we are estimating the parameters of the demand function or the supply function. The problem
becomes more complex while we deal with a system of large number of equations.
Choice of the appropriate econometric technique: Next issue is the selection of the
appropriate method for estimating the coefficient of economic relationships. The kit of
econometric tools provides different techniques which can be split into single equation
techniques and simultaneous equation techniques. The important single equation techniques are
Ordinary Least Square method, Indirect Squares or Reduced form technique, Two Stage least
Square method and Limited Information Maximum Likelihood method and mixed estimation.
Simultaneous equation techniques are techniques which applied to all equations of a system at
once, and give estimates of the coefficients of all the functions simultaneously. The most
important are the three stage least squares method and the full information maximum likelihood
method. The selection of the method depends on the following.
1. The nature of the relation and its identification condition.
2. The properties of the estimates of the coefficients obtained from each technique
3. Simplicity of the method
4. Time and cost requirements of the method
5. The desirable properties expected for the coefficients.
3 Evaluation of estimates
After the estimation of the model, the econometrician must proceed with the evaluation of
the results of the computations. That is, we are testing the reliability of the results. The
evaluation consists of deciding whether the estimates of the parameters are theoretically
meaningful and statistically satisfactory. For this purpose, we use different criteria, namely
apriori criteria, statistical criteria and econometric criteria.
Page 159
Statistical criteria (First order test):The coefficients estimated may be apriori true but need
not be statistically valid. Thus the validity of the model is to be ascertained using statistical
criteria. The frequently used tests are standard error, t test, Coefficient of determination and F
ratio. These tests are discussed later in detail.
Econometric Criteria (Second order test):The validity of the model also depends on the
validity of the assumptions of the model or more specifically the stochastic assumptions. If the
assumptions of the econometric method applied by the investigators are not satisfied, either the
estimates of the parameters cease to possess some of their desirable properties or the statistical
criteria lose their validity and become unreliable for the determination of the significance of
these estimates.
When the model does not satisfy the economic, statistical or econometric criteria, it is
appropriate to re specify the model. This process and re estimation should continue until we get
reliable estimates.
Page 160
will normally differ from the new estimates. The difference is tested for statistical significance
with appropriate methods.
Page 161
Page 162
There are often competing models capable of explaining the same recurring relationship, called
an empirical regularity, but few models provide useful clues to the magnitude of the association.
Yet this is what matters most to policymakers. When setting monetary policy, for example,
central bankers need to know the likely impact of changes in official interest rates on inflation
and the growth rate of the economy. It is in cases like this that economists turn to econometrics.
Econometrics uses economic theory, mathematics, and statistical inference to quantify economic
phenomena. In other words, it turns theoretical economic models into useful tools for economic
policymaking. The objective of econometrics is to convert qualitative statements (such as the
relationship between two or more variables is positive) into quantitative statements (such as
consumption expenditure increases by 95 cents for every one dollar increase in disposable
income). Econometricianspractitioners of econometricstransform models developed by
economic theorists into versions that can be estimated. As Stock and Watson put it, econometric
methods are used in many branches of economics, including finance, labor economics,
macroeconomics, microeconomics, and economic policy. Economic policy decisions are rarely
made without econometric analysis to assess their impact.
Econometrics can be divided into theoretical and applied components.
Theoretical econometricians investigate the properties of existing statistical tests and procedures
for estimating unknowns in the model. They also seek to develop new statistical procedures that
are valid (or robust) despite the peculiarities of economic datasuch as their tendency to change
simultaneously. Theoretical econometrics relies heavily on mathematics, theoretical statistics,
and numerical methods to prove that the new procedures have the ability to draw correct
inferences.
Applied econometricians, by contrast, use econometric techniques developed by the theorists to
translate qualitative economic statements into quantitative ones. Because applied
econometricians are closer to the data, they often run intoand alert their theoretical
counterparts todata attributes that lead to problems with existing estimation techniques. For
example, the econometrician might discover that the variance of the data (how much individual
values in a series differ from the overall average) is changing over time.
The main tool of econometrics is the linear multiple regression model, which provides a formal
approach to estimating how a change in one economic variable, the explanatory variable, affects
the variable being explained, the dependent variabletaking into account the impact of all the
other determinants of the dependent variable. This qualification is important because a regression
seeks to estimate the marginal impact of a particular explanatory variable after taking into
account the impact of the other explanatory variables in the model.
The methodology of econometrics is fairly straightforward. It involves 4 steps as explained
below.
Quantitative Methods for Economic Analysis - I
Page 163
The first step is to suggest a theory or hypothesis to explain the data being examined. The
explanatory variables in the model are specified, and the sign and/or magnitude of the
relationship between each explanatory variable and the dependent variable are clearly stated. At
this stage of the analysis, applied econometricians rely heavily on economic theory to formulate
the hypothesis. For example, a tenet of international economics is that prices across open borders
move together after allowing for nominal exchange rate movements (purchasing power parity).
The empirical relationship between domestic prices and foreign prices (adjusted for nominal
exchange rate movements) should be positive, and they should move together approximately one
for one.
The second step is the specification of a statistical model that captures the essence of the theory
the economist is testing. The model proposes a specific mathematical relationship between the
dependent variable and the explanatory variableson which, unfortunately, economic theory is
usually silent. By far the most common approach is to assume linearitymeaning that any
change in an explanatory variable will always produce the same change in the dependent variable
(that is, a straight-line relationship).
Because it is impossible to account for every influence on the dependent variable, a catchall
variable is added to the statistical model to complete its specification. The role of the catchall is
to represent all the determinants of the dependent variable that cannot be accounted forbecause
of either the complexity of the data or its absence. Economists usually assume that this error
term averages to zero and is unpredictable, simply to be consistent with the premise that the
statistical model accounts for all the important explanatory variables.
The third step involves using an appropriate statistical procedure and an econometric software
package to estimate the unknown parameters (coefficients) of the model using economic data.
This is often the easiest part of the analysis thanks to readily available economic data and
excellent econometric software. Just because something can be computed doesnt mean it makes
economic sense to do so.
The fourth step is by far the most important: administering the smell test. Does the estimated
model make economic sensethat is, yield meaningful economic predictions? For example, are
the signs of the estimated parameters that connect the dependent variable to the explanatory
variables consistent with the predictions of the underlying economic theory? (In the household
consumption example, for instance, the validity of the statistical model would be in question if it
predicted a decline in consumer spending when income increased). If the estimated parameters
do not make sense, how should the econometrician change the statistical model to yield sensible
estimates? And does a more sensible estimate imply an economically significant effect? This
step, in particular, calls on and tests the applied econometricians skill and experience.
REGRESSION ANALYSIS
The term regression was introduced by Francis Galton. Regression analysis is concerned
Quantitative Methods for Economic Analysis - I
Page 164
with the study of the dependence of one variable (dependent variable) on one or more other
variables (explanatory variables) with a view to estimating the average (mean) valve of the
former in terms of known (fixed) values of the latter.
Galton found that, although there was a tendency for tall parents to have tall children and for
short parents to have short children, the average height of children born of parents of a given
height tended to more or regress towards the average height in the population as a whole. In
other words, the height of the children of unusually tall or unusually shorts parents tends to more
towards the average height of the population. In the modern view of regression, the concern is
with finding out how the average height of sons changes, given the fathers height. Regression
analysis is largely concerned with estimating and/or predicting the (population) mean value of
the dependent variable on the basis of the known or fixed values of the explanatory variable.
Origin of the Linear Regression Model
There are different methods for estimating the coefficients of the parameters. Of these different
methods, the most popular and widely used is the regression technique using Ordinary Least
Square (OLS) method. This method is used because of the inherent properties of the estimates
derived using this method. But, first let us try to understand the rationale of this method. For
this purpose, let us go back to the demand theory as well as the consumption function which we
discussed in the earlier chapter. Demand theory says that there is a negative relation between
price and quantity demanded certeris paribus. In the case of consumption function, there is a
positive relation between consumption expenditure and income. There are three important
questions here.
1. Which is the dependent variable and which is the independent variable?
2. Which is the appropriate mathematical form which explains the phenomenon?
3. What is the expected sign and magnitude of the coefficients?
In order to answer these questions, the theory will give the necessary support.
In the case of demand equation, quantity demanded is the dependent variable, and price is the
independent variable. Economic theory does not discuss the choice between single equation
models or simultaneous equation models to discuss the relationship. So naturally we may
assume that the relation is explained with the help of single equation, that too assuming a linear
relation. As far as the sign and magnitude of the coefficients are concerned, in the equation,
D = + P + U, can take any value but preferably zero or positive. It actually shows the
quantity demanded at price zero. So chances of demanding negative quantity is very rare and
hence if we get negative quantity, it can be approximated to zero. In the case of , it can be
positive or negative. But normally it will be negative assuming that the commodity demanded is
a normal good. Of course, elasticity nature of the commodity also influences the magnitude and
nature of this value.
Quantitative Methods for Economic Analysis - I
Page 165
In the case of consumption function, consumption is the dependent variable and income is the
independent variable. Whether the relation is linear or non linear, is a debatable issue. For
instance, psychological law of Keynes suggests that when income increases, consumption also
increases, but less than proportionate. So assuming that consumption and income are linearly
related is in one way, over simplification. But for the time being let us assume so just for
explanatory purpose. Regarding the sign and magnitude of parameters and . There is some
meaning and interpretation. represents the consumption when income takes the value zero,
that is, according to theory, it is autonomous consumption. Similarly, is nothing but the value
of marginal propensity to consume which is normally less than 1 and can not be negative.
Based on the above discussed rationale and logic, let us rewrite the demand equation as D =
+ P + U , where D is the quantity demanded, P is price, and are the parameters to be
estimated. In order to estimate these parameters, we use Ordinary Least Square (OLS) method.
Once we plot this on a graph, we will be able to get the deviations between actual and estimated
observations, popularly called as errors. Naturally, a rational decision is to minimize these
errors. Thus from all possible lines, we choose the one for which the deviations of the points is
the smallest possible. The least squares criterion requires that the regression line be drawn in
such a way, so as to minimize the sum of the squares of the deviations of the observations from
it. The first step is to draw the line so that the sum of the simple deviations of the observations is
zero. Some observations will lie above the line and will have a positive deviation, some will lie
below the line, in which case, they will have a negative deviation, and finally the points lying on
the line will have a zero deviation. In summing these deviations the positive values will offset
the negative values, so that the final algebraic sum of these residuals will equal zero.
Mathematically, e = 0. Since the sum total of deviations is 0, it can not be minimized as such.
So we try to square the deviations and minimize the sum of the squares. e2. Thus we call this
method as least square method,
Population Regression Function (PRF)
Mathematically a population regression function (PRF) or Conditional Expectation Function
(CEF) can be defined as the average value of the dependent value for a given value of the
explanatory or independent variable. In other words, PRF tries to find out how the average value
of the dependent variable varies with the given value of the explanatory variable. On the other
hand, when we estimate the average value of the dependent variable with the help of a sample, it
is called stochastic sample regression function (SRF).
E(Y | Xi) = f (Xi)
where f (Xi) denotes some function of the explanatory variable X.
E(Y | Xi) is a linear function of Xi. This is known as the conditional expectation function
(CEF) or population regression function (PRF). It states merely that the expected value of the
Quantitative Methods for Economic Analysis - I
Page 166
distribution of Y given Xi is functionally related to Xi.In simple terms, it tells how the mean or
average response of Y varies with X. For example, an economist might posit that consumption
expenditure is linearly related to income. Therefore, as a first approximation or a working
hypothesis, we may assume that the PRF E(Y | Xi) is a linear function of Xi,
E(Y | Xi) = 1 + 2Xi
where 1 and 2 are unknown but fixed parameters known as the regression coefficients;
1 and 2 are also known as intercept and slope coefficients, respectively.
we can express the deviation of an individual Yi around its expected value as follows: ui
= Yi E(Y | Xi) or
Yi = E(Y | Xi) + ui where the deviation ui is an unobservable random variable taking
positive or negative values. Technically, ui is known as the stochastic disturbance or stochastic
error term.
We can say that the expenditure of an individual family, given its income level, can be
expressed as the sum of two components: (1) E(Y | Xi), which is simply the mean consumption
expenditure of all the families with the same level of income. This component is known as the
systematic, or deterministic, component, and (2) ui, which is the random, or nonsystematic,
component is a surrogate or proxy for all the omitted or neglected variables that may affect Y but
are not (or cannot be) included in the regression model.
If E(Y | Xi) is assumed to be linear in Xi, it may be written as
Yi = E(Y | Xi) + ui
= 1 + 2Xi+ ui
Sample regression function (SRF)
Since the entire population is not available to estimate y from given xi, we have to
estimate the PRF on the basis of sample information. From a given sample we can estimate the
mean value of y corresponding to chosen xi values. The estimated PRF value may not be
accurate because of sampling fluctuations. Because of this only an approximate value of PRF
can be obtained. In general, we would get N different sample regression function (SRFs) for N
different samples and these SRFs are not likely to be the same.
we can develop the concept of the sample regression function (SRF) to represent the
sample regression line.
Y =
1 + 2 Xi
Page 167
Page 168
determine the, approximate functional form. Scattergram cannot be visualised in multidimensional form. For all these reasons, the stochastic disturbance ui assumes an extremely
critical role in regression analysis.
Assumptions of Classical Linear Regression Model
1. U is a random real variable. The value which may assume in any one period depends on
chance. It may be positive, zero or negative. Each value has a certain probability of
being assumed by U in any particular instance.
2. The mean value of U in any particular period is zero. If we consider all the possible
values of U, for any given value of X, they would have an average value equal to zero.
With this assumption we may say that Y = +X + U gives the relationship between
X and Y on the average. That is, when X assumes the value X1, the dependent variable
will on the average assume the value Y1, although the actual value of Y observed in any
particular occasion may display some variation.
3. The variance of U is constant in each period. The variance of U about its mean is
constant at all values of X. In other words, for all values of X, the U will show the same
dispersion round their mean.
4. The variable U has a normal distribution
5. The random terms of different observations are independent. This means that all the
covariance of any U (ui) with any other U (uj) are equal to zero
6. U is independent of the explanatory variables
The above mentioned assumptions are really classic to regression estimations and make the
method OLS efficient.
There are a few other assumptions also used in OLS estimated. They are,
(i) The explanatory variables are measured without error. In other words, the explanatory
variables are measured without error. In the case of dependent variable, error may or may not
arise.
(ii) The explanatory variables are not perfectly linearly correlated. If there is more than one
explanatory variable in the relationship, it is assumed that they are not perfectly correlated with
each other. More specifically, we are assuming the absence of multicollinearity.
(iii) There is no aggregation problem. In the previous chapter, we discussed aggregation over
individuals, time, space and commodities. So we assume the absence of all these problems.
(iv) The relationship being estimated is identified. This means that we have to estimate a unique
mathematical form. There is no confusion about the coefficients and the equations to which it
belong.
(v) The relationship is correctly specified. It is assumed that we have not committed any
specification error in determining the explanatory variables, in deciding the mathematical form
etc.
*************
Page 169