Professional Documents
Culture Documents
STATISTICS
BY
D. N. ELHANCE, M. COM.,
H,ad qf thl D,partm,nt and Dean of 1M Faculty of Commerce.
Uniuersity of Jodhpur,
JDdhpur.
,-
Printed by: Eagle Offset Printers. 15. Thornhill Road, Allahabad
Published by: Kitab Mahal, 15. Thornhill Road. Allahabad
IN MEMORY
OF
MY FATHER
PREFACE TO THE THIRTBB'lTfI EDlrfO~
A new edition of the now famous book on Statistics has come out
maintaining its old traditions intact but with new approaches all round
to register and record the various changing aspects.
Calculations have been re-calculated in order to eliminate any
slightest variations which may haye crept in during the past years-
Change to metric system has also been completed.
In its present fOJ;m the utility of the book has increased consider.
ably, and University students as well as administrators will find suffi.
cient material for their guidance and assistance.
The author will feel grateful to the discriminating student com-
munity and the general users of the book for their indulgence in pinpoin-
ting any error.
D. N. ELHANCB
PREFACE TO THE SIXTH EDITION
The present edition of this book has many new features. Two
new chapters-Designs of Experiments and Statistical Q:lality Control-
have been added in this volume. The chapter on Growth of Statistics
in India has been made uptodate.and latest figures have been substituted
for old ones.
Some chapters of this book have been reVised and new points have
been included in them. A large number of fresh questions have been
added at the end of each chapter to make the book more useful to exa-
minees. The entire portion of Indian Statistics has been brought ·upto-
ate.
I hope the present volume would be found useful by the students
of the subject. J am grateful to a number of students and friends" ho
gave me valuable sug,gestions for the improvement of the book and
1 am confident that they would continue to do so in future also.
D. N. ELHANCB
PREFACE TO THE SECOND EDITION
From the various reviews which appeared in a large number of
journals and papers, I conclude that the first edition of this book was
very well received. In the present edition I have rearranged certain
chapters and made the chapter on Growth of Statistics in India upto-
date. Besides, I have included a large number of new questions at the
end of each chapter.
The book is now divided in two volumes. V.olume I covers tbe
eatire B. Com., B.A. and B.Sc. course of statistics of all the universi-
ties of India and Pakistan. Volume II contains chapters on Probability,
1 heoretical Frequency Distributions and Sampling. Tbe two volumes
ale available separately as well as in a combined form.
I am grateful to a la-rge number of friends who have given me
valuable s.uggestions for the improvement of the book. 1 hope the
students of the subject would lind the book more useful than before.
151h April. 1958. D. N. ELHANCB
PREFACE TO THE FIRST EDITION
The science of statistics has assumed great importance in recent
years. It was once known as the "Science of Kings" and its scope
was extremely limited, but today the science of statistics has become
an all-important science, without which no other science can progress.
Modern age is the age of statistics and it is very correctly said that the
extent of the economic development of a country can best be known
by finding out the extent to which statistical organisation has developed
there. Till recent! y the foreign government of our country and even
our countrymen were very indifferent towards statistics. After inde-
·pendence of the country the era of economic planning started and along
with it the importanc of statistics increased considerably, In fact
economic planning cannot be imagined in the absence of statistical
data.
o It is a matter of great satisfaction that the impottance of statistics
Is gradually being realised in our country and they are occupying the
place of honour which they should have got much earlier. Statistics
is now taught in almost all the universities of the country and there are
a number of statistical institutes which impart special trainihg in ~his
subject. This book is an attempt to furnish a simple, non-mathemat1cal
text for those who desire to equip themselves with a knowledge of the
elementary statistical methods used in modern times. The treatment
of this subject has been as far as possible of a non-mathematical character
because most of the students who study this subject do not always have
a mathematical background. This book has been written primarily
for use of M.A., M.Com., and B.Com. students who study this subject.
The book covers the entire course which is prescribed for the statistics
paper in these examinations in various universities of the country as
also the courses prescribed in LA.S. and P.C.S. examinations of the
paper. A large -pumber of questions have been given at the end of
each chaptet with a view to help the students in solving numetical pt'o-
blems and thus familiarising themselves with different types of formula~
used in statistical analysis.
I am grateful to my colleagues in the Faculty of Commerce, Alla~
habad University, who have given me some ver} valuable suggestions,
Thanks are also due to Mr. S.V. Erasmus, my secretary, who worked
almost like a machine for all the days during which this book was
written. Kitab Mahal, my publishers deserve congratulations for the
nice printing and get-up of the book.
IJI December, 1956. D. N. ELHANCE
CONTENTS
CHAPTER Page
)'r Meaning and Definition of Statistics 1
2. Origin and Growth of Statistics 8
/ Importance, Limitations and Functions of Statistics 16
4. Preliminaries to the Collection of Data 33
5. Collection of Primary and Secondary Data 41
6. Accuracy, Approximation and Errors 53
....:w--- Classification, Seriation and Tabulation 63
8. Ratios, Percentages and Logarithms 80
C Measures of Central Tendency 87
%. Measures of Dispersion 178
11. Moments, Skewness and Kurtosis 236
12. Index Numbers 250
~. Diagrammatic Representation of Data 300
..)..4. Graphic Representation of Data 347
15. Analysis of Time Series 405
1Jj. Correlation 454
17. Regression and Ratio of Variation 508
18. Theory of Attributes and Consistence of Data 528
19. Association of Attributes 546
20. Interpolation 577
21. Business Forecasting ... 610
/22 Interpretation of Data 619
,.23: Probability 629
24. Theoretical Frequency Distributiolls 654
25. Theory of Sampling 676
26. Sampling of Attributes 689
27. Sampling of Variables (Large Samples) 706
28. Chi-square Test and Goodness of Fit 736
29. 'Sampling of Variables (Small Samples) 757
30. Analysis of Variance 783
31. Designs of Experiments 796
32. Statistical Quality Control 802
33. Growth of Statistics in India 814
34. Mathematical Tables 994
DET A ILED CONTENTS
Statistical methods
The second sense in which the word statistics is used refers to
the statistical principles and methods -used in collection, analysis and
interpretation of data. In this sense the word is used in singular.
Statistical methods (or statistics) have a very wide range. They include
. not only simple and conunonly known devices of comparison and
analysis. but also highly technical and mathematical formulae which
are capable of being understood only by experts who have received
special training in this subject.
SttllislictJl methods IJIIIl experimllltlJl IIIIthods. Statistical methods
include all those devices which are used in collection and simplification
of nUIllerical data so as to render them capable of being analysed, and
conunonly understood without much difficulty. ,Statistical methods are
different from experimental methods in as much as the latter are more
accurate and precise than the former. In experimental methods it is
possible for us to study the effects of anyone of the many factors affect-
ing a phenomenon individually by making the other factors inoperative
for the time being. Thus in physics it is not difficult to study the effects
of, say, only heat on the density of air by making other factors in-
operative for the duration of study. But the same thing is not possible
in statistical methods. It is not feasible to study the effects of, say,
only inflation on prices. The effects of inflation cannot be separately
studied from the effects of many other factors like demand, supply,
exports and imports, etc. However, by the use of statistical methods
it is possible to have a rough idea of the effects of inflation upon prices.
Statistical study c~aot be as accurate as the study done by experimental
methods. Thus we see that statistical methods are comparatively less
accurate and are usually applied in inexact sciences like sociology though
even in physical sciences (which are classed as exact sciences), the use
of these methods is sometimes necessary. Statistical methods are thus
of universal application though their primary field is social sciences.
Thus "Statistics are- numerical facts, but statistics is a body of
methods for making decisions when there is uncertainty arising from
the incompleteness or the unstability of the information available. The
decisions may be made either for the practical purpose of selecting
a course of action or for the scientific purpose of gaining genera]
knowledge."
DEPINITION
The term Stmsliu has been defined differeady by different au-
thors. Some authors have defined the word as used in the first sense
(of numerical d~ta) while others have .defined it as. 1l:sed in the second
sense (of statistical methods or the sCience of statistics).
Firat Type
Of the first type of definitions the one given by HortJce Secrist iJ
the most exhaustive. It is as follows -
MEANING AND DEPINITION OP STATISTICS 3
·'By statistics we mean aggregates of facts affected to a marked
extent by multiplicity of causes numerically expressed, enu-
merated or estimated according to reasonable standards of
accuracy, collected in a systematic manner for a pre-deter-
mined purpose and placed in relation to each other."
This definition makes it clear that statistics (as numerical data)
should possess the following characteristics : -
(i) They should b6 aggregates of facts. Single and unconnected
figures are not statistics. A single age of 25 years or 40 years is not
statistics but a series relating to the ages of a group of persons would
be called statistics. A single figure relating to birth, death, purchase,
sale, accident, etc., does not form statistics though aggregates of figures
relating to births, deaths, purchases, sales, a~cidents, etc., would be
called statistics because they can be, studied in relation to each other and
are capable of comparison. It is possible to study them in relation to
time, place and frequency of occurrence.
. (it) They should be affected to a marked extent by multiplicity oj cause-r.
Usually statistical facts are not traceable to a single cause. Since statis-
tics are m~st commonly used in social sciences it is only natural that
they are affected by a large variety of factors at the same time. It is
usually not possible to study the effects of anyone of these factors se-
parately as is the case in experimental methods. In statistical methods
the effects of various factors affecting a particular phenomenon are
generally studied in a combined form though attempts are also made
to study the effects of different sets of factors sepll-rately as well. Most
of the statistics, however, are affected to Ii considerable degree by mul-
tiple causation. For example, statistics of prices are affected by con-
ditions of supply, demand, exports, imports, currency circulation and
a large numbet of other factors.
(iit) They should b6 numerically 6Xpre.rS6J. Qualitative expressions
like good, bad, young, old, etc., do not form part of statistical studies
un1e~" a numerical equivalent is assigned to each such expression. If
it is said that the production of wheat per acre in 1953 was 100 maunds
and in the year 1954 it was only 60 maunds or if it is said that of two
perspns A and B, A is 20 years old and B 60 years old, we shall be mak-
ing statistical statements.
(iv) They should be enumerated or estimated according to reasonable
standards oj acctlraqy. Numerical statements can either be enumerated
in which case, they are supposed to be accurate and precise or they can
be estimated by some expert observers. Where the scope of statistical
enquiry is very wide or where the numbers are very large, enumeration
i~ usually out of question and in such cases :ligures can only he estimated.
It. is obvious that estimated :ligures cannot be absolutely accurate and I
pI(cise. The degree of accuracy expected in such :ligures depends to
a large extent on the purpose for which statistics are collected and also
4 JroN1).umNTALS 01' STA'rlS'nCS
on the nature of the particular problem about which data are being
collected. There cannot be a uniform standard of accuracy for all
types of enquiries. For-example. if the heights of a group of individuals
are being measured it"is all right i1' the measurements are correct to the
tenth of an inch but if we are measuring the dista.ri.cc from Bombay
to Calcutta, a difference of a few: furlongs even, can be easily ignored.
(v) They shOtlld bl coll,;leti in " syslll1JaliG l1Jamur. If figures are
collected in a haphazard fashion Ole can never be sure about the degree
of accuracy of such _data. It is, therefore, essential that statistics must
be collected in a'systematic manner so that they may conform Jo re-
asonable standards of accuracy.
(VI) ThU .rooflld be collettetifor a J>f'Itilllrmineti Pll1"poSI. It is obvious
that if statistical data are not collected with some predetermined aim
their usefulness would be almost negligible. Figures, are usually collect-
ed with some end in view, as without it all the efforts made in the collec-
tion of figures would be completely wasteful and the figures so collected
would not be in any way us<tuI.
(viI) The} should h, pfaffti in r,lation 10 ell&D Dlher. Statistics are
collected mostly for the purpose of comparison. If the collected figures
are not capable of being compared with each other they. lose a very
large part of their value. It is n..ecessary that the figures which' are
collected should be a homogeneous lot because it is not possible to
compare figures which are of a heterogeneous character and which
cannot be placed in relationship to each other. 1£\ for example, the
height of a person and the money spent by him in getting his house
constructed are placed together it does not make any sense and the figures
cannot be compared to each other. Such figures naturally do not come
under the category of statistics.
Webster has"also defined statistics in the same sense in which Secrist
has defined it. Webster's definition of statistics is as follows:
"Statistics are the classified facts respecting the conditions of
the people in a state ... speclally those facts which can be stated
in numbers or in tables of numbers or in any tabular or classified
arrangement."
This definition is rather narrow. It confines statis~cs onlYrto
those facts which relate to the condition of the l?eople in a state. '.Ilhis
was a very old concept o~ the word statistics and it does not suit modern
conditions. At presen~ statistics relate to all aspects of human activity
and as such this definition falls short of the modern concept of the term.
Moreover, this definition is not as clear and exhaustive as the one given
by Secrist. :
Second Type
Of the second type of definitions of the' term statistics (as statis-
tical methods or science ~f statistics) the oni: given b'1 S,ligm4n is very
short and simple and yet quite comprehensive. According to Selig-
WlANING AND DEFINITION OF STATIS1'ICS
man' 'Statistics is the science which deals with the Ibcthods of collecting,
classifying, presenting, comparing and interpretin~ numerical data col-
lected to throw some light on any sphere of enquIry."
. Acco~ding to King <"the science of statistics is the method of judg-
tog colleCtive, natural or social phenomenon from the results obtained
from the analysis or enumerarlon or collection of estimates." This
~efinitjon is not very exhaustive and it limits the scope of the science
of statistics. The author himself admits this defect but is of the view
that for practical purposes it is all right.
A. L. Bowley has given a series of'definitions but most of the de-
finitions given by him are not complete and lay emphasis only on
some;: of the aspects of the science. At one place Bowley says, «Statis-
tics may be called the science of counting.... At another place he is of
the view that "Statistics may rightly be called the science of averages".
Both these definitions are defective as the science of statistics does not
confine itself either to counting or to averaging alone. Th~se are no
doubt important statistical methods but they do not cover the entire
field of the science of statistics. Yet another definition given by the
same author characterises statistics as "the science of measurement of
tbe social organism regarded as a whole in all its manifestations."'" O'b-
viously this definition limits the application of the statistical methods
to only one field, namely, sociology. Bowley realised this limitation
and he himself writes at another place that statistics cannot be confined
to anyone science.
Bodtlington has defined statistics as the science of "estimates and
probabilities." This definition gives expression only- to certain methods
by which conclusions are derived in this science. No doubt in most
of the cases statistics are estimates and 'probabilities' but it should be
remembered that the scope of the science is not confined merely to
these things.
Lovitt ddfines the science as 'that which deals 'With the collection
classification and tabulation of numerical facts as th6 basis for explana~
tion, description and comparison of phenome~." This ·definition is
fairly satisfactory and it indicates that the science .of statistics is a sim-
ple and scientific exposition of statistical methods.
Having briefly discussed some 0% the definitioJj.s of the term statis-
tics and having seen their drawbacks we are now w. a position to give
a simple and complete definition of the term in the following words : -
Stati.rtiu (a.r lued in the .ren.re,oj data) are ,numerical .rtatement.r oj jart
rapable of analy.ri.r and interpretation and the sfienre of J/ati.rtiu i.r a .rtudy oj
thl prinripies and method.r u.red in the rollertion, pre.rentation,analy.ri.r and inter-
pretation oj nttmeriral data in any .rphm oj en(illiry.
MAIN· DIVISIONS 01' THE STUDY 01' STATISTICS
Statistics as a science can be divided into two JJlain classes, namely,
,,:,fati.rtirai mlthods and applild .rlat;.rtifl.
.6 FUNDAMENTALS OF STATl:.-rICS
t. Statistical methods
Under statistical methods are studied all those devices" rules of
procedure and ge~eral principles which are applicable to all kinds or
grou,ps of data. Thus they include all the general principles and tech-
niques which are commonly used in the collection, analysis and inter-
pretation of data relating to any sphere of enquiry. Statistical methods
are the .tools in the hands of a statistical investigator. These are devices
for achieving the desired ends explained in theory. Since a method is
always a means to an end, its acc·uracy and precision depends on thl'
object which is desired to be· achieved and this in turn is considerably
affected by the peculiar features .of the problem to which it is related.
This is the reason why different statistical methods are usc:! in different
types of enquiries and no uniform standard of accuracy is desired to
be achieved in different types of investigations.
a. Applied 8tatis~C8
Applied statistics deal with the application of statistical methods
to specific problems or concrete forms. If we have to estimate the
national income of a country or its industrial or agricultural production
then the special techniques followed to achieve these ends and the re-
sults obtained thereof would form part of applieu statistics. As IS
clear from the above explanation applied statistics can be further divideCl
into two m.ain groups. They may be either descriptive or scientific.
Dmriptive applied statistics deal with data which are known and
which naturally relate either to the present or to' the past. For example,
business statistics are descriptive applied statistics, as they deal with
the analysis, measurement and presentation of business facts relating
to past or present. On the basis of these facts decisions about various
business problems are usually taken.
Scientific applied statistics deal with the formulation of physical
and psychological laws on the basis of quantitative data collected for
descriptive purposes by the use of appropriate statistical methods. If.
for example, by the use of soine business statistics we are in a position
to derive certain conclusions, which we use for forecasting the future
trend or tendency of that particular phenomenon, we are making use
of scientific applied statistics. For purposes of business forecasting
we have to make use of such statistics.
OBJECTS OF STATISTICS
Questions
t. Explain clearly the concepts of statistics, statistical methods ...ad statistical
siences.
2. Examine the main differences hetween statistical, methods <and experimental
methods.
3. Critically CKamine the following de.6nitions of statistics: "Statistics is a.
>cience of counting", "Statistics is a science of averages", and, "Statistics is a sdene"
of the measurement of social organism in all its aspects". (B. C(IfII. Agra, 1'943).
4. Discuss the meaning and scope of statistics.
s. "Statistics affects everybody and touches life at man¥ points. It is both
ascience and an art." Explain the above statement with appropriate examples.
(B. Co",. Agra, 1946).
6. "Statistics of a business can be tre~ted scientifically and the preparation
and study of business statistics may be made a more e&act science than the study of
national and social statistics". Explain. (B. Co",. Allahabad, 1932).
7. "Science without statistics beats no fruit, statistics without science have
no root." Rxplain the above statement with necessary comments.
(M. A. P4lnfl, 1943).
8. "Statistics is co-operative counting." COInIl"ent.
9. What ate the characteristics that statistics (statistical data) possess. Explain
with illustrations.
10. What are the main divisio.ns of statistics. Illustrate with examples.
n. Write a note on the objects of statistics.
12. "Statistical methods include all those devices of analysis and synthesis by
means of which statistics are scientifically collected and used to explain or describe
phenomena either in their individua lor related capacities", Co'Dtt!ent on the above
statement.
". Explain with aIustrations how statistical methods tend to clarity of thOl1ght,
accuracy of estimates, verification of theories and discovery of relations.
(B. Co",. Agra, 1947).
14. UBy statistics we mean quantItative data affected to a marked extent by a
multiolicity of causes". Explain, (M. Co",. Agra, 1945).
IS. In whd ways can statistical methods be misused by interested persons
Give at least two caramplell of the misuse of statistics.
16. "A statistician is not an alchemist expected to produce gold from any "Vorth~
less material," Comment.
Origin and Growth of
Statistics 2
Early Beginninge
The origin of statistics is suggested by the derivation of this word.
It seems to have been derived from the Latin word stati.t which means
a political state. In fact the origin of statistics was due to administrative
requirements of the state. Statistics in the past were a by-product of
administrative activity. The administration of the states required the
collection and analysis of data relating to population and material wealth
of the country for purposes of war and finance. The earliest form of
statistical data, therefore, relate to census of population and property
collection of data. for other purp,oses, however, was not entirely ruled
out. Perhaps one of, the earliest censuses of population and wealth
was held in Egypt as early as 3050 B. C. for the erection of pyramids.
RamlSlI II conducted a census of all lands of Egypt. During the Middle
Ages such censuses were held in England, Germany-and other Westem
countries as well. In India about 2000 years ago we had an efficient
system of colleGbng administrative statistics. During the' Hindu period,
particularly during the Mauryan regime, our country had an efficient
system of collecting vital statistics and of the registration of births and
deaths. Ain~;-Akh4r; gives us a detailed account of the administrative
and statistical survc;y conducted during the reign of Emperor Akbar.
The histories of th~ other countries of the world also clearly indicate
that in ancient times statistics was regarded as a. matter connected with
the activities of the state and that is why it was known as a science of
statetUaft. The systematic collection of offiCial statistics originated in
Germany towards the end of the eighteenth century. In its earliest
form it was an attempt to assess, for political purposes, the relative
strengths of the German states by comparing population, industrial and
agricultural output. In England, statistics is a legacy of the Napoleonic
Wars. In order to raise new taxes that the cost of the war demanded,
it was found necessary to collect such facts and figures which would
enable government to have an idea about the probable revenue and
expenditure more accurately.
Sixteenth Ceutury
These spasmodic attempts made in ancient times to collect certain
facts and figures can be left out of account as in those days statistical
methods were not properly developed and 'We do not know the tech-
nique by which these figures were collected. Most of these figures
e.re not available and all that we kno'W is that such statistics 'Were collected
ORIGIN AND GROWTH OF STATISTICS
In those days. It has been only within comparatively recent times that
mankind has realised the utility and usefulness of collecting statistics
relating to the phenomena of physical and social universe. Prior to it,
the astronOl:n.~s used to record the movements of heavenly bodies like
stars and planets to foretell their position and to make forecasts about
eclipses. Tycho Brahe (1546-1601) collected valuable information about
the movements of planets and johannu K,pler made an exhaustive study
of these data and discovered the three famous laws relating to the move-
ment of planets. It was on the basis of these laws that Sir Isaa& N,w/on
formulated his theory of gravitation. Sir Frant:is Bacon (1561-1626)
was of the opinion that a proper knowledge of nature can be obtained
only on the basis of the study of data relating to various forms of nature,
and under his influence this method was adopted by scholars in various
fields. When these methods proved their efficacy in physical sciences
and when it was found that the results obtained by the use of these devices
were very accurate, social sciences like politics, econqpUcs and sociology,
all adopted statistical methods for the formulation of their theories
and for testing the degree of accuracy ~chieved by them.
Seventeenth Century
Eighteenth Century
The modern theory of statistics can be said to have been formulated
by L. A. Ji QHelict (1796-1874). He put forward the notion of 'average
man' whose actions, he stated, conform to the 'average rc;,:;ults obtained
from society.· He was further of opinion that the action and beha-
viour of other persons deviated from this form in a lesser or greater
degree and these deviations from this theoretical average were capable
of being treated by the method of errors and probability. He also
emphasised the im1?0rtance of the 'law of large numbers' which was
founded by Jacob Bernoulli.
In fact the science of statistics is highly indebted to the games of
chance. G. Cartlano (1501-1536) who was a great mathematician and
at the same time a big gambler also, wrote a valuable treatise on the
hazards of the .game of chances and he pronounced certain rules by which
the risks of gambling could be minimized and one could protect him-
self against cheating. These rules were based on the correct approach
to the problems which we, in modern times, study under the theory
of probability. Jacob Bernoulli and his nephew Daniel BernoHIIi (1700-
1782) laid a solid foundation of the theory of probability and put forward
the idea of 'moral expectation'. It was after this that Pierra Silllon de
Lapl(lce (1749-1827) published in 1782, his monumental work on the
theory of probability. This work is recognised as one:: of the best ever
done on the subject of probability. It is both mathematical as well as
philosophical. Later on most of the prominent mathematicians of
the eighteenth and nineteenth centuries like Moillre, Fiuier, Lagrange,
Chrystal, Btges, TodhHnter, GaHss, MorgaH, Lexis and Charlier, to mention
only a few names, contributed to the subject to probability.
Nineteenth and Twentieth Centuries
On these foundations laid by the mathematicians of the eighteenth
and nineteenth centuries modern theory of statistics' was gradually built
up. G. F. Knapp (1842-1926) and W. Lexis (1837-1914) contributed
valuable works on the statistics of mortality. Sir Frands Galton (1822-
1911) was the first to introduce statistical methods in the field of bio-
metry. Later on Karl Pearson took up this chain and his work on the
subject is too well known to need any detailed description. In the words
of Pearson himself, "the whole problem of evolution is a problem of
vital statistics, a problem of longevity, of fertility, of health, of disease
and it is impossible for the evolutionist to proceed without statistics as it
would be for the Registrar General to discuss the National Mortality
without an enumeration of the population, a classification of .deaths and
a knowledge of statistical theory."
It 'Was in the second half of the last century and in the present
century that statistical methods entered the realm of the science of eco-
nomics and became intimately associa~d 'With the ancient subject of
mathematics. Though relationship of statistics and mathematics is
very old yet it is only during the last tOO years or so that the two sciences
have come very ~ose to each other. In recent years the domain of
ORIGIN AND GROWTH OF STATISTICS 11'
statistical methods has considerably widened and today there is hardly
any science which does not make use of statistical methods. The science
of statistics is now associated with all other sciences in some form or
the other and we shall now study the relationship of statistics with other
sciences particularly. with economics and mathematics. For the past
two decades particularly there has been a remarkable and sustained
growth in the use of statistics. This is because business, government
and science, three fields in which applications of statistics are most nu-
merous and di\'erse, are growing in volume and complexity. It is
also because of the technological revolution which has taken place in
data handling, affecting especially computing and tabulating equipment,
and a scientific revolution in statistical theories and techniques.
RELATIONSHIP OF STATISTICS WITH OTHER SCIENCES
Statistics and Economics
Though the relationship of statistics ..,-ith economics dates back to
1690 when Sir William Petty published his book named "Political Arith-
metic" yet the relationship of these two sciences became intimate rather
very late. No doubt statistical data ab9ut economic problems used to be
collected in the past but there was no relationship between statistics and
economic theory. In earlier stages of development the science of eco-
nomics was based on deductiol. and the predominance of deductive
approach was responsible for the disinterest of economists towards
quantitative data for purposes of the development of economic doc-
trines. Besides this, there was also a tendency in those days to avoid
figures which were considered to be lifeless, rude and coarse. What
was responsible for this peculiar disposition to figures in those days if,
difficult to state. It is a fact that people wanted to avoid rude shocks
which awaited them in the world of facts and always wanted to be vague
in their statements and logic. Gradually this hatted for figures melted
away and even deductive writers like J. S. Mill admitted that "in some
cases instead of deducing our conclusions from reasoning and verifying
them from observations we begin by obtaining them provisionally from
specific experience and afterwards connect them with the principles of
human nature by a priori reasoning." Similarly in 1871 W. S. lepons
wrote that "the deductive science of economy must be verified and
rendered useful from the_purely inductive science of statistics. Theory
must be invested with the reality of life and fact. Political economy
might gradually be erected into ,the exact science, if only commercial
statistics were far more complete and accurate than they are at present
so that the formulae could be endowed with the exact meaning by. the
aid of numerical data. Jevons developed the technique of an~ysis of
time-series and was the pioneer in the field of price studies and index
numbers. Rightly he has been called the 'Father of Index Numbers'.
Besides Jevons the Historical School (1843-1883) also brought statistics
and economics close to each other. In fact Roscher, Knies, and Hilde,.·
brand, all were of the 'opinion that economic doctrines should not be
argued in the abstract and that they should be inductively verified. The
12 FUNDAMENTALS OF STATISTICS
·effect of the preachings of Historical School was indeed very great and
the science of economics no more remained merely deductive in its
approach. By th~ time the .present century began, much of the opposi-
tion to .the use of .statistical methods in the realm of economics had
elided and in 1907 Ai/rId Marshall could write, "Disputes as to methods
have ceased. Qualitative analysis has done the greater part of its work ...
··that is to say, there is general a~reement as to the charactc.cistic and
duration of the changes which varIOUS economic forces tend to produce.
Much less progress has been made towards the quantitative determina-
tion of the relative strength of different economic forces ...... that higher
·and more difficult tasks must wait upon the slow growth of thorough
realistic statistics." At the same time Pareto wrote, ':The progress of
political economy in the £uture will depend in great part upon the in-
vestigations of Impiri&a/laws derived from statistics which will then be
compared with known theoretical laws or will suggest derivation from
them of new laws." Later on Lord KeylUs writirig abaut the functions of
statistics w[Ote that it is "first, to suggest '~f::al /aWt, it mayor may
not be capable of subsequent deductive exp tions; and. secondly, to
supplement deductive reasoning by checking its resu,lts and submitting
them to the test of experience." Now there are no tWo opinions about
tIie fact that both induction and deduction are necessary for the growth
and development of economic science. In fact statistics and economics
are so intermixe~ with ea~ other now that the question of th~ir separa-
tion does not arIse. .
Fa&tors responsibl, for &Ioser lies b,/ween Itonomiu aIId sfa(isliu. Since
1890 two factors have worked together to bring about this great change in
the relationship of statistics and economic:s-. 'the Brst is the develop-
ment of statistical methods-of probability G:dd -sampling, simple and
partial correiatj9n and association, periodicity'-an<l index Jl11ID.bers, etc.,
the second is the enlargement of statistical material in recent years. In
fact during this period various eminent statisticians like C. B. Datl,nporl.
A. L. Bowley, W. Pearson. W. I. King and R. A. Fisher. etc. have made very
valuable contributions towards the developments of the science of statis-
tics. During this period the statistic~ data have also increased in
quantum allover the wo.rld on account of the establishment of statistical
bureaus in various countries. Tpe improvement of statistical methods
and the expansion of statistical data have thus brought economics and
statistics very close to each other and have marked the real. inception of
statistics in the domain of the science of economics.
Statistics, economics and mathematics
It has already been mentioned above that statistics and mathe-
matics have been closely in touch with each other eve.r since the seven-
teenth century when theory of probability was found to have bearing on
various. Cltatistical methods. During the last 100 years or so not only
statistics and mathc;.matics have come very close to each other due to the
dc;velopment of mathematical statistics, but these sciences have been
joined by economics as wells and now there.is a happy union between
statistics, economics and mathematics, Mathematics has considerably
'-
OllIGIN AND GllOW'I'H OP S'l'ATISTICS
study of the significance of these deviations has also to be made for various
purposes. All this cannot be done without the use of statistical methods.
We thus find that the science of stat1stics helps meteorology in a large
number of ways.
The above account of the origin ~d growth of statistics clearly
reveals the fact that the great science of statistics is associated with all
the other important sciences both physical as well as social. In fact
today the domain or statistics' is very wide, it is almost universal and
it is difficult to imagine any science worth the' name where statistics has
not proved its usefulness in some form or the other. Bowley was right
when he said, "A knowledge of statistics is like a knowledge of foreign
language or of algebra; it may prove of use at any time under any cir·
cumstances. "
Callser of the recent growth of Statistiu. The tremendous growth in
the use of statistics, l'.S has been shown above can be attributed mainly to
two factors, "i~. :-increased demand of statistics and decreasing cost of
statistics.
(I) Increased dlmand. There has been a phenomenal increase in the
demand. for statistics in various fields. Statistics are most commonly
used by businessmen, government and scientists. The spheres of the
activities of all these three categories have increased extraordinarily in
modern times. The magnitude of business has considerably increased
resulting in an increased demand of statistics. The business in modern
times has become a very complicated affiUr and this fact has further aug-
mented the demand of statistical data. The complexity in business is
on account of numerous government regulations, laoour disputes, ever·
increasing taxes ~d technologjcal revolution which the business world
has witnessed in recent years.
Even more than business activities, the activities of the government
have incJ;eased both in size a.II well as in complexity. Modern states are
welfare states and they have to look after a large variety of things result-
ing in an increased demand of statistics.
Probably the most spectacular development of modern world is the
growth of scientific research. Science today is a very complex pheno-
menon and different types of researches in the field of science are of an
e~emely complex nature and they make an extensive use of statistical
data. We thus find that the demand of statistics has considerably in-
creased and this is one reason why the science or statistics is developing
so fast.
(;/) Decreasing Cost. Another reason why the science of statistics
has developed so fast and has become so popular is that on account of
a number of reasons the cost and the time required for the collection and
analysis of data have gone down. There has been a vast improvement in
the technique of processing the data which has resulted in great economy
of both time and cost. Modern computing and tabulating machine!:
not only save time but money also. The development of electronic
calculators and other modern machines like desk-calculators and card
ORIGIN AND GROWTH 010 STATISTICS 15
sorting machines etc., have made the task of scientists, businessmen and
administrators very easy and simple. They have resulted in a very great
economy both in terms of money as well as the time needed to do a job.
Statistical theory has also developed in modern times in such a
manner that the cost of compilation of statistical data has gone down
considerably. The theory of sampling and various designs of experi-
ments and statisticallJ.uality control have all contributed towards lower-
ing the cost of collection and analysis of statistical data.
Questions
I.Write a shott essay on the origin and growth of the science of statistics and
throw light on its future.
2. &plain the relationship between ~conomics and statistics.. How far has
the use of statistical methods in economics led to its development ?
• (B. Com. Lt«kf/4flJ 1941)
,. "Statistics are the straw out of. which every other economist has to
make the bricks." (Marshall).
B'It()lain, in the light of the above observation,the relation between ec)l1omics
alld statistics and discuss how far it is correct to say that the science of economics is
becoming statistical in its method. (M. Com. Allahabad, 1944).
4. Trace the association of mathematics with the science of statistics and show
how the former has considerably helped the development of the latter.
s. Discuss the relationship between statistics and various soclal sciences.
6. Do you think that statistical methods are of any help in physical scJences?
If 80, how?
7. Write a brief essay on the relationship of economics, statistics and mathe-
matics.
8. Show how the science of statistics which was originally the science of state-
craft has now become the sclence of universal application. Do you think that statistical
methods are in reality applicable to all types of sciences ?
9. How far has the growth of statistics coincided with the development of
physical and BQ.cial sciences ?
10. "Statistics is an apparatus by the help of which the validity of the laws of
physical and social sciences can be tested". Comment.
II. Discuss the factors responsible for the quick development of statistics in
recent years.
Importance, L imitation and
Functions of Statistics 3
Sflltisnrs and th, coml11on man. The fact that in the modern world
statistical methods are of universal applicability, is in itself enough to
show how important the science of statistics is. As a matter of fact
there are millions of people all over the world who have not heard a
word about statistics and yet who make a profuse use of statistical me-
thods in their day-to-day decisions. Statistical methods are common
ways of thinking and hence Rre used by all types of persons. When
a .person wishes to purchase a car or a radio and he goes through the
price lists of various companies and makers to arrive at a decision, what
he really aims at, is to have an ideaabout the average level and the range
within which the prices vary, though he may not know a wQtd about
these terms. When a farmer wishes to have a particular quantity of
tain in a p~ticular season so that he may have a good crop, he has in fact
an idea of the correlation that exists between rainfall and crop yields and
the regression line of. crop yields on rainfall. Again when we use a
common proverb ·'as you sow, so you reap" we indirectly pint that there
is a positive correlation between one's actions and achievements.
Examples can he multiplied to show that human behaviour and
statistical methods have much in common. In fact statistical methods
are so closely connected with human actions and behaviour.that practically
all hvroan activity can be explained by statistical methods. This shows
how important and universal statistics is.
CAUSES OP nIB IMPORTANCE OP STATISTICS
Simplifies Gomplexi(J. One reason why statistics is so important
today is that it simplifies complexity. Human mind is not capable of
assimilating huge facts and figures, and statistical methods, by making
these data easily intelligible and readily understandable render a great
service, because in its absence the information 'Would not have been
of any use. Statistical methods describe a phenomenon in a very simple
fashion. If, for example, we have to study the economic system of
Soviet Russia we cannot properly understand it by a purely descriptive
method in which no statistics are used, but if the different aspects of tho
economic system are numerically eXpressed we can und~rstand the whole
thing in a short time and in a better manner.
'M£asures·rU1IIts. Similarly if we have to measure the results of
particular policy it can best be done by statistical methods. If we have
to study. for example, the effect of a rise in the bank rate on the industries
of a country we c~n do so in a proper manner only by, means of a statistical
IMPORTANCE. LIMr1'ATIONS AND FUNCTIONS OP STATISTICS 17
They tell us about the' volume of business done in a country and the
amount of money in _circulation. Distribution statistics disclose the
economic conditions of the various classes of people. They throw light
on the distribution of national dividend amongst the inhabitants of a
country. We thus find that in all types of economic problems statistical
approach is essential and statistical analysis useful. Mathematics and
"its offsprings, statistics and accounting, are the powerful instruments
which the modern economist has at his disposal, and of which business
through the development of research agencies and JIlethods. is 'making
constantly greater use.
Need in planning. Modern age is an age of planning. The days
of laisse~ faire are gone and state intervention in practically all aspects
of life has become universal in character. Today. we live in ~ period of
transition; economic activities are being more and more closely directed
to the production of such goods, and the provision of such services, as
the government may decide to be most urgently required~. Our future
is 'Very largely being pla111led, and this planning, to be successful must be
soundly based on the correct analysis of complex statistical data. When-
ever we thiuk of a plan we have to think of statistics. Planning cannot
be imagined without statistics. If we study the economic plans imple-
mented in various countries in recent times we will·find that all of them
are a statistical study of the economic resources of the respective countries,
and they suggest possible ways and means of utilising these resources
in the best possible manner. Various plans that have bTen prepared
for the economic development of India have also made \1se of the statis-
tical material available about various economic problems. The fact that
in our country the amount of statistical material available to the planners
has been very scanty, is responsible for many drawbacks and inaccuracies
in different plans. Not only plans of economic development are construc-
ted on the basis of statistical data but the success that a plan achieves is
also measured best by the use of statistical apparatus. We thus find that
in the field of economic planning the use of statistics is indispensable:
Usefulness in commerce. .~tatistics are an aid to business and com-
merce. In fact today the situation is, that a businessman succeeds or fails
according as his forecasts prove to be accurate or otherwise. When a
man enters business he enters the profession of forecasting, because success
in business is always the result of precision in forecasting and failure in
business is very often due to wrong expectations,. which arise in turn due
to faulty reasoning and inaccurate analysis of various causes affecting a
particular phenomenon. Modern devices have made business fore-
casting more definite and precise. Economic barometers are the gifts
of statistical methods and businessmen all over the world make extensive
use of them. A producer estimates probable demand of his goods, ana-
lyses -the effects of trade cycles and seasonal variations as also of changes
in habits and customs of people on the demand of his wares, and after
taking all these factors into consideration finally takes decision about the
quantum of production. A businessman who ignores the effects of booms
and depressions can never succeed and is bound to face frustrations as his
IMPORTANCE. LIMITATIONS AND FUNCl'IONS OF STAl'ISnCS 19
was generally believed that the Indian people wanted to resume tighting
again. A poil of public opinion carried out by a leading newspaper
r::vealed the following result :
Yes No No
Are ynu in favour of another round opinion
of fighting with Pakistan. 65 25 10
Uses in War
(i) Active lead by OJlicers. A statistical analysis of the Indian and
Paklstani casualties during the Indo-Pakistani War of September 1965
re vealed that the proportion of officers among those killed was higher
ou the Indian side. This showed that the Indian armies were actually
led by their officers and this was one of the important far.tors responsible
for Indian victory. This factor will assume importance in the formula-
tion of future war strategy.
(ii) Training in the Use of War Eqllipment. The heavy reverses
suffered by Pakistan during the above War, despite its vastly
superior Air Force and armoured Corps came as a great surprise to the
whole world. Statistical analysis with its causes revealed that a high
. and positive correlation existed between the _period and intensity of
training in the use of aeroplanes and tanks and their effective use ill war.
A further investigation into the period and intensity training provided
in both the countries revealed that Pakistani failure td make an effective
use of its fighters. bombers and tanks was due to inadequate and inferior
training of its personnel
(iii) Inspection ofpurchases. During the war, military requirements
of goods and commodities increase tremendously. Complete inspec-
tion of each and every item involved huge expenditure and time of a
large number of personnel and it can also not be done expeditiously.
Here statistics come to the help of the army. The use of sampling ins-
pection method helps not only in its quick disposal but also gives accurate
results. Under this method, only a few items, say 2 per cent. are selected
on random sample basis and thoroughly inspected. This method is both
cheapet and expeditious. It also ensures accuracy as it is easier to ins-
pect more closely a few rather than a large number of items.
LIMITATIONS OF THE SCIENCE OF STATISTICS
and yet many persons in the group might have become poorer than what
they were before. Statistical methods ignore such individual cases.
Shifting of Definition
I
(I) Monthly and Hourly Wage Rates. A firm had introduced pro-
ductivity methods with the result that productivity had increased. Since
the demand for its product was inelastic and labour laws did not permit
retrenchment, it decided upon reducing the working hours. As a
result, the monthly rates of wages could be increased only marginally.
A dispute arose between labour and management. The contention of
the labour was that despite significan~ increase in productivity, the wage'S
had increased only marginally; and in support of its argument, it de-
monstrated monthly wage statistics. The managements' argument wab
just the opposite. It maintained that the increase in wages has been
commensurate with increases in productivity; and in support of its
contentiori, it demonstrated average hourly wage statistics. Both the
labour and management were right. It depends on the definition of
wages which is lfdopted. The labours' definition will be considered
more apprQpriate "When wages are viewed as income of workers; and tha t
of management will be more appropriate when wages are viewed as cost
of production.
(il) BPlplayment of Women. The census of 1961 showed that the
percentage of working women in India had increased from 23.30 in 1951
to 27.96 in 1961. It might be concluded from this that the female labout
participation ratio increas-ed sig!,lificantly during the decade. But as a
matter of fact, a major part of this increase was'due tv the mclusion of
u11paid family workers and hOl,lsewives under the nOII)enclature 'workers'
IMP(,RTANCE, LIMlTATIONS .AND FUNCTIONS OF STATISTICS 27
Inappropriate comparison
(1) Deaths in Hospitals. The statement that 'the incident of death
among sick persons is higher in hospitals than at home' is likely to lead
to the conclusion that more patients die in hospita1s than at home due
to lack of proper treatment and care. But this conclusion .turns out to be
completely erroneous if it is borne in mind that in India only seriously
ailing persons are hospitalised.
(il) It was claimed by a teacher that his teaching method was
superior to that of others. He supported argument by showing that all
the students in his .class secured first class: Investigation into the
matter revealed that unlike' others, all his students had secured first class
in previous examination_ and were merit holders. His success was,
therefore, due to better stuff in his class rather than to the superiotity of
his teaching method.
DIS1!RUST OF STATISTICS
and should work like a true re!ltarcher without any prec<1nceived notions
or conclusion about the problem under investigation. It should not be
forgotten, as W. 1. King said that "statistics is a most useful servant but
only of great value to those who understand its proper use.'r
Questions
Questions
1. Discuss the preliminary steps which should be taken before commencing
the work of 'collection' of data.
z. Why is it necessary to determine the object and scope of the enquiry before
planning an investigation i'
3. What is a statistical unit? Is it necessary that the data be homogeneous i'
(B. Com. Agra, 1939).
4. What steps would you take to organise an economic survey of a typical
Indian village?
5. Describe the various stages in conducting a primary economic investigation.
What precautions will you take at each stage i' (M. A. &IJ Punjab, 195 0 )'
. 6. Wh~t is meant by (a) units of collection, and (b) units of analysis? Explain
theIr respective uses. /
7. Differentiate between simple and composite units. Give examples of each.
8. Write a note on the purpose and utility of planning a statisticll investigation.
9. What is meant by degree of accuracy? How should it be determined jI
10. Distinguish between primary and secondary data. 111u~trate your answer
with examples.
Collection of Primary and
Secondary Data 5
Primary and secondar'y data. After the preliminaries discussed in
the last chapter have been gone through, the task of the collection of
data begins. Statistical data, as we have already seen, can be either
primary or secondary. ,Primary data are those which are collected for
the first time and are thus original in character, whereas secondary
data are those which have already been collected by some other per-
sons and which have passed through the statistical machine at least
once. Primary data are in ,the shape of raw materials to· which statis-
tical methods are applied for the purpose of analysis and interpreta-
tion. Secondary data are usually in the shape of finished products since
they have been treated statistically in some form or the other. After
statistical treatment the primary data lose their original shape and become
secondary data. On a closer examination it will be found that the dis-
tinction between primary data and secondary data in many cases is one
of degree only. Data which are secondary in the hands of one may be
primary for others. Statistics of agricultural production are secondary
data for the Agriculture Department of a Government, but for the pur-
pose of calculation of national income these data are primary, because
they will have to go through further analysis and their shape will not
remain the same.
Factors affecting choice of method. It is obvious that the methods
of the collection of primary data and secondary data would not be exactly
identical because in one case the data have to be originally collected
while in the other the work is of the nature of compilation. There are
various methods of the collection of primary and secondary data and the
choice of the method depends on a number of factors. Nature, object
and scope of the enquiry are the most important tbings on which the
selection of the method depends. The method selected should be
such that it suits the type of enquiry that is being conducted.
Availability of finance is another factor which influences the selec-
tion of the method of collection of data. When financial resources at
the disposal of the investigator are scanty he shall have to leave aside
expensive methods even though they are better than others which are
comparatively cheap.
Availability of time has also to he taken into account. Some methods
involve a long duration of enquiry while with others the enquiry can be
conducted in a comparatively shorter duration. The time at the disposal
of the investigator thus affects the selection of the technique by which
data are to be cotlected.
42 RUNDAMENTALS OF STATISTICS
By local reports
The last method of collection of primary data is through local
reports. In this method data are not formally collected by enumerators
but by the local correspondents or agents in their own fashion and to
their own likings. Obviously such data cannot be very reliable and
as such this method is used in those cases where the purpose of in'{es-
tigation can be served with rough estimates only and where a high degree
of precision is not necessary. This method has the advantage of being
least expensive and it also saves the botheration usually associated with
statistical investigatioq of other types.
REpRESENTATIVE DATA
As has been pointed out previously a statistical investigation can
be either of census type or of sample type. In a census enquiry all the
units assoCiated with a particular probl~m are taken into account where-
as in sample enquiry only a few selected units are studied and on the
basis of such studies attempts are made ~o draw generalisations which'--
may be applicable to the whole data. If, for ·example, we have to find
out the average monthly expenditure of the 2000 students residing in the
hostels of the Allahabad University and if we hold a census investigation
we shall have to study the monthly expenditure of each one of these 2000
students. If,. however, we hold sample investigation we shall select say,
200 students out of these 2000 and then study their expenditure. On the
basis of the study of these 200 units (techOlcally called a "sample") we
can draw conclusions which will hold good about the expenditure of all
~he 2000 students (technically called a "universe" or' "population").
The sample is considered to be a representative of universe and if the
sample has been properly selected and if its size is all right. whatever
holds good for the sample should also hold good for the universe. If
the scope of the enquiry is very wide a census investigation would not
only be-very expensive but highly cumbersome also. Moreover·it will
take a very long time and require a large number of enumerators. In
such cases a sample investigation is very suitable. A sample usually
gives representative data and the generalisations made on the basis of
such data usually hold good for the universe.
The most important point, however, is the Sel,ttlon of th, sampl,.
A sample study would give dependable conclusions only if the sampfe is
a true representative of the universe. Broadly speaking there are two
methods by which samples can be selected and they aro:-
(1) Deliberate or purposive sampling,
(2) Random or chance sampling.
Deliberate selection or purposive sampling
In deliberate selection or purposive sampling the investigator him-
self cho~ses from the uni\rerse few such units which according to his
estimates are best representatives of the population. His selection is
I For a detailed study see chapters on Sampling.
46 PUNDAMENTALS OF STATISTICS
any king is clearly 4/52 and the chance of its being any card of spade is
13/52. This clearly indicates that if the chances of selection of all the
units in a universe are equal, and if from it, selections are made at ran-
dom, then the possibility is, that in the sample so selected the various type
of units would be in the same proportion in which they are in the universe.
On this basis it is said that random sampling gives a representative sam-
ple which contains the characteristics of the populatlOn. Further, as
has been pointed out earlier, the size of the sample and its accuracy are
also related. In ten tosses of a coin it is not unlikely that seven times it
falls heads and only three times tails. But if there are a 100 tosses there is
a greater chance of heads and tails being equal. If the number of tosses
is 1000 the chance of equal distribution of heads and tails is still greater.
The bigger the size of the sample the greater is the chance of accuracy.
Law of statistical regularity. Thus according to the rules of the
theory of probability, if from the universe a moderately large sized sample
is chosen at random, it is almost certain that on an average the sample so
chosen will have the same characteristics as the universe. It is on this
basis that games of chance are played successfully by a large number of per-
sons and the insurance companies are able to insure people against varlOUS
types of calamities. In statistics this law is known as the "Law of Statis-
tical Regularity. It is a corollary to the mail} .theory of probability.
The theory ofp,.obability tells us of the mathematical expectation of the success Dr
failure of an event and on this basis the law of statistical regularity tells us that
random selection from the universe is very likely 10 give a representative sample.
Law of inertia of large numbers. We have men'tioned above, that
there is a relationship between the size of a sample and its accuracy.
The larger tht. sample the greater would be the accuracy. The reason
for this lies in the fact that in large numbers the chances of compensatory
action are greater. If in the first ten tosses of a coin there are seven heads
and three tails, it is quite likely that in the next ten tosses the situation
might be reversed and there may be seven tails and three heads. The
larger the number of such experiments the greater are the chances of
one irregularity compensating the other. It is said on this basis that
large numbers have got an inertia or that they are more constant. The
production of wheat in the 'district of Allahabad might show great varia-
tions year after year but the production figures of the state ofU. P., would
not. vary much, because if in some districts the crop is above normal it is
very likely that in others it might be below normal. Similarly the pro-
duction figures of wheat for the whole of India whould show still less
variations and the figures of world production would show hardly any
significant change. This phenomenon is characterised as the "Law oj
Inertia of Large Numbers" which states that large numbers are relatively
more constant and stable than small ones. It is on the basis of this law
that we say that larger the size of the sample the greater would be its
accuracy.
It should not be concluded from the above discussion that the law
of inertia of large numbers does not allow any change in figures with the
passage of time. All that it means is that large numhers are more constant
COLllECTlON OF PRIMARY AND SECONDARY DATA 49
and stable than small ones. There are no violent fluctuations in large
numbers. After all the figures of world production of wheat do change
from time to time but these changes are not violent and sudden. They
are slow and gradual. Long-period trend is indicated by large numbers:
they simply ignore the short-period regular and irregular fluctuations.
COLLECTION OF SECONDARY DATA
Soqrces of secondary data
We know that secondary data are those which have already been
collected and analysed by someone else, and as such the problems asso·
ciated with the original collection of data do not arise here. Secondary
data may be either published or unpubli~hed. The sources ofpllblished data
are usually : -
(0) Qfficial publications of the central, state and the local govern-
ments.
(b) Official publications of the foreign government or interna-
tional bodies like the United Nations Organization and its
subsidiary bodies.
(c) Reports and publications of trade associations, chambers of
commerce, b~nks, co-operative societies, stock exchanges, anc
tnlde unions, etc.
(~ l'echnica~ tiade journals like the Economica, Indian
Journal of Economics, Commerce, Capital, etc., and books
and newspapers.
(t) Reports submitted by economists, research scholars"university
bureaus and various other educational associations, et~.
The .fOliren, of ilnpllbli.fhed data are varied, and such materials may
be found with ~cholars and research workers, trade associations, cham-
bers of commerce, labour b~eaus, etc. Many enquiries of a private
nature are conducted by these bodies and these findings are not pub-
lished and are usually ineant for the conswnption of their members only.
12.. What is'sampling' and what are its uses. Expltin how would you design
a sample survey to estimate an average size of holding in locality.
(M. A. A".4. 1947).
13. "It is never safe to take published statistics at their face value without know-
ing their meanings and limitations and it is always necessary to criticise the arguments
that ~n be based on them." (BollPlt}!). Elucidate. CB. Com. Allahabad, 1946).
24. Why is it neeessaey to sctutinizc and edit secondary data before its usc?
What' precautions would you take before ',sing such statistics ?
IS. Write short notes on :
(a) Theory of Probability.
(b) Law of Statistical Regulatlty.
(I) Law of In.ertla of Large Num~ets.
2.6. "In any sample survey there arc many sources of errots. A perfect survey'
is a myth". Discuss the ~tatement.
z7. Suppose you we-nt to study the changes in the e#cnt of indebtedness of
middle-class people of Allahabad for the next five' years. 'How would you proceed
to do it 7 Explain all the protesses. -- (8. Com. BtlnOral, 19S5).
z8. Descrlbe the procedure you wouJd adopt In order to obtain the necessary
Information for introducing compulsory primary education in a big city.
(B. Com. Btztloral, 19'2.).
19. "Statistics, especially other people's statistics, are full of pitfalls for the user".
(Conner) Do you agree with this statement ? '
50. "Samples arc devices for leaming about large maS$es by observ"jng a few
individual..... (Sneti~_).
Elucidate the above statement.
31. How would 70U conduct an enquiry about 'Payment of Wa~ in an in-
dustry P On what pOlOts would it be necessary for you to he clear before actually
beginning investigatIon work? (M. Com. Agra,19S7)'
31. How would you organise a marketing survey of the fruit trade in a particular
region wIth a view to making suggestions for its development? Explain the pro-
cedUre you Would fol~ow step by step. (M. Com. Agra, 1956).
Accuracy. Approximation
And Errors 6
Btlitin,g oj data. After collection of data the next step in a statistical
investigation 15 the ·scrutiny of the Ct?llected figures. This is technically
called ;tlitiltg of data. It is a necessary step as in most cases the collected
data contain various types of mistakes and errors. It is quite likely
that some question has been misunderstood by ~he informants, and if it
is so, this part of the data has to be collected afresh, or it may be, that
answers to a particu1a.s: question are, in general, vague, and it is difficult
to chaw inferences from them, or some of the schedules and question..
naires are so haphazardly blled that it is necessary to reject them. It is
also likely that some of the investigators were biased and the answers
&ned by them or the data collected by them show unmistakable signs of
their prejudices. In all such cases the collected data have to be edited
and modified. However, it should be, clearly understood that undue
tampering of data should never be doae. If only a few schedules are
defective they can be omitted but this too should be done very carefully.
,"In some cases the omission of a few schedules would not affect the general
conclusions, while in others this may entirely change the complexion
of the problem under study. As has been pointed out earlier, absolute
accuracy is neither 'possible nor essential but decision about the extent
to which irutccuracles, approximations and errors can be allowed, is a
very important step in statistical analysis and we shall study these things
in the fOllowing pages.
ACCURACY
coin may fall heads in:3 tosses,out of 4 but in 3000 tosses the number of
heads and tails are bound to be more or less equal. There is a general
tendency everywhere to give ages in round figures. It is another
example of unbiased error. If some people have, in this process, over-
estimated their ages, others might have under-estimated them. A person
of29 years of age may call himself of 3Q but it is also likely that a person
of 31 years may call nimself of 30, and in such a case the errors cancel
each other.
The following table will illustrate the characteristics of the biased
and unbiased errors : -
TABLE I
Bialed and u'lbialed e'-f'()'-J
Exact number
Correct to
nearest
I Absolute
"error'"
Correct to
next 1000
Absolute
error
1000 unbiased and over biased
50,241 50 +241 51 -759
60,507 61 -493 61 -493
49,361 49 +361 50 -639
61,427 61 +427 62 -573
53,764 54 -2.36 54 -236
48,090 48 + 90 49 -910
50,460 50 +460 51 -540
96,670 97 -330 97 -330
I
60,250 60 +250 I
61 -750
Total 5,30,770 I 530 +770 536 -5230
When figures are estimated correct to the nearest thousand the
error is an unbiased one. The unbiased absolute error in the above
ngures, as shown in column 3, is only 770 and the relative error is
5';~,~70=0.001453. The errors are negligib1~.
When figures arc estimated correct to the next one thousand and
over, the error is a biased one. The biased absolute errOl in the
above case is - 52.30 as shown in column 5 and the relative error is
5~~70 =0.00975. These errors are comparatively much more than
in the previous case and cannot be safely ignored.
Brrtt,-I in !lIliltiplication, dir·jIion, ete. However, it should bt:'
remembered that neithet are unbiased errors always compensatini>:
'nor biased errors always cumulative. Where items have to be added
together biased errors would no doubt be cumulative and unbiased
ones compensating; but where items have to be subtracted the situatio.lll
is just the reverse, and biased errors would be smaller in size than the
unbiased ones. If ~'o items arc multiplied together unbiased errors
60 FUNDAMENTALS OP STATISTICS
would give a better estimate than the biased ones. But if the items
are divided and the algebraic signs of the two figures are the same
(as is the case in biased errors) the result would be quite close to the
true valu~ ;and if the signs are opposite (as is the case in unbiased error$)
the reo;ults would be away from the true value. In other words, ordina-
rily, unbiased errors ar.e compensating only when items have to be added
or multiplied but when the items have to be subtracted or divided
biased errors would give results closer to the true value than the; results
given by unbiased errors. .
These points can· be illustrated as follows : -
True Value Estimated value with Estimated value with
biased error unbiased error
(a) 100 99 99
(b) 200 197 202
(i) Biased errOl in-(d)""l ana unbiased euo! c= 1
(ii) Biased errOl in (b) -3 and unbiased etror "" - 2
(iii) Biased ~rror of (a+b) or 300 -= (300 - 296) or 4 and
unbiased error (300-301) "'" - 1
(iv) Biased errOl fo! (b-a) O! (100) ... (100-98) "" 2 and
unbiased error-(100-103) -3
(v) Biased error for (axb) or 20,000=(20,000-19503)-497 and
the unbiased error -(20,000-19.998) ==21
(VI) Biased error for (b+a) or (200+ 100) or 2 ""
197) . ( 202)
( 2- 99 ... 0.01. and the unbIased error -2- 99
--0.04.
Thus it is clear that in addition and multiplIcation the biased
errors are more than the unbiased ones whereas in subtraction and
division the position is reverse and the unbiased errors are more thaD
the biased ones.
Estimation ot errors
In most of the statistical investigations in actual practice the exact
figures or the true values are not known. In such cases we cannot
measure the absolute or the relative error. But it is possible to estimate
them.
EsJimation of IInbiased e"Orl. Unbiased errors can be estimated
without much difficulty in most of the cases. In the illustration in
Table I if the actual figures were not known. all we could say was,
that the total of the figures (correct to nearest 1000) was 5,30,000.
If the absolute error in the above figures, is to be estimated then in
each of the nine items it can range between 0 and 499. It will be
zero if the actual number was in exact thousands. and in such a case
the actual and the approximated figures would be the same. The
maximum error in any figure can be 499 because the approximated
figure will be discarding all numbers less than 500 and adding all
ACCURACY, APPROXIMATION AND ERRORS
Questions
1. Write a note on the c;ditlng of primary and secondary data for the purposes of
analysis and interpreta~lon.
2. The statistician who desires to safeguard. his analysis and result8 from im.
perfections entering at the very start should rest his choice among sources upon a test
of reliability rather than upon accessibility and convenience.
Expaod this statement so as to bring out clearly the way in which sources should
be used. eM. Com. LtlcJ:nolP, 1943)'
3. Discuss the standard of accuracy required in statistical calculations. To what
extent should approximations be used? (M. A. Agra, 1949)'
4. What precautions should be taken in the use of published statistics.
I (B. Com. Agro, 1949)'
5. Mention the advantages of approximation of Statistics. What degree of
accuracy is generally required in each statistical investigation?
(M. Com. Rajpulono, 1951).
6. What are the different ways of approximating figures ? Discuss the merita
of each.
7. To what extent call figures be safely approximated in statistical analysis?
How should such ligures be written i'
8. (0) Discuss the sources of errors in statistics and their effects.
(b) State the important methods of approximation and their utility in
statistics. (B. Com. Agra. 1940).
9. In what way does a statistical error differ from a 'mistake? What classC1I of
¢uorsarethere and how may they be measured? (B. Com., Allababad, 1943)'
10. Discuss the various types of errors likel y to creep into statisl:ical investigations
and suggest how to avoid or correct them. (B. Com. Agro, 1949).
. u. Of the biased errors the statistician should have none : but of the unbiaaed
ones the more the merrier, notwithstanding that they are also errors. Elucidate'.
12.. In framing statistical estimates we are not so definite as the Modem Traveller
who:
........ knew the weather to a T.
Longitude to a degree.
The Latitude exactly,"
Explain the bearing of the above, on the degree of accuracy desired in statistical
estimates as distinguished from the estimates of the more exact sciences.
eM. A. PlInjab. 195Z).
15. Show how biased errors are generally cumulative and unbiased ones com-
pensating. Are there any exceptions to this general rule?
14. Discuss the various methods of estimating biased and unbiased errors botb
abSolutely and relatively.
1 S. Distinguish between
(a) Absolute and relative errors and
(b) Biased and unbiased errors.
Discuss the effects of these errors and explain the steps that are taken to meet the
effects. (B, Com. Agra, 1938).
,
Classification, Seriation and
Tabulation
7
CLASSIPICATION
Need "nd meaning
The data which are collected or compiled in accordance with the
rules and methods discussed in the preceding chapter are usually very
voluminous and large in quantity. As such they are not directly fit for
analysis or interpretation. If, for example, the figures of the expenses
of 2,000 students residing in Allahabad University hostels are before
us, as collected, it would not be possible to draw any inferences from
them because for purposes of comparison. analysis and interpretation
it is essential that the data are in a condensed form. Further. it i$.
a]so essential that the likes must be separated from the unlikes. All
the 2.000 students, no doubt. are alike in the sense that all of them
belong to a particular university and live in hostel but they differ in
other respects. Some may be living in single-seated rooms atld others
in double or treble-seated rooms; some may be living in costlier hostel
and others in comparatively cheaper ones; some may be having their
privat~ messing arrangements while others may have joined the common
mess. Thus, even though the data collected relate to one set of persons
yet there may be many types of dissimilarities even within this ~roup.
For the purpose of analysis and interpretation. data have to be d1vided
in homogeneous groups. In order to remove these defects-of volume
and heterogeneity-;-statistical data are fablliated with a view to present
a condensed and homogeneous picture. But before the tabulation of
data, it is necessary to arrange them in homogeneous groups so that
there may be.no difficulty in tabulation. The proceu of arranging data in
grollps or claue! according to relemblances and limilarities is technicallY called
Cla.r.rification. Thus, by classificatioQ we try to strike a note o(homoge-
neity in the heterogeneous elements of the collected inform~tion. Classi-
fication gives expression to the similarities which may be found in the
diversity of individual units. In classification of data units having
a common characteristic are placed in one class and in this fashion the
whole data are divided into a number of classes. Even after classifi.cation
the !ltatistical data are not fit for comparison and interpretation but this
is certainly the first step towards the tabulation of data. After tabula-
tion of data statistical analysis and interpretation are possible. Classi-
fication is a preliminary to tabulation and it prepares the ground for
proper presentation of statistical facts.
Characteristics of an ideal classific~tion
Despite the fact that classification is a very important preliminary
in a stati~tical analysis no hard and fast rules can be laid down for it.
64 PUNDAMENTAt.s lOP STATISTICS
TABLE 1
I II III IV
0-10 oand under 10 0-9 5
10-20 10 and under 20 10-19 15
20-30 20 and under 30 20-29 25
In the first method, items whose values are just 10 or 20 ca'
be classified either in 0-10 group and 10-20 group respectively or i
10-20 and 20-30 classes respectively. Usually in such cases the iteJ
is classified in the next higher class so that the item whose value
exactly 10 would come in 10-20 group. In the second method, tho
point is made clear. Items whose values are Ius than 10 woul
be in the 0-10 class interval. This is the exclusive method of c1as!'
fication. In exclusive method the items whose values are equ
to the upper limit of a class are grouped in the next higher dar
68 PUNDAl(BN'tALS O~ STATImCS
In other words, the upper litnit of a class is excluded and items wi~
values less than the upper limit are taken into account. As against
this the third method is in&ms;v,. In it the upper limit is alSo in-
cluded in the class interval. This method. in reality, is like the second
method as 0-9 means 0 and undc:r 10. To emphasise this point sOJ:QC"
times the class interval is written as 0-9.99. The fourth method indicates
only the mid-pbints.
Cotm#ng I/;, nllmb,r of it'lIIl in quI; t/all. After deciding the number
of classes. their tnagnitude and class limits, the next thing to be done
is to count the number of items falling in each class. This can be ·done
in any of the following ways : -
(a) .B;r IaI!J ·shl,ls. Under this method, the class intervals ~re
written on a sheet of paper (called Tally Sheet) and for .each item a
stroke is marked against the class interval in which it falls. Usually
after every four strokes in a class, the fifth item is iudicated by drawing
a horizontal or diagonal line over or through the strokes. These groups
of five are eas} to count. Data sotted in such a manner would give
the following type of tally sheet.
TABLE 2
Nllmb". of 1II4f1u oblai",J b" 80 sIIIt/",tl
(Tally Sheet)
------------------------
MArks
I 'I To'"
20-30 IIIl nn II 12
30-40 IiII fin lIn III 18
40-50 UlI IIil iIII IIII IIII nIl 1 31
50-60· lItt nrr 10
60-70 rill IIII 9
Total 80
TABLE 6
MarRs,obtained by 100 stllJents in statistics (slx-wisI)
\
Males Females Total
30-40 8 6 14
40-50 16 10 26
50-60 14 16 30
60-70 12 8 20
70-80 6 4 10
Total S6 44 100
TABLE 8
Marks obtained 1!7 stfltknts (sex-1IIise, on tbl btlns oj ~"i/
eonJiti01ls and resitkn&u)
Number of Students
Males
~----_-_-
Residence Marks
Hostellers
30-40
40-50
50-60
I
i
I I
60-70
Totali l l
i i
_ _ _ _ 70-801 _ _ _ _ _ _ _ _ _ I__i__
,
I_
74 FUNDAMENTALS OF STATISnCS
50-60 I
60-70 I
70-eO
-
Total
1
-- - - I - -I~------
I I .l --I
Total I (
30-40
40-59
50-60 I
I ,
60-70 I
70-80
Grand
Total
-j-,- --
i
--~ -- --,--1-
I
I I
The above table gIves information about a large number of inter-
related questions regarding students, namely, about the marks obtained
sex-wise distribution, civil conditions and residence. Manifold tables
are very useful in presenting population census data.
Rules- of 'tabulation
Having discussed the meaning, importance and ~es of tabulation,
it is necessary to lay down certain rules regarding construction of tables.
The following general rules should be observed in the copstruction
of tables : -
1. The table should be precise and easy to understand. It should
not be necessary to go throJ.l.gh footnotes or explanation to properly
understand a table. .
2. If the data are very large they should not be crowded in a
single table. This would increase the chances of mistakes and would
make the table unwieldy and inconvenient. Such data can be presented
in a number of tables. Each table should be complete in itself and
should serve a particular purpose.
3. The table should suit the size of the paper and, therefore,
the width of the columns should be decided beforehand.
4. There should be thick lines to separate the data under one
class, from the data under another class and the lines separating the
sub-divisions of classes should be comparatively thin.
5. The number of main headings should be few though there
is no harm if the number of sub-headings is large. This will he 'p in
understanding the main points of the table.
6, Captions, headings or sub-headings of columns, and sub-
headings and sub-headings of rows must be self-explanatory.
7. Those columns whose data are to be compared should be
kept side by side. Similarly percentages, totals and averages must also
be kept close to tl;le data.
CLASSIPICA'l'ION, SElUA'l'ION AND TAlIOLA'l'ION 75
8. As far as possible figures- should be approximated before
tabulation. This would reduce unnecessary details.
9. The units of measurement under each heading or sub-heading
must always be indicated.
10. Total of rows should be placed in the extreme right column,
though sometimes they are placed in the first column after the vertical
captions on the left. The totals of columns should ordinarily be placed
at the foot though in some cases it is helpful to place them at the top of
the table.
11. Items should be arranged either in alphabetical, chronological
or geographical order or according to si2:e, importance, emphasis or
casual relationship to facilitate comparison.
12. If certain ii gures are to be emphasised they s!-.ould be in dis-
tinctive type or in a "box" or "circle" or between thick lines.
13. When percentages are given side by side with original figures
they should be in a separate type-preferably italics.
14. If some portion of collected data cannot be classified in any
class or division a miscellaneous class should be' created and the data
shown in it.
15. There should be a proper title to each table. It should tell
what exactly the table presents.
Besides the rules mentioned above, the figures should be scruti-
nized before being entered in a table. Below a table, should be given
the method of collection, sources of data, general results obtained and
their limitations. The probable error should also be mentioned.
It Rhould be remembered that there cannot be any rigidity about
these rules. Tables must suit the needs and requirements of an in-
ve~tigation. Bowley bas correctly said that "in collection and tabu-
lation common sense is the chief requisite and experience the chief
teacher."
Questions
I. What do you understand by classification, seriation and tabulation? Dis.
cuss their importance in a statistical analysis.
z. "Classification is the process of arranging things (either actually or notionaliy)
in groups or classes according to their resemblances and affinities giving expression
to the unity of attributes that may subsist amongst a diversity of individuals!'
Elucidate the above'statement. ' (B. Com. Allahabad, 1947).
3. How would you proceed to classify the observations made and what points·
will you take into consideration in tabulating them? Mention the kinds of tables
generally used. (B. Com. Agra, 1941)
4 What precautions would you take in tabulating your data ?
(B. Com. Agra, 1933).
1. "In collection and tabulatiQn common sense is the chief requisite and ex·
perience the chief teacher."-Bowley.
What precautions in your opinion are necessary to avoid statistical errors in the
collection and computation of primacy' data? (M. A. Agra. 1940).
76 PONDAHBNl'ALS OF STA'l'ISTICs
6 •• DlacUSI the main functions and importaDcc 0.£ tabulation in a schcmc in in-
vcatJgation. Prepare blank tables to show distribution of students of a coUc~ accord.
Ing to age, class and residence for arranging (a) Physical training and (b) Tutorial classes.
7. (or) Draw up a blank table with suitablc beadings, spacings, table of lincs.
etc:. in which could be shown the number and tonnage of ships enteted and cleared
at ~ in India for 10 years distinguishing steam and sailing vessels anel also tbose
with eatgOCB from those in ballast.
(b) What do you mean by "A statistical Unit of Measurement:; Give a
auItab1e illlJ8tfttion. (B. CO/JI. H()JIs. AiMDTII, 194%)'
·8 Draw "P two independent blank tablcs giving rows,-columns and totals in
eacb ease swnmatlzing thc dCtails about thc members of a number of families distingue.
shing males from females, earners from dependants and adults from chUdren.,
g. Draw up in detail, with propct attentioCl to soaclng double lines, etc.,
and showing all sub-totals, a blank table in whIch coulcl bc entered the numbers
occupied in sil[ Industries on two dates, distinguishing males from females, and
ImODI the latter single, married and widowed. (M. A. AlIi/., (940)
10. &plain how you would tabulate IItatistics of death from principal diseases
by 1CZeI, in two dUfcrent provincea in India for a period to five years.
(M. COllI. Ct:Iflllla, 19")'
U. Prcpa:rc a table with a proper title, divisions and subdivisions to represent
the following heads of !nformation : -
(a) ~rt of cotton piccegoods from India.
(b) To BlUm.. China, Java, Iran, lraAJ.
(t) Amount of piec.egoOda to each country.
Value of piecegoOds to each country.
Prom 1939-40 to 1945-46 year by year.
1" I,.I,.a.,
17. a" 19. 11,,14. 'I,.
i6.17. I,.
Ja, 18. ale 15. 20,
10. 22, .17, 21, 19. 19. 16. 18, II, 18, 10 •
11.
19. 17. t6. 14.
".,
10
I,
10
2ll ..... ,
0 n
45
.. ..
57 10
20
I,
50
H
1,
20
"
IS
S
"
78 FUNDAMENTALS OF S'l'ATISTICS
.24. In an enq~find out relation between age and monthly wages, the fo1.
lowmg data were co from 40 mill workers :
S. No. Age(Ycats) Wagc(Ra.} S. No. Agc(ycars) Wagc(Ra.)
I. 37 81 :11 41 89
1. al 100 aa 38 9a
3· 49 101 a3 41 8I
4· ,6 109 24 37 140
S· 57 (02. as 4S 94
6. 34 104 a6 4 6 .n9
7· 25 8( 2.7 28 99
8. 48 tit, a8 43 109
9. 51 100 2.9 41 92.
10. 41 89 30 31 no
n. 4, 15' 31 5S tao
12.. H 101 32 42 115
13· 38 99 H 4Q 119
14· 41 U3 H 4S 90
IS· 31 100 3S So 76
16. 30 99 56 24 IS8
17· 55 130 37 :n 76
til. 30 159 38 u 76
19· 2.9 90 59 al 94
ao. u 79 40 58 89
Tabulate the above data in the following form ! -
13, 81, 58, 81 SS, 7S, 61, 70, 84, 84, 81. 87, 67, 6" 62, 62, 61, S9, 5S, 57, 75,
72, 84, 91, 87, 76, 43, 83, 40 , 73, 86, 73, 43, 33, 76, 95, 73, 65, 77, 72, 72, 29,
43 85, 4%, 80, 75, 85,62, 57, 64, 70,95, 57, 74, ,0. 7S, 49, 55,64, 92, 73, 73, 96,.
69 51, 22, 7S, 80, 36, 70 8S, 47, 69,63, 53, 91, H. 69, 30. (AndbrfJ, B. A., 1914)
31. Tabulate the following data by taking 10 as the cIas.·lnterval :
30, 45, 55, 65, 60, 90, lIS. 8s. 95> 100, 95, '65, 75. 8S, IZS, lIO, 87, 6"
100, lIS, 65, 60, 75, 9S, 130, 95, 125, II5, 6" 70, 9" 8" 6S, 60, 80, 8"
75, 95. 55, 45, 35, 45, 40, 85, 135, 140, 9S, 65, 4S, 3', U5, 90,80. IZS, 130,
~5. 90, 100, 95, 85, 85, uo, II5, 40, 35, 12 5, 35, lOS, 7',45,
(B. CtIIII., Vwa"" 1964).
Ratios, Percentages And
Logarithms 8
RAnos. AND PERCBNTAGBS
Need. Mter the statistical data have been collected, edited, cla$sifi-
ed and tabulated, they are ready for further statistical analysis. In the
process of classification· and tabulation the size of the data is considerably·
reduced and a large number of figures are condensed. This is done with
a view to make the data easily understandable and fit for analysis and
interpretation. But even after condensation, data might be fairly large
in quantity and the figures may be very big and unwieldy. It·may not be
easy to draw inferences from them. To remove this difficulty, sometimes,
ratios and percentages are calculated so that big figures are reduced to
small ones and 11. relative study of the data is possible. Absolute figures
ue uafit for relative study and in statistical analysis where most of the
data ~ compared relatively, absolute figures, even though they arc:
esset;ltial do not have very great· significan~.
Derivatives
Ratios and percentages are obtained by a combination of two or
more figures. They are J,ri",J from the absolute figure~ collected for the
putpose of investigation, and that is why. they are sometimes referred
to as utkriflflnfllt." Derivative is a quantity. obtained by the combination
of two or more figures. In a statistical analysis a vanety of derivatives
are used. Ratios, percentages, rates, coefficients, measures of central
tendency and meas~s of dispersion., skewness, kurtosis are all statistical
derivatives. Ratios and percentages are nlJlpl, JlriIJ4/iWI while measures
of central tendency or averages of the first order and measures of dis-
persion.and skewness or averages of the second order arc ~oClpl,x tllrilJa-
lilll", as in their calculation a number of statistical processes nave to be
undetgone. Simple derivatives may be either to-er_1I ()1' mlmJilk1h.
When two or more parts of a universe are.compared with each other ~th
the help of ratios or percentages these derivatives .are called co-ordinate
derivatives, and when a patt of the universe is co~=d with the Whole
of the universe derivatives are said to be subor teo The ratio of
females and males in a population is an example of co-ordinate derivative
and the ratio of females to the totall?opulation is a subordinate derivalive.
Ratiot. In the simplestjOSSlble form, a ratio is t\ quotient or the
numerical quantity obtaine by dividing One figure by another. 1£
800 is divided by 100 the quotient is 8. Here 800 has been compared
with 100 which 1S the base in this case. In other words, 800 is to 100
tlS 8 is tQ 1. Or 800: 100: : 8 : 1. 'the process reduces the s* of the
numben and thU9 facilitates comparison. Instead of saying that the
RATIOS, PERCENTAGES AND LOGAllI'1IHMS 8"1
"
82 FUNDAMENTALS OF STATISTICS
9~ + 10Xl00
100ft, rise ..· price becomes :I 100 - 105
2OXl00
20% fall ..· price becomes \05--- -
100 - 85
Thus according to this method the prices went up 10% over tbe
)riginal price.
Using the second supposition : -
Original price 100
95 X 100
5% fall /. price: becomes
100 = 95
10% rise ·.. price becomes 110x 95
100
= 104.5
th!e age of 45, and if dur.ing this period. on the basis of eurre nt fettility
rates, they give birth to 2412 femAle child.r~ the female g,ross reproduo-
tion rate 'Would be 2.412. Reproduction rates are generally apressed in
terms of UQity. It means that on. the assumptions mentioJ1ed above ·for
each mother at the present moment there would be 2.412 mothers in
future.
Thus
Number of female children expected to be born to
Female gron "\000 newly born females on the basis of current
,.
reproduction fertility Without mortality
tooo
N" "prodlttliOfl rat,. As bas been noted above the gross repro-
d~ction rate does not take into account" the factor of mortality. The
net reproduction rate takes intO account this factor also. Female net
reproduction rate tells us about the nwnbcr of female children ~
to be bom to 1000 newly hom females <;)n the basis of ~nt fertility
and mortality rates. It is quite obvious that neheproduction rate 'Would
be less than the gross reproduction rate. 1000 newly born.females 'Would
in actual practice not remain 1000 at the age of say 16. Some of them
would die. Supposing their number is reduced from 1000 to 800 and'
suppose further that the lCUttent fertility rate for the age of 16 is 20 per
toOO then the total 'number of children bom to them would not be 20
but 16 only~, ; If the sex ratio'is 50 ; 50 then only 8 female children.would
be taken into account for the (a}.~tion of female net reproductioll
rate. In the calculat;i.on of gross repr~uction rate 10 female children
'Would have been taken into accoWlt. In ~ age group of-women in the
dlildbearing age period, the numbC:r would go on declining due to morta-
lity .and ·the number of children bOrn woUld also be reduced. I{ sUppose
the total of 2,412 children (preswp~d in the calcUlation of gross reproduc-
tion rate) comes down to 1411. female ,net reproduction rate would be
1..411; It shows that for every present mother there would be 1.4~1
future mothers or, in other words, the populadon is growing. If net
reptoduction rate is just 1 it indicates a stationary population in fume
and if it is less than 1 it is a sign ,of declining popUlation.
'rhus
Number of female childl:en expected to be born to
Female net re- 1000 newly bom females on the basis of cutteDt
production fertility and mortality rate!>
rate
1000
• In the same way male reproduction rates can be calculated by taking
into, account the fathers and the number of male children espected to be
bom.. Combined reproduction rates for males and females can be cal-
culated by taking into account the population (both males and females)
and the number of children (both males and femaIes) expected to be bom.
90 FUNDAMENTALS OF STATISTICS
LoGARITHMS
'&lIJIIpl, 1
BxIJlllP" 11
Divide .0009 by .008
(II) log ••0009
(b) • log•• 008
- 4:9542
'J.9031
log. 4-1og. b T.OSt,t
Anti-log. 1.0511
: •• 0009+.008
-
po"",.
To ,.tUIIII tillmb" 10 II
.1125
.1125
In order to raise a number to a power
of
multiply the log. of the number by the exponent the power and find
out the anti-log.
Thus aa-Anti.llog. (nxlog. a)
Exampll1
Find out toe vslue of (95.2)~
log. 95.2 - 1.9786
x4
7.9144
Anti-log. 7.9144 8204()(){)('
:. (95.2)4 82040000
'&a111p/,11
Find out the cube of .0991
log. of .0991 - 2.9961
x3
4.9883
Anti-log of4.9883 .0009727
:. (0991)1 - .00097Z7
No/,. In th~ second example above 2 which is carried forward
£rom the mantissa to the characteristic is subtracted &om the product
of 3 and 2 and thus the chancteristic of the product is .f."
To6:tlrll# tbI rool ojIIl1l1111b". To extract the root ora numbet divi-
de the log. of the number by the index of the' root and. find out the anti-
log.
Thul
,\la-anti-Iog (10: a)
&alllpill
Find out the value of ~
log. 92.4 - 1.9657
Divided by 3
Anti-lo$~_ .6552
-- 1.~6~ -.6552
4.519
:.{j92.4 . 4.519
94 FUNDAMENTALS 011' S'I'ATIS'rICS
Example II
Find out the value of 7 v.00481
log. .00487 3.6875
_ To divide 3.6875 by 7 we shall have to write it as 7+4.6875 because
in 3..6875 the characteristic is negative and the mantissa is positive and
division is not possible with the figures as they are.
So
7+4.6875+7
Anti-log. 1];696
:.'11' -.0--
The utility of logarithms is very great in statistical calculations.
As has been said in the beginning, they help us in studying propor-
tionate changes. 10 to 100 is the same degree of relative chnge 9S 100
to 1000. In a.bsolute figures these changes are different but jf we find
out their logarithms, they would be 1 and 2 (for 10 aug 100 respectively)
and 2 and 3 (for 100 and 1000 respectively) indicating that the relative
changes in the two cases are identical.
Questions
I. Defin e a statistical derivative and discuss its utility in statistical analysis.
2. What is meant by co-ordinate and subordinate derivatives ? Illustrate with
examples.
3. "Wh"t precautions are necessary in the use of ratios and percentages?
4. What do you understand by a crude-birth rate? Is it an accurate measure:-
ment of the population growth of a locality? If not, how can it be modified to
give better results?
~. What is a "standard population"? How are birth rates and death rates
standardized ?
6. What do you understand by general fertility rate? Is it an improvement
over standardized birth rates ? '
1. What statistical data are necessary for the calculation of net-reproductlon-
rate? What is the deficiency in the existing Indian data in this respect.
(M.A.AIIJ .• 1951).
8. What is net-reproduction-rate ? Explail" with the help of an example the
method of calculating it.
9. What are the various ways of the measurement of population growth? In
this connection discuss in detail the calculation of net-reproduction-rate.
CM. Com. Allahabad, 1952).
10. Point out the ambiguity or mistake, if any, in the following statements :_
(a)· 99% of the people who drink, die before reaching 100 years of age.
I Therefore, drinking is bad for longe\'ity.
(b) The rate of increase in the number of cows in India is greater than the
population,' Then'fore, the people of India are now getting more milk per head.
(M. A. Palna, 1943).
RATIOS, PERCEN'l'~GaS AND WGARITHWI 95
II. Below is given the fertility rate for 1000 women, by their age group for a
certain country for 19;6 : -
Age GrollP Per IililJ rale per AglI GrfJfIP FerlililJ rale per
1000 women 1000 women
Years Yeara
I6-2c' 19 36-40 IS7
,"[-25 173 41-45 67
26-;0 "H 46-5 0 9
3 1 -35 201
Assuming that ratio of female babies to total births for tbe country and year
concerned is 48.8%. calculate the gross-reproduction-rate for the country and explain
what this rate means.
u. ~he following are the death-rates. per thousand, per annum, of two towns
in a certaln Y,car : -
Town A TownB
Ages Death- Death-
(years) Population Deaths rate per
1000
PopulatIon D.:aths
I ~~e:~
Under a
2-10
3000
10000
191.
70
64.0
7. 0
SOOO
12.000
300
78
I
I
60.0
6.5
10-"0 10000 40 4.0 10000 38 S·8
2,0-60 ;"5 00 1.60 8.0 1Sooo 190 7. 6
60 & over SSoo 510 60.0 8000 460 SM
1\jJ---I--"-7(,,-,.-noo--
107" I ~7;- --6:;"00-00- J lO66 17.71
(a) For each age group the death-rate of town A is greater than that of
town B but the reverse is the case when all age-group. are grouped together. Why
is it 80 r
(b) Calculate the standardized death-rate for toWn B taking the popUlation
of toWn A as the standard. (B. COf1I. Andbra, 1944), (M. A. Punjab, 1954).
13· Compute crudc and standardized death-rates in the folloWing and find out
if the local population has a higher or lower death-rate:_
I
80
Above 6) 8000 320 8000 400
14· Wh'lt are absolute and relative measurements r E,.-plain in this Cllnnectlol'l
the URe of ratios, ,Jercentages and co-efficients. (B. Com. -" 6"0. 1941).
IS· Write short notes on: (a) Derivative series. (b) Complex derivatives.
(r) Total fertility rate. (d) Male.rr-prodnct)rm.rat... (e) P2llacks in the use (,f
ratiM and percentages.
96 rl1NDAHBN'rALS OF ITA'l'ISTICI
A B
......__-----I-----I----------
No. of ClDcUdateI Suc:ceuful No. 0( c:aocU- Sac:cellCuJ
appeared data appeared
M. Sc.
M. A.
60 ,0
90
zoo
Z40
160
190
160
_____ :'0:. __
ToU! 800 '90 800 '90
(II y,.,.. T. D. C•• R4.. 1961).
17. 'l11e following table gives the result of ceftaJn eumlnatiODll of tluee ani-
1'CtI1tb fa the JCIU 19'7. Whfch Ja the best otliveftlty P
M.A.
---------r=- Percentages resultlln the otliveftlty
A B C
----- ... ----------
7
, ----- 0
M.Sc:.
I
70
B.A. 80
B.Bc. 70
B.Com. 60
(M. A. c.Jmtlti)
Measures of Central
Tendency 9
Need and meaning
We have discussed in the last chapter the utility of various statistical
derivatives like ratios, percentages and rates, etc., in reducing the quantum
of data and also in reducing the size of the figures. But these derivatives
ate not enough for the proper condensation of figures and sometimes
there are many fallacies in their use. Condensation of data is nece,ssary
in statistical analysis because a large number of big figures are 1l0t only
confusing to prind but difficult to analyse also. In order to retiRGt Ib,
complexity of data and to make thelll GOllIparable it is essential that the,various
phenomena which are being compared are reduced to ,one figure each.
If, ,for example, a comparison is made between the marks obtained by a
group of 200 students belonging to a university and the marks obtained
by another group of 200 students belonging to another university, it
would be impossible to' arrive at any conclusion, if the two series relating
to these marks are directly compared. On the other hand, if each of these
series is repre_sented by one figure, comp~n 'Would 'be an extremely
,easr affair. ,It is ,obvious tnat a figure which,is used to represent a whole
senes should neither have the lowcst value in the series nor the 'highest
value, but a value somewhere between.these two limits, possibly in the
centre, where most of the items of the, $eries cluster. Such figures are
,called MealllriS' of Central TendellGJ or A_ages. An average represents
a whole series an4 as such, its value always lies between the minimum
and maximum values and generally 'it is located in the centre or middle
of the d i s t r i b u t i o n . ' ,
ObjeGts. Measures of central tendency or averages gipe a bird'l ey,
iii,., of/he hllge lIIalS ofJlatistitai tItsta w!Jith'Ordillari(y are not tanlJ jntelligible.
They are devices to aid the human mind 'in grasping the true significance
of large aggregates of facts and m~surements. They set aside the un-
,necessary details of the data and put'forward a concise picture of the com-
plex phenomena under investigation. If the human mind was capable
of grasping all the details of large nu~bers and their interrelationships,
averages would have no utility. But the human mind is not capable of
this. It is impossible to keep in mind, say, the details of heights, weights,
incomes and expenditures of even 200 students, what to talk of big figures.
This difficulty of keeping all the details in mind necessitates the use of
averages not only for grasping the central theme of a data, but also for the
.facility of comparison and further analysis. Averages are thus extre""lJ
/;elpflll for pllrPdJlS of (olllpariJon.
w~ jj an aperage a reprefen/alive. The reason why ao average is
a valid representative of a series lies in the fa.ct that ordinarily most of the
7
98 FUNDAMENTALs OF STATISTICS
items of a series cluster in the middle. On the extreme ends the number
of items is very little. In a population of 10,000 adults there would
hardly be any person whO" is 2 ft. high or whose height is above 8 'ft.-
There will be a smaU range within which these values would vary,
say 5 ft. to 6' 5", Even within this range a large number of persons
wou1d have a heighl: between say,S' 5· to 5' 10·. In other class intervals
of height the number of persons would be comparatively small. Under
such circumstances if we conclude that the height of this particular
group of persons would be represented by, say 5' 7', we can reasonably
be sure that this figure would, for aU practical purposes, give us a
satisfactory conclusion. This average would satisfactorily represent
the whole group of figures from which it has been calculated. Ordinarily,
items with values less than the average cancel the items whose values
are more than the average. Thus the average of 3, 4 and 5 is 4. The
item before it is one less in value and the item after it is one more in
value, than the average figure of 4. Thus the two deviations 'If -1
and +1 cancel each other.
Typical and descriptive averages. It should, however, be noted,
that a serie .. can be represented by an average only if the average is
really typical. Sometimes the average which is calculated is not truly
representative of the series. In such cases it should not be used to
represent the series. Averages which are representative are called
Typical Averages and those which are 'not representative aQ.d have only
a theoretical value are called Descriptive averages.
CharacteristicS of a representative average. In whatever way we define
an average it is necessary to keep in mind the fact that an average is
a particular value in a variable and as such it has to be expressed in the
same unit in which the series is. If the variable refers to the weights
of students in pounds the average would also be weight and in pounds.
Similarly- the average of ratios and percentages should be in ratios and
percentages only. Averages are meant for condensing a frequency
distribution in one figure and it is necessary that they are in the same
unit in which the original series is., At thi's stage, it is necessary to decide
about the desiderata or the requirements for a good measure of central
tendency. A typical average should possess the following charac-
teristics : -
(a) It shollld be rigidly defined. If an average is left to the estimation
of an observer and if it is not a definite and fixed value it cannot be
representative of a series. The bias of the investigator in such cases
would considerably affect the value of the average. If the average is
rigidly defined this instability in its value would be 110 more, and it
would always be a definite figure.
(b) It shollld be based on all the observations of the series. If some
of the items of the series are not taken into account in its calculation
the average cannot be said to be a representative one. As we shall
see later on there are some averages which do not take into account
MEASURES 011 CENTRAL T'ENDENCY 99
all the values of a group and to this extent they are not satisfactory
averages.
(e') )t should be e'apable o/further algebraie' treatment. lrfiytilverage
does not possess this quality, its use is bound to be very limited. It
will not be possible to calculate, say, the combined average of two or
more series from their individual averages; further it will not be possible
to study the average relationships of various parts of a variable, if it is
expressed as the sum of two or more variables. Many other similar
studies would not be possible if the average is not capable of further
algebraic treatment. -
(d) It .rhotJ/d be ea.ry to e'aleu/ate and .rimp!e fo follow. If the calcu-
lation of the average involves tedious mathematical processes it Will
not be readily understood and its use will be confined only to a limited
number of persons. It can never be a popular average. As such,
one of the qualities of a good average is that it should not be too abstract
or mathematical and there should be no difficulty in its calculation.
Further, the properties of the average should be such that they can be
easily understood by persons of ordinary intelligence.
(e) If should not be affected by jlue'ftlatiblls of samplilzy,. If two
independent sample studies are made in any particular field, the averages
thus obtained, should not materially differ from each other. No doubt,
when two separate enquiries a~".made, there is bgund to be a difference
in the average values calculated but in some cases this difference would
be great while in others comparatively less. Those averages in which
this difference, which is technically called "fluctuation of sampling"
is less, are considered better than those in which its difference is,
more.
One more thing to be remembered about averages .is that tbe itellu
lIIM.re average ir being cakulated rhollld form a oomogli1le01lS group. It is absurd
to talk about the average of a man's height and his weight. If the data
from which an average is being calculated at:e not homogeneous, mis-
leading conclusions are likely to be drawn. To.find out the average
production of cotton cloth per mill, if big and small mills are not separat-
ed, the average would be unrepresentative. SimiLirly, to study wage
level in cotton..mill industry of India, separate averages should be cal-
culated for the male and female workers. Again, adult workers should
be separately,studied from the juvenile group. Thus We see that as far
as possible, the data from which an average is calculated should be a
homogeneous lot. Homogeneity can be achieved either by selecting
dnly like items or by dividing the heterogeneous data into a number
of homogeneous groups.
Measures of various orders
Statistical series may differ from each other in the following three
ways : -
1. They may differ in ~ values of th~ variable round which
most of the .items cluster. '
100 FUNDAlIENTALS OP STATISTICS
ARITHMETIC AVERAGE
1
If - 7("1+1Il1 +1Il.+ ............... +11I0)
1 :zmo
or a - -~ or a - -
Where " "
11=Arithmetic average; Ill" Values of the '\>ariablej 1: = Sum-
mation or total; ,,-Number of items.
The following example would illustrate this formula.
&alllpl, 1. Calculate the simple arithmetic average of the
following ltems :
Si%e of items
20 SO 72
28 53 74
34 54 75
39 59 78
42 64 79
SollltiOll. DiI'I# M,thod
Computation of aritbm.ctic_ aY~J;3ge
Size of items
(m)
20
28
34
39
42
50
53
54
59
64
72
74
75
78
7'9
'1'02 FUNDAMENTALS OP STATISTICS
• Proof. Supposing INI' fIIz, INa, etc., stand for the values of a
variable and d1,tdz, da, etc., for· the;r respective deviations from the
mean and if a stands for their arithmetic average and n for the number
of items.
Then,
IN.+fIIZ+INS+ •.. +INn
a ~ --~~~--~--~~~
n
IN1+fIIZ+INS+ ... +tJtn ~an
The number of items is equal to n.
:. If we subtract an times from each side of the equation we
get
But
(m1-a)=d1, (ms-a)=d z, (INa-a)=d. and so on.
:. d1+dl+da+ ... +dn =0
Or l)i==0
MEASURES OF CENTRAL TBND:l!NCY 103
Symbolically:
T,tix
a =x+-- n
Where -
a =Actual arithmetic average; x=Assumed arithmetic average;
T.dx => The sum of the deviations from the assumed mean; n = Number
of items.
It should be remembered that the difference between the actual
arithmetic average and the assumed arithmetic average is equal to the
sum of the deviations from the assumed arithmetic average divided by
the number of items.
Symbolically ;
T,dx
a- x = __
n
If we solve example No. 1 by this short-cut method it will give us
exactly the same answer as we got by the direct method. This alternative
method is illustrated below:-
Calm/ation of arithmetic average
Short-cllt method
Deviation from an assumed
Size of items mean (50)
_ _ _ _ _ _.(m) (dx)
20 -30
28 -22
34 -16
39 -11
42 -8
50 o
53 3
54 4
59 9
64 14
72 22
74 24
75 25
78 28
79 29
n = 15 }Jdx=+71
71
a -50+ ~ -50+4.73 -54.73
Or 4<='P"f -= Emf
n 1:.f
The following illustration would clarify the formula :
Example 2. The following table gives the number of children
born per (amily in 735 families. Calculate the average number of
children born per family.
a "" x + ( }:;~dx X i)
The following illustrations would clarify these tules :_
Example 3. The following data relate to sh:es of shoes sold
a store during a given week. Find the average size by the short-cut
method.
Computation of the overage nte of shoes
Size of shoes No. of pairs Size of shoes No. of pairs
4.5 1 8 95
5 2 8.5 82
5.5 4 9 75
6 5 ~5 44
6.5 15 10 25
7 30 1~5 15
7.5 60 11 4
So/Illion. Shor/-&II/ Me/hoa No.1.
Height
No. of
Persons
Deviations
from the
avo mean (67)
I Step-
deviation
Total
Deviations
(ill) The fourth characteristic laid down for an ideal average that
it should be easy to calculate and simple to follow, is also found in
arithmetic average. The calculation of the arithmetic average is simple
and it is very easily understandable. It does not require the arraying
of "data which is necessary in case of some other averages. In fact this
average is so well knQwn that to a common "man_Average means an
arithmetic average.
Thus the arithmetic average
(a) is simple to calculate,
(b)~ does not need arraying of data,
(e) is easy to under5tand
(v) The last characteristic of an ideal average that it:. should be
least affected by fluctuations of sampling is also present in arithmetic
average to a certain extent. If the number of items in a series is large,
the arithmetic average provides a good basis of comparison. as in such
cases, the abnormalities in one direction are set off against the abnorm-
alities in another direction.
Drawbacks of arithmetic average
No doubt the arithmetic average satisfies most ,of the conditions
of an ideal average, there are certain drawbacks also from which it suffers
and as such it should be used with caution. These drawbacks really
arise on account of the peculiar nature of this average aqd the teChnique
of its calculation. The points worth consideration in this respect are
as follows:
(i) Since arithmetic average is calculated from. all the items of a
series sometimes the abnormal items may considerably affect this average,
particularly when the number of items is not large. For example,
if the income of a shopkeeper is Rs. 1,000 per month and the incomes of
his three assistants are Rs. 25, Rs. 35 and Rs. 40 per month respectively,
. . " 1000+25+35+40
the average Income of thIS group would be Rs. 4
or is 275 per month. This is not at all a representative figure. Simi-
larly, if one player in cricket scores 300 runs and the remaining 10 players
score only 140 runs, the total is 440 runs and the average per player is
40 run~. It is not a representative figure as 10 players out of 11 have
scored on an average only 14 runs each.
(it) Further, the fact that the arithmetic average cannot be calcu-
lated without all the items of a series can also be said to be a drawback,
If out of 1000 items the values of 999 items are known the arithmetic
average <;annot be calculated. Other averages like median and mode do
not need complete data,
(iit) Arithmetic average is no doubt easy to calculate but in Ii
relative sense its calculation may be more difficult than tha:t of mode or
median as they can be located merely by inspection.
(iv) Another point to be noted. in this connection is that the
arithmetic ayerage can be a figure which does not exist in tne series
MEASURES OF CENTRAL TENDENCY 113
at all. The arithmetic average of 12, 14 and 19 is 15. No items of the
series has a value of- 15.
(II) Arithmetic aye rage sometimes gives such results which appear
almost absurd. If we have to find out the number of children per
family, and if we use th~ arithmeti~ average, it is qui!e likely that we ~et
the average as 3'4 "children. ObvlOusly the result 1S absurd. A chlld
cannot be divided in fractions.
(II') Sometimes arithmetic average gives fallacious conclusions.
Suppose the incomes of two groups of persons are as follows :-
The average incoine of each of these two
groups is Rs. 300. It would appear from the A B
averages that both the groups are economically
at the same level, and the two series are al·
most similar to each other but this is not the 1000 325
case. The two series entirely differ from each 100 300
other. so far· as their composition is con- 75 285
cerned. 25 290
(fIi;) The arithmetic average gives
greater importance to bigger items of a series
and lesser Importance to smaller items. It has 1200 1200
an upward bias. One big item among four ,
items, three of which are small, will push up the average conSIderably.
But the reverse is not true. If in a series of four items there are three
big items and one small item the average will not be pulled down very
much.
The above discussion thus leads us to the conclusion that though
arithmetic average fulfils most of the conditions of an ideal average yet
it should be used with caution as it is likely to give erroneous conclusions
under certain conditions.
,. MEDIAN
- ('MuI;an !!!Ae vtJlf!..,_gilh, 11I~ ilJ.!!l.d.. a ser;e.!..JI!/un iJ,4 arrayed ~'!. t!,mn£_-
in&. ;;:r,{t:enmng Drlir DL!'IPblibl, .
It divides the series in two equal parts.
Tlie va uesof items in one part are less than the value of the median and
in the other part are more than it. If in a clas~ there are 21 students and
if they stand in a line in accordance with their height beginning with the
shortest amongst them and ending with the tallest, then the 11 th student
would be in the centre and would divide them in two parts consisting of
ten students each. Students of one part will have heIghts less than the
height of the 11th student and of the other part more than this height.
The height of the 11th student is the median height. For un grouped
data it may be convenient to und the value of the median by counting
+1 .Items, b"
N"2-.- eglnmng W'1t . h the h'19hest f\or Iowest).Item tn . th e
array. In grouped data it is abandoned.
Symbolically M .... si%e of ; items
where M stands fot the median and n for the number of items.
8
114 FUNDAM;NTALS OF STATISTICS
the 5.5th item. In such a case the values of 5 and 6 items would be
added and IN _total would be divided by 2: the resulting figure would ,
be the value of the median. The following example would clarify this
point : -
Exam.ple 8. The following table gives the marks obtained'by a
batch of 30 B. Com. students in a class-test in statistics. (Marks 100).
Roll. No. Mark;s obtained Roll No. Marks obtaIned
1 33 16 ~4
2 32 17 33
3 55 18 42
.4 47 19 38
5 21 20 45
6 SO 21 26
7 27 22 33
8 12 23 44
9 68 24 48
10 49 2S 52
11 40 26 30
12 17 27 58
13 44 28 37
14 48 29 38
15 62 30 35
MEASURES OF CENTRAL TENDENCY 115
Find the value of the median.
S()ffltion. Marks obtained by 30 students arranged in ascending
order of magnitude:
Serial No. Marks Serial No. Marks Serial No. Marks
1 12 11 33 21 47
2 17 12 35 22 48
3 21 13 37 23 48
4 24 14 38 24 49
5 26 15 38 25 50
6 27 16 40 26 52
7 30 17 42 27 55
8 32 18 44 28 58
9 33 19 44 29 62
10 33 20 45 30~ 68
M == SIZe z-
. 0 f n+1 items
.
d · ... size
-:?Jf M elan z-
. of n+l pairs;
. wnere
/ n equaIs the total f requency
.
= SIZe 0
457 + 1 or 229'
f ----2--- palrs - 8.5
I
It will be clear from the above figures th,t th~ .alue of items from
213th to 294th is 8.5. The 'Value of the 229th item. thus, is also 8.5.
Detetmination of median in a continuous s¢es
When the median of a continuous fre-~ncy distribution has to
be determined there is one difficulty. The tie of the median lies in
a class interval, and to get a definite fi~ure, interpolation has to be done.
Suppose, for example it is'found that the :value of the median lies in the
20 to 30 class interval'whose frequency is 40. Now to find out the value
of the median "We have to takic recourse to interpolation and to apply a
?articular formula. This formula, which we discuss below, is based on
the asswnption that the frequencies of the class in which the median lies
Lre uniformly spread over the whole class-interval. In the abqve case.
He shall presume that these 40 units are equally distributed in, the whole,
:lass interval of 20 to 30 or each of these ten values 20, 21, 22 ang so on,
las a frequency of 4 units. /
The formula of interpolation to find out the median is : -
at the middle item cutting the curve at a particular point. The value of
the median is read on the vertical line (called fNdinate) at the point of
intersectio~. This procedure would be illustrated in the chapter on
Graphs.
Merits of median
(i) It satisfies the first condition laid down in previous pages for
an ideal average as it is rigidly defined.
(ii) It can be easily calculate'd and it is understood without any
difficult~.
(iii) It is not affected by the values of the extreme items and as
such is sometimes more representative than arithmetic average. If the
incomes of five persons are Rs. 30, Rs. 35, Rs. 40, Rs. 45 and Rs. 1,000
the median would be Rs. 40 whereas the arithmetic average would be
Rs. 230. Median in such cases is a better average.
(iv) Even if the value of the extremes is not known median can
be calculated if the number of items is known.
(v) It can be located merely by inspection in many cases.
(vi) It gives best results in a study of those phenomena which are
incapable of direct quantitative measurement, for example intelligence..
It is impossible to measure intelligence quantitatively but it is possible to
arrange a group of persons in ascending or descending order of intelligence
and thus to locate a person ;vhose intelligence can be:. said to be average.
Drawbacks of median
(i) Median may not be representative of a series iQ. many cases.
This is specially so when there are wide variatiQns between the values
of different items: For example, if the marks obtained by eleven students
are respectively 15, 16, 16, 18, 18, 20, 54, 60, 60, 60, and 72 the median
marks would be 20. Clearly the average is not representative of the series.
(ii) It is not suitable for further algebraic treatment. For exam-
ple, we cannot find out the total values of the items if we know their
number, and median.
(iii) When median has to be calculated in continuous series it
requires interpolation. The assumption of the interpolation, that all
the frequencies of the class-interval are uniformly spread over their
values in the class-interval, may not be actually true. In most cases it will
not be true.
(iv) If big or small items in a series are to receive greater impor-
tance median would be an unsuitable average. Median ignores the
values of extreme itenls.
(v) Median is more likely to be affected by the fluctuations of samp-
ling than the arithmetic average.
(vi) The arrangement of items in ascending or descending order
is sometimes very tedious.
Comparison of mean and median
Both the mean and the median satisfy the conditions of rigld
definition and stability but so far as ease in calculation is concerrred
MEASURBS OF CEN'tRAL TBNDENrV 119
median has >l distinct advantage over mean. On the other hand, the
general fluctuations of sampling 'affect the median to a greater extent
than the mean, though there might be some cases where mean is affected
to a greater extent by such fluctuations than the median.
So far as thl'! case of algebraic treatment of these two averages is
concerned, mean is definitely superior to median. In case of mean
w hen several series relating to one phenomenon are combined into one,
it is possible to find out the combined average from the averages of
various series and their number of observations. It is not possible in
case of median. However, if the component series are symmetrkaP
their means and medians would be identical and as such combined mean
and median would also be the same. But in case of asymmetrical distri-
bution the combined median would not coincide with the mean n01" with
any other assignable value. The sum or difference of the corresponding
values of the items of two series, is not equal to the sum or difference of
their medians as is the case with arithmetic average. The calculated value
of the median subject to error, is not necessarily the same as the true value
of the median, even if the error is :tero. that is if positive or negativ:e
errors cancel each other.
On the other hand, median has certain advantages over tue mean.
It is easily calculated and is readily obtained without even knowing
the value of all the items, provided they can be arrayed. Further in
SOme cases mean cannot be calculated due to the extreme class intervals
being infinite, like cCless than 100" or "more than 10,000" etc; but median
can be easily obtained in such distributions. Sometimes median may be
more representative than the arithmetic average, due to the fact that it is
not affected by the values of extreme item:::. If, for example, the values
of most of the items of a sample cluster round 200, median would not be
affected if suddenly, one it~m, whose value is 3000, is included in the
sample.' Mean in such cases is more affected by fluctuations of sampling
thhl the median. Further, median is geO(:rally the value of a particular
item of the series, whereas mean may not be the value of any item of the
series. In this sense median is a more natural average than the mean.
QUARTILES, DECILES AND PERCENTILES
It has been seen that the median divides an arrayed series in t'wo
equal parts. The values of items in on'e part are more than the median
value, and the vlllue of items in the other part, less than the value of the
median. With a view to have a better study about' the composition of
a series it may be necessary to divide it in four, five, six, seven, eight,
nine, ten or hundred parts. Usually the series are divided either in
four, ten or hundred parts. Just as one item divides the series in two
parts, three items would divide it in four parts, nine items in ten parts and
ninety-nine items in hundred parts. The values of these items are res-
pectively known as Quartiles, Deciles and Percentiles. A series can be
di~ided in five, seven or eight parts by Quintiles, Septiles and Octiles.
I For further explanation see chapters 0::1 Dispersion and Skewness.
120 'PtlNDAMl!NTALS OF STATIS'I'lCS
There are thus three quartiles, nine deciles and ninety-nine percen-
tiles in a series. The second quartile, qrth decile and 50th percentile is
median. The value of the item which divides the first half of It series
(with values less than median) i.ti two equal parts is called the First gM4rtil.
or LOlli" Quartil, and the value of the item which divides the latter
half of a series with values more than the median) in two equal parts is
called Third Q1IIIrIiJ, 0'Upp., QIIP,IiJ,. The S,fOlJd Qua,lih or the
MidtJ" Qlla,lil, is the same thing as median.
The calculation of 'Quartiles, ,Deciles, 'Percentiles and other such
values is done by following the same rules with which the value of median
is determined.
Thus
Ql - the v al ue of 4 " .ltems
. f II •
SO/Ii/iOIl. 1st Quartile or 12.1 -s~e 0 4" Items
. f3(11) .
Upper Quartile =Slze 0 -4- pairs
=9
· f 7(n) .
7th Decile =!i{ze 0 -fO-palrs
· 0 £7(457)
=Slze lU or 3199h'
. t pair
=9
46th Percenrl1e · 0 £--pallS
=SJZe 46(n) .
100
· £46(457) 2 02 .
=Slze 0 -fOO-" or 1. 2th pair
=8
3rd Quintile · f 3 (n) .
=Slze 0 - 5 - palts
· 3(457) .
=SlZe o f -- or 274.25th Ipalr
5
=8.5
· f 5(n) ".
5th Octile = size 0 -8- pairs
.
=SlZ~ 0
f5(457)
8 or 2856h .
. t pal!
=8.5
'1
Where and 12 are the lower and upper limits of the class in which
the first quartile lies,/l the frequency of this class, '11 the quartile number
.!!._ and c the cumulative frequency of the class preceding the quartile
4
class.
',,-/1 ,
Qa = I 1 + 11 ('l3- C)
Where 11 and 12 stand for the lower and upper limits of the class
in which the 3rd quartile lies, 11 for the frequency of this class, 'ia the
quartile number and & the cumulative frequency of the class preceding
the quartile class. .
Similarly the formulae can be denved for the calculation of deciles
percentiles, etc. '
Thus '.-i1
'd )
D 2= I 1 - - 2-& an d
11
i.-II \
P72 ... il +-y-;- rp72- C}
Example 13. From the data given below calcula 'e the median and
quartiles.
Solution. Calculation of the median and quartile ages of married females.
Age Number of married Cumulatlye frequency.
females
~
0-5 3 3
5-10 31 34
10-15 410 444
15-20 1809 Q253
20-25 2446 4699
25-30 2223 6922
30-35 1723 8645
35-40 1292 9937
40-45 963 10900
45-50 762' 11662
50-55 531 12193
55-60 317 12510
60-65 156 12666
65-70 59 12725
70-75 37 12762
Total 12,762
The median age of married fe~ales
th f th n females, where n equals the total
= e age a e 2 frequency
12762
... the age of the -2 _. i.e., 6381st married female .
l~ PUoNDAMBNTALS OP STATIS'l'ICS
who lies in the 25-30 age group. Applying the formula of interpolation
1,-/1 )
M- I 1+-,;-(111-1
where, M represents the median,/:t-!.._nd I. the lower and the upper limits
of the group in which median is situated;!1 the frequency of median class;
111. the number of middle item or T items a"nd I. the cumulative
frequency of the group lower than the one ~ which median is situated.
30-25
M-25+ 2223 (6381 - 4699) -28.8· years approx.
The lower quartile age of married females
n
-the age of the ~ i.i., 3190.50th married female who lies
in the 20-25 age group;
By interpolation
I 1,-/1 )
121-1+ !1 (fl- t ;
where 121 represents lowc:r quartile; 11 and It. the lower and the upper
limits of the group in w!Uch lower quartile is situated;!I' thf frequency of
"
lower quartile class; fl' the number of 4 .ltems; and t. the cumu-
lative frequency of the group lower than the one in which the lower
quartile is situated.
=20+ 25-20
2446 (31 90.50 - 2253) = 21.9 yrs. approx.
The upper quartile agc< of married females
-the age of the 3 ~) 'i.,.• 9571.5 st married female who
i~ situated in the 35-40 age group;
By interopolation,
n 1+ 1.- 11
olGa"'" 1 --Y-;-(f,-t;
)
where Q a stands for upper quartile; 11 and I,. for the lower and the upper.
limits oithe groUp in which upper quartile is situated;!! for the frequency
of upper quartile class; fa for the number of ~) items; and 1
or the cumulative frequency of the. group lower than the one in which
Q. is situated.
-- 35 + 40-35
1M2 (9571.5-8645) "",38.6 yrs. approx.
MEASURES OF CENTRAL TENDENCY 125
Bxampll 14. From the dda given in Example 10 calculate
(a) 8th decile and (b) 56th percentile.
S oilltion : (a) 8th duile
Da =si2!e 'of 8 i~) items. where n equals 245
... size of 196th item, 'which lies in 7 - 9 group; applying the for-
mula of interpolation.
D.... 11 + !,,;:1 (ds - t).
here 11 and I" represent the lower and the upper limits of the group
in which 8th decile is situated.ft, the frequency of the same group; de, the
value of 8i~) item and t. the ~ulative frequency of the group
)ewer than the one in which 8th deCIle is situated.
9-7
We get Da-=7+ ---sr (196-144)
=8.6.
(b) 56th Pemntil,;
. f 56 (n) .
PH=s12le 0 100- items
. f 56(245) ..
.,. SIZe 0 100 stems
-sae of 137.2th item, which lies in 5-7 group,
Applying the formula of interpolation,
It-II
P&I - I 1 + 11CP.. - t);
where 11' II and!1 represent the lower and the upper limits and the fre-
quency of the group in which the 56th p_ercentile is situated PH> the value
of ~~_(n)
100 item and t, the cumulative frequency of the group lower
lihan the one in which PH is situated
7-5
We get Pae =5+ ss-(137.2-59)
-6.84
MODB
Mode is the most comma" item of a series. It represents the most typical
of frequent value of a series-a \talue which is in fact,the fashion(/a mode).
When one speaks of the "average student," "the most common wage."
"the common man" or "the typical farm" and the l!ke, he is unconsciously
referring to mode. If it is said that the most common wage in a particular
industry is Rs. 50 per month, what it means is that the largest number of
persons get this single figure of Rs. 50 as wage. Other I figures of wage
are not as popular as this one, and the number of persons getting them is
less than the number getting Rs. 50 per month. _
Methods of calt:tllation. It appears from this definition that it must
be very easy to calculate the mode of a series. In fact it is.., not always
so. As we shall see later on, the most satisfactory method of calculating
mode is that of "curve fitting" which is an extremely difficult process.
In ordinary practice, however, mode is estimated by easier methods which
are comparatively very much less accurate than the method of curve
fitting. These methods are no doubt very simple and easy.
Example 15
Find out the mode of the following series : -
SolNliofl
I ~3) I
item
(81) _(!L_/ (2) {4) 1 (5) (6)
5 48.
100 I
!
6 52 } II J~
108
7
8
56
60.
..-
} 116
} u3
r I I
156
168 179
I t I
9 63
10 57 } lao I~
1
11 55 } 112 17' 162
} 105
I I
12 50
13 52 } 102 157
I
}
I
93 1.43
14 41 150
}
I I II~
98
15 57
16 63
} X20 161
172
17 52 } 115
I
} 100
18 48
19 40 } 88 140
The frequencies in colutntl (1) are first added in /tIIo' sib. columns (2)
and ,3). Then they are added in IDr,,'s in columns (4), (5) and (6). The
maxtmum frequency in each column is indicated by thick letters. It
will be observed that mode changes with the change in grouping. Thus
according to column (1) mode should be 9 or 16 according to column
(2) it should be either g or 10 or 15 or 16. To find out the point of.ma.xi-
mwn concentration the data can be arranged in the shape of table as
follows:
129
Analysis Table
Columns Sh!!e of item containing ma:lCimum frequency
- 9 16
(1)
(2) 9 19 15 16
(3) 8 9
(4) 8 9 10
(5) 9 10 11
(6} 7 8 9
No. of times--a -size 1 3 6 3 1 l' 2
occurs , ! I
Since the size 9 occurs the largest number of times it is the modal
size or mode is 9.
If we look; at the frequencies in the o~iginal t.able, we shall fin.d
that the frequency of 63, which is the ma:lC1mum smgle frequency, IS
against two values, 9 and 16. The series thus appears to be hi -modal
but the process of grouping leads us to the conclusion that the- con-
centration of items round 9 is more than the concentration round 16.
Even if the frequency against 16 was 64 instead of 63 probably group-
ing would have disclosed that the concentration/-of items round about
9 is plore, even though the individual frequency again!>t 9 is only 63 It
is thus never safe to rely only on the inspection of a series and to locate
the mode at the point of maximum frequency. Mode is affected by the
frequencies of the neighbouring items also, and, therefore, grouping is
essential, as it reveals the true point of ma:lCimum concentration.
Determination of mode in a continuous series
In a continuous senes the determination of mode involves two
steps. First, by the process of grouping, the class in which there is
maximum concentration has to be located. After this the value of
mode is interpolated by the use of a formula. It should be remember-
ed that mode does not always give satisfactory results in a continuous
series. If the size of the class-interval is changed the modal class also
changes in many cases. Suppose, for example, the magnitude of c1ass-
intervals is 10 and mode hes in, say, 30-40 group. If this series is
regrouped in class-intervals having magnitude of only 5, it is quite likely
that the mode may lie in, say, 45-50 group. It would depend on the
distributior. of items in various class intervals. For determining mode
in 2. continuous series, the class-intervals should not be very big in size,
but if the size of the class-intervals is very small the frequencies also
become very small, the distribution becomes irregular and the deter-
mination of mode becomes very difficult. The series n:ay even become
multi-modal.
It has already been said that the mode is affected by the frequen-
cies of the neighbouring classes. The formulae for the interpretation of
mode are based on this very assumption. If the frequency of the
9
130 FUNDAMEN'rALS OF STATISTICS
class or by deducting 10~il X(/I-/1) from the upper limit .of the
modal class. Thus if Z stands for the mode,
The two sets of formulae given above would give different values
of mode as they are based on different assumptions. In the first case
we take into account only the frequencies of the preceding and suc-
ceeding classes whereas in the second case (i) difference of the modal
frequency and the preceding frequency, and (ll) the difference of the
modal frequency and the succeeding frequency, are taken into accou ..<.
The second set of formulae ~re supposed to be better than the
first set and usually mode is interpolated by starting with the lower
limit. As such we shall be making use of the following formula in
the determination of mode in a continuous series.
"*'
v/
Z- I1 + 2/11--/10- 12
1 0
I
(2- 1
I)
Example 16. The following tahle gives the length of life of 150
electric lamps : -
Life (hours) Frequency of lamps
a to 400 4
400 to 800 12
800 to 1200 40
1200 to 1600 41
1600 to 2000 27
2000 to 2400 13
2400 to 2800 9
2800 ~o 3200 4
Calculate the mode.
Soln/ion. Determination of mode by grouping
Life (hours) I Frequency of lamps
(1)
, (2) I (3) I (4) I (5) (6)
0- 400
I
I 4
-.
\.16
I
400- 800 12 ) ")
~52
800-1200 40 J 56
I
18]
1 f93
1200-1600 4:1: J
168 I!oa
1600-2000 27 J
2000-2400 13
140
J 'la,
I·' f~
}22 I
\49
2400-2800
2800-3200
9
4
113
J ! I i
132 FUNDAMENTALS OF STATISTICS
Z=/ + ~1 £0-11 (/
1 2- /1 )L?ttfl?)
Where Z stands for mode, 11 and 12 .stand for the lower and upper
limits of the modal group, 11 stands for frequency of the modal group,
fo stands for frequency of the group preceding the modal group,f2 stands
for frequency in the group succeeding the modal group.
41-40
We get, Z = 1~+ 82-40-27 X ~O
=1226.67 hours
Thus
The modal life of the lamp = 1226.67 hours.
Detel:.tllination of mode by curve fitting
As has been said earlier, the above methods of the calculation
of mode are unsatisfactory. In most of the distributions, as they arise
in actual practice, these methods would not give satisfactory results.
The ideal method of calculating the mode is that of curve :litting. Since
there are many irregularities in the data which we normally come across,
it is necessary to remove them befo~e determination of mode. These
irregularities are removed by the technique of curve fitting. Attempts
jlre made to :lit an ideal curve which gives the closest possible :lit to the
actual distribution. The value of the variable corresponding to the
maximum of this ideal curve is the value of the modt'. The technique
of curve fitting is highly mathematical and should be left to the more
advanced students of this subject.
Determination of mode from mean and median
In a symmetrical distribution the mean, median and mode are
identical. We shall discuss in the next chapter the concept of 'a sym-
metrical distribution which gives a normal curve. In actual practice,
however, symmetrical distributions are very rare, and data usually give
a symmetrical curve. In distributions which moderately differ from
MEASURES OF CENTRAL 'l'E;NDENC'Y 133
Dtawbaoks of mode
Mode is an unsatisfactory average and has many drawbackt.
)ome of them are as follows:
(,) Mode is ill-defined, indeterminate lind indefinite. The veCj
Ist condition laid down for an ideal average that it should be rigidly
efined is not fu11illed by it.
(ii) Mode is not based on all the observations of a series and as
lch the second condition is also not fulfilled by it.
uti) Mode is not capable of further mathematical treatment.
(iv) Mode may be unrepresentative in many cases. If in a series
1000 items 20 have a particular value and other values have frequencies
is than 20, it does not necessarily mean that the value whose frequency
20 is the typical or average value. In such cases data should be
IOverted into class intervals of a bigger magnitude.
(u) In many cases it may be impossible to set a definite value of
.ode. There may be 2, 3 or more modal values.
omparison of mode with mean and median
From' the above discussion, about the merits and drawbacks of
lean, median and mode it is qbvious that mode dbes not stand in
)mparison either to mean or median. Mode no doubt possesses the
lerit of being the most popular item 'of a series and has also the
ivantage of easy calculation and common understandability, yet its
rawbacks are too many to be set' off against these merits. Mean is
.mple in calculation, its value is definite and can be easily determined.
t is amenable to algebraic treatment and is usually not affected much
y fluctuations of sampling. Median is more ea,ily calculated than
ven mean, and in certain cases it is as stable as mean, but if v'ariations
it the values of items .are not uniform, median is indeterminate, and .is
lmost incapable of algebraic treatment. Mode is hardly suitable for
[lost of the elementary studies as it is correctly determined only by
urve-fitting which is an extremely difficult process. It is unrepresen-
ative in many cases, and is not based on all the observations of a series.
rhus, of these tlvee averages, mean has definite advantages over median
.nd mode, though there may be some cases where median or mode
nay have preference over mean. Mode has its own importance and
t JIlay be the reason for giving its value along with mean but it should
)e clearly understood that mode cannot replace mean and for that
natter neither can median do so. However, it should not be ta~en
:0 mean that median and mode are superficial averages and have no
independent virtues. There are certain fields in which Il".t!dian or
mode may give better result than the mean, but sllch cases are few
and the universality of mean cannot be challenged on account of these
~ases. We shall discuss more about this point in a later section after
we have examined the other averages also.
MBASURBS OF CBNTRAL TENDENCY 135
GEOMETRIC MEA.~
Geometric mean is the nth root of the product 9'fn items of a series.
Thus if the geometric mean of 3. 6 and P Ie. ~o be calculated it would
be equal to the cube root of the product of these figures. Similarly
the geometric mean of 8, 9, 12 and 16 would be the 4th root of the
product of these four figures.
Symbolically g=D'¢mlXmsXHlaX ... mn
where g stands for the geometric mean, n for the number of items and
m for the values of the variable.
The calculation of the geometric mean by this process is possible
only if the number of items is very few. If the number of items is
large and their si2:e is big, this method is more or less out of question.
In such cases calculations have to be done with the help of logs. In
terms of logs.
1 _log.rml+1og.ms+log.ms+ .. .log mn
og.g_ 11
or
g- A nti-1og. {
log.III1+10g.Hl2+1nog.Hls+ .. .1og. Hln }
or
. Mean=
Geomc;tnc '.' Anti-
. 1og. [ assume'd 1C?g. + ~ Deviations]
II
g= A n ti'..Iog. 2.1208
- 8 - = A nt!. 1og ..265
-1.841
16+6.3512
Series B. g= Anti-log. 8 =Anti-Iog.2.7938
-.06220
Al~ebraic properties of geometric 'lllean
Geometric mean possesses certain mathematical properties and they
are as follows : -
(i) Just as in case of arithmetic average the sum of the items
remains unchanged if each item is replaced by the arithmetic average,
similarly in case of geometric mean the product of the items remains
unchanged if each item is replaced by the geometric mean. Thus the
total of 2, 4. and 8 is 14 and the arithmetic average is__!3~ If in place
of these figures. we substitute the arithmetic average the total would
still remain 14.. Similarly in caSe of geometric mean the product of
these three figUres 2. 4 and 8 is 64 and the geometric mean is 4. If in
place of these numbers the geometric mean is written the product would
still remain 64.
(;1) On account of the above property of the geometric mean, it
is possible to calculate the combined geometric mean of two or more
senes if only their geometric meanS and the number ofjtems are known.
138 PUNDAMENTALS 01" STATISTICS
. Iog. [10.8681']
""anti- . 1 .2. 1736
--5- =antt~,og
149
If we calculate geometric mean of the five items together we shall
get this very figure. It can be yerHied from the answer ot example. No.
17 in which the geometric mean of these five items has been calculated.
(iii) Just as in the case of arithmetic average, sum of the deviations
from the mea:' on either side is always equal, similarly in case of geo-
metric mean the product of the corresponding ratios on either side
is always equal. If the ratios of the geometric mean to the figures
which are equal to less than it, are multiplied together, this product
would be equal to the product of the ratios of figures more than the
geometric mean.
Thus the geo~etric m.ean of 3, 6, 8 .and 9 is equal to 6. The
product of the ratlos of ltems. equal to it or less than it would be
equal to the product of the ratios of items more than it.
t, g 8 9
Thus 3- X 6" "" g X g
or
6 6 8 9
'3 X'6=6 X 6
This p,:,cperty of the geometric mean is very important. It
indicates that geometric mean measures relative changes. If the price
MEASURES op CENTRAL 'TENDENCY 139
r= n /) Pn __ 1
." Po
Thus if Rs. 1,000 at compound interest become Rs. 1,500 at the end
of 10 years there has been an increase of 50% and the simple tate of
interest is 5%. The compound rate would be
r =10 J~~~~ -- 1
=10'\1"1.5- -1 =1.041-1
=.041 or 4.1%
Whenever we have to find out the average of the rates of increase
_gr decrease, ~uch problems arise. If we calculat~ the mean of the
rates of increase or decrease the study would be Inaccurate as -mean
measures absolute changes but if the geometric mean of the rates of in-
crease or decrease is calculated the results would be accurate, as geo-
metric mean measures relative changes. ,
Merits of geometric mean
Besides the above-mentioned mathematical properties the geometric
mean has many other merits. We shall now examine the worth of
this average by finding out how many conditions Qf an ideal average
(laid down earlier) does it satisfy.
(i) The geometric mean is rigidly defined and its value is a precis~
figure.
(ii) It is based on all the observations of a series. Like arith-
metic average it cannot be calculated, if even a single value of a series
is missing.
(iii) It is capable of further algebraic- treatment. As we have
seen above, various types of mathematical relationships can be establish-
ed between data when a relative study is being made with the help of
geometric mean.
{io) Ge~etric mean is. not much affe~ted by the ftuct?ations of
sampling. It .,.ves comparatIvely more weIght to smaller Items. In
this respect it is better than the arithmetic average and a single big figure
does not push its value very much.
ThuS out of five conditions laid down for an ideal average geo-
metric. meal' satisfies four.
MEASURES OF CENTRAL TENDENCY 141
1,450 .00069
7,200 .00014
120 .00833
1060 .00094
150 .00667
480 .00208
360 .00278
96 .01042
200 .00500
520 .00192
60 .01667
-~·m~048
SO/lition :
Calclllation of quadratic mean
Size of items Square of the size
(m) (ml)
10 100
30 900
40 1600
50 2500
70 4900
n=5 10000
1000~
Qm= j 5
=44.72
MEASURES OF .cENTRAL TENDENCY 145
The arithmetic average of the series would have been 40. Quad-
ratic mean is seldom used as an average except in case of finding out
the average of the positive and the negative deviations from a measure
of central tendency. In that case it is known as standard deviation: We
shall discuss it in the next chapter.
Moving average
Moving average is calculated by using the technique of simple
arithmetic average. It is useful in removing the irregularity of time.
series and is usually calculated to study the long period trend. The first
thing to be decided in the calculation of moving average is the "period"
for which the average is to be calculated. The moving average may
be three-yearly, five-yearly or seven-yearly depending on the nature of
the series. We shall discuss this problem of periodicity of moving
average later in the chapter on Analysis of Time Series. For the present
we shall simply illustrate the technique of its calculation.
If a three yearly moving average is to be calculated the arithmetic
average of the first three years' figures would be found out and written
against the middle year (second year in this case). Then the'first year's
figure would be dropped and the aritbmetic average of second, third
and fourth years' figures would be calculated and written against the
third year. Similarly the arithmetic average of the figures of third,
fourth and fifth years would be written against the fourth year and so
on. The following example would illustrate the method of its cal-
culation.
ExatlJple 22. Calculate the three yearly moving average of the
following figures relating to the annual sales of a concern (in lakhs of
rupees).
Calculation of tbree yearly moving average
Year Sales (in lakhs 3-Yearly moving 3-Yearly mov-
of rupees) Total ing average
-
1945 8 ... ...
1946 9 25 8.3
1947 8 24 8.0
1948 7 23 7.7
1949 8 24 8.0
1950 9 27 9.0
1951 10 30 10.0
1952 11 32 10.7
1953 11 34 11.3
1954 12 33 11.0
1955 10 ... ...
Similarly, if a 'five yearly moving average Has to be calculaJed
the first five figures (of years 1945 to 1949) would be added 31'd their
10
146 FUNDAMENTALS OF STATISTICS
average would be written against the third year or 1947, then the next
five figures leaving the first (of years 1946 to 1950) would be averaged
and the figures written against the middle year of 1948 and so on.
Moving a~erage is very helpful in removing the fluctuations of
time series and giving an idea about the general trend.
Progressive average
It is also calculated by the help of simple arithmetic average. It
is a cumulative average and is different from the moving average. In
the calculation of this average, figures of all previous years are a,dded
and no figure is left out as in the case of moving average, Thus the
progressive average of the second year would be equal to the arithmetic
average of the figures of the first two years; the progressive average
of the third year would be equal to the arithmetic average of the figures
of the first three years and so on. .
The following illustration would clarify the procedure :
Example 23. C~culate the progressive average of the data given
in Example 22:-
Ca/(ulation oj progressive average
I
Years Sale (in lakhs of ProgressIve Progressive
rupees) total average
1945 8 8 8.0
1946 9 17 8.5
1947 8 25 8,3
1948 7 ~2 8.0
1949 8 40 8.0
1950 9 49 8.1
1951 10 59 8.4
1952 11 \, 70 8,7
1953 11 81 9,0
1954 12 93 9.3
1955 10 103 9.3
Pr<;>gressiv~ average is used by business-houses particularly in early
years wIth a VIew to compare the current profits with those of
the past,
Relation between different averages
When different averages have been calculated from a given set of
observations it will be found that there is a relationship between their
values, Generally these relationships are of the following type : -
(i) If a series is «normal" or ".ryll/metrical" the values of its mean
median and mode would be identical.
M~SURES 00F CENTRAL TENDENCY 147
there are wide variations in any series median is the most unsuitable
average. Similarly, if the enquiry under question relates to, say,
"average size of ~eady-made clothes" or "size of typical farms," the
average to be used is mode. The use of mode is every day increasing
in business and commerce, Modal output per machine or modal time
needed to produce a commodity are very important concepts in the
business world of today. But mode i~ very often indeterminate and
unrepresentative and is entirely unsuitable for many enquiries. It is
not capable of further algebraic treatment and has limited use. If an
e~quiry is being conducted to study the relative changes in the price
level at two periods, neither arithmetic average nor median or mode
would give satisfactory results. In such cases the best average is the
geometric mean. In the construction of index numbers the use of
geometric mean is almost' universal. But geometric mean is entirely
useless if bigger items have to be given more weight or if a study of
absolute, rather than relative changes, is undertaken. Harmonic mean
similarly is the best average if small items have to be given more weights
or if we have to find out the average of certain types of rates, etc. If,
for example. we have to calculate the average speed of a person who
walks four miles per hour, for the first mile a~d three miles an hour,
for the second mile, arithmetic average would give inaccurate results.
Harmonic mean of these figures which would be s._,4, . is the correct
average. This person takes fifteen minutes to cover the first mile and
twenty to cover the second mile or in thirty-five minutes he covers
two miles. The speed is '.,~ miles per hour.
The above discussion clearly shows that each type of average has
its own field of importance an.d usefulness. Before selecting an average
ail these considerations should be kept in mind. In actual practice
tWo or three averages of a series may be necessary for a proper under-
standing of its special 'features. A discriminate use of averages is
assential for souna statistical analysis. But all said and done, it has to be
edmitted that arithmetic average would be found to be ideal average
for a larger number of enquiries, than any other average.
Limitations of averages
Even when an average .has been selected very judiciously and is
ideal for a particular investigatioo, it should never be forgotten that
even the best average has its own limitations. An average is a single
figure representing a series, and no single figure can condense in itself
all the properties of the items which it represents. This is the reason
why conclusions which are drawn on the basis of a study of averages
are not always infallible: The average height of women may be less
than the average height of men but it does not mean that no woman
can be taller than a man. The well-known example of the mathema-
tician who calculated the average depth of a stream and finding it lower
than the average height of his family members, attempted to cross it anct
drowned with his family in the process, is an illustration on this point.)
MEASURES OF CENTRAL TENDENCY 149
The average depth of the river may have been lower "than the height
of the shortest member of the mathematician's family, but at some
point the depth of the stream must have been more than the height of
the tallest member in the group.
Average is a single ~gure and can be expected to represent a series
only as best as a single figure can. Averages do not throw light on the
formation of a series or distribution of frequencies round the various
values of a variable. It is for this reason that measures of dispersion
and skewness are calculated. Averages do not reveal the whole story of
a series. A student getting 30, 40 and 50 markis respectively, in three
examinations would have the same average as another who gets 50, 40
and 30 marks respectively. The progress of the two students is in
different directions but on the basis of the averages' they will be ranked
together.
In fact if wrong conclusions are drawn by the use of judiciously
selected averages, it is not the fault of the averages. The fault lies
with the person drawing the conclusions. The inherent limitations of
averages should always be kept in mind and they should not be expected
to reveal more than what they can.
WEIGHTED AVERAGE
Need and meaning. In the calculation of simple. average each
item of the series is considered equally important but there may be
cases where all items may not have equal importance, and some of
them may be comparatively more important than others. The funda-
mental purpose of finding out an average is that it shall "fairly" re-
present, so far as a single figure can, the central tendency of the many
varying figures from which it has been calculated. This being so,
it is necessary that if some items of a series are more important than
others, this fact should not be overlooked alt<;>gether in the calculation
of an average. If we have to find out the average income of the
employees of a certain mill and if we simply add the figures of the
income of the manager, an accountant, a clerk, a labourer and a watch-
man and divide the total by five", the average so obtained cannot be
a fair representative of the income of these people. The reason is that
in a mill there may be one manager, two accountants, six c~erks, one
thousand labourers and one dozen watchmen, and if it is so, the rela,...
tive importance of the figures of their income is not the same. Similady
if we are finding out the change in the cost of living of a certain group
of people and if we merely find the simple arithmetic average of the
prices of the commodities consumed by them, the average would 'be
unrepresentative. All the items of consumption are not equally.im'por-
tanto The price of salt may increase by 500% but this wiP not'affect
the cost of living to the extent to which it would be affected, if the
price of wheat goes up only by 50%. In such cases if an average has
to maintain it,) representative character, it should take into account
the relative importance of the different items from which it is being
calcl,1lated. The simple average gives equal importance to all the
items of a series. In this sense a simple average is also a wei
150 FUNDAMENTALS OF STATISTICS
average, because 'in a simple average the relative importance of all the
items is supposed to be the same. But in actual practice the impor-
tance of various items is not always the same and in such cases the
simple arithmetic average and the weighted arithmetic ~verage would
differ in value. Therefore, in order that an average may be a typical or
a representative average, it is necessary that the relative importance
of items is taken into account in its calculation. Thus if item A is
considered-to be five times as important as item B, the weights of these
items respectively should be 5 and 1. Weights are. figures which indicate
the relative importance of variolis items.
Difficulties in weighting. It is easy to say that in many cases it is
better to take into account the relative importance of items and to have
a weighted average, rather than simple average~ but it is very difficult
to decide the relative importance of different items. If we have to
decide the relative importance of items, the problem that would arise
would be about the basis or criteria of determining the relative impor-
tance. How should weights be assigned, is a question very difficult to
answer. In fact no hard and fast rule can be laid down for the assign-
ment of weights, as the relative importance of items depends on ~he
nature and purpose of the investigation. In some cases the weights
are determined without much difficulty, and such cases are those where
weights are determined on the basis of some evidences associated with
given data. If we have to decide the weights of the income figures of
a manager, an accountant, a clerk, a labourer and a watchman, the
simplest method would be to give them weights in accordance with
their number. Thus if there is one manager, two accountants, six
clerks, one thousand labourers and twelve watchmen, the weights
would also be these very figures respectively. 1'6 calculate the average
income of these people if instead of finding out the simple arithmetic
average of the figures of their incomes, we multiply their incomes by
their numbers (weights), and if the total of these products is divided
by the total of weights, we shall get the weighted arithmetic average
of the series. This average' would be a better representative of the
series than the simple arithmetic average. Many writers like Secrist
and Kelly are of opinion, and Eightly too, that this is not a weighted
r
average. When values ar multiplied by their frequencies and the
sum of their product~ is dIvided by the total of their frequencies, it
is in fact a simple arithmetic average of the series. In cases of discrete
and continuous series we have already seen that arithmetic average
is c:Vculated by multiplying the values by their respective frequencies.
Sueli writers are of the opinion that weights should be determined
by some such evidence, which is not associated with the items them-
selves. But it is neither easy nor safe to associate weights to various
items arbitrarily, as in such cases weighted average may give misleading
conclusions. Weights have to be judiciously selected.
In fact difficulties in'the selection of p!Oper weights are so many.
that many writers are of opinion that it is better to have simple average
than to have weighted average of doubtful fairness. Thus Bowley says:
\
)
MEASURES OF CENTRAL TENlJENcY 151
Where ~ mw stands for the sum of the products of the values and their
respective weights, and ~ w for the sum of the weights. The following
illustration would clarify the formula : -
Calculation of the weighted arithmetic average : direct method
Example 24. Calculate the weighted arithmetic average of the
prI~e of tea, from the following data assuming the quantities sold as
welghts:
Price per pound Quantities sold
(Rs.) (pounds)
2.25 14
2,:"0 11
2.75 9
3.00 6
152 FUNDAMENTALS OF STATISTICS
error in the size of ilcou. The reason for it is that the errors in weights
are u<ually unbiased and compensate each other while errors in the
values of items are generally biased ones. It is for this reason that
we had concluded above that attempt should be made to make the
items free from bias and we should not strain after exactness in
weights. According to King, "The items should be as exact as pos-
sible and the weights used should be approximately accurate ...... ".
Short-cut method of calculating weighted arithlnetic average
The method discussed above for the calcluation of weighted
arithmetic average is sometimes found to be very tedious particulatly
when the size of items is big. In such cases a short-cut method can
be used. In this method, first an average is assumed and the deviations
of each item from the assumed average are multiplied by the respective
weights of the items. The sum of these !:-'roducts is then divided by
the total of weights and added to the assumed average. The result in
figure is the actual weighted arithmetic average of the series.
.+
1: d'1JI
· 11y a ' =x -Yw
Symb 0 1lea
Where a' stands for the weighted arithmetic average x' for the
assumed average 1: d'1)) for the sum of the products of the deviations
and the respective weights of items, and 1:D' fOr the total of the weights.
The following example would illustrate the formula : -
Example 25
From the following table calculate weighted average price of tea.
Price per lb. Lbs. sold
Rs. p.
1 00 200
1 35 275
1 62 400
1 75 150
2 00 100
2 ~ ~
2 50 50
SOilltioll. Caiculaiioll oj the weighted averag~ price oj a lb. oj tea
Deviations from
Price in Lbs. sold assumed weighted Total devia-
paisas per lb. average (175) tions
(m) (w) (d') d'w
100 zuu -75 -15,000
135 275 -40 - 9,625
162 400 -13 -15,200
175 150 0 U
200 100 +25 + 2,500
225 75 +50 + 3,750
250 50 +75 + 3,750
~w-l,247 1 1:m=1250· I 1:d' UJ = -19,825
154 FUNDAMENTALS OF STATISTICS
, '+ ~ (J'w)
a =X
~ (w)
where a' stands for the weighted average; x' for assumed weighted
ayerage: w, for weight and Jr for deviation from assumed weighted
average .
• We get, .a' = 175 + - ~~58~5 = 175- 15.86 = 159.14 paisa
(a) When the importance oj all the items in a series is not equal. We
have seen that simple arithmetic average gives equal importance -to all
the items of a series. In many cases all the items may not be of equal
importance. If it is so, a simple arithmetic average would give us
misleading conclusions. The following example would clarify the
point : -
~~
....
I~~ '"81.,)
Subject /weight Marks Marks Marks ...c: .. :e~ :ern
A B C .....bO~... bOlo<
..... c.s bO~
..... C<!
Statistics 4 63 60 65
a·
e3252 ~a
240 260
e3 S
Mathematics 3 65 64 70 195 192 210
Economics 2 58 56 63 116 112 126
Hindi 1 70 80 52 70 I 80 52
~
---
Total 10 256 260 250 633 624 648
(d) When ratio.r, percentage.r or ratn are hfin.g averag.a. Suppose the
heights of four groups' of persons are measured and it i~ found that
Scy, of the persons in group A, 10% in group B, 8% in group C, and
4~Z in group D hav"C heights less than 50" and it is required to find
mit the percentage' of people in all the groups combin("d together
:'Whose heights would be less than 50'. Simple arithmetic average of
these percentages would give a misleading conclusion. The reason is
that we do not know the number of persons in each group. In such
cases we should presume certain numbers in each group, and then on
that basis calculate the weighted arithmetic average, which gives the
correct results. If suppose the number of persons in these groups
were respectively 50, 70, 75 and 55 the weighted arithmetic average
can be ca1culated by taking these frequencies as weights of the various
percentages.
The percentage ratio of people with heights less than 50' (in all
the groups combined together) wo~ld be : -
(5 X 50)+(10 X 70)+(8 X 75)+(4x 55)
----50+ 70+ 75+5-5- - -
or
250+700+600+220
250------
or
1770
250 or 7.08%
158 FUNDAMENTALS OF STATISTICS
(e) Whm it is desired 10 caleulate the average of series from the average
oj its component parts. We have already discussed in the section on
simple arithmetic average, how the means of two or more compo~ent
series can be combined in one. The method involves the calculation
of weighted arithmetic average of the different means, using the number
of items in each case, as the weights. Thus, if the average of a series
is 20 and the number of items in it is 10 and the average of another_ series
is 25 and the number of items iD.!it is 15, the combined average of the
two series would be equal to the weighted average of these two averages,
the weights being 10 and 15 respectively (the number of items in each
case). The weighted arithmetic average would be : -
(20x;0)+(25x15) 23
10+ 15 or
The simple arithmetic average 10f. the two averages would be
--2 or 22.5. ThIS'"IS an Inaccurate
20+ 25 >
It 15 mu1tIP
average, as. 1"f"' . I"Ie d
by the total frequency (now 25) it would not give the correct aggre-
gate. If, however, we multiply the weighted arithmetic average or 23,
by the total frequency or 25, the product would be 575 which is the
total of the aggregates of the two series (200+375).
Discriminate weigbting. We have seen that in many cases the SImple
arithmetic average and weighted arithmetic average differ considerably,
and the question that arises is, which of the two averages should be used
in such cases to represent the series? For this, it is necessary to study
the weights of the items in relation to their si2:e. Sometimes it would
be found that big items in a series are associated with big, weights and
small items with small weights. In such cases weigh~d arithmetic
average would be more than the simple arithmetic average .... Thus the
simple arithmetic average of natural numbers 1, 2, 3, 4, 5, 6, 7, 8, 9,
and 10 is 5.5, and if these numbers are associated with weight~ whose
respective values are 1,2,3,4,5,6, 7, 8, 9, and 10 the weighted arithmetic \
average would be 7.0.
If, on the other hand, big items are associated with small weights
and small items with big weights, the weighted arithmetic average
would be less than the simple arithmetic average. If the weights in
the above case were respectively 10, 9, 8, 7, 6, 5, 4, 3, 2, and 1 the
weighted arithmetic average would be 4.0 whereas the simple arith-
metic average is 5.5.
Chance weighting. If weights are indiscriminately associated with
values or, in other words, if big items are associated with both big and
small weights and similarly small items with both small and big weights,
the weighted average and the simple average would not materially d11f~r.
Thus if for the'values of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10 the weights
wett respectively 10, 3, 6, 4, 5, 8, 2, 1, 9, and 7 the weighted arith-
metic average would be 5.4 and the simple arithmetic average is 5.5.
MEASURES OF CENTRAL TENDENCY 159
. 1og. (42.9129)
=antl- 20
= anti-log. 2.1456
= 140 (to the nearest whole number)
The method discussed above is the direct method. We have
seen in the calculation of simple geometric mean that a short-cut method
can be used by assuming a geometric mean. Weighted -geometric
mean can also be calculated by the short-cut method. The deviations
of the logs. from the log. of the assumed geometric mean are multi-
plied with their respective weights, and the sum of the products is divided
by the total of the weights. The resulting figure is added to the log.
of the assumed geometric mean. The anti-log. of this figure wonld
give the actual weighted geometric mean of the series.
where d' stands for the deviations of the logs. from the log. of the
assumed mean.
MEASURES OF CENTRAL TENDENCY r61
Items Weight
t 5
.5 10
10.0 20
45.0 10
175.0 15
.ot 2
4.0 15
11.2 8
Soilltion. CD1/Jplltation of the wlighted harmonic mlan
Items ReCIprocals Weight WClghtXftecl.
1 1.0000
- 5
/
procals
5.0000
.5 2.0000 10 20.0000 )
10.0 .1000 20 2.0000 (
45.0 .0222 10 .2220
175.0 .0057 15 .0855
.01 100.0000 2 200.0000
4.0 .2500 15 3.7500
11.2 .0893 8 .7144
85 1231.1719
1:w
. ___ , 1:.,,-
-ReClp~ };W
• cal 231.7719
... ReClpro 85 . --, 2727
.. RcClproQU.
-.3663
QueStions
1. What is meant by measures of central tendency? What are the characteristics
of a good measure of central tendency i'
2.. Define arithmetic average, geometric mean, median and mode. Which of
these is most roprosentative and why i' (M. Cam. Au'" 1945).
3· What is a statistical average? What are the desirable properties £01' an ave-
rage to possess? Which of the averages, you know, possess most of these proper_
ties? (M...4. Delbi,19H).
4. What are the algebraic properties of the arithmetic average?
. ,. Define weighted average. How does it ditIer from a simple average? Is a
weIghted average better than a simple one? Give reasons.
6. Discuss critically the use of weighted mean in statistics.
(B. CDIII. Cal&tll/", 1937).
,. What are the algebraic .properties of the geometric mean? Is it a better
average than median and mode i' If So how?
8. Compare ~nd contrast the relative merits and demerits of the variouS measur es
of central tendency which you know.
MEASURES OF CF.NTRAL TENDENCY 163
29. Make a frequency table having grades of wages with class intervals of two
Annas each from the following data of daily wages received by 30 labourers in a certain
factory and then compute the average daily wages paid to a labourer.
Daily wages in annas,
14, 16, 16, 14, 22, 13, 15, 24, 12, 23,
14, 20, 17, 21, 18, 18, 19, 20, 17, 16,
15, 11, 12, 21, 20, 17, 18, 19, 22, 23.
(B. A. Hons. PUlljab, 1945).
30. The following table gives the monthly average of automobile production
in the United States for the year 1926-1932 (unit 1,000 cars).
Year Production Year Production
1926 358.4 1950 279.7
1927 283.4 1931 199.1
1928 363.2 1932 114.2
1929 446.5
Calculate the average per cent of change per year.
31. The following is the table of the age of 30 adult. persons
Years 1 2 3 4 5 6 8
20-29
°
2 1 2 2 1
7
1 1
9 Total
10
30-39 2 1 2 1 2 8
40-49 2 2 1 1 6
50-59 1 2 1 4
60-69 1. 1 2
Thus there' are two persons of 23 years, one of 57 years and so on.
Find out the mean of the series
(a) by using only totals of class intervals.
(b) by using the entire data
32. A candidate obtains the following percentages in an examination: Sanskrit
75 ; Mathematics 84 ; Economics 56 ; English 78 ; Politics 57 ; History 55 . Geo-
graphy 47. It is agreed to give double weight to mlrks in English, Mathemati~s and
Sanskrit. What is the Weighted and unweighted mean ?
33. Explain what is meant by weighted ave rag,?, and discuss the effect ofweighting..
Calculate (i) the unweighted mean of the pn.ces in column III and (ii) the mean
-obtained by weighting each price by the quantIty consumed.
I II III
Articles of food Quantity consumed Price in rupee per
maund
Flour 11.5 rods. 5.8
Ghee 5.6 mds. 58.4
Sugar .28 mds. 8.2
Potato .16 mds. 2.5
Oil .35 mds. 20.0
(M. A. Cal., 1937).
MEASURES OF CENTRAL' TENDENCY
34. The following table gives the number of employees and their monthly earnings
in two factories of a particular city :
A B
....
Description No. of Monthly No. of Monthly
of workmen employees earnings emplQyees earnings
Rs. Rs.
(0) 3 800 2- 750
(b) 20 145 10 150
(f) 15 50 15 60
(d) 25 30 25 50
(e) 80 35 40 40
(f) 250 20 120 20
Compare the weighted average.
35. Suppose that an automobile makes a 200 mile trip, covering the first 100
miles at the rate of 50 miles an hour and the second 100 miles at the rate of 40 miles
an hour. What is its average speed ?
36. A railway train runs for 30 minutes at a speed of 40 miles an hour and then,
because of repairs of the track runs for 10 minutes at a speed of 8 miles, an hour, after
which it resumes its previous speed and runs for 20 minutes except for a period of 2
minutes when it had to run over a bridge with a speed of 30 miles per hour. What
is its average speed ?
37. The following table indicates the increase in cost of living over July 1946,
for a working class family as at 1st January 1955, and the weights assigned to various
groupS.
38. The table shows the age distribution of married females according t9 sample
census of 1941 in the Baroda State.
Calculate the median age of married females and also the two quartiles.
(T. A. & A. S •• elr., EXalll., 1942)
FUNDAME.Nl'AT-,~ OF Sl',o\TISl'ICS
39. Calculate the values of the median and the two quartiles for the following :-
40. Calculate the mean and median for the following distributioo.
41. \he foll~wing ?ble !!ives t?e distribution of the male. and female popu!at:r:::!.
of a certam area jfi India. By finding the mean age, the median age, and the ilpper
and lower quartile ages, make comments on the age distribution of the tWG sexes
in the area :-
47. The following table gives the marks obtained by 65 students in Statistics in
:ertain examination :-
Examination marks Number of students
More than 70% 7
60% 18
50% 40
40% 40
30% 63
20% 65
Calculate the median of the above series.
48. Find out the median of the following series
Wages No. of labourers
Rs.
60-70 5
50-60 10
40-50 20
30--40 5
20-30 3
49. The following is the age distribution of candidates appearing at the Matr
culation and Intermediate Arts examinations of the Patna University in 1937.
_Age in yeats 12- 13- 14- 15 16- 17 18 19 20- 21 22 Tota
Matriculation 5 48 189 303 522 980 981 794 515 474 X 481
Intermediate X X X 5 45 87 127 150 155 127 175 87
Compare the median and modal ages of the Matriculation candidates with thos
of 1. A. candidates. (M. A. Pallia, 1940
50. The following table shows the frequency with which profits are made. Wha
is the Mode ? I
Frequency
Exceedi.ng Rs. 3,000 and not exceeding 4,000 83
4,000 5,000 27
.. 5,000 6,000 25
" 6,000 7,000 50
.. 7,000 8,000 75
" 8,000 9,000 38
" 9,000 " 10,000 18
"
51. Find the modal wage group from the following table :
Wages in Rupees No. of labourers
Above 30 520
40 470
50 399
60 210
70 105
" 80 45
90 7
52. Find out the median and the mode for the following table
No. of days absent No. of students
Less than 5 29
10 224
15 465
20 582
25 634
" 30 644
" " 35 650
653
" " 40 655
45
"
, -, 1.
MEASURES OF CENTRAL TENDENCY , '.
53. Find the median and mode from the following table :
Class Frequency Class
0- 3 Frequency
4 18-20 24
3- 6 8 20-24
6-10 14
10 24-25 16
10-12 14 25-28
12-15 11
16 28-30 10
15-18 20 30-36 6
54. Find the modal wage from the following data :
Weekly Wage No. of wage-eamers
Sh. d. Sh. d.
12 6 to 17 6 4
17 6 22 6 44
22 6 27 6 38
27 6 32 6 28
32 6 .. 37 6 6
37 6 42 6 8
42 6 47 6 12
47 6 .. 52 6 2
52 6 .. 57 6 2
(B. Com., Rajplliana, 1949)
55 1'lnd out the mode of the following 'Seri<;s ' -
Size 0 It n::. Frequency Size of item Frequency
0-9.99 10 40-49.99 11
10-19.99 14 50-59.99 13
20-29.99 16 60-69.99 17
30-39.99 14 70-79.99 13
56. Calculate the geometric mean of the following figures : -
5, 10, 192, 14,374, 20,498, 1,20,674, 15,491
57. Compute the weighted geometric average of relative prices of the following
'~mmodities for the year 1939 (Base year 1938-price 100) :_
Weight
Commodity Relative Price (value produced in 1938)
Corn 128.8 1,385
Cotton 62.4 819
Hay 117.7 842
Wheat 99.0 561
Oats 130.9 408
Potatoes 143.5 194
Sugar 125.6 142
Badey 150.2 100
Tobacco 101.1 103
Rye 116.2 25
Rice 117.5 17
Oil seeds 78.7 29
How does it differ from the unweighted geometric mean, and why ?
(B. Com., Alld. 1943)
~, 58. The following table gives index numbers for various items entering the cost
)£ liVing. Find an index of the cost of living by computing a weighted average of
;!lese items. The weights to be used are also given in Ithe table : -
FUNDAMENTALS OF STATISTICS
Table
Items Index Weight
1. Clothing 77.3 13
2. Food 74.5 43
3. Fuel and light 85.8 6
4. Housing 64.6 18
5. Sundries 92.5 20
59. Compute the geometric mean of the following series
Marks No. of students
o-tO 5
10-20 7
20-30 15
30-40 25
40-50 8
60
60. The annual incomes 'of fifteen families are given below in rupees : -
80, 2500, 90, 1200, 1450, 7200, 120, 1060, 150, 480, 360, 96, 200, 520 and 60.
Calculate the Harmonic Mean.
61. The following table gives-(o) the total number of persons possessing hold-
ing~ of different sizes and (b) the total area of land comprised in holdings of different
sizes in U. P. during the year ending on 30th June, 1945 :
Total number Total area in
Size of holdings in acres of hersons in thousands
t ousands of acres
Not exceeding .5 2,643 925
" .. 1
2
.. ....
Exceeding .5 but not 1
2
3
1,696
2,205
1,430
1,556
3,361
3,373
3 "
" .. 4 "
5 "
"
"
4
5
6
992
703
515
3,458
3,150
2,817
" 6 " " 7 378 2,446
" 7 " "
"
"
8 "
9
..
" ,"
"
"
8
9
10
283
216
171
2,112
1,830
1,617
" 10 12 206 2,264
" 12 " " 14 138 1,776
" 14 " " 16 96 1,424
" 16 " " 18 68 1,252
" 18 " " 20 51 972
" 20 " " 25 70 1,570
" over 25" " 115 5,310
Grand Total 12,276 41,113
(i) Calculate the average size of holdings in the U. P.
(ji) Assuming the minimum size of an economic holding to be 10 acres-
(1) Calculate the percentage of the area under uneconomic holdings in 1945 in
the U. P.
(2) Calculate the percentage of persons having uneconomic holdings in the
U. P. in 1945. (P. C. S. 1951)
62. (0) Define a 'weighted mean.'
If several sets of observations are combined into a single set show that the means
of the combined set is the weighted means of the several sets.
(b) The number of asthma sufferers whose first attacks came at various ages is
given in the follOWing table. CaIculate the mean age at the first attack by any method.
MEASURES OF CENTRAL TF.NDENCY
T1\.aLE
Age at
first 0-5 5-10 10-15 15-20 20-25 25-3030-35 35-40 40-45 45-50 50-55 55-60 60-65
attack
Number
of cases 298 113 64 61 70 81 I 77 64 53 40 35 24 20
(I. A. S. 1955)
63_ Fi~d the mean, mode, standard deviation and co-efficient of skewness for
the followtng . -
Year under 10, 20, 30, 40, 50, 60.
No. of persons 15, 32, 51, 78, 97, 109.
(P. C. S. 1952)
64. What are the desiderata for a satisfactory average? Point out the special
characteristics of the arithmetic mean, the median a~d the geometric mean.
Explain the step-deviation method for finding out the arithmetic mean of a
frequency distribution. Derive the useful formula and apply it to find the arithmetic
mean of the distribution.
Variate 5, 10, 15, 20, 25, 30, 35, 40, 45, 50.
frequency 20, 43, 75, 67, 72, 45, 39, 9, 8, 6.
~ vi (P. C. s. 1954)
6? The following table gives the monthly income of 24 families in a certain
locallty :-
Serial No. of Monthly income Serial No. ot Monthly income
the family
1
in Rupees
60
the family
13
in Rupees
96
I
2 400 14 98
3 86 15 104
4 95 16 75
5 100 17 80
6 150 18 94
7 110 19 100
8 74 20 75
9 90 21 600
10 92 22 82
11 280 23 200
12 180 24 84
Calculate the arithmetiC average, the median and the mode of the above incomes.
Which average would represent the above series the best? Give reasons.
(P. C. S. 1955),
/66. Figures concerning the number of deatbs in two towns in a particular year are
given below : -
Town A Town 15
~ge-group No. of persons Deaths No. of person s Deaths
In years. living living
0-10 500 100 12,000 4,800
10-20 3,000 150 6,000 360
. 20-30
30-40
over -40
7,000
10,000
19,500
200
300
750
9,000
25,000
48,000
180
250
576
Total 40,000 1500 1,00,000 6,166
Compare the health conditions in both towns.
(P. C. S. 1955)
174 FUNnAMRNTAIS 0.1' STATISTICS
67. You are given the following statistics of population and unemployment-in ;-
(0)" Your country as a whole for a standardised age distribution.
(b) The local administrative area in which you live.
Calculate (i) the standardized unemployment rate in the country as a whole, (ii)
the standardised ratC of unemployment in the local Mea and (iii) the crude rate of
unemployment in the local area.
Age (Years)
16--;3lr 30---4S- 45-60 60-75 Total
Standard population
Age constitution 250 350 300 100 1,000
Unemployment rate
per cent 5 8 12 15 -
Local population
Age constitution 300 300 350 50 1,000
Unemployment rate
per cent 4 9 12 20 -
(P. C. S. 1956).
68. Fifty items sold in Department A of the Comer Store had a mean price of 30
rupees. Seventy-five items sold in Department B had a mean price of 20 rupees. The
mean price of commodities sold in Departments A and B was 24 rupees. Is it right?
69. If Xl and,)(2 are two positive values of a variate, prove that their geometric.
mean is equal to the geometric mean of their arithmetic and h:trmonic means. .
70. (0) An examination candidate's percentages are' ; English, 73; French, 82;
Mathematics, 57; Science, 62; History, 60; Find the Candidate's weighted mean if
weights of 4, 3, 3, 1, 1 respectively are allotted to the subjects.\
(b) The average percentages for the same examination were 57, 52, 48, 55, 50
for the above subjects respectively. Find the weighted mean for the whole examination.
71. "The inherent inability of the human mind to grasp in its entirety a large body
of numerical data compels us to seek relatively few constants that will adequately des-
cribe the data."-R. A. Fisher.
Comment.
72. Find the Average ages of men And WOmen blood donors from the following
data : -
Age, years 10-19 20-29 30-39 40-49 50-59 60-69
Frequency, Men 3016 6894 9229 5714 3575 1492
Women 7845 16,008 13,107 9685 6374 2137
Age years 70-79 80-89 90 and over
Frequency, Men 170 9 1
Women 173 9
73. A candidate obtains the following percentages in an examination : Latin,
75; Mathematics, 84; French, 56; English, 78 ; Science, 57 ; History, S4 ; Geo-
grapby 47. It is agreed to give double weight to the marks in English, Mathematics
and Latin. What is his weighted mean ?
74. Tbe frequency distributions of real income in rupees of the employees of a
big industrial concern, in two different periods. are as given below
Frequency
Income in Rs. Period t Period 2
0-50 90 200
50-100 150 400
100-150 100 120
150-200 80 100
200-250 70 150
over 250 10 30
500 1,000
!'dEASUllES OF CENl'RAL TENDENCY
175
The total income of 10 employees In the frequency class '':Iver 250' in Period 1 is
Rs. 3,000 and that of 30 employees in Period 2 is Rs. 18,000.
(a) Compute the mean and median incomes for the two periods.
(b) Write a very brief note on the .relative economic conditions of the employees
in the two periods, supporting your statements by analysis of the given
data, if, necessary.
(I) Every employee belonging to the top 25 per cent of the earners is required to
pay 1 per cent of his income to a worker's relief fund. Estimate the in-
crease in contributions to this fund from Period 1 to Period 2; (1. A. S.1958)
75. The following are the monthly salaries in rupees of 30_ ~mployees of a firm:-
139, 12.6, II4, 100, 88, 62. 77, 99, 10 3, 144. 148, 63. 69. 148, 132., II8. 142.
16, 12.3, 104,95, 80,85, 106, 12.3, 133, 140, 134, 108,12.9.
The firm gave bonus of Rs. 10, 15, 20, 25, 30 and 35 for individuals in the res-
pective salary groups-Exceeding 60 but not exceeding 75, exceeding 75 but not ex-
ceeding 90 and so on upto exceeding 135 but not exceeding 150. Find out the average
bonus paid per employee. (B. Com., B. H. U.)
76. For a certain group of 'Saree' weavers of Banaras, the median and quartile
earnings per week are Rs. 44.3. Rs. 43.0 and Rs. 45.9 respectively. The earnings for
the group range between Rs. 40 and Rs. 50. Ten percent of the group earn under
Rs. 42 per week, 13 percent earn Rs. 47 and over and 6 percent Rs. 48 and over. put
these data into the form of a frequency distribution and obtain an estimate of the mean
wage. (P. C. S., 19~6).
77. From a frequency.distribu.tion of marks in AcCOunts of 100 students, mean
was found to be 35. Later It was discovered that the marks 35 were mis-read as 25
Find the concet mean.
78. From the following data. find the missing frequency.
No. of Tablets. 4 - 8 - I2. - 16 - 2.0 - 2.4 - 2.8 - 32. - 36 - 40
No. of Persons cured II 13 16 14 9 17 6 4
The average number of tablets given to cure fever was 20.
79. Calculate the Median, Quartiles, 6th Deciles and 70th Percentile from the
following data : -
Marks less than 80 70 60 SO 40 30 2.0 10
N? of Students. 100 90 80 60 52 2.0 13
(B. Com., Raj., 1951).
80. (a) From the data given below, find the mode:
(b) If the mode and the median of a moderately asymmetrical series are 16
inches and 20.2 inches respectively, compute the most probable median.
(D. C01t1., Delbi, 1960).
SI. Recast the following cum_ulative table inlO the form of ~n "cdinary
frequency distribution and determme the value of Mode by usmg formula
Mean.Mode.= ~(Mean-Mcdian). -'~ -
176 FUNDAMENTAlS OF STl'TISTlCS
No. of days absent No. of students No. of days ab- No. of students
sent
- - - _ - __ - - - ... -_- - -_ ---_ ....
Less than 5 29 Less than 30
[0 224 H
[5 46 5 40
20 582 45
25 634
(B. Com., Luckno1Jl, 1957)
82. A taxicab drives from a plain-town to a hill-station, 60'miles distant, at a
mileage rate of 10 miles per gallon of petrol and on the return trip at 15 miles per gallon.
Find the harmonic mean rate of mileage per gallon. Verify that this is the proper
average in this particular case.
83. An aeroplane flies around a square the sides of which measure too m~les
each. The aeroplane covers at a speed of 100 miles per hour the first side, at 200 mdes
per hour the second Side, at 300 miles per hour the third side and at 400 m.p.h. the
fourth side. What is the average speed of the aeroplane around the square ?
8-4. A train moves first to miles at the rate of 10 m.p.h. next 20 miles at the rate
of 30 m.p.h .• and then due to repairs in the track another 5 miles at the speed of 5
miles per hour. It covers the last 15 miles at the rate of 10 miles an hour. Find the
average speed of the train per hour.
85. The mean wage of 50 labourers working in a factoil is Rs. 38. The mean
wage of 30 labourers working in the morning shift is Rs. 40. Find the mean wage
of remaining 20 labourers working in Evening shift.
86. The teachers of statistics reported mean examination marks of 37.5, 41 and
42 in their classes which consisted of 32, 2.5 and 17 students respectively. Determine
the mean marks for all the classes taken together.
87. The following table gives the distribution of the average weekly wages of
100workers in a factory. Calculate (i) Average weekly total wage bill of these
workers; (ii) The weekly wage of a worker whose wage is greater than that of
75% workers.
Weekly wages 16-20 ZI-2l 26-3 0 31-35 36-40 41-45 46-50
No. of workers 7 12 It 8
Weekly wages 56 -60
No. of workers
88. The monthly incomes of 8 families in rupees in certain locality are given
below. Calculate the Mean, the Geometric mean and Harmonic Mean, and confirm
that the -relationship a > g > h holds true.
Family A IB C D E I- F I G I H
Income : (Rs. J 70 r 10 500 1 75 8 1 25 0 1 8 I 42
(Sagar, B. Com., II,1965)
Calculate 3.4 and 5 yearly moving Average from the following data : -
Years 19P 152153 I 54 155 I 56 I 57 I 58 I 591 60 I 61 16 2 1 6 3 164 16,
Value 18 I 20 I 22 I 25 I 30 I 37 I 38 I 38 I 40 I 43 I 45 I 4 6 r 4 8 I 49 I F
MEASURES OF CENTRAL TENDENCY 177
There is a member A such that there are twice as mlIny members older than /I.
a8 there arc members younger than /I.. Estimllte his age (in years upto two decimals.)
(M. A • .&0., Delhi. 1963).
91. The arithmetic mean. the mode and the meclian of a group of 75 observations
were calculated to be 17, H, 19 respectively. It was later discovered that one ob-
servation was wrongly read as 43 instead of the correct value 53. Examine to what
extent the calculated values of the three averages will be affected by the discovery
of this e r r o r . . (M.A .• E&O •• Delbi. 1963)'
:;1. If the mode and the median of a moderately asymmetrical series is 166 and
15.6 respectively. what would be its most probable median? (8. CDm., AgrtJ, 1960).
93. Under what conditions weighted average is 0) equal to simple a~e, (ii)
greater than simple av~tage and (iii) less than simple avcrage. lllustrate your answef
with the help of examples.
• 94. (a) A train starts from rest and travels successive quarters of miles at ave-
ragc speed of IlZ, x6, :t4 and 48 miles per hour. The average speed over the whole
mile is 19.7. m.p.h. and not 15 m.p.h.
(b) The price of a commodity increased by 5 percent from 1954 to 1955.
by 8 percent from 1955 to 1956 and by 77 percent from 1956 to 19n. The llvcragc
increase from 1954 to 1957 is quoted as 7.6 percent and not ~o percent.
Explain the two statements as you would to a layman and verify the arith.
metic mean. (M. COlli. Agrll. 1962)
95. If arithmetic mean of two cumbers is 20 and their geometric mc:atl IS 16. line
the harmonic mc:atl.
Measures of Dispersion 10
Need and meafliflg. In the preceding chapters we have already
discussed why it is necessary to tabulate and classify statistical series
and to condense them into a single figure called average. The average
as we have already seen has its own limitations and even an ideal average
can represent a series only" as best as a single figure can". No doubt
averages have a very great utility in statistical analysis but they fail
to reveal the entire story of a phenomenon. There may be a dozen
series whose averages may be identical but which may differ from each
other in a hundred ways. Obviously in such cases further statistical
analysis of the data is necessary so that these differences between various
series may also be studied and accounted for. If this is done statistical
analysis would be more accurate and we shall be more confident of our
conclusions.
Suppose there are three series of nine items each as follows :
In the first series the mean is 40 and the value of all the items
is identic~l. The items are not at all scattered, and the mean ,fully-
discloses the cha.racteristics of this distribution. However, in the
second case though the mean is 40 yet all the items_of the series have
different values. But the items are not very much scattered as the
minimum value of the series is 36 and the maximum is 44. In this
case also mean is a good representative of the series. Here mean
cannot replace each item yet the difference between the mean and
other items is not very significant. In the third series also, the mean
is 40 and the values of different items are' also different, but here the
values are very widely scattered and the mean is 40 times of the
MEASURES OF nYSPERSION 179
smallest value of the series and half of the maximum value. Obviously
the average dves not satisfactorily represent the individual items in
this group. In order to have a correct analysis of these three series,
it is essential that we study something more than their averages because
averages are identical and yet the series widely differ from each other in
their formation. The scatter in the first case is nil, in the second case
it. varies within a small range, while in the third case the values ragge
between a very big: span and they are widely scattered. ItTs'Cvldent from
the above, that a study of the extent of the scatter round an average should
also be studied to throw more light on the composition of a series/. The
name gillen to this scatter is dispersion.
Dispersion in a general sense. Dispersion, thus, refers to the variability
in the size of items. It indicates that the size of items in a series is not
uniform. The value of various items differs from each othe1. If thus
variation is substanti~l dispersion is said to be considerable and if the
variation is litt~e dispersion is insignificant. This is rather a general sense
in which this terni is used. If there is a series in which the scatter of the
value is much, say, from 100 to 1000, this series would be said to have
more dispersion than the one in which the values range only from 100
to 2.00.
Vispersion in a precise sense. The term dispersion not only gives a
g~r;. ral impression about the variability of a series, but also a precise
me ."ure of this variation. Usually in a precise study of dispersion, the
deviations of size of items from a measure of central tendency are found
out and then these deviations are averaged, to give a single figure re-
presenting the dispersion of the series. This figure can be compared
with similar figures representing other series. It goes without saying
that such comparisons would give a better about the formation of
series than a mere ('omparison of their averages.
unit of the original data. In the above case the average income would be
referred to .. S Rs. 12.0 per month and the rdative dispersion ~ or '167
120
or 16.7%. In a comparison of the variability of two or more series, it is
the relativt: dispersion that has to be taken into account, as the absolute
dispersion may be etroneous or unfit for comparison if the series are
originally expressed in different units.
Measures of dispersion
The following measures of dispersion are in common use--
I. Range
2. Inter-Quartile-Range
3. Semi-Inter-Quartile-Range or Quartile Deviation
4. Average Deviation or Mean Deviation
5. Standard Deviation or Root-Mean-Square Deviation taken
from the mean.
We shall discuss them in turn.
RANGE
Range is the simplest possible measure of dispersion. It is the
difference between th~ vallies oj. the.!..?f1!e1Jle.i1!J!;,LojEJ..e.r.iM:- Thus if in a series
rerat1t'ig to the weight measurements of a group of students the lightest
student has a weight of 90 pounds and the heaviest of 240 pounds the
value of range would be 150 pounds. This figure indicates the variability
in the weights of students. The distance on the scale measuring 150
pounds would include the weight of every student. If the data are given
in the shape of continuous frequency distribution, range is the difference
between the lower limit of the smallest class and the upper limit of the
biggest class.
Range as calculated aboveis an absolute measure of dispersion which
is unfit for purposes of comparison, if the distributions are in different
units. For example the range of the weights of students cannot be
compared with the rang(. of their height measurements as the range of
weights would be in pounds and that of heights in inches. Sometimes,
for purposes of comparison, a relative measure of range is calculated.
If range is divided by the sum of the extreme items, the resulting figure
is called "The Ratio of the Range" or "The Coefficient of the Scalier."
Merits, demerits and uses of range
A good measure of dispersiort should possess the same qualit:es
which were laid down in the Ilj.st chapter for a good measure of central
tendency. A good measure of dispersion should be rigidly defined,
easily calculated, readily understood and further, should be capable of
algebraic- treatment and should not be affected much by the fluctuations
of sampling.
The only merits possessed by range are, that it can be easily calculated
':::.. and readily understood. As against these, there are many drawbacks from
which it suffers. The most important point against range is that it is
HEASyRES OP DISPElt STON 181
500/0 of the values, a percentile range which takes into account, say, the
90th and the 10th percentiles would give a better measure of dispersion
than either of these two. If the difference of the 90th and the 10th
percentiles is found out it will be called 10-90 percentile range. Un-
lik:e range it has the advantage of not being affected by the values of the
extreme items of a series and it also does not leave aside 50% of the
values as the intet-q uartile range does. A 10-90 percentile range would
leave only 20% of the values at the extremes. It, however, suffers from
most of those defects from which range and inter-quartIle range suffer.
SEMI-INTER-QUARTlLE RANGE
Semi-inter-quartile range
or
Quartile deviation
Where Q'A and Ql stand for the upper and lower qua{tiles respectively.
In a symmetrical series median lies half way on the scale from Ql
to Qa. If, therefore, the value of the quartile deviation is added to the
lower quartile or subtracted from the upper quartile, in a symmetrical
series, the resulting figure would be the value of the median. But
generally series are not symmetrical and in a moderately asymmetrical
s~ries Ql+ quartile ~eviation or Q3- quartile deviation, would not give
tne value of the median. There would be a difference between the two
figures and the greater the difference, the greater would be the extent of
departure from normality. .
Quartile deviation is an absolute measure of dispersion. If it
is divided by.the average value of the two quartiles, a relative measure
of dispersion IS obtained. It is called the Coefficient of Quartile Deviation.
/2a-Ql
Symbolically 2
Coefficient of a quartile deviation = Q2+ Q'8 =Qa- Ql
2 Qa+Ql
The following example would clarify the procedure of the calcu-
lation of the quartile deviation and its coelfficient : -
Example 1. Calculate the Semi-Inter-Quartile Range and its
coefficient of the marks of 59 students in Economics given below.
MEASURES OF DISPER.SION 183
... 40 + 50-40
1 2 (44'25-38)=45'2 marks
Semi-inter-quartile range =
Q a- 2 Q 1 =
\/44'2-22'5
2 = 10.85 marks.
be 33.35 and if they are subtracted from the upper quartile it will again be
33.35. The actual value of the median is 34.33. It shows that the series
is not perfectly normal though the department from normality is not much.
It, however, reveals that the dispersion of items on the two sides of the
median is almost equal.
Merits and drawbacks of quartile deviation
The quartile deviation possesses the merits of simple calculation and
easy understandability. It is commonly understood and its calculation
.does not involve any mathematical intricacies. These are the points in
\favour of quartile deviation but there are a large number of points which
go against it. Quartile deviation is neither based on all the observations
of the data, nor is it capable of further algebraic treatment. It is affected
to a cousiderable extent by the fluctuations of sampling. A change in
the value of a single item may in certain cases affect its value considerably.
Thus quartile deviation is not a very good measure of dispersion, parti-
cularly for series in which the variation is considerable. However, for
rough studies, '{uartile deviation may give an approximate idea of the
extent. of variabllity in a series.
n the direct method, as we have seen above, the mp.an deviation would
be calculated by totalling the deviations from the mean or median (plus
and minus ignored) and dividing this total by the nllJIlber of items.
In the short-cut method mean or median is assumed and the total of
the "allies of itWiS below the actual mean or median and above it are found
out. The former is subtracted from the latter and divided by the number
of items. The resulting figure is the required mean deviation.
Symbolically
81n= _:_(JIIY-1IIx)'
~ n ,
Where 3m stands for the mean deviation from median, my for
the total of the values above the actual median, and mx for the values
below it, and n for the number of items.
1
(;,1 8=-;;- (ay-ax)
Where 8 stands for the mean devia~ion from mean, '!Y stands for the
total of the values above the actual arithmetic average and ax for values
below it. The following example would illustrate these formulae : -
Example 2. The following are the marks' ~btained by a batch of 9
students in a certain test : - I
Serial No. Marks Serial No. Marks
(out of 100) (out of 100)
1 68 5 54
2 49 6 38
3 32 7 59
4 21 8 66
9 41
Calculate the mean deviation of the series.
Soilltion. Direct method. Calculation of mean deviation of the series
of marks of 9 students (arranged in ascending order of magnitude).
I- 1.)eVlattons frcJlllmedian (4~)
Students Marks (+and- signs ignored)
(m) (dm)
1 21 28
2 32 17
3 38 11
4 ~ 8
5 ~ 0
6 54 5
7 59 10
8 66 17
9 68 19
r.dm = 115
MEASURES ()il DISPERSION 187
n-\=1.
Median=value of - 2 - ltems
= 49 marks
.. ~ "l'..dm
Mean devlatlon or um = -
n
Where "l'..dm represents the summation of the deviations from the
median; and n, the number of items
115
Sm-=-9- marks =12.8 marks
So/tllion
Calculation of Mean Deviation ftom the arithmetic average,
Prices Rs, Deviations from arithmetic average
(Rs, 100,425)
(+and-signs ignored)
(111) (d)
100.500 .075
100.250 .17S
100,375 ,050
100.625 ,200
100.750 ,325
100,125 ,300
100,375- ,050
100.625 ,200
100.500 .075
100.125 .300
:Em -1004,250 Ed ... 1.750
. hm .
Arit etic average-
Em
n
== 1004.25
1.0
R 100 4.25
- s. '1
.. 1004.250
ArIthmetIc average or a- 10 "'" 100.425
Number of items smaller than arithmetic
average or fiX = 5 and their total
or ax -- 501.250
Number of items bigger than arithmetic
average or '!1-5 and their total
or ~ ... 503.000
1
Mean deviation ,,-(aY-flyxa)+(nxxa-ax)
1
n(ay -IlK)
1 1.750
= 10 (503.000-501.250) - """'10
- .175-rupees ..
. .
Mean d eVlatlon = l:.fd
I:.f =
472
50 mark s. =.
9 44 marks.
Adjustments
Number of items less than the !lctual arithmetic average (27)=28
'192 FUNDAMENTALS OF STATISTICS
Diretl Mlthod.
MEASUR.ES OF, DISPERSION 193 -
. hmetlc
Atit . average=a + 'SfdX • 25+('-864
~XI=l. 2f23 x 5 ) = 10.5
64
MEASURES OF DISPERSION 195
Mid-value
I
= Deviation 'fotal Deviation
from the deviations from the Total
I
Age-group of the Frequen- as. avo from the average devia-
group cy (19.5) as. avo (20.7) I tions
(+ &-signs!from the
ignored) , average
(m) I(m.v.) (f) (dx) I (/dx) Cd) Cfd)
15-16 15.5 0 -4 0 5.2 0
16-17
!
, 16.5 '1 -3 -3 4.2 4.2
17-18 I 17.5 3 -2 -6 :'\.2 9.6
18-19 18.5 8 -1 -8 i 2.2 17.6
19-20 19.5 12 0 0 1.2 14.4
I
20-21
21-22
22-23
I 20.5
21.5
22.5
14
14
5
+1
+2
+3
+14
+28
+15
.2
.8
1.8
2.8
11.2
9.0
23-24 2~.5 2 +4 + 8 2.8 5.6
2~25 24.5 3 +5 +15 3.8 11.4
25-26 25.5 1 +6 +6 4.8 4.8
26-27 26.5 0 +7 0 5.8 0
• 27-28 27.5 1 +8 + 8 6.8 6.8
----.
,I n-64 l:.fdr = . j}:.jJ='97. 4
I +77
.h
A nt .
metlcaverage or,a=x
+. l:.fdx -=19.5+64"
---n 77
=1.0.7 year$.
Symbolically
a = jL:~
Where CT stands for the standard deviation, ~d2 for the sum of
the squares of the deviations measured from the arithmetic average
and n for the number of items.
Difference between' root mean os-quare deviation and standard deviation.
Various terms like· Mean Error, Mean Square Error_and Error of Mean
Sqllare are used to denote the value of standard deviation. We shall
be using the term standard deviation only as it is most" popularly used.
Some writers use the term root-mean-square-deviation to denote the stan-
dard deviation. This is technically wrong, bec~use the standarddeviation
is only one of the many values that the root-meatJ..-square.-deviation
Cll n take. Root-mean-sqllare-de.uiatiofl is tbe sqllare root of tbe arithmetic
average of the sqllares of deviations measllred from a'!Y arbitrary vallie. If the
deviations are measured from the arithmetic average there is no difference
between root-mean-square:...deviation and the standard deviation;
in' other -Words, standard deviation is the root-mean-square-deviation
0,
mea'sured from the arithmetic average. If deviations are not measured
from the arithmetic average but from some other value we can find out
the value of the standard deviation from the value of the root-square-
deviation. In fact the short-cut method of calCUlating the standard
deviation is based on the relationship between standard deviation and
root-mean-square deviation. We ~ball discuss this point a little later.
198 FUNDAMENTALS OF STATISTICS
- ..
-sfancTard DeViation or (1-
J~mI
- nl:(mj'/n -
'39,764-(630)1/10 ,j'W'
-=,J 10 - 10
= 2.72"
Short-cut metbod. Standard deviation can also be calculated by
a short-cut method. Here the deviations from an assumed average
are calcula.ted and squared. Their sum. is divided .by the number of
items, or in other words, the arithmetic average of the square of devia-
tion!> from the assumed average is found out. From this ligure the
square of the arithmetic average of the deviations from the assUllled
mean is subtracted. The square root of the resulting figure is the
standard deviation.
~2_ (~dX):P
Symbolically 0=
j "
--
n
Where dx 'stands for the deviation from the assumed mean.
Example No.9 would be solved by this method.as follows:-
·Proof
nand $= J'DJx
J ~~I 2
Let 0= - , , - and c=(a-x)
2 l
(Ill = '2.fd and $2 = '1:.fdx
n "
(111=$II_CI
s.
As would always be greater than (II, the root mean-square devia-
tion from mean would always be le~s than the root mean square
deviatiofJ from any other point.
-200 PUNDAM:e;N'l'ALS 01':- STATISTICS
Short-fill M,thaa
si:te Of items Deviacl"Ons from Square o f -
assumed mean 62 deviations
-------60- -
(dx) ....
, - - - ' (~-);.,.----
60
-2' I 4
61
62
-2
-1
o
I
I
1
0
63 1 \ 1
63 1 1
63 I 1 1
64 2 4
64 2 4
70 8 64
Total +10
-0 - J};:' -(~ y
a -J 84 -(~)I-V84-1
10 10 .
- v 7.4 =2-.72'
This formula ,can be 'Written ih the following way~ also
(-I)
Then dX-d-f
(tlx2)_ (tI+ &)'= tl s+ 2&J+ &~
L(dxY'=;l:dl +:t2ed+el
but ~d=:O
:. Z(dx)B=l::dl+ncl
I "i.(dX)1 = };dl _+ el
n n
l::d' l::(dx)1
-= ---e·
n "
-= (a-x)!
n n
MEASURES OF ruSPE'RSION 2.01.
(ii)
(i) a
J 84-10(63-62)1 -
10
J 84-10. '-74
10 v - ·
2.72'
.(ii) a ...
J~ Iu -(63-62)' -J 8! -
10
1
~'\I~2,72
o =J-~~2
The following illustration would clarify this procedure : -
Example 10, Calculate the standard deviation from the following
data : -
Size of item Frequency Size of itetp f're'iaency
6 3 10 ..i
7 6 11 5
8 9 12 4
Q
13
202 l'UND~N1.'ALS 'OF STATISTICS
-
items
(m)
6
I quency
,(f)
3
I
Frequency from the
(mf)
-.'---~-'
average (9)
(d)
squared
up
(JI)
-3
square .of
deviations
(fdt )
9 Z7
1- 6 42 -'2 4 24
8 9 72 ~1 I 1 9
9 13 117 0 I 0 0
10 8 80 +:1 1 8
11 , 5 55 +2 4' 20
12 4 48 +3 9 36
11==48 :Emf=432 -:Ejdi -124
- ~
. hm . 1:.mf 432
A rlt etlc average = - - = __ =9
n 48
Standard DeviatlOn
''i:.ftlxl
u=
J -n--(a-x)1I
=x+
:EJd« = 22
n
+ 38 =22.38 articles
100
Standard deviation
. . j j1X:-
:E n(a-x)1I
= j490-10~06·38)~·~ ,;;[756
= 2.2 articles app rox.
OJ'
a=J~a_(:Ef:X2)
= )
490 C
38\2 -
100-- 'tOO} = ,\/4.9-.144= '\14.756
=2.2
204 FUNDAMENTALS OF STATISTICS
Age
group
\
value
I
Mid- ' Freque-
ncy
Deviat~ons
from the
assumed
Total
deviations dev iations
--
Square of Frequency
X square
I aVo (55) fJxI
(m) (11/4)) ~f) dx fdx dx"
20-30 2 5
30--40 35
3
61
-30 - 90 900 2/UU
24400
-20 -1220 400
40-'5"0 I 45 132 -10 -1320 100 13200
50-60 55 153 0 0 I 0 0
60-70
70-80
65
75
140
51
+10
+20
+1400
+1020
I 100
400
14000
20400
80-90 85 2 +30 + 60 ,900 1800
n=542 "'i:.fdx=- "'i:.fdx"=
150 I 76500
MEASURES OF DISPERSION 205
-150
(a-x) .... 542 = .28
-=J 76500--:542(.28)1
542
- v'i4iJ57
-11.9
The following metpod 'will also give us the same result. ."--
. , 0 f t he age d
Standard d eVlatton "b
Istn '
utton = j"J:.fdx'l.
_ ' - - -("J:.fdX)'
--
1\ n n
-= j 76500
542
(-150
542
)2 = v' 141.07
.... 11.9 years
Example 1.3. The following data relate to the ages of a group
of Government mployees. Calculate the standard deviation. I
• Age Number of employees Age Number of employees
50-55 25 30-35 80
45--50 30 25-30 110
40--45 40 20-25 170
35-40 45
SoJlltion. Cal&1llation oj standard tktIiation
u= J
Standard deviation or
_-==L""':j:~X-::z""'_-(-=-=~:-:~"""x-y X i
_- j 500
2435 _ ( -635)2
5eO X
'5
= 9.0 years.
206 FUNDAMBNTALS OF STATISTICS
Whete a.· stands for the square of the §tandard deviation after
corrections, at for the square of the standard deviation before correction
and h. for the square of the magnitude of the class intervals.
u1.= 141.07
h=10
10·
Therefore ul =141.07- = 132.74
12
Therefore
" 1+n ll
Where all stands for the mean of a serie:; and a1 and a, for the means
of its component parts, and n1 and n. for the number of. items in tbe
two component parts respectively.
If. further 0'1 and 0'1 stand for the standard dcvi~tions of these
component parts and O'lf for standard deviation of the whole series
tben
+ d21)
0'11=
1 j nl (O'tl+dtl)+n.
n1 +n.
((1111
Series Series
A 13
Number of items 100 500
Mean 50 60
Standard deviation 10 11
SO/Illioll.
Combined mean or
all=
"1fNl
11 +11
+" 2m 2
1 II
(lOQx50) + (500 X 60) 35,000
- - - . 100+500 ._= 600-
- 58.3
MEASURES OP DISPERSION
100[(10)1+(-8.3)1]+500[(11)1+(1.7)1 ]
• 100+501)
-= j 600
16889+61850
""
,.--~~-
J13f.23 =11.5
0'11- j O'II+O'tl
2
• Thus, if in the above example, the number of items in each case
was 100 and if the mean in each case 'Was 50 the combined standat:d
deviation by the lirst method would have been
_j 100(100+oY.+ 1oo(121+0)
1 +100
- J 10000+ 12100
200
_j ~: _j~l
10.51
If we apply the second rule then
j _j ~1
~t-
J
0'11+0',- _
10.51
100+ 121
2 2
We can know from elementary algebra that the sum of the first
" natural numbers is
11 (,,+ 1)
2
14
210 FUNDAlomNTALS OF STATISTICS
1'1(1'1+ 1) (2n+ 1
6
Thus the sum of the squares of natural numbers 1 to 5 would
be 1+4+9+16+25 ... 55. It is equal to
5(5+1) (10+1)
6 ~-
5 X 6:11 -55
We have Seen in Example No. 9 (Direct method No.2) that
u "" J '1:.ma_<;m)l{w
~o11
45-50 47.5 110
50--55 52.5 111-1
......... ... ............ .." ......... ............
----
200 200
The value of the mean when 43 was misread as 53 is given by
1
40- 20(r(2.5/1+7·~I+ •..... +37.5/.+42.5f.+47.5fl.+52.5111+ ..... ·)
Let the value of the corrected mean be x.
- 1
Then x .... 200 (2.5/1+ 7 •5/.+.,.+37.5/.+42.5 (/.+ 1)+47.5/10
+52.5 (jtl-l)+ ...... )
Let 2.5/t+ ...... +37.5!.+47.51111+57.5/1t+ ...... -.r
1
Then 40 =2O(j'"(s+42.5/.+S2.5fu) or .r+42.5/.+S~.5fn -800(,l
- 1
and x - 200 [1+42.5 (/9+ 1 )+52.5 (/11-1)]
1
- 200[1+42.5/.+52.5/11+42.5-52.5)
1 7990
- 200 (BOOO-10} - 200 -39.95
MEASURES OF DISpERSION 213
1 44850
2000 [45000-150} = 200 -224.25
aI=sl-dl where d is the difference between the actual and
assumed mean.
In this example s:l=81 =224.25 and d-=(40-39.95) =0.05
:.aI =224.25-0.0025
=224,2475
:.a =14.97
The corrected standard deviation corresponding to the corrected
distribution is 14.97.
ExtZlllp/e 17. The mean. age and standard deviation of a group
of 100 persons (grouped in intervals 10-, 12-, ... etc.) were ,found
to be 32.02 and _13.18. I.ater it was discovered that the age 57 was
misread as Z7. Find the correded mean and standard deviation.
Solution. The age 57 belongs to the group 56-58 (mid-value
,-57) and the age Z7 belongs to the group 26-28 },mid-value-27)1
, Let the misread frequencies of these two grou:ps be 1 and I.. Then
the corrected frequencies will be (/1+1) and (/r-l) respectively. All
other frequencies have been entered correctly.
Mid-value Frequency (wrong) Frequency (correct)
57 11 11+1
27 I,. /.- 1
214 FUNDAMEN'!'ALS OF STATISTics
1
~ 1000 [/+16075] :.1-29930-16075-13855
x- 1~00 [/+(28Xl0)+(121X20)+(198X30)+(176X35)+
(27 X 50)]
30005
... --fooo ... 30.005
4gain let T.li(xi-29.93)2=T.; where I; are the
correct frequen-
cies. Then second moment ahout 29.93, when errata was not
considered is given by : ~
82 -= 1~ [T+(28X397.2049)+(121X98.6049)+(198X.OO49)+
(176X25.7049)+(27X~02.8049)
C - J2fl
Modulus is equal to standard deviation multiplied by the square
[oat of2 or
C=aXV2
Like standard deviation this measure is also based on the second
moment about the mean.
Precision
It is the reciprocal of modulus.
Thus
Precision ...
1
jv:'
Probable ettot
It is equal to .67449 X stanctard deviation.
Modulus, precision and probable errors are used in the theory of
errors of observations. We .shall discuss them in chapter!! on Sampling.
Standard deviation should not be confused with the term "Standard
Error" which stands for the standard deviation of simple sampling.
The. concept of standatd error will also be discussed in details in the'
chapters on Sampling.
Variance
It is equal to the square ~f the standard deviation or in other words
it is the second moment about the mean.
'218 FUNDAMENTALS OF STATISTICS
Coefficient of variation
It stands for the percentage, which the value of standard devia-
tion is, to the value of the mean. In other words, if standard devia-
tion is divided by the mean and multiplied by 100 we get the coefficient
of variation. This measure was first suggested by Professor Karl
Pearson. According to him, coefficient of variation is the "percentage
variation in the mean, the standard deviation being treated as the'tota}
variation in the mean,"
Symbolically
Coefficient of variation or V ... -.!!_X 100
a
-Coefficient of standard deviation X 100
Thus, if the mean of a series is 50 and the standard deviation is 10,
the coefficient of variation would be
10
SO--X 100
or 20%
It means that the standard dcviation is 20% of the n.can.
Ginni's mea,n difference
Corrado Gipni, an Italian statistician, has suggested that instead
of measuring dispersion from any measure of celJtral tendency, the
mean dMrerence, between tne values' of all possibJe p~rs of the variable
should be found out, and it would give a good measure of dispersion.
Thus, thi~ measure of dispersion is equal to the mean difference (regard-
less of algebraic signs) of each possibfe pair of the values of the variable.
Symbolically
Ginni's mean differen£e _l
m
Where g stands for the total of the differences in the values of all
possible pairs of a variable and m stands for the total number of diffe-
rences. The tot~l number of differences would be equal to ,j n (n-l)
The following- example would illustrate the above formulae : -
Exampk 19. Find out Ginni's mean difference from the following
items : -
22, 24, 26, 28, .30.
SO/filion
30-22=8 28-22 ... 6
'30-24=6 28-24=4
30-26=4 28-26 ... 2
30-28=2
Total .. 20 -12
MEASURES OF DISPERSION 219
The mean deviation of the above series -2.4 and the standard devia-
tion-2.8.
Giani's mean difference is always more than the mean deviation
as it gives greater importance to extreme variations. The value of
Giani's mean difference lies in the fact that it studies the variations
(JIIJongII the values of a variable rather from a central value.
If the- square root of the average of the squares of all dif¥erences is
Thus jZ-::j~ x j 2( 55 1)
-~/~xJ}
-J 40 X 2..00:J20
5 2
d
tion.
--_----- :±: 1 stan- :±: 2 stan- ::I: 3 stan-
dud dev;,,- dard devia- dard devia-
tion tion tion
The range in both the cases is Rs. 2,000 a.nd the mean deviatigr,
is Rs. 666.7 in both the cases. The absolute measures of dispersion
are thus equal but the variation in the two series, is, in reality, not iden-
tical. If, however, we calculate relative measures of dispersion diis
anomaly would be removed. The coefficient of range in the two cases
would be land 11 respectively and similarly the mean coefficient of dis-
.
perslon wou1 2 and
d be 2 respective
63 ' 1y. In ..
comparIng .
dIsperSlon
9
of two series, expressed in different unitS, the use of relative mea-
sures of dispersion is inevitable because absolute measures of dis-
persion in such cases would be in different units.
Lorenz curve
.
Dispersion can be studied graphically also with the help of what
is called Lorenz Curve, after the name of Dr. Lorepz who first studied
the dispersion of distribution of wealth by the graphic method. The
technique of drawing Lorenz Curve is not very difficult. In it the size
of items and the frequencies are both cumulated and taking the total as
100, percen'tages are calculated for the various cumulated values. These
percentages are plotted on a ~raph paper. If there is proportionately
equal distribution of the frequencies over various values of a variate, the
points would'lie in a straight line. This line is called tpe "Uneo! Eqllal
Dirtribllfion." If, however, the distribution of items is not proportioll-
'itelyequal, it indicates variability, and the curve would be away from
the line of equal distribution. The farther the curve is from this
line, the greater is the variability in the~ries. The following example
would illustrate the procedure of drawing'-a Lorenz curve : -
Example 16, Draw a Lorenz curve from the t'ollowing data :-
10
20 I
i
I
5
10
8
7
I 15
6
40 20 5 2
50 25 3 1
80 40 2 1
- -
To draw the Lorenz Curve from the above data the size of the item
and frequencies would have to be cumulated and then percentages would
have to be calculated by taking the respective totals as 100., This is
MBASt11tlfS OF DISPERSI<?N 223
10 10 5 5 5 5 8 8 32 I 15 15 1 60
20
40
30
70
15
35
10
20
15
35
15
35
7
5
15
20
60
80
I6
2
21, 84
23 \ 92
50 120 60 25 60 60 3 23 92 I 1 24 96
80 200 100 40 100 100 2 25 100 I 1 25 100
--~------~---------------
Now the cumulative percentages would be plotted on a graph paper.
Percentages relating to the number (Jf person would be shown on the
abscissa and from left to right the scale would begin with 100 and end
with O. The income percentages would be shown on the ordinate and
here the scale will begin without' the bottom and go up to 100 at the top.
The above percentages would give the following type of curve :
224 PUNDAlmNTALS OP STATISTICS
From the above figure it is clear that in the first group of persons,
the distribution of income is proportionately equal 110 that 5% of the
income is shared by 5% of the population, 15% of the income by 15%
of the population ancfso on. It gives the line of equal distribution.
In the second group the distribution is uneven so that 5% of the income
is shared by 32% of the people and 150/0. of the income by 6()0,4 of the
people. In the ttir~ group the distributIon is still more un~qual so that
5% ofthe income is shared by 60% of the people and 15%oftheincome
by 84% of the people. The variation in group C is thus greater than the
variation in group B. Curve C is thus at a greater distance from the
line of equal distribution, than ~rve B.
The Lorenz curve has a great drawback. It does not give any
numerical value of the measure of dispersion. It merely gives a picture
of the extent to which a series is pulled away from an equal distribution.
It should be used along with some numerical measure of dispersion. It
is very useful in the study of income distributions, distributions of land
and wages, etc.
Questiool
1. What is meant by dispenion? What are the methods of computing mra-
sures of dispetsion ? Illustrate the practical utility of such methods.
eM. C_., Ail••, 194').
z. Explain the meaning of the term djspellion and distingui~ between absolute
and relative measures of dispellion. (B" C_•• Allaha/Hui. 1946).
3. Discuss the various ways in which the diifctences in the characteristics of
frequency djsttibutions ate generally measured. CB. C_ •• LIK_",. 1957).
4. Explain the various methods of describing the Idltter of a frequency distri-
bution and say what you know as to the relztive worth of the relztive measures.
(B.U..,NfII1lW. I 944)·
5. Frequency distributions may either differ in the numerical size of their ave-
rages thoogh not neccssatiJy in their formations or they may have the same valucs
orthe average but differ in their respective fonnations.
Explain and illustrate how the measures of dispersion afford a IUpplcmcnt to the
informatiOn about the frequency distributions given by the a~.
(M. C_ .• KlljJ1ldlrlltl. I.9S Z).
6. Ddine carefully the mc:an deviation. standard deviation and quartile devia-
tion of any given distribution. In wbat problems should each be uacd ?
(M. A.. AlJ6habtu1. 1940).
7. What arc the mathematical properties of standard de"jation? How is it"
better measure of dispersion than the mean deviation or quartile deviation ?
8. What is meant by Sheppard's Coucctions? Under what c:onditiosls should·
these. corrections be made ?
9. Define dispersion. Why is it necc:swy to measure dispctsiosl in ord er to
make comparisons of frequency d,isttibutiona ?
10. What is range? What ate ita advantages and disadvantages as mcslUre of
dispCllion ?
n. Find directly the standard deviation of the natural aucibers &om 1 to 10
and VCtify the answer obtained by a abort cut method.
U. Write abort notcs on
(II) Lotens Curve (/1) Charlier's Check (f) Ginni's Mean Differcucc Cd) pre.
cision ee) Modulus (I) Root Mean Square deviation.
MEA ~URES OF o:.>PERSIO" 225
13. The following table gives weights of one hundred persons. Compute the
coefficient of dispersion by the Method of Limi(s.
Weight in lbs. of 100 persons
Class-interval No. of pefSons
85- 95 4
95-105 13
105-115 8
115-125 14
125-135 9
135-145 16
145-155 17
155-165 9
165-175 8
175-185 2
100
14. What arc the different measures of dispersion ? Th~ following table gives
the height of one hundred persons. Ca1culate the dispersion by Range Method.
Height of 100 persons in inches
Height in incht:s Frequency
'Below 62 2
63 8
64 19
65 32
66 45
67 58
68 85
69 93
70 100
"
15. The following are the marks obtained by a batch of 9 students in a certain
test : -
Serial Number Marks Serial number Marks
(out of 100) (out of 100)
1 68 5 54
2 49 6 38
3 32 7 59
4 21 8 66
9 41
Calculate the mean deviation of the series.
18. Calculate the mean deviation from the following data, what light does it
throw on the social conditions of the c:;ommunity ?
Difference in age between husband lUld Wife in It particular co~munity.
Difference in years Frequency Difference in yeatS Frequency
0- 5 449 . : 20-25 109
5-10 705 25-30 52
10--15 507 30--35 16
15--20 281 35--40 .oJ
19. The following table gives the age distributions of swdents admitted to a col-
lege in the years 1914 and 1938. Find which of the two,gri>ups is more variable in age.
Number of stl](fcnts admitted 1n
I
Age 1914 19.38
15- 0 1
16- 1 6
17- 3 .4
18- 8 :2
19- ·12 ·5
20- 14 :0
21- 14 7
22- 5 .9·
23- .2 3
24- 3 0
25- 1 O.
.Q 1
26-
27- 1 0
ZOo Calculate quartile dniation and its coefficient of A's monthly eAminp.
for It year.
Months Monthly earnings Months Monthly earnings
Rs. ·Rs.
1 139 \7 160
2 150 8 161
3 151 9 162
4 151 10 162
5 1.57 11 173
6 158 12 175
227
21. From the following table giving height of student$ calculate the Semi-
[nterquartile Range and' the Coefficient of Quartile Deviation.
23. Compute the standard deviation'of the rainfall in the varioQ.S jute-growing
listricts of Bengal from the following statement : -
24. Calculate the standard deviation of the following two series. Which shows
:ceater deviation ?
25. Find standard deviation of the figures in the following table to show whether
he ","riation is great in the area or the yield ?
26. The index numbers of prices of cotton and COli.] shares ill 1942 "ere as under:-
;l28 FUNOAMEN'l'ALS OF STATISTICS
31. Calculate the standard deviation for the following table giving the age dis-
tribution of 542 members of the House of Commons.
Age No. of members
2~ 3
30- 61
40- 132
50-- 153
6~ 140
70-- 51
80-- 2
Total 542
I
32. The following table gives the frequency distribution of expenditure on food
per family per month among working class families in two localities. Find the arith-
metic average and the standard deviation of the expenditUle at both places.
Range of expenditure No. of families
in Rs. per month Place A Place B
Rs. 3- 6 28 39
6-- 9 292 284
" 9-12 389 401
" 12-15 212 202
15-18 59 48
18-21 18 21
~~ ~ 5
(P. C. S., 41).
33. Find the mean yield of paddy and the standard deviation for the distribution
of the results of 3,061 crop-cutting experiments shown in the following table --
Yield of paddy per acre in
Lbs. No. of experiments
0- 400 236
401- 800 481
801-1200 604
1201-1600 576
1601--2000 419
2001--2400 333
2401--2800 217
2801--3200 87
3201--3600 64
3601-4000 23
4001-4400 14
4401-4800 6
4801--5200 1
3061
(B. Com., Bombqy, 1945).
34. Calculate the mean and standard deviation of the following series--
Marks Number of students Marks Numbct of students
1- 5 1 21-25 7
6-10 18 26-30 2
11-15 25 31-35 1
16--20 26
230 I'UNDAU;ENTALS OF STATISTICS
35. Find out the mean and standard deviation of the following data : -
Age untler Number of persons Age under Number of persons
dying dying
10 15 50 100
20 30 60 110
30 53 70 115
40 75 80 125
36.Find out the co-erlicient of variation of the following series :-
Number of Number of
Income persons Income persons
More than 1000 0 More than 500 600
900 50 400 750
800 110 " 300 350
700 200 200 900
600 400 " 100 1000
37. Calculate the standard deviation of the following seri,cs:-
Marks Number of students
More than 0 100
10 90
20 75
30 50
40 25
SO 15
60 5
70 o
33. Find out the m=an and variance from the following data : -
I
Factory .A Factory B
Wages No. of No. of
workers workers
Not exceeding Rs. 40 30 45
Exceeding Rs. 40 but not exceeding Rs. 80 25 35
80 120 30 25
120 160 45 40
160 200 25 25
200 240 13 20
240 280 24 5
" 280 320 8 5
Tot21 200 200
39. A collar ffi'lnufacturer is considering the production of a new style of collar
to attract young men. The follOWing statistics of neck circumferences are available
based upon measurements of a typical group of college students : - '
Mid-value No. of students Mid-value No. of students
(inches) (inches)
12.5 4 15.0 29
13.0 19 15.5 18
13.5 30 16.0 1
14.0 63 16.5 1
14.5 66
Compute the Standard Deviation and use the criterion (X ±3 Standard Deviation)
to determine the largest and smallest size of collars he should make in order to meet
the needs of practically all his customers, bearing in mind that collars are wom. on
~:age, ! inch larger than neck si.l.e. (D. Com., RRj., 1949).
loiHASUllES 01' DlSPERSION 231
,-- 40. Calculate the arithmetic tTerage and the standard deviation of the following
figures and state the percentages ot cases which He outside the mean at distance II ± (f,
'1I±2a, "±3a, where (1 stands for the atandard dCTiation.
148, f45, 141, 116, 96, $II, 87, 89, 91, 91, 102, 95, 108, 120, 139.
41. Find the S. D. of the following frequency distribution : -
Exceeding But not exceeding Frequency
5.5 6.5 4
6.5 7.5 2
7.5 8.5 5
8.5 9.5 7
9.5 10.5 9
10.5 11.5 4
11.5 12.5 2
(M. A., Agrll, ,1934).
42. The following table relates to the profits and losses of 100 firms. Calculate
the average profits and the standard deviation of profits.
Profits Rs. Number of £inns
5000 to 6000 8'
4000 to 5000 12
3OQo to ~OO 30
2000 to 3000 10
1000 to 2000 5
Oto rooo 5
-1000 to 0 6
-2000 to -1000 8
-3000 to -2000 9
--4000 to -3000 7
43. In any two series, where /1 and /. represent the deviation from a trial average,
100,
X/l -180 ,E/11=245320
XJ.-250 ,EJ,I-4385Q
II .... 100
Calculate the c:odfident of variation for the two series.
44. In any two aamplCl, wh~ the variatCi Xl and X. arc measured in the same
units,
"1-36 (summation) L'Xll=49428
",-49 ., 2',..,1..,71258
Compute the values of the StandArd Deviations of the two samples. What
additional information is required to calculate the co-dficient of variation of the above
two samplCl? Indicate the uses of such a coefficient. (B. Ctmt., LIKIcn~. 43).
45. An analysis of the monthly wages paid to workers in two firms A and B,
belonging to the same indusn-y, gives the following results : -
Firm A Firm B
Number of ~e-oarnetS 586 648
Ayerage monthl: w~ Rs. 52.5 47.5
Varian~ of the distnbution of wage 100 121
(II) Which firm, A or B, pays ~ut the larger amount as monthly wage. ?
(.) In which finn. A or B. is there sreater variability in individual wages ?
(&) What are the m~rCl of ('I aTctsge monthly wage, and (ii) the variability
ill individual waCCS. ot all the workers In the two .&nn•• A and B, taken toscthcr.
(1. A. S., ,11., ~",.,., 1951).
232 FPNDAMENTALS OF STA'l1ISllCS
46., The following table gives the marks obtained by 100 'itudents ; - -
Digits (Division of Class-interval)
Marks 0 1 2 3 4 5 6 7 8 9 Total
0-9 2 4 3 1 1 1 12
10-19-----1--5 3 4 2 1 15
- - -20--29 ! -
1 7 8 10 5 4 3 2 40
- - - 30-39 3 5 10 2 1 1 22
40--49-- 4 3 2 2 11
100
By calculating the co-efficient of variation in each case, find which team may
be considered more consistent. (I. A. S., 1954).
52. Explain the method of computing the standard deviation of a frequency
distribution from a working origin different. from the arithfIletical mean.
Calculate the standard deviation for the data given below using the interval,
50-59 as working origin : -
-Class-interval Frequency
0- 9 2
10- 19 4
20- 29 23
30- 39 30
40- 49 40
50- 59 45
60- 69 35
70-79 25
80- 89 12
90- 99 9
100-109 6
110-119 10
120-129 3
130-139 1
140-149 1
150-159 3
Total 249
How would the value obtained above be modified if. you have to adjust it for
the reason that the data are grouped in class-intervals ? (r. A. S., 1956).
FUNDAMENTALS OF S.TATl:;U(';:S
53. The following is a record of the number of bricks laid each day fot 20 daya
by two bricklayers A and B :-
A- 725, 700, 750, 650, 675, 725, 675. 725, 625, 675,
700. 675. 725, 675, 800, 650, 675, 625, 700, 650,
B- 575, 625, 600. 575, 675, 625, 575, 550. 650, 625,
550, 700, 625. 600, 625. 650, 575, 675, 625, 600.
Calculate the co-efficient of variation in each case, and discuss the relatlYc consis-
tency of the two bricklayers. If the figures for A were in every case 10 more and
those for B in every case 20 more than the figures given above. how would the ans-
wer be affected ? (M. Com., BtmurIU. 1950).
54. A distribution consists of three components with frequencies of 200,
250 an? 300 having means of 25. 10 and 15 and standard deviations 0( 3. ",
and 5 respectively. Find the mean and the standard deviation of the combined
distribution ? (M. Com., B4narar. 1954).
55. Suppose each measurement in a distribution is multiplied by 2. What
happens to the : -
(it) mean of the distribution
(/I) variance" ..
(l») standard deviation of "
(J~ each of the three if .. is added to each meaSlUCment ?
56. Compute the values of arithmetic average, mode, median and standard
deviation for the following observations :
96, 8.... 10.3, 88, 92, 98, 100, 96, 87
92, 94.
57. Suppose a group of children have a distribution of I. Q. Scores with mean
100 and standard deviation 10. If one child with I.Q. 70 is reroOfed, what wllI be thc
c:fi"ect on the mean, and slllndard deTiation.
58. Three distributions each of 100 members and standard deviatlon 4.5 units
are loated with their arithmetic means at 12.1, 17.1 and 22.1 units respectively. Find
the standard deviation of the distribution obtained by combining the chCQI '1
S9 The (irst of the two samples bas 100 items with mean and standard deo:rla-
tion ,: If the whole group has 250 items with mean 15.6 and standard deviatiOn
vIT44 find the standard deviation of the second group. \
. , (M. A., Beo., Ik/~/, 1~"91
60. The mean and the standard Deviation of a sample of to? observa.tlOfls waS
calculated a9 40 and 5. I respectively by a student Who took by mIstake '.0 mstcad of
40 for one observation. Calculate the correct mcan and standard deYlallon.
61. Co-efficient of variation of two series are 60% and 70%. Their standard
deviatjons are :z [ {lnd z6. What are their arithmetic means?
62. Given: Number Mean Variance
IG~ ~
11 Group 60 5'
1 and II Group combincd 95 u .,
Find the missing items.
63. Indicate the extent of dispersion graphically for the data giycn in the
follOWing table ; -
Years
Income (in thousands)
I '.><
AII
B1
6
t6
55
8
ao
,6
JJ
(8
57
9
18
58
8
ZO
59
10
Z2
60 61
12
36
10
18
6z 6_
J4
zz
u.
110
64. The tablc given below gives the population and weekly earnings of twO
localities-A and B. Represent the data graphically to bring out the inequalities
of dil;tribution of earnings.
MBASUB,ES OF DISPERSION 235
Weekly earning
(in Rs. I
o-:to I 2
20-40 6 S
40-00 8 zo
60-80 IS zS
So-IOO 20 4~
65. Find the actual class-intervals from the data given below :
dx -3 -2 -I 0 1 :l S
f 10 15 25 25 10 10 5
,~ n n n
Second Moments about the value x. It is obvious that any moment about
II value other than mean, would be more than the value of the moment
about the mean. Thus the first moment about the mean is 0 because the
sum of the deviations from the mean is-always o. The second moment
about the mean is the variance or the square of the standard deviation.
Just as we can calculate the first and second moments e\ther about the
mean or about any other value similarly 3rd, 4th, 5th and nth moments
can be calculated either about the mean or about any other value of the
variate. Thus the third moment about the mean or
};fd3
7t3= n and
8 I S
rIO
t>.. o tl 0 a<'I
E u ..!;:lpo."'-" J:j
o~
q ........ ",«~
*~ ----- :g S'-" 2
~
0 ...... q --0 .._.. co
......
0 &.'-' .;:: So II) U ~
-.:::... ~ ~
0 ...... ..... d
..... <'I
fd 2 d3 fd 3
bl)
........ ~ ~
~
~ :J
4.) •21 :J <'I • ..-4 U .....
~
N
po. '"
V <'I
U5 ~<'I
rIO
0
.2 10 -2 -.20 '40 -80 -HS 11.22 112.2 -37. 6 -376 .0
4 IS 0 0 0 0 -l.~S 1.82 27·3 -~·46 - 36 '9
8 8 4 32 u8 p2 +2.65 7. 02 56 . 1 10.6 148.8
10 7 6 42 252- 151 2 +4·65 21.62 15 1.3 100·7 70 4.9
Total 4 0 +54 4 20 1944 346 ,9 440.8
The first moment about the arbitrary origin (4) or
Efdx 54
vl = - - - = -=1·35
n 40
The first moment about the mean or
, S4 54
7tl=v1-V1 = - - -=0
40 40
The second moment about the arbitrary origin (4) or,
Efdx2 420
Vz = - - = -=IO·S
n 40
The second moment about mean or
7tZ=V2-V12 = 420
___ •.
r(
:(5- 4)2
" •
0
~ 86
- . 8
40 4
The third mOment aDout the arbitrary origin (4) or
Efdx3 1944
vs= - - = - = 4 8. 6
n 40
The third moment about the mean or
1Ta=Pa-3PtV2+2.V13
= 1944
40
(; X 54 X">42.0) +2 X
4 0 X40
(Jj;)3
4,0
=48.6 -42.5 +4.9 2
= 1 1.02
and further
1'1 = +v'i3;
and
1'2=~2-3
Thus for example No. I.
(11"02.)2 _J(I1.02.)'
~l = (8.68)3 and 1'1 - (8.68)
Need and lneanin.g. In our studies so far, we have discussed the methods
of measuring the central tendency of a frequency distribution and the
methods of studying the concentration of items ro'und the central value.
These measures of central tendency and disperSion do not reveal whether
the dispersal of values on either side of an average is symmetrical or not.
If observations are arranged in a symmetrical order round a measure of
central tendency, we get what is called a "symmetrical distribution."
When plotted on a graph paper such a distribution gives a normal or
ideal curve. A normal curve has many mathematical properties, which
we shall study in a later chapter in which we shall discuss the various types
of theoretical frequency distributions. For the present it would suffice
to say that in a normal distribution the values of the mean, median and
mode coincide and the quartiles are equidistant of the median. It is obvious
that in such cases the sum of the deviations measured from the mean,
median or mode would be o. We have already mentioned in earlier
chapters that the empirical relationships between various averages and
measures of dispersion hold good only in a symmetrical distribution.
MOMENTS. SKEWNESS AND KURTOSIS 2.39.
( ~.
/ '
\
J ~
V
/ \\
L/ a "--
M
Z
Figure I.
( " ~
1\
I f\
II '\
~
./ Z Md
~
Figure z.
Figure No. 3 also gives the shape of a moderately skew curve.
This curve is skewed to the left and in it, the value of mode would be
greater than the value of median and the value of median would be greater
than the value of the mean. Such curves are called negatively skew.
V
I \
/
1
"\ ,
~
/ I ~
(1M Z
Figure ;.
UOlGlNTS. lOWNESS AND s:uaTOSIS 241
T_t oIl11ewaea
In order to find Qut whether a particular distribution is ,Ikew cer-
tain testa are u~a1ly applied. They ale as followa :-
(.) In a lkew distribution val,ues of mean. median and mode
would not coincide. The ttlean and mode would be pulled wide apart
and median would usuilly lie between them. Vie have already seen
that· in modetate1y asymmetrical distribution ;
Mean =Modc+ I (Median-Mode)
(j) In a Ikcw distribution the two qual' tiles would not, be equi-
distant from the median or in other words (12,- M)-(M- 121) would
not be O.
(e) A skew distribution when plotted on a graph paper would not
gi'Ye a .ymmetrl~ bell-shaped curve.
Mouurel ollkewnel'
The abo..e mentioned 'testl would indicate whether a particular
distribution ia skew or not. If a particular diltdbution is (ound to be
skew the nat problem that arises is to meu~re the c::ct~t of skewness.
Some distributions may be slightly dUfctent from th;' ~'!trical dis-
tribution while others may be very much different fro~ ~~,. Meuures
of skewness are meant to give an idea about the extent "01 asymmetry
in a series. . " -,
First IIIUlllrll of SIu1ll1lIlS. 'Pte 'first meaSures of skewness are
based on the assumption that in a skew distribution the values of mean,
median and mode do not coincide. This being so. the difference
between any two of these values indicates the extent of skewness.
Thus fint measures of skewness ate :-
(I') Mean - Mode or (11- Z)
(it) M~-Median or (.-M)
(iiI') Median-Mode 01' (M-Z)
The above measures of skewness arc absolute measures. For pur-
poses of comparison it is necessary to have telative meaaurea of .Itew-
neS!. Relative measures of skewness are obtalined by dividing the
absolute measures byuny measures of diapetaion. The absolute measures
of .kewnes. should not be divided by a mCUUt'e of central tendency or
average because. here the problem il not to study the extent of skewness
in s:elation to the size ofitem&, but it is to study the asymmetry in relation
,J to the di.~raal of items round a central value. The purpolle of studying
skewnes'1' to find out how much more or leis. do the items on one side
deviate.from the items on the other side of a central value. Therefore,
absolute measures of skewness IhQ~l~diVjded b1 a measure of disper-
sion rather than a measure of ce(it.r t\ndency. Relative measures of
.kewness .lIe known o,..'/fid,,,f bf ~ »IfI.us.
16
242 FUNDAMENTALS OF STATISTICS
Thus
Coefficient of skewness or
· a-Z (i)
J=sz····
· a-Z
or J= -a-· ..··· (it)
If mode is ill-defined median can be used in place of mode and then
• ,(1- M (,'1.'.)
J=an;- ..... .
· a-M
or J=_ -8-······ (iv)
./=sz--
· M-Z
..... . (v)
· M-Z
or J= sm- (vi)
Kllrl Pearson has given a formula in which the denominator ,is not
the mean deviation but standard deviation.
· a-Z (vii)
Thus J= - ..•...
a
\
If mode is ill-defined, Karl Pearson is of opinion 'that its value
should be estimated on the basis of the empiri~l relationship which
exists between the values of mean, median and mode in a moderately
asymmetrical distribution. We have seen that in a moderately asym-
metrical distribution
(Mean-Mode) = 3 (Mean-Median)
Thus j = 3(a-M) .... (viii)
a
The value of the above coefficients of skewness would be 0 for a
symmetrical distribution and for skew distributions it would be a pure
number. These are the two properties of these coefficients and for these
reasons they are regarded as better than other tneasures. In theory there
are no limits to the values of the coefficient numbers (i), (ii), (iii), (iv),
(v), (111} and (vii). In actual practice for moderately asymmetrical distri-
butions all these coefficients (excepting No. viii) vary between ± 1. The
theoretical limits of coefficient number (viii) are ±3 (because the
a-M .
theoreticaI limits 0 f - - are ±1) but they are never reached In actua
I
u
practice.
MOMENTS, SKEWNESS AND KUR'l:OSIS 243
SuonJ Measure oj Skewnes.r. The second measure of skewness is
based on the quartiles. It has been said above that in a skewed distri-
bution (M- Ql) and (Qa-M) would not be equal. A measure of skew-
ness is thus derived by finding out the difference between these
two values.
Thus
Second measure of skewness =(Qa-M)-(M-Ql)
=Qa- 2M+Ql
=Q.+Q,-2M
The above is an absolute measure of skewness. The relative
measure can be obtained by dividing this absolute measure by the sum
of (Qa-M) and (M-QJ.
Thus the coefficient of skewness or
;_ (Qs-M)-(M- Ql)
- (Q.-M) + (M- Ql)
QS+Ql- 2M
= QQ .... (ix)
3- 1
Jtt1ll TOSIS
Figure 4.
baurea of kurtosis
Kurtosis is measured by coefficient' f3. or its derivatio.r )'1' We
lave seen in connection with th e ltudy of moments that
~.
Q "'"
== -.-
"".
In other words P. is equal to the fourth moment about the mean
li-rided by the square of the second moment about the mean.
Y. = P. - 3
The standard value of fl. is taken SUI 3 and the CUtVC8 with valuei
f II. less than 3 are called ~latykurtic and curves with values of P. morc
lao 3 are called leptokurtlc. In a normal or metokurtlc curve the value
flJ. is equal to 3. .As sudl for a normal curve the value of Y. -0, and
I curves which are more Bat-topped o·r more peaked than the nonna}
nve the value of y. would be cithet a minus or pInl iigure. The
igge.r the value
!!parture from no
c;!,!j1 in sa frcqueru:y dittributiOD. the greater is its
ty.
iapeaion, .xccwnes. IRIld kurto8ia contrasted
Now that we have .tudicd CIi.apetsion, Ikewocsi and kurtOllis, it
·ill not be out of place to comparc1Ulcfcontralt them,llI all the.e meuurcs
:e meant to study the formation of a frequency distribution;Disper.ion
:udies the acatter of itcml unmd a central value or among themaelTcl.
: doa not ahowthe extent to which deviations dulter below an QeDlle
246 FUNDAMENTALS OF S'rATISTICS
::>r above it. Measures of skewness study this point. ,They tell us .about
the cluste!= of deviations above and below a measure of central tendency.
In a normal distribution the deviations below and above an average are
equal while in an asymmetrical distribution they are not equal. Kurtosis
studies the concentration of items at the central part of a series. If the
items concentrate too much in centre the curve becomes leptokurtic.
and if the concentration in the centre is comparatively little the curve
becomes platykurtic.
Thus we find that measures of dispersion, skewness and kurtosis
study three different aspects of a frequency distribution. Measures of
dispersion throw light on the span withil;l which values of a variable lie.
They study the size of a series. Measures of skewness throw light on
the shape of the series and the size of variation on either side of a central
value. Kurtosis studies the frequencies of' a series at the cent.ral values.
The theory of skewness and kurtosis has not a very great impor-
tance in economic and social studies, as in these cases a normal distri-
bution is usually out of question, but the importance of these studies is
very great in biological studies and studies relating to other physical
sciences.
Questions
I. Define moments and discuss the method of calculating momcllts of dja-
persion about the mean.
I
2.. How would you calculate the value of a moment about the mean from the
value of the moment about any arbitrary value ?
~ . What is skewness? How does it differ from dispersion? What arc the
vadous measures of skewness which you know ?
4' What ia kurtosis? What purpose does it serve? 1& the ltudy of kurtosil
useful in economic and social scieoces ? If oot. why ?
5. Find the Second Moment of I;>ispersioo and a coefficient of skewness from
the data in the following series : -
Size of item Frequl;ncy Size of item Frequency
3 7·5 8S
7 8·S 32-
za 9·S 8
60
61. Find out the mean wage and a coe6icien't of skewness for the following :_
3~
40 ..,.
men get at the rate of Rs.
..
•• ,.••
5-5 0 ....
4-50 per man
"••
48
roo
u." .... ..
.f
6-50
7-5 0 .. .
...
f' f'
.f 8-5 0 f •
..•• ..
....
87 9-5 0
..
'f
43
2:& .. ff 20--5 0
Ir-50 .. ..
MOMENTS, SltBWNESS AND KURTOSIS 247
7. Frnd the coefficient of ske-wness from the following data :_
Heights of school bays at age 5
Height in No. Height in No. Height in No.
inches inches .inches
28 1 ~6 166 44 567
29
30
0 n 3'44 45 233
I ;8 740 46 89
31 3 39 U 67 47 27
32 S 40 1670 48 19
33 13 41 1614 49 4
34 I 40 42 154 1 50 4
H i 59 4~ 102.8 SI 1
(Bombay, 1935).
Find out coefficient of dispersion and a coefficient of skewness from the
8.
following table giving wages of 230 persons and explain their Significance.
Wages No. of personS Wages No. of pecsons
Rs. 70--So 12 lIo--UO 50
SOP-90 18 120--130 45
90-100 ~S 130--140 20
.. lOO-lIO 42. 140-150 S
(B. Com., AgTa, 1940).
9. The following table gi~es. the di.tribution of pop?lation in. towns A and B
in age groups. Compare the var'atlon and skewness of thelf frequenCIes.
Age-group Population ,in thousands
A B
0-10 is 10
10-2.0 16 12.
Z0-3° IS 2.4
30 -40 IZ '2
40-"-5 0 10 29
50-""60 5 II
60-70 Z 3
Above-70 I I
(B. Com., AgTa, 1947).
10.From the following table state in which section the coefficient of skewness
is higher.
Marks obtained Students of sec. A Students of Sec. B.
0.--10 I 0
10--2.0 4 ~
20--3 0 10 10
S0--40 2Z 13
40--5 0 So 42
50--60 35 So
, $0--70 to 10
70--S0 7 8
S0-90 I 2
I I . Weekly earnings of tWo groups of workers in factory A and B arc given
«nd you arc asked to compute coefficient of skewness by Pearaon'. method.
248 PUNDAV8NTALS Of STATISTICS
a
:0-:4 IJ
611. Pinel the mean deviation and lta'nd2rd dCTi;ttiOn-of' the follow ins table giving
tbe mark. obtained by soo candidates. Calculate alle) a coefficien.t of .tewnesa.
lJur.>hcr of marks No. of candidates Number of marb No. of C2ndidsres
f . ~ 10 ,0 Below SO tl)lf.
.. 20 70 •• 60 3S..
.. 50 120 .. 70 4a~
.. 40 ,6. .. 80 100
~.Plod the standard and qutlrtiJe deviations and coefficient of akC'iUICIII for
the following : -
0-,
Class
,-6
..
PI:Cq'l1coq
8
CbIa
I&-JIG
1l0-:4
Prequency
a4
as
O-IO 10 z.t-a, 20
1_1,
10-13
tria
14
16
u
,o--,a
as-,o 14
u
JZ-J' 7
'70~ Pind Qutthe mean ;!cvution. standard deviation and quartile deviation (tOm
rile following table. Also ealQllarc II. coefticient of .kewndll.
.6.. Calculate K&rl peaftOq·. coefficient ofslcewnes.B fcom the f'ollowias datA 1 -
Mub Namber of 1Itudents Mala NlImber of atudca..
AIac"e 0 1,0 Abo"c ,0 70
10 140 •• 60 ,0
..••., :: I: .. I:
,t.
t
40. 80
(8. C-•• AJItI•• I",).
17' Calculate the valae of the ,td molDeot about the meaA hom the data alyce
itt question No. 16 rabo'Ye.
II. Fied out the muca of tJa
aod),. from the data 8ival in question No. I,
above.
19. Pind the mean, IllO<ic. standard dc"iailon and a co-effideot -of Ilcewneat
for the foUowins : -
YC&l1l UDder 10, :&0. ,0. 40, )0, 60.
No. of peaona I" Ja, )1, 7 8, 97. ICl9.
(P.CoS., 19'11).
80. CAlcQd:e Mean, S.D. md Karl Peamon's Co-cticlent of akewaca from
the £oIlowing d:t.t& pupea into 1.1QequaI. step interftla.
t
Clads
p~
1002.0
24
J-9
fJ,
'-4
1-2 .. J
1121
0
obviously means Wholesale Price. The reason for it is that the wholesale
prices are more stable than retail prices and in one locality tl' ere is one
wholesale price of a commodity, but the retail price varies from shop
to shop. Besides this, the wholesale prices are more quickly affected
by changes in demand or supply or by other similar factors than retail
prices. Wholesale prices are more sensitive than ,retail prices. Retail
prices depend on wholesale prices, and, therefore, there is always a time
lag between changes in wholesale prices and changes in retail prices.
As such retail prices cannot give a correct picture of the price level at
:a particular moment. But this does not solve the problem of definition
'altogether, because we shall have to define the- term Wholesale Price.
Wholesale price can' be ex-factory price or price including incidental
expenses or price at which wholesalers in a market purchase or sell a
commodity. Again wholesale price may be at one level at the opening
of the market and at anotheiIeve1 at the close of the market. A decision
about the exact definition of the term wholesale price would derend on
the purpose of the index number. Generally, wholesale price indtx
numbers take into account the price at which wholesalers in the market
purchase or sell a commodity. If the price of a commodity is controlled,
then controlled prices are taken into account even though the price in
the black-market may be much higher.
FreqNency of price qNotations. Yet another thing associated with
the price quotations is a decision with regard to their number. The
question that arises in this connection is, how many quotations should
be obtained per week or per month, as the case may be. There is
no hard and fast rule about the frequency with which price quot!tio,ns
should be obtained. In general the larger the number of quotationa
the better it is. But too many quotations also complicate the prob.
lem of' the construction of inde;x numbers. Oq:linarily if an index
number is constructed every week one quotation per week is consider-
ed enough. For a monthly index numl-er at least four quotations per
month should be taken. The Economic Adviser's index number of
17
258 PUNDA!ctHN'l'ALS OF STATISTICS
CHOICE OF AVElL\.GE
If we, have to study the price variations of only one commodity
the simple price relatives are the relevant index numbers and the problem
of a choice of average does not arise. But in actual practice the technique
of index numbers is used to study the changes in general price level and
in such cases more than one commodity have to be taken into a!=count.
When the price relatives of all such conunodities have heen calculated,
the next problem is to average them into a single figure. Theoretically
any Iolverage can be used for the purpose but in practice a choice has to be
made from amongst the mean, median and geometric mean only. Which
of these three averages should be used is a question which requires
262 PUNDAMENTALS OP STAl'ISTICS
B 11 22 33
C 12 6 12
D 13 6.5 13
E 14 7.0 28
TABLE V
Averaging Pri&e Relatil'eS ih Fixed
Bare Metbod
Commodity Price Price Relative rrice Relat:
1940-=100 1941 1942
A 100 -2~- -300
B 100 200 300
C 100 50 100
D 100 50 100
E 100 50 2.00
Tota-l- 500 550 1000
Mean 100 110 200
Median 100 50 200
Geometric Mean 100 87 178.2
TABLE VI
Averaging Link Relatives in Chain Base Method
Link Relatives
Commodity
1940 1941 1942
A 100 200 150
B 100 200 150
C 100 50 200
D 100 50 200
E 100 50 400
Total 500 550 noo
Mean 100 lio 220
Median 100 50 200
Geometric Mean 100 87 204.6
INDBX NUMBBIRS 263
TABLE VII
Calcnlation of Fi«~d base Index nllmber J by the lise of Geometric
Mean and Arithmetit' Average
Mean 100 87
In the abOve table the price of _commodity A for the yedr 1954
is double its price in 1953, and of commodity B it is half of the price
of the year 1953. If these two commodities are of equal importance
there should be no change in the value of the index number from 1953
to 1954. If the price of one commodity is doubled and that of the
other is halved there is no change in the general price level. However,
the arithmetic average of the price relatives shows that there is an increas~
of 25% in the prices of 1954 as compared to the prices of 1953. Th~
geometric mean, however, does not reveal any change and it is thus a
correct measure. Similarly in the year 1955 when the price of commodity
A has gone up by 50o/~ and ofB has gone down by 50% arithmetic average
4
does not record any change. A 50% fall is never made good by a 50'0
rise. 50% fall can be compensated by a rise of 100% but since arithmetic
average measures absolute changes a 50% fall has been set off against
a 50% rise, and the ind* number shows no change. The geometric
mean, however, records ~ change and shows that the average price has
gone down. Thus, geotpetric mean is a better measure than arithmetic
average, so far as the cor.struction of index numbers is concerned. It
has another advantage ovc!'r the arithmetic average inasmuch as, it makes
possibfe the replacement of commodities which have become obsolete
and the inclusion of new pnes without affecting the balance of the index.
We shall see a little laterl1:hat the index ntlmbers which are constructed
by the use of geometric tnean are reversible and satisfy the time reversal
test, and make base shifting an extremely easy task. Therefore, in the
construction of index !}1Ilmbers invariably the geometric mean should
be used. The Economic Adviser's Index Number of wholesale prices
in India also uses the ~eometric mean. '
INDEX NUMBERS 265
PR.OBLEM 0.1' WEI<aITING
~C ~::nd
Dozen
I 16
~ \
5.6
1~ I ~~6
7.0
D Yard 21 1.5 I 1.4
Sollllioll. The price reLitive of the current year
_ Current year's price X tOO
Base year's price
and according to this rule the price relatives of the commodities A,
B, C and D for the current year would be respectively 122.5, 160, 125
and 93.3.
The values of the base year
=Quantity of the base year X Price: of the base year, and according
to this rule the values in the base year for conunodities A, B, C and D
would be respectively 112, 12, 89.6 and 31.5.
These figures are used in the following table for calculating
weighted index number : -
Price Relative Values or
Commodi~y of the current year weight Weight X Price
relatives
(I) M (IV)
A 122.5 112 13,720
B 160 12 1,920
C· 125 89.6 11,200
D 93.3 31.5 2,939
Total 245.1 29779
Weightid index number of prices ~ l~ "",~9779
... ~V 245.1
-=121.5
268 FUNDAMENTALS OF STA'l'ISTICS
Commo-
dity I
Unit
Quantities
yefiT (qo)
p",,, of Pd,,, of
year (Pn) year (PI)
r
Colell/aliolJ of Index Ntlll/ber by Ihe Weighled AggregatitJe Melhoi
I
of the base the base the current (IJo XPo~ (fOXP1
-- A Maund 7 16 19.6 112 137.2
B Kilo 6 2 3.2 12 19.2
C Dozen 16 5.6 7.0 89.6 112.0
D Yard 21 J 1.5 1.4 31.5 29.4
"l:.pofJo ~lP"
I Total I / I 245.1 29 .8
Where Pt and Po stand for'the prices of the current year and base
year and 90 for the quantities consumed in the base year. The follow-
ing example would illustrate the aeove procedure : -
Example 2. Construct the cost of living index number foe 1940
on the basis of 1939 from the following data using the Aggregate Ex-
penditure method.
Quantity con- Unit
Article~ sumed in Price in Price in
1939 1939 1940
Rs. Paise Rs. Paise
Rice 6 mds. roaund 5'75 6
Wheat 6 ., 5 8
Gmm 1 " 6 9
"
Arhar ,pulse
Ghee
6
"
4 seers
••"
seer
S
2
10
l·S
S:rr 1 md. maund 20 15
S t 12 seers ., 20.50 18
Oll 20 Sc.Qrs maund 4 4.75
272 FUNDAMENTALS OF STATISTICS
Where I stands for the current year's price ~elative and V for t}
values of the base year.
Example 3. Construct the cost of living index number for 19~
on the basis of 1946 from the following data using the family budg'
Method.
273
Articles Quanti ty con- Unit' Price in Price i n -
sumed in 1946 1946 1950
lts. Its
Rice 5 mds. per md. 12 16
Bajra 5 mds. "., 8 10
Wheat 1 md. ",. 10 20
Gram 1 md. .,., 6 12
Arhar 5 mds. .... 8 12
Other pulses 2 mds. "" 6 8
Ghee 4 kilos per kg. 2.5 4
Gur 2 mds. per md. 5 10
Salt 12.5 kilos per 40 kgs. 8 10
Oil 24 kilos per 40 kgs. 40 50
Clothing 40 meter per meter.5 1
Firewood 10 mds. per md. 1 1.6
KH~sene 1 tin per tin 4 7
ruse-rent .•. j>cr house 24 30
So/unon. Construction oj '(;osl oj Livi,,1. Ind,x NlIII1b,,.
, I 0 I 0 PrIce Values Product
Quantities . ~ G' ~ ~ rela- \ conSu- of price
consumed ,..Q ct. to)...... tives med in relative
• d ...... c: .. fi
Articles In base .... '-".... ~ or \baSe year and
Year 8 u·
Unit ..... ~ 8 1>0. current (wei- weight
.... ~
0':; 1>0. d:: c: year I ghts)
I ~ (1) (V) (IV)
lts. Rs. lts.
Rice 5 mds per md. 12 16 133.3 60 7,998
Bajra 5 mds. .,,. 8 10 125 40 5,000
Wheat 1 md. "" 10 20 200 10 2,000
Gram 1 md. "" 6 12 200 6 1,200
Arhar 5 mds. "" 8 12 '1 150 40 6,000
Other pulses 2 mds. ,." 6 8 133.3 12 1,599.6
Ghee 4 kilos per kg. 2.5 4 160 10' 1,600
Gur '2 mds. per md. 5 10 200 10 2,000
Salt 12.5 kilos .. 40 kgs. 8 10 125 2.5 312.5
Oil 24 kilos " 40 kgs. 40 50 125 24 3,000
Clothing 40 meters per meter .5 1 200 20 I 4,000
Firewood 10 mds. per md'l 1 1.6 160 10 lAOO
Kerosene 1 Tin per tin '4 7 175 4 l~ 700
House-rent - per house, 24 30 125 24 i 3,000
.. - ---,-----1:---:--- - - - 272.5 . 0,010'1
l:IV 40,010.1
Index Number of 1950c=~ = 272.5 '=147
In fact the two methods discussed above are the same as weighted
aggregative method and weighted average of relatives nlethod discussed
earlier ill connection with weigbting of wholesale price index numbers.
The results given by both'these methods are always identical.
18
274 FUNDAMENTALS OF S'1'ATIS1'ICS
2.:.. X 102-
100
92. 102.
- X ~XI04
[00 100
92. 102 104 ()
- X - X - X 9"
100 100 100
92- 98 102 104
- X - X - X - X 10 3
1949 100 100 100 100
_
92 102 104 9 8 y 10 3
X -~ X - X -,,-XIOI
10 [ .100 100 100 100 100
278 FUNDAl4ENTALS OF STATISTICS
BASE SllIFl'lNG
100
1ndex B I'
I
r Index B spliced to A
193 8
1939
180
2.00 100 I, 100
100
X
J50 xzoo
2."00
=2.00
1940 15 0 -100- =3 00
1941 160
I '"
IGOX200
roo
Thus in the above example the ratio of the two index numbers
FO
1939 65 100
1940 70 110
1941 75 IZ.O
1941. 80 13 0
1943 9° 15 0 \
1944 100 ~oo
1945 120 z5 0
1946 15 0 35 0
Sollliion. Deflatinl!, per &apita i n&01116
t'er caplta " Lost of llvlng Vetlated or real per
year income index capita income
Rs. Base 1939 Rs.
19~9 oS 100 65. 0
100
1940 70 110 70 x-=63·
110
6
100
1941 75 120 nX-=62.5
11.0
100
1942 80 13 0 80X-=6I.5
130
100
1943 90 15 0 90 )(-=60.0
ISO
100
100 zoo 100X-=so.0
1944 zoo
100
120 25° IZOX -=48.0
19 4 5 25 0
100
15 0 35 0 15 0X -=4Z.·9
1946 35 0
INDEX NUMBERS
REVERSIBILITY TESTS
As we shall see later on, the only formula which satisfies the factor
reversal test is the "Ideal" formula given by Fh-her himself.
Circulai' 'test
Another test applied in index number studies ;.s the circular test.
It is a sort of extension of the time reversal test. Suppose an index num-
?er is constructed for the year 1955 with the base of 1939 and ano~her
lUde::s; number for 1939 on the balie of 1914, tcen it should be possible
for us to directly get an indc;K number for 1955 on the base of I914. If
the index number calculated directly does not give an inconsistent value,
the circular test is said to be satisfied. If POl represents th:: price change
of the current year on the base year and PlI the price change of the base
year on some other base and PIO' tht! price change of the current year on
this second base then the following equation should be satisfied : -
POI XPn XPIO=I
This test is not fulfilled by m9st of the common formulae used in the
construction of index numbers. Even Fisher's ideal formula does not
satisfy this test. This test is fulfilled by unweighted or fixed weighted
aggregatives or by index numbers which use simple geometric mean.
THE PROBLEM OF AN IDEAL INDEX NUMBER
~Pl qo
P01 :E Poqo
In the above formula weights are the quantities of the base-year.
The weights are fixed. This formula does not satisfy the time reversal
test because when the index number of the base year (based on the current
year) is calculated the quantities of the current yeal' would be taken as
weights. This will be clear from the following example : -
Example 9.
dity
Price 1940 Quantity "dcc '95 0
Commo- (Rupees) 1940 Rupee,
PI ql
Quontltle'
'95
r>'9' Pof
0 t,g, Pt'1l
'10 Po
_ - - - --~------_:-----
A 10 3 20 30 40 60 80
B
Total
t
5 4 15
,
4
3 %0 IS
- - - -- -
50 55
60
120 125
-_
45
:E Plql
POI =:EP091
This formula differs from the previous one inasmuch as the weights
, used he~~ arc the quantities of the current year and as such weights are
not fixed. This indc-x number is also not reversible. From the data
given in Example No.8.
286 FUNDAMENTALS OP STATISTICS
If the year 1950 is now taken as base and the index number of 1940
is calculated the prices and quantities of 1950 would become re-spectively
p. and tJe and those of 1940 PI and ql respectively ..
lhus
50
Index number of 1940= -
120
Again we find that the time re\rersal test is not satisfied as the two
indices are not reciprocals of each other.
(;) DROBISCH .... 1'0 BOWLEy'g FORMULA
2-
From the- data given in Example 8
~ +~
So 55 l-n
Index number of 1950=------=--
2 IIO
5~ 50
12.5 120 +
257
Index flumber of 1 9 4 0 = - - - - - - = - -
2 600
Here again the time reversal test is not satisfied.
(4) FISHER'S ID&4.L FORMULA
The formula given by Fisher is a geometl'ic cross of Laspeyre's
aad Paasche's formulae.
In other words
P01-
_j 1: hlfJ>_ X
t, Poqo
~ .P"lJl_-
'i:.1POQ1
This formula sati.sfies both the time reversal test and the factor re-
versal test. From the data given in Example 8
12OI-Z;
Index number of 1950 = J -So- x -55-
INDEX NUMBERS 287
:And jf 1950 is taken as base, in which base, the prices and quan-
tities of 1950 would respectively be Po and qo and the index of 1940 is
calculated (the price and quantity of 1940 would now De PI and ql res-
pectively) it would be
The two index numbers are reciprocals of each other and the time
reversal test is satisfied.
Symbolically
and
Thus,
=1
This formula satisfies the factor reversal test also. In factor reversal
test besides the index of price change, the index of quantity change has
also to be calculated. In calculating quantity index the weig~ts are the
prices; in other words, the positions of P and tJ are interchanged. Thus,
quantity index of the current year or
~Plq]
qOl = J EPlqO
'E.poqo
X
'LP~1
_J~P~l X 'LPltJl
- .P04o 'LPlqo
For factor reversal test
POIXqOI = 'LAtJl
-"i:.p~q-;
In Fisher's Ideal formula
and
P01= J~x....::1._
55
50
:x.12~
Thus
qOl = J--;;-- 55
120
J ~x~x-!I.x
12 5
POIXqol= 5055 jO 120
12 5
50
The value of 1:. Plq 1_ is also equal to 125 Thus I factor reversal
~Po1o 50
test is satisfied
This fonnula does not satiSfy tqe circular test. Though Fisher has,
called it an ideal fc.rmuJa, its practical utility is not much because in its
ca1cul~tion each time fresh weights have to ,be used and generally it is
very difficult to have correct informatiOJ1 about, weights e.ch time the
index number is prepared. Moreover the computation of index number
by this formula involves difficulties in calculation.
The two ilidices are reversible but in the computation of this in-
dex number also the current year's weights are needed. It has !he same
defect which the Fisher's ideal iode-x has. .
(b) A '1ariation is possible in the above-me~tioned form~ if
i06tead of arithmetically crossed weighted aggregatrv~ geometrIcally
crossed weighted aggrega?ve~ are taken into a~count.. It it is done the
formula becomes more saentific. Wah" consIders thIS formula as the
best, from th~ theor'~tical point ,?f view.
Thus
~ Vq-;(iJi;)
POI = E' '\Iqo(qlPo)
The data given io.Bnmple 8 would be used in the following manner
to c()ostruct index numbers in accordanC"e with this formula.
Priet
~Q1uantity
I~
'";:I ~
T94C Quantity
\~
.l:'nce
Com
roodity (PQ'
1940
( qo )
.1950'
( '>1 )
0 95
( fl. ) ~• ~
~ ~
0
~
---~-
A
--- -
10
--- -- - ---
o:t.
-> --- --~__
_- -_.
3 ..0 4 140 15·5 110 10·9
r
rotal
13 4 IS 3 180 13·4
---
7R.(,
60
--- __
7-7
--
TR,(!i
----~--~~---------------
index number of 1950= 2S.96_and
1£1.
Index number of 1940 = T8 ·6
1 S.9
This index number also satistte$ the time reversal test but here also
the difficulty is the same as in the previous two formulae, i. e., it als
r.e~uires current weights. which are very difficult to obtain. '
(6) KELLY'S PORMULA
Questions
1. "It is-really questionable-though bodering on }1carsay to put tbl= qucstion-
whether we would be any worse off if the whole bag of tricks relating to lodex numbers
wetc BCIapped." Commenting on the above statement, discuss the utility of the index
numbe1: in modem times.
2. "The pernicious nature of tying wages to cost of living indexes is apparent.
The whole scheme is positively Machiavellian in its acceptance of deceptions at a ne-
cesaitt in )'lolities. And does it really work so well after all? The truth is that
too ineffiClcnt even to keep the workers s~dardized." Comment.
3. "Index numbers-are economic barometers." EXplain this statement and
mention what precautions should be taken in making use of any published index num-
bers. (B. COfII., AlJababad, 19'z).
4. On what basis should commodities be selected for purposes of constru~is
a wholesale price index number ?
5-. Describe with illustrations the construction of a weighted'lndex number
of wholesale prices and show its importance. (B. Co",., Nllgptlr, I9U).
S. Distinguish between fixed base and chain base iOdex numb'er. What are
thei1:tcapective merits and dementa ?
292 l'UNl)i\MF.NTA L~ OF ~TATJSl'lC~
7. Examine the claims of (a) ge<_:>metric average'and (b) chain ~as~ metho,d In
the technique of index number construct.~_I. Illustrate your answer with illustration •.
(B. Com., Delhi, 19H).
8. Show givJ.ng suitable examples the importance of the use of index numbeca
in interpreting ecbnomic effects.
(B. COlli., Allahabad. 1946).
9. What po~ts w~uld you take .into considera!i~n ~ choosing the base and
determining the wClghts In the 12repamtlon of cost of hvmg Index numbers.
(B. Com .• Agra, 1943).
10. What are the muin sources. of errorS in cost of living !index numbers ?
How' om these errors be avoided? (B. CI1IIJ., Allahabad, 1938).
11. Write a note on the construction of an index number ofind!lstrial production.
12. Explain Fisher's ideal meth<?d 'of constructi!lg ind~x numbers. D? yo~
think that this method can be adopted m the construction of Index numbf'rs In this
country? If not, why ?
13. Discuss time reversal and factor reversal tests. Do you think It is necessary
that a good index number should satisfy them? IT so, why?
14.' Define an index number. Explain the role of weights in the construction
of an index of a general price level. (M. A., Rajptdafltl, 1950).
IS. "The real problem for the maker of index numbers is whether he shall leave
weighting to chance or seck to rationalise it." (Mitchel/)
Distinguish clearly between chance weighfing and rational weighting and
suggest a solution of the above problem. Also discuss whether Fisher's ideal formula
offers a rational system of weighting. {M. Com., Allahabad, 19' I).
16. Discuss the problem of obtaining a perfect formula for tfe index number of
prices. Explain fully what is meant by reversibility of an index number.
(M. Com., AlJahPhmJ, 1940).
17. ••... the method of index numbers is at once applicable to the disentangiement
of that which ill common to the whole group from those variations which are special
co individual items." Elucidate. (M. COlli., Rajplllana, 1942).
18. "Index numbers seek to set aside the irregularity of individual instances
and replace it by the regularity of big numbers." Comment. (M. A., Punjab, 1953).
19. Describe briefly the method you will adopt for the compilation of cost of
living numbers forworking classes in an Industrial atta. (B. Com., Honr. Allt/hra, 1944).
20. "The discus~ion of the proper weights to be- used has occupied a space in
statisticallitt!rature out of all proportions to its significance. For it may be said that
no great importance need be attached to the special choice of weights; one of the most
convenient facts of statistics I theory is that given certain conditions the same result
is obtained w!th sufficient closeness whatever logical system of Weights is applied."
(Bowley). DISCUSS the above statement.. (M. Com., Allahabad, 194 8).
21. Write short notes on- '
(a) Base shifting, (b) Splicic}g of index numbers, (&) Deflating of index
Qumbers.
22. Y-ou are required to construct a cost of living index for the- textile workers
of a city•. What information yon will collect for the purpose? Explain the method of
constructlOg the mdex. (1. A. S., 1943).
23· Co~pute.thei?de"numb~rsfoieach year fromtRe following average annual
wholesale prIce ofll1te In Cakntta In mpeeR f'er bale of 400 lb•. for the period 1914 to
193 0 :
INDEX NUMBERS 293
Year Rupeef! Year Rupees
I l4 78 192% 88
191} H 192.3 78
1916 67 192.4 ;'6
1917 56 192.5 Jl2.
1918 72 1926 99
[9!9 '<)2 1927 76
1920 9R 192.8 7}
192.1 94 192.9 71
193 0 5"
24. Construct appropriate ind("x numbers to discus!> the fluctuations in thc
expurt of raw cotton and raw jute from India for the period 1930-31 to 1935-36,
using the average of the period 1926-30 as base.
Raw Cotton Qty. Cotton value Raw jute Quantity jute Value
Year (1000 tons) (lakhs of Rs.) (1000 tons) (Lakhs of Rs.)
1926 -3 0 609 5,941 826 2,9 24
(aver-age)
193 0 -3 1 701 4.633 620 1.288
[93 1-32. 42.3 2,;45 5 87 t,119
[93 2 -33 36 5 2,°37 J 63 973
19H-H 104 z,7H 74 8 1,093
[9H-35 6.13 3.495 752. [,087
1935-3 6 607 ,3,3 77 77 1 1,31 1
(1. C. S., (939).
25. Use the following <.lata of industrial production in India to compare the
annual fluctuations in Indian industrial activity by the chain base method:-
Index Numbers of Industrial Production in India
Year Index No. Index No.
12.Q 149
-2.1 rz2. 15 6
n6 -29 (37
I~O .-3 0 16.2
1%0 - 31 149
-3 z 160
--33 160
(M_ CD",., Luehno1lll, 19H).
26. The following table ~jves the avctagc I ... ho]esale prices of the commoditics
. A , Band C dot II
----- -_._------- ----
..
{.
2.
A
B
:~
-0
C.
a
e
(3
--__.__
50 .6
6.8
1944
----
61.6
1945
6.4
--R-------
Average wholesale pric'-'8 in Rupees
1946
----
66.8
J.6
'~47 194 8
-- ---- ---- - - --
71.0
6.2
70.6
6.4
1949
72.0
7.8
195 0
72.0
6.0
195 1
7~.6
6.8
3· C 29.6 25.8 26.4 28.6 28.6 30.2 d.o
.. 34.6
Find out thc m..!cx mlmu.:rs (I) by rcterCllct: tv. 1944 as base yeru: (II) by the cham
balle method.
:1-7. You arc given the following series of index numbers of whdlesale'pdccs of
four commoditiea and a straight index based on the average. CaI,..,latc Ii new index
for the five years based on the chain method.
294 FUNDAMBNTALS OF STATISnCS
.
Year A B C D Total Average
I
1949 33 0 616 3 84 371 1,671 41 8
19,0 171 660 35.:& %40 1,424 35 6
19" 176 1 6~6 35" %40 1,408 ~S2
%8. Which avenge would yoo use in computing tho: price Index Number frorn
the followbg d?t.. f<>f 1934 on the basil of 1930? Give realons.
100
eM. C_., LIltA:mJ.,. 19jO.J
INDEX NUMBERS 295
Prlces QuanUty
Crops ___ Base year 1947 Base year 1947
1 12 20 50 120
2 10 12 100 70
3 14 15 60 70
4 16 18 30 50
5 18 20 40 40
6 :a 15 70 60
7 20 16 90 100
8 !l.-_ 18 80 80
Find the index nUn;Jbers for 1947 by (I) Base year weighting, (2) Current year
weighting, (3) Fisher's Ideal Formula.
32.-' Given the following data, what index numbers would you.J.!se for purposes
of cod('parison ? Give reasons.
Rice Wheat Jowar
Year Price Quantity Price Quantity Price Quantity
19 2 7 9.3 100 6.4 It S·l 1
1934 4·5 90 3.7 40 2·7 3
Prices and quantities are given in arbitrary units.
,M. A., Cal., 1947).
33. Explain what is meant by Factor Reversal Test. Construct with the help
of the data given below Fisher's Ideal Index, and show how it sati<;lies the factor
re versal test.
Estimated total produce Harvest price per
in thous:md tons in mannM in fHRt.,.;,.....·
- -·dts"Trict Sarau -- Sa~-----
3". The following figures show the imports of cotton p!ece-goods into India
from Gteat Britain duting 191,-14 and :it few post-war years. Flll:d (a) i~dex nU~\lbet8
of quantity, (b) index numbers of value, and (c) index numbers of prtce, usmg the figures
of 1915-14 as base.
...
I,
5
S
6
75
00
00
8
9
6 00
00
00
Arhat pulse 6 " 8 00 IC 00
Ghee 4 "
seers seet 2 no 1 50
Sugar 1 md. maund 1.0 00 15 00
Salt
Oil
Oothing
I2 seers
20
50 yds.
. seer
,.
ya.rd
20
4
0
50
00
50
18
0
a
00
75
75
Firewood I2 mds. maund 0 50 1 Ill:
Kerosene I tin °tin 4 00 3 12
House-reot bouse 10 75 12- 75
INDEX NV MBBRS 297
40. Construct the cost of living Index Number for 1950 on the basis of [~4P
from the following data using the Family Budget Method.
.... .."
Gur 2mds. permd. S 10
Salt u.5 seers. S JO
Oil 24 seers 40 50
Clothing 40 yd". per yard ·5 I
Firewood 10 mds. per md. 1.6
I
Kerosene I tin per tin 74
House·rent per house 24
3°
41. An enquiry into the budgets of the middle dass families in a dty in
England gave the following information:
Expenses on Food Rent Clothing Fuel Misc.
35% 15% 20% 10% 20%
Prices (19z8) [.15 0 [.3 0 [.75 [.2.5 [.40
Prices (19Z9) [.145 [.3 0 £65 {,z .. [.4'
What changes in cost of living figures of 19Z9 ~s compared with that of 19z8
are seen ? (B. Com.. LII&kllolP. 1944).
4Z. An average famiI;. of inauStriit workers in a certain town consumed during
August 1939. 1.5 maunds 0 foodgrain, 10 yds. of1::loth, 2. maunds of fuel. and 1 ti~
of kerosene oil and paid Rs. 15 as house rent. Food grain then sold at an average price
of Rs. 6 per maund, cloth at 8 as. per yard, and fuel at Rs. Z.4 per mc~. while a tin of
kerosene at Rs. ,. By August 1943, the average prices of foodgrains and cloth had
risen to three times and:r.j. times the pre-war average, respectively, fuel rose to Rs. S
per maund and hOllse rent to Rs. zoo The solitary exception was kerosene Whose price
fell by 8 annas per tin.
Express in quantitative terms, the rise that took place in the cost of'living of
industrial workers in the given town in August 1943, as compared with August 1939
making clear your method of approach. (M. Com •• Allahabad, 1948).
43. Given below are two sets of indice~. one with 1939 as base and the other
with 1947 as base. "
(a) Year Index Nos. (b) Year Index Nos.
1939 100 19<!7 100
1940 12.0 1948 no
1941 15 0 1949 90
1942 zoo 1950 98
1943 300 1951 101
1944 :150 195z 110
1945 370 1953 98
1946 3 80 ~914 96
1947 400
298 FUNDAMENTALS OF STATISTICS
The index number (0) with 1939 base was discontinued in 1947. It is desired to
splice the dccond index number (b) with 1947 base to the first index number for the sake
of continuity. How will it be done, so that the combined series has a common base
of 1939 ?
44. You are given a sufticic:nt number of family budgets which shows the total
expenditure of each family, and the number of children, adult males, and adult females
in each family. Explain bow you will employ the data to derive the most suitable weights
that may be given for the different costs of maintaining (0) one adult male. (b) one adult
female, and (&) one child? (LA.s., 1947).
45' What considerations should enter in the selection of the base period for the
computation of serial index number? When aQll how will you give effect to a shift
in the base originally selected? Explain with illOstrations the meanings of.the terms: ,
Factor reversal test, Fisher's Ideal Index Number. (I. A. S., 1952).
46. Q; and Pi respectively denote the quantity purchased and the pr.ice per unit
of each of n commodities (;= 1,2, ........................n) in the 'base' year; (jl and PI, the
corresponding measures in the current year.
What exactly do the following ratios indicate, and test whether each of them
satisfies (0) the time reversal test and (b) the factor reversal test l -
(,) ~pil ~Pi ; (#;) '4Qi pi/ ~Qi Pi ;
(;;) "»II pi/ IQi Pi ; (i,) I(Qi+gz) pi/ ~(Qi+!Ii) Pi.
(,)
X
"2qI
»liP;
pi] i
--
In all cases sum'lllltion eJ[tends over all the" values ofI, and the ratio is multiplied
by 10C. (I. A. S., 1943).
47. Define an 'Index number' I
The average of wholesale prices was higher in 1937 than in 19~6 by 1~.I per
cell.t. ~he index numbers ~or the two years b;ipg !o8.7 and 94.4 respectively (19;0= 100).
TIllS Incte:'se f~~owed rISes. of 6.1. 1.0 and 2.8 per,cent. Cllch year being compared WillI
[he precedmg. 11l 1933-Pnccs- were the same as In 1932 but 1.1 per cent. below 193 1 •
Prices in 1931 wete 13.2 per cent. below 1930'
From these data compute the index numbers for each year from 1930 to 1937.
(P. C. S., 19S6).
4 8: Explain the methods by which index numbers of volume and price of national
productIOn are prepared, and discuss to what extent each method is satisfactory ?
(I. A. S., 1947).
. 49· You are required to construct a cost of living index for te:ttile workers of a
CIty. Indicate what information you wo~ld collect for the purpose, and explain the
method of constructing the index.
(I. A. S., 1948).
So. How will you construct an index number of prices that will exhibit with great
s~nsitjveness movements in the general price level. Examine from this point of
vIew the (Indian) Economic Adviser's Index Numbefl~f Wholesale prices (I.A.s.,1949).
p. What are index numbers of prices and for what purposes ate they used?
descnbe the general method of construction of a wholesale price inde" nu:nber illus-
trating your remarks with the help of any official index in current use in India.
(L A. S., 1955).
. 52. What is 17ishcr's Ideal Formula for preparmg Index Numbers ~ What are
TIme Reversal' and 'Factor Reversal' tests?
INDEX NUMBERS 299
Compute an appropriate index number for purposes of comparison from the
following data : -
Rice Wheat Jowar
--------- -
1935 4 50 3 10 Z 5
1945 10 40 8. S 4 4
(Prices and quantities are stated in arbitrary units.) (1. A. S., I95 6).
53. One of the important problems during the Second Plan period is to keep a
watch on inflation of prices. For this it is convenient to define a suitable index number
of prices and study its short term changes. A.t. is ut.uai with the oont.truction of index
numbers there are difficulties in (a) choice of a suitable formula, (b) choice of a base.
period, (f) periodical collection of statistics, ctc.
Yoiu are required to prepare a brief report, discus~ing the various issues involved
and givng your recommendations about building a series of index.number of prices,
keeping in view the purpose for which this series is intended. Also suggest at what
interval the index number should be computed. You need not writt' an essay on the
various ways of COt;nputing index numbers but only justify the procecluresou
recomn:end.
(1. A. S., 19,8).
As has been said earlier in these diagrams only the length of the bars
or lines is ta!ten into account. Since only one dimension of the figure
is taken into account these diagrams are known as one-dimensional dia-
grams. The bars which are drawn can be of any width or thickness.
It has no effect on the diagram. Howeyer, the thickness should not be
too much as otherwise bars would appear like rectangles and give a
misleading impression. Such diagrams are also known as Bar Dia-
gfam.r. Bar dbgrams can be of three types : -
(i) Simple bar diagrams.
(it') Multiple bar diagrams.
(iii) Sub-divided bar diagrams.
In simple bar diagrams one bar represents only one figure and a~
such there will be as many bars as the number of figures. Such dia-
grams represent only one particular type of data. For example, the number
of students in a university year after year can be represented by such bars.
Each bar would represent their number in a particular year. Multiple
bar diagrams are prepared on the basis of simple bar diagrams. These
diagrams represent more than one type of data at a time. Thus, if for
each year two bars are constructed- one representing the number of male
students and the other representing the number of female students, the
pigram would be a multiple bar diagram. Similarly, there can be dia-
grams in which 3, 4 or 5 or even more bars are constructed at the same
DIAGRAMMATIC REl'RESENTA"'TON OF DATA
Fig. 1
If the number of items is very large, bars have to be replaced by
simple lines. The technique of drawing such diagrams 1S the same
u in the previous case. The only difference is that the bars will have
no thickness. The data given below in table II is represented by lines
in figure No. z : -
DJAGRAMMl,TlC REPRESENTATION OF DATA 307
TABLE II
Heights of 32 sludents of M.(.om./inoi dau
Serial No. Heights in Serial No. H eights in
of students inches of Students inches
,
4' I I ' 17 ~, 5'
1. 5 ' o· 18 ~ 5'
4
3 5 ' o·
5' I'
I?
1.0
5' 6'
5 6' .
6
5 5 ' Z·
5 ' 2'
21
22
5
5
7'
7'
.
7 5' 2' 23 5 ' 7'
8 5', 3' 24 5 • 7'
9 5 25 5' . 7'
10
rI
5 • ;'
5' 3'
26
27
" 5 ' 8'
5' 8"
11. 5' 4' 28 5' 9'
[3 5 r 4' 2.9 S' IO~
14 5' 4' ;0 ~' 10·
15 5' 5' 31 5 ' 10'
16 ~' 5' 32 6' o·
Heights of St"dents
80 ,_.---_._-
70
!
60
so
.
I \
,
40 ,-
I
-
30
20'f-
I I
fO
o
o 5
I 10 t.s 20 25 .3032
Fig. 1.
308 FUNDAMEN1'AL~ OF STA'l'HTICS
61-61
60-61
5 6-'1
~ _ _~_J___-L_-L-_J~_-L_~__~_
f()O 110 '20 130 140 150 p,,:, 170 IRa 190 200 2/0
Pig. 3
OiAGRAM~IATIr. REPRE~ENTA1'ION OF DATA 309
(n all the above examples the figures given \\ere already in as-
cending order. If the figures are not in ei.ther ascending or descending
order they have to be so arranged for facility of comparison. But this
can be done only if the fignres do not relate to a series of years. For
example, it is not possible to first \vrite the figures of 1945 and then of
1948 and then of 1944 with a view to secure an ascending or descend-
ing series. Such arrangement is possible only jf the data are not 1n the
shape of time series.
TABLE IV
19 60 - 61 610.~ 62.4. 6 5
The data given in the above table can be vety suitably represented
by a multiple bar diagram. The figures of exports and imports each
year can be represented by two types of bars placed side by side. Such
dia.grams facilitate comparison. The figures gIven in table IV above
can be represented by a multiple bar diagram in the following manoer:-
310 FUNDAMENTALS OF STATISTICS
In
lor Cf'DI'e l"II{'(tc!.!
-----'------,
9 t ~£Kpot'f
8 1/111pOI'1
7
61- 6a
Fjg.4
TABLE V
TABLE V]
(erores of Rlipees)
Total 114°·S
-----
610.1
--
1:.73·~
DIAGRAMMATIC REPRESEN1A'l"lON OF DATA 313
o M'scelloneolls
t· ..'::Jlndustries
~Agncll/Illre
• Social Service
IIJlrr'.9ation
• Transport
Go vi Stoles 51ales
Fig. 15
. In the above figure the lengths of the bars are in proportion to
the total expenditure incurred by the three types of ~overnment_ Each
bar is then divided in six parts, each part representing the expenditure
on one of the six heads. It would be noted that the expenditure of
Central Government is highest on transport and next highest on
irrigation. The various parts within a bar are arranged in the order
in which the Central Government has spent money. The arrangement
of the pa,rts in all the three bars is identical. This diagram not only
gives us information about the total expenditure of the three types of"
governments on the first Five Ycar Plan but also the distribution of
this amount over various major heads. From this diagram we can
also compare the expenditure of various governments on any particular
head, say transport or agriculture or industries, etc. Such sub-divided
bars can be used to represent the time series ~Iso if the data are avail-
able about various parts, period after period. For example, if we have
got the 6gtues of the number of students of Allahabad University dur·
314 FUNDAMENTALS OF STATISTICS
ing the last te:_n years divided facultywise we can have ten bars to repre-
sent the total number of students in the last ten years and each bar can
be divided in four parts on the ba,:;is of the strength of students in each
of the four faculties of Arts, Science. Commerce ~nd Law. The
following table gives such figures about the Allahabad University for
a period of six years.
TABLE VII
if Allahabad University
commerc(!
i
1945
Law
Soenc!'
ArEs
Fig. 7
OIAGRAMMATIC
, REPRESENTATION OF DATA 315
TABLE VIII
~£~orl$
_Imporls
,
19 60 - 61 196 3- 64
Fig. 8
316 FUNDAMBNTALS OF STATISTICS
B ';00 34 0 +4CJ
D 4~0 4~5 +n
E 52.0 53°\ -~. l()
G 4(.10 3 80 -~o
- J no p.e -50
DIAGRAl.fMATIC REPRESF_,,"TATION OF DATA 317
Differences in the nllmber of hOllses in tIn smull
"'---_.
tOll;ns from 19,6 t .. 1966
B _ _ __
-=H
____I
_ _ _ _ _J
Fig_ 9
In all the a~ove diagrams bars have been ~sed to represent t~e ac~al
ligures. Many times comparison of the data IS done on a relative baSIS.
In such cases also, f'imple or multiple bar diagrams can be used. Even
sub-divided bars can be used for the purpose. If the data regarding t~e
cost of production of a particular commodity and its sale price are avall-
able for a number of years, sub-divided percentage bars can be drawn
to show the percentage cost of each item to the total cost. It is also
possible to draw bar diagrams which show the percentage of pronts
llnd cost to the total turnover_ In the following table the data about the
cost, I?roceeds, profit and loss per chair in the years 195;, 1954 and 1955
arc glven.
'TABLE X (/7)
Co.rt" Proc"ds, Profit or LorI per chair
during 1963, 1964 and 1965
l:'artlCUlarS 19()3 ----------~------
19()4 196~
(R>s.) (Rs.) (RS-.)
Cbst per chair
(r) Wages 12. 10 11
(2) Other costs 8 7 7
\
(3) Polishing 4 3 ~
I Total cost 24 20 21
Proceeds per chair ,-S 20 20
Fig. 10
In the above diagram first of all three bars of equal size have been
drawn. They represent the sale proceeds of each of the three years.
Then percentage of profit for the year 196; which is 4 is shown at the
bottom of the bar. From this level polishing charges which are 16% arc
DIAGRAMMATIC REl'RF."ENTATION OF DATA 319
measured. From this level (which is now zo% from the base line) an-
other part is cut at a distance of 3Z% (or S z% from the base line). The
remaining portion or 48% represents the wages. Similarly. the bar for
the year 1964 has been drawn. There, is no profit in this year and so this
ba~ contains only three divisions representing polishing charges, other
expenses and wages. In the year 1965 thereis aloss of S% so that the
total cost is ros % of the sale proceeds. This loss of S% is shown below
the base line and from this level the percentages of other parts are marked.
The bar is thus divided according to the variou~ :tems of~cost. It
should be noted that out of the percentage of polishing expenses in 1965
(which is IS) 5% is below the base line and 10% above it.
Though bar diagrams can be used to show many sub-divisions
it is not worthwhile to use them if the number of such divisions is large,
because in such cases comparison becomes somewhat difficult. Due to
disparity of figures in different bars various sub-divisions may be thrown
wide apart from each other.
TWO-DIMENSIONAL DIAGRAM'>
Rectangles
As has been said earlier, in such diagrams the size of the items is
represented by the area of tbe rectangle. As such, not only their lengths
are taken into account but also their breadths, because the area of a
rectangle is equal to the prod'lct of it::: length and breadth. When two
fig-Ilres are to be shown by the areas of two rectarigles, two methods
can be adopted: either their breadths :nay be kept equal and their lengths
in proportion to the two figures or, their lengths can be kept equal and
their breadths in proportion to thf' size of the two figures. In both
the cases the area of the rectangles would be in proportion to the size of
the figures. Generally the lengths are kept equal and the breadth in
proportion to the size of items. The data given below in Table XI are
represented in figure I I in the shape of a rectangle:
TABLE XI (a)
Expenditure on the first Five Year Plan
(in crores of rupees)
TABLE Xl (b)
(in Pert'Mtalu)
8~ uII)
; ~ ~
I Govt.
:::J
S
:::J
II)
~
0 States
:::J u
E ....u
~
I Up.,' up., uP::
Irrigation
~\~-f--
I 27.
27. J 21.43
---- ---_- _----
33·77
I 33-77 ~1·43
and power
I
Transport & ,
Communication 24·0 51.1 33. 01 54·44 9. 28 43· 0 j
Social
Services i
20·5 I 71.6 15 ·44 69. 88 ~l.p 74·57
Agriculture j I
and Develop-
ment
Industries
1.75
8·4
I !:l9. I
97·5
15. 02
I 1.82
84.9 0
9 6.7 2
20·!!7
\
2·93
95.44
9 8.37
Miscellaneous Z·5 II 100.0 ,.21\ 100.00 1.1) 3 100.00
Total 100.0
i
1----
I
_---
'00.00 100.00
----
I
I
DIAGRAMMATIC REPRESENTATION OF DATA 32-1
Part A Siales
Fig. 11
TABLE XII
Cost of Prodlltl;on, Profits and Nil",,," of Uf/its Prodllmi b.J 11110
jartories A au B
Particulars Factory A. Factory B
(Rupees) (Rupees)
Wages %000
Raw Materials ~ooo
Total cost 5000
Profits 40 00 1000
Total Sales 9000 4800
Numbe-t of units
produced and sold 1000 800
In the above table sale prices per unit in case of Factories A an!l
B are respectively Rs. 9 and Rs. 6. l'wo rectangles would be drawn
whose breadths would be in the ratio of 9: 6 and whose length, would
be in the ratio of 1000: Soo. These rectangles would then repr¢sen.t the
total sale proceeds and within them various dh'isions would be made to
represent the amount of different items of cost and the amount of profit•.
Cost of Prodllttion, Sale Proreeds and Profils of a t011l1ll0d;/y prodlilld
by fat/oriu A and B
No.ofumls FACTOflY A
750
Wages
Pig.. l~
DIAGRAMMATIC REPRE,ENTATION OF DATA 323
The first rectangle is 3' long and three items (profits, materials and
wages) whose t>arts are to be cut are in the ratio of 4000 : 5000 : 1000
The vertical scale is divided in this very ratio to get the sub-divisions
of profits, materials and wages. Similar calculations have been made
i~ case of the second rectangle representing the profits and the cost of
Factory B.
Squares
Amongst two-dimensional diagrams, sometimes squares give a
better comparison than rectangles and bars. They are specially useful
when some items of the series have values much higher than others.
In such cases if a bar diagram is Qrawn then the bars representing big
figures would be very big in size while the bars representing the smaller
figur.es would be comparatively very small. If, for example, two values
are in the ratio of 1600 : 100 one bar would be 16 times the length of
the other. In such cases, squares give a better comparison becausere
case of squares the area is taken into account. In the above case where
two figures are in the ratio of 16 : I the sides of the squares would be
in the ratio of 4: I though their area would be in the ratio of 16 : J.
In the construction of squares first of all the squa.re root of the
various figures is calculated and then squares are drawn with the lengths
of their sides in the same proportion as the s'luue roots of the original
figures. The area of the squares would be In the same proportion as
the ratio of original figures. The following table gives the ligun:a of
the production of coal in some countries in the year I9S I. These
6gures are represented in the shape of squares in figure 13.
Country Production
(00,00,000 tons)
U. S.A. 13°·1
Rllssia 44·0
United Kingdom 16.4
India ~·3
II
II
Fig. 13
DIAGRAMMATIC REPRESENTATION OP DATA 325
The scale in the above diagram is calculated as follows:-
The area of any square is first calculated. Thus the area of the
square representing the production of United Kingdom is .55 X.5 5 or
3.025 square inches. 'Ibis area represents- ~ production of 16.4 million
tons. Therefore one square incb would roughly represent 14 million
tons.
For facility of comparison, as also for saving space, the total of the
figures can be represented by one square and the other figures can be
sho\vn in the shape of divisions. These divisions would be in the shape
of rectangles. The divisions can be made either horizontally or ver-
tically. The data given below in table XIV have been represented thus
in figure No. 14 below : -
TABLE XlV (a)
Prodllction of Manflaner, (000 tons)
CoulILry l"ruuUCtlull
RUb:.l.. z.z.vO
South Africa SiO
Gold Coast 7 11 3
lndia 747
Frt"och Morocco 3 16
Brazil 179
Egypt 16 7
Japan I48
'~'otal 5410
. The total production uf 54,10.000 tons 'Would be representt:o
,by a square \vWch would be divided in various parts to represent the
production of various countries. The square root of 5410 is equal to
73. Iftbis quantity is represented by 3.7" then to show the pcoductioo
of various countrIes it·would be divided in parts. Such figures obtained
after calculation ate shown in t!lbJe XIV (b).
TABLE XIV(b)
Prodllc_!i{ln of Afangll!':.!..' {ooo lorrs)
Cumulative
I
Country
Ku~~ia
I Production
2.!\JU
Length in
itiches
1·51
Length in
inches
I Ill'
South Africa 870 o·S9 ~.IO
rig. 1'4
Circle ot Pie-Diagrams
Circles occupy a unique place amongst t\\ o-dimensional diagrams.
The reason for their popularity lies in the facility and ease with which
they can be dra" n. The area of a circle is directly proportional to the
square of its radius. Thus if the radius of a circle is four times the radius of
another circle its area would be sixteen times the area of the other circle.
The area of circles is always in the ratio of the squares of their radii.
Circle can be used at all places where squares are used. Just as in the
construction of squares. the square roots of various items arc calculated.
siIr1ilarly in the construction of circles. the square roots of various figures
are found out. In case of squares their sides are kept in toe ratios of these
square roots, and in case of circles their radii are kept in this ratio. l'hlt
\
DIAGRAMMATIC REPRRSENTA'1l'ION OF DATA 327
data given in table XV below is represented by circles in figure No. I,.
TABI.E XV (n)
Country Production
U. S.A. 2200
Venezuela 6z:.z
Russia 301
Saudi Arabia 268
Iran 13 2
The square roots of these figures and the ratio of the radii of
various circles haTe beet: calculated in Table No. XV (b) below:-
TABLE XV (b)
U. S. A. 2200 4 6 ,9 T .02
r
I
I
1
Fig. 15
In the above diagram, for the calculation of sc~le the area of aoy
circle can be calculated. Thus, the area of last circle representing the
production of Iran is about ·2 sqnare inches. If.2 square inches repre-
sents 132 rodUon' gallons, then one square inch would represent 660
DIAGRAMMA'l1IC REPRESBN'l'A'l1ION OF DA'l'.\ 329
million gallons. The area of the circles is in proportion to the figures
of produetion. Circles look more beautiful than squares and are also
easy to draw. As such wherever a choice has to be m1.de between
squares and circles the }atter should be preferred.
Just as in c~se of squares it is possible to represent the aggregate
l1y one big square and various cOl11ponents by rectangles cut within it,
similarly, in case of circles the aggregates can be represented by a b~g
circle and the various components by sectors cut irside it. Such dia-
grams are known as Angular Diagrams. Sectors are not difficult to draw
and they look beautiful. This is the reason why they are preferred
to squares.
The areas of various sectors are in proportion to the angles which
they make at the centre of the circle. The angle at the eentre of the
circle is of 360 degrees. This angle of 360 degrees represents the ag-
gregate. It can be divided into a number of smaller angles whose
degrees would be in proportion to the values of the components.
Table XVI (a) below gives the total expenditure on the first FIve Year
Plan and its distribution amongst various types of States.
Expenditure in crores
I -----------
Government
-------·-----1
of rupees
Fig. 16
From the data given above the following diagrams can easily be
drawn:
Expmditllre of Central Gflvernment and Part A
States on fir.rt Five Ytnr Plan
(Crores of rtipee.r)
PA!lr·A ·STATU
CENTt/AI CDVT
Fig. 17
Circles can be used at all places where squares and rectangles are
used. Angular diagrams look beautiful but if the number of compo-
nents is very large it becomes difficult to show them in this fashion.
In such cases smaller components should be merged and then the data
should be shown in the shape of a circular diagram. 'But merging of
comJ>onents is not always aesirable. It may sometimes create mislead-
ing Impressions. It should be remembered that various sectofs in two
or more circles should always be kept in one order to facilitate com.
parison.
THREE-DIlmNSIONAL DIAGRAMS
The following table gives the population of four towns in the last
census : -
T:ABLE XVIII
Populalion of FOllr Towns
Town Population
A 5,00,000
B 1,00,000
C 50 ,000
D 10,000
Fig. I8-A
" (,) Join C and G. Rub off line BF and FG. The required cube
is ADCGHB.
CO,lJifilimm of CUb6. Three-dimensional diagrams arc more difficult
to construct than surface diagrams. E
Cube rool: -of the figures cannot be "..------:.1
calculated very easily and though
the drawing of cube is not very diffi-
cult, cylinders and spheres require very A ~_______""
great care in construction. As such ,
three-dimensional diagrams, in general, I
and cylinder and spheres in parti- 1
eulu are not very popular. But it J" ___ _
must be admitted that three-dimen- /' F
sional diagrams look more beautiful ~~
than hars, squares, rectangles or ~
circles. "
D
Fig. 18-B
PICTOG R.4 "'(5
TABLE XIX
Tea&her-silldeni ralio in University A
1962-6~
AGE-PYRA/It/1J5
INDIA
AGE CENSU.s 1951
75 AND "I
OVER
35-44
25-
V.S.A_
CENSUS 1950
.460£
;Of; AND
OVER
65-74
55-64
45-
'35-14
'15-34
15-24
5- 14
,- "*
15.000
FEMALE
)
Fig. zo
DIAGRAMYATIC REPRESENTATION OF DkT_4. 33,.
CARTOGRAMS
The regional distribution of data is usually shown by the use of
maps. The distribution of rainfall in various regions of India or the
production of coal 'in Vll.rious parts of the country -can be shown with
the help of maps. Similarly density of population in a particular country
can best be stud ied by drawing a map and putting down dots representing
a certain number of people. Thus, one lakh or ten lakhs of people can
be represented by a dot and the density of population in various regions
can be suitably pictured in this manner. The following map shows
the density of the Indian population : -
-- ---:::::-"':-'=:::'--'-"':'='':1
Fig. 21
Various methods by which statistical data can be represented by
means of diagrams and pictures have been discussed above. Which
particular diagram should be chosen for a parti(..ular type of data. is a
q'lestion not v~ry easy to ,answer, The selection of a diagram should
22
338 ··FUNDo\ME...... TALS OF STATISTICS
Questions
I. Write a note on the necessity and usefulness of diagrammatic representation
of statistical data.
2. . What' types of mistakes ale commonly committed in the construction of dia-
grams ? What precautions are necessary in this connection ?
3. Point out the usefulness of diagrammatic representation of facts and explain
the construction of anyone of different forms of diagrams you know.
(B. Com., Allahabad, 1945).
4. Write short notes on : -
(a) Surface diagrams (b) Volume diagrams (c) Pie diagrams (i) Bat diagrams
(e) Two-dimensional diagrams.
s. The following table gives the detaila of the cost of the construction of a
house in Allahabad : -
Land Cement 800
Labour Lime 800
Bricks Stone 600
'Iron 1800 Sand 200
Timber J500 Other things 1500
Represent the above figures by a suitable diagram. (B. Co", .• Allahabad. 1945).
6. Represent the following data by vertical baIS constructed on a percentage
ballis ; -
Prweeas, CoSI. Profit or uss ( per pair oj shoes) malUlja&lund by A//ahQbad Shoe COIII/NRfJ
in the years 1936 aflll 1940
1940 '193~
RII. as. Rs. as.
Proceeds per pair of shoes I2. 8 10 0
Cost per pair
Wages 4 0 5 0
Leather 8 0 6 0
Othercost8 I 0 o 8
Total IJ 0 9 8
Loss Pt'Oiit
Profit o~ l~ per pair -0 8 +0 •
(B. C_., .A//~
DIAGRAMAT'IC REPRESEN'l'A'l1ION OF DATA .339
Country Population
(-000)
China 4,'11.770
India 3.5:t,310
Russia 1.61,000
America 1.&4,070
Germany 64.77 6
Japan 64.7 00
United Kingdom 4 6•0 77
France 4 1 ,860
Italy 4°,100
Others 7.0 5,°..:.7.:..7_ _ _ __
Total population of the World 20.11.800
IJ. Rcpresent the following data by a suitable diagram sho\ving thc difference
bctween procecds and costs : -
Year Total proceed, TotaleOSh
1940 22.0 19·5
194 1 2,7·3 :n.7
1942 2,8.2 30 •0
1943 30 .3 2S·6
1944 P·7 26.1
1945 n·, 304. 2
(These figures are imaginary~
14· The follOWing table gives the profit and loss of a concern. Represent
the data by .ub-divided bar-diagrams.
u. The following table gives the national income of India by industrial origin.
Rl:1'resent the data by 8ub-divided bar diagrams : -
N4tiDlla/ l"come of I"tl;4 b.J Industr;ll/ Origi"
(In hundred croT'e/)
Particulars 194 8 -49 1949-5 0 19,0-,1
Agriculture 4:t·S 44·9 48 ,9
Mining, Manufacturing and
Hand trades 14.8 IS·0 I s.3
Commerce, Transport and
Communication • 16.0 16.6 16.9
Other services 13·4 13. 8 14.4
Net domestiG product at
factor cost 86·7 0
9 .5 95·5
Net earned income from
abroad -0.% -0.1 --0.1
:10, The following table gives the quarterly foreign trade of India. Represent
tbe data by a suitable diagram. .
In CroresoiRupees
Exports Imports Difference
<+) (-)
1952--53
Second quarter 48,7 38.9 + 9. 8
Third quarter 151.0 16 7.7 -10·7
Fourth. quarter 140 .3 13 8,3 + z~o
19H-H
First quarter 132.6 130•6 + z.o
Second quarter 119·9 164. 0 -44.1
Third quarter 13 0 .4 148.0 -17.6
Fourth quarter 148.8 12 4. 0 +z4· 8
19S4-SS
First :a=er
Secon quarter
13 2 .9 I29·6 + 3·3
II3·S 14S· z -3 1.7
;no Represent the following data by a suitable diagram :
Main headings of
income of the
1948-49
(lakhs
1949-So-
(lalehs of
19S 0
(lakhs of
-,1.
Central Government
Import duty /
of rupees)
7. z 74
rupees)
12..616
h pees
z,471
)
Production duty 1.06 3 6.78, 6,7S4
Income tax 13.988 n.'H iz.5'71
Other taxes 319 360 '. 661
22. Represent the following data by a circular diagram divided into sectors :
&5· The following table gives the population of India on the basis of religion_
Represent the data by pie'diagram constructed on a percentage basis:-
2,. Utilise the following data to represent diagrammatically the re/aJipe in ere-
Uc in note circulation towards the end of 194' in different countries:-
2.6. Show the details of monthly expenditure of two families given below by
means of two-dimensional diagrams : -
Family A ,Family B
Items of expenditure (Income Rs. ~Ineome Rs.
500 p. m.) 400 p. m.)
(Rupees) (Rupees)
Food 140 12.0
Clothing 80 80
House rent 100 60
Education 30 40
Fuel and JigbtinQ 40 ao
Miscellaneous 40 4°'
(M. A •• Plllljah. 1952.)-
344 FUNDAMBNTALS OF STATISTICS
Custom 4 0 50 4S S8
Central Excise Duty 868 Gsz
Corporation Tax 2. 0 4 13 S
Taxes on Income 1574 1410
Salt Bu 10SO
Opium so 46
Other heads ua 15 0
----~~~----.--------------------------------~~~~~~=-~~--
~ (B. Ctllll •• N~gpur.1943)·
The following tabIegives...th.e birth rates and death rate. of a f.w countries
in the world during the year 19,1 : -
~ ; -." -_ -Death rate
Countn' Birtllu'te
Egypt 44 207
Canada 2.4- II
U. S. A.. 19 II.
India
Japan
Germany
,.
33
16
2.4
19
11
France 18 16
Irish Free State 10 14-
United Kingdom 16 lIZ
Soviet Russla 40 II
A.ustralia 200 9
New Zealand 18 8
Palestine H 20J
Sweden IS 120
Norway 17 II
~ ....b
M
...:..'"'" ....
~
..,_ .0- ...,
H .....
..... "".....~
.. .2- ..2-
.,;, ~ ..b .,;, b :......
0-
~ 0- '"
2- ~ '"
~
Material.
- --- --- --- -- -
57 20S 55 36 3S 38
---------
17
2020 1i)
Labour 10 8 II II II IZ 7 5 8
Over-head 14- 10 15 16 17 10 Ia 9 la
Total
---
61
--- ---
45 61
- 63 --- --- - - --- ---
63 70 41 31 46
/
Draw a 2taph of the different component of costs as percentage of the total
DIAGRAMMATIC REPIlESENTATION OF DATA 345
~o. Show by suitable diagrams the absolute as well as relative changes in the:
student population of the colleges A and B in the different departments from 1940 to
1947 : -
Subject A B
1940 1947 1940 1947
Arts 300 3S o 100 2.00
Science 12.0 500 150 2.5 0
Commerce 2.00 6so 13 0 15 0
Law 100 300 100 12.0
(B. Co".., Agra, 1948).
31. Indicate the diagrltms you would consider most appropriate to use for re-
presenting each of the following classes of statistical data, stating briefly the reason for
your choice:-
(a) Distribution of a large number of candidates according to the number of
marks scored by each at a public examination.
(b) Marks scored by two selected candidates in each different subjects tested
at an examination.
(t)- Total value of Indian Exports and Imports during the years 1938 to 19S5.
(d) Distribution of Assets of all Indian Life Assurance Companies put together
as at January 19, 1956.
(II) Middle class cost of Living Index Numbers in Bombay and Calcutta during
tbe years 1938 to 19H.
o (f) Distribution of age, sex and civil condition of person enumerated at tbe
CCOIllS in 1951. (I. A. S., 1956).
32.. Diagrammatically compare the following itatistics : -
Country Acres
India 7·S
Denmark 40.0
Holland 26.0
.Germany aI·S
France 20·S
Belgium 14·5
Britain 20.0
U.S.A. 145.0
(Source : Congress ,Agrarian Reforms Committce Report, 195 0 )
on. "Give me an undigested heap of figures and I cannot see the wood for the
trcc~o Give me a diagram and I am positively encouraged to forget detail until I have
a real grasp of the overall picture. Diagrams register a meaningful impression almost
before we think."-Moroney.
Discuss the utility of diagrams and elucidate the above statement.
346 FUNDAMENl'ALS OF STATISTICS
-34. The following tabie gives the number of students appearing at various exa·
minations from a college in 1958, 1961 and 191140 -
BUIDlllations I Number of Students appearing
---- - \---1918 1961 --"19 64
a~ I ~ ~
B. Com. I 100 us
..B,..--,S,C.,-________ i ISO 2.50
Total I 4,0 675,
Represent the above data by a suitable diagram.
55. The table given below shows the percent of the worlF done in tae manu-
facturing sections as against the allotted quota.
Sections Monday Tuesday Wednesday Thursday Saturday Weekly
Jan. 2.5 2.6 2.7 2.8 30 Total
A ~ ~ n ~ 55
B 70 6S 80 85 100 100
C 6, 51 7S So 75
75 So 100 100
D
-110
Allotted quota for each workday 100%. Draw a Gantt'iCilart-£rOm: {In: sbcvt'
dat..
36. The following table gives information of outlay in th6 two five year plans
of India under IDl\jor heads of development expenditure : -
X' ·,0 x
3
...,
.J 4 :
••:--__ j-~--s:
348 PU"'OAMBNTALS OF STA'l'lSTlCS
TABLE I
Variable x Variable.)
12
-10 6
6 -6
4 -,
Thus P is plotted at a point .wh~re x has a value of ~z andy of~.
The distance of P from the ordinate is 1.2.- tndicating a value of 12 of
x-variable and its distance from the abscissa is 1.6- indicating a value of
II of y-variables. Since both the values are positive the. point P is in the
6rst quadrant. In x-axis the distance from the pc;>int of origin to any ?ther
point is known as x-(".oordinatc and in. y-axis it IS known as~-coord.lnat~.
These two distances are called the co·ordinates of the pOint which IS
GRAPHIC REPRESENTATION OF DATA 349
plotted. Thus the co-ordinates of point Pare IZ and 8 ; x- co-ordinate
is 12 and y-coordinate is 8. Co-ordinates are expressed in terms of x
andy. Q, Rand S are points in which x andy co-ordinates are respecti~ely
- 1 0 and 6 -6 and-6 and 4 and-5. The co-ordinates of these pOInts
are indicat~d by dotted lines in the above figure. In actual practice only
the points are plotted and lines are not drawn to show the co-ordi-
nates.
Thus, the scale chosen Inust be such which would permit the whole
data to be represented in an accurate manner so that the fluctuations are
clearly indicated. The respective sizes of the scale of x-axis and.1-axis
cannot be rigidly laid down. It depends amongst other things en the
si:>'e of the paper also. Conventionally, however, y-axis is taken Ii times
as long as (-axis but there .is absolutely no rigidity about it.
Plotting of data. When the scales h9,ve been decided and marked
on the graph paper the last thing to be done is to plot the data. On
the basis of the values of x and'y co-ordinates variou!i points should be
plotted on "the graph paper. The next thing is to join these points.
The rule with regard to the joining of points is that if the figures relate to
a continuous variable the line joining the points should give as smooth
a curve as possible. By continuous variable we mean such variables
which can assume any value within a given range. For example, the
heights of persons can have any value within a specific range. The series
relating to the heights of some persons would be a continuous variable.
In such cases it should not appear as if the different parts of a curve are
'lot smooth and give an angular picture. If a curve is smooth it, indi-
cates that the different values of variable are continuous and there are
no parts separate or different from each other and further there is
no break or gap between them. If the variable is discrete thell the
different points should be, joined by straight lines. It would indicate
that there is no continuity between the value represented by one point
and that represented by another. It means that the variables can a,ssume
only those values which are indicated by various points. They cannot
have any value between the points. Ordinarily It is very difficult to
smooth curves and the data are shown by joining the faints with straight
lines. However, curves which show mathematica relationships can
always be smoothed and they should not be shown by straight lines.
First we shall study the graphs relating to time series and then the
graphs representing frequency distributions. Graphs of time series
can be either on natural scale or on ratio scale.
195 8 9·54
1959 8,90
19 60 8·93
1961 8·57
19 62 8.;0
196; 10.04
1964 10.7 6
1965 11.03
If the above data. are to be shown by means of a graph, first of all
x-axis and y-axis would have to be drawn. Since the values of both
the variables are positive we shall draw only one quadrant in which the
~alues of both x and y variables are ppsitive. On tI:e abscissa or x-axis
we shall show the years and on the ordinate or y-axis the figures of the
production of steel. In figure No. 2 given below on x-axis l ' represents
two 'years and on y-axis I ' represents four }akh tons of steel. Thus
agaiDst the year 1958 the point at a distance of 2' 38" from the J:loint of
origin would show the production of 9.54 lakh tons of steel.
Similarly against the year 1959 the point at a distance of z.u· would
reprcacnt a production of 8.90 lakh tons. In the same way other
pointa can be plotted. The line joining these points would be the
desired graph. It would be as follows ;
352 FUNDAMENTALS OF STATISTICS
.-y, I
11
--
r
~+---4-
f
.--
I
,-- -+----+
I
--l-
j_'-t-. i-I
6
I I
.i .-f- •
1+t
-- --'Ii--+-+i
_- ,-t--- I .
r-----+-
II I
!
I---i---
, )
H-__L_-
I_ I J
2
I i - .1.Ll
60 61 6z
yeOTs
Fig. 2
From the above graph the production of steel in each year can be
known. The graph can also reveal the changes in the production from
year to year. If these data were represented by a bar diagram it wouid
not have looked so impJ;essive.
The only difference' between absolute historigrams and index his-
torigrams is that in the former, actual values of the variable are plotted
whereas in the latter their index numbers are plotted. Absolute his-
torig.rams teJl us about the changes in actual values whereas index his-
torigrams tell us about the relative or percentage changes. If in the above
case the production of steel for the year 19S 8 is represented by 100 and the
production figures of other years are expressed as relatives and if these
indices are plotted the resulting graph would be an index historigram,
False Base Line
If the fluctuations in the values of a variable are very small as com-
pared to the size of items, a false base line is used. By its use even minor
fluctuations are magnified so that they are clearly visible on the graph.
If the size of items is big and if the vertical scale begins from zero the
curve would be mostly on the top of the pa,per and if the differences in
the values of various items are not much, it would, more or less, be of the
shape of a straight line. In false base line the scale from zero to the smallest
GRAPHIC REPRESENTATION OF DATA 353
value of the variable is omitted. Whenever false base line is used
it shou'Id be very clearly indicated. on the graph. Generally in such
cases Terti cal scale is broken in two parts and some blank space is left
between them. The lower part of the vertical scale is kept very smal.l .
and it begins with zerO. The upper part begins with a value equal or
nearly equal to the smallest value of the variable. TQ make the breaking
of vert).cal scale prominent usually saw-tooth lines are used. In Figure
NO.3 which represents the data given in table III below such a false base
line has been used :-
TABLE III
ToM Supply of Money in India
(in hundred-million rupees)
Month 195 I 19S Z Month 195 I 195 Z
January 19·7 18·7 July ZO.I 18·3
February 2.0.2- 19. 0 August 19·4 18. I
March 2.0.6 18,9 September 19.0 17·9
April 2.0·9 18.'9 October 10.0 17·9
May 2.0·9 18·7 November 18·7 17·9
June 2.0·4 18·5 December 18.8 17·9
Total Supply of Money ;n India "
f?s
(IIW1(''''''
21'0
""",on) --
~o 5
v 1\
t-
!
Cl 1'\
200
J
II
195
1\
190
i\
[\ v t'-. V l"'- f--
18 5
"- f'... 0
f'\ t'-,
180
175
o=TITmr
JFMAMJJA~ONOJrMAMJJASONa
1951 1952
Fig. ,;
23
354 FUNDAMENTALS OF STATISTICS
~ In the abov.e graph I" on the vertical scale represents 100 million
rupees. If a false base line was not used the vertical scale would have
been 21' long. If 1" was to represent 500 million rupees the size of the
vertical scale would still have been more than 4" but then the fluctuations
in the supply of money um-ing these two years '\vould not have been
very clear from the graph.
TABLE IV
50
.,.'
"
~''''
45 J
40 J \ II
I \ /
35
30 I
: V r.....: J -- f... .......
.. 1--,_
!-
25 1/ 1\ Ii
J \~ II!
20
V \If !
.
15
1'0 I , ,
,, ,,
5 ,
, ~-
,.,
,,
0 f----
5
-,'0
,,
,. ~/ncomE
.f;:5 '
,. - ..- £xpendi!vre
"
".
-2'0
--- Oiflt'rt'flct'f
o .... '"r
'-0 \D
I
0'
I
0
..... \0
\0
\0
... .
......
\D
I
\0
Fig. 4
In the graph given ahove two quadrants-the first and the fourth-
are shown as some of the figures of ~-axis (relating to diffetences) are
negative. .
- If the units itl which different variables are expressed ar~ dilferent.
then also such graphs can be used. The techni~ue of their cOnstruction
is the same as in the above ca~e. The only dlfference is that in such
cases two or more scales relating to different units have to be shown.
If, however, there are onlv two variables in different units, One vert"ical
scale can b~ shown on the'left hand side and the other on the right hand
side. 'The two hi$torigrams can be plotted in this manner on the same
paper. To facilitl\te comparison the two scales are so adjusted that the
hi$torigrams are close to each other. Thus, if the average value of both
the variables is kept near about the centre of the vertical scale the two
curves would affotd a good comparison. False base line can be used for
356 .f'UNDAlI4BNTALS OF STATISl'ICS
the purpose of adjusting the scales. The following table gives the mon-
thly imports (volume and value) of liquor in India in the year 1941-42-
Thcs(' data have been plotted in Figure NO.5.
TABlE V
MOllth['y Import'J Liquor ill India
.
Month
Volume
(lakbs of
gallons)
I Value
(lakbs'of
rupees)
194 1 -4 2
I Month
I Volume
(lakbs of
gallons2
'Value
(lakhs of
rueees)
April 4. 6 October 2.6., 4.6 3 1 •1
May 3·9 November 19·3 3·4 23. 2
June ;;6 11.1 December 2..1 IS·3
July 4. 1 2.6.6 January 2.·3 2.1'.1
August '3·3 2.1.0 February 1.6 16.7
September ;.6 2.;·4 March ;·s 19·C}
MOlllh!fINlportoTLI'lHor In ]ndiq
VO/llma
fl
socgaIIOtIS)
S 3
,
4
.S~
·0
J1\
II 1\ ili,
3 '5
3 .f)
1\
Ie.
\~
~~ '" (
I
I
I
\
\
\
\7
,V'
\
"
I
~ I
2&
2 .$
\
\
\
i7
1I
I
Ii ',
,
\
7 2D
2·0
\'tJ1\\ it"
I
I ,
I
\
- , I \
'·5
GRAPHIC RBPRESENTATION OF DATA
TABLE VI
July 96 50
august S6 33
September '59 43
October 32 23
.:November 60 48
December 22" 1.9
represen~ 32,000 cwts. and 20,00,000 rupees. The scales are such that
tltt: f\.i~~rig~atn. would run ~hrough the bats making;the gra ph 'geautiful
~tl!lg Hlly the comparison between value and vol~tne.
358 FUNDAMENTALS OF STATISTICS
VValUI!
~~~~~~~~~~~~~~~o
SEP. ..;J OCT.
Fig. 6
From the abov-e graph it is possible to study the variations in the
voiume of' exports month after month and similarly the variations in
the values are also clear. 'The two variables are moving in the same
direction. Wbenever there has been a rise in the volume. values have
also gone up and conversely with a fall in volume, values have also gone
down.. ThIs indicates that there is a positive relationship bct\veen the
two phenomena.
Method of showing Range-Zone Graph
In soine data the difference between the maximum and miOlmum
values of a variable have to be emphasised and presented graphically.
In such .:aseS zone graphs are used. Zone graphs show the range of
variations. Figure No. 7 presents a zone graph based on the data given
in table Vll.
GRAPHIC REPRBSENTAT1.0N OF DATil
TABLE vn
A!>erage PriceJ of Gold in Bombay (Pcr Tota)
Year Maximum Minimum
Rs. P. Rs. P.
195 0 122.00 12.1.;0
195 1 12.1.80 UI.Z.S
19P. 13) .12 I21,2-5
1953 132. 12 n6.6z
1954 134·n rzR·7°
1955 13 6 . 80 133.20
195 6 13 6 ,75 I3 I •2 ,
1957 135.5 0 133·94
195 8 135. 20 134·2.5
1959 137·7° 134·75
In order to plot the above data in such a manner that the difference
between the maximllm and the minimum prices in diR-erent years is
clearly represented, it is necessary to use a false base line as the difference
between the maxima and minima is not very much. In figure No. 7
below. the minimum and the maximum values have been plotted and the
difference between the two has been made prominent by drawing thin
bars between these values. The size of the bars represents the range of
variation in prices in different years.
Maximum and Minimum Prictll of Gold in Bombay
IIspsdo/tJ
'4 0
120 -- I -
110
360 FUNDA.MENTALS OF S'l'ATISTICS
TABLE VIII
.
Maxil1l111J1 and Mi'nilJllllf' Va/t(u oj "X"
Date Maximum value Minimum value
I 52 50
l, 51
; 56 55
4 5; 50
S 51 48
6 52 51
7 53 51
S 55 53
9 56 54
10 58 54
II 51 ,6
12 59 54
13 56 55
58
1,14 62
54
60
16 63 62
17 61 59
18 60 51
19 64 58
20 66 63
.21 62 60
22 59 '5
23 60 '9
24 64 63
25 66 64
26 67 62
27 65 60
28 .1 8 n
29 58 56
30 15 ~2
GJ!.APHIC REP)!.ESEN"I'ATrON OF DATA 361
~
6J 1-,.-
1-1-
.,..
~i.
55
~
~
5
I
"oem I I I ( 1 1 I ITt 0 T1 t::n 1 1 I I t I (J I J
t J S 7 9 " 'J '5 11 19 11 2} 25 27 2~
Dat~s
Fig. 8
Method of showing difference
When the difference of two figures is to be shown prominently
the space between them is either coloured or cross-lined. Such graphs
. are very attractive and appealing. Positive arid negative differences
are indicated by different colours or ditlerent types of lines and crass-
lines. Such a graph has been shown in figure NO.9 which presents
the data given in table t X.
TABLE IX
India'.s Foreign Trade (Janl/dry, 1953 to July, 1954)
Month Imports Exports
1953 (in crares of rupees) (in crore'!> of rupees)
January 43·5 44·5
February 4 0 .4 39·2.
March 47. 1 4 8.8
April '56.l 38'9
May 51·4 4 1 .0
June 51.8 4 0 .0
July· 50 .0 41.0
August 4 6 '5 49·4
September 45·5 4 8 •8
OctoQer 39. 0 ' 4 8.7
. Noyember
~~ ~ ..
.'X~~ t;.l'iI
39.·4
3~'9 ;
51·5
44.6
362 i1UNDAMEN'I'ALIl 01' STATISTICS
JTIITIJ]~rmJTD]JJ
J F M'A '" J J ,1 .s 0 N D J F M A 111 J J
1953 1954
Fig. 9
F.rom the above graph the f:wourable and unfavourable balance
of trade can be very easily studied.
Band Graphs
Band graph is a type of line graph which is used to present the
total for successive time periods broken up in sub-totals for various
component parts of the total. Va+ious component parts are plotted
one over another and in this way there would be as many bands as the
number of parts. To distinguish various parts from eaeh other they arc
either coloured in different colours or the space between them is filled
by cross-hatch, vertical or horizontal lines or different types of signs
and symbols. This type of graph is specially useful for studying total
cost divided in various component parts or total sales, production,
consumption, e~ports or imports, according to different states, districts
GRAPHIC REPP.ESENl'ATlON OF DATA 363
or regions. Table X below gives the imports of newsprint in India
from various countries ! -
TABLE X
import of Newsprint in India (1947-48 to 195 z-53)
Country '947-4 8 194 8-49 1949-$0 195 0 -$1 1951-$ 2 195-53
- - - - - ----.- 1------ ---- ---
Canada 6.6 8.2 7. 1 6.2 11·9 10.6
Finland 5·5 5 ·9 2·4 8.0 12'9 10·9
Sweden 4. 6 7·3 4·4 3. 6 1.6 4·5
Norway 9. 1 14·9 8·5 11.9 12.. I 10·7
Others 5·-1- 7. 8 4·7 24. 0 18·5 12.6
47-48
Fig. 10
FUNDA1I.IbNTALS OF STATISTICS
From the above graph it is possible to study the trebd -of total
imports as also the imports from various countries. If the data are
given in the shape of percentages then also band graphs can be used.
in !ouch graphs the total in each year would be represented by 100 and.
the curves representing the imports from various countries year after
year would be expressed as percentages of the totals in different years.
The data given in table xr below are shown on the basis of percentages.,
in figure No. 11.
TABLE Xl
Number rif Hindi Films prodllred ;n India (1940-5°)
No. of Hindi Total No. ot Col. (I) as
Year Films Films Perc.el1tage of
{I) (2) Col. (2)
J94 0 86 50 .3
1941 78 4 6 ,5
[942- 97 59·5
[943 108 68.0
1944 86 68.2-
1945 73 73·7
1946 155 77·5
1947 186 65·7
194 8 14 8 55 ·9
1949 157 54·;
T95 0 IH 46.()
.fig. II
GlI.APHIC REPRESBNTA'1'ICN OF PA~A
365
....'" N
q
0 0
j co
N
1""4
c1'\
)'0'1
cr'\
...
.....o
u
t
t
-<
-i:l
0..
Q'\
"" 00 00
00' 0 \•. 0•
. . . . 0'\
~ o 0 .... o
N'O
0..
~_I~----I----~I~---
-i":' .
,t'OO'V
..Q ~ ..... 0
~
-
e -oi -.;..- 1 - - - - 1 , - - - 1
366 FVNl>A:MllNTALS OF STATISTICS
In the table given above the monthly figures of sales are first
cumulated. These figures are shown in the second row for each of the
three years. Monthly annual total has been given in the third row for
each year. The figure given against December, 1953 is the totalof the
figures of the twelve months ending December, 195}. Similarly, the
figure given against January, 19J4 h the total of the t\ve}ve months
ending January, 1954 or in other words, the figure against January, 1954
is found out by dropping the figure of January, I95; and adding the
figure of January, J954 to the annual total of J953. In this manner
other figures have also been calculated. The above data would be
plotted in the shape of a zee-chart as follows.
100
~
::; 150
~
~
~ "0
'50
t~
'V
0 U
J F
Pig. 1%
The rules for constructing the bar frequency curves are the same
as discussed in the previous chapter in connection with bar diagrams.
The values of the variable are shown on the base line and above each
value a bar representing related frequency is dra'\\n. The lengths of the
various bars ~re kept in proportion to the size of the respective frequen-
cies. Sometimes instead of thick bar only lines are drawn. Since they
do not look beautiful, thick bars should be used for the purpose. In
table XIII bela".. the data relate to the number of rooms in houses.
The series is a discrete one and a bar frequency diagram represents the
data as shown in figure No. 13.
TABLE XIlI
Nflfllber of Rooms in HoufeI
1 17 0
2 18 3
3 ;:9 1
4 146
5 lOS
6 75
7 42
8
3°
9 2,
GRAPHIC REPRESENTATION OF DATA 369
Nllmber of Rooms ;n HOllses
~aor--------'-'----------------------~
Bars can be horizontal also but vertical bars give better graph!>.
The data can be shown by.a discontinuous curve'as well. In this case
instead of bars or lines only points would be plotted on various heights
representing the number of hOJlses. These points would be joined
by straight lines and a frequency polygon would thus ,be obtained. In
table XIV below the frequencies of the.,_different values of a variable are
given. These data are shown in the shape of a frequency polygon in
figure No. 14.
TABLE XIV
Values of 11 Variable and their Corresponding Frequen&ies
Values of a Variable Frequency
I 3
Z II
3 3%
4 41
24
'-37'"tl FUNDAMENtALS OF STATISTICS
s
6
7
8
9
10
II
12
13
14
IS
16 z
Fig. 14
, i
0 .r
I
I
~
0
z0
-
r
: T1 tnJ
Fig. 15 1\
372 FUNDAMENTALS OF STATISTICS
OJ people of people
No ofPropir
.. __. ..... - '''0. of P~Dpll'
80
!- 1 80
'I
r
6(/ vl'\ 60 t- 1.11\. 1
II f\ I
so i
so
Il ~ ! I
I !/ 1\
t-
40
I
.f- , \ II
I-t-
~
$0
~ 1 lJ. ~ I
, ~
j_y
I
I . ~ L
10 -~)III
I
~
\
'0 10 I-
I _L itJ
I
. ~_L
t'-
~ 'II ,. IS ,7 19 21 2} ]~" 2' 31 3Jl5_.- J,
l_LLL-l_l_
, " I)"" 19 21 lH5l7 ~ 31 .n 34 31 n
Agr AJ'
Fig. 15 B Fig. 15 C
Fig. 16
. In. s~h cases as the name suggests the curves are not symmetrical.
The frequencies of various values are not in any mathematical relatiQn-
ship. They arc: most common, type of curVes, as in actual practice,
perfectly symmetrical curves are rarely obtained. Such curves may be
either positively skew or negatively skew. Figure No. 17· shows a
curvewhichispositively~kewand Figure No. 18 a curve which i,s nega-
tively skew : -
GRAPHIC REPRE~ENTATION OP DA1A 375
Positively Skew Cllrl1e
Fig. 17
Fig. 18
In such curves skewness is very high and the size of the items
containing the maximum frequency is generally in one corner (not in
the middle as in the case of symmetrical curves). Figure No. 19 below
shows a J-shaped curve : -
376 FUNDAMBNTALS OF STATISTICS
J-Shaped Cllrve
Fig. 19
U-shaped curves
In the U-shaped curve the values at the extremes have very high
frequencies and the values at the middle have very low frequencies
Figure No. 20 shows such a curve : - I
U-Shaped Curve
l
--------------~
Fig. _20
GRAPHIC REPRESEN'CATION OF DA.TA 377
Fig. 2.1
TABLE XVI
Age dillribllfioll of a groNP of ItNdentl
,- b 40
6- 7 96
7- 8 15 6
8- 9 122
9- 10 306
10-11 40 2.
II-12
494
12-1 3 574
13- 1 4 63 8
14-15 682
15-16 702
16-17 7 10
000 j -
700
I _j
_/' V Ii
600
I
~ 'fOO ~
-. - - - - - - -- -- ' /
"~
~ V. · I
I:r-
~~
400
t
30
.. .. - - - - --
V ·•,
.
V ·· ·•
1-.. -- v,
·;.tt
"-' ,200
~_L~_
I
o, L-.L.._ ,,:,._l_.. I
i
7 8 9 10 It 12 I) 14 1516 II
4g e
Fig. zz
&'he above graph shows that the total number of students reading
is 710 as such the value of median will be the age of the' 3~ 5th student,
of first quartile, age of In.5th student and of third quartile, age of S 32..5 th
GRAPHIC REPRESENTATION OF DATJ. 579
studen~, (in graphic method median is the value of (n;-) items and
first and third quarti.les. are similarly the values of (~) and (34
n
)
items respectively).
To find the value of median, first of all, the point on vertical line
which reads 355 would have to be located. From this point·a line would
be drawn parallel to the base touching the ogive at some poi~t. From
this point of inter-section another line would be drawn parallel to the
vertical scale touching the base line. The value at the point where
the base line is touched wou-Id, be 'the value" of the ~edian. In the above
graph the value of median located in this fashion comes to about 10.S
years. In the same manner values of quartiles, deciles and percentiles
can be located.
Just as f'teCJ.uency curves are smoothed, similarly the ogives or
cumulative frequency curves can also be smoothed. "The data given
in table XVII below have been cumulated by both the methods discussed
earlier and the smoothed frequency- cUrves have been shown in figures
Nos. 2.3 and ~4. :-:-'
'iDO
./
.......... -
500
,/
/
3
/
200 ./
V
- I--- v
I()O
0 J
513 5? 6D 61 4>.2 63 64 65 66 67 68 6' 70 71
LUI th4n H'tglr' in incltn
Fig. %3
Height ,distribution of a grOIlP oj perso"!
(more tball GlI1'fle)
after smoothing
700
~
600 -
500
""1_....._, I ,
~400
~
-r- ~"" I
" ""
:::. ,
~JOO
I.<:
'" 200
C
(00 ""- .........
o r-- ~
58 S9 60 &1 62 63 64 65 66 67 68 69 70 71
14Dr"t""n JI,ijhf in inclrts
Fig. %4
Cumulative frequency curves have some advantages over simple
frequency curves. Simple frequency curves cannot be compared with
each other unless the magnitude of the class intervals is uniform in all
the series. But there is no such restriction in case of cumulative
frequency curves. Again, uneven class intervals of series may distort
simple frequency curves but a cumulative frequency curve is not affected
by the unequal size of the class-intervals. Cumulative curves are well
adapted for interpolation. We have already seen how the values of
GRAPHIC REPRESENTATION OF DATA 381
Income Cumulative
No. of Persons frequency
Rs. (Less than)
1- " 3 3
.1- 3 ~o 13
;- 4 14 .17
4- j zz 49
5- 6 38 87
(,-
7 40 12.7
7- 8 38 16,
8- ') H 19 8
,.0
9-1(· to
10-11 I 1 ZZ9
11-]." 10 23()
1%-13 I I 250
13-14 R Z5 8
14- 1 5 4 z6z
15-16 4 .166
16-17 5 %69
s69
382' PUNDAIC'.NTALS OF STA nSncs
:t,~ ~:!=:
:r:!t ~~~
,
.. .. .,
I
I
,I ,
I
:
:
,
,
, :T
.
,
, 1- : ., . ,,
,
I
~ ,
U-, r-f J r
I
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~/~ ~ ~ ~
".!_ ~..,~ .... "',.. ClC)o..~::::~b~~ ~
Fig. z~
The upper part of the above figure shows the method by which
an ogive is constructed. The area of various rectangles representing the
frequencies in different classes is in proportion to the number of items
in their respectiv"e groups. Since the operation is cun;lulative the base
of a rectangle is the cumulative frequency of all previoUs class intervals.
Thus, the base of the first rectangle IS zero, that of the second 3 and tfult
of th~ third _1; and so on. The cumulative frequency curve is passing
throl1gh the upper limits of the variolls class intervals as th: data. have
been cumulated upwards. 'The lower part of the figure glves Simple
fre9uency bar graphs. From it frequency polygon can b~ constructed
which in tUI:n can be smoothed to give the simpte frequency curve.
Galton's method of locatin~ median
Francis Galton bas gIven a method by which median can be
locatbd grapbically without cumulating the series. In this type of graph
the values of the variable are marked on the horizontal hne and the
GRAPHIC REPRES~A'I'ION OF DATA 383
.frequencies on the vertical line. The ':lpecial feature of this graph is
that for each successive plotting of frequency the previously plotted
point is taken as the base. The frequencies are shown by plotting the
points equal in number to the frequency; A curve is then drawn passing
through 'the middle of the various groups of points p!otted in tbe above
manner. A line is then drawn parallel to the base line from that point
01), the vertical scale whose value is the median. This lin,c touches the
cutve and from this ,point of inter-section a perpendicular is drawn to-
wards the base line. The point at which the perpendicular touches the
,base line gives the value of the median. The data given in table XIX
below llre plotted in this fashion in Figure No. 26 :_
TABLE XIX
Marks obtaintd by 32 students
4
4S
SO
54
2.
S
3
35
40
42 I 4
S
3
Marks obtained by 32.
58
60
II/ldenls
,
2.
10
4D SO
Marks
Fig. 26
384 FUNDAMENTALS OP STATISTICS
TABLE XX
I 100 - -
2 2.00 IOC 100
3 300 Jt')o 50
4 4 00 100 33-3
5 ~OO 100 25
From the above table it is clear that whether the income increases
from 200 to 300 or fr~m 700 to 800 the absolute increase is equal in both
the cases. However. the relative changes are not equal. A change
from 100 to 200 shows an increase of 100% whereas an increase from
700 to 800 shows an increase of 14.3% only. Ratio scale very clearly
o;hows this difference.
25
FUNDAMENTALS OF STATISnCS
TABLE XXI
I
Year A B Logarithm A Logarithm B
The above data are plotted on a natural scale in figure No. 18--
below : - "
SU11I1 of Rs. 100 tIIId Rs. 1,000riling at Compound Interest rail of 10%
on Nalnra/ S~ale
Its.
2400 ,
ZIOO
lY
/
1800 /
/
/
'500
V
'200
/'
V
V
900
61>0
JlIO
1-
O·
2 .3 .. 5
YhZI"S
6 7 8 '0
Fig. 2.8
From the above figure. it appears that the sum of Rs. 1,000 is in-
creasing at a higher rate than the sum of Rs. 100. This is so on account
of the fact that natural scale measures absolute changes and the absolute
changes in case of curve B are no doubt more than the absolute c;::hanges ,
in curve A. If, however, the data are plotted on ratio scale, this anomaly
would be no more. In figure No. 2.9 below the original data are not
plotted but their logarithm have. been plotted on a natural scale.
383 FUNDAlmNTALS OF STATISTICS
SHIIII of R/. 100 and RI. 1,000 rhing al Compound Intenst rate of 10%
Log'!
3-8 -
----
$·4 f----
I
~
(
,
-
I--
3· 0 I-- K -
I
2· 8 Fig. 29
2·4t-- ~I
4i I I
I
~~
2· 2
- ~
i !
'-
I
~ /
I
-J
1 .l
2 J 5 6 7 8 ~ . to
Y,ors
From the above figure, it is dear that the two sums are. increasing at
the same rate. This graph can be plotted on semi-logarithmic scale
also. In this case instead oflogarithms, the original data would be plotted
on a semi-logarithmic paper as in Figure 30 •
SUIIIS ofRs. 100 and R/. 1,000 rising at Compotl1ld Interest rate of .. 0%
Ammount
(Ill)
4000
5000
- ;--_.
I I
"000 r·
300l>"
B
--
~ooo
,
i--"
1000
800
600
400
, -_ Fig 30
-;0 0 A
100
2
-I--
3 4 5
---
6 7 8 9 to
Y"ars
GRAPHIC REPRESENTATION OF DATA 389
:r
sJ'
"
- V
----
1 .. V
.. V
t7
"
s
V
4
" 17
3
[7
2
\/' \
7
:v o 2 3 4
x
5 6 7 8 9
Fig. 31
1 391
GRAPHIC REPRESENTATION OF DATA
.In the l1bove graph when the value of x is I. the value of .J=I and
when the value ofx is z. the value of.J is also t. All corresponding values
of x and J are equal Simple equations of the first degree also disclose
linear relationships. Such equations are of the form.1=a+bx. In this
equation. a is a constant representing the distance from the point of ori-
gin to the place where the line would touch the vertical scale or.J axis
and b is a constant representing the tangent of the angle which the line
would make with the horizontal base. If a=I and b=3. the equation
would be.1=I+ }X. To construct the graph. various values of x can be
assumed and the corresponding values of.J can be calculated from the
above equation. Thus. the following type of series can be obtained.
TABLE XXII
Vallles of x and.J ill t/J6 aqllalion
.1=1+3 X
.J
4
:z. 7
; 10
4 1~
1 16
If the above dat'a ate fllotted on a grl1ph paper. the following type
of figure would be obtaI1lcd :-
CNrQ8 of th8 81jllatioll .1=I+;X
yO'L-__.__~____-+----~~--~
I 234 5
x X
Fig. 3~
392 FUNDAMENTALS OF STATISTICS
It is clear that the function is linear and that is why a straight line
has been obtained by plotting them. Any two values of x and corres-
ponding values of y were sufficient to locate the lipe. That the c-:.ave
represents the equation is proved by the fact that the above equation is
satisfied by the co-ordinates of various points of. this curve. In a linear
relationship if one variable increases by a constant amonnt, the corres-
ponding increase in the other variable is also constant. Tn the above
case x variable always increases by I andy variable by 3. Such series
in which there is constant increment of this/type, are called arithmelie
teries. -
In physical sciences various types of linear relationships are found.
In economic and social phenomena such relationships are rare. How-
ever, one example is that of the growth of money at simple rate of interest.
Ify represents the sum to which Re. I would amount at the end of two
years at r rate of simple interest, the equation would be of the following
type :-
y=1+"X
Thus in ten years at 5% simple interest, Rs. 100 would amount to
100+ (5 X 10) or Rs. 150. In this caser is constant and so the relationship
is'tinear. If plotted on a natural scale such data woulaalways give a
straight line.
Non-linear relationship
N011-linear functions are of many types. We shall discuss some
common forms of such relationships particularly those wbich can fit
into economic and social phenomena. Parabolic and hyperbolic -func-
tions are very common in physical sciences and they can fit in economic
data also. Parabolic curves are used to represent data to which laws
of increasing and decreasing returns apply. Demand curves and utility
curves are also parabolic in nature. The general form of e'Juation in
such caSes isy=axb. The curveis parabolic when the exponent b IS positive
and hyperbolic if it is negative. In suc' ~ curves there is no constant
term. The following eXl1I1l.ples would illustrate the d.ifference between
parabolic curves and hyperbolic cu~es.
If the equation is y=XI. the following series Can be obtained :-
x y
-4 16
-3 9
-2- 4
-I I
0 0
I
2. 4
3 9
4 t6
GB.APHIC REPRESENTA'llION OF DATA 393
The graph of the above data would be as follows : -
Parabolic curve of the equation,) Xl
16
14 1\ J
12 \ /
10
\\ /
s
6
1\ I I
4
\ V
2
1\\ V
/
oL ~ ./ --
~4 l -2 -, 0 2 3 4
Fig. 33
The above figure is that of a parabolic curve. In such relationships
an important characteristic is that if x variable increases in geometric
progression y variable also increases in geometric progression. Thus,
if the values of x are 2, 4, 8 and 16, the corresponding values ofy would
be 4. 16, 64 and.256. Both the series are in geometric progression.
If the relationship is y = x - 1. the following series can be
obtained :- .
x .Y
(x-t)
i
t z
I I
394 FUNDAMENTALS OF S'l'A'l'ISnCS
Jr-~------r---------r---------'
2~--~----~--------r---------;
o~ ______ ~ ________+-__ ~ __ ~
o 2
.Fig. ~4
z. 4
4 16
If the above data are plotted on graph paper, the following type of
CJtponential curve would be obtained :-
GRAPHIC REPRESENTATION OP DATA 395
Expo"'"tial &fIN1, ,.,prUlntill& Ib, 'IJ..lIIltiDII
.J=ab
40
32
,
,
,j
v
16
/
8
/
V
o
/ I
o 2 3 4 s
X
Fig. 35
fOO
I
90
II
80
70
-ij
j
7 J
60
y SO
I
/ I
V
40
30
v
7
20
/
V
10 ~
o
o 2 .3 4
X
Fig. ;6
The above curve can be used to study the relationship between
cost and return. With every constant increase of one unit in x-variable.
GRAPHIC REPRESENTA'lHON OF DATA '397
l:t;
August
:~:~
~I 2.0
~~
2.I
~~
20
September 2.9 2.2. 2.0 zo
October 32. 2.I 2.3 18
November 32 19 2.6 2.0
December 32. 2.0 23 2.2.
January 31 19 2.8 2.3
February 25 18 20 2.2.
March 2.4 19 2.I z8
Plot the above ligure" on a graph paper. and sh ow also the balance of trade.
(R. Co", ... Allabab1d. I9~8).
.398 FUNDAMENTALS OP STATISTICS
8, Study the following table and dtaw 11 StaPh on loguitbmic ecale to ahow
nct supplies of ~ and adult population in (Undivided) India : -
NctImportll
Year Production Seed and (+)orEx- Adult
of cereals wastage ~rts(-) po~ulation
'000 tons '000ton8 000 toos 000
9. 'Vhat are the advantages of the Ratio Scale over the Naturs} Scale i' Plot
the following, data graphically on the logarithmic SClIle ; -
Year Total Notes issued Notes)n c:irculation
in crares of rupees in crares of rupees
1933-34 167
19H-n I7:1.
1935-3 6 167
1936-37 19:1.
1937-3 8 IBS
1938-39 187
1939-40 237
1940-4 1 %.5 8
1941-42 410
J94 2 -45 62 5
(B. COMJ.. Nagptlr, 1943).
10. Plot the following figures reating to population of India (undivided) 80 as
to show the proportionate increase in population from onc period to another:-
Year Population.
(000,000'& omitted)
t87J uo
1881 :I.~O
18 9 1 2.90
1901 '295
19 I1 ;15
I9 u 5to
19;1 HO
1941 390
(B.o.•• NIfdX#T. I~J).
GRAPHIC lI.BPl\BSBN"l'A'llION OF DA'JlA 399
II. Show the results of wot,king of class I railwayi graphically and comment
thereon.
(In million. of
Capital outlay Gtpn earning)
I9.z~-24 464 JO
1924-1, 473 74
79 2 S-16 4 87 n
192 6- 2 7 50 S 71
19 27-.'l8 594 86
192 8-2 9 599 86
19 2 9-3 0 61 7 84
1930-3 1 62 7 77
193 1-3 2 63 1 71
193%-,33 63 8 70
19;;:-34 635 71
CB. CD1II., .AlII, IHO).
12. Reflfcaent graphically the data givCll below Oil a single sbeet of gnp}: P2per
to bring out clearly the relatiTc fluctuations in the prices of various articles. Draw
such conclusions as you can from thc graphs.
rr/HJ/~rll!s prit6r ;11 lVInpw
(hi rupees per maund)
Year Rice Wheat Linseed Gur Cotton Tobacco
19 zB 7.7 7. 0 6·s H·I 17.3
192 9 5.5 8.0 7·' 29.8 17. 1
193 0 3.6 6.5 6.2 17.3 14.,
1931 2.7 4. 2 4. 2 13.3 II.6
193 2 3·4 3·5 3·5 14. 8 4-9
I9H ,.2 3.4 3. 1 u·9 4·9
1934 2.8 3.6 4.1 13·. 5.7
(M. C-.• Alltlht."lItl. 19'''),
1.3. The following table shO'WI the total sales of gold by thc Bank of England
.)n foreign account. Represent the data graphically on the logarithmio scale:-
Year Pounds ('000)
1910 14,44'
19I:f 8,2.2.8
1912. 9,670
1913 7,94~
1914 8,027
1915 "3,076
1916 2,360
CB. CDm., AliIIWd, 1932).
14. The following table gives cost of living index numbers of Ksmpllr, Nagpur
and Calcutta. Represent the dacK graphically l -
Year Kanpur Nagpur" Calcutta
(I939= 100) (19'9= 100) (1939""=' too',
1944 ~I4 267 Z79
1945 ,08 219 aS3
1946 ,a8 as, 27'
1947 378 3 ao 30 9
1948 471 37 2 339
1949 47 8 377 348
1950 434 37% ;49
19,1 4,1 391 370
191 2 441 ,80 HI
19H 413 3 8, ,49
400 PuNnAMENTAL8 OF 8'l'A'l'J8TICS
IS. When should false base line be used? The following data give the Index
number of industrial profits in India. Represent it graphically : -
Year Index Number Year Index Number
(1929=100) (19 29- 100)
1941 187 1946 229
194 2 222 1947 192
1943 246 1948 260
1944 239 1949 182
1945 334 195 0 247
16. The following table gives the indices of the supply of money relating to
certain countries. Represent the data on a logarithmic sCale : -
Base 1948= 100
Year ending U. K. U. S. A. France
1945 86 92 47
1946 97 99 6z
1947 98 t02 71
1948 100 100 100
1949 lOT 100 125
1950 102 108 144
195t 103 115 170
195 2 105 119 192
1953 100 121 214
17. The following table gives the data relating to production, wholesale
prices, and cost of living in India. Represent them at one place by a suitable graph:-
Year and Indices of Indices of Indices of cost
quarter Industrial wholesale of living
production prices
(1939=100)
1951-5 2
I II7·' 459·9 144
2 II7·7 439·9 145
3 IZl., 43.5. 6 145
4 u6.0 401.9 139
195 2 -H
1 u6·7 373. 2 140
2 u8.: 386.7 143
5 1H·6 381 •2 142
4 13 2 .7 387 .4 142
19$3-54
I 135.5 395.9
2 134'9 407.3
3 137·5 39 1 • 2
4 137·3 395'~
18. The following table gives the prices of gold and wheat and net export of
gold during the years 1931-3:& to 1938-39 : -
Years Average price Average price Net export
of gold of wheat of gofd
(per tola) (per maund) (crates of RB.)
Re. as. Ra.
:l93:1-3 Z 2S 4 3·'
:l93 2 -H 30 I2 3.3
:1933-34 33 6 ·::.8
1934-35 35 8 3.1
1935-,6 35 4 3. 2
1936-37 36 0 3.9
1937-3 8 36 6 3.0
193 8-39 H r.a 3.4
GRf\ I'HIC RE ('RESENT." 'l'ION 0 F D.~ 'PA 401
Plot the abO're figures on a graph paper and comment upon the relationship.
(M. A., Aga, 1943).
19· Followin~ table gives the ptoduction of sugar io Cuba, Java and (undi.
vided) India during 1930-39 in millions of quintals. Represent the figures by a suitable
diagram and comment on their relationship.
Year Cuba Java India
19 2 9-3 0 44 '"9
193 -3 1
0 30 28 20
193 1 -;2 2S 26 24
193 2-33 19 14 28
1933-34 22 Ii ;0
1934-35 2S S ;1
1",~6 ~ 6 ;6
193 6-37 29 14 40
1937-3 8 29 14 52
193 8-39 26 ~, %7
(M. A., Palna, 194').
20. The following table gives the pwporuon of muried_ women in 1910 and
in 1920 from women of every $ge. Show graphically that the increase was most marked
for the women of younger years.
Ag~ Married women% Married women%
1910 19 20
18 17.0 19.2
zo ;6.2 3 8-4
SO·7 52 .9
62.0 64. 2
65-7 67.8
(B. Com., Nagpllr, 1944).
21. Descn"be the Lorenz graph. How does it differ from an Ogive? Illus-
trate your answer by fitting a Lorenz curve and an Ogive to the following data : -
Pertenta6e Df tlgf3 distribution Df the male
pop"latiDfl iff British I"dia, 19P'
Age groups Males Age groups Males
0-10 28.0 50-60 5. 6
10-20 2.0.9 60-70 2·7
10-30 17.7 70--over 1.1
30-40 14.3 Mean Age 23. 2
40-,0 9.7 (M. A., Pa/na. 1940).
Write a short account of the use of graphic methods in statistics. Draw
22..
a diagram to represent the ~a~ gi'(e~ in_ the. follo~ing tab!e showing the numb~r .of
rooms measured, in a certaUl 1OvestlgatlOn 10 which the sIZe lay between the ltmlts
given in the column on the left. (Area calculated to-the nearest square foot):-
Area of room Number Area of room Number
(Sq. ft.) (Sq. ft.)
zo and under 40 ~ zoo and under 220 18
40 ..
60
60
80
14
16
Z20
140"
Z4°
260
u
,
80
100
120
.. 100
120
140
~6
51
H
260
280
300
..
.. .. z80
300
3 20
a
I
a
140
160
180
•• 160
180
200
35
IS
2(,
310
340
360
..
..
..
. 34°
360
380
26
402 Fll'NDAMENTALS OF S']1.\TISTICS
Read off the median and quartiles from your diagram and check your results
byaetual calculation. (B. CD"'., Hons., Afllihra • 1944
2,.
DislfibliliOIl offirml ill Woollen alld Worllen [ndlil/riel
in Y Mllshire, according tD nfl1/lber oj operative .•
Operatives No. of firms Operatives- No. of firms
1-20 380 301-340 24
21-60 3 20 341-380 18
61-100 I82 ,81-420 II
101-140 147 4·U-460 16
141-180 92 461-5 00 9
181-:U.0 66 501-700 19
2U-260 ~9 7°1--900 IS
Z61-300 ,0 901- 16
Total number of firms 1~84
Represent this distribution graphically (by means of a cumulative diagram) and
'rom this graph estimate the median and quartiles of the group. (B. Com .• Llld,n" ..,
[93 0 ).
24. Represent the follOWing frequency distribution by means of a graph. Cons·
fUct the cumulative frequency curve also.
Class:interval Frequency Class interval Frequency
0- 5 13 SO-35 2S0
5-10 42 35-40 Z'37
10-1 5 135 40-45 .l~
15-20 237 45-5 0 '4%
to-2S 250 50-55 13
25-3 0 25 6
as. The following table gives the figures of production of paper and paper
lards tn India in 1000 tons .. Represent these figures by a Z-eurve.
:ar Jan. Feb. Mar. A~r. May June July Aug. Sep. Oct. Nov. Dec.
'49 8.1 7.7 8·7 9. 1 9·0 9. 1 9. 1 9.0 9. 0 8.4 8.z 8·9
50 8.3 8.5 9. 1 8·7 9·5 8·7 9·3 9·4 9·4 9.0 9·3 9·7
P 10.0 9.9 II.O 10.5 I1.Z II.O 10.9 10.1 II.~ II·4 II.4 n.l
26. The following table gives the wholesale price index numbers of certain
Ilntries. Represent the data on logarithmic scale.
Base 1948= 100
Year and Month India U.K. U.S.A.
1953
January 10 3 15 0 10 5
February 10 4 14 8 10 5
March 10 5 15 0 10 5
April lOS 152 10 5
May 108 ljI 10 5
June IIO ISO 10 5
July III ISO 106
August lIZ 149 106
September 110 149 [06
October 107 14 8 106
November 106 149 105
December 106 149 105
Gl\APHIO l\EP"ESEN'J.1Al.'ION OF DA'rA
6,90
4.9 6
6.44
14·40
H'99
7. 6 5 :22.'90
1951- 52 8.12 7·B :24. 2 1
0
..H+ ,
+10
... -,
20
2S
2Z
.. -y
-10
0
17
13 .. .. ., -IS
-20
•• -:-lo
.. -I,
10
7
.. .. -a, -3 0
" -zo
.. -.2.,
:& . -n •• -3 0
TABLE I
Values of X-Variable in different years
I ,
.... r-., vI II, ~l1
V \I I !'/ 1 I
I
\
,00
1:/ I V'
I :/V~VI I :I i I
I I \ I
'100
~ 1 I
I
I-r-- ,
_.___LL
1915
I
1920 1915
1/
J
1,..,.
I
1905
Fig. I
AN ~LYSIS OF 'l'IME SERIES 407
Data which are available in the above fashion are usually affected.
by a multiplicity of causes. The changes in the values of a variable
related to time can be the result of a large variety of factors, like changes
in the tastes and habits of p~ople, changes in population, reduction in
cost of production,. increase in incomes of people, etc. The value of
a variable changes due to the interaction of such forces. If these forces
were constant and not liable to change the value of the variable would
also be constant and even if an equilibrium of their effects was slightly
disturbed and after this there was no change, the values of the variable
would also slightly change and then become constant. But in actual
practice things do not happen in this fashion. Reality is more complex
than this. Tbese forces are never constant, and as such, due to the
effects of their constant interactions, values of the variables also go on
changing with the passage of time. Generally, we do not know much
about the variations in these factors or about the magnitude of such
variations. An idea about their effects is obtained only by a study of
the changes in the values of the related variable. Therefore, to study
the changes in these forces and the magnitude of ~uch changes, we have
to study the variations in the values of related variables in chronolo-
gical order. In economics these two conditions-one in which those
situations, in which there is no change in the effects are studied, and
the other in which the study relates to those situations in which there
is a change-are respectively called static and (i.J'nafIJic conditions. The
analysis of time series is done to understand the dynamic conditions.
As bas been already said, the effects of various factors and their
Significance are incapable of detailed study. Their existence i~·recog
nlsed due to changes in the values of related variables, By studying
Changes in the time series, an idea can be obtained about the changes in
the effects of various forces which interact simultaneously. The effects
of these forces can be classified roughly in some major categories.
These categories or classes are called the components of titNe series, be-
cause a time series is the result of the combined effects of diffetent
categories o~ forces. These components are as follows ' -
decrease but the general tendency of the data. is upwards. It can be said,
therefore, that~he data have an upward trend or tendency. That component
of a tillle series which reveals the.general tendenry of the data is called long period
Jr secular trend. The secular trend can be either upward or downward.
[t cannot be both ways. Secular trend is the effect of such factors which
are more or less constant for a long time or which change very gradually
and slowly. S\lch factors are changes in population or tastes and habits
of pevple, etc. The effc;:ct of such factors is very slow and gradual. For
example, the effect of increase in population on prices or production
cannot be sudden or irregular. It would always be very very slow,
gradual and regular.
In the analysis of time series the trend values are taken as normal
values. From these normal valu~s, an idea is obtained about the different
types of fluctuations which may be both regular and irregular. The
concept of normal values is no doubt an empirical one but it is very useful
in studying economic events. It should be remembered that social sci-
ences are incapable, by their very nature, of adopting the experimental
methods in their studies, and as such the concept of normal values,
even though it is empirical, is very essential and helpful.
As has been said, trend values indicate the smooth, regular and
long term movement of a series, but it sl:J.ould not be concludfd from
this statement that all time series give a definitely rising or falling trend.
Many series are such in which values fluctuate round a more or less
constant figure which does not change with the passage of time. An
example of such data is a series relating to the temperature of a particular
locality. The temperature would no doubt fluctuate in various seasons
but_ the general tendency of the temperature would hardly change with
the passage of time. Barometric readings generally fluctuate round a
more or less constant value.
:Ana!Jsis of time series. The observed values in a time series are the
result of the interaction of the various components discussed abovc-
secular trends, seasonal variations, cyclical fluctuations and random
fluctuations. In the analysis of time series attempts are made to isolate
the effects of these forces and to study them separately. The importance
of such a study cannot be overemphasized. Economists and business-
men ~ve not only to study the short time fluctuations but have also
to observe the long period or secular trend of the data. But here many
diBiculties have to be faced. These d.i.fficulties arise due to the limita-
tions of the science of statistics. If it were possible for a statistician to
carry on experiments like a physicist- he would have been in a position
to isolate the effects of '\tarious factors and to study only one factor at a
time. But a .atatistician cannot do so. He is helpless in this respect.
The only course ope~ ~o him is to stutly the effects of various factors
by the process of elimination. In the following pages this method has
been explained in details.
In this metho.d the data are first plotted on a graph paper and a
smoothed curve is plotted to the data merely by inspection. The curve
is fitted in such a manner. that the general tendency of the figures
becomes clear. Such curves eliminate other components-regular and
irregular fluctuations.
From the point of view of simplicity this is-the best method. It
saves time. In' other methods complex mathematical processes have
to be used whereas, in this method nothing of the type i~ needed. But
the main disadvantage of this method is that the trend curve so drawn
can be effected by the bias of the statistician. In such cases different
curves would be obtained from the same data by different persons. Due
to this shortcoming this method is not very popular. Usually trend
values are obtained by the application of mathematical formulae.
Since curve fitting by mere inspection does not usually give satis-
factory values of the trend, other methods are used for the purpose.
Movipg average method is a simple device of reducing .fluctuations
and obtaining trend 'Values with a fair degree of accuracy. The tech-
nique of moving average has already been discussed in the chapter on
Measures of Central Tendency. The fiJ;st thing to be decided in this
method is the period of the moving average .. What it means is, to take
a decision about the number of consecutive items whose average would
be calculated each time. Suppose it has b,een decided that the period
of the moving average would be 5 (years, months, weeks or days as
the case may be:) then the arithmetic average of the first items (numbers
1,2, 3,4 and 5) would be placed against item NO.3 and then the arith-
metic average of item Nos. 2, 3,4,5 and 6 would be placed against item
NO.4. This process would be repeated till the arithmetic average of
the last five items has been calculated.
ANALYSIS OF TIllE SERIES 411
. The ~ost important qdestion that arises hele is about the period
of average. Should we take three yearly (monthl, or weekly) moving
average or five yearly moving average or seven or nine yearly moving
average or the moving average of some other period, is a question not
easy to answer. This question is very important because trend values
are affected by the period of the ~oving average. We have already
said that the purpose of moving average is to obtain trend values so
that all types of fluctuations are eliminated or in any case reduced to
minimum. The period of moving average should be such as would
achieve these objec,dves. This statement, however, does not carry us
very far, for we have still to find out which period would be ideal for
realising the above mentioned aims. We shall study the trend values
obtained by various periods of moving average in different types of series
to arrive at some general conclusion.
1946 4 4
1947 6 6 6
1948 8 8 8 8
1949 10 10 10 10 10
195 0 12 12 I" 12 12
19S 2 16 ,6 16 16
19B 18 18 18
1954 20 20
1955 22
412 FUNDAMENTALS OF S'l'ATIS'l'ICS
The data given in the above table are plotted below in figure
No.2..
Linear Trend
-1
10
~
VV I
15 / I
l
vV I
:0
/
v _-
/
liI
s
~v
o I
'94J 46 .p 4-8 49 $0 51 52 5-' 54 Sf
YI!'''''s
F(g.2
-- o.·.,. :h/dtl
_._J YH'~: /ItI.~
- --_:
.... ,.
GO . /
1/
." /
·-;ff
.~;?
~
,,;
~
r;_; ~
f':?
9 -- 1=-':::; ~ 5) 51
'~:i ~ 4 48 ""9 5 53 54 5'11
VPar.s
Fig. 3
JlUNDAMBNTALS OF STATISTIC:;
TABLE IV
1946
1947
19
21
-
20.6
-
-
1948 22 2z.6 21
1949 22 2r.6 21
1950 21 20.6 20
195 I 19 18.6 'I !l
1952 16 15. 6 15
1953 12 11.6 I I
1954 7 6.6 -
1955 I - -
J
data and the other two of trend values-have been plotted in figure
NO.4 below : - .
8 r--r-jl--+---+-
Fig. 4
It will be observed from the above figure that the moving 2verages
have. given such curves which are parallel to the curve of the original
data. These curves are below the curve of the original series indicating
that in such cases trend values are less than the original ones. Further,
the longer the period of moving average, the farther the trend curve is
from,the curve of the original data. We, thus conclude that if a series
contains only trend and no fluctuations and if the origiltal figures give a (/Irve,
concave to thIJ baS/J, moving average would givIJ another curve, parallel to the original
one. Further, the longer the period of m(Jving average the farther wOllld be the
trend curve from the curtJe of the original data. This is another example of
CllrtJi-Linear Trend.
1945 -1
I -.6 0 +3
1946 -2 -8 0 +·4
1947 -2 -·4 0 +.2
1948 I +1 +.2. 0 -.1
- -
_O'~/"lj
ICH'~~U'
-- - 7 ~ #
11\
! 1/
I
----9" 4'1
I I 1\
\ '~ I/~ ;
\
\
.'
I / -. ,/
-
o --- - - - -- ..: - ._ i. - - ~
rI ~ -..,II
/
-- -.
\ ....
I
~
I
I
1
[ I I \,
.1
r93~
II3' J 7 $u
•
39 40 4 .,. 4. ·1
\ 46 1 41 49 SO 51
1\
51 53 5.
Fig. l
It will be observed from table No. V as also from figure No_, that
seven-yearly moving average eliminates all the fluctuations while five-
yearly and nine-yearly' moving averages only reduce them. The reason
is that there is a seven-yeady cycle in the fluctuations and that is wh,
a seven-yearly moving average has eliminated the fluctuations completely.
With a five-yearly moving average the 'range of the fluctuations which
is:l::2 in the original series-.has been brought down to+.S and nine-yearly
moving average reduces the range to:l::.4 whereas a seven-yearly moving
averllge reduces it to o. Fourteen-yearly moving average will also eli-
minate the fluctuations' completely. In fact, if the period of moving
average is in multiples of 7 the fluctuations would be completely eli-
minated. Other periods of moving average would merely reduce the
fluctuations. They cannot· eliminate them. Thus, we arrive at anothc:r
conclusion that if a series contains cyclical flllttuations, a moving average with
a jHrioJ leu or more than the period of the cycle wOHlJ only redtfCe the flrIG/tfalions
bllt if the period of moving average is the same as the period of the fYtle or its
"'lI!t~tJle the fltfctHations wOlild be completeb eliminated.
Example ,. The following table contains a series of irregulllr
fluctuations : -
"7
418 FUNDAMENTALS OF STATISTICS
TABLE VI
Irregular 5-Ye~rly 7-Yearl y 9- Yearl y
YEAR fluctuations mOV1ng moving moving
\ average average average
193 6 -z ... ... ...
1937 0 ... ... ...
193 8 +1 -.2. ... ...
19~9 0 0 -.4~ ...
1940 0 -.2. 0 - ·55
1941 -I -,2. -·43 -·44
1942. -1 -,8 -.7 1 -·55
1943 +1 -1.0 -.86 -·55
1944 -3 -1.0 -.7 1 -.2.1-
'945 -1 -.6 -- .4 1 -·H
1946 -1 -- .. 2. -·41 -',%.%.
Fig. 6
ANALYSIS OF TIME SERIES 419
and leading the time distances between vario'tls peaks. The average
of these time distances would give the average duration of the cycle,
The following table gives the values of a variable and its three-yearly,
five-yeatly and seven-yearly averages which ha'V'e been plotted in
Figure. No. 7. .
TABLE VIl
Vallles oj a Variable
19 20 225 ... .. ,
19 2I 21; 21 3 .. , ...
'"
I/~ !
-Onglna! dala
__ - 3 vcarly '" A
!
_ ··5 .
0[-7 .
~
'J
\ ~'V :1
-~
I
\
I _ ...
'r"\ _- r-
.
Z80
\
~ .' ,,-:;
,,d-r,I ItV ~ r-... V
_\
I~ ~
r- _- r- f-
Z<I0
ij,
,/1 fL _
.' t
~';;'
tx
J! I
t,~ KI
"... ~
\ /"
200
~~ J.lJ
f920 25 )0 35
Fig. 7
From the above graph it is clear that the first peak in the above
lata is ir the year 1925 and the second in 1930. The time distance
Jetween these two peaks is five years. The third peak is in 1935 and
lere also the time distance is the same. In this series, the time distances
between adjoining peaks is always five years. In other words, the cycle
!las a perfectly uniform period. The best period of moving average
n the above series is, thus, five years. If the time distances, between
ldjoining peaks were not uniform then arithmetic average of the various
ime distances would have bes:n calculated to obtain the average duration
Jf the ~ycle and the relevant period of the moving average.
It should he noted that if the average duration of a cycle is in even
number of years, say six, then the average of the first six figures (1 to 6)
would be placed between the third and fourth items and similarly, the
lverage of item Nos. 2 to 7 would be placed between the fourth and
Hfth items. The arithmetic average of these two moving average figures
would be kept against item No.4. Similarly, the other trend values
would have to be found out. The calculation of moving average with
an even period, thus, involves t\1'O processes and considerably increase~
the work of calculation. Generally, however, the period of cycles'is
in odd figures aad this. difficulty does not arise.
422 FUNDAMENTALS OF STATISTICS
The question that naturally arises is, how to obtain such a line
wh~ch would sati~fy the above-mentioned conditions. We shall first
take an e.xample and illustrate by a very simple and non-mathematical
technique, now this can be done. Later on, we shall examin~ the
mathematical implications of the technique.
424 FUNDAMENTALS OF STATI5TICS
TABLE VIII
Years Production
1945 IZ·7
1946 10.1
1947 1;.0
1948 1;.2
1949 1.2.6
195'" 14·~
19P 13·t
TABLE IX
Solution: Calcll/atio,n of the Line of the Best Fit by the Method
of Least Sqllc."'s,
0)
c; ~L. . . "'::0 .....o '"c:: X.
.9 0 tJ t:::-o 0
0 ........ ~
.-
I
(2) The deviations of years from the middle year have been cal-
culated. They are given in column No. ; and the squares of these de-
viations are shown in column NO.4.
(3) Deviations given in column NO.3 have been multiplied by
the production figures given in column No. 2. and the products havp
been totalled. This is in column NO.5.
(4) The total of column NO.5 has been divided by the total of
column NO.4. This figure ( 10;! =+.;86) shows the average annual
rate of growth.
(5) The arithmetic averagt: of the figures of column No. 2.
(12'786) has been written against ,the middle year in column No.6.
The procedure of finding trend values for oth~r years is clarified ;n this
column. The trend values of the years before the middle year would
be less than the trend value of the middle year and similarly, the trend
values of the years after the middle yeaf would be more than the trend
nlue of the middte yea!'. The difference betw.een the figures of ad-
joining years would be .3S~. 'In this example if the average annual
r!lte of growth would have been negative tqe trenJ values of the years
before the middle year wo'uld have been more than the trend value of
the middle year and the trend valut::s :>f years after the middle year would
have been less than the trend value of the middle year.
The production figures of gold and the Line of tlie Best Fit obtain-
ed by the Method of Least SQuares have been shown in the graph on
next page.
(Cror, Dtmn'S)
I$r------r----~------~----_r----~------T".----~
13
TABLE X
Values of x and y
x y
3
4
3 6
4 9
10
In the above table, the values of x andy are not in any fixed rdatio~
ship and we have to obtain such an equation which would give the most
probable values of a and b. First, we shall obtain five equations for
the five sets of relationships disclosed by the abov:e figures.
ANALYSIS OF TIMB SBRIES 427
Thus, when .1=a+bx
3-a+ Ib
4=a+z.b
6=a+3 b
9=a+4b
and lo=a+5b
Any two of the above equations can be solved as simultaneous
equations to get the values of a and~. But these values would not
necessarily satisfy the remaining three equations. We have to obtain
such values of a and b which are most probably taking into account the
relationships in all the above five equations. For this we shall have to
obtain two normal etJ.'IoIMIJ. The first normal equation can be obtained
by multiplying the five equations by the respective values of the coeffi-
cients of a and adding them together. The second normal equation
can be obtained by multiplying the five equations by the respective values
of the coefficients of b and adding thetn together.
Since the value of coefficients of a is unity in all the above five cases
,we shall simply add these five equations to get the first normal equation.
Thus, the first normal equation would be :
32.=S"+I5b
To obtain the second normal equation we shall multiply eguation
Number I by 1, number 2 by .I, number ~ by .5 and so on, and after this
we shall add them. After multiplication, these equations would be :
3=a+lb
8=z.a+4b
18=;a+9 b
6=.f.O+ 16b
50=~{1+2.5b
and their total would be :
115=Isa+s~b
This js the second normal equation. Thus, the tW(\ normal equa-
tions are:
sa+lSb=;2 ...... (i)
lSa+5S b= 11 5 ......... (ii)
If these simultaneous equations are solved, the values of rJ and b
\'\'ould be found,to be respectivelY.7 and 1.'). These are the most pro-
bable values (If ,q and b respectively. From these values we can get
the equations· which 'would give the line of the best fit.
),=If+bx
Substituting the values of II and 11.
_"=·7+ I ·9x
428 FUNDAMENTALS OF ~TATISTICS
,,=~
or .b=.386
Substituting the value of b in equation (t). we get
or 89.5-10.8=7.0
89·S-14 10 • 8+
or 78'7=74
or 4=11.2.43
430 FUNDAMENTALS OF STATISTICS
The equation of the Line of Best Fit is .1=a+ bx. Substituting the
above values of a and b, we have
y=1l.%.43+·386~
To find the ·trend values tor computed values of (y) for different values
of x (from 1-7), we substitute the values of x in the above equation and
get the following results' : -
When X=I,.7=I1.Z43+ .;86=Il.6z9
" X=Z,Y=11.z43+ .77 2 =IZ.015
" X=3,Y=II.z43+I.l~~=IZ·401
" x=4,J'=II.z43+ 1·544=U·7 87
.. x=,,)'=--'1I.2.43+ 1·930=13· 1 73
" X=6,Y=ll. z 43+2..·3 16 =13·"9
.. X=7,Y=II.z43+ 1.·702. =1 3'94'
These trend values are shown in the last column of the table giving
the solution of Example No.6 by this method. It will be observed that
the trend values obtained by this method are exactly the same as' obtained
by the method first illustrated. The difference of .00 I in trend value!
is due to the approximation of decimals.
In the example solved above, the calculation of trend values would
become very easy if the deviations from the middte year are taken as the
values of x. This has been done in the following table : -
TABLE XII \
World Production of Gold Qlzd its Line of But Fit (in crores of Ofillces)
rear Production .l)evtatlons ::'quare <..ol.zX Trend
(.7) from middle of Devia- Col.; Values
year clons
(x) (Xl) (xy)
(I) (%.) (3) (4) (S) (6)
194' 12.·7 -3 9 -3 8.1 II.6z8
1946 10.1 -1. 4 -zO.Z 12..014
1947 13. 0 -I 1 -13·" IZ·40 0
1948 13. 2 0 lz·7 86'
1949 12..6 +1 °I +u.6 ° 13· 17 z
195 0 14. 2 +z 4 +2.8·4 13·,,8
195 I 1;·7 +3 9 +4 1 • 1 13·944
Total 89·J %.8 +1,.8
In the a,bove data
l:(y)=89· 5. l(x)=o, r(xY)=Io.8, E(.lfI )=8, 0=7
The two normal equations are
~y=na+b I(X)
1;(xy)=o r(x)+b 1;(xS)
ANALYSIS OF TIME SERIES 431
With this equation, the trend values can be easily obtained. Thus,
far tbe year 1945. when the value of x is-3> the trend value would be
I2..y86+(-3X.386) or J2..786-1.1S8 or 11.6zfC. Similarly, other va-
lues can be <;alculated. These values are identical with those calculated
before.
This simplification is possible only when x's are consecutive num-
bers. It will be so when there is an unbroken time series. If there is
Il break in the time series, this simplification would not be of much help.
In fact in an \lllbraken time series where the sum of the deviations from
the middle year is equal to zero, the two normal equations are:
E C),)=na
E (xy)=b 1: (Xl)
It w.il1 be noted that in the above solution these simplified equations
are satisfied.
It should be kept in mind that if the deviations of these trend
fi gures from the original data are calculated, their sum would be zerO.
In the above example, the positive and negative deviations both total
at about 2.7 and their sum is zero. The sum of the squares of these
deviations would be about 6. 1. The sum of the squares of such deviations
from ,values of trend other than these, would be more than 6.1. The
Method of Least Squares is based on the principle that the sum of the
squares of the deviations from the Line of the Best Fit is the Least.
The main limitation of the Method of Least Squ~res is that if some
items are added to the original series, a fresh equation has to be obtained
as the values of x, xy and Xl, etc., would all r.hange with the addition
of items. Tn case of moving average, this difficulty is not there.
432 FUNDAMENTALS OF STATISTIC';
194'
1946
1947
1948
1949
195 0
195 1
195 2
195.3
1954
1955
fO/lltiDlI :
TABL~ XIV
Filling of a P.raboltl of Ih, Se;o,,,J Delre,
(ear Valucs l)CV13-
tions
from
middle
year
(1) (x) xl xC
.,
..6
17 -5 - 85 2.5 -uS 62 5
2.0 -4 - 80 16 - 64 2.5 6
7 19 -~ -51 9 - 2.7 81
8 26 -2. - 52 4 8 16
9 %.4 -I - 2.4 - 1 1
j
40 0 o 0 ° .) 0
1 35 +1 35 + n 1 + 1 1
+2 4 + 8 16
:I
~
55
5I -7-3
+110
+1",
:UO
4'9 9 + 17 ,.
~ 74 +4 +196 lIS4 16 + 64 2.5 6
69 +5 +H5 17 2 5 2.5 +u5 6z1
.." x=
x=
x=
z,
;
4,
. .)'=48 . 12
.1-,6'97
.1=64. 88
" x= 5, .1=74·55
. These ar~ the required trend values.' Th~ original data and the
___ .. d values are plotted below in figure 9 : -
Vallles of a Variable and itl Trlllil
(Parabola of the Second Degree)
'aNALYSIS OF -rUlE SERIES
o
\ 1
J
\1 II
I
t. \ 1 I I\ I
1\
t\ Il
i ·
\
~ ~ I\ \
\~
V
II I
-
1\ I -+
,t
- (0
~ II
-20
~ n H U U
i
~
J.~_~_
D ~'H
,
U
11' I
U v
Fig. 10
TABLE XVI
Calm/atiol1 of Seasonal Variations index Iry MONthly Averages
:>- ...
-5tl :g.B S 0:>-& .:.
0
,~
Production of a Commodity d-
o 5
II.J
:>-
._>0::: """
~
1l.ull.J
«I t:l
U '"
OJ
!::!!>Il
)lB"" :>-~ q) «I
Cl..
----
193 6 1937 193 8 1939 1940
(1) (2.) (,) (4) (5) (6) (7) (8) (9)
Jan. 13 1 I4~ 2.08 2.;8 2. 63 9 85 197.0 99·7
Feb. 12.9 15 1 211 2.37 261 9 89 197. 8 100.1
Mar. 12.1 149 20 7 225 250 952. 19°·4 96.3
Apr. 119 143 2.01 21 9 2.44 92.6 185·2- 93·7
May 113 149 201 212 240 9 15 183.0 92.·6
June Il6 159 20 3 206 253 937 18 7.4 94. 8
uly It; 153 193 201 25 0 910 182.0 9 2. 1
Aug. 113 165 2.05 210 255 95 8 191.6 97. 0
Sep. 1;0 173 210 220 2. 69 1002 200·4 101.4
Oct. IH IS, 22.1 1.37 285 10 59 zII.8 10 7. 2
Nov. 145 197 21.3 1.41 1.90 10 96 %.19'.2 110·9
Dec. 149 19 8 222 260 300 IIz9 22.5.8 114. 2
---------- ----
Total 118 58 237 1.6 11.00.0
---_._._-----------------------
Average 988 197. 6 100.0
--------- ----- ----
The procedure followed above can be explained in the following
four steps : -
(I) Find out the total of the values of similar months. For
example, in table XVI above the totals of the figures of Januarys,
Februarys, etc., are given in column NO.7.
(2) Divide these totals by the number of years. This will give
the monthly average fot variou5 months. (table XVI, column No. -8)
438 FUNDAMENTALS OF ST.".TlSTICS
r.. 'J- I
tlO _. ;/
/
fV c,_. I V
'\ VI
90
J M
'" A
V
:---.
'M j
~
.i
VI h
- A
I
5 ~[l'
ANALYSIS OF TlME SERIES 439
The\ seasonal tluctuations of the data given in table XVI are very
clearly visible in the above figure. In this example data relating only
to five y.ea\~s ~avc been taken into actual practice. Data relating to a
larger numher of years should be taken so that cyclical fluctuations may
not affect seasonal variations.
IY3 6
May
I II 3 ... '" ... ~
-., ...
June 116 12.7 ... ... ... '"
\
July 113 I u8 1%7·5 ---I4.~: -17. 1 +2.6
Aug. 12; !3° 12.9.0 - 6.0 -9. 1 +3· J
Sept. 130 131. 13 1 • 0 - 1.0 --4·4 +3·4
I
193 8
Jan. 208 195 193. 0 +15.0 +15. 2 -O.l
1939
Jan. ZIS Z~8 21 7.' +20·5 +15. 2 + .503
. Feb. 2.37 2.18 118.0 +19. 0 I +13'9 + S'l
442 I'UNDAMENTALS OF STATISTICS
1940
Jan. 2. 6 3 243 241. 0 +22.0 +15·2. + 6.S
I
Feb. 261 ~47 245. 0 -+-16.0 +13'9 + 2.1
July 2.50
Aug. 255
Nov. 29 0
Dec. 300
ANALYSIS OF TIME SERIES 443
The tigures of seasona~ variations given in col. (6) of table No.
XVII above have b~en calculated as below : -
TABLE XVIII
Seasonal
Month Dev:iations from Trend variation
'''6=~9''
IAverage
1939 1940 of cols.1.
3,4,' and
~ __ ,_ (2)____~;~__ (4)_ _ (,) (6) 6 (7)
_--'- ---- ----2
J~uary i ... + ;., +15. 0 +20·5 +22.0 +15.
February ... + ,.0 +15·' +19. 0 +16.0 +1;'9
March ... - 1.0 + 7·5 + 7·5 + 1.0 + 3. 8
April '" -II.O - 1.0 - 0·5 -9.0 - 5·4
May ... - 9. 0 - ;.0 - 9. 0 -17·' - 9. 6
June '" - ;.0 - 3. 0 -17·' - 7·5 - 7. 8
July I -I4'J -1;·5 -IS·' -2S·0 ... -17. 1
August I
- 6.6 - 6., - 6.0 -18.0 ... - 9. 1
Sept. - 1.0 - ;·5 - ;.0 -10.0 ... - 4·4
Oct. 0.0 + 2.0 ' ,
+ 6.0 + 5. 0 ... + ;·3
Nov. +10.5 +12.0 + 7·' + 7. 0 ... + 9'3
Dec. +10.0 + 9. 0 I
+ ,.0 +23. 0 ... +II.8
I :1
30 , --p'.- -
,. .'
I~, , . f
.'
~; ,_---/'"
>o~
.".:../
....-~---/~ -- ....
}O 0 ' £c '
~:/.,
,
10
~
0
J D o D
t I
D
Fig. 12 \
Seasonoi poria/ion olld irrcgN/ar j/u.-tNalioIlJ
. 0
- 5,.q50nt1~ Variations
,
- -1""~9ular FluctuO/lon$
('\
0
-'
/.h '- .....
t r\ ., ,I
....
"V
A
! r . . '\, if , \
,
\
r\ I
Ir
\ I\J
'., ~"
'i'i'ol
V
" I \ I
/ .,
i~1
\ I
V
2v
I
!
,
0
,
,.,
i
I
If 0 I 5 M f::1 .!
J M 0
Fig. 13
!,.I S
" ..J !. I> .J
ANALYSIS OF TIME SERIE<; 445
The above graphs very clearly show the seasonal variations of the
data.
Method of link relatives
The following s~eps should be taken to calculate the seasonal varia-
tion indices by this method : -
(I)' Calculate the link relatives of the seasonal figures. For calcu-
lating link relatives divide the figures of each seaSon by the figure of the
immediately preceding season and multiply by 100.
(Current season's figure)
·
P revtous season•s fi gure X 100
(z) Calculate the average of the link relatives for each season.
(3) Convert these averages into chain relatives on [he base of the
first season.
(4) Calculate the chain relatives of the first season on the base
of the last season. There will be some difference between this ehain
relative of the first season and the chain relative calculated by the previous
method. This is due to the effect of long period changes. It is, there-
fore, necessary to correct these chain relatives.
(5) For correction, the chain relative of the first season calculated
by first method is deducted from the chain relative (of the first season)
calculated by the second methoc1~ The difference is divided by the
number of seasons. The resulting figure multiplied by I, 2, ~,(and so
on) is deducted respectively from the chain relatives of the 2nd, 3rd,
4th (and so on) seasons. These are correct chain relatives.
(6) Corrected chain relatives expressed as percentages of their
averages, give indices of seasonal variations.
The following table would illustrate the above procedure : -
TABLE XIX
Quarterly Fif!,lIrel
Quarter 1940 1941 194.2. 1943 1944
TABLE XX
CalUllation of Chain ReialifJe.f
Quarters
Year
1940 -- 1
80
2
120 1)3
3 4
83
1941 1I7 IH 89
1942 88 12 9 III 92-
1943 80 IZ,5 115 96
1944
Ar1·thmetle
.
8, Il7 rzo 79
I
Average 81.8 12.1.6 , 118.4 88
Chain Rela-
tives 100 100 X 111.6 121.6 X Il8'4 '-14;'2 X88
100
=1,ZI.6
100 i=u6.6
100
14~·9
Corrected chain
r~latives
100
(
1,21.6-1.1
=120·4
143.9-2 .4
=141·5
I u6.6-;.6
=1 2 3
Seasonal
IndiCes
100 12.0·4
121.Z
141/5
- - XIOO --', XIOO
IZl.2
\
I
-XIOO
IZ3·
121.,
0
In the above table the figure for c:orrection has been calculated as
follows ! -
Chain relative of first quarter=loo
(on the basis of first quarter)
Chain relative of first quarter
. fl )=82.8 X 12.6.6
(on t h e b aSIS 0 ast quarter
JOO
= 104.1,1
The differenc~ between these chain relatives=r04.8-roo
=4. 8
4.8
Difference per quarter = - =I.Z
4
Seasonal variation indices have been calculated as follows :-
Average of corrected chain relatives
+
__ 100+ 12.0.4+ '41.S 1.23. 0
4
=121.2
Seasonal variation indices
Corrected chain relatives X 100
111.2.
ANALYSIS OF TIM8 SERIES 447
If from the original series trend values are isolated the remainder
consists of regular and irregular fluctuations. If from these fluctuationrs
seasonal variations are also isolated the remainder consists of cyclic and
irregular fluctuations. In figure No. 13 we have presented seasonal
variations and the remaining cyclical and irregular fluctuations. Or-
4inarily if the period of moving average coincides with the period
of cycle in the series and if the cycle is of a more or less uniform duration,
cyclical fluctuations are considerably reduced and sometimes elimnated
in the process of finding out the trend values..- As such, if seasonal
variations arc isolated from the total fluctuations, the remainder mostly
consists of irregular fluctuations. However, a certain amount of cyclical
fluctuations may also be there if they have not been completely eominated.
There is no well-recognised method of separating irregular and cyclical
fluctuations. One method is by finding out the moviQ.g aveage of the
series of cyclical irregular fluctuations. By this proces the irregular
fluctuations would be reduced and cyclical data would hecome more
prominent. The period of moving average in this case would depend
on two factors (I) The irregularity of the data and (2.) the extentto which
the curve is to be smoothed. The more irregqlar series is, the longer
should be the period of moving average so that irregularities in one'
direction may be set off against irregularities in another direction. But
if the period is too long the curve would be very much smoothed. The
problem is to find a middle course between these two factors and in
taking a decision the object of the analysis should be kept in mind.
So far as irregular fluctuations are concerned, t~ere is no method
to isolate them. Since by nature they are irregular it is difficult to know
anything about them. After removing trend, seasonal and cyclical
fluctuations from the original data whatever remains, constitutes the irre-
gular fluctuations. Since they are irregular in character, their scientif,c
analysis is more or less out of question. [t should, however, not be
taken to mean that they are not important. They are sometimes very
significant and they can even give birth to cyclical and ,other types of
regular fluctuations.
QuestioDs
I. Distinguish hetween secular trend, seasonal variations and cyclical fluctua-
tion •• How would yO~1 measure secular trend in any given data. (M. Com., Agra, 1946)
z. (a) Distinguish between regular and irregular fluctuations in a time series ••
(b) Write a short note on the value of analysing time variations.
(M. A., PtIII}4b. 19S1).
\. Write a short essay on "Analysis of Time Series" (M. A., Palna, 19S4).
4. Describe briefli the statistical procedllre you would adopt in the analysil
of Ii time aeries and explain how you would isolate the secuhr trend.
(M. A., Pahra, 1942).
448 FUNDAMENTAL<: OF STAT1STTC~
s. Esplain clearly what is meant by time series analysis. Indicate fully the
importance of such analysis in business. (B. CtJm., LIM_III, 19«).
6. Describe one method each of (a) eliminating the effect of trend from time
aeries (b) measuring the seasonal variations.
In measuring seasonal variation. can cyclical and erJ:atic influences be elimina-
ted. How ? cr.
A. S., 1948).
7. What Is meant by trend? How would you statistically eliminate the in-
fluence of seasonal and cyclic factors on the lOng period movement of any series.
(B. Co",., BOlllbtIJ, (936).
8. Discuss the claims and limitations of the method of moving averages as
applied to analysis of time series. (M. A., Delhi, 1953).
9. How would you find out the trend values in a series by the Method of Least
Squares? Explain the mathematical implications of the technique.
10. What do you understand by parabolic trend? How would you fit a para-
bola of the znd degree to a time series to obtain trend values?
I I . In an experiment designed to find the effects of seed rate on the yield of
wheat the following results were obtained : -
Seed, rate (lbs. per acre) 40 50 60 70 80
Avcrage, yield of wheat
(lbs. per acre) 850 86z 8S8 817 768
Draw a grapb and fit a second degree parabola.
u. Fit a stmight line trend by the method of least square and a ~rabolic trend
(by a parabola of the second degree) to the data relating to growth Of reserves of
Cooperative Societies in India liS given below. Plot th~ --seties and the trend on a
graph paper:-
l~. Explain how you will deal with a time series, and illustrate your remarks
with the help of the following series of annual figures for the period 1901-1930.
Period A.nnual Values
'90 I - 19 10 ot08,zz3.22 S.Z12..Z39,&4z.z38.z S2,2 57,:&50,
1911-19z0 Z73,Z70,z6B,z88.z84.:t8z,300.soS.Z9S.3 I 3
1921 - 193 0 3 I7.30 9,3 z9,335.3 z 7,34 S,344,HS.S62,S60
(I. C. S.. 1939).
14. Explain the use of moving averages in thc analysis of timc scries. Find
out an approximate moving average for the follOWing series : -
ll}Ol 506 19u 818
ll}OZ 620 1913 745
190 3 1036 1914 8.+s
lI}04 615 1915 U76
I~
l1}O6
s"
696
I~6
1917
~
814
1907 1II6 1915 929
l1}O8 738 1919 1560
lI}09 663 1920 961
1910 777 1921 926
1911 n89
(M. A. CaltIIll.)
A:;rhLYS~S OF TIME SERlE' 449
15. The following are the figures for the infantile mortality rate in England
and Walea (deaths of infants under one year of age per 1,000 live births).
Year Rate Year Rate
19 22 77 193 6 59
192 3 69 1937 58
19 24 75 193 8 H
19 2 ' 75 1939 ,I
19 26 7° 1940 57
19 27 7° 1941 60
19 211 65 1942 61
1929 74 1943 49
193 0 60 1944 45
I93 1 66 1945 46
193 2 6, 194 6 43
1933 64
1934 59
1935 57
Fit a simple moving average of fives to the series and apply a further simple
moving average of fives to the result.
16. Business Cycles in U. S.A. and England arranged in chronological ordeB
(1796-1923) have had the following d\!~tion as measured to the nearest years:-
U.S.A.
6,6, 5, 3. 7, 3,3, 5,4, 3, r" t, ", 6, 4, 3, 5, 5,4.9, " 3. 2. 3, 4, 3. 4 ,2, " 5, 2, 3·
England
4,6,3,5,6,4. Z, 6,10,7,4,8,8,9,8,10,7,6,5, z.
Tabulate the above figures in classes of one year eaeh and calculate the average
duration of the business cycle in each country separately. (B. Com., LHt!t:no", 1939).
17. The following table gives the Bank Clearings in the Bombay city for the
years 19.6 to 1940 in millions of rupees. Find the trend.
z,. 'The following are the quartedy index numbc:n of Industrial Production
with lQ50= 100 (All items) published by the Board of Trade, U. K. By II movlnS
avenge of four, calculate II quarterly index corrected for seasonal effects.
Year Index Year Index
1928 193 0
I 106.0 I 10 7.6
II 100·4 n 100.0
III 97. 1 III 96.,
IV 10~·7 IV 96 •0
192.9 193 1
I 107.2. I 9 1.5
II 108.6 II 89. 1
III 107.3 III 86.4
IV IlO., IV 94. 1
193 2
I 9 1 .7
II 91.0
III 84.4
IV 9 1.7
2.6. The following is an index no. of the price oflead from 19z!)-1945 together
with the "Statist" wholesale price index of the period. Construct an index of Jead
prices, "corrected" for changes in the wholesale price leveL
Year Index ofwholesaJe Index No. of
price Lead prices
192.6 u5 157
1927 122 US
19 22 119 109
19 2 9 114 II7
193 0 9 6 95
1931 82 71
193 z 79 63
1933 7 8 6,
1934 8y 61
1935 83 78
193 6 88 9'
1937 102 121
193 8 90 83
1939 94 8,
1940 128 127
1941 14z 129
1942 151 129
1943 115 129
1944 160 129
1945 164 42
20 7. The number (in hundrt'.ds) of letters posted in a certain city on each day on
a typical period of five weeks was as follows : -
Sun. Mon. Tues. Wed. Thura. Pri. Sat. Total
for each
week
lit week 18 161 170 164 153 181 76 92 3
and week 18 16, 179 157 168 195 85 967
~rd week 162 169 153 139 IS, S.1 9I1
~th week 182 170 16z 179 95 9 83
5th week 186 170 170 18z 120 101 7
Total for 886 814 79 2 922- 458 4801
1111 weeks ,
Calculate the average fluctuation, indices within a week.
(B. Co",., A"tlhra, 1942)
ANALYSIS OF TIME SERIES 453
50. The revenue from Sales TID: In U. P. during 19S8-S9 to 1(62 to 63 is shown
in the following table. Fit a straight line trend by the method 0 least squares and
exhibit the data as also trend on a graph paper.
Years 18 58-59 59-60 60-61 61-62 62-63
Re...enue (RI. Lakha) 427 6u 52 1
51. Flna out Trend, Short-time Oscillation, Seasonal VariatIons and Iuegular
fluctuation. &om the following data : -
Year Summa Monsoon Autumn Winter
1. 62 II9
,....
J.. 86
99
u9
171
2.2.1
2.35
s· 136 5 0 2.
Correlation 16
Meaning. In various types of analysis discussed so far in previous
chapters we have confined ourselves to such series where variol s items
assumed different ,'alues of on, variable, We have discussed how,
measures of central tendency and measures of dispersion and skewness
are calculated in such cases for purposes of comparison and analysis,
With the help of these measureS such data can be easily understood.
There can, however, be such series also where each item assumes the
values of two or mor, variables. For example, if the heights and weights
of a group of persons are measured we shall get such .series where each
member of the group would assume two values-one relating to height
and other relating to weight. If besides heights and weigbts, the chest
measurements were also taken, each member of the group would assume
three values relating to three different variables. In such cases we can
calculate averages, dispersion and skewness, etc., in accordance with
the rules given in previous chapters.
But sometimes it appears tha the values of the vJrious variables
so obtained are inter-related. It is likely that such relationship may
be obtained in two series relating to the heights and weights of a group
of petSons. It may be observed that weights increase with increase in
heights-so that tall people are heavier than short sized people. Simi-
larly, if the data are collected about the prices of a commodity and the
quantities sold at diff~rent prices two series would be obtained. One
variable would be the various prices of the commodity and the other
variable would be the quantities sold at these prices. In two such series
we are again likely to find some relationship. With increase in the
price of the commodity the quantity sold is bound to decrease. We can
thus con~lude that there is some relationship between price and demard.
Such relationships can be found in many types of series, for example,
prices and supply, heights and weights of persons, prices of sugar and
sugarcane, ages of husbands and wives, etc.
The term correlation (or co-variation) indicates th, ,.elationship betfIJ,en
two slIch variables ;n wbicb with cbanges in the va/illS of one variable, to,
vallies of the otoer variahle also chang'. Here the word relationship has
been used in the sense of mutual dependence. Correlation in two series
need not always be the result of their mutual inter-dependence. Changes
in one series may be the cause of changes in the other and there may
be cause and effect relationship between the two series. It is also-...likely
that the changes in the two series are the effects of a third factor which
affects both these series either in the same way or in different ways.
CORRELA TION 455
SCATTER DIAGRAM
.... . . ..
.. ..
'0
....
l" Y fO
..• Y
. .. .
.. . 0
• .
0 .: •
0 20
• 10
0
()
'0 20
The three figures given above are scatter diagrams. They indicate
the scatter of various points. These points are not in any mathematical
relationship and as such they only indicate the trem! of the data. Figure
No. I indicates positive correlation as it shows that the values of the two
variables move in the same direction. Figure No. 2. indicates negative
correlation as the values of the two variab~es are moving in reverse direc-
tions. Figur~ No. 3 does not have any trend line and it shows that
there is no correlation between the two variables. .
Ids possible to have a line of the best fit in the above type of cmta.
If jt is drawn by the method of least squares it would set a mathematical
relationship in the variations of the two variables. One advantage of
the line of the best fit is, that with it, if the value of one variable is given
it is possible to estimate the value of the other variable. The line of the
best fit can also be drawn bY' free-hand. It is rather difficult to draw such
a line by free-hand method. Generally a piece of thl ~ad is stretched
through the plotted points to locate the best possible position for the
line.
CORRELATION GRAPH
TABLE I
Pritt and SupplY of a Commodi!"
Year Price-per Maund Supply
(in rupees) (in maunds)
1944 32 22,000
[945 45 29,000
1946 )2. 22.,000
1947 2.9 19,000
1948 44 27,000
1949 69 43>000
1950 40 24,000
1951 2.9 18,000
1952 31 2.0,000
1953 39 23,000
1954 5; 3 2 ,000
1955 43 26,000
Average 40.5 25,4 00
In representing the above data by a graph the ordinate will have
two scales-·one which will show the price (in rupees) and the other
which will show the supply (in maunds). In adjusting the scales of the
two variables, care should be taken to see that their averages are at the
same level or in any case very close to each other. If this is .done the
two curves would be close to each other and their re~ationship can be
easily studied. If need be, a false base line can be taken for the purpose
Price and SIiPPlY of a Commodily
"riu Supply
(p,r mil) (oo.,mds)
q .. 72 45
fprta I AI\
,-
64 40
56 35
I
- S/Jpply
! 1\
48 30
f \
, .~~
I t\
I
\
Il'
, f\ /
[7/ ,:\
//' --1\1 1/
40 25
,,
l
,
r---..
~T' , "
I
32 20 I
~-I
t
24 15
I
16 10
I I L
8
o
.5
,1 1 l_Ll .__L_L-JI
Years
Fig. 4
COIl RBLATION 459
COEFFICIENT OF CORRELATION
"
.'
.' .'
,"
. . . ,
.. '
.'
"
Fig, 5 Fig, 6
It would be observed from the above graphs that all the corres-
ponding v.alues of x andy are in a straight line. Figure S indicates perfect
positive correlation between x andy as the variations in the values of the
two series are always in a fixed proportion and they move in the same
direction, Figure 6, on the other hand, shows a perfect negative correla-
tion between x andy as the variations between their values are in a constant
ratio and the two series move in reverse directions.
After knowing this, it is necessary to obtain such a measure of cor-
relation which can ac;curately indicate the de,grel! of correlation in quan-
titative terms, The measure should be such that its extreme values re-
present perfect positive and perfect negative correlations and the value
in the middle, absence of correlation, Such a measure is given by the
coefficient of correlation. .
The coefficient of correlation which we are going to discuss in
the following pages always varies between the two limits of + 1 r nd - I.
CORRELATION 461
'When there is perfect positive correlation its value is + I and when there
is perfect negative correlation its value i5- 1. Its mid-point is 0, whicb
indicates absence of correlation. As the value of tbis coefficient decreases
from the upper limit of + I, tbe extent of positive correlation between
the two variables also declines. When it reaches the value of 0 it indicates
complete absence of correlation and \vhen it goes further down in negative
values (less than zero) it indicates negative correlation. When it reaches
the other limit of - I there is evidence of perfect negative corrdation
between the two series.
The above-mentioned points can be studied from the: graphs which
have been given so far. When the values of the variable are like those
given in Figure 5 there is perfect positive correlation or the value of tbe
coefficient of correlation is+- I; when they are like those given in Figure
I there is positive correlation but it is not perfect, or the value of the
coefficient of correlation is less than--+- I but more than o. When the
3
values are like those given in Figure there is no correlation between
the data or the value of the coefficient of correlation is 0 ; when the values
are like those given in Figure 2, there is negative correlation though not
perfect, which means that the value of the:' coefficient of correlation would
be more than 0 (on the negative side) but less than--I. If, however,
the values of the variable are like those given in Figure 6 there is per-
fect negative correlation or in other words, the value of the coefficient of
correlation would be- I.
CALCULATION OF COEFFICIENT OF CORRELATION
),.2+ •• anYn2)2
2
-)'1 Ya 2 (a 1 2+a 2 2 - 2tilal)+ ..
2 2
=-"1 YI (1I1- a 2)2
=0 if 11 1 =02=08= .• lin
Or positive for all oth~r values, or > 0 for other values
Thus
(i) When ai=Ftl2= •. tin
n Zo 1 2 u2 2 - (~Y) 2=0
or (:Ex])2=n2u12u22
or JExy)1 =1
IJ2U12Ua2
t) r r2=I
()rr=l
(it) When"l is not equal to tit or a,. etc.
2
It'" a1 a22_(~)2~ 0
or n? al? u2?' J> CL'9')'
CORRELATION 463
.
Average he1ght 0
f f:ather = :Em 1 544' = 6'
-n- = -8- g
z
Average height of son = l;nml-:_Hg ' =69'
(l;x'y)1
or 1 l>
n20'1 20',2
or t> rl
I
DrxJ>r
or r <1 I
Thus r or coefficient of correlation cannot exceed unity
464 FUNDAMENTALS OP STATISTICS
U1 = Jt ;2 = J ~6 =~.I2'
Standard deviation of the heights of sort
"'. = J,~
I
y2 =
11
j« =2..34'
8
Substituting the above values in the Karl Pearson's formula, we
get,
+.24
r = -------=+.6
8 X .2 •• zX .2.34
(it) r D<'.Y
.~
__
"'~ x J__ J--
-.iiY3- -
nJ l,;
n n
If the above example is solved by this fomiula. ,,'e gel
=--------
8 X j ~6 X :; ~4 = +.6
'iii) r= _~_~.L -
\; Vl:,'I:' 2 X:E),2
In the above example with this formula we get
.24
r= -=--==-- = f-.6
V~6X44
CORRELATION
375 60 - 544X n 2 / 8
r= -..,;7-=37=-:>=2=:::8_:::::;::(5=44==):;;:2/;;8)~(=;8:::}=;=2-=:;:(5=5:::;2)~2/7:=8=-
375 60 - 37H6
24
=-~=:=;,.JL.6
..,; ;Z X44
Short-cut methods
In these methods assumed average is used for the calculation of
coefficient of correlation. Instead of taking deviation from the actual
arithmetic average (for calculating standard deviation and (Exy) the
deviations in both the series are taken from assumed averages. The
sum of the products of such corresponding deviations (:E~Y) is later on
corrected b~· subtr.lcting from this figure. the product of the differences
between the actu.l1 ~lnd .lssumedaverages in the two series , and the numbel
of pairs of obsecv:ltlons. The standard deviation of the two series is
either calcuhto::d lW the short-cut method or the relevant formula is
inserted in the formuh of calculating coefficient of correlation. Thus
co-effidellt of c(\rreiation or,
:r,'!I'-n(al-x1) (a.-xl!)
r=
Where,
III =Actual arithmetic average of the first series.
a z=Actual arithmetic average of the second series.
Xl =Assumed arithmetic average of the first series.
x 2 =Assumed arithmetic average of the second series.
:Exy=Sum of the products of deviation from the assumed averages.
The other symbols stand for the same things as in the first formula.
or
Exy-n(E:) (T.;)
Cii) r = - - - - - - · - - - - - - _ _
llj~;2 _( ,:x Yj-T.;:2 _(~ y
Exy _( EX; EY)
(iii) r = - - - - - - - -______ _ -----_,
0'1= j 28
10
0
3 (E.)3c =16'79 thousand labourers
10
0' •• = J~
10
_( -s )2.
\: 10
=1.97 lakh bales.
Coefficient of correlation Or
r- 39:1-10 [(381.2.-580) (24.S -2S)]
- lOX 16'79X 2.97
=+ .8
In the second, third and fourth methods there· is no need to
calculate the arithmetic average, or the standard deviat~on.
Coejfiriel1t of Corre/arion
Suomi Method
1
39 0 - 10 (:;) ( 10 )
r __
== ----------
V 1815· 6 X 88·5
=+ .8
Coet!i,;ient IIJ Correia/ion
Fourth Afe/bod
390 X 10 (12 X -- S)
r
'\12.830 X [0 -(TI2.)2
\9 60
'\Iz81l 6 X B8). = + .8
The coefficient of correlation of + .8 indicates a high degree of
pos~ivecorrelation between the number of labourers and the cort·
sumption pf cottOD. It means that with an increase in the number
of labourers the consumption of cotton also increases and vi.·ever sa.
II ' " \0
. ~::-::
'" "\0
2
0
o,\co....
.. 0 co 0 '<I"
~ ... .... II 00;!
I + I I + I I + ++ i \+
I :• \DC\I \0N
I
""' .... 00
~ l'1li
0toot \0 0 \0 ... ... •
/'
I I ++ I I+ I I I
472 FtTNDA.~!ENT.ALS OF STATrSTrC~
CoeJficim r of corre/ation
Direct Method Forlllula No. ;
"£.xy
V94-4 X 4 I 47
The value of + .07 indicates that there is hardly any correlation
between the short time oscillations of the two series. Probably this is
on account of the fact that the figures are imaginary. Ordinarily
short time fluctuations of supply and prices give a high degree,of cor-
celation.
(iii) Correlation of cyclical fluctnations. In example No. 3 above
we have calculated the coefficient of correlation between short time
fluctuations of two series. We know that short time fluctuations
consist of seasonal variations, cyclical fluctuations and irregular fluc-
tuations. It may be desired to study the correlation. exclusively
between cyclical fluctuations of two series. For this, it is necessary to-
obtain exclusive figures of cyclical fluctuations. After this has been
done, these cyclical fluctuations are divided by the '8tandard deviations
of the series to which they relate. This is done to bring them to a
common denominator. These figures are then multiplied in pairs
and their products are totalled to obtain the value of }; xy. This figure
divided by the number of pairs of values gives the required coefficient
of correlation. Here the formula for coefficient of correlation is.
r= I;xy
The reason for this modification in the formula is that the values
are already divided by their respective standard deviations and as such
there is no need of dividing 1;xy by the product of the standard
deviations.
In actual practice instead of cyclical fluctuations the pcrce~tagcs
of cyclical fluctuations are used in the calculation of coefficient of
correlation. Cycle percents can be easily calculated by the following
procedure ; -
(r) Represent the original series by percentages based on trend
valuer,. It means that the value against a particular year should be
divided by the corresponding trend value and multiplied by 100. These
are perct:ntages of trends.
(2.) Compute the seasonal variation indices by the methods ex-
plained in the last chapter.
(3) From the percentages of the trend subtract the corresponding
seasonal variation percentages. The resulting figure would give the
cvcHcal variations.
CO RI{ELA 'cION 473
11= 11 + 5·53
Coefficient of correlation or
I;xy 5.53
r = -- n
... = ---=+.502.7
I I
(x-series)
Age i'1 years Number of Husbands
20-30 5
30 -40 20
40-5 0 44
50 - 60 24
60-70 7
Total 100
I- ..("j_ge (Jf W'it1e!'
I -----------.--~
Age in years
("I-seria)
Number of Wives
15-2.5 17
2.5-;5 37
35-45 15
45-55 25
55-65 6
Total 100
f
------ ---------r-----------------
15-2.5 5 9; I7
2.5-35 10 25 2. 37
35-45 I U 2 15
45-55 .•. 4 16 ,. 25
55-65 ... ..• 4 2 6
--Total - - - - - - --:~- -~;-r--;-4- - - : ; - --~oo--
5 I I '
CORRELATION 475
=45'11 years
Stanaard deviation of x-series or
=J~
100
( 100
8 )lIX 10 =J" ·9 -·)o.64X 10
1
9' j, years
a,=x2+(~fdy
If
Xi)= 40 +( -;4 X 100
10)
;=:::36.6 yeats.
U2=j~-(~~~yx;
=j110 _ (:-34)Z~ 10
100 100
x-sedes
-
1;0=-40 ~o-,o
I-~--
Age-Husband_ 50-60 60
Wife
J.
------ --------i----
Deviations
'__--
---- - - - - - -
I
dx_., -20 -10 O. 10 20
--~\
9 3
---I-.-----
:1.5-55 -10 1000 0 -200 800
10 .2.5 .2.
- -- ---
~
·0 0 0 0 0 ·0
~ 35-45 I 12 2
~
---- -
-:=-~--
.-'-_
- -
4
----
.2.
-_._.2.800 0 2200
---1-,
I 800 8800
(EXY)
In the above table the figures given at the left top of various
cells indicate the products 'of the deviations of x and.J series from
thefr assumed averages find the corresponding frequencies. Thus,
in x-series, tqe deviation of the mid-point of 2.0-30 group from the
assumed average (of 45) is-zo, and io.),-series the figure of deviation
478 FUNDAMENTALS OF' STATISTICS
of the mid-point of 15-25 group from the assumed average (of 40) is
also-zoo The number of items \vith these values is 5. The product
of these deviations and the frequency is (-20 X -20 X 5) 2000,
which is written at the left top cornet in the relevant cell. The total
of all such products comes to 8800. This is the value of l;x.y from
assumed averages.
The co-efficient of correlation between the ages of husbands and
wives can now be calculated by using the following formula : -
_l;xY-f1(al-x 1)(02-X 2)
r- -------- ---- .
aX (11+(12
8800-100(45.8-45) (;6.6-40 )
r= . - ---. - -----
100 X9.5 5 Xl 1.9 2
=+,79
Ages of husbands
Ag.-
Group
'20 oj]
1Jo-4014o-S( jso-!JO160-70
Ages of
WIves Mid-
I'olnts 35 4~ 55 65
2S
K
-20 -to Q +tQ HI} T6U\
~'-~5 6(1 + 20 +2
4 2
6 +12 .4 16
r01ll1 I ~ ]JJ 44 l~
, 100
r. 1"..1 ~f¥
--34 ~154
I(J,
(d)( -10 -26 0 {>oN +14 ... 8~
'Uri"
Uy' 20 ZO .\) 24 211 "'92
fJ>cdj 20 28 0 22 18
Ef~lI~
480 FUNDAMENTALS op STATISTICS
r=--~~==~~~~====~~
160j_9"-1~-O-- c!o) j :~~ -( :~4)2
88 X 10) -(8 X--H)
8800+27 2
v'9::2=0=0==6;:4 v' I~5-40-=0===I1=5:::;:&=-
9°72.
r
v'9I;6X 14244
r=Anti log. [log. 90 7 2-1(log. 9136+log. 14 244)J
r=Anti log. [3.9 5 76-h(3·9 60 9+4.15 23)J
1
give the lower limit, and if it is added, it would give the upper limit,
within which the coefficient of correlation can be expected to vary.
Thus, in the example solved above, relating to the ages of husbands
llDd wives, the value of the probable error would be,
Sy=
J :Ed!
--n-
ship ill established between the two measures which is also helpful
in the calculation of coefficient of correlation. Such a relation-
ship is obtained by dividing the standard error of the estimates by
the standard deviation of they series, or in other words, such a relation-
ship is expressed by finding out the value of S,Y This is usually
(1Y
called HleaSllrtJ of ~o"elation. A better measure of correlation is obtained
by finding out the coefficient of correlation which is obtained by the
following fotmula : -
Coefficient of correlation, or
r=jl_~j_2_-
of'
1010=
3310=1,.+, ,b..
,.+
Substituting the v~lues in the above two equations we get -
I,b •.• (1)
. .. (ii)
If these two equjltjons are ~olved simultaneously the value of it
would be 100 and the value of b 34, and tbe equation of the line of the
best fit would be- .
y=IOO+34X· .'
On the basis of this·equatipn the computed values of.1 have been
given in c()l~mn No~ 5 of the above table~
In order·to calculate.·the value of Sy w.e shall have'to find. out the
~fference between the, original and computed values ofy and we shall
have to obtain the square of thes~ devia~on~. ,For c:r-lculating the value
of tTy we shall have to £ind out the deyiations of the original values of .1
from the arithmetic average of the series and·we shall have to obtain the
square of these deviations als~. This has been done in the following
table : -
Calculation of the valtler ofS.1 and rI.1
I Duterence r
between
original and I -
Deviations of
original values
-
of.1 from the
Computed computed mean of the
values of values series 20Z.
(-"") ()') (y) (d) (d') (dy) (d)'·)
-I
-_- ----- .-...------ ----
166 134 3l 1014
-------~
-3 6
---
96
12
2 t84 168 16 z.S6 -18 3~4
3 141 101 -60 3600 -60 I 3600
4 180 2.}6 -~6 3 1 3'6 -:1.2- 4 84
S 33 8 17° 68 46 14 1;6 [849 6
- ---- ------ ------ --__.__ ~_----- ----
15 1010 101C> ... 11640 ... 24100
Standard error of the estimate or
J'y=
=5 0
JtJr' =
-n-
•2
j12640
-5- =V 2~
Standard deviation ofy series or
J t!Yt.
II ,=
J24200 =V
5 4!<4 0
486 FUNDAMENTALS OF STATISTICS
Coefficient of correlation
r=Jl--.~)'2 =jl_2~
ay2 4~4°
=+.69
The coefficient of correlation calculated by Karl Pearsoo',
formula or by the Product Moment formula would abo be+.69.
This can be verified as tollows :
According to Karl Pearson's formula
r .._...;.1:"",,'\...
:'_
n)( CTXX o-y
The value of ~x)' if calculated would be HO .nq of .x 1.4
Thu~
r 340
S )!: 1.4 ~ 6 9,6
=+,69
It can easily be proved that
~-/t=rz
Sy =cryV
In the above example CT.Y V'X-=;2 \
=69. 6 </1-.4 8 =69.6 </ ~ =69. 6 X .71
=5 0 • 2
Shorl-clit Mtlhod. The coefficient of correlation by the Method
of Least Squares can also be obtained directly without calculating the
values of Sy and cry. For this, besides the computed values of_' the
value of I;yl has also to be obtained. In the above example the Talue.
of yl would be as follows _
Original
values 166 184 142 r 80 .B 8
I Tot"l
1,.)10
ofy •
=+.69
It will be observed that in this method there is no need of finding
Ollt the values of d, dB, dy, and 4'11. The table which is prepared for obtain-
Ing the computed values of.} is enough for finding out the coefficient
of correlation. Only the values of ~(yS) and ry have to be calculated.
After assigning ranks to the various items of both the series the
differences of corresponding rank values arc calculated. To calculate
the coefficient of correlation the following formula is used : -
r _ 1- 6(»l1)
n(III-I)
or
6(»11)
n 1- tI'
Where r stands for the coefficient of correlation ~I for the total
of the squares of the difference of corresponding ranks. and n for the
number of pairs of observations. The following example would ilh,ls-
trate the above formula : -
B"lt4I11pJ, 6. Calculate the coefficient of rank correlation from the
following data-
='82.
Sometimes where there is more than one item with the same value
a common rank is given to such items. This rank, as has been said
earlier; is the average of the ranks which these items would have got had
they differed slightly from each other. When this is done, the coefficient
of rank correlation needs some correction, because the above formula is
based on the supposition that the ranks of various items are different and
that no rank is given to more than one item.
If in a series there are III items whose ranks are common, then for
correction of the coefficient of rank correlation 'I"
[("",1-18)) is added to
the value of (Edl). If there are more than one such groups of items
with common tank, this value is added as many times as the number of
such groups. This procedure is clarified in'the following example : -
. Rank Rank.
CaltNlation of Coiffititnl of Rank Corr,latiotl
Dlltercnce dl
I
Y
_____________________,_of_!anks (d~ _ ___ _
48 ~ 13 5.S -Z.·5 6.15
B S 13 ,.j -0., .25
40 4 14 I +3.0 9.00
9 10 6 8.5 +1., 2..2.5
16 8 IS 4 +4.0 16.00
16 8 4 10 -2..0 4.00
6, 1 20 2 -1.0 1.00
2.4 6 9 7 -1.0 1.00
16 g 6 ll.5 -0., .2.5
j 7 2 1 9 'j 3 - 1.0 1 • 00
------ -----_ --------;--- -_ ,--------- ----
__n_-_-_.I_o__~________+_--------------~~-----O-----------4~1,:~~
In the above table in x-series the figure 16 occurs three times. The
tank of all these items is 8 which is the average of 7. 8 and 9-the ranks
which these items would have got had thete bc:oco some difference
between their values. In.1-series figure 13 and 6 both occur two times,
Their ranks are respectively 5.5 and 8.5. Due to these common tanks
tbe coefficient of rank correlation would have to be corrected.
490 FUNDAMENTALS OF STATlS 7ICS
=+'73
Coefficient of concurrent deviation
Sometimes it is desired to study the correlation bet\tlee...1 two
series in a very casual manner, and in such cases no particular attention
is needed so far as precision is concerned. In such-<=ases it is enough to
calculate die Goeffillie1l1 of GonGllrrenl devialions. In this method correlation
is calculated between the dirul;on of deviations, not thair magnitudes.
As such only the direction of deviations is taken into account in the
calculation of this coefficient, and their magnitude is ignored.
It has already been said earlier that if the short time fluctuations of
two time series are positively correlated or in other words if their devia-
tions are concurrent, their curves would move in the same direction
and would indicate positive correlation between them. Coefficient of
concurrent deviations is calculated on this very principle and ordinarily
It indicates the rela.tionship between short time fluctuations only.
To calculate the coefficient of concurrent deviations, the devia-
tions are not calculated from any average or by the method of moving
a'Yerages but only their direction from the previous period, is noted
down. The formula for the calculation of coefficient of concurrent
deviations is ginn below : -
Coefficieat of concurrent deviations or
where r
r ==J= cat n ~
teprescnts the coefficient of correlation,
We get,
=± Y-'(-'6364) .=--y;6364
=-.79 8.
Thus, the_,rt>efficient of correlation between the output of steel and.
the number oC unemployed persons in steel industry is -.798, which
indicates that there is a hlgh degree of inverse correlation between the
two.
If there ;re
concurrent deviations between two serle!' (whether
positive 9r negative) xJ would always be plus. The value oh: is equal
to the number of times the deviations are concurrent. In the above
s~es. there 'lire pnly two concurrent deviations, which means t~at o?ly
two .t1mc$ the:rnovement of the two series have been in the same dIrectIOn
, an,d t~t is why there is a high degree of negative correlation between
them. ,'.'
Cottela'tioD table
Jusi as in case of scatter diagrams we plot the value of x and y-
variables 01). the graph and study their trend lineS, in the same way,
conclusions can be drawn about the relationship bert'i,'een .two variables
w.hicli are' presented in the shape .of continuous series in a two~way or
correlation table. In 'correlllHon table the number of items which as~
x
sume particular values :of apd y variables arc entered in the relevant
cells. ~here~~r ;:?rrelation between grouped series h~s to be studied
correIa,tIoo ,table IS necessary. The following table gives the figures
of production of pig iron and the figures of industrial production:-
. Indices of Pi~ Iron Production
Total
I I I I I I i
~a
a
'
c:I \
-
1}.::>-130
110-120
[Oo-tIO
I
I
I
I
I
I
I
I I 1 () I ~4 I
11115 1 1 0 1
! n
I I
I
.p
-g.S!
... u
I 9°--100 1 I I 1 31 H I I I I
'0 .g 90 (
I!o-- I 2. I 2.4 i \ I
VJ e 70- roo I I I 7 I l I I
C)
~p.. 60- 70 I I 2. \ I I \ i
oS 50 - 60 ! () I z! I ! 1 I I
Total 6 I 4 I 10 I 2.9 i 41 I 511 I 40 I rt> I ~04
CORRRLATION 493
In the above table the figures are in a diagonal band almost in the
same style as the points are in a scatter diagram when there is positive
correlation between two variables. If in a correlation table the values
of x (or,}) variaqle given horizontally are in ascending order and the
values of.J (or,x) variable given vertically in descendjng order from the
top. and if in such a table the frequencies are in a diagonal band rising
from left bottom towards the right top. there is an indication of a high
degree of positive correlation bctv.een the two .series. In the above
table indices of pig iron production shown horizontally are in ascending
order and the indices of industrial proi:1uction shown vertially in des-
cending order from tht: top. 'The frequencies are in a diagonal band
rising fwm left bottom to right top. 'Ibis indicates that there is a high
degree of positi"e correlation between the two series. If in such a table
the frequencies are in a diagonal band which falls from left top to right
bottom it indicates a high degree of negative correlation. 1f there is
no trend in the frequencies either upward or downward it i(l an indica-
tion of absence of correlation or of a very low degree of correlation
between two series. It should be remembered that if the values on the
vertical scale are arranged in ascending order from the top. the conclu-
sion with regard to the direction of correlation would be reverse. In
such cases a diagonal band rising from left bottom to right top would
indicate negative correlatlon and a diagonal band sloping from left top
to right bottom would indicate PQsitive correlation.
Lag and lead :
When there is cause and eHect relationship between two series it
is not unlikely that there is a time lag between the changes in the values
of the subject and the relative. If. for example. it is established that III
'rise in prices is accompanied by an increase in supply it is quite possible
that the change in the supply may take place three months -or sis months
after the changes in prices. The difference in the period of change in the
values of the subject and the relative is called time lag. 1£ there is t.
time lag of, say, one year bet\Veen the price changes and the change in
the supply, it is essential that in the study of correlation the values of
prices should be paired with those values of supply series which are ob-
tained after one year from the change in price series. Thus the prices of
the year 1953 should be paired with the supplies of 1954 and the prices
of 1954 with the supply of 1955 and so on. The values of the relative
should always be lagged in 5uC'h a way that they can be compared with
the values of the subject. The underlying principle of allowing for
the lagging effect is, that the values J'aired should be such which would
give the highest value of the resulung coefficient of correlation. The
period oEIag can be estimated by plotting the two series on a graph paJ'er
and studying the period of peaks and troughs in the data. If there IS •
lag of one year between the series of the price and supply the price curve
will lead by one year and the supply' curve would lag by this period.
The peaks in the price curve would be one year earlier than the peaks
in the supply curve and similarly the troughs in the price curve would
be one year earlier th~ the troughs in the supply curve.
PIJ~DA"F.NTALS OP STATISTICS
·5° ·2.5
Qu.adODS
I. What i. meant by correlation? Doc. it alwa)'11 signify CAUse and el£ect
zelottionship between two Tanables ? (]II. C.",., RlljPllflll!l•• '19")'
z. Explain the meaning and ligoificance of the concept of ootrelatlon. How
will you calculate it from a statistical point of view) (ll/. C""., Agr(l, 19~5)·
,. What are the spechl chatact<!ris!ici of Karl,f~ars6n'.cOe[i1:ient of correlation
'V1hat arc the aalumption. which thi. formula i. baled Ot! ?
~. How would you calculate the coefficient of'correlation h~ the method of leut
4Iquares. &pJaip the underlying assumptionl of IUeb a correlation.
5. What is meant by correlation? Gi ..." the general rule. for Interpreting its
CQ.eHicient, (]II. C-., AIJj~ 19«).
COaRI!LATION 495
6. What is correlation? Explain how would YOD study correlation by-
(II) Graphs
(b) Correlation Table
(e) Karl Pearson'. coefficient of correlation.
7. Diacussthe problemlnvoh·ed in correlation analy.is in the cue oftimeKrle.
and state how they can be solved. (M• .A., .Ai11llNlHJ. 19,0).
S. Write .bort notes on-
(,,) Positive and negatf...e correlation.
(b) Line of the beat 6t.
(t) Lag and Lea"d.
(J) Correlation table.
(e) Coefficient of determination.
9. Prove that the Karl Pearson's coefficient of correlation cannot cxceed±t.
10. How would you calculate coefficient of correlation bctween:-
(II) Long time changes.
(b) Short time changes.
(I") Cyclical fluctuations of two serIes.
n. Compute the coefficient of correlation from the following data:-
1100-1000--900--400 1200 1400--600--1000
-3 600 35 00 %400 UOD-3600-UOO 1800 3000
n. The following data give the Index numbers of industrial preductlo!.
of Great Britain and the number of registered unemployed persona in the ..m~
country during the year 1914--31:-
Industries Numberoftegbtercd
Year Production UnemploJc~
(Index Numbet) (Hundred thouunds)
[924 XOO 1I·5
191' 10% IZ.O
19z6 104 14.0
1927 107 II.I
1928 105 U.,
1929 XI2 IZ.2
1930 103 19.1
193 1 94 z6.4
Calculate the coefficient of correlation between production and. the number of.
unemployed_ (B. eli,..• L.chN., 1944)
"";3. The following table give. the value of expOttl of raw cotton from India
and the value of the Imports of manufactured ,cotton goods Into India during the
yean %913-14 to 1931-3%:- "
(IncfOxesof .cupeei) Imports of manufactured
Year Exports of Raw Cotton Cotton Good.
[915--[4 4Z ,6
19 1 7--18 4~ 49
[9 1 9--20 S. SJ
1921--12 ss Sa
1925--24 89 6,
1929--30 9i 76
['131-32 66 Sl
Calculate the coefficient of correlatioh between the yalue of the expottl of raw
cotton and the value of the imports of cotton manufactured goods. (M. A •• Clllndlil
, 1937)' (B. Co,. •• N"IPIir. 1944).
496 FUNDAlaNTALS OF STATISTICS
14. Calculate the coetJi.cient of correlation between the COlt of living and the
weekly wage !lit" from the following data : -
Year Coatof living Index Index of weekly waae
Rate.
1920 1,1 In
192 1: no 120
1922 102 99
1925 101 9'
1924 103 101
192 5
192 6
100
100
'.1
10Z
192 7 96 100
19 21 9' 99
1:9%9
1930
9'
87
99
9'
1931 84 96
193 2 81
(M. A.•• ...J:W-,
1937).
1,. Calculate the coefficient of . correlation
given below : -
between the vaIuCl of!JC and"
!JC
78
19
u,
"
IH
97 1,6
69 :rn
'9
79
107
136
68 u3
61 108
(You may ule 69.1 working mean for:lt and that for.1)
liZ . .
(M. :A... Delbi. 19")'
16. Calculate the coefficient of correlation betweep infant mortality and
o'lCrerowding from the following datal-
i 109!
In£mt mortality
Percentage of
population ovcrcrwded
!
-1---1--
I
. 14.9· 6., ,.8
!
96
---- ---I
IU I 14Z
I
1,1 114 uS IOZ 109 1,6/ 122
12. Calculate the' coefficient of correlation between the ages of 100 mothers and
daughterS from the following data : -
eOf
moAtfera in
years S-IO 10-1,
Age of daughters
-----
1,-10
-10--1,
- - ----
----- - - - - ---- ----- -----
in years
1S-~0
T:;-I
Is-a, 6 3 ... ... ... 9
1'-35 ~ 16 10 ... ... 29
55-4' ... 10 IS, 7 ... 32:
......
II ,
4'-H ... 10 4 21
SS-6, ... ...7 4 9
Total 9 29 ,2 21 9 100
a,. The',following table gives the number of students IlavJng different heights
and ,weights.
Heights in
Weight in pounds
Total
-
inches
8Q--90 90--100 lOO-IIO nO-I 20 Uo--I30
,cr.--:-H I 3,. 7 , 2 18
'SS-6o 2 4 10 ...-'7 4 1,
60-6, I , 12 10 7
-----
, - "
-
6'--71
Total
...
4 If
Do you find any relation between height and weight ?
8
57 I 28
6 3
16 I
20
100
i .'
200-2 50
2,0-3 00
300- , , 0
4
3
2
,
4
6
2
4
8
'"
I,
,
2
,
II
14
21
,,0-400 I 4 6 10 al
Total 10 ,19 20 I 18 67
Is there any relationship betwecl!.1-ge and intelligence?
CB. (:0111 •• Agra. 1942).
2,. Calculate from the data reproduced below pertaining to 66 selected village
in Meerut District. the value of r. between "total cultivable area" and "the are
under wheat."
CORRELA'rION 499
Total Cultivable Area' (in Bighas)
Total
'"
----
l4
---~
..
29
'
II
I
e - . -_ _ _
8
-----
4 -
2
-----
66
3
1.8. Ptom the following table calculate the cocSicient of correlation between the
llg e. of fathers and S008 and eathnllte the probable euon of the tcsult obtained.
Age of Father. Age of SonI
Yean
-- %
1-'_- - - - - - - - - - - - - - - -
6 10 14 18 Z% %6 50 Total
S,-60 1 3 6 14
--- --
,0-"
----- -
8 10
-- 6
---
%
--' _._
1.6
~
4,-,0 13 8 4 2 27
- --- - - r - - -- --- --- - -----
+0-45 14 18
'S-40
--- ------
IS 20 8
3
------- "
--------
43
--- ---
30-35 6 12.%5 x6 59
2S-~0 IS 26 :0 I 6:
20-2, U- 10
--- ---
2-
<
34
Total 4~ 48
---
62 H 47 1.7 H ~ ,,00
29. Calculate the coefficient of conelation &om the following data by tbe
method of rank dUfercnce.
- n. 18. 95. 70 • 60. So. 81.. SO.
_7-J20. I,... 1,0. II" IIO. 140. 14.2., 100
~o. Calculate the coeBicient of rank cortclation of tbe ~lloWing data : -
_87, 22, ~3, 75, '37•
.)'-29. 63, S2, 46. 48.
31. Calculate the rank ooeIJiclent of correlation of the following data :-
~-80. 78, n. 75. 68. 67, 60, 59·
_7--I:r.. 13. 14, 14. 14. 16. 15. 17.
J2. The competiton m a beauty contest llre ranked by three Judges in the fonow-
ing order : -
~6. Compute the coef£lcient of correlation of the short time oscillations from toe
following ignoring decimals.
~7. ,The following table gives the wholesale price ihdex 'number8 for'Calcutta
and Karachi for the period 19:'7-41 : -
Year Calcutta Index Nos. Karachi Index Nos.
(base: July, 1914) (baae: July, 1914)
192.7 148 137
192.8 145 137
192.9 141 133
193 0 116 108
19;1 96 9~
1932. 91 99
19H 87 97
1934 89 96
19;5 91 99
193 6 91 10Z
1937 10Z 108
193 8 95 104
19'9 108 108
1940 IlIO u6
1941 139 lZ0
calculate the coefficient of correlation of the short time oscillation between the
above two l~dicC8 taking five-yearly moving average and ignoring decimals.
1940 145 uS
[94 1 1S3 2.0S
1942 186 19 6
194' 202. 177
2.07 168
1944
194~ 2.0 4 177
1946 19 8 170
1947 zoo 16,
1948 208 170
1949 23 2 17'
228 180
19~o
7 104 108
August 14 10 5 108
21 10.3 109
2.8 102. 10 9
I
7 107 108
Scpo 14 , 106 107
I 108
ZI lOS
t I 2.8 1I0 IIO
.'
Oct.
I
14
7
I
II,
,109
II 2.
107
2.1 U2 108
2.8 10 7
- 108
Number of
seeds per pod
Numbetof
6 I I
L:J----
7 8
0
9
28
~
10
10
II
2
Total
147
pods 7 7
Length of
pod In em.
I
---
Number of
poda
Total number of seeds
In each category
S 6 46
6 33 2~8
7 H
" 4~7
8 57 51 4
9 13 113
I
10 2 19
II 1 9
-,.--
Total 147 1,176
Calculate the co-efficient of correlatIon and Its probable error. You may use the
following results;-
L~ngth of rod
if! em.
Number 0 seeds pet pod
Means Variance
1. 18 79
0·9"4
(loA. S .. 19S 2).
504 FUNDAl4ENTALS OF STATISTICS
4~.
How do you find a coefficient of conelation when UBiy ElInk, arc known'?
Persons in the income group Rs. 20.000 to Re. 25.000 were lAked to supply
30
befo.tc a apei:.lfied date Return. of their annual income for some person connmed with
taxation. But when the dat.. arrived. only 2.0 Returns of total income .. Doted below
had been received in the follOWing order.
20.690 %.1.720 2.4,010 2.0,090 20,940 &1,5 10
20.340 220,42.0 2Z,180 2.2,600 2;2,94 0 23.080
u,840 23.,10 Z4.260 2;3.740 24,7 20 23.310
21,3 00 24,Ho
Do the data conflrm the belief that persons with bigger incomes delay sub-
mission of Retums more than the others? What is the measutc of confidence you can
aaaociate with your answer? You ma., assume that the square of the test criterion
t is equal to (_2). pI (I-pl). where p 11 the co-efficient of tllnk correlation and n the
number of oblervations on which the coefficient is based. (1. A. S •• 1953).
42. What is rank correlation and what is the purpose for which it is used? Ob.
tain the formula for the Spearman coefficient of rank correlation.
6~
P=I- n(nl-1:)
When " is the number of individuals ranked and d is the difference in the ranks
IIIIBigned to thc IIalDC individual.
Twelve pictures submitted in a competition were ranked by two judges with
results shown in thc table below : -
TABLE
Picture
lUnk assigned by
1\
5
B
9
c
D~E FG: H-:__2_,-~ rL-
7
I
3 4 12 2 II 10 8
first Judge
--------+--1--1--,-- ---ro--- _--
Rank assigned by 5 8 9 II I 2 10 4 IZ 7 6
second Judge
Calculate p. Is therc a lack of independence in thesc rankinga ?
(Assume that on the hypothesis of independence of two sets of " rankings,
t =p (";'2) i follows the I distribution with ,,-2; degrees of freedom).
Extracts from Statistical Tables
Table t : The normal distribution
The area under the standard normal curve between x=o and s=o." is 0.13 68
and between x=o and X=I.15 il 0.~746.
T able II : Th e Ch' square d'19UIibutlon
.
Degree ot Freedom I .z 3 4
Values of X. ai,nifi- •
cant at 5 per cent evel S·84 5·99 7.81 9·49
of probability
Table III : The I distributIon
Degrees of freedom 10 II
• 12;
- Values of t signilicant 2,2~ 2.10 2.18
at S Ber cent level of pro-
babi itv
(I. A. S., 1956).
CORRELATION 505
4;. Pollowlng Is a table showing height-weight frequency distribution. Com·
pute KuI-PcarIJOIl'. Coefficient oJ Correlation:
~-H~V~a-t~ia~b~le~(b~e~ig~b~t-.~In~ch~CI~)-----·--------
- - ' _ ' - - - ; ; ; - , -f - - - - ' - -
YVa.tiable ~o 6~ 64 66 68 70 72. 74
(weight, pounds) J~~
---I - J ,
~'o '~ I
-, %.2.0
no 5
4
1
4
~
3
1
2.00 2.
---
1 ,
f--
I 1
9 7
f..--
19{' I 3 8 16 ; S
do I S 8 IS I2. I
-
" 170 2. 8 18 2.6 8 I
160 19 40 20 ..
ISO S
1--,
1, 2.6 9 2.
140 I 4 6 S I
1,0 I ; I r
12.0
I--
t t
- ---I - -
4+ The data of the table given below are from a paper on "Wool in the World
Economy." Calculate coefficient of correlation.
4~. From the following data find out if there is any .relationship between density
of population and death tate.
a b
x y x y
100 400 21 58
200 600 17 56
300 700 u 64
4QO SOO 23 66
500 100 20 62
600 300 19 54
700 200 t8 60
50. Calculate Co-efficient of Correlation from the following data by (i) Karl
Pearson's Method (ii) SpearInao's Ranking Method and (iii) Concurr~nt Deviation
Method.
St. Mention the rules f~r interpretation of Karl Peal, ~ s Co-efficient of corre-
lation. What is the significance of the co-efiicient of correlatio:' _ r, for the follOWing
values based on the numbers of observations (II) 50 and (b) 500 : r '= .2••4. ·9.
COI\RELAnON 507
,Z. Given I Number of pairs of observations of" and y series ~ IS
Arithmetic average of:z: series
Standard Deviation of x series
Arithmetic average of y series
Standard Deviation of y series
Sum of products of deviations of ;z: and y series 122.
Find out (a) Co-efficient of Correlation between x and y, '(b) the probable ereor
of the Co-efficient. (M. A. Raj., 19( 3).
H. Given: r = .56, };xy = 60, oy ,,;, y. ~x2 = 90.
54. A student calculates tbe value of. r as +.7 when the value of N is 5, and
concludes that r is highly significant. Is he correct ?
5(>. The following table gives the results of two different intelUgence tests. Find
¢e co-efficient of correlation between them. \
48
100
30
80
12
Regular PI*yers 15 0 90
(T. D. C., II._".., Raj., 1962).
Regression and Ratio of
Variation 17
Meaning and lise. It has been discussed in earlier chapters that
when a line of the best fit is obtained for data, it gives the best possible
mean values of y for a given value of x (when x is the independent
variable andy the dependent variable). It is also possible to fit another
straight line to the data to obtain the best possible mean values of x for
given values of'y (assuming y as independent variable and x as the de-
pendent variable). If two such lines are plotted on a graph paper (one
which shows the best possible values of y for given values of x and
the other which shows the best possible values of x for given values of
y) a study of correlation can be made from them. If there is perfect
correlation between the two series both the lines would coincide, or in
other words, there would be only one line which would give the best
possible mean'values of y for given values of x and the best possible
mean values of x for given values of y. """-The farther are the ,lines from
each other the lesser 'is the correlation between the two series. If they
cut each other at right angles there is no cotrelation between them.
These lines are called regreuion lines.
We have seen in the chapter on analysis of time series that the
line of the best fit describes the change in a given series accompanying
a unit change in time. Toe regression lines describe the average rela-
tipnshlp between the 11/10 series. In fact there is no difference between
the lines of the ~st fit and the regression lines though the term "line of
the best fit" is generally used when x-series, relates to time andy-series
to the values of a variable. If both x andy series are variables the lines
of the best fit are known as lines of regression. The equations describ-
ing the regression lines are called regression equations. We shall dis-
cuss later on how regression eqqations are obtained. The use of
these termS dates back.to the time of early studies made by Francis
Galton. He made studies relating to the heights of fathers and sons and
found that the deviations in the mean heights of the sons from the mean
height of the race was less than the deviations in the mean height of the
fathers from the mean height of the race. When the fathers were above
the mean or below the mean the sons tended to go back or regress
towards the mean. Regression thus implies going back or returni~g.
Galton studied the average relationship between these two variables
graphically and called the line describing this relationship, the line of
regression. Since then. these 1:er_ms a~e in com?lon use.. Regression
lines thus study the average relationship between two varIables. They
throw light on the correlation between two series. If the coefficient
of correlation betwee.o the heights of fathers and sons is +.7 it mean.s
.REGRESSION AND RATIO OP VARIATION 509
tJiat if a group of fathers have heights which are more than average by
x inches their sons would have heights which would be more than
average by .1>: inches. This going back to the mean or average is called
regression.
Regression equations
Regression equations are algebraic expressions of the regression
lines. Since there are two regression lines there are two regressi9n
equations. Regression line of >: on y gives the best possible me~n
values of x for given values of'y and similarly the regression line of' y.
on x gives the best possible mean values of.1 for given values of x. As
such, regression equation of x on y would be used to describe the
variation in the values of :t(' for given changes in the values of .y, and
similarly the regression equation of y on >: would "e used to
describe the variation in the value of y for given changes in the values
of x. ]f the variations are studied from the respective means of the two
series the regression equations would be calculated as follows : -
Regression equation of x on .1.
(I) x=a+by
The above equation is of the same type as we studied in earlier
chapters In connection with the line of the best fit by the method of least
squares. This equation can also be written in terms of the coefficient
of correlation, standard deviations and the means of the two series.
Thus
- ax -
(it)x--x=r ay (J-.')
Where x'andj stand respectively'for the mean values of x andy
series and fiX and uy for their standard deviations and r for the coefficient
of correlation.
Regression equation of.1 on x
(i) y=a+bx
Or
(ii) y-y=r ay (x-x)
(1X
The symbols stand for the same things as in the previous case.
The' following example would illustrate the above fotmulae : -
Example 1. Calculate the regression equatioos fro· 1 the following
data -
x .Y
z
3
4
J
510 PUNDA!.reNTALS OF STAnS'llCS
This example was solved in the last chapter by the method of least
squares and we had obtained the computed values ofy by the equation
y=a+bx. The equation obtained was Y=lOO+ 34.%'. Now we shall
solve it by taking into account the values of the coefficient of correlation.
standard deviations and the meaOR of the two sc:ties. In the above
example the value of the coefficient of correlation or r is+.69 and tl;lC:
values of the standard deviations of x andy series or ax and ay are res-
pectively r.4 and 69.6. The values of the means of the two series are
respectively 3 and 2.02.. -
With the above figures regression of,,, on x would be
-
_J-J=r ay (x-x)
_._ -
(IX
69. 6
y-2.0'1. =.69 - - (X--3)
1.4
y-2.0Z.=~34 (X-3)
_)'-202.= 34X-102
.Y=34X+ 100
Thus_it will be noted that the regression, c;quation ofy on x ob.
waed by ,this me~hod is exactly the same as obtained by the equation
.1=.+.b~.' ,
The regressioa equation ·of x, on y )voul& l);
- ax
x-x=r --;ry(Y-:i')
X-3 =. 69 - 1·4
- (.,'
V -2ciZ
)
, 69.6 '
x-3=.or4 (Y-'202.)
x-~=.oI4y-Z.·82.S
X=.014Y+·17 2
Frotn the above two regression equations we can calculate the
most probable values of x for given values 0(1 and most probable values
of y for given values of x. ' ,
The 'regression e(Juation ofry on x is .1==:= 34X+ 100• , From this
we can compute the values of y for ,given values ,of _". Thus ~hen
-, x=I •.1=(34X 1)+100=1';4,
x=2..y=(34 X .z.)+ 100=168
x.:_ 3•.1=(34 X 3)+TOO=.z.02
.. X=4.y=(34 X4)+ Ioo=.z.;6
X=5,Y=(34X5)+loo=2.7°
Similarly, from the regression equation of x ony we can. calculate
most probable values of x for given values ofy. The regressIon equa-
tion of x ony is
aEGRESSION AND kATIO OF VARIATION 511
Thus when
Y=I66,x ·",,(.014X 166)+.I7Z=2.496
),=1114, x=-:(.0!4X 184)+.172=2.748
.1=142, X=(·':)14 x. 142)+.172..=2..160
)I=luo, X"=-(.oI4X 180)+.172=2.692
.1=338, X=(.oI4 X 33!!)+.I72=4'904
To plot tire regressi?n line of x ony we shall tltke tl;le actual va1?CS
of'y and the computed 'values of x and similarly to plot the regresSlO n
line of J on x we shall take the a~al values o~ x and computed values
ofy. .
Thtis for. the regressjon line of x on .J the data would be
.l 166 1~4. 142 180 33 8
Reffessioll Lints
" X'
y,
·014y .. ,172
34" + 100
I I/~VI
4
I 1 ±o ,
,
~!
V
3 ..... . _-_ . - -... ~
It
2
1B
/
l1( ,,
V /. ,,,
:
, '/ /
,,
,- .
0
t!
, '
- "'
.
.., ~~
.
50
/
/ 0 f 0
~R
,zoo
-:
,
y
,,
.
; 0 30t) 350 4t
"
Fig., .I
, '~ro~' the ibove.~eg;~~~jon ..f.iD;es .we .CJ.n 'find ~ut the' valu~ of"
of giv~_.val~e;>,of.Y and the values ofy,for giVC;n values, of~· :thus,
512 FUNDAMENTALS OF STATISTICS
if we have to find out the value ofy when the value of x is 1.5 we shall
use the ~egression line ofy on .'tt:. From the point I.5 on the vertical
scale, a line parallel to the base, would be drawn touching the regression
line ofy on x at point P. From this point a perpendicular towards the
ba~e Hne wou~d be drawn touching it at point R. The value at point
R IS 151. Thls would be the value ofy when the value of x is 1.5. In
the same way "fecan find out..the most grobable value of x for a given
value ofy. Here we shall take ~nto account th(" regt'ession lioe" of x on.y.
Suppose 'We have to find out'the most probable value of x when y is
200, We shall draw a perpendicular from the base at the point where
the value is 200. This perpendicular touches the regression line of x
on y at point A. From this point a line parallel to the base has been
drawn touching the vartical scale at B. The value at this point is 2.972.
This is the most probable value of x when_' is 2.00.
These values can also be calculated from the regression t.quations.
Thus, if we have to calculate the value of,1 when x is 1.5 we shall use
the regression equation ofy on x. It is
Y=34X+ ZOO
If x is 1.$ the value of.J would be
(34 X 1.5)+ 100
=15 1
Similarly. if the value of ~ is to be. calculated when t~e value of
1 is zoo, we shall use the regressIon equation of x ony. It IS
x-=.oI4Y+·I7 2
If.J is zoo the value of x would be
(.014 X 200)+ .172
=2.97 2
Thus we notice that the computed values of x and y for given
values.7 and x as obtained by regression equations are the same as obtain-
ed by the lines o£ regression graphically.
Regression equation and regre8sion coefficients
Regression equations are expressed in teans of mC2n values of
the two sedell and indicate the variation in one series from its mean as
compared to a variation from the mea1\ of ti.c other series. Regression
coefficient gives the value ·by which ~ne variable changes for a unit
change in the other ~ble. Just as t;here are two regression c"ua-
tions similarly there are two regression coefficients. The l'egresslon
coefficient of :Ie on'y would indicate the value by which x variable
would change as compared to a unit change in the value or_,-variable.
Similarly the regression coefficient of.J on x 'would indicate the value
REGRESSION AND RATIO OF VARIATION 513
RegressIon ·coefl1cien t of x on y =- r ax
qv
R •
I,egresslon coe ffi'
Cleot 0
f y on x=r ay
a",.;
• 01 4
r,,6: j bX,YXbYX
Thus tbe roeffirien! of correia/ion is the square root oj the product of the
two regression coefficients. It can also be said that the. coefficient of
correlation is equal to the geometric mean of the two regression
coefficients. In the last example the value of bx_y was .014 and of byx 34·
Therefore,
.014X 34
33
514 FUNDAMENTALS UF STATISTICS
=n ax ay x-y(1
1: dxqy .
= n oy2
1:.dxdy
E, dxd v T"d:>:d v
= l: "Ji2 l:,dy2'-
nX n
1: dxdy
Similarly hxy = :£ dx 2
Where .E dxtfy stands for the sum of the ptoducts of the deviatiom
of two series from their respective means, and dx and dv for the deviations
of x and y series from their respective means. . \
The following example would illustrate the above procedure :--
Exa11lple 2.. Calculate the regression coefficient from the following
data : -
x y
--------
I 166
2. 18 4
3 142
4 180
5------ 33 8
Solution Calculation of Regrus;on Coef!idents
x SerIes I y Serlc;s
IDeviations Deviations
1111 from mean (dx 2) mJ from mean (dy2) dxdy
(dx) Ctfy)
-Z 4 166 -3 6 IZ96 7z
-I IR4 -18 } 2.4 18
3 0 0 142 -60 i 3600 0
4
5 2
I
4 33 8
ISO -22
+13 6
I18494 64
8 -22
27 2
--------- ---- ---- -I--
Total 10 12 42.00 ~40
REGRESSION AND RATIO OF VARIATION 515
The value of
bx:- "Y -_T.dxdy
T.elj2 _- ~ 24 200
The value of
b)'x= "i:.dx4J = 34 0
,T.dx"" 10
=34
It should be noted that the same values were obtained for bxy and
byx when the coefficient of correlation and standard deviation of the
two series were actually calculated.
If instead of actual arithmetic average the assumed arith01etic
average are used, the formula would be
ax
bxy=r--
ay
~dxdy-n (~) (T.;Y) j "i:.:x
2
(.'l::x y
=--- ----,------- - X ------,--
T.dxdy-n (~x) ( ~v )
-=-------------
n X {~dy2 _ (T.;'Y y}
T.dXdy_n(T.:x) (T.;V Y
==------------------ 2
»{v 2-n ("5:.;Y )
Similarly
x-Series y-Series
----
Deviations Deviations
from ftom assum-
assumed ed average
average (z) (zoo)
"'1 (dx) (dx S ) 1112 (41) dy2
Total S 15 10 24 2 21,) 35 0
J
1 Y
(10 2.42.2.0-2.0 2-42.00
2.42.20-j
=.0)4
r,dx4J--n(-;;-
l)iX) ( >=41 ) -11-
Oyx
y_dXJ-fi(--;;-
l:Jx)'
HO-,(+) (-~~)
3~0-10 340
--=-
15-' (+-y 15-~ 10
=~4
R.E'.G~ESSION AND RA.'l'IO OF VAR.IATlON 51i
RATIO OF VARIATION
Galton's Graph
Francis, Galton who firs! used th~ term regression in his studies
relating to the mean height of fathers and sons has given a graph to study
the ratio of variation between the two series. This graph is known
after his name as Galton's Graph. This Graph is very use(uI when the
variations in the series are irregular as is generally the case in series reo
lating to economic data.
60
2 100
; 90
4 12 5
:5 90
6 6S
520 PUNDAM:F ...·TALS OF S'l"ATISTICS
80
7 60
II. liS 70
lOa
9 Ijo
10 ljO 13 0
II 160 16 5
12 140 IIO
13 LZO 13°
14 100 110
15 So 9°
16 70
65
- - - - - - - - - _..
The above indices are plotted in the following graph
-60 .-.---~ -
120
'>
.~
~
"I
:::::.
-,
.::.: 00
'..... 8
""=-<
40
O'i-_ _ _- J . _
-#1 ao /20 160
y - $t'r/t'~ (J'l'h/ivr)
Fig. 2. Gallon's Graph Sh01JJing Ratio of Variation
In the above graph x-series has lit(en taken as the subject and
J-series as the relative, because the standard deviation of the indices
of -",-series is more than the standard deviation of the indices ofy-series.
The subject series is on the vertical scale and the relative s'eries on the
horizontal scale. The line which has been drawn passes through the
averages of the two series. The average of both the series is 100. The
number of points on either side of the line is almost equal and they are
also equi-distant from the line of regression. To .6nd out the ratio of
variation We have to find out the tangent of the angle betweeny and x.
for this, any point may be taken on the vertical scale and from it a line
should be drawn parallel to the base touching the line of regression
REGRRSSWN AND RATIO OF VARIA'rION 521
(C) to the ordinate axis from where the line begins CB) divided by the
distance between B and the point at which the line of regression touches
the vertical scale (A) gives the ratio of variation. In other words, the
ratio of variation is equal to the tangent of the angle BAC or is equal to
BCfBA.
I 79 49
z 62 40
3 33 25
", 6
55
46
62
31
H
34
7 31 34
8 34 2.8
Average 49 35
7. Calculate Karl Pearson's coefficient ot correlatIon and the regteSslon equation.
from the following data : -
Age of Husband Age of Wife
III 17
19 17
20 18
21 18
22 18
23
24
2.5 20
2.6 ZI
2.7 22
(Allahabad, M. Com., 1951)
8. Write down the two regression equations ~hat may be associated with the
following pair& of .values :
(x) 152 114 138 154 144 15; 141 I17 136 154
(1) 193 300 414 594 676 549 4 83 481 659
(I. A. S., 1951).
9. The following marks have been obtained by a class of students in statistics
(out of 100).
Paper I 80, 45, 55, ,6, 58, 60, 6" 68, 70, 75, 8,.
Paperll 82.. ,6, 50, 48, 60, 6z, 64, 6" 70, 74, 90.
Compute the coefficient of correlation for the above data. Find the lines of
regression and examine the relationshlj>. (Indian Alldil ana A&d/.
Examination, 1945).
10. Vital Statistics of the U. P. (in thousands)
Dysentery
Year Fever Respiratory and Others Total
diseases diarrhoea
19~1 10Z.5 37 10 .ub 1300
1932. 8H 34 13 176 10 7 6
1933 69 8 35 12 160 90 5
1934 97 0 47 18 260 12 95
Find out 'r' of the deaths from the fevers and lOtal deaths given above. Cal.
culate standard error of this coefficient and the line of regression - of the death from
fevers on total deaths. (M. A •• Ag,... , 1937).
524 FUNDAMENTALS OF Sl'~TISTICS
n. Given the f'~llowing values of' Arithmetic Mean, Standard Deviatlon and
~dcnt of Correlation of 240 sets of values; find the regression equation of:J< in
(tems of ,:<., and.:l. •
.. ., .. Maths. =47.6
Standard deviation of marks in Englisb = 10.8
.. t, t, .. .. Maths = 16,9
r between marks in English and Maths. =+0.42.
Form the two lines of regression and explain why there are two equations of
regression. Calculate the expected average marks in Maths. of c:andidates who received
~o marks in English. (U. P. C. S., 1941).
16. In the following table are recorded data shOWing the test scores made by
salesmen 00 an intelligence test and their weekly sales : -
SalCSlIlllQ 2 ~ 4 S 6 7 8 9 10
l'e6.t Scorea 70 So 60 80 SO 90 40 60 60
Sales (000) &·5 6.0 4'5 ,.04.5 2.0 S.,
5.0 4.'
REGitl'-SSION AND RATIO OF VARIATION 525
Calculate the regression line of sales on test score, and estimate the most probable
weekly sales volume if a salesman makes a score of 70. What will be sampling error
of your c:stimate ? (T. A. S., 1948).
17. Given
Find out : -
(a}.1 (b) ,
18. In a partially destroyed laboratory record of an analysis of cortelation data
the follOWing results only are legible:-
Variance of tK- 9
Regression Equations : -
8,..-1°.1+66=0
40,..-18.1 -=214.
What were (a) the mean values of,.. and.1, (b) the standard deviation ot-" and (&)
the coefficient of correlation between,.. and.1? (1. A. S., 1947).
19· Explain what is meant by a scatter diagram and line of regression. Why
should there be, in general, two lines of regression for each bivariate distribution?
You are givcn the follOWing results for the heights (x) and weights (1) of 1,000
policemen of U. P:
,..= 68'00 ill., .'1= 1 50-00 lb. '=+0.60
ax=250 in., o-,=2.o·oolb.
'" Estimate from the above data (0) the height of a particular policeman whose
'weight is 200 lb.,(b) the weight ofa particular policeman who is 5 ft. tall (P. C.S., I9B).
2.0. (a) Explain the concepts of correlation and regression.
(h) How is a regression equation, such as that of the timber volume of 11 tree on
its height and girth measurements, useful in predicting the total timber volume of
standing tre'es in a plan tation ?
(&) The correlation between marriage rate and the value of industrial exports
over a number of years is of the order of 0.95 for a certain country. Recalling the
purpose for which the correlation coefficient was introduced, what conclusions can you
draw about the association between marriage rate and ihdustrial production?
(1. A. S., 19S7).
21. Suppose a school class has an examination at the beginning and at the end
of the school year: What is meant by"'regression of final grades on beginn ing grades"
and by '<regression of beginning grades on final grades"? Which of these would be
more useful in practice? Might these regressions coincide?
u. 18 the regression in the popu~tion always a straight line? If [lot, give an
example of a population where it fa not?
&3. Explain the dilference between regression and "correlation" problema.
Clln a correlation problem al60 be a regression problem ? Can a regression problem
also be a cortelation problem?
a~. Label the following examplts as regression or correlation type problems.
It is desired to study:
(a) The conne~tion between 1. Q. and weight of Is-year-old girls.
. (b) The connection between' the velocity of ,the Gange8 River and its depth at
vanous points.
. (;) The connection between the amount of winter snow and the batley yield
to r some locality.
526 FUNDAMENTALS OF STATISTICS
~l
Year: 195 2 B 54 55 56 57 58 59 60 61 61.
Sales 72 84 66 60 48 42 54 63 51 57 69
(Lakhs Rs.)
Profits
(Lakhs Rs.) 42 P 48 46 30 2:1! 36 3 8 34 42 44 40
27· Plot a Galton graph from the following Table and show the Ratio of
Varation.
Year
Subject Series
Relative Series
, :toI
I 160
I 2
30
170
f 503
180
I 4\5\1:>\71
100
190
100 12.0
ZOO
'150
Z10 2.2.0
8
160
:t~o
1]0
2.40
9
3x = y, 8y = 6x and ax = 4.
34. Find out the following : -
(i) Co-efficient of conelation,
(ii) The two regression equations,
liii) Most likely value of x when y is 34,
(iv) Most likely value of y when x is 47.
(II) The regression Co-efficients.
Subject Series: 48 50 53 49 53 49
Relative series ; 36 32 33 38 35 30
3'5" Explain the circumstances when co-efficient of correlation would be more
than unity.
36. The two regression lines between height (x) in inches and weight (y) in
Ibs. of male students are : -
4Y - 15 X i- 530 = 0
20X - 3Y - 975 = 0
Find the mean height and weight of the group and r. Also estimate the weight
when the boy has height 80 inches and hc:;ight whe&weight is 167.5 Ibs.
37. For 30 students of a class the regression equation of marks in Htstory(x)
on the marks in Geography (y) is : - 3Y - 5X 100 = + o.
The mean marks in Geography is 40 and the variance of marks inJiistory is
4_ of the variance of marks in Geography. Find the mean marks in History and the
9
Co-efficient of correlation between marks in the two subjects.
38. Comment on the following : -
The co-efficient of correlation between x and Y and x and z is the same. Hence
the rate of increase of x with respect to y is"the same.
39. The values obtained in measurement of charactcJ;:s x and y on each 35 indivi·
duals led to the following table : -
Mid-point of x : 10 15 20 25
Frequencies: 4 6 8 10 7
Average of y : 3 4 6 8 9
Obtain the linear regression equation of y on x. C~n you determine from
rhis data regression equation of x on y? If not why ?
Theory of Attributes and
Consistence of Data 18
Meaning ~
It has been said in an earlier chapter that statistics deal with quan-
titative data alone. Quantitative data may arise in any of the following
two ways : -
(a) In the first place an investigator may measure the actual mag-
nitude of some variable-height or weight of a group of individuals,
their income or expendittlCe, marks obtained by a group of students or
the number of labourers getting a particular amount as wage etc. In
all these cases the data are such that a faidy accurate quantitative mea-
surement is poss ible. In all the previous chapters we have discussed
various statistical methods applicable to such type of data which are
known as .flatiItie.f "f tJariable.r. Measures of central tendency, measures
of dispersion and skewness and correlation are some of the important
statistical methods used in the analysis of sucp. data.
(b) In the second place data might be such that it may not be
poss ible for an investigator to measure their ma~nitude. In such
cases the observer can only study the presence or absence of a parti-
cular quality in a group of individuals. Examples of such phenomena
are blindness, insanity, deaf-mutism, sickness, honesty, extravagance
etc. In such cases an observer cannot measure the magnitude of the
data ; for example, he cannot measure the extent of blindness or honesty
in quantitative form. All he can do is to count the number of persons
who are blind or who are honest. He has to take this decision on the
bas is of some standard definition of the term in question. Such data
in which the quantitative measurement of the magnitude is not possible
and in which only the presence or absence of an attribute can be studied
are called statistics of attributes.
In the present and the succeeding chapters ·we shall be discussing
some general aspects of the theory applicable to statistics of variables.
It should be noted that the methods of statistical analysis applicable
to statistics of variables can be used to a certain extent in the analysis
of statistics of attributes also. For example, the presence or absence
of attributes may be treated as changes in the values of variable (which
has only two values).
Classification of data
In the ana lysis of statistics relating to attributes, the first thing
is the classification of data. Here data are classified on the basis of
THEORY OF ATTRIBUTES AND CONSISTENCB OF DAT" 529
Now, if one more att:J;ibute B is taken into account the first order
classes, i. e., A and tI can each be divided into two classes-one in which
I!ttribute ,B is present aqd the other in which it is not prescnt.
Thus
(A)o=( AB )+C AP)
(a)=( aB )+( ap)
Similarly
(B)=( AB )+( aB)
lfJ)=( AfJ )+( ap)
If there is a third attribute C also, then each of the above second
order class frequency can be divided into two classes-one in which Cis
present and the other in which C is not present. Thus
, (AB)-,-( ABC)+( ABY )
(AB)=(A{JC)+C ABy)
(a B) = ( aBC )+( aBy )
(afJ)=(afJC)+(aBY)
From the above, it is clear that if higher order frequencies are given
we can find out the values of the lower '.Jrder frequencies without much
difficulty.
Thus
( ABC )+( ABy )=( .riB)
(ABC)+C APC)=(AC)
(ABC)+(aBC) =(BC)
( A~C )+( Af3Y ) =( Aft )
The frequencies of the other classes can be similarly calculated.
From the above we can easily conclude that
(A) =(AB)+CAP)
=( ABC )+( ABy )+( APC)+( AP,,)
The following example!1 would clarify the above rules.
Example l. Given the following ultimate class frequencies, fi1• .:1
the frequencies of the positive and negative. classes and the total number
of observations : -
CAB) =250 CAP) =120
(aB) =200 (ap) =70
SDIJlH()fJ
N=(A)+(a)
=(A B)+ (AP)+ (aB)+ (af3)
=250+120+ 200+70
=64 0
(A) =(AB)+(AtJ)
=25°+12::)
=~7:::
532 I't.JNDAMENTALS OF STATISTICS
(B)=(AB)+(oB)
=250+ 200
=45 0
(a) = (a B) + (ap)
=200+7°
=270
CP)=(Ap)+(a/3)
=120+70
=19 0
When only two attributes are involved the class frequencies can
be very easily found out by a table of the following type. In this table
the given data can be filled in the relevant cells and frequencies of the
blank cells. can be easily found out.
A II
(All) (all) (li)
B
Z5° zoo ~
450
---- ----- ----
(A{J) (af!) 001
{J
120 70 190
- - - - - - ----
370 270 640
Sollltio"
Here we have to tind out the frequencies of the following
classes : -
(a) ; (/3) ; CAlf) ; (aB) ; (all) •
Now
(a)=N-(A) =100-~0=SO
(/3)=N-(B) =100-40 =60
(AfJ) =(A)-(AB) = 50-;P= 2.0
(aB) = (B)-(AB) =40-30 =10
(.1J1)=(a)-(aB)
=N-(A)-(B)+(AB)
=100-5 0 - 40 +3 0
=40
SO 100
"
*Proof
(ap)=(a)-(aB)
=N-CA)-{(B)-(AB) }
=N-(A)-(B)+(AB)
-(AB)=N-(A)-(B)-(ap)
(AB) =-N+(A)+ (B)+(ap) =(A) + (B)-N + (ap)
Now if (AB) is less than (A)+ (B) -N it is obvious that (tlfJ),
would be negative
·*Pro(;f
(Ap,,) =(Ap)-(AfJC)
=(A)-(AB)-(AC) + (ABC)
-(ABC) =(A)-(AB)-(AC)-(AP')')
(ABC) =(AB)+(AC)-(A)+ (AP'Y )
Now if (ABC» is less than (AB)+(AC)-(A) 1t is clear that (A{Jy)
would be negative.
Similarly other rules can be proved .
•• ,. ProlJ.!
CaP')') = (ap)--{afJC)
=(a)-(aB}-{fJC) +(AfJC)
+
=N-(A)-(B) (AB)-(C)+(DC) + (AC)-(ABC)
(ABC) = (AB)+ (AC) + (BC)-(A)-(BHC)+N-(ap'Y)
Now if (ABC) is more than (AB) + (AC)+ (BC)-(A)-(B)-(C) + N
it is clear that (ap')') would be negative.
FUNDAMENTALS OF STATIS1'ICS
Thus AB has got a negative value. Hence the given data are
inconsistent as it is obvious that no class-frequency occurring by count-
ing real attributes can be negative.
ExampJe 6. A labour welfare officer returns the following number
o£ workers observed with certain classes of defects amongst a number of
factory workers. A denotes development defects and B, low nutrition.
N=6oo (A)=25o(aB)=4oo(A,B)=zoo
Do you find tbe data consistent ?
Solution
From the given data,
N=600
(A)=z5 0
(AB)=(A)-(A,B)=z5 0 -.200=5 0
(B)=(AB)+(aB)=5 0 +4 00 =45 0
Now, according to the conditions of consistence
(AB)~(A)+(B)-N
or (AB) ~ 2.50+410-600
or (AB) ~ 100
But the value of AB is 50 which is less than 100. Hence the given
data are inconsistent.
ExampJe 7. A market investigator returns the following data of
10000 people consulted
8110 liked chocolates,
75 zo liked toffee,
4180 liked boiled sweets,
5700 liked chocolates and toffee,
3500 liked chocolates and boiled sweets, and 3480 liked toffee and
boiled sweets,.
2970 liked all the three.
Show That this information as it stands must be incorrect.
Solution
J:?enot!ng
persons wbo liked chocolates by A
persons wh? liked toffee l-,y B
and persons who liked boiled sweets by C
tbe given data are :
(A)=8 TIO (AC)::..-= 35 00
(B)=75 zo (BC)=H80
540 FUNDAMENTALS OF STATISTICS
(C)=4IS0 (ABC)=z97°
(AB)=S700 N=IOOOO
Now, according to the conditions of consistence,
(ABC» (AB)+(AC)+ (BC)-(A)-CB)-Cc) +N.
Substituting the given values, we get,
(ABC» 5700+ 35°°+3480-8110-7520-418°+10000
. or (ABC» 1870
So ABC can be either equal to or less than 2870. But as per the
data given it is 2970. Hence the data are incorrect.
Examplt R. If in a village actually involved by anthrax, 70 per
cent of the goats are attacked and 85 per cem: have been inoculated with
vaccine; what is the lowest percentage of the inoculated that must have
been attacked ?
Solution
Denoting
the attribute of attack by A
and the attribute of inoculation by B,
the given data are
(A)=70 (B)=85 N =100
We have to find the lowest percentage of ,AB.
Now, according to the conditions' of consistence,
(AB)c(o, and I
(AB) «(A) + CB)-N, i. e., 7°+85-100=55
Hence the value of AB cannot be less than 5 5 •
Hence the lowest percentage of the inoculated that must have been
attacked, is
AB 100= 85 x 100 ) = 65 per cent
(B
J)
X
·Proof
(AB) «A)+CB)-N (proof'of it has already been given.)
Applying the above inequality to the universe of C, we get,
(ABC) < (AC)+(BC)-Cq
<(A)+(C)-N+(B)+(C)-N-(C)
«A)+(B)+(C)-zN
Applying the above inequality to the universe of D, we get,
(ABCD)«AD)+(BD)+CCD)-2(D)
«A)+(D)-N+(B)+~D)-N+(c)+CD)-N-2.(D)
«A)+CB)+(C)+(D)-3N
:. (ABCO)... .(M)=[CA)+CB) +(C)+CD)+ ......
(M)](n-I)-N
542 FUNDAMliNTALS OF STATISTICS
Hence the least possible value ofeABeD) is 10 and since the value
of n has been assumed as 100, IO~/~ at least of the combatants must have
'ost an eye, an ear, a leg and an arm.
Questions
I. Given the following frequencies of the positive classes, find the frequencies
or the ultimat~ classes. •
(A)=80, (B)=100. (AB) = 70, N=Z50
2. A number of labourers in a factory. were examined for the presence or
absence of certain defects of which three chief desctiptions were hated.
A-Physical weakness
B-Nerve signs
C-Mental dullness
Given the following ultimate frequencies, find the frequencies of the positive
classes including the whole number of observations N : -
(ABC)=75 (aBC)=98
(ABY)=:I 10 (oB;,)= 702
(AfJC)= 106 (a,8C)= 74
(A{JY)=4 89 (a{J"I)=4 81 5
3. Given the following frequenciee find out the frequencies of the positiye
and the ultimate classes.
N= 29,002,5 2 5 (ABy)= 82
(A)= 23.4 6 7 (AfjC)=3 80
(B)= 14,19 2 (aBC)=5 0o
(C)= 97.3 8 3 (ABC)= 25
4. Measurements are made on a thousand husbands and a thousand wi ves.
If the measurements of the husbands exceed the measurements of the wives in 789
cases for one measurement, in 741 cases for another and in 690 cases for both mea·
surements, in how many cases will both measurements on the wife exceed the ml"lI-
surement on the husband?
5. In a Girls' High School there were zoo students. Their results in the
quarterly, half-yearly and the annual examination were as follows:
80 passed the quarterly examination,
75 passed the half-yearly examination,
and 96 passed the annual examination.
'l'HEORY OF A"D'l'RIBU'l'ES AND CONSIS'lffiN(';E 0 F DATA 543
(A) X (B)
N .
From the above it is clear that ordinarily if attributes A and
B are independent the expected frequency of (AB) would be.e·qual to
(A) X (B)
~--.
Criterion of independence
If there is no kind of relationship between the attributes A and
B we may expect to find the same proportion of A's in B's as in {J°s.
lIn other words, attribute A must be equally popular in IVs and in not
B's. If, for example, blindness and 'deafness are not associated, the\
proportion of blind people amongst the deafs and lI-mongst the not-
deafs must be equal. If, however, it is found that the proportion of
blind people amongst the deafs is more than their proportion amongst
not-deafs, it indicates that blindness and deafness ha.ve some association.
Two attributes A and B are said to be independent if the
observed frequency of (AB) is equal its expected frequency, i. e.,
(A) X (B)
N
Example I. In a population of zoo students the number of
ma.rried is So. Out of 60 students who failed, 24 belonged to the mar-
ried group. It is required to find out whether the attributes of marriage
and failure are ind(.'pendent.
Suppose the attributes of marriage are represented by A and failure
by B. The actual value of (AB) = 24
(A) X (B) Sox60
Expected value of (AB) = N = 200
=z4
The actual value of (AB) and its ~xpected value are equal. It
means that attributes A and B are independent. We can look at this
problem from another point of view also. The percentage of married
students who failed is ,
=3 0
failure amongsi the ~married students is also 30. The two attribute.
a" thus ind~dent of each other.
Yet another way of looking at the above problem is to find out
whether", -and If ate associated. In other words. whether bachelorhood
is asodated with su~ss in examinati9n. The .actual value of (41) in
the above illustrations
=(IHaB)
=N-(AHB)+(AB)
=%00-80--60+14
=84
The value of (..,) =N-(.A)=.z~-80
==tao
The value of (/J) =N-(B)==aoo-60
=&40
The Cltpected value of 11/1
_(II)X(JJ) =~XI40
N . too
==84
Thus we find that II and fJ arc also independent of each other.
We are now in a position to lay down the folIo,," ing general rules
for decid.ing w,bcthcr any two attributes arc: independent or not. Thest
rules a"· ..tis6ed by substituting the ligures given in the above esample.
In the above eumple the aCtual values of (AB), (ap) (Af1)•. apd
C.B) are respectivdy 24. 8"" 56 and 36.
Attributes A and B arc independent. if
(I) (AB) == \~ X (B) 14 == ~_)ij!60
N 200
{HI)
I
(.m)
.'P,...j
(AB) X (11/1) _CA).; (B)x ~II)-N (/I)
failed whi~ is J :! or 3~%'_ ~ but 15- also equal to the proportion 'of
'Ibu5 we can conclude that if A and B are independent. the nine ceJ)
table would have the following form : -
A a
Associatiotl
In statistics the word a!>soclatlOn has a technical meaning. Ip
corr:mon parlance if A and Bare fouIld together in a large number of
cases they are said to be associated, But in statistics they cannot be said
to be associated until they are found together in a larger number of
cases than 1S expected if they are independent, Thus even if A and Bare
found together in a very large number of cases they will not be said t
be associated unless this number is greater than the figure expected
when the attributes A and B are independent. This point should always
be kept in mind while drawing inferences from statistical d.ata relating
to attributes. _
A and B would be associated if they are not independent. Or,
in other words, if CAB) is not ("qual to (A) X (B).
--l\r-·
It
(AB);> (A)XC13)
N
A and B are said to be porilively associated or simp" associate,t.
If on the other hand
(AB) ~ (A) x(B)
N
A and B are said to be negatilJe{y aucciated or Jitnply'disassociated.
It ShOllld be remembered that disassociation does not mean absence
of association. It means presence of negative association. _ -
ExOfliplf ~
Given
(A)=4 0 (B)=30 (AB)=.2o N=IOO
Study the association between A and B ; a and f3 ; A and f3 and II
and B.
We can represent the given data in the shape of a table and obtain
the frequencies of the missing classes :_
A a
B CAB) CaB) (B)
20 10 30
"-
N (A) (0) N
40 60 100
552 P'UN1)A!I'i'.NTALS OF STA'l'ISTlCS
The 'above values are those which h2ve been observed. We caD
DOW calculate expected frequencies,. The expected frequency or
(AB)_(A) x (B) .... 40 x'so == u
N I()(\
t_tl)=,II)X(Jl)
V"~ N
=- '0 X70 ==4'
100
( AR'I_(A)X®_40
X 70 = as
~""" 100 100
attributes. We know that (AB) c:annot be: less than (A)+(B)-N. 1=bull
if (A)+(B}-N is more tban o. (AB) would also be more than 0 If
(AB) is just equal to (A)+(B)-N. there would be complete negatIve
asaociation between A and B. In such cases the value of (1ltJ) only.
would be o. For example, if the value of (A)=so and of. (B) 60 and
N 100. the value of (A)+(a)-N would be ,0+60-100 or 10. If the
value of (AB) is 10, the value of (All) would be 40, of (uB) so and (all) o.
Thus we conclude that if value bf (AD) Qr (alJ) ,is 0 there is complete
disassociation between ~ two attribute's.
Intensity q£ aaaociat!.Oft , , ' ,
In actual practice in C'lQSt of the ,cases tJte value of (AB) would lie
somewhere betw~ tbe two . limits---oone laid down by the, espected
value and t1ie other'~d dOwn by'the :va1ul!:[ .cSpcc:tcd in ~r£ect 'positive
Dr ne~ativc association~ 'The ~tensity of asaoc:iation is'indicated by the
alClit to. which tb~ obsetved ~uc. o£ (AD), dcvia_ from its apcctcd
value towaidS. the linut '~,,~c.Ct ·...u~tiOn. We shan disC:USS'the
quantitative '~ea6urcment of.~~'~Unslty of aaociation or disusociation
'I_little later.' ,.'., " ",' ' " "
Chance a.sOciati_' " ' , ."~", ' , '
At this s~ge it iJ" .il.e~~ to point Out that if the value of. (AB)
N
'is fouod tO,be grea~t ~ ,the~~~;<>~ (~~ (B) it ahoul4 not be, at O!Jcc
co~cl\ld~d that there is ~:,~~t$n ~eeII the ~o attribut;es.
It 11 qlJ1te possib~. ~11W~ ~ ,~ce, ~~ obacm:d
and, the ezpect~ TSfaa ~...Ot', ;~"ch.':~' ,o,~ ~oc:iation may be the
result or UlDpling ftUct\la~ the ti.\lC a~oaaUOD may be zero.' As
such. unJeu thc ~ ,,' ';thc,o~cd aQd the Cspcci:ed 'v~~cs
is very significant we sb, ' ,not ~d~dc tbat'tbete .ii' any a..ociatiOD
,or disass~tiOD ~ccn ~ 'tWo ~..Urlb~tes.. The cp:ation which n_a-
tunlly anaes here ~ b~w mu~ divergence betWeen the obsaftd and
IICIDal values can be term~d as sigstificant. We shall discuss tb:i& qu~
tiOD in details in the chapters o~ S~q. This point has b=n aiscd
here o~Y to warn ~ student pf thi$ j~ubjcct against drawing ,hasty
COD.clUS10n, "
Coefficient of".lOCf8dot.. '
S() far we. bave-dis(U*d,th_i a 'rough, idea about' the ~tCDt.of asso-
ciatiOtt or disassociation bctw~ 'two attribUtes can be had' by finding
out the extent of'the difference between their obse~ed and't;Xpccted fre-
~endes. Fot prac:tical,'~es it is !'!DOugI'! to take I!l, qedsion ,..bout
whether the two attributes.in qucstic5n. ue associated. disassOciated or in-
dependent. But in'some cases the qHferenee bet\Veen observed .and ex-
pected frequencies may be due to what ~e called'fluctuations of sanipling.
Under such circumstllnces it becomes neccssuy t6 obtain an idea about the
extent to which the diffcrence between the observed and expected fre~
q\lcncij:s can be due to chance fluctuations. We shall discuss these
tests and their signUicance in a separate chapter entitled Sampling of
Attributes. Por the present we shall disogsl the possibility of obtain-
554 FUNDAMENTALS OF STATISTICS
ing a coefficient of association which can give some idea about the
extent of association between two attributes. It would be convecient,
if the coefficient of association is such that its value is 0 when the two
attributes are indepe.cdents,+ I when they are perfectly ~ssociated and-l
when they are.perfectly disassociated. Many such coefficients of asso-
ciation have been worked out by different authors but the one given by
Yule is easy and simple.
Yule·s coefficient of association or
Q _ (AB) (a{J)-(AfJ) (aB)
- (ABJTaIf>+(Ap)(.aB)
I
We know that when the two attributes A and B are independent the
value of (AB) (afJ)-:(AfJ) (afJ). As such, if two attributes are independent,
the value of the numerator in the above formuJ a would be 0 and the value
of the coefficient of association would also be o. Similarly if there is
perfect association between the two attributes A and B the value of (AfJ)
(aB) would be 0 and since it will be so both in the numerator and the
denominator, it is evident that the value of the coefficient of association
would be+ I. Similarly if there is perfect disassociation bet\veen two
attributes A and B the value of (AB) (ap) would be 0 and since it will be
so both in the numerator and denominator, the coefficient of association
would . he-I. The following example would illustrate t1:lc above for-
mula r
-Exampl8 4. Calculate the coefficients of association from the
foll.pwing data ; -
(1) (A)=60; (B)=80; (AB)=48; N =100 1
(2) (A) =60; (B)=8o; (AB)=6oj N=IOO
(3) (A) =60; (B)=8oj (AB)=40; N =100
(4) (A)=60; (B)=80; (AB)=50; N=100
'iOili/ioll :
(1) In this caSe
(AB) =48
(AfJ)=(A)-(AB)
=60-48=12
(aB)=..(B)-(AB)
=80-4 8 =3 2
(a{J) . (a)-(aB)
=4 0 -3 2 =8
(AB) (afJ)-(Ap) (aB)
Q= (ABf(afJ)+(A{J) (oB)
(48 X 8)-(1 ~ X 3 l )
= (4 SX 8)-t--(i l X-32)
ASSOCIATION OF ATTRIBUTES 555
::::i 1 2.00=+1
1200
Thus there is perfect positive association between the attributes A
and B.
(3) In this case
(AB) =40
(Af3) =CA)-(AB)
=60-40 =20
(aB) =(B)-{AB)
=80-40 =40
(af3) =(a)-(aB)
=40 - 40 =0
(40 X 0)-(20 X 40)
Q =(40 XO)+(20 X40)
-800
== 800
- =-1
Q ho~roHlox,o)
(yO)( 10)+(10 X JO)
:00
==8 +.2}
00
Thus there is slight association between the attributes A and B.
'The chief characteristic of this coeBident of association is that it
is independent 'of the relative proportions of
A's and ,,'. in the data.
If all the terms containing A uc multi~lied by a constant,- the ...alue
of Il would not' be. aifected. SinJilarly if all the tmns containing B
or tI or fJ are multiplied by a 'COnlWlt, the valUe' of f2 'Would rCs:Qain
unaKccted: Thus, if in the laSt eumpl~,tbe values 'of '(AB) and (AB)
are multiplied by two, the frequencies would be-
(AB) ='00
'(JVJ) =.10
(.,B) =.50
(lIfj) =
10
,J2= (100,Xl()HJ.ox}~)
, (1C:~oXI0)+(%O'X50)
-~
JQQO
-+.a,.
Thus,the ......ue of the codIici=t of euoc:iation_ ~Cc:l'~
'ted C'ftA thOllfb.aU ~ COG~I:A~,;"~ ~) Uld (~_~ mwti-
plied by.l. Similuly if (",B) ...d '(4l) 'Were', -.GltipJicd by • coalUftt the'
'Value of IJ. woUld not be',.aa'~ed. '
The comparison of the coefBcicots of ...oc:iat1on of two sets-of dab
,can gi'9'e an idea about the 'csteQt of ,..sodation betweeo twQ_~ of
Sautes.
&,..pk ,. lbe following tabJe gi..... 'the aWilber of persons
a.,dericg, ftom certain -¥rmh1es in' BeDgal. ~,
< ':.. '"
Total .. ,
.Deaf and ' Insane aad
numbers lris~e mutes ' deaf-mutes
,. ,
Study the IlSsociation between insanity and deaf-mutism, aepa.ca-
tc1v for males and females. '
ASSOCIA'rION or ATl'atJSUl'ES 557
SOllllioll :
Denoting
Insanity by A aDd sanity by tI, Deaf-mutes by B, and its abscoce
by fJ, the given data are
Sex
Males Females
N = 160,00,000 141,00,000
(A) = 1%,'01 9,0"
(B) = 11.6,0 14"1}6
(AB) =;= '17
34'
Calculating the class frequencies of the second order.from the above
we have.
S'X
Males Fema1ea
(AB) . = 54' '17
(AP)=(A-AB) = 1%,10, 8,,,8
(IlB)=(B-AB) = 20.7S6 13,8J~
(ap)-N-{AH4B)= 259.66,194 .140,77.126
Substituting the abo~e values in the 'Yule's formula for c:oefIident
of association
Q (AB) (afJHAfJ) (B.)
(AD) (afJ) + (AP) (aB)
Where Q represents coefficient of association.
we get,
QforMales- ('4J X~'9,66"94Hu,IO' X20,7S 6)
('4' X2S9,66.S94)+(U.10S X%0,7S6)
='96
and Q fot Females==(,~?~_.!:~o.77.u6}-(8n8 X 13 81 9)
(311X 240.77~1 ~6)+(8738 X t ,819)
==+·97 .. '«.'.~ .
Thus; thete is a positive association between insanity and deaf-
mu~~ fo! both the males. and females of ~~ but the degree: of
usoaaUon 1$ mo~ for females than for males. '~:
Coefficient or c:OWgnadoo
Another impOrtant coefficient· which is also ind~~t of the
relative ~rof0rtions of A's and a's in the data.(lilte Yule s ~clent ot
Auociauon is known as Col/./id,fI' of C-oIIigfllltitlfl, Coefficient of Collig-
,oation or
558 FUNDAMENTALS OF STATISTICS
y
1 - j CAB)
(ABf(aB)
(afl)
+ J(ApgaB)
1 (AB) Cap)
The following example illustrates the above formula:--
ExalJlpfe 6
Given
(AB)=,o
(Afl) = 10
Calculate the Coefficient of Collignation.
SO/lIlion
Coefficient of Collignatioo or
I-J2
30 X 5
y=--
0X5
1+J~?:C5
_;OX5
I-V.,33 1-·57 ·43
=--=-
=1+.57 = r·57
I+V·33
=·2.7
it can easily be proved that coefficient of association or
21
Q= I+yz
In the above example the coefficient of association or Q
_ CAB) (am-CAp) (aB)
- (AI3) (4)+(A{J) (aB)
(3 0X 5)-( IO X5)
=-(30 )<5)"+-(10 x 5)
=~=.~
.100
=_i!_x 100=41
100
Amongst the ri,ch, hO\\'ever, the association is not of a very high degree.
As such the idea t;hat the association in total population is due to the
fact that larger number of rich people are vaccinated and are also exempt
from tubi!fulosis is incorrect, because in that case the association of A
and :i iD' tysub-population of C should have been very high.
It should be noted that in the above case the percentages of l'eople
exempt from tuberculosis in the total population as also in ,the sub-
population of poor are very low (.7% and Z.77% respectively) but in the
vaccinated group' ·the percentages are very high (37.5% and ;0% res-
pectively). In the sub-population of rich the percentage of people exempt
from tuberculosis is fairly high (45%) but in this group also the percentage
of vaccinated 'people exempt from tuberculosis is higher by 5%-it
is p%. Thus the attributes A and B are positively associated in the
total population as also in the sub-populations. The association between
A and B is thus not due to their common association in the sub-population
of C.
It is possible to lay dowh a formula for the coefficient partial of
association, by; modifying the origina1 forn'lula for the calculation of
coefficient of association. '~he only chJnge that is introduced in the
formula is that the sub-pqtYulation in wnlch aS$ociation is being studied
is also indkated. Thus It formula for the coefficient of association bet-
ween A and B is ..
_ (AB) (afJ)-(Af3) (aB)
Q-(AB) (af3) (Af3) (ail)+
Now if we wish to study the partial association between A and Bin
the sub-population of C we shall add the Qttribute C in each of the above
classes and the coefficient of association would then be
(ABC) (afJC)-(AfiC) (aBC)
Q= (ABC) (aflC)+(AfiC) (aBC)
,Illusory associations
It is clear from the above discussion about partial associations
that sometimes misleading or illusory associatipns may be ob~e-rved
between two attributes wl-_ich are not directly associated but which
arc both individually associated with a tliird attribute. Thus. if A and
B are two independent attributes but both of them He positively asso-
ciated with a third attribute C, it would appear as if A and B are directly
associated. If A is positively associated with C and B negatively associ-
ated with C the association between A and B would appear to be negative.
Thus misleading conclusions may be arrived at, if the partial associa-
tions are not studied. The following illustration would further clarify
this point ; -
Suppose out of 100 non-vegetarian patients, a new dietic treat-
ment is tried on 80. Further if out of 30 patients who died only 10 were
from th~e who were under the new treatment, the coefficient of asso-
ciation b~ween the two attributes. i. 6.. N.:w Treatment (A) and
Deaths (B) would be-I. Suppose futther that the same treatment was
tri~d upon vegetarian patients also. If out of JOO vegetarian patients
the treatment was tried upon 40 and if out of 60 deaths in this group 40
were those on whom the new treatment was tried there would be perfect
positive association between the new treatment and deaths, or the coefJi-
cient of association b~tween new treatment (A) and deaths (B) would
be+ 1. Thus there is perfect negative association in one case and perfect
positive association in the other. If further the resultt were pu.b-
Iished' :wit,hout distinguishing between non-vegetarian an~ vegetarian
patients the conclusions would be highly misleading. In that case out
of 200 patients the new treatment is tried on 120 and out of 90 deaths 50
are from. the group. on which the treatment was tried. The coeflicient
of association in this case. between new treatment (A) and deaths (B)
would be-' I 7 indicating that there is a slight degree of negative
association between them.
It is thus evident from the above illustration that if the association
between sub-populations is not studied separately misleading conclusions
ate liable to be drawn. There may be cases where the apparent asso-
ciation or disassociation between the two attributes in the universe at
large may be the result of association between the two attribu~es with
a tbird attribute. It may also be (as in the above illustration) that the
association between attributes in the universe at large may not appear
significant but there may be a high degree of negative or positive asso-
ciation between the two' attributes in the various sub-populations. In
the above illustration if the combined results are published they would
show that out of I 20 patien~s on whom the ne\v treatment was tried
on!y d·ij X 100 or 41.7% died while in 80 patients on whom it was
not tried the percentage of death was t%- X 100 or 50. It would indi-
catc tbat, the new treatment has some value. But we have seen that in
the non-vegetarian group out of 80 patients on whom the treatment
564 FUND'AMENTALS OF STA.TISTICS
was tried only IO,or 12'5 % died and in the vegetarian group the per·
centage of death was 100. It means that the new dietlc treatment is
excellent for non-vegetarian patients but suicidal for vegetarian patients.
In this case it is necessary that the results are published separately for the
types of patients.
Manifold classification
Number of classes
Attribute A
I
Attribute Al A2 I Aa '" ... I As Total
----- ----
(AIBl)
- - - - ----.
(A2Bl) (A JB 1 )
---- - - - - -------
... ... (AsBl) (B l )
BJ.
----- - - - - - --_- ---- - - - - -_--- ------
B2 (A J B2 ) (A2B,) (AsB2) ... ... (AsBi) (B 2 )
- - - - - - - - - ---- ---- ---- _--- ----_-
B8 (AIBs) (A2 B a) (!laBa) ... ... (AsBs) (B,)
----- ----- ---- ----- ---- ---- ------
...
----- ---- ---_- ---- ---- ---- --------
...
------- ---- --_- ---- ---- ---- ----_ -
\ B t (AI Bt ) (A 2 Bt ) (AaBJ ... ... (AsBt ) (B t )
------ - - - - _---- ---- ---- -(AY-j ------
Total Al (A 2) (As) ... ". N.
I
Tn the above table the totals of various columns Al All etc., and
the totals of various rows B1> liB etc., would give the first order frequen-
cies and the frequencies in various cells would be second' order frequen-
cies. The total of either Al A2 etc., or B I , B 2 , etc., would give the grand
total N. Such a table is called Contingency Table. The following con-
tingency table is 4 X 4 fold. It gives' the details about the stature of
the parents and the stature of the sons.
Parents
•
V. Tall Tall Medium Short Total
£ Medium ; 140
433 [65 U5
0 ----- ----- ----- ._--- ----- ------
Short ) 37 68 15 z59 [
From the above table we can easily know that the~e are 40 very
tall parents and 20 of their children are very tall. 14 tall • .3 medium and
.3 short. Similarly there are 290 short parents arid 151 of th~ir children
are short. 125 medium. I:!. tall and 2 very tall.
Association in contingenc~ tables
For studying association in such tables the easie~~ ,yay is to convert
them into 2 X 2 fold tables by merging the various groups. For example,
in the above case, tall and very tall groups can be combined in one and
named "tall". Similarly the medium and short groups can be combined
in one and named "not tall." If it is done, the above table would become
2 X 2 foJd and then association can be easily studied. The contingency
table given above can be reduced to a 2 X 2 fold table as follows , -
Parents
Not tall 18 3 50 9 69 2
We can now trace the association between the stature ot the off-
spring from the above table by the metho~_ discussed eatlier. Thus the
percentage of tall ~hildren in the universe is 3<_:_8 X 100 or 30.8. The
1000 18 9
percentage of tall children amongst tall parents is -8 X 100 or 6103-
30
This indicates that there is a positive association bet\veen the stliture
of tbe parents and the stature of the offspring. Similarly, the perc-ent·
age of not-tall children in the universe is 6qz X JOO or 69.2, The per-
1000 50 9
centage' of not tall children amongst not-tall parents is - 6 X ioo or
92
n.'. Here again there is indication of positive association between the
stature of the parents and offspring.
The abov~ procedure of ~tudying association in cc:ntingency
tables is not very.accurnte or convenient. In actual practJce we are
concerned with finding out whether A's on the whole depend on B's.
We are not ,concerned \'lith the association of individual A's and B's.
ASSOCIA'CION OF A'1"l'RIBO'l'ES 567
There IS need of a co-efficient which would summarize the extent of
dependence of one attribute on the other. Moreoyer the technique of
pooling sub-groups together is laborious and inconvenient particu-
larly if J and I are large.
Coefficient of contingency
If A and Bare c_ompletely independent of each other in the uni-
verse at large then the actual values Al B I , A9 , Ba, etc., must be equal
to their expected val~es which are in turn equal to (At) JB
1) and
XI=E {(If )1 }
1
II
This value is called "Square Contingency" and if the mean of the
square contingeflcy is calculated, it is called "Mean Square Contingency.Of
Thus
Square contingency = X 2 and
Mean square contingency or
XI
q,2 =N
(.pI is pronounced as Phi-Square)
XI can ~lso be calculated by the following formula :_
are equal in all cases the value of Xl and r/JI would be o. The limits
of X' and 41 2 vary in different cases and as such they are not suitable for
studying the association in contingency tables. Karl Pearson has given
the following formula for the calculation of "Co-efficient oj Mean Stj!ltJre
Conting·'ICY." According to it the coefficient of mean square contin-
gency or
c= j;;&
-J
- q,2
-,-,'--:X=2-
-J-
-
~--1'.
:::.
= j III
Thus in a z X 2
-
fold tahle the maximum value of C
=
J a-I
-2-=.70 7
~32
15 1
-_--- -----
q8 2.9 0
259
_-----
rooo
SO/Iltion
\
Let us take as our hypothesis the supposition that the two attributes,
t'iZ., stature of parent and stature of offspring are independent. If this
b~ true, the theoretical cell frequencies would be :
We get,
X 2= (20-2'9)2+ii_0=-~3~2)~+ (20- 2 4.;)2 (.2-20·9)2 +
2·9 23'9 24· 3 20·9
+ (14-9.4)2 +
(125-78.4)2 + (85-79.8)2 + (12-68.4)2
9.4 7 8 .4 79. 8 68·4
+ 12
(3-17'3)' + (1 4 0-1 43 .8)2+ (165-146.4)2+ ( 5- 12 5.5)2
17.3 143. 8 146 .4 125·S
+ (3-10.4)1 + 8
(n- 5·9)2 +
(68-87.5)2 (In-7~ .2,2 +
10·4 85'9 87'5 7~.7
.570 .PUNnAME.NTA~S OF STATISTICS
C = IN!~'
vht!reaS' C repteseots ~- -Coefficient of Mean Square Con-
tingency
W~ get,
C =
=.49~
J 325. 1, .
looo+3 1 5.Q
= l ·'49SI
--~VC:;;--=(=4'_1) (4-1)
ASSOCIATION OF ATTllIBl1TES 571
·245--
= j_
·245
=V'
·755 x3
=-to8
= ·329
J 2·265
___---------------~------------_4.-------------
II
t n
572 FUNDAMENTALS OF STA'I'lSTICS
II. Do you find any association between the tempers of brothers and sisters
from the following data : - '
Good natured brothers and good natured sisters u30
Good natured brothers and sullen sisters 850
Sullen brothers and good natured sisters 30
Sullen brothcrs and sullcn sistcrs 580
12. Explain the 'method of finding association between two attributes. Out of
70 thousand litcratcs in a particular district of India. number of criminals was 100.
Out of 930 thousand illiterates in the same district, number of criminals was I 5 thousand.
On the basis of these figures, do you find any association between illiteracy and
criminality? (M. A., Agra, I941).
13. (0) Write a short note on the use of Coefficient of Association in analysing
economic statistics. .
(b) From the figures given in the following table, compare the association
between literacy and unemployment in rural and urban areas, and give reasons for the
difference, if any : -
Urban Rural
Total Adult Males
Literate Males
Unemployed Males
Z5 lakhs
10
5
.... zoo lakhs
4
0
12.
"
"
Literate and unemployed Males 3 4
" A. A''''., 193 7 ')
(M.
(M. A., Pa/na, 1943).
14· The following table gives the number of literates and criminals in three
cities of U. P.
Kanpur Allaha\Jad Agra
Total number (thousands) 244 1 84 230
Literates (in thousands) 40 47 H
Literate criminals (in hundreds) 3 2. 2.
Illiterate criminals (in hundreds) 40 2.0 24
Comparc thc degree of association oetwcen criminality and illiteracy in cach of
the three towns. (M. A., AI/ahabad, 1944).
15. A census revealed the following figures of the blind and the insane in two
age-groups in a certain population.
Age-group Age-group
15-25 years Over 75 years
Total population 2,70,000 1,60,200
Number of blind 1,000 2,000
Number of insanc 6,000 1,000
Number of insane among the blind 19 9
(a) Obtain a measure of the association between blindness and insanity in each
of the two age groups.
(b) Do you consider that blindness and insa nity are associated or disassociated
with each other in the two age-groups, or more in one 2ge-group tbnfl in the otner?
(U. P. C. S., 1948).
(M. Com., Allahabad, 1950).
16. Calculate the co-efficient of association between extravagance in father and
sons from the follOWing data :
Extravagant fathers with extravagant sons 3z7
ExtravaglilOt fathers with miserly sons 545
Miserly rathe!S with extravagant sons 741
Miserly fathers with miserly sons 2.55'
(M. A., Lu~kno1P. 1947).
ASSOCIATION OF ATTRIBUTf_S 573
17. 1n December 1897, there was aR outbreak of plague in a jail in Bombay.
Of 127 persons wbo were t1ninoculated, 10 contracted plague, 6 of them dying. Of
147 persons who were inoculated, 3 contracted plague and there were no deaths.
Trace the association between (a) inoculation and contracting the disease (b)
inoculation and mortality among persons who have contracted the disease.
18. The following table shows the distribution of the temper in pairs of sisters
in an exhaustive school enquiry:- ,
)
FIrst SIster
Second Sister I
Good natured I Sullen Total
Good natured 1040 I ISO 1220
SuITen ~6o I 12.0 2.80
Total 12.00 I ~oo 1~00
Trace the association, if JIOY, in the distribution of tempers infirst sisters and
second sisters. (M. Com., Raj., 1952.).
19. Find out the coefficient of association between the type of college traininjZ
and success in teaching from the following table : -
Institutic,,)O Successful Unsuccessful Total
Teachers' College 58 42. 100
University 49 51 100
Total 107 93 2.00
..<i
Fair 2.6~
1
2.57
I 17 8
to
::8
Poor 41
I 91
:n. Given the following contingency table for Hair -colour and Eye colour,
lind the value of C. Is there good association between the two?
Hair colour
Eye colour I Total
"-
Fair Brown Black
------
Blue 15 5 20 40
Grey 2.0 10 20 5'0
&2.. The following table show8 the astIociation among 1000 criminals between
their weight and mentality. Calculate the coefficient of contingency berween the two:
Weights in Ibs. I
Mentality
90 - ao
I
I 120-130 130-140
I
I 140-150
I 15 0 I Total
I I Iupward II
Normal ·.----,-O--I--I-O-2.--T--1-9-;8:---_,I~-2.-I-O--TI--2.4-0~-';I--n8~00::-
Weak ;0 ,8 72 ~o I ~o I 200
Total 80 140 2.70 I 2.4" I 2.70 I 1000
23. The data in the following table were obtained in a cross between a rust
resistant and a susceptible variety of oats. The Fa families were compared
for reaction to rust in the seedling stage, and in the fide under ordinary q>idemic
conditions.
Classification of Seedling and Field Reactions of 900 Fa
Families of Oats
Seedling. Reactlon
Field Reaction
Resistant Segregating Susceptible
Resistant 112 51 37
Segregating 47 404 49
Susceptible 13 In
•
IZ
. Tesl the sIgnificance of the asSOClatlOn in the table and cal~late the coefficient of
contmgency.
24. 1000 subjects of English, French, German, Italian and Spanish nationality
were asked to name their preference -among the music of those five nationalities. The
results were as follows : -
Nationality of music prefened
J
English French German Italian Spanish Total
..... Engl~sh
-
:g"
Ul
32 16 75 47 30 zoo
----
French 10 67 4z 41 40 zoo
'0 ,
Gerrnt lZ z3 107 36 Z2 zco
~ Italian
.-------
16 ~
Zo 44 76 44
-.-
zoo
'D
S
Spanish
~ 30 66
01 8 zoo
Z H 43
Total 78 it 179 \
29 8 243 202 1000
Discuss the association b~eeo the nationality of the subject and the nationality
ollhc music preferred.
ASSOCIA'l'ION OF A'l"l'Rmt1T.BS 575
2.5. Examine critically. the following statements and the infcrcncel1 drawn-
State wheth'er the'inference is or ill not valid in each case, glvin~ reasons. It lD2y be
that to test the validity of the inference in any case, some addItional information is
needed; in that case state what this additional information is and how it should be
analysed to examine the truths of the inference drawn.
(a) Statemeot: Eighty-five per cent of the girls reading in University Bare
short-sighted and wear gllisses whereas only 2.S per cent of the boys have this eye
defect and wear glasses.
Inference: There is a strong association between eye condition (defective ')r not)
Ilnd sex, with short-sightedness being almost a char"cteristic of the female sex.
(h) Statement: The acreage under food snd cash crops in a tahsil for the years
t9S 3-195 5 was as shown below :-
Year Acteage (in units of a thousand acres)
Food crops Cash crops
I9H' 100 2.0
1954 12.0 38
1955 ISO 62.
Inference: During recent years there has been a tendency to convert the land
under food crops to casb crops.
(t) Statement: Out of all the 600 children of a school, who were vaccinated,
only 2.0 were attacked with tuberculosis. Out of all the 300 children of another school,
'QOO~ of whom were vaccinated, 30 were attacked with tuberculosis.
~inbination
What would have b~en the frequency of 'fatherS with dark eyes and sons with
dark eyes'. for the same total number. had there been complete independence?
(1. A. S., I9H).
29. In a town of about 1,00,000 population, 52.,000 males and 48,000
females were distributed as follows ;-
., Males FamaJes
(In thousands) (In thousands)
Educated and employed 38 6
Educated and Unemployed 2 14
Uneducated and employed 84 18
Uneducated and unemployed 10
Is there any Connection between education and employment in the two groups
as well as in the total population.
Interpolation 20
Meanillg. Ordinarily statistical data relating to the values of two
interrelated variables are not available in the shape of cvntinuous
series. Such dllta are found in the form of discrete series so that for
some given values of x-variable corresponding values ofy-variable are
available. Sometimes necessity is felt for obtaining corresponding value
of y-variable for a particular value of x-variable which is not available
in the given series. The process of estimating such unknown values
of y-variable conesponding to particular values of x-variable is
. called Interpolation.
Suppose ~he interrelated values of two variables x and 'J are
a~ foIl ows : -
x .Y
3·2.
2. 4. 1
3 5. 6
4 6.8
5 7·'
6 7·9
When statistical data are available in the shape of class intervals and
class frequencies it become':! inevitable to use the technique of inter-
polation for the calculation of the values of median and mode. It
would be remembered that interpolation is always dont: under certain
assumptions and in case of the interpolation of median we had assumed
that the magnitude of the median class is equally distributed over the
frequencies- of that-class. Similarly. in case of interpolation df mode
our assumption was that the m~dal value is affected by the values and
frequencies of tbe adjoining classes. But it is not only for the purpose
of locating the median or mode that methods of interpolation are used.
The use of such methods is necessary in a large variety of studies. As
a matter of fact whenever we have to fill up the gaps in statistical data
the technique of interpolation has to be used. vl?-PS in statistical data
may a.::se due to various reasons. In many cases it is not possible to
collect the whole dat!l about the problem u~der study. Even if it Were
possible to collect the whole data it may not be worth while to do so
due to a large amount of expenditure involved or due to organisational
difficulties of a complex nature. Population census. as for example,
is not conducted every year because It involves a huge expenditure of
money and there are considerable difficulties in organising it. As such
population censuses are conducted only once in ten years in almost
all countries of the world. Now if we wish to know the population
of a country in-between these censuS years we shall have to use the
technique of interpolation. In India the last ceqsus of population
was held in 1951 and prior to it in 1941. If we hive to estimate t1;le
population ofIndia in 1940 or 1947 we shall have to use the technique of
interpolation. If, on the other hand, we wish to know the population
of India in 1954 or 1955, we shall have to extrapolate it because the last
available figure relates to 195 I only. Besides this, gaps in statistical
data may also be on account of the fact that for some special reasons
no statistics were collected either in a particular week or month or year
as the case -may be. In some cases the collected data may have been
destroyed or lost and in such cases also the technique of interpolatiQn
has to be used to fill up the gaps in statistical information. In all such
cases where there are gaps in data we have only two alternatives before
us (a) either to fill up the gaps by imaginary figures or most likely
4gures according to our intuition or judgment or (b) to fill up the gaps
by the most likely figures as estimated on the basis of the available data.
Pbviously the latter alternative is better and is likely to give us a
dependable estimate.
Assumptions
As. has ~een sai,d above interpolation of figures.is done on .certain
assumptions. They are as .follows .:-
(I) There are no stIJJen jumps ill jigNres /r:01ll one period 10
another. In other words it means that the data are in the shape of conti·
INT8B.POLATION 579
nuous or smoothed curve. H t for example, we are interpolating the
figure of population of India in the year I933 and we are given the
figures ofIndian population in the yeats 1911, 19%1•. 193I, 1941 and 19' I
our presumption w01,lld be that t!te population in this country bas
grown up smoothly and there are no violent ups and downs in these
6gures.
(il) The second assumption is that the rate of .hollge oJ the figures
is II1liforlll.
It means that in our example of interpolation of Indian
population our second assumption would be that the rate of the growth
of Indian population h1ls been uniform throughout the period 19 I I to
[95 I .
Methods of interpolation
GRAPHIC METHODS
Graphic Methods in continuous time series
If statistical data are available in sufficient quantities they can be
plotted on a graph paper. After this continuous smoothed curve can
be drawn passing through the plotted points. This curve would
disclose the inter~atiotr bf the two variables and if we know the value
of one variable we can estimate the corresponding value of the other
variable. This method would become clear by the following ,example :-
Ee<omp/e 1. The following table gives the population of England
and Wales. The figures are for every twenty years beginning from
IBIJ : -
Year Population
(in crores)
1811 1.02
1 8 31 1·39
18 51 1.79
18 7 1 2.27
18 9 1 2.9 0
19 11 \.61
1931 4. 001
The above figures are plotted below in figure I :-
~ .~ ------. -!---~~'---I-'--
~ ~...:.r:-:-:-::.: r-:-·""-.~r=-:-:-: .. -l .. · ( . ._+---1
Fig. 1
INTERPOLATION 58l'
In the above figure the plotted points are very clear. The curve:
is not obtained merely by joining the plotted points by straight line.
In that case the curve would not have been smooth. Now suppose
we' have to iaterpolate the population figures for the years 1861 and
I 88I. For this we shall first locate these values on the x-axis on which
the years are shown. From these points two ordinates shall be drawn
at they-axis. We can now read the values at the points where these
ordinates touch the y-axis. They would be the interpolated figures
of the population for the years 186 I and 188 r . From the above graphs'
these figures are 1.'0 crores and 2.6 crares for the years 1861 and 1881
respectively. For these years the actual figures as obtained from the
censuses were respectively 2..007 and 2..597 crates. It is clear that the
difference between the interpolated figures and the actual figures is not
much.
The actual figures ofpopulaticn for 19%1 as disclos~d was ;.789 crateS.
Thus here again the difference between interpolated and the actual figure
is not much. The error is only.. 4~%'
ALGE.BDAIC }'fETHODS
'\. Suppo!;e there are two series x andy. We presume that the values
of.J depend on the "values of x -in-such 2 manner .t.h;tt w_h~n _x i, given y
can be estimated. In a parabolic curv.e as we have seen in the chapter
on Analysis of Time Series the relationship is of the type .
.1=a+bx+cx'+Jx3 ........ .
wht;re a, b, f: and dare cor.stants to be determined. The equation_'=
,,+bx+~+Jx8 is a parabola of the third order. If we are given fOUl
v~lues of.1, we can tit a parabola of the third order, to the series. Simi-
larly if 8 values ofJ are available, we can fit a parabola of the 7th order
to snch a series.
1931 n· 8
1941 ;8·9
In the above table we are given four values of the ,-variable and
as such we can fit a parabola of ;rd order. It would be of the following
type
y=a+bx+cx'+dx'
Now it we can know the values of 0, b, I and d in the above equa-
tion, we can complete it and then it would be possible to interpolate
the: population for the: year 1926.
In this data we shall take the years as x-series and the population
figures as ),-series. We have seen in the' chapter on Analysis of Time
Series that in place of given values of x we can write down their devia-
tions from any point of origin. If we take 1926 as the year of origin
the deviations of the years 1911, 1921, I9.~5 and !941 would be respect-
+
ively -'15, -5, 5 and + 15. We can further reduce their size by
dividing them by a common factor and writing them as-" -I, +1
and +; respectively. The deviation at the point of origin or 19z6
would be o. Thus the data.given in the above example can De written as
Tollows : -
x J
-,
-I
30 .3
30 .5
0 .10
I )3. 8
.5 3 8 ,9
or
30.5 =a-b+&-a (ii)
.yo =a+b(o )+&(0 )2+d(0)3
or
)'o=a (iii)
33·8 =0+b(I)-t:&(1)2+d(I)3
or
8
33. =a+b+c+d (iv)
38·9=a+b(3)+C(3)2+d(3)1I
or
8 b
3 .9=0+ 3 +9&+qd (v)
INTERPOLA'l'ION 5~5
100 2..0000
101 :&.0043 +.0043
101 :.0086 +.0043
10 3 ".0] 2.9 +.004;
2..006""
In the above table the figures were tabulated very closely and
dee diiIe:l:ence was equal. In many cases the difference may be equal
a~~~~qw:ptly it may not b~ possible t!=' ~~terpolate ~gmes by this
!idiidC method. In such cases .we have to' proceed to highlr dlfferlnn s
ttl t£e hope that they would ultunately be equaJ and may vanisb at some
.tag. In other words, we presume that the differences are finite and are
c:wpabk of be.iog eliminated.
Now take the fonowing table relating to the squsres of cemtin
n~:-
,
J
a
l 1
...
+;
+,
+2.
.
"
•
I 9
16
r
I
+7
+2
In this case the first differences are not equal but the second
differenc:es ate constant and coru;equentJy the thi!:d differences vanish.
In puctice the differences are indiCated by the sign~. Thus first diffe-
fences would be indicated by 6,1. second differences by' 6,'.
third diffe-
rences by Aa· aD.d so on. The first difference in each column is called
L~ DifJerlllU. Thus the leading differences in the above table are
+ J + 2. and o. H we are given the leading term (the first figUre in the
column of squares) ami the JeacUng differences we can build up the whole
table. Tbua z+o (second and third leading differences) is equal to 2.
which is the value of the second difference; 3+ 2 is equal to j which is
the n1ue of the second difference in the first difference column. In this
way we qm find ont all the differences and when the differences of the
first cohrmn ~ obtained we can easily find out the values of the vari-
ous teals. -
<i
..,
.<l.
,.d-
::3
::sa ..<lII• 11
tf~
;a
..<1f.:. .<lL
.•
<J
..<l.. •<l
..
'Ui
.
:all ~
..<ln• ..<1..
R
.<I.
II
;;
.<IL ..<Jr.. ..<I..
(
"'0
..<II.. .<1I• ..<1I. ..<lI.
.. v •
..<1.- ..<1.. ..<I.. ..<1.. .<l.
.. n..
"..
::" II "- II..
-I
1IIf-r ~
=if
"r i'. ......
t t t~
~ ....
'"'
.... co
~ ..... ..
.... .
.... ..
....
- • - 'ot' ... "
\0
588 PUNDAYENTALS 0P STATISTICS
Newton's ,formula
We have already shown above how Newton's formula is derived
and how it is based on binomial expansion. We give below examples
to illustrate the use of this formula.
Expecta- I
tion of Diflcrences
life in
z6.o )'3
Z3. 1 /.Y4
z.o·4 Y:;
J X(X-I) J X(X-I)(X-Z} a
_'z=J.+xA 0+ I xz A 0 + IX2.X; A 0
" ~e XI 80 4
.11
.1.
+z.oS
+II4 All
All
-9 11 6·.f
~J All
-t-7 A.I,
+Z~ All
+1' ' A
-
.•
a. " 3S X. 918 .YI -66 A'.
+ 4 8 All
•• .40 fx• Q66 '.
INTERPOl.!t.TION 591
2-4-20 4
~=-=8
2.5- 20 5
= v + xb.1 f=X(X-I) 1\ a +
,;I((X-I) (x-z)
.Yx 0 or X%I W. 0 I X 2. X ,
X(X-I) (x-2.) (x-;) 6,.
IX2X3X4 0
We get,/"
1,,=%9 6 +'(.8 X ;0 j\.J.--!8(. 8-1) x-08 + .8(.8-1) (.8-z)X7
'/ IX% IX%X5
.8(.1\-1) (.8-2) (.8-3) X 18
+ IXzX5X4
=29 6 + 242..4+7.84+.224--.3 J68
=54 6
Differences
Newton-Gauss formula
1 X(X-l) 2 (X+l)X(X-l)
Y" )'o+xb. yo-+- i X 20 b. 7.1 + I XZ X 3
, (X+1) x (X-I) (x-z)
T ~'Y-2
IXzX3X4
3'75-3'5 '2.5
X= =---='~
4'0 -3'5 '5
INTERPOLATION 593
Substituting the values we get,
Y,=z.o'US + (oj X 1'581) + {'S l,S-'I)z X 'zn>}
= { (0,-1) 'S ('S~ I) X - '03 8 }
9-11
Value of x = - - = .5
11-6
Sterling's formula is
yx =Yo+ x '"'
Al +6>¥o_ + -;-
x2
A. 2 Y-
X(X2-1)
Y-l2. 1+ 6 X
A IIY-2 + A. IIY- 1 2
x (X 2_1 2 ) A!Y-2 .......•.
+ __
Z 2.4
Substituting the values, we get.
,X=48+.5 5+4 + .2.l_ X-I+ .5(.2.5~1) X
2. 2. 6
-~+2. .2.5
- 2.- + --- (.2.}-:)'
2.4 ' J
=4 8 +2..25-. 12 5 +.0312.5-. 0 39
=5 0 .117.
1_--;=,,----~--r___,.;---...,D-iHi-e-lI=en
.........ce-s.._____r-~_-~-
Year
1 901
19 11
! poPUlatiOn
(000)
y
First I Second I Third}
~l ~I ~.
Fourth
~4
1 92.1
1 931
1 941
1 951
1 94 1 - 1 93 6 ·5
x = 1 95 1 - 1 94 J = 10= ·5
Y"
=Y _XA 1 _
- 0 ~ ,. 1
+ (X+I)-,,"
%
AI _ _
~ 'I 1
(.%"+1)-,,"(-,,"-1)
6
AI
~ T_I
•
=39(.,XIZ)+ (1.'_~~5XI) _(I.5X.5/-.5 X-4)
=39-6+.375-.2.5 0
=33.12.5 thousands.
x
(0
~o .• .Ja
4 0 7 J.
So 8 y.
'In the above table the x series advances by equal intervals l?f teo
units and the value of x for which the cortesponding value of.7 has to
be interpolated is also ooe of the class limits of x series. As such in
this case we can make use of a formula of direct binomial expansion.
Since 4 values of.J are given we can presume that the fourth leading
difference would be o. We have already illustrated that the leading
differences follow the iaw of Binomial Expansion and lIS such we shall
raise a binomial of the 4th order.
596 FUNDAMENTAl.S OP STATISTICS
Thus
b.'o = 0
or
Y4- 4 )'+<?Y2-4Yl+YO=0
Substituting the values of y,,)'u etc., we get
8-28+0'2-20+4 = 0
or 02=36
or Y;- 6
The following example would further clarify this formula :-
Example 8. Obtain the missing figure in the following table.
Value of chi-square at 1% level of significance.
~Degrees of Freedom I, 2, 6, 7.
Degrees of Freedom 8,
J % chi-square .
6.64 .10
2 9. 21 .11
3 II·34 J.
4 r 3.28 J'a
5 ? Y ..
6 16.81 Y.
7 18.48 _"'t
8 20.07 y,
Q 21.67 J'.
Since the known quantities are eight the eighth leading difference
will be zero.
6. ~O=J'8- !brd-- 28Y6- 5~Y5+7Qy- 5~Jld·2~2 -l!Yl +'>:0,:,"0
INTBRl'OLATWN 597
Substituting the given values, we get
6. 8 0 =(Z.1.67)-(8 X Z.0.07)+(z.8 X 18,48 X )-(56 X 16.81)+70Y4
--(5 6 X 13.28)+(28 X Il.34)-(8 X9.2.1)-6.64=0
=21.67'-160.65 + 5 17.44-941.;6+7~y,-743.68
+ 3 1 N 2.-73. 68 +6.64=0
or 7QV 1=160.5 6-2.1.67-517.44+941.36+743.68
-3 17.52.+73.68-6.64
70Y4=I056.01
or.Y =15.°9
Thus the interpolated value of chi-square at 1% level of signifi-
cance for five degrees of freedom is 15.0 9.
METHOD USED IN UNEQUAL INTERVAL OF ARGUMENTS
11. The following table gives the census of population of a certain tOWn i!l
1891,1901, 19II, 1921 and 1931. Estimate the population in 1925, making your
method clear : -
Years Population
18 91 98 ,754
1901 1,3 2 , 28 5
19 I I 1,68,076
19 21 1,95,690
193 1 2,46 ,05 0
(M. A., Ca"utta, 19 n),
12. The following are the annual premiums in a certain Life Insurance Com pany
for a policy of Rs. 500 payable at the death with an agreed bonus : -
Age next Annual
birthday Premium
Rs. as.
2.S 24-10
;0 2.7-11
35 31 - 9
40 36- 6
45 42 - S
Calculate the premium at age 36.
(M. COfll.,Lu&i:nolll, 1942)
'3. The following table gives the quantities of a certain brand of tea demande.d
at prices noted against each. Estimate the probable demand when the price II
Rs. 1-14-0'8 pound.
Price of tea Quantity demanded
per lb. in thousand
Rs. as. Ibe.
1- 4 82·5
1- 8 70 • 8
l-U. 63.1
2- 0 55.0
z- 4 48,9
(M. A., AII#habad, 1942) •
• 14. The Gross Profit of the Buland Sugar Co., Ltd., are given below:
Years Gross profit
(in lakhs of Rupees)
19;5-;6 4. 86
1937-311 12.64
1939-40 1;.68
1941-41 16.6~
1945-44 13.1 9
Make an estimate for 1932-43 and 1944-45. (B. Co", •• RtU•• 1944).
602 FUNDAMENTALS OP STATIST1CS
1,. 'tf Ix repreaents the numbers living at age x in a life table, find as accurately
.. the data will permit. Ix for values of X= 55,42 and 47, given.
' ..- 51 2 • ',.-439, "0= H6; '50= 245· (1. A. S., 1948).
16. From the following data, estimate the number of persons earning wages bet·
ween 60 and 70 rupees.
Wages in rupees No. of persons
in thousands
Below 40 25 0
40 60 no
60 80 100
80 100 70
tOO 120 ~o
The age of mothers and the avemge number of children borl(l. per mother
26.
are given in a table below.
Interpolate the average number of children born per
mother aged ~O-34.
15-1 9
10-z4
:1S-2 9
30-34
~5-39
40-44 (4~. P. C. S.).
~9. It is required to find the missing value in the following table. Establish
!lny suitable formula for interpolation, and find the missing value.
19 1 5
1920
19 2 5
193 0
1935
Assuming the conditions of the market to be the same, estimate the sales for th e
year 1940. eM. A., Palna, 1941). \
Age in months z g 10 12
Weights in lbs. 7t 16 IS ZI
Estimate the weights of the baby at the age of 7 montha. (M. A., PaIno, 1940).
34. Determine by Lagrange's formula the percentage number of criminals
!lnder ~ S years.
Age %number of criminals
lJ ndec z 5 years 52 •0
30 " 67-3
40 84.1
,0 " It
94·4
(M. A., Agra, 1934).
606 FUNDAMENTALS OF STATISTICS
35. The following table gives the npmber of income tax aSSe8sees in U. P.:-
[ncomes not exceeding Number of
Rs. assessees
2,5 00 7,166
3,000 IO,S76
5,000 17,200
7,5 00 20,5 0 5
10,000 21,97S
Bstimate the number of assessees with incomes not exceeding Rs. 4000.
Age-group Deaths
.1.1- 13,2.1.9
35- 18,1391
45- 24,.1.25
55- 31,49 6
(P. C. S., 1'5.1.).
58. Find by algebraic method of interpolation, using all the infurmation given
the likely number for 1950 from the following table of index numbers of production
of certain article in India
Find out the number of candidates (a) who secured more than 48 but not more
than So marks, (b) less than 48 but not less than 45 marks. (P. C. S., 1<)54)
INTERPOLATION 607
40. The following figures show the valbe of a life annuity upon a single life
aged 20, at rates of interest varying from z.~ [,.} , per cent : -
Calculate the intermediate values at 2.75 and 3.75 per cent after developing an
appropriate interpolation formula. (P. C. S., 1956).
41. Develop a formula which will help interpolation when observations are
known to be at unequal intervals.
The observed values of a function are respectively 168. I ZO, 72 and 65 at the four
positions ;, 7. 9 and 10 of the independent variables. What is the hest estimate you
an give for the variable of the function at the position 6 of the independent variable?
(1. A. S., 195 I).
42. The. following values are given in a table:
x .}
216.000
Z 22.6.9 81
3 ----
4 25 0 •047
26z,l44
Using any suitable 4lgcbrair. method, find the value of.} for X= ,.
Also draw a graph of the above points on a piece of squared paper and from tbl.
graph find the value ofy for X=4.4 cr.
A. S., 19H).
4~. (a) By coqstructing a difference table,lind the 7th term .. well a8 the
general term of the sequence :
(b) Given
Sin 4,°=0.7°7 1 ,
50°=0.7660,
55 0 =0. 81 9 Z ,
60°=0.8660.
Fi~d Sin p.o, by using any method of interpolation. (1. A. S., 195 ,).
4S. The length of the day was IZ hours on March 19th; 14 hours on April 18th
and I S hours 40 minutes on May 18th. Required an approximate value of C.) t h~
length of the day on May 3rd (b) the m~n length oftbe day during the period, March
19th to May 18th. (1. A. S., 1947) .•
. 46. The following table gives the population of Indore city at the time of last
61X censuses : -
It has been said earlier that business forecasting has in recent years
been made a more scientific and aq:urate proposition than what
it was formerly when business forecasts were made only on the basis
of experience and intuition. In modern times business forecasting
has been put on scientific footing so "that risks associated with it have
been considerably minimized and the chance of precision iqcreased.
In fact in most of the economically advanced countries of experienced·
persons and which undertake this highly specialized work of drawing
statistical inferences from the past data as modified in the light of current
conditions, with a view to study the future course of events, the
Harvard Economic Socil!ty, Brooletnirl! Economic Servicl!, Babson Statistical
Organisation of the United States of America, The Lo,rdon and Cambridge
Economics S~rvice of the United Kingdom and The SwedisiJ Board of T,.ad6
~re world famous institutions which undertake the work of business
614 FUNDAMENTALS OF STATrsr..cS
tht: two series of cyclical percentages with .~ffetent time<-1ags. The t.im~
lag which gives the highest value of coefficieilt of correlation is con:si-
dered to be the best estimate of the lag between two $eries.
Once the time-lags between the movements of various series have
been estimated forecasting caJ?- be done easily. It should be remember~d
that here also forecasting is not done mechanically and due adjust/Dents
are made for the effects of the current economic conditions and other
special factors operating at that time .. Thus. if there is currency infla-
tion in a country it can be forecasted that wholesale prices would go up
or that retail prices would increase or that the wages. would record 'an
upward change but if there is an effective government control over
prices and wages, then despite inflation they may not record any change
or in any case the expected quantum of change. We thus see that the
effects of special factors operating at the time of making a forecast are
very important and have to be taken into account.. In the above case
unless the forecasts are modified in the light of special factors operating,
the inferences are bound to be misleading and inaccurate.
\ .
""ction and reaction theory
This theory is based on the assumption that every action h .. a u
reaction after some time. It also assumes that the magnitude of reaction
is based on the magnitude of the original action. Thus if the price of
a commodity has gone up above the normalleve1 there is every likelil..
hood that after some time it would go down below the normal level.
In making forecast according to this theory special study has to be made
about the normal level of the phenomena in question. Normal le:vel
is not fixed for all times and sihee it is a dynamic concept, normal level
of phenomena has to be very carefully estimated at the time of making
forecasts. It is common knowledge that after a boom there is a depress-
ion which is again followed by a boom and the cycle goes on in this
way. Thus for every action there is a reaction in the reverse direction.
This being so if it)s felt that a particular phenomenon has gone above its
normal level it can be forecasted that after some pme there would be a
tendency for it to move in the reverse direction below the normallevel.
The extent of the movement and the time after which the reaction would .
set it would have to be decided on the basis of past happenings as modi-
• fied in the light of current facts. Thus the basic nature of business fore-
C2.sting remains the same in all theories and the analysis of past and preseot
conditions has to be done in all cases.
Babson Statistical Organisation· of the U. S. A, makes its forecasts
on the basis of this theory. It should be remembered that irt· this theory
forecasting is done on the basis of actual level of phenomena in relation
to its normal level. .'
Specific histotical analogy theory .
.As the name of this theory suggests it 15 based on the study of
such past conditions which closely resemble those under which forecast.
\
616 FUNDAMENTALS OF STATISTICS
ing is being made. What is done actually is that a time series relating
to the data in question ,is thoroughly scrutinized and from it such period
is selected in which conditions were similar to those prevailing at the
time of making the forecasts. The course which events took in the
past under similar circumstances is then studied and it gives an idea
about the likely course which the phenomena in question would follow
in future. The theory is thus based on the assumption that history
repeats itself and that whatever ha'ppened in the past under a set of cir-
cumstances is likely to happen in future also if conditions are the same.
This theory also makes due ad;ustments for the special circumstances
which prevail at the time of making the forecasts but it is largely depen-
dent on pa.st conditions.
Cross·cut analysis theory
This theory is different from the last one. It denies that history
repeats itself in economic life and according to it each factor affecting
a phenomenon should be studied separately and independently. In the
last theory we have seen that business forecasting was done on the basis
of the analysis of past conditions as modified in the light of current
situations. In it, the different factors affecting the problem under study
were not studied separately and independently. In l!istorical analogy
theory, as also in other theories the conclusions wefe arrived at by the
study of the combined effects of the various factors affecting a pheno-
menon as modified in the light of current conditions. In Tthis theory
the combined effects of the Tarious factors are not studied. The effect
of each factor is studied independently. The process is very difficult
and we have already discussed in the chapter on Analysis of Time Series
how difficulties arise in the separation of the effects of va rious types of
factors affecting a series. It is obvious that this theory makes little use
of historical data. It concentrates on the analysis of the present factors
affecting a phenomenon in question and as far as possible the effects of'
each factor are studied separately.
Utility of business fbrecasting
For Controlling Business Cycles. Frolfl what has been said above it
is clear that the utility of b1.lsiness forecasting is v,. ''17 great not only to
businessmen and economists but to the society as a '" :lOle. It is common
knowledge that business cycles are always very harmful in their effects.
Abrupt rise and fall in price level is injurious not only to businessmen
but to all types of persons. Industry, tradi", agriculture, etc., all suffer
from the painful effects of depression. Trade cycles increase the risk
of business, create unemployment, induce speculation and discourage
capital formation. They spread from country to country and in a short
time the entire economic body of the whole' world is in the grip of trade
cycles. The Great Depression of the "thirties" is a very well-known
illustration on this point. Business forecasting reduces the risk associa-
t~d with business cyeles. If businessmen, industrialists, economists
know in advance that a period of depre3sion is expected in the
near future they can plan in such a manner that the intensity of
BUSINESS FORECASTING
depression is roduced and its harmful effects are minimized. Sim ilarly
if businessmen could know in advance that a boom is to set in they
'can plan their policy in such a mann-::r as to take the maximum advan-
tage of the situation. B lsiness forecasting is thus very useful for the
purpose of controlling business cycles .
. For Making Profits: Besides this, businessmen make forecasts for
the purpose of making proBts. As has been said earlier when a person
enters business he enters the profession of forecasting. In business,
forecasting has to be done at every stage. A businessman may dislike
statistics or statistical theories of business forecasting but he cannot do
without making forecasts. A businessman has to forecast the future
level of prices and the extent of demand and his success or failure de-
pends on the accuracy of the forecasts that he makes. The amount
of stock to be kept by a businessman or the quantum of goods to be
procluced by an industrialist entirely depend on what they feel about
the future course of events. It is thus obvious that in business and
commerce forecasting is indispensable and it plays a very important
part in the determination of various policies.
Questions
I. What is meant by business forecasting? What are the assumptions on which
business fore-casts are made ?
z. Discuss the important theories of business forecasting. How does analysis
of time series help in forecasting of ecc,nomic events? (M. Com., A/I•. , 195%).
3. What is meant by business forecast? Explain the major classes of me-
thods used in forecasting. (M. Com., urJ:now, 1944).
4. What IS meant by business activity indez? How will you 'Weigh the va-
rious series of which such index will be composed? (M. Com., LIICJ:noIlll, 1947).
5. What do you understand by time-lag theory of business forecasting? What
are the general assumptions in this theory ?
6. Write a note on the usefulness of analysis of time series in business
forecasting.
7. What important thtories of business forecasting are known to you? Give
a critical estimate of each of them.
S. What is meant by business barometers? Write a note on their limitations.
9. What is the utility of business forecasting in th ~ modern world? What
are its limitations?
10. In what fields of business facts must be studied to,judge the position o_f
a business in regard to btlsiness cycles? Why is a careful analysis of business cycle'
important in the field of marketing? (M. Com., ~GJ:nDIV, 1943)
Interpretation of Data [
22
In previous chaptt'rs we have discussed the various methods of
colle<;:tion and analysis of data. An attempt has been made in those
chapters to a'nalyse the various methods which are used 'by statisticians
to collect statistical material and the technique used by them for their
detailed analysis with a view to draw inferences from them. All such
rules of collection and analysis of data are briefly termed as "statistical
methods". The task of the statistician does not end after the collec-
tion and analysis of data have been done and he has to draw inferences
from the analysis that he has done. In drawing inferences it is neces-
sary to exercise extreme care otherwise misleading condusions may be
drawn and the whole purpose of the collection and analysis of data
may be destroyed.
Meaning and .;Need. Interpretation of data refers to that part of
the science of statistics which is associated with the drawing of in-
ferences from the collected facts after an analytical study. Interpretation
is an extremely important and useful branch of the science of statistics
because it makes possible the use of collected data. Statistical facts have
by themselves no utility and interpretation makes it possible for us to
uti1is~ collected data in various fields of activity. The usefulness and
utility of collected information lies in its proper interpretation. Ali
statistics are collected with a view to draw certain conclusions about the
prbblem which is being studied. In all such sciences where the method
of induction is used, statistical methods are important tools but as is
true of all other methods,statistical methods also depend to a considerable
extent on the nature and the use to which they are put. If statistical
methods are misused it is natural that the conclusions obtained would be
inac~rate and undependable and 'if, on the other hand:(a proper use of
statistical methods has been done there is no reason why the inferences
drawn would not be fairly accurate and trustworthy. It is, therefore,
extremely essential that various statistical methods are very carefully
used in the analysis of data. Mistakes are committed in the analysis of
data either deliberately or unconsciously. As a scientist it is the duty
of the statistician to see that mistakes are as few as possible. Deliberate
mistakes are due to bias and prejudice and they can be e1imit:lated com-
pletely if tlIe statistician is careful in the selection of his staff. so that only
such persons are given the task of collection and-analysis of data who can
work impartially. In previous chapters, whenever we have discussed
the methods ofl:ollection' of data or their analysis we have indicated the
various sources from which errors can arise. But it is not enough that
the 'statistician onfy kno'Ws the possible sources of errors or mistakes
"as. this knowledge would not by itself reduce the magnitude of the errorsfo..
620 FUNDAliENTALS OF STATISTICS
the goods. Even if the quantity of the re-exports has not increased
there may have been a decline in the conSUlllption of homemade goods
and there may not have been any increase in the per capita consump-
tion. It is also likely that the population of the country has increased
and the additional imports are accounted for by the increase in popu-
lation. If none of these factors have changed and the per capita con-
sumption of goods has really gone up it is no proof that the economic
condition of the people has improved. It may be that the additional
imports are mostly of luxury goods consumed by a handful of rich per-
sons who may have become richer whereas the vast majority of the
people may be consuming the same quantity of goods as formerly or
even less. It is very obvious that the conclusion which appears all
right at the beginning may n9t at all be accurate.
Wrong interpretation of statistical m.easures
Wrong ;nlqpr,tation of index nllfllbus. Mistakes in interpretation
of data may also arise due to wrong interpretation of statistical ~easures
calculated from the data which have been collected and analysed. Thus
if index nUlpbers have been calculated from the collected data and if they
are not properly interpreted, wrong conclusions are bound to be arrived
at. It has already been said that inda numbers only reveal a general
tendency and further that index numbers constructed for one purpose
may not necessarily be suitable for other purpose. If some conclusions
are arrived at by interpreting inda numbers without keeping in mind
their limitations it is quite likely that they may not be accurate. Similarly
if inda numbers constructed for one purpose are used in such problems
which are of a different nature wrong conclusions are likely to be drawn.
Thus it may be wrong to say that since the general price level has in-
creased, therefore, the cost of living must have gone up. It has already
been pointed out that general purpose whole sale-price index numbers
and cost of living inda numbers are constructed in two different ways
and serve different purposes. One cannot be used in place of the other.
Similarly it would be wrong interpretation of index numbers to say that
since the general price level has gone up'; therefore, the quantity of money
in the country has also increased. Generally price level does not merely
depend on the quantity of money in circulation. It also depends on the
quantity of goods and the velocity of the circulation of money, etc.
Wrong int".pretalion of torn/a/ion. Just as wrong interpretation
of index numbers would give wrong conclusions similarly if coefficient
of correlation oc coefficient of association are not properly interpreted
wrong conclusions are likely to be arrived at. Coefficient of correlation
also indicates a general tendency and that is why we had mentioned
in an earlier Chapter that a trend line can be "obtained for studying
the correlation between two series. It was also mentioned
in the chapter on Correlation that coefficient of correlation should" be
interpreted very carefully because it does ~ot fully disclose the mutual
dependence of two variables. Moreover correlation does not nCCCl-
INTBRl'llETATION OP DATA 623
,
sarily mean cause and effect relationship between two lICries. Supposie
in the state ofU.p. there is ne~tive correlation between the "area under'
surgarcane' and 'area under foodgrains·. If from this we conclude
that the cultivation of ~ugarcan~ is increasing at the cost of cultivation
of foodgrains it would be a clear example of wrong interpretation of
coefficient of correlation. Again if we conclude that people prefer
sugar to foodgrains it would also be a wrong interpretation of the c0-
efficient of correlation. It is likely that due to import of foodgrain
at low prices from foreign countries their cultivation inside the country
has become less profitable and S9 sugarcane is being cultivated in place
of foodgrains. It may also be that due to the increase in the sugar
mills. the prices of sugarcane has gone up and it may have become
more profitable to produce sugarcane than to produce foodgrains.
It is also likely that d~e to the construction of canals those people
who could not produce "sugarcane for Wallt of adequate water have
started producing it. Tlfe preference of surgacane cultivation over
the cultivation ot>oodgrrutls may also be due to changes in the climatic
conditions.'IThtfs before any conclusion is arrived at from the co-
efficidtt of correlation in the present case it is absolutely essential. to
take into account all these factors. If they are ignored it -is obvious
that" the conclusion would not be dependable.
Take another example. Suppose tlJ,e proportion of child accident'-
are less in those localities where there are parks and morein those where
there are no parks. It means there is negative correlation between the
number of parks and the number of child accidents. From this it
cannot be concluded that to reduce the number of accidents the num-
ber of parks must be increased. It is possible that those localities v.here
there are parks rich people live and the number of children are few
and the number of servants who look after them large. It is also likely
that in such localities there may be gardens attached to the houses and
the children may not be coming out,on the roads frequently. It is
thus clear that unless the coefficient of correlation is interpreted very
carefully misleading conclusions are likely to be drawn.
Wrong interpretation of association. In case of coefficient of associa-
tion also ~ess proper precaution is taken wrong conclusions .are likely
to be drawn. In the chapter on Association of Attributes while dis-
cussing partial association we had mentiOned that the association between
two attributes may be the result of their common association with a
third :i.ttribute and there may not be any direct association between them.
Thus. if there is a l'ositive association between inoculation and exemp-
tion from attack of small-pox it should not be immediately concluded
thatthe inoculation is useful in preventing the disease. It may be that most
of the inoculated people are rich and live in healthy surroundings. where
chances of getting these disease are litde. Thus the apparent association
between inoculation and exemption from small-pox may be due to the
common association of both these attributes with a third attribute.
namely. the cc::noomic status of the people.
624 FUNDAMENTALS OF STATISTICS
Output of consumer's i I
_
goods 12.2. : 147 171. 190 1.00 1.30 274
4. Interpret the data given below and illustrate any two series given by •
suib~le diagram :-
Percentage of
World land World cul- World prc..- Worldpopu.
Quantity of Country area tivated area duction of lation
cereals
Asia excluding U.S.S.R. 18.6 32.9 31.0 53.1
North America 17.3 21.2 21.5 8.2
U.S.S.R. 16.1 16.8
- 22.0 7.6
Europe excluding U.S.S.R. 3.7 16.3 16.0 17.9
Mid and South America 13.2 5.7 4.5 5.0
-
Africa
-_ -
2".1 5.6
. .
4.0 7.7
Oceania
Total 1
~
ToO
100.0 f 100.0
F ~ 1.0
_
1-
0.5
100.0
(M. A., All4babaJ, 1952).
S. How fat do you agree with the conclusions drawn in the following cases :_
(d) It is observed that intelligent fathers have intelligent sons; and intelligent
pnd-fathers Mve intelligent grand-son!>, thctefos:e, intelligence is hereditary,
(6) Two series:_quantity of money in circul3;tion and general price index-
:lrC found to possess positive cors:elation of a fairly high order. It is concluded that
one is the cause and the other the effect in a direct causal s:elationship.
(,;) I~ is observed that generally death rates in two towns as:e identical. It ie
infein:d from this that the popUlations of both the towns are equally ~lealthy,
(M. A •• RajPNltIII4, 1959).
(b) The pet capita income for India in 1931-32 according to the estimates fra-
med by Dr. V. K.-R. V. Rao was' Rs. 65.00. The estimate for 1948-49 framed by the
National Income Committcc was Rs. 225. In 1948-49 India was, therefore, four
times more prosperous than in 1931-32.
«() The examination results of school x was 75% in a particular year. In the
same ye_ar and at the same examination-onl} 400 out of a total of 600 students were
successful in ,chool.1. The teaching standard of the former school was decidedly
better. (B. Com., Dtlhi, 1953)
40
FUNDAMENTALS OF STATISTICS
-8.. c:~
u·_
.......
CO B
'O_g.~
." '"' ....
o '~
on
......
....
o':i
.
e'..0.... . . ,.
....0 , ~...
c:
§D~": ~g5 ~2 .. r:! i:'.a ~ d ~[.'.! !
-2 ....o ~aS ]'0
~~1~
._ " " - 0 .;: d S
..... ,0 'zj
a ~~'-'~
~~~~
ft ~::> .. C "' ....
en
<~c:dl
:::5!
.!l.c'
;:.. ....
~."
o:J.. H
'0
.. ." c!-t3]'O
1931-32
1932-33
15.9
17.9
._..!!
222.5
261.2
8.5
16.4
-- ::I
."
0.7
1.4
2.0
2.2
.g
-
19)1-.-34 17.3 257.0 30.2 2.7 1.6
1934-35 18.4 275.8 36.9 3.2 1.2 17.2
193"5-36 22.5 333.6 55.3 5.3 '1.0 20.4
1936-37 25.2 380.2 63.0 6.1 0.8 24.1
1937-38
1938-39
22.2
165
319.4
145.4
57.9
35.9
5.3
3.2
.
1.0
1.0
18.8
7.3
1939-40 19.1 217.4 70.3 6.6 1.0 9.2
1940-41 25.4 286.4 52.0 5.1 1.2 15.5
19-41---42 17.5 145.5 38.8 3.8 0.8 7.5
Write a short review based on the above ~blc
...
of tbe .apr economy of Uttar
Pnocbb during the period 1931 to 1942. (N. A. A,.a, 1944).
- - - - - - - - - - --""7"--:-""7 - - - - -
8. Iacapret the following results relating to two collegu A and B and find
('41t which of the two is better ; -
A .tI
Esamination No. of .caodi-_ . Successful· No. of candi- Successful
dates appeared ~dates appeared
M.A. 30 25 190 ao
M. Com. 50 45 1 0 85
B. A. 200 150 100 70
B. Com. 120 75 80 50
Total 400 295 4'0 215
(0) Whether the economic condition of the agriculturist Qf Uttar pradesh was
favourable Of unfavourable to him month by month in 1948.
(b) Whether at the' end of 1948 he was better or worse off as compared ,~o (i)
1932 ; (ii) Beginning of 1948.
12. Discuss the soundness of the f')i1owing arguments and indicate what
additional information, if any, would be needed to test the matter more effe~tively.
Your answers mus_t be brief" and to the point. (You may assiune that all tlie f"cts
stated are correct.)
(0) Wages have risen much more than salaries compared with pre-war days,
so any action to remove the excess of demand should be concentrated on them.
(b) The percentage of women in the various ranks of state service falls consis-
tently as one goes up the hierarchy; this must mean either'that there is :it prc;udlce
against promoting women, or that they have less llbi)ity to do responsible work.
(c) The mortality rates of miners for a particular period were above the male
average in each age-group, but those of mmer's wives were even further above the
female averages, it follows that high rates for miners were not due to unhealthy
workjng conditions but to low incomes.
Fl"I'DA'YEl':TALS OF STAThnCS
(J) The futility c;,f diphtheria immunization is shown by th~ fact that there
were over 5,000 cases of diphtheria amongst immunia:d children .in a particular
period. (P. C. S., 1953).
13. The following figures have been taken from the census ofIndia Report,
1951 :
Age-!l'D/(p 5 ID 14 flllri
Number of Numocr ot Numbelof Numbe!of
Zone males in married males females in married fema-
thousands in thousands thousands les in thou-
sands
North India 7,41b
~~~
tt,298 1,568
East India 10,935 10,253 1,759
South India 9,256 ·87 9,213 421
West India 5,~2 128 5,010 535
Central India 6,750 494 6,427 1,364
On the basis of the foregoi ng figures, write a critical Dote 00 the -extent of earlv
"11'lrriagcs ill different zones of India. (P. C. S., 1955)·
1
or San e w ill not get a pmc:
d the ch ance t h at h' . 1S
. (4+
16 ) or l'
16
4 In
This principle can be extended to cases wher~ there are more than
2 operations to be:: performed.
EXQ,!,ple 2. There are six doors in a room. Four persons have
to enter It. In ho\'.'" many ways can they enter from different doors ?
The first person can enter from any of the six doors, the second
from any of the remaining five doors, the third from any of the remaining
four doors, and the fou.rth from any of the remaining three doors.
Thus the tot al number of waysoy which they can enter through different
doors is
6 X5 X4X3 = ;60
(i) The nunlber of permutations of n dissimilar thing. taken r at a time.
This is the same thing as finding out the number of ways in which
, places can be filled when there are n dissimilar things. The first place
can be filled up in n ways, the second in (n-I) ways, the third in (n-1)
ways and so on. The total number of permutations would be
n~(n-I)x(n-2)X .•.•...•• . x(n-r+I).
In example No. 2 the value of n was six and of r 4.
Thus'the number of ways in which four people can enter' a six-
door room from difterent doors is
n(n-I){n-2)(n-3)
or
6 (6-1) (6-2) (6-;)= ;60
The. above rule is written in shott form as
"P~
Thus
"Pr=n (n-I) (11-2) . •.. .(n-r+l)
n 1
- (n-r) I
The sign 1 is read as fafloriaJ.
n1=n (n~I) (n-2) (11-3) . ....•.
In example No. "2, the number of ways in which four persons can
cnter a rOom from six different doors
op 6 !
= .= (6-4)!
FUNDAMENTALS OF STATISTICS
This is equal to
(Ifj'+n+p) I
mlnlpl
PROBABILITY
If, however, the shelves were of equal size so that each could con-
tain four books the answer would be
(m+n+p) ! 3m ! .
mIn ! pI; ! or m 1 m 1 m ! ; 1 Slnce m = n = p.
12.1
- 4 1 4 1 4 1 ;1
= 5775
(iv) The number oj wtrys in which n things mtry be arranged among them-
selves when p oj the things are exac'tIY alike and of one kind, q of the thi.ngs are
exactlY alike, of ~ second kind, r of ~he things are exactly alike of a tbird kind,
and the rest' are all different.
If, n things are aU different, the total number of ways in which
they can be arranged is n I. If, however, Some of them are exactly
~imilar to each other, the number of permutations is reduced. Thus
the number of ways in which" things can be arranged, 1f p are of one
kind, q of a second kind" and 'r of a third kind, is
nI
pI q 1 r 1
Example 6. In how many ways can the letters of the word
"combination" be arranged?
FUNDAMENTALS OF STATISTICS
Simple events
We have already pointed out that if an event· can happen in !II
ways and fail to happen in n ways the probability of its happening or p is
Example. 9.' From a ba~ containing four black and five red balls
a draw of three balls is made. What is the probability that aU of them
would be black ? .
The total number of ways in which, three balls can be drawn out
of 9 is=IIC a· .
The number of ways in which three black b~lls can be drawn out
of four = 'Ca.
Therefore t.Ile desired probability =:~,
a • e
nn'
,--.:--would be the chance of both events failing;
(m+n) (n;' +n')
mn' wopld be the chance of the first happening and seconA
~~~ .---- \
(m+n) (m'+n') failing; and
nm' would be the chance of the first failing and the second
(m+n) (m' +n') happening.
If the respective chances of the happening of two events are de-
noted by p' and p' the probability that both of them would hapl>en
would be pp'. Similarly the chance that none of them would happen
would be (I-P) (I-p') and the chance that first would happen and the
second would .fail to happen, would be P(I-p'); the chance that the
first event does not happen and the second happens would be p'
(I-P)·
If P is the probability of the hal>pening of an event in one trial,
the probability of its happening in two trials would be p 2 and in three
trials p3 and in n trialsr, because the probability of its happening in
each of these trial~ is equal and independent of the previous events.
If PI' P2 and P3', are the probabilities of the happening of three
independent events, the probability that some one of them would
happen is
1-(I-Pl).(I-P2) (I-P3) because the chance that all the three
events fail to happen is (I-PI) (I-P2HI-Ps) and except hi this case
some of the events (either I or 2 or' 3) must happen.
Exau,ple 12. A dice is thrown three times; what is the chance
that on the first t)1rOW, it falls with number 1 upwards and in the
second with either number I or number 2. and in the third with either
number I, or number 2. or number 3 upwards.
The probability of the first event is i
The probability of the second event is i
The probability of the third event is i
.'. The probability of the '-Compouqd event is i Xi Xi
- )
-Sir
Example 13. A 'bag contains 5 red and 8 green balls. Two d'raws
of thr~e balls each are ma~e, the balls being replaced after the first draw.
What IS the chance that In the. first draw, all the balls were red and in
the second, green ?
The number of ways in which three balls can be drawn out of
13 is= 13Ca .
The number of ways in v:hich three red balls can be drawn "ut of
~ is= <lCa
Tiw; number of ways in which three sreen ba11e can be drawn out
of 8 i5_- BCS
FUNDAMENTALS OF STATISTICS
_ 'C. X484_
The required probability would be - -52C-'- - 416~
2!..
13
Addition of probabilities
If a set of events is of such a character that when one of them
happens the other cannot ~ppen, the sets are said to be mutuallY exclu-
sive.. For example, if three per~ons run a race only one of them can
win, the other two cannot, assuming a dead heat to be impassible.
(Assuming that two or more persons do not cover' the distance in
exactly the same time).
The rule of tinding out the probability of the happening of
mutuaily exclusive events is as follows :-
.. If an event can happm in different Illtrys whic6 are 'mil/HallY exclusive the
probability that it u'ill happen is the SlIm of the probabilities of its happening
in these, different ways,"
6.40 FUNDAMENTALS OF STATISTICS
The above two events are mutually exclusive therefore the required
chance that foui- successively drawn balls are alternately of different
colour (without mentioning the colour with which to begin):
=I'~+I~
=}
Example 19. From;o tickets marked with the first thirty numerals
one is drawn at random. It is then replaced and a second. draw
is made. Find the chance that in the first draw (a) it is a multiple of
5'or of 7 and (b) in the second it is a multiple of 3 or of 7·
(a) The chance that the number is a multiple of 5 is =i:~
and the chance that it is a multiple of 7 is ="B".u
These events are mutually exclusive. Hence the requirect chance
is 6 +"- _1
= 1nJ "B"lY - 3
(b) The chance that the number is a multiple of,; is =ig
and the chance that it is a multiple of 7 is ="B"~
But Z I is a common multiple of both ,; and of 7. Hence
the probability that it is either a multiple of ,; or of 7
='0+ 4_ '=18
"B"lJ "B"lY lflY 'I er
The probability of the compound event = !-x i~ = ~~
Example zo. A bag, contains 5 red and; black balls, and-a
second one 4 red and , black balls. One ·of the bags is chosen at
random and a draw of 2. balls' is made from it. Find the chance that
one is red and the other black.
The probability that the first bag is selected and a draw of two
balls gives one red and one black ball =~X6CIX3~
T 'lC.
= '5
1>6'
The probability that the second bag is selcted and a draw of two
~ .~ X 6 C
balls gives one red and one black ball = T X ---'==----'-l
9C2
= Is5
Since the events are mutually successive the probability of the
events ='6+ 5='27 5
'1:11' Tlf '11":0 4'
41A
FUNDAMENTA~S OF STATISTICS
nl .,., n-r
rl(lI-r)1 r 9
=ftCrp''qn-r
Where p stands for the probability of its happening and '1 fClr the
probability of its not happening in a single trial.
The probability that a series of events will happen r time~ in n
trials and fail to happen n-r times is equal to pr qn-r, but each seIles can
happen in flCr different orders all mutually. exclusive. T~erefor~ t~e
probability of the happening of an event exactly r times 1n n trIals IS
nCrp" qn-r
If we expand (p+q)fI, by the binomial theorem we can ge~ the
probabilities of the happening of the event exactly n times, n-I tIme,S,
n-2 times, etc., in n trials. We shall discu'ss.the binomial theorem In
deta.ils ,in the neXt chapter. For the present we shall obtain the J>ro-
bability· of the happening of events of this type by bC,. pro qn-r
Example 22.. What is probability of obtaining exactly three, heads in
five throws with a single coin ?
PROBABILITY
To maKe the total of 12.; two coins must fall with the mark 3
up and three coins must fall with the mark 2. up. No other arrange-
ment would give a tobU of I Z.
The probability that a coin in one throw would fall with the
mark 3 up is t and the 'probability that it will fall with the mark % up
is also equal to i.
Therefore the chance that two coins would fall with mark 3 up
and three coins with mark Z up.
IC.p'q'
=~
Z X I
X lxlxlxlx!=
.
11
TW
Th, probabilil.1lhal an 'PIIII 1I'oIlIJ,happe"II al I'lasl r til11IJ ,,, " trials is
==p"+"c:.p_lq+"C,pr.-lIql+0Capr.-8g '+ ......
flC, p"qn-r
= 2.SXx41 x!xlxlxlx!
=H
'Therefore the probability of
at least three heads -u-~+lri+i.g.=i-a-
==1
Exal11ple 2. 5. The chan~ that a ship safely reaches a port is i.
Out of S ships expected what is the 'probability that 3 at least would
arrive safely ?
The probability of the safe arrival of a ship or and hencep=i
q=:.
PR.OBABILI'I'Y 64'
We want the probability of the safe arrival of al karl three ships
out of five which meanS either all the' five ships or four ships or three
ships.
The probability of the safe arrival of all the five ships
=pn= (-5-
1)5 = 3 12.5
1
Suppose a bag contains three black balls and four white ones.
Another bag contains four black balls and five white ones. A black ball
has been drawn from one of the bags and we have to find out the
probability that it came from the first bag. It is clearly a question
of inverse probability. Here we know that a black ball must have come
either from the first bag or from the second one an~ we have to find
out the probability of the first case.
If a very large number of draws are made from the first bag
(the ball being replaced after each draw) and if this number is denoted
by N then we can reasonably ~ct that ~ N times we shall get the
black ball and i N. times, the white ball. Similarly if the draws were
made, from the second bag we shall get a black ball i N times and a
white ball -& N times. This is a question of the general theorem which
is due to James Bernoulli and was published in Ars COnJtctandi in 1713,
eight years after the death o( the author. It should be remembered
that. this theorem holds good only when N is large. In ten draws from
the first bag (wjth replacement) it is not unlikely to get a black ball
each time but it will not happen so, if the number of draws is say 1000.
In a large number of draws the number of times we shall get a black
b311 would be more or less equal to pN or ~ N.
In the above illustration the 'probability that the ball came from
the first bag would be equal to the probability of the favourable events
~vided by the sum of the probabilities of all possible events. In other
words the probability would be
"_ ';'N _a 63
x 1nr
p- ,}N+j.N -'f
~'~ii
PROBABILITY
-
(-i- x +)
---------=-
27
.2_ X .l_) X (_I X_!,)
( 2 7 9
.2
Example 27. Suppose a black ball has been drawn from one of
three bags, the first containing three black balls anq seven white, the
second five black balls and three white, and third eight black balls and
four white, what is the probability that it was drawn from the first
bag?
If an event is known to have proceeded from one of n mutually
exclusive causes whose probabilities are PI' P2 ......... Pn and further-
more if PI' P2 ............Pn are the respective probabilities that one of the
n causes exists, then the 'probability that the event proceeded from the
mth cause is
P= ' PmP...
P1Pl+P2P2 ...... + P,.,Pn
In the given problem :-
'PI =P2 =Pa=j since it is just as probable that the ball was drawn
from one bag as another.
, also PI i. p.., the probability of drawing a black ball from the first
bag = L
10
and P2 i. e., the probability of drawing a black ball from the second
bag =j
and P3 i. e., the probability of drawing a black ball from the third
8
bag ;::-
12
648 FUNDAMENTALS OF STATI,)TICS
1 X ~9 X ..2_ = 441
10 50 10 5000
Probability that the event did not happen and that A and B were
wrong
=.2.. X !_ X 2_ = _9_
10 50 10 5000
=- X
1 49
- X l.. X 1.. = 132.3
10 50 10 10 50000
Probability that the event did not happen and A and B were
wrong and C right
9 I I 7 63
=-X-X-X-= - -
10 So 10 10 50000
Therefore the probability t~t the event actually did occur
132.3(5 0000
~ IP'3! ,::>000+6315 0000
J313/50000 1313 11
=1386/50000 = J 38(. = 2.2.
PROBABILITY 6~9
Questions
1. According to the Life Tables, out of one lakh of persons living at age
to,82134 survive to age 40, of whom 837 die in a year. What is' probability for a
man of 40 of surviving one more year.
2. In 128 litters, each of five puppies, the number of males were distributed
as follows : -
No. of males per litter 0 1 2 3 4 5
No. of litters 3 22 44 38 16 5
Calculate the probability 'of a m:Ue birth.
3. Eight balls numbered from 1 to 8 arc placed in a bag and two drawn at
random. What is thl! probability that they are numbered 1 and 2 ?
4. Find out the probability that a man asked to form a two-digit number out
of 2, 3, 5, 7, 9 would form 79, when
(0) repetitions are not allowed.
(b) repetitions are allowed.
5. What is the chance of throwing a number greater than 4 with an ordinary
die whose faces are numbered from 1 to 6 ?
6. In a single throw with two dice find the chance of throwing (1) eight, (2)
eleven.
7. Compare the chance of throwing 4 with one dice, 8 with two dice, and 12
with three dice.
8. A and B throw with three dice; if A throws 14, find B's chance of throw-
ing a higher number.
9. From a pack of 52 cards two are drawn at random; find the chance that
one is a king and the other a queen.
10. In shuffling a pack of cards, four are accidentally dropped find the chance
that the missing cards should be one from each suit.
11. If n people are seated at a round. table, what is the probability that two
named individuals will be neighbours ?
12. A bag contains 4 white,S red, and 6 green balls. Three balls are drawn
at random. What is the probability that a white, a red and a green ball are .drawn ?
13. Find the chance of drawing a king, a queen and a knave in that order from
a pack of cards in three consecut,ive draws, the cards drawn not being replaced.
14. Goddard, the captain of the West Indies Cricket team, is reported to have
observed the rule of calling "heads" every time the toss was made during the five
matches of the Test-series with the Indian team, what is the probability of his winning
the toss in all the five matches ?
How will the probability be affected if he had made a rolle of tossing a coin privately
to decide whether to call "heads" 'ur "tails" on each occasion. (l. A. S., 1950).
15. A lady decl~res that, by tasting a cup of tea mlde with milk, she can dis-
criminate whether milk or tea infusion was first made into the cup. It is proposed to
test this claim by means of an 'experiment with 12 cups of tea, 6 made in one way
and 6 in the othe"-.way, and presenting them, in random order to her.
Calculate the probability that on' the null hypothesis the lady would judge
correctly all the 12 cups it being known to her that 6 are of each kind,
If, however, the 12 cups were presented to the lady in 6 pairs, each pair to
consist of either kind, and the presentation be again in random order, how will the
probability of correctly judging with every cup on the null hypothesis be altered?
Which of the two designs would you prefer and why? (T. A. S., 1949),
FUNDAMENTALS OF STATISTICS
16. In a given race the odds in favour-of three horses A, B, C, are 1:3; 2:3;
and 2:5 respectively. Assuming that a dead heat is impossible, find the chance that
one of them will win the race.
17. Find the probability of throwing 6 at least once, in four throws with a
.ingle die.
18. A and B throw alternately with a pair of dice. A wins if he throws 6
before B tlirows 7, a'nd B if he throws 7 before A throws 6.
If A begins, show that his chance of winning is ~.
19. A, B and C in order toss a coin. 1'he firs~ one who throws a head wins.
What are their respective chances?
20. A, B and C, in order, draw from a pack of cards, replacing their cards after
eac:h draw. If the first man to draw a heart wins, what are their respective chances?
21. (a) Given n independent events with respective probabilities of occurrence.
P1; p'J.; .... ·• .. •.... •• .. •..Pn
Write down the probability of at least one of these events happening.
(b) What is the p~babi1ity of getting 9 cards of the same suit in one hand
~t a game of bridge ? (I. A. S., 1951).
22. A problem in statistics is given to three students, R. G: T. whose chances
of solving it are i, ! and t. What is the probability that the problem will be solved?
23. A has six shares in a lottery in which ·there are three prizes and ten blanks.
B has two shares in a lottery in which there are four prizes and eight blanks.' Who
has the better chances to win a prize ?
24. A bas four shares in a lottery in which there are four prizes and ten blanks,
B has thrce in which there are three prizes and four blanks. Who has the better
chance of winning exactly one prize? Who of winning two prizes ?
25. A can hit a target 3·times in 5 shots, B, 2 times in 5 shots, C, 3 times in 4
shots. They fire a volley. What is the probability that 2 shdts hit?
(M. ,A., Punjab, 1945).
26. A is one of five horses entered for a race, and is to be ridden by one of
the two jockeys II and C; it fa 3 to 1 that B rides A, i n which case all the horses are
equally likely to win, if C rides A his chance is doubled. What are the odds in
favour of his winning ?
27. Fourteen quarters and one-five dollat·gold piece are in one purse and fifteen
quartel)l are in another. Ten coins are taken from the first and put into the second.
and then ten coins are talten from the second, and put into the first. Which purse
is probably the more valuable?
28. A man draws from an urn containing two balls, one white and one black.
If he draws a white ball he wins. If he fails to draw a white ball, the draw is replaced,
another black ball is added and he draws again. If he fails to draw a white ball in
the next draw, the process is repeated. What are his respective chances of \yinning
in 2. 3, 4, 5, 7, and·10 trials?
29. In each of a set of games it is 2 to 1 in favour of the winner of-the pl'evious
game, what ie. the ~cc that the player who wins the first game shall win three at
retlat of the next four?
30. If the chance that a vessel arrives safely at a port is 'l"!'
find the chance
that out of 5 vessels expected 4 at least will arrive safe!y.
31. There are 9 coins in a bag, 5 of which are sovereigns -.nd the rest are
unknown coins ;)f equal nlues; find what they must be if the probable value 0 f
• draw is 12 shillings.
32. A pays B 1 Ih. to guess the number of heads in a single toll of 4 coin s
what expecutlon should B plate on each of the possibilities; no head, one head. twO
bqds, etc. ?
PROBABILITY
33. A bag contains S balls which are just as likely to be white as coloaued.
Two white balls are drawn from the bag what is the probability that all are white "1
34. Suppose A is' known to tell the.. truth in five cases out of six and he stllb:S
that a white ball was drawn from a bag containing 9 black and one white ball what is
the probability that the white ball was really drawn ?
35. A, Band C run a race. The odds are 8 to 3 against A, S to 2 against B.
Assuming a dead heat to be impossible, find the odds against C.
36. The odds,against A solving a problem are 8 to 6, and the odds in favour of
B solving the. same 15toblem are 14 to 10. What is the probability that if both of
them try the probfcm will be solved ?
37. Four persons draw each a ....ard from an ordinary pack, find the chance
that no cards .are of equal value.
38. A, B and C, in order cut from a pack of cards. replacing them after c:;adl
cut. The first to.cut a diamond is to win, what are their respective chances ?
39: A and Blwho are players of equal skill leave a badminton game when A
had scored 12 points and B 13. If the game was to finish at IS, and the winner was
to get Rs. 32, what share each ought to take ?
40. P has in ilis pocket one sovereign and four shillings. He takes out two
coins at randoul and promises to give one coin to R and the other to Q: What is
the worth of Q's expectation' i>
41. There are three balls in a bag. But j t is not known of what colour they
are; one ban drawn from the bag is of red colour. What is the probability that all
balls are red ?
42. There are 6 balls in a bag. Their colours are not known. A draw of
3 balls is made and all of them are finite. Whaf is the probability that no white ball
is left in the .bag.
43. A bag contains 5 balls. Their colours are not known. A ball is taken
out and found to be )Vhite. It is replaced and another draw is made. This ball also
happens to be white. This is also replaced, and a simultaneous draw of two balls
is made. What is the chance th~t both of them would be white ?
44. A speaks the truth in 75% cases and B in 80% of the cases. In what per.
centage of cases are they likely to contradict each other in stating the same fact jI
45. A and B are tWo very weak students of statistics and their chances of solVing
i
- a problem correctly are and ii respectively. -they are given a question and obtain
the same answer. If the probability of their making a common mistake is TU1ST find
the chance that their answer was correct.
4.f). A and B play for a prize of Rs. 324. A is to throw a dice first and is to win
if he throws 6. If he fails, B is to throw ar.-:l is· to win if he throws 6 ()r S. If he
fails, A is tlJ throw again and to win if he throws, 6 or S or 4 and so on. Find their
respcctiTe expectations.
47. Eight mice are selected at random from a large number and then divided
into two groups of four each-group A and group B. Each mouse in group.A is
given a dose 'a of a certain p,oison whIch is expected to kill one in four.. Each m~
in group B is given a dose b' of another poison which is expected to kill one in two.
Show that, nevertheless. there may be fewer deaths in group B than in group A and
find the probability of the happening. (M. C~1II., Allahabad, 1954)
~. Define 'mutually exclusive events' state and prove the theorem of addition.
of probabilities concerning mutually exclusive events.
A. D and C in order toss a coin. The first one to thrr-;v a hcad wins, what an::
their respective chances of winning ? .Assume that the game may continue indefinitely
(1. A. r .• 1955).
FUNDAMENTALS OF STATISTICS
49. There are m points on a line consisting of the ro)ours black and white.
If p and fJ are the probabilities of a point being black and white respectively so that
p+!l=1, find the expectations of obtaining
(i) a black-black join
(i) a black-white join, and
(iii) a white-white join,
a join being defined as the line joining adjacent points. (1. A. S., 1949).
50. Ix special type of an automatic telephone dial can be designed to consist
always' of 3 integers subject, however, to two restrictions, namely (0) that zero
cannot be dialled first and (b) that two consecutive integers cannot he dialled, the
lower one first and the next higher one immediately after. (For this purpose '9'
and '0' are not consecutive integers.).
How many subscribers can be served and in how many of their numbers will
there be at least one 2.ero ? (1. A. S., 1953).
51. State and prove the theorem of multiplication of probabilities.
p is the probability that a man aged x will die in a year. Find the probability
that out of 5 men, A, B, C, D,E each aged x, A will die in the year and be the first
to die. (1. A. S., 1954).
52. The following table gives the values of p (x, i) the probability that a person
aged x will survive upto-{i-1) years more but die before the end of the j·th year : -
I~ 70
1
.25
2
.20
3
.15
4
.10
75 .30 .25 .20 .15
(0) Find the chance that out of 10 persons, each now aged 70, exactly 5 will
die before the end of 2 years. I
(b) I pay to each person aged 75 an amount of Rs. 1,400 and ask him to pay me
a certain amount at the end of every year so long as he is alive, upto a maximum of
·t years. Find the value of the yearly instalment which I must receive from each if
I intend not to gain or lose in this transaction in the long run. Ignore all considera-
tions ?f investmen~ of money, interest etc. (1. A. S., 1957).
53. A fiction and a non-fiction book are selected from a bookshelf containing
12 fiction and 30 tl6n-fiction books_ In how many ways can the choice be made ?
54. It is considered that the only way to maintain peace between 6 countries
is to have non-aggression pacts between every possible pair of countries. How
many pacts are necessary ?
55. A customer buying a dozen eggs always examines a sample of 3 to see if
they are fresh. In how many ways can she pick the sample? If the dozen includes
3 bad eggs, in how many ways can she take a sample which includes at least one bad
egg?
56. 1 have 24 friends whose birthdays are equalJy likely to fall on any day of
the year and who will send me an invitation to their birthday. What is the chance
Lhat I will receive two or more invitations for the same day ?
n. Suppose it is 9 to 7 agaulst a penon A, who is noW 3S year~ of age living
till he is 6~. and 3 to 2 against a penon B noW 4S living till he is n: find the chance
that one at least of these persons will be alive 30 years hence. (B. A., Punjab.• I9s 8).
~ 8. The following mortality table shows the number of survivors to various ages
of 1,00,000 newly born males.
,\ge Survivors Age Survivors
0 1,00,000 60 67.7 8 7
10 93,601 70 4 6 ,739
20 9 2 ,293 80 19,86')
30 Qo,oqa 90 2,81ll
40 116,880 100 6
l() 80 $11
PRODABILITY
(i) Find the probability of a newly born infant in this population Jiving still
60 years old. (ii) Find probability of a 2.0 years old in this population living until
he is 50 years old.
s!); Three groups of children contain respectively 3 girls and I boy,2. girls and
2. boys; I girl and ~ boys. One child is selected from each group at random. Show
th~t the chance that the three selected at random consist of 1 girl and 2 boys is ;!
(M. Com., Raj., 1965).
60. A factory using quality control methods mass produces an article and pass
rccords shoW fhat on the average 4 ~rtkles are found defective out of evcry batch of
100. What is the maximum numba--of defective articles likely to be encountered in
a batch of 100 ?
It is brought to your notice that recently several batches of 100 were turned out
containing I I to 15 defectives. What inference would you draw ?
61. A and B stand in a ring with ten other persons. If the arrangement of u
persons is at random, find the chance that there<are exa_ctly three persons between
A and B. (Agra, M. St., 1951).
62. 'P' is the probability that arc managed X will die in a year. Find the pro-
Iiabil ty that out of five men A, 13, C, D and E each aged X, A will die in the year
and be the first to die. (I. A. S., 1954)
Theoretical Frequency
Distributions 24
.Me.n~
Thus the coins can fall in four ways-either 0 and bfaU heads or a
heads and b tails or 0 tails and b heads or both 0 and b tails. If p stands
for the probability of a coin falling head and q for falling tail, then,
The probability of two heads is PXp=p2
The probability of one head and one tail is=(pxq)+(qxP)=pq
+pq=%pq
The probability of two tails is, qxq=q2
The reader who has done even elementary algebra will at once
recognize the terms of the expansion (p+q)2 which are p2+2.pq+q2.
This gives us a clue to follow up.
Let us noW analyse the sample (jf three coins. If three coins 0,
band c are tossed simultaneously the results would be as follows :-
Thus the probabilities of three heads, two heads and one tail, one
head and two tails, and 3 tails are respectively, pa, 3p2q, 3pq2 and -q3
These are the terms in the expansion (p+q)3
Thus we arrive at a very simple rule of finding out the prCi>babilities
of three heads, two heads, one head and zero head. Their respective
. probabilities would be as follows :-
3 Heads=p3=(!)3 =1
2. Heads = 3P2q= 3(1)2(t) =i
2
I Head= 3P'1 = 3(i)(l) II =i
o Head=q3=(l)1 =1
The terms of the binomial expansion are (P+'1)ft
n(n-I)
=bft+npn-lq+ _ _pn-2q2+ n(n-I)(n-z)p,,-aq3+ ...... +q"
IXZ IxaXJ
6S6 FU'NDAMENTALS OF STATISTICS
4
~,
1
1
5
'V '
441
10 10 1
I 6 15 20 IJ 6 I
7 1 7 2.1 35 35 11 7 1
8 I 8 2.8 56' 28 8 56 70 1
10
9
I
1
10 4~
9 ~ 1:.6 12.6 84 ~
V 10 2.52. 2.10 \2.0 V 10
1
I
It is clear from the above that each term in the triangle is derives!
by adding the two terms in the line above, which lie on either side of
it. Thus as Is indicated .by three small triangles, in line four 6 is ob-
tained by aMing 3 and 3 in the third line; in line ten, 120 is obtained
by adding ;6 and 84 and 45 is. obtained by adding 36 'and 9.
Now if we have to raise a binomial (p+q)8 where P= i and q=
1, we can do it in the following ways :..£.
(p+q'3=.p8+3p3-1d+ 3(3~1) ,pa-Sqz+ 3(3- 1)(3- 2)q3
J '2 IX2. IX2.X3
=P'+3Plq+3pqa+~l
=(1)1+ 3(1)1'(1)+ 3(1)(1)2+(1)3
I p 2.7 - 27
= 64 + 64 + 64+ 64
\Ve would obtain the same, results if instead of actual multiplica-
tion of n (n-I), (n-z) etc.,- for finding out the numerical co-efficients
of various terms we s!lbstitute sC3 , sC~, aCI and 'Co
In such a case
(P+i)3 =3C3P3+3C~pa.q+3C1P~2+3Ccq3
=p3+ 3P"q+3Pq2+q3
THEORETICAL PREQUENCY DISTRIBUTIONS'
7 79z 847
,
6 92 4
79 z
94 8
731
4 495 4;0
5 220 19 8
% 66 60
I 12 7
0 I 0
r~,.\
~600r-~~-4--+-~V/~--+-~~'~-+~~I~-4~
,1 II ,
i'\
~~ 400r-+-___
;v'
~-+---+-~,~---+_~--~---~~~~r+--4_--~_+~
I
200r-~_+---r~~_r--t-_+--+__4--~~r__+--+_4
'\
V '\
o~~/~~~~~~~~~~~~
o3 4 S 6 7 8{9 W «
2 n
NumhH of Succ,ss,.s
Fig. I
Mean and standard deviation of binomial distribution
The mean and standard deviation of such theoretical frequency
distributions where we know the number of independent events and
the probability of the happening of the event in question, can be very
easily calculated. If M stands for the mean of such distribution, n for
the number of independent events and p for the probability of the
happening of the event in a single trial.
M=np
In the question relating to J2 coins solved above the value of the
me:tn of the theoretical frequencies would thus be
THE ORE'll CAL FREQUENCY DISTRIBUTIONS
6,6t
M=u. (1)=6
The mean of the actual frequencies comes to 6.14. Thus the two
means arc not very different from each other.
The '\"alue of the standard deviation of the expected frequencies in
-such cases is
a=VfiPi
lIt the above example the standard deviation of the theoretical
frequencies would be :
a=Vu(!X!)=:Y'3' = 1·73
The standard deviation calculated from the actual frequencies
comes to 1.71. Thus a binomial distribution has mean np and standard
deviation Vnpq.
NOlU{AL DISTRIBUTION
Fig. 2
The above figure shows the area enclosed by'ordinates at I. 2 and 3
sigma distances from the mean ordinate. The fbllowing table shows the
area relationship in a normal curve in more details : -
Area of the Normal Curve between Mean Ordinal,' and Ordi~all.f al
various Sigma distanctS fr011l 'the Mean as Percentage of the Tptal Area.
< , '
With the above formula fl normal curve can be fitted to any given
frequency distribution. The height of the mean ordinate can be calcu-
lat'!d by the formulayo= N~6 u
2.50
and then heights of" other ordinates
at various a distances from the mean ordinate can be calculated by the
formula
Ni x2
- 6 6U 2.7 1828 -
Y= 2.50 --2
2.0'
The following example would illustrate the procedure of fitting a
normal cu~ve.
Example 1. Fit a nor'mal curve to the following frequency distri-
bution relating to the heights of certain children :-
Heights (inches) No. of children
Total 106
==
I 1
=18 X .1 =18X
V %.71828 1.6489
== 10.9 1 75
Thus the heights of the ordinate at one (7 distance from the mean
on either side would be 10.9175. Similarly the heights of other or-
dinates can be calculated and these points can be plotted to obtain a
normal curve.
In actual practice these Calculations ake not done. Only the
height of the mean ordinate is calculated and the heights of other
ordinates are seen from qta,thematical tables. It has already been said
earlier thai the heights of the ordinat~ at various sigma distances ftom
the mean are ~ a fixed relationship to the height of the mean ordinate.'
It has also been said that the height of the ordinate at"one sigma distance
from the mean is 60.653% of the height of the inean ordinate. Thus
if the height of the mean ordinate is 18 the height of the ordinate at
.
one Sigma distanc:e wo uld be 18 X60.6n = 10.9175. Thi' S IS enctly t h e
100 .
sarne figure which we obtained by the use of the formula. When the
heights of various ordinates have been found in the above fashion a
normal .curve, can be drawn from them.
The following tal;lle gives the heights. of ordinates at various sigma
distances from the mean ordinate as percentages of the .height of the
mean ordinate :-
_ _..,D
...l~·~e from Percentage height of the ordinate
1\lean ordinate as compared to the height
of the mean ordinate
.2. 5 ~'9%5
.5 0 88.%5 0
·75 75.484
1.00 60.615
1·%5 45.783
1·50 .P·46 S
1·71 2J.6%7
.t.OO 13·534
.1·%5 7.95 6
.1.5 0 4·394
.t·n %.2.80
,.00 I.IlI
666 ltUNDAMENTALS OF STATISTICS
the rich. Thus, f9rces affecting human beings are not independent in
character. Further, they are not evenly balanced nor do they tend to
produce variations of equal size and magnitude on ,either side of the
average. The value of mode, median and mean do not coincide here,
and as such the distribution which is obtained is not symmetrical but
skew either to the :tight or to the left.
Besides this, there are other difliculties also it}. so~al spences •
.The datil 'are' not only very complex, variable and skew to a marked
extent, but the complexity, variability and skewness are not uniform and
permanent. They are subject to trends, cycles and similar other com-
plex changes. Under such circumstances it becomes difficult to do
dependable researches and to arrive at solid conclusions. But this. does
not mean that the p,roperties of the normal distribution and various
types of inferences that. can be drawn from them are useless in case of
social and economic pheno~ena. In fact as we shall see in subsequent
chapters, it is generally in these fields that the properties of the normal
distribution are utilized in drawing various types of inferences. Of
course these inferences are drawn under certain assumptions and with
varying degrees qf dependability but this does not matter much in eco-
nomic and social studies which are basically characterised as belonging
·to 'inexact sciences/
POISSON DISTRIBUTION
We have s<!en in earlier sections that even when p and tj are un-
equal, a binomial distribution tends to be a normal distribution provid-
ed ,the value of the exponent l1'is sufficiently large so that (p-q) becomes
very small as compared to Vnpq. If, however, p is indefinitely small,
the limits of the series are found in a different way. Here we presume
that n is very large and that the average of the series or np is a finite
number. If the average of the series is represented by a or in other
words if np=a and if the above conditions are· satisfied the binomial
distribution assumes a very convenient form.
The probability of II successes is then obtained by the following
rule :-
an
P"=ronr
where e is the base of the natural logarithms and has a value of 2'7183.
The above equation was given by Poisson in 1837 and is known after
his,name.
As an example let us consider the ~ollowitlg d~ta collected by
BorlkelPiith and quoted by R. A. Fisher showing the clulnce of a cavalry
man being killed by a horse kick in the course of a year. The data are
based on the records of 10 Army Corps for 2.0 years and thus gives 200
readings :-
66S FUNDAMENTALS OF STATISTICS
Now e-a=2·7x83-·61
I
I
- Anti-log (.6x x1dg 2.7183)
I
-Anti-log (.61 x 0.4346)
I
= Anti-log (.265106)
I 1000
= 1.841 = 18 4 1
0°= I because any figure raised to the power 0 is equal to I.
o 1=1
a O
Therefore pO or e-aOT
TaEOlU!TICAL FREQUENCY DISTlUBUTIONS
1000 1 1000
== 1841 X 1= 1841
This is the probability of 0 -deaths and therefore the number of
1000
deaths e'YfVocted
-r-
in 7.00 readings would be - - X
Is41
7000= 108.7
1000 372 I
= 18 4 1 X iOOOO
and the expected number of deaths in 200 readings would be
1000 ~721
-8- X - - X 200= 20.2
1 41 20000
In the same manner we can find out the expected frequencies !=If;
and 4 or more deatl;).s. They would be respectively 4.1 and .7.
Now we can compare the actual and the expected frequencies
from the following table ;_
Number of deaths per Frequencies expected Prequencies
year per crop in 200 readings observed
0 108,7 10 9
I 66.; 65
2 20.2 22
~ 4. 1 ;
4 ·7 1
Total 200.0 200
In Poisson's distribution
Mean or a=np
a
=nx-
11
=a
and the standard deviation or
a=Vnpq
=J n X : X I =Va
Thus the standard deviation of the Poisson distt;ibution of
example No. I solved above would be V.6I=.78I. For the actual
frequencies the value of standard <;levja~on comes to .78 which is
"Very close to the expected value of the standard deviation.
Thus~ it is clear that the above series is anI excellent example of
Poisson distribution.
In fact the Poisson's distribution applies to such cases where we
can count the number of times an event happens.. but .where it is f)ltile
to find out the number of times it did not hippen. Thus in the
~bove illustration we can count the number of times a man was
killed by a horse kick but· it is meaningless and out of question to
nnd out the number of times a man was not killed by a horse kick.
Similarly 'it is possible to count the number of goals scored in football
matches by a particular team but it is not possible to know the number
of times goals were not scored by the said team. Here p or the pro-
bability of the; happening of the event is very. small and so q is almost
equal 'to unity. In such cases ,we cannot use the pinomial expansion
(p+q)" becaus~ the value of p is unknown. As such Poisson's dis-
tribution raises the equation
Therefore
and in 3%5 pages the expected frequency against 0 mistake per page
would be
Similarly
0 ~ 44
pl=Z'7 18 3-'" _:44 = 2. 93 X100
-
1 3%50
and the expected frequencies against one mistake per page would be
%093 44
-- X - X 3%5=9%.1
3%50 100'
Slmilarly
II Z093 ('44)2 2093 X 193 6
P =32~~ X -z- == 3250 zoooo
and the expected frequency against two mistakes per page would be
20 6
93 X 193 X 325 =20·3
3250 20000 .
2. I I 2. 0 9.3
°I 90 92.·1
19
%
3 5
" %0·3
3·0
4 0 0·3
6. Two types of electric bulbs have the same average life of 2000 hours. Their
standard deviations are, however, 15 and 20 hours respectively. In each case what
is the chance that the bulbs would not burn longer than 1800 hours ?
7. How would you fit a normal curve to a given frequency distribution ?
8. Fit a normal curve to the following data and find out its mean and standard
deviation : -
Results obtained by W.F.R. Weldon, of 4096 throws of 12 dice each, a throw ·,f
4, 5 or 6 being called a success : -
Success Frequency
o o
1 7
2 60
3 198
4 430
5 731
6 948
7 847
8 536
9 257
10 71
11 11
12 o
9. Assuming that half the population in India is vegetarian so that the chance
of an individual being a vegetarian is half and assuming further that 500 invatigators
each take 12 individuals to see whether they are vegetarians, ho~ many investigators
would you expect to report that four people or less were vegetarians ?
10. A card is drawn from a pack of 52 cards and thC{! replaced. The process
is repeated 10 times and the number of black cards drawn IS noted. One thousand
such experiments (of drawing 10 cards in each experiment) are conducted and the
results obtained are given below : -
Number of Black Frequency
Cards
o 0
1 8
2 46
3 117
4 210
5 245
6 206
7 120
8 39
9 6
10 4
What theoreti-;al frequency distribution would be expected to apply to the
above data and why ? Calculate the theoretical frequencies of t:1e distribution and
see if the fit is good.
11. The district of Rangoon was divided in 100 zones and th~ number of
direct hits on the residential houses during the fiying bomb raids in the last War was
recorded. Results are gh·en, below ' -
Number of Hits Number of Zones
o 23
1 35
2 23
3 12
4 4
5 2
6 •
L
Total 100
FUNDAMENTALS OF STATISTICS
W"nich theoretical frequency distribution should apply in the above case and
why ? Calculate the theoretical frequencies in that distribution and compare them
with the obseC"'ed ones.
12. Articles are produced by a factory in la~e quantities and 3% of them are
found to be defective. They are despatched in batches of equal number. How large
should a batch be to ensure that
(0) not more than 1 in 5;
(b) not more than 1 in 10;
contains more than three defective articles.
13. If the chance of being killed by a moto.r accident during a year is 1/3000
usc Poisson distribution to calculate the probability that out of 500 persons at least
one would die of motor accident in a year.
14. A person can hit a target one out of tWentY' times. Use Poisson's distri·
bution to determine how many tdals should be had in order to have 99% chance of
hitti ng the target :It least ten times.
15. Male and female children are born in approximately equal numbers. If
twins are born, in what relative proportion would you expect
(0) two boys;
(b) two girls;
(c) one of each.
16. In what circumstances may a Poisson distribution be used? Give the
genet'll term of Poisson distribution and'derive the mean and variance of the distribu-
tion. (P. C. S. 1953).
17. Derive the normal distribution as the limiting fotm of the symmetrical
binomial distribution.
Assume the mean height c>f soldiers to the 68.22 in, wIth a variance of 10.8
:in.). How many soldiers in a regiment of 1,000 would you expect to he over six
feet tall? (1. A .. S, 1956).
18. Give an example of the Poisson distribution, explaining the underlying
stochastic model responsible for it.
In a city with 400 census blocks. each having approximately the same popula-
tion, the frequency distri bution of the number of cholera cases is as follows :_
No. of cases o 1 2 3 4
No. of city blocks 160 146 64 25 5
Examine by an appropriate goodness of fit test whether the occurrence 0 f
cholera cases is distributed at random all over the city. (1. A. S., .957).
19. From records of 10 Pl'ussian army corps kept over 30 years the follOWing
data were obtained showing the number of deaths caused by the kick of a horse. De-
termine the average number of deaths per army corps per annum, and calculate the
theoretical Poisson frequencies.
Number of deaths Frequency of
per army corps occurrence
per annum
o 109
1 65
2 22
3 3
4 1
Total 200
20. A London district was divided up into 200 sub·areas And the number of
direct hits on dwelling houses during the flying bomb raids Was recorded :
THEORETICAL FREQUENCY ,DISTRIBUTIONS 67S
study ha~ 'heen made in such a manner that we can obtain a Jarge variety
of information.___about the phenomena to which the sample relates, it
would be easy f~us to have an idea about simila~ inform.ation relating
to the universe. If, fQr example, the sample studIes relatlOg to expen-
diture of selected stud~ts in Indian universities have been done pro-
perly, they would give us an idea about the distribution of expendit~re
of all the students in Indian universities. Thus the aim of samphng
studies is to obtain the best possible values of the parameters. (The word
parameter is used to indicate. various statistical measureS like mean,
standard deviation, correlation, etc., in the universe. As against this
the term statistics refers to the statistical measures relating to the sample.)
This aim is best achieved if the sample studies are made in such a way
that they disclose a mathematical relationship between the .values of
the distribution. For example, if it is found out that a part1c~lar ~re~
quency distribution obtained by a sample study conforms to Blnoml~l,
Normal or Poisson distribution, the parameter values can be very eastly
estimated and a high degree of reliance can be placed on them.
Thus a large part of sampling theory is devoted to finding out
some constant of the universe. If they are found out, a very accurate
idea about the parent distribution is obtained from the sampling studies.
Even if only the mean and standard deviation of the universe can be
estimated by some mathematical relationship observed in the sample it
is enough to have an idea about many other parameter values.
I
Precision in sampling
Since the main aim of sampling studies is to obtain information
about the problem under study in the universe at large, and .since
sampling studies are made only from a few units collected out of a
large number constituting the universe, an obvious question that arises
is, "to w~at extent can we depend on the sample estimates" ? It is
clear that 1f a sample fails to reveal the main characteristics of the uni-
verse. it does not serve the purpose for which it is meant. As such the
qu~st10n ~elating to . the reli~bility and de~end~bility of the sample
estImates IS a very VItal and Important questIOn In theory of sampling.
I~ th~ sa~pling studies reveal unmistakably that the observed frequency
distrlbut!?n conforms to some theoretical frequency distribution thl
prC?blem l~ solved to a considerable extent because then it is possibl. to
estimate .he parameter values with a high degree of accuracy. If, how-
ever, no such mathematical relationship is disclosed the problem has to
be very carefully investigated. The contlllSionJ in the Jan,pling studies are
based not on certaintiu but on probabilities. The probabilities of some
~yents ~re high and of others low and the degree of accuracy of samp-
hn~ estImates, naturally. depends on the degree of probability with
whtch they are made. Thus if out of 1000 people 999 have heights
below 74" we can say with a high degree of confidence that the height
of the Ioooth person would also be below 74". Here the probability
of the statements being true, is very high, almost touching the realm of
certainty.. Similarly the probability of the accuracy of the statements
THEORY OF SAMPLING
that a man cannot jump higher than IZ feet or that a man cannot live
for more than 150 years, is so high that they are never questioned.
Theoretically speaking a man can be more than 74" high, can jum?
higher than IZ feet and can live more than 150 years. As against such
events there are others whose probability of happening is very low and
we cannot make any assertion with even a fair degree of accuracy.
Theory of sampling makes an attempt to indicate the degree of
reliance that can be placed on various estimates obtained from samp-
ling studies. This is done by assigning limits within which the estimate
is expected to vaiy. These limits vary with the degree of confidence
which we wish to achieve in our assertions. Thus if we want to asset:t
a fact with a very high degree of confidence, the limits which shall be
placed, will be wide so that the chance of the estimate going beyond
them is minimum. It means that the degree of confidence which can
be put in any estimate is expressed in terms of probability. We can
thus make a statement that the probability that the average monthly
expenditure of university students in India would be within the limits
of Rs. 60 and Rs. J.ZO, is .99. It would mean that the degree of confi-
dence that we place in the estimates is very high because the probability
of the actual figure being beyond these limits is very low, 1-.99 or
.01.
The accuracy or precision of estimates depends on a variety
of factors. The first is the manner in which the estimate i.r made from the
sample data. This leads us to the theory of estimation. The second
is the manner in which the sample was obtained. This leads us to the study
of technique of sampling. A third factor is tbe size of tbe sample. If
the size of the sample is small much reliance cannot be placed on the
estimate.
We shall now discuss the various types of sampling and see what
is the extent of confidence that can be placed on various samples under
different types of sampling.
Types of sampling
Samples can be selected from a universe in the following three
manners :-
(I) By random sampling.
(z) By purposive sampling.
(3) By mixed sampling.
Ral1dom sampling. As has been discussed in earlier chapters random
sampling is one where the individual units constituting the sample at:e
selected at random. By random selection we mean that the selection
has been done in such a manner that the probability of the inclusion
of each item of the universe, in the sampfe. is equal. The selection is
thus entirely objective. The first and by far the most important ques-
tion that arises here is, how to obtain such a sample. We shan discuss
a little latter, how a random sample can be select(",.J and how selections
which appear at first thought quite r-andom, are in r ...ality not so.
-"0 ' FUNDAMENTALS OF STATISTICS
selection is not purely random. As w.e shall see a little later the selec-
tion of a purely random sample is a very difficult task and one can
never be sure that a particular sample is a perfectly random sample.
Another reason of bias in random sampling may arise when a selected
unit is, for some reason, deleted and a new unit is substituted in its
place. Thus, if a" particular student selected at random, is not available
for questioning or if a particular house selected at random in a house-
to-house enquiry is vacant, and if a new unit is substituted in such cases,
the sample no more remains a random sample, and the results may be
biased.
Further, the results of random sample enquiry may be rendered
inaccurate if the selected units are not properly investigated or even
where they are investigated properly, faulty reasoning is applied to
draw inferences.
The foregoing discussion clearly shows that if a sample is to be
made free from bias it is necessary that all personal choice should be
eliminated. The human factor must have the least say so far as the
selection of the sample is concerned. The technique of selection should
therefore, be such, that no room is left for the personal whims of the
investigators. As we shall see in the following paragraphs the selection
of samples is now-a-days done by mechanical aids and through such
devices that the human factor has the least chance of affecting the
choice.
Other types of bias which arise due to substitution of new units
in pbce of selected ones or due to incomplete investigation of the
sampled units can be removed if only proper care is taken in conduct-
ing the enquiry.
SELECTING A RANDOM SAMPLE
on the size of the universe and the size of the sample. Thus, if the~
are 100 rOams in a hostel each with a serial number we can select every
tenth room and the student Hving in it can be included in the sample.
Another arrangement is that the names of the students are written in
alphabetical order and then every tenth student is selected from the
list. Similarly, the units of the universe can be arranged geographically
and the selection of the sample done in accordance with the above
procedure. 'Thus, if we have to select 10 villages out of 100 in a par-
ticular district, the villages can be arranged geographically so that the
names of villages in different tahsils are noted down and every tenth
village selected.
The above mentioned methods would give a random sample unless
every tenth unit is of a variety diffr.::ent from the common lot in which
case the sample would no more be random.
number to each unit of the 'universe we assign it, say 104 numbers, so
that the first unit has numbers 0001 to 0104 and the second unit 0105
to 02.08, we shall be in a position to use all the numbers of the table.
Having done this we can select any ten numbers from the tables and
they would constitute the sample. If we get two or more numbers say
in 0105 to 02.08 group we can ignore the numbers after that unit has
been selected.
A question that arises at this stage, what is the guarantee that
these numbers are really random? No proof of it can be given but
experience has shown that the numbers have given very satisfactory
results. Thus the proof of their randomness lies only in the success
of thousands and thousands of repeated investigations that have been
conducted by using them throughout the world. Another set of 1,00,000
numbers has been constructed by Kendell and Babbington Smith by
using a randomizing machine. They are also very popular and have
given correct results in a larger number of investigations.
:2) In an infinite universe
Collection of data. After a, sample has been selected the next step
is to collect the required data from the sample units. For this purpoSe
we shall have to select an appropriate method of collection of data. We
have already discussed in an earlier chapter the various methods which
are used in the collection of data and their merits and limitations. The
reliability of the results of a sample survey depends to a considerable
extent on the manner in which data have been collected and on the
reliability of the collected information. The fact that a particular
sample is a purely random sample is absolutely n9 proof that the in-
ferences drawn from its study are also dependable. Sample surveys
may give most unreliable and inaccurate results even though tht:
samples are purely random, if the data are nQ.t collected properly. If
the investigator or the informafJ.t is biased or if the questionnaire
adopted is unsatisfactory or if the method of collection of data is not
appropriate, the results of the survey are bound to be misleading.
Therefore,it is necessary that adequate care be taken in the collection of
data and one should not think that since the sample selected is random
the results ought to be satisfactory.
After the data have been collected, statistical inferences are drawn
from them. It has already been said that sample studies are meant to
draw certain conclusions regarding the universe as a whole, and as such
the generalisations from sample studies should be very carefully made.
The question that naturally arises here is how far would the results of
the sample hold good for the universe as a whole. It is obvious that in
such studies one cannot be dogmatic about the inferences. Usually
the conclusions are started in a very general form. As has been said
befor~ the extent of confidence associated with a particular generali-
sation is expressed in terms of probability, and invariably certain limits
are laid down within which results are expected to vary. In inductive
reasoning we apply to the unknown (universe) the results of the known
(sample). It is no doubt a leap in the dark, but statistical methods
t:elating to sampling, definitely reduce the hazards and dangers involved
in such an attempt.
THEORY OF SAMPLING 67 g
1. Show the necessity of the uses of the method of random sampling in any
extensive investigation. How would you make use of these methods in carrying out
an economic survey of the rural areas of C. P. ? (B. Com., Nagpur, 1948).
2. What is meant by "sample" methods of enquiry? When is it adopted
nnd what are its advantages ? Describe the test that may be applied to determine
whether the sample is representative or not.
(B. Com., Hans., Andbra, 1944).
3. D'iscuss the special fcatures of thc diffcrent types of universes from which
samples can be drawn.
4. What are the main objects of sampling? Compare and contrast the merits
and drawbacks of Sample and Census studies.
5. What do you undcrstand by ,"random sample?" Is it a synonym for "re-
presentativc sample ?" Why is a random sample supposed to speak for the 'population'?
To what types of enquiries is the technique of random sampling specially applicable?
(M. A., Rajplllana, 1951).
6. Discuss the various methods of judging the .teliability of various types of
sampling studies.
7. "Random sampling owes its importance to the fact that we can assess the
results obtained from it in terms of probability."
Elucidate this statement and also discuss the technique of random sampling
investig'ltion. (M. A., Allahabad, 1950).
8. What do you u?dcrstand by the ten"? '.'bias" _? .How ~ bias in .samples
he reduced ? Is it pOSSIble to completely eliminAte bIas 10 samplmg studIes? If
not, why?
9. Discuss the relative ad\"llntages and dis"dVAn~gcs .of the ~ethod of comple~c
enumcrdtion and the method of random sample su~y In sQC1al and. economIc
enquiries. (M. A., P;:I!Iab, 1952).
10. Di~cuss the important methods of selecting rtuldorn slImplcs ffOUl different
types 0 f u ni verses.
688 FUNDAMENTALS OF STATISTICS
11. How are purposive and mixed samples selected? What are the chief
dar.:gers in the selection of samples by these methods ?
12. ·'One of the aims of statistics is to describe population (through sample)
and to this end statistical constants are calculated".
Discuss fully the above statement and show how this is sought to be achieved.
(M. A., Paino, 1'940).
13. Write a note on the respective merits and demeritS of:-
(a) Random sampling,
(b) Purposive sampling,
(t) Mixed sampling.
14. Compare the relative advantages and disadvantages of the method of
complete enumeration and the method of random sample survey. Explain with
reasons the method you will adopt in enquiries relating to : -
(a) Area under rice.
(b) Cost of production of sugarcane in Bihar. (M. A., Paino, 1942).
15. Write a note on the theory of sampling.
16. How would you conduct a sample survey? What special points should
be kept in mind in the : -
(0) selection of a sample,
(b) collection of data.
17. What do you understand by "statistical induction"? What precautions
are necessary in drawing inferences from a sample survey ?
18. What is meant by "precision" in connection with sampling studies? Discuss
how the precision of sampling studies is estimated at various levels of significance.
19. Discuss how the theory of sampling is based oJ the theory of probability.
20. Write short notes on :
(a) Inertia of large numbers,
(b) Multi-stage-sampling,
(t) Stratified sampling,
(d) Utilisation of variable sampling fractions.
21. If a sample is obtained by selecting every tenth item, what possible bias
could result? Give examples. Why is this not a random sample ?
22. Suppose it is desited to estimate the mean family size in a certain town.
Would the recording of the family size of a random selection of high-school studc.'llts
be a reasonable way to obtain data ? '
Sampling of Attributes 26
We have already pointed out the distinction between statlsl:1cs of
variables and statistics of attributes in earlier chapters, and have also
discussed various statistical methods which are used in these two types
of statistics for the purpose of analysis of data. In sampling studies
also, we shall discuss the statistics of attributes separately from the sta-
tistics of variables.
The. sampling of attributes may be understood as drawing a
sample from a universe which consists of A's and a's. Thus if we
are studying the problem of blindness and if this attribute is represent-
ed by A then a would represent the absence of blindness. In order to
find out percentage of blind people in a particular universe we may
take a sample and study the percentage of blind people in it, so that we
may be in a position to draw certain conclusions about their percentage
in the universe. For the sake of convenience here also we shall call
the drawing of an individual on sampling as an "event", and the pre-
sence of attribute A as "success" (represented by p) and its absence as
"failure" (represented by q). Thus, if we have taken a sample of 200
people and 'if we find that out of them 8 are blind we will say that the
number of events was 200 and out of it there were 8 successes and 192
8 192
failures. The probability of success or p=- and q=-
200 200
Simple sampling
Before proceeding further we' shall lay down certain assumptions,
which we presume, would hold good in the sample which is under
study. The sampling which satisfies these assumptions would be called
"Simple SaIlJpling." Thus by simple. sampling we shall mean a random
sample in which the following conditions hold good ;-
(I) The probabilities of drawing individuals with attributes A, or the
chance of success of various events are independent whether previous trials have
been made or not. It means that ~he proportion of A's at each draw of a
sample unit is identical. This assumption holds good in case of toss-
ing a coin or drawing a ball or a card provided that before the second
and subsequent draws the ball or card drawn previously is re-
placed. Thus the probability of a coin falling "heads" is identical in all
throws and similarly the- probability of drawing a black ball from a
bag containing three black and four white balls is identical for all
draws provided there is replacement each time. In actual practice
this condition would not hold good in drawing samples relating to
attributes froJIl a "finite" population. For example, if in the universe
of Io,oos>---people there are 100 blinds the probability of drawing a
FUNDAMENTALS OF STAl'ISTICS
blind in the first event is 100 and in the second_22._ and in the
10000 9999
8
third 9 and so on. It will be noticed, however, that if the num-
"9 8
ber of items is very large there will be no material difference in the
_
probabilities of various events even if this condition does not hold
good.
(2.) The probability (or p) of drawing an individual with attribute A
remains constant and is the sdme for all samples. This condition would hold
good only if the proportion -of A's in the universe remains constant
each time a sample is drawn. If a dice is tossed at two different places
or at two different times the probability of success (if coming of
No.6 is taken as success) would be identical. This cannot be said
about sampling of attributes if the two samples have been drawn at
two different places (of the same universe) or at two different times. The
proportion of blind in the same universe would not be identical either
at two places or at the same place at different times. In the analysis of
sampling of attributes we presume that this would be so.
The simple sampling is a particular type of random sampling in
, which the above conditions hold good.
It should, however, be kept in mind that in a<ktual practice in
most of the data that we shall come across, these conditions would
not hold good and statistical inferences will have to be made with the
hypot~esis tha~ these con~itions a~e satisfied .by th: data. In certai~
biologIcal studies and studIes relat.lUg to phy~lcal sCl.ence5 these condi-
tions do hold good. In econom1C and sOClal studies, however, the
limitations imposed by these conditions do not leave much room for
I the application of the rules which we shall discuss below and which
apply in case of simple sampling only.
Mean and standard deviation in simple sampling of attributes
We have already mentioned in earlier chapters that if the probability
of tlu! happening of an event in one trial is known we can find the
probability of its happening r times in n trials by: the expansion of
a binomial. If p denotes the chance of success of an event and if I-P
or q denotes the chance of its failure and if we take N samples with
n events the frequency of samples with n, (11-1), (n-2..) ..... successes
are the terms in the series N (p+q)n or
N { r+nr-1q+ n(n-l) pn-Zq2+ .. .q" }
IX2.
We have mentioned in the chapter on Theoretical Frequency
Distributions that the mean and standard deviation of such series are
given by the following rules :-
Mean or M=np
Standard Deviation or (J = v'npq
SAMPLING OF ATTRIBUTES
u'= JPt
The follow~ng examples would illustrate the above rules :-
Example x. Suppose four coins are tossed simultaneously 1600
times and falling of heads are called successes. We have thus 1600
samples of four tosses eac)l.
Successes Frequency
4 90
3 42.0
2. 'So
I 410
0 100
Total 1600
In the above case the mean of the series is ± 1.994 and the stand-
ard deviation .99. If the coins are unbiased and if they are properly
thrown the value of p or the probability of success would be t and fre-
quencies of 4, 3, 2., I and 0 successes would be the various terms of the
expansion x600(!+t)' or they would be respectively 100, 400, 400
and 100. For this theoretical or expected distribution, the value of mean
would be=np or (4xi) or 2. and 6.e value of standard deviation
would be=v'fiiq or V4XiXt or I.
We thus find that there is some difference between observed
values of the mean and standard deviation and their expected values.
These differences may be due to what we have been calling fluctuations
of sampling. The question that arises here is, to what extent can such
differences be assigned to sampling fluctuations and consequently
ignored? We know that samples which are classified according to fre-
quencies of attributes give rise to binomial distribution. A binomial
distribution gives a single humped type of curve when p and q are
equal, or even when they are unequaf but the value of the exponent n
is large it gives us a distribution which very closely resembles a nor·
1llal distribution. We, however, know that in a normal frequency
distribution 99.73% of the items lie within the limits given by mean
FUNDAMENTALS OF STATISTICS
Solution
The total number of throws = ;086 X 12.= 37°;2
Thf" chance of success, that is of throwing a
2,3 or 4 with one dice in one throw =1'
Hence the expected value of successes = 1X 37032. = 185 I~
The observed value of successes is 19142..
Thus, the observed number of successes is in excess of the expect
cd number by (19142-18516)=62.6.
The standard deviation of simple sampling is
a=vnpq= vtx iX,37032.
=9 6.2
The deviation observed is 6.5 times of this figure and it is, there-
fore, most imp~obahle that it is due to fluctuations of sampling.
Example 3. Certain cross of the pea gave 532.1 yellow and 1804
green seeds. The expectation is 2. 5 per cent. of green seeds on a Men-
delian hypothesis. Can the divergences from the expected values have
arisen from fluctuations of simple sampling only?
SAMPLING OF ATTRIBUTES
Sofution
The total number of pea seeds examined=(H21+1804)=7l%~.
The expectation of green seeds is 25 per cent. of the total.
:. the expected result is 1781 green seeds. But the observed
result is 1804 green seeds, and so it is in excess of the expected result
by 23. The stand~rd deviation of simple sampling is
u=v'npq=v'0.25 XO.75 X7IZ5-36.6.
The diver~ence from expectation is thus only 0.6 times of this
and hence may very well have arisen from fluctuations of simple
sampling.
Example 4. Balls are drawn from. a bag containing equal numbers
of black and white balls, each ball being returned before drawing
another. In 2250 drawings, 1018 black and up white balls have been
drawn. Do you suspect some bias on the part of the drawer ?
Sollliion
The expectation of drawing a white ball in one draw is t, since
the bag contains equal number of black and white balls.
In 2250 drawings the expected number of white balls is II2S.
But the number of white balls drawn is up.. Thus the numerical
difference from the expected result is 107.
The standard deviation of simple sam.pling is
u==v'npq= vixix:U50=23.7
The divergence from the expectation is thus about 4.5 times of
this and hence it is not probable that it arose due to ~fluctuations of
sampling.
Explanation of the deviation must be sought somewhere else, and
it seems reasonable to suspect that the drawer was biased.
In the above example we have calculated the standard deviation
of numbers of the simple liampling. We can similarly calculate the
standard deviation of the proportions of the simple sampling by the
use of the formula already given. The following examples would illus-
trate the rules regarding the calculation of the standard deviation of
proportions of simple sampling.
Example 5. A group of scientific men reported 1,70~ sons and
1,527 daughters. Do these figures conform to the hypothesis that the
sex ratio is i ?
Solulion
The total number of observation is 170 5+ 1527=3231.'
The number of sons is 1705.
'694 FUNDAMENTALS OF STAnSnCS
0
Therefore the observed male ratio is 17 5 or o.p-n.
32 32.
On the given hypothesis the male ratio ought to be 0.5000.
Thus the observed male ratio is in excess of the theoretical ratio
by 0.0275.
The standard deviation of the proportion is
s=J_1!!L=
n
J !X!X _1_=.008-8.
3232.
The divergence from hypothesis· is thus about 3:13 times of this
standard error and it is, therefore, most improbable that it arose as a
sampling fluctuation. Hence it can be definitely said that the figures
given do not conform to the given hypothesis.
Example Ii. I2. dices were thrown 6500 times, 4, 5 or 6 being
reckoned as a "success". What proportion of success do you expect ?
If in actual observation the proportion of success is found to be o. 5016~
find the standard deviation of the proportion with the given number
of throws and state whether you would ~egard the excess of successes
as probably significant of bias in the dice.
SO/Iltion
The total number of throws=6500XI2=780oo.
The expected proportion of success is ! or 0.50000.
The observed proportion of successes is 0.5016, and thus i$ in
excess of expected proportion by .0016.
The standard deviation of the proportion is
s=J Pi ~J !xix
'..
1
7 8000
•001 79
The deviation observed is onlY'9 times this figure and it is, there-
.fore, probable that it arose ·as a sampling fluctuation. Therefore, the
,excess of the proportion of successes is not significant of bias in the
dice. .
Standard errors
The standard deviation of simple sampling is briefly called Stan-
dard Error. The term standard error has in reality a wider meaning
than merely the standard deviation of simple sampling. But for the
sake of convenience the term can be defined as mentioned above. Thus
if the difference between the actual and observed frequencies is Plore
than three times the standard error the difference is said to be signifi-
cant which means. that such a difference could not have arisen due to
fluctuations of sampling or the probability of such a difference arising
due to chance is very very low. If the difference is less than three-
SAMPLING OF ATTRIBUTES
On this basis the limits would be 37.5±(3 X2.S) per cent. Or 30%
and 45%.
We find that the difference between the two results is very little.
Example 8. 500 eggs are taken at random from a large con-
signment, and 50 are found to be bad. Estimate the percentage of
bad eggs in the consignment and assign limits within which the per-
centage probably lies.
Sollliion
0'= =
J
=3·4
1032.
--X
89032.
38o()(
89032.
'X 1000
As such the range within which P lies is not dependent on the size of
the universe. This is the reason why the size of the universe is not
indicated anywhere in the formula of the standard error. But standard
error is affected by the size of the sample. If, therefore, P is constant
and if 11 is changed the value of standard error of p would also change.
The value of the standard error varies' inversely as the square root of 11.
Therefore, if 11 becomes larger the value of the standard error becomes
smaller. The standard error decreases In proportion to the square
root of the number of items in the sampls:. If the value of p is t and
the value of 11 is 100 the value of standard error would beV! X r! X 1.
or .05 or 5%. If we wish to reduce the standard error to one-hai}uolf
its magnitude that is 2.5% the value of 11 should increase four-fold and
not two-fold only. ThusViXtX-ri-u-.025 or 2.~%.
Standard error and precision
Standard error gives us an idea about the unreliability of a sample.
The greater the standard error the greater is the departure of actual
frequencies from the expected ones and consequently greater is the un-
reliability of the sample.' The reciprocal of the standard error· or
({ I d is a measllre of reliability of the sample. We have seen
stan ar error
in the chapter on Measures of Dispersion that this value is called
Precision. The reliability or precision of an observed proportion varies
as the square root of the number of items in the sample. To double
the precision (which means the same thing as reducing the standard
errOrs to one-half) the number of observations should be increased to
fourfold and to treble the precision number of observations shOUld be
increased to nine-fold.
Standard error of the difference between proportions of two samples
In question done so far we have tried to study the difference
between the· actual proportions as observed and the expected propor-
tions in the universe. There may be cases where two samples have been
taken from distinct materials or different popul~tions and they give PI
and P2 as the proportions of A's, the numoer of observations in the
two samples being 111 and 118 respect~vely. The question that may arise
here is whether the difference in the two proportions disclosed' by the
two samples is significant or there is no real difference between them
and the observed difference is due to fluctuations of sampling, the two
populations being similar so far as the proportion of A's, in them is
concerned. In such ·cases we do not have any idea about the propor-
tions of A's in the universes from which the samples have been drawn.
However, in such cases we can proceed on the NIIJI HJPofhuis, i.e., on
the hypothesis that there is no difference in the values A and 11 and
whatever difference is there is due to sampling fluctuations. We can
further assume the value of p in the univers~ of Po as the weighted mean
proportion in the two samples taken together. In other words
SAMPLING OF ATTRIBUTES
P0= PxnJ'+P'J!Ia
n1+n:
This is the best possible estimate of Po that we can have in the
given circumstances. The standard ert:ors in two samples would be
1: e. 1 - 2 =J Poqo (!_
nl
+ _1_)
na
If the observed difference between PI and pg is more than three
times the standard error of the difference it is sigriificant, otherwise it
could have arisen due to chance fluctuations and as such c:,'ln be
ignored.
The following examples would illustrate the above rule :-
Example 10. In a random sample of 1000 persons from town A
400 are found to be consumers of rice. In a sample of 800 from to~n B'
400 are found to be consumers of rice. Discuss the question wh~the;
the data reveal as significant difference between A and B so far aa the
proportion of rice consumers is concerned.
Soltltion
In the two towns together, the percentage of rice consumer!> is
(400+400) X 100
Po= 1000+800 44·4
and therefore
qo= (I 00-44.4) = 55.6
In town A it is 40 per cent. whereas in town B it is 50 per cent.
The difference in the percentages of rice consumeh in the two towns is
10. Assuming that the samples taken are simple samples, the standard
error' of sampling for the difference between percentages observed i.n
the samples of the given sizes would be:
=J 6) (1~00+ 8~O)
(44·4X55·
=2.3S7 per cent.
The actual difference which is 10 per cent. is over 4.2 times this
standard error. So it can be ,concluded that the data reveal a signifi-
cant difference between A and B so far as the proportion of rice con-
sumers is concerned.
Example II. The following table gives the proportion of dark-
coloured people in two cities.
FUNDAMENTALS OF STATISTICS
p ( I0 5+ 149)XIOO 6
0= 250 + 450 ~ .~ approx.
and therefore !1o=(Ioo-36.3)=63.7 approx.
If this were the- true percentage, the standard error of sampling
of the difference between percentages observed in samples of the given
sizes would be
.T. e'1-2=(po!10)
(_1
t
fix
+_1)* 112
The actual difference is I. per cent and is only 1.6 times this stan-
da.rd error a.nd so could have arisen due to fluctuations of simple
sampling.
Hence, it c~nnot be reasonably concluded that the product of the
first factory is inferior to that of the second.
Sometimes we may come across cases where the proportiort of
A's are not the same in the two materials or universes from which the
asmples have been chosen, but PI and P2 are the! real proportions. In
such cases we may be interestea in finding out whether the difference
would vanish if further samples were taken. Such a situation usually
'1.rises in questions where association between attributes is studied. The
proportion of A's in the universe of B and in the universe of ~ may be
I different from each other and we can presume that PI ~nd PIl are the
real propo.rtions and then we can test our hypothesis. We may then
find out whether further samples would also indicate the difference in
the proportiori of A's in the universe of B's md ~'s or whether the
difference has arisen only in the [resent case due to sampling fluctua-
1:lons. Here the standard error 0 the two proportions Pi and P2 would
be respectively as follows :-
s. e' 1 = J Pl(/1
111
and
s. e..,. = J'PaqS
n.
As such the standard errOr of the difference of j'l and h would be
The value of
200
PI= 4 000 or .05 or 5%
and of
I. e'1-2= J "I
h91 + hfJa
"2
= J 10 X 9
1000
0
X 5 X 95 per cent.
4000 .
=1. 01 9%
s. e.1-o= j Potio
n1+nll
X n1
nil
In the present case Po or the proportion of deafs in the uni"erse
.IS equaI to -300- or .06 or 6%, and qo therefore = 100-6 or 94%,
5000
The values of n1 and n2 are respectively 1000 and 4000. Substituting
these-figures in the above formula we get
s. e·1-o=
J 6X94
1000 X 4000
X-
4 0 00
1000
23. The ligures for ~ntitoxin treatment in a hospital, {or a certain period, in •
the trestment of dipbfheria were : -
Cases Deaths
·Antitoxin treatment 228 37
Ordinary treatment 337 28
Can it be concluded that there were significantly more deaths in the group
treated by antitoxin.
24. The following table relates to the hair colour of girls at Bombay.
Of Dark Total Percent
Hair Colour observed Dark
Bombay 21,537 49,507 43.5
Film Sector of Bombay 4,008 9,743 41.1
Non-Film Sector of Bombay 17~529 39,764 44.1
Do you regard the difference observed in the percentages of girls of dark hair
colour in Bombay and its Film Sector as significant?
25. The subject under investigation is the measure of dependence of Tamil
on words of Sanskrit origins. One newspaper article reporting the proceedings of
the Constituent Assembly contained 2025 words of which 729 words were declared
by a literary critic to be of Sanskrit origin. A second article by the same author des-
cribing atomic research contained 1600 words of which 640 words were declared by
the same critic to be of Sanskrit origin. Assuming that simple sampling conditions
held estimate the Bmits for the proportion of Sanskrit terms in the wnter's vocabulary
and examine whether there is any significant difference in the dependence of this writer
on wdrds of Sanskrit origin in writing on these two subjects. (1. A. S., 1947).
26. Show how you would test the significance of the difference between the
prevalence of a certain attribute in two given populations from each of which you
could take large samples.
In a random sample of 500 men from a particular district of U.P•• 300 are found
to be smokers. Out of 1,000 men from another district. 550 are smokers. Do the
data indicate that the two districts are sign. tcantly different with respect to the pre-
valence of smoking ,among men ? (P. C. S., 1953).
Sampling of Variables
(L,arge Samples) 27
NIlMI oj tbe problem. In ~he last chapter we studied the sampling
of attributes and were concerned with the question whether a particu-
lar member of the sample possessed an attribute A or did not possess
it. Now we shall be discussing the sampling of variables and here we
shall come across such individuals which can assume any value of a
vapable. Thus in a series relating to heights we are nO'more concern-
ed with the question whether a particular individual is tall or not tall
but we have before us individuals who can have any height ranging
fro1l;l the lowest to the highest. Under such circumstances we cannot
classify the items of a sample in two groups-one ~ossessing an attribute
and the other not possessing it, because in statistics of variables the
values of various items of the sample can range within wide limits.
Theoretically speaking the ·limits arCf infinite but for the sake of con-
venience and practical considerations the range is limited.
Ol!}crts Of siutfy. The aims of samp,ling studies in statistics of
variable are the !lame as in case of statistics. of attributes. Here also we
compa,., the arlllaL or observed frequencies wilh Ihosl eXPfcled IIfIder cerlain'
aJ.fumptions and try to find out whether the difference can be audbuted
to chance. As in sampling of attributes here too we try 10 oblain one
or 11JJO ronstants for fhe universe-mean or standard deviation because if
they are obtained, an idea about the type of parent distribution is easily
formed. In sampling studies relating to variables, as in sampling studies
of attributes our third aim is to a/seSs toe reliability oj 01lr estimates.
Sampling distribution
In statistics of variables generally the question of finding o~t the
values of p and q does not arise and as such it is very difficult to obtain
the expected frequencies. If, however, we take a large ilumber of
samples from the same universe and calculate any function (mean or
standard deviation, etc.), we snaIl have a series relatiIlg to the vaJues of
the function. If say, 100 samples are taken and if the mean value of
each sample is found out, we shall have a series relating to the mean
o£ 100 samples. Similarly we can have a series relating to standard
deviation of 100 samples. Such series relating to the values of a func-
tion are called Sampling Distrib1f.tiolls. There is one, very important
characteristic of all sampling distributions and it is that they give a more
or less normal distribution. If the number of samp~. used in the
sampling distribution is large, almost invariably, the sampling distribu-
tion would be a normal distribution even though the parent di!!tribution
from which tbe samples have been taken is not normal. This is a very
important and useful cbaractcristk whkh helps. in the analysis of samples
S:'UiPLING OF VARIABLES (LARGE SAMPLES'
Hut if the Slze' "~f the sample is lar'ge or" in aU• .:! words i.f n \5
large. '\/n.::::I can be approximated as vn:
as it would not make an
appreciable difference and the formula then becomes
Standard Error
o{Sample)
of t h e M ean=
__o__ _"__;:
Vn-
It should be observed that the above formula is derived without
any reference to the form of the parent distribution or the proportionate
size of the sample and is therefore of general application. The standard
error of the mean can give us two limits within which the parameter
mean is expected to fluctuate. Our supposition here is that the s~mpl
ing distribution of mean would conform to the normal curve and even
when there is one sample. mean ±3 u would give us the range within
which the mean value is expected to vary.
Suppose in a given case the sample mean has a value of 40 and the
number of items in the sample is 12.1. Suppose further that the stand-
ard deviation of the sample is 16.5. Then the
Standard Error
16·5 16,5
o f t h e mean=--=--=1.5
vlli II
.r. e. m=~;
VN
where up is the standard error of the entire population;
and
N is the numb!!r ,of items... in the s.ample
Substituting the given values, we get
124 8 .
s. e. m=-=4I.6 miles .
. V900
The observed difference between! the tWo means of mileage is
15232-15180, or 52 miles. This is only 1.25 times this standard error
and so could ha.ve arisen through the,fluctuations of simple sampling. '
Hence it can be said that the divergence of the sample mean from the
population mean is not significant.
Nole. When the standard deviation of the population is given,
the standard deviation of the sample should not be used as the latter
is only a substitute for the former.
FUNDAMENTALS OF STATISTICS
, vn
s.e. q=I'36z63--=
VN
Substituting the values of a and N we get for the age of bride-
grooms.
s. e. 1st and 9th Decile
The standard errors of the l.nd and the 8th decile are given by
the formula.
s. e. 2nd and 8th Dedle= I.42877....!!._ ;
VN
Substituting the values of (1 and N we get for the age of bride-
grooms.
I.428 77 X 8~
s. e. 2nd and 8'th Decile ~~~== =.023 year.
V 2 50 ,ooo \
The standard errors of the 3rd and the 7th deci1es are given by
the formula.
(1
s. e. ;rd and 7th Decile = 1.3 1800-
v'N
Su'!Jstituting the values of a and N we get for the age of l?ride-
grooms. ---- _ _
a
S. e. md,,=I·2.5;3 1 VN;
Substituting the values we get,
I.z53P x8
s. e. d ,,- = . I =.ozo year.
v 2.50000
s. e. P' d=·7 86 7 z v:
Standard error of the standard deviation or
a
S. e. a=. I
vzn
Standard error of variance or
s. e.a 2=( 2) :
s. e. I r= J -3-
2.11
Some of the above forntulae are illustrated below in the follow-
ing examples :.,....
Example 9. For the height distribution of 7575 females. ~he
standard deviation is found to be 2.46 inches .. Within what limits 'this
standard deviation may be taken to be correct ?
SoluJion
Taking the universe to be normal, tile standard error of standard
deviation of the height is
a 2.46
e. a= V 2N = V2.X757S =0.02 approx.
FUNDAMENTALS OF STATISTICS
J. I.al=u! J:
Substituting the values We get
J. ,.0'=2.} j z.
100
=3·535
Thus the standard error of the variance in the problem is J. J3 f.
&alllpl, u. The following is the frequency table of Six Months
Prime Commercial Paper RateS, January, 1931 to December, 1941.
Class InterVal. Frcqucnciea
2.50 to 2.99% .l
3. 00 to 3·49 7
3' ,0 t:. 3.99 .10
4.00 to 4.49 30
4.,0 to 4.99 2.0
,.00 to '.49 10
5.5 0 to 5·99 S
6.00 to 6.49 6
S1UtiPLING OF vARIABLES (LARGE SAMPLES)
i
Calculate the standard error of the coefficient of variation for the I
above table. I
Standard deviation or
a=J Edx 2 _ (Xdx)1I = J66.75 _
II n 100
(.:2:1...)2
100
=·79
Coefficient of variation is
aXloo ·79Xloo'
V =---= =17·91
a 4.4
The standard error of the coefficient of variation is given by thel
formula
s. V
e.v=~
v 2.n
J1+-,- 2V
10
a
s. e. • =
17·95
. I 2. XIOO
'V
J1+ 2 X(17'9S)2
10'
= 1·3
Thus, the standard error for the coefficient uf variation for the
given distribution is 1.3'
FUNDAMENTALS OF STATISTICS
foltltion
The standard error of the calculated r 3n the sample is
1-(1-0.082.)2
1.1, '- 0. 008 9
VN VIUSO
SAMPLING OF VAlUABLES (LARGE SAMPLES)
=.67-3-= 1.005
%
The standard error of this regression coefficient is
(T .. '\f'I-r2 hiI -(.67)Z
s.e·b= =
(Tllyr:r- .zy~
=.02.8.
Thus, the regression coefficient, of Hapur prices over Karachi
prices is 1.005 and its standard error is .0%8.
Example 16., Fot a given group of adults, the coefficient of cor-
relation between height and weight is .6; standard deviations of height
and of weight are; inches and IZ lbs. respectively and the means in
height and weight for the entire group are 69 inches and l45 lbs. res-
pectively. Find out the best estimate of the weight of an individual
who is 7% inches tall. Assign limits to this estimate in which in all
probability his actual weight would be lying.
Soilltion
The regression equation of weiglit (Y) on height (X) is
(Y -A II )="..!!-II (X-A~);
(Til
or Y =r~(X-A",)+AII'
(T.
Solution
Denoting poor children by A, well-to-do children by a and below
normal weight by B, and above normal weight by {3,
we get,
(AB)=55 (aB)=I;.
(A{3) = I I (a{3)=48.
Substituting these values in the formula,
Q= (AB) (a~)-(A~) (aB)
(AB) (a{3)+<A{3) (aB)
Where Q represents coefficient of association,
we get,
(55 X4 8)-(II X 1 3)
Q= (5SX48)+(IIX I 3)
2.6 4°-143
+
= 26 4 0 1 43
=+.89'
SAMPLING 01' VJlRIABLES (LARGE SAMPLES)
711
The standard error' of this coefficient of association is
s. eo 9=
l-QI
.t
J 1 I
CAB) +~+(aBj+ (4f
I 1
Up till now we have discussed the various formulae for the cal-
culation of the standard error of a measure calculated from a sample
and on its basis we bave tried to lay down the limits within which the
parameter value of that particular measure is expected to vary. We may,
many times, come across problems of slightly different type where the
results of two samples may be before us and we may then have to test
whether there is II. significant difference between the results of the two
samples and to find out whether two such samples could have come
from the same universe. We shall discuss below some problems of
this type.
Standard error of the difference of sample means
Suppose two samples have given us two mean values and we have
to find out whether there is a significant difference between the two
values, or whether they could have come from one universe or from
universes having the same mean and standard deviation. Here we shall
calculate the standard error of the difference of the two sample means
and then we shall find out whether the difference between them is more
than say thl:ee times the standard error of the difference. If the actual
difference between the two means is more than thrice the standard error
of the difference it is said'to be significant otherwise the difference can
be due to fluctuations of sampling.
j 2(_I + __1_)
eT
P 111 1/2
Ja/ nl;£~n2)
In all the above cases we presume that simple sampling condi-
tions hold good. If, however, the two samples are from such universes
Dc tween which there is a correlation, the standard error of the differ-
ence of sample means would be
j 0"]2
--
2
+- - -
--~----~---------
0"2
-2r - - -
0"1 X0"2
111
+---
112
+ (16.04)2
s. e' m1 -m2=
=·715
J
(1 7 .4 1)2
182 I 74 6 J.5 4
11
SO/It/ion
Supposing the two samples have been drawn quite independently
the standard error of the difference of the two means is
s.e· m1-!U2=
n1
j n2
Substituting the given values, we get,
Solution
Since the two samples are independent and come from the same
uniyerse under simple sampling conditions, the standard error of the
difference of two mean heights is
,-------------------
s. e. ml-m2=J (]2p (_1_+
n 1
_1__ ) "2
=·47
The observed difference of the two mean heights is '91 inches
which is only two times this standard error and so could have arisen
throLlgh the fluctuations of simple sampling. H:::llce it can not be
saId that the mean height of the seniors is greater than that of the new
entrants.
Example 2.1. The mean produce of wheat of a sample of 100
fields comes to 2.00 lbs. per acre with a standard deviation of 10 lbs.
Another sample of 150 fields gives the mean at 2.z.~ l~s. with a 5t~ndard
deviation of 12. Ibs. Assuming the standard devIatIOn of the yIeld at
I 1 Ib3. for the universe find out if there is a significant difference bet-
ween the mean yields of the two samples.
71.4 FUNJ)A)lBNTALS OF STATISTICS
S"/lIliull
Supposing the samples arc independent and come from the same
universe, the stan!iard error of the difference of the mean yields would
be :
= 1·42,
The observed difference between the two means is 2,0 lbs. which
is more than thrice the standard error of the difference of means. Hence
the difference is significant and could not have arisen due to fluctuation
of sampling.
Exaulple H. A random sample of 100 villages in a district gives
the mean population of 500 persons per village. Another sample of
150 villages from the same district gives t'!~ -mean at 504. If the
standard deviation of the mean population of villages in that district
is 10 find out if the mean of the first sample is signHlcantly different
from the combined means of the two samples taken together.
S"/IIIiOl1
The combined mean of the two samples
(lOOX 500)+(qox 504)
= 100+J50
= ,02·4
The difference between the first sample mean and the combined
mean thus is (502.4-,00)
=1·4
The standard error of the difference of the first sample moan and
the combined me:.n is
,,2-
=J up" IJ1(ll1+nJ
Substituting the values we get
=J (20)1
1 0
5
100(100+1)0)
, - v' .1.4-
== 1·~49
The observed difference is less than thrice the standard error of
the difference and as such it could have ali sen due to fluctuation of
sampling. hence. the difference is not significant.
SAMPLING OF VAR.IABLES (LARGE SAMPUS)
s.e· ml-m2=
J (1;)8
-6-+--zX'7~X
0_
(ll)·
100
.!L
60
II
x100
-
V;·9793
== 1.99
Thus the standard error of the difference of the two mean scores
is 1.99.' The observed difference is 1I4-I.IO~ or ...., which is only
twice this standard efror and sO could have ansen through the B.uctua-
tion of the simple sampling. Hence, the difference is not significant.
Standard error of the difference betwccn twO sample medians
The standard error of the difference of two sample m.edians is
obtained by the fortn-ula-
I. ,. mdlll.!:tII/... == V "·'··../..l+ S.,.ltII4,,.
The following example illustrates the formula :-
Exampl, Z4. Two samples of 100 and 80 studCnts are taken
with a view to fitld out their average monthly expenditure. It is found
out that the median monthly expenditure for the first group is Rs. 8S
and for the second group is Rs. 100.
The standard deviation fot the first group is Rs. 7 and fox the
second, Rs. 8.
Examine if the difference between the median monthly expendi-
ture of the two samples is statistically significant ?
SolulitJn
Assuming tha.t the conditions of simple sampling hold true, the
standard error of the median monthly expenditure is
FUNDAMENTALS OF STATISTICS
2112
J a p2 X
2
(_I + _I)
III 112
Jal 112
---;- X nr(1I1+na)
The following examples would illustrate the above form~lae.
Example 25. In a sample of 1000 the mean is found to be 17·5
and the standard deviation 2.5. In another sample of 800 the mean is
18 and the standard deviation 2..7. Assuming that the samples are
SAMPLING OF VARIABLES (LARGE SAMPLES)
independent, discuss whether the two samples could have come from
universes which have the same standard deviation.
Sollilion
Supposing that the samples are drawn independently, the stan~
ard error of the difference of their standard deviations is
s. e.a1-all=J~+ O"z2
Z"1 znz
S~bstituting the values, we get
(z.~)2 + i::7)2
Z X 1000 Z X800
=.088.
+
s. e'O']-O':a=
J_1_
0'2
2."1
follllio"
Supposing that the two samples are independent and have come
from a universe which is normal, the standard error of the difference
of the standard deviations is
s.e.al-aa=
J up?
- -(
~
1 I) ==;,
-+-
"1 "2
J
(H)2 ( I
--
~
-+--
100
I )
~oo
= j 6.U'X3
10 0 = V.o9 18n
==·3°3
The observed difference in the two standard deviations is I
which is slightly more than thrice the standard ertor of the diffcrc~e
of deviations, and hence the difference is signi6cant.
&"".11, as. The standard deviation of the weights of 746
United States'soldiers of French extraction was shown to be 16.0,3
pounds. At the time of demobilization weight measurements were
made not only of the United States soldiers of French extraction, but
of 80,000 soldiers of all types. For the entire group the value of ap
was J7'06 pounds. Does the observed a for the French differ signifi-
cantly from this value ?
SO/lIlio"
Supposing that the distribution is normal, the sbfudard error of
tl-.e di6erencc of, the standard deviations of weight of the soldiers of
the French estraction and of all types is
I.I.C'I'o-al=
J Z
ailS
..:£-x - - - -
Substituting the values, we get,
;;::-
~ (Ift+"a)
I. l.a.-a, =j-:(:-I-7.-06~)I:--x---7-9a-'-4--__
a 79254(746+ 7t);?'i4
··oW
The observed cW£eicnce of the standard deviations is t .0,3 pounds
whichia only a.,3 times this standard error and so can be attributed to
the fiul:t'uations of the sampling. Hence, the data do not reveal,a signi-J
6cant diHcrenc:c between the two standard deviations.
" 'Bx"IIIP" .19. The standa~ deviation of the wages of 400 textile
workers is Ils. 12.8, another sample of 600 teztile workers $ives the
standarcL:ieviation at Rs. 1'.7. Find out if the standa.rd devJation of
the first ~ple significantly differs from the combined standard devia-
tion of the two samples which is Ils. J4.
SDIMIi",
Supposing that the distribution is normal the standard error of
the diiefenc:e of the standard deviation of the fuat sample and the
combined ltandard deviation of the 'two samples i.,
SAUI"LING 01' VAIlIAUU (UltGE SAUPLES)
-J
-
9S x6oo -~-
400 X 1000 - .147
=.3 8.
The observed difference between the standard deviation of the
Brst sample and the combined standard deviation is 1.1 which i. more
than thrice the standard error. Hence the clliference i. aigniJiQUlt. •
!. Uader what ASsumptions would JOI1 teat the li~ of IIIIIlIft uti
standaad deYi.lion in lUSt: samples jl
2. What do )'OQ understand by simple aamplinl o! ....nabla) What lillli-
tation. an: i~ by the ..sumption that .implc lamplinc c:oodi.tioaa . . _a&r:d.
by a particular eampIiog ttudy ?
3. Calculate the IWIdud error of the mean fcom the followinc daIa collcc:&cQ
in one of the many Dftdom .ample inquiriCl conducted to find out aYUIF camin&
of a particular claa :-
Euaitlp per moath Number of PetaODl
ia rupees
up to 10 so
•• 20 150
•• 30 300
•• 40 500
.. 50 700
60 800
:~ 70 900
•• 10 1000
(M. C,a•• ~"". 1951).
4. The~ frequency table 11\oW8 the diltribution of 1110 observations
made oa 149 tty price leries duriog ten bllSiDCII cycles :
Dluation of cycle Frequency
(in months)
7.50 to 12.49 months 7
U.5O to 17.49 27
17.50 to 22.49 61
22.50 to 27.49 115
27.50 to 32.49 139
32.50 to 37.49 1~
37.50 .to 42.49 167
42.50 to 47.49 124
47.50 to 52.49 122
52.50 to 57.49 67
FUNDAMENTALS OF STA'I'ISTICS
57.50 to 62.49 52
62.50 to 67.49 15
67.50 to 72.49 15
72.50 to 77.49 8
77.50 to 82.49 2
82.50 to 87.49 2
87.50 to 92.49 o
92.50 to 97.49 1
Calculate the average duration of the cycle for this distribution and state that
within what limits is your average correct?
5. To test if a small electric current affected the growth of wheat seedling
100 Rairs o~,plants v:e:e grown in pa~I1el boxes and one me.mber of each pai:
was treated by recelVlOg a small electriC cutrent. The mean dIfference in heights
between the treated and the untreated (treated-untreated) is+4 m. m. with a standard
deviation of 2 m. m.
Does the electric current exercise any influence on the growth of the plants ?
6. If the standard deviation of pulse rate in adults is 8 and the normal pulse
rate is 70 would you say a high pulse rate was diagnostic if a group of 64 people
suffering from a disease were found to have a pulse rate of 75?
7. The following table shows frequency distribution of yield of wheat in rna·
unds per acre in 998 irrigated fields selected at random in the province of Punjab.
Limits in mds. No. of fields
0- 4 ~ 45
4- 8 184
8-12 281
12--16 228
16-20 155
20-24 71
24-28 22
28--32 5
32--36 1
Calculate the sampling error of the mean.
8. Suppose that the standard deviation of stature in men is 2.48 inches. One
hundred male students in a large university are measured and their average height is
found to be 68.52 inches. Determine the 98 per cent. confidence limits for the mean
height of the men of the university.
9. The average breaking strength of steel rods is specified to be 18.5 thousand
pounds. To test this a sample of 100 rods was tested and gave the- following results:-
Breaking Strength No. of rods.
15,000-16,000 12
16,000-17,000 20
17,000-18,000 40
18,000-19,000 20
19,000-20,000 8
Do the results of the test justify the hypothesis ?
_._10. /l sample of 900 members is foundJ:O have a mean of 3.4 em. Can it he
reasonably regarded as a simple sample from a large population with mean 3.2 cm.
and standard deviation 2.3 cm. ?
11. A certain colliery is supposed to supply coal of ash content about lS%.
One hundred samples of coal of the company when analysed gave an ash content
of 16.8%with a standard deviation of 8%. Do these results justify company's claim?
SAMPLING OF VARIABLES (LARGE SAMPLES) 73 1
12. What ill the standard error of the standard deviation if thc parcnt distri-
bution is (0) normal (b) not I normal ?
It is known that the mean and standard deviation of a variable are respectively
100 and 10 in the univer~e. It is, however, considered sufficient to draw a sample of
sufficient size but such as to ensure that the mean of the sample would be, in all pro-
bability within 0.01% of the true value. How much would the cost be (exclusive of
overhead charges) if the charges for drawing 100 members of a sample be one rupee?
cr.
A. S., 1947).
13. The followihg table gives the frequency distribution of weights for adult
males born in Punjab.
31. In a wheat variety test conducted over a wide area, the mean difference
between two varieties was found to be 5.5 bushels to the acre. The standard ertOr
of this difference was 1.4 bushels per acre, and was determined from 100 pairs of plots.
Set up the fiducial limits at the 5% probability level for the mean difference in
yield between the two varieties.
32. The median height of 100 M. Com. students is 66" with a standard deviation
of 3" and the median height of 121 M. A. (Economics) students is 64" with a standard
deviation of 4". Are M. Com. students taller than M. A. (Economics)
students?
33. For two groups of students it was found that there was no significant
difference in their mean weights. However, the standard deviation in the first
group (of 100 students) was 10 lbs. and in the second group (of 150 students) 13
Ibs. Is there a significant difference in the two standard deviations ?
34. To find out the mean weight of girl students two independent samples
of 100 each are taken which reveal no significant difference between the means; for
the first sample the standard deviation is 2.5 inches while for the second it is ·3.S
inches. Test the significance of the difference between these two measures taking
the standard deviation of the universe to be 2.8S.
3S. The following data relate to the wages paid to workers in a certain industry.
1st sample 2nd sample;
Number of workers 586 648
Mean monthly wage Rs. 52.5 Rs. 47.S
Standard deviation of wages _ 10 11
Find out if the mean wage and the standard deviation of wages of the first sam-
ple significantly differ from the combined mean and the combined standard deviation
of the two samples. \
36. Two hundred observations drawn independently from a population have
a mean equal to 30 and standard deviation of 15.
To examine the null hypothesis that the hypothetical mean is 20 the procedure
ollowed is to compute w=( 30-20 )-:--lS/y'200 and to verify whethet this
exceeds 1.96.
Explain clearly the basis of this test, assuming that the available under study
in the population is not known tl priM'; to follow a normal distribution.
If you are told that the distribution in the population is symmetrical, about 20
do you suspect the sampling procedure if 150 observations out of 200 exceed the
value 20 ?
(I. A. S., 1957).
37. The heights of a random sample of 1,304 Scotsmen had a mean of
68.S456", with standard devi2tion 2.480". A .r2ndom sample of 6,194 Englishmen
gave an average height of 67.4375", and a standard deviation of 2.548". Is this
difference significant ?
38. In an ordn2nce factory tW;) different methods of shell filling are compared. I
The average and standard deviation of weights in a sample of 96 shells filled by an
old process are 1.261bs. and O.013lbs. tnd a sample of 72 shells filled by a new process
gave an average of 1.28 lbs. with a standard deviation of 0.011 lbs. Is the differ·
cnce in averdge weights significant ?
39. Two chemists estimate the acidity of a jar of dilute hydrochloric
acid. Their results are an average of 10.162 with a standard deviation of 0.23 based
on 15 determinations and 10.341 with standard deviation of 0.12 based on 24,
determinations. What is the level of significance of the difference between the chemists'
estimates of the acidity ?
40. If it is known that the variance of the length of life of electric;!ight bulbs
is 2,500 and it V{e obtain for 25 light bulbs a mean life of 500 hours, determine a 95
percent coruidence-interval estimate of the population mean ?
SAMPLING OF VARIABLES (LARGE SAMPLES) 73,'i
41. The mean life of stockings used by an atmy was 40 days with a stanJard
deviation of 8 days. Assume the life of the stockings follows a normal distribution.
If 100,000 pairs ate issued, how many would need replacement after 35 days ? After
46 days ? , .
42, On an examination a class of 18 students had a mean of 70 with <>=6,
Another class of 21 h9d a mean of 77 with <>=8, on the same examination. Is there
reas.on to believe that one class is significantly better than the other? Consider
the classes as samples from some universe. What might the universe be ?
43. A machine is set to turn out ball bearings having a radius of 1 centimetre
(allowable error±.Ol centimetre). A sample of 10 ball bearings produced by this
machine has a mean radius of 1.004 centimetres with <>=.003. Is there reason to
suspect the machine is turning out ball bearings having a mean radius greater than
1'0 centimetre ?
44. A sample of 25 workmen showed an average increase in wages of 45 ceDts
per hour with 0'=10 cents. Give 90 pet cent confidence limits for the mean wage
increase in the population from which the sample was drawn. State assumptions
used?
Chi .. square Test And
Goodness of Fit 28
M,tIf1ll1~. In the 19th chapter while discussing the study of asso-
ciation in manifold clusmcation or in contingency tables 'We had men·
tioned that the nlue of chi·square is used to study the divergence of
actual and expected frequencies and from this value coefficient of con-
tingency is calculated to find out if there is any association between the
attributes in question. Thus chi-square is a measure of actual diver-
gence of the observed and expected frequencies. It is very obvious
that the importance of such a measure would be very great in sampling
studies where 'We have invariably to study tm divetgence between
theory and fact. In sampling studies we never expect that there will
be perfect coincidence between actual and observed frequencies and the
Iquestion that we have to tackle is about the extent to which the differ-
ence between actual and observed frequencies can be ignored as arising
,due to ftuetuations of s2mpling. Chi-square as we have seen is a mea-
sure of actual dift"erence between the expected and observed frequencies
and as such if there is no difference between actual and observed fre-
quencies the value of chi-square is o. If there is al difference between
the observed and the expected frequencies then the value of chi-square
would be more than o. But the difference in the explained observed
frequencies may also be due to ftuctuation of sample and the value
of chi-square may arise due to sampling fiuctuations and it should be
ignored in drawing inferences. Such values of chi-square under differ-
ent conditioDl are usually available in the shape of tables and if the
actual value of chi-square is more than that given in the table it indi-
cates that the difference between expected and observed frequencies is
not solely d~ to sampling fluctuations, and that there is some other
reallon for it. If, Oft the other hand, the calculated value of chi-square
is less than the table value it indicates that the difference between actual
and observed frequencies may have arisen due to chance fluctuations
and can he ignored. In this way chi.square test enables us to find out
whether the divergence between theory and fact or between expected
and actual frequencies is significant or not. If the calculated value of
chi-square is very small as compared to its table value it indicates that
the divergence between actual and expected frequencies is very little
and consequently the fit is good. If, on the other hand, the calculated
value of chi-square is very big as coml'llred to its table value it indicates
that the divergence between expected and observed frequencies is very
great and consequently the fit is poor.
Before going into further details of the chi-square test we shall
Introduce the reader with certain terms used in this connection.
CHI-SQUARE TEST AND GOODNESS OF FIT
Degrees of freedom
The term degrees of freedom refers to the number of " independent
constraints" in a set of data. We shall illustrate this concept with a few
examples. Suppose there is a :z X:z association table and the actual
frequencies of the various classes are as follows :-
A a
AB "B
B 2.2. 60
38
A(3 af3
8 40
32.
30 70 100
Suppose that we presume that the two attributes A and Bare
independent then the expected frequency of the class (AB) would be
0
3 X60 or 18. Now once we decide the expected frequencies of the
100
class (AB) the expected frequencies of the remaining three classes are
automatically fixed. Thus for the class (aB) expected frequency must
be'60-IS or 42 and similarly for the class (Af3) the frequency must
be 30-18 or I2 and for (af3) it must be 70-42 or 2.8. It means then
that so far as this table is concerned we have only one choice of our
own and in the remaining three classes we have no freedom to fill in
the frequencies as we like. It means that we have only one degree of
freedom so far as this table is concerned. There is one independent
constraint here and three constraints are dependent. In such tables the
degrees of freedom are calculated by the formula:
V=(C-I) (r-I)
where v stands for the degrees of freedom. C for the number of
columns and,. for the number of rows.
Thus in 2 X 2 table degrees of freedom are (2. - I) (2. - I) or I.
Similarly in 3 X 3 table, degrees of freedom are (3 - 1) (3 - I) or 4 and
in 3 X4 table the degrees of freedom are (3 -I) (4-1) or,6.
If the data are not given in the shape of contingency tables as
abSve but are given in the shape of a series of individual observations
or discrete or continuous series then the degrees of freedom are calcu-
lated in a different way. Take the following illustration relating to 102.4
throws of ten coins each which gives the following distribution :-
Number of Heads Actual Frequencies
o 2.
I 10
2. 38
; 106
4 188
5 :on
6 "2(>
7" FUNDAMENTALS OF STATISTICS
7 u.S
8 59
9 7
10 3
Total 1024
value of X 2 is more than the table value it means that the difference
is significant, and if it is less it indicates that the difference could have
lrisen due to chance fluctuations and as such can be ignored. The
second method is more convenient and easy and is generally used.
We shall now give below some illustrations in which· the value of
~2 has been calculated and conclusions drawn from it. .as we have
seen in the chapter on Association of Attributes the value of XI is
:alculated by the formula
where I stands for the observed frequency' and II for the corres-
ponding expected frequency.
Example. The following table shows the number of people
interviewed by age-groups and the number in each age group estimated
to- have peptic ulcers.
Age group 15-2.0 2.-2.5 2.5-::-35 35-45 45 -5 5 55- 6 5 65 -75 Total
\fos. Interviewed 199 3 00 112.8 1375 108 9 62.5 15' 4 867 1
P. U. Cases I 8 3 8 9 6 10 5 56 12 31
Do these figures justify the hypothesis that peptic ulcer is equally
popular in all age groups '(
Solution
If peptic ulcer was equally popular in all age groups then in each
age group (~X' 100) or 6.,% of ·the people should suffer from it.
4 87 1
On this basis the observed and expected frequencies would be as
follows : -
Age ~rou__E_ 15-2.0 2.0-2.5 25-35 35-45 45-55 55-6~ 65-75
Observed Cases I 8 ~8 96 105 56 12.
Expected Cas.es 13 1,9·5 73 89 7 1 40·5 10
We get,
X2=(I8-20)2+ (19-20)' (2;-20)2 (2.1-20)2+(16-20)2
ro ro + ro + ro ro
(2.1- 20)2 (2.2.-20)1 (20-20)2 (.u -20)2 + (15 -20)2
+ 20 + 20 + 2.0 + 20 20
=-irs {4+1+9+1+16+25+4+0+I+25}
=403-
The number of degrees of freedom
=("-I)(r-I)=(10-I)(1-I)
=9
CHI-SQUARE TEST AND GOODNESS OF FIT 741
Test whether the colour of the son's eyes is associated with that of
the father's (you may use the fact that 5% value of chi-square for I
:legree of freedom is ;.84).
Solutioll
Let us take for our hypothesis, the supposition that the eye colour
of sons and the eye colour of fathers are independent. If this be true,
the expected freguencies would be :
Eye colour in sons
not light light
We get.
X2=(2.30-144)1 +(I84-2.34)2+(I~ I -2.37)2 +(47~-385)'
144 234 %37 ;85
=(86)2 {_I_ +_1_ + _1_ +_I_}
144 2;4 2.37 ;85
FUNDAMENTALS OF STATISTICS
=133·37·
The number of degrees of freedom
=(t:-l)(r-l)=(z-I)(I-I)
=1.
Solution (I)
We take for our hypothesis, the supposition that inoculation and
survival are independent attributes. If this- be true, the e~pected
frequencies would be : -
Died of anthrax Survived
Inoculated with vaccine 4 8
Not Inoculated 4 8
Substituting the observed and the expected frequencie~ - in the
formula
}
We get,
X2=(Z-4)! + (10-8)2 + (6-4t +(6-8)-
-It-
4 8 4
=(Z)2 H:+i+i+i}
=3'
The number of degrees of freedom is
=(c-I)(r-I)=(z-I)(Z-I)
=1.
The v.alue of X2 at s% level of significance for /one degree of
-freedom is 3.841. The calculated value which is 3 is thus less than it.
Hence there is no cause to suspect the hypothesis and the data do not
suggest that the survival is associated with inoculation. Whatever
association there appears to be between these two from a direct com-
parison may be due to fluctuations of sampling.
CHI-SQUARE TEST AND GOODNESS OF FIT
SoIlIIiflll (2.)
There is a simple formula for calculating X 2 for this type of table.
i.I., (2. X 2.) fold table given by Brandt and Snedecor.
Representing tne (2 X 2) fold table as follows : -
b1 b2 Tb
(1 (2 Tc
Tl TI T
.. (b (2-( b )2T
Xl 1S glven by 1 1 2
TrTz·Tb,Tc.
We multiply diagonal frequencies and find the difference between
the two products. The difference is squared and multiplied by the
grand total, and the result is divided by the product of the sub-totals.
In our case, the table is as follows
L~
8
1~ ~:
16
I 24
we get,
X 2 = (11-17'7)1 + (16-20'3)3 (16-10';)' (6-11'7)'
11'7 2.0'3 + 10'3 + II '7
( .)2 { I
= 57 17'7
I
+ 2.0'3 +
I
10'3 + 1
11'7
J
=9.3 67
The number of degrees of freedom is
=«(-1)(,.-1) =(2.-1)(2. -I)
= I,
The value of XI for one degree of freedom at ~% level of signi-
ficance is 3'841. The calculated value is thm greater thu this value.
So the hypothesis is incorrect and the association between the two
attributes is significant. Thus it is established th'nt the varone is
effective in controlling susceptibility to tuberculosis.
Example 6. Two investigators draw samples from the same
town in order to estimate the number of persons falling in the income-
groups "poorer", "middle class", "well-to-do" (The limits of the
groups are defined in terms of money and are the same for both investi-
gators.) Their results are as follows :-
Investigator Income-gtoup
"Poorer" "Middle class" I "Well-to-do" Total
A 140 100 I IS 2.55
B 140 50 I 2.0 2.10
Total 280 15 0 I H 465
Show that the sampling technique of at least one of the investL
gators i~; o;uspected
So/ption
Let us take as our hypothesis the supposition that the sampling
technique of both-the investigators is the same and beyond suspicion,
If this be true, and further, as both the samples have been drawn from
one population, the two samples should contain almost same propor-
tion of "poorer", "middle class" and "well-to-do" people. On this
basis the theoretical frequencies would be : -
CHI-SQUARE TEST AND GOODNESS OF FIT
74'
Investigator Income-group
"Pooter" "Middle class" "\X1ell-to-do" Total
A 154 8z 19 z55
B u6 68 r6 2.10
Totals zSo no 3~ 46 5
Substituting the observed and the theoretical frequencies In the
formula
)(2=1: rUt?}
We get,
= J ;.,8
The number of degrees of freedom
=(c-IKr-I)=(3 -1)(2-1)= 2..
The value of X 2 for 2 degrees of freedom at 5% level of signifi-
cance is 5.991 which is much less than the calculated value. Hence the
hypothesis does not hold ground and suspicion arises in the samplin<?
technique of at least one of the investigators. b
12 8
II>
4 124
Substituting these values in the formula, we get
X2= (24)2 {(9)2 +(Z.)2+(1)2_( lZ
12. X 12 12. 8 4 2.4
r}
= ~:~~: { 6.75+°'5+'°,25-6 } =4X I.5
=6
The number of degrees of freedom is,
(t'-I) (r-I)=(2-I) (3-1)=2
The value of XII for 2. degrees of f'reedom at 5% \evel of signifi-
cance is 5.99. The calculated value is thus slightly greater than this
value. Hence the treatment may be considered to have some positive
effect.
Where the probability of the happening I.)f an event is known and
the actual frequency of the number of times the event happened is
given, X 2 can be calculated directly from the following formula :-
X2= (a -+_/!_n
b )2
tJ
Where
a=observed frequency b=total number of observation less I{
p=probability of the happening of the event
'1= probability of not happening of the event
n=Total frequency.
The following examples would illustrate this formula :
Exampll 8. In a certain hospital during a certain week, So babies
were born out of whom 30 were males and 20 females. Using X'I
test ascertain if this distribution is inconsistent with the observation
that the sex ratio among births I: 1.
SoJulion
Representing
the chance of a male birth by P
CHI-SQUARE TEST AND GOODNESS OF FIT '741
X 2=
( a_ L
q
[j /
We get, X2=
( 30-:;~20
--"-------=2.
r
_p_ n ·5°
--50
q ·5°
The number of degrees of freedom is I. The value of X2 for
I degree of freedom at 5% level of significance is 3.841. The calculated
value is less than this figure. Hence the observed sex ratio during the
given week is not significantly different from the general observation
of sex ratio which is I : I.
The value of X 2 in the above question could also have been cal-
culated in the usual manner as follows :-
Observed Expected
Frequency Frequency
Male births 3° 25
Female births 20 2$
Total 50 50
=~+~=2
25 25
Example 9. Upon the basis of past experience the fatality rate
from malaria fever for a certain community was found to be 12.5 per
cent (that is, reported deaths from malaria fever+reported cases of
malaria fever=.I250). A survey was made of certain congested areas.
The homes studied were selected as nearly as possible at random
and a fatality ratio of 30% (45 deaths) was found for 150 cases -of
FUNDAMEN'I'ALS OF S'I'A'I'ISnCS
(
4 - /!_b
q
)2 (45 _,US
.~,
10 5 )2
--~--p__'_-- We get, X' = .12 5
-11
q
--
.875
150
=4%·
The number of degrees of freedom is I. The\value of X2 fot
one degree of freedom at 1% level of significance is 6.64 which is much
less than the calculated value. Hence the sample fatality rate repre-
sents a significant departure from the population fatality rate' from
malaria fever.
By the formula
X 2 =E[(f f)l]
We shall get the same value of Xi.
Thus
=4 a
Expected values
The above examples illustrate the use of X2 in testing the signi-
5mnce of the difference between the observed and the expected values.
The expected values must be very carefully obtained. Where we are
:lealing with data about throws of a coin we know the theoretical fre-
~uen<ies a priori. In many other cases the expected frequencies can bp
calculated on the basis of the past records. Thus we can know fron.
past records the incidence of malaria or tuberculosis but in such cases
there 'is need of caution. The incidence of the disease in the past and
the present may not be alike. In association tables the expected fre-
quencies can be calculated on the hypothesis that the attributes in
question are independent of each other.
Conditions for the application of chi-square test
While applying the X 2 test the following conditions should be
:>bserved :
1. The number of observations must be reasonably large. It is
Q.ecessary because if the number of items is small the values /-11 or
the differences between the actual and observed frequencies would not
Se normally distributed. In X2 test our presumption is that these
differences give a normal distribution. It is difficult to say what should
. .be the least number of observations, but as far as possible the number
,hould not be less than 50.
2. No theoretical or expected frequency should be very small.
8rdinarily tbey should not be less than 5. If the theoretical frequencie~
ue smaller than this number, then adjoining classes should be merged
:ogether.
3· In X2 test we calculate the probability of getting on random
;arnpling a value of X 2 equal to or more than the observed value and
if the probability is very low. we conclude that there is a significant
divergence between observed and expected frequencies. Usually if p is
less than 0.5 the divergence between theory and fact is supposed to be
5ignificant. If, however, the value of p is high it is no conclusive proof
that the difference between the observed and expected frequencies is
insignilicant. All that can be said is, that so far as X2 test is concern-
ed the difference is insignificant, and the data and the hypothesis are in
agreement. If the value of p is very large say near unity then also there
is reason to suspect the hypothesis because in actual practice there is
never a very close agreement between observed and expected frequen-
cies. Very close agreements are as rare as very great divergences.
The additive property of X 2
X2 has a very useful property of addition. If a number of sample
studies have been conducted in the same field then the results can be
FUNDAMENTALS OF STA'I'lS'tICS
pooled together for obtaining at'. accurate idea about the real position.
Suppose ten experiments have been conducted to test whether a parti-
cular vaccine is effective against a particular disease. Now hen: we
shall have ten different values of X2 and ten different values of v (deg-
rees of freedom). We can add the ten of XI to obtain one value and
similarly ten values of v can also be added together. Thus, we shall
have one value of X2 and one value of the degrees of freedom. Now
we can test the results of all these ten experiments combined together
and find out the value of p.
Suppose five independent- experiments have been conducted in a
particular field. Suppose in each case there was one degree M free-
dom and following values of XI were obtained :-
Experiment Number Value of Degrees of Freedom
Xa
1 4'3 1
2. 5'7 I
3 1..1 1
4 3·9 1
S 8·3 1
. Discuss whether there is any sigslificant dilference between the two communities
in the m..tter of tea taking.
S. The table given below shows the data obtained during an -epidemic of
cholera : -
Attacked N6t Attacked Total
Inoculated 31 469 -500
Not Inoculated 185 1315 1500
216 I784 2000
Test the elfectiveness of inoculation in preventing the attack of cholera.
[Five per cent. value of X2 for one degree of freedom is 3.84 ]
(Indian AmJiI and A((_tz StI'Ii(I, 1941).
9. Two treatments A and B were tried to control a certain type of plant disease.
The following results were obtained : -
A : 200 plants were examined and 40 were found infected.
B : 200 plants were examined and 10 were found infected.
Is treatment B superior to treatment A ?
10. The following table shows fractional test meal results in a series of patho-
logically verified cases of ulcer and the cancer of stomach :_
Hypochlor- Hyperchlor-
Achl or-hydri a hydria Normal hydria Total
Chronic ulcer 3 7 35 9 54
Cancer 22 2 6 0 30
Toul 25 9 41 9 84
Find the value of X2 on the hypothesis that the dice were unbiased and hence
'lhow that the data are consistent with this hypothesis So far as the X2 teEt is
concerned.
17. The number of ma;es in each 106 eight-pig Jitters was found, and they
a re given by the following frequency distribution : -
Number of males per litter 0 1 2 3 4 5 6 7 8 Total
Frequency 0 5' 9 22 25 26 14 4 1 106
Assuming that the probability of an animal being male or female is even
0. e., p=q=l), and the frequency distribution follows the binomial Jaw, calculate the
expecteci frequencies of the nine cIas6es. Find also the value of XI to test the good-
ness of fit. (Punjab, M. A., MalhI., 1946).
18. Ten coins are tossed 1024 times and the following frequencies observed
No. of heads 0 1 2 3 4 5 6 7 8 9 10
Frequencies 2 10 38 106 188 257 226 128 59 7 3
How does this compare with a normal distribution ?
19., In experiments on pea-breeding, Mendel obtained the 'following frequencies
of seeds; 315 round and yellow; 101 wrinkled a?d yellow; lOS round ~nd gleen; 32
wrinkled and grecn. Total, 556. Theory predIcts that the frcquenCles should be
in the proportions 9 : 3 : 3 : 1.
Examine the correspondence betwcen theory and experiment.
20. qenetic theory states that children having one parent of blood type M
and the other of blood type N will always be of one of the three typCll M. MN, Nand
that the proportions of these types will on average be as 1 : 2 : 1. A report states
that out of 300 children having one M parent and one N parent, 30% were found to
be type M, 45% type MN and remainder type N. Test the hypotheSis by X= test.
(I. A. S.).
21. Given the following actual and theoretical normal frequencies (total 400).
Test the goodness of fit (degrees of freedom 10). '
Actual 4, II, 17, 29, 43, 56, 58, 63, 61, 25, 20 9 4
Theoretical 4.6, 7'9, 16.8, 30'3, 44'7, 59'1, 65, 60'4, 47.5, 31.16, 18.25, 8.S: 5.3
22. Test the following data for goodness of fit (volume of trading on New
York Stock Exchange expressed as percentage of straight line trend, 1897-1913).
Class Marks 0 1 2 3 4 5 6 7 8 9 10
Obscrved frequencies 11 35 50 48 24 15 9 7 3 1 1
Theoretical frequencies 15 29 48 43 35 21 9 3 1 0 0
Here n=204. Is the theoretical frequency curve a good fit to the observed
data?
23. In a certain cross the types represented by BC, Be, bC and be are expected
to occur in a 9 : 3 : 3 : 1 ratio. The actual frequencies obtained were :
BC Ee be be
102 16 35 7
Test the goodness of fit of obiervation to theory.
24. Tcst the goodness· of fit of obsetvation to theory fat the following ratios:-
Ob~erved Values Theoretical Values
A a A a
(1) 134 34 3 1
(2) 240 120 3 1
(3) 76 56 1 1
,(4) 243 13 15 1
GHI-SQUARE r-:ST AND GOODNESS OF FIT 755
25. (D) Describe some applications and the limitations of chi-square tC5t.
(b) If for one-h~f of n events the chance of success j s p and the chance of failure
9. while for the other half the ehance of success is qand the cb:ance offailurep, what is
the standard deviation of the number of successes, the events being all independent?
Is the mean number of successes np in this case ? (P. C. S., 1956).
26. Suppose that, in a public opinion survey answers to the questions ' -
(D) Do you drink ? .
(b) Ate you in favour of local ol?tion on sale of'liquor
Were tabulated as below ; - ..
Question (D)
Yes No Total
Yes 56 31 88
No 18 6 24
Total 74 37 112
Can you infer that opinion on option is dependent on whether or not an indi-
vidual drinks ?
Values of X' on levels of significance P
Degrees of freedom P=0.05 P=0.10
1 3.84 2.71
2 5.99 4.60
3 7.82 6.25 (P. C S., 1953).
27. Explain the technique of the X' test of goodness of fit, pointing out the
precautions to be observed in its applications.
Two sample pools of votes for two candidates A and B for a public-office are
taken, one from among residents of urban areas, and the other from residents of rural
areas. The results are given below. Examine whether the nature of the area is related
to voting preference in this election.
Votes for
~
A B Tota
35. Four hundred and ninety-two candidates for scientific posts gave parti:
culars of their university degrees and their hobbies. The degrees· were in either
rnaths., chemiHry or physics, and the hobbies could be classified roughly as music
crafts work, reading or drama. Every candidate, therefore, represents one degree
and one hobby. Do you find any association between the two ?
Music 24 83 17 124
CraftswO!k 11 62 28 101
Reading 32 121 34 187
Drama 10 26 44 80
JVnlJd2
n-I
J Ed
n-I
3
-
x-m,/_
= -a- y n
If standard deviation has been calculated by the formula
a= j _¥2 then the value of "t" can be written as
x-m _
t=--y'n
a
FUNDAMENTALS OF STATISTIC:;
* This co;rcx:tion which is given by the ratio of the standard etto! of the
small samples to the standard error of the true mean is known, and is not to be
calculated. it has been dist:overed and measured for all size of small samples by
mathematial statisticians and are available in the shape of tables,
t See Chapter XI.
THE SAMPLING OF VARIABLES (SMALL SAMPLES) 76J
Em
Average height of the sample = n
-67 in.ches.
Standard deviation of the sample is
a= J Ed
n-l
2
= J_!8_-
10-1
= 3.13 inches
Substituting thes<! values in the formula,
t= x-m v'n
0'
where x represents the mean of the sample, m the mean of the
universe, (1 the standard deviation of the sample and n the number of
observations, we get,
t= 67- 6 5 v'I'"O=2..02..
3-}3
The number of deg;.ees of freedom n-I = 9.
. The value of t for 9 deg~ees of freedom at 5% level of significance
is 2..2.62.. The calculated value of t is 2..02. and is less than the table
value so at 5% level of significance this error could !Iave arisen due to
fluctation of sampling and it can be said that the mean height in the
universe is 65 inches.
We can interpret this result in terms of p also. On random sam-
pling the probability of g.etting a vlaue equal to or higher than 2..02. for
nine degrees of freedom 1S more than .05. When p=. I and degrees of
freedom=9 the value of t= 1.8B. Thus the value of p in the present
problem is somewhere between .05 and.!. If the value of p is .os and
1=2..:.68 it means that the probability is 5%, or in only one case out of
:'0 this difference could have arisen due to chance. Similarly if P is .1
and tis 1.833 it means that the probability is 10% or in only one case
out of 10 such difference could have arisen due to chance. In the pre-
sent example, it is in one case out of about is (between 10 and 2.0)
that such a difference could have been due to chance fluctuations. It
means that we can say with a fair degree of confidence- that the differ-
ence may have arisen due to fluctuations of sampling.
Example:.. Ten individuals are chosen at random from a popu-
lation and their heights are found to be in inches, 63,6;,66,67,68,69.
'7°,7°,71, and 71. In the light of'these data discuss the suggestion that
the mean height in population is 66".
Soltllion
In the above question the value of the mean of the sample of x
would come to 67.S" and the value of standard deviation or a would
come to _;.OII".
As such the value of
1= __"_-_m_y~ 67.8 -66
(1
;.OII
v'1o
=1,89
FUNDAMENTALS OF STATI::;T1C::;
j
0 0
0 -3 9
6 3 9
-2 -5 25
I -2 I 4-
5 2- 4
0 -3 9
4 I I
t=~v'n=I
u
We get,
t= 2.5 8 - 0 ;--
'\,12-1=2..89
2·97
THE SAMPLING OF VARIABLES (SMALL SAMPLES) 7:6'3
the f~rmula J1::X2.In order i:o adjust the difference we have modi-
Solution
I 23 24 +1 0 0
2. 20 19 -1 -2 4
3 19 22 +3 2 4
4 21 18 -3 -4 16
5 18 20 +2 I I
6 20 22 +2. I I
7 18 20 +2 1 r
8 17 20 +3 2 4
9 23 23 0 -I I
10 16 2.0 +4 3 5>
11 19 17 -2 -; 9
Em=II Ed z =50
(j =J =J Ed
n-I
2
50
11-1
= 2·2.4
We get,
f -- 2=5? v'-
I I = 1.4 8 2.
2·2.4
The number of degrees of freedom is I I - I or 10.
Batch No. r 2 3 4 , 6 7 8 9 10
--
Lab. A 7 8 7 3 8 6 9 4 7 8
Lab. B 9 8 8 4 7 7 9 6 6 6
- ·3 .09
+2. +1.7 2..89
'-I -I.~ 1.69
-2 -2·3 5-29
+3 1:d2 16.10
u=
Jn=I = j
JJd 16.1
10-1 =1.33
766 FUNDAMENTALS OF STATISTICS
f=~vn
CT
We get
1=_.3- 0 V'i"(;=.7 1
I. 33
S
J-+-
"1 "2
Where Xl and Xa are the means of the samples, "1 and "2 the num-
ber of items in the two samples and S the standar~ deviation of the
difference between two samples. The value of S is calculated as foll-
ows:-
SJ I(X1-X1)l!+ I(x'J,-x.;"J2
~+"2-2.
Where Xl and X2 are the values of various items in the two
samples.
It will be observed that this formula of the claculation of t is again
based on the formula of testing the significance of the difference of two
sample means, when samples are large. The value of f is found out by
obtaining the ratio between the standard error of the difference of two
sample means and the actual difference between the two mean values.
As we have seen in earlier chapters in case of large samples the
standard error of the difference of two sample means is equal to
j op2(.2..+ _ I )
"1"2
or 0' p J"1
_I +_1
"'J
. In case of small samples also the same calculations are done. op
IS replaced by S which is the standard deviation of the difference of nyo
sam~les and is calculated in the manner indic;lted above. In place Of
~!le sum of "1 and Nt the figure which is used is % less than this because
~he degrees of freedom in such -cases is equal to "l+nll-z..The follow-
mg ..examples would illustrate the above formula : -
THE SAMPLING OF VARIABLES (SMALL SAMPi.ES) 761
Example 6. The figures below are for protein tests of the same
variety of wheat grown in two districts. In district 1 the average for
~ samples is IZ.74, and in district 2., the average for 7 samples is 13.05.
If these are the only figures available, test the significance of the differ-
ence betwee" the average proteins for the two districts.
Protein Results
District 1...... H.6 13.4 11.9 IZ.8
District 2...... 13.1 13.4 12.8 13.5
Sollilian
Calcllialion of Ihe !land4rd error of Ihe difference of the average protein!
for the two dis/ricls 1 and 2.
District 1 ! DIstrIct Z
-
Deviation Deviation
Protein from the Square of from the Square of
Results average the deviation Protein , average the'devia-
(12·74) results (I ).0;) tion
(Xl) (Xl-Xl) (XI -Xl )2 (Xz) (Xs-x,J (Xt-- X 2)2
12.6 -.14 .01 96 1;.1 +°'7 ·°°49
13·4 +.66 .435 6 13·4 +·37 .13 6 9
11.9 -.84 .7°5 6 u.S -·2.3 .°5 2 9
12.8 +.06 .00;6 1;·5 +.47 .22°9
13. 0 +.26 . 06 76 1;·3 +.27 .°7 29
12·7
IZ·4
-."
-.6;
. 108 9
.39 6 9
1:(x1 -x1)2' 1:(xa-xJ'I.
= 1.2.;20 - =·9943
The difference between the average proteins of the two distticts
is 13.°3-12.. 74, or .29. The standard error of the ditfer~nce is
We get
Sx
J-+-I
"1
1
"a
768 FUNDAMENTALS OF STATISTICS
,
The number of degrees of.freedom is n1 +n2 -2= 5+7-2= 10.
The value of t for 10 degrees of freedom at s% level of significance
is 2.228. The calculated value is considerably less than this figure.
Hence, it can be concluded that the average protein contained by wheat
in the two districts does not differ significantly.
Note :-± signs have no significance while we compare the cal-
culated and the table values of t.
Example 7. 'The heights of six randomly chosen sailors are in
inches: 63, 65, 68, 69, 71 and 72. Those of ten randomly chosen
soldiers are: 61,62.,65,66, 69, 69, 70,71, 72 and 73. Discuss the li;:5ht
that these data throw on the suggestion that soldiers are, on an average,
t..ller than sailors.
Solfltion
Calculation of the standard error of the difference of the mean
heights of the sailor and the soldier.
Sailors Soldiers
Deviation Deviaiion
Height in from avo Square of Height fromav. Square of
inches height deviation in height deviation
(?& inches) inches (67.8 inches)
j L'(Xl-'XlYI+X(Xt-Xt)1
"1+"2-:1
'-"-60-+-1~-3-:.6 - v'--6
=
=~·9
J 6+10-1 - 11-1
Wc get
3-9Xv'i+ .1
.. 1'0'
=·°99
The number of degrees of freedom is
n1+PJI-z=6+IO-Z, or 14.
The v~l\le of t for 14 degrees of freedom at ,% level of signi-
ficanee is ~.145. The calculated vlllue is much less than this value.
So the difference between the mean heights of sailors and soldiers is
not significant. Hence the sugges~ion that soldiers are on the average
taUer than sailors is wrong.
Example 8. Eight pots growing three wheat plants each were
exposed to a high tension discharge, while nine similar pots were
enclosed in an earthen wire case: 'the number of tillers in each pot
were as follows :-
Caged 17 .16 18 %~ :6
Electrified 16 16 16 1~
.
tillers of the two samples .
l'UNDAMENT.U.S OP STA"rISTICS
Caged Electrified
;
16
22.
-2. 4-
16
1 -5 2.5 +4
4 25 +2 4 4 16 -2. 4-
5 27 +4 16 5 2.1 +3 9
6 28 +~ 25 1 6 18 0 0
2.6 15 -3
7
8 23 I +3
0
9
0
7
8 2.0 +2.
9
4
9 ; 17 -6 ;6
fll = ') \ IJ(>:!:.) 11:&=8 L'(xJ
\=10 7 L'(x1':;1) 2
=160
(144 L'(X2- X2)'I.
= 50
Mean Tiller of caged JPllIple
- r(x,) 207
xl =--- = --=2.3·
"1 9
Mrllll Tiller of 'Electrified' salllple
x 2= L'(x2) = 144 = IS.
n" 8
Difference between the two mean tillers is
"'1-X'l.=2.3-18= 5·
The slalldard error of Ihis difference i.t
=VI4=;.74.
Substituting these values in the formula
SXJ_I"1 +_1"2
THE. SAMPLING OF VARIABLE.S (SMALL SAMPI..ES)
77 1
We get
J ~=2.751
3· 74X V ~+i
The number of degrees of freedom is
S or u p=Jn1u12+n2ui
"1+"2- 1
=J(IOX 100)+(IOX 111)
10+10-2
v' 1Z1.8
= 1 LOS
Substituting these values in the formula
We get,
500 -5 60 _ _ =
t = ___....:-_---'c..-_ 12_2
11-8)_1_ +_1_
10 10
-.1_,.-
.r,;-
In case oE small samples the standard error of the coefficient of
(:of!elation is calculated in tbe same way as in case of large samples.
with the difference that instead of 1-1'". v'I-r' is used and instead of
vii; VII-I. is used because in the calculation of coefficient of correla-
tion two degrees of freedom are lost :lnd therefore the number of deg-
ftC. of freedom is 11-2.. Thus the standard error of the coefficient of
conclation in small samples is givetl by the formula:-
~
vn-Z:-
The value of I is calcu12ted by finding out the ratio between the
coefficient of' correlation and its standard error.
Thus
VI-r sVii=Z
t=r..:..-=-xr
. ..;n -t'Z. v' I -r.
The following examples would illustrate the above formula :-
&1I111pl, 10. Use t test to lind whether a correlation coefficient
of+., is significant if 11= 5 t.
Sqllllioll
Substituting the values of rand tz in the formula.
yin:::;:
t= xr
VI-r s
We get,
v'51 - 2
1= - - - - -x· 5 == 4·07
Vt-(.,)3
For 49 degrees of freedom at ,% level of significance the value
of 1=2.01. The calculated value being much higher than this the
correlation is signiuont.
Bxampl, Il. A random sample of IS frorp. ~ normal univer.se
gives a correlation c;:oefficient of-O.1. Is this SIgnificant of the exIS-
tence of correlation in the universe ?
THl!: SAWPLING OF VAIUABIJ!S (SMALL SAMPLES) 77J
SDltltion
Substituting the values of rand f1 in the formula
f=
Vn-2. X r
VI-r!
We get,
V Ij-2
t= x-o.j=-2.o7
VI-(-O.5)'
The number of degrees of freedom is 15-2 or 13. The value of
t for 1; degrees of freedom at 5% level of significance is 2.16. The
calculated value of f is smaller than this, hence correlation of the
sample is not significant to warrant an existence of correlation in the
universe.
Z-Transformation
Fisher has also given a method of testing the significance of the
coefficient of correlation in small samples. In this method the coefficient
of correlation in the sample or r is transformed into Z. The value of
Z is calculated by the formula
Z=i 10ge*I+r
l-r
,
I+r
-logl' - - I.11IJ
I-r
The staqdard error of Z or
I
I.~• • =./--
V 11-5
Similarly the coefficient of correlation in the universe or P can be
transformed into~. The value of ~ is calculated exactly in the same
manner as the value of Z; of course here the coefficient of correlation
of the universe is taken into account.
If the coefficient of correlation of the universe is not known it is
°
supposed to be and then the value of ~ also becomes o.
To test the sigtillicance of a coefficient of correlation the difference
between Z and E is calculated and the relationship of this difference
and the standard error of Z is then found out. If the difference is more
than thrice or twice (depending on the level of significance) the stan-
dard error it is supposed to be Significant. The following examples
would illustrate the above rules :-
Bxlll1lph u. A correlation coefficient of 0.,
is discovered in a
sample of 19 pairs. Apply Z test to find out if chis is significantly
different •
• Natutal or Nepharion .ystem of logs. The relationship between nltural
log and loglO which are ordinarily used is 10810= log" X 2.3026.
FUNDAMENTALS 01' STATISTICS
(a) from.'7
(b) from 0.;
Sollltill'!
Su :Jstituting the given value of r in the formula
I+r
Z = loglo-- X 1. 151 3
I-r
We get
1+.5
Z = IoglO--
1-., XI.ISI;
=IOglO; X 1.1, I; =.4771 X I.IP3
=·549
The standard error of Z is
I
s. e. = ...; N-;
Now if we assume the correlation in the universe, i. e., p to be
e
zero then is also equal to zero. The deviation of Z from is more e
than twice the standard error of Z. Hence the 4y.pothesis is wrong and
we may conclude that the given correlation 'is significantly .different
from zero.
\
Note : -For Z test the coefficient of correlation of a sample is
converted into Z by the formula Z=IOglO I+r X I.Ip;. Similarly
I-r
the coefficient of correlation of the universe of population is converted
into~. The formula used for ~he purJ:'ose is the same as above.
Other things being equal, discuss the question whether it is,likely to pay the
farmer to continue the more expensive dressing.
11. Why should there be different formula for testing sisnifiance of difference
in means when the samples are (tt) 'small' (b) 'large' ?
The yields of .two types, "Type 17" 'lnd "Type 51" of grams in pounds per acre
at 6 replication are given below. What comments would you make on the diJferences
in the mean yields? You may assume that if there be 5 degrees of freedom and
1'=0.2, I is approximately 1.476.
Yield in pounds Yield in J)ounds
Replication "Type 17" "Type 51"
1 20.50 24.86
2 24.60 26.39
3 2~06 28.19
4 29.98 30.75
5 30.37 29.97
6 23.83 22.04
(I. A. S., 1951).
718. FUNDAMENTALS OF STATISTICS
12. Find out the reliable of the sample mean of the following data : -
Breaking strength of 10 specimens of 104-inch diameter handdrawn copper
wire.
Specimen Breaking-strength
ill pounds
1 578
2 572
3 750
4 568
5 572
6 570
7 570
8 572
9 596
10 584
13. How can "," test be applied for testing the significance of the difference
between two sample means. Calculate the value of t in the case of two characters
A and B Whose corresponding values are given below : -
A 41, 49, 34, 36, 49, 50, 36, 20, 18
B 46, 44, 30, 35, 26, 28, 29
14. The ash content of coal from two different mines was analysed, five analysiq
being made of the coal from the first mine, four of that from the second mine. Are
we justtfied in supposing that the two mines consist of ooal with the same percentage
of ash content on the basis of the results obtained, which are recorded in the
following tables.
\
Table A Table B
per cent per cent
ash content ash cl)ntent
24.3 18.2
20.8 16.9
2J.7 20.2
21.3 16.7
17.4
15. Is there a significant difference in the strength of lead in two number
2 Pencils manufactureJ by R. G. T. Guteba & Co.?
Pencil (a) Pencil (b)
Test Strength in Test Strength in
kilograms kilograms
1 1.6C 1 1.78
2 1.72 2 1.48
3 1.68 3 1.72
4 1.50 4 1.62
Total 6.50 Total 6.60
16. Below are given the gains in weight (pounds) of hogs fed on two different
diets. Twelve animals were fed on diet A, 15 on diet B. Is either diet superior?
Gains in weight on diet A: 25,32, 30, 34, 24, 25, 14, 32, 2.~, 30, 31, 35
Gains in weight on diet B: -44, 34, 22, 10, 47, 31, 40, 30, 32, 35,
18, 21, 35, 29, 22
17. The means of two random samples of sizes 9 and 7 rt'spectively are 196.42
and 198.82 respectively. The sums of the squares of the deviation from the me.ln are
26.94 and 18.73 respectively. Can the samples be considered to have been drawn
from the same normal population ?
THE SAMPLING OF VARIABLES (SMALL SAMPLES) 17c)'
(C. A. S •• 1957).
31. A cettaia stimulus is to be tested for its eft'ect: on hlood pressure. 'I'welTc
men hne their ItJDod preurue measwed befoR: and after the ltitnulus. The teSultl
are as follows.
Man Before .A£te.c
1 120 !28
2 124 1~t
3 130 131
4 118 127
5 140 132
6 128 125
7 140 !41
'8 135 137
9 126 118
10 130 1.32
11 126 129
12 127 135
Is there re ason to believe that !:lie stimulus 'Would, on the a-rer.lge. r.rlse
lood pressure five point. ?
32. What usumptiollS are made about the population when ao experIment is
lrricd out and I1nlLlyzed by usc of a I test?
33<. If it is not necessary to pair observations and you decide to pair, do
DU have a dUfc11lnt level of significance than if you did not pair? Discuss the
~ect of p4liring 00 your chance of making a correct inference both in the case where
ou should pair and in the case where it is not necessary :
34. Suppose that 10 samples of 9 obserfttions each have vuiances as follows
23.5 30.6 29.3 27.5 27.5
26.3 29.8 30.7 22.3 26.5
Is there reason to doubt, at the 5 per cent. Inc:i of significance, that these samples
re from populations having equal YIlrilDCQ i'
35. A fertilizer milling mschine is let to give 10 pounds of aitDIte {or every
'00 rounds of fetdli:zer. Ten 100 pound hap arc examined. The ~n:entlges of
aitrate arc .. foHows : -
9, 12. 11, 10. 11. 9. 11. 12. 9. 10
Is there reason to believe that the mean is not equal to 10 pet cet1t. ?
FUNDAMENTALS OF STATISTICS
36. The following data give paired yields of two varieties of wheat. Each
pair was planted in a different locality. Test the hypothesis that the mean yields ace
cqua.l. Find a 90 per cent. confidence interval for the difference in the mean yields.
1 45 32 58 57 60 38 47 51 42 38
II 47 34 60 ~9 63 44 49 53 46 41
Expl:lin why pairing is necessary in this problem?
37. A group of SO boys and a group of 50 girls were given a test in arranging
different shaped blocks. The times are recorded in the frequency table below. Ana-
lyse the data. If this was an industrial experiment, give a 90 pel' cent. confidence
interval for the mean saving in time if gids perform the task rather than boys.
40 51
Frequency of girls
F re ucncy of boys 1
Analysis of Variance 30
By now the reader would be fairly familiar with the methods of
:alculation and use of standard deviation in various types of studies
relating to dispersion. correlation, regression and sampling errors. In
the chaptet: on Measures of Dispersion it was pointed out that the
standard deviation is the square root of the arithmetic average of the
squared deviations taken from the mean. It was also mentioned there
that the square of the standard deviation, or the arithmetic average of
the squares of deviation taken from the mean is called .. Variance."
This name was first used by R. N. Fisher who developed very elabo-
rately its theory and uses. Variance is in many cases a better and more
convenient measure of variation than the standard deviation and in the
present chapter we shall discuss some of the simple methods by which
variance of a set of series is analysed.
In the previous three chapters we discussed the methods of deter-
mining whether two samples 'have come from the same universe or
from two universes which are significantly different from each other.
One of tl}e methods by which this study is done is by the calculation
of s~ndard error of the difference of the means of the two samples ;
another method is that of X' test ; and in case of small samples the
methods that' we follow is that of t test. Here we shall discuss a
method which measures the signi6cance of the difference between selltral
mea..1S at one time. This Method is the analysi, of variance.
We know that variance or
V-a'
E(X-X)'
"""-----..:..-
"
Where x stands for the values of individual items, x for the mean
::>f the series and 0 for the total number of items.
In case of small samples, however, we mentioned in chapte, ~o
that the standard deviation should be calculated by dividing the sum
::>f the squares of the deviations between the values of items and the mean
value. by the degrees of freedom rather than by the total number of
[terns. Degrees of freedom in such cases are equal to n-I In other
words, in small samples, standard deviation or
q=J L'(x-x)~
0-1
FUNDAlO!.N'l'ALS 01" STATIS'nCS
and Vadana: or
V""" .E(x-xya
"-1
We shall now illustrate how the total Tarianc:e of • .erics can be
divided in components and analysed in detaila. Suppose four samples
of five items each have been taken from a uni-.ene .ad they give
dUferent values of the mean. In these £OlU umples there arc in all
twenty items. Now if we wish to satisfy OuraelVCI that the samples in
reality have come from the same universe we must measure the diver-
gence between their means and the mean of the universe 01: the combi-
ned mean of all the four samples. If there is a significant divergence
between the combined mean and the means of the sample;l then 'We &hall
think that samples have not come from the same universe. Here there are
two difficulties. One is thlrt there are.four Rtllple means and 'We have to
test them for significance of dilference at one and the same time. The
second difficulty is that the variation in the valuet of these twenty items
may not only.be due to the fact that samples c:Wier from each other
but alao on account of the fact that within each sam£?le the items dUfcr
from each other. Thus there are two types of-variatlons in the data-
one ;6h11t6N the varioull samples and the other within the various samples.
N ow if the variations within the samples and betwe~ the samples are
not significantly different from each other then the iamples belong to
the same universe, because variation between the samples is just like
variations within the sa.mple. In such a case if we had taken-only C?nc
sample of 2.0 items it would not have made any difference so far as
variation of items is concerned. If, however, the variation between
samples is much greater than the variation within the aamples, it means
that the samples come from different types of universes otherwise there
'Would not have been a significant difference in the variations between the
samples and within the samples. In the analysis of variance. therefbre,
we :find out the relationship of the variation between th. samples and
the variation within the samples. ..
Suppose the values of the itClllJl in the four samples 'Were as
follows : -
(
The grand total of the values of all the items of all the sa.mples or
T=2.0+3 0 +7°+40=r60
The grand mean of all the items of all the samples
160' or 8
2.0
We shall now study the total variation of these 20 items and we
shall also study the variance between the ... amples and within the
samples.
Total variation
Here we shall find out the squares of the deviation of each of these
2.0 items from the grand average.
Squares of the deviations of vari01ls items from
tiM grand aJ1et't'J!" of 8.
Sample I Sample 2. Sample 3 Sample 4
16 4 16 4
4 0 64 4
36 4 ;6 4
4 4 0 4
;6 64 144 I6
Total 96 76 2.60 ;2
Grand total of squares
=96+76+260+32.=464
Degrees of freedom
=2.0-1 = 19
Variance between the samples
Here we shall presume that each item of the sample is equal to
its average and then study the variance between different samples. In
other words, we shall calculate the square of the deviations of tbe
means of the various samples from the grand average. If the value
of each item in the first sample is t,!lken as 4, for the second sample as
6, in the third samt>le as 14 and in the fourth sample as 8 and if the
.squares of the deVlations of these values from the grand average are
calculated they would be as follows :-
Sample I Sample :t Sample 3 Sample 4
!6 4 36 0
r6 4 36 0
16 4 36 0
16 4 36 0
16 4 36 0
It will be observed that the sum of the squares between the sam-
ples and within the samples is equal to the total of the squares of all
the items from the grand average. Similarly the degrees of freedom in
the total variation is equal to the sum of the degrees of freedom between
the samples and within the samples.
Now if we presume that the difference in the variance between
the samples and within the samples is insignificant, we shall be proceed-
ing on the null hypothesis. We shall then have to find out the extent
of difference in the: variance between the samples and within the sam-
ples which can be ignored as arising due to fluctuations of sampling and
on that basis we shall draw our conclusion as to whether difference in
the present problem is significant or not. In the analysis of variance
we do not study the absolute difference in the variance between the
samples and within the samples but the ratio of these variations is
obtained. In the above problem the ratio of variation or
F = ...2!:L=8.I
11.5
=12.80
The next step is to find out the square of all the items (not of their
deviations from mean as in the previous case). They would. be as
follows :-
Sample 1 Sample 2. Sample , Sample 4
16 ;6 144 100
,6 64 2.S6 100
4 ;6 19 6 100
;6 100 64 36
4 0 4 00 16
Total 96 2,6 1060 ;5 2
Grand total of the sum of squares
=9 6 + 2 ;6+1060+; 52= 1744
Total sum of squares is obtained by subtracting the correction
factor from the grand _total of squares. Thus \
Total sum of squares
= 1744-u80=464
It will be observed that it is equal to the figure obtained by the
first method.
The sum of the squares between the samples is obtained by find-
ing out the sum of the squares of the sample totals aLd dividing it by
the number of it~ms which make up each sample total and then sub-
tracting from it the correction factor.
Thus the sum of squares between the samples.
2.02+,02+702+402
= 1280
5
,
= 400+900+49°0+ 1600 12.80
7800
=---1280
5
= J5 6o - 12.80=2.80
This figure is also the same as obtained by the first method.
The sum of squares within the samples is found out by subt~act
lng the sum of the squares between the samples from the total sum of
squares. Thus the sum of squares within the samples
=464-2.80= 184
This figure is also the &ame as obtained in the previous case.
ANALYSIS OF VARIANCE
When these three figures have been obtained the table of a{\nJ~':::s
of variance can be set up exactly in the same manner as done in the
previous case and the value of F can be calculated•
.The variance ratio or F has a very important property that its vallie
remains unchanged if all the figures are either multiplied or divided by II common
factor or if a common·factor is added to or subtracted from each figure.
Thus if in a problem $e figures are big or otherwise inconve-
nient they can be reduced in magnitude either by division or by sub-
traction of common figure. The value of variance ratio would remain
unaffe<;ted. If in the above example all the values are multiplied·by 10
or divided by 1. or if 5 is added to all the values or if .3 is deducted
from all the values the value of F would remain unaffected.
Suppose we added I to all the values they would then be :
Sample I Sampl~ .2. Sample .3 Sample 4
5 7 13 II
7 9 17 II
5 7 15 II
7 11 9 7
, I :u
Total 75 45
Grand total at T=.180
T2 180X 180
Correction factor or N = .2.0 161.0
The squares of the above items would be
sample I Sample .2. Sample 3 Sample 4
2, 49 16 9 11.1
49 81 28 9 :21
9 49 1..2.5 1.2.1
49 111 81 49
9 I 441 .2.5
rotal 141 501 1.2.0 5
= 62.5+12.2.5+561.5+2025 16%0
5
00
= 95 - 1620
5
== 19°0-162.0= 2.80
790 FUNDAMENTALS OF S'I'ATIS.'I'ICS
'0 q
..wu .........s Deviation of q
-
0
"'[j 0
0 Q1 Deviation of the mean block ';1
."
.c .~~ the block 'j; yield of each .~
0:
..!4 ..c::u ~ ..;: yield of each II)
I P -2.6 6.7 6
(
1. 34 +°·4 0.16
C 3 35 H·6 +1.4 1.9 6 +0.8 .64
4 52 -1.6 z. ,6
5 36 +2.·4 5.7 6
1 29 +0.6 0.3 6
2. 26 -2.·4 5.7 6
D 3 30 28·4 +1.6 2. ,6 -4·4 19.3 6
4 1.8 -0·4 0.16
5 2.9 +0.6 o. ;6
51 • Z0 zG.80
79 1 FUNDAMENTALS OF STATISTICS
Question.
1. Deline variance. How is it related to the standard deviation? Discuss
its utility and usefulness in various types of statisticalllnlllysis.
2. What do you understand bf-
'(a) Variance between samples
(;) Variance within samples.
3. What is the relationship of variance between the samples and "riance
within the samples.
4. What is the meaning of F coefficient ? How is it computed?
5. Indicate the usefulness of th~ study of analysis of variance in various fields
of economic activity.
6. The UgUfCS given below arc yields in bushel. per acre of 6 plots of
gram. Three of these plots are of variety A Ind three of variety B.
A 30 32 22
B 20 18 16
Set up a table of analysis of variance and calculate F to find out the significance
of difference between the yields of two .. arieties.
7. The following table gives the yields of four ('lots each of three varieties
of wheat. Find out if the Variety differences are signilieant.
(VI is the number of degrees of freedom for the greater estimate of "ada DCC
and VI for the smaller).
8 12
8 3.44 3.28
6.03 5.67
9 3.23 3.07
5.47 5.11
10 3.07 2.91
5.06 4.71
CitIes
Allahabad Bombay Delhi
A 20.3 19.8 .21.4 21.6 22.4 21.3 19.8 18.6' 21.0
B 19.5 18.6 18.9 20.1 19.9 20.5 19.6 18.3 19.8
C 22.1 23.0 22.4 20.1 21.0 19.8 22.3 22.0 21.6
D 17.6 18.3 18.2 19.5 19.2 20.3 19.4 18.5 19.1
E 23.6 24.5 25.1 17.6 18.3 18.1 22.1 24.3 23.S
Designs of Experiments 31
Meaning and Need
Data are the fundamentals of statistics and designs are forerunner
of data. Design is a plan for the collection and analysis of facts ane
figures. Preparations of statistical designs should be done very care
fully as any error committed at this stage is likely to upset the entire
investigation. Generally the need for a well-thought-out-plan is no
realised by many and the importance which this problem deserves is no
given to it with the result that many inquiries do not serve the purpose
fo>: which they are conducted. They give misleading conclusions ane
are very apt examples of the misuse of statistical methods. The technique
of the collection of data and the methodology of their analysis have ~
great bearing on the reliability of the results arrived at. Therefore,
in any statistical enquiry the problem of collecting and analysing data
must be very carefully considered. The selection of an efficient design
requires careful planning ir. advance of data ~ollection and analysis.
Thoughtlessness in the selection of a design is' very likely to make the
statistical enquiry entirely useless. A detailed analysis of the actual data
involves huge cost and labour ll.::1d therefore it is im~erative that an
efficient design must be selected. ';:'his can be done only by visualizing
the analysis of data obtainable in different plans and keeping in mind
their standard error and cost. The design which gives the smallest
sampling error is supposed to be best design for a particular investiga-
tion. We have discussed the technique of the collection of data and the
the<?ry of sampling in earlier chapters and a repetition of the same is
not necessary here. Presently we shall confine ourselves to a brief
study of some of the important designs of experiments.
Experimental Designs
Experimentai designs concern the arranging of treatments in such
a manner that the inferences and conclusions regarding the effects of
these treatment;; can be easily done and their reliability measured.
Experiments are made with a view to find the validity of a particular
hypothesis and to have an idea about the extent of the reliability that
can be placed on a particular conclusion arrived at. If a physici?.n
wants to know whether a particular drug which has been invented will
be beneficial in the treatment of a particular disease or if a farmer wants
to know whether a new type of fertiliser will give him better yields he
will frame his investigation in terms of some suitable hypothesis. Mter
this he will design an experiment to find out whether the hypothesis
which he has presumed is correct or whether it is wrong and conse-
quently has to be rejected. The selection of the design will have a very
. \
DESTGNS OF EXPERIMENT~
They are taken out after some time and the effects of corrosion on each
pair is noted down separately. At this stage two types of experiments
can be conducted. One is that the average effects of corrosion on 10
coated pipes is calculated and a similar calculation is done for the 10
uncoated pipes. These two averages can be compared and an inference
can be drawn whether the effects of corrosion on the two types of pipes
are significantly different from each other or not. This design of experi-
ment is not a satisf'actory one because in this method the effects of
variations between soil types has not been eliminated. The effect of
corrosion may be due to (x) soil type which may vary from place to
place; (z) due to the type of pipe coating and (3) due to random varia-
tions. In the above design it is not possi ble to eliminate the effects
of variations in the type of soil which may be there at different places
where pipes were burried and as such it is difficult to say that the diffe-
rence in the effects of corrosion in the two types of pipes is entirely
due to the type of coating on them.
However, if the differences in the corrosion effects of each of the
10 pairs is found out and then analysis of data is done this difli!:ulty
will _not be there. In this case we shall have 10 figures representing
the difference in corrosion effects and we shall find out the average of
these differences and proceed on the null hypoth!!sis assuming that there
is no real difference in the corrosion effects. We have already illustrated
this technique by numerical examples in the ,chapter of analysis of small
s~mples. In this design if it is calculated that the .hypothesis has been
disproved and should be rejected and that there is a,significant difference
in the corrosion effects, we are on more solid grounds than in the pre-
vious case. The reason being that the effects of variations between
the soil types have been eliminated in this case because each pair is in
the same type of soil.
In certain studies the problem before the investigator is to test the
significance of the difference between a group of means. This is usually
solved by the technique of analysis of variance. We have already dis-
cussed this technique in an earlier chapter and a representation of the
same is unnecessary. Sometimes a more elementary and less accurate
test than the analysis of variance is used and this depends on the distri-
bution of the range. The technique which is used is known by the
name of standardised range. The range of a sample is standardised
by dividing it by the standard deviation. Standard range is cal~u1ated
for different samples and it forms a statistical distribution from which
the significance of any observed result can be calculated.
Latin Squares
This is an experimental design which is mostly used in agricultural
experiments where nature plays a very important role and renders even
fairly reliable results very difficult to obtain. In agricultural investi-
gations a number of experiments have to be carried out simultaneously
and inferences have to be drawn under conditions which are different
from those ob~ining in other type of studies.
DESIGNS OF EXPElI.1MEN'l'S 799
An example would make the above point clear. Supposing an
experiment has to be made in which the effeets of 4 different type. of
fertilizers on the yield of wheat is to be tested. In such a case the most
important point is to take into account the varying fertility of the soil
in different blocks in which the experiment has to be conducted. If
this factor is ignored then the result obtained would not be dependable
because the difference, if any, in the various fertilizers would not be
exclusively on account of their intrinsic worth but also on account of
the difference in the fertility of the soil on which the experiments have
been conducted. To remove these difficulties the technique of latin
squares is adopted and the experiments are conducted in such a manner
that the ultimate results are not effected by the varying fertility of the
soils of the various blocks. In the above example the field on which
the experiment will be cond)1cted shall be divided into 16 similar small
plots which will then be treated with fertilizers (A), (B), (C) and (D)
as shown below : -
A B C D.
B D A C
C A D B
D C B A
The above arrangement has been, made in such a manner that the
effects of difference in soil fertility would have no bearing on the ulti-
mate conclusions and inferences which are arrived at. The arrangement
of -the fertilizers in the above experiment is such that ":-
(I) Eam fertilizer appears 4 times in the design.
(ii) Each fertilizer is used once and only once in each row, and
(iii) Each fertilizer is used once and only once-in each column.
Under the above arrangement a detailed comparison can be made
of all the fertilizers by comparing the average results of the 4 sections
used to each fertilizer.
The great advantage of this type of design, which is only one of its
kind, is that it enables difference in fertility gradients in the field to be
eliminated in the comparison of the effects of the 4 fertilizers on the
yield of wheat. Hence even if there is a difference in the intensity of
fertility among the 4 plots there will b'e no significant difference between
their respective yields. In practice, the results of individual fertilizer
a:s well as the mean results of all the fertilizers in each section would be
used in analysing this design.
This method of analysis, however, suffers from one .defect Ilnd it is
that although each row and each column represents equally a114 fertili-
'):ers there may be considerable difference in the row and column means
FUNDAMENTALS OF STA'l'TS'l'ICS
bOt.l Up and across the field. However. this defect can be removed
by making the means of the rows and columns the same as the field mean
by adjusting the results.
Factorial Designs
In economic and social phenomena usually a large number of
factors affect a particular problem and this multiplicity of causes creates
certain difficulties in the way of analysis of results and also in making
generalisations. In such case attempts are made to study the effects
of a single factor at a time by making the other factors constant as far as
possible. It is not always possible to do so ('In account of the nature of
economic science where the various factors operating at a time are so
intermingled that their effects cannot be studied individually. However.
there are certain methods by which an idea can be obtained, though not
very precisely about the effects of a single factor by keeping aside the
effects of other factors which a're kept constant. Usually in such cases
the experimental design is so arranged that the effects of irrelevant factors
may be eliminated by analysis. This process is known as Balancing.
In the experiment referred in earlier pages of this chapter about the
corrosion of steel pipes the factor whose effect was to be measured was
the coatfng of pipes and the other irrelevant factor namely the type of
soil was eliminated by pairing the results. Similarly in the experiment
of fertilizers the difference in the soil fertility was eliminated by using
a Latin square design. I
But the process mentioned above should not be used fa those
experiments where the effects of varying more than one factor are to be
determined. The fo)-lowing example would illustrate the point. Sup-
pose that a university wants to evaluate 3 methods of presenting lectures.
Suppose that these methods are called A, Band C respectively. It
also wants to evaluate different times of a day at wqch the lectures
might be given namely morning, mid-day and aftetnoon. These two
factors-method and time of presentation can be evaluated simulta-
neously by the design shown below :-
Method
Time Total
A B C
Morning I I I 3
Mid-day I I I 3
Afternoon I I I 3
-
; 3 3 9
Here 540 students offering the subject can be assigned on the basis of
random sampling to the 9 sections shown above. Each section will
contain 60 stuaents. All the 540 students will take the same examination
and then the performance of each section will be measured by calcula~
ting its mean score in the examination.
DESIGNS OF EXPERIMENTS
Questions
1. Discuss the need and utility of planning statistical experiment.
2. What is meant by 'Experimental Designs' ?
3. Write, short notes on
• (i) Latin square.
(ii) Factorial Designs.
Statistical Quality Control 32
Meaning
It is a well-known fact that all repetitive processes no matter how
carefully arranged are not exactly identical and contain some variability
which cannot be assigned to any particular cause. Even in the manu-
facture of commodities by highly specialised machines it is not unusual
to come across differences between various units of production. For
example, in the manufacture of bottles, corks and cartons, even though
"highly efficient machines are used some difference may be noticed in
'\rarious units. It is difficult to say what actually is the cause of it, but
the facts remain that such variations are not uncommon. If the
difference is not" much it can be ignored and the product can be passed
off as O.K. But if it is beyond certain limits the ;trticle has to be
1:ejected and the cause of such variations has to be investigated. In
statistical quality control such limits are laid down within which the
repetitive process-whether it is manufacture of corks and bottles or
the estimatton of printers-errors or complaints from customers-should
vary. If the variation is within the specified limits there is no cause
for alarm but if it crosses the limits, then the hypothesis has to be rejected
and fresh decisions have to be taken. The uPl'er and the lower limits
thus laid down are known as Upper Control Lirillt (U.C.L.) and Lower
Control Limit (L.C.L.).
Statistical quality control thus is a wellaccepted and widespread
process on the, basis of which it is possible to understand the funda-
mental'principles and techniques by which statistical decisions are made.
Statistics by itself is a science in the application of which ·certain important
decillions nave, to be arrived at. It has already been pointed out t)1~t
broa.dly spealting statisticS refer to the process of collection, analysis
'and rnterpietation of .certain figures relating 1:0 some facts. At each
of thes~ stages before the actual application of the statistical technique
it is essential that some decisia;ns are arrh;ed at, because without them
the study would not'serve the purpose (or which it is meant. Statistical
quality control enables us to understand the important idea of statistical
decision . ~kin~. ~n a<:count of this iml?0rtance statistical quality
control IS applIed 1n varIOUS types of studies.
Statistical quality contrbl also provi<les an opportunity to introduce:
the important idea of sequential sampling. It enables us not only to
tak~ decisions about accepting or rejecting a particular inference but
it also helps in deCiding to withhold. judgment and to su~gest the collec-
tion of more data to arrive at a more fundamental deCision.
STA~CAL QUAunT CONTROL
future can be made only if the process is in .control because then the
variations can be attributed to chance.
Acceptance inspection as the name suggests is a method by which
it is possible to decide whether existing group of things should be accepted
or .rejected on the basis of a sample study. It is obvious that the universe
in such cases is finite and consists of items which are generally equal to
a lot. A random sample from a lot is taken and the sample material
is inspected by using definite statistical standards. On the basis of the
sample study a decision is taken about the quality of the entire lot and
on the basis of such decision the lot is either accepted or rejected. In
acceptance inspection the standards are fixed according to what is
required of the product whereas in process control the standards depend
on the inherent capability of the process. In acceptance inspection,
"for example, we can take a decision that if 5% or less articles ~n the
samples are defective the lot will be "accepted and if the percentage
exceeds 5 it will be rejected. In process control it is not possible to lay
down any such limit in the beginning and the limits will depend on the
capability of the process itself.
With this preliminary background we now proceed to examine the
'Process Control,' and 'Acceptance Inspection~ in some detail.
PaOCEss CoNTROL
Evolution
In old days when there was no division of labour and nO standardi-
sation of the goods prodQced and when an artisan produced a commodity
from beginning to end, there was no problem of process control. Each
unit was produced as an independent unit and there was no uniformity
among corresponding parts of the units of production. Later on,
with a change in technique of production and with the advent of division
of labour and mechanisatiOn, instead of one whole commodity being
produced at one place or by one person. various parts of the commodity
began to be p.roduced separately and in large n\lmbers. Then it became
necessary that the various corresponding parts produced must be
identical so that they can be assembled together without any clifficulty.
But it was found that due to variations in the raw materials consumed
or the tools used there was no perfect uniformity in the articles produced.
The recognition of this variation of the goods produced amounted
to admitting that some variations in process measurement were unavoid-
able an~ could not be escaped. ThiS leads to the concept of "tolerance
limits? and then variations began to be permitted if they did not exceed
certain prescribed limits. Later, more studies were made in this tech-
nique of laying down limits and thus maintaining a certain amount of
uniformity in the articles produced. In recent years many researches
have been conducted to improve the technique of process control to
enable manufacturers to make more confident and accurate predictions
about the quality of the commodity produced.
STATISTICAL QUALITY CONTROL .805
VCL (M+3u)
Mean
LCL (M-;u)
o L-________________
In the above graph VCL stands for Upper Control Limit, LCL for
Lower Control Limit and x represents the numbers obtained in an obser-
vance. If the population is normally distributed, successive obser-
vance if plotted would mostly fall within these limits. Only 3 out of
1000 can fall outside these limits.
O. C. CUrve
By O. C. curve is 'meant pperating Characteristics Curve. The
O. C. curve of an acceptance sampling plan shows the ability of the plan
to distinguish betweeri good and bad lots. In judging acceptance
plans it is desirable to compare their performance from a range of
,possible quality levels of the submitted product. For any given per-
centage of defective articles in a su,bmitted lot the O. C. curve shows
the probability that such a lot will be accepted by the given sampling
plan. In other words O. C. curve shows the percentage of submitted
.ot that would be accepted if large number of lots 'Of any speciiied qua-
ity were submitted for inspettion; The O. C. curve an be thought
)f as showing the probability of accepting lots over a stream of pro-
:lucts having a certain pettcntage of defectives.
812 FUNDAMENTALS OF STAnSTIC~
Conclutdon
"Without quality control you, as a producer or purchaser, are in the
same position as the man who bets on a horse race-with one exception,
the odds are not; poswd. Statistical quality· control will give you ,t,be
odds on which you wish to pl::~ce your money, your man power, your
tools and your materials. It will tell you at what level and with what
variation you are operating and, more important~ it will tell you when
your process, tools or materials change from that level and range of
variability-possibly inost important of all will be the change in outlook
on your purchases or production, and the inspection of both. The
dazzling light which statistical quality control throws on everything
surrounding the charatteristic bein$ examined many time reveals
startling facts, sometimes good, sometunes bad. You will· be shocked
out of your complacency. Your philosophy will change' for the
better. Variability will be recognised as a natural inherent characteristic
of your production or, if you are buying of the incoming material."
.FRANK M. S'rAD'MAN
Questions
1. DiSCt1ll the need lind utility of Statistical Quality Contest.
2. What do you undcmtaod by "Process Control" ? How does it differ from
'Acceptance: I11Spcction' ?
3. Write II note 011 the utility of Control Cbatts in Statistical Quality Control.
4. ,HoW would you .elect a sampling plan fOl: acceptance inspection ?
5. Write lIhort ftotes oft :
(i) O. C. Cun-e,
(ii) Sequential Sampling.
(iii) Selection of Control Limita.
(ill) R.elatI~Dlbip of C>otrol Chart IU1d Normal Distribution.
Growth of Statistics in India 33
SBCnON (1)
S'1'AnSnCAL ORGANISAnON
every year from London till 1923 'When its publication was done from
India. In the year 1874 Sir John Strachey, the then Governor of North
Western Province (now called Uttar Pradesh) wrote to the Secretllryof
State for India suggesting him the creation of a department for the
collection of statistical information regarding trade and agriculture and
!he appointment. of Ii Director of Agriculture and Commerce. It wll:s
In accordance wIth these recommendations that a Department of AgrI-
culture and C~mmer~ was set up in this pr?vince in the year 18~5.
One of the maIn functIOns of this department was to collect trade statis-
tics. It?d to suggest )Vays and means 0( improving ,the agri~l~ral
statIstics of the country. A little later the Indian FamIne CommISsIon
recommended the apP9intment of a Director of Agriculture in each
province and th~ appointment of Statistical Officers to assist him in his
work. In accordance with these recommendations, Agricultural Depart-
ments were opened in various provinces and the Central Agricultural
Department which was created in 1871 but which was closed due to
financial stringency arising out of the Afghan War, was also revived
with a view to co-ordinate the 'Work of the ve.rious provincial agrlpll-
~ural departments. Though these agricultural departments were pri-
marily concerned with the improvements of agriculture yet they collected
valuable statistical information about various agricultural problems.
In the yea:r 1881 the first population census was taken but since it was
not country-wide and was not complete it is usually left out of account.
Population censuses at that time and ,even as)ate as 1941 did not need
any permanent staff or department and as such no statistical machinery
'Was established for this purpose. The CensuS Department and the
Census Staff had a very short duration of service and used to be dis-
banded soon after the census operations were over. In the year 1881
the Imperial Gazetteer of ,India was also published for the first tjple. It
.contained economic statistics of the different parts of the eQuntry. Dur-
ing the last few years of the 19th century various departments of the
Government of India started collecting and publishing statistics infor-
mation relating to their subject. It was.in tl}e year 1894 that the first
crop forecast .of wheat production was. made in this country and in
subseq~nt ye~rs. forecasts were made of other agricultural commodities
also. A p1(Lblication entitled Reports of ,Agricultural.Statistics of British
India was brought out by the Revenue and Agriculture Department$ in
the year 1886., The statistics of foreign trade were _1Jublished by the
Finance and Commerce Department and in 1895 a Statistical Bureau was
set up to deal with the' agricultural statistics .and the statistics of foreign
trade. This Bureau 'Was bended by the Director-General of Statistics.
20th tenlll'l'J. In the' year 1905 the office'of the :I:)irector-General
of Cop:unercia! IntelligenCe 'Was created'to maintain liaison between the
Goverpment and the businessmen. The Director- Genetal of Commercial
Intelli~ce w,as also to look ~er the work of the Statisti~al Bureau
which was formerly under the Director-General of Statistics. In the
year 1906 this d~rtment brought out the first issue of the Indian
Trade-Journal. In the year 1912 when the headquart_ers of the Govern-
8161 FUNDAMENTALS OF STATISTICS
ment of India were shifted from Calcutta to New Delhi it wa~ [hought
desirable to separate Statistics from Commercial Intelligence. This
~eparation dislocated the work of both the departments and ultimately
1n the year 1922 they were merged again. The designation of the head
.;1. ~his d~l?a~~~n~ was the~chapged to Director-General of Commer-
cial Intelligence and Statistics.
~nurin'g ih~ first World War the folly of not industrializing this co-
untry and not keeping adequate figures about various economic problems
was realised by the British Government. The country got an impetus
for industrial development during this period and consequently the
question of collection of statistics also came to the forefront. The
Indian Economic Enquiry Committee was appointed in the year 1924
under the chairmanship of Sir M. Viswesvarayya with a view to survey
the then existing statistical material in this country and to make recom-
mendations for improvement, In its report the Committee recommended
that the statistics collected by all Central ~nd Provincial departments
should be placed under the supervision of a Central Authority and
further that each province should have a Statistical Bureau. The Royal
Commission on Agriculture agreed with the observations of the Indian
Economic Enquiry Committee but was of the opinion that the Central
Statistical Authority should merely be a co-ordinating agency and that
the statistics should be collected by various department$ separately. In
1931 the Royal Commission on Labour emphasised the necessity of
collecting various types of labour statistics and it suggested that there
should be legislation to facilitate the work of the collection of data. The
Royal Commission suggested the creation of an Imperial Council at
Agricultural Research. Its primary duty was to promote, guide and
co-ordinate agricultural research and to act as a clearing-house for
information in regard not only of reasearch but other general matters
also connected with agriculture and animal husbandry. It was also
to take over the publication work done by the Imperial Agricultural
Department. Though the Government of India did not accept all the
recommendations of the Commission in this connection yet by a resolu-
tion passed on the 4th of August. 1930. the Secretariat of the Council
of Agricultural Research was constituted as a department of the Govern-
ment of India. This department though primarily concerned with agri-
cultural research had a statistical establishment also and it co-ordinated
the agricultural research carried at different places. In 1933 a Statistical
Research Bureau was set up in New Delhi for the purpose of analysis
and interpretation of economic statistics. In 1933 MIs. Bowley and
Robertson w~re appointed to conduct an economic census of India.
They recommended the appointment of a permanent economic staff with
a Director of Statistics at the centre. The 'Work of this organisation
was to co-ordinate the statistics collected by the provincial and central
departments and also to conduct a census of production and census
of population. These recommendations could not be implemented
by the Government, but in 19.38 an Office of the Economic Adviser to
the Government of India was created. The functions of the Economic
GROWTH o.P STATIS'I'ICS IN INDIA 817
Adviser included collection and analysis of economic statistics. The
Statistical Research Bure~u started in 1933 was merged with this office.
The second World War again found India without a well-organised
statistical machinery. With the outbreak of War in 1939. need was
felt of collecting statistics about a large number of problems and this
resulted in setting up of small statistical organisations in various depa ~
ments of the Goveroo\cnt both at the Centre as well as in the provinces.
These statistical units collected and analysed statistics and advised the
Government on matters relating to their respective fields. The Govern-
ment felt difficulty in the collection of industrial statistics and to over-
come it an Industrial Statistics Act was passed in 1942. The Directorate
of Industrial Statistics condu<;ted the first Census of Manufactures in
1946 after the Census of Manufacturing Industries Ru1es were passed.
The Labour Bureau started constructing cost of living index numbers
for certain urban and rural areas with the base year of 1939. In 1947
tbe Economic Adviser's Office also started publishing the General
Purpose Wholesale Price Index Number. This index number replaced
the earlier index number whicb was issued by the office of the Econo-
mic Adviser and which was of a sensitive type being constructed out
of 23 commodities only. A National Income Committee was set up
in the year 1949 and it has given estimates of India·s national income
for a number of years. National Sample Surveys were conducted
(and are still being carried on) for the first time in the year 1950. A
Statistical Unit was established in 1949 to co-ordinate all activities rela-
ting to sta.tistics collected in this country and later on this unit deve-
loped i~to Central Statistical Organisation in the year 1950.
In the year 1951 an international statistical conf..!rence was held at
Calcutta to study statistical problems which were common to all Coun-
tries and to suggest improvements with a view to bring about concep-
tual uniformity- in the data collected. The Collection of Statistics Act
was passed in 1953 and it empowered the Government to collect all
types of statistics relating to any matter. In the year 1956 the All India
Agricultural Labour Enquiry was conducted to collect upto date wage
statistics and other important facts relating to labour conditions in the
country. The AU India Rural Credit Survey which was conducted
in the year 1951-52 to collect statistics of rural indebtedness and other
problems of rural finance was followed up in subsequent years and
very valuable figures were collected. The Indian Statistical Insti-
tute. Calcutta and the Indian Council of Agricultural Research have
been doing very valuable work in the field of statistical research and
in the year 1960 the Indian Statistical Institute has been declared· as
an institute of national importance recognised by the Government.
Present Position. The present position of the statistical organisa-
tion of India can best be understood in the background of the Indian
Constitution_. According to the present Constitution there are Some
items over whiph the Central Government has exclusive control while
there are others which are under the direct jurisdiction of the State
Governments. There are some items which are under the jurisdiction
52
818 FUNDAMENTALS OF STATISTICS
f
of both the Central Government and of the State Governments. The
important items of the Union List are Defence, Railways, Posts and
Telegraph, Currency and Foreign Exchange, Banking, Trade and
Commerce with foreign countries, Census, Customs and Excise Dutios
and Income Tax. The Central Government is responsible for the collec-
tion of statistics with regard to all these items. The States List includes
items like Public Health, Agriculture, Livestock, Irrigation. Forest and
Fisheries, etc., and with regard to these items statistics are collected by
the State Governments though the Central Government has also a right
to frame laws relating to any of these items. The concurrent list inclu-
des Vital Statistics, Economic and Social Planning, Trade Unions,
Social Insurance, Labour Welfare, Relief and Rehabilitation, Price Con-
trol, etc., and with regard to these items both the Central Government
and the State Governments can frame laws and collect statistics. It
s·hould not be taken to mean, however, that there is a rigid line of de-
marcation between the fields of operation of the Central and the State
Governments. In fact there is a co-ordination in the 'Work of the
<:;entre and the States. The Centre acts as a co-ordinating agency and
publishes the statistics collected by various states on an all-India basis.
The above survey clearly indicates that the statistical organisation
in this country has gradually been decentralised. Formerly we had a
highly centralised system of the collection of statistics and the Depart-
ment of Commercial Intelligence and Statistics 'Was the pivot round
which the wheel of statistical organisation revolved~ All important
statistics were collected, compiled and published by the Department of
Commercial Intelligence and Statistics. It used to publish statistics
relating to Agriculture, Inland a_nd Foreign Tl!lde and Prices, etc. With
the expansion of the scope of economic statistics in this country this
single department could not cope w!th the situation and it 'Was thought
advisable to decentralize the system and thus to distribute huge task of
the collection of statistics to a number of statistical units. With this
view in mind the statistical organisation of the country was gradually
decentralized. At present, so far as the centre is concerned each minis-
try has a statistical unit (some have more than one) which is responsi-
ble for collection and compilation of statistics relating to the subject of
the ministry. There are about 90 full-fledged statistical organisations
attached to the various ministries at the centre. Important amongst
them are as follows : -
T. Ministry of Food and Agrimltllre
(a) Directorate of Economics and Statistics.
(b) Directorate of Marketing and Inspection.
(c) Statistical Wing of-the I. C. A. R.
(d) Statistical Branches of (t) Forest Research Institute, (ii)
Central Rice .Research Institute, (iii) Central Marine and
Fisheries Research Station, etc. .
(e) Institute of Agricultural Research Statistics.
'j
GROWTH OF S'I' A'I'lS'I'lCS IN INDIA 819
2. Ministry of Commerce and Indristry
(a) Department of Commercial Intelligence and Statistics.
(b) Office of the Economic Adviser to the Govt. of India.
(C) Directorate 'Industrial Statistics.
(el) Statistical Sections of offices of (i) the Textile Commissioner,
(ii) the Iron and Steel Controller, and (iii) the Chief Con-
troller of Imports and Exports.
3. Ministry oj Finance
(a) Department of Research and Statistics of the Reserve Bank
of India.
(b) Research- and Statistics section, Department of Company
Law Administration.
(c) Statistical Branch (Income-tax) of the Central Board of Re-
venue.
(d) Statistics and Intelligence Branch (Customs and Central
Excise), Central Board of Revenue.
4. Ministry of LaboRr and Employment
(0) Labour Bureau.
(b) Statistical Unit in the Department of Mines.
(c) Statistical Branch-Agricultural Labour Enquiry.
Cd) Statistical Section of the Directorate of Resettlement -and
Employment.
5. Ministry of Home Affairs
Office of the Registrar-General and Census Commissioner of
India.
6. Ministry of Healtb
7. Cabinet Secretariat
Central Statistical Organisation-Directorate of National
Sample Survey and National Income Unit.
8. Planning Commission
Specific technical studies being conducted in the different
dIvisions and by the Evaluation Organisation.
Similarly under other Central Ministries also there are statistical
units which collect and analyse statistics relating to the subject which the
ministry deals in.
In)he States also there is a decentralized type of statistical orga-
nisation. There are more than 101 statistical units working in different
states of the country. Almost all the states in the country have either
It Directorate of Economics and Statistics or Bureau of Statistics.
The state statistical organisation is responsible for the collection and
publication of statistics relating to the state concerned &.nd it has Dis-
trict Statistical Officers for exercising supervision and processing of
aata collected from various sources. The Directorate also co-ordi-
8£0 FUNDAMENTALS OF STATISnCS
nate tho Statistics collected by other units in the State. The Intu..
State co-ordination of Statistics is done by the C. S. O.
Decentralhmtion of the statistical organisation necessitates the
presence of an efficient co-ordinatiog agency so that statistics may be
collected uniformly throughout the country and there may not be un-
necessary duplication. Accordingly the Central Statistical Organi-
sation 'Was set up under the Cabinet Secretariat at the Centre in 1951.
The main functions of this Organisation are : -
(II) To ildvise the various ...Jinistries and other Government agen-
cies about statistical matters,
(b) To co-ordinate the statistical 'Work of different ministries
and Government agencies with a view to avoid duplication and to
maintain uniformity,
(t) To lay down standardb!;ed definitions of various terms with a
view to have a uniform collection of statistics so that comparability
may be achieved not only in the national field but in the international
field also, and
(d) To supply statistical data to U. N. O. and other international
bodies on behalf of the Government of India.
Besides the C. S. O. the Directorate of N: S. S. is the most im-
portant Statistical organisation in the country. It 'Was set up ill the
year 1950 and Is by far the most important agency forI the co.ltinuous
collection of reliable statistical data on random sample basis.
Apart from these government organisations, the non-govern-
ment organisations 'Worth mention are-(i) Indian Statistical Insti-
tute, (it') National Council of Applied Economic Research, (iii') Indian
Institute of Economic Growth, (ill) Gokhale Institute of Politics and
Economics. (,,) Institute of Applied Man-power Research. The I. S. I.
was set up in f 937 and ever since has been doing researches in Statis.
tical methods and imparting training for statistica~ a'Ssignments. In
1960, the Gqvernment of India passed the Indian Statistical Institute
Act and recognized it as an institute of national importance.
The above account of the gradual development of statistical orga-
nisation in India clearly indicates that it is only very recently th1\t we
have in our country a statistical organisation 'Worth the name. The
Government has realised the importance of the collection of statistics
and is- keen to see that statistics are collected in our country almost in
the same fashion as they are collected in other countries of the 'World.
It is common knowledge that economic planning cannot progress
successfully in the absence of adequate and accurate statistical data and
the Government is now doing its best to improve the situation in this
directiop.. Formerly considerable difficulty was felt by the Government
in the collection of statistics due to the reluctance of people to part
with factual information. To overcome this difficulty the Government
passed an Act named as the Collection orStatistics Act in the yeat_1953.
It gives legal powers to the Government to collect any type of statis-
tical information regarding industrial and commercial units. All types
GROWTH OF STATISTICS IN INDIA 821
POPULATION STATISTICS
list: was being prepared information was collected about the purpose
for which a house was used (namely for residence or for shop or work-
shop or school or any other institute etc.) In case a house was used
1S a workshop or factory. further infi:mnation about the number of
persons employed. type of work done. and kind of fuel or power
used was also noted down. Details were also obtained about the des-
cription of the house-type of walls and type of roof etc. Separate sex-
wise figures about the number of persons below and above !O ~ars
of age living in a house were also obtained. Thus the house list which
was prepared in the months of September and October, 1960 and which
was checked again in December, 1960. contained very useful infor-
mation.
(II') The most important change which was introduced in 1961
census related to economic characteristics about which statistics were
obtained. The occupational classification adopted in this censhs was
different from those adopted in earlier censuseS. For the first time
in this census the whole population of the country was divided in
two broad categories of "Worhlng" and "Not working." The 1951
classification of economic status (self-supporting, earning dependents
and not earning dependents) was entirely dropped.
Statistics about principal and subsidiary means of livelihood which
were collected for the whole population in 1951 census were obtained
only for certain classes of people in the census of 1961 as it was thought
that for a large majority of Indian population, p~icularly of rural
areas this distinction between principal and subsidiary means of liveli-
hood was meaningless. Moreover the basis on which the distinction
'between principal and subsidiary means of livelihood was made in
1951 and earlier census Was changed. Formerly the criterion adopted
~as that of income, So that the principal means of livelihood was
supposed to be one from which the largest share of income was derived
but in 1961 census income was replaced by time and the principal means
of livelihood was supposed to be one in which a person devoted a major
part of his working time.
~IIii) In 1961 census some of the questions which were asked
in 1951 census and which were not more important were entirely drop-
ped. for example the question relating to displaced t>ersons from Pa-
kistan, which was included if' 1951 census was dropped 1 n the last census
of 1961.
(viii) Many other minor changes were also made in the census
of 1961, for example. in earlier censuses prostitutes and concubines
were treated as unmarried but in the census of 1961 the marital status
indicated by them was noted down. Similarly the category of divorced
persons was extended and ren~med as ·separated or divorCed' so that
it could include persons who Were not formally divorced but who were
living separately without any intention of a reunion.
GROW'tH OF STATISTICS IN INDIA 835
In£ormation collected in 1961 census
Iriformation in the last census was collected on two diHerent types
of slips namely Household slips and Individual slips. We shall discuss
the itiformation obtained through these slips separately.
Household slips. The household slip was meant to collect the
following information about cenSus households.
(1) Is the housebo/d an institution. In the household slip it was to
be mentioned whether the enumerated household was an institution
like jail, asylum, a religious institution, hostel, hotel, hospital and
boarding. house etc. If it was any such institution then it was to be
clearly specified in the slip. The idea was to find out the number of
households which were different types of institutions as distinct from
ho.... s~holds which were family units.
(:2) Name of the head of th, household. The head of"the household
was supposed to be the person on whom fell the chief responsibility
for the maintenance or various members constituting a household.
Thus the head of the household was not necessarily the eldest member
of the family or a male. The head of the household could ·have been
a very young person of either sex. However, the enumerators were
instructed not to go in detail of this question and were asked to write
the name of such person, as head of the household, as was given by the
informant. In h:ostels, hospitals etc. the Superintendent was taken as
the head of the household and such households were classed as "House-
holds of unrelated persons." .
(3) Does the household belong to scheduled castes or tribes. _A list of
such castes and tribes residing in different districts of each State was
supplied to the enumerators and they were to find out if any household
belonged to these categories.
~4) Households engaged in cultivation and or household industries and
details of persons working in either or both cultivation and household industries.
This section of the household slip was divided in the following three
parts: (a) Cultivation, (b) Household Industry and (c) Workers at
cultivation and/or household industries.
It is clear from the above account that for the first time in our
country an attempt was made to find out the number of households
and also the number of people who were working in agriculture or house-
hold industries or in both. Distinction was also made between whether
they were ownl?rs or only hired labourers. These ~ tatistics would be
extremely useful in studies relating to the basic structure of the Indian
economy.
Individual slips. On individual slips statistical information was
collected separately about each single individual of the country. One
slip was filled for one individual only and the following information
Ws.s collected.
(1) (a) Name. The name of the person to whom the slip related
was noted down-. If the name of a lady was not disclosed then in place
1:l36 .PUNDAlIIENTALS OF STATISTICS
In case of item No. (iii) above the name of the district of b-irth
and in item No. (iv) the name of the State of birth and in item No. (v)
the name of the country.of birth were noted down.
(b) Whether born in vii/age or in town. This information was collect-
ed separately fr~Ol the information of question (4) ta) above. Per-
sons born in places wEich were not considered a town at the time of
their birth but were in the category of town at the time of census were
considered to have been born in town.
(c) Duration of reJidence if born elsewhere. This information was
collected about those persons only who were not born in the .illage
or town in which they were enumerated. If a person was born in any
other village or town of the same district where he was enumerated or
if he was born in another district (of any State) or in any other country.
his length of residence at the place of enumeration (in complete years)
was noted down. If the period of stay was less than one year. the
length of residence was recorded as Zero.
(5) (a) Nationality. If a person had a nationality other than
Indian then the name of the country of his nationality was noted down.
(b) R,ligion. Data were collected about all religions but sym-
bols were assigned only to Hindu. Muslim. Christian. Jain. Budh and
Sikh religions. For other religions the actual name of the religion
was noted down.
(c) Sdmwled castes and JchetbtJetl tribu. The answer to this enquiry
WAS recorded only if a person belonged to the scheduled caste or scheduled
tribes. A list of such castes and tribes was prepared districtwise and
supplied to the enumerators -and they were to write down the Ilctual
caste or tribe to whiCh the person belonged and not in general terms
like Harijans or untouchables or scheduled castes. Scheduled caste
persons could have belonged only to Hindu or Sikh religions though
Scheduled tribes could belong to any religion.
(6) LiterafJ and etiMeation. The following information was collect-
ed about literacy and education.
(a) A number of persons who could neither read nor write or who
could read but not write. Such persons were treated as illiterate.
(b) Persons who could both read and write. Only such person
were treated as literate and the test of literacy was whether a person
could read and write a simple letter.
(e) Standard oj etiMelltion. If a person was a literate (that is. he
could both read and write) and he had passed some examination. a
further enquiry about the highest examination passed was made and
the anSWer recorded.
(7) (4) Mother 101lf!". For purposes of census mother-congue
was supposed to be the language in which a person's mother spoke
to him or her in childhood or the language commonly' spoken in the
family. If the mother of a person died in his childhood, the language
838 PUNDAl<tBNTALS OF STATISTICS
commonly spoken in the family during his childhood period \VaS taken
as mother tongue. 'For infants and 'deaf and dumb people the language
spoken by their mother was recorded as their mother tongue.
(b) Any o/lher languag,. If a person knew one or more languages
( either Indian or foreign) other than the mother tongue, they were also
noted down. However not more than two such languages were
recorded for a pers~n.
(8-11) Working POPll14tion. '{hose who were classed as Working
could belong to anyone of the following categories:
(8) Working as cultivators.
(,9) Working as agricultural labourer.
(10) Working in any household. industry and
(11) Working in occupations and classes other than those mem-
tioned in 8, 9 and 10 above. .
Main occupation. If a person belonged to more than one ·category,
say, he worked as cultivator as well as in household industry, he was
included in both these classes. All such peqple who were entered
in more than one class were further asked a CJ.uestion as to which of
these categories was their main work and which ranked as No.2 or
No.3 as the case may be. For the purpose of this census the main
occupation was that in which the largest aJ:!lount of time was devoted.
It should be remembered that in earlier censuSes when a distinction
was made between principal and subsidiary mejlns of livelihood the
criterion was not time devoted, but incotp.e, sb that the principal
means of livelihood 'was taken as one from which a person earned the
largest amount of income. In this census, however, the criterion was
changed from income to time.
Prisoners who were undertrials and were not convicted were
supposed to belong to that occupation to which they belonged before
their arrest. Similarly patients in hospitals were supposed to belong
to that category of work to which they belonged before they were ad-
mitted as patients. However convicted prisoners, lunatics in asylum
were classed as "not working."
(12) Not Working Population. All persons who did not do any
work and consequently were not included in categories associated with
items 8, 9, 10 and 11 were classified in this group. Eight categories
of such persons were mentioned and they were as follows:
(i) Wholetime students or children going to school who
did not do any work like making articles at home for
sale or who did not help in household industry or busi-
ness or cultivation.
(it") Persons eng~ged in unpaid dome!>tic work: (like house-
wives) and who did not do any other work like making
article for sale and who did not even assist in household
cultivation. business, trade or industny:
GROW'IH OF STATISTICS IN INDIA 839
(iii) Dependents including infants and children not going to
school. and persons permanently disabled due to illness,
old age etC.
(;,,) Retired people who were not re-employed, rcntiers. per-
sons living on agricultural and non-agricultural royalty.
rent or dividends, persons of independent means for
which they did not have to work and who did not do
any other work.
(v) Beggars, vagrants and independent Women whose source
of income was not disclosed or other persons whose
source of income was not known.
(:Ii) Convicts in jail (not under-trials) or inmates of penal,
mental or charitable institutes.
(vii) Persons who were not employed in work at any time
before but who were seeking work: for the first time.
('tIiii) Persons who were employed in work formerly but were
without work at the time of enumeration and who were
in search of work.
.Persons who could -not be classified in anyone of the eight ca
tegories were included in category (v). Persons who were not working
but had been offered work which they had not joined at the time of
enumeration were included in category (iii).
(13) Sex. People were classified as males or females. Eunuchs
and hermaphrodites were treated as males.
General criticism of Indian po,pulation census
Data 1I0t com/arable. It is a sad commentary on the organisation
of Indian populatIon statistics that the data collec~d iq the last eight
censuses are not strictly· comparable with each other because the de-
finitions of the various terms used and the classifications under which
the data have been published have been changing from census to cen-
sus. The area covered in various censuses differs. In the year 188t
the census covered an area of 1,382, 624 square miles, in 1941 ofl,581,410
square miles and in 1951 census the area covered was only 1.18 million
square miles (the reduction in area was due to the partition of the
country).
Figures inaccurate. Besides this the figures collected at the time
of the census are not very accurate also. There are various reasons
for the inaccuracies in Indian population census data. One of them
is the indifferent attitudes of the Indian population towards census opera·
tions in general. This apathy is on account of the fact that population
census in India is a 'Very temporary affair and according to one writer
the Indian Population cenSus is like a comet which appears on the
Indian hori2:on once in 10 years attracts much attention and passes
away unnoticed. It has already' been said that there was no inter-
censal activity in our country till recently and on account of this the
general interest of the people in population census Wa3 vary short·
840 FUNDAMENTALS OF STATISTICS
lived. It should not be forgotten that it takes two to\'make the cen-
sus-the enumerator afid the citizen, and the role of the latter is the
more: important of the two. After all the accuracy of the census data
would depend on the replies given by -the citizen rather than on the
efficiency of the enumerator. It is gratifying tr note that after the last
census of 1951 the population census department has not been com-
pletely abolished and IS carrying on certain demographic surveys the
findings of SOme of which have been publisned very recently. It is
necessary that such surveys are conducted very frequently so that
people are always in touch with the activities of the census department
and if this is done there is no doubt that the general character of the
Indian figures would considerably improve in future.
An inexpenriv6 renllU. The Indian census is in fact the cheapest
.:ensus in the world. In othet: countries enumerators are paid on the
bas.ls of the number of people counted by them. In our country the
enumerators were given a certificate for the work done. In.the 1951
census a special medal was given to efficient workers. There has
been a strong feeling that about the million workers engaged in
census operation cannot be expected to put their heart in the work
unless they were paid something. In 1961 census a token payment
of Rs. 4 per supervisor and Rs. 16 per enumerator in an average
block was made. Besides the: question of payment the next
question is tha t of training. Our enumerators are not trained and
there is need of having better trained enumerators. The type of
training that they get is hardly any good and particularly in rural areas
the Lekhpals etc., do not take adequate care to see that the returns are
filled up a~curatdy. It is, therefore, extremely necessary that we should
i:),IoI.tt as far as possible, literate and well-trained enumerators to conduct
census operations in future.
We have already pointed out that in the last census a sample
verification of census figures was conducted on a random sampling
basis and it was found that there was an under estimation in the original
figures. On the basis of sample verification it was estimated that the
error was of the magnitude of 1.1 i.e., for every 1000 persons about
11 were omitted. It goes without saying that if both the citizen as well
as the enumerator realise their responsibilities such error in counting
would not arise.
Indian age-returns are admittedly unsound. One very important cause
of it is the ignorance oj people. 'Ignorance is something quite natural in
a population which is illiterate and which does not keep any systematic
record of age. Besides ignorance, jndifference is also responsible for the
unreliability of age statistics in India. As the Census Superintendent
of Central India pointed out in 1931, "Indifference arises from the
outlook on life. The average man or woman in India matures eady
and is short-lived. Life presses heavily on them and fatalism over-
powers them. Childhood, adolescence, middle life, and old age are weI
marked stages in life and the Hindu sociological system has laid dOW
GROWTH OF STATISTICS IN JiNDIA 841
conduct of life and presented rules for the observance of customs and
practices. It matters not if the present age is not known."
There are some other reasons also which are responsible for the
deliberate misstatements about age in Census Reports. In case of un-
married girls who have reached puberty the age returns are definitely
wrong. The reason for it lies in the fact that high class Hindus feel
shy to admit that they have unmarried daughters already pubescent
who should have been married by that time according to the custom
of the community. EO'umeration may also be short at this age on
account of the practice found in certain parts of India of secluding
girls at the age of puberty. At this age the returns regarding the
ages of males are also wrong, and the reasons of it may lie in the desire
of a person to appear definitely either as a boy or as a man. Early
marriage is also responsible for the wrong returns at this age, because
there is a tendency to exaggerate the age of boys just married and to
understate the ages of those who are not married. Widowers and
bachelors who have advanced in age and who wish to marry usually
understate their ages; in case of elderly Womer the returns are usually
correct but recently married girls and particularly those who have
become mothers tend to overstate their ages because motherhood
implies some elderliness. Old persons of both sexes generally overstate
their ages as it gives them a sort of pride. Some people overstate their
ages on account of superstition. -Many people believe that by telling
the age correctly the length oflife would be reduced. Sometimes wrong
figures are given for fear that they may be used in the court of law as
evidence of age. Moreover there is a tendency to quote the age in
round figures. The most popular digits are 0 and 5. This defect can
be removed to a certain extent by grouping the age returns in suc,h a
way that the effect of the bias is lessened if not totally removed. It can
be achieved if there are alternately 3-yeatly and 7-yearly classes so that
their midpoints are always 0 and 5. This was actually done in the 1931
census. Deliberate misstatements of age can be reduced if the persons
are made to understand and appreciate the fact that all ages are merged
in groups and that their names disappear from the age records. If there
is any doubt about the age of any person it can be verified by hastily
asking him his age on the happening of a particular important event.
In ,the last census the enumerators. were equipped for this purpose with
a calendar of important events-events of such local importance that
everyone could be expected to remember one or other of them. The
appointment of female enumerators can also be helpful to a certain
extent as they can easily See the family members themselves and can
verify certain facts. In this respect the 1951 Census Report makes an
interesting observation that "we do not apparently share one weakness
which is prominently observed in some other countries. Our WOmen
folk-spealdng generally and in large numbers-are not keen on being
recorded as younger than they are."'
Statisti&! of oCfllpation. Our returns of occupational structure lW:e
also not very satisfactory. The manner of classification of the popula-
842 FUNDAlmNl'4LS OF STAnSnCS
It has been found that rnpia has a higher age specific fertility
(births per 1000 females in the samo age groups) for all ages as well liS
individual age groups when compared with U.S.A., Canada, England.
Belgium. Denmark. France and even Japan. It has also been found
that fertility in India follows a pattern which is entirely different from
the pattern experienced in other countds of the world. In our coun-
try fertility is low in the age group 15-19 but it rises very sharply in the
next age group of 20-24 and rises slightly in the age grotlp 25-29.
After this there is a decline. In the U. S. A., England and Japan
also the fertility in the age group 15-19 is very low. In these count-
ries also fertility rises Sharply in the age group i-0-24. In U. S. A.
it reaches the p(:ak in this very period. In England and Japan fertility
rises further but slightly in the age group 25-29 and then there is a
decline. In the U. S. A. the decline starts froll1 the age of 24. In our
COuntry the declino in fertility even after the a&e of 29 is gradual while
in other countries the decline is very ~kep. ThIS is the reaSOn ,'Why ferti-
lity in higher age groups say 40-44 even 45-49 is very high in our
country as compared to the other countries of the world.
In the year 1952 Sri R. A. Gopalaswami, the Indian Census
Commissioner and Regist.rar-General, had prepared a scheme for the
improvement of population data. This scheme 'Was divided in t'wo
parts, the first part related to a simultaneou;; revision of the National
Register of Citizens and the electoral rolls and the second part related
to the sample census of births, deaths, etc. In accordance with these
recommendations a sample census of births and deaths was carried out
in all the States excepting Mysore, Hyderabad, Orissa, West Bengal.
Bhopal and Delhi. This study covered 20 states 'With a population of
27.83 Crores (in 1951) or in other words 78'70 of the total population
of India. The onumeration took place bet'ween Scptcll1ber, 1952 and
January. 1953. In Uttar Pradesh it took place in 1954. The first part
of the scheme relating to the simultaneous revision of the National
Register of Citizens and the electoral rolls could be tried only in
Madras, Coorg, Vindhya Pradesh, Madhya Pradesh and Madhya
Bharat.
Some of the findings of the sa.tnple census of births and dtaths
are as follows :
(1) In a majority of the stat~s the bi~th rate has been found to
be b::tween 30 and 40. In almost all cases the birth rates of the sample
are less than those computed by the Census Actuary for 1941-1950. It
mJRnS that there has been:. c;:ertain amount of under-estimation 'of births
in the sample census. The sample I.:ensus was cOnlined to 1%) of the
households in the selected dist.ricts. It may also be that the birth rate
in India at present might be less than the birth rate during the decade
1941-1950 and this may be the reason why the sample CenSUS has ob-
tain..."<l birth rates lower than those obtained by the Census Actuary.
But it would be highly dangerous ~o d.raw any conclusions about the
GROWTH OF STA nsncs IN INDIA
.
845
.
fall in death rate from these data unless this tendency is confirmed by
other facts.
(2) Child birth index for completed maternity in India is placed
between 6 and 7, i.e., on an average a woman produces between 6 to
7 children during the reproduction age period. In Japap this figure is
5.3, in U. S. A. 3.3 and in England 2.6 only. .
(3) For mothers of complt:ted fertility the child loss varies from
about 20~·;, to 33~ ;,. In other words, about 1(5th to 1(3rd of the child-
ren born pr,~deccasc their mothers, undoubtedly a very high figure
which shows the extent of colossal loss of humarl resources which takes
place in our country.
(4) In a majority of the States ofIndia between 40 and 50% of the
births arc of the first and second order. If the total of the births of
the first three ord,t!rs is taken it works out to be between 60 and 70%. In
Japan this percentage is 74, in U. S. A. 76 and in England 85. Accord-
ing to the Census Commissioner all births after the third lead to un-
wanted increase in the size of the family in an undeveloped countey like
ours and it amounts to what he calls improvidence. The proportion
of births of the fourth and higher orders has, therefore, been called the
incidence of Improvident Maternity. This incidence in our country
ranges between 40 and 50%. In Japan this incidence is 26.1% and in
U. S. A. 23.5% and in England 14.9% only. A very interesting feature
of this study is that the incidence of the improvident is higher
in the urbar: than in rural areas. It is rather a very unexpected
result.
Estimated by Reverse
Registered Survival method
Decade
......--_.-.--
Birth Death Birth Death
tate tate tate rate
1901-10 37 48.1 42.6
1911-20 37 34 49..2 48.6
1921-30 34 26 46.4 36.3
1931-40 34 23 45.~ 31.2
1941-50 28 20 .39.9 27.4
The following table sho'\VS the bi~h. death and infant mortality
rates since 1947 based on the registration data.
TABLE I
/
1901 23,62,81.245
1911 25,21,22,410
1921 25,13,52,261
1931 27,90,15,498
1941 31,87,01,012
1951 36,11,29,622
1961 43,92,35,082
TABLE II
India U. S. A. U. S. S. R. World
-------
Population (in crores) 36.1 15.1 19.4 240
Land area (in crore acres) 81.3 590.5 590.4 3251
Area per Gapila (in cents)
(a) All land 225 1264 3246 1354
(b) Agricultural area 97 741 448 351
(&) Arable land 97 302 287 126
TABLE III
Rural and urban population (1921-1961)
Percentage of total
population
Rural Urban
1921 88.6 11.4
1931 87.9 12.1
1941 86.1 13;9
1951 82.7 17.3
1961 82.0 18.0
,
GROW'l'H OP STATISTICS IN INDIA 849
TAlILE IV
Size of ·rural and urban households in 1951.
TABLE V
Sex Ratio 1921 to 1961 (Females per 1000 males)
TABLE VI
Livelihood pattern in S(i;)me other countries
(Per one thousand self-supporting persons)
TABLE VII
Population Census 1961
Population Density Sex ratio Literacy
(in (No. of (Females Rate
Millions) persons per 1,000 (per 1,000)
per males)
Total square
mile)
1 I 4
States
Andhra Pradesh 35.98 339 981 212
Assam 11.87 252 876 274
Bihar 46.46 691 994 184
Gujrat 20.63 286 940 305
Jammu and Kashmir 3.56 878 110
Kerala 16.90 1,12/ 1.022 408
Madhya Pradesh 32.37 189 953 171
Madras 33.69 669 992 314
Maharashtra 39.55 333 936 298
Mysore ~3.59 318 959 254
Orissa 17.55 292 1,002 217
Punjab 20.31 430 864 242
Rajasthan 20.16 153 908 152
Uttar Pradesh 73.75 649 909 176
West Bengal 34.93 1,032 878 293
Union 'territories
Andaman and Nico- 63,438 (b) 20 617 336
bar Islands
Delhi 2.66 4.640 785 527
Himachal Pradesh 1.35 124- 923 171
Laccadive. Minicoy 24,108 (b) '2,192 1,020 233
1I1ld Amindivi Il'lands
Trip'ltra 1.14 283 932 202
,ALLIND!A 439.24 (c) 370 (d) 941 240
GROWTH OF SYAnsTIcs IN INDIA 851
TABLE VIII
Some Projections of Indian Population
by studying the density of population. He can find out the best mar kets
for the commodity in which he deals. He can make use of occupational
statistics to find out whether a certain area is inhabited mostly by labou-
rers or agriculturists or by people of a particular occupation. This
would enable him to estimate the demand of his goods at different
places. An inallstrialist can make use of the statistics relating to age
and sex of the population to find out the labour supply that he can
recruit from particular regions. A transport agenfY like railway would
find from the Census Reports the areas which are densely populated
and in which there is scope for the development of tr!lnsport. Simi ..
larly other types of institutions engaged in trade and commerce would
find very valuable data in the Census Reports.
Various types of statistics collected at the time of population
census are useful in their own ways. Thus statistics of marital status
throw light on the pattern of population growth of a country. These
statistics are very important from socio-medical point of view. Sta-
tistics of marital status are also helpful in studying h01lsing problellls
particularly in urban areas. They also throw light on the problelll oj
aep6llaenfY. Similarly statistics of age and sex are not only helpful in
making forecasts abo1lt poplllation for future b~t they are also very help.
ful in studying the problems of fertility. dependency, etc. They also
give an idea of the extent oj l!Ian power that can be mqbilized in times of
war. if necessary. Besides this if statistical data relating to other prob-
lems are classified on the basis of age and sex it becomes more useful
and can be analysed in a better manner.
The importance of the study of literacy and education is also very
grellt particularly in a country where a large number of people are sup-
posed to be illiterate and uneducated. The Government can decide
about the particular policy that it should follow in matters of education
in various parts of the country on the basis of statistics colle~ted at the
time of the census. In the last census in our country the Government
had collected statistics not only about whether a person was literate or
illiterate but also about the hi~hest academic qualifications of the edu-
. cated persons. On the basis oithese facts it is now easy for the Govern-
ment to lay down a policy with regard to the particular fJpe of ,alKa#01l
that is necessary and that should be developed in future. The intor-
mation about the ianll'au spoken by the people is gethered in some
detail, and this helps ln deciding the relative place of regional
lan~uages as medium of instruction for elementary and higher edu-
catlon.
Statistics relating to the economic characteristics of the people
are probably the most important statistics collected at the time of the
census. The data relating to the means of livelihood and the economic
status of the population can be utilised in a large number of ways. It
is possible from these data to find out which occupations and which
industries are overcrowded and on the basis of such studies the govern-
GaOWTH OF STATISTICS IN INDIA 853
AGRICULTURAL STATISTICS
Early beginnings
Ji4aning. In a broad sense tne term agricultural statistics is \)~ed
to refer to all statistics which have any bearing on argicultural economy,
such as statistics of land utilisation, production, crops, livestock, forest
and fisheries, agricultural prices, wages, land revenue etc. In CommOD
usage, however, this term is used to denote statistics of land utilisation
and of prqduction of crops. We shall be using this term in this sense
throughout this section and shall discuss statistics of agricultural prices,
wages, etc., in another section dealing with price statistics.
Barly b$ginning,r. The collection of agricultural statistics is not
something new to this country. In ancient days the kings and chiefs
who rulea the country used to collect figures of yidd of various crops
and the area under cultivation. The reason for this interest in agri-
cultural statistics was due to the fact that in those days most of the
public revenue was derived from land and as such it was necessary to
have records of area and yield of various crops to find out the amount
ofland revenue. Ain-c-Akbari and some other documents of Moghul
period clearly indicate the manner in which such staostics 'Were collected.
The British people also realised the itnportance of having agricultural
statistics as India was in those days, and even is today, primarily an agri-
cultural country. Prom the point of view of revenue collection, agri··
cultural statistics were not so important for those areas where there was
Permanent Setdement but in Ryotwari areas wbere the amount of land
revenue depended upon net produce· such statistics 'Were absolutely
necessary. Ryotwari system or the system of temporary settlement was
introduced in some parts of the country as early as 1792 and for this
purpose statistics of land, values cost of cultivation, prices of produce
and crop yields had to he collected. .Later on Ryotwari system was
introduced in various parts of the country and agricultural statistics
began to be collected more exhaustively, though on the same lines.
Crop jortta,rJ,r. In those days and even till recently most of the
agricultural statistics of our country were related to crop forecasts th~t
were made from time to time. The history of the present crop forecasts
dates back to the year 1861. Famines and droughts in India were a
common phenomenon in those days and the government had to start the
collection of agricultural statistics with a view to a.ppraise themselves
with the real position. As has been mentioned in earlier sections it was
in the year 1875 that the Department of Agriculture and Commerce was
established in Uttar Pradesh (then known as North Western ProvinCt!~
and later on, on the recommendations of the Indian Famine Commis-
GROWTH Of' STATIS.TICS IN INDIA 855
sion similar departments were opened in other Provinces and the
Central Dcpartm~nt of Agriculture which was closed by the Govern-
ment due to the financial stringency created by the Mghan wax was also
lcvived, to co-ordinate the activities of the various Provincial Agri-
cultural Departments. A Statistical Conference was held at Calc1.ltta'in
1883 and it strongly recommended early publication of crop forecas
and in 1884 the Government of India issued instruction to the Provin-
cial Governments to make an experiment in this line and to start with
wheat. Mer that from time to time both the Central and the Provin-
cial Governments have been taking interest in the matter of crop fore-
casts and in the year 1896 many other commodities were added to
wheat and crop forecasts of important commodities began to be made.
After the first World War certain improvements were made in the agri-
cultural statistics of the country and various researches were conducted
by the Indian Council of Agricultural Research and the Indian Statis-
tical Institute, Calcutta not only with a view to improve the methods
of agriculture followed in this country but also to collect agricultural
statistics on scientific lines. In the last few years the system of making
forecasts and collection of other agricultural statistics has undergone a
very great change and now-a-days crop forecasts which are issued ate
made on scientilic lines by the use of tandom sampling technique.
We shall now discuss some of the important agricultural statistic!.
available in out country at present : -
Area Under Crops
Two sets of acreage figures are available in our country at present
!lit· (i) Official Series based on village records and (it) the N. S. S.
Series based on Sample Surveys. We shall first discuss the Official
Series.
(1) Official Seri,s. Fairly detailed statistics are available in the
Official Series about area under crops. The total area under crops
is. broadly speaking, divided into toad-crops and non-food crops.
Pood crops are further sub-divided into food grains (consisting of
cereals and pulses), sugar, condiments and spices, fruits and vegetables
and other food-crops. Statistics relating to non-food crops are subdivi-
ded into oil seeds, fibres, dyes and tanning materials, drugs and narco-
ti'cs. fodder crops, green manure crops and other non-food crops.
In subsequent pages we shall. exami~e . these statistics in d~tail
and shall discuss the methods by WhICh statIstICS of area under varIOUS
:rops are estimated in our country from village records. As has
been mentioned earlier there are tw9 types of settlements in our
:ountry, namely, (a) temporary settlements and (1J) pe'rmanent settle~
nents. The system of temporary settlements was introduced in
:>ur country in the_year 1892, with a view to fix land revenueS for a
:Jeriod which was subject to change at the time of the nat settle-
ment. Ordinarily the interval between the two settlements was 25
:0 30 years. In order to determine:: the land revenue and to make
!stimates of forecasts, detailed statistics had to be collected about
856 FUNDAMENTALS OF STATISTICS
land revenue, land value, cultivation costs, yield and value of produce
etc. Most of the temporarily settled areas are in U. P., Punjab and
Madras.
The other type of settlement. namely, the permanent settlement
was mostly prevalent in Bihar, Bengal, Orissa and some parts of
eastern U. P. In such areas land revenue was permanently settled
and the question of revision ordinarily did not arise. After the indepen-
dence of the country the land tenure systems were changed and many
land reforms of far-reaching nature were made. The zamindari sys-
tem was abolished and an attempt was made to have owner culti-
vators in the country. We shall discuss the collection of area statis-
tics in these two types of settlements separately, because whereas in
temporarily settled areas there was a permanent reporting agency
collecting figures of area under different crops regularly~ in per-
manently settle9 areas where there was no question of the revision
of land revenue, there was no such agency for the collectiqn of area
statistics and figures relating to such tracts of land (which came under
permanent settlements) were in the nature of very rough estimates
and in many cases even guess work.
Temporarily settled areas
,Techniqll, ojl.lli1lllJlion. In temporarily settled areas the figures
of ~ci:eage a.re collected by the village accountant or the patlllari and
are recorded by him in his register popularly known in northern
India as Khasra. The village accountant is know~ by different names
in various parts of the country such as Karnam in south, Telathi in
Bombay and Karamchari in Bihar. In Uttar Pradesh also patwaris
have been replaced by 1,lcbpaJs. The work of the village accountant
is supervised by his immediate superior officer known by the name
Kanllngo in northern India. He is expected to check at least 7 percent
of the !:hasra number or the entries made by the village accountant.
He is to select those fields for cheCking where numerous changes appear
to have occurred during the year. The selection is not based on
random sampling. These figures of acreage under various crops are
collected on the basis of complete enumeration. Most of the geogra-
phical area of temporarily settled tracts is cadastrally surveyed anu
detailed maps are available in tahsils and district headquarters. The
village accountant who is the primary reporting agency is expected
to conduct a field to field inspection to find out the amount of, area
devoted to different crops.
,The figures supplied by different primary reporting agencies
within a district are totalled to find out the total area devoted to
various crops in the district. State totals are obtained by adding
district, figures.
lShorteomin!,s. Generally it is believed that area figures in the
te~porarily settled tracts are fairly saqsfactory. However absolute
reliance cannot be' placed on them due to a variety of reasons. The
most important reason is the inejfiGien&,y and carelessness of the primary
re porting agen&,y. The reports from many villages are not received
GROWTH OF STATISTICS IN INDIA 857
in time and in many cases they are not included in the compilation of the
figures. On account of this the coverage of the figures becomes incomplete.
There is somt: tendency in the primary reporting agency to avoid
a change. Many times field to field inspection is not done and either
the last year's figures are repeated or the registers are filled on the
basis of pure guess work. Messrs. Bowley and Robertson had suggest-
ed that this could be avoided if the supervisors and higher officials
issue more detailed instructions and kept better supervision. The
suggestion, however, seems to be untenable because already the Agri-
cultural Manual gives very detailed instructions both to the primary
reporting agency and the supervisors with regard to the
manner in which area should be measured and recorded. No amount
of supervision can remove this tendency so long {IS the primary re-
porting agency is unavoidably careless about these measurements.
In' fact the lekhpal and the supervisors are so much preoccupied
with other work; of miscellaneous type that it is too much to expect
more than a passing interest from them so far as this work; is con-
cerned. Some people have suggested that the work; of estimating the
area and yield should be done through some other agency. but there
are certain advantages in the statistical collection and reporting being
done by the agency which is responsible for the related administration
which in this case is the revenue agency. The data collected by such
agencies is lik;ely to be superior than those collected through adhoc
statistical agencies. Moreover when statistics are collected through
administrative agencies such figures as are immediately and urgently
needed, become immediately available to the administrative agency.
Besides, in the course of the collection of statistics such agencies gather
considerable amount of experience and k;nowledge (other than statistics
proper) and this can be utilised only when statistics are collected .by
the administrative agency concerned. For these reasons we are of
opinion that the collection of area and yield statistics should be done
by the Revenue Department and the practice should not be changed.
National Income Committee was of the view that improvement in area
statistic~ could be brought about if the total coverage of figures were
spread over a number of years, say 5, so that every year the crop area
was reported from only 1/5th of the villages. They thus proposed
to spread the complete coverage over a period of say 5 years. This
suggestion was made with a view to reduce the work; load of area repor-
ting to only 1/5th of its present burden. This suggestion would, no
doubt, reduce the burden of the primary reporting agency but the area
statistics obtained, would not necessarily be representative of the total
area distributed over various crops. The Government of India have
recently brought into force a scheme of central supervision and random
checking of the work of Itkhpals ana it may improve the situation to
a certain extent. It would be worthwhile to collect some data on
the basis of random sample surveys relating to the area devoted to
various crops and to compare these figures 'With those obtained by
complete enumeration. This has been done recently through N.S.S.
858 FUNDAMEN'JIALS OF S'I.'A'nSTICS
and it has been e"tablished on the basis of these such studies that even
111 the temporarily settled areas where there is supposed to be complete
.eaumc:ration the figures of area reported have been underestimates.
Another source of error in the estimates of area is presented by
mixed trops. Till recently there was no uniform practice of reporting
such area. The areas covered by several crops in a field ",ere es-
timated in various ways in different States and the estimates in SOme
cases were based on formulae prescribed by the State authority in
individual cases as it was impracticable to prescribe a general method
of calculation. These formulae and ratios were very old and they
needed revision because the composition of crops in mixed farming
undergoes changes quite often. Crop cutting experiments have to be
conducted t'1 find out the ratios and they have to be periodically revised.
The Technical Committee had suggested that in all cases the gross
unadjusted acreage of the mixture should be recorded separately for
each major crop mixture and published in Season and Crop Reports and
Crop Forecasts side by side with the net acreage of the components as
calculated at present. Where fixed ratios were used for estimating the
;omponents of the mixture the Committee suggested that they should
be fiXed for each district and their accuracy shoq,ld be tested at periodical
intervals during the crop cutting surveys. In case of minor crop mixtures
the .areas should be allocated to the various comI,'0nents according
to eye estimates-the practice followed in Punjab and some other parts
of the country. Recently a uniform procedure has been laid down
for adoption by all states, under which for major crop mixtures, the
gross unadjusted acreage under the mixture has to be shown side by
side with the net acr~ge under the components calculated according
to the prescnt practice. In respect of minor crop mixtures the areas
are to be allocated to the various components on the basis of pre-
~termined formulae.
Another factor responsible for inaccuracy and confusion in the
figures of area under various crops is the uncertainty whether the area
means the area lown or the area Ilittess/lIllY tropped. The present rule
is that it is the area sown and not the area harvested . .However if
the first sowing fails and if the area is given to another crop. then in
subsequent forecasts such area is deducted from the area in original
crop and added to the area of the subsequent crops sown.It is suggested
that if on the first sowing there is no germination and even 1f the area
has not been devoted to any other crop such area should not be counted.
However if there is germination hut the crop is extremely bad such area
must be! included. The Technical Committee suggested that the normal
percentage of harvested to sown areas should be estimated through
random sample surveys. This is indeed a very good suggestion to
find- out the difference between the area sown and the area actually
harvested and cropped.
Area lown I11Or, than onte is also responsible for SOme confusion
abuut statistics of area under various crops. Area sown more than
once should be counted separately each time. This practice is
GROW'I'H OF STATISTICS IN INDIA 859
revenue staff and as such the acreage figures under different crops
cannot be accurately estimated. There is no uniform system of
measuring the area under various crops and different practices are
followed in Bengal, Bihar, Orissa and certain parts of Madras.
The police chaukidar or the village headman usually collects these
primary statistics and there are hardly any senior officers corresponding
to supervisors and kanungos of the temporarily settled areas for supervi-
sion. Till recently, therefore, the estimates of area under various
crops reported for permanently settled regions were largely in the
nature of guess work. The village headman passed on the estimate:; to
the sub-divisional officer who in turn sent the estimates to the district
officer who modified them in the light of his OWn experience. The
Director of Agriculture modified the figures of district officers on the
basis of all the figures received from various districts. It is obvious
that the system was full of defects.
Need. If correct statistics about area under various crops have
to be collected for permanently settled regions it is necessary that
they should also have a permanent staff' of the type that temporarily
settled areas have. Further, it is necessary to have detailed surveys of all
such areas and maps should-be prepared for various tahsils and villages.
So far as area statistics are concerned it has been reco~nised on all hands
that ther' rhould be complete entimeraNOn and that samplIng methods should
not be followed. \
Recent &hanges. Certain steps have been taken in this direc-
tion in the States of Bihar, Orissa and Bengal. In 1944-45 the Bihar
Government established a primary agency of Karamcbaris who have
since been recording the statistics of area on the basis of complete
enumeration. In West Bengal also a plot to plot survey was carried out
in 1944-45 and shortly afterwards a system of collecting acreage statistics
on the basis of random sampling surveys was introduced. Ad boc
investigators were appointed to collect agricultural statistics and the
central Government also gave financial assistance "for the scheme.
It is expected that gradually all the permanently settled regions in
the country would follow a system of complete enumeration and would
have a permanent staff for the collection of agricultural statistics as
the temporarily settled areas have.
In.Jian States under British rule presented another problem in
this respect. A large part of their area was also not surveyed but now
with the integration of these States reporting agencies have been set
up and in future we can hope that better and more extensive statistics
would.be available with regard to these areas also.
StlggIJ#ons for the improvement of acreage statistics. In order that
statistics of area may be improved and more accurate figures may be
available, the method of complete enumeratton should be extended to
permanently settled areas' also. As has been pointed out above in
certain tracts of land under the permanent settlements, area statistics are
being collected on a random sample basis for want of primary .reporting
Gl\Or'WTH OP S'l'AnS'l'tCs TN J.NDIA 861
agency. This affects the quality of these statistics. Complete enumtra-
tion is the best method so far as area statistics are concerned and it has
received the approval of variou.c; expert committees and conferences.
In view of the fact that the basis of all economic development measures
is the village, it is necessary to have detailed statistics about this unit.
The random sampling method may give fairly reliable figures of area
under principal crops for the State as a whole, but it cannot be expected
to give satisfactory area statistics when the figures for smaller units
like district, tabsil or village are needed. Random sampling survey
method cannot be recommended for use in case of minor crops also.
It; was due to these reasons that the Bengal Famine Enquiry COIrl-
mission and the Inter Departmental Committee on Official Statistics
advocated the adoption of the method of complete enumeration for
finding out area statistics. At present out of the total geographical
area of 806.3 million acres 550.7 million acreS (or 68.3%) is under
complete enumeration and 23.1 million acres (or 2.9%) under sample
surveys. For 146.6 million acreS (or 18.2%) there are only rough
estimates kvailable. Thus 89.4% of the total geographical area is
said to be reporting and the remaining 85.9 million acres or 10.6%
is non-reporting.
However certain difficulties arise in case of lI11sllrll!yed areas where
the method of complete enumeration cannot be easily adopted. It
is, therefore, suggested that in such areas which have not been ca-
dastrally surveyed, the services of experienced patwaris and other
primary reporting agencies should be utilised wherever they exist
and they should be asked to prepare maps relating to these areas
after which the estimates of area of various fields should be obtained.
In such rmsllrtJlI.JetI areas where reporting agencies tID not exist special me-
thods of making such estimates can be adopted. The difficulties are
no doubt aggravated by the fact that certain areas are not easily ac-
cessible. In such areas which have not been cadastrally surveyed and
where no reporting agency exists eatperiments can be made to
estimate the area under crops through aerial photography. Some
attempts ;'1 this direction were made by the statistical branch of the
Indian Council of Agricultural Research, in collaboration with the
Surveyor General of India. The earlier experiments made in 19417:.
48 unfortunately did not meet with much success as difficulty was
experienced in distinguishing different crops from the photographs
so obtained. Coloured filters were also used for the purpose but they
also did not give satisfactory results. However recent experiments
have given much better results and they are being continued.
N. S. S. Series. Apart from the figures collected from village
records there is another set or iigures collected through N. S. S. This
series was started vezy recently and it relates mainly to area under various
crops. N. S. S. collects these figures of area under various crops
during its regular rounds of survey. Estimates are given for the whole
country and also for certain population zones. These figures are based
862 FUNDAMENTALS 0,1' STATISTICS
The above definition gives a fairly good idea about the concept
of the word "normal" but all the same it is a hypothetical concept and
leaves much rOom for personal bias. It is a subjective estimate. Ob-
viously the concept of the word "normal" in the mind of the Indian
Government was not the same as that of U. S. Bureau. The Govern-
ment of India's opinion about "American normal" was that it was a
"full normal condition" and further that a: normal condition was below
the full normal. It was felt in certain quarters that the system of normal
yield should be replaced by a system of average yield. There are a number
of countries which follow this system. Generally average yields in these
countries are es~imated by finding out the moving average of yields over
a period ofS to 10 years. The adoption of this technique in our country
presented two problems. Firstly this system needed very efficient crop
cutting experiments and secondly it involved a change in the standard
against which condition factor was reported. It was felt by some people
that the village lekhpal.r and crop reporters may not understand the ini·
plication of this change from normal to average yield and they may
continue to express the condition factor against a background of normal
yield. This risk was no doubt great but it was worth taking.
The Department of Agriculture in each State was responsible
for fixing the normal yield per acre for various crops in each district.
The estimate of normal yield was based on the system of crop cutting
experiments which were conducted according to certafn rules laid down
by the State Government. According to these rules average plots of
land were selected by the officers of the Agriculture and Revenue De-
partments and on these fields the crop was sown and cropped before
them. The figures of yield thus arrived at were forwarded to the Direc-
tor of Agriculture who, after taking into account various other facts,
finally fixed the normal yield figures for various districts. Normal
yield figures were generally not changed for a period of 5 years. The
aim of the annual crop cutting experiments WAS to furnish tests of accu-
racy of the original figures and to make it possible for the Agriculture
Department to revise these estimates if necessary after 5 years.
New C(JltCljf. Recently the concept of normal yield has been
changed and it IS now defined "as the moving average of actual yields
per acre as determined on the basis of the results of crop-cutting Str-
veys over a period of ten years."' This is a much better definition a d
it will certainly remove some of the shortcomings associated with he
yield estimates in our country.
Soor/Goming.r and suggestion.r. Apart from the vagueness of the term
'normal yield' and the subjectivity of the concept, the system of estima-
ting normal yield and conducting crop cutting experiments invited a
grea> deal of criticism not only from non-government people but also
from those who were responsible for these calculations. Mr. Allen,
Director of Agriculture, remarked as early as 1932 that "examining the
crop tests which I have had to do since taking over charge I have stuck
with the very flimsy and unreliable evidence on 'Which such district
averages are based. In my opinion, at present the probable error is
GROWTH OF Sl'ATISl'IGS IN INDIA 865
is no uniformity in the area of the crop cut and it is very difficult to say-
whether it would be worthwhile to bring about uniformity in this
matter.
There is also a feeling that these figures of crop cutting experi-
ments are not proper,", utilised. They are kept in safe custody for a period
of five years till the question of revision of the normal yield figure comes
up. Such figures should be carefully examined as and when they are
received and a consolidated review of all the districts and for all the
crops should be prepared each year and sent to the officers of the dis-
trict where crop cutting experiments were made. Such reviews should
point out the mistakes committed in conducting crop cutting experi-
ments. When these figures would be scrutinised annually, it may not
be necessary to revise the normal yield figures rigidly after every five
years only. Figures could be revised as and when necessary, and the
annual scrutiny will guide the time when revision is needed.
Condition or Seasonal factor
Concept and lechni(j1le of estiPlalion. The figure of normal yield as
calculated on the basis of _crop cutting experiments has to be adjusted
in the light of the conditions prevailing at the time of the estimation
of crop yields. This condition or seasonal factor denoted the condition
of the crop in a particular season in relation -to the normal crop. It
was usually expressed in terms of annas, a fixed number of annas repre-
senting the normal. It was a purely subjective estimate and was not
arrived at by any type of statistical calculation. Once or more during
the growth of the crop and again at the time of harvesting, the crop
reporters estimated the yield as so many annas taking a de£nite
number of annas, for example 12 or 16 as standard. The tehsiltlar re-
ceived such statemen~s from various crop reporters in his jurisdiction
and he assumed some sort of average from them using also his general
knowledge of crop condition and he reported a single figure for the
whole tehsil to the district officer. The district officer modified this
figure according to his knowledge and either proceeded like the lehsildar
and selected a single figure for the district as a whole or applied the
anna yield to the area sown for each tehsil separately and reported an
average for the whole district,
Defects. This system of estimating the seasonal factor was ex-
tremely defective. The crop reporters were generally uneducated or
little educated people and the task assigned to them was one which
required a very great ability and perfectly unbiased and balanced state
of mind. AFi a matter of fact it is very difficult even for trained per-
sons to give an accurate idea is about the condition of a crop as compared
to a normal crop which itself is a vague term, and as such it should not
be any surprise if the untrained crop reporters couid not give correct
~~. . .
Crop reporters were in many cases found to be bIased. It was
generally believed that the official crop 1eporters 'Were pessimistic by
nature and it resulted in underestimation o( the crop. Probably the
GROWTH OP STATISTICS IN INDIA 867
reason for this pessimism was that the idea of the word "normal" in
their minds was somewhat different from the one which the Govern·
ment intended it to be. Their idea of a normal crop probably was
'a crop which they ,longed to See but rarely saw.' When they compared
the actual crop with this concept of the normal yield underestimation
was bound to occur.
It was also felt that the figures of allllalPari estimates were some-
t).mes vitiated on !lccount of the fact that in temporarily settled tracks
remission in the land revenue had to be granted when the seasonal
condition fell below a certain perct:ntage of the normal. It was not
unlikely that when the seasonal conditions were on the border line for
the grant of remission, some Overzealous crop reporters pitched their
estimates too high, thinking that they would displease their superiors
by the possible loss of revenue. I
Moreover, the number of reports received about the condition
factor was not adequate and there was also a tendency in the crop re-
porters to favour even figures. The position waS further aggravated
by the fact that it was not possible to estimate the extent of error of these
estimates. These estimates were usually guess work and it was diffi-
cult to say whether they were correct or incorrect and what was the
extent of error contained in them. '
Besides these discrepancies in the collection of these ilguresthere
were many drawbacks in the working of these figures for final estimates'
'The methods by which the tehsildar arrived at the condition figure of
the tehsil and the district officer for the district were open to objection.
No recognised system of averagin~ was followed and this was supposed
to be a very serious drawback of these estimates. Messrs. Bowley and
Robertson had suggested that arithmetic average of tehsil figures
should be calculated for aqiving at the district condition figure, but we
continued to follow the old practice and no change was introduced
except in some states like Madras where the arithmetic average of the
village figures was calculated to find out estimate for the 'firka' and
weighted arithmetic average ofJirka estimates to find out the condition
factor for the revenue circle. The weights were proportional ~o the
area under the crop in various firkas. It would have been better jf the
condition figure was represented as a percentage of the normal other
than in terms of annas.
Recently the Central Government has standardised the procedure,
for arriving at the condition factor and recommended to the states to
adopt the n<;w pattern.
The above discussion clearly shows that the yield statistics as
calculated by traditional method of finding out the normal yield and
the condition factor Were fhn of defects. Both the normal yield and
the condition factor coul~ not be correctly estimated and the yield
estimates were extremely faulty. At present the final estimate of crop
production even in the official series are made on the basis cf randoin
sample surveys. Yet it is, necessary to obtain accurate data about the
868 PUNDAMENTALS OF STATISTICS
normal yield and the cOhdition factor because the earlier crop estimates
are still based on them. 11 is gratif'ying to note that the normalyield is now
computed as a moving average of the actual average'yields per acre obtained by
Crop Cutting Surveys but these surveys should be conducted on random sampling
basis during the precedingyears and the condition factor for each district should
be computed as the weig'hted average of tehsilfigures, weights being proportionate
to area under the crop in various tehsils. Certain improvements have been
made in some States but the geneo:al condition cannot be said to be
satisfactory as yet.
1. Official figures-(B) Random sampling method
Due to the various defects pointed out above it was realised that
the system of crop estimation in our country ne,eded a change. It was
also thought that crop estimation should be done on the basis of random
sampling surveys rather than on the basis of concepts like normal yield
and anna condition which were subjective in nature and always contained
an element of uncertainty and guess work. It is a matter of consider-
able satisfaction that the scientific method of random sampling is fast
replacing the traditional system so full of defects as we have just seen.
This system makes super.fluous the determination of normal yield and
condition factor and gives the estimates of yield directly. Besides
being more straightforward and scientific, random sampling is re-,
latively less exposed to personal bias. Indirectly it takes into account
the effects of such factors as rainfall, soil, irrigation, methods of culti-
vation, etc.
Techniqlle. "The technique of random sampling consists, in
principle, of choosing a sample of elements out of a given totality of
elements comprising the population, in a manner as to offer each element
of the totality an equal chahce of inclusion in the sample. The tech-
nique not only ensures that the sample is representative of the population
but also provides the means of knowing how far one is likely to be in
error in estimating any characteristic of the population on the basis
of this sample. The advantages of such a method in yield estimation
are that we are able to obtain an unbiased estimate of the average yield
per acre and can determine in addition the margin of error by which
the estimated average yield is likely to depart from the true unknown
value of the yield for the tract survey.".
The random sampling method for the estimation of yield statistics
is the most scientific method. The Brst random samfling scheme on
scientific lines as reco~ended by the Indian Counci of Agricultural
Research was carried ouf by the Indian Central Cotton Committee in
1942 in Akola district of C~ntraI Provinces. In 194.3-44 it was exten-
ded to Buldana district also. In the year 1944-45 it covered the whole
cotton belt of Central Provinces and the estimates of the surveys were
found to be about 10% higher than the official estimates. The first
• "Statistics of area and yield of crop in India" by Dr. P. V. Sukhatme,
A~rlcultural Situation in India, March, 1949.
GROWTH OP STATISTICS IN INDIA 869
the sample. Another point of difference is the shape and size of the
field selected. The Statistical Institute selects, at random. plots of 1000
square feet to form the sample plots. The I. C. A. R. selects rectan-
gular plots which are larger 1n size than those selected by the Institute.
Again while the Institute appoints special staff of field investigators
for conducting these surveys the 1. C. A. R. surveys are conducted
through the existing agencies of the Revenue and Agriculture Depart-
ment.
As has been s!lid earlier that even though the final estimates of crop
yields are now based on the resul~s of crop cutting experiments by the
random sampling methods yet it is necessary to obtain accurate data
regarding the condition factor and the normal yield for earlier crop
forecasts. As such it is necessary to effect the improvements that
have been suggested in the foregoing pages with regard to the estima-
tion of the normal yield and the allllatpt!rl estimate. The Government
is conscious of this fact and efforts are being made to effect improve-
ments.
DifeC!l.r. Though theoretically speaWng tpe results based on the
new technique of crop cutting surveys conducted on random sampling
basis should be better than those under traditional m("thod in which
subjectivity was present in ample measure yet it cannot be asserted
with any degree of confidence that it is so. Thue .rl(rlley.r are no/ (ondIiG/-
ted iIJ a satisfaGtory manner and the supervision from outsid, agenGilS I,alll.(
milch to he desired. The supervising agencies are generally overburdened
with their own work: which is of a varied type and they do not attach
much importance to this aSSignment. Probably they do not realise
the great im.portance which should be associated with this work. It
has been est1mated that effective supervision over these State crop
cutting survey.) is not more than 2% of the total number. This amount
of supervision is very inadequate and it need~ strengthening. Yet
another shortcoming of these estimates is that the Sllrtley.r ar, not GOlllllGled
on a uniform pattern in all States. Some States like U. P. follow the
pattern of 1. C. A. R. surveys while others like West Bengal follow the
I. S. I. pattern. It has been pointed out that these two techniques con-
siderably differ from each other. The samp~ng unit in I. C. A. R. sur-
veys is the villa.ge whereas in 1. S. I. Surveys the village is no~ the
unit. The shape and the siZe of the plot also differ in these two types
of surveys.
In order to have dependable statistics relating to agricultural out-
put it is necessary that these drawbacks are removed and the scope and
coverage of the official series is enlarged so as to include minor crops
and fruits and vegetables etc. about which no statistics are available
a~ present.
2. National Sample Survey series
The N. S. S. also collects data relating to yield of major cereal
crops during their regular rounds. These figures are collected on the
bas1s of sample surveys and are obtained through their general purpose
GROWTH OF STATISTICS IN INDlA 871
to submit the various figures within the time limit prescribed. A cer-
tain amount of delay in the final crop estimate is also due to the
fact that in the random sampling method the estimate is made fairly
late when the crop is almost ready for harvesting. If crop estimation
under this method 'Was don" a little earlier probably this reason
for delay could also be avoided.
Another defect of our crop estimates is that they contain infor-
mation which is generally a month old whereas in other countries
crop forecasts generally relate to information which is only one week
old. If crop forecasts are released early. this period could be reduced
in our country as well.
The crop estimates do not receive due publicity in our country.
They are no doubt published in daily papers and magazines and also
broadcast by All India Radio yet all this is not enough. The ser-
vices of the Information Department of the Government should be
utilised further to have better publici ty.
At present the number of estimates are too many in respect of
some crops and too few for others. Two good crop estimates are
better than 4 or 5 unsatisfactory oneS and it would be better if efforts
are made to have only two or at the most three forecasts for each
crop and these forecasts are of a more satisfactory quality than the
present ones.
It would also be better if the form on which the crop estimates
are submitted by the primary reporting agencies are standardised for
the country as a whole. and information is called about the area under
irrigated crops and probable harvest prices also.
In order to judge the standard of accuracy achieved by the crop
estimates attempts should be made to estimate the output by other
methods also. In case of commodities like cotton and jute an estimate
could be obtained by finding out cQnsumption of these goods by Indian
mills. figures of exports and imports and of stock at th~ beginning and
end of the season. Similar methods could be used for other crops also.
In countries like United Kingdom and U. S. A. objectivity to
crop estimates is imparted by correlating them with weather conditions.
They havC? found by experience that there is a definite relationship
between crop yield and weather conditions and that crop forecasts based
on weather conditions combined with a study of soil fertility manuring
etc. are more accurate than subjective forecasts. We could also make
experiments in these directions.
One more suggestion can be given to impart objectivity to these
estimates which are at present subjective in nature. In some coun-
tries like Japan. suitable systems of physical measurements during the
various stages of crop growth have been evolved and they are supposed
to give a dependable idea about the degree of accuracy of crop forecast-
ing. The importance of pre-harvest estimates is considerable in the
context of price policies and food administratio'n and such a step. if
taken in our country. will be fully justified.
874 FUNDAMENTALS ~F STATISTICS
this that the data are consolidated at the State Headquarters. After
this the States send ~heir figure to the Directorate of Economics and
Statistics under the Ministry of Food and Agriculture at the centre for
consolidation and publication on an all-India basis. If there is delay at
any of these stages in any part of the country the all·India statistics are
held up. India is a big country and in various areas the time of sow-
jngs of various crops are different and even if the various agencies
associated with the collection of these statistics are efficient and punc-
tual, all-India statistics cannot be published sOOn. It is necessary that
strict punctuality must be observed at every stage and attempts shoul~
be made to publish the statistics within the time limit set because if
statistics are published late much of their utility is lost and then they
have only an historical importance.
In conclusion it may be said that this.unsatisfactory state,of affairs
is partly du~ to the fact tha\. the collection, of agricultural statistics was
in the past Lreated only incidental to the collection of land revenue.
The preparation of crop forecasts was taken up later on, at the insis-
tence of persons interested in trade. Actually it was at the insistence
of the cotton merchants of Manchester and Lancashire that crop fore-
casts began to be issued in this country. It w~s never realised until
recentiy that every development plan has to be based on accurate sta-
tistical information. The collection and compilation of statistics has.
therefore, remained more or less a by-product of either the official acti-
vity or a luxury which could be enjoyed in relatively easy times and
skip~ over in times of difli<:ulty. This Is the reason why statistics
collected- in our country are usually available in patches and their com-
pilation. and processing have been hapha~rd. Even at present as has
been said -already, a good deal of data runs to waste for want of proper
processing. A lot of additional information can be derived without
any extra cost if there is proper planning and interpretation of statistics.
With all sorts of economic plans and schemes afoot, it can be reason-
ably expected that these defects shall receive proper attention at the
hands of the government and the agricultural statistics of the country
would soon become more accurate and dependable.
tion a.nd the weights were proportional to the total value of each co-
mmodity during the base period. To find out the average value of
products the average harvest price during the base period was taken
into account and where no price was available it was estimated from
wholesale prices. 'I:he general index was obtained by combining the
index numbers relating to the food grains group and non-food grains
group. Weights assigned to these groups were respectively 2 and
1. Later, the base period of the indices was changed to the triennium
1937-38 to 1939-40.
The revised index was available in the year 1950-51 with the base
year 1945>-50. This base year was also provisional and after 1953-54
it was finalised and the final base now is the year ending June 1950.
These indices are published in the Annual Report on Currency and
Finance issued by the Rese~ve Bank: of India and Agricultural Situation
in India. In the revised index there are a number of groups and sub-
groups. The first major group is the food-grain group which is
divided in two sub-groups namely (1) cereals and (2) pulses. Cereals
are further sub-divided into five smaller groups of rice, wheat, jwar,
bajra and maize. The second major group is the non-food grains
group and this group is sub-divided into four sub-groups namely (1)
oilseeds (it) .fibres (iii) plantation crops and (iv) miscellaneous. Each
of these four groups is further sub-divided into a number of smaller
groups. Indices are available separ~tely for each item, each sub-
group, each major group and for all commodities.
There are in all 26 commodities (for which regular crop fore-
casts are issued) included in the seLies. The values of different items
of production during 1949-50 have been taken as weights and the in-
dex number is computed as a weighted arithmetic average of production
relatives for individual crops. In working out the production rela-
tives chain base method is' used to allow for changes in coverage as
well as technique of estimation.
(ii) R'4.reru6 B:JI1/e of India index and Agrk1llJurai Production. This
index: number was pu.blished in the Reserve Bank of India Bulletin.
The indices were available from 1939-40 onwards. Upto 1946-
47 before the partition of the country. the base period of the index
number was the average of the triennium 1936-37 and 1938-39. Mter
th~ partition of the country the index: number of agricultural produc-
tion issued by the Reserve Bank: of India related to the Indian Union
only and up to the year 1948-49 the base year of the index number
was the same as previously. From 1949-50 onwards the base year
of the index number had been the vear 1948-49. The index number
was based on 17 commodities dis·tributed over five major groups.
The index: number was weighted and the weights were the values
of crop production. The various items and their weights were
as follows:
56
882 FUNDAMENTALS OF STATISTICS
Weight
1. Rice 38
2 Jowar and bajra 12
3.
4.
Maize
Ragi
e
2
5. Wheat 14
6. Barley 4
7. Gram 7
8. Sesamum 1
9. Groundnut 7
10. Rape and mustard 2
11. Linseed 1
12. Castor seed 0.3
13. Cotton 3
14. Jute 2
15. ~ea 4
16. Coffee ••. 0.4
17. Rubber 0.1
The above mentioned commodities Were divided in five groups
namely, foodgrains beverages,oilseeds fibre and~others. It would be ob-
served that in this Index number food-grains received a total weightage
of 79 out of 100 and amongst the foodgrains rice rece\ved a very high
wei~htage of 38. In the Ministry of Food and Agriculture Index food
gralns received a weightage of 66.9 only and rice only 35.3. Similarly
wheat receives a weightage of 14 in this index whereas in the other
index its weightage is only 8.5. There are differences in the weightage
of other items also.
(iii) Eastern Economist index of Agricultural Production. This index
is based on the average prices of 1936-37 to 1938-39 and includes
14 items spread over four major groups. This series is available from
1939-40 and was first published in the special budget number of
Eastern Economist of 1952-53.
The major groups and the items within them are as follows:
A. Food,grains: (i) Rice, (#) Wheat, (iii) Millets, (it!) Gram.
B. Fibre:r.: (i) Cotton, (m Jute.
C. Oilseeds: (I) Sesamum, (it') Groundnuts. (iii) Rape
and Mustard, (iv) Linseed.
D. Misc,llanevJlIs: (i) Sugarcane, (il) Tobacco, (iiI) Tea,
(iv) Coffee.
This index number is also a weighted index and the Wieghts
ate also proportional to the values of the cotllmodities during the base
period.
GROWTH OF STATISTICS IN llNDU 883
(iv) F. A. O. index. The Food and Agriculture Organisation of
the United Nations Organisation also publis-hes II. series of index
numbers of agricultural production relating to many countries of the
world including India. The base of these indices is the average
of the years 1934-38. A large number of commodities have been
included in these indices and they are divided in eleven groups.
This index number is also weighted and. the system of weighting
is a very complicated one. The commodity weights are world prices.
Each commodity price is calculated first in term~ of gold francs
per metric ton. In 19-34-38 the prit:e of a metric ton of wheat
was 100 gold francs and on this basis the price of all other commo-
aities are converted to wheat relative prices on the basis of which
weights art;: assigned. This systetn or weighing is a very compli-
cated one and is under scrutiny at present. The Food and Agricul-
ture Organisation of the United Nations Organisation is interested
in international comparisons of the indices of agricultural production
and that is why world prices are taken into account for assigning weights
to different items.
Live Stock and POUltry Statistics
In a country which is predominantly agricultural and in which
more than 80 percent of the popUlation lives in villages and is con-
nected with rural economy. the necessity of aknowledge of cattle wealth
cannot be overemphasised. In India the statistics of livestock were
first collected at the instance of the Secretary of State for India and it
was in the year 1883 that the Statistical Conference prescribed a form
on which the details of cattle cenSus were to be filled. Since then
figures of livestock began to be published quinuqennially in Agri-
tllltllraJ Statisliti of India. In temporarily settled areas it- was the
village Jekhpa/ who was entrusted with the task of reporting the
numbet: of cattle in his area. The figures submitted by tHe patwari
were generally not reliable. In permanently settled areas the con-
dition was still worse. It was in the year 1916 that the Government
of India decided to improve the situation and to have a cattle census
for -the whole of the country. A cattle census waS held in the year
1920 and since then it is being held every fifth year. The last two
Censuses which were due in the year 1950 and 1955 were respec-
tively held in the years 1951 and 1956. The current census was
conducted in the year 1961. Information relating to the 1951 and 1956
censuses are available in the publication entitled Indian Live Stotk
Census. Besides containing data about population of live stock and
poultry this publication also contains information about the agricul-
tural ~mplements and machineries of different types including tractors
in various States.
Sborltdmings. These statistics are not supposed to be very satisfac-
tory though it should be admitted that there has been a gradual im-
provement both as regards coverage as well as the quality ofdata. The
1'930 Live Stock Census had a coverage of SO percent of the area, the
884 FUNDAMENTALS OF STATISTICS
1945 census, 92 per cent of the area and 1951 census 94.4 percent
of the area. In 1956 the coverage was still better.
Live stock census prior to 1951 related to undivided India and
in 1950 the Ministry bf Food and Agriculture brought out a brochure
"Live Stock Statistics" and re-estimated the data for 1940 and 1945
on the basis of the present boundary of the Indian Union. In 1951
census the data relate to all States except Orissa and some parts of
Rajasthan and Manipur. The statistics given in these publications
are not fully comparable from census to census as information is
sometimes not available for certain areas in various States. Moreover
formerly the number of princdy Indian States taking part in the
enumeration was not the same at the time of every census and whatever
figures are available in previous reports relating to former Indian States,
cannot be relied upon because there was no proper agency for the
collection of data there. Moreover this census was not held at the
same time in all States. However, the last two censuSes were held
almost at the same time in all States.
Another drawback in the present live stock statistics is that the
classification of cattle is not uniform for the whole country. Moreover
the definitions of the words bull, bullocks, breeding bulls and othes
bulls etc. are not uniformly followed in all States and as such ttgure s
are not strictly comparable.
Recent improvements. In 1956 certain improvements were made
and, for the first time, the census was conducted on a household basis
on a uniform pro forma. A sample verification was also done by the
Directorate of National Sample Surveys in June-July 1956. The Indian
Council of Agricultural Research had also done sample verification
in certain areas after the census of 1950. However the quality of
these statistics is still very poor. Now it has been decided to conduct
each alternate livestock census in the same year in which human census
is done. This will improve the quality of these statistics to some extent.
Livestock Products
Formerly statistics relating to live stock products were available
in Indian A.griclllfttral Statistics. These statistics were extremely defective
as not only their coverage was incomplete but also because their quality
was very poor. The methods adopted to estimsate the output of live
stock products were extremely defective and generally speaking the
data were only of nominal importance. In 1951, however, these
statistics were published in Indian Livestock Statistics which contained
figures for 1947-48, 1948-49·.and 1949-50.
Livestock products can be divided in three main categories:
(i) Edible livutock products (primarJ). Milk, eggs, meat
and poultry.
(ii) Bdih/~ livestock products ( se~ondar.J). Ghee, butter,
. curd, cream etc.
GRQWTH OF STATISTICS IN INDIA 885
INDUSTRIAL SECTOR
Need. In modern times the economic structure of almost all im"
portant countries of the world is incre~singly dominated by large scale
industries and economic development is measured more by industrial
development than anything else. As such the importance of statistics re-
lating to t\1e industrial sector of various countries is continuously increas-
ing. Since proper industrial development is not possible in the absence of
reliable and adequate data, statistics relating to the size of industrial
units, their capital structure, employment provided by them, impact
of rationalisation and productivity movements and various types of
input-output ratios have assumed great significance in modern times,
when there is almost a mad ra(:e between various countries for achieving
supremacy in the industrial field. Economically advanced nations
of the world like U. S. A., U. K., Germany and U. S. S. R. now col-
lect comprehensive statistics relating even to the minutest problem
associated with development and growth of industry.
. The availability of statistical data relating to jndustries in our
cOl.lntry has always b~en very poor. As stated earlier, the attitude
of the British Government towards industrial development was never
sympathetic and the question of having any efficient system of collec-
tion of industrial statistics never arose. In modern times the importance
of industrial statistics has considerably increased and in our country also
·keen interest has been shown by the Government in matters relating
to industrial development and the question of having adequate and
dependable industrial statistics has also received the attention it deserves.
The paSSing of Industrial Statistics Act, 1942, the Census of Manu-
facturing Industries Rules in 1945, the Collection of Statistics Act in
1953 and the Collection of Statistics (Central) Rules in 1959 are impor-
tant steps that have been taken in recent years to improve the indus-
trial statistics of our country.
Data collected in other countries
Before actually examining the industrial statistics available in our
country during the British period and after independence it would
be better to have an idea about the nature of industrial statistics col-
lected in other countries of the world. Broadly speaking the available
industrial statistics in other countries can be classified under "the fol-
lowing heads :
(i) Capital strIlC/llre. Under this heading statistics relating both
to fixed as well as working capital are collected. Not only figures
relating to authorised, issued and paid-up capitals are available but
details of investments in land,·buildings plant and machinery, furnit~e
892 FUNDA,MBNTALS OF STATISTICS
~nd other fixed assets are also noted down. Figures of expenditure
on extet;lsions and replacements of these assets as also the amount of
depreciation and repairs during a year are regularly collected. Work-
ing capital figures are available on the basis of their distribution over
items like raw materials, fuels. finished and semi-1inished products.
cash in hand and bank etc. Foreign capital investments and figures
relating to the manner in which internal capital is raised (by issue of
shares,debeptures,reinvestments of profits and loans) are also available.
(ii) E11Iploy;ment. Statistics are collected about the number of
persons employed and wages and salaries paid. to them. Employees
are classified in a number of ways, usually by the nature of work (skil-
led, unskilled, technical, non-technical, clerical, supervisory and ad-
ministrative etc.) Figures of wages and other emoluments paid are
collected in detail. Figures of total man-hours worked. average
employment per working day, and average wages and salaries are also
estimated. Figures of industrial disputes, absenteeism, labour turn-
over etc., are usually available as a result of the working of various labour
legislatjn!1s. Social security statistics generally emerge from the working
of social security measures.
(iii) Inputs. Very detailed statistics of different industrial inputs
are collected in other countries. Inp\:t and output analysis has assumed
great significance in recent years and such statistics are now collected
in great detail. Figures relating to both quantity and value of
each industrial input are obtained. Industrial input~ may be either items
of raw materials, chemicals, packing material and consumable stores
or of fuels, lubricants and electricity consumed etc. Apart from tbese
two major categories of inputs (i) materials, (ii) fuel and power there are
other inputs also like various expenses on inward transport, printing,
advertising, warehousing, purchase agency service, local rates and taxes
.!tc. about which detailed statistics for every unit are collected. These
figures are of very great help in estimating the net output or value added
by a group of units and also in setting up certain t~chno-economic
ratios which are essential for correct analysis and interpretation of facts
and figures.
(iv) Outputs. Just as statistics of inputs are collected in detail,
statistics of output are also obtained comprehensively. Figures re~ating
to quantity and value of the main product, Dy-product and subsidiary
products manufactured in a year are obtained. These figures give an
idea about the gross output from which the total inputs hav:e to be
deducted to arrive at the figure of nct output.
Ev) Other data. Apart from figures of capital structure, industrial
employment, input and output, figures are also collected about the
potential exparrsion and maximum capacity of the units from various
points of view-like output, employme~t etc. These figures are very
helpful for the purpose of economic planning. Figures of stock and
supply of various goods and the utilisation of selected articles (generally
scarce) are also obtained. These statisti~s are of immense help in input
GROWTH OP STA TISTlCS IN INDIA '893
and output studies and in locating the point where econO!llY is pos-
sible and rationalisation schemes can be put through.
The points discussed above indicate only the broad headings
under which information is collected in other countries of the world
and should be available in our country also. In actual practice each of
these major classes are sub-divided into a large number Qfsmaller classes
depending on the nature of the data required. These statistics are
usually collected either by sending schedules through post to the
industrial units and asking them to fill them up and to return them
to the relevant statistical authority or by deputing factory inspectors
who go from factory to factory and collect the iollformation needed.
We shall now discuss the data that are available in our country
and shall examine their drawbacks and suggest ways of improvement.
We shall not study labour statistics in this section but shall discuss
them later on in connection with statistics of wages. Industrial statistics
available in our country prior to independence were extremely me!lgre
and whatever improvements have been made are of a comparatively
recent origin. As such we shall examine the available statistics in two
parts namely, one relating to the pre-independence period and the
other relating to post-independence period.
Pre-Independence period
Statistical data available in India about large scale industries before
the year 1947 can be studied under three headings, namely:
1. General s~atistics.
(a) Number of factories.
(b) Labourers employed and wages paid.
(c) Capital invested.
2. Statistics of output and costs.
3. Statistics of power consumed.
General statistics. So far as general statistics were concerned data
were available about the number of factories, the number of persons
employed by them and the amount of cal>ital invested in them. These
figures were available in :
(1) Large Indllstrial Establishment in India, which was issued by
the Department of Commercial Intelligence and Statistics at that time
Now this publication is issued by the Labour Bureau, Ministry of Labour.
(2) Statistical Abstracts of (British) India,
(3) Statistics of Factories, and
(4) Report on the Working of Joint Stork Companiu.
For the purpose of statistics factories were divided in ten major
groups. namely_(i) Textiles, (H) Engineering, (iiI) Mineral and Metal
(iv) Food, Drink and Tobacco, (v) Chemical dyes etc. (vi) Paper and
Printing, (viI) Processes relating to wood. stone and glass. (vHi) processes
connected with skins and hides, (ix) Gins and Presses, (x) MisceIla-
894 FUNDAMENTALS OF STATISTICS
neous. Each of these ten groups was further sub-divided into a number
of smaller groups and the number of factories in each of the major
and minor groups was given both districtwise as well as provinc~wise.
Separate sets of tables for seasonal and perennial factories were given.
Seasonal factory was ta".en to mean a factory which did not work
for more than 180 days in a year. These figures were compiled from
the returns of Provinc~al Factories Department. Figures relating to·
Indian States were specially collected. The average number of persons
employed daily was calculated by diviCling the total ~ttendance of all
working days in the factory by the number of working days. These
figures were publiShed in the Large Industrial Establishment as well
as in the Statistical Abstract. The figures in the Abstract related
to those factories only to which the Factories Act of 1934 appUed.
These figures Were fully comparable with those publiShed in the
Large Industrial Establishment because the Factories Act applied. in
some cases to those factories also which employed less than 20 persons
and such factories were ignored in Large Industrial Establishments.
Figures were available provincewise and the provincial figures were
classified under three headings namely. adults. adolescents and children.
Separate figures were available for males and females for the first two
groups. These publications also contained sortie information about the
capital invested in various factories. Separate figures were available
for authorised and paid-up capital and debentures., However no
separate figures were available for fixed and working capital and the
amount of money spent on land. buildings, plant and machinery and
other fixed assets Was not known.
OIItpHt and Cost Statistics. So far as statistics of output and cost
were concerned, there were hardly any statistics worth the name avail-
able in our country prior to the year 1946. Stati~tics of inputs were
particularly conspicuous by their absence and even statistics of output
Were extremely f~ulty and inadequate. There was no legal binding
on any industrial unit to supply information about output and cost
and whatever information was available during this period was collec-
ted on a voluntary basis. As such figures were neither complete nor
comparable. The situation with regard to cotton mill industry was
somewhat better due to passing of the Cotton Industry (:Statistics)
Act in the year 1926. According to this Act a cotton mill was under
legal obligation to supply statistical information which was publiShed
in Monthly Statistics of Cotton Spinning and Weaving il1 It1dian Mills. Under
this Act figures were collected about particulars of all cotton goods
manufactured. description and weight of all yarn spun. amount of
cotton pressed and consumption of Indian cotton in Indian mills.
Figures of production of some industries were available in another
publication named Monthly Statistics of the Protlnction of Certdin Selected
Intlns/ries in India. This publication c6ntained information about the
production of jute manufa~tures. paper, iron and steel. petrolll:nd kero-
sine oil, cement, paints and heavy chemicals and wheat flour mills
in India. In, all these ,cases figures were supplied voluntarily by the
GaOWTH OP STA!ISTICS IN INDIA 895
factories. These figures were not comparable month after month
because the number of factories supplying information was not the
same each time. Besides these statistics information was also available
about the production of sugar and match industries. The production
figure~ of sugar and match-boxes were based on the reports of the excise
authorities under the Sugar and Match (Excise duty) Act 1934. Certain
statistics were also available about the production of distilleries and
breweries.
Statistics of Power. So far as statistics of power consumed were
concerned MonthlY Survey of Bu!iness Conditions in India (now merged
with Indian Trade Journal since 195 I) used to give monthly statistics
of the electric power generated and consumed in India. Upto October
194:> information was given in a detailed form and ~he figures of con-
sumption were given under seven heads, namely, Domestic, Com-
mercial, Industrial Tramways. Electric Railways, Street lightings and
Miscellaneous. Sidce November 1942 only total figures of the ener-
gy generated and total untts sold for consumption began to be given.
Upto the year 1943 these statistics were supplied by Economic Adviser's
Office but since January 1944 statistics relating to the electric power
generated and consumed began to be supplied by the Electrical Com-
missioner to the Government of India.
The above survey of industrial statistics available in India prior'
to independence clearly indicates that the situation was highly un-
satisfactory. The data available were extremely inadequate and their
quality was very poor. Production figures were not comparable year
to year as the numoer of units supplying the data was not uniform.
Statistics relating to inputs were very inadequate and figures relating
to the quantity and value of raw materials used in production, added
value of manufacture, value of fuels and power consumed, number of
engines, horse power and kind of power used, value of land, build-
value of manufacture, value of fuels and power consumed, number of
engines, horse power and kind of power used, value of land, build-
ing. machinery etc. and figures relating to :various items of cost were
almost non-existent.
Post-independen'$:e period
After the independence of the country the Indian Government
realised that unless the country' was industrially developed economic
development was not possible and particularly when the government
decided to proceed with the 5-year plans it became absolutely essential
to collect detailed statistics about the various problems associated with
industries and it is not surprising, therefore, to find that the condition
of industrial statistics of our country has considerably improved in
recent years. We shall discuss below some of the important achieve-
ments of the government in th-e field of industrial statistics in recent
years.
(1) Annual Census of Manufactures
The Government was conscious of the fact that industrial statistics
in India were inadequate and that their necessity was acute from more
than one point of view. The need for a legislation for the collec~on
896 FUNDAMENTALS OF STATISTICS
of such statistics was a long felt one and it was in 1942 that Industrial
Statistics Act was 'passed.The Act covered the then British India and
provincial governments were ft:ee to frame their own rules and to de-
cide the dates on which their rules Were to come in force. Though
the Act was passed in 1942 it was only late in 1945 that the Directorate
of Industrial Statistics was set up at the Centre to enforce it. The first
stage for implementing the Act was the notification by the State Go-
vernments of the rules for conducting an industrial census. Census
of Manufacturing Industries Rules were uniformly adopted by all the
States in the year 1946 and in the same year the first Census of Manu-
factures was conducted. Annual censuses have been conducted since
then and the results published in the Census of Manufacturing Industries.
This Act is applicable to all 'factories which employ 20 or more persons
or employ 10 or more and use power. In accordance with this Act
the factories are under legal obligation to supply the requisite informa-
tion to the government and failure to do so is punishable with fine.
All information required under section 3 of these rules was to be
furnished in English and it was to be treated as confidential by the
Statistical Authority. '
Under these rules industries were classified in 63 categories out
of which 29 were selected in schedule I about which data were to be
collected as per form which was a part of schedule II ofl the above men-
tioned rules. Thus the details of the statistics to be collected were
part of the ruleS which could not be even modified without a tedious
legal procedure. About the remaining 34 industries statistics were
to be collected at a later date. The 29 industries included initially in
schedule I were as follows ! -
1. Wl:eat flour, 2. Rice milling, 3. Biscuit making, 4.
Fruits and Vegetables Processing,S. Sugar, 6. Dis-
tilleries and Breweries, 7. Starch, 8. Vegetable Oils,
9. Paints and Varnishes, 10. Soap, 11. Tanning,
12. Cement, 13. Glass and Glass-ware, 14. Cera-
mics,15. Playwood and Tea Chests, 16. Paper and
Paper board, 17. Matches, 18. Cotton Te.~tiles, 19.
Woollen Textiles, 20. Jute-Textiles, 21. Chemicals,
22. Aluminium, Copper and Brass, 23. Iron and
Steel, 24. Bicycles, 25. Sewing Machines, 26. Pro-
ducer Gas plants, 27. Electric lamps, 28. Electric
Fans, 29. General Engineering and Electrical En-
gineering.
As per schedule II for each.of the above 29 industries a form was
prescribed on which statistics were collected. The data coollected about
~hese 29 industries were, generally speaking, of unform pattern. How-
ever all the forms were not identical because various items of inputs
and OUtputs in d..i1ferent industries were different. The data collected
under these rules were as follows :
GROWTH OF STATISTICS IN INDIA 897
12. Total salaries and wages paid (in crores) 135·7 2.I8.6 270
13· Value at factory of materials etc. consum-
ed including cost of transport (in crores). 48~'4 883·4 12.06
14· Value of work done for factories by other
concerns (in crores). . . .• 2..8 6,3 10
15· Depreciation (in crares) 12..6 25. 0 40
(2) The forlJJS oj the Schedllle.r and Qllestionnaires were not flexible
and no change could be made in them without a very tedious and long
legal formality. This defect has been removed in the new rules framed
in 1959 under the Collection of Statistics Act.
(;) Their coverage was not complete and only 29 industries out of 6;
were covered. The idea was to include the remaining 34 industries
at a later date but these industries could not be in~luded till the day the
Industrial Statistics Act became inoperative. In fact from 195 I on-
wards data were available with regard to 2.8 industries, as no unit con-
tinued in Producers Gas Plants Industry.
(4) The forms on which statistics were collected were fJot suitable
for Government own~ faG/ories and they did not supply information. With
the expansion of the public sector the number of such units increased
fast and there was need of bringing them within the orbit of the census.
The factories attached to Training Institutes also came in a different cate-
gory as they did not employ labour permanently on regular basis and
as such were left out.
(5) Even for the industries covered the forms used for tbe collection
oj statistics were n"t very suitable and there were genuine difficulties in
supplying all the information called for. In' fact these forms were
designed on the lines of similar forms used in U. K. :;'Lld U. S. A.
but conditions in those countries were different as tP:;lr factories were
of more advanced type and maintained detailed accounts of inputs
and outputs. Our forms should have been more simple and such as
c(;>uld be easily understood. Moreover the information had to be
supplied in English only and this also created some difficulty.
On the whole, however, it can be said that the data collected \
under eM were not very unsatisfactory. The details were published
Statewise as well as industrywise. There was however a great delay
in publicdtion oj these figtlres.
..f"
2. SSMI.
Apart from the annual census of manufactUres conducted by
the State Governments the Directorate of National Sample Survey
conducted a Sample Survey of Manufacturing Industries (SSMI) since
the year 19F. It covered all establishments registered under section
zm(t} and 2m(it} of the Factories Act of 1948, that is those using power
and employing 10 or more workers and those not using power and
employing 2.0 or more workers. tts scope was further extended to
cover establishments registered or licensed under the Industries (De-
velopment and Regulation) Act of 1951 as amended from time to time.
This was done for the first time in the fourth round which related to
the year 1954. The SSM! covered the whole of the country with the
exception of Andaman and Nicobar islands. Units under the Minis-
tries of Defence and Railways. Government of India, were excluded
from its purview. The survey was spread over all the 63 industries
which were classi£ed for the census of manufactures. In' the year
-I954 there were 32,767 factories in all (covered by zm(t) and 2.m<it)
of the Factories Act 1948) and ~he sample size was 3,567 factorIes
900 FUNDAMENTALS OF STATISTIC5
Unified forlll. As has been mentioned earlier the CMI and SSM!
were using different schedules and forms fot collection of industrial
statistics. Now in AS! one single form of return has been designed
to meet the requirements of both the census and the sample surveys.
Data collected under A S I relate to :
Capital strllctnre. Details of fixed and working capital and transac-
tions relating to fixed capital (replacements, improvements and ex-
pansions) during the year.
Emplqyment and wages. Average employment and ·emoluments
during the year, employment by categories etc.
Inputs. Raw materials, chemicals, packing materials and con-
sumable stores consumed during the year. Work done by other con-
cerns for repairs and manufacturing processes.
Fuels and lubricants (excluding intermediate products consumed
during the year).
Other expenses not included in materials and fuel and lubricants
consumed.
Olltput. Quantity and value of manufactured products, by~pro
ducts and intermediate products produced duriQ.g the year. Work
done for other concerns on repairs and manufacturing processes. Value
o( semi-finished goods including work in progress.
Slocks. Stock of raw materials, fuels, products and by-products
at the end of the accounting year.
Installed capacity. Installed capacity of production during the
y~r,. its !?_asis of estimation, spare capacity and expected additional pro-
ductIon. -
Power equiplllent. Prime movers (steam engine, internal com-
bustion engine and other prime movers) as at the end of the year. Also
electric motors (AC ~nd bC) at the end of the year.
It will be noticed that this information is very much similar ·to
that collected under the SSM!. However more detailed information
is being collected about these items and information is also being col-
lected about a number of new items which were neither included in
SSMI nor CMI. For the first time we shall be collecting statistics
relating to t -
(a) Equipments other than power equipment installed.
(b) Skilled, semi-skilled and unskilled workers.
(c) Installed capacity of production.
(I) Sales effected during the year classified by the type of con-
sumers.
(e) Labour and management relations.
(/) Training facilities given by the factories, and
(g) Industrial research.
GllOWnI OF STATISTICS IN INDIA
(i,) Manufactures.
(ii,) Electric light and power.
Since these statistics are compiled, from voluntarily supplied in-
formation they are neither complete nor "comparable month after month
because the number of factories supplying information is not always
the same. However for major industries like cotton and jute textiles,
sugar, iron, steel etc. the coverage is fairly wide and' in some cases almost
the whole output is covered. In other cases statistics are obtained
mainly from big units.
This publication also contains figures of installed capacity. In
case of textiles installed capacity is estimated in terms of spindles and
looms. In some cases it is in terms of output. Figures of installed
capacity are estimated generally by the agencies responsible for the
collection of data but in some cases like sugar the estimates of installed
capacity are made by the Directorate of Industrial Statistics itself. In
case of iron and steel and electricity generation the capacity is estimated
on the basis of continuous operation of the plant throughout the year,
adjustments however being made for shut-down. In most other cases
installed capacity is estimated by taking into account the duration of
working of each industry.
A general study relating to the growth of industrial statistics in
our country and the data available at present about large scale indus-
tries has already been made in the preceding chapt~r. It is now pro-
posed to examine the statistics relating to some of the important in-
dustrial problems like finance, inputs, outputs, employment and wages
and taxes and profits etc. Presently we shall confine our study to
statistics of industrial finance only.
At present the industrial finance statistics of our country are
mainly available from the following sources:-
(,) Directorate of Industrial Statistics which had the data
collected under the annual CM!.
(i,) Directorate of NSS which had the data collected earlier
under SSMI and at present under AS!.
(iiI) Department of Company Law Administration (Ministry
of Finance) which has data compiled from the annual
returns filed by Joint Stock Companies.
(Iv) Office of the Controller of Capital Issues.
(v) Finance Corporations.
(vt) Reserve Bank of India.
C. M. DATA
The data collected under the annual CMI (upto 1958) related to
the figures of paid-up and productive capital of the manufacturing in-
dustries.
GROWTH OF STATISTICS IN INDIA 907
These figures gave only a very general and superficial idea about
the productive investment in our large scale industries. These figures,
however, were available industrywise. They were not at all enough
for statistical analy~is of the capital structure of our industrial economy.
They did not give an idea about the sources of industrial finance and
the form in which it was obtained. The coverage -of these figures, as
has been pointed out earlier. was very limited and not only a large num~
bet of industries were not at all covered by CMI but even for the indus~
tries covere'd the data were not complete.
SSM! DATA (UPTO 1958)
The Directorate of NSS while conducting the annual sample sur-
vey of manufacturing industries also collected data relating to the ca-
pital structure of the units covered.
The data collected under ASI is similar to those collected under
SSMI. In the ASI schedule there is a provision for reporting actual
expenditure incurred during the year on plant and machinery and tools,
bought for installation and this expenditure is to be shown classified
under the following headings : -
A-(I) new (2) secondhand.
B-(I) indigenous and (2) imported.
Only expenditure incurred on major additions and alterations OJ
replacements are to be reported. The idea is that only such expense~
should be included which either extend the normal economic life OJ
raise the productivity of the assets. However under this heading
there is no mention of the materials and fuels consumed and laboul
and other costs incurred by the establishments in the manufacture oj
machine for its own use. In many cases big industrial units manu-
facture parts of the plants and tools for their own use and such incre-
ments to capital is being ignored. This is a serious lacuna. In fact
there is an instruction that care should be taken not to include (in the
relevant block) any material used on capital account that is for additions
to capital. This cannot be justified on any account and it would give
an incorrect picture of the value created or added in the establishment
and it is also contrary to the recommendations to the Statistical Comi-
mission, United Nations, in International Standards in Basic Industrial
Series.
DEPARTMENT OF CoMPANY LAW ADMINISTRATION DATA
The figures obtained by the Company Law Administration relate
to the number of companies registered in any year and their capital.
Figures are available about the authorised, issued, called-up and paid-
up capital. Figures relating to calls, unpaid and forfeited shares are
also given. Figures of new registration and liquidation of companies
are also published. These statistics are given in two publications
namely :~
(I) M_onth!J BIlle Book on Joint Stock Con/panies in India.
(ii) Joint Stock Companies ill India (Annual).
FUNDAMENTALS OF STATISTICS
(v) R~finalJce Corporation for Industry (Private) Ltd. This was set
up in June 1958 and it provides relending facilities to industrial
concerns against loans given to them i>y banks. Till March 1960 the
Refinance Corpqration had sanctioned assistance to the extent of Rs.
4.16 crores.
FUNDAMENTALS OF STATISTICS
Manufacture of machinery
except electrical machinery 43 71 3.38
Manufacture of electrical
machinery, apparatus, appli-
ances and supplies 17 21 3.05
Manufacture of transport
equipment 8 15 7.77
! liscellaneous manufacture
industries 5 14 1.23
actiTity in the country. The base year of the index Was 1935 when
the lefies was started in 1938. It has been changed to 1953. The
constituents and weIghts assigned in the series have dso been- comp-
letely revised. The weights assigued to various items ar:c u follows;
Weights.
A. Industrial Production
1. SU8a~ .... 63
2. Tea 4.95
3. Jute Textiles 8.11
~. Cotton Textiles
(i) yarn 7.75
(~i) woven material 19.39
5. Cement 1.54
6. Coal 6.01
7. Iron and steel
(i) pig iron and ferro-alloys 1.91
(ii) finished steel 3.S1
8. Paper and board 1.27
9. Tyres 2.13
10. Automobiles 1.45
11. Bicycles 0.45
12. Electricity 2.11
65.52
.s. Transportation
.i.ailways: net ton-miles
100.00
Index Number of Industrial P.rofits
The office of the Economic Adviser under the :Ministry of Com-
merce and Industry used to pubUth an index number of industrial pro-
fits till April, 1951 when th.i3 work was transferred to the Ministry
of Finance. The index number 11 flOW published by the Company
Law branch under the Ministry of PiMnce. This index is based on
the following eight industries :
(I') Cotton, (it) Jute, (iii) Cement, (iff> Tea, (fI) Iron and steel, (n)
Paper, (flit") Sugar and ";il) coal.
The t.echnique of construction -of this index is Tery 8imple. A
number of companies have been selected [rom the list available in the
FUNDAMENTALS OF STATISTICS ~ 915
. nvestors Year Book and the profits of these companies are found out
nd an index for each industry is calculated on the chain base system.
bis chain relatives are also linked to the year 1939~ Formerly they
sed to be linked to' the year 1918.
This index number is extremely defective and should not be used
:>r general purposes. Some industries have been well represented
1 the index while the representation of others is very poor. Percen-
1ges of paid up capital of companies included in the index to the total
aid up capital of all the companies in that industry for the year I9~9-40
'ere 46.5 in case of cotton, 91.7 in case of jute, 80.t in case of cement,
).6 in case of tea, 71.2. in case of iron and steel, 31.6 in case of sugar,
to in case of paper and 63.7 in case of coal. It is obvious from these
~ures that whereas some industries like paper and jute «Ie well repre-
-nted others like cotton, sugar and tea are not represented adequately.
Besides being unrepresentative another drawback of this index
that the definition of the term profit used in the series is very defective.
here is no uniformity in the returns submitted by the companies and
ey use the term 'profit' in different senses
Apart from the statistics recently published by the Reserve Bank
'out 1001 companies and the above mentioned index of industrial
'ofits, no other statistics relating to industrial profits are available
our country. No doubt the companies have to file a copy of their
cadiitg and Profit and Loss Account and Balance Sheet with the Re-
strar of Joint Stock Companies, but. these valuable data are not offi-
tily put to any use. At a time when we are trying to have planned
onomic development in our country, studies relating to profits of
:lustrial enterprises are of special consequence. This is particularly
in our country where our economic plans aim Itt the establishment
a socialistic pattern of society. Studies relating to profits of the
:vate sector are extremely necessary and it would be worth while
have exhaustive data about this problem.
EMPLOYMENT STATISTICS
PRICE STATISTICS
prices have been renamed as harvest se"asonal prices and are published in
Agricultural Situations in India. These prices should be used with cau-
tion because there is no uniformity in the quality of the commodities
for w~ich figures are given and these prices have been found to differ
widely from similar figures collected by various trade associations and
technical journals in the country.
Other prices
Beside the prices of agricultural commodities at harvest time the
prices of other commodities are also available in a hrge number of
magazines and journals some of which are official while other's are un-
official or semi-official. MonthlY Survry rif Business Conditions i/1 India
used to publish the prices of a large number of commodities like cotton,
jute, iron, steel, sugar, coal, foodgrains, oilseeds, tea, etc. As has been
said earlier this publication has been merged with the jottrnal of Industry
and Trade which is a monthly publicatio1} of the Department of Industry
and Commerce. This publication gives the prices of agricultural and
non-agricultural commodities and the Economic Adviser's index num-
ber of wholesale prices which we shall discuss later on in this very
section, is based on the prices available in this publication.
During the period of war when prices of most of the commodi-
ties were controlled the Government used to publish the controlled
prices. Statistical data were published about the prices at which the
Government used to purchase the commodities and the prices at which
it sold the commodities to the public. Besides the above statistics
data relating to the prices of securities are also published in the shape
of bulletins by the Reserve Bank. Every week, the Reserve Batik of India
Bulletin is published and it gives the prices of securities, gold, silver,
etc. The weekly bUlletin of statistics and the MonthlY Abstract rif Sta-
tistics, ;lS also the Statistit:ll Abstract of India whi:ch are brought out by
the Central Statistical Organisation also contained statistics relating to
securities and bullion. The above account clearly shows that there
has been some improvement in the price statistics relating to our coun-
try in recent years. Most of the statistics prices relating to our country
are published by the Office of the Economic Adviser and the Direct-
orate of Economics and Statistics. These departments obtain price
statistics from various state governments and also non-official organi-
sations like Chambers of Commerce, trading firms, etc. The state
governments have also effected certain improvements in the collection
of price statistics and this work is now done by the Economic In-
telligence Inspectors who are specially appointed to collect statistical
data. Normally this work was done by the patwaris and kanungos who
did not devote much time to this work and supplied statistics of a very
poor quality. In order that these staistics may be collected uniformly
throughout the country the Central Government has laid down certain
instructions which are followed by most of the states.
Despite improvements that have been done in recent years in
price statistics there are yet a large number of defects from which
GROWTH OF STATISnCS IN INDIA
they suffer. First of all due to the absence of standardization, the prices
available at different periods are not strictly comparable with each
other. Moreover price statistics are not published at uniform inte~. t
and the data are not collected in such a manner that price index numbers I
may be easily constructed from them. In other countries of the world
a large variety of price indices are published and various types of per-
sons utilise them with profit. This thing is not possible in our country
unless price statistics are improved considerably both in quality as well
as in quantity.
PRICE INDEX NUMBERS
Total 100
In the food articles group there are three sub-groups, namely,
cereals, pulses and others, with weights 59, 8 and H respectively. Simi-
larly in the industrial raw materials group there are four sub-groups,
namely, fibres, oilseeds, minerals and others with weights of 53, 30, 10
and 7 respectively. Other groups are similarly divided in sub-grm.,ps.
In the semi-manufactures group there are seven sub-groups and in the
manufactures group there are three sub-groups. There in no sub-
group in the miscellaneous group.
Each of the sub-group is sub-divided into a number of items.
Thus in the cereals sub-group the number qf items is 4. In the sub-
group relating to pulses the number of items is z and so on.
.926 Criticism.
r GROWTH OF S1'A.1'IS1'ICS IN INDIA.
From the above table it is dear that the weight of the food group
has increased in the new index and the weights of non-foo'd groups
have decreased.
Average. This index number does not use geometric mean; it
uses weighted arithmetic average instead. The general indices of the
two series can be linked on the basis : 100 of the new series=380,6
(being the average for 1952-53) of the old series.
Calcutta wholesale price index number
This index number is monthly ; at present it is compiled from
69 commodities. Formerly it used to be compiled from 72. commodi-
ties which were divided in 16 groups. A separate index number is
worked out for each group by finding out the simple arithmetic average
of the price relatives of the articles included in the group. The base.
of the index number is July 1914 and the price quotations relate to the
wholesale prices of the commodities prevailing in Calcutta. Though
the index number of each individual group is calculated by the use of
simple arithmetic average yet an element of weighting is introduced by
taking more than one quotation for some items within a group. Thus
,. cereals" includes four varieties of rice, and only one each of wheat,
barley, maize and oats.
The general index number is arrived at by calculating the simple
arithmetic average of all the individual price relfltives included in'the
compilation. It can be said that this general in\:lex number is also a
weighted index number, the weights in each case being equal to the
number of items included in each group. This index number should
be used with caution. The price quotations on which it is based refer
to one day in a month and as such cannot be representative of the prices
of the whole month. The weights of the index number have not aiways
been the same because the number of varieties for which quotations
are available from time to time has not been the same. Two items were
dropped under sugar since February, 1934 and one item under cotton
manufactures since September 1936; as their quotations were no longer
available. This index number is unfit for being used in discussing pro-
blems of an all-India character, and there is a proposal to discontinue
the index number as it does not serve a very useful purpose. A similar
index number used to be published for Bombay but it was later dis-
continued. The Calcuttfl Wholesale Price Index Number was com-
piled by the Department of Commercial Intelligence and Statistics under
the Ministry of Commerce and Industry. Now this index number is
being issued on a temporary basis and may be discontinued any time.
It is published in the Indian Trade Journal.
Labour Bureau Series of Consumer Price Indices
Labour Bureau publishes consumer price indices for the follow-
ing centres in different States :
Assam : Gauhati, Silchar, Tinsukhia.
Bihar: Jamshedpur, Jharia, Dehri-on-Sone, Monghyr.
FUNDAMENTALS OF STATISTICS
Maharashtra : Akola.
Delhi : Delhi.
Madhya Pradesh: Jubbulpore, Bhopal, Satna.
Madras and Kerala : Plantation Centres (covering four centres
only).
Mysore : Mercara.
Orissa : Cuttuck, Berhampur.
Punjab : L.udhiana.
Rajasthan : Ajmer, Beawar.
West Bengal: Kharagpur.
The base period of most of these indices was the year 1944, but
it has been shifted to 1949. Base year of indices of Bhopal is 1951,
of Satna and Mercara 1953, and Beawar August 19P, July 1952. Four
Plantation centres covered in Madras-Kerala are Gundalpur, Kulla-
kamby, Vayithiri and Valparie and the base period for them is January
to June 1949. Most of other indices were started before 1949 and had
formerly different bases. The All India Consumer Price Index number
has also 1949 as base.
The items included in these indices are distributed over five major
groups namely :
(i) Food
(it) Fuel and lighting
(iiI) House rent
(iv) Oothing, bedding and foot-wear
(v) Miscellaneous.
The items falling under each group naturally differ from centre
to centre depending on the consumption pattern in various parts of
the country. These items were selected on the basis of the family bud-
get enquiry conducted in 1943-44 for 15 centres and for the remaining
5 centres the enquiries were conducted later on. The scope of these
enquiries and the sampling frame and fraction were not identical for
all centres and this is a ma.jor shortcoming of this series.
The prices are collected through Economic Intelligence Inspec-
tors or Marketing Inspectors, though in certain centres other agencies
are also used for this purpose. The prices are obtained on a weekly
basis though they do not refer to the same day of the week for
all centres.
Index numbers are constituted in two stages. First, group in-
dices are computed and then they are combined into the general index.
Both the group indices and the general index are weighted. In case
of group indices weights hav~ been assigned to items falling in the
group in proportion "0 expenditure on items in relation to the total
expenditure on the group concerned. In case of general index, weights
are assigned to various groups in proportion to expenditure on them
in relation to the total expenditure on all the groups combined together.
930 GROWTH 01' STATISTICS IN INDIA
age of all these indices is not satisfactory. To construct an- All Incli&
Index it is necessary that a large 1111mber of family budget enquiries
are conducted throughout the country and a larger dumber of regional
indices are compiled and then combined into one index.
Thus the Labour Bureau indices are, strictly speaking, not COD\-
parable either on the basis of time or space. The period of family
budget enquiries in these indices the frequency of price quotations, the
classification of items etc. are not uniform. Weight base and the
price base are also not identical and the quality of primary data on which
these indices are based is also uot above criticism.
New Scheme of Labour Bureau
Due to the above mentioned shortcomings a new series of can-
sumers price indices are being prepared by the Labour Bureau. Ag
mentioned earlier the family budget enquiries relating to workers at 50
industrial centres have already been completed in August-September
1959 and the details of the scheme are being worked out. Very sooo
we may have a new and uniform series of consumer price indices for a
large number of centres in the country.
State Series
Various State Governments compile ('ost of living index num-
bers and publish in their own Labour G:l.Zettes oriBulletins. At present
about 41 index numbers are being compiled in d.ifferent States of ~e
conntry in addition to the 20 indices compilt:d by the Labour Bureau.
Government of India. There is considerable diversity in the scope
and construction of these indices and t~ey are not comparable with each
otber. The base periods Qf these indices are different. Some index
numbers have as old a base as 1921 while there are others whose base
periods are very recent. Besides these diversities in the base period
the number of items included in these indices, their technique of cons-
truction and the method of collecting primary data, all differ from each
other and as Sllch the:ie indices are in no way comparable.
Most of these series are now maintained with the base 1949 -100
also. In case of other series which have a b:ase periQd of 1949 or those
following 1949 there is no change in the base periods. With a view
to arrive at the indices calculated on the original base periods, coo-
version factors have been worked out. The indelt calculated with the
new base of 1949 when multiplied by the respective conversion factor
gives the index with the original base. These conversioll factors are
published by the Labour Bureau in its Gazettes.
Most of the State series of indices were already subjected to cri-
ticism on the g[Qund that the base periods and the period of the family
budget enquiries on the basis of which weighting diagrams were arrived
at differed. After the shifting of the base to 1949 this criticism becomes
more valid.
PUNDAMEN'rALS OF S'rAl'IS'rICS
933
The number of items priced and included in these indices varies
between 2.0 and 75. Items of consumption expenditure have· been
mostly represented under the groups :
(I) Food (ii) Fuel and lighting (iii) Clothing (io) House rent
and (v) Miscellaneous. Hyderabad city index includes a
sixth group of "intoxicants" also.
We discuss below some of the State indices.
Bombay Working Class Cost of Living Index Number (Revised)
The working class cost of living index number for Bombay city
;vas first published in the year 192.1. As in the absence of any family
Judget enquiry it was not possible to assign weights to various items,
:his index number was constructed on the aggregate consumption
method. The first family budget enquiry in the working class was
held by the Bombay Labour Office from May 192.1 to April 192.2. in
Bombay city. A second enquiry was conducted between May 193z
and June 1933 and the results of the second enquiry have been used
in the compilation of the revised index number. Three per cent.
sample of the working class families was taken in the Bombay city
Enquiry and if the sample tenement was vacant or occupied by the
persons out of the scope of the enquiry the next tenement fulfilling the
conditions was taken up. Information was collected by the inter-
view method. Members of the Bombay Labour Office paid house
to house visit to collect information regarding the amount of money
spent on various items. These items are distributed over five major
groups which have the following weights :
I. Food 2.8
"'7
2.. Fuel and lighting 4 7
;. Clothing 6 g
14· House rent I 13
5· Miscellaneous 7 14
Total 46 89
The base yeat. of the index number was orig~y the yeu endiq.g
June 1934. The method used in the compilation of the index number
resembles that of the Rtitish Ministry -of Labout'. Tbe index number·
for each grcmp .i~ -&at cakalated by averaging tP.e group ngules ~
gether mer Welghting ~ by the perc:enta~5 that each item of a.
group bears to' the total expmidituoc on the ~oop. 'the 1inal in~~
number 'is aalcolatcd by a:vemgin8 'the group .indiCes after w~1)Wlg
them by the percentages that expendi~. on tadl Broop b~
to the total expenditure on all the grp'ups. The average used in botO
-cases is the.. arithmetic average.
In October J~4jt1R: ~cral index was shifted to a new base of
August 1~9. Formula. adopted for this p1UpOfe was very simple.
It was as follows ~
IbXl00
III=-""'__-
u
Where Ia indicates the index 011 ·the new base, n Rp.resents the
index on the old base ilnd 1& tho index for August I9~9 with a baR
for 193jJ.
Later on· the Labour Bureau shifted the base .of thilf'indc:.x to f949 ,
and in this ahifting also a conversion factor has been alculated for the
purpose.
Xanpur Working Cmea Cost or Living Index Nwnber
Kanpur working ~ss COOSlUDer ~rice indCJl; lllm).ber is maintained
by the office of the Labour Commtssioner, Government of Uttar
Pradesh. This series is based on the results of a family budget ~uiry
conducted in the year 1938-~9 among the working class families.
The tene~ents sampling was used as the sampling frame and 10 per
cent. sample was collected for study. In all I4Z% budgets were collected
FUNDAMENTALS OF STATISTICS
935
but due to the difficulties created by.the outbreak ~Second
World War only 300 budgets. all reIating to only one locality of Kanpur
city, namely Juhi. could be processed. The average budget was,
therefore, derived on the basis of the analysis of only 300 family bud-
gets. Items included in this index number are divided in the five
usual groups. Expenditure on household requisites is not included
in these five groups and the inde~ number covers only about 69 per
cetlt. of the total. expenditure of the working class families. The
weights of various items in this index number have been worked out
on the basis of relative expenditure on different items. Though
within the group the total weights of individual items adds upto
100 the total of the groups weights is only 69' In other words
only 69 per cent of the total family expenditure has been accounted
for by the items of consumption. expenditure.
Prices are collected from 10 selected shops in various labour lo-
calities of Kanpur. Prices are inclusive of all sorts of taxes the inci-
dence of which is bome by the consumers. The selection of shops
for price collection has been done on the basis of their popularity.
situation and volume of purchases made by the workers. Formerly
the price related to every Sunday but now they are collected on every
Saturday. The weekly prices are averaged to obtaill the monthly
prices of various commodities. It should be noted that unlike most
of the series of consumer price indices the rent index for Kanpur has
been kept upto date and this has been possible by conducting special
enquiries every six months. In the computation of the index, Les-
peyre's formula (which is the most common formula for the compila-
tion of these, indices) is used and the index is computed liS a weighted
average of the price relatives in two stages. First group indices are
compiled and then they are merged into a general consumer price
index number. The following are the group weights in this index
I. Food II 42
2. Fuel and lighting 2 6
3. Clothing 2 8
4. House-rent I 9
~ . Miscellaneous 6
Total 21
,. r GR9WTH OF STATISTICS IN INDIA
6<1)36
Working Class Consumer Price Index Calcutta
\
This index was originally initiated by the Controller of "Civil
Supplies Department. Government of West Bengal. At present it is
maintained by the office of the Labour Commissioner, Govdnment
of West Bengal. A special feature of this index is that n,O large
scale surVey of the family budgets were conducted to obtain the
weighting diagram. Instead, the results of a limited survey of family
budgets conducted by the Burma Shell Co .• among its v,,"orkers were
utilised for the purpose. This survey was conducted by the Burma
Shell Co. in 1939-40. The consumption expenditure was reported
under the usual five groups namely food, fuel andfighting, clothing,
house rent and miscellaneous. '
Prices are collected for this index from tw~ selected shops situated
in each of the two important industrial ~Ones of Calcutta, namely
Cossipore ~d Kidderpore. Prices are c611ected weekly by the clerks
working in the office of the Labour Commissioner. The house rent
is kept fixed in this index and ~lC formula used in its calculation is
the Lespeyre's formula. The general index is obtained by combin-
ing the group indices which, are first compiled as weighted averages of
the prices relatives.
A very big sho,rlcoming of this index is that the miscellaneous
group index is kept unchanged at 150. This figure was worked out
for the month of March 1943 by the Civil Supplies De~artment. Later
on when this 'index was taken over by the Labour Commissioner's
office, detail~ ofitems included in the miscellaneous group and their
weights were not available !u:1'd hence this group index has remained
unchanged.
S/forlcoming.r. All the State indices suffer from a number of draw-
backs. In most cases a temporal or spatial comparison is out of
q·.lestion even after shifting of the bases to 1949. There is no uni-
formity in their base periods. The items included in the series and
their classi~cation, ~e technique of obtaining price d.ata,. the frequ-
ency of prIce quotatiOn, and even the system of weIghting differ.
The family budget enquiries on the basis of which weights have been
arrived at do .not synchr~mise with the. base years, and in many cases
the consumptIon expenditure covered IS not full. As has been point-
ed out, in certain indice~, house re~t i.s altoge,ther a~sent or not fully
represented. The technIque of assIgmng weIghts 10 many cases is
not scientific and generally speaking the quality of these indices is
not very satisfactory.
It is, however,. gratifying ,to note that many S.tat~s are working
on schemes for startIng new senes of consumer pnce lOdlces of working
cla~s and ~he Labour ~ureau, Government ,?f India, is also very shortly
~ou::g to Issue an entIrely new set of such lOdices for about 50 centres
FUNDAMENTALS OF STATISTICS 937 I
scrip on the basis of ,the prices of the immediately preceding week. For
the first week of Jarluary, 1946 the price relatives were based on the
average prices of 1938. Unweighted geometric mean of the price rela-
tives of all scrips falling within each sub-group 'was -then calculated to
give the sub-group-link-relatives at each of the three centres, namely
Bombay, Calcutta and Madras. For the link relatives of th~ sub-groups
two sets of index numbt:rs were prepared : -
1. The regional index numbers. and
2. All India index numbers.
So far as regional index numbers were concerned for each centre
the iink relative for each of (he main groups was formed by computing
weighted arithmetic 'average of the sub-group link relatives. Weight&,
used in case of fixed dividend industrials and variable dividend indus
trials were proportional to the total paid-up capital of all companies
quoted on the stock exchange and belonging to the particular sub-
group. In case of Government securities weights were proportional
to the amounts of loans outstanding. Regional index numbers for
each of the' groups and sub-groups for each centre were formed by serial
multiplication of the corresponding link ,relatives.
So far as all-India Index numbers were concetued all-India sub-
group link relatives were first formed" by calculating weighted arithmetic
average of the sub-group link relatives at each Of the three centres.
From the all-India sub-group link relatives index,numbers were obtained
for each sub-group by serial multiplication. Similarly all-India group
link relatives were formed by weighting the suh-grQt;r.p link relatives
and all-India group index .numbers were obtruned from them by serial
multiplication.
(c) ::N ew series of the Reserve Bank of India
Ip. the year 1952-53 it was realised that the series of indices of se-
curity prices published by the Reserve Bank of India had become obsolete
and required certain modifications. The pre-war base period had
become out of date and some new securities quoted on the stock ex-
changes of the country had become very influential and deserved in-
clusion in the indices. The Government oflndia and State Governments
had also Boated certain loans in the market and these scrips were also
to be included in the index number. These reasons led to the r~vision
of Reserve Bank series in August, 195~. Though these revised indices
were published for the first time in August, 1953 they were worked
backwards up to the first week of April, 1953.
The important changes that were made in the new series are of
various types. Firstly a number of new scrips have been added and a
few old, scrips have been dropped. For variable dividend industrial
securities quotations have also been taken from Delhi Stock Exchange
and a new series of index numbers for debentures is also constructed.
At present the variable industrial securities relate to four centres...
namely, Bombay, Calcutta, Madras and Delhi and others to three centres
GROWTH OF S'l'A'IISTlCS IN INDIA" 941
only. Minor changes have also been made in the groupings. Now
. there ate four main groups as follows : -
t. Government and semi-government securities with three sub-
groups.
2. .Debentures of industrial concerns with eight sub-groups.
3. Preference shares with nine 'sub-groups.
4. Variable dividend securities with five sub-groups which are
further divided in eighteen still smaller groups.
Weights of industrial securities ~e now proportional to the market
value of shares during the base period instead of being proportional to
the paid-up capital. It was thought that market values correctly re-
present the relative importance of different industries and since this
it the practice in U.S.A.• France. Norway. Finland. New zealand. etc.
is was adopted in our country also. There has been a slight c.lulnge ih
the technique of the construction of index numbers also and .it is that
the chain index number of sub-groups arC': now weighted inste3Q. of
weighting the relative.
SECTION 6
WAGE STATISTICS
Statistics of wages in India have not been collected at regular
intervals either by official or non-official committees 01: commissions
or by any government deRartment under statutory sanction as has been
tdone in other countries of the world. This state of affairs was empha-
ic:ally criticised by the Royal Commission on Labour which pleaded
for better and more statistics about wage level in various industries in
different centres. Agricultural wages present anotill..t problem and the
data il.vailable about them are. extremely scanty and defective. It is
supposed that the condition of agricultural labourers is worse than that
of the industrial labourers and it is necessary to solve the problems of
agricultural labour and to study the various aspects of tqe agricultural
economy from the statistical point of view.-
In the matter of collecting statistics of wages with regard to in-
dustrial labourers the state of Bombay had taken the lead and pad con-
ducted comprehensive surveys to enquire into the wage levels of certain
industries. Similar surveys had een conducted in Bihar and some
other provinces. Certain statistics were collected, under the P'!1l'11ent
'/ Wag.ys Ad of 1936. The Report of the Labout Investigation Com-
mittee appointed by the Government of India (popularly known as
Rege Committee) also contained statistics of wages relating to certain
industrial centres of the country. Some earlier statistics relating to
wages are available in the publication Pri" and Wagu. This publica-
tion used to give the results of wages census in respect of some urban
and rural occupations. Its publication was suspended long ago. These
figures are extremely defective. The rates have been quoted between
very wide ranges and.the frequency of employment is not given. More-
over the unitoftime for which wages have been recorded is notruniform.
At present, so far as statistics of industrial wages are concerned, they
are available in the Annual Reports of the Chief Inspector of Mine and the
AIIlZllal Reports of the Working of Fact()f'ies Act. The Annual Reports
of the Working of the Trade Unions Act and the Workmeff s Compensation
Ati as also the Btnplqyee.r State Insllr(lllCl Act also contain certain statis-
tics of wages. The Labour Gazettes of the various .States and certain
publications brought out by the Labour Bureau also contain statistics
of industrial wages. Besides these the AnnIltJI Census of Manufacturl1' s /
also publishes statistics about the level of wages in different industries
of the country. Indian Labonr Ga~effe published by the Labour Bureau
contains statistics about the employm.ent in factories, employment
~hange statistics, wages and earnings, minimum wages in certain in-
dustries and average weekly earnings of certain types of labour.
GROWTH OF STATISTICS IN INDIA 943
Wage statistics are obtained from (I) the Labour Bureau, (il) CM
(iii) SSM! and (ill) AS! (from 1959 onwards).
Labour Bureau data of wages in manufacturing industries
The Labour Bureau publishes statistics relating to per capita
average annual earning collected under the Payment Qf Wages Act
1936. Under this the statistical authority in various Statc-s only collects
the returns from individual factories as defined in section 2(m) of the
Factories Act of 1948 and sends them to the Labour Bureau for con-
sideration. processing and publication.
Apart from tht: above mentioned statistics relating to manufactur-
ing industries the Labour Bureau also publishes statistics relating to : -
(a) Per caph.a .average annual earnings collected under the Mines
"Act (by the Chief Inspector of Mines) and Index Numbers of Nominal
Earnings in Employees in different Minor Industries (Base December
1951-100).
(b) Earnings of workers in plantation industries.
(&) Average annual earnings of certain type of staff working in
Government Ra.tLways, Docks~ and some atl hoe figures relating to earn-
ings of workers in nationalised motor transport.
(d) Wages of working journalists.
(e) Minimum Wages fixed or revised under the Minimum Wages
Act of 1948.
(J) Average Wages of casual agricultural labour.
eM, SSM! and AS! Wages Statistics
The CM contained the following wage statistics (upto 1958) r~
lating to 'workers' and 'qther employees' of the manufacturing units
covered by it : -
(!) Total salaries and wages paid in cash during the year kll fines
and deductions for abs~nce or damage or loss.
(ii) Total money value of any privilege or benefits or contribu-
tion not paid in cash but which was capable of being estimated in terms
of money and whicl! accrued to individual employees and not to a group
of employees.
All employees not covered by 'workers' were classed in the second
category (of persons other t~an workers).
Under SSM! also similar figures were collected on a random
sample basis for a large number of industries.
Under the AS! workers are classified as skilled, semi-skilled and
unskilJed and separate figures of wage are being collected for each cate-
gory. Wage statistics collected under AS! are of two tvpes namely
(,) For part I of the form which is applicable to certain types of
factones the term wages includes all contractual payments and exclu-
des all ,x-gratia payments like profit sharing bonus etc.
FUNDAMENT4LS OF S'lIATISTICS
lirom 1st j.ll1Uary, 1957 our country adopted a new trade classi-
fication and it is much better and more exhaustive anG s..cientific than the
e.ulier one. The c1assificatio'n of goods und~r the new system is more
logical and scientific and the following are the main categories uncle r
the new scheme:-
<i) Food.
(ii) Beverages and tobacc().
(iii) Crude materials, inedible except fuels.
(;6) Mineral fuels.
(0) Animal and vegetable oil and fats.
("'1 Chemicals.
(lIil) Manufactured goods.
(piit) Mach.inery and transport c<luipment.
tix) Miscellaneous manufactured articles.
Each of the above nine classes is further sub-divided into a number
of sm:\ller sub-divisions. Due to the adoption of this new classification
figures relating to the perioe prior to 1957 are not strictly comparable
with those of later year.
With regard to each item of export and import, figures are avail·
able not only of value but also of quantity exported or imported. Fi-
gurl!s of quantity represent the net weight exclusive of packing ete.
Value of the goods imported or exported was formerly based on tht
wholesale cash price less trade discount, for which like goods could be
sold at the time and place of import or export as the case may be. In
case of export no deductions were made but in case of import. from the
value thus calculated deductions were made for import duty payable
thereon. If it was difficult to ascertain the wholesale cash price the value
generally represented the cost at which goods of like nature and quality
could be delivered at such place. However if the goods were subject
to duty on tariff valLl::ttion, this value was taken as the correct value even
though there might be a great difference between this valuation and the
actual value.
With effect from April, 1951 the basis of valuation has been chang~
cu. Now the export values conform strictly to f.o.b. basis inclusive of
export duty. Imports are now valued at c.i.f. basis.
In making use of these figures the following things should be
remembered : -
(I) That these are based on declarations of importers and
exporters in the Bills of Entry and Shipping Bills
respectively which are accepted practically with.out ques-
tion. The policy of the Government has been that
the goods should not be detained on account of mit- .
statements. affecting 'statistics only' (not revenue) antI
therefore to a very great extent the accuracy of these figurell-
950 FUNDAMENTALS OF Sl'Al'lSTICS
figures are available about the trade in merchandise and treasure. Fi-
gures of movements of selected articles by rail and river between thest
blocks and port towns are published by the Department of Commercial
Intelligence and. Statistics. The registration of the rail-borne trade
is done by the R.ailway Audit Office. This office registers the goods
carried for delivery to consignees on their own lines and in some cases
for delivery pn connected lines. Trade carried on bet'\\'een the stations
in the same block is not regisrered. The required information' is col-
lected from the invoices which show the details of the goods that is,
place of deStination, the nature of the goods and their gross weight.
A certain percentage which varies according to the class. of goods is
taken to represenfthe weight of packing material and it is deducted
from the gross weight to arrive at the net figure. In tables only the
net weight is recorded.
The river borne trade. formerly u'<>ed to represent the trade curied
by inland steamers as well as country boats. Later on it \\ as thought
desirable to delete the figures relating to the trade carried through
country buats. The collection of £gures relatipg to the trade carried
on through country boats required elaborate arrangements and even
then the quality of statistics obtained was highly unsatisfactory. At
present the river borne trade represents only the trade carried by in·
land steamers between various blocks. The trade carried by steamers
is registered by steamer agents. The trade partly carried by rail and
partly by river when booked through and carried by steamers running
in connection with railways is generally recorded by the Railway and
is treated as rail-borne trade. ' _
These ~tatistics relate only to the qcantities and not the value of
goods tI:ansported.
9. Raw Col/on Trade StafiJtics
This publication deals with the imports and exports of raw cotton
from one State to another. It is a monthly tJublication and it gives
figures of the quantities moved from one block to another. The
figures relate to almost all the varieties' of cotton, and intra-block
movement is excluded.
10. Review of Trade of llldia.
This publication was formerly brought out by the Department
of Commercial Intelligence and Statistics but now it is publiShed by
the Office of the Economic Advh,er under the Ministry 6f Commerce
and Industry. This publication gives a review of India's trade both
foreign as well as inland and it contains useful tables relating to the
foreign trade, coastal trade, inland trade, balance of trade, etc.
11. StaiislicaJ Abstract of India.
Statisticnl .Abstl:act of India contains important tables relating
both to the: foreign as well as inland trade of India.
954 FUNDAMENTALS OF STATISTICS
1. Rajjways
~ailway statistics are compiled and published in our country by
the Rallway Board. Formerly they were available in the "AJlnual Re~
port of the R-ailway Board on Indiall Railways!' Now besides this annual
report. the Rail~ay Board also publishes "Montb{y Railway Statistics"
~lld thIS pl;lblicat~oo contains all relevant statistics relating to the work-
Ing of Indian Railways. Some of the data contained in this nublicatiol1
are also reproduced io the Monthly Abstract of Statistics. - Important
data contained in these publications relate to the following:
(i) Passenger Traffic and Earnings.
(il) Freight Traffic and Earnings.
(iii) Wagons loaded.
(iV) Labour employed and wages paid.
The figures relating to passenger traffic and earnings are given
separately for each class of traffic and for each railway. So far as freigllt
traffic is concerned figures are classified commoditywise and figures of
total tonnage are also available. Statistics relating to passenger miles
alld ton miles arc also published and the density of traffic as measured
by the number of passenger miles and ton _miles to the length of runn·
ing track is also calculated.
Figures of mileage published relate to both r~ule mileage. Double
or triple lines are cvunted only once but in calculating the track mileage
they are counted the proper number of times and besides this, the trans-
portation and commercial sidings are also taken into account.
In addition to these, statistics relating to capital ou tlay alld cami.lgs
of railways are also available. These figures include tolal capital as
charge, gross earnings, working expenses, net e,lcoings, perceutage
of working expenses to gross earnings and percentage of net earning
to total capital at charge etc.
De/ails regarding number of locomotives and wagons of various
kinds on the last day of the official year avcrage load per train, and
repairs etc., are also available~
The available railway statistics published by the Railw,:y Buard
are open to criticism. As early as 1937 the Indian Rallway Enpuiry
Committee criticised the then existing railway statisiics and offered
valuable suggestions for improvement. Some ot these sllggesticIDs
have been PUl in operation and the quality of statistics 1 a_-. \.. :iderably
improved since then. However even nuw ~ome of these published
statistics are misleading. Statistics reIatin8 to the working of railw,lY
include figun's of coaf'hing, earnings per train mile, costs of haulin
passenger train one mile, cost of hauling goods unit (1 ton) one mile,
profits on working a passengea train one mile etc. Such cost statistics
which are very technical in nature do not mean what they purport to
mean. The expert critic Lan makp. nv use I)f them and they ledd the
GROwfu'or STATI:-'1'IC~ It-; INDIA 9.55
Shipping
The statistics relating to shipping in our country arc nvailable in
the following publications:
(i.) .Monthly Abstract of Statistics.
(ii) Account s Relating to Foreign Trade and Navigation of
India.
tiii) Accounts relating to the Coasting Trad~ of India (Monthly)
The data published relate to :
(i) Tonnage of vessels entered :'lnd cleared with cargo.
(a) in foreign trade.
(b) in coasting trade.
(ii) Nationality of vessels.
(~ii) Country from which :'lnd to which ships arrive and go'
(iv) Ports into or ~rom which ships enter or cleared.
(t') Number cleated and tonnage of ships buil t in variOlls part s
of the country.
(vi) Number of tonnage of vessels registered in India.
(vii) - Loss of tonnage and number of live'S lost.
(viii) Number of passengers carried on long and short voyages.
(When Ii ship is continuously ou r of the port for 120 hours 'or more it
is a long voyage oth.erwise a short one. Long voyage figures are given
under two heads according as the destination is in India or outside ~ndja.
Short voyage figures are classified according as the desdnation is within
the same state, beyond the state but in India and outside India.)
5. Civil Air Transport
The statistics of Civil Air Transport are published by the Direc-
tor General of Civil Avi~tion, Ministry. of Transport and Communi-
cations, Government of India. The data are contained in the follow_
ing publications:
(i) Monthly News Itller of Civil Atlia/ioll.
(if) Monlhly Abslrtl(1 of Sfatistiu.
The figures relate to the following:
(i) Passengers miles flown.
(ii) . Revenue and nonrevenue load.
(iii) Passengers carried.
(iv) Miles flown.
(v) Hours flown.
Separate figures a~e :lv.libl,lc fnr inlernal :Jill! inteff1:lti0I1:11 ser-
vices ..
958 FUNDAMENTALS I)F STATISTICS
6. Communications
Statistics relating to Post Offices and Telegraph Offices and Tele-
phones are comp:JeJ in our country by the Director General of Posts
and Telegraphs in the Ministry of Transport and Communications.
They are published in the Monthly Abstract oj Statistics and the annual
figures are available in th, Annual Report of the Post and Telegraph D.-
par/menll·.
The following statistics are available in these publications
(a) Post Offices
(i) Nflmber of Post offien afld letter /'oxer.
(ii) Nllmber of poslal artirles balldled.
(iii) Mileage ov!!r wbicb mails are carried.
(iv) Total Nflmber alld amount of money orders.
(v) Post Office Savings Bank and cash certificates.
(vi) Postal Insurance Fllnd.
(flH) Dead LetJer Office. Nun/her of lellers dealt lVi/h.
(viii) Capilal expendittlre, rueipts and char.t:es.
(b) Telegraphs
The following information is pubHshed
(i) "i".,'''o;·aph Lines.
(ii) Number oj 1t!:legranJS halldled and tbeir reeei/l: s.
(iii) Number of Telegraph offices.
Besides the above information stati!;tics relating to technical de-
tails are also provided in ,the report.
(C) Telephones
The following information is available about the telephones In
the annual report of the Posts and Telegraph Department
(i) N"mber of exchanger.
(ii) Total lines and lolal lelephonts connected.
(iii) Nllmber of Departmenlal non-exchange teleph01les.
(iv) Total line and wire mileage.
(v) Revenue earned fronl rents and telephone (all fees.
(vi) Nllmber of licensed telepbone companies, excbanges, Sllb-excba1lges
and telephones.
(I'ii) Strength of the staff uBployed in the variofls brdllcbes of the depart-
nlenl and statements reiating to the receipt.r and expenditllre oj
the departments.
by the R.eServe Bank of India. These statistics are collected from ex-
change Control records which suffer from certain defects in the coverage,
classification and' timings. In regard to coverage the glaring defect
'relates to barter transactions which the exchange control does not cover.
Then there are other transactions 'which are omitted for instance, in-
vestments in kind, earnings ploughed back by nonresident companies,
short term trade credit etc. The gap in the private transacdon is larger
than the official transactions because the latter have been covered
through other sources.
So far as the classiiic:Hion of the transactions is concerned the
results of the exchange con'trol records are far from satisfactory. Ac-
cording to the booklet issued by the R.eserve Bank of India the rea-
sons for the unsatisfactory classification are (a) incomplete classification
of receipts (b) recordings of the transactions on a net basis and not gross
basis and (t) splitting of the same transaction under different categories.
Classification of receipts is incomplete mainly due to the rule under
which purpose of the transaction need not be disclosed if the individual
amount is not above Rs. 25,000/-. Such transactions have roughly
been estimated to be to the tune of Its. 50 crores per year. This defect
has been partly removed with the help of two surveys the travel survey
conducted on a continuous basi's since 1952 and the survey of unclassified
receipts relating to the quarter (July-September 1955) conducted by the
Reserve Bank of India. '
The transportation transactions are good examples of the re-
maining two defects mentioned above. Exchange control records
the individual merchandise transaction exactly in the same way as the
transactions are x:.:ported to the exchange control which may be either
f.o.b., c.f., c.i.f. Thus one cannot get a true picture of this item. Ex-
change control records only such traosat:tions uoder this head which
are not included in the merchandise data. EXChange control statistics
do not ep.ter satisfactorily the residual transport transactions included
under the item "transportation." They .include certain transactions
which must be excluded altogether and which are often entered on a net
basis, recdpts and payments offsetting each other.
The el!;pott figures of the data are recorded neither on shipment
basis nor the payment basis. They are recorded when the dOCllments
are received which may occur within varying periods from the date of
shipments. In a majority of cases, however, the figures are entered In
the same month in which the shipment takes place. In case of imports
the figures more or less cQrrespond to the time when the exchange is
released 'the non-residents except where the gaps in the statistics are
filled from independent sources (for example, data relating to foreign
aid). The releases of exchange may however not coincide with the tim~
when the goods arrive in India and to that extent the statistics presen-
ted are not fully accurate. With regard to other transactions the figures
correspond to the time of transfer of funds to and,from India.
GROWTH OF STATISTICS IN INDIA 965
Although the source of the statistics of India's balance of pay-
ments, that is, exchange control records, suffers from above mentioned
defects, there seems to be at present 110 way out except to accept them as
the final choice. Although the Reserve Bank of India is making efforts
to improve the quality of these statistics by conducting surveys from
time to time yet we cannot as yet say that OUt balance: of payments
statistics are fully dependable.
Apart from the data available from the Reserve Bank about India's
balance of payments certain statistics relating 10 merchandise imports
and exports (based on custom data) are published by the Director Gene-
ral of Commercial Intelligence and Statistics. However figures of trade
in merchandise as derived from customs and the figures published by
the Reserve Bank which are derived from the Exchange Control De-
partment vary considerably because of differences in timings coverage
and basis of valuation. The custom data relate to the physical movement
of the commodities whereas the documents of the Exchange Control
Department relate to actual "payments" rather than to "acc.ruaJs".
The major cause of discrepancy between the two s.ets of figurell Hel>
in the different basis of valuation adopted. The data based on cus-
toms returns are more useful for a study of balance of trade rather than
the balance of payments and as such one has to rely almost exclusively
on the Reserve Bank data So far as balance of payments are concerned.
It can be reasonably1Joped that as a result of the effects which are being
made by the Reserve Bank of"India the balance of payments statistics
df oilr country would improve in future.
SECTION 9
NATIONAL INCPME STATISTICS
Importance
National income statistics along with their various breakdowns are
the most important statistical measures of the ,economic activity of a
nation and are very useful in analysing current economic conditions. If
these statistics are available in the shape of a time series their iplpor-
tance increases considerably, for then, it is possible not only to describe
a nation's economy at'a particular'fime but-also to compare the changes
in it at different periods. National income statistics thus throw light
on the relative importance ofthe::various sections of a nation's economy.
The contribution of the vari(lUS/c:t)mponent parts of a country's eco-
nomy towards the national il"come and the statistics of per capita in-
come in different sectors indica-te·their relative importance. A balanced
economic development~of a counvty is oot possible in the absence of such
facts and figures. The creation of the, United Nations organization
and other international institutions has opened an entirely new avenue
for national income statistics. These stadstics, play an'important role
in the field of international economic relation~ ana are necessary for
international comparisons of the burden of taxation, war efforts, effects
of war and simifar other t,hings, The problem of the development
of economically backward countries and the progress made by them
is always studied in the background of national income and per capita
income statistics.
Methods of Calculation
Prot/litis M4lhodr. The concept of national income can be discussed
from three sides, "i~.• production, income and expenditure. From the
side of production, national income can be said to represent the na-
tional product or tlt~ total of the net values added in a l'artkular period,
in all 'branches of economic activity (including serviCes). Net income
from abroad is also included in the total. If the valuation of the out-
put of goods and services is done at market prices the national in-
come is said to be national income at market prices and if the valu-
ation is done at prices which equal to the payments received by various-
facto~s of production only, national income is said to be national in-
come at factor cost. In the latter case (national income at factor cost)
the net values of output are calculated net of indirect taxes but subsi-
dies are included so tliat the price is just equal to the payments made
to various factors of production. Another point that should be noted
in this connection is that the output must be the result of the labour
and capital invested by the residents of the country only. The term
residents is in itself ambiguous and many difficulties arise when, say,
the permanent residence of a pet:Son is in one country, his place of
GROWTH op S.TATIS'lIICa IN INDIA 967
work in a second country and the location of employer in a third coun-
try.
There is .no uniformity in the procedure followed in this respect
by various countries.
Intome'l Meihod. From the side of income the term national
income refers to the total of the distributive shares. In other words,
it is equal to the aggregate of the payment accrued to various factors
of production in a particular period in the shape of wages and salaries,
interest, rent, profits, etc. National income according to this method
is calculated by totalling the income of all the residents of a country
during a specified period. Here again the term income is rather am-
biguous. Gross income is not difficult to be calculated but there is
a diversity of opinion with regard to the items which should be de-
ducted from the figure of gross income to alrive at net income.
Expenditure M4Ihod. F,rom the side of expenditure the term
national income is equivalent to net national expenditure. In other
words, it is equal to the total expenditure on final consumption· goods
plus investments (both domestic and fordgn) and hoardings, if any.
In this field also various terms like expenditure, final consumption
goods, savings, hoardings and investment have different interpretations.
Difficulties in the calculation of India's national income
Mter dealing bridly with the importance bf national income
st-4tistics and the concept of the same We now propose to deal with the
special difficulties that are felt in the calculation of the national income
of India. Ordinarily there are difficulties in the calculation of national
income in all countries but in our country the difficulties are of a peculiar
nature. Broadly speaking we can divide the difficulties in two main
categories, "it., those which arise due to inadequacy and inaccuracy
of statistical data and secondly those which arise due to a peculiar cha-
~cter of our economy.
Data lnaiktjllale and Inatcurate. Statistical data relating to output
and cost of agricultural and industrial production are hopdessly inade-
quate in our country. We have, no doubt, some statistics with re<gard
to output of agricultural commodities but his information is also .in-
adequate aud incomplete. Data regarding the production of milk,
meat, vegetables, fruits, etc., do not exist, and on the whole, our food
statistics are unsatisfactory. In the field of industrial production, the
situation has improved only very re.cently, when the Census of Manu-
facturing Industries Rules were passed. We have now fairly good
statistics of the output of various large scale industries of the .country.
However, at present, we are collecting statistics of only 29 industri.es
out of the 63 industries that we have. It is necessary to collect statls-
tics with regard to the remaining industries also. Cottage and smal
scale industries present another sphere where few statistics exist. These
968 FUNDAMENTALS OP STATISTICS
these totals, the income from house property and other miscellaneous
items which could not be specifically classified was added. From this
gross income, money values of goods and services consumed in th~
course of production and net increase in country's foreign indebtedness
were deducted and the national income figure was thus arrived at.
Dr. Rao's method of national income calculation was- the best
under the conditions and circumstances in which the estiinate was made.
His results are considered to be much better than those of his pre-
decessors because of the many ad hot enquiries cpnducted by him to
gather information about many problems on which no statistics existed.
Scheme suggested by M~ssrs. Bowley and Robertson
In November 1933 the Government of India invited Dr. A. L.
Bowley and Mr. D. H. Robertson to exlUnine the data available for
estimating the national income and wealth of India and to make sugges-
tions for their improvement. They submitted their report in 1934 and
were of opinion that the then available statistics of the country were very
scanty and highly defective and unfit for national income estimation.
They suggested a scheme for the estimation of India's national income
the outline of which is discussed he.t:e. They defined national income
as "the money measure of the aggregate of goods and services accruing
to the inhabitants of a country during a year inclu ling net decrements.
from their individual or collective'" wealth. The authors discussed both
the Census of Products and Census of Incomes Method in detail and
recommended a cantious combination of both of them for estimating
~he total income of the country. The scheme suggested by them was
primarily/based on the Census~of products method though a small part
of the scheme relating to urban area was dependent on the coll~ction of
income figures. They were in favo\1r of distinguishing the rural income
from the urban income, as in their opinion, the nature of products, and
the methods of investigation suitable for rural and urban areas, were
different from each other. .
All these three definitions or concepts would give the same figure
()f national income provided the different items are treated in a like
manner in aU the cases. There is, however, a great diversity of
opinion with regard to the meaning of various terms used by differenf
countries and as such international comparisons should be made with
a great amount of caution. It is not possible, in a short space to
discuss the various items over which differences of opinion exist, but
the most important of them are the non-monetary items like unpaid
services of hO\1sewives, services durable consumer's goods like
*B.1sed on 'Nltional Income Stal....c:s'-Statistical Office of United Nations.
I. Net rent interest and dividends accruing (a) persons (b) social security
funds pension fun'ds, fund of life insurance companies, and non-profits institutions.
Inco~e received from abr;,,,d is in::luded, while income paid abroad is exclu~d.
&. Includes all expenditures on noncapital goods and services.
GROWTH OF S'J'AJ:]STICS IN INDLA 973
furniture, etc., rental value of owner occupied houses, farmer's con-
sumption of his own product and payments in kind. As a general rule
national income is confined to those items only which have a monetary
value but a better estimate of national income would certainly be one
which includes all types of goods and services. These items are differ-
ently treated by different countries like many other items (including
international transactions and capital gains and losses) about which
opinions are divided. Generally, however, most of the countries in-
clude services in the calculation of national income though no country
in the world includes all types of services. Unpaid services of house-
wives are probably nowhere taken into account in the calculation of
national income.
Now-a-days many countries calculate the national income by using
all the three methods separately or in a combined form. United States.
United Kingdom, Australia, France, Sweden and some other countries
calculate national income separately by the three methods and get fairly
consistent results.
National Infom, Commillee. In August, -1949 the Government of
India appointed a committee to calculate the national income of India
and its report has been published. The committee has calcuJated the
national income for a number of years and the method that it has follow-
ed in its calculations is broadly on the same lines on which Dr. Rao
made his estimates. The method followed by the committee is as
follows ! -
The committee has also combined the Inventory and the Incomes
Methods like its many predecessors in the line. First of all the com-
mittee has estimated the working force for the year 1948-49 and its
distribution in various occupations. Occupational classification is on
the basis of the classification of the economy by industry (including
agriculture. services, etc.). The Inventory Method or the Census of
Product s Method has been applied in the sphere of agriculture, forestry
and animal husbandry, hunting, fishing, mining and industry. The In-
comes Method has been applied in the field of Transport, Trade, . Public
Force, Public Administration, Professional and Liberal Arts and Dom-
estic Services. In some cases like Professional and Liberal Arts, statis-
tics of consumer's expenditure have also been utilised for the purpose
of income calculation. In urban areas the income from house property
is estimated from the municipal records and in the rural areas the in-
come is estimated on the basis of an average rate of return based on
the esnmated values of houses. After finding out the net income from
....arious sources in the above manner the committee has made adjust-
ments for earned income from abroad and has thus 1inally arrived at
the figure of national income for the year 1948-49.
Comparison of Dr. Rao's and the N.I.C. M~thods
Though the methods followed by Dr. R.ao and the National In-
come Committee appear to be similar in principle yet there are many
974 PUNDAMENTALS OP STATISTICS
f cal with the national product because the value of the production in
a nation will be the same as the;: pa'yment made to the factors which
contr~buted tQwards this production. Hence, national income will
be equal to the value of all products at factor cost. Tlie vruue of
pro~uction may be meas~red in terms of a f~ctor cost, i.e., payment
to tne factors of productlon; or at mafket prIce,. i.e., the price which
users paid for these products. So we learn tliat :
1. National income is same as not national product ::.t factor
cost.
2. Net national product at factor cost plus deprec'atiofl is
equal to the gross national product at factor cost
3. G~oss national product at factor cost plus indirect taxes
(net) is equal to the gross national product at market price.
This is popularly mentioned as the GNP.
The national income and product figures are aggregates and reveal
the totals of income and product. They are useful but 'unable to
reflect the basic structure of the economy. The reason is that these
statistics do not throw light on the constituents. of these totals.
Moreover, the totals of income and products generated in the country:
do not show the economic' activit'V which reSulted in these totals of
income and 'product. The economic activity is !lot an isolated and
stagnant performance. It is a network or flows. It is continuous,
interrelated human effort passing. shifting, transfer;-ing income and
products from one activity to another-from production to consu~ption
and accumulation. The sum total of all these activities makes up the
national economic activity. \
National accounts have been developed to present· a more faith~'
ful and finished picture of the national economic activity. It seeks
to attempt an orderly understanding of the strtlcture of tl)e economy
and the flow of economic activities in the country. National accouJ;lt
technique trjes to bring forward what the nation has been doing to
obtain the income and product totals. In understanding the natllre
of nationa1llccounts, three things must be clead y made out:
1. national accounts or statements record the transactions (or
activities) in the areas of production, consumption, accumulation and
foreign transactions by.sectors.
2. sectors are the institutional subdivisions of the economy, name-
ly, ownership, fonn of organization, industcy etc.
.3. the transactions are presented in the standard design developed
by the science of accountancy. The object is to arrange and structure
the transactions in such a way that they reflect the inflows and ou~flows
generated by the activities.
We find that national income is the preperation and presentation
of the totals of income and products generated in th<: co~ntry. This
982 FUNDAM~'NTALS OF STATISTICS
deals with the quantllm and concerns itself with the magnitude.
National accounts on the other hand seeks to detail alit these totals;
and to ,'reate a stmetllre which permits compilation and comparison
of all CCOI/olllie effort which would reflect the economic activity in the
national economy. Then, the concept of social accounting is a pro-
jection of nati'mal accounting approach into the more comprehensive
area. of a structure which is so c:>mprehensive that it covers total socia'
(lctivity so far as they can be guanti1ied. This would try to eliminate
the artificial distinction between the economic and non-economic
activities. Social accounts call for more elaborate structure and are
concerned with comprehensive, orderly consistent p re5entation of the
facts of human endeavour in any society. Once the problems ot
detailed data collection, classification and c(,mputation are solved,
social accounting will truly mirror and measure the national effort and
achievements.
In our country we do net have cno ugh .st'ltistics to have
meaningful set of national and social accounts. However, the C.S.O.
is bringing out a simple set of national accounts for the country. We
need more exhaustive information for the preparation of soda
accounts and efforts ate being made to fill up the gaps in statistical
data so that such accounts may be prepared for the country.
The national income commiltee attempted to ptesen't a sample set
of social accounts of India for the year 1948-49. These accounts were
derived from three kinds of information :_ I
The committee presented four accounts each for the three sectors
of the economy, namely, enterprises, house-holds and Government.
The four accounts presented for each of these three sectors related to
productioll (operating account) consumption (appropriate account),
addition to wealth (resting account) and external acCOUnt.
However in the tinal report of the National Income Committec'
these accounts were dropped as it was felt that the statistical data avail-
-able in the country were not adequate for the presentation of national
accounts. Since then the national account estimates have not been
published by the CSO also. It would be worth while to collect statis-
tics and to present a simple set of national or social accounts because
they reflect in a nut shell the entire economic activities of a country and
give a bird's eye view of the economy as a whole.
S~CTION 10
NATIONAL 'SAMPLE SURVEYS
Blgimling. With the attainment of freedom the full impact of the
large gaps in the statistical information available in our country was felt
and an urgent need arose to tone up its quality and- quantity. At the
instance of the Prime Minister Pt. Jawaharlal Nehru, Prof. P. C. Maha-
lanobis prepared an abstract scheme of organizing a National Sample
Survey (NSS) which was approved in principle by the Government of
India in January 1.950. A Directorate of National Sample Survey was
set up under the Ministry of Finance to collect economic and social
information from the whole country on the basis of random sampling.
The general approach in the planning of NSS was entrusted to Indian
~tatistical Institute together with the Gokhale Institute of Politics and
Economics, Poona, running under the direction of Prof. D. R. Gadgil.
A,s the experience and the field of activity of the Calcutta and Poona
Instit-utes had been very different it was not unnatural that certain
differences of opinion emerged at the early stages. However, as a result
of joint discussion. a plan for the sample design in its more restricted
sense ,of allocation. and the methods of selection of the sample units,
etc.' was evolved.
Meaning. The idea of using sampling methbd for collecting eCo-
nomic data was not new. It can be said to be a projection of tIle
"Scheme for an Economic Census of India" prepared by Professors
A. E. Bowley and D. H. Robertson in 1934. Sampling is of course
the only possible approach if it is desired to collect data about various
facts relating to agriculture,,industry, trade and services, etc. The only
possible method is to send round investigators to collect information
from a -comparatively small number of individual households, that is,
from ajample of households, selected in such a way that it would be
possible to estimate on the basis of the sample, the required information
for the country as a whole. Where the scope of survey is wide and the
territory. to be covered is large, the sample survey is the only practical
means for obtaining information at short intervals. This is especially
true of a rural economy of the size of India covering a vast area of 12.6
lakhs of square miles.
Metbod. For administrative purposes India was till recently. divid-
ed into 29 states (some.of which are very small), ~bout 300 districts,
about 2500 tahsiis (or equivalent units) 3000 towns, and about 5,86,000
villages in round figures. The total area of India is 1.26 million squart
miles broken up into some hundreds of millions of "plots" of land, and
at the 1951 census, population of 1ndia was 36 crores (7 crore households)
of w~om 6 crores in round figures live in towns and 30 crores in
GROWTH OF S'l'A'lISTICS IN INl>IA
villages. Tur.Q.ing now to the livelihood pattern we find that less than
one~third of the Pl!ople are self-supporting and of these again, less than
a third derive their income principally from non+-agricultural sources.
I As regards the sampling design, it can be stated very briefly that
the rural area' of the country was divided into about 250 geographical
Strata from which about a thousand sample villages were selected. In
the first three rounds the villages were di~ectly selected within a stratum
but in subsequent rounds two tahsils were selected in each 'stratum and
two villages were selected within each sample tahsil (an administrative
unit of about 500 square miles on an average). Finally a sample of
households was taken up for the household enquiry and a sample of
clusters of plots for the land utilization survey. From the third round
the survey was extended to urban areas with a stratification of towns
by size.
,. A general idea of the nature of information collected is given
below!-
(a) Sample villages: general economic information, and weekiy
prices of sclecte.d commodities; rates of daily wages of skilled and un-
skilled workers, etc.;
(b) HOllSeholds : (i) General Particulars. Age, sex, marital
status, economic and employment st1l.tus; births, deaths, etc.; detail!
regarding holdings, use of land under various categories; li~estock, reaj
assets, loans, savings, housing conditions, etc.;
(ii) Consumer expendifllre: on a very large )lumber of items;
(iii). Household enterprises ! agriculture and animal hl,lsbandry
acreage and production of different crops; p'articulars of industry, crafts
and trade, including fixed capital machinery and tools, fuel power, .raw
material~, quantity and value of production, Source of finance, etc.;
(c) Utilization of land: survey of sample of reyenue plots;
(d) Crop survey: crop acreage and estimates of the yield of crop
per acre by direct crop-cutting experiments; ana
(e) Sample 'survey of oJanujacturing establishments: (with 10 opera-
ives or m~re with power, or 20 operatives or more without power:
covering practically all groups of industry over the whole of India,
The statistical work of NSS (including the preparatiol1 of the
sample design and schedul~s, the processing and tabulation'of the data
and the preparation of the reports) is being done in the lSI. The first
report was published in December, 1952 and since then a chain of re-
ports an.d tr;chnical papers have been continuously coming. out. In
addition, special surveys have been conducted and reports and data in
tabular form have been supplied' from time to time to different agencies,
for example, the Fact Finding Committee of the Ministry of Rehabilita-
tion (survey of economic condition of refugees of West Bengal); The
Press Commission (habits of newspaper reading); The Taxation Enquiry
Commission (household consumption by expenditure levels); Tht
\
\
986 FUNDAMENTALS OF STATISTICS
broadly the same as in the third round but the design for the rural areas
was completely changed. The fifth round also included industrial pro-
duction which covered all household and non-household enterprises
in both rural and urban areas other than those employing a minimum of
10 or more workers with power or 20 or more without power on any
day in the year. The sixth round which commenced in May, 1953
sought to collect comprehensive information on a wide range of sub-
jects : consumer expenditure; other household expenditure, posses-
sions including land, tanks, agricultural implements, livestock, poultry,
etc., fertility, binp.s, deaths and diseases; manufacturing establishments,
agriculture and animal husbandry; small scale industry and handicraft;
transport, trade, professions and services, etc. \'{'ith time and ex-
perience the scope of the surveys became wider and included more
and more subjects of enquiry. The 20th survey (J uly 1965-June 1966)
included Goa, Daman, Diu and Pondicherry It now covers almost
all the aspects of the life and living of the people In the country.
In July 1969 the 24th round of the survey started.
Assessment of results. It would be instructive to note here some
of the criteria for the assessment of the results of the NSS. The main
difficulty arises out of the fact that the information on many aspects of
economic conditions in the rural ar eas of India has been obtained for
the nnt time. In most cases similar data are not available for the pur-
pose of direct comparison. There can be three broad approaches which,
of course, are complementary and must be used jointly and simultane-
ously. Firstly, the great scientific merit of sampling method is that in a
properly designed sample survey,it is possible to calculate the margin of
error of the results from the sample data themselves. Such calculations
supp1y a powerful tool to judge the significance of comparison
based on tlle NSS estimates. Secoud, in a properly designed sample
snrvey it" is. alsO ~ble to study (for example, by using an inter-pene-
trating ne,twork of sub-samples) and sometimes to eliminate, the effect of
non~liqg ,e&:ors which arise,from pias, recording mistakes, and other
disturbing factors operating at the stage of collecdon of the primary
information. NSS has used this method with considerable success.
Thit~ special ~quality' checks may be carried out by highly trained
and experienced workers (indudng senior statisticians) who would
tl;l~elves go ~t to the field and directly collect some of the critical
prifu:il:y ihta. ,The 'results thUS obtained can be used to determine the
re1iability of the data collected by the reguI,ar investigators. All these
~hods are lnternal. in, the senSe that information would be collected
hl'ttfe NSSitse!f in many differeDt ways with a view to improving the
~ahiHty 'of the results. A _seeond broad approach is to use external
~t;eks by comparing the NSS. results with data obtained from entirely
lndependent sources. External checks are of great value, and it is
intended to include test items in the NSS schedules from time to time
", ith the deliberate intention of using such information for purposes
of comparison with data obtained from independent sources.
988 FUNDAMENTALS OF STATISTICi
Questions
I. Write a brief critical note on the 19~ 1 census of population.
(R. Com., Allahabad, 19~2.),
2. Discubs the possible value of Census Reports to producers, manufacturers
.. nd businessmen. How can the Indian Census Reports be made more useful to these
people;' (M. CO"", Allahabad, 1948),
(B, Com, Na5.Pflr, 1945),
. 3· Discuss the main features of population statistics in India. What sugges-;
tl()nsw,mldyou offer to make them more reliable and useflll ? (M.A.,Allahabad, 1951).
4· "The Phoenix system is in fact a financial mistake as well as an-intellectual
crIme," (Cenml Commiuioner, 1941).
How far do you agree with the above criticism of the Indian census? In what
respect is th<: system of conducting the 195 I census and improvement over the
!9·H census ~ (111, A" Punjab, SeplP",bu, 195 I).
992 PUNDAMEN't'ALS OF S1'ATISTICs
J. What statistical data are nc:ccsaary for cilculation of the nct reproduction
rate? What is the deficiency in the existing Indian data in this respect?
(M. A •• Allahabad, x9' x),
. 6. "The available ~gricultutal statistic!! in India are incomplete and inaccurate
10 so. fat as ~a) data ~la~Jng to acreage and production in non-reporting areas ate
wantmg,(b) 1Oformatlon 10 respect of permanently settled areas is rar from satisfac-
tory, and (&) t}t, level of accuracy of output figures leaves much to be desired".
Comment on this statement, suggesting measures fQr elfectfug improvements
in each of these directions. (B. Com., AllalJabad, i949 and 195 1 ).
7. Define a normal yield and describe the official method of determining it.
What do you .consider to be the defects of the method and how would you remove
them ? (M. A .• Raj., 1950)
8. What do you understand by Jhe:term "Indian Agricultural Statistics" i'
outline their shortcomings and give concrete suggestions to remedy them.
(M. A., RaJ., 195 I,)
9. Why are agricultural statistics in the temporarily lIettled areas in India
said to be comparatively more reliable than those in the permanently settled areas.
(M. A., PHlljab. APril, 195 1 ).
10. Wdte a l'.lcid note on either the system of crop forecasting in India or
the adequacy. and reliability of data available on agticultur~1 prices and wages in'
India. .(M.A., PilI/jab, SeptBmber, 195 2 ).
1t • What important statistics of food production are available in India? How
are they c9mpiled, and in what official publications are they found?
(M. CoJ,J., Allahabad, 19S 2)
12. What is meant by Census of Production ? Why is such cenSllS taken?
How rar is the Indu~trial Statistics Act adequate from the point of view of
holding this cens).ls in India? (M. Com. -Allah.d, 1947).
13. Write a lucid note on the nature and scope of i~dustrial statistics in
India. (D. Com., Allahabad, 1953)·
14. Explain the importance of "Price Statistics", and examine the nature and
scope of the data relating to them avaqable in India. (M. Crm., Allababad, 1947).
IS. Which publication would you consult to find (i)the number of co-opera-
tive and land mbrtgage banks in different Indian States, (;;) the rail-borne inland
trade for the U. P, (iii) the annual absorption of small coms in India, and (iJ') the
\!heat area irrigated in U. P .. and Bihar? (M. A •• AI/aDabad, 19P).
16. What statistics are nec~ssary to ~e.ep under review the pu~c~asin.g
power of a country's currency? Write a crlhcal account of such statisticS 111
India. (M. A .• Agra., 1.945)'
17. What are the methods usually ad?~ted .for ~easuring til:e natjon~. income
of a country? 'Which of these in your opinion IS SUited to Indian cond1tlons ?)
(M. A., Allahabad, 1950).
18. What is national income? What statistical methods of its estimation are
known to you. Give a lucid account of the method actually adopted in any of th e
recent ()fficial or non-official estimates framed for India ..
(M.A., Punjab, April, 1952).
12' What are the special probiems of national income estimation in India ?
DeSCrIbe briefly the variobs methods followed for the calculation of Indian income.
eM. Com., Allababad, 1952).
20. 'Describe briefly the Ihethod followed by the National Income Committee
for framing estimates Gf national income of India 1948-49. How far does this method
.Hlfer from the one recommended by the Bowley Robertson Committee?
(B. C(}m., Allahahad, 1911)
GROWTH OF STATISTICS IN INDIA 993
E. Write notes on :-
(..) N.S.S.
(b) C.S.O.
(I) National Income Unit;
(tI) Reserve Bank Index of Security Prices;
(,) Consumer Price Indes Numbers (Labour BU!e1U)~
(f) Economic Adviser's Index Number of Wholesale Prices (now interim
series) ,
2.2.. 'Plsn~ng without lltatilltica is a ship 'With('\\It rudder and compass.' In
the light of this statement explain the importance of statistics as an. ~ecti~ aid to
national 1?lanning in India. ' (B. COlli., Bt:uuwaJ, 1958).
~ In a communique issued by the Government of lndia t~1I main featu1CS of
the 1961 CetlSUS have been described lIS the collection of economic data for pJanning
purposes, which indicates thac tbe prepatation for the bQge task are now afoot. In
the Jight of the above objective: what suge-estion can you give for the next population
census of lndi".
MATHEMATI[AL TABLES
Loprltbms
Mean Differences I
0 I 2 3 4 5 6 7 8 9
I 2 :s14 5 6 789
10 0000 0043 0086 0128 0170 0212 0253 0294 0034 0074 4 8U:l'12125 21) 33 37
II 0414 0~53 0492 0531 0569 0607 0645 0682 0719 0755 4 811 15 111 23 26 SO 3'
12 0792 0828 0864 0899 0934 0069 1004 1038 1072 1106 3 710 14 1721 24283
13 1139 1173 1206 1239 1211 1303 1335 1367 1399 1430 3 610 1816 19 2326 2'
14 1461 1492 1523 1553 1584 1614 1644 1678 1~3 1732 3 6 9 121618 21 24 2'
15 1761 1700 1818 1847 1875 1003 1931 1959 ? 7 2014 3 6 8 11 14 17 20222
16 2041 2068 2095 2122
2148 2175 2201 2227 '2253 2279 3 5 8 11 13 16 1821 2
11 2304 2380 2355 2380
2405 2430 2455 2480 2504 2529 2 5 7 101215 17 202
18 2553 2577 2601 2625
2648 2672 2695 2718 2742 2765 2 5 7 91214 1619 I
19 2788 2810 2833 2856
2878 2900 2923 2946 2967 2989 2 4 7 II 11 13 1618 j
20 301Q 3032 3054 3076 30116 3118 3139 31da 3181 3201 2 C 6 81113 1517 :
21 3222 8243 3263 3284 33~ 3324 3345 3365 3385 3404 2 4 6 81012 14 16 '
3444 3464 3483 360 3522 3541 8~79 3598
22 3424 3500 2
" 6 8 10 12
~~ ~~
2'
24
25
3617
3802
3979
3636
3820 :M 3674 36112 3711 3729
8856 3874 3892 3009
3997 4014 4081 4048 C005 4082
3747
3927
4009
3766 3784
Stu5 3962
4116 4133
2
I
!
4 6
4 5
8 6
7 (911
7 1111
7 " 10
12 a
1214
.
26 4160 U66 U8S 4200 4216 4232 4249 4265 4281 4298 2 3 5 7 810 1113
4314 4330 4346 4362 4378 4393 4409 4425 4440 4466 2 S 5 6 8 11 13
21
28
29
. .72
4624
4171
4487
4639
4786
4502
4654
4800
4518
4660
4814
4633
4683
(821)
4548
4698
4843
4564 4579 4594 C609
4713
4857 m?4742 4757
4se6 4000
2
1
1
S 6
3 4
3 4
6 8 I)
6 7 9
6 7 9
" 11 12
10 12
1011
30
31 4914 4928 4942 4956 4069 4983 4997 5011 5024 5038 1 8 4 6 1 8 1011
32 5051 5065 5079 50112 5105 5110 6132 5145 filS9 5172 1 8 4 5 7 8 911
5185 5198 5211 5224 5237 5250 5263 52,8 5289 5302 1 3 5 9 10
33
34
35
5315
5441
5328
5453
5940
5465
5353
5478
5366
5400
5378
5502
5391
551' 55~7 5539 5551 1 2 4
5416 5428 1 8 "" 5
5
6
6
6
8
8
7
910
910
36 5563 5575 5587 5599 5611 5623 5635 5/47 5858 5670 1 2 4 5 6 7 8 10
5682 5694 5705 5717 5729 5762 ~63 5775 5786 1 2 S 9
~;~
'51 5 6 7 8
38 6798 5809 5821 5832 5843 6866 77 5888 5899 1 2 3 I'> 6 7 8 9
J9 6911 5922 5033 5944 5955 5066 5077 988 5000 6010 1 2 8 4 I'> 7 8 II
0021 6031 0042 6053 6004 6075 6085 0096 6107 6117 1 2 3 8 II
40
6138 OaQ 6160 6170 6191 6201 6212 6222 1 2 3
" 5 0
6128 6180
"" 5 7 6
41 I) 6
42 6232 6243 6253 6263 5274 6284 62~ 6304 6314 6325 1 2 8 6 7 8
43 6335 6345 6355 6366 6375 6385 63 6405 6415 6425 {I 7 II
" I'>
f1.~
1_-H
64~ 6503
~~
44 6435 6Ul 6454 6464 6474 6484 6513 l,.. 6 7 8
45 6532 6542 6551 6561 6571 6580 65 6599 123 6 7 8
6R'1~ I,,~ ~
~
46 6628 0637 6646 6656 6702 6712 1 2 3 4. 5 6 7 7
47 6721 6730 6739 67~
6812
"6767 ~?6
6321~ o<lS9 6848 68.S?
I 6785 6794 6803 123 4 5 5
Ii
6 7
7
fo~
48 6875 6384 6893 1 2 3 6
49 ~~ o~ll 6920 61128 6937 6946 6964 6972 6981 1 2 S 44 "4. 5 6 7
-~ 61100 6998 7007 7016 7024 7033 17042 7050 7059 7067 123 S 4- Ii 6 7
/
)(ATHBMATlCAL'l'A.US
I Mean Differences
0 1 2 :5 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
55 7404 7412 7419 7427 7435 7443 1'51 7459 7466 7474 1 2
.-
2 3 4 6
--
1\ 6 7
56
57
58
7482
1559
7634
7490
7566
7642
7497
7574
76411
7505
7582
7657
7513
7589
7664-
7520
7597
7672
7528
7604
7679
7636
7612
7686
7543
7619
7694
7551
7627
7701
1 2
1 2
2
2
1 1 2
3 .
3 4 5
5
3 4 4
5 6
5 6
5 6
7
7
7
59 7709 7716 7723 7781 7738 7745 7762 7760 7767 7774 1 1 2 3 4 4 5 6 7
60 7782 7789 7796 7803 7810 7818 7825 7832 7839 7846 1 1 2 3 4 5 6 6
61 7853 7860 7868 7875 7882 7889 7896 7903 7910 7917 1 1 2 3
" 5 6 6
62 7924 7931 7938 7945 7952 7959 7966 7973 79SO 7987 1 1 2 3 3
63 7993 8000 8007 SOH 8021 8028 8035 8O~1 8048 8055 1 1 2 3
"3 "4" I) II
5 5 6 •
tl
64 8062 8069 8075 8082 8089 8096 8102 8109 8116 8122 1 1 3 3 5 5 (I
65 8129 8186 8142 8149 8156 8162 8169 8176 8182 8189 1 1
2
2 3 3 "
4 5 5 6
66 8195 8202 8209 8215 8222 8228 8235 8241 8248 8254 1 1 2 3 S 4 5 Ii 6
67 8261 8267 8274 8280 8287 8293 8299 8306 8312 8319 1 1 2 3 3 4 5 5 6
68 8325 8331 S338 8344 8351 8857 8363 8370 8376 8382 1 1 2 3 3 4 4 5 6
69 8395 8401 8407 8414 8420 8426 8432 8439 8445 1 2 2 3 4 5 6
70
8388
8451 8457 8463 8470 8476 8482 8488 8494 8500 8506 1
1
1 2 2 3 "4 4 5 (I
~
II
, """
8518 8519 8525 8531 8537 8543 8549 8556 8561 8567 1 1 2 2 II 4 6
8M's 8579 8585 8591 8597 8603 8609 8616 8621 8627 1 1 11 2 3 4 6 5
n 8633
8692
8639
8698
8645
8704
8651
8710
8657
8716
8663
8722
8669
8727
8676 8681
8739
8686
8745
1 1 2 2 3
8
4
~
6
6
6
6
8733 1 1 2 2
J: 8761 8756 8762 8768 8774 8779 8785 8791 8797 8802 1 1 2 2 3 S 4 6 6
76 8808 8814 8820 8826 8831 8842 8848 8854 8859 Ii {;
8865 8871 8876 8882 8887
8837
8893 8899 8904 8910 8915
1
1
1
1
2
2
2
2
S 3
8 3 "" 4 5
f! 8921
8976
8927 8932 8938 8943
81182' 8987 8993 8998
8949
9004
8954 811e0 8965 8971
9()09 9016 9020 9025
1
1
1
1
2
2
2
2
3 3
8 3
4
4
4 {;
.( 5
80
81
0031
9085
1lO3G 00'2 1lO47 0053
9090 9096 9101 9106
0058
9112
11068 9OG9 0074 0079
""
1 2 3
90 0542 9547 9552 9557 9562 9566 11571 9570 9581 058U 0 1 1 2 2 3 3 4
91 9590 9595 9600 9605 9609 9614 11610 9624 962» !l033 0 2 3 4
92 9638 964~ 96-17 9652 9657 9661 9066 9671 0675 0080 0
9685 9689 9694 9699 07()3 9708 9713 9717 0722 072i 0
1
1
I
1 2
1 2
I 2
2
2
3
3 3
3
4
4
"
4
93
94
95
9731 9736 9741 9745 9750 9754 9759 1)703 071)8 9773 0
9777 9782 9786 9791 9795 9800 9805 9809 9814 9818 0
1
1
1 2
1 2
2
2
3
3
3
3
3
4
4
""
4
9823 9827 11832 9836 9841 9845 9850 11854 9859 9863 1 1 3 4
96
97
98
986S
91H2
987219877
9917 9921
9881
9926
OSSIl
9930
9S9Q
11934
989'
9939
IlSIl'J
9943
(1003 99QS
9948 0952
0
0
0
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3 " ..
4
4
4
99 9956 9961 9965 9969 9974 9978 99& 99tl7 9991 9996 0 1 1 2 2 3 3 3 4
,ANTILOGARITHMS
Mean DUfcrcnrPR
0 I 2 3 4 5 6 7 8 9
I 2 3 4 5 6 1 8 9
- --- - - f - - _---
·00 1000 1002 1005 1007 1009 1012 1014 1016 1019 1021 0 0 1 1 1 1 2 2 2
1023 1026 1028 1030 1038 1040 1042 1046 0 0 1 1 1 2 2 2
~g~~
·01 1035 1
·02 Ion 1050 10:;2 1054 lO;:tU 1062 10(;4 1067 100\) 0 0 1 1 1 1 2 2 2
·03 1072 IOU 1076 1079 1081 10S4 1086 1089 1091 1094 0 0 1 1 1 1 2 2 2
·04 10!l6 1099 1102 1104 1107 1109 1112 1114 1117 1119 0 1 1 1 1 2 2 2 2
·05 1122 1125 1127 1130 1132 1135 1138 1UO 1143 1146 0 1 1 1 1 2 2 2 2
·06 11148 1151 1153 1156 1159 1161 1164 1167 1169 1172 0 1 1 1 1 2 2 2 2
·01 1175 117~ 11~0 1183 1186 1189 1191 1194 1197 1199 0 1 1 1 1 2 t 2 2
·08 1 1202 1205 1208 1211 1213 1216 1219 1222 1225 1227 0 1 1 1 1 2 2 2 8
·09 '1230 12:l:l 12:16 12:19 1242 1245 124i 12;;0 1253 1256 0 t 1 1 1 2 2 2 3
·10 1259 1262 1265 1268 1271 1274 1276 1279 1282 1~85 0 1 1 1 1 2 2 2 S
·11 1288 1291 129~ 1297 1300 130.1 1306 1309 1312 1315 0 1 1 1 2 2 2 2 8
·12 1318 1321 ]324 1327 1330 ]334 ]3:17 ]:)40 ]343 1346 0 1 1 I 2 2 2 2 3
·13 1:3411 1:152 1355 1358 136] 1365 1368 1371 1374 1377 0 1 1 1 2 2 2 3 3
·14 1380 1384 1387 1390 1393 1396 1400 140:1 1406 1409 0 1 1 1 2 2 2 3 s
·15 1413 1416 1419 1422 1426 1429 1432 1435 14.39 1442 0 1 1 1 2 2 2 3 s
·16 U45 14~9 1452 1455 14511 1462 1466 1469 1472 U76 0 1 1 1 2 2 2 3 s
·,1 HiD 1483 1486 1489 1493 Ufl6 UOO 1503 1507 1510 0 1 1 1 2 2 2 3 8
·18 1514 151i 1521 1524 1528 1631 15:15 1538 1542 1545 0 1 1 1 2 2 2 3 8
·19 1549 1552 1556 1560 1563 15(17 1570 1574 1578 1581 0 1 1 1 2 2 3 3 8
·20 1585 1589 1592 1596 1600 1603 1007 1611 1614 1618 0 1 1 1 2 2 3 3 S
·21
·22
·23
1622 1626
WOO 1663
HIllS 1i02
1629
1667
1706
1633
1671
1710
1637
1675
1714
1641
1679
1718
1644
1683
1722
1648
1687
1726
1652
1690
1730
1656
1694
1734
0
0
0
1
1
1
1
1
1
Z
2
2
2
2
2
2
2
2
S
3 3
3 3
"s S
4
·24
·25
17:18 1742
1778 1782
1746
1786
1750
1791
1754
1795
1758
17119
1762
1803
1766
1807
1770
1811
1774
1816
0
0
1
1
1
1
2
2
2
2
2
2
3 3
3 3 •
4
·26 1820 IB2~ 1828 1832 IB37 ISn 1845 1849 IBM 1858 0
1 1 2 2 3 3 3 4
·27 Is62 1866 1871 1875 1879 11!84 1888 1892 1897 ]901 0 1 1 2 2 3 3 3 f
·28 1005 1910 1914 1I1I9 1923 1928 1932 1936 19~1 1940 0 1 1 2 2 3 3 4 f
·29 1950 11154 1959 1963 1968 1972 1977 1982 1986 1991 0 1 1 2 2 3 3 4 4
·30 1995 2000 2004 2009 2014 2018 2023 2028 2032 2037 0 1 1 2 2 3 3 4 4
·31 20~2 20~6 2051 2056 2001 2065 20'0 207&
2123
2080
2128
2084
2133
0 1 1 2 2 3 3 4
4 4
,
·32 2089 209~ 2099 2104 2109 2113 2118 0 1 1 2 2 3 3
·33 2138 2143 2148 2153 2158 2163 2168 2173 2178 2183 0 1 1 2 2 3 3 4 4
·34 21811 2193 2198 2203 2201' 2213 2218 2223 2228 2234 1 1 2 2 3 3 4 4 5
·35 2239 2244 2249 2254 225!1 2265 11270 2275 2280 2286 1 1 2 2 3 3 t 4 6
·36 2291
2344
2296 2301 2307 2312
23;;0 2355 2360 2366
2317 2323 2328 2333
2371 237i 2362 2388
2339
2393
1
1
1
1
2
2
2
2
3
3
3
3
,
4
4
4
5
6
·37
·38 2391) 240~ 2410 2-l15 2421 2427 2432 2~38 2443 2449 1 1 2 2 3 3 4 4 6
·39 2455 2460 2466 2472 2477 2483 2~81) 24115 2500 2500 1 1 2 2 S 3 4 Ii 5
·40 2512 25111 2523 2529 2535 25~1 2S47 2553 2559 2564 1 1 2 2 3 4 4 5 6
, ,.
·41 2570 25i6 2582 2588 2594 2000 2606 2612 2618 2624 1 1 2 2 3 4 4 5 6
·42 2630 26'J6 2642 2649 2655 2661 2G67 2673 2679 2685 1 1 2 2 S 4 5 6
·43 26112 2698 2704 2710 2it6 2723 272~ 2735 2742 2748 1 1 2 S ·3 5 6
·44 2754 2761 2767 2773 2780 2786 2793 2799 2805 2812 1 1 2 a 3 4 f I) 6
2818 2825 2831 2838 2844 2851 2858 2864 2871 2877 1 J 2 S S 4 5 5 6
·45
·46
·47
2884
2951
28111
2958
2897
2965
2904
2972
2911 2917 2924
2!l79 2985 299~
~931
2099
2938
3006
2944
3013
1 J
1 1
2
2
if 3
3 3
,
4
5 II 6
5 6 f>
·48 3020 3027 3034 3041 3048 30!)5 306~ 3061' 3076 3083 1 1 2 3 4 4 6 6 II
·49 3090 3097 3105 3112 3119 312G 313 31'11 31411 3165 1 1 2 3 I) 6 6
1 J 7 " 4
MATHEMATICAL TABLES
ANTILOGARITHMS
J(ean DIlI'ere_
0 I 2 3 4 5 6 7 8 9
I 2 3 4 5 6 1 8 ,
. e
"SO
"51
3162 3170 3177 3184 3192 3109 3206 3214 3221 3228 1
3236
3311
3243
3319
3251
3327
3258
3334-
3266
3342
3273
3350
3281
33r.7
3289
3365
3296
3373
3S04
a38t
1
1
2 Z 9 4 5
11 3
1 2 1\ 8 4 fi
4 Ii
5
5 ~
, 7
7
"52 3443 6
"53 3388 33116 3404 3412 3420 3428 3436 34IH 3459 1 2 II 3 4 5 6 7
"54
"55
34li7
3548
3415
3556
3483
3565
3401
3573
3499
3581
3508
3589
31>16
31>97
3524
3606
31>32
3614
3540 1 2 II 3 4 5
3622 1 2 2 3 6 . 6
6
6
7
7
7
"56 3631 3639 3648 3656 3664 3673 3681 3600
3776
3698 3707
3793
1 2 S 3
1 2 3 a
.. r.
6
6 'I 8
6 'I 8
" ""
"57 3715 3124 3733 3741 3150 3158 3767 3784
"58 3802 3811 3819 3828 3831 3846 3855 3864 3873 3382 1 2 S 5 6 7 8
"59 3890 3899 ;1008 3917 3926 3936 3945 3!Jr.4 3963 3972 1 2 3 4 5 5 6 7 8
"60 3981 3090 3999 4009 4018 4027 4036 4046 405r. 4064 1 2 3 4 5 6 6 'I 8
"61 4074 4083 4003 4102 4111 4121 HaO 4140 4150 4159 1 2 3 .. 5 6 7 8 \)
"62 4169 4178 4188 4108 4207 4217 4227 4236 4246 4256 1 2 3" " Ii I) 7 8 \)
-63 4266 4216 4285 4295 4305 4315 4325 4335 4345 4355 1 2 3 4 6 6 7 8 9
"64 4365 4315 4385 4395 4406 4416 4426 4436 4446 4457 1 2 S .. 5 d 7 8 9
"65 4467 4471 4487 4498 4508 4519 4529 4539 4550 4560 1 2 3 4 Ii 6 7 8 II
"66 4511 4581 4592 4603 4613 4624 4634 4645 4656 4667 1 2 3 4 5 6 1 9 10
"67 4677 41\88 4699 4710 4721 4732 4742 4753 4764 47~6 1 2 3 4 5 7 8 9 10
"68 4786 4797 4808 4819 4831 4842 4853 4864 4875 4887 1 2 3 4 6 7 8 9 10
"69 4898 4909 4920 4032 4943 4955 4966 4977 4089 5000 1 2 3 5 6 7 8 9 10
"70 5012 5023 5035 5047 5058 5070 5082 5093 5105 5117 1 2 4 5 6 7 8 911
"71 5120 5UO 5152 51M 5176 5188 5200 6212 5224 5236 1 2 4 5 6 7 8 10 11
"72 5~"8 5260 5272 5284 5297 5309 5321 5333 5346 535S 1 2 4 5 6 7 9 1011
"13 5;,70 5383 5395 5408 5420 5433 5445 5458 5470 5483 1 3 4 5 6 8 o 10 11
"14 54~5 5508 5521 5534 5546 5559 5572 5585 5598 5610 1 3 4 .'i. 6 ~ 9 10 12
"75 5623 5636 5849 5662 5675 5689 5702 5715 5728 5141 1 3 4 5 7 8 9 10 12
"76 57~4 57G8 5781 5794 5808 5821 5834 5848 5861 5875 1 3 4 5 7 8 9 11 Ii
"77 5888 5902 5!ll6 5929 5043 5957 5970 5984 5908 6012 1 3 4 5 7 8 10 11 12
"78 6026 6039 6053 6067 6081 6095 0109 6124 fH3S 6152 1 3 4 6 7 8 10 11 13
"79 61116 6180 6194 6200 6223 6237 6252 6266 6281 6295 1 3 4. 6 7 9 10 11 13
"80 6310 63U 6339 6353 6368 6383 6397 6412 6427 6442 1 3 4 6 7 II 10 12 13
"81 6457 6471 6486 6501 6516 6531 n!")-l6 6561 n5i7 6502 2 3 5 \} 8 9 11 12 14
"82 6f)()7 6022 (i637 6653 6668 6683 6GO\) 6714 6730 6145 2 3 5 6 8 II 11 12 14
"83 67tH 67711 6792 6808 6823 6839 6~55 6671 6887 6902 2 3 5 6 8 II 11 13 14
"84 !lU18 1iU34 61150 6966 61182 61108 7015 7031 7047 7063 2 3 5 6 810 11 13 15
"SSij7U7!l 7000 7112 11:l9 7145 111ll 717B 7194 7211 7228 2 3 5 7 l! 10 12 13 15
"S6 j7'2'lt 72(H 7278 729" 7311 7328 7345 7362 7379 7396 2 3 5 7 810 12 13 }&
"87 7413 7430 7447 7464 7482 7490 75111 7534 7551 7568 2 3 5 7 910 1214111
"89 I ~~~~ 7780 7198 1816
·88 7flOa 7621 7638 7656 7674 76!l1 7709 7721 7745 2 4 5 7 911 12 14 16
7834 7852 7870 781111 71107 7925 2 4 5 7 II 11 131416
"90 1 794 7962 7980 7998 S017 S035 8054 8072 8001 Bll0 2 4 6 7 \) 11 13 15 17
;1
.,1 8128 RU7 8166 81S5 8204 8222 8241 8260 8279 8299 2 6 8 911 13 1h 17
-92 8318 8337 8356 8315 8395 8414 8433 8453 8t12 8492 2 " 6 810 12 14 15 17
,""
.,3 8;;11 8531 8551 8570 8590 8610 8630 86r.O 8610 8600 2 4 6 810 12 14 16 18
"94 8710 8130 8750 8770 8790 8810 8831 8851 8872 8892 2 6 8 10 12 14 16 18
.,5 S913 8933 8954 8974 8995 9016 9036 9051 0078 9099 2 6 8 10 12 15 11 19
"96 9120 9141 9162 9183 9204 9226 9247 9268 9200 9311 2 6 8 11 13 15 17 19
"97
"98
9333
9550
9772
0354
0512
9376
9594
9397
9616
11419
9638
9441
9661
9462
0083
9484
9705
9506
9727
9528
9750
2
2
4
4
" 7
7
9 11 13
9 11 13
15
III
1720
1820
\)1114
" 97115 9817 9840 Il863 9886 9008 9931 0054 9977 t 5 7 III 1820
9911 FUNDAMENTALS OF STA'rISnCS
.. g:
46 2116 97336 11-7823 3'5830 0'02174 21'4476 0'U744 0-<l4003
47 2209 103823 6'8557 3·6088 , 21·11796 0·14587 0-<l4613
48 2304 l1o.~92 6'9282 3-6342 21·90811 0·1443-& 0-<l4564
2401 117649 7-0000 a·6693 O-oiO&l 22'1359 0'14286 0-<l4518
50 2500 125000 7-0711 31l84O 0-02000 22'3607 0'14142 01K472
\
999
Meaa I>iSelwIces
0 1 2
--
3
• ,; 6
-- -
7 8 II
123 U8 781
I-- 1 - ---- I - - -- I- I-
1-0 1'000 1'005 1'010 1'01 5 1'020 1'02 5 1'030 1'034 1'039 1'044 OIl 223 344
-- -- --
1-1 1'049 1'054 1'°58 1.063 1·068 1'°72 1'°77 1·082 1:086 "0<)[ Oil 223 344
1'2 1'0<)5 1'100 1' 105 1'109 l'Il4 1'1l8 1'122 1'127 1'13 1 "136 ° I I 223 344
1'3 1'14° 1'145 1'149 1'153 1'158 1'162 1'166 1'170 1'175 {'{79 OIl 223 334
1-4 1' 183 1'187 1'192 1'196 1'200 1'204 1'208 1'212 1'217 1'221 oI I 222 334
1·5 1'225 1'229 1'233 1'237 {'241 1'245 {'249 1'253 1'257 1"261 OIl 222 334
1-6 1' 265 1'269 1'273 1'277 1'281 1' 28 5 1'288 1'292 1'296 ('3OO OIl 222 333
1'7 1'3°4 1'3°8 1'311 1'3 15 1'3 19 1'3 23 1'327 1'330 1'334 1'338 OIl 2::1 2 333
1,8 1'34 2 1'345 1'349 1'353 1'35 6 1'360 1'364 1'367 1'371 1'375 011 122 333
1'9 1'37 8 1'382 1'386 1'38 9 1'393 1'396 1'400 1'4°4 1'4°7 1'411 011 122 333
2·0 1'4 14 1'418 1'421 1'42 5 1'428 1'432 1'435 1'439 1'442 1'446 011 122 233
2'1 ['449 1'453 1'456 1'459 1'46 3 1'466 1'470 1'473 1'476 1'480 oI I 122 233
2,2 1'48 3 1'487 1'490 1'493 1'497 1'500 1'5°3 1'5°7 1'5 1P 1'5 13 011 122 233
2,3 1'5 17 1'5 10 1'52 3 1'526 1'530 1'533 1'536 1'539 1'543 1'546 011 122 233
2·4 1'549 1'55 2 1'556 1'559 1'562 1'56 5 1 °5 68 1'572 l S7S
o
I'S78 oI I t22 233
2·5 loS81 1'584- 1 °5 8 7 1'59 1 1°594- 1°597 1·600 1'603 1·606 1·60<)
122 Oil 233
2,6 1·612 1·616 1.61 9 1·622 1·625 1·628 1.631 1,634- 1·637 1·640122 oI I 223
2,7 1,64-3 1°64 6 1,649 1'652 1,655 1'6S/) 1 0661 1,664 1'667 1,670122 OIl 223
2·8 1·673 1.676 1·679 1·682 1.685 1,688 1.691 1'694 1'697 1'700112 011 223
2·9 1'7°3 1'706 1°709 1'712 1'7 1 5 1'718 1'720 1'7 23 1'726 1°729[[ 2 Oil 223
3·0 1'732 1'735 1'738 1'74 1 1°744 1'746 1'749 1'75 2 1'755 1'758I I2 011 223
3,1 1'7 61 1'764 1'766 1'769 1'77 2 1'775 1'778 1'780 1'7 8 3 1'786I I 2 ° I I 223
3·2 "7 89 1'792 1'794 1'797 1·800 1.803 '°806 1·808 1·811 1.81 4
I I 2 oI I 222
3,3 1.817 1'81 9 1·822 1"82 5 1·828 1·83° 10833 1.836 1,838 1'841112 Oil 222
3·4 1·844 1·847 1,849 1.85 2 1·855 1·857 1·860 1'863 1,865 1·8681I 2 ° I I 222
3·5 1·871 1'873 1°876 1·879 1·881 1°884 1·887 10889 1.892 1·895 112 all 222
a'6 {·897 I'goo l'g03 1'90 5 l°goB 1°910 1 °9 13 ['916 1'9 18 1'921 II Z OIl 222
3·7 1'924 1'926 1'9 29 1'93 1 1'934 1'936 1°939 1'94 2 1'944 1'947 II 2 ° I I 222
8·8 1'949 '.95 2 ['954 1°957 1'960 1'962 1°96 5 1'96 7 1'97° 1'97 2 II Z ° I I 222
39 1'975 1'977 1'980 1'982 1'98 5 1'987 1'990 1'992 1'995 1'997 I I2 01 J 222
...·0 2'000 2'002 Z'005 2'007 2'010 2'012 20015 2'017 2'020 2'022 II I 00 I 222
4'1 2' 02 5 2'027 2'°30 2'°3 2 2'035 2'°37 2 0040 2'042 2'045 2'°47 1I I 001 222
4,2 2'°49 2'°5 2 2'°54 2'°57 2'°59 2·062 2°064 2'066 2'069 2'071 I I I 001 222
4·3 2'°74 2'076 2'078 2'OSI 2'083 2·086 2'088 2'090 2'093 2'°95 II I 001 222
4·4 2'098 2'100 2'IO,a 2'1°5 2'107 2'lIO 2°112 2'114 2'117 2'119 II I 00 I 222
4·5 Z'121 2'124- 2'126 20128 2°13 1 2'133 2'135 2'138 2'14° 2'142 I I I 001 222
4·6 2'145 2'147 2'149 2'15 2 2'154 2'156 201 59 2'161 2' 163 2'166 II J 001 22:1
4·7 2'168 2'17° 2'173 2'175 2'177 2'179 2'182 2' 184 2'186 2' 189 I I 1 00 I 222
4·8 2'191 2'193 2'195 2'198 2'200 2'202 2'2°5 2'2°7 2' 209 2'211 I J I 001 222
4·9 2' 21 4 2'216 2'218 2'220 2'223 2'225 2'227 2'229 2'23 2 2'234 I I J 00 I 222
5·0 2·236 2'23 8 2'241 2 0243 2'245 2'247 2'249 2'252 2'254 2'256 001 J I I 222
5-1 2'258 2'261 2' 263 2' 26 5 2'267 2'269 2°272 2'274- 2'276 2 0278 001 I I I 222
S'lI 2'280 2'283 2028 5 2'28 7 2' 289 2'291 2 0293 2'296 2'298 2°3 00 001 I I I 222
5·3 2'3°2 2'30 4 2'3°7 2'3°9 2'3 11 2'3 13 2'3 15 2'3 17 2'3 19 2'3 22 001 I I ! 222
6-4 2 °3 24 2'3 26 2 °3 28 2 °33 0 2°33 2 2'335 2°337 2'339 2'34 1 2'343 001 I I I [22
1001
I14THBVATICAL TABU'
0 1 sa 3 , IS 8 7 8 9
Mean I>lIfem>ceS,
Hean Differences
0 1 sa 3 4 5 6 '1 8 9
123 456 '189
10 3'162 3'178 3'194 3'209 3'225 3'240 3'256 3'271 3'286 3'302 235 68 9 II 12 14
11 3'3 17 3'33 2 3'347 3'362 3'37 6 3'39 1 3'406 3'421 3"435 3'450 134 679 101213
12 3'464 3'479 3'493 3'5°7 3'5 21 3'536 3'550 3'564 3'578 3'59 2 134 678 10 II 13
13 3,606 3,619 3,633 3,647 3'661 3,674 3'688 3'701 3"715 3'7 28 134 57 8 10 II 12
14 3'742 3'755 3'768 3'782 3'795 3,808 3,821 3,834 3'847 3'860 134 57 8 911 12
15 3'873 3,886 3,899 3'9 12 3'924 3'937 3'950 3'962 3'975 3'987 134 5 68 9 10 II
16 4'000 4'012 4'02 5 4'037 4'05 0 4,062 4'074 4'087 4'099 4'111 124 56 7 91011
17 4' 12 3 4'135 4'147 4'159 4'17 1 4' 183 4'195 4'207 4'219 4'23 1 124 56 7 810 II
18 4'243 4'254 4'266 4'278 4'290 4'3°1 4'3 13 4'324 4'336 4'347 12 3 56 7 8 910
19 4'359 4'370 4'3 82 4'393 4'4°5 4'416 4'4 27 4'43 8 4'450 4'461 12 3 56 7 8 910
20 4'472 4'483 4'494 4'506 4'517 4'5 28 4'539 4'550 4'561 4'572 12 3 46 7 8 910
21 4'5 83 4'593 4,604 4'61 5 4,626 4,637 4,648 4,658 4,669 4'680 12 3 45 6 8 910
22 4,69° 4'7°1 4'712 4'722 4'733 4'743 4'754 4'764 4'775 4'7 8 5 12 3 45 6 7 8 9
23 4'796 4,806 4,817 4,827 4,837 4,848 4,858 4,868 4,879 4'889 12 3 45 6 7 8 9
24 4,899 4'909 4'919 4'930 4'94° 4'950 4'960 4'970 4;980 4'990 I :1 3 456 7 8 9
26 5'000 5'010 5'020 5'03 0 5'040 5'05 0 5'060 5'°70 5'079 5' 08 9 12 3 45 6 7 8 9
26 5'099 5'109 5'119 5'128 5'138 5'148 5'15 8 5' 167 5'177 5' 187 12 3 456 7 8 9
27 5'196 5'206 5' 21 5 5'225 5'235 5'244 5'254 5' 263 5'273 5'282 123 456 7 8 9
28 5'29 2 5'3° 1 5'3 10 5'3 20 5'3 29 5'339 5'348 5'357 5'3 67 5'376 12 3 456 7 7 8
29 5'38 5 5'394 5'404 5'4 13 5'422 5'43 1 5'441 5'450 5'459 5'468 12 3 455 6 7 8
30 5'477 5"486 5"495 5'5 0 5 5'514 5'5 2 3 5'53 2 5'541 5'55 0 5'559 12 3 445 6 7 8
31 5'5 68 5'577 5'5 86 5'595 5' 604 5'612 5,621 5'63° 5'639 5'648 12 3 345 6 7 8
32 5,657 5,666 5,675 5' 68 3 5,692 5'7° 1 5'7 10 5'718 5'727 5'736 12 3 345 6 7 8
33 5'745 5'753 5'7 62 5'77 1 5'779 5'788 5'797 5' 805 5,814 5'822 12 3 345 6 7 8
34 5'83 1 5,84° 5,848 5,857 5,865 5,874 5,882 5,891 5,899 5'908 I,. 3 345 6 7 8
35 5'9 16 5'9 2 5 5'933 5'941 5'95 0 5'95 8 5'967 5'975 5'983 5'99 2 122 34S 6 7 8
36 6'000 6'008 6'017 6' 02 5 6'033 6'042 6'050 6'°5 8 6'066 6'075 122 345 6 7 7
37 6' 08 3 6'09~ 6'099 6'107 6'II6 6' 124 6'13 2 6'140 6'148 6'156 122 345 6 7 7
38 6'164 6'173 6'181 6' 189 6'197 6' 20 5 6' 21 3 6'221 6'229 6'237 [22 345 6 6 7
39 6'245 6'253 6'261 6' 26 9 6'277 6' 28 5 6'293 6'301 6'3 09 6'3 17 122 345 6 6 7
40 0'3 25 6'33 2 6'340 6'34 8 6'35 6 6'3 64 6'372 6'380 6'3 8 7 6'395 122 34S 6 6 -;
41 6'40 3 6'4 11 6'4 19 6'4 27 6'434 6'44 2 6'450 6'45 8 6'4 6 5 6'473 122 345 5 6 7
42 6'481 6'488 6'496 6'5 04 6'5 12 6'5 19 6'5 2 7 6'535 6'54 2 6'55 0 [22 345 5 6 7
43 6'557 6'5 65 6'573 6'580 6'588 6'595 6' 603 6'611 6,618 6'626 122 345 5 6 7
44 6,633 6'641 6,648 6,65 6 6,663 6,67 1 6·678 6,686 6,693 6'701 122 345 5 6 7
45 6'708 6'716 6'7 23 6'73 1 6'73~ 6'745 6'753 6'760 6'768 6'775 I 1 2 344 5 6 7
46 6'782 6'790 6'797 6' 804 6,812 6,819 6,826 6'8341 6 '84 1 6,848 I I 2 344 5 6 7
~7 6,856 6' 86 3 6,87° 6,877 6,885 6,892 6,899 6'9°7 6'9 14 6'921 [ I 2 344 5 6 7
,48 6'9 28 6'935 6'943 6'95 0 6'957 6'9 64 6'97 1 6'979 6'9 86 6'993 I 12 344 5 6 6
49 7·000 7'007 7' 01 4 7'021 7' 02 9 7'03 6 7'°43 7'°5° 7'°57 7' 064 I J 2 344 5 6 6
60 7'07 1 7'07 8 7' 08 5 7'09 2 7'099 7'106 7' I l 3 7'120 7' 12 7 7'134 I J 2 344 5 6 6
51 7'14 1 7'14 8 7'155 7'162 7' 169 7'176 7' J8 3 7'19° 7'197 7'204 I I 2 344 5 6 6
52 7'2U 7'218 7'225 7'23 2 7'239 7'246 7'253 7'259 "266 "273 1 J 2 334 5 6 6
53 7'280 7'287 7'294 7'3° 1 7'3 08 7'3 14 7'3 21 7'3 28 7'335 7'342 112 334 S S 6
54 7'348 7'355 7'3 62 7'369 7'376 7'3 82 7'3 89 7'396 7'40 3 7'409 I I 2 334 5 5 6
MA.THKM.\TICAL TABLES
Mean Differences
0 1 2 3 4 5 6 '1 8 9
-66 - - _- -- -- _- -- I- -- -- -- -
123 456 789
- --
7'4 16 7'423 7'43° 7'~36 7'443 7'45 0 7'457 7"463 7'470 7'477 112 334 55 6
56 7"483 7'490 7'497 7'5°3 7'5 10 7'5 17 7'5 23 7'53° 7'537 7'543 I I 2 334 55 6
67 7'55° 7'55 6 7'5 63 7'57 0 7'57 6 7'5 83 7'589 7'596 7' 603 7' 609 I I 2 334 55 6
58 7,616 7,622 7,629 7'635 7,642 1,649 7. 655 7,662 7,668 7'675 I I 2 334 55 6
59 7,681 7,688 7,694 7'7°1 7'7°7 7'714 7'720 7'7 27 7'733 7'740 I I 2 334 456
60 7'746 7'75 2 7'759 7'7 65 7'77 2 7'778 7'785 7'79 1 7'797 7'804 1 I 2 334 45 6
61 7·810 7,8 17 7.823 7.829 7,836 7'84~ 7'849 7,855 7,861 7,868 I I 2 334 45 6
62 7·874 7,880 7,887 7.893 7·899 7'906 7'9 12 7'9 18 7'9 25 7'93 1 I I 2 334 45 6
83 7'937 ,'944 7'95° 7'95 6 j'962 7'9 69 7'975 7'9 81 7'98 7 7'994 I I 2 334 45 6
64 8'000 8·006 8'012 8'01 9 8'02 5 8'°3 1 8'°37 8'044 8'°5° 8'°5 6 112 234 45 6
66 8·062 8'068 8'075 8'081 8'087 8'°93 8'099 8'106 8'112 g'I18 I I 2 234 45 6
66 8' 124 8'13° 8'136 8'14 2 8'149 8'155 8'161 8'167 8'173 8'179 1 1.2 234 455
67 8' 185 8'191 8'198 8'20" 8'210 8'216 8'222 8'228 8'234 8'24° I I 2 234 455
68 8'246 8'25 2 8'258 8'264 8'27° 8'27 6 8' 28 3 8'289 8'295 ·8'3°1 I 1 2 234 455
69 8'307 8'3 13 8'319 8'3 25 8'33 1 8'337 8'343 8'349 8'355 8'3 61 I I 2 234 455
70 8'367 8'373 8'379 8'385 8'390 8'396 8'402 8'408 8'414 8'420 I I 2 234 455
'11 8'4 26 8'43 2 8'438 8'444 8'450 8'45 6 8'462 8'468 8'473 8'479 I I 2 234 455
'11 8'48 5 8'491 8'497 8'503 8'509 8'S 1 5 8'5 21 8'526 8'53 2 8'538 I I 2 233 455
'13 8'544 8'550 8'55 6 8'5 62 8'5 67 8'573 8'579 8'58 5 8'59 1 8'597 I I 2 233 455
74 8·602 8·608 8·614 8·620 8·626 8.631 8'637 8,643 8'649 8·654 11 2 233 455
76 8·660 8·666 8.672 8·678 8'683 8·689 8'695 8'7°1 8'706 8'712 112 233 455
76 8'7 18 8'7 24 8'729 8'735 8'74 1 8'746 8'75 2 8'75 8 8'7 64 8'7 69 I 1 2 233 455
7'1 8'775 8'781 8'786 8'79 2 8'798 8·8°3 8'809 8·8r 5 8,820 8·826 112 233 445
78 8.832 8·837 8·843 8·849 8'854 8·860 8·866 8'87 1 8·877 8·883 1 I 2 233 445
79 8·888 8'894 8·899 8'905 8'911 8'9 16 8'922 8'927 8'933 8'939 I 1 2 233 445
80 8'944 8'95° ,8'955 8'961 8'967 8'972 8'978 8'983 8'989 8'994 11 2 233 445
81 9'000 9·006 9'011 9'01 7 9'022 9'028 9'°33 9'°39 9'°44 9'°5° I 1 2 233 445
82 9'°55 9'061 9,066 9'°7 2 9'°77 9'o8~ 9.088 9'094 9'099 9'10~ I 1 2 233 445
83 9'110 9'116 9'121 9'127 9'13 2 9'13 9'143 9'149 9'154 9'160 I I 2 233 445
64 9' 16 5 9'171 9'176 9'182 9'187 9'19 2 9' l gB 9'2°3 9'209 9' 21 4 1 1 2 233 445
85 9'220 9'225 9'23° 9'236 9'24 1 9'247 9'252 9'257 9'263 9'268 112 233 445
88 9'274 9'279 9' 28 4 9'290 9'295 9'3°1 9'306 9'3 11 9'3 17 9'3 22 1 1 2 233 445
87 9'3 27 9'333 9'33 8 9'343 9'349 9'354 9'359 9'3 65 9'37° 9'375 I 1 2 233 4405
88 9'3 81 9'386 9'39 1 9'397 9'4°2 9'4°7 9'4 13 9'4 18 9'4 23 9'4 29 1 1 2 233 445
89 9'434 9'439 9'445 9'45° 9'455 9'460 9'466 9'471 9'476 9'482 1 1 2 233 445
90 9'487 9'492 9'497 9'5°3 9'508 9'5 13 9'5 18 9'5 24 9'5 29 9'534 112 233 445
91 9'539 9'545 9'55° 9'555 9'5 60 9'5 66 9'57 1 9'57 6 9'5 81 9'586 1 1 2 233 445
92 9'592 9'597 9.602 9.607 9.612 9'618 9.623 9,628 9.633 9.638 I 1 2 233 445
93 9·644 9.649 9·654 9.659 9·664 9.670 9.675 9.680 9.685 9'690 I 1 2 233 445
H 9'695 9'701 9'706 9'711 9'716 9'721 9'726 9'73 1 9'737 9'7~2 11 2 233 445
96 9'747 9'75 2 9'757 9'762 9'767 9'77 2 9'778 9'78 3 9'788 9'793 1 1 2 233 445
96 9'798 9'8~3- 9·808 9.81 3 9.818 9.823 9.829 9.83.... 9.839 9.844 1 1 2 233 445
97 9·849 9·854 9.859 9.864 9.869 9.874 9.879 9.814 9.889 9·894 I I 1 233 445
98 9.899 9'905 9'9 10 9'9 1 5 9'9 z0 9'9 z5 9'930 9'935 9'940 9'945 01 1 223 344
99 9'95° 9'955 9'960 9'965 9'97° 9'975 9'g80 9'985 9'990 9'995 01 1 223 344
FUNlJAMBNTALS OF STATlSTlC~
Mean DlJferences
-_ -- -
0
1,0 1'000
1
~
2 3 4
_-
5
1-
6
-
7
1-1 1'909 1 9009 89 29 8850 877 2 8696 862( 8547 8475 84()o3
1-2 ,8333 8264 81 97 81 30 8065 8000 7937 7874 781 3 775 2
1,3 '7 692 7634 7576 75 19 7463 7407 7299 7246 7 194
1-4 '7 143 709 2 7042 6993 6944 6897 7W
6 49 6803 6757 6711 5 [0 14 19 24 29 3338 43
1015 ,6667 662 3 6579 653 6 6494 6452 64 10 6369 63 29 628 9 4 813 17 212 5 293338
1'6 '02 50 6211 61 73 61 35 6og8 6061 6024 5988 595 2 '59 17 4 7 II 15 1822 262933
1,' '5882 5848 581 4 57 80 5747 57 14 5682 5650 561 8 5587 3 610 13 1620 23 262 9
1'8 '555 6 55 25 5495 5464 5435 540 5 537 6 5348 53 19 52 91 3 6 9 n IS 17 202 3 26
1-9 '5 263 5236 5208 5181 5155 5128 5102 5076 5051 502 5 358 II 13 16 1821 24
2,0 '5000 4975 4950 49 26 4902 4878 4854 4831 4808 47 85 257 1012 14 17 19 21
2'1 '4762 4739 47 1 7 4695 4673 4651 46 30 4608 45!i7 4566 2 4 7 9 tI 13 1517 20
2,2 1'4545 45 25 45 05 44 84 44 64 4444 44 25 4405 43 86 43 67 2 4 6 8lt>I2 14 1618
2'31'4348 43 29 43 10 4 292 4274 4255 4237 4219 4202 4184 2 4 5 7 9 II 13 1416
2,4 '4167 4 149 41 32 4 1I 5 4098 4082 406 5 4049 4032 4016 2 3 5 7 810 121 3 15
2,51'4000 3984 3968 3953 3937 39 22 3906 3891 3876 3861 2 3 5 6 8 9 II 12 14
26 '3846 383 1 381 7 3802 3788 3774 3759 3745 373 1 37.17 I 3 4 6 7 8 10 I I 13
2" '3704
2,8 '3571
3690 3676 3663
3559 3546 3534
13
650
35 21
3636 362 3
3509 3497
3610
3484
3597
347 2
3584
3460
1 3 4
I 2 4
5 7 8 9 II 12
5 6 7 9 IO II
2'9 '3448 3436 34 25 34 13 3401 3390 337 8 3367 3356 3344 1 2 3 5 6 7 8 9 IO
3,0 '3333 33 22 33II 3300 3289 3279 3268 3257 3247 3236 I 2 3 4 5 6 7 9 10
3'1 '3 226 321 5 3205 3 195 318 5 3175 3 165 3155 3145 3135 I 2 3 4 5 6 7 8 9
3,2 '3 12 5 3I l 5 3106 3096 3086 3077 3067 3058 3049 3040 I 2 3 4 5 6 7 8 9
3,3 '3 030 302 1 301 2 300 3 2994 2985 297 6 29 67 2959 295 0 I 2 3 4 4 5 67 8
3,4 '294 1 2933 29 24 29 15 2907 2899 28 90 2882 28 74 286 5 I 2 3 3 ~- 5 678
3,6 , ' 28 57 28 49 28 41 28 33 282 5 281'1 2809 2801 2793 2786 I 2 :2 3 4 5 6 6 7
3,6 '2778 2770 2762 2755 2747 2740 273 2 27 25 27 17 2710 I 2 2 3 4 5 5 6 7
3" '2703 2695 2688 2681 26 74 2667 2660 26 53 2646 2639 I 1 2 3 4 4 5 6 6
3'8 ' 26 32 262 5 2618 26Il 2604 2597 259 1 2584 2577 2571 1 1 2 3 3 4 S 5 6
3'9 '2564 2558 2551 2545 253 8 253 2 2525 25 19 25 13 2506 I I :2 3 3 4 4 5 6
4,0 '2500 2494 2488 2481 2475 24 69 2463 2457 245 1 2445 I 1 :2 2 3 4 4 5 5
4,1 '2439 2433 242712421 24 15 2410 2404 2398 2392 23117 I I :2 2 3 3 4 5 5
4'2 '23 81 2375 2370 23 64' 235 8 2353 2347 2342 233 6 233 1 ..I I :2 2 3 3 4 4 5
4,3 '2326 23 20 23 15 z309 23 04 2299 2294 2288 228 3 2278 I I 2 2 3 3 4 4 5
4'4 '2273 2268 2262'2257 225 2 2247 2242 2237 223 2 2227 I I :2 2 3 3 4 4 5
4,5 '2222 221 7 2212 2208 2203 2198 21 93 2188 218 3 2179 0 1 I 2 2, 3 ~a "t ~
4,6 ' 21 74 2169 216 5 n60 21 55 2151 21 46 21 41 213'r 21 32 o I 1 2 2 3 3 4 4
4" '2128 212 3 211 9 211 4 2110 2105 2101 2096 2092 2088 o I I 2 2 3 3 4 4
4'8 ' 208 3 2079 2075 20~0 2q66 2062 2058 2a53 2049 2045 0 1 I 2 2 3 3 3 4
4,9 '204 1 2037 2033 2028 202 4 2020 2016 2012 2008 2004 0 I I 222 3 3 4
5,0 '2000 1996 1199 2 1988 1984 1980 1976 1972- 1969 1965 0 I I 222 3 3 4
5'1 '1961 1957\1953 1949 1946 1942 1938 1934 193 1 1927 0 I I 2 2 2 3 3 3
6'2 '19 23 19 19 19 16 19 12 1908 1905 190- 1898 1894 1890 0 1 ! 1 2 2 3 3 3
6'3 '1887 188i!1880 1876 1873 18~11866 1862 18 59 18 55 0 1 1 122 2 3 3
5,4 ' 18 52 1848 1845 1842 18 38 1835 18 32 1828 182 5 1821 0 I 1 I :2 2 2 3 3
14ATH!!'~I.c'nCAI, TABLES
Mean DflIenmc:es
0 1 2 3 4 I) 6 7 8 9
123 456 789
I- -- --
6,6 '1818 181S 18u 1808 180S 1802 1799 179S 1792 1789
-- all
-- I-
122 233
5,6 '1786 1783 1779 1776 1773 1770 1767 1764 1761 1757 a I I 122 233
5,7 '1754 175 1 1748 1745 1742 1739 1736 1733 1730 17~7 o I I I 12 223
5,8 '17 24 17 21 1718 17 15 1712 1709 1706 li04 1701 1698 a l l I I 2 223
5,9 '1695 1692 1689 1686 1684 1681 1678 1675 1672 1669 01 1 112 223
6,0 '1667 1664 1661 1658 1656 1653 1650 1647 164S 1642 01 1 I 1 2 223
6-1 . ' 1639 1637 1634 163 1 1629 1626 162 3 1621 1618 1616 o I I I 12 222
6·2 ' 161 3 1610 1608 1605 1603 1600 1597 1595 1592 1590 Oil 112 222
6,3 '15 87 1585 1582 1580 1577 1575 157 2 1570 1567 1565 001 I 1 1 222
6,4 '15 62 15 60 m8 1555 15,53 1550 1548 1546 1543 1541 001 I I I 2.22
6,6 '153 8 1536 1534 153 1 15 29 152 7 15 24 15 22 15 20 15 17 001 I I I 222
6,6 'ISIS 15 13 ISH 1508 1506 1504 1502 1499 1497 1495 001 1 I I 222
6'7 '1493 1490 1488 1486 1484 1481 1479 1477 1475 1473 001 1 1 1 222
6,8 '1471 1468 1466- 1464 1462 1460 1458 145 6 1453 145 1 001 I I I 222
6·9 '1449 1447 1445 1443 1441 1439 1437 1435 1433 143 1 001 I I I 222
7·0 '14 29 1427 1425 1422 1420 14 1 8 1416 1414 1412 14 10 001 I I I 122
H '14 08 1406 1404 1403 1401 1399 1397 1395 1393 139 1 001 1 1 1 122
7,2 '13 89 138 7 1385 1383 1381 1379 1377 1376 1374 137 2 001 I I I 122
"7·3'1370 1368 1366 1364 1362 13 61 1359 1357 1355 1353 001 I I I 122
7·4 '135 1 1350 1348 1346 1344 1342 1340 1339 .1337 1335 001 I 1 I I I 2
7,5 '1333 1332 1330 1328 1326 132 5 132 3 1321 1319 1318 001 I I I 112
7,6'13 16 13 14 13 12 13 11 1309 130 7 130 5 1304 1302 1300 001 I I I 1 I 2
7,7 '1299 12 97 1295 1294 1292 12go 128 9 128 7 1285 1284 000 I 1 1 1 1 I
7,8 '1282 1280 1279 1277 1276 1274 127 2 1271 1269 1267 000 1 1 1 1 I 1
7·9'1266 .1264 1263 1261 1259 12 58 12 56 u55 1253 1252 000 1 I 1 1 1 I
8,0 '1250 1248 1247 1245 1244 1242 1241 1239 1238 1236 000 1 1 1 I I I
8'1 ' 12 35 12 33 12 32 1230 1229 1227 122 5 1224 1222 1221 000 I I 1 I I I
8,2 '1220 1218 1217 121 5 1214 1212 1211 1209 1208 1206 000 1 I 1 I 1 1
8,3 ' 1205 1203 1202 1200 1199 1198 119 6 119S I! 93 1192 000 1 1 1 I I I
8,4 , II go 1189 1188 1186 1185 1183 1182 !l81 1179 1178 000 I 1 1, I 1 1
8,5 '117 6 1175 1174 1172 1171 1170 u68 1167 1166 1104 000 J I I I 1 1
8·6 ' 116 3 1161 1160 1159 1157 1156 1155 1153 1I52 lIS' 000 1 1 I I t ,
8,7 '1149 1148 1I47 1I45 1I44 1143 U4 2 1140 1139 1138 000 1 1 1 I 1 1
8·8 '1136 Il35 1134 1133 1131 1130 1129 1127 1126 1125 000 1 1 1 1 J I
8·9 '1124 1122 lUI 1120 1119 1,117 Ill6 IllS lll4 1112 000 I I I I I 1
9,0 '1111 1110 1109 1107 1106 II 05 11 0 4 1103 IIOI 1100 000 1 I I I I I
9·1 '1099 1098 1096 1095 1094 1093 1092 logo 1089 1088 000 011 I I I
9'l! '1087 1086 1085 1083 1082 1081 1080 1079 1078 1076 000 o I I I 1 I
9,3 ' 1075 1074 1073 1072 1071 1070 1068 1067 1066 1065 000 o I I I 1 I
9·4 '1064 1063 1062 1060 1059 •1058 1057 1056 1055 1054 000 o I I I I I
9,5 '1053 1052 1050 1049 1048 1047 1046 1045 1044 1043 000 01 I r 1 I
9,6 '104 2 1041 1039 1038 1037 1036 1035 1034 1033 1032 000 o I I 1 1 I
9,7 ' 103 1 1030 102911028 102 7 1026 102 5 102 4 1022 1021 000 01 1 1 I I
9·8 '1020 1019 1018 1017 1016 1015 1014 101 3 1012 1011 000 o I I I I I
9'9 /'1010 1009 1008 1007 1006 1005 1004 1003 1002 1001 000 001 I I I
10c6 FUNOAUlLNTALS OP STA'I'IS'I1(;S
TABLE OF XI AND
...
X 'CC'"
Degrees of
freedom PROBABILITY PR9BABILITY
.05 .01 .05 .01