You are on page 1of 1027

FUNDAMENT ALS OF

STATISTICS

BY

D. N. ELHANCE, M. COM.,
H,ad qf thl D,partm,nt and Dean of 1M Faculty of Commerce.
Uniuersity of Jodhpur,
JDdhpur.

KITAB MAHAL ALLAHABAD


1972
B.~. No. 64.
Indian Veterinary Research Institute
Library.
MUKTESWAR.
Class,

Register No.98:2_ 'Room No. 31 a


Inward No. Shelf No. E:.L-H
Received Book~o.
~/'~; ;' 10; Ii,' .
MGIPC-S4-28 VRjo2-12·2 '63-1,00 J.
First Idition,_ 1957
Second Edttio~ 1958
Third Edition, 1960
Fourth Edition, 1962
Fifth Edition, 1964
Sixth Edition. 1965
Seventh Edition. 1966
Eighth Edition, 1967
Ninth Edition, 1968
Tenth Edition, 1969
Eleventh Edhion, 1970
Tyveltth Ed~itp-?'f 1971
Thirteenth Edition: 197~

,-
Printed by: Eagle Offset Printers. 15. Thornhill Road, Allahabad
Published by: Kitab Mahal, 15. Thornhill Road. Allahabad
IN MEMORY

OF

MY FATHER
PREFACE TO THE THIRTBB'lTfI EDlrfO~
A new edition of the now famous book on Statistics has come out
maintaining its old traditions intact but with new approaches all round
to register and record the various changing aspects.
Calculations have been re-calculated in order to eliminate any
slightest variations which may haye crept in during the past years-
Change to metric system has also been completed.
In its present fOJ;m the utility of the book has increased consider.
ably, and University students as well as administrators will find suffi.
cient material for their guidance and assistance.
The author will feel grateful to the discriminating student com-
munity and the general users of the book for their indulgence in pinpoin-
ting any error.
D. N. ELHANCB
PREFACE TO THE SIXTH EDITION
The present edition of this book has many new features. Two
new chapters-Designs of Experiments and Statistical Q:lality Control-
have been added in this volume. The chapter on Growth of Statistics
in India has been made uptodate.and latest figures have been substituted
for old ones.
Some chapters of this book have been reVised and new points have
been included in them. A large number of fresh questions have been
added at the end of each chapter to make the book more useful to exa-
minees. The entire portion of Indian Statistics has been brought ·upto-
ate.
I hope the present volume would be found useful by the students
of the subject. J am grateful to a number of students and friends" ho
gave me valuable sug,gestions for the improvement of the book and
1 am confident that they would continue to do so in future also.
D. N. ELHANCB
PREFACE TO THE SECOND EDITION
From the various reviews which appeared in a large number of
journals and papers, I conclude that the first edition of this book was
very well received. In the present edition I have rearranged certain
chapters and made the chapter on Growth of Statistics in India upto-
date. Besides, I have included a large number of new questions at the
end of each chapter.
The book is now divided in two volumes. V.olume I covers tbe
eatire B. Com., B.A. and B.Sc. course of statistics of all the universi-
ties of India and Pakistan. Volume II contains chapters on Probability,
1 heoretical Frequency Distributions and Sampling. Tbe two volumes
ale available separately as well as in a combined form.
I am grateful to a la-rge number of friends who have given me
valuable s.uggestions for the improvement of the book. 1 hope the
students of the subject would lind the book more useful than before.
151h April. 1958. D. N. ELHANCB
PREFACE TO THE FIRST EDITION
The science of statistics has assumed great importance in recent
years. It was once known as the "Science of Kings" and its scope
was extremely limited, but today the science of statistics has become
an all-important science, without which no other science can progress.
Modern age is the age of statistics and it is very correctly said that the
extent of the economic development of a country can best be known
by finding out the extent to which statistical organisation has developed
there. Till recent! y the foreign government of our country and even
our countrymen were very indifferent towards statistics. After inde-
·pendence of the country the era of economic planning started and along
with it the importanc of statistics increased considerably, In fact
economic planning cannot be imagined in the absence of statistical
data.
o It is a matter of great satisfaction that the impottance of statistics
Is gradually being realised in our country and they are occupying the
place of honour which they should have got much earlier. Statistics
is now taught in almost all the universities of the country and there are
a number of statistical institutes which impart special trainihg in ~his
subject. This book is an attempt to furnish a simple, non-mathemat1cal
text for those who desire to equip themselves with a knowledge of the
elementary statistical methods used in modern times. The treatment
of this subject has been as far as possible of a non-mathematical character
because most of the students who study this subject do not always have
a mathematical background. This book has been written primarily
for use of M.A., M.Com., and B.Com. students who study this subject.
The book covers the entire course which is prescribed for the statistics
paper in these examinations in various universities of the country as
also the courses prescribed in LA.S. and P.C.S. examinations of the
paper. A large -pumber of questions have been given at the end of
each chaptet with a view to help the students in solving numetical pt'o-
blems and thus familiarising themselves with different types of formula~
used in statistical analysis.
I am grateful to my colleagues in the Faculty of Commerce, Alla~
habad University, who have given me some ver} valuable suggestions,
Thanks are also due to Mr. S.V. Erasmus, my secretary, who worked
almost like a machine for all the days during which this book was
written. Kitab Mahal, my publishers deserve congratulations for the
nice printing and get-up of the book.
IJI December, 1956. D. N. ELHANCE
CONTENTS
CHAPTER Page
)'r Meaning and Definition of Statistics 1
2. Origin and Growth of Statistics 8
/ Importance, Limitations and Functions of Statistics 16
4. Preliminaries to the Collection of Data 33
5. Collection of Primary and Secondary Data 41
6. Accuracy, Approximation and Errors 53
....:w--- Classification, Seriation and Tabulation 63
8. Ratios, Percentages and Logarithms 80
C Measures of Central Tendency 87
%. Measures of Dispersion 178
11. Moments, Skewness and Kurtosis 236
12. Index Numbers 250
~. Diagrammatic Representation of Data 300
..)..4. Graphic Representation of Data 347
15. Analysis of Time Series 405
1Jj. Correlation 454
17. Regression and Ratio of Variation 508
18. Theory of Attributes and Consistence of Data 528
19. Association of Attributes 546
20. Interpolation 577
21. Business Forecasting ... 610
/22 Interpretation of Data 619
,.23: Probability 629
24. Theoretical Frequency Distributiolls 654
25. Theory of Sampling 676
26. Sampling of Attributes 689
27. Sampling of Variables (Large Samples) 706
28. Chi-square Test and Goodness of Fit 736
29. 'Sampling of Variables (Small Samples) 757
30. Analysis of Variance 783
31. Designs of Experiments 796
32. Statistical Quality Control 802
33. Growth of Statistics in India 814
34. Mathematical Tables 994
DET A ILED CONTENTS

Chapter x-Meaning and Definition 'of Statistics- Pages.


Meaning; Definition; Main divisions of the study of
Statistics; Objects of Statistics; Questions. 1-7
Cbapter 2-0rigin and Growth of Stad.tics-Early
beginnings; 16th to zoth Centuries; Relationship with
ober Sciences; Statistics and Economics; Statistics and
Mathematics; Statistics and Astronomy; Statistics and
Bioogy, etc., Questions. 8-51
Chapter 3-Importance, Limitations and Functions of
Statistics-Statistics and the common man; Causes of
importance; IndisJ.'ensability of Statistics; Limitation;
of the Science of Statistics; Distrust of Statistics
Functions of Statistics and Sta~isticians; Questions. 1 6-32
Chapter 4-Preliminaries to the Collection of Data-
Object and Scope of enquiry; Sources of information;
Type of enquiry I Statistical Units, Degree of accuracy,
Questions. 33-40
chapter ,-Collection of Primary and Secondary Data
-Primary and Secondary data ; Choice of Mthods J
Method of Collecting Primary data I Representative
Data J Random Sampling J Collection of Secondary data I
Scrutiny of Secondary data J Questions. 41-5:1
Chapter 6-Accuracy, Approximation and Errors-
Editing of data, Accuracy , Approximation , Statistical
Errors; Questions. 53-62
Chapter 7-Classification, Seriation and Tabulation-
Classifitation : Need and meaning; Characteristics-
Classification according to attributes J Classification
according to class-intervals.
Serialio,., Definition J Time Series J Spatial Series J Con-
dition Sel.~s, Discrete, Continuous, Simple and Cumu-
lative Series.
Tablliation : Types of tabulation, Rules of tabulation;
Questions. 63 -79
Chapter 8-Ratios, Percentages and Logarithms-Need,
De~ivatives , Fallacies i~ the US~ of percentages and
ratIos, Some ~<.>pular RatJos used In Population Studies;
General Fert1h~y Rate; Gross a~d Net Reproduction
Rates.j LogarIthms I Computatlons by logarithms ;
QuestIons. 80- 96
DETAILED CONTENTS

Chapter 9-Measures of Central Tendency-Need and


Meaning; Objects; Characteristics of representative
average, Measures of various orders; Types of averages.
Arithmetic Average: Calculation of arithmetic average
in a discrete series; Calculation of the arithmetic average
in a continuous series; Charlier's Accuracy Check,
Algebraic properties of arithmetic average J Meri ts
and Drawbacks.
l'J.edian: Meaning; Location of Median in various
types of series; Graphic calculation , Merits and Draw-
backs, comparison with mean.
Qllartiles, Deciles and Percentiles, etc., : Location in
various types of series; Graphic calculation; Charac-
teristics.
Mode : Meaniog J Location of mode in various types
of Series; Determination by curve fitting; Determi-
nation of mode from mean and median ; Graphic Me-
thod ; Merits and Drawbacks; Comparison with mean;
aod Median.
Geometric Mea" : Meaning; Calculation in various types
of series ; Algebraic properties of geometric. mean,
Merits aod Drawbacks.
Harmonic Mean: Meaning; Calculation; Reciprocal
character of' arithmetic average and harmonic mean
Merits and Drawbacks.
Other Averages: Quadratic mean, Moving average;
Progressive average; Relation between different averages;
Selection of an average, Limitations of averages.
Weighted Average : Need and meaning, Calculation of
.weighted arithmetic average by direct and short-cut
methods; When should weighted mean be used,
Weighted geometric and harmonic means J Questions. 97-177
Chapter lo-Measures 6f Dispersion-Need and meaning;
Measures of dispersion.
Range: Its merits, demerits and uses.
Inter-Quartile and Semi-Inter Quartile Range: Calcula-
tion in various types of series; Merits aod Drawbacks.
Mean Deviation : Meaning ; Calculation in various types
of series by direct and short-cut methods; Charac-
teristics aod uses of mean deviation.
Standard Deviation : Meaning ; Calculation in various
types of series by direct and short-cut methods; Charlier's
check of accuracy; Sheppard's corrections for grouping;
Standard Deviation and the spread of Observations;
Mathematical properties of standard deviation , Merits,
demerits and uses.
OHTAILED CONTENTS :xj

Other Measures of Dispersion: Modulus, Precision;-


Probable Error; Variance I Co-efficient of Variations I
Ginni's Mean Difference.
Relationship between various measures of dispersion;
Choice of a measure of disperslon; Lorenz curve;
Questions. 178--235
Chapter II-Moments, Skewness and Kurtosis-
Moments : Meaning, Calculation of various moments
about the mean I f3 and " co-efficients.
Skewness : Need and meaning; Tests of skewness;
Measures of skewness I First and Second measures of
skewness; Positive and Negative skewness.
Kurtos~s : Meaning and calculation; Dispersion, Skew-
nesS and Kurtosis contrasted, Questions. 236-249
Chapter u-Index Numbers-Need and Meaning.
Wholesale Price Index Numbers: Technique of construc-
tion; Selection of items and obtaining quotations;
Selection of Base; Fixed and Chain Base, Price relatives
and link relatives I Problem of weighting.
Cost of Ulling Index NURJbers: Need; Difficulties in
construction; Aggregate expenditure method; Family
budget method, Sources of errors io cost of living index
numbers.
Indices of Industrial Production : Need and technique
of construction.
Indices of Business Conditions: .Need and technique
of construction.
Relationship between fixed base aod chain base index
numbers ; Base shifting ; Splicing of index numbers ;
Deflating of index numbers ; Reversibility Tests; Time
Reversal Test I Factor Reversal Test; Circular Test;
Problem'of an Ideal Index Number I Various Formulae
used in construction of index numbers J Uses and Limi-
tations of Index Numbers ; Questions. 250-299
Chapter 13-Diagrammatic Representation of Data-
Need and usefulness; Characteristics of and rules for
drawing diagrams, various types of diagrams-Simple,
multiple and sub-divided bars; Rectangles, SquaresJ
Circles ; Cubes; Pictograms ; Cartograms ; Questions. 300-346
Chapter 14- Graphic Representation of Data---Construc-
tion of graphs.
Graphs of Time Series: Absolute HistorigraOlS- ; False
Base line J Mhod of showing Range-Zone Graph I
Mebod of showing differences ; Ba_nd Graphs ; Zec-
chart.
xU OBTAILlm CONTEN'l'S

Graphs of Freq/lenry Distriblltions : Bar Frequenc}


curves ; Discontinuous curves; Continuous curves.
Theorrtical Frequenry Curves: Normal curve of error ;
Moderately asymmetrical frequency curves; Extremely
skew curves.
Cumulative Frequency Curves: Less than and more
than curves; Galton's method of locating median.
Graphs on Ratio Scale: Semi-logarithmic curves;
Reading graphs on ratio scale ; Special features of ratio
scale.
Unear Relationships.
Non-Linear Relationships : Parabolic and Hyperbolic
curves; Exponential Curves; Questions. 347-404
Chapter Is-Analysis of Time Series-Meaning and need;
Secular Trend; Seasonal Variations; Cyclical Fluctuations;
Irregular Fluctuations.
Measurement of Trend: Curve fitting by inspection;
Moving average methodJ curve fitting by mathema-
tical equations; Method of Least Squares; Fitting
curve of the power series ; Parabolic curve.
Measurefnmt of Seasonal Flf(ctuations Seasonaf Varia-
tion Index (by monthly averages) ; Seasonal Variation
Index (by moving averages); Method of Link Relatives.
Measurement of Cy&lical and Irregular Fluctuations:
Questions. 405-453
Chapter I6--Correlation-Meaning; Scatter diagram; Cor-
relation graph.
Coeffident of Correlation : Karl Pearson's fo.rmula
and its proof ; Calculation of the Coefficient in various
types of series by direct and short-cut methods ; Cor-
relation in time series-:-long-time changes, Short-time
oscillations and cyclical fluctuations ; Correlation in
grouped data; Probable Error of the Coefficient of
Correlation in Interpretation of Correlation ; Correlation
and method of Least Squares; Rank Correlation ; Co-
efficIent of Concurrent Deviations ; Correlation Table
Lag and Lead Correlation and Determination; Questions. 454-507
Chapter 17- Regression and Ratio of Variation-Meaning
and use; Regression equations; Regression Lines;
Regression Coefficient; Ratio of Variation; Galton's
Graph and its interpretation, Questions. 508-527
Chapter IS-Theory of Attributes and Consistence of
Data-Meaning ; Classification of Data J Rules for
testing consistence of Data; Incomplete Data,
Questions. 528-545
DE.TAILED CONTETS Kin
Chapter I9-Association of Attributes-Expected values I
Criterion of Independence; Association; Complete
association and Qhassociation J Intensity of association ;
Chance association; Coefficient of association; Coeffi-
cient of Collignation ; Partial association; Illusory asso-
ciation; Manifold Classification; Association in Contin-
gency tables ; Coefficient of contingency; Tschuprow's
Coefficient; Questions. 546-576
Chapter 2o-Interpolation-Meaning and need; Assump-
tions J Methods of interpolation.
Graphk Methods: In continuous time series ; in series
showing periodicity in correlated series,
Algebraic Methods: Methods of curve fitting; Methods
of .finhe differences; Newton's Formulae; Newton-
Gauss Formula; Sterling Formula; Newton-Gauss
(Backward) Formula; Direct Binomial Expansion
method; Lagrange's Formula; Questions, 577-609
Chapter 2I-Business Forecasting.-Meaning and Need ;
Basis; Technique, Business Barometers; General Assump-
tions , Theories of Business Forecasting ; Tim<:-lag or
Sequence Theory; Action and Reaction Theory I
Specific Historical Analogy Theory ; Cross Cut Analysis
Theory; Utility of Business Forecasting; Questions. 610-618
Chapter u-Interpretation of Data-Meaning and Need I
False generalizations ; Wrong interpretation of statis-
tical measures like Index Numbers, Correlation and
Association etc., Effect of wrong interpretations of
data; Questions. 619-628
Chapter 23-Probability-Permutation and combination I
Calculation of probability I Simple events I Compound
events I Multiplication of probabilities, Addition of
probabilities; Use of binomlal theorem, Mathematical
Expectation; Inverse probability J Questions. 629-653
Chapter 24-Theoretical Frequency Distributions-
Binomial Distribution: Meaning; Characteristics J
Binomial expansion, General form of binomial dis-
tribution , Comparison of actual and expected frequen-
cies ; Mean and standard deviation of binomial distri-
bution.
Normal Distribution: Meaning J Properties of a nor-
mal curve, Equation of normal curve; Basic assump-
tions of normal curve.
Poisson Distribution : Meaning J Equation of Poisson
distribution. Assumptions in Poisson distribution J
Questions. 654-675
xlv DETAILED CON'l!EN'l'S

Chapter 2,-Theory of Sampling-Meaning and use,


Types of Universes; Objects of Sampling J Precision
in Sampling; Types of Sampling J Bias in Sampling.
Selecting Random Sample: Lottery method J Serial
Geographical or Alphabetical arrangement, Random
numbers.·
Selecting Purposive and Mixed Samples: conducting a
sample enquiry; Questions. 676-688
Chapter 2li-Sampling of Attributes-Simple sampling;
Mean and Standard Deviation in Simple Sampling of
attributes; Standard Errors; Standard Error and Size of
Sample; Standard Error and Precision ; Standard Error
of the difference between proportions ·of two Samples J
Questions. 689-705
Chapter 27-Sampling of Variables (Large Samples)-
Nature of the problem; Sampling Distribution I Standard
Error of Simple Sampling of Variables J Standard Error
of the Mean • Standard Error of the Median, Quartiles,
Deciles, etc.; Standard Error of Mean Deviation, Stan-
dard Deviation, Quartile Deviation; Variance Coeffi-
cient of Variation and Co-efficient of Skewness; Stan-
dard Error of the Co-efficient of Correlation, Regression
and Association.
Standard Error of the Difference of Sample Means:
Sample Medians and Sample Standard Deviations;
Questions. 706-73 S
Cbapter 28-Chi-Square Test and Goodness of Fit-
Meaning; Degrees of Freedom; Levels of Signifi~a~ce,
Formulae of Calculation; Expected Values; Condltlons
for the application of Chi-square Test; Additive property
of X 2 ; Chi-square Distribution; Goodness of Fit;
Questions. 736-756
Chapter 29-Sampling of Variables (Small Samples)-
Need of Separate Analysis; Tests of Significance, Null
Hypothesis; Significance of a Sample Mean; Form of
"t" distribution.
Significance of Difference between Two Sample Means :
Significance of the Coefficient of Correlation; Z-trans-
formation; Significance of the difference between two;
Sample Coefficients of Correlation; Questions. 757-782
Chapter 3o-Analysis of Variance-Meaning and Use;
Total Variation; Variance between the samples:
Variance within the Samples; Formula of Calculation;
Short-cut mc:thod; Table Values of F: Questions. 783-795
DETAILF.D CONTEN111 xV

Chapter 3I-Designs of Experiments-Meanlng and need


Experimental Designs; Comparison in pairs; Latin
Squares; Factorial Designs; Questions. 796-801
Chapter 3z-Statistical Quality Control-Meaning; Pur-
pose.
Process Control: Evolution, Control Chart Technique;
Chart of Industrial observance; Chart of averages;
Chart of Range, Control Chart and analysis of variance,
Control Chart for defects per unit; Selection of Control
limits.
Acceptance Inspection: Meaning and Technique; Sam-
pling Technique; Sampling Plans; Single, Double and
sequential Sampling; Average Sampling; OC curve;
ConclusionJ Questions. 802-813
Chapter 33-Growth of Statistics in India-
Section I : Statistical Organization : Early beginnings; 18th
Century; 19th Century; 2.oth Century; Present Posi-
tion. Improvement in Methodology, Scope and
Coverage.
Section 2. : Population Statistics : Census procedure upto
1931 ; Changes in 1941 ; Census of 1951 ; Information
Collected; General Criticism of Indian Population
Census; Census of 1961-some Suggestions.
Vital Statistics-Shortcomings-Suggestions.
Demographic Surveys.
Utility of Population Statistics.
Section 3 : Agricultural Statistics: Area Statistics; Tempora-
rily settled areas and Permanently settled areas; Yield
Statistics; Traditional Method; Random Sampling
Method ; Crop-estimates J Land Utilization Statistics;
Publication on Agricultural Statistics; General Short-
comings of Agricultural StatistJcs.
[ndices of Agricultural Production :Reserve Bank of India
Index; Eastern Economist Index; F. A. o. Index.
,Miscellanlolll -t4gricultural Statistics : Livestock Statis-
tics--Statistics of Holdings; Forest Statistics of Mines
and Minerals.
fettion 4: Industrial StatistiGS : Early Statistics; Present Posi-
tion; Annual Census of Manufactures; Statistics of
Industrial Output.
IndiceJ oj IndUS/rial Prodllctlon pnd Profits: Eastern
Economist Index ; Index issued by Ministry of Com-
merce and Industry; Capital Index of Industrial Ac-
tivity} Index Number of Industrial Profits. Employ-
XVI OF.TAILED CONTENT $

ment Statistics; Trade Union Statistics; Industrial


Dispute Statistics; Social Security and Labour Welfare
StatIstics.
S,t/ion S : Price Statist;&!: Harvest Prices; Other Prices.
Publications containing price statistics.
Pr;,e Index Numher s : Index Number of Harvest Prices}
Economic Adviser's Index of Wholesale Prices; Eco-
nomic Adviser's new (Revised) Index of Wholesale
Prices ; Calcutta Wholesale Price Index Number.
Labollr Bureau Index Number of Retail Prices. Con-
sumer Price Index Numbers, compiled by the Labour
Bureau Rnd various States; Bombay Working Class
Cost of Ijving Index, Kanpur Working Class Cost of
Living Index.
Indites of S~Cllrity Prices : Economic Adviser's Series l
Old Series of the Reserve Bank; New Series of the
Reserve Bank.
Sution 6 : Wage Statistics : Publications containing Wage
Statistics--Labour Bureau Index of Earnings of Fac-
tory Workers.
Agricultural Wages.
SIll/on 7 : Trade Statistics : Publications containing statistics
of Inland and Foreign (Sea, Air and Land) Trade of
India and their detailed study.
Sulio'J 8 : Financial Statistics: Publication containing financial
statistics and their study.
Sulion 9: National Income Statistics : Important Methods
of Calculation; Difficulties in the calcuhtion of India's
National Income; Technique suitable to Indian con-
dition J Esdmate of India's National Income; Special
Features of India's National Income.
Sution 10: National Sample Sf/rUBVJ: Beginning Method;
First round ; Subsequent rounds; Asses~ment of results
-Information collected.
SUIion
I I : PreJent POJilion : Shortcomings and Suggestion!;
Questions. 814-990
Mathematical Tables 994-1006
Meaning and Definition
of Statistics 1
----------------------------------~--~--.--
MEANING

"Statistics", in its modern J:onnotation, "is a body of metho.:ls for


making wise decisions in the face of uncertainty." (Wallis and Roberts)
It is used for the collection, analysis and interpretation of data in order
. to provide a basis for making correct decisions. This concept of
statistics is very different from the sense it originally used to denote.
As its name implies, the word Sfatistics was originally applied only
to such facts and figures as the State required for its official purposes.
The word has since acquired a wider meaning, so that it now embraces
any set of quantitative data relating to a particular phenomenon, irres-
pective of the fact whether the data are of interest to the State or not.
The same word is also used, not only for the material which is analysed,
but also for the methods applied in its analysis. Thus in recent times
the word statisti&S has come to be used in two senses: as numerical data
and as statistical methods.
As numerical data
In common parlance the word Statistics denotes some numerical
data. If, for example, somebody says that he has studied the statistics
of man"-hours lost by the Indian cotton mills due to strikes in the year
1955 or that he has seen the statistics of automobile accidents in 'the
U.S.A., he refers to the numerical figures or data relating to these phe-
nomena. In this sense, statistics are numerical descriptions of the quan-
titative aspects of things. They take the form of counts or measure-
ments. Statistics about the membership' of a certain hostel, for example,
include a count of the number of members and separate counts of the
number of members of various kinds. as postgraduate and undergra-
duate or over and under 21 years of age. They might include such
measurements as the weights and heights of the members qf the numbers
computed from such counts or m~asurements, for example. the pro-
portion of members who are married or the ratios between weights and
heights. The use of the word statistics in this sense is always in plural.
However, any figure or set of figures cannot be called statistics irres-
pective of any other consideration. Many things are taken into account
before using the word statistics for any group of figures. We shall
discuss these a little later.
The use of the word statistics in the above sense is, in our opinion,
oot very correct. A more appropriate 'Word to indicate numerical
facts is dala and as far as possible this word should'be used in place of
statistics in this sense.
2 FUNDA.J.mNTALS OP STATISTICS

Statistical methods
The second sense in which the word statistics is used refers to
the statistical principles and methods -used in collection, analysis and
interpretation of data. In this sense the word is used in singular.
Statistical methods (or statistics) have a very wide range. They include
. not only simple and conunonly known devices of comparison and
analysis. but also highly technical and mathematical formulae which
are capable of being understood only by experts who have received
special training in this subject.
SttllislictJl methods IJIIIl experimllltlJl IIIIthods. Statistical methods
include all those devices which are used in collection and simplification
of nUIllerical data so as to render them capable of being analysed, and
conunonly understood without much difficulty. ,Statistical methods are
different from experimental methods in as much as the latter are more
accurate and precise than the former. In experimental methods it is
possible for us to study the effects of anyone of the many factors affect-
ing a phenomenon individually by making the other factors inoperative
for the time being. Thus in physics it is not difficult to study the effects
of, say, only heat on the density of air by making other factors in-
operative for the duration of study. But the same thing is not possible
in statistical methods. It is not feasible to study the effects of, say,
only inflation on prices. The effects of inflation cannot be separately
studied from the effects of many other factors like demand, supply,
exports and imports, etc. However, by the use of statistical methods
it is possible to have a rough idea of the effects of inflation upon prices.
Statistical study c~aot be as accurate as the study done by experimental
methods. Thus we see that statistical methods are comparatively less
accurate and are usually applied in inexact sciences like sociology though
even in physical sciences (which are classed as exact sciences), the use
of these methods is sometimes necessary. Statistical methods are thus
of universal application though their primary field is social sciences.
Thus "Statistics are- numerical facts, but statistics is a body of
methods for making decisions when there is uncertainty arising from
the incompleteness or the unstability of the information available. The
decisions may be made either for the practical purpose of selecting
a course of action or for the scientific purpose of gaining genera]
knowledge."
DEPINITION
The term Stmsliu has been defined differeady by different au-
thors. Some authors have defined the word as used in the first sense
(of numerical d~ta) while others have .defined it as. 1l:sed in the second
sense (of statistical methods or the sCience of statistics).
Firat Type
Of the first type of definitions the one given by HortJce Secrist iJ
the most exhaustive. It is as follows -
MEANING AND DEPINITION OP STATISTICS 3
·'By statistics we mean aggregates of facts affected to a marked
extent by multiplicity of causes numerically expressed, enu-
merated or estimated according to reasonable standards of
accuracy, collected in a systematic manner for a pre-deter-
mined purpose and placed in relation to each other."
This definition makes it clear that statistics (as numerical data)
should possess the following characteristics : -
(i) They should b6 aggregates of facts. Single and unconnected
figures are not statistics. A single age of 25 years or 40 years is not
statistics but a series relating to the ages of a group of persons would
be called statistics. A single figure relating to birth, death, purchase,
sale, accident, etc., does not form statistics though aggregates of figures
relating to births, deaths, purchases, sales, a~cidents, etc., would be
called statistics because they can be, studied in relation to each other and
are capable of comparison. It is possible to study them in relation to
time, place and frequency of occurrence.
. (it) They should be affected to a marked extent by multiplicity oj cause-r.
Usually statistical facts are not traceable to a single cause. Since statis-
tics are m~st commonly used in social sciences it is only natural that
they are affected by a large variety of factors at the same time. It is
usually not possible to study the effects of anyone of these factors se-
parately as is the case in experimental methods. In statistical methods
the effects of various factors affecting a particular phenomenon are
generally studied in a combined form though attempts are also made
to study the effects of different sets of factors sepll-rately as well. Most
of the statistics, however, are affected to Ii considerable degree by mul-
tiple causation. For example, statistics of prices are affected by con-
ditions of supply, demand, exports, imports, currency circulation and
a large numbet of other factors.
(iit) They should b6 numerically 6Xpre.rS6J. Qualitative expressions
like good, bad, young, old, etc., do not form part of statistical studies
un1e~" a numerical equivalent is assigned to each such expression. If
it is said that the production of wheat per acre in 1953 was 100 maunds
and in the year 1954 it was only 60 maunds or if it is said that of two
perspns A and B, A is 20 years old and B 60 years old, we shall be mak-
ing statistical statements.
(iv) They should be enumerated or estimated according to reasonable
standards oj acctlraqy. Numerical statements can either be enumerated
in which case, they are supposed to be accurate and precise or they can
be estimated by some expert observers. Where the scope of statistical
enquiry is very wide or where the numbers are very large, enumeration
i~ usually out of question and in such cases :ligures can only he estimated.
It. is obvious that estimated :ligures cannot be absolutely accurate and I
pI(cise. The degree of accuracy expected in such :ligures depends to
a large extent on the purpose for which statistics are collected and also
4 JroN1).umNTALS 01' STA'rlS'nCS

on the nature of the particular problem about which data are being
collected. There cannot be a uniform standard of accuracy for all
types of enquiries. For-example. if the heights of a group of individuals
are being measured it"is all right i1' the measurements are correct to the
tenth of an inch but if we are measuring the dista.ri.cc from Bombay
to Calcutta, a difference of a few: furlongs even, can be easily ignored.
(v) They shOtlld bl coll,;leti in " syslll1JaliG l1Jamur. If figures are
collected in a haphazard fashion Ole can never be sure about the degree
of accuracy of such _data. It is, therefore, essential that statistics must
be collected in a'systematic manner so that they may conform Jo re-
asonable standards of accuracy.
(VI) ThU .rooflld be collettetifor a J>f'Itilllrmineti Pll1"poSI. It is obvious
that if statistical data are not collected with some predetermined aim
their usefulness would be almost negligible. Figures, are usually collect-
ed with some end in view, as without it all the efforts made in the collec-
tion of figures would be completely wasteful and the figures so collected
would not be in any way us<tuI.
(viI) The} should h, pfaffti in r,lation 10 ell&D Dlher. Statistics are
collected mostly for the purpose of comparison. If the collected figures
are not capable of being compared with each other they. lose a very
large part of their value. It is n..ecessary that the figures which' are
collected should be a homogeneous lot because it is not possible to
compare figures which are of a heterogeneous character and which
cannot be placed in relationship to each other. 1£\ for example, the
height of a person and the money spent by him in getting his house
constructed are placed together it does not make any sense and the figures
cannot be compared to each other. Such figures naturally do not come
under the category of statistics.
Webster has"also defined statistics in the same sense in which Secrist
has defined it. Webster's definition of statistics is as follows:
"Statistics are the classified facts respecting the conditions of
the people in a state ... speclally those facts which can be stated
in numbers or in tables of numbers or in any tabular or classified
arrangement."
This definition is rather narrow. It confines statis~cs onlYrto
those facts which relate to the condition of the l?eople in a state. '.Ilhis
was a very old concept o~ the word statistics and it does not suit modern
conditions. At presen~ statistics relate to all aspects of human activity
and as such this definition falls short of the modern concept of the term.
Moreover, this definition is not as clear and exhaustive as the one given
by Secrist. :
Second Type
Of the second type of definitions of the' term statistics (as statis-
tical methods or science ~f statistics) the oni: given b'1 S,ligm4n is very
short and simple and yet quite comprehensive. According to Selig-
WlANING AND DEFINITION OF STATIS1'ICS

man' 'Statistics is the science which deals with the Ibcthods of collecting,
classifying, presenting, comparing and interpretin~ numerical data col-
lected to throw some light on any sphere of enquIry."
. Acco~ding to King <"the science of statistics is the method of judg-
tog colleCtive, natural or social phenomenon from the results obtained
from the analysis or enumerarlon or collection of estimates." This
~efinitjon is not very exhaustive and it limits the scope of the science
of statistics. The author himself admits this defect but is of the view
that for practical purposes it is all right.
A. L. Bowley has given a series of'definitions but most of the de-
finitions given by him are not complete and lay emphasis only on
some;: of the aspects of the science. At one place Bowley says, «Statis-
tics may be called the science of counting.... At another place he is of
the view that "Statistics may rightly be called the science of averages".
Both these definitions are defective as the science of statistics does not
confine itself either to counting or to averaging alone. Th~se are no
doubt important statistical methods but they do not cover the entire
field of the science of statistics. Yet another definition given by the
same author characterises statistics as "the science of measurement of
tbe social organism regarded as a whole in all its manifestations."'" O'b-
viously this definition limits the application of the statistical methods
to only one field, namely, sociology. Bowley realised this limitation
and he himself writes at another place that statistics cannot be confined
to anyone science.
Bodtlington has defined statistics as the science of "estimates and
probabilities." This definition gives expression only- to certain methods
by which conclusions are derived in this science. No doubt in most
of the cases statistics are estimates and 'probabilities' but it should be
remembered that the scope of the science is not confined merely to
these things.
Lovitt ddfines the science as 'that which deals 'With the collection
classification and tabulation of numerical facts as th6 basis for explana~
tion, description and comparison of phenome~." This ·definition is
fairly satisfactory and it indicates that the science .of statistics is a sim-
ple and scientific exposition of statistical methods.
Having briefly discussed some 0% the definitioJj.s of the term statis-
tics and having seen their drawbacks we are now w. a position to give
a simple and complete definition of the term in the following words : -
Stati.rtiu (a.r lued in the .ren.re,oj data) are ,numerical .rtatement.r oj jart
rapable of analy.ri.r and interpretation and the sfienre of J/ati.rtiu i.r a .rtudy oj
thl prinripies and method.r u.red in the rollertion, pre.rentation,analy.ri.r and inter-
pretation oj nttmeriral data in any .rphm oj en(illiry.
MAIN· DIVISIONS 01' THE STUDY 01' STATISTICS
Statistics as a science can be divided into two JJlain classes, namely,
,,:,fati.rtirai mlthods and applild .rlat;.rtifl.
.6 FUNDAMENTALS OF STATl:.-rICS

t. Statistical methods
Under statistical methods are studied all those devices" rules of
procedure and ge~eral principles which are applicable to all kinds or
grou,ps of data. Thus they include all the general principles and tech-
niques which are commonly used in the collection, analysis and inter-
pretation of data relating to any sphere of enquiry. Statistical methods
are the .tools in the hands of a statistical investigator. These are devices
for achieving the desired ends explained in theory. Since a method is
always a means to an end, its acc·uracy and precision depends on thl'
object which is desired to be· achieved and this in turn is considerably
affected by the peculiar features .of the problem to which it is related.
This is the reason why different statistical methods are usc:! in different
types of enquiries and no uniform standard of accuracy is desired to
be achieved in different types of investigations.
a. Applied 8tatis~C8
Applied statistics deal with the application of statistical methods
to specific problems or concrete forms. If we have to estimate the
national income of a country or its industrial or agricultural production
then the special techniques followed to achieve these ends and the re-
sults obtained thereof would form part of applieu statistics. As IS
clear from the above explanation applied statistics can be further divideCl
into two m.ain groups. They may be either descriptive or scientific.
Dmriptive applied statistics deal with data which are known and
which naturally relate either to the present or to' the past. For example,
business statistics are descriptive applied statistics, as they deal with
the analysis, measurement and presentation of business facts relating
to past or present. On the basis of these facts decisions about various
business problems are usually taken.
Scientific applied statistics deal with the formulation of physical
and psychological laws on the basis of quantitative data collected for
descriptive purposes by the use of appropriate statistical methods. If.
for example, by the use of soine business statistics we are in a position
to derive certain conclusions, which we use for forecasting the future
trend or tendency of that particular phenomenon, we are making use
of scientific applied statistics. For purposes of business forecasting
we have to make use of such statistics.
OBJECTS OF STATISTICS

In the words of A. L. Boddington "the ultimate end of statistical


research is to enable comparison to be made between pasl and present results,
with a view to ascertaining the reasons for changes which have taken place and
the effect of SIIch changes in the future."
To achieve the above mentioned ends data relating to past and
present are collected and presented in the shape of time-series from
which valuable conclusions are drawn and these conclusions are used
MEANING ANI? DEFIN!'I'ION OF 51'F,'lS},y(... ':; 7

for the purpose of forecasting the future trend of dilferent problems.


Collection,. presentation. analysis and interpretation of statistical data
are no easy tasks. Latest statistical methods have to be applied for
ltriving at correct and dependable conclusions. Rese...rches have been
going on for improving statistical methods with a view to make them
more accurate and precise :so that the laws based on the analysis of the
descriptive applied statistics may become comparatively more stable
and dependable. It is thus very obvious that the science of statistics
is very closely associated with the progress of human civ~tion. It
helps in assessing the results of past achievements of human activities
and It is also useful for making forecasts about the future course of
events.

Questions
t. Explain clearly the concepts of statistics, statistical methods ...ad statistical
siences.
2. Examine the main differences hetween statistical, methods <and experimental
methods.
3. Critically CKamine the following de.6nitions of statistics: "Statistics is a.
>cience of counting", "Statistics is a science of averages", and, "Statistics is a sdene"
of the measurement of social organism in all its aspects". (B. C(IfII. Agra, 1'943).
4. Discuss the meaning and scope of statistics.
s. "Statistics affects everybody and touches life at man¥ points. It is both
ascience and an art." Explain the above statement with appropriate examples.
(B. Co",. Agra, 1946).
6. "Statistics of a business can be tre~ted scientifically and the preparation
and study of business statistics may be made a more e&act science than the study of
national and social statistics". Explain. (B. Co",. Allahabad, 1932).
7. "Science without statistics beats no fruit, statistics without science have
no root." Rxplain the above statement with necessary comments.
(M. A. P4lnfl, 1943).
8. "Statistics is co-operative counting." COInIl"ent.
9. What ate the characteristics that statistics (statistical data) possess. Explain
with illustrations.
10. What are the main divisio.ns of statistics. Illustrate with examples.
n. Write a note on the objects of statistics.
12. "Statistical methods include all those devices of analysis and synthesis by
means of which statistics are scientifically collected and used to explain or describe
phenomena either in their individua lor related capacities", Co'Dtt!ent on the above
statement.
". Explain with aIustrations how statistical methods tend to clarity of thOl1ght,
accuracy of estimates, verification of theories and discovery of relations.
(B. Co",. Agra, 1947).
14. UBy statistics we mean quantItative data affected to a marked extent by a
multiolicity of causes". Explain, (M. Co",. Agra, 1945).
IS. In whd ways can statistical methods be misused by interested persons
Give at least two caramplell of the misuse of statistics.
16. "A statistician is not an alchemist expected to produce gold from any "Vorth~
less material," Comment.
Origin and Growth of
Statistics 2
Early Beginninge
The origin of statistics is suggested by the derivation of this word.
It seems to have been derived from the Latin word stati.t which means
a political state. In fact the origin of statistics was due to administrative
requirements of the state. Statistics in the past were a by-product of
administrative activity. The administration of the states required the
collection and analysis of data relating to population and material wealth
of the country for purposes of war and finance. The earliest form of
statistical data, therefore, relate to census of population and property
collection of data. for other purp,oses, however, was not entirely ruled
out. Perhaps one of, the earliest censuses of population and wealth
was held in Egypt as early as 3050 B. C. for the erection of pyramids.
RamlSlI II conducted a census of all lands of Egypt. During the Middle
Ages such censuses were held in England, Germany-and other Westem
countries as well. In India about 2000 years ago we had an efficient
system of colleGbng administrative statistics. During the' Hindu period,
particularly during the Mauryan regime, our country had an efficient
system of collecting vital statistics and of the registration of births and
deaths. Ain~;-Akh4r; gives us a detailed account of the administrative
and statistical survc;y conducted during the reign of Emperor Akbar.
The histories of th~ other countries of the world also clearly indicate
that in ancient times statistics was regarded as a. matter connected with
the activities of the state and that is why it was known as a science of
statetUaft. The systematic collection of offiCial statistics originated in
Germany towards the end of the eighteenth century. In its earliest
form it was an attempt to assess, for political purposes, the relative
strengths of the German states by comparing population, industrial and
agricultural output. In England, statistics is a legacy of the Napoleonic
Wars. In order to raise new taxes that the cost of the war demanded,
it was found necessary to collect such facts and figures which would
enable government to have an idea about the probable revenue and
expenditure more accurately.

Sixteenth Ceutury
These spasmodic attempts made in ancient times to collect certain
facts and figures can be left out of account as in those days statistical
methods were not properly developed and 'We do not know the tech-
nique by which these figures were collected. Most of these figures
e.re not available and all that we kno'W is that such statistics 'Were collected
ORIGIN AND GROWTH OF STATISTICS

In those days. It has been only within comparatively recent times that
mankind has realised the utility and usefulness of collecting statistics
relating to the phenomena of physical and social universe. Prior to it,
the astronOl:n.~s used to record the movements of heavenly bodies like
stars and planets to foretell their position and to make forecasts about
eclipses. Tycho Brahe (1546-1601) collected valuable information about
the movements of planets and johannu K,pler made an exhaustive study
of these data and discovered the three famous laws relating to the move-
ment of planets. It was on the basis of these laws that Sir Isaa& N,w/on
formulated his theory of gravitation. Sir Frant:is Bacon (1561-1626)
was of the opinion that a proper knowledge of nature can be obtained
only on the basis of the study of data relating to various forms of nature,
and under his influence this method was adopted by scholars in various
fields. When these methods proved their efficacy in physical sciences
and when it was found that the results obtained by the use of these devices
were very accurate, social sciences like politics, econqpUcs and sociology,
all adopted statistical methods for the formulation of their theories
and for testing the degree of accuracy ~chieved by them.

Seventeenth Century

During the seventeenth century the methods of statistical science


were used under the banner of Political Arithmetic. Captain Joon Graunl
of London (1620-1674) was ,the first person who sturued statistics of
births and deaths and he is often referred to as the 'Father of Vital Statis-
tics'. It was during this period ,that figures relating to births, deaths
and marriages were collected by other persons also, specially by the
preachers of the Protestant Churches with a view to check illegi-
timacy prevailing in those days. During this period Edmund HallY
prepared the first life table giving the expectation of life at each age on
the basis of data collected by Casper Newman, in 1691, relating to death
records of Breslau. Sir William Petty (1623-1687) also prepared mor-
tality tables and calculated expectation of life at different ages. Later
on Jamer Dodson, Thomas Simpson, Dr. Price and others also, computed
mortality tal;>les and it was during this period that the idea of life in-
surance was developed. The first life insurance institution was founded
in London in the year 1698.
Even in the early 18th century statistical methods were used in the
same old name of political arithmetic. 1. P. Suumikh (1707-1767)
who was a Prussian clergyman statistically explained the theory of
'Natural Order of Physiocratic School'. He developed the .doctrine
that the ratio of births and deaths remains more or less constant and
that it is a kind of natural law;
, jacob Bernoulli (1654-1705) in his great work Ars Corget/andi pub-
lishe eight years after his death, was the first person to state the 'Law
of Large Numbers' and S. Poisson (1781-1804) also contributed a brilliant
paper on this subject.
10 FUNDAMENTALS OF sTAl'lSnCS

Eighteenth Century
The modern theory of statistics can be said to have been formulated
by L. A. Ji QHelict (1796-1874). He put forward the notion of 'average
man' whose actions, he stated, conform to the 'average rc;,:;ults obtained
from society.· He was further of opinion that the action and beha-
viour of other persons deviated from this form in a lesser or greater
degree and these deviations from this theoretical average were capable
of being treated by the method of errors and probability. He also
emphasised the im1?0rtance of the 'law of large numbers' which was
founded by Jacob Bernoulli.
In fact the science of statistics is highly indebted to the games of
chance. G. Cartlano (1501-1536) who was a great mathematician and
at the same time a big gambler also, wrote a valuable treatise on the
hazards of the .game of chances and he pronounced certain rules by which
the risks of gambling could be minimized and one could protect him-
self against cheating. These rules were based on the correct approach
to the problems which we, in modern times, study under the theory
of probability. Jacob Bernoulli and his nephew Daniel BernoHIIi (1700-
1782) laid a solid foundation of the theory of probability and put forward
the idea of 'moral expectation'. It was after this that Pierra Silllon de
Lapl(lce (1749-1827) published in 1782, his monumental work on the
theory of probability. This work is recognised as one:: of the best ever
done on the subject of probability. It is both mathematical as well as
philosophical. Later on most of the prominent mathematicians of
the eighteenth and nineteenth centuries like Moillre, Fiuier, Lagrange,
Chrystal, Btges, TodhHnter, GaHss, MorgaH, Lexis and Charlier, to mention
only a few names, contributed to the subject to probability.
Nineteenth and Twentieth Centuries
On these foundations laid by the mathematicians of the eighteenth
and nineteenth centuries modern theory of statistics' was gradually built
up. G. F. Knapp (1842-1926) and W. Lexis (1837-1914) contributed
valuable works on the statistics of mortality. Sir Frands Galton (1822-
1911) was the first to introduce statistical methods in the field of bio-
metry. Later on Karl Pearson took up this chain and his work on the
subject is too well known to need any detailed description. In the words
of Pearson himself, "the whole problem of evolution is a problem of
vital statistics, a problem of longevity, of fertility, of health, of disease
and it is impossible for the evolutionist to proceed without statistics as it
would be for the Registrar General to discuss the National Mortality
without an enumeration of the population, a classification of .deaths and
a knowledge of statistical theory."
It 'Was in the second half of the last century and in the present
century that statistical methods entered the realm of the science of eco-
nomics and became intimately associa~d 'With the ancient subject of
mathematics. Though relationship of statistics and mathematics is
very old yet it is only during the last tOO years or so that the two sciences
have come very ~ose to each other. In recent years the domain of
ORIGIN AND GROWTH OF STATISTICS 11'
statistical methods has considerably widened and today there is hardly
any science which does not make use of statistical methods. The science
of statistics is now associated with all other sciences in some form or
the other and we shall now study the relationship of statistics with other
sciences particularly. with economics and mathematics. For the past
two decades particularly there has been a remarkable and sustained
growth in the use of statistics. This is because business, government
and science, three fields in which applications of statistics are most nu-
merous and di\'erse, are growing in volume and complexity. It is
also because of the technological revolution which has taken place in
data handling, affecting especially computing and tabulating equipment,
and a scientific revolution in statistical theories and techniques.
RELATIONSHIP OF STATISTICS WITH OTHER SCIENCES
Statistics and Economics
Though the relationship of statistics ..,-ith economics dates back to
1690 when Sir William Petty published his book named "Political Arith-
metic" yet the relationship of these two sciences became intimate rather
very late. No doubt statistical data ab9ut economic problems used to be
collected in the past but there was no relationship between statistics and
economic theory. In earlier stages of development the science of eco-
nomics was based on deductiol. and the predominance of deductive
approach was responsible for the disinterest of economists towards
quantitative data for purposes of the development of economic doc-
trines. Besides this, there was also a tendency in those days to avoid
figures which were considered to be lifeless, rude and coarse. What
was responsible for this peculiar disposition to figures in those days if,
difficult to state. It is a fact that people wanted to avoid rude shocks
which awaited them in the world of facts and always wanted to be vague
in their statements and logic. Gradually this hatted for figures melted
away and even deductive writers like J. S. Mill admitted that "in some
cases instead of deducing our conclusions from reasoning and verifying
them from observations we begin by obtaining them provisionally from
specific experience and afterwards connect them with the principles of
human nature by a priori reasoning." Similarly in 1871 W. S. lepons
wrote that "the deductive science of economy must be verified and
rendered useful from the_purely inductive science of statistics. Theory
must be invested with the reality of life and fact. Political economy
might gradually be erected into ,the exact science, if only commercial
statistics were far more complete and accurate than they are at present
so that the formulae could be endowed with the exact meaning by. the
aid of numerical data. Jevons developed the technique of an~ysis of
time-series and was the pioneer in the field of price studies and index
numbers. Rightly he has been called the 'Father of Index Numbers'.
Besides Jevons the Historical School (1843-1883) also brought statistics
and economics close to each other. In fact Roscher, Knies, and Hilde,.·
brand, all were of the 'opinion that economic doctrines should not be
argued in the abstract and that they should be inductively verified. The
12 FUNDAMENTALS OF STATISTICS

·effect of the preachings of Historical School was indeed very great and
the science of economics no more remained merely deductive in its
approach. By th~ time the .present century began, much of the opposi-
tion to .the use of .statistical methods in the realm of economics had
elided and in 1907 Ai/rId Marshall could write, "Disputes as to methods
have ceased. Qualitative analysis has done the greater part of its work ...
··that is to say, there is general a~reement as to the charactc.cistic and
duration of the changes which varIOUS economic forces tend to produce.
Much less progress has been made towards the quantitative determina-
tion of the relative strength of different economic forces ...... that higher
·and more difficult tasks must wait upon the slow growth of thorough
realistic statistics." At the same time Pareto wrote, ':The progress of
political economy in the £uture will depend in great part upon the in-
vestigations of Impiri&a/laws derived from statistics which will then be
compared with known theoretical laws or will suggest derivation from
them of new laws." Later on Lord KeylUs writirig abaut the functions of
statistics w[Ote that it is "first, to suggest '~f::al /aWt, it mayor may
not be capable of subsequent deductive exp tions; and. secondly, to
supplement deductive reasoning by checking its resu,lts and submitting
them to the test of experience." Now there are no tWo opinions about
tIie fact that both induction and deduction are necessary for the growth
and development of economic science. In fact statistics and economics
are so intermixe~ with ea~ other now that the question of th~ir separa-
tion does not arIse. .
Fa&tors responsibl, for &Ioser lies b,/ween Itonomiu aIId sfa(isliu. Since
1890 two factors have worked together to bring about this great change in
the relationship of statistics and economic:s-. 'the Brst is the develop-
ment of statistical methods-of probability G:dd -sampling, simple and
partial correiatj9n and association, periodicity'-an<l index Jl11ID.bers, etc.,
the second is the enlargement of statistical material in recent years. In
fact during this period various eminent statisticians like C. B. Datl,nporl.
A. L. Bowley, W. Pearson. W. I. King and R. A. Fisher. etc. have made very
valuable contributions towards the developments of the science of statis-
tics. During this period the statistic~ data have also increased in
quantum allover the wo.rld on account of the establishment of statistical
bureaus in various countries. Tpe improvement of statistical methods
and the expansion of statistical data have thus brought economics and
statistics very close to each other and have marked the real. inception of
statistics in the domain of the science of economics.
Statistics, economics and mathematics
It has already been mentioned above that statistics and mathe-
matics have been closely in touch with each other eve.r since the seven-
teenth century when theory of probability was found to have bearing on
various. Cltatistical methods. During the last 100 years or so not only
statistics and mathc;.matics have come very close to each other due to the
dc;velopment of mathematical statistics, but these sciences have been
joined by economics as wells and now there.is a happy union between
statistics, economics and mathematics, Mathematics has considerably
'-
OllIGIN AND GllOW'I'H OP S'l'ATISTICS

helped in the development of economic theories and now mathematical


economics has become a very important branch of the science of econo-
mies. In December 19.30, the first econometric Society was founded in
the United States of America and this was a sort of recognition of the
union of these great sciences. ,The purpose of the .society was officially
defined as "the advancement of economic theory in its relation to statistics
and mathematics ...... " The aim of econometrics is to make economics
more realistic and practical science. Mathematical approach to economic
theories makes them more precise and logical and similarly statistics give
a quantitative conclusion about the validity of purely theoretical concepts.
Thus we see that in lll,.odern times economics, statistics and mathematics
are very much intermixed with each other and this union has' proved
helpful for the ,development and progress of all these sciences.
Statistics and Astronomy
Statistics, as we have already mentioned earlier, 'Were first collected
by astronomers for the study of the movement of stars and' plantts.
In fact there are a few things which are common between physical sciences
and statistical methods, and astronomers applied statistical methods for
the furtherance of their studies. The method of least squares was first
q,eveloped by an astronomer. Astronomers generally take a large num-
ber of measurements and in most cases there is some difference between
these several observations. In order, therefore, to have the best possible'
measurement they have to make use of the technique of the law of errors
in the form of method of least squares. Besides this, even in physical
sciences usually first rough measurements are taken anrllater on as morc
and more data are available and as the precision of instruments of measure-
ment increases, better estimates arc put forward. In order to give an
idea about the degree of accuracy achieved, usually limits are assigned
within which the true value of the phenomenon is expected to lie. This
is in fact a purely statistical approach. Thus 'We find that even though
astronomy is a physical science, and statistical methods are generally noi
applied in such sciences, yet they cannot be entirely ruled out of question.
Statistics and Biology
The development of biological theories has been found to be closely
associatd with statistical methods. Professor Karl Pearson in his Gram·
mar of Sciences says that the whole doctrine of heredity rests on statistical '
basis. The contention that tall fathers have in general tall S011S can be
proved only by the use of statistical data and statistical methods. The
differences observed in the various generations in different zoological
species can be measured and studied only with the help of statistical
technique. Thus we see that statistical methods help in the formation
of the theory of development of human and animal life.
StatistiC's and Meteorology
Statistics is related to meteorology. In meteorology records are
made of temperature, humiditr o£ air and barometrical pressures, etc.
'For purposes of comparison /and forecasting it becomes necessary ~
average these figures and. to study their trend and fluctuations. . A
14 PUND.Al.mI\lTALS op STATISTICS

study of the significance of these deviations has also to be made for various
purposes. All this cannot be done without the use of statistical methods.
We thus find that the science of stat1stics helps meteorology in a large
number of ways.
The above account of the origin ~d growth of statistics clearly
reveals the fact that the great science of statistics is associated with all
the other important sciences both physical as well as social. In fact
today the domain or statistics' is very wide, it is almost universal and
it is difficult to imagine any science worth the' name where statistics has
not proved its usefulness in some form or the other. Bowley was right
when he said, "A knowledge of statistics is like a knowledge of foreign
language or of algebra; it may prove of use at any time under any cir·
cumstances. "
Callser of the recent growth of Statistiu. The tremendous growth in
the use of statistics, l'.S has been shown above can be attributed mainly to
two factors, "i~. :-increased demand of statistics and decreasing cost of
statistics.
(I) Increased dlmand. There has been a phenomenal increase in the
demand. for statistics in various fields. Statistics are most commonly
used by businessmen, government and scientists. The spheres of the
activities of all these three categories have increased extraordinarily in
modern times. The magnitude of business has considerably increased
resulting in an increased demand of statistics. The business in modern
times has become a very complicated affiUr and this fact has further aug-
mented the demand of statistical data. The complexity in business is
on account of numerous government regulations, laoour disputes, ever·
increasing taxes ~d technologjcal revolution which the business world
has witnessed in recent years.
Even more than business activities, the activities of the government
have incJ;eased both in size a.II well as in complexity. Modern states are
welfare states and they have to look after a large variety of things result-
ing in an increased demand of statistics.
Probably the most spectacular development of modern world is the
growth of scientific research. Science today is a very complex pheno-
menon and different types of researches in the field of science are of an
e~emely complex nature and they make an extensive use of statistical
data. We thus find that the demand of statistics has considerably in-
creased and this is one reason why the science or statistics is developing
so fast.
(;/) Decreasing Cost. Another reason why the science of statistics
has developed so fast and has become so popular is that on account of
a number of reasons the cost and the time required for the collection and
analysis of data have gone down. There has been a vast improvement in
the technique of processing the data which has resulted in great economy
of both time and cost. Modern computing and tabulating machine!:
not only save time but money also. The development of electronic
calculators and other modern machines like desk-calculators and card
ORIGIN AND GROWTH 010 STATISTICS 15

sorting machines etc., have made the task of scientists, businessmen and
administrators very easy and simple. They have resulted in a very great
economy both in terms of money as well as the time needed to do a job.
Statistical theory has also developed in modern times in such a
manner that the cost of compilation of statistical data has gone down
considerably. The theory of sampling and various designs of experi-
ments and statisticallJ.uality control have all contributed towards lower-
ing the cost of collection and analysis of statistical data.

Questions
I.Write a shott essay on the origin and growth of the science of statistics and
throw light on its future.
2. &plain the relationship between ~conomics and statistics.. How far has
the use of statistical methods in economics led to its development ?
• (B. Com. Lt«kf/4flJ 1941)
,. "Statistics are the straw out of. which every other economist has to
make the bricks." (Marshall).
B'It()lain, in the light of the above observation,the relation between ec)l1omics
alld statistics and discuss how far it is correct to say that the science of economics is
becoming statistical in its method. (M. Com. Allahabad, 1944).
4. Trace the association of mathematics with the science of statistics and show
how the former has considerably helped the development of the latter.
s. Discuss the relationship between statistics and various soclal sciences.
6. Do you think that statistical methods are of any help in physical scJences?
If 80, how?
7. Write a brief essay on the relationship of economics, statistics and mathe-
matics.
8. Show how the science of statistics which was originally the science of state-
craft has now become the sclence of universal application. Do you think that statistical
methods are in reality applicable to all types of sciences ?
9. How far has the growth of statistics coincided with the development of
physical and BQ.cial sciences ?
10. "Statistics is an apparatus by the help of which the validity of the laws of
physical and social sciences can be tested". Comment.
II. Discuss the factors responsible for the quick development of statistics in
recent years.
Importance, L imitation and
Functions of Statistics 3
Sflltisnrs and th, coml11on man. The fact that in the modern world
statistical methods are of universal applicability, is in itself enough to
show how important the science of statistics is. As a matter of fact
there are millions of people all over the world who have not heard a
word about statistics and yet who make a profuse use of statistical me-
thods in their day-to-day decisions. Statistical methods are common
ways of thinking and hence Rre used by all types of persons. When
a .person wishes to purchase a car or a radio and he goes through the
price lists of various companies and makers to arrive at a decision, what
he really aims at, is to have an ideaabout the average level and the range
within which the prices vary, though he may not know a wQtd about
these terms. When a farmer wishes to have a particular quantity of
tain in a p~ticular season so that he may have a good crop, he has in fact
an idea of the correlation that exists between rainfall and crop yields and
the regression line of. crop yields on rainfall. Again when we use a
common proverb ·'as you sow, so you reap" we indirectly pint that there
is a positive correlation between one's actions and achievements.
Examples can he multiplied to show that human behaviour and
statistical methods have much in common. In fact statistical methods
are so closely connected with human actions and behaviour.that practically
all hvroan activity can be explained by statistical methods. This shows
how important and universal statistics is.
CAUSES OP nIB IMPORTANCE OP STATISTICS
Simplifies Gomplexi(J. One reason why statistics is so important
today is that it simplifies complexity. Human mind is not capable of
assimilating huge facts and figures, and statistical methods, by making
these data easily intelligible and readily understandable render a great
service, because in its absence the information 'Would not have been
of any use. Statistical methods describe a phenomenon in a very simple
fashion. If, for example, we have to study the economic system of
Soviet Russia we cannot properly understand it by a purely descriptive
method in which no statistics are used, but if the different aspects of tho
economic system are numerically eXpressed we can und~rstand the whole
thing in a short time and in a better manner.
'M£asures·rU1IIts. Similarly if we have to measure the results of
particular policy it can best be done by statistical methods. If we have
to study. for example, the effect of a rise in the bank rate on the industries
of a country we c~n do so in a proper manner only by, means of a statistical
IMPORTANCE. LIMr1'ATIONS AND FUNCTIONS OP STATISTICS 17

study of the phenomenon. Such situations involve the rather difficult


and delicate task of the comparison of two phenomena. The pertinent
question in the present case would be, whether a rise in the bank rate has
affected the industries adversely or favourably? An answer to this
question would involve. a comparison of the present situation with the
past and also a decision whether the change has been beneficial or other-
wise, from the point of view of industries. Without an adequate use of
statistical data it would he impossible to arrive at any sound and depen-
dable conclusion. Statistics thus help in measuring the effects of a
particular policy and in arriving at a conclusion about it.
Studies relationships. Yet another reason for the importance of
statistics is that it makes possible a study of correlation between two
phenomena. In all types of studies the importance of observing relation-
ship between different phenomena is very great. The relationship bet-
ween, say, price and supply, or demand and price is a phenomenon which
requires a very careful and close study before any generalization can be
made. In the absence of statistical methods it would be very difficult
to arrive at a precise and correct conclusion in this respect.
Enlarges human experience. Thus we see that the science of statistics
enlarges human experience and knowledge, by making it easier for man
to understand, describe and measure the effects of his own actions or
the actions of .others. Many fields of knowledge would have ever re-
mained closed to mankind but for the efficient and refined technique
and sound methodology provided by the science of statistics. It has
provided such a Jllaster key to mankind that he can use it anywhere and
can study any problems in its correct perspective and on right lines.
We have described above how the science of statistics helps in the
solution of many difficulties which mankind has to face. The technique
of the science can briefly be summed .uP in three words. namely, Des-
cription, Comparison and Correlation. It is with these devices that statistics
help in improving human knowledge about various sciences and· in
solving various difficulties which arise in the development and growth
of these studies. We will now briefly discuss how statistics is indispen-
sable in different branches of human activities.
INDISPENSA13ILITY OF STATISTICS
Tmporland in economics. In f:he field of economics it is almost im-
possible to find a problem which does not require an extensive use of
statistical data. Important phenomena in aIr branches of economics
can be described, compared and correlated with the help of statistics.
Statistics of consumption tell us of the relative strength of the desire
of a certain section of the people and its variations from time to time.
By statistical analysis we can study the manner in which people spend
their income over various items of expenditure, namely, food, clothing
house rent, etc. Statistics of production describe the wealth of a nation
and compare it year after year showing thereby the effect of changing
economic policies and other factors on the level of production. Ex-
change statistics throw light on the commercial cievelopment of a nation.
2
18 FUNDAMENTALS OF STATIS'l'ICS

They tell us about the' volume of business done in a country and the
amount of money in _circulation. Distribution statistics disclose the
economic conditions of the various classes of people. They throw light
on the distribution of national dividend amongst the inhabitants of a
country. We thus find that in all types of economic problems statistical
approach is essential and statistical analysis useful. Mathematics and
"its offsprings, statistics and accounting, are the powerful instruments
which the modern economist has at his disposal, and of which business
through the development of research agencies and JIlethods. is 'making
constantly greater use.
Need in planning. Modern age is an age of planning. The days
of laisse~ faire are gone and state intervention in practically all aspects
of life has become universal in character. Today. we live in ~ period of
transition; economic activities are being more and more closely directed
to the production of such goods, and the provision of such services, as
the government may decide to be most urgently required~. Our future
is 'Very largely being pla111led, and this planning, to be successful must be
soundly based on the correct analysis of complex statistical data. When-
ever we thiuk of a plan we have to think of statistics. Planning cannot
be imagined without statistics. If we study the economic plans imple-
mented in various countries in recent times we will·find that all of them
are a statistical study of the economic resources of the respective countries,
and they suggest possible ways and means of utilising these resources
in the best possible manner. Various plans that have bTen prepared
for the economic development of India have also made \1se of the statis-
tical material available about various economic problems. The fact that
in our country the amount of statistical material available to the planners
has been very scanty, is responsible for many drawbacks and inaccuracies
in different plans. Not only plans of economic development are construc-
ted on the basis of statistical data but the success that a plan achieves is
also measured best by the use of statistical apparatus. We thus find that
in the field of economic planning the use of statistics is indispensable:
Usefulness in commerce. .~tatistics are an aid to business and com-
merce. In fact today the situation is, that a businessman succeeds or fails
according as his forecasts prove to be accurate or otherwise. When a
man enters business he enters the profession of forecasting, because success
in business is always the result of precision in forecasting and failure in
business is very often due to wrong expectations,. which arise in turn due
to faulty reasoning and inaccurate analysis of various causes affecting a
particular phenomenon. Modern devices have made business fore-
casting more definite and precise. Economic barometers are the gifts
of statistical methods and businessmen all over the world make extensive
use of them. A producer estimates probable demand of his goods, ana-
lyses -the effects of trade cycles and seasonal variations as also of changes
in habits and customs of people on the demand of his wares, and after
taking all these factors into consideration finally takes decision about the
quantum of production. A businessman who ignores the effects of booms
and depressions can never succeed and is bound to face frustrations as his
IMPORTANCE. LIMITATIONS AND FUNCl'IONS OF STAl'ISnCS 19

calculations are Sure to be faulty. A study of all these things is in reality


a study of'statistics and hence we say that all types of businessmen have
""to make use of statistics in one form or the other if they want any success
in their profession. For the solution of problems connected with tht
internal organization and administration of business units and with the
processes of buying and selling that bring the businessman into ~ontact
with the price system. methods of statistical analysis are peculiarly appro-
priate. Various branches of commerce utilise the services of statistics
in different forms. Promoters of new business make extensive use of
statistical data to _arrive at conclusions which are vital from the point of
view of starting a new concern. In fact in the absence of statistical data
it would be impossible to carryon the activities of promotion of new
concerns on sound lines and the number of business failures would be
much more than at present. Again, cost accounting is entirely statistical in
otltlook and it is with the help of this technique that producers are in a
position to decide about the prices of various commodities. We thus
find that the science of statistics is of extreme importance to business a~c;l
commerce and in its absence the growth of these things would be lopsided
and very slow, and business uncertainties would considerably increase.
Utility to bankers. brokers, insurance Gompanies, etc. Bankers, stock
exchange brokers, investors, insurance companies and public utility
concerns all make extensive use of statistical data. A banker has to make
a statistical study of business cycles to forecast a probable boom or a
depression and has to study in detail the seasonal variations in the demand
for call money from its clients. It is after a study of these factors that a
blnker decides about the amount of reserves that should be kept. Unless
the calculations of a banker are correct, and they cannot be so in the
absence of statistical data, he is' always in danger of making a mistake
and losing public confidence upon which his entire existence depends.
Statistics are equally important from the view-point of stoGk txchange
brokers, speculators and investors. They have to be fully conversant with
the prevailing money rate at' v~rious centres and have to study their
future trends. Their success depends on the extent to which their
forecasts about the future trends of money rates come to be true. In-
surance companies cannot carry on their business in the absence of
statistical dlta relating to life table:? and premium rates, etc. In fact,
Insurance has been one of the pioneer branches of commerce and business
which has been making use of statistics from the very beginning of its
existence. Insurance as an institution could never have existed in the
absence of statistical data. Theory of probabiflty, works itself out fully.
in the field of insurance and the success of an insurance company de-
pends on the accuracy of the basic data that-it uses for the calculation
of premium rates, etc. Again public utility concerns like railways,
dec/ric supp!J companies, waterworkl, etc., alsd make extensive use of
statistics. As a matter of fact it is difficult to imagine any business. big
or small which does not make use of statistical data, in one form or the
other. The science of statistics is indeed indispensable to business and
. commerce.
20 FUNDAMEmALS OF STATISTICS

Utility ;n Blisiness Management. One element common to all


problems faced by business managers is the need to make decisions in the
face of uncertainty; and the essence qf modern statistics lies in the
development of general principles for dealing 'Wisely 'With uncertainty.
Modern statistical tools of collection, classification, tabulation, analysis
and interpretation of data have been found to be an important aid in
making 'Wise decisions at various levds of managerial fUnction. The
uses to which statistical methods are put in this area are many and
varied, for example, the success of a modern industrial enterprise
depends to a significant extent on the accuracy of production pro-
gramming, and sales,. quality and inventory control. Statistical tools
are relied upon heavily in arriving at correct decisions in all these
aspects.
The success of production programming both in the short as well
as long period depends to a great extent on the quality of sales forecasts
and proJections. The sales forecasts may be made in one or more of
various ways. A compants regional offices may make estimates of sales
in their respective areas. They may alsol take into account seasonal va-
riations. These regional estimates may then be subjected to statistical
treatment and final forecasts made. These forecasts may further be recast
in the light of statistically estimated variations in aggregate market
variations under the impact of economic programming at the national
level.
The variations in sales below or above the forecast may disturb the
production schedules. These variations may be seasonal and the policy
of stabilising production means that the inventory of finished goods
would vary rather widely. The upper and lower zones are set based
on statistical study of sales fluctuations and if fluctuations occur outside
this zone, changes in pro~uction schedules are made.
Effective control on sales can also be exercised through regional
allocations. 1.1 this case also the determination of aggregate and re-
gional sales potential is based on a statistical study on tto:nds. Statistical
methods also help the sales executive in the establishment of the share of
responsibility on each regional sales office for the achievement of sales
targets. These measures enable the sales executive in determintng which
of the regional offices is most effectively utilising the market potential
of its region. Market research, co~sumer preference studies, trade channel
studies and readership surveys !lre other methods of sales control which
make an extensive use of statistical tools.
Statistical methods also come to the aid of quality control. A
manufacturer of match boxes may select 100 pieces which are consi-
dered to be satisfactory with respect to its characteristics and acceptability
to consumers. The average weight of each box may then be ascertained
and set on a standard of quality. Now variations in the weight of match
boxes from this sta.ndard vTeight may be due to a mUltiplicity of varia-
IMPORTANCE. LIMITATIONS AND Fl'NCTIONS OF STATIS'l1ICS 21
ti')ns in the size. quality a.nd weight of its components variations in the
quality of match and bO'lt-wood. quantity and quality of phosphorus,
fJIloisture in wood or phosphorus, labelling etc. Statistical methods can
also be used to avoid the cumbersome process of weighing each and
every match box. The boxes may be divided into lots on random sam-
pling basis, and boxes may again be selected from each lot on random
sampling basis. If the weight of these boxes corresponds to standard
weight, the whole lot may be considered to be of standard quality.
Inventory control is essential for economical functioning of business
enterprises. It relates both to quantitative and qualitative aspects. The
stocking of inventories at the optimum level depends on the accuracy
of sales forecasts and correlation between the fin~l product and size and
quantity of each raw material, tools, equipment, fuel etc., needed for it.
Quality control On inventory is not only facilitated but also made more
accurate with the aid of statistics. It may be a cumbersome, time con-
suming and costly process to inspect each and every item of inventory
purchases. This is particularly so if the size of items is very large. It
is also not possible to inspect meticulously each and every item. This
problem is tackled by the method of random sampling. Some items
are selected on this basis and subjected to close quality inspect~on and
if these are found according to specifications, the whole lot is accepted~
Importance to the J/ate. Statistics are very helpful to a state as they
help it in administration. Modern State makes extensive use of statistical
data on various problems. Before enforcing any policy a State has to
examine its pros and COBS and this can be done only with the help of
numerical data. Evils of drinking or crimes nee,d a proper statistical
investigation before remedies can be suggested for them. The state has
to collect figllres of population for various purposes; it has to estimate
the figures of national income to find out the prosperity of the country.
A state in the modern set-up besides being an administrative body is a
big commercial concern also. It carries on businesses of various kinds
and has monopoly in many cases. It needs statistics for carrying on these
works. In fact the state is always the most important single unit which
not only collects the largest amount of statistics but also needs statistics
on a very extensive scale. Probably this is the reason why official statis-
tics occupy a very important place in the statistical literature of any
country.
Desirability ill Research. Modern statistical methods and statistical
data are being found increasingly useful in research in different fields.
Experiments about crop yields with different types of fertili~ers .and ·.di-
fferent types of soils or the growth of animal life under different types of
diets and environments are very often designed and analysed according
to statistical methods. Even in the field of medicine and public health
statistical methods are used for testing the efficacy of new medicines arid
methods of treatment. In the field of industry and commerce statisticians
carry on different types of researches. They try to find out the sources
PUNDAMENTALS OF S'VAT1STICS

and causes of variations of different products from their standard quality.


The technique of quality control is entirely statistiGal in nature. Mark et
researches are carried on by making extensive use of statistical method:,.
Even in literary field statistical.researches have been found to be useful.
Statistical studies about the length of sentences, the frequency of various
words and various parts of speech have been used to find out whether a
disputed work is of one author or the other. As a matter of fact, for
a research worker in any field which is concerned with numerical
results a study of statistical method is not only useful but necessary.
Universal appHrability of sta/is/iral methods. We thus find that the
'Statistical methods are of wide, almost -qniversal applicabili~y. Govern-
ment needs them, economists and business men .J.eed them; in fact all
types of persons, astrologers, astronomers, biologists, meteorologists, bo-
tanists, and zoologists make use of statistics and statistical methods.
Statistics, when used effectively, become so intertwined in the whole
fabric of the ~ubject to -which it is applied as to be an integral part of it.
Statistics assist in planning the initIal obse.rvations, in organizing them
and formulating hypotheses from them, and in judging whether the new
observations agree sufficiently well with the predictions from the hy-
potheses. The universality of statistics is enough to indicate its im-
portance, utility and indispensability to the modern world.
Some illustrations of the uses of statistics in various fields have
been given below.

Uses in Business Management

(i) Qualitative Change in Produr/. A soap manufacturer found


that his product was loosing ground to a competitor who had introduced
a new quality. He decided to make a su.rvey of consumers' preferences
so as to provide the b·asis for introducing qualitative changes in his
own product. But survety of preferences of millions of consumers spread
all over the country was 3 very expensive and time consuming proposi-
tion. It was, therefore, decided to take a sample of 1,000 consumers.
Consequently, he sent his own soap and that of his competitor to the
sampled consumers alongwith a questionnaire requesting them to show
their preference and the reasons for such preference. The tabulation,
analysis and interpretation of the information supplied by them reveaJed
that scent, transparence and colour of the competitors' soap were res-
ponsible for its preference by them, so a new formula was devised
for introducing these qualities to a greater degree in his own soap with
the result" that his prodllct could again outsell the competitor.
(il) Sales CtJn/rol. A manufacturer of electrical appliances was
worried about the declining trend in take-off from the factory. The
only information available in his records related to monthly despatches
The consumers' su.rvey was indeed an expensive and time consuming
task. Even the number of dealers in his product ran into thousands and
IMPORTANCE, LlMtTA'tIONS AND ';tJNCT10NS OF STAnSTICS ~3

presented the same problepls, although to a lesser degree. He, there-


fore, decided to address a questionnaire to a random sample of dealers
including whole-salers and retailers. Their replies were classified
and tabulated separately. A statistical interpretation of the processed data
showed that the wholesalers had drastically reduced their sto~s due to
two reasons viZ., credit squeeze exercised by banks and reduced orders
from retailers. The retailers were facing marketing difficulties. The
manufacturer of competing products had started providing mobile re-
pairing services. These two factors were responsible for decline in
factory take-off. The provision of pecessary credit facilities to whole-
salers and organisation of mobile repairing services restore the fall in
take-off.
(iii) Issue oj Bonus shares and Revaluation of Siock. (a) A general
as well as sectoral study of price movements by the Amen Steel Structurals
showed that the prices of all the commodities have been rising steadily
during the last ten years. A statistical analysis of price movements and
developmental outlay further showed that a positive: and high correla-
tion existed between the two. A study of trends in developmental
outlay and of the deliberations of the Planning Commission showed that
during the. Fourth Plan it will be nearly double the size of the Third
Plan. It was, therefore, concluded that it could be safely assumed that
prices were not likely to decline in future. The company could, on this
basis, take the decision of making a revaluation of its fixed and floating
assets and issuing bonus shares.
(b) The revaluation of fixed assets, inventories and stock also
presented a serious problem as they ran into thousands. Moreover,
many of them were not available in market due to import restrictions
and it was impossible to ascertain their market price. The task of esti-
mating the residual life of all the fixed assets was also very complex and
difficult. The company, therefore, decided to'take a random sample of a
few items of each type of asset and make its valuation. On this basis,
the valuation of the whole lot of assets was assessed and results were
found to be very satisfactory.

uses in Social Sciences


'(i) Scholastit Performance. Numerous statistical studies have de
monstrated that a high and positive correlation ~ists between scholastic
performance and extra-curricular activities. Findings of these studies
have led to a vast expansion of extra-curricular activities in educational
institutions and ways and means have been devised for encouraging
students to participate in them.
(ii) Public Qpinion. General impressions about public opinion
are often found to be misleading. Carefully designed statistical analysis
has been very helpful in arriving at accurate conclusions. Immediately
after the cease fire following the Indo-Pakistan War in September 1965. it
24 FUNDAMENTALS OF STAnSTICS

was generally believed that the Indian people wanted to resume tighting
again. A poil of public opinion carried out by a leading newspaper
r::vealed the following result :
Yes No No
Are ynu in favour of another round opinion
of fighting with Pakistan. 65 25 10
Uses in War
(i) Active lead by OJlicers. A statistical analysis of the Indian and
Paklstani casualties during the Indo-Pakistani War of September 1965
re vealed that the proportion of officers among those killed was higher
ou the Indian side. This showed that the Indian armies were actually
led by their officers and this was one of the important far.tors responsible
for Indian victory. This factor will assume importance in the formula-
tion of future war strategy.
(ii) Training in the Use of War Eqllipment. The heavy reverses
suffered by Pakistan during the above War, despite its vastly
superior Air Force and armoured Corps came as a great surprise to the
whole world. Statistical analysis with its causes revealed that a high
. and positive correlation existed between the _period and intensity of
training in the use of aeroplanes and tanks and their effective use ill war.
A further investigation into the period and intensity training provided
in both the countries revealed that Pakistani failure td make an effective
use of its fighters. bombers and tanks was due to inadequate and inferior
training of its personnel
(iii) Inspection ofpurchases. During the war, military requirements
of goods and commodities increase tremendously. Complete inspec-
tion of each and every item involved huge expenditure and time of a
large number of personnel and it can also not be done expeditiously.
Here statistics come to the help of the army. The use of sampling ins-
pection method helps not only in its quick disposal but also gives accurate
results. Under this method, only a few items, say 2 per cent. are selected
on random sample basis and thoroughly inspected. This method is both
cheapet and expeditious. It also ensures accuracy as it is easier to ins-
pect more closely a few rather than a large number of items.
LIMITATIONS OF THE SCIENCE OF STATISTICS

Does not study eptalitative plJenomena. Despite the universality of its


application the science of statistics has its oWu limitations. The most
important limitation of the science is that it can be applied only to those
problems which are capable of quantitative expressions. Such pheno-
mena which.cannot be expressed in figures have very little use of statistical
methods. Honesty.. for example, cannot be measured in figures and,
therefore, in a study of honesty statistical methods cannot be of much
help. However, it should be noted that even these subjective concepts
WPORTANCE, LIMITATIONS AND FUNCTIONS OF STATISTICS 25

caiJ be related in an indirect fashion to numerical data. Honesty itself


may not be capable of quantitative analysis but many factors which are
related to this phenomenon are capable of being expressed in ligures and
as such can throw some light on the study of this problem. A study of
the number of thefts or cases of cheating or swindling can indirectly tell
us something of the problem under study. Again, tht' crime can be
measured in terms of the men who go to prison and if the !lumber of such
persons is decreasing, we can safely say that there is a better enforcement
of the law and crime is on the decrease. Similarly a study about the
culture or civil~tion of a country is not possible with the help of statis tical
methods though these methods can certainly help in such studies in an
indirect and subsidiary manner.
'DoIs II()t reveal the entire story. Another limitation of statistics is
that it cannot reveal the entire story of a problem. Since many problems
are affected by such factors which are incapable of statistical analysis it
is not always possible to examine a problem in all its manifestations only
by a statistical approach. Many problems have to be examined in the
background of a country's culture, philosophy or religion. All these
things do not come under the orbit of statistics.
Statisli&aJ laIPS tr," onlJ on atJeragl. We have already discussed
in an earlier chapter that statistics, as a science. is not as accurate as many
other sciences are. and statistical methods are not very precise and coltect.
Laws of statistics are not universally true like the laws of physics or
astronomy. Statistical laws are true only on an average. Statistics deal
wjth such phenomena which are affected by a multiplicity of causes and
it is not possible to study the effects of each of these factors separately
as is done under experimental methods. Due to this limitation in the
statistical methods, the conclusions arrived at are not perfectly accurate
and consequently the same conclusions cannot be arrived at under similar
conditions at all times.
Does II()t stll4J individuals. Statistical methods ha\re no place for an
individual item of a series. Statistics deal with aggregates, though for
purposes of analysis these aggregates are very often reduced to single
figures. A statistical series is condensed into an average for purposes of
comparison though an individual item of the series has no specific reGog-
nition. This is a limitation. If only 100 persons die of starvation in
India and if the percentage of these deaths to the total population of the
country works out to be a negligible figure, statistically we will be justi-
fiedin ignoring it; but this fact does not in any way reduce the torture of
death and its aftermath, so far as these 100 people are concerned. and
consequently from this point of view it is something very important and
material, and yet in statistical analysis the problem does not occupy any
significant place whatever. This type of apathy to individual items of a
series is a serious handicap in many investigations. The average income
of a group of persons might have remained the same over two periods
-26 FUND.A.MENTALS OP STATISTICS

and yet many persons in the group might have become poorer than what
they were before. Statistical methods ignore such individual cases.

Is liable to be misused. Statistics are liable to be misused easiiy.


Any person can misuse statistics and draw any type of conclusion he
likes. There is very great possibility of the ?Jisuse of this science. In
,reality statistical methods can be properly used only by trained people and
their use by less expert hands is sure to give inaccurate results. Statistics
is a delicate science and' consequently should be used with caution.
Misuses, unfortunately, are probably as common as valid uses of statis-
tics. The ability to discriminate between a valid and an invalid use of
statistics is more important for most people than knowing how themselves
to make effe<;tive use of statistics. No one can afford to be misled by bad
statistics; and everyone needs kno~'ledge that can be gained only through
the effective use of statistics. The fact that it can be used properly only
by ~erts limits the chances of ma.~ popularity of this important and
useful science.
Some illustrations of the misuses of statistics have been given
below.

Shifting of Definition
I
(I) Monthly and Hourly Wage Rates. A firm had introduced pro-
ductivity methods with the result that productivity had increased. Since
the demand for its product was inelastic and labour laws did not permit
retrenchment, it decided upon reducing the working hours. As a
result, the monthly rates of wages could be increased only marginally.
A dispute arose between labour and management. The contention of
the labour was that despite significan~ increase in productivity, the wage'S
had increased only marginally; and in support of its argument, it de-
monstrated monthly wage statistics. The managements' argument wab
just the opposite. It maintained that the increase in wages has been
commensurate with increases in productivity; and in support of its
contentiori, it demonstrated average hourly wage statistics. Both the
labour and management were right. It depends on the definition of
wages which is lfdopted. The labours' definition will be considered
more apprQpriate "When wages are viewed as income of workers; and tha t
of management will be more appropriate when wages are viewed as cost
of production.
(il) BPlplayment of Women. The census of 1961 showed that the
percentage of working women in India had increased from 23.30 in 1951
to 27.96 in 1961. It might be concluded from this that the female labout
participation ratio increas-ed sig!,lificantly during the decade. But as a
matter of fact, a major part of this increase was'due tv the mclusion of
u11paid family workers and hOl,lsewives under the nOII)enclature 'workers'
IMP(,RTANCE, LIMlTATIONS .AND FUNCTIONS OF STATISTICS 27

Inaccurate Measurement Classification


. (i) Incidence oj-Crimes. The newspapers reported that the number
of convicts in jails had been increasing at a fast rate during recent years.
It was inferred from this that crimes Were on the increase. The Govern-
ment was reprimanded for growing inefficiency in police administration.
The Home Minister making a statement in the State Assembly S'llid that
the number of jail convicts had not increased due to an increase in crimes
but due to stricter penal provisions and strengthening of the police force,
especially of its detection wing. He thus maintained that police adminis-
tration has become more rather than less : efficient.
(ii) Performances of the First Five Year Plan. It was claimed by the
critics of the First Plan of India that' per capita income declined from
Rs. 266.5 at its beginning to Rs. 255.0 at its end. This led to the erro-
neous conclusion that the economy suffered deterioration during this
period and the Plan was a failure. But, in fact, this decline had occurred
due to a fall in the general price level and real per capita income had
'indeed increased. This could be shown by making comparisons of pe r
capita incollles in both the years at 1948-49 constant ,prIces that it had
increased to Rs. 267.8 at the end of the Plan as compared to Rs. 247.5.at
its beginning.

Inappropriate comparison
(1) Deaths in Hospitals. The statement that 'the incident of death
among sick persons is higher in hospitals than at home' is likely to lead
to the conclusion that more patients die in hospita1s than at home due
to lack of proper treatment and care. But this conclusion .turns out to be
completely erroneous if it is borne in mind that in India only seriously
ailing persons are hospitalised.
(il) It was claimed by a teacher that his teaching method was
superior to that of others. He supported argument by showing that all
the students in his .class secured first class: Investigation into the
matter revealed that unlike' others, all his students had secured first class
in previous examination_ and were merit holders. His success was,
therefore, due to better stuff in his class rather than to the superiotity of
his teaching method.

·Defective Method in SeleCting Cases


Isslle of Abortion. The Morning News reported that 70 per cent
people in the country were in favour oflegalisation of abortion. It had
come to this conclusion by a statistical analysis and interpretation of the
replies sent to it by its readers in response to a questionnaire. But a
broad based survey made by a social organisation showed that this was
entirely incouect. In fact, more than 80 per cent. people wei:e against
it. The newspaper had reached the errOneous conclusion as it was
based on the opinion of educated people who constituted only a small
minority in the population
28 PUNDAMENTALS OP STATISTICS

DIS1!RUST OF STATISTICS

Figuru may be incomplete or manipulated. Despite its importance and


usefulness the' science of statistics is looked upon with a suspicious eye
and is quite often condemned as a tissue of falsehood. It is said that "an
ounce of1tuth will produce tons of statistics;" or that "statistics- are lies
of the first order!' These statements indicate the extcht to which the
science of statistics has come in disrepute and is not trusted even in modern
times when its use has spread over all types of human activities. In our
:daily life we tend to accept statistical conclusions and the interpretations
placed on them, uncritically. But then we are misled so often by skilful
talkers and writers who deceive us with correc~ facts that We come to
distrust statistics entirely and assert that-"statistics can prove anything"-
implying, of course, that "statistics can prove nothing." Strangely
enough, whereas on the one hand statistics is condemned in such bitter
language, on the other hand it is also said : '~If figures say so it cannot
be otherwise" or "figures don't lie". The reason for such diversity
in views is not far to. seek. The reason lies in the innocence of figures.
Figures are innocent and easily believeable. It is human psychology that
when facts which are supported by figures come before a man they are
easily believed. Numerlcal data convey a sense of precision and accuracy
and consequently it is only natural that a man believes a statistical state-
ment usually without questioning it. There is a great danger in this
type of approach to a numerical statement. Figures which support a
particular statemellt may not be true. They may be incomplete, in-
accurate or deliberately manipulated by prejudiced persons who wish to
conceal the truth and want to present a false picture to achieve a particular
end. Latef on when people realise that even a statistical statement has
belied their expectation, their faith in the science of statistics is shaken
and they begin to condemn it in the strongest possible language. The
fault, in such cases does not lie with the science of statistics; it lies with
those who use it. If wrong figures have been used they are bound to
give wrong conclusions and it is the duty of, the persons who use statistics
to see that the figures that they use are free from all types of bias and
have been properly collected and scientifically analysed.
Can prove anything is' not correct. Sometimes it is remarked that
statistics can prove anything. But people who say so are usually those
who do not know the A. B. C. of the subject. Statisticians rarely claim
to prove anything. They are taught to examine the reliability of their
data and the justification of their conclusion with the utmost suspicion,
due care and caution. Statisticians generally take care that the chances
of their statement being correct are at least 20: 1 and as such it is abso-
lu~ely wrong to say that statistics can prove anything.
Does no/ prove anything, is merelY a fool. Many people disbelieve
statistics because it does not prove a particular thing in a particular
manner. It should be clearly understood that statistics does not prove
anything. Statistics is only a method of approach;' it is a tool in the hands
mPOllTANCB. LIMITAnONS AND FUNCTIONS OP STATISTICS 29I

of a statistician to present a phen,pmenon in a particular manner, no thi1lg


beyond it. The science of statistics does not prove or disprove a thing.
it merely presents the true facts about a problem and leaves the rest to
other people. Different tn>es of conclusions can be arrived at from the
same set of figures if there IS a difference in the approach of various per-
sons. From one set of figures a communist can prove that Russia has
eliminated unemployment and improved the lot of the working class
and ftom the same set of figures an anti-communist can derive an oppo-
site conclusion. This fundamental di:£ference in approach or we may
call it bias in the minds of the investigators, has been responsible for
different conclusions being drawn from the same set of figures. For
this, the science of statistics cannot be blamed. It is not the fault of
the science. It is the mischief of those ~ho use it.
Need of Gallik",. A layman has, therefore, to be very cautious. If
figures have been given without the context in which they were collected
or if they are not complete, or if they relate to a phenomenon different
from the one under investigation, or even if the figures are correct and
complete but a faulty or biased logic is applied to them, the conclusions
arrived at are bound to be wrong and would strengthen the'beliefthat
statistics are Hes of the first order. Unfortunately a set of figures canno~
by itself disclose whether it is dependable or not. Figures-'ao not bear $
trade mark of their accuracy. All figures appear to to be correct and\
innocent. It is this difficulty of separating ~ood frugres from bad ones:
which is responsible for discrediting the SClence to a considerable ex-
tent. It lis, therefore, necessary that whenever we use statistict we
should first of all make sure that they were properly collected and are
suitable for the problem under investigation.
StaJistiGa/ methods are de/itate loo/s. Statistical methods are very-
delicate and since they are liable to be misused easily they are very dan-
gerous as well. The results of the misuse of statistical methods or statis-
tical data should not be used to discredit the science. If a child cuts his
finger by a sharp knife or an insane person hits his own head or that.
of anyone else with a stick, the fault does not lie with the knife or !he
stick. It lies with the person who uses it. Similarly if statistical method
are not properly used the fault does not lie with the science of statistics .
but with the person using it. Statistics are tools and can be used in any
way we like and it is in our own interest that we use them in a proper
manner.
"He who accepts'statistics indiscriminately will often be duped
unnecessarily. But he who distrusts statistics indiscriminately will often
be ignorant unnecessarily. There is an accessible alternative between
bliriC:l gullibility and blind distrust. .. It is possible to interpret statistics
skilfully. The art of interpretation need not be monopolized by
statisticians, though, of course, technical statistical knowledge helps.
Many important ideas of technical statistics can be conveyed to the non-
atatistitian without distortion or dilution. Statistical interpretation
depends not only on statistical ideas but also on ordinary clear thinking.
30 FUNDAMEN'l'ALS OP STATISTICS

Clear thinking is not only indispensable in interpreting statistics but is


often sufficient even in the apsence of specific statistical knowledge ...
I:or the statistician not only death and taxes but also statistical fallacies
are unavoidable. With skill, common sense, patience and above all
objectivity, their frequency can be reduced and their effects minimised.
But eternal vigilance is the price of freedom from serious statistical blun-
ders." (Wall"s and Roberts).

FUNCTIONS OF STATISTICS AND STATIS'I'lCIANS

To (ollerl and ana!Jse data. At this stage it is necessary to pause for


a moment to see what are the functions of statistics and statisticians.
We have discussed above that the main function of statistics is to collect
and present numerical data in a systematic manner so that it may be
analysed in a scientific way. Statistics is, as we have seen, not meant
to prove anything; it is merely to analyse the phenomena in a s(;ientific
fashion. Accordingly, the role of a statistician is to collect the data in
a proper fashion, to scientifically analyse it and to set a stage for its
correct interpretation. It is futile to expect him to work wond~rs or to
give a particular shape to given material. He has simply to arrange the
·material in a proper form sO that its real worth may be exposed. After
doing this the statistician has finished his job. The ta~k of giving a
particular shape, to material is beyond the scope of the scienc~ of statis-
tics. Statistics' are like raw materials and to convert them .into finished
products is the work of people other than statisticians. The use of eco-
nomic statistics for the purpose of formulating an economic policy is
the work of an economist not of a statistician. A statistician would
simply collect and analyse the economic statistics. He would hot for-
mulate an economic policy on their bas s; he would leave this work for
the economist.
\

In general a successful statistician requires not only a sound


knowledge of statistical methods but he has to be a specialist in the branch
in which he is carrying on an investigation. If a statistician is asked
to find out if fertilizer A is better than fertilizer B when applied to po-
tatoes, he should first know all about fertilizers in general, about their
application, growing of potatoes and many other connected things.
In this case he should be an agricultural expert. Practical statistician
is, therefore, in the first place an engineer, an economist, a biologist
or some other specialist and he has to acquire special knowledge of the
field in which he is making use of statistical methods.
A statistician should also not forget the limitations of the science
of statistics. He should not forget that laws of statistics are true only
on an average, and that he cannot boast of the same precision which
is found in experimental methods. He has to work under various
handicaps and he should be very cautious and vigilant. Even a slight
mistake on his part is liable to render his entire work useless and defec-
tive. He should be free from 'bias, should have profound common sense
IMPORTANCE, LIMITATIONS AND FUNCTIONS OF' STATISTICS 31

and should work like a true re!ltarcher without any prec<1nceived notions
or conclusion about the problem under investigation. It should not be
forgotten, as W. 1. King said that "statistics is a most useful servant but
only of great value to those who understand its proper use.'r

Questions

I. Discuss fully the importance of statistics as an aid to commerce.


(B. Com. Allahabad. 1942).
2. "Knowledge of statistics is like a knowledge of foreign language or of
alge~ra. It may prove of use at any time under any circumstances." EKplain.
3. "The statistics of a business can be treated practically and the prepamtion
and study of business statistics can be made a more exact science than the study of
national and social statistics." Explairn. (B. Com. AflahalJad, 193 2).
4. Explain clearly the statistical methods used in any scientific investigation
and show their importance to theoretical economists and practical businessmen.·
J. Discuss the scope, utility and limitation of statistics. (B. Com. Agra, 1937)
6. Discuss the importance of statistics and show how it can help the extension
pf scientific knowledge, the establishment of a.soundb usiness and the introduction
of social and political reform. . (8. Com. Agra. 1942)
7. "Figures never lie". "Statistics can prove anything". Comment on the
above two statements indicating the reasons for tre existence of such divergent views
regarding the nature and functions of statistics. (B. Com. Agra. 1948).
8. "Statistics could not be used a~ Ii blind man does a lamp-post for support
instead of for illumination". Comment on the above remark. (M. A. Agra, 1946)
9. Write an ess~ on "Statistics in the Service of the State." (1. C. S., 1946).
10. Discuss the usefulness of statistics to businessmen and members of legislative
councils and local badies in 1ndia. (B. COlli. LU(htO/ll,I942)
I I . In what ways can statistical ~ethods be misused by interested persons.
Give at least two examples of the misuse 0 f statistics. (B. Com. Lu(.knom, 1939
12. "Science ,}Vithout statistic~ bears no fruit and statistics without science h a
no root." Comment.
I;. Give the important limitations and uses of statistics. Show its relatiQn to
canomics and mathematics. (B. Com. Lutkno1Jl. 19;8).
14. ExpJain tbe utility of maintaining statistics in industrial and commercial
concerns. (B. Com. Agra, 1949)
IJ. ;"flor. the most part statistics is a method .of investi~ation far use when
other methods are of no avail; it is often a last resort of a forlorn hope".
Comment.
t 16'. "Statistical analysis properly conducted is a delicate dissection of uncer-
ainties, a surgery of suppositions." Explain tne above statemeots.
I7. "Public knows too little of the statistician as a conscientious and skilled
serv,lOt of true science." In the light of the abovc statement explain why the
science of statistics is not well known to the common man.
18. "There are lies, damn lies and statistics." Comment.
19. "There is more than a germ of truth in the suggesri,)ll that in a society where
statisticians thrive, liberty aDd individuality are'likely to be emasculated." Do you
agree with tbe above statement? If not, why?
20. Clearly show how in modern times statistics is the science of human welfare.
32 FUNDAMENTALS OF STATISTICS

21. Discuss the functions of statistics and ~tatistician8.


n. "The science of statistics is a most useful servant, but only of great value to
those who understand its proper use."-Kmg. Comment.
23. "He who accepts statistics indiscriminately will often be duped unnecessarily.
But he who distrusts statistics indiscriminately will often be ignorant unnecessarily."
Comment.
24. "When you can measure what you are speaking about and express it in num-
bers you know something about it; but when you cannot measure it, when you cannot
express it in numhers, your knowledge is of a meagre and unsatisfactory kind."
-Lcrrd KelPin.
Elucidate the above statement.
25· ........ eternal vigilance is the price of freedom from serious statistical blun-
ders." Explain. \
26. Bxplain and illustrate the use of statistics in the field of Business management.
27. Give some examples of the misuse of statistics and point ·out the precaution
which should be taken in using statistical data.
Pr.eliminaries to the
Collection of Data 4
N,ed. In order to apply the statistical methods to any type of
en,uiry it is essential that statistical data be collected as statistical ana-
l YSlS is not possible in the absence of quantitative data. Data are in
fact the fundamentals of statistics. Thetefore, an all-important step in
statistical work is the collection of facts and figures. The problem
appears to be very simple and easy on first thought but actually it is
not so. A careful study of the technique of collection of data and their
presentation in proper form is absolutely necessary as these things
form the ,very foundation of tbe statistical information that bas to be
.provided. Every aspect of the problem has to be carefully examined
so that the real purpose of the collection of facts may be fulfilled. As
such in all statistical investigations before the collection of data begins
a large number of preliminaries have to be undergone.
Probl,,,, should b, num,rical. A statistical study is always undertaken
to supply answers to some questions Which emerge from any important
problem. But all types of questions cannot be answered statistically.
Therefore, the firs~ thing to be observed by a statistical investigator is
whether the problem and more partiCularly the question arising out of it
is capable of quantitative expression. Such questions as, how great was
Mahatma Gandhi, how brave was Subhas Chandra Bose or how vir-
tuous was Aurobindo? cannot be answered by the use of statistical
methods. A qpestion, to be suitable for statistical investigation should
be like, what IS the average production of wheat per acre in India?
What is the national income of this country or what is the total population
of the Indian Union? All these questions are capable of being
answered in absolute or relative numbers.
Having verified the fact that the problem under consideration is
capable of quantitative study. other things to be thought about are.
the object of the investigation and the scope of the enquiry. It is also
necessary to study beforehand the data iPli;," Ikw, to IN tolletl,d in order
to fulfil the objects of investigation, and the sources from which the
collection has to be done. When all these things have been decided
then only a decision can be taken about the type. of enquiry which is
to be conducted. At this stage the statisticaJ tmils are decided and de-
fined and an idell is also formed about tite d'grlt of aCttlrtlry desired. Thus·
before the collection of the data actually begins the following steps must
be carefully discussed and analysed:- \
1. Object and scope of the enquiry.
2. Sources of information.
3. Type of enquiry to be conducted.
3
34 FUNDAMENTALS OF STATISTICS

4. Statistical units and their definition.


S. Degree of accuracy desired.
We shall now study them in turn.
OBJECT AND SCOPE OF ENQUIRY
Object. The determination of the object of enquiry is a very im-
portant step in statistical investigation. If the object of the enquiry
is properly determined and defined many difficulties or the collection
and analysis of data are automatically removed. It becomes easy to
decide which data are useful and essential for the purpose of the investi-
gation and which are comparatively less important and can be left out.
This considerably improves the degree of accuracy of the collected data.
The knowledge of the purpose of en<Juiry serves as a guide in the collec-
tion of facts and the various difficulttes experienced. by the investigtltors
are easily solved if the object of the enquiry is kept in mind. Moreover,
with the object of enquiry in mind, it is always possible to have a uniform
approach to different problems which arise during the course of collection·
and analysis of facts and figures. The purpose of an enquiry may be
general or spedfie. Census of population and census of production are
examples of statistics collected for general purpose, while statistics of
cost of living or of indebtedness of a particular group of persons, are
examples of statistics collected for specific purpose.
Seope. It is also essential that the scope of the- enquiry is also
determined beforehand. The extent to which statistical data can be
useful and should be collected for the purpose of a partietUfl.r investi-
gation should be decided before the actufll work of collection begins.
If a very large quantity of statistical data are collected they are likely
to become unmanageable and it may not be easy to draw correct
inferences from them. On the other hand, 1f the quantum of statisti-
cal data collected is inadequate, there is every possibility of the con-
clusions drawn being incorrect. It is therefore, essential that efforts
should be made to come to a corr'ect conclusion about the exact
quantum of data that have to be collected. In some cases complete
record of the whole data may be necessary while in others only selec-
tive study might suffice.
SOUl\CES OF INFOl\MA'l1ION

Primary and secondar;y data. Having determined the object and


scope of the enquiry it becomes necessary to think about the sources
from which data have to be collected. Broadly speaking, the source
of information may be either: (i) Primary or (ii) Secondary. Primary
data. are those which are collected for the first time by the investigators
or enumerators working under him, while secondary data are those
that have already been collected ~y others and which ate usually available
in journals, magazines Ot research publications. the nature, scope and
objects of the enquiry have to be taken into account for deciding whe-.
ther the data are to be collected originally or whether published or UO-,
published information which has already been collected can be utilised
for the purpose of investigation. If statistics have to be collected in the
PRELI!4lNAllIES TO THE COLLECTION (W DATA 35
shape of primary data we shall have to decide about the 'persons from
whom such information is to be gathered. Some enquiries like
population census or rural indebtedness J;llay involve varied sources
fs:,om which data have to be collected. while in small enquiries like
tourist traffic in hill stations or expenses of university education in a
particular state, the sources may be comparatively few and less varied.
If secondary data have to be used adequate precautions must be taken
otherwise results of the investigation are likely to be inaccurate. A
careful study should be made as to how the published statistics were
collected, what was the purpose of their collection and whether they
are suitable for use in the enquiry that is being conducted. Published
statistics should never be taken at their face value.
TYPE OF ENQUIRY
Object and scope. A decision about the type of enquiry most suit-
able for a particular problem can be taken only after a study of a large
number of factors. Among these, the object and scope of the enquiry
are comparatively more important. The type of enquiry is considerably
influenced by the object with which the investigation is conducted,
and the scope of the enquiry has also a considerable bearing on this
problem. If, for example. the object of an enquiry is to find out the.
total area under wheat .in Uttar Pradesh. the type of enquiry best suited
would be one. in which there is complete enumeration. A sample study
would not give dependable results. If, on the other hand, the object
of the enquiry is to find out the normal yield, a sample survey would give
fairly accurate results. Similarly. if the scope of the enquiry is wide. it
has to be of one type and if the scope is narrow the enquiry has to be of a
totally different form. Thus. we find that a decision about the type of
enquiry most suitable for a particular case depends considerably on the
object and scope of the investigation.
Who wants inforl1lation. Another factor affecting the decision about
the type of enquiry is the answer to the question, on whose behalf the data
are being co/I~cted. If the enquiry is being conducted on behalf of the
state, the task of collection becomes comparatively easier as the state can
compel people to su~ply the necessary information and that too at regular
intervals, and at thl:lr own cost. If th.e investigation is being conducted
on behalf of an institution other than the state, for example, chamber of ,
commerce, university, or a trade union, there is the force of moral pre-
ssure only. These institutions can 0lllY persuade and request people to
give the necessary information. The type of enquiry in such cases is
bound to be of a different fashion. If the data are being collected by an
individual on his own behalf, the position is still worse.. He can only
beg for information which he needs and the edquiry in this case would be
of a still different type. Besides this the financial resources of different
tYPes oflersons or institutions, conducting statistical investigations also
differ. state can spend much more than a private institution and simi-
larly a private institution can spend ordinarily much. more than an in-
diVldual. Thus a decision about the type of enquiry is affected by its
financial implications also. ."
36 FUNDAMENTALS OP STATISTICS

HOIII do data tmergt. The manner in which statistical informa-


tion emerges is another factor which has a bearing <;,n the typo of en-
quiry which should be done in a particular 'case. If the data are to be
collected originally, or in other words, if the primary data have -to be
collected the type of enquiry suitable in such cases would differ from the
type which would be ideal if the data in question are secondary. If the
secondary data have to be compiled and used the problems of definition
of terms and units, etc., are no more, as the data have already been collect-
ed with certain definitions of the units which cannot be changed. In case
of primary data various terms and units will have to be defined in the
light of the objects of the enquiry; the manner of the collection of data
in the above case would be entirely different from the previous one.
A decision about the type of enquiry best suited for a particular
investigation should be taken after due consideration of all the factors
discussed above. An enquiry may be
(a) Census or sample
(b) Original or repetitive
(c) Direct or -indirect
(d) Open or confidential
We shall now briefly expla~n these types.
eenSllI ~r sample. A census enquiry is one in which all the units
connected with the problem are taken into accs>unt, while in sample
enquiry only some selected representative units are studied. Whether
a particular enquiry should be of a census type or sample type depends on
a variety of factors like object, scope and nature of the investigation, as
also on the amount of money available for the puJ:tlose. A census enquiry
is usually a costly affair and requires a big organ1sation which ordinarily
only a state or a big private institution can afford.
Original or repetitive. An original enquiry is one which is conducted
for the first time and a repetitive enquiry is one which is carried on in
continuation or repetition of previous enquiries. In an original enquiry
the plan of investigation has to ire drawn whereas in repetitive enquiry
the previous plan is only modified to suit the new situation. It should,
however, be remembered that in the repetitive enquiry the definition of
various terms should not be materially altered as this would render com-
parison inaccurate. It should also be remembered that old definitions
very often need a change, and as such, the advantages of a new definition
should be weighed against the drawback <If sacrificing comparability and
continuity of figures.
DirHI or indirect. Statistical enquiries may be direct or indirect.
Direct enquiries are those in which the data are capable·-of quanti-
tative expression and can be directly measured whereas indirect enquiries
are those in which the problem is not capable of quantitative measurement
directly. If we have to collect statist1cs of the age, height, weight or
income of a group of persons the enquiry would be direct enquiry as all
these things are capable of direct measurement. On the other hand, if
the problem relates to intelligence Or character or a certain group -of people,
PRELIMINARIEs TO THE COLLECTION OF DATA 37
the enquiry y/ould have to be an indirect one as these phenomena
are not capable of direct quantitative measurement. In such cases some
factors which have an indirect bearing on the problem and which can be
quantitatively measured will have to be studied. For example, to study
intelligence of a group of persons we may have to study the marks ob-
tained by the group in a certain test and thus we may have some idea
about the main problem.
Open or Gonfldential. Another classification of statistical enquiries
can be open or confi~ential. {\n open enquiry ~s one which is not con-
udenrlal and the deta11s of which are not kept Hi secrecy. Most of the
enquiries conducted by the state, private institutions and even individuals
are of this type. However, the results of certain enquiries are not open to
public and are kept confidential. Private bodies like manufacturers'
associations, employers' associations or trade' unions sometimes collect
information, the details of which are confined only to their members and
none else. Such enquiries are of a confidential type.
STATISTICAL UNITS

Need of definition. The collection of statistics necessitates measure-


ment or counting, and as such it is essential that the unit in which the
data are to be collected should be properly defined. In the absence
of a proper definition of the unit it is quite likely that the items which
should have been included are omitted and those which should have
been .omitted are included. At the first thought it might appear to be
a very easy and even unnecessary step but a little thinking would clearly
show hbw difficult and important the problem is.
Physical and arbitrary, IInits. "The unit of measurement applied to
the data in any particular problem is the statistical unit." In many
studies the unit to be used is conventionally fixed and is well determine'd
and defined. Physical units of measurement like ton, pound, yard, feet;
inch, hour and year, etc., are examples of this type. These units do not
need any explanation or definition. However, in many statistical studies
such customary and legal units are not available. It is in such cases that
a statistician has to arbitrarily decide about a unit and has to give it a
proper definition. In social sciences such situations. arise very frequently.
For instance, if an enquiry is undertaken about the wages of workmen
in any industry the unit of measurement will have to be-carefully defined.
Wage is a very general and vague term. It may refer to money wage,
or real wage. piece wage or time wage. of 'Skilled worker or of unskilled
worker, weekly wage or'monthly wage, and so on. Further, a week may
be of 48 working hours or of 40 working hours or less or more. Under
such, circumstances, which unit of wage income should be used, is a-
question not easy to answer. It is, therefore. essential that a statistician
defines the units of data before he commences the work of collection.
The unit of measurement should be uniform throughout the study of a
p~ticular pro!:?lc:pl.~""V ."
38 FUNDAMENTALS OF STATISTICS

Requirement:> of statistical units


Should be unambiguous and specific. The first and foremost require-
ment of a statistical unit is that it should be unambiguous and unmis-
takable. If the unit is not specific, and if its meaning is liable to be
misunderstood, the data collect~d would suffer from various types of
inaccuracies. It is necessary that the units'are properly denned, as in the
absence of proper definition ambiguity about its meaning is bound
to arise.
Should be stable. The statistical unit should be stable. If there
are significant fluctuations in the value of a unit the data collected at
different times or at different places would not be comparable and much
of their utility would be lost. Fluctuations in the value of currency or
difference in weights and measurements at different places may create
unending difficulties in comparison and analysis.
Should be appropriate to enquiry. The unit should further be appro-
priate to the enquiry and should be capable of correct ascertainment.
As has been noted earlier the definition and concept of a statistical unit
differ from enquiry to enquiry. Price may mean retail price in one en-
quiry, wholesale price in a second enquiry and cost price in a third en-
quiry. It may be used in other senses also. It is essential that the
unit is defined in such a manner that it completely suits the purposes
of enquiry. ~
ShOUld be homogene()lls. Homogeneity of the units is another essen
tial thing which requires careful consideration. Unit must be uniform
throughout the enquiry. The unit should imply as far as possible the
same characteristics at different times or at different places. If the data
are not homogeneous they can be broken up in groups and sub-groups to
secure uniformity. For instance, if data relating to industrial accidents
are being collected, the accidents cll-n be divided into a number of classes
on the basis of the type of injury and compensation claimed. Thus even
heterogeneous data can be made homogeneous in small groups to ensure
uniformity in study.
Types of statistical units
Broadly speaking statistical units can be of two types, viZ : -
(a) Units of collection.
(b) Units of analysis.
(i) Units of collection. Units of collection are those units in which
figures relating to a ?articular problem are either enumerated or esti-
mated; for example, production of wheat in, India may be estimated
in tons, the consumption of dectricity in kilowatts and the exports
of c:>tton in bales. Units of measurement may be either simple or (Ofll-
poslte.
Simple units of collection like ton, pound, bale, kilowatt, yard, and
hour, etc., are not at all difficult to define. Their meaning is general
and they are in common use. However, care should be taken in their
actual us,!. For example, bales of cotton can be of different weights. In
PREUMINARIES TO THE COLLECTION OF DATA 39
such cases a standardised definition of the units must be used and this
fact ,should be mentioned: Similarly most of the monetary units have
different values in different countries and even in the same country at
different times che values are not the same. Allowance for such varia-
tions must alWl!.ys be made.
A tompositl unit is one which is formed by adding a qualifying word
to a simple unit with the result that its !)cope becomes restricted and its
definition becomes rather difficult. "Mile" is a single unit and its scope
and meaning are very clear; but if this word is preceded by a qualifying
word "ton" then "ton-mile" becomes a com}?osite unit. It has now a
restricted scope and it requires a special definttion. Ton-miles are equal
to the number of tons multiplied by tll.e number of miles carried. Other
examples of composite units are passenger-miles, labour-hours, kilowatt-
hours and bus-miles.
(it) Units of ana!Jsis. Units of analysis and interpretation, as their
name suggests, are those units with which statistical data are analysed and
interpreted. They include ratios, pertentages, rates and ro-iffitienls. All
these are very useful for the purpose[ of comparisop. Comparison in
statistical analysis may relate to tIme, place or condition. A series relating
to annual production of manganese in India during the last ten years is a
time-series; if a comparative study of the production in different years is
to be undertaken, it can best be done by calculating ratios, coefficients
or percenta~es. Similarly series relating to space or condition are also
analysed WIth the help of such units. Ratios and coefficients involve
comparison between the numerator and denominator both of which are
supposed to be homogeneous. Similarly percentages and rates (per 1000)
are comparisons of certain figures in relation to a fixed level of 100 and
tOOl) re~pectively. Elsewhere in this volume we shall discuss the fallacies
of ratios and percentages and we shall point out the precautions which
should be taken while making use of these units for the purpose of making
comparisons or drawing inferences.
DEGREE OF ACCURACY
Abs()llIle acturary is impossible. Before commencing the work of
actual col!ection of data it is necessary that the investigator has some
idea in his mind about the degree of accuracy which he desires in his
estimates. The type of enquiry and the mode of collection of data
are affected to a considerable extent by the degree of accuracy whicn
is aimed at. It should be kept in mind that absolute accuracy is impossible
to be achieved, and as such efforts must be made to achieve only a reason-
able standard of accuracy. In most of the statistical investigations,
perfect accuracy, even if it were attainable, is hardly of much use and a
reasonable degree of accuracy is enough to draw dependable inferences.
A decision about the degree of accuracy should be made with regard
to the purpose of investigation and the nature of enquiry. The degree of
precision needed by a grain merchant in Weighing grain is much less than
that needed by a chemist in weighing medicine.
40 FUNDAMENTALS OF STATISTICS

The standard of accuracy aimed at should be stated, and if possible,


the limits of the probable error should also be mentioned. We shall
discuss the concept of accuracy in greater detail in the next chapter.

The above discussion gives in brief an idea of the preliminaries


that are necessary before the actual work of collection of data com-
mences. In the next chapter we shall discuss the methods of the collec-
tion of statistical data.

Questions
1. Discuss the preliminary steps which should be taken before commencing
the work of 'collection' of data.
z. Why is it necessary to determine the object and scope of the enquiry before
planning an investigation i'
3. What is a statistical unit? Is it necessary that the data be homogeneous i'
(B. Com. Agra, 1939).
4. What steps would you take to organise an economic survey of a typical
Indian village?
5. Describe the various stages in conducting a primary economic investigation.
What precautions will you take at each stage i' (M. A. &IJ Punjab, 195 0 )'
. 6. Wh~t is meant by (a) units of collection, and (b) units of analysis? Explain
theIr respective uses. /
7. Differentiate between simple and composite units. Give examples of each.
8. Write a note on the purpose and utility of planning a statisticll investigation.
9. What is meant by degree of accuracy? How should it be determined jI

10. Distinguish between primary and secondary data. 111u~trate your answer
with examples.
Collection of Primary and
Secondary Data 5
Primary and secondar'y data. After the preliminaries discussed in
the last chapter have been gone through, the task of the collection of
data begins. Statistical data, as we have already seen, can be either
primary or secondary. ,Primary data are those which are collected for
the first time and are thus original in character, whereas secondary
data are those which have already been collected by some other per-
sons and which have passed through the statistical machine at least
once. Primary data are in ,the shape of raw materials to· which statis-
tical methods are applied for the purpose of analysis and interpreta-
tion. Secondary data are usually in the shape of finished products since
they have been treated statistically in some form or the other. After
statistical treatment the primary data lose their original shape and become
secondary data. On a closer examination it will be found that the dis-
tinction between primary data and secondary data in many cases is one
of degree only. Data which are secondary in the hands of one may be
primary for others. Statistics of agricultural production are secondary
data for the Agriculture Department of a Government, but for the pur-
pose of calculation of national income these data are primary, because
they will have to go through further analysis and their shape will not
remain the same.
Factors affecting choice of method. It is obvious that the methods
of the collection of primary data and secondary data would not be exactly
identical because in one case the data have to be originally collected
while in the other the work is of the nature of compilation. There are
various methods of the collection of primary and secondary data and the
choice of the method depends on a number of factors. Nature, object
and scope of the enquiry are the most important tbings on which the
selection of the method depends. The method selected should be
such that it suits the type of enquiry that is being conducted.
Availability of finance is another factor which influences the selec-
tion of the method of collection of data. When financial resources at
the disposal of the investigator are scanty he shall have to leave aside
expensive methods even though they are better than others which are
comparatively cheap.
Availability of time has also to he taken into account. Some methods
involve a long duration of enquiry while with others the enquiry can be
conducted in a comparatively shorter duration. The time at the disposal
of the investigator thus affects the selection of the technique by which
data are to be cotlected.
42 RUNDAMENTALS OF STATISTICS

METHODS OF CoLLECTING PRIMARY DATA

The following methods of the collection of primary data are in


cllmmon use : -
(a) Direct personal investigation.
(b) Indirect oral investigation.
(c) By schedules and questionnaires.
(d) By local reports.
We shall briefly discuss each of them in turn.
Direct personal investigation
In direct personal investigation as the name suggests the investi-
gator has to collect the information personally from the sources con-
cerned. He has to be on the spot for conducting the enquiry and has to
meet people from whom data have to be collected. It is necessary that in
such cases the investigator has a keen sense of observation and he is
very polite and courteous. He should further acquaint himself with
local conditions, customs and traditions so that he is in a position to
identify himself fully with the persons from whom the information is
sought. In some cases it may not be possible or worthwhile to contact
directly the persons concerned and in such cases the investigator has to
cross-examine other persons who are closely in touch with the sources
of data. The information elicited in such a manner should be carefully
used and the investigator should make sure that the persons' from whom
data are being collected actually know the facts fully and catideliver him
the goods. The investigator has to be very tactful and cautious in such
cases. He should put easy and simple questions which are capahle of
being answered precisely and in a language which is not vague.
The method of direct personal investigation is suitable only for
intensive investigations. It involves enormOIlS cost and usually requires a
long time. It is naturally not suitable for extensive: enquiries where the
scope of investigation is wide. Further, in this method the bias or
prljllliice of the investigator can do a lot of damage as he is in sole charge
of the collection of data. This method, however, gives very satis-
factory results if the scope of the enquiry is narrow and if the investigator
is fully dependable and is completely unbiased.
Indirect oral investigation
When the above mentioned method cannot be used either on account
of the reluctance of persons to part with information when approached
directly, or on account of the extensive scope of the enquiry or on account
of some other reason an indirect oral examination can be conducted. In
this method data are not collected directly from the persons concerned but
through indirect sources. Persons who are supposed to have knowledge
about the problem under investigation are interrogated and the desired
information is collected. Usually in such enquiries a sOlall/isl of questions
relating to the investigation is prepared and these questions are put to
different persons (known as witnesses) and their answers are recorded.
COLLECTION OF I'.IMAB.Y AND SECONDAB.Y DATA 43

Most of the commissions and committees appointed by the Govern-


ment to. collect statistical data or to carry on such investigations in which
factual data have to be compiled, make use of this method. They re-
quest different types. of people to come and give evidences and on the
basis of these records, facts about different problems are ascertained. In
such enquiries the evidence of one person should not be relied upon and
the views, of a number of per&ons ~houl~ be asce!tained to find out the
real position. In this method the accuracy of data collected would largely
depend on the type" of persons whose evidences are being recorded. It is,
therefore, necessary to be very cautious in the selection of these persons.
Invariably it should be seen that the person who is being questioned
(a) knows full facts of the problem under investigation;
(b) is not prejudiced;
(&) is capable of eXpressing himself correctly and can give a true
account; _and
(d) is not motivated to give colour to the facts.
Proper allowance should be made for the inherent optimism or
pessimism of the informants. Some people by nature are optimists while
others are pessimtsts. These persons may be honest and unbiased and
yet their eVidences in most cases are likely to be affected by their inherent
psychology. The .w:ll-known example ,of two dr';illkards (one optimist
and the other pesslmlst), each of whom was left With half a glass of wine
iltustrates the point very clearly. The optimist said, "What do I care
for the world, I have 'yet half the glass with me" and the pessimist
remarked, "What can I do in this world, I have only half t4e glass with
me." Both of them were stating facts correctly and yet the two state-
ments give entirely different impressions.
Schedules and questionnaires
An important method of the collection of data followed usually
by private lndividuals~ research workers, non-official institutions and
sometimes the Government also, is that of schedules and questionnaires.
In this method a list of questions relating to the problem under investi-
gation is prepared and printed and information is collected from various
sources in any of the following ways : -
(a) B'y sendipg the tpl8ltionnaire to the persons &oncerned and reques-
ting them 10 ansn'e'; the (juestions and return toe questionnaire.
The main advantage of this method is that it is least expensive
and with it, information can be collected from a wide area in a com-
paratively short period of time. If the investigation is properly conducted
the method can easily ensure a reasonable standard of accuracy. Success
in this method depends on the co-operation that the informants are pre-
pared to give. Generally it has been found that the informants adopt an
attitude of indifference towards such enquiries and in many cases do not
even returh the questionnaire. Even those who answer the questions do
so most hapha%ardly and in a very vague and unintelligible manner
Only those persons who are under the authority of the investigator or:
44 PUNDAMENTALS OF STATISTICS

i?-ve!ltigating ins.titution or those who are obliged to them in some form


or the other devote some time and energy in answering the questions. In
order to have correct answers the investigator should send a very polite
letter to the informants emphasising the need and usefplness of the
investigation that is being conducted and requesting them to give their
co-operation by sending correct replies. He should further give them
an assurance that if the informants so wish their replies would be -kept
confidential. Further the questions that are asked should be very
carefully framed. The questions should be : -
(1) Short and'clear.
(2) Easy to understand and answer.
(3) Few in number. .
(4) Free from ambiguity.
(5) Such as can be answered in Yes or No if opinion is sougbt On
a particular point.
(6) Corroboratory in nature.
(7) Not such which cald for a confidential information.
(8) Not such which may hurt the sentiments of the informants
or may arouse resentment in their minds.
However, this method cannot be used if the informants are illiterate.
If they are literate but adopt an indifferent attitude then also the method
should be used with utmost caution as in such cases likelihood of error
is very great.
(b) By sending the questionnaires through e1Iunierators to help the infor-
mants'in filling the answers. ,
In this method the enumerators go to the informants along with the
questionnaires and help them in recording their answers. The enumera·
tors explain the aims and objects of the investigation to the informants
and also emphasise the necessity and usefulness of correct answers. They
also remove the difficulties which any informant may feel in understand-
ing the implications of a particular question or the definition or concept of
difficult terms. This method is very useful in extensive enquiries and
with it, fairly dependable results can he expected. It is, however, very ex-
pensive and usually such enquiries can be conducted only by the Govern-
ment. Population census all Over the world is conducted by this method
In such enquiries it is necessary that not only the questions are simple
and few in number but the enumerators are also courteous and polite
and have proper training.
The selection of enumerators is a very important task and should be
carefully done. The enumerators should be explained the nature, scope
and subject of the investigation thoroughly and they should properly
understand the implications of the different questions put and the de-
finitions of the various terms used. The enumerators should have
intelligence and capacity of cross-examination for the purpose of finding
out the truth and they should be persons who are hard-working and
should have patience and perseverance.
1:0LLECTlON OF PIlIMARY AND SECONDARY DATA 45

By local reports
The last method of collection of primary data is through local
reports. In this method data are not formally collected by enumerators
but by the local correspondents or agents in their own fashion and to
their own likings. Obviously such data cannot be very reliable and
as such this method is used in those cases where the purpose of in'{es-
tigation can be served with rough estimates only and where a high degree
of precision is not necessary. This method has the advantage of being
least expensive and it also saves the botheration usually associated with
statistical investigatioq of other types.
REpRESENTATIVE DATA
As has been pointed out previously a statistical investigation can
be either of census type or of sample type. In a census enquiry all the
units assoCiated with a particular probl~m are taken into account where-
as in sample enquiry only a few selected units are studied and on the
basis of such studies attempts are made ~o draw generalisations which'--
may be applicable to the whole data. If, for ·example, we have to find
out the average monthly expenditure of the 2000 students residing in the
hostels of the Allahabad University and if we hold a census investigation
we shall have to study the monthly expenditure of each one of these 2000
students. If,. however, we hold sample investigation we shall select say,
200 students out of these 2000 and then study their expenditure. On the
basis of the study of these 200 units (techOlcally called a "sample") we
can draw conclusions which will hold good about the expenditure of all
~he 2000 students (technically called a "universe" or' "population").
The sample is considered to be a representative of universe and if the
sample has been properly selected and if its size is all right. whatever
holds good for the sample should also hold good for the universe. If
the scope of the enquiry is very wide a census investigation would not
only be-very expensive but highly cumbersome also. Moreover·it will
take a very long time and require a large number of enumerators. In
such cases a sample investigation is very suitable. A sample usually
gives representative data and the generalisations made on the basis of
such data usually hold good for the universe.
The most important point, however, is the Sel,ttlon of th, sampl,.
A sample study would give dependable conclusions only if the sampfe is
a true representative of the universe. Broadly speaking there are two
methods by which samples can be selected and they aro:-
(1) Deliberate or purposive sampling,
(2) Random or chance sampling.
Deliberate selection or purposive sampling
In deliberate selection or purposive sampling the investigator him-
self cho~ses from the uni\rerse few such units which according to his
estimates are best representatives of the population. His selection is
I For a detailed study see chapters on Sampling.
46 PUNDAMENTALS OF STATISTICS

deliberate and is based on his own ideas about the representativeness of


the sampled units. These selected units are intensively studied and
certain conclusions are arrived at. It is supposed that these conclusions
would hold good for the whole population.
This technique of selection has many drawoacks. The first and
the foremost of them is that the bias or prqudice of the investigator has enough
s,ope to Ulork and influence the seleaion. If the investigator is biased, it is
but natural that he would select such a sample which would give con-
clusions which suit 'his requirements and views. If, for example, an
investigator wants to shaw that the expenses of students residing in
the hostels of the university are very high he can select such a sample
which consists of those students only who are very aristocratic and who
spend much more than others. Another defect of purposive sampling
is that it is not possible to have a~ idea about the degrtn of accuracy achieved in
any statistical investigation conducted by this method. If the scope
of enquiry is very wide the selection of the sample by this method carr
never be recommended. However, if the investigator is unbiased and
has the capacity of keen observation and sound judgment even purposive
selection can give fairly ,clependable results.
Chance selection or random sampling_
In random sampling the selection of the units is pone in such a
manner that the chance of selection of each unit of the universe is the
same. In other words, the selection of the units depends entirely on
chance and one does not know before hand which units will actually
constitute the sample. It is for this reason that this method is also
known as the meth<#d of ,han,e seletiion. It is in fact a lottery method of
selection. How carl such a selection be made is a question not easy to
answer. Methods, which on first thought, appear to be perfectly ran-
dom may actually prove to be otherwise. If we have to select a sample
of 200 students out of 2000 hostellers, we can write their names on small
chits of papers and after folding them and mixing them together can
blind-folded draw 200 chits. This is the lottery method. It appears to
be random but in actual practice ~t may not be so. It is quite possible
that some chits were folded and pressed less than others and so their
size was slightly bigger than the size of other chits and if it was so, the
chance of the selection of bigger chits cannot be said to be the same as
that of the smaller ones. Thus, we see that it is not easy to have a purely
random sample. However, various methods are in vogue and among
them the technique of Tippets Numbers 1 is most popular. In random
sampling attempts are made to eliminate human bias in all forms and that
is why selections are usually made with the help of machines. Each unit
of the universe is assigned a number and then certain numbers are mecha-
nically selected to constitute a sample.
Chance selection or random sampling has many advantages over
purposive selection. The most significant merit of this system is that by
I Fot a detailed study see chapters on Sampling.
COLLECTION OF PRIMARY AND SECONDAR'i' DATA 47

theory of probability it is possible 10 hlJlJe an ;Jea aboRI Ihe e"DrS of esti-


malion, and we can always find out whether the results are significant
or not. It is possible to assign limits within which the true value of
a measure of universe must invariably lie. Another point in favour of
this method is that the selection is nol affected by the prejudice or bias of Ibe
investigator. As we have noted above, the selection in most cases under
this system is made by mechanical devices and naturally human bias has
hardly any scope here. But it must always be kept in mind that in many
cases it is difficult to say that the selection has been purely random and
that the sample is fully representative of the universe. However, as
far as possible, the selection of the sample should be done on a random
basis as it is always likely to give better results than the method of pur-
posive selection.
AC&llra&,J and site of ,fample. It should further be kept in mind that
the size of the sample has a relation to the degree of accuracy that it is
expected to achieve. Ordinarily, the bigger the size of a sample the
greater would be the accuracy, but a very big sized sample is likely to
become unmanageable and is very often unnecessary. No hard and fast
rule can be laid down with regard to the size of a sample. An ideal size
would depend on the type of the series and the size of the universe. If
the series is comparatively more variable the sample should be big to
cover up all types of variations. Again, ordinarily the bigger the universe
the greater should be the size of the sample. The accuracy in a random
sample has more or less a fixed relation to its size. The accuracy of a
sample increases with the rate of the square root of the increase in size of
the sample. If, for example, the degree of accuracy desired is to be
doubled, the size of the sample should be increased four-fold; if it is to
be trebled the size of the sample should be increased nine-fold. We
shall study this relationship in further details 4I chapters on Sampling.
Random sampling and the Iheory of probability. The technique of
random sampling is based on the Theory of Probability.l Probability
is a mathematical concept and indicates Ihe likelihood or the chance of the
happening or nol happening of a particular event. If, for example, a coin is
tossed it can fall in two ways-either with head up or tail up. Each
of these ways is equally likely and so the probability of the coin falling
head or tail up is equal. It is 1 {2. If, however, a dice is thrown, there
are six possible wars in which it can fall. The probability of its falling
with No.6 up is 1/6 because the chance of its falling with any of the six
numbers upward is equal. The chance of the dice not falling with 6
upward would be 516 as there are five ways in which it can fall without No .
..6 being upward. Thus, if an event can happen in a ways and fail to
happen in b ways and if each of them is equally likely, the chance of its
happening would be _!__b and of its not happening _!!__b' I( from a
a+ a+
pack of 52 cards one card is drawn at random the chance of its being

I For detailed description see chapters on Samp/in&.


48 FUNDAMENTALS OP STATISTICS

any king is clearly 4/52 and the chance of its being any card of spade is
13/52. This clearly indicates that if the chances of selection of all the
units in a universe are equal, and if from it, selections are made at ran-
dom, then the possibility is, that in the sample so selected the various type
of units would be in the same proportion in which they are in the universe.
On this basis it is said that random sampling gives a representative sam-
ple which contains the characteristics of the populatlOn. Further, as
has been pointed out earlier, the size of the sample and its accuracy are
also related. In ten tosses of a coin it is not unlikely that seven times it
falls heads and only three times tails. But if there are a 100 tosses there is
a greater chance of heads and tails being equal. If the number of tosses
is 1000 the chance of equal distribution of heads and tails is still greater.
The bigger the size of the sample the greater is the chance of accuracy.
Law of statistical regularity. Thus according to the rules of the
theory of probability, if from the universe a moderately large sized sample
is chosen at random, it is almost certain that on an average the sample so
chosen will have the same characteristics as the universe. It is on this
basis that games of chance are played successfully by a large number of per-
sons and the insurance companies are able to insure people against varlOUS
types of calamities. In statistics this law is known as the "Law of Statis-
tical Regularity. It is a corollary to the mail} .theory of probability.
The theory ofp,.obability tells us of the mathematical expectation of the success Dr
failure of an event and on this basis the law of statistical regularity tells us that
random selection from the universe is very likely 10 give a representative sample.
Law of inertia of large numbers. We have men'tioned above, that
there is a relationship between the size of a sample and its accuracy.
The larger tht. sample the greater would be the accuracy. The reason
for this lies in the fact that in large numbers the chances of compensatory
action are greater. If in the first ten tosses of a coin there are seven heads
and three tails, it is quite likely that in the next ten tosses the situation
might be reversed and there may be seven tails and three heads. The
larger the number of such experiments the greater are the chances of
one irregularity compensating the other. It is said on this basis that
large numbers have got an inertia or that they are more constant. The
production of wheat in the 'district of Allahabad might show great varia-
tions year after year but the production figures of the state ofU. P., would
not. vary much, because if in some districts the crop is above normal it is
very likely that in others it might be below normal. Similarly the pro-
duction figures of wheat for the whole of India whould show still less
variations and the figures of world production would show hardly any
significant change. This phenomenon is characterised as the "Law oj
Inertia of Large Numbers" which states that large numbers are relatively
more constant and stable than small ones. It is on the basis of this law
that we say that larger the size of the sample the greater would be its
accuracy.
It should not be concluded from the above discussion that the law
of inertia of large numbers does not allow any change in figures with the
passage of time. All that it means is that large numhers are more constant
COLllECTlON OF PRIMARY AND SECONDARY DATA 49

and stable than small ones. There are no violent fluctuations in large
numbers. After all the figures of world production of wheat do change
from time to time but these changes are not violent and sudden. They
are slow and gradual. Long-period trend is indicated by large numbers:
they simply ignore the short-period regular and irregular fluctuations.
COLLECTION OF SECONDARY DATA
Soqrces of secondary data
We know that secondary data are those which have already been
collected and analysed by someone else, and as such the problems asso·
ciated with the original collection of data do not arise here. Secondary
data may be either published or unpubli~hed. The sources ofpllblished data
are usually : -
(0) Qfficial publications of the central, state and the local govern-
ments.
(b) Official publications of the foreign government or interna-
tional bodies like the United Nations Organization and its
subsidiary bodies.
(c) Reports and publications of trade associations, chambers of
commerce, b~nks, co-operative societies, stock exchanges, anc
tnlde unions, etc.
(~ l'echnica~ tiade journals like the Economica, Indian
Journal of Economics, Commerce, Capital, etc., and books
and newspapers.
(t) Reports submitted by economists, research scholars"university
bureaus and various other educational associations, et~.
The .fOliren, of ilnpllbli.fhed data are varied, and such materials may
be found with ~cholars and research workers, trade associations, cham-
bers of commerce, labour b~eaus, etc. Many enquiries of a private
nature are conducted by these bodies and these findings are not pub-
lished and are usually ineant for the conswnption of their members only.

Editing and scrutiny of secondary data


The secondary data mu~1; be used with caution. It is usually very
difficult to verify ~u'ch data and to edit them to find out inconsistencies,
probable error,s and omissions. Scrutin, of the secondary data is essen-
tial bec~u.se the data might be inaeCllrate, ttnsllitable or inadeqllate. In the
words of Bowley, "It is never safe to take published statistics at their face
value without knowing their meanings and limitations and it is always
necessary to criticise arguments that can be based on them." Statistics
collected by other people canno't be fully depended upon as they may
contain many pitfalls and unless they have been thoroughly scrutinized
they should not be used.
4
50 FUNDAMENTALS OP STATISTICS

The secondary data should possess the following attributes :_


(i) They sh(1uid be reliable. The reliability of the data can be tested
by ·finding out : -
(a) Who collected the data and from which sources?
(b) Are both the compiler and the source dependable ?
(c) Were the data collected by the use of proper methods ?
Cd) At what time were the data collected? Can it be regarded
as normal time ?
(t) Are there any possibilities of deliberate or unconscious bias
on the part of the compiler?
(j) What degree of accuracy was desired by the compiler? Was.
it achieved ?
(it') They should be slIitable for the pJlrpose oj investigation. Even if the
data are reliable they should not be used if they are found to be unsuitable
for the purpose of investigation. Data which are suitable for one
enquiry may be entirely unsuitable for another. An example would make
the point cle'ar. If an enquiry is being conducted about the level of
earnings of factory workers and if some data collected by some agency
relating to wage level are being utilised, it is quite likely that these data
may be unsuitable for the purposes of the present enquiry. It is possible
that the data which are being used might relate to wages of skilled labour
only or might relate to the wages of day workers only or might include
bonus payments. In all these cases the data are upsultable for investigat.
ing the earnings of factory workers. The definition of various' terms and
units of collection must also be carefully scrutinized and the object,
scope and nature of the enquiry should also be properly studied. If
there are differences in th(Jse, the data are not lit to be used.
(iii) They should be adequate. TIle data may be found to be reliaJ:>le
and suitable but they may be inadequate for the purpose of the enquiry.
The original, data may refer to an area which is wider or narrower than
the area of the present enquiry and if it is so, they should not be used,
because there might be signi.ficant variations in different regions.
Further the data may not cover suitable periods; for a monthly study of a
phenomenon; yearly figures are inadequate. Again the degree of accuracy
achieved in the data may be found to be inadequate for the purpose of
the investigation in which they are proposed to be used.
Thus it is very risky to use statistics collected by other people unless
they have been thoroughly scrutinized and found reliable, suitable and
adequate.
Questions
I. Distinguish between primary and secondl\TY data. What ate the various
methods bv which prImary data arc collected ?
2.. "In collection of statistical data commonsense is the chief rcquisitlf and
e:s:perien£e the chief teacher." Discuss the above statement with commentS.
(M. A. Pafna, 19~I),
COLLECTION OF PRIMARY AND SECONDARY DATA 51
3. Mention the different kinds of statistical methods generally used in investi
gations. Are there any fields of enquiry where these methods cannot be used i'
(B. Com. Agra. 1940)
4. "Though figures ClUlnot lie. yet liars can figure". Expand the above state-
ment so as to explain its bearing on the use of secondary statistical data.
(M. Com. AI/ahabad. 1945).
5. How will you organise an investigation into· the handloom ,veaving industry
of Urtar Pradesh? Prepare a questionnaire for the purpose.
( B. Com. Allahabad. 1942.).
6. How far do the results "I statistical investigations depend upon correct
sampling? Compare the me.thods used to secure representative data.
~B. Com. Agra. 19~9
7. State and explain ,he law 0 f st:ltistical regularity. Di~cuss the methods)
generally used in sampling. (D. Com. Agra. 1941)
. S. Comp.lre the dirferent methods used in the collection of numerical data.
Explain the importance of determining a statistical unit. (B. Com. Agra. 1942.)·
9. Distinguish between a census and a sample enquiry and briefly discuss their
comparative advantages. Wl1ich of these methods would you prefer for caleulating
the total wages of,vorkers io a given industry? (M. Com. Agra. 1947).
(0. ,You are required to undertake a rapid sample survey for estimating average
size of a holding for rour province. How would you plan the survey and how would
yOU use the rcs!llts of tbis survey on a subsequent occasion?
11. It is desired to obtain reliable data to lind out the cost of production of sugar·
cane. in Uttar Pradesh. How will you proceed to organise the enquiry. Wbat various
points of importance will you consider and what decisions on each such point would
you make? (1. C. S. 1948).
12. What is a random sample? Explain the distinction between a random sample
and a representative sample. How would rou apply the technique of tandom sampl·
ing an enquiry into working class fami y budgets?
1;. Classify the methods generally employed in the collection of statistical data
alld state brieBy their respective medts and demerits. CB. Com. AI/aha.bad. 1946)
14. Draw up a suitable questionnaire for surveying the economic aspects of any
cottage industry in which yoU may be interested. BrieBy indicate how you will pro-
ceed to collect the relevant material.
15. Discuss the advantages of direct personal investigation as compared with the
other methods generally used in collecting data. (B. Com. Agra. 1950).
16. How will you organise an econOlnic survey of a stnaB Indian State com-
prising five towns and 1,000 villages. (M. Com. Allahabad, 1943).
17. If you are appointed to investigate the housing conditions of industrial
labour in Lucknow how will you proceed to do the job Give a specimen of the
~uestions that you would put. (D. Com. Lllt/moUl, 1944)
18. Compare the advantages and disadvantages of the census method and the
sample method of collecting statistics. B. Com. Ca/mlla. 1937)
19. Statistical investigations carried out by the Government arc usually based
~ither on complete enumeration of universe of reference, as for instance, the popula-
tion census. or on the study of "typical" cases as for instance, the proposals
regarding the economic censUS. Explain why the method of random samples is to be
preferred to either of these methods. (M. A. Allahabad. 19;5).
zoo Show the necessity of the uSe of method of random sampling in any
extensive investigation. How will you make use of thiS method in carrying out an
economic survey of the rural areaS of U. P.
21. How would you organise an investigation into the hand weaving industry
of U. P. ? Propose a queStio.rlnaire suitable for the purpose.
(B. Com. AI/ahabad. 194~).
52 P'UNDAamNTA'LS OP STAnSnCS

12.. What is'sampling' and what are its uses. Expltin how would you design
a sample survey to estimate an average size of holding in locality.
(M. A. A".4. 1947).
13. "It is never safe to take published statistics at their face value without know-
ing their meanings and limitations and it is always necessary to criticise the arguments
that ~n be based on them." (BollPlt}!). Elucidate. CB. Com. Allahabad, 1946).
24. Why is it neeessaey to sctutinizc and edit secondary data before its usc?
What' precautions would you take before ',sing such statistics ?
IS. Write short notes on :
(a) Theory of Probability.
(b) Law of Statistical Regulatlty.
(I) Law of In.ertla of Large Num~ets.
2.6. "In any sample survey there arc many sources of errots. A perfect survey'
is a myth". Discuss the ~tatement.
z7. Suppose you we-nt to study the changes in the e#cnt of indebtedness of
middle-class people of Allahabad for the next five' years. 'How would you proceed
to do it 7 Explain all the protesses. -- (8. Com. BtlnOral, 19S5).
z8. Descrlbe the procedure you wouJd adopt In order to obtain the necessary
Information for introducing compulsory primary education in a big city.
(B. Com. Btztloral, 19'2.).
19. "Statistics, especially other people's statistics, are full of pitfalls for the user".
(Conner) Do you agree with this statement ? '
50. "Samples arc devices for leaming about large maS$es by observ"jng a few
individual..... (Sneti~_).
Elucidate the above statement.
31. How would 70U conduct an enquiry about 'Payment of Wa~ in an in-
dustry P On what pOlOts would it be necessary for you to he clear before actually
beginning investigatIon work? (M. Com. Agra,19S7)'
31. How would you organise a marketing survey of the fruit trade in a particular
region wIth a view to making suggestions for its development? Explain the pro-
cedUre you Would fol~ow step by step. (M. Com. Agra, 1956).
Accuracy. Approximation
And Errors 6
Btlitin,g oj data. After collection of data the next step in a statistical
investigation 15 the ·scrutiny of the Ct?llected figures. This is technically
called ;tlitiltg of data. It is a necessary step as in most cases the collected
data contain various types of mistakes and errors. It is quite likely
that some question has been misunderstood by ~he informants, and if it
is so, this part of the data has to be collected afresh, or it may be, that
answers to a particu1a.s: question are, in general, vague, and it is difficult
to chaw inferences from them, or some of the schedules and question..
naires are so haphazardly blled that it is necessary to reject them. It is
also likely that some of the investigators were biased and the answers
&ned by them or the data collected by them show unmistakable signs of
their prejudices. In all such cases the collected data have to be edited
and modified. However, it should be, clearly understood that undue
tampering of data should never be doae. If only a few schedules are
defective they can be omitted but this too should be done very carefully.
,"In some cases the omission of a few schedules would not affect the general
conclusions, while in others this may entirely change the complexion
of the problem under study. As has been pointed out earlier, absolute
accuracy is neither 'possible nor essential but decision about the extent
to which irutccuracles, approximations and errors can be allowed, is a
very important step in statistical analysis and we shall study these things
in the fOllowing pages.

ACCURACY

'Reasons IPItJ JHrfeGi ar&lIraty not possible. Perfect accuracy means to


describe a phenomenon enctly as it is. It is impossible to be achieved.
We can never describe a thing with complete accuracy. There arc two
reasons for it .(a) imperfection of the investigator, and (b) imperfection
, of the instruments of inspection and measurement. Since man is not
perfect the investigations done by him and the instruments of measure-
ment and inspection made by him are also imperfect. For these reasons
the data collected cannot be absolutely ac~~te.
It is futile to e~ect complete accuracy in statistical investigations.
When in physical sCIences where controlled experiments can be done
perfect accuracy cannot be achieved, it is no use to expect the same in
statistical investigations, where, neither the experiments are possible nor
it is possible to use the.instruments of measurement at all places. In
statistical methods where personal prejudices-deliberatt: or uncon-
scious-are present, efforts to .obtain absolute accuracy are bound to end in
fallure. In reality one should not be surpmed at the fact that sbltistical
54 FUNDAMENTALS OP STATISTICS

methods have given comparatively inaccurate results, because there are


reasons for it; the fact to be really surprised at is, how have the statistical
, methods given such results which are fairly close to accurate ones, In
fact the science of statistics helps us in understanding the factual world
with all its inaccuracy and imperfectness. When conditions of investi-
gation are imperfect, the invf'stigator is imperfect and the instnunentJ
of measurement are imperfect it is only natural that the results do not
achiev:e perfect accuracy.
,No need of absolute accuracy. Moreover there is no need of absolute
accuracy in statistical i.nvestigations. If reasonably accurate estimates
are available there is no difficulty in understanding or analysing a pheno-
menon. At many places it is foolish to try to have absolute precision.
For example, if the distance froor the earth to 'any planet is es'timated
correct to inches (if it is possible) this woul_? hardly have any practical
significance. Where billions of miles are being measured or estimated
inches have absolutely no importance. This is an example of extreme
type. In actual practice estimates which are many times more crude than
this are sufficient for the purpose of statistical analysis. No businessman
cares to weigh grain correct to an ounce. Where ~easurements are
being done in tons it is enough if they are correct to a pound. Similarly
in the measurement of miles a few yards have no significance, not to talk
of feet and inches. In fact we never measure ~ thing with perfect
accuracy. We simply estimate its true value. If in the estimates there
is reasonable accuracy we have every reason to be satisfied.
What ;s reasonable accllracy? But on this point a very pertinent
question arises. What do we mean by reasonable accuracy? It is not possible
to give an absolute definition of this term. It depends on the type of
data that are being used and the purpose of the investigation. In many
cases there are conventional standards of accuracy and they also help the
investigator in taking a decision. In measuring the distance from the
earth to the sun a few hundred miles can very safely be left out but
n measurement of cloth even a few inches cannot be ignored. In
statistics there is no need of absolute accuracy; only relative accuracy
is taken into account.
How the degree of acC1lf'tlC'y ;s shown. Degree of relative accuracy
achieved should always be mentioned. If the production of wheat in
a certain district is 25,000 tons (correct to a 1000) the degree of accuracy
can be shown in any of the following ways:-
(a) The production is 25,000 tons (rounded in thousands).
(b) The production is 25,000 tons plus or minus an amount
not exceeding 500 tons; or the production is 25,000
tons ± 500 tons.
(c) The production is between 24,501 and 25,500 tons.
Cd) The production is 25,000 tons correct to 2%.
ACCURACY. APPROXIMA'rION AND ERRORS S5
APPROXIMATION
Meaning and need. "Approximation is the basis of rounding off
the figures with a view to simplify them and to make them fit for con-
sumption and analysis without in any way imparing the standard of
reasonable accuracy." Big numbers are usually confl.lsing to the eye
and the mind, and even when actual figures are available it is worthwhile
to round them off, with a view to make them more intelligible and fit
for analysis and interpretation. At many places there is no need to
give actual numbers and approximate figures setve the purpose all right.
If the actual figures of the production of wheat in India are given without
approximation they would be confusing and difficult to analyse and in-
terpret. Round figures can safely be given it_). such a case. It is quite
likely that the figures which are, left out or added in the process of appro-
ximatibn might actually make the data more accurate and remove the
errors of calculation.
Methods of approximation
There are some universally accepted methods of approximation.
They are given below. Out of these the first one is the most ac.curatl'.
(a) ApproxiIJlation to tbe nearest wbole nllmber. In this method the
nearest whole number is written in place of the actual figure. Thus
5,32,671 would become 5,33,000 (to the nearest 1000)
4,12,?30 would become 4,12,000 (to the nearest 1000)
The rule is that if the portion that is being left is more than half
the whole number (1000 in the above case) it shO'llld be replaced by the
whole number. In the first example given above, 671 has been replaced
by 1000 and the number has thus becomes 533 thousands. If the portion
approximated is less than half ot the whole number it should be ignored.
In the second example above 230 has been left out. If the number to be
approximated is just half of the whole number, it can either be replaced
by the whole number or ignored. However, if such cases are many,
in half of them the whole "number should pe kept and in the 'other half
the figures should be ignored. Another practice followed in such cas es
is to keep the retained figure unchanged if it is even and to increase it to
the next higher figure if it is odd. Thus 324 will be rounded as 320 and
335 as 340.
The same rule can be applied in case of percentages and ratios
etc. For example 74.8~ ~ can be written as 75 percent and 73.2% as
73 percent in round numbers.
(b) Approximation by tlsiJlg the next bigher whole nU/IJ/Jer. In this
method in place of the portion which is being left out the next higher
figure is written. According to this rule:
5,32,671 would become 5,33,000 (correct to 1000) and
4,12,230 would become 4,13,000 (correct to 1000)
According to the first rule 4,12,230 wa~ approximated at 4,12,000
but according to this rule it has b( w approxImated at 4,13,000.
56 FUNDAMENTALS OF STATISTICS

Similarly 74.8% would be approximated at .7S~o and 73.2% at


74% and not 73%·a'8 in the previoull method.
(t) Approximation by discarding terlain digils. In this method a
",art of the number which is approximated is entirely left out. Thus
5,32,671 would become 5,32,000 (correct to a thousand)
and 4,12,230 would become 4,12,000 (correct to a thousand).
Similarly 74.8% would become 74% and 73.2% as 73% (correct to
a whole number).
How much approximation is necessary in a particular case would
depend on the degree of accuracy achieved in the collection of data.
If, for example, certain lines have been measured correct to m.ilI.Up.etres
then the tenth part of the millinietre can be removed by approximation,
3.22 nuns. can be approximated as 3.2 roms. OrdinarilY all }iglll"l1 ,xapl
on; bI,J01Id th, fJl4rgin of aG&llraey shol/ld b, lift 0111.
Method of Ulriting approximated fi/,II"s. The approximated figure
should· be written in such a manner that the degree of approximation
is clear from it. For example, a line has been measured correct to milli-
metres and measurement 1S 4.99 ems. After approximation it would be-
come 5 ems. but it should be said and written as 5.0 ems. and not 5 ems.
5 ems. would mean that the measurement is correct up to centimetres
only, i.e., all measurements between 4.5 ems. and 5.5 ems. have been
expressed as 5 ems. On the other hand, 5.0 ems. would mean that the
measurement is correct up to milllmetres or in other words all mea-
surements betwe~n 4.95 and 5.05 ems. have been expressed as 5.0 ems.
Thlis if there is a f(!ro at thl Ind of an approximat,d Jigtlrl it Ihol/lt! al1ll91
h, IIIrilt,n.
The method of approximation should also be made clear while
writing an approximated figure. Usually the lower and the upper limits
of the approximated figure should also be stated. For example, in the
illustration given in the preceding paragraph if the measurement is correct
to centimetres it should be written as 5±0.5 ems. and if it is correct to
millimetre'S it should be written as 5.0±O.05 ems.
Approximation ant! other &altliialionl. If approximated figures are
used in multiplication, division or for finding out the roots or powers
great care should be exercised. In such cases the ertors due to aJ;lproxi-
mation would come after multiplication, division, etc., and this may
considerably affect the conclusions. For example, if two figures 194
and 184 are multiplied their product would be 36,696. If, however, they
are approximated as 190 and 180 respectively -and then multiplied the
product would be 34,200. There is a considerable difference between the
two results. Similarly in division of figures or in the calculation of roots
and powers, approximated figures may sometimes give erroneous can·
elusions. The effect of approximation on percentages calculated from
such figures is negligible. An illustration would make it clear. 1,23,65,
357 is 25% of 4,94,61,428. If these figures ate approximated correct to
a lakh they would become 1.24,00,000 and 4,95,00,000 res~ctively. The
ACCURA.CY, A.Pl'ROXIM'ATION AND ERRORS 57
former is 25.05% of the latter. Thus 'We see that even a high degree of
approximation has not materially. affected the percentages.
STATISTICAL ERRORS

Meaning. The word error is used in a specialised sense in statistics.


It does not mean the same thing as mistake. Mistake in statistics means
a wrong calculation or use of inappropriate method in the collection 'or
analysis of data. Error, on the other hand, means "Jhe difJereoa beJlJlltR
the trIIe vallie and Jhe utifJlaled tla/lle." We have seen In the p:receding pages
that in statistics 'We only aim. at a reasonable standard of accuracy. In
other words, we use approximated values or estimates rather than actual
v:aIues. The- difference between the approximated or estimated value and
the true value is technically called the statistical error.
CaliSCI of errors. Statistical errors arise due to a large number' of
factors. They may be due to inappropriate definitions of statistical units
bias of the investigator or the inherent instability of the collected data.
Such errors are called Errors oJOrigin. Errors may also arise on account
of manipulation in counting, measurement, description or approxima-
tion. Such errors are known as E"ors of ManipllJation; Yet another
cause of statistical errors may be the use of incomplete data, errors may
also arise on account of inadequacy of the sb1e of the sample and all such
errors are called E"ors of Inadetjllaty.
Measurement of Errors
Statistical errors can be measured either-
(4) absolutely, or
(b) relatively.
AvsolNle ~nd relalilll errors. If the error is measured absolutely it is
called an absolute error and if it is measured relatively it is called relative
error. Absolute error is the difference between the true value and the
estimate. If the actual figure of sales of a concerti is Rs. 9,900 and the
approximated figure is Rs. 10,000 there is difference of Rs. 100 in these
two figures. This is an abso/llle error. Relative erroi is the ratio of the
absolute error to the estimate. In the above example if the absolute error
of Rs. 100 is divided by the estimated figure of Rs. 10,000. the result,
Ib%o or 0.01 is the relative error. The relative error can al~o be express-
ed in te1l1lS of percentages. It is then known as percentage e"or. In this
100
example percentage error would be 10000 X 100 or 1 'Yo.
,
Algebraically if U stands for the actual value, U for the estimated
value, Ue for the absolute error and e for the relative error,
Ue=U'-U and
U'-U
'=-u
58 FUNDAMENTALS OF STATISnCS

In statistical analysis relative errors are more valuable than abso-


lute errors. Absolute errors very often give erroneous conclusions.
If the true value of a phenomenon is 99 and if it is estimated at 100,
the absolute error is 1. Again, if the true value is 99,999 and the
estimated figure 1,00,000 the absolute error is 1. The absolute error
in both cases is the same but the relative error in the first case is-n~o-
while in the second it is 1,0~,000' The first error is relatively 1000
times the second one.
Positive qnd negative errors. Absolute and relative errors can be
either positive or negative. If the true value exceeds the estimate, the error
is said to be positive and on'the other hand if the estimate exceeds the
true value the error is called negative.
Classes of Errors
Broadly speaking ertors may be either-
(a) Biased or
(b) Unbiased.
Biased erron. Biased errors are ·those which arise on account of
some bias in the mind of the investigator or the informant or in the
instruments of measurement. If the investigator wishes to exaggeTate
the :figures he would approximate them at the next higher figure. If,
on the other hand, he has a downward bias he would approximate them
by discarding numbers. A biased investigator can play mi~chief even at
earlier stages of investigation. He can select such data which would suit
his conclusions. Biased errors may also arise due to defective instruments'
of measurement. If a yard-stick, which is 35 w in length, is used to measure
a certain distance it will always produce a biased error, as there wiII
always be a short measurement.·
Bia.red error.r are cllmulative. The larger the number of cases in which
there is a biased error ~he greater would be its magnitude. Ifwe measure
the distance of 5 yards only with a yard-stick of 35* the error would be
of the magnitude of 5". But if we measure 100 yards the error would
be 100". Thus biased errors are cumulative.
Uf/biased err()N. Unbiased errors are those which arise just on ac-
count of chance. They are not the results of any prejudice or bias. If
figures are approximated to the nearest whole number, the error would be
unbiased, as in some cases the approximated number would be less than
the actual ones while in others they would be more than the true values.
Unbiased errors are generally cOlllpensating. One error compensates
the other. The law of statistical regularity works here and since errors
are both positive and negative they usually cancel each other. If the
yard-stick is just 36' and if with it, certain distances are measured. it
is quite likely that in some cases the measurements are unconsciously
more than 36' while on others they are less. The larger the number
of such measurements the lesser would be the error. An unbiased
ACCURACY, APPROXIMATION ANt) ERRORS 59

coin may fall heads in:3 tosses,out of 4 but in 3000 tosses the number of
heads and tails are bound to be more or less equal. There is a general
tendency everywhere to give ages in round figures. It is another
example of unbiased error. If some people have, in this process, over-
estimated their ages, others might have under-estimated them. A person
of29 years of age may call himself of 3Q but it is also likely that a person
of 31 years may call nimself of 30, and in such a case the errors cancel
each other.
The following table will illustrate the characteristics of the biased
and unbiased errors : -

TABLE I
Bialed and u'lbialed e'-f'()'-J

Exact number
Correct to
nearest
I Absolute
"error'"
Correct to
next 1000
Absolute
error
1000 unbiased and over biased
50,241 50 +241 51 -759
60,507 61 -493 61 -493
49,361 49 +361 50 -639
61,427 61 +427 62 -573
53,764 54 -2.36 54 -236
48,090 48 + 90 49 -910
50,460 50 +460 51 -540
96,670 97 -330 97 -330
I
60,250 60 +250 I
61 -750
Total 5,30,770 I 530 +770 536 -5230
When figures are estimated correct to the nearest thousand the
error is an unbiased one. The unbiased absolute error in the above
ngures, as shown in column 3, is only 770 and the relative error is
5';~,~70=0.001453. The errors are negligib1~.
When figures arc estimated correct to the next one thousand and
over, the error is a biased one. The biased absolute errOl in the
above case is - 52.30 as shown in column 5 and the relative error is
5~~70 =0.00975. These errors are comparatively much more than
in the previous case and cannot be safely ignored.
Brrtt,-I in !lIliltiplication, dir·jIion, ete. However, it should bt:'
remembered that neithet are unbiased errors always compensatini>:
'nor biased errors always cumulative. Where items have to be added
together biased errors would no doubt be cumulative and unbiased
ones compensating; but where items have to be subtracted the situatio.lll
is just the reverse, and biased errors would be smaller in size than the
unbiased ones. If ~'o items arc multiplied together unbiased errors
60 FUNDAMENTALS OP STATISTICS

would give a better estimate than the biased ones. But if the items
are divided and the algebraic signs of the two figures are the same
(as is the case in biased errors) the result would be quite close to the
true valu~ ;and if the signs are opposite (as is the case in unbiased error$)
the reo;ults would be away from the true value. In other words, ordina-
rily, unbiased errors ar.e compensating only when items have to be added
or multiplied but when the items have to be subtracted or divided
biased errors would give results closer to the true value than the; results
given by unbiased errors. .
These points can· be illustrated as follows : -
True Value Estimated value with Estimated value with
biased error unbiased error
(a) 100 99 99
(b) 200 197 202
(i) Biased errOl in-(d)""l ana unbiased euo! c= 1
(ii) Biased errOl in (b) -3 and unbiased etror "" - 2
(iii) Biased ~rror of (a+b) or 300 -= (300 - 296) or 4 and
unbiased error (300-301) "'" - 1
(iv) Biased errOl fo! (b-a) O! (100) ... (100-98) "" 2 and
unbiased error-(100-103) -3
(v) Biased error for (axb) or 20,000=(20,000-19503)-497 and
the unbiased error -(20,000-19.998) ==21
(VI) Biased error for (b+a) or (200+ 100) or 2 ""
197) . ( 202)
( 2- 99 ... 0.01. and the unbIased error -2- 99
--0.04.
Thus it is clear that in addition and multiplIcation the biased
errors are more than the unbiased ones whereas in subtraction and
division the position is reverse and the unbiased errors are more thaD
the biased ones.
Estimation ot errors
In most of the statistical investigations in actual practice the exact
figures or the true values are not known. In such cases we cannot
measure the absolute or the relative error. But it is possible to estimate
them.
EsJimation of IInbiased e"Orl. Unbiased errors can be estimated
without much difficulty in most of the cases. In the illustration in
Table I if the actual figures were not known. all we could say was,
that the total of the figures (correct to nearest 1000) was 5,30,000.
If the absolute error in the above figures, is to be estimated then in
each of the nine items it can range between 0 and 499. It will be
zero if the actual number was in exact thousands. and in such a case
the actual and the approximated figures would be the same. The
maximum error in any figure can be 499 because the approximated
figure will be discarding all numbers less than 500 and adding all
ACCURACY, APPROXIMATION AND ERRORS

numbers more than 500. Thus 60,250 has been approximated as


60,000 and 60,507 as 61,000 0 and 499 are the minimum and ma-
ximum of the absolute error~ per: item in this example. The most
likely error, however, would lie somewhere between these two limits.
It can be expected to be about the middle of these limits at 249.5
or say 250. The best estimat, of the unbiased absolute ,"or ;s given by the
product of the average ahsolllie error and the square rool of the nlllllber oj il,/III.
ln the above case the estimated average absolute error per item is 250
and the square root of the number of items ('\1'9) is equal to 3. 'Thus
the estimated value of the absolute error would be :
Average absolute error of items X ,I:No-:-'
otltems
"" 250 'x v'9 .=; ± 750
The estimated figure of 750 compares well with the actual figure
of the absolute error which is 770.
, ,

'The relative error can be estimated easily. It is equal to the


estimated absolute error divided by the approxImated total of the
items. In this case it would be:
_ 250 X"';9 750
- 5,30,000 - 5.30000 , , -0.00115
The actual,relative error as we had calculated was 0.001453.
Bstimation of bia.red ,"orl. Just as it is possible to estimate the
unbiased errors, similarly, the biased errors can also be estimated to a
certaiq extent. In the example given in Table 1 the minimum biased
absolute error per item is 1 and the maximum is 999, as all figures from 1
upward to 999 are apl'roximated at 1000. Thus 60,250 has been approxi-
mated at 61,000 and (;0,507 has also been approximated at 61,000. The
likely error per item would be somewhere between 1 and 999. It will
be round about SOD. The rule for the estimation of biased errors is
slightly c:lliferent from the rule for unbiased errors discussed above.
In case of biased errors the estimation is done by Illl1ltiplYing Ihe average
absolllte'e""s of il'lII. by the nllmber of item.r (instead of the square root of
the number as in the previous case). Thus in the above. illustration
the estimated absolute biased error would be :
Average absolute error of items X Number of items
... 500x9=+4500
The estimated absolute biased error of 4~OO conlpares well with
the actual figure of 5230. The estimated relative biased error can be
found out by dividing th,is figure with the approximated total. In the
above case It would be :
Average absolute error of items X Number of items
Approximated total of all the items
SOOX9
... 5,36,000 = .0084
The actual relative error had been calculated at .00975.
62 FUNDAMENTALS OF STATISTICS

Questions

1. Write a note on the c;ditlng of primary and secondary data for the purposes of
analysis and interpreta~lon.
2. The statistician who desires to safeguard. his analysis and result8 from im.
perfections entering at the very start should rest his choice among sources upon a test
of reliability rather than upon accessibility and convenience.
Expaod this statement so as to bring out clearly the way in which sources should
be used. eM. Com. LtlcJ:nolP, 1943)'
3. Discuss the standard of accuracy required in statistical calculations. To what
extent should approximations be used? (M. A. Agra, 1949)'
4. What precautions should be taken in the use of published statistics.
I (B. Com. Agro, 1949)'
5. Mention the advantages of approximation of Statistics. What degree of
accuracy is generally required in each statistical investigation?
(M. Com. Rajpulono, 1951).
6. What are the different ways of approximating figures ? Discuss the merita
of each.
7. To what extent call figures be safely approximated in statistical analysis?
How should such ligures be written i'
8. (0) Discuss the sources of errors in statistics and their effects.
(b) State the important methods of approximation and their utility in
statistics. (B. Com. Agra. 1940).
9. In what way does a statistical error differ from a 'mistake? What classC1I of
¢uorsarethere and how may they be measured? (B. Com., Allababad, 1943)'
10. Discuss the various types of errors likel y to creep into statisl:ical investigations
and suggest how to avoid or correct them. (B. Com. Agro, 1949).
. u. Of the biased errors the statistician should have none : but of the unbiaaed
ones the more the merrier, notwithstanding that they are also errors. Elucidate'.
12.. In framing statistical estimates we are not so definite as the Modem Traveller
who:
........ knew the weather to a T.
Longitude to a degree.
The Latitude exactly,"
Explain the bearing of the above, on the degree of accuracy desired in statistical
estimates as distinguished from the estimates of the more exact sciences.
eM. A. PlInjab. 195Z).
15. Show how biased errors are generally cumulative and unbiased ones com-
pensating. Are there any exceptions to this general rule?
14. Discuss the various methods of estimating biased and unbiased errors botb
abSolutely and relatively.
1 S. Distinguish between
(a) Absolute and relative errors and
(b) Biased and unbiased errors.
Discuss the effects of these errors and explain the steps that are taken to meet the
effects. (B, Com. Agra, 1938).
,
Classification, Seriation and
Tabulation
7
CLASSIPICATION
Need "nd meaning
The data which are collected or compiled in accordance with the
rules and methods discussed in the preceding chapter are usually very
voluminous and large in quantity. As such they are not directly fit for
analysis or interpretation. If, for example, the figures of the expenses
of 2,000 students residing in Allahabad University hostels are before
us, as collected, it would not be possible to draw any inferences from
them because for purposes of comparison. analysis and interpretation
it is essential that the data are in a condensed form. Further. it i$.
a]so essential that the likes must be separated from the unlikes. All
the 2.000 students, no doubt. are alike in the sense that all of them
belong to a particular university and live in hostel but they differ in
other respects. Some may be living in single-seated rooms atld others
in double or treble-seated rooms; some may be living in costlier hostel
and others in comparatively cheaper ones; some may be having their
privat~ messing arrangements while others may have joined the common
mess. Thus, even though the data collected relate to one set of persons
yet there may be many types of dissimilarities even within this ~roup.
For the purpose of analysis and interpretation. data have to be d1vided
in homogeneous groups. In order to remove these defects-of volume
and heterogeneity-;-statistical data are fablliated with a view to present
a condensed and homogeneous picture. But before the tabulation of
data, it is necessary to arrange them in homogeneous groups so that
there may be.no difficulty in tabulation. The proceu of arranging data in
grollps or claue! according to relemblances and limilarities is technicallY called
Cla.r.rification. Thus, by classificatioQ we try to strike a note o(homoge-
neity in the heterogeneous elements of the collected inform~tion. Classi-
fication gives expression to the similarities which may be found in the
diversity of individual units. In classification of data units having
a common characteristic are placed in one class and in this fashion the
whole data are divided into a number of classes. Even after classifi.cation
the !ltatistical data are not fit for comparison and interpretation but this
is certainly the first step towards the tabulation of data. After tabula-
tion of data statistical analysis and interpretation are possible. Classi-
fication is a preliminary to tabulation and it prepares the ground for
proper presentation of statistical facts.
Characteristics of an ideal classific~tion
Despite the fact that classification is a very important preliminary
in a stati~tical analysis no hard and fast rules can be laid down for it.
64 PUNDAMENTAt.s lOP STATISTICS

Technically the classification of data in each ~vestigation has to be


decided after taking into accol!nt the nature, scope and purpose of the
enquiry. However, an ideal classifi_>ation should possess the following
characteristics : -
(a) II shoilld be IIntJlllbigtlOlil. If there is ambiguity in classification
the very purpose for which it is meant is not served. Oasswcation is
meant for removing ambiguity. It is necessary that the various classes
should be so defined that there is no room for doubt or confusion. It
is by no means an easy task. . If we have to divide the population into
two classes, say, literates and illiterates, exhaustive delinltion of the
terms used, is essential. Who is a literate? is a question not easy
,to answer. Some criterion has to be laid down. In the last censuS of
.population of India, a literate was deBned as one who could read and
write a simple letter. This is technically not a very satisfactory definition.
After all what is meant ~y a simple letter is a point on which there can
be difference of opinion. But for practical purposes the definition can
be .said to be faidy satisfactory.
(b) If Ihould belfaple. The ideal classification should have the merit
of stability. If a classification is not stable and if each time an enquiry
is conducted it has to be changed, the data would not be fit for com-
parison. The occupational classification in the Indian population census
suffers from this defect. Various occupations have been defined in
different ways, in successive censuses, and these figures llfe not strictly
comparable.
(e) II sholild be flexible. A good classwcation should be flexible
and should have the capacity of adjustment to new situations and
circwnstances. When we talk of stability of classification we do not
mean rigidity of classes. The "f'erm is used in a relative sense. No
classification can be stable for ever. With changes in time, some classes
become obsolete and have to be dropped. while fresh classes have also
to be added. An ideal classification should be such that it can adjust
itself to these changes and yet retain its stability. The data should be
divided into a few major classes which must be sub-divided further.
Ordinarily there would not be many changes in major classes. Only
small sub-classes may need a change and the classification can thus retain
the merit of stability and yet possess flexibility.
Basis of classification
Statistical data are classified on the basis of the charactetistics
possessed by the different groups of units of a universe. As has been
pointed our earlier, these characteristics give expression to the unity
of attributes which may be traced in a diversity of individual units.
These characteristics can be either deltrip/jlle or nUII/erital. Unem-
ployment, oc,:.:upation, literacy, civil conditions and sex are ex-
amples of descriptive characteristics while age, income.. weight
and height are examples of numerical characteristics. Descriptive
characteristics cannot be quantitatively measured or estimated. OnI
their presence or ahsenee .in an individual unit can be found ou
CLASSIPICATION. S8IiUATIO~ AND TABuLATION 65
For example. we cannot q-oantitatively measure litetac:y. All we can
sar is whether an individual is literate or illiterate according to c:er-:-
taln definitions laid down. When, data are classified on the basis
of qualities or attributes, which are incapable of quantitative measure-
ment, the classilication. is said to' be IItaJriJing 10 IIllribtllll, and when the
data are classified on the basis of quantitative D;leasurement the classiii·
cation is said to be IIttore1ing 10 t/ass ;1I1'''II/S.
Classification. according to attributes
SiIllP" tlassi/i&alioll. In this me&od the data are divided po. the
basis of attributes' or qualities. All those units in which a ~cular
characteristic is present, are placed in one group and thos~ 10 which
it is not present are placed in another group. If, for example, the
problem of blit1dness is being studied, the universe can be divided
tnto two classes-one in which the units possess this characteristic
and the other in which this characteristic is not found. We shall thus
have two classes: those who are blind and those who are not blind.
This type of classiiication in which only one attribute is studied and
the data are divided in two parts is called rimp!, tlt!!!ifitlJlitJfl IIf tltmifitll-
litJfl amwding 10 tlitlJolol1l..1'
Ma1li(Dlti tlilSrifitaliDn. If, however, more than one attn'bute is
being stud'ied simultaneously. the data would be divided into a number
Qf classes. If the problem of blindness is studied sex-wise, there are
twQ attributes under study, namely, blindness and sex. A person can
be either blind or not blind; further a person can be either a male or
Ii female. Each of the two ,attributes is capable of division itt two
classes. The data wouid thus be divided in four classes. (1) males who
are blind, (2) males who are not blind, (3) females who are blind.
(4) fenpUes who are not blind. The study can be further extended it
we have a third attribute say, religion. Now ea~h of the above four
classes is capable of further sub-division on the basis of religion. Such
classification in which more than one attn"bute is taken into account is
calred Cll»lifolJ &ltmijitatitJfl.
Arbitrllry 1III1ttr' (If clalri/ital/(JlI. In the various groups which ate
formed in the above mentioned manner the diHerenccs are not always
natural or very well defined. Ordinarily such classification is of an
arbitrary nature. If the universe is divided in two ~roups-tall men
and short men-we shill have to give arbitrary definitions of the two
classes. It can be said that those who are 5 feet 4 inches or above are
taIl and those who are less than. 5 feet 4 inches are short. The
classification is obviously arbitrary. In those cases where a particular
attribute is decided on the basis of quantitative study, as in the above
case of tall and short men, the classification is comparatively more
definite and precise. But this is not always possible. Many attributes
cannot be studied with the help of figures. The dii£erence between
Uteracy and illiteracy is an enmple. Here one attribute gradually
changes into another attribute and there is no clear cut line of demarca-
tion. The c:llifer~nce between a literate and an illiterate is always a
5
66 PUNDAM8NTA,LS OF Sl'A'l:ISTICS

matter of opinion. There may be' many persons, whom it wouid be


difficult to classify either as literates or as illiterates. Whenever data are
classified according to attributes this point should be kept in mind and
attempts must be made to define the attributes in such a manner that
there is the least possibility of doubt and ambiguity.
Classification according to class intervals
This type of classification is applicable only in those cases where
the direct quantitative mealurement of data is possible. Data relating
to height, weight, income, production and consumption, etc., come
under this category. In such cases data are classi6ed on the basis of
values or quantities. Thus, instead of saying that a certain group
of persons is tall while' the other group is short, the heights can be
specUied in class-intervals. Persons whose heights say, are, within
5'4"-5'6" can form one group, those whose heights are within 5'6"-
5'S" can form another group and so on. In this way the data are
divided into a number of classes, each of which is called a class
Interval. 5'6"-5'8" is One class interval. The limits within which
a class interval lies are called C/all Limits. In the present case 5'6"
and 5'S' are respectively the lower and the upper limits of this class.
The: difference between two class limits is termed as &lass Magnitude,
or M4gnilllde of the dass Interval. In the above example the magnitude
of class-interval is 2". The number of items which,fall in any class-
interval are called Class Frequency. If the number of persons whose
heights are 5'6"-5'8" is 116, this would be the trequency of the class
5'6"-5'8".
Classification according to class intervals involves three basic
problems. They are:-
ea) Number of classes and their magnitude.
(b) Choice of class limits.
(c) Counting the number in each class.
Number of classes. Ordinarily, a frequency distribution should
not contain more than 20 to 25 and not less than 6 to 8 classes, depend-
ing upon the total number of items of the series. If the number of
items in a series is large it can have a large number of class intervals
also, because in such a case all class intervals would have a fairly good
fr~quency. If, on the other hand, the number of items is less, the
number of classes should also be less, as otherwise there would be
no frequency in some classes and very little frequency in others. The
idea contained in the data can be easily and readily grasped when the
number of classes is few. But in such a case there is the danger of
obscuring some important characteristics of the data. If the number
of classes is large, all the characteristics of the data are contained in
them but on account of too many classes it becomes difficult to ascertain
them. In fact, a balance should be struck between these two factors.
An ideal nllmber of classes for any frequency distribution would be that which
p_iues the maximllfll information in. tbe clearest fashion.
CLASSIFICATION, SERIATION AND TABULATION 67
Magniftltle o.f intervals. The magnitude of class intervals depends
on the range of the data and the number of classes. If the range
(difference between the maximum and the minimum values) of the
heights of a group of persons is 15', and if it is desired to have 10
classes, the magnitude of each class inter1al would be 1.5'. Besides
these things, a few other points should also be kept in mind. The
magnitude of the class intervals should be such that it does not distort
or obscure the important characteristics of the data. Bearing this
fact in mind the magnitude of tpe class interval should be 2, 5, 10,
25, 50, 100 500, tOoo, 5000 and so on, rather than odd figures like
1, 3, 7,·11, .24, 57, 92 and 472, etc. The multiples of 2,5 and 10 are
in common use and human mind considers them almost as natural
magnitude~.
In general, the class intervals should be of equal magnitude. If the
si2:e of the class 'interval is unequal it may give a misleading impression
and in such cases, comparison of one class with the other may not be
possible.
Class limits. The most important thing that should be kept in
mind while choosing the class limits is that these should be chosen in
such a manner that the mid-point of a class interval and the actual average
of items of that class interval should be as close to each other as possible.
If it is .not so, the class limits would be obscure and distort the main
characteristics of the data. Consistent with this point, wherever
possible the class limits should be located at multiples of 2, 5, 10, 100
and such other figures. The class limits must be such that midpoints
of class intervals are familiar and common figures ending with 0, 2, 5, 10,
15, etc. These are capable of easy and simple analysis. As far as
possible in frequency distribution there should be no indettrminat;e
class~s like under 10 or over 10,000. Such classification may create
difficulties in analysis and interpretation.
The class limits may be written in any of the following ways : -

TABLE 1
I II III IV
0-10 oand under 10 0-9 5
10-20 10 and under 20 10-19 15
20-30 20 and under 30 20-29 25
In the first method, items whose values are just 10 or 20 ca'
be classified either in 0-10 group and 10-20 group respectively or i
10-20 and 20-30 classes respectively. Usually in such cases the iteJ
is classified in the next higher class so that the item whose value
exactly 10 would come in 10-20 group. In the second method, tho
point is made clear. Items whose values are Ius than 10 woul
be in the 0-10 class interval. This is the exclusive method of c1as!'
fication. In exclusive method the items whose values are equ
to the upper limit of a class are grouped in the next higher dar
68 PUNDAl(BN'tALS O~ STATImCS

In other words, the upper litnit of a class is excluded and items wi~
values less than the upper limit are taken into account. As against
this the third method is in&ms;v,. In it the upper limit is alSo in-
cluded in the class interval. This method. in reality, is like the second
method as 0-9 means 0 and undc:r 10. To emphasise this point sOJ:QC"
times the class interval is written as 0-9.99. The fourth method indicates
only the mid-pbints.
Cotm#ng I/;, nllmb,r of it'lIIl in quI; t/all. After deciding the number
of classes. their tnagnitude and class limits, the next thing to be done
is to count the number of items falling in each class. This can be ·done
in any of the following ways : -
(a) .B;r IaI!J ·shl,ls. Under this method, the class intervals ~re
written on a sheet of paper (called Tally Sheet) and for .each item a
stroke is marked against the class interval in which it falls. Usually
after every four strokes in a class, the fifth item is iudicated by drawing
a horizontal or diagonal line over or through the strokes. These groups
of five are eas} to count. Data sotted in such a manner would give
the following type of tally sheet.
TABLE 2
Nllmb". of 1II4f1u oblai",J b" 80 sIIIt/",tl
(Tally Sheet)
------------------------
MArks
I 'I To'"

20-30 IIIl nn II 12
30-40 IiII fin lIn III 18
40-50 UlI IIil iIII IIII IIII nIl 1 31
50-60· lItt nrr 10
60-70 rill IIII 9

Total 80

(b) B.1I11't~i&al aids. Various types of machines are now available


for purposes of sorting !lnd listing of data. Some of these machines
are hand operated wl\ile others are operated with electricity. With
the help of hand operated machines the method of Needl4 Soiting has
become very popular now. Large number of items can be sorted
with it under any number of headings and sub-headin~s. Cards hf
convenient s~e and shape with a series of holes, are used In· this method.
Each hole stands Eor a value and when cardo ~re stacked, a needle passel
through particular hole representing a particular -vlU"ile. These cards
CLASSIPICATION, SERIATION AND TABULATION 69
3re later on separated and counted. In this 'way frequencies of vanous
classes can be found out by the repetition of this .technique. .
The technique of pllll~h,d ~ardl is also equally popular. In this
method the data are recorded. on special cards by punched hole6 made
by means of a special.key punch "fIhich can be operated either by hand
or electrically. HoIferith and Powers Samas sorting machines sort
the cards at a speed of; about 24,000 per hour. Thus we :find that
mechanical aids have made the work of classification very easy, quick
and accurate. _
SBlUATION
Definition. The process of seriation is closely associated with classi-
fication of da,ta. According to L. R. Connor, "If two variable gllatltiti"
~an b, arrang,tT lid, by lith .so that the' meamrabl, dijj'rltl~1S in th, on, ~orrll­
pond to th, ""amrabl, diffBrln~1I in th, other th, reSult il laid to jOf'fll a ItatiSfi~1I1
lerill." If the production figures of wheat in India for the last 10 years
are arranged systematically they would form a statistiatl series. Similarly
if tL·, marks obtained by a group of"100 students or their heights or
weights are pt9perly arranged they would form statistical series.
The classification of data can b~ done on three bases, filii', JptJR,
and ~ontlition and they give rise to three types of statistical series known
as T;., S"i8l, Spatial S";II and Condition Series•
.Tim' S"';es. Time series are also knOWD. as historical series as
the data collected relate to either past ot present. If ·the figures of
.enrQlment of students in the Allahabad 'University during the last .30
years are properly arranged they would form a time series. Similarly
figures of the population of India during the last eight censuses would
form a historical series. The changes in the level of phenomena measured
are related to the changes in time.
Spatial Ser.ies. If the data collected do not change in relation to
time but in relation to place the-series is called spatial series. Technically
spea~g, such series are not statistical" series because changes in place
are not capable of a quantitative measurement. As per the definition
given above, in statispcal series both phenomena should be variable
and capable' of quantitative measurement. However, in I;ommon
parlance data arranged on the basis of place are called spatial series. If
the figures of production of wheat for a particular year, for different
States· of India, were noted down they would form a spatial series as
the data are in relation to place. .
Condition s"';es. If statistical data are recorded on the basts' of
changes in some condition, the series so formed, is called condition
series. If the data relating to the heights of 100 students were classified
they would form a condition series. as the figures are neither on the
basis' of time nor place, but a particular condition, namely, heigqt.
Similarly data relating to income, expenditure, marks, and weight Of
'students would give rise to condition series. ' .
. D;scr,te anti ~ont_RS smll. Statistical. series may . be eithet
-aismt, OT eontinJIDliS. A discrete series is formed fIOm items "Which
70 FUNDAMENTALS OF STATISTICS

are capable of exact measurement. In such cases the various units


are not capable of division. Each unit of data is separated and complete.
We can count the number of persons whose salaries are exactly Rs.
100 per month, Rs. 105 per month, or Rs. 200 per month. The
data would give rise to discrete series. But there are certain pheno-
mena which are not capable of exact measurement like height or weight.
Height of an individual cannot be measured with absolute accuracy
and as such, we cannot count the number of persons whose heights
are exactly 5'46
•The actual height may vary by a thousandth part of
an inch from this figure. In such cases, therefore, the data are given
in relation to certain groups or class intervals. For example we can
count the number of persons whose heights are between 5'3" and 5'4".
Here an exact measuement is not possible. Such series are called
continuous series.
In continuous series the statistical unit is capable of division and
can be measured in fractions of any size, no matter how small. A
ton of coal can be divided in a 100, 1000, 10,000 or even more parts.
Theoretically a ton of coal can be divided into a limitless number of
sub-divisions. In discrete series statistical unit is either not divisible
or is not divided. We can image half a ton, one-fourth of a gallon or
one~tenth of a pound, but it would be absurd to talk about half a son,
one~fourth of a student and one-tenth of a wife. Here the unit is com-
plete and indivisible. We can, however, have discr<;te series, even from
divisible units where they are not conventionally divided. For example,
marks are given to the students in whole numbers. It is po~sible in
such a case to have a discrete series of marks. \
Simple and cumulative series. Statistical series can be either simple or
cumulative. In a simple series the frequency against each class interval
or value is shown separately a.qd individually. In a cumulative series
the frequencies are progressively totalled and aggregates are shown.
The following example woUld clearly show the difference between
discrete and continuous series and simple and cumulative series : -
TABLE 3
Discrete and Continuolls Series

Discrete Series Continuous Series

No. of children No. of couples Height in inches No. of persons


per couple
2 50 60~62 12
3 is 62-64 15
4 40 64-66 24
5 28 66-68 13
The above series are simple. If they have to be converted into
cumulative series they would be as follows : -
CLASSIFICA'I'rON, SERrATION AND TABULATION 71
TABLE 4
CUmtllatilJc S cries
No. of children No. of couples Height in inches I No. of persons
per couple
'.
Up to 2 50 Up to 62 12
Up to 3 125 Up to 64 27
Up to 4 165 Up to 66 51
Up to 5 193 Up to 68 64
TABULATION
Meaning and imp()rtance. In the broadest sense "tabulation is an
orderlY arrangement of data in columns and rows". It involves the systematic
presentation of data to elucidate the problem under investigation. It
is a process between the collection of data on the one hand, and its
final analysis on the other. In fact tabulation is meant to properly
au~nge the answers relating to the questions posed in any investigation,
and is very helpful in analysis of the collected data as also in drawing
inferences from them. Tabulation is the final stage in collection and
compilation of data, and is a so~t of stepping-stone to the analysis and
interpretation of figures. In deciding about the type of tabulation one
has to keep in mind the nature, scope and object of the enquiry. Tabu-
lation of data should be done in such a form that it suits the nature and
object of the investigation. The importance of proper tabulation is
very great because if the tabulation of data is not satisfactory its analysis
will not only be difficult but defective also.
Types of tabulation
Simple and complex tabulation. Broadly speaking, tabulation of data
can be :.ither simpfe or complex. Simple tabulation gives information
about one or more groups of independent questions. Complex tabu-
lation shows the division of data in two or more categories and as such
is meant to give information about one or more sets of inter-related
questions.
One-way table.r. Simple tabulation usually gives rise to single or one-
way tables. One-way tables supply answers to questions about one char-
acteristics of data only. The following table will illustrate the point:-
TAB~E 5
Marks obtained by 100 sludent.f in statistics
Marks Number of students
30-40 14
40-50 26
50-60 30
60-70 20
70-80 10
Total 100
72 PUNDAMENTALS OF STA'l'IS'I1CS

This table tells us about the number of students in each class-


interval of marks obtained' by them. We can know from this table
that 30 students obtained marks between SO and 60. This table also
tells us that the minimum marks range from 30 to 40 and the maximum
from 70 to 80. Thus this one-way table gives us information only
about one chaucteristic of data, that is, marks of students in statistics.
All the ~uestions that can be answered fro~ the table would be indepen-
dent of each other.
TWO-IIIIlJ tabus. As against the above type of Jable there are dollble
or 11II9-IIII1J lables. Two-way tables give information ~bout' two- inter-
related characteristics of a particular phenomenon. If the numbers
of' students given in the above table ate further divided sex--wise, the
table would become a two-way table because it would give information
a~out two characteristics, namely, the marks obtained by students in
statistiCs and the sex-wise distribution of students in various class intervals
of marks. The shape oftbe table will be as follo'Ws:-

TABLE 6
MarRs,obtained by 100 stllJents in statistics (slx-wisI)

Marks Number of Students

\
Males Females Total
30-40 8 6 14
40-50 16 10 26
50-60 14 16 30
60-70 12 8 20
70-80 6 4 10

Total S6 44 100

The above table is capahle of supplying information about ques-


tions relating to two inter-related phenomena. From the table not_only
can we find out that 30 students obtained marks between 50 and 60 but
also the fact that out of them 14 were males and 16 females.
Thr,I-1PiZ.J labks. If three inter-related phenomena are to be studied
there would, be treble or three-way tables. A three-way table can
answer questions relating to three inter-related problems. In the above
example if we further find out the number of students who. were hostellers
and the number who 'were day-scholars a three-way table would be
necessary. It would be as follows-
CLASSIFICATION, SERIATION AND T~ULATION 73
TABLE.7

Marks obtained by 100 students in statisties (suc-lIIise and


on the b4Sis of residen&l)
Number of Students
Males I l'emales I Total
Host- Day Totar Host- \ D~y Total Host- Day
Marks ellers Scho- ell.ers Scho- . elIers Scho- Total
lars lars Iars
30-40
--4- .-r- 8 -4- 2 6 -8-- ---6- 14-
40-50 10 6 16 5 5 10 15 11 26
50-60 8 6 14 9 7 16 17 13 30
60-70 7 5 12 5 3 8 12 8 20
70-80 5 1 6 2 2 4 \ 7 3 10
---
Total I"' 34 I 22 I 56 I 25 I 19 I 44 I 59 I ·41' I. 100
The above table can supply us information about (1) marks obtained
by students, (2) the distribution of these students sex-wise and (3)tbc
distribution ot" the students on the basis of residence.
Higher order tab/es. The tables can also be I1I~JO/J .or of higbl,
orde,. Such tables supply information about a large nwhJ>er of inter-
related questions. If in the above table additional informa~on is given
about civil conditions of the ~tudents it would become a four,.;way table
and similarly tables can be. of still higlier order-five-way, six-w-ay, and
so' on. .All such tables are called manifold or higher order tables.

TABLE 8
Marks obtained 1!7 stfltknts (sex-1IIise, on tbl btlns oj ~"i/
eonJiti01ls and resitkn&u)
Number of Students
Males
~----_-_-

Residence Marks

Hostellers
30-40
40-50
50-60
I

i
I I
60-70

Totali l l
i i
_ _ _ _ 70-801 _ _ _ _ _ _ _ _ _ I__i__
,
I_
74 FUNDAMENTALS OF STATISnCS

Day scho- 30.40


lars 40-50 j
1

50-60 I
60-70 I
70-eO
-
Total
1
-- - - I - -I~------
I I .l --I
Total I (
30-40
40-59
50-60 I
I ,
60-70 I
70-80
Grand
Total
-j-,- --
i
--~ -- --,--1-
I
I I
The above table gIves information about a large number of inter-
related questions regarding students, namely, about the marks obtained
sex-wise distribution, civil conditions and residence. Manifold tables
are very useful in presenting population census data.
Rules- of 'tabulation
Having discussed the meaning, importance and ~es of tabulation,
it is necessary to lay down certain rules regarding construction of tables.
The following general rules should be observed in the copstruction
of tables : -
1. The table should be precise and easy to understand. It should
not be necessary to go throJ.l.gh footnotes or explanation to properly
understand a table. .
2. If the data are very large they should not be crowded in a
single table. This would increase the chances of mistakes and would
make the table unwieldy and inconvenient. Such data can be presented
in a number of tables. Each table should be complete in itself and
should serve a particular purpose.
3. The table should suit the size of the paper and, therefore,
the width of the columns should be decided beforehand.
4. There should be thick lines to separate the data under one
class, from the data under another class and the lines separating the
sub-divisions of classes should be comparatively thin.
5. The number of main headings should be few though there
is no harm if the number of sub-headings is large. This will he 'p in
understanding the main points of the table.
6, Captions, headings or sub-headings of columns, and sub-
headings and sub-headings of rows must be self-explanatory.
7. Those columns whose data are to be compared should be
kept side by side. Similarly percentages, totals and averages must also
be kept close to tl;le data.
CLASSIPICA'l'ION, SElUA'l'ION AND TAlIOLA'l'ION 75
8. As far as possible figures- should be approximated before
tabulation. This would reduce unnecessary details.
9. The units of measurement under each heading or sub-heading
must always be indicated.
10. Total of rows should be placed in the extreme right column,
though sometimes they are placed in the first column after the vertical
captions on the left. The totals of columns should ordinarily be placed
at the foot though in some cases it is helpful to place them at the top of
the table.
11. Items should be arranged either in alphabetical, chronological
or geographical order or according to si2:e, importance, emphasis or
casual relationship to facilitate comparison.
12. If certain ii gures are to be emphasised they s!-.ould be in dis-
tinctive type or in a "box" or "circle" or between thick lines.
13. When percentages are given side by side with original figures
they should be in a separate type-preferably italics.
14. If some portion of collected data cannot be classified in any
class or division a miscellaneous class should be' created and the data
shown in it.
15. There should be a proper title to each table. It should tell
what exactly the table presents.
Besides the rules mentioned above, the figures should be scruti-
nized before being entered in a table. Below a table, should be given
the method of collection, sources of data, general results obtained and
their limitations. The probable error should also be mentioned.
It Rhould be remembered that there cannot be any rigidity about
these rules. Tables must suit the needs and requirements of an in-
ve~tigation. Bowley bas correctly said that "in collection and tabu-
lation common sense is the chief requisite and experience the chief
teacher."

Questions
I. What do you understand by classification, seriation and tabulation? Dis.
cuss their importance in a statistical analysis.
z. "Classification is the process of arranging things (either actually or notionaliy)
in groups or classes according to their resemblances and affinities giving expression
to the unity of attributes that may subsist amongst a diversity of individuals!'
Elucidate the above'statement. ' (B. Com. Allahabad, 1947).
3. How would you proceed to classify the observations made and what points·
will you take into consideration in tabulating them? Mention the kinds of tables
generally used. (B. Com. Agra, 1941)
4 What precautions would you take in tabulating your data ?
(B. Com. Agra, 1933).
1. "In collection and tabulatiQn common sense is the chief requisite and ex·
perience the chief teacher."-Bowley.
What precautions in your opinion are necessary to avoid statistical errors in the
collection and computation of primacy' data? (M. A. Agra. 1940).
76 PONDAHBNl'ALS OF STA'l'ISTICs

6 •• DlacUSI the main functions and importaDcc 0.£ tabulation in a schcmc in in-
vcatJgation. Prepare blank tables to show distribution of students of a coUc~ accord.
Ing to age, class and residence for arranging (a) Physical training and (b) Tutorial classes.
7. (or) Draw up a blank table with suitablc beadings, spacings, table of lincs.
etc:. in which could be shown the number and tonnage of ships enteted and cleared
at ~ in India for 10 years distinguishing steam and sailing vessels anel also tbose
with eatgOCB from those in ballast.
(b) What do you mean by "A statistical Unit of Measurement:; Give a
auItab1e illlJ8tfttion. (B. CO/JI. H()JIs. AiMDTII, 194%)'
·8 Draw "P two independent blank tablcs giving rows,-columns and totals in
eacb ease swnmatlzing thc dCtails about thc members of a number of families distingue.
shing males from females, earners from dependants and adults from chUdren.,
g. Draw up in detail, with propct attentioCl to soaclng double lines, etc.,
and showing all sub-totals, a blank table in whIch coulcl bc entered the numbers
occupied in sil[ Industries on two dates, distinguishing males from females, and
ImODI the latter single, married and widowed. (M. A. AlIi/., (940)
10. &plain how you would tabulate IItatistics of death from principal diseases
by 1CZeI, in two dUfcrent provincea in India for a period to five years.
(M. COllI. Ct:Iflllla, 19")'
U. Prcpa:rc a table with a proper title, divisions and subdivisions to represent
the following heads of !nformation : -
(a) ~rt of cotton piccegoods from India.
(b) To BlUm.. China, Java, Iran, lraAJ.
(t) Amount of piec.egoOda to each country.
Value of piecegoOds to each country.
Prom 1939-40 to 1945-46 year by year.

m To amount exported cadi year.


Total nlue of" aporta each year.
(M. CD",. A.lld.; .1946).
u. lhplain the ~poac and methods of classification of data. How are the
madllae tabiiladng caida prepared and used 1
15'. Prepare a blank form with luitable heading and lpacing for use in collection
()f data on ODe of the following : -
(.) Sut'f'CY of tradCli in your district.
(b) StancWd of living of middle class families in a small town.
(t) .Bzpea&el ollt\Jdenta in a Wlivenitv.
14. DistinJZU!sb between qnCoway, tWo-way, three-way tablc:a and tables of
bJsher order. lnustrate your aDlWCZI with elWJlples
IS. Write ahort DOtea on : -
(a) ClauHicatiOD ac:cotdJag to attributes.
(t) Clasa limits.
(f) Magnitude of c:lau interval.
(tI) Q,mplcz tabulation.
(e) Class frequency.
16. (4J) What f. the motivation for arranging ob8ctVed data in a frequency dis-
tribution with a number of c:laaI-fntctvals of the variable jI
(b) What are tIle ~tlnclples governing the choice of (I) the numhct of class-
Intctral... (II) the length of the clus-interval and (/Ii) the mid-point of the clan
Interval ?
(I) It ill said that in obtaining a frequency distribution of the ceolhll agc
remrna, the mid-point of the elas:I-Inte:rnla should be multipl.led by S. GiVe an expla-
natJoo.
CLASSIPlCATION, 8BlUATION AND TABULAT~ON 7i
(I) For a frequency distribution of marks in bistofI of zoo candidates
(grouped in .intervals O-S, J-1o.' ..... etc.) the mean an4 standard devistioll were
lOund to be 40 and 1J. Later it Was discovered that the score 4' was misread as"
in obtaining the £~uency distribution. Find the corrected mean and stsndard
deviation corresppndmg to the corrected frequency distn'bution. (1. A. S., 1951).
17. You ~re,'given a statistical table. What questioDs would you 115\ before
accepting it P Draft a form of tabUlation tc show : -
(6) Sex; (b) Three tsnb-Supervisors, assistants, and clerks; (,) Years 1918
and 194~;(tI) Age-groups :-18 years and undor, over 18 but less than H years, over
H years. (D. A. Mlll/r6s, 1953).
8. What information can be I>btained from a frequency distrIbution P
19' What are the advantages aDd disadvantages in having a large number of
class intervals P Discuss.
10. Define Frequency Distribution. State the principles to be observed in Its
formation.

, The follOWing is a record of weights of 70 students (in Ibs.). Tabulate the


data in the form of Frequency Distribu:ion, taking the lowest class as (60-69) :-
61 73 93 107 111 76 78 69 96 72
80 88 96 109 103 84 84 106 91 7J
91 92 101 91 101 90 77 10 5 90 86
(13 101 114 7 1 77 1I8 9S 63 99 81
100 106 ~7 89 91 1 07 III 76 8, 86
106
109
107
97
62
74
94
98
73
67
108
Sa
liS
10 4
8S
88
98
88 91
9'
II. ' Make a f!'C<JUency table (n descending order in Inclusive from the data
gf'f'!=l below. eelectlng III c_ Interval of, units each.

1" I,.I,.a.,
17. a" 19. 11,,14. 'I,.
i6.17. I,.
Ja, 18. ale 15. 20,
10. 22, .17, 21, 19. 19. 16. 18, II, 18, 10 •
11.
19. 17. t6. 14.

. u. Following III, the m:ord of matkll obtafned by go caodldatell In ao eumi-


nadon. Form It fnlqueney dlatrlbutlon.
84, 91. ,8, 71 • 44. 87. 76.4" as, 40. 75. 86. 77. 1S. n. 71 , '4. 46, H. 45,
n.76• 94. 6,. 74, ,0,6,,80. '7, ", $6. H. 91. n. 6,. 69. 47. 119.57. n. b,40,
27. 84, H. %9,51.72 • 44. 19, 11, 67, 58, 76.3 8• 16, 37. 74, 46. 50. IS. '9. a7A 92,
IS. 4'. 61. '9.78, 15. 12, 71, 6a. II, 41, ,8, 27, 66, St. 29,6,.47.59.19. a., ".
39. 80. ". (S. A. A/JuInu. I,,,.)
2,.
Convert the following data of clas&-&equendea c:umuJated from the top
and frOm the bottom Into usual type of class-mtervals with Indlvldual c1u&-&equendel
(1) C1ass-frequenclea cumulated from (II) ClasArequendee cumulated from
the top the bottom.
Below
. ,
Matb Studen~ Abo~e Mub Stucknta

".,
10
I,
10
2ll ..... ,
0 n
45

.. ..
57 10
20
I,
50
H
1,
20
"
IS
S
"
78 FUNDAMENTALS OF S'l'ATISTICS

.24. In an enq~find out relation between age and monthly wages, the fo1.
lowmg data were co from 40 mill workers :
S. No. Age(Ycats) Wagc(Ra.} S. No. Agc(ycars) Wagc(Ra.)
I. 37 81 :11 41 89
1. al 100 aa 38 9a
3· 49 101 a3 41 8I
4· ,6 109 24 37 140
S· 57 (02. as 4S 94
6. 34 104 a6 4 6 .n9
7· 25 8( 2.7 28 99
8. 48 tit, a8 43 109
9. 51 100 2.9 41 92.
10. 41 89 30 31 no
n. 4, 15' 31 5S tao
12.. H 101 32 42 115
13· 38 99 H 4Q 119
14· 41 U3 H 4S 90
IS· 31 100 3S So 76
16. 30 99 56 24 IS8
17· 55 130 37 :n 76
til. 30 159 38 u 76
19· 2.9 90 59 al 94
ao. u 79 40 58 89
Tabulate the above data in the following form ! -

No. of Mill Workers


Wage group
in RupccB
----
Age group Age group Age group Total
2.I -3 0 yra. 51-40 yra • 41 -S o yts
76-100
. _---
101-US
ltG-ISO
lSI-In
--------1--
Total

Note I-Your answer should show the 'actuaI procC$s of tabulation.


(R4j. B. CfJ1It. I9S8)
as. Preecnt the following information In a suitable tabubt fortnl-
In 1940041 the total production in India (in thousand tons) of the principal oll-
seeds Willi l1li follows: Ground-nuts 37°2; linseed 454; rape and mustard n05; castor
lOS; sesamum .33; Next 'year the production of each of the 6rat three Items £ell by
56% and of the remaining Items fell by 10% each.
In 1942.-.45 ~ere Willi an incrClllle compared to the preceding year of 8%.in ground·
nuts, t2.% m Imsecd, 1% In rape and mustard, 50% in castor and 10% m See8mum
Tn tile next year the figures were respectively ,825,395, 955. 140 and #1·.
(M. CIJIIJ., IJeIb" 1959).
26. The followin~ is the su1lllD9.fY of the time of leaving home and the number
of hours spent in the I.llstitution of a group of teachers In Bombay unlvctsity : -
One teacher leaves the bome before, .30 a. m. and spends 4 hours in the instl·
tution. Orthe 2.3 teachers who leave thelr homes between 6 and 7 a.m. 7 teachers
lpend 3 hours, II teachers .. 4 bours, 1 teacbers ... s hours. and 3 teachers 6 hours. Of
the 16 who leave between 7 and 8 a. rn.,4 teachers spend 3 hours, 6 teachers ... 4 houta,
79
I teachct •.. S houn and ~ teachera ... 6 houJ:l'. C ~. 9%, v.Lo leave-between 8 and 10
a.m., 6 teachets... 3 houtsl 9 teachera ... 4 houts, :zx'teacher ", I-OUI1' md 46 teachers••.
6 houts. Of the :n who leave between 10 and II a.m., %r,1II,r-.htts"'5 houts, S teachers
· .. 4 houts, 7 teachets ... Shouts and 4 teachets... 6 houts
Present the summary In a suitable Tabular Form
(Raj., B. CtIIII., 1961).
27. Ar.rangc in a suitable tabular fotq!. the following I
The Food Grain Enquiry Committee made t}le following comparable study of
size of holdings in the Eastern U. P. and the rest of U. P.
In the 14 eastern districts of U. P., holding. below 2 acres account for 20% of
the area under all holdings comprising a total area of U%SO (thousand acres); the cor-
responding figures for the rest ofU.P. are II% and %9036 (thousand acres). Smilarly,
the proporton of area covered by holdings exceeding 2 acres but not exceeding ,
acres to the area under all holdings Is 29% in 14 districts and only 3% in the rest of
U. F. On the other hand the proportion ot area covered by holdings exceeding 5 aerea
is much greater in the rest of U.P. than in the 14 districtS. (Delhi, M.A, I9S8).
(13anaral, B. Com., 1960).
2S. In a newspapct account, describing the incidence of infl1lCllZ8 among
tubercular persons liVing in the same family, the follOWing paragraph appeared:-
"Exactly a fifth of the 1.00,000 inhabitants showed signs of tubercluosls and no
fewer than SOOO among them had an attack of influenza, but jUUong them only 1000
lev-ed in infected houses. In conttaSt with this 1 (15th of the tubercular petsons who
did not have influenza were still exposed to infection. Altogether 21,000 were at.
tacked by influenza and 41,000 were Cxposed to the risk of infection, but the number
h8"l'ing an attack of influenza but not of tuberclosis and living in bouses wh~re no other
Cases of influenza occurred was only 2,000."
Redraft the information in s concise tabular torm.
(M. Cmn., AgrfJ, 1962).
(R. A. S., 1960).
(M. A., Delhi, 1957.)
z9. The following figures give the height in Inches of 80 studentB of a class
Represent the data by a frequency distribution with suitable class-Intervals : -
6Z.I, 65.5. 6~.0, 62.2, 64.7, 63.1, 6S.8, 62.3, 60.7, 63.2, 64.1, 59.6, 64.S, 61.1.
65.7,60.2,64.3,67.4, 64.S, 664, 64. 2, 6204, 63·3, 64,0, 6z." 6,.4, 66,3, S9'9, 63·5, 61.8,
6S.4, 67'3,60.4,6,,6,59.1,64. 8,61.9,62.6,67.0,68.1, '9.4, §3.6 64.4, 62.0, 63.7, 6S.3"
63.8,667, 63.9, 60.8,63.0,64.3, 6uz, 6%,7, 64.6, 64.9, 60." 64.4, 61.7, 66.5, 66.6'
63.4, 6S.2, 66.2, S9.7, 67.6, 63·5, 67.41 63. 6, 68.5, 60.0, 61.3 63.6, 61.S, 6,.1,6%.8,61'3
64.0, 68.7, 66.6.
30. Ammge the followihg mark. in a Frequency Table taking the lowest class
nterval 10-20 :- ~

13, 81, 58, 81 SS, 7S, 61, 70, 84, 84, 81. 87, 67, 6" 62, 62, 61, S9, 5S, 57, 75,
72, 84, 91, 87, 76, 43, 83, 40 , 73, 86, 73, 43, 33, 76, 95, 73, 65, 77, 72, 72, 29,
43 85, 4%, 80, 75, 85,62, 57, 64, 70,95, 57, 74, ,0. 7S, 49, 55,64, 92, 73, 73, 96,.
69 51, 22, 7S, 80, 36, 70 8S, 47, 69,63, 53, 91, H. 69, 30. (AndbrfJ, B. A., 1914)
31. Tabulate the following data by taking 10 as the cIas.·lnterval :
30, 45, 55, 65, 60, 90, lIS. 8s. 95> 100, 95, '65, 75. 8S, IZS, lIO, 87, 6"
100, lIS, 65, 60, 75, 9S, 130, 95, 125, II5, 6" 70, 9" 8" 6S, 60, 80, 8"
75, 95. 55, 45, 35, 45, 40, 85, 135, 140, 9S, 65, 4S, 3', U5, 90,80. IZS, 130,
~5. 90, 100, 95, 85, 85, uo, II5, 40, 35, 12 5, 35, lOS, 7',45,
(B. CtIIII., Vwa"" 1964).
Ratios, Percentages And
Logarithms 8
RAnos. AND PERCBNTAGBS

Need. Mter the statistical data have been collected, edited, cla$sifi-
ed and tabulated, they are ready for further statistical analysis. In the
process of classification· and tabulation the size of the data is considerably·
reduced and a large number of figures are condensed. This is done with
a view to make the data easily understandable and fit for analysis and
interpretation. But even after condensation, data might be fairly large
in quantity and the figures may be very big and unwieldy. It·may not be
easy to draw inferences from them. To remove this difficulty, sometimes,
ratios and percentages are calculated so that big figures are reduced to
small ones and 11. relative study of the data is possible. Absolute figures
ue uafit for relative study and in statistical analysis where most of the
data ~ compared relatively, absolute figures, even though they arc:
esset;ltial do not have very great· significan~.
Derivatives
Ratios and percentages are obtained by a combination of two or
more figures. They are J,ri",J from the absolute figure~ collected for the
putpose of investigation, and that is why. they are sometimes referred
to as utkriflflnfllt." Derivative is a quantity. obtained by the combination
of two or more figures. In a statistical analysis a vanety of derivatives
are used. Ratios, percentages, rates, coefficients, measures of central
tendency and meas~s of dispersion., skewness, kurtosis are all statistical
derivatives. Ratios and percentages are nlJlpl, JlriIJ4/iWI while measures
of central tendency or averages of the first order and measures of dis-
persion.and skewness or averages of the second order arc ~oClpl,x tllrilJa-
lilll", as in their calculation a number of statistical processes nave to be
undetgone. Simple derivatives may be either to-er_1I ()1' mlmJilk1h.
When two or more parts of a universe are.compared with each other ~th
the help of ratios or percentages these derivatives .are called co-ordinate
derivatives, and when a patt of the universe is co~=d with the Whole
of the universe derivatives are said to be subor teo The ratio of
females and males in a population is an example of co-ordinate derivative
and the ratio of females to the totall?opulation is a subordinate derivalive.
Ratiot. In the simplestjOSSlble form, a ratio is t\ quotient or the
numerical quantity obtaine by dividing One figure by another. 1£
800 is divided by 100 the quotient is 8. Here 800 has been compared
with 100 which 1S the base in this case. In other words, 800 is to 100
tlS 8 is tQ 1. Or 800: 100: : 8 : 1. 'the process reduces the s* of the
numben and thU9 facilitates comparison. Instead of saying that the
RATIOS, PERCENTAGES AND LOGAllI'1IHMS 8"1

production ofw~eat in cQuntry A is 3,SO,800 tons and in B, it.is 1,00,000


,tons, we can say that the ratio of prdduction in these countries is 3,50;800
tons to 1,00,000 tons Or 3.508 to 1. Ratio is the siinplest form of
relative :comparison between two fi,gures. Ratios are used in a number
of ways. In ordinary walk of life, the common man is sub~onsciously
aware of 'them. "Nehru is one man in a million", or ICcost of living
has gone u~ four timesv or "he can lift thrice his own weight" are
expressions 1n common use. They indicate the univetsal use of ratios.
P,"",t4glS. Ratios are very 'often expressed as percentages. In'
the calculation of percentages also, one figure is taken as base 'and is
represented by 100. The other figure is expressed as a ratio of t~s base.
80 is 400 0 0 of 2.0 or 80 : 20 : : 400 : 100. Instead of. saying that the
export o~ a commodity ftom a country was valued at 45 lakhs in the y,ear
1955, and 30 lakhs in the year 1954, we can say that the exports in the
year 1955, were 1500 0 of the figures for the yeat 1954.
RatlS. Instead of 100 as base other figures lik~ 1, 10, 1000, 10,000
can also be used. Usually when,the b~se is 1000 the ratio 'is called rate.
For example, ihhe number of deaths is diyided by the total population
and the quotient is multiplied by 1000 we will get what is called It crude
death rate. However, thete is no hard and fast rule that a rate should
have the base of 1000 only. Rate per '1000 is called rate per ",ilk:
C"'Jfiti",lI. Rate per uOlt is «alled a ~6efficient. The death rate in
India at present is .bout,1.7 per cent or 17 per thousand. We can say
that the co-efficient of deaths 18 .017. If this co-efficient of .017 is ululti-
plied by the total population we shall ,get the total number of deaths'.
Sit' oj tIJ, ball. The abo~e discussion clearly indicates that the
diJrerence betweef\ ratios, rates, percentages and co-eflicients is only io
the base on which they ate calculated, otherwise all of them give a relative
picture o( two phenomena which are interrelated. Which base should be
chosen for the putpose of comparison is a question which can be decided
alter taking into account the nature of the data. OrdiDArily the base
should be large enough to permit the numerator to be expressed as a
whole number and it should be snudl enough to prevent more than three
digits appearing in "tl,le<numerator. to the left of the decimal point. The
death rate in India is t 7 per thousand. If the base was reduced from 1000
to to the numerator\:woUld be 17 which is less easy to understand. If 00
the other hand, it W'l\s raised from fooo to 1.00,000 the numerat01 would
be 1,700. This violates the principle of ratio which is tD , _ , /arK.'
1I11111b"s ID S/llall" fJIII~flJf' 1/)1 lah of ,a1.7 IIIItlIr .rlaiuliflg ami 11114!1.ri.r.
TJjHl of btllu. Various types of bases can be llsed for the compu-
tation of ratios. Som~ of the important ones are-:-
. 1. To/alto total. If one group o! ~res (as a whole) is compared
wlth another group, the base of the ratio would be the total of one of the
two groups. Income per capjta is ,an example. In its calculation total
income is divided "y 'the total, population:

"
82 FUNDAMENTALS OF STATISTICS

2. Total to part. Where a part is compared to a whole or universe,


the hase of the ratio is usually the value of the universe. If ratio offemales
to the total population is calculated, base would be the total population
and the ratio would be obtained by dividing the number of the females
by the total population.
3. Part to part. If the ratio of males to females in a population
is studied the sex ratio of the population will have to be calculated.
This ratio is usually expressed as number of fe!I}.ales per 1,000 males.
Here the base of the ratio is one of the two parts-males in this case.
4. Past to present. If the production of wheat in India in the year
195: is to be expressed as a ratio of the production in the year 1954, we
shall have to use the figures of the past, that is, 1954 as the base of the
ratio. When it is said that th~ production of wheat in 1955 was 110
per cent of the production in 1954, it means that the production of 1954
is represented by 100 which is the base.
5. Standard area, distance and units. Sometimes the base of the ratio
is a standard area as in the case of population per square mile, or standard
distance as in the case of cost of railway line per mile or a standard or
conventional unit as in the case of children per faJl1i/y, students per ;.hool
or room per house, etc.
6. Arbitrary ratios. In many enquiries it is pos~ible to use arbi-
trary units and they sometime,s give better results than e\ren conventional
units. Examples of such units are horse-power, ton-mile, light-years,
class-hours, etc.
As has been said earlier, the most common arbitrary units ate 1, 10,
100, 1000 and 1,00,000. Among these 100 (or percent) is the most
popular arbitrary base.
Ratios between like and IInlike tlnits. In order to facilitate comparison
in the shape of ratios it is essential that the two figures compared should
have the same characteristics and should be expressed either in the same
unit or in comparable units. We can calculate a ratio between produc-
tion of wheat (in tons) and export of wheat <in tons). Similarly consump-
tion of cotton (in' bales) can be compared with its production (in bales), but
we cannot calculate ratios between production of wheat <in tons) and the
c0nsumption of cotton <in bales) .. Death rates are comparisons between
persons and persons. Number of persons dead are compared with the num-
ber of persons constituting the total pc·fmlation. However, in many cases
comparison has to be done between items which are expressed in dii'erent
units. For example, total income of a country is divided by the total
population to find out per capita income. Similarly in the comparison of
the number of miles -done with a gallon of p<;trol the units are different. A
direct comparison of rupee and persons or miles and gallon cannot be done
but they can be reduced to a common denominator. The common denominator
in such cases is a number or quantity. Thus, in comparing total income
with the total population we really divide the number of rupees representing
the total inco~e by the number of persons representing the total population.
RATIOS, PERCENTAGES AND LOGAa.ITHMS 83
Similarly in the second example above we compare the "lIIIIb". of miles
with the 1lfl1llber of gallons. We thus find that in these cases also the nu-
merator and denom~nator are identical. They are both IUImberl.
Fallacies in the use of percentages Clnd ratios
Original data IhOlild be comparable. The percenta~s and ratios
should be used with caution otherwise they are liable to give misleading
conclusions. Percentages should be used to compare only such data
which are comparable in actual figures. If the original figq,res are not
strictly comparable with each other, for some reason or the other the per-
centages should never be used and the original data should be presented.
Usually it is said that percentages give a better idea of the relationship bet-
ween profits and capital than the original figures. It is for this reason that
profits are usually expressed as percentages of capital invested. This
is all right, but if the figure of capital investment undergoes a change in
any year, and if only percentages of profits of capital invested are shown,
it is likely to give wrong impression. An illustration would make the
point clear. Supposing a company has a capital of Rs. 1,00,000 for the
last five years, and its profits each year are Rs. 10,000 which means that in
each of these 5 years the profits are 10% of tp.e capital. Suppose in the
next year the capital of the company is increased to Rs. 1,50,000 and the
amount of profit is Rs. 12,000. The percentage of profit to capital would
now be only 8. If only percentages of profits are shown for these six
years, it would appear as if the profits in the sixth year have gone down
wJ;lereas the profit has increased by Rs. 2,000. This is so on account of
the fact that the original data of capital and profits are no mote comparable
as the amount of capital has increased. In such cases, -where the homo-
geneity of the original data has been disturbed by any factor, percent-
ages should not be used without the original fgures being shown side
by side.
Balil of calClilation of percentages. Misleading conclusions by the
use of percentages may also be derived' on account of the wrong basis
of the calculation of the percentages. It is essential that. the basis of
the percentage calculation is correctly observed. An example would
clearly illustrate, bow wrong conclusions can be arrived at, if the basis
of calculation of percentage is not properly observed. Suppose it is
said that the lrice of a commodity" fell 5%, then went up 10%, then
a~ain fell 20~ I) and then again went up 25% over a period of time, it is
dIfficult to find out the change in the price level over the whole period.
Two different answers would be obtained accordingly as the basis of
calculatign of percentages is
(/) the original price DC
(il) the price level ruling at the time of the change.
I~ ~he original price level was 100, changes according to the first
Supposition would be : -
Original price 100
95
5% fall :. price becomes. 100 X100 95
.
34· FVNDAMENTALS OP STATlSTtCS

9~ + 10Xl00
100ft, rise ..· price becomes :I 100 - 105
2OXl00
20% fall ..· price becomes \05--- -
100 - 85

25% rise ..· price becomes 85+ 25r~OO - 110

Thus according to this method the prices went up 10% over tbe
)riginal price.
Using the second supposition : -
Original price 100
95 X 100
5% fall /. price: becomes
100 = 95
10% rise ·.. price becomes 110x 95
100
= 104.5

2f>% fall ..· price becomes


80.x104.5
100
83.6
125 X 83.6
25% rise ..· price becomes
lOQ
104.5

Or we can directly calculate the price as


95 110 80 -125 .
100 XIOO X lOO X 100 X 100 - 104.5

Tlius according to the second supposition the rise in price! level


luring the whole period .is only 4.5%.
S;~.' oj ;IIIIIS. If the sae of items is small percentages may give
nisleadfug conclusions. If, for example, in'a school out of 2 candidates
who.appeared in an examination both pass, whUe in another school out of
~OO appearing 190' pass. The percentage of success in the first case is
loo·and in the-second 95 only. On the basis of these percentages we.will
lot be correct in drawing the conclusion ·that the first schoo. is better
:han the second. Therefore, percentages should not be used if the she of
tems is small, say, less than 100.
Pr,ttlllnons ill 1111111' oj ratios. Due to the fallacies in the use of
)ercentages mentioned above, sometimes the use of ratios may be advOcat-
:d in place of the perc:entages. In the example given in the last paragraph
tbout the success of candid*tes in .two schools, ratios would have given
1 better picture. If the ratios of suCcessful candidates to the total number
is. to be expressed, they Wduld be 2 : '2 and 190 : 200. The fallacy which
'lie observed by the use of percentages in this case is not foUnd here, and
:he ratios give a clear picture of the 'whole situation. However, ratios
sb,ould also be used with caution, particularly in comparing unte1ated
tnd heterogeneous figures. If out of t,OOO persons in a locality 200 are
tttacked by small-pox, the ratio of persons not attacked to those attacked
85
would be 800: 200 or 4: r. 1£ in another locality out of 1.000 persons
who were inoculated only 100 were attacked the ratio would be 900: 100
or 9 :t. These ratios, namely, 4: 1 and 9: 1 give the impression that
the second localj.ty.is healthier than the first. but it may not be so, be-
cause a lesser incidence of small-pox'in the second locality is most J'ro-
bably due to inoculation. The concl~ons thus would be fallacious.
Therefo.re, in cases where the data are not strictly homogeneous, ratios
should also be used with a great amount of caution.
S9me popular Ratios used in Population Stu~e8
Ratios, rates and coefficients are ·rised iC& all types of st ~s. In-
.:~me pet capita, population per square .mile, production per acre, turn-
;ob..,e.r .ratio, lixed-as~t ratio, lntellig_ence qu~tients, etc., art: examples. of
VUlOUS popular tattos used. We gJve below some of the popular tabos,!
Used in population studies : -
Ct'lllll tlllllh and birtlrrat,. Crude death rate for a locality is found by
dividing the .total number of deaths in that area, during the year in ques-
tion, by the number of people living in that ~ty at the mid-point of
that year. It is 1lcSually expressed per thousand (or p~r fllil/,).
Thus
Crud d th t _No. of deaths in a locality .in a ~ 1000
e ea ra e No. of people living during the mid- X
point of that year
'Crude birth rate is similarly found· by dividing the. total number
ofDUths in a l~ty, during the year in question. by the total popuktion
of the l~ty at the mid-point of the year.
Thus
Crude birth· . No. of births in a locality during. y~ 1000
mte - No. Of people UVi1lg during the mid- ~ .
point of that year
SliIIIIhrtlifld tkath aNI wth ratll. Crude death-~tes ahd bfrth-rates
are usually iJ.<?t fit for comparing conditions in .two or more localities.
The crude d~~tate of two places may be identical and yet their mortality
pattemJ may be entirely .dissimilar. S~ly; the fact that' etude birth
rates of two places are identical, docs not necessarily mean that the fertility
pattern of ~e two places is similar. This is on account of the fact that
age cOJJlpOSition.of two populations may dUrer from.each bther. The
~tages of people in dilferent age groups may not. be oimilat in the
two populations, and if it is so, the comparisons are bound to be fallacious.
To remove these dtawbacks death-rates and birth-rates are standarc'li%ed.
In the calculation of standardized rates it 'is presum~ that the age com-
position of the two populations is identical. The difFerence betWeen the
age compositions is elimio.ated with the technique of presuming a popula-
tion as tIOrIJIalor standard. Standarclli!ed death and birth-rates are calCUlated
by as~ting·the ctude death and birth-tate in dii£crent age groups with f.
86 FtINDAlmNTALS OF STATISTICS

normal or standard population instead of the actual pop_ulation of the


locality. The following illustration would clarify the pOJnt : -
Example 1
Suppose we have to calculate the crude and standardized death rates
of two towns A and B. Their age composition and mortality patterns
are as follows : -
TABL~9
Calculatidll of Crude dealh-rales of 10lllni A and E.
' Town A . -_ -- Town B
Age / Popula- No. a)f Oeath- Popula- No. of' Death-
Composition bon death s rate per tion deaths \ rate per
1000 1000
I-----~ - ---
Less than 5 yeats 3000 180 60 1500 75 50
5-20 years 5000 200 40 2200 55 25
20-50 years 4000 120 30 2800
( 56 20
2000 140 70 2500 150 60
--~- --
Total 14000 640 45.7 9000 -3~ 37.3
Crud, tiealhraJe of Town A
640
= 14000 X 1000 -=45.7
It catl also be calculated as :
~_?OX3000)+(40X5000)+(30X4000)+(70X2000) = 6,40,000 =457
3000+5000+4000+2000 14000'
Crude death rale of lawn B
336
= 9000 X 1000=37..3
It can also be calculated as :
_(50X1500)+C25 X200) 1-(20X2800)+(60 X2500) _ 3,36,000 =373
1500+2.200+2800+2500 - 9,000 •
As has been said earlier these rates cannot be directly compared.
To calculate the standardized death-rates we shall assume a standard'popu-
lation. Supposing the age composition of the normal or standardized
population is as follows : -
TABLE 10
Standard Popliialion
Age composition Population
Less than 5 years 200
5-20 years 250
20- 50 years 400
Above 50 years 150
RATIOS, PERCENTAGES ANI: LOGARITHMS 87
Now the standardized death-rates of the two towns A and B w;~ld
be calculated as follows : -
TABLE 11

. Calculation of standardized death-rates of towns A and B


Standard Death-rate Col. 2x Death-rate Col. 2 X
Age Groups pop~la- of town A Col. 3 of town B Col. 5
tion
1 2 3 4 5 6
-
i '!ss th:tn 'S years 200 -'60~
12000 ---5(r- -10,000
S. -20 years 250 40 tOOOO 25 6,250
20-50 years 400 30 12000 20 8,000
.4_2ove 50 years 150 70 10500 60 9,000
Total-- -1000 - - - - 44500
33,250
Standardized death-rate of town A = 44500 =44.5
1000
Standardized death rate of town B = 33250 = 33.25
1000
We could have calculated the number of deaths on the basis of the
standard population by applying the original death rates and got the same
answers. For example, on the basis of a death rate of 60 per thousand,
the actual deaths in !l nopulation of 200 (in less than 5 years group for
town A) would have b~en 12. Similarly, for the next 3 groups in town
A, actual number of deaths would have been 10, 12 and 10.5 respectively.
The total number of deaths thus would Mve been (12+ 10+ 12+ 10.5) 01
44.5. Since the total population is 1000 the death-rate also would have
been 44.5.
In the above ex:tmple we have presumed a standard population and
on the basis of this population we have calculated the standardized death-
rates for towns A and B. These rates are comparable with each other
because in their calculation, the diHerences in the age composition of the
two populations have been eliminated. If, however, the standard popu-
lation was not known, the population of any of the two localities could
have been assumed as standard population, and on that basis, the death-
rates would have been calculated.
General fertility rate. The crude birth-rate, as we have seen, dues
nat take into account eithi!r the age composition or the sex ratio of the
population. In the standardi:z:ation of the birth-rate also, the sex ratio
is ignored. In order to study the pattern of population growth it is
necessary to take both these factors into account because if two popula-
tions have different sex ratios, and if their birth-rates are identical, it is a
definite proof that there is a difference in their fertility patterns. If the birth
rates of the two populations are identical, the tocality in which the number
of females are comparatively less, the number of births per tOOO women
88 PUNDAMENTALS OF STATISTICS

am bound to be mo.re than in the other locality, because then only,


the two bjrth-rate~ would be similar. To remove this difficulty l"'nY»
j".IiHty rill, is calculated. It is the .ratio of the number of children bom to.
1000 women of child-bea.ring ages.
Thus
.; Total number of births
General fertthty .rate - Total number of women in child- X tOOO
bearing age group (15-45)
Spltifit j".IiHry ralls. If a more detailed study of fertility is necess-
ary the child-bearing age group can be divided in a number of smaller
groups. Fertility in the firsl: b,al£ of the child-bea.ring age group (15-30)
is more than in the second half (30-45). So, for a more detailed study,
fertility rates for specific age groups or individll!ll ages, within the child-
bearing age group, are calculated. They are called sR,djit f".tiHIJ ratls.
Here the total number of births to Women within a parttcular age-group lU'e'
divided by the total number of women in that age group. Thus for
women in the 20-25 age group. the specific fertility rate would be
Specific fertility 1;'otal number of children born to women
rate for women in 20--25 age group
in 20--25 age = - X 1000
group Total number of women in 20-25 age group
If the specific fertility rates for various ages are totalled together,
the ~esultis called lolal jertililJ ral,.
Gross rePTDtlIItliofl rat,. For a study of the £opulation growth, the
calculation of general fertility rate or even specific fertility rates is not
enough, because in their calculation two important factors, namely the
sex of children born and mortality ,are not taken into account. These
factors are considered in the calculation of reproduction rates. In gross
reproduction rate mortality is oot b!.ken into account, only the sex factor
is considered but in. the calculation of net reproduction rate both these
things are accounted for. Sinc~ children can be bom only b.y·women and
since in their case it is easy to lay down the limits of fertility ,period, re-
production rates are calculated by takjog into account the mothers and
f~e children only. Now-a-days, however, lIIal, "prodmtiOll rates and
even tOlllbilllJ reprotbitliOll ralls (for males and females combined) are also
calculated.
Female gross reproduction rate tells us about the number of female
children expected to be born to 1000 newly born females, during their
life-time, on the assumption that none of these 1000 ne'Wly born females
would die before crossing the upper limit of child-bearing a~e period and
further ,that the current fertility rates 'Would continue to rema1n unchanged
during the 'Whole of this period. Thus if 1000 newly born female children
remain toqo till the age of 45 (which is the upper limit of child-bearing
age period) or in other words, if there fs no mortality in this group till
R.A'rIOS. PBRCENTAGES AND LOGAIU'I'H¥S

th!e age of 45, and if dur.ing this period. on the basis of eurre nt fettility
rates, they give birth to 2412 femAle child.r~ the female g,ross reproduo-
tion rate 'Would be 2.412. Reproduction rates are generally apressed in
terms of UQity. It means that on. the assumptions mentioJ1ed above ·for
each mother at the present moment there would be 2.412 mothers in
future.
Thus
Number of female children expected to be born to
Female gron "\000 newly born females on the basis of current
,.
reproduction fertility Without mortality

tooo
N" "prodlttliOfl rat,. As bas been noted above the gross repro-
d~ction rate does not take into account" the factor of mortality. The
net reproduction rate takes intO account this factor also. Female net
reproduction rate tells us about the nwnbcr of female children ~
to be bom to 1000 newly hom females <;)n the basis of ~nt fertility
and mortality rates. It is quite obvious that neheproduction rate 'Would
be less than the gross reproduction rate. 1000 newly born.females 'Would
in actual practice not remain 1000 at the age of say 16. Some of them
would die. Supposing their number is reduced from 1000 to 800 and'
suppose further that the lCUttent fertility rate for the age of 16 is 20 per
toOO then the total 'number of children bom to them would not be 20
but 16 only~, ; If the sex ratio'is 50 ; 50 then only 8 female children.would
be taken into account for the (a}.~tion of female net reproductioll
rate. In the calculat;i.on of gross repr~uction rate 10 female children
'Would have been taken into accoWlt. In ~ age group of-women in the
dlildbearing age period, the numbC:r would go on declining due to morta-
lity .and ·the number of children bOrn woUld also be reduced. I{ sUppose
the total of 2,412 children (preswp~d in the calcUlation of gross reproduc-
tion rate) comes down to 1411. female ,net reproduction rate would be
1..411; It shows that for every present mother there would be 1.4~1
future mothers or, in other words, the populadon is growing. If net
reptoduction rate is just 1 it indicates a stationary population in fume
and if it is less than 1 it is a sign ,of declining popUlation.
'rhus
Number of female childl:en expected to be born to
Female net re- 1000 newly bom females on the basis of cutteDt
production fertility and mortality rate!>
rate
1000
• In the same way male reproduction rates can be calculated by taking
into, account the fathers and the number of male children espected to be
bom.. Combined reproduction rates for males and females can be cal-
culated by taking into account the population (both males and females)
and the number of children (both males and femaIes) expected to be bom.
90 FUNDAMENTALS OF STATISTICS

LoGARITHMS

Like ratios and percentages, logarithms also help in making relative


studies. Logarithms are short-cuts in mathematical calculation. With
their help multiplication, division, roots and powers of big and small
numbers can be easily calculated.
The common system of logs. (short form of logarithms) is based on
10. Log. of a number is the exponent to which 10 is raised to be just equal
to that number. The following example would clarify this point : -
1,00;000 = 105 therefore the logarithm of 1,00,000 =5
10,000=104 " " " " 10,000=4
1,000=103 "" ,,1,000=3
2
100=10 " " " " 100=2
10=101 " " " " 10=1
0
1 =10 " " " " 1-==0
.1=10- 1 " " " " .1=-1
.01 == 10- 2 " " " " .01 =-2
.001 = 10-8 " " " " .001 =-3
.0001 = 10-' ,. " " " .0001 =-4
.00001-=10- 5 " " " " ;00001 =-5
The logarithms of the above figures are all integral numbers. The
logarithm of 10 is 1 and of 100 is 2. For all numbers more than 10 and
less than 100 the logs. would be between 1 and 2. Similarly the logs.
of numbers more than. 01 and less than 1 would be betWeen -2 and -1.
Characteristics and manti.rsa. Thus leaving aside numbers like 10,
100,1000, etc., for all other numbers the logs. would consist of an integral
and a fraction. The log. of a number thus consists of two parts, namely;
(a) An integral number known as characteristic which can be
either positive or negative.
(b) A fractional part known as mantissa which is always posi-
tive.
RnJes fOT finding 0111 characteristic. There are two rules for finding out
the characteristic of a number : -
(1) The characteristic of all numbers more than 1 is equal to one
less than the number of digits to the left of the decimal place. Thus
the characteristic of 214.43 is 2 as the number of digits to the left of
the deC.llt.J place is three. Similarly the characteristic of 48297.3 is 4
and that of ll.2 is 1 and of 7 is 0. The characteristic of 1 is also O.
(2., The characteristic of all numbers less than 1 is equal to one
nlVtf' than the number of ~eros after the decimal point and before any
significant digit. Thus the characteristic of .003801 is-3 as the number
of ~eros after the decimal point and before a significant digit is 2. Simi-
larly the characteristic of.0102 is-2, of .00012 is-4 and of .182 is-1.
Rnles for finding 0111 mantissa. Mantissa of a number is seen from
the log. tables. At the end of this book there is a three-figure log.
RATIOS, PBllCBNTAGBS .AND l.OGAlUTH)lS 91
table. Log. tables can be of 4, 5, 6 or even more figures. There arc two
things which should be remembered about mantissa : -
(011) Mantissa is always positive.
(b) Mantissa is not affected by the position of the decimal point.
The mantissa of 785, 78.5, 7.85, .785, .0785 and .00785 would be the
same. Looking at the log. table we find that the mantissa of all these
figures is. 8949. Since, in numbers less than the characteristic is negative
and ,mantissa is positive, the minus sign is not written before the log. but
on the top of the characteristic; thus if the characteristic is - 2 and the
mantissa is .8949 the log. would be written as 2.8949 and not as - 2.8949.
Finding oNtlogarithm. Thus to find out the log. of a number 'We
should first write down the characteristic in accordance with the above
rules and then should consult the log. tables and write down the mantissa.
The log. tables given at the end of this book are only 3-figure tables, and
as such, figures with more than 3 digits should be first approximated to 3
qigits, and then these tables should be consulted. The following illus-
trations would clarify these points =
log. 6789.5 3.8319
678.95 2.8319
67.895 1.8319
6.7895 0.8319
.67895 1.&319
.067895 ~.s319
.0067895 3.8319
Anti-Iogaritoms. Just as with the help of log. tables it is possible
to find out the log. of a number, similarly by the use of anti-log tables
numbers can be found out from their logs. To find out a number from
it log. first only mantissa is taken into account. In the anti-log. tables
we can look up the number asainst the figures of mantissa. After this
the position of decimal point is decided by taking into account the charac-
teristic. Thus, ifwe have to find out the number whose log. is 2.874 we
shall read the number in anti-log. table. The number against the mantissa
of .874 (.87 at the margin and 4 at the top in 3 figure tables as given at the
end of this book) is 7482. The characteristic of the numbex is 2, there-
foxe, there shOUld be 3 digits in the number. Accordingly we place
_.declmal point after 8 and the number Whose log is 2.874 is 748.2. Simi-
larly the anti-log of 2.874 would be .07482; since the characteristic is- 2
the number of zeros after the decimal point and before a significant digit
would be one.
Computation by logarithms
To mllitply nllmh!rs. To multiply numbers find out theix logs, add
them together and find out the
anti-log. Thus a X b == Anti-
log. (log. a+log. b)
92 Pt1NDAMBNTALS or STATISTICs

'&lIJIIpl, 1

Multiply. 64.7 with 29.8


(a) log. 64.7-1.8109
(II) log. 29.8-1.4742
log. a+log. b-3:285i
Anti-log. 3.2851-1928
:.64.7X29.8-1928
BxIUllP/l II
Multiply 49.3 with ;0842
(a) log. 49.3 1.6928
'(11) log .•0842 2:'9253
log. a+ log b 0.6181
Anti-log ~f 0.6181 4.150
:. 49.3 X .0842 4.150
Not,. Whatever is carried fotw'ard from mantissa to characteristic
is positive and in the addition of cbaractcristics~ plus and minus signs
are taken into a~ount. In the above eumple. one is carried Eot'Wud
from 'the mantissa to the characteristic; it is positive and when it is added
to the characteristic of the first numbet it becomes+2; the cbarac:teristic
of the second numbet ilr~and so the total of the chatactens* is O.
'BxlUllph 111
Multiply .0842 and .00741
(a) log• •0842 - 29253
(&) log••00741- 3.8698

log. lI+log.b "4:7951


Anti-tog:. 4.7951 .0006237 .
-.~2X.OO741 •0006237
Til JiPitk fIIIl""trS; To divide one number by another. lind out ttle
log. of the dividend and from it subtraCt ,the log of the divisor. Find out
the anti-log of this dilFerence. It 'frill be the requireo arlswet.
Thus
: -Anti-log. (log. a-log. b)
Exlll1lpl, 1
Divide 1928.1 by 29.8
(a) log. 1928.1 3.2856
(b) log. 29.8 1.4742
log. a-log. b 1.8114
Anti-log. 1:8114 64.71
:. 1928.1.;.29.8 64.71
93

BxIJlllP" 11
Divide .0009 by .008
(II) log ••0009
(b) • log•• 008
- 4:9542
'J.9031
log. 4-1og. b T.OSt,t
Anti-log. 1.0511
: •• 0009+.008
-
po"",.
To ,.tUIIII tillmb" 10 II
.1125
.1125
In order to raise a number to a power
of
multiply the log. of the number by the exponent the power and find
out the anti-log.
Thus aa-Anti.llog. (nxlog. a)
Exampll1
Find out toe vslue of (95.2)~
log. 95.2 - 1.9786
x4
7.9144
Anti-log. 7.9144 8204()(){)('
:. (95.2)4 82040000
'&a111p/,11
Find out the cube of .0991
log. of .0991 - 2.9961
x3
4.9883
Anti-log of4.9883 .0009727
:. (0991)1 - .00097Z7
No/,. In th~ second example above 2 which is carried forward
£rom the mantissa to the characteristic is subtracted &om the product
of 3 and 2 and thus the chancteristic of the product is .f."
To6:tlrll# tbI rool ojIIl1l1111b". To extract the root ora numbet divi-
de the log. of the number by the index of the' root and. find out the anti-
log.
Thul
,\la-anti-Iog (10: a)
&alllpill
Find out the value of ~
log. 92.4 - 1.9657
Divided by 3
Anti-lo$~_ .6552
-- 1.~6~ -.6552
4.519
:.{j92.4 . 4.519
94 FUNDAMENTALS 011' S'I'ATIS'rICS

Example II
Find out the value of 7 v.00481
log. .00487 3.6875
_ To divide 3.6875 by 7 we shall have to write it as 7+4.6875 because
in 3..6875 the characteristic is negative and the mantissa is positive and
division is not possible with the figures as they are.
So
7+4.6875+7
Anti-log. 1];696
:.'11' -.0--
The utility of logarithms is very great in statistical calculations.
As has been said in the beginning, they help us in studying propor-
tionate changes. 10 to 100 is the same degree of relative chnge 9S 100
to 1000. In a.bsolute figures these changes are different but jf we find
out their logarithms, they would be 1 and 2 (for 10 aug 100 respectively)
and 2 and 3 (for 100 and 1000 respectively) indicating that the relative
changes in the two cases are identical.
Questions
I. Defin e a statistical derivative and discuss its utility in statistical analysis.
2. What is meant by co-ordinate and subordinate derivatives ? Illustrate with
examples.
3. "Wh"t precautions are necessary in the use of ratios and percentages?
4. What do you understand by a crude-birth rate? Is it an accurate measure:-
ment of the population growth of a locality? If not, how can it be modified to
give better results?
~. What is a "standard population"? How are birth rates and death rates
standardized ?
6. What do you understand by general fertility rate? Is it an improvement
over standardized birth rates ? '
1. What statistical data are necessary for the calculation of net-reproductlon-
rate? What is the deficiency in the existing Indian data in this respect.
(M.A.AIIJ .• 1951).
8. What is net-reproduction-rate ? Explail" with the help of an example the
method of calculating it.
9. What are the various ways of the measurement of population growth? In
this connection discuss in detail the calculation of net-reproduction-rate.
CM. Com. Allahabad, 1952).
10. Point out the ambiguity or mistake, if any, in the following statements :_
(a)· 99% of the people who drink, die before reaching 100 years of age.
I Therefore, drinking is bad for longe\'ity.
(b) The rate of increase in the number of cows in India is greater than the
population,' Then'fore, the people of India are now getting more milk per head.
(M. A. Palna, 1943).
RATIOS, PERCEN'l'~GaS AND WGARITHWI 95

II. Below is given the fertility rate for 1000 women, by their age group for a
certain country for 19;6 : -

Age GrollP Per IililJ rale per AglI GrfJfIP FerlililJ rale per
1000 women 1000 women
Years Yeara
I6-2c' 19 36-40 IS7
,"[-25 173 41-45 67
26-;0 "H 46-5 0 9
3 1 -35 201

Assuming that ratio of female babies to total births for tbe country and year
concerned is 48.8%. calculate the gross-reproduction-rate for the country and explain
what this rate means.
u. ~he following are the death-rates. per thousand, per annum, of two towns
in a certaln Y,car : -
Town A TownB
Ages Death- Death-
(years) Population Deaths rate per
1000
PopulatIon D.:aths
I ~~e:~

Under a
2-10
3000
10000
191.
70
64.0
7. 0
SOOO
12.000
300
78
I
I
60.0
6.5
10-"0 10000 40 4.0 10000 38 S·8
2,0-60 ;"5 00 1.60 8.0 1Sooo 190 7. 6
60 & over SSoo 510 60.0 8000 460 SM
1\jJ---I--"-7(,,-,.-noo--
107" I ~7;- --6:;"00-00- J lO66 17.71

(a) For each age group the death-rate of town A is greater than that of
town B but the reverse is the case when all age-group. are grouped together. Why
is it 80 r
(b) Calculate the standardized death-rate for toWn B taking the popUlation
of toWn A as the standard. (B. COf1I. Andbra, 1944), (M. A. Punjab, 1954).
13· Compute crudc and standardized death-rates in the folloWing and find out
if the local population has a higher or lower death-rate:_

Standard population Local population


Age group
Years
Population Deaths Population Deaths
-
Under 5 500 15 0 5000 60
5-15 I 2.000 14 15000 30
15-6s 2.0000 60 %0000

I
80
Above 6) 8000 320 8000 400

14· Wh'lt are absolute and relative measurements r E,.-plain in this Cllnnectlol'l
the URe of ratios, ,Jercentages and co-efficients. (B. Com. -" 6"0. 1941).
IS· Write short notes on: (a) Derivative series. (b) Complex derivatives.
(r) Total fertility rate. (d) Male.rr-prodnct)rm.rat... (e) P2llacks in the use (,f
ratiM and percentages.
96 rl1NDAHBN'rALS OF ITA'l'ISTICI

A B
......__-----I-----I----------
No. of ClDcUdateI Suc:ceuful No. 0( c:aocU- Sac:cellCuJ
appeared data appeared
M. Sc.
M. A.
60 ,0
90
zoo
Z40
160
190

-_.I. ~---1---~~~-- -~-- _.__


100
Sc:. .fOO Joo ZOO 140

160
_____ :'0:. __
ToU! 800 '90 800 '90
(II y,.,.. T. D. C•• R4.. 1961).
17. 'l11e following table gives the result of ceftaJn eumlnatiODll of tluee ani-
1'CtI1tb fa the JCIU 19'7. Whfch Ja the best otliveftlty P

M.A.
---------r=- Percentages resultlln the otliveftlty
A B C
----- ... ----------
7
, ----- 0
M.Sc:.

I
70
B.A. 80
B.Bc. 70
B.Com. 60

(M. A. c.Jmtlti)
Measures of Central
Tendency 9
Need and meaning
We have discussed in the last chapter the utility of various statistical
derivatives like ratios, percentages and rates, etc., in reducing the quantum
of data and also in reducing the size of the figures. But these derivatives
ate not enough for the proper condensation of figures and sometimes
there are many fallacies in their use. Condensation of data is nece,ssary
in statistical analysis because a large number of big figures are 1l0t only
confusing to prind but difficult to analyse also. In order to retiRGt Ib,
complexity of data and to make thelll GOllIparable it is essential that the,various
phenomena which are being compared are reduced to ,one figure each.
If, ,for example, a comparison is made between the marks obtained by a
group of 200 students belonging to a university and the marks obtained
by another group of 200 students belonging to another university, it
would be impossible to' arrive at any conclusion, if the two series relating
to these marks are directly compared. On the other hand, if each of these
series is repre_sented by one figure, comp~n 'Would 'be an extremely
,easr affair. ,It is ,obvious tnat a figure which,is used to represent a whole
senes should neither have the lowcst value in the series nor the 'highest
value, but a value somewhere between.these two limits, possibly in the
centre, where most of the items of the, $eries cluster. Such figures are
,called MealllriS' of Central TendellGJ or A_ages. An average represents
a whole series an4 as such, its value always lies between the minimum
and maximum values and generally 'it is located in the centre or middle
of the d i s t r i b u t i o n . ' ,
ObjeGts. Measures of central tendency or averages gipe a bird'l ey,
iii,., of/he hllge lIIalS ofJlatistitai tItsta w!Jith'Ordillari(y are not tanlJ jntelligible.
They are devices to aid the human mind 'in grasping the true significance
of large aggregates of facts and m~surements. They set aside the un-
,necessary details of the data and put'forward a concise picture of the com-
plex phenomena under investigation. If the human mind was capable
of grasping all the details of large nu~bers and their interrelationships,
averages would have no utility. But the human mind is not capable of
this. It is impossible to keep in mind, say, the details of heights, weights,
incomes and expenditures of even 200 students, what to talk of big figures.
This difficulty of keeping all the details in mind necessitates the use of
averages not only for grasping the central theme of a data, but also for the
.facility of comparison and further analysis. Averages are thus extre""lJ
/;elpflll for pllrPdJlS of (olllpariJon.
w~ jj an aperage a reprefen/alive. The reason why ao average is
a valid representative of a series lies in the fa.ct that ordinarily most of the
7
98 FUNDAMENTALs OF STATISTICS

items of a series cluster in the middle. On the extreme ends the number
of items is very little. In a population of 10,000 adults there would
hardly be any person whO" is 2 ft. high or whose height is above 8 'ft.-
There will be a smaU range within which these values would vary,
say 5 ft. to 6' 5", Even within this range a large number of persons
wou1d have a heighl: between say,S' 5· to 5' 10·. In other class intervals
of height the number of persons would be comparatively small. Under
such circumstances if we conclude that the height of this particular
group of persons would be represented by, say 5' 7', we can reasonably
be sure that this figure would, for aU practical purposes, give us a
satisfactory conclusion. This average would satisfactorily represent
the whole group of figures from which it has been calculated. Ordinarily,
items with values less than the average cancel the items whose values
are more than the average. Thus the average of 3, 4 and 5 is 4. The
item before it is one less in value and the item after it is one more in
value, than the average figure of 4. Thus the two deviations 'If -1
and +1 cancel each other.
Typical and descriptive averages. It should, however, be noted,
that a serie .. can be represented by an average only if the average is
really typical. Sometimes the average which is calculated is not truly
representative of the series. In such cases it should not be used to
represent the series. Averages which are representative are called
Typical Averages and those which are 'not representative aQ.d have only
a theoretical value are called Descriptive averages.
CharacteristicS of a representative average. In whatever way we define
an average it is necessary to keep in mind the fact that an average is
a particular value in a variable and as such it has to be expressed in the
same unit in which the series is. If the variable refers to the weights
of students in pounds the average would also be weight and in pounds.
Similarly- the average of ratios and percentages should be in ratios and
percentages only. Averages are meant for condensing a frequency
distribution in one figure and it is necessary that they are in the same
unit in which the original series is., At thi's stage, it is necessary to decide
about the desiderata or the requirements for a good measure of central
tendency. A typical average should possess the following charac-
teristics : -
(a) It shollld be rigidly defined. If an average is left to the estimation
of an observer and if it is not a definite and fixed value it cannot be
representative of a series. The bias of the investigator in such cases
would considerably affect the value of the average. If the average is
rigidly defined this instability in its value would be 110 more, and it
would always be a definite figure.
(b) It shollld be based on all the observations of the series. If some
of the items of the series are not taken into account in its calculation
the average cannot be said to be a representative one. As we shall
see later on there are some averages which do not take into account
MEASURES 011 CENTRAL T'ENDENCY 99
all the values of a group and to this extent they are not satisfactory
averages.
(e') )t should be e'apable o/further algebraie' treatment. lrfiytilverage
does not possess this quality, its use is bound to be very limited. It
will not be possible to calculate, say, the combined average of two or
more series from their individual averages; further it will not be possible
to study the average relationships of various parts of a variable, if it is
expressed as the sum of two or more variables. Many other similar
studies would not be possible if the average is not capable of further
algebraic treatment. -
(d) It .rhotJ/d be ea.ry to e'aleu/ate and .rimp!e fo follow. If the calcu-
lation of the average involves tedious mathematical processes it Will
not be readily understood and its use will be confined only to a limited
number of persons. It can never be a popular average. As such,
one of the qualities of a good average is that it should not be too abstract
or mathematical and there should be no difficulty in its calculation.
Further, the properties of the average should be such that they can be
easily understood by persons of ordinary intelligence.
(e) If should not be affected by jlue'ftlatiblls of samplilzy,. If two
independent sample studies are made in any particular field, the averages
thus obtained, should not materially differ from each other. No doubt,
when two separate enquiries a~".made, there is bgund to be a difference
in the average values calculated but in some cases this difference would
be great while in others comparatively less. Those averages in which
this difference, which is technically called "fluctuation of sampling"
is less, are considered better than those in which its difference is,
more.
One more thing to be remembered about averages .is that tbe itellu
lIIM.re average ir being cakulated rhollld form a oomogli1le01lS group. It is absurd
to talk about the average of a man's height and his weight. If the data
from which an average is being calculated at:e not homogeneous, mis-
leading conclusions are likely to be drawn. To.find out the average
production of cotton cloth per mill, if big and small mills are not separat-
ed, the average would be unrepresentative. SimiLirly, to study wage
level in cotton..mill industry of India, separate averages should be cal-
culated for the male and female workers. Again, adult workers should
be separately,studied from the juvenile group. Thus We see that as far
as possible, the data from which an average is calculated should be a
homogeneous lot. Homogeneity can be achieved either by selecting
dnly like items or by dividing the heterogeneous data into a number
of homogeneous groups.
Measures of various orders
Statistical series may differ from each other in the following three
ways : -
1. They may differ in ~ values of th~ variable round which
most of the .items cluster. '
100 FUNDAlIENTALS OP STATISTICS

2. They. may differ in the extent to which items are dispersed


round the centtal value.
3. They may differ in the extent of departure &om a normal
distnbution.
Accordingly there are three measures designed to study the
above differences. They are respectively known as : -
1. Measures of the first order or measures of central tendency or
averages.
2. Measures of the second order, or meas!1fes of dispersion.
3. Measures of the third order or skewness, kurtosis, etc.
We shall study all these measures. In the present chapter, a study
of the measures of the first order, or measures of central tendency is
being made. Measures of the second and third order 'Would be studied
in the next two chapters.
Types of .AJ!eragu. Measures of central tendency or averages are
usually of the following types : -
(a) Ma/helllatical Al)eragll
1. Arithmetic Average or Mean.
2. Geometric Mean.
3. Harmonic Mean.
(b) Averages of Position
1. Median
2. Mode
Besides these there are less important averages like QlI4IIralit M,ll".
There are also some averages which are mostly calculated by using
the technique of the arithmetic average in a modified form. Examples
of such averages are Mo,,;ng aperage ami Progr4ssip, a"erag'. Both of them
are used in the analysis of commercial statistics and their utility in the
analysis of a time series is very great.
Of the above mentioned five important averages Arithmetic
Average, Median and Mode are the most popular ones. Geometric
mean and Harmonic mean come next. We shall study them in this
very order.

ARITHMETIC AVERAGE

Arithmetic Average or Mean of a series is the figNre obtained b.J


dividing the tolal flallilS of the flar;olls itellls by their III1,,",er. If the heights
of a group of eleven persons are 64", 69"'. 63", 60" 65', 68", 62", 67",
70",66" and 61', then to find the arithmetic ave~ge of the height of
these persons we shall add these figures and divide the total so obtained,
by the number of items which is 11. The total of the items in this case
is 715" and if it is divided by 11 w..s:.se;t'411a.e ngure of 65-. This is the
mean or arithmetic av~~e serles. ~.,
WlL\SUaBS OP CBNTlI.AL 'mNDENCY 101
Calculation of (be arithmetic average in a eede.. of individual
obscrvatione
Direct ·M,lhod. fu has been said above the simple arithmetic
average of a series is equal to the sum of variables divided by their
number. This m~thod can be expressed in the shape of a mathematical
-..formula.
Suppose the values of a variable are respectivelY·lIIl • 1111. fIJ, .......
••• ............. ....Q and their arithmetic average is represented by II. then

1
If - 7("1+1Il1 +1Il.+ ............... +11I0)
1 :zmo
or a - -~ or a - -
Where " "
11=Arithmetic average; Ill" Values of the '\>ariablej 1: = Sum-
mation or total; ,,-Number of items.
The following example would illustrate this formula.
&alllpl, 1. Calculate the simple arithmetic average of the
following ltems :
Si%e of items
20 SO 72
28 53 74
34 54 75
39 59 78
42 64 79
SollltiOll. DiI'I# M,thod
Computation of aritbm.ctic_ aY~J;3ge
Size of items
(m)
20
28
34
39
42
50
53
54
59
64
72
74
75
78
7'9
'1'02 FUNDAMENTALS OP STATISTICS

Arithmetic average or a = l:m j where ~ represents


n
the summation of measurements and n the number of items,
H21
a - 15=54.73

Arithmetic -average of the series=54.73


Shorl-;ttt Method. The above method of the .;alculation of arith-
metic average can be used only when the items are few and the size
of the figures is small. If it is not so, there would be considerable
di!fficulty in the calculation of the arithmeuc average.
To remove this difficulty a short-cut method is used. The method
is based on an important property of the arithmetic average, which~
that the algebraj; Slim of the deviations of a series of individual obseruatioR;r,
from their mean is a/ways equa/to zero. Thus the arithmetic average of
4, 6, 8, 10 and 12 is equal to 8. If the differenr.e of each of these items
from the mean is calculated it would be-4, -2, 0, + ~ +4, Their total
is zero. This will always be so. This can be easily proved. *
This being so we can assume any arbitrary -mean to find out the
deviations of items from this assumed mean. The total of the devia-
tions will not be Zero. If this total i& divided by the nUmber of items
and added to the assumed average we shall get the actual arithmetic
average.

• Proof. Supposing INI' fIIz, INa, etc., stand for the values of a
variable and d1,tdz, da, etc., for· the;r respective deviations from the
mean and if a stands for their arithmetic average and n for the number
of items.
Then,
IN.+fIIZ+INS+ •.. +INn
a ~ --~~~--~--~~~
n
IN1+fIIZ+INS+ ... +tJtn ~an
The number of items is equal to n.
:. If we subtract an times from each side of the equation we
get

But
(m1-a)=d1, (ms-a)=d z, (INa-a)=d. and so on.
:. d1+dl+da+ ... +dn =0
Or l)i==0
MEASURES OF CENTRAL TBND:l!NCY 103

Symbolically:
T,tix
a =x+-- n
Where -
a =Actual arithmetic average; x=Assumed arithmetic average;
T.dx => The sum of the deviations from the assumed mean; n = Number
of items.
It should be remembered that the difference between the actual
arithmetic average and the assumed arithmetic average is equal to the
sum of the deviations from the assumed arithmetic average divided by
the number of items.
Symbolically ;
T,dx
a- x = __
n
If we solve example No. 1 by this short-cut method it will give us
exactly the same answer as we got by the direct method. This alternative
method is illustrated below:-
Calm/ation of arithmetic average
Short-cllt method
Deviation from an assumed
Size of items mean (50)
_ _ _ _ _ _.(m) (dx)
20 -30
28 -22
34 -16
39 -11
42 -8
50 o
53 3
54 4
59 9
64 14
72 22
74 24
75 25
78 28
79 29
n = 15 }Jdx=+71

Arithmetic ave1:age or a = X + T.dx;


n
where x represents assum-
cd average and T.dx represents the total of the deviations from the
assumed average and n. the number of items.
104 '_PUNP~N'l'.u.s, 91' STA.TISTICS

71
a -50+ ~ -50+4.73 -54.73

Arithmetic average of the series -54.73


In th~ ab<>ve solution when deviations are measured from 50 as
arbitrary mean. there is' in each case an error., This error is"a constant
figure and is equal to the c:illference betWeen the actual arithmetic
average and the, assumed arithmetic average (thUS the first deviation
of-3D should have been-34.73 measured from the actual arithmetic
average). In the sum of all such deviations from the assumed mean.
the total error would be " times of this constant error, since the e~or
is repeated once for every item included. If the sum of these deviations
iS'divided by n the actual amount of error is determined, and we' can
calculate the actual arithmetic average.
Calculation of arithmetic average in a discrete serice
Dirl&1 Method. In a discrete series the values of the variable are
-multiplied by their respective frequencies and the products so obtained
are totalled. This total is divided by the number of i~ems. which in a
discrete series, is equal to the total of the frequencies. The resulting
quotient is a simple arithmetic average of the series.
. Algebraically
. If fl' f .. fa, etc., stand respectively for the fre~uencies of the
values 1111' III•• III.. etc.,

Or 4<='P"f -= Emf
n 1:.f
The following illustration would clarify the formula :
Example 2. The following table gives the number of children
born per (amily in 735 families. Calculate the average number of
children born per family.

Number of children Number of Number of children Number of


born per family families born per family families
o 96 7 20
1 108 8 11
2 154 9 6
.3 126 10 5
4 95 11 5
5 62 12 1
6 45. 13 1
ImASUlI.ES OF CBNTIlAL TENDENCY 105
SO/lltiOll :--Dirttf metbot/'
Complltation of the averag' 1111mb" of children born per jal1l;!!
Number of child~en Number of
born per family I, families mxj
(m) f (j)
0 I 96 0
1 108 108
2 154 308
3 126 378
4 95 380
5 62 310
6 45 270
7 20 140
8 11 88
9 6 54
10 5 50
11 5 55
12 1 12
13 1 13
Total '2Lr=735 .tm ... 2166
Aritlunetic average or a ... Dn! = where .tmJ represents the
sum of the products of the size orhems and corresponding frequencies.
2166
a- 735"-2.9
... 3 children approximately.
The average number of children born per family is equal to 3
approximately.
Short:'&/It method. A short-cut method can be used in the discrete
series also. In this method the deviations of the items from an assumed
mean are first found out and they are multiplied by their respective
frequencies. The total of these products is divided by the total fre-
quencies and added to the assumed mean. The resulting figure is the
actual arithmetic average. For further simplification of the calculations
the deviations from the assumed mean may be divided by a commOn
factor to reduce their size. If ,this is done the sum of the products
of the deviations and frequencie's is multiplied by this common factor
and then it is divided by the total frequencies and added to the assumed
average.
Algebraically: a<=x+
tidx
n
Where
}';fdx = the total of the products of the deviations from the assumed
iilverage and the respective frequencies of the items.
106 PUNDAM&n'ALS OF STATISTICS

IT the deviations are further divided by a common factor and if


this factor is represented by i

a "" x + ( }:;~dx X i)
The following illustrations would clarify these tules :_
Example 3. The following data relate to sh:es of shoes sold
a store during a given week. Find the average size by the short-cut
method.
Computation of the overage nte of shoes
Size of shoes No. of pairs Size of shoes No. of pairs
4.5 1 8 95
5 2 8.5 82
5.5 4 9 75
6 5 ~5 44
6.5 15 10 25
7 30 1~5 15
7.5 60 11 4
So/Illion. Shor/-&II/ Me/hoa No.1.

Size of shoes No. of pairs


Deviations from
the assumed
I \ Total
(III) (J) meanJ!) deviation
(fdx)
4.5 1 -3.5 -3.5
5 2 -3.0 -6.0
5.5 4 -2.5 -10.0
6 5 -2.0 -10.0
6.5 15 -1.5 -22.5
1 30 -1.0 -30.0
7.5 - 95 60 -0.5 -30.0
8 I 0 0
8.5 82 +0.5 +41.0
9
9.5
75
44
I +1.0
+1.5
+75.0
+66.0
10 25 I +2.0 +50.0
10.5
11
15
4
I +2.5
+3.0 I'
+37.5
+ 12.0
- - - - - - I n ",,457 -----;.....1 --;~'"'"':fi"iJx---+--'1"'09;;t""";.5
Applying the short-cut methOd.
. hmeac
A rlt . average or a ." x+ 1.ftix
_
"
Where x stands for the assumed average; r.fdx for the sum-
mation of deviations for the assumed average; and fI, for the number
of items.
MEASURES OF CENTRAL TENDENCY 107
169.5
We get, a-8+ ~ - 8 + .37 ... 8.37
Thus the average size of shoes is 8.37
J:3.xampJe 4. The following table gives the heights of 350 men.
Ca1cnlate the mean height of the group.
Height in inches Number of persons
~ 1
61 2
63 9
&5 48
~ 1M
~ 1~
71 40
73 17
So/ll#OIl. Short-eftt method No.2.
-Computation of the mean height of the group.

Height
No. of
Persons
Deviations
from the
avo mean (67)
I Step-
deviation
Total
Deviations

{m) (J) (dx) (~) (ftlx)


5~ 1 -8 -4 -4
61 2 -6 -3 -6
63 9 -4 --2 -18
65 48 I -2 -1 -48
I
67 131 0 0 0
69 102 2 1 102
71 40 I, 4 2 80
73 i7 I
I 6 3 51
11":'350 .' "I:.fdx-157

Arithmetic avemge or a -x+ ( "I:.f':; X i) where x represents


the assumed average, fdx, the product of the frequency and step-dcvia-
tion, and i represents the common factor of deviations,

a-67+ (!~~ X 2 ) -67.89


'lpe mean height, of the group -67.89"
Cakulapon of the arithmetic average in a continuous series
The process of the calculation of arithmetic average in a conti-
nuous series is the same as in case of a discrete series. In a conti-
nuous series the midpoints of the various class intervals are written
down to replace the class intervals. Once it is done. there is no clif£er-
ence between a continuous series and a discrete series. .All- the
108 FtJNDAllENTALS OP STATISTICS

three methods of the calculation of arithmetic average discussed above


in connection with the discrete series can be used here as well. The
following examples would illustrate the point :-
Example 5. Calculate the arithmetic average of the following by
the direct method:
weekJy wages NumbCr of labourers
(in rupees)
11-13 3
13-15 4
15--17 5
17-19 6
19--21 5
21--23 4
23-25 3
Soilltion. Dire~t method.
Computation of the average daily wages of labourers.
Wages in rupees I No. of Mid-values of Wages multIplied
I labourers the wage by the no. of
I
I
groups labourers
( m) (j) (1/111) (mf)
- 11':_13 3 12 36
13-15 4 14 \ 56
15-17 5 16 80
17-19 6 18 108
19-21 5 20 100
21-23 4- 22 88
23-25 3 24 72
-
n=3O I
Substitutillg the above data 1n the formula.
-
l:fIIf-S40

Arithmetic Mean Ot a = l'/1IJ


We get, a co ~
"
540
.... 18 rupees.
Thus the average daily wages paid to a labourer is Rs. 18
Bxt1tl1P" 6. The following table gives the marks obtained by· a
set of students in a certain examination. Calculate the average mark:
per student.
Marks Number of students Marks Numbet of
students
10-20 1 60-70 12
20-.30 2 70-80 16
30-40 3 80-90 10
40-50 5 90-100 4:
50-60 7
SO/filion. Short-rill fIIItbod.
Computation of average marks per student
I
I I DevIatIon
Marks Mid No. of nom Step Total
values students assumed deviations deviation
mean (10)
{fJJ) (fJlJI.) (j) (55) (d%-) (fdx)
10-20 15 1 --=40-- -4 -4
20-30 25 2 -30 -3 -6
30-40 35 3 -20 -2 -6
40-50 45 5 -10 -1 -5
50-60 55 7 0 0 0
60-70 65 12 +10 +1 +12
70-80 75 16 +20 +2 +32
80-90 85 10 +30 +3 +30
90-100 95 4 +40 +4 +16
___..,--
n-60 r.fJx-+69

Arithmetic average or a "'"""+ (};f;: X;)


-55+ (:~ X 10)
-66.5 marks.
Charlier's accuracy check
The accuracy of the calculation of the arithmetic average can be
checked easily with the help of the following formula given by Chadit'r :
r./d "-:E{j(d+1)} -l;j
If the two sides of the above equation are equal it is a proof that
the calculations are all right. In the above example if + 1 is added to
the deviations they 'Would respectively become .... 3, - 2, -1, 0, 1, 2, 3,
4, and 5 and the values of f(d+- 1) would be - 3, - 4, - 3,. 0, 7. 24, 48,
40, and 20. The total of f«(1+1) would be+129. Substituting these
values in the equation given above we get
69-129-60 ... 69
Thus E/J-{f(d+l) }-E/
and it is a proof that the calculations are all right.
Steps in short-cut method
The short-cut method of calculating arithmetic average shoud be
used in all cases as it saves time and gives accurate results. The process
of calculating the arithmetic average by the short-cut method cao be
lununar~d as follows :_
110 FUNDAMENTALS OP STATISTICS

(i) Assume as average, the midpoint of a class which is in the


middle of the series. Technically any class can be chosen, but if the
class chosen is in ,the middle of the di.s.t.riblltion there is considerable
facility in calculations.
(i;) Calculate the deviations of the items (midpoints in case of
continuous series) from the assumed mean.
(iii) Divide the deviations by a common factor or magnitude
of the class interval. These deviations are known as stt} deviationl
or tiePiations in class-interval 1I11;IS.
(;v) Multiply the deviations with the respective frequencies of
the various classes and total the products, taking into account the
algebraic signs (plus or minus).
(f) Divide this total by the total of the frequencies and if step
deviatIons have been taken multiply the result by the common factor
or the magnitude of the class-interval.
(VI) Add this figure to the assumed average and the resulting
figure would be the actual arithmetic average of the series.
Algebraic properties of the arithmetic average
The arithmetic average has three important matnematical pro-"
perries. They are:
(a) "The total of the deviations of the items frOflJ the mean (takiTJg pitts
and minus sig"s) is 8fl'Iai 10 zero. We have already seen in ptevious pages
how important this property of arithmetic Ilverage is, and how the
calculation of arithmetic average is based on this rule. The algebraic
proof of this has also been given.
(b) If a serie.r oj an observation Gonsi!ts of 1",0 or more Gomponent series
the lIIean of the ",hole series Gan be easify expressed in terms of the metZl'lr of the
GOl1lptmenl series. If, for example a series relating to wages in a particular
industry is divided in two parts-one relating to males and the othea,
relating to females-and if we know the number of observations in each
group and their respective means, we can find the combined mean of
the two series as follows :
111 a1 +1:11 az
au ...
nl+~11
Where
au is the combined mean of the two series a and as the means
of the two series respectively and n1 and n2 the num~er of observations
in the two series.
Thus if the average wage of male workers is Rs. 30 ana the?
number i~ 200 and the average wage of female workers js Rs. 2S ~
their number is 100, the combined mean of the t'Wo series"Would-be
(200 X30)+(l()O X 25) 6000+2500
200+100 300
Rs.. 28.3
MEASURES OF CENTRAL TENDENCY 111
/
(I:') The mea/II of all the SlIlIiS and differences of carre/ponding obsert!ptions
in fwo series (with eqlla/ nllmber of obsertJatiofJs) is eljfla/ to tbe .film or difference
1)/ the mcans of the tW'O series.
The following illustrations would clarify the point : -
Section A Section B A+B A-B
5 8 --13 _3
6 10 16 -4.-
7 12 19 -5
8 14 22 -6
9 16 25 -7
10 18 28 -8
11 20 31 -9
Total 56 98 154 \~ ___ -42
Arithmetic Average 8 14 22 -6
In the above elf{<lrr.pie the mean of the corresponding sum of the
two series is equal f6" ~2 and the mean of the differences is-6. these
figures can be direcdy ()btained by adding the means of the two series
(8 +14) and by subtractin~ t"e mean of the second series from the Clean
of the first one (8-14).
Merits of arithmetic average
The arithmetic average is the most popularly used measure of
central tendency. There are many reasons for its popularity. III the
beginning of this chapter we l.ad laid down certain characteristics which
an ideal average should possess. We shall now see how far the arith~
metic average fulfils these conditions : -
(i) The first condition that an average should be rigidly defined
is ful.fiI1ed by the arithmetic average. It is rigidly defined and a biased
investigator shall get the same arhhmetic average from the series liS an
unbiased one. Its value is always definite.
(ii) The second characteristic that an average sbould be based
on all the observations of a series is also found in this average. Arith-
metic average cannot be calculated if even a single item of a series is
left out.
(iii) Arithmetic average is also capable of further algebraic ueat-
.nent. While discussing the algebraic properties of the arithmetic
~~erage. we have already seen in details, how various mathematical
processes can be applied to it for purposes of further analysis and in-
terpretation of data. It is on account of this characteristic of the arith-
metic average that:
(0) It is possible to find the aggregate of items of II series if
only its arithmetic average and the number of items is
known.
(b) It is possible to find the aritrunetic average if only the.
aggregate of items and their number is known.
112 PUNDAMl!N'I'ALS OP "STATISTICS

(ill) The fourth characteristic laid down for an ideal average that
it should be easy to calculate and simple to follow, is also found in
arithmetic average. The calculation of the arithmetic average is simple
and it is very easily understandable. It does not require the arraying
of "data which is necessary in case of some other averages. In fact this
average is so well knQwn that to a common "man_Average means an
arithmetic average.
Thus the arithmetic average
(a) is simple to calculate,
(b)~ does not need arraying of data,
(e) is easy to under5tand
(v) The last characteristic of an ideal average that it:. should be
least affected by fluctuations of sampling is also present in arithmetic
average to a certain extent. If the number of items in a series is large,
the arithmetic average provides a good basis of comparison. as in such
cases, the abnormalities in one direction are set off against the abnorm-
alities in another direction.
Drawbacks of arithmetic average
No doubt the arithmetic average satisfies most ,of the conditions
of an ideal average, there are certain drawbacks also from which it suffers
and as such it should be used with caution. These drawbacks really
arise on account of the peculiar nature of this average aqd the teChnique
of its calculation. The points worth consideration in this respect are
as follows:
(i) Since arithmetic average is calculated from. all the items of a
series sometimes the abnormal items may considerably affect this average,
particularly when the number of items is not large. For example,
if the income of a shopkeeper is Rs. 1,000 per month and the incomes of
his three assistants are Rs. 25, Rs. 35 and Rs. 40 per month respectively,
. . " 1000+25+35+40
the average Income of thIS group would be Rs. 4
or is 275 per month. This is not at all a representative figure. Simi-
larly, if one player in cricket scores 300 runs and the remaining 10 players
score only 140 runs, the total is 440 runs and the average per player is
40 run~. It is not a representative figure as 10 players out of 11 have
scored on an average only 14 runs each.
(it) Further, the fact that the arithmetic average cannot be calcu-
lated without all the items of a series can also be said to be a drawback,
If out of 1000 items the values of 999 items are known the arithmetic
average <;annot be calculated. Other averages like median and mode do
not need complete data,
(iit) Arithmetic average is no doubt easy to calculate but in Ii
relative sense its calculation may be more difficult than tha:t of mode or
median as they can be located merely by inspection.
(iv) Another point to be noted. in this connection is that the
arithmetic ayerage can be a figure which does not exist in tne series
MEASURES OF CENTRAL TENDENCY 113
at all. The arithmetic average of 12, 14 and 19 is 15. No items of the
series has a value of- 15.
(II) Arithmetic aye rage sometimes gives such results which appear
almost absurd. If we have to find out the number of children per
family, and if we use th~ arithmeti~ average, it is qui!e likely that we ~et
the average as 3'4 "children. ObvlOusly the result 1S absurd. A chlld
cannot be divided in fractions.
(II') Sometimes arithmetic average gives fallacious conclusions.
Suppose the incomes of two groups of persons are as follows :-
The average incoine of each of these two
groups is Rs. 300. It would appear from the A B
averages that both the groups are economically
at the same level, and the two series are al·
most similar to each other but this is not the 1000 325
case. The two series entirely differ from each 100 300
other. so far· as their composition is con- 75 285
cerned. 25 290
(fIi;) The arithmetic average gives
greater importance to bigger items of a series
and lesser Importance to smaller items. It has 1200 1200
an upward bias. One big item among four ,
items, three of which are small, will push up the average conSIderably.
But the reverse is not true. If in a series of four items there are three
big items and one small item the average will not be pulled down very
much.
The above discussion thus leads us to the conclusion that though
arithmetic average fulfils most of the conditions of an ideal average yet
it should be used with caution as it is likely to give erroneous conclusions
under certain conditions.
,. MEDIAN
- ('MuI;an !!!Ae vtJlf!..,_gilh, 11I~ ilJ.!!l.d.. a ser;e.!..JI!/un iJ,4 arrayed ~'!. t!,mn£_-
in&. ;;:r,{t:enmng Drlir DL!'IPblibl, .
It divides the series in two equal parts.
Tlie va uesof items in one part are less than the value of the median and
in the other part are more than it. If in a clas~ there are 21 students and
if they stand in a line in accordance with their height beginning with the
shortest amongst them and ending with the tallest, then the 11 th student
would be in the centre and would divide them in two parts consisting of
ten students each. Students of one part will have heIghts less than the
height of the 11th student and of the other part more than this height.
The height of the 11th student is the median height. For un grouped
data it may be convenient to und the value of the median by counting
+1 .Items, b"
N"2-.- eglnmng W'1t . h the h'19hest f\or Iowest).Item tn . th e
array. In grouped data it is abandoned.
Symbolically M .... si%e of ; items
where M stands fot the median and n for the number of items.
8
114 FUNDAM;NTALS OF STATISTICS

Location of median in a series of individual observations


Bxamplt 7. Find out the median of the f ....'lowing it~s:-
5, 7, 9, 12, 10, 8, 7, t5~ 21
Sollltion. Items arranged in ascending order JJf magnitude
Serial number Size of items
--~--- 5
1
2 7
3 c' 7
4 8
5 9
6 10
7 12
8 15
9 21
If M represents the median, and n, the number of items,
·
M = SIZe 0
f n+ 1 .Items -= SIZe
-2- . 0
f 9+ 1 .
-2- Items
=size of 5th item..,9
In the above example the number of items,was odd and there was
no difficulty in finding out the middle item and 'its value. If the number
of items is evt:n, say, 10, the middle item of n~ i~ems would be
1
I

the 5.5th item. In such a case the values of 5 and 6 items would be
added and IN _total would be divided by 2: the resulting figure would ,
be the value of the median. The following example would clarify this
point : -
Exam.ple 8. The following table gives the marks obtained'by a
batch of 30 B. Com. students in a class-test in statistics. (Marks 100).
Roll. No. Mark;s obtained Roll No. Marks obtaIned
1 33 16 ~4
2 32 17 33
3 55 18 42
.4 47 19 38
5 21 20 45
6 SO 21 26
7 27 22 33
8 12 23 44
9 68 24 48
10 49 2S 52
11 40 26 30
12 17 27 58
13 44 28 37
14 48 29 38
15 62 30 35
MEASURES OF CENTRAL TENDENCY 115
Find the value of the median.
S()ffltion. Marks obtained by 30 students arranged in ascending
order of magnitude:
Serial No. Marks Serial No. Marks Serial No. Marks
1 12 11 33 21 47
2 17 12 35 22 48
3 21 13 37 23 48
4 24 14 38 24 49
5 26 15 38 25 50
6 27 16 40 26 52
7 30 17 42 27 55
8 32 18 44 28 58
9 33 19 44 29 62
10 33 20 45 30~ 68

If M represents the median, and 'f1 the number of items,

M == SIZe z-
. 0 f n+1 items
.

=si:te of 30+ 1 items ""size of 15.5 th item.


2
size of 15 itenis+size of the 16 items.
2
38+40
'= - 2 - =39 marks.
Location of median in discrete series
In a discrete series also the items are first arranged according to the
ascending or descending order of mag~tude and their r~5pective fre-
quencies are written against them. After this, the frequencies are cumu-
lated and then the value of the middle item can be easily located. The
following example illustrates the procedure :
Example 9. Find the median siZe of the shoes from figures given
below:
Size of shoes Frequency Size of shoes Frequency
4.5 1 8.5 82
5 2 9 75
5.5 4 9.5 44
6 D 10 25
6.5 15 10.5 15
7 30 11 4
7.5 60
8 95
116 FUNDAMENTALS OF STA'l'IS'l'ICS

SO/linon: Calculation of the median size of the shoes :_


Size of shoes
4.5
Frequency
1
Cumulati¢ Frequency
,
5 2 3
5.5 4. 7
6 5' 12
6.5 15 27
7 30 57
7.5 60 117
8 95 212
8.5 82 294
9 75 369
9.5 44 413
10 25 438.
ffi.5 15 453
11 4. 457

d · ... size
-:?Jf M elan z-
. of n+l pairs;
. wnere
/ n equaIs the total f requency

.
= SIZe 0
457 + 1 or 229'
f ----2--- palrs - 8.5
I
It will be clear from the above figures th,t th~ .alue of items from
213th to 294th is 8.5. The 'Value of the 229th item. thus, is also 8.5.
Detetmination of median in a continuous s¢es
When the median of a continuous fre-~ncy distribution has to
be determined there is one difficulty. The tie of the median lies in
a class interval, and to get a definite fi~ure, interpolation has to be done.
Suppose, for example it is'found that the :value of the median lies in the
20 to 30 class interval'whose frequency is 40. Now to find out the value
of the median "We have to takic recourse to interpolation and to apply a
?articular formula. This formula, which we discuss below, is based on
the asswnption that the frequencies of the class in which the median lies
Lre uniformly spread over the whole class-interval. In the abqve case.
He shall presume that these 40 units are equally distributed in, the whole,
:lass interval of 20 to 30 or each of these ten values 20, 21, 22 ang so on,
las a frequency of 4 units. /
The formula of interpolation to find out the median is : -

D M=/t+ /~-/l (111- t)


7 /1
Where
M =Median; 11 - the lower limit of the class in which median lies;
't"",the upper limit of the- Class in which median lies;.,,=the frequency
117
of the class in which median lies; III-middle item; & -cumulative fre-
quency of the group preceding the median group.. .
The following ~xamples illustrate the above formula : -
'Bxdlllpl, 10. Find the median of the following distribution.
Cla88-intervals Frequencies Class-intervals Frequencies
as. Rs.
1-3 6 11-13 16
3-5· 53 13-15 4
5-7 85 15-:t7 4
7-9 56
9-11 . 21. Total 245
Sollllioll. Calculation of median
Class-intervals Frequency Cumula~ frequ(;ncy
1-3 6 6
·53
3-5
$-7 85 5'
144_
7-9 56 200
9-11 21 221
11-13 16 237
13-15 4 ·241
t5-17 4 245

Median=the value, of -; i. ,., 122.5 items; which lies in 5-7


group;
Applying the formula of interpolation,
f.-II
M I... 1 +-t'-
J 1
(111-')
~ 7-5
we have, M=5+ --ss-(122.5-59) 0=6.5
In the above example median is the value of 122.5 items which lies
in 5-7 group. In this group the number of items is 85. On the pre-
sumption that these items are uniformly distributed in this class-interval,
we can calculate median by direct arithmetical process also. 59th item has
the value of 5 and the next 85th items up to 144, ar e spread over two
values from 5 to 7. From 59th to 122.5 there are 63.5 items. The value of
63.5 item after 59th (or the value of 122.5 item) would exceed 5 (the
value of 59th item) by l-" X 63.5 or by 1.5. Thus the value of the 122.5
items would be 5+ 1.5 or 6.5
Graphic calculation of median
The median of a series can be calculated graphically also. For this
the series is cumulated and a cumula.tive frequency curve (called 0UII')
~s drawn. A perpendicular is then drawn on the base line (called Absi.rra)
118 FUNDAMBN'I'ALS OF STATISTICS

at the middle item cutting the curve at a particular point. The value of
the median is read on the vertical line (called fNdinate) at the point of
intersectio~. This procedure would be illustrated in the chapter on
Graphs.
Merits of median
(i) It satisfies the first condition laid down in previous pages for
an ideal average as it is rigidly defined.
(ii) It can be easily calculate'd and it is understood without any
difficult~.
(iii) It is not affected by the values of the extreme items and as
such is sometimes more representative than arithmetic average. If the
incomes of five persons are Rs. 30, Rs. 35, Rs. 40, Rs. 45 and Rs. 1,000
the median would be Rs. 40 whereas the arithmetic average would be
Rs. 230. Median in such cases is a better average.
(iv) Even if the value of the extremes is not known median can
be calculated if the number of items is known.
(v) It can be located merely by inspection in many cases.
(vi) It gives best results in a study of those phenomena which are
incapable of direct quantitative measurement, for example intelligence..
It is impossible to measure intelligence quantitatively but it is possible to
arrange a group of persons in ascending or descending order of intelligence
and thus to locate a person ;vhose intelligence can be:. said to be average.
Drawbacks of median
(i) Median may not be representative of a series iQ. many cases.
This is specially so when there are wide variatiQns between the values
of different items: For example, if the marks obtained by eleven students
are respectively 15, 16, 16, 18, 18, 20, 54, 60, 60, 60, and 72 the median
marks would be 20. Clearly the average is not representative of the series.
(ii) It is not suitable for further algebraic treatment. For exam-
ple, we cannot find out the total values of the items if we know their
number, and median.
(iii) When median has to be calculated in continuous series it
requires interpolation. The assumption of the interpolation, that all
the frequencies of the class-interval are uniformly spread over their
values in the class-interval, may not be actually true. In most cases it will
not be true.
(iv) If big or small items in a series are to receive greater impor-
tance median would be an unsuitable average. Median ignores the
values of extreme itenls.
(v) Median is more likely to be affected by the fluctuations of samp-
ling than the arithmetic average.
(vi) The arrangement of items in ascending or descending order
is sometimes very tedious.
Comparison of mean and median
Both the mean and the median satisfy the conditions of rigld
definition and stability but so far as ease in calculation is concerrred
MEASURBS OF CEN'tRAL TBNDENrV 119
median has >l distinct advantage over mean. On the other hand, the
general fluctuations of sampling 'affect the median to a greater extent
than the mean, though there might be some cases where mean is affected
to a greater extent by such fluctuations than the median.
So far as thl'! case of algebraic treatment of these two averages is
concerned, mean is definitely superior to median. In case of mean
w hen several series relating to one phenomenon are combined into one,
it is possible to find out the combined average from the averages of
various series and their number of observations. It is not possible in
case of median. However, if the component series are symmetrkaP
their means and medians would be identical and as such combined mean
and median would also be the same. But in case of asymmetrical distri-
bution the combined median would not coincide with the mean n01" with
any other assignable value. The sum or difference of the corresponding
values of the items of two series, is not equal to the sum or difference of
their medians as is the case with arithmetic average. The calculated value
of the median subject to error, is not necessarily the same as the true value
of the median, even if the error is :tero. that is if positive or negativ:e
errors cancel each other.
On the other hand, median has certain advantages over tue mean.
It is easily calculated and is readily obtained without even knowing
the value of all the items, provided they can be arrayed. Further in
SOme cases mean cannot be calculated due to the extreme class intervals
being infinite, like cCless than 100" or "more than 10,000" etc; but median
can be easily obtained in such distributions. Sometimes median may be
more representative than the arithmetic average, due to the fact that it is
not affected by the values of extreme item:::. If, for example, the values
of most of the items of a sample cluster round 200, median would not be
affected if suddenly, one it~m, whose value is 3000, is included in the
sample.' Mean in such cases is more affected by fluctuations of sampling
thhl the median. Further, median is geO(:rally the value of a particular
item of the series, whereas mean may not be the value of any item of the
series. In this sense median is a more natural average than the mean.
QUARTILES, DECILES AND PERCENTILES
It has been seen that the median divides an arrayed series in t'wo
equal parts. The values of items in on'e part are more than the median
value, and the vlllue of items in the other part, less than the value of the
median. With a view to have a better study about' the composition of
a series it may be necessary to divide it in four, five, six, seven, eight,
nine, ten or hundred parts. Usually the series are divided either in
four, ten or hundred parts. Just as one item divides the series in two
parts, three items would divide it in four parts, nine items in ten parts and
ninety-nine items in hundred parts. The values of these items are res-
pectively known as Quartiles, Deciles and Percentiles. A series can be
di~ided in five, seven or eight parts by Quintiles, Septiles and Octiles.
I For further explanation see chapters 0::1 Dispersion and Skewness.
120 'PtlNDAMl!NTALS OF STATIS'I'lCS

There are thus three quartiles, nine deciles and ninety-nine percen-
tiles in a series. The second quartile, qrth decile and 50th percentile is
median. The value of the item which divides the first half of It series
(with values less than median) i.ti two equal parts is called the First gM4rtil.
or LOlli" Quartil, and the value of the item which divides the latter
half of a series with values more than the median) in two equal parts is
called Third Q1IIIrIiJ, 0'Upp., QIIP,IiJ,. The S,fOlJd Qua,lih or the
MidtJ" Qlla,lil, is the same thing as median.
The calculation of 'Quartiles, ,Deciles, 'Percentiles and other such
values is done by following the same rules with which the value of median
is determined.
Thus
Ql - the v al ue of 4 " .ltems

Qa -the value of 3f) items

D1 -the value of 1~ items

Dr-the value of;~) items

D.-the value of ;~)items

PI - the value of 1~ items

PI-the value of ;g;) items


P ..... the value of 9;~) items
,
\ Where Ql and Q. stand for the first and third quartiles Dl DI and
b. for the first. second and ninth deciles and P1 PI and P~ for the l$t,
2nd and 99th percentiles respectively and" stands for the number of items
in the series.
Location of quartiles, deciles and percentiles, etc., in a seties of
individual observation.
&(I/IIpl, 11. From the data given in the Example No. 8 calculate
the value of the qua.rtiles, 6th decile and 70th percentile.
lBL\SUUS op ClIINTRAL ~DBNCY 121

. f II •
SO/Ii/iOIl. 1st Quartile or 12.1 -s~e 0 4" Items

... si2!e of ~ or7.50thitem


-size 7th litem+i (si2!e of 8th item-size
of 7th i~em)
-30+ i (32-30)
-31 marks.
. . f 3(11) .
, Quarttle or Qa
3rd -s~ 0 -
4 Items
-si2!e of 3 (~) items or 22.5 th item
-size of 22nd item+i (size of 23rd
item-size of 22nd item).
-48+. (48-48)
... 48 marks.
. f6(1I) .
6th Decile or D. -SI2le 0 '101tems

=size of 6i~2 or 18th item


... 44 marks'.
70(11) 1tems
. ,0f --roo-
. o-r p ,.=S12lC
70th Percentile .

-siR of 7~) or 21st item


-47 marks ..

Location of quartilea, decllea, percenjilcs, etc•• in a discrete aerie.


Exampl, 12. From the data given in Example 9 calculate the
lower and upper quartiles, 7th decile and 46th percentile, 3rd quintile
and 5th octile.
Solllfi~n. Lower quartile .
... Sl2:e 0
f TII patrs

=site of 4!7 or 114.25th pair


=7.5
122 FUNDAMENTALS OF STATISTICS

. f3(11) .
Upper Quartile =Slze 0 -4- pairs

=size of ~t~7) or 342.75th pair

=9
· f 7(n) .
7th Decile =!i{ze 0 -fO-palrs

· 0 £7(457)
=Slze lU or 3199h'
. t pair
=9
46th Percenrl1e · 0 £--pallS
=SJZe 46(n) .
100
· £46(457) 2 02 .
=Slze 0 -fOO-" or 1. 2th pair
=8
3rd Quintile · f 3 (n) .
=Slze 0 - 5 - palts

· 3(457) .
=SlZe o f -- or 274.25th Ipalr
5
=8.5
· f 5(n) ".
5th Octile = size 0 -8- pairs

.
=SlZ~ 0
f5(457)
8 or 2856h .
. t pal!
=8.5

Determination of quartiles, deciles, percentiles, etc., in a conti.


DUOUS series

In a continuous series, a"s in the case of median the values of quar-


tiles, deciles and percentiles, etc., lie in various class-intervals and the
actual values have to be interpolatecl by the use of algeoraic formulae.
The formulae for the calculation of quartiles, deciles, percentiles, etc., are
almost the same as used in the calculation of median. The assumption
of interpolation is also the same, and it is that the frequencies of a class-
interval are uniformly spread over its values.
Thus
MEASURES 0 r GENTRAL TENDENCY 123

'1
Where and 12 are the lower and upper limits of the class in which
the first quartile lies,/l the frequency of this class, '11 the quartile number
.!!._ and c the cumulative frequency of the class preceding the quartile
4
class.
',,-/1 ,
Qa = I 1 + 11 ('l3- C)
Where 11 and 12 stand for the lower and upper limits of the class
in which the 3rd quartile lies, 11 for the frequency of this class, 'ia the
quartile number and & the cumulative frequency of the class preceding
the quartile class. .
Similarly the formulae can be denved for the calculation of deciles
percentiles, etc. '
Thus '.-i1
'd )
D 2= I 1 - - 2-& an d
11
i.-II \
P72 ... il +-y-;- rp72- C}
Example 13. From the data given below calcula 'e the median and
quartiles.
Solution. Calculation of the median and quartile ages of married females.
Age Number of married Cumulatlye frequency.
females
~

0-5 3 3
5-10 31 34
10-15 410 444
15-20 1809 Q253
20-25 2446 4699
25-30 2223 6922
30-35 1723 8645
35-40 1292 9937
40-45 963 10900
45-50 762' 11662
50-55 531 12193
55-60 317 12510
60-65 156 12666
65-70 59 12725
70-75 37 12762

Total 12,762
The median age of married fe~ales
th f th n females, where n equals the total
= e age a e 2 frequency
12762
... the age of the -2 _. i.e., 6381st married female .
l~ PUoNDAMBNTALS OP STATIS'l'ICS

who lies in the 25-30 age group. Applying the formula of interpolation
1,-/1 )
M- I 1+-,;-(111-1
where, M represents the median,/:t-!.._nd I. the lower and the upper limits
of the group in which median is situated;!1 the frequency of median class;
111. the number of middle item or T items a"nd I. the cumulative
frequency of the group lower than the one ~ which median is situated.
30-25
M-25+ 2223 (6381 - 4699) -28.8· years approx.
The lower quartile age of married females
n
-the age of the ~ i.i., 3190.50th married female who lies
in the 20-25 age group;
By interpolation
I 1,-/1 )
121-1+ !1 (fl- t ;
where 121 represents lowc:r quartile; 11 and It. the lower and the upper
limits of the group in w!Uch lower quartile is situated;!I' thf frequency of
"
lower quartile class; fl' the number of 4 .ltems; and t. the cumu-
lative frequency of the group lower than the one in which the lower
quartile is situated.
=20+ 25-20
2446 (31 90.50 - 2253) = 21.9 yrs. approx.
The upper quartile agc< of married females
-the age of the 3 ~) 'i.,.• 9571.5 st married female who
i~ situated in the 35-40 age group;
By interopolation,
n 1+ 1.- 11
olGa"'" 1 --Y-;-(f,-t;
)

where Q a stands for upper quartile; 11 and I,. for the lower and the upper.
limits oithe groUp in which upper quartile is situated;!! for the frequency
of upper quartile class; fa for the number of ~) items; and 1

or the cumulative frequency of the. group lower than the one in which
Q. is situated.
-- 35 + 40-35
1M2 (9571.5-8645) "",38.6 yrs. approx.
MEASURES OF CENTRAL TENDENCY 125
Bxampll 14. From the dda given in Example 10 calculate
(a) 8th decile and (b) 56th percentile.
S oilltion : (a) 8th duile
Da =si2!e 'of 8 i~) items. where n equals 245
... size of 196th item, 'which lies in 7 - 9 group; applying the for-
mula of interpolation.
D.... 11 + !,,;:1 (ds - t).

here 11 and I" represent the lower and the upper limits of the group
in which 8th decile is situated.ft, the frequency of the same group; de, the
value of 8i~) item and t. the ~ulative frequency of the group
)ewer than the one in which 8th deCIle is situated.
9-7
We get Da-=7+ ---sr (196-144)
=8.6.
(b) 56th Pemntil,;
. f 56 (n) .
PH=s12le 0 100- items

. f 56(245) ..
.,. SIZe 0 100 stems
-sae of 137.2th item, which lies in 5-7 group,
Applying the formula of interpolation,
It-II
P&I - I 1 + 11CP.. - t);
where 11' II and!1 represent the lower and the upper limits and the fre-
quency of the group in which the 56th p_ercentile is situated PH> the value
of ~~_(n)
100 item and t, the cumulative frequency of the group lower
lihan the one in which PH is situated
7-5
We get Pae =5+ ss-(137.2-59)
-6.84

Graphic calculation of quartiles, deciles and percentiles, etc.


Like median quartiles, dedles and pel'centiles can also be calculated
graphically with the help of cumUlative frequency curves called Ogives.
The rule for drawing such curves and the procedure for reading the''Values
of quartiles, deciles and percentiles, etc., would be discussed in details
in the chapter on Graphs.
126 FUND.&MBNTALS OF STATISTICS

Characteristics of quartiles, deciles and percentiles, etc.


It should be remembered that quartiles, deciles and percentiles ·etc.
are not averages in the same sense in which mean and median are. An
average is representative 9f a whole series while quartiles, deciles and
percentiles are averages of parts of series. First quartile is the average of
the first half of a series arranged in ascending order. Similarly, the
third quartile is the average of the second half of the series. First decile
in the same way is the average of the first tenth part of a series and first
percentile of the first hundredth part.
These are,however, very helpful in understanding the formation
in a series. They tell us how various items are spread round the median.
Their special utility lies in a study of the dispersion of items from the
median. We shall discuss this point in greater details in the next chapter
and then the usefulness of this study would become more clear.

MODB

Mode is the most comma" item of a series. It represents the most typical
of frequent value of a series-a \talue which is in fact,the fashion(/a mode).
When one speaks of the "average student," "the most common wage."
"the common man" or "the typical farm" and the l!ke, he is unconsciously
referring to mode. If it is said that the most common wage in a particular
industry is Rs. 50 per month, what it means is that the largest number of
persons get this single figure of Rs. 50 as wage. Other I figures of wage
are not as popular as this one, and the number of persons getting them is
less than the number getting Rs. 50 per month. _
Methods of calt:tllation. It appears from this definition that it must
be very easy to calculate the mode of a series. In fact it is.., not always
so. As we shall see later on, the most satisfactory method of calculating
mode is that of "curve fitting" which is an extremely difficult process.
In ordinary practice, however, mode is estimated by easier methods which
are comparatively very much less accurate than the method of curve
fitting. These methods are no doubt very simple and easy.

Mode cannot be determined from a series of individual observations


unless it is converted into either a discrete or continuous series. In a
discrete series the value of the variable a&ainst which the frequency is the
largest would be the modal value. Simllarly in a continuous frequency
distribution the class-interval having the maximum frequency would
be the modal class. The exact location of mode in a class-interval is
done by interpolation, as in case of median, on the basis of certain
assumptions which we shall examine a little later. Location of the modal
value in a discrete series or of a modal class in a continuous series is
possible only if the concentration of items is at one particular point.
If, however, there are two or more values round which figures concentrate,
it becomes difficult to determine the value of mode. Such series are
-,
MEASURES OF CENTRAL TENDENCY 127

called hi~moda/, tri-modal and multi-modal depending on whether the


items concentrate at ~ 3 or more values.
Gr()llping method. In discrete and continuous series if the items
concentrate at more than one value, attempts are made to find out the
point of maximum concentration with the help of grouping method. In
this method values are first arranged in ascending order and the frequencies
against each value are written down. These frequencies are then added
in two'. and the totals are written in lines between the values added.
Frequencies can be added in two's in two ways:
(a) By adding frequencies of item numbers 1 and 2; 3 and 4;
5 and 6 and so on.
(h) By adding frequencies of item numbers 2 and 3; 4 and 5; 6
and 7 and so on. After this the frequencies are added in three's. This
can be done in three ways : -
(a) By adding frequencies of item numbers 1, 2 and 3; 4, 5 and
6; 7, 8 and 9 and so on.
(h) By adding frequencies of item numbers 2, 3 and 4; 5, 6 and
7; 8, 9 and 10 and so on. .
(c) By adding the frequencies of item numbers 3, 4 and 5; 6, 7
and 8, 9, 10 and 11 and so on.
If necessary freq~encies can be added in four's and five's also· After
this the si2!e of items containing the maximum frequencies are noted
down and the item which has the maximum frequency the largest num-
ber of times is called the mode. If grouping has been done in case of
continuous series we shall be in a position to determine the modal class
by this process.
We shall now see how mode is determined by the grouping me-
thod in a discrete series.

Location of mode in a discrete series

Example 15
Find out the mode of the following series : -

Si2!e FrequenH Si2!e Frequency


5 48 13 52
6 52 14 41
7 56 15 57
8 60 16 63
9 63 17 52
10 57 18 48
11 55 19 40
12 50
.
128 PONDAlmNTALS OP STA'rISTICS

SolNliofl

Location 'Of mode by grouping

SllC 0 f FrC<!uency (f)

I ~3) I
item
(81) _(!L_/ (2) {4) 1 (5) (6)

5 48.
100 I
!
6 52 } II J~
108
7
8
56
60.
..-
} 116
} u3
r I I
156
168 179

I t I
9 63
10 57 } lao I~
1
11 55 } 112 17' 162
} 105

I I
12 50
13 52 } 102 157
I

}
I
93 1.43
14 41 150
}
I I II~
98
15 57
16 63
} X20 161
172

17 52 } 115

I
} 100
18 48

19 40 } 88 140

The frequencies in colutntl (1) are first added in /tIIo' sib. columns (2)
and ,3). Then they are added in IDr,,'s in columns (4), (5) and (6). The
maxtmum frequency in each column is indicated by thick letters. It
will be observed that mode changes with the change in grouping. Thus
according to column (1) mode should be 9 or 16 according to column
(2) it should be either g or 10 or 15 or 16. To find out the point of.ma.xi-
mwn concentration the data can be arranged in the shape of table as
follows:
129

Analysis Table
Columns Sh!!e of item containing ma:lCimum frequency
- 9 16
(1)
(2) 9 19 15 16
(3) 8 9
(4) 8 9 10
(5) 9 10 11
(6} 7 8 9
No. of times--a -size 1 3 6 3 1 l' 2
occurs , ! I
Since the size 9 occurs the largest number of times it is the modal
size or mode is 9.
If we look; at the frequencies in the o~iginal t.able, we shall fin.d
that the frequency of 63, which is the ma:lC1mum smgle frequency, IS
against two values, 9 and 16. The series thus appears to be hi -modal
but the process of grouping leads us to the conclusion that the- con-
centration of items round 9 is more than the concentration round 16.
Even if the frequency against 16 was 64 instead of 63 probably group-
ing would have disclosed that the concentration/-of items round about
9 is plore, even though the individual frequency again!>t 9 is only 63 It
is thus never safe to rely only on the inspection of a series and to locate
the mode at the point of maximum frequency. Mode is affected by the
frequencies of the neighbouring items also, and, therefore, grouping is
essential, as it reveals the true point of ma:lCimum concentration.
Determination of mode in a continuous series
In a continuous senes the determination of mode involves two
steps. First, by the process of grouping, the class in which there is
maximum concentration has to be located. After this the value of
mode is interpolated by the use of a formula. It should be remember-
ed that mode does not always give satisfactory results in a continuous
series. If the size of the class-interval is changed the modal class also
changes in many cases. Suppose, for example, the magnitude of c1ass-
intervals is 10 and mode hes in, say, 30-40 group. If this series is
regrouped in class-intervals having magnitude of only 5, it is quite likely
that the mode may lie in, say, 45-50 group. It would depend on the
distributior. of items in various class intervals. For determining mode
in 2. continuous series, the class-intervals should not be very big in size,
but if the size of the class-intervals is very small the frequencies also
become very small, the distribution becomes irregular and the deter-
mination of mode becomes very difficult. The series n:ay even become
multi-modal.
It has already been said that the mode is affected by the frequen-
cies of the neighbouring classes. The formulae for the interpretation of
mode are based on this very assumption. If the frequency of the
9
130 FUNDAMEN'rALS OF STATISTICS

preceding class is greater than the frequency of the succeeding class,


mode wO.lld be nearer the lower limit of the class-interval and if the
frequ~n::y of the succeeding class is more than the frequency of the
preceding class mode would be nearer the upper limit. To study this,
the proportions of frequencies in the preceding and succeeding classes
to the total frequencies in these two classes, are found out.
If 10 sta?ds for the frequer~,;ies of the preceding ~lass and f. for
the frequencies of the succeedIng class these proportions would be
(a) fo
10+1,
(b) f.
10+12
These proportic;IOS are multiplied by the magnitude of the class-
i aterval and mode IS calculated in any of the following two ways-
either ~y adding fo~1B X(/a- / l) to the lower limit of the modal

class or by deducting 10~il X(/I-/1) from the upper limit .of the
modal class. Thus if Z stands for the mode,

~ Z=/ 1+ 10{'f;X (/,-11)

'/ Z-',,- 1o!:11 X (/ -:'1)1

Mode is also calculated by taking into account (I) the proportion


of difference between the frequency of the modal class and the frequency
of the preceding class, and (il) the proportiQn of difference between the
modal frequency and the frequency of the succeeding class.
Tr .~ jf /i sta.nds for the frequency of the modal class, and if we
take Into al.:count the lower limit of the mqdal class, the proportion of
the difference (/1 -10) is added to it and if we take into account the
upper. limit, the p.roportion of the difference (11-1a) is deducte~
from It.
Thus
MEASURES OF CENTRAL TENDENCY 131

The two sets of formulae given above would give different values
of mode as they are based on different assumptions. In the first case
we take into account only the frequencies of the preceding and suc-
ceeding classes whereas in the second case (i) difference of the modal
frequency and the preceding frequency, and (ll) the difference of the
modal frequency and the succeeding frequency, are taken into accou ..<.
The second set of formulae ~re supposed to be better than the
first set and usually mode is interpolated by starting with the lower
limit. As such we shall be making use of the following formula in
the determination of mode in a continuous series.
"*'
v/
Z- I1 + 2/11--/10- 12
1 0
I
(2- 1
I)

Example 16. The following tahle gives the length of life of 150
electric lamps : -
Life (hours) Frequency of lamps
a to 400 4
400 to 800 12
800 to 1200 40
1200 to 1600 41
1600 to 2000 27
2000 to 2400 13
2400 to 2800 9
2800 ~o 3200 4
Calculate the mode.
Soln/ion. Determination of mode by grouping
Life (hours) I Frequency of lamps
(1)
, (2) I (3) I (4) I (5) (6)
0- 400
I
I 4
-.
\.16
I
400- 800 12 ) ")
~52
800-1200 40 J 56
I
18]
1 f93
1200-1600 4:1: J
168 I!oa
1600-2000 27 J
2000-2400 13
140
J 'la,
I·' f~
}22 I
\49
2400-2800

2800-3200
9

4
113
J ! I i
132 FUNDAMENTALS OF STATISTICS

COlumns Si2!e of group containing maximum frequency


(1) 11200-1600
(2~ 800-1200 1200-1600
(3 11200-1600 1600-2000
(4) I 12QO-1600 1600-2000 2000-2400
(5) 400-800 800-1200 1200-1600
(6) 800-1200 1200-1600 1600-2000
No. oftime~
---
the size 1 3 6 3 1
occurs
There£ re 1200-1600
- , _g roup. Mode lies in
g roup is the modal
thill group and by applying the formula of interpolation, viZ"

Z=/ + ~1 £0-11 (/
1 2- /1 )L?ttfl?)
Where Z stands for mode, 11 and 12 .stand for the lower and upper
limits of the modal group, 11 stands for frequency of the modal group,
fo stands for frequency of the group preceding the modal group,f2 stands
for frequency in the group succeeding the modal group.
41-40
We get, Z = 1~+ 82-40-27 X ~O
=1226.67 hours
Thus
The modal life of the lamp = 1226.67 hours.
Detel:.tllination of mode by curve fitting
As has been said earlier, the above methods of the calculation
of mode are unsatisfactory. In most of the distributions, as they arise
in actual practice, these methods would not give satisfactory results.
The ideal method of calculating the mode is that of curve :litting. Since
there are many irregularities in the data which we normally come across,
it is necessary to remove them befo~e determination of mode. These
irregularities are removed by the technique of curve fitting. Attempts
jlre made to :lit an ideal curve which gives the closest possible :lit to the
actual distribution. The value of the variable corresponding to the
maximum of this ideal curve is the value of the modt'. The technique
of curve fitting is highly mathematical and should be left to the more
advanced students of this subject.
Determination of mode from mean and median
In a symmetrical distribution the mean, median and mode are
identical. We shall discuss in the next chapter the concept of 'a sym-
metrical distribution which gives a normal curve. In actual practice,
however, symmetrical distributions are very rare, and data usually give
a symmetrical curve. In distributions which moderately differ from
MEASURES OF CENTRAL 'l'E;NDENC'Y 133

a symmetrical distribution, there is an empirical relationship between


mean, median and mode. This relationship holds good fot most of
the moderately asymmetrical distributions. It is as follows :_
Made=Mean- 3 (Mean-Median)
It means that mean is. usually on one end and mode on the other.
Median lies at a point one-third of the distance between mean and mode
from the mean towards the mode. The median is thus closer to mean
than mode. From thi.s relationShip we can estimate the value of mode
of moderately asymmetrical distribution if we know the values of
mean and median. This relationship can also be expressed as :
~ (Median-.Mode) = i (Mean-Mode)
Tn most of the cases if the series is moderately asymmetrical, value
of the mode as estimated from the mean and median would not differ
significantly from the value calculated by other methods.
Determination of mode by graphic method
Mode can also be located graphically. In discrete and continuous
series the point of maximum frequency, which wouLl usually be the
apex of the curve, is observed, to find out the modal value. The value
of the variable against the apex of the curve would be the value of the
mode. However, when determining the value of mode graphically
it is better if the curve is smoothed for irregularities. We shall discuss
more about it in the chapter on Graphs.
Merits of Mode •
Of the many conditions laid down for an ideal average mode
possesses only a few. They are as follows : -
(i) It possesses the merit of simplicity. It can be determined
without much mathematical calculation. In a discrete series mode
can be located even by inspection. In this respect, like median, it has
an advantage over arithmet~c average
(ii) 111 is commonly understood. As has been said earlier, mode
is an average which people use in their day-to-day expressions. The
average si~e of the ready-made garment, the typical si2e of holdings,
the average number of road accidents are all examples of the common
use of mode.
(iii) Sin<:e mode is the most common item of a series it is not an-
isolated example like the median: Unlike arithmetic average it cannot
be a value which is not found in the series.
(iv) Mode is not affected by the values of extreme items provided.
they adhere to the natural law relating to extremes.
~v) For the determination of mode it is not "necessary to know
the values of all the items of a series. If the point of norm or maxi-
mum concentration is known it is enough. The value of extreme
items need not be known even, as usually there is very little concentra-
tion round the extreme values~
134 FUNDAMENTALS OF STATISTICS

Dtawbaoks of mode
Mode is an unsatisfactory average and has many drawbackt.
)ome of them are as follows:
(,) Mode is ill-defined, indeterminate lind indefinite. The veCj
Ist condition laid down for an ideal average that it should be rigidly
efined is not fu11illed by it.
(ii) Mode is not based on all the observations of a series and as
lch the second condition is also not fulfilled by it.
uti) Mode is not capable of further mathematical treatment.
(iv) Mode may be unrepresentative in many cases. If in a series
1000 items 20 have a particular value and other values have frequencies
is than 20, it does not necessarily mean that the value whose frequency
20 is the typical or average value. In such cases data should be
IOverted into class intervals of a bigger magnitude.
(u) In many cases it may be impossible to set a definite value of
.ode. There may be 2, 3 or more modal values.
omparison of mode with mean and median
From' the above discussion, about the merits and drawbacks of
lean, median and mode it is qbvious that mode dbes not stand in
)mparison either to mean or median. Mode no doubt possesses the
lerit of being the most popular item 'of a series and has also the
ivantage of easy calculation and common understandability, yet its
rawbacks are too many to be set' off against these merits. Mean is
.mple in calculation, its value is definite and can be easily determined.
t is amenable to algebraic treatment and is usually not affected much
y fluctuations of sampling. Median is more ea,ily calculated than
ven mean, and in certain cases it is as stable as mean, but if v'ariations
it the values of items .are not uniform, median is indeterminate, and .is
lmost incapable of algebraic treatment. Mode is hardly suitable for
[lost of the elementary studies as it is correctly determined only by
urve-fitting which is an extremely difficult process. It is unrepresen-
ative in many cases, and is not based on all the observations of a series.
rhus, of these tlvee averages, mean has definite advantages over median
.nd mode, though there may be some cases where median or mode
nay have preference over mean. Mode has its own importance and
t JIlay be the reason for giving its value along with mean but it should
)e clearly understood that mode cannot replace mean and for that
natter neither can median do so. However, it should not be ta~en
:0 mean that median and mode are superficial averages and have no
independent virtues. There are certain fields in which Il".t!dian or
mode may give better result than the mean, but sllch cases are few
and the universality of mean cannot be challenged on account of these
~ases. We shall discuss more about this point in a later section after
we have examined the other averages also.
MBASURBS OF CBNTRAL TENDENCY 135

GEOMETRIC MEA.~
Geometric mean is the nth root of the product 9'fn items of a series.
Thus if the geometric mean of 3. 6 and P Ie. ~o be calculated it would
be equal to the cube root of the product of these figures. Similarly
the geometric mean of 8, 9, 12 and 16 would be the 4th root of the
product of these four figures.
Symbolically g=D'¢mlXmsXHlaX ... mn
where g stands for the geometric mean, n for the number of items and
m for the values of the variable.
The calculation of the geometric mean by this process is possible
only if the number of items is very few. If the number of items is
large and their si2:e is big, this method is more or less out of question.
In such cases calculations have to be done with the help of logs. In
terms of logs.
1 _log.rml+1og.ms+log.ms+ .. .log mn
og.g_ 11

or
g- A nti-1og. {
log.III1+10g.Hl2+1nog.Hls+ .. .1og. Hln }

or

g=Anti-log. { ~ lOng. HI}


Thus geometric mean is the anti-log of the arithmetic average of the
logs. of the values of a variable. It is also possible to assume a log.
mean and to find out the deviations from it and then calculate the
geome~ric mean. It should be noted that the yalue ?f the geometric
mean IS always less than the value of the arIthmetlc avera~e unless
all the items have equal value in which case the geometric mean
and arithmetic average have identical values .
. The following examples would illustrate the calculation of geo-
metrlc mean:
E«ampl6 17. Calculate the simple geometric mean from the
following items ;-
133, 141, 125, 173, 182.
Solulion. Calculation of the geometric mean
Size of item Logarithms
133 2.1239
141 2.1492
125 2.0969
173 2.2380
182 2.2601
n==5 l:logs .... 10.8681 .
136 FUNDAMENTALS OF STATISTICS

According to the fotnlwa, viZ"


Geometric Mean~"'\i11l--xm
1 :I x---m
••• 0

• 1 ((lOg. m1+1og. mz .. .log. mn ))


"'" A nt 1- og
n
=Anti-log. ~O.8:81 =; Anti-log. 2.1736
=149 (to the nearest whole number)
Thus the geometric mean is 149.
Alttrnate MeihoJ
Site ofitem Logs. Deviations from
(m) assumed log. mean
(2.000)
(Jx)
133 '"1 2.1239' .1239
141 2.1492 .1492
125 2.0969 .0969
173 2.2380 \.2380
182 2.2601 .2601
,,-5 r.dc<= .8681

. Mean=
Geomc;tnc '.' Anti-
. 1og. [ assume'd 1C?g. + ~ Deviations]
II

= Anti-log. [ 2+ ~ ] _Anti-log. 2.1736


-t49 (to the nearest whole number)
Thus the geometric mean is 149.
Example 18.
Calculate the geometric mean of the following two series:-
(4) (b)
2574 .8974
475 .0570
75 .0081
5 .5677
.8 .0002
.08 .0984
.005 .0854
.0009 .56/2
MEASURES OF CENTRAL TENDMCY 137

SDllltion CalCIIlation, of geometric mean


Series A Series B

Size of items Size of items Logarithms


Logarithms (m)
em)
(a) 2574 3.4106 .8974 1.9530
(b) 475 2.6767 .0570 2:7559
(() 75 1:8751 .0081 3·9085
(d) 5 0.6990 .5677 1-:7541
(e) .8 1.9031 .0002 4.3010
(f) 08 2.9031 '.0984 2:9932
(g) 005 3.6990 .0854 29317
(h) 0009 4.9542 .5672 1:7538
~ log._ 2.1208 1: log. = 10.3512

Geometric Mean =Anti-Iog. [1: 10;. m]


S,ries A. According to the formula, we have

g= A n ti'..Iog. 2.1208
- 8 - = A nt!. 1og ..265
-1.841
16+6.3512
Series B. g= Anti-log. 8 =Anti-Iog.2.7938
-.06220
Al~ebraic properties of geometric 'lllean
Geometric mean possesses certain mathematical properties and they
are as follows : -
(i) Just as in case of arithmetic average the sum of the items
remains unchanged if each item is replaced by the arithmetic average,
similarly in case of geometric mean the product of the items remains
unchanged if each item is replaced by the geometric mean. Thus the
total of 2, 4. and 8 is 14 and the arithmetic average is__!3~ If in place
of these figures. we substitute the arithmetic average the total would
still remain 14.. Similarly in caSe of geometric mean the product of
these three figUres 2. 4 and 8 is 64 and the geometric mean is 4. If in
place of these numbers the geometric mean is written the product would
still remain 64.
(;1) On account of the above property of the geometric mean, it
is possible to calculate the combined geometric mean of two or more
senes if only their geometric meanS and the number ofjtems are known.
138 PUNDAMENTALS 01" STATISTICS

The formula for finding out the combined geometric mean is : -


g =anti-Iog.
1'2
[!!.1
log_:_~_t~_l~g'_~2 ]
"1+"2
Where gl'B stands for the combined geometric mean, "1 and "2 for
the number of items in the two series respectively and gl and g2 for
the geometric mean of these two series.
Thus if there are two series A and B with the following values:
A B
133 125
141 173
182
and we have to find out their combined geometric mean. The log.
of geometric mean of series A is 2.13655 and of series B it is 2.19833.
If these logs are multiplied by the respective number of items of the
two series, namely 2 and 3, their values would become 4.2731 and
6.5950 respectively. The combined geometric mean wculd be:
. 1 [4.2731+6.5950]
antI- og. 2+3~

. Iog. [10.8681']
""anti- . 1 .2. 1736
--5- =antt~,og
149
If we calculate geometric mean of the five items together we shall
get this very figure. It can be yerHied from the answer ot example. No.
17 in which the geometric mean of these five items has been calculated.
(iii) Just as in the case of arithmetic average, sum of the deviations
from the mea:' on either side is always equal, similarly in case of geo-
metric mean the product of the corresponding ratios on either side
is always equal. If the ratios of the geometric mean to the figures
which are equal to less than it, are multiplied together, this product
would be equal to the product of the ratios of figures more than the
geometric mean.
Thus the geo~etric m.ean of 3, 6, 8 .and 9 is equal to 6. The
product of the ratlos of ltems. equal to it or less than it would be
equal to the product of the ratios of items more than it.
t, g 8 9
Thus 3- X 6" "" g X g
or
6 6 8 9
'3 X'6=6 X 6
This p,:,cperty of the geometric mean is very important. It
indicates that geometric mean measures relative changes. If the price
MEASURES op CENTRAL 'TENDENCY 139

of a commodity has gone up from 100 to 1000 and of another commo-


dity has fallen from 100 ~o 10 there is no r~lative change in the p~ice
level. The rise in the rrlce of one commodlty has been set cff agalnst
the fall in the price 0 the other. In such cases arithmetic average
would give an erroneous conclusion. The arithmetic average of the
.. lprices
orlglna ' 0 f the two conuno d'lues
. wouId b e 100+100
2 or 100 an d
"
the arithmetiC average af ter the c h anges In
. prices,
. would be 1000+10
2
or 505 indicating that the prices have gone up_ The geometric mean
of the original prices would be 100 and the geometric mean of the
new prices would be V 1000 X 10 or 100. It indicates that relatively
there has been no Change in the price level as the rise in the price of
the first commodity has been counter-balanced by the fall in the price
of the other.
-Jiv) The geometric mean of the ratios of corresponding observa-
tions in two series is equal to the ratios of their geometric means. Thus
i~ there are two series as follows : -
A B A
13
3 2 1.500
6 4 1.500
8 4 2.000
9 8 1.125
Geometric mean 6 4 1.500
In the above example the geometric means of the two series A
and B are respectively 6 and 4 and their ratio is as 1.5: 1. The geo-
metric mean of the ratios ~ as calculated in the third column is also
the same figure, i.e. 1.5. Thus the geometric mean of the ratios of
the corresponding values of two series can be directly- calculated by
finding out the ratio of their geometric means.
(II) The geometric mean of the products of corresponding items
in two series is equal to the prod~ct of their geometric means. Thus
if in the above example we multiply-the corresponding items of A and
B series the products would be respectively 6, 24, 32 and 72 and their
geometric mean equal to 24. The geometric mean of these two series,
A and B is also ( x4) or 24.
(VI) Another mathematical proper.ty of the geometric mean is
useful in calculating the average rate of increase of any sum at com-
pound interest or in calculating the average rate of increase of a popu-
lation. In fact in all cases where changes in quantity are directly pro-
portionate to the quantity itself, or where we are qealing with average
of ratios as in case of index number of prices, the use of geometric mean
is almost inevitable.
140 FUND~MENTALS 'OF STATISTICS

Th,:s i.f PO represents the principal at the beginning of a period.


Po the prIncIpal at the end of the period, r the rate of interest and fl the
number of years.
pn =Po (l+r)n
and

r= n /) Pn __ 1
." Po
Thus if Rs. 1,000 at compound interest become Rs. 1,500 at the end
of 10 years there has been an increase of 50% and the simple tate of
interest is 5%. The compound rate would be

r =10 J~~~~ -- 1
=10'\1"1.5- -1 =1.041-1
=.041 or 4.1%
Whenever we have to find out the average of the rates of increase
_gr decrease, ~uch problems arise. If we calculat~ the mean of the
rates of increase or decrease the study would be Inaccurate as -mean
measures absolute changes but if the geometric mean of the rates of in-
crease or decrease is calculated the results would be accurate, as geo-
metric mean measures relative changes. ,
Merits of geometric mean
Besides the above-mentioned mathematical properties the geometric
mean has many other merits. We shall now examine the worth of
this average by finding out how many conditions Qf an ideal average
(laid down earlier) does it satisfy.
(i) The geometric mean is rigidly defined and its value is a precis~
figure.
(ii) It is based on all the observations of a series. Like arith-
metic average it cannot be calculated, if even a single value of a series
is missing.
(iii) It is capable of further algebraic- treatment. As we have
seen above, various types of mathematical relationships can be establish-
ed between data when a relative study is being made with the help of
geometric mean.
{io) Ge~etric mean is. not much affe~ted by the ftuct?ations of
sampling. It .,.ves comparatIvely more weIght to smaller Items. In
this respect it is better than the arithmetic average and a single big figure
does not push its value very much.
ThuS out of five conditions laid down for an ideal average geo-
metric. meal' satisfies four.
MEASURES OF CENTRAL TENDENCY 141

Drawbacks of geometric mean


(i) Geometric mean is neither easy to calculate nor is it simple
to understand. This is a major drawback of this average.
(il) If any value in a series is ~ero the geometric mean would also
be ~ero. In such cases geometric mean cannot be calculated. Simi-
larly if a value is negative geometric mean becomes an imaginary
figure.
(iii) Like arithmetic average it may be a value which does not
exist in the s ~ries.
(iv) The property of giving more weight to smaller items may in
some cases prove to be a drawback of the geometric mean. In some
cases smaller items have to be given smaller weights and bigger items
bigger weights. In such cases geometric mean is not an ideal average.
The above discussion clearly indicates the scope and limitations
of the geometric mean. We shall discuss more about the properties of
the geometric mean in the chapter on Index Numbers.
HARMONIC MEAN

Harmonic mean of a serie.r is tbe reciprocal of tbe arithmetic avcrage of


the reciprocals of the values of its vario!ls items. The harmonic mean of
1+1+1
2, 4 and 8 would be equal to the reciprocal of ¥ -p
Symbolically
1 1 1 1
h=Reciprocal '!!1 + m;+ m3 +~~ m~
n
Where h stands for the harmonic mean, '''1,171 2, etc., for the values
of the variable and n for the number of items. The following examples
would illustrate the formula :
Example 19. The annual incomes of fifteen families are given
below tn rupees : -
SO, 2500, 90, 1200, 1450, 7200, 120, 1060, 150, 4S0, 360, 96, 200,
520, 60.
Calculate the Harmonic Mean.
Somtion. Cot/lpulation of the Harlllonie II/Call of the annual incomes of
fifteen families
Size of items Reciprocals
Rs. (m)
80 .01250
2,500 .00040
90 .01111
1,200 .00083
142 FUNDAMENTALS OP STATISTICS

1,450 .00069
7,200 .00014
120 .00833
1060 .00094
150 .00667
480 .00208
360 .00278
96 .01042
200 .00500
520 .00192
60 .01667
-~·m~048

Harmonic mean = Reciprocal.2...+ 1 -t- _!_ + ... +


1111 111~ m3 "'n1
n
where 1111 m• ... mn represent 'the
values of the n items of the vari-
able, and n is the number of items.
R' l:E Reciprocals
..,. eClproca - - 11

Substituting the values


b ...Reciprocal .0~48 -Reciprocal .00536
"'" 186.5 tupees. I
Examplt 20. Calculate the harmonic mean of the following
items : -
1.0, 1.5, 5.0, 15.0, 250.0, .5, .05, .095, 1245.0, .009.
SO/lItioll. CalGII/alion of th, harmoniG mean.
Size of the items Reciprocals
1.0 1.0000
1.5 .6667
5.0 .2000
15.0 .0666
250.0 .0040
.5 2.0000
.05 20.0000
.095 10.5300
1245.0 .0008
.009 111.0000
145.4681
r[' R' I :E (Reciprvcal of the items)
£" arm011lC mean = eoproca n
145.4681
= Reciprocal 10 = Reciprocal 14.54681
=.06878
MEASURES OF CENTRAL TENDENCy 143
The above examples clearly show that the harmonic me~n gives
a very great importance to small itc:ms of a series. In example 19 above
if arithmetic average is calculated it would be 1038 whereas the har-
monic mean is only 186.5. In fact harmonic mean gives a value which
is smaller than not only the arithmetic average but also the geometric
mean.
Reciprocal character of arithmetic average and harmonic mean
The price of a commodity can be quoted in two ways, either in
tetms of money or in terms of quantities. Thus either we can say that
the price of mangoes is Rs. 1.50 per dozen or we can S1.y that the ptice
is eight mangoes per rupee. Suppose mangoes are selling at the follow-
ing rates at three shops-4 far a rupee, 5 per rupee and 10 for a rupee.
We have to calculate the average price of the mangoes. The arithmetic
average of the figures given above "(4, 5 and 10) is 1':. This is the
average number of mangoes sold per rupee. Therefore, the prke of a
mango would be fg rupee or 15.7 paisa. If these 9uotatio'lS are in
terms of prices and not quantities they would lie 25 paIsa per mango at
first shop, 20 paisa per mango at the second shop and 10 paisa per man-
go at the third shop. The average of these prices (25, 20 and 10) is
18.3 paisa per mango. Thus there is discrepancy between the average
calculated above. It is due to the fact that we have calculated the
arithmetic average of "quantity prices" (so many mangoes pe? rupee).
If we calculate the harmonic mean of these quantity prices, it would
. t+ l '
equal to the reciprocal of !- 11; .~ or the reciprocal of U
1
or it would
be f!j mangoes per rupee. The price of one mango then would be it
rupee or 18.3 paisa. Thus we find that if we calculate the harmonic
mean of the quantity prices and the arithmetic mean of the money prices
there would be no discrepancy and the price per unit would be the same,
in both the cases. Harmonic mean gives accurate results in such ..:ases.
For one rupee we get 1~ mangoes: therefore the price of a mango is
U rupee. The two are reciprocals of each other.
Merits of harmonic mean
(fJ Harmonic mean satisfies the test of rigid defini!ion. Its
definition is precise and its value is always definite.
(it") Like arithmetic average and geometric mean this average is
also based on all the observations of the series. It cannot be calculated
in the absence of even a single figure.
(iii) Harmonic mean is capable of further algebraic treatment.
fiv) Like geometric mean this average is also not affected very
much by fluctuations of sampling.
(v) It gives greater importance to small items and as such a single
big item cannot push up its value.
(VI) It measures relative changes and is extremely useful in averag-
ing certain types of ratios and rates.
144 FUNDAMENTALS OF STATISTICS

Drawbacks of harmonic mean


(1) Harmonic mean is not readily understood nor can it be cal-
culated with ease.
(2) It gives a very high weightage to small items and for analysis
of economic data it is not very useful.
(3) It is usually a value which does not exist in a series.
(4) Generally it is not a good representative of a statistical series,
unless the phenomenon is such where small items have to be given a
very high weightage.
OTHER AVERAGES

Having discussed the chief features of the five ~ain avera~es we


shall briefly discuss some of the minor and less important averages.
Quadratic mean
This is also known as Root Mean Sqllare. It is calculated by taking
the square root of the average of the squares of the numbers. It is
useful when some items have negative values and others positive values
because in such cases the mean is not very r!!presentative.
Symbolically :

Qm=Jmlll+msl+mall + ... + m1n


n ,
Where Q'II stands for quadratic mean, ml,f"., ma, etc., for the value
of the variable :tnd n for the number of items. The following example
would illustrate the formula : -
~ Example 21. Find out the quadratic mean of the following items
10, 30, 40, 50 and 70.

SO/lition :
Calclllation of quadratic mean
Size of items Square of the size
(m) (ml)
10 100
30 900
40 1600
50 2500
70 4900

n=5 10000
1000~
Qm= j 5
=44.72
MEASURES OF .cENTRAL TENDENCY 145

The arithmetic average of the series would have been 40. Quad-
ratic mean is seldom used as an average except in case of finding out
the average of the positive and the negative deviations from a measure
of central tendency. In that case it is known as standard deviation: We
shall discuss it in the next chapter.
Moving average
Moving average is calculated by using the technique of simple
arithmetic average. It is useful in removing the irregularity of time.
series and is usually calculated to study the long period trend. The first
thing to be decided in the calculation of moving average is the "period"
for which the average is to be calculated. The moving average may
be three-yearly, five-yearly or seven-yearly depending on the nature of
the series. We shall discuss this problem of periodicity of moving
average later in the chapter on Analysis of Time Series. For the present
we shall simply illustrate the technique of its calculation.
If a three yearly moving average is to be calculated the arithmetic
average of the first three years' figures would be found out and written
against the middle year (second year in this case). Then the'first year's
figure would be dropped and the aritbmetic average of second, third
and fourth years' figures would be calculated and written against the
third year. Similarly the arithmetic average of the figures of third,
fourth and fifth years would be written against the fourth year and so
on. The following example would illustrate the method of its cal-
culation.
ExatlJple 22. Calculate the three yearly moving average of the
following figures relating to the annual sales of a concern (in lakhs of
rupees).
Calculation of tbree yearly moving average
Year Sales (in lakhs 3-Yearly moving 3-Yearly mov-
of rupees) Total ing average
-
1945 8 ... ...
1946 9 25 8.3
1947 8 24 8.0
1948 7 23 7.7
1949 8 24 8.0
1950 9 27 9.0
1951 10 30 10.0
1952 11 32 10.7
1953 11 34 11.3
1954 12 33 11.0
1955 10 ... ...
Similarly, if a 'five yearly moving average Has to be calculaJed
the first five figures (of years 1945 to 1949) would be added 31'd their
10
146 FUNDAMENTALS OF STATISTICS

average would be written against the third year or 1947, then the next
five figures leaving the first (of years 1946 to 1950) would be averaged
and the figures written against the middle year of 1948 and so on.
Moving a~erage is very helpful in removing the fluctuations of
time series and giving an idea about the general trend.

Progressive average
It is also calculated by the help of simple arithmetic average. It
is a cumulative average and is different from the moving average. In
the calculation of this average, figures of all previous years are a,dded
and no figure is left out as in the case of moving average, Thus the
progressive average of the second year would be equal to the arithmetic
average of the figures of the first two years; the progressive average
of the third year would be equal to the arithmetic average of the figures
of the first three years and so on. .
The following illustration would clarify the procedure :
Example 23. C~culate the progressive average of the data given
in Example 22:-
Ca/(ulation oj progressive average
I
Years Sale (in lakhs of ProgressIve Progressive
rupees) total average
1945 8 8 8.0
1946 9 17 8.5
1947 8 25 8,3
1948 7 ~2 8.0
1949 8 40 8.0
1950 9 49 8.1
1951 10 59 8.4
1952 11 \, 70 8,7
1953 11 81 9,0
1954 12 93 9.3
1955 10 103 9.3
Pr<;>gressiv~ average is used by business-houses particularly in early
years wIth a VIew to compare the current profits with those of
the past,
Relation between different averages
When different averages have been calculated from a given set of
observations it will be found that there is a relationship between their
values, Generally these relationships are of the following type : -
(i) If a series is «normal" or ".ryll/metrical" the values of its mean
median and mode would be identical.
M~SURES 00F CENTRAL TENDENCY 147

(if) If a series is moderately asymmetrical the ~e4ian wo~ld be


somewhere between the mean and the mode. Usually 1t 1S at a d1stance
one-third from the mean towards the mode, or
median =mean- j (mean-mode)
mode -=mean-3 (mean-mode)
or
(median-mode) = I (mean-mode)
(iii) The arithmetic mean is greater than the geometric mean;
which in turn, is greater than the harmonic mean; but if all the values
of a variable are equal the arithmetic mean, geometric mean and harmo-
nic mean would coincide.
(if) The geometric mean of any two values is equal to the geo-
metric mean of their arithmetic average and harmonic mean. Thus
the arithmetic average of 4 and 16 is 10, geometric m-:an is 8 and the
harmonic mean is 8Il'. The geometric mean of 10 and 3 a' is also 8:
This rule holds good only when there are two items in a series. If
there are more than two items this rule would hold good only, if the
values of the items increase in geometric progression (like 2, 4, 8, 16 etc.)
Selection of an average
The choice of an average is an important and difficult problem
which a statistician has to face. It is to be very cautiously made, as
if a wrong average has been chosen, inaccurate conclusions are likely
to follow. There are no hard and fast rules for the selection of a
particular average in different fields of statistical investigation. Selection
of a particular average should be done after giving consideration to the
nature and type of enquiry, as also to the object with which the investi-
gation has been conducted. No one average can be said to be good
for all types of enquiries and under all conditions.
In the selection of an average consideration must also be given
to the chief characteristics and limitations of various averages. Most of
the averages suffer from one limitation or the other and -they have their
own merits and drawbacks as owell. We have seen that the arithmetic
average is, generally speaking, better than other averages as it has many
properties which other averageo~ do not have, but even arithmetic
average cannot be recommended 'for universal usc. If, for example,
a veOry large number of items in a series have small values and only one
or two items have very big values, arithmetic average would give
fallacious conclusions. In such cases median, mode or geometric mean
would give much better results than the arithmetic average. However,
if the purpose of the investigation is to find out things like "averag<r
output", ·'averag.e imports or exports", "average cost of production'S
~r "average price" the arithmetic average would be an ideal one. In
economic and social studies it gives better results than other averages.
If the purpose of the enquiry is to study such phenomena which are 0

incapable of direct quantitative measurement, like intelligence or honesty,


etc., median has a distinct advantage over all other averages. If, however.
148 FUNDAMENTALS OF STATISTICS

there are wide variations in any series median is the most unsuitable
average. Similarly, if the enquiry under question relates to, say,
"average size of ~eady-made clothes" or "size of typical farms," the
average to be used is mode. The use of mode is every day increasing
in business and commerce, Modal output per machine or modal time
needed to produce a commodity are very important concepts in the
business world of today. But mode i~ very often indeterminate and
unrepresentative and is entirely unsuitable for many enquiries. It is
not capable of further algebraic treatment and has limited use. If an
e~quiry is being conducted to study the relative changes in the price
level at two periods, neither arithmetic average nor median or mode
would give satisfactory results. In such cases the best average is the
geometric mean. In the construction of index numbers the use of
geometric mean is almost' universal. But geometric mean is entirely
useless if bigger items have to be given more weight or if a study of
absolute, rather than relative changes, is undertaken. Harmonic mean
similarly is the best average if small items have to be given more weights
or if we have to find out the average of certain types of rates, etc. If,
for example. we have to calculate the average speed of a person who
walks four miles per hour, for the first mile a~d three miles an hour,
for the second mile, arithmetic average would give inaccurate results.
Harmonic mean of these figures which would be s._,4, . is the correct
average. This person takes fifteen minutes to cover the first mile and
twenty to cover the second mile or in thirty-five minutes he covers
two miles. The speed is '.,~ miles per hour.
The above discussion clearly shows that each type of average has
its own field of importance an.d usefulness. Before selecting an average
ail these considerations should be kept in mind. In actual practice
tWo or three averages of a series may be necessary for a proper under-
standing of its special 'features. A discriminate use of averages is
assential for souna statistical analysis. But all said and done, it has to be
edmitted that arithmetic average would be found to be ideal average
for a larger number of enquiries, than any other average.
Limitations of averages
Even when an average .has been selected very judiciously and is
ideal for a particular investigatioo, it should never be forgotten that
even the best average has its own limitations. An average is a single
figure representing a series, and no single figure can condense in itself
all the properties of the items which it represents. This is the reason
why conclusions which are drawn on the basis of a study of averages
are not always infallible: The average height of women may be less
than the average height of men but it does not mean that no woman
can be taller than a man. The well-known example of the mathema-
tician who calculated the average depth of a stream and finding it lower
than the average height of his family members, attempted to cross it anct
drowned with his family in the process, is an illustration on this point.)
MEASURES OF CENTRAL TENDENCY 149

The average depth of the river may have been lower "than the height
of the shortest member of the mathematician's family, but at some
point the depth of the stream must have been more than the height of
the tallest member in the group.
Average is a single ~gure and can be expected to represent a series
only as best as a single figure can. Averages do not throw light on the
formation of a series or distribution of frequencies round the various
values of a variable. It is for this reason that measures of dispersion
and skewness are calculated. Averages do not reveal the whole story of
a series. A student getting 30, 40 and 50 markis respectively, in three
examinations would have the same average as another who gets 50, 40
and 30 marks respectively. The progress of the two students is in
different directions but on the basis of the averages' they will be ranked
together.
In fact if wrong conclusions are drawn by the use of judiciously
selected averages, it is not the fault of the averages. The fault lies
with the person drawing the conclusions. The inherent limitations of
averages should always be kept in mind and they should not be expected
to reveal more than what they can.
WEIGHTED AVERAGE
Need and meaning. In the calculation of simple. average each
item of the series is considered equally important but there may be
cases where all items may not have equal importance, and some of
them may be comparatively more important than others. The funda-
mental purpose of finding out an average is that it shall "fairly" re-
present, so far as a single figure can, the central tendency of the many
varying figures from which it has been calculated. This being so,
it is necessary that if some items of a series are more important than
others, this fact should not be overlooked alt<;>gether in the calculation
of an average. If we have to find out the average income of the
employees of a certain mill and if we simply add the figures of the
income of the manager, an accountant, a clerk, a labourer and a watch-
man and divide the total by five", the average so obtained cannot be
a fair representative of the income of these people. The reason is that
in a mill there may be one manager, two accountants, six c~erks, one
thousand labourers and one dozen watchmen, and if it is so, the rela,...
tive importance of the figures of their income is not the same. Similady
if we are finding out the change in the cost of living of a certain group
of people and if we merely find the simple arithmetic average of the
prices of the commodities consumed by them, the average would 'be
unrepresentative. All the items of consumption are not equally.im'por-
tanto The price of salt may increase by 500% but this wiP not'affect
the cost of living to the extent to which it would be affected, if the
price of wheat goes up only by 50%. In such cases if an average has
to maintain it,) representative character, it should take into account
the relative importance of the different items from which it is being
calcl,1lated. The simple average gives equal importance to all the
items of a series. In this sense a simple average is also a wei
150 FUNDAMENTALS OF STATISTICS

average, because 'in a simple average the relative importance of all the
items is supposed to be the same. But in actual practice the impor-
tance of various items is not always the same and in such cases the
simple arithmetic average and the weighted arithmetic ~verage would
differ in value. Therefore, in order that an average may be a typical or
a representative average, it is necessary that the relative importance
of items is taken into account in its calculation. Thus if item A is
considered-to be five times as important as item B, the weights of these
items respectively should be 5 and 1. Weights are. figures which indicate
the relative importance of variolis items.
Difficulties in weighting. It is easy to say that in many cases it is
better to take into account the relative importance of items and to have
a weighted average, rather than simple average~ but it is very difficult
to decide the relative importance of different items. If we have to
decide the relative importance of items, the problem that would arise
would be about the basis or criteria of determining the relative impor-
tance. How should weights be assigned, is a question very difficult to
answer. In fact no hard and fast rule can be laid down for the assign-
ment of weights, as the relative importance of items depends on ~he
nature and purpose of the investigation. In some cases the weights
are determined without much difficulty, and such cases are those where
weights are determined on the basis of some evidences associated with
given data. If we have to decide the weights of the income figures of
a manager, an accountant, a clerk, a labourer and a watchman, the
simplest method would be to give them weights in accordance with
their number. Thus if there is one manager, two accountants, six
clerks, one thousand labourers and twelve watchmen, the weights
would also be these very figures respectively. 1'6 calculate the average
income of these people if instead of finding out the simple arithmetic
average of the figures of their incomes, we multiply their incomes by
their numbers (weights), and if the total of these products is divided
by the total of weights, we shall get the weighted arithmetic average
of the series. This average' would be a better representative of the
series than the simple arithmetic average. Many writers like Secrist
and Kelly are of opinion, and Eightly too, that this is not a weighted
r
average. When values ar multiplied by their frequencies and the
sum of their product~ is dIvided by the total of their frequencies, it
is in fact a simple arithmetic average of the series. In cases of discrete
and continuous series we have already seen that arithmetic average
is c:Vculated by multiplying the values by their respective frequencies.
Sueli writers are of the opinion that weights should be determined
by some such evidence, which is not associated with the items them-
selves. But it is neither easy nor safe to associate weights to various
items arbitrarily, as in such cases weighted average may give misleading
conclusions. Weights have to be judiciously selected.
In fact difficulties in'the selection of p!Oper weights are so many.
that many writers are of opinion that it is better to have simple average
than to have weighted average of doubtful fairness. Thus Bowley says:
\
)
MEASURES OF CENTRAL TENlJENcY 151

"The discussion of the proper weights to be used... has occupied a


space in statistical literature out of all ~ons to its significance,
for it may be said that no great importance need be attached to the
special choice of weights;" a little further he observes: «80 we arrive
at a very important precept; in calculating averages give all care to
making the items free from bias and do not strain after exactness it
weighting."
But this is hardly a full statement of facts. Weights are used to
make items free from bias. Weighting is essential because items usually
constitute a heterogeneous group. Even Bowley admits that "paucity
of data may make the use of weights necessary or an attempt at fairness
of measurement rr.ay make weighting expedient". So we arrive at a
conclusion that even though a selection of proper weights is an extremely
difficult task, weighting of items is essential to make them free from
bias.
We shall now see how various averages can be weighted and
what special characteristics do such averages possess.
WEIGHTFJ) ARITHMETIC AVERAGE
In calculating the weighted arithmetic average each value of the
variable is multiplied by its weight and the products so obtained are
aggregated. This total is divided by the total of weights and the
resulting figure is the weighted arithmetic average.
Symbolically
a l = f!Jl w1+m2 w2+ms ws+···+mn Wn
wl+ w2+Wa",+ wn
where a 1 stands for the weIghted arithmetic average, m 1 , m 2 , etc.,
for the values of the variable and WI' W2' etc., for their respective
weights:
The formula can be written in short as:-
~ItIW
al = ~w

Where ~ mw stands for the sum of the products of the values and their
respective weights, and ~ w for the sum of the weights. The following
illustration would clarify the formula : -
Calculation of the weighted arithmetic average : direct method
Example 24. Calculate the weighted arithmetic average of the
prI~e of tea, from the following data assuming the quantities sold as
welghts:
Price per pound Quantities sold
(Rs.) (pounds)
2.25 14
2,:"0 11
2.75 9
3.00 6
152 FUNDAMENTALS OF STATISTICS

Solulion. Calculation of the weighted arithmetic average of the price


of lea : -
Price per pound Quantities llold Price X quantity
in Paisas (in pounds)
(fll)
225
)4) (II/W)
3,150
250 11 2,750
275 9 2,475
300 6 1,800

Total 1,050 40 10,175


Weighted arithmetic average or,
:L mw 10,175
al = TW = -;ru-
=\254.375 paisa per pound.
-2.54375 rupees per pound.
The simple arithmetic average of the prices would have been
T
1050 .
or 262.5 paIsa (2.625 rupees per pound).
Aclual and estimated weights. In the above example weights used
were actual. Many times actual weights are not available and estimated
weights have to be used or even if actual weights are available estimated
weights are used for the sake of simplicity in calculation. It will be
observed that if there is not much difference between the actual and
esth~ated weights the results obtained by usi1"l:g tl:te estimated weights
would not materially differ from the results obtained by using the actual
weights. In the above example if actual weights were not used and
if the estimated weights were'respectively 15, 10, 10 and 5 the products
of price and the weights would have been respectively 3,375, 2,500,
2,750 and 1,500 and the total would have been 10,125. The total (If the
weights is still, 40 and ,·the weighted arithmetic average would be
10,125/40.or 253'125 paisa or 2·531 rupees per pound. The difference
between the two averages is not much.
If the actual weights are used and if the values of the items arc.
slightly changed the average would be affected to a greater extent than
where values are correct and the weights estimated. Thus, if in the
above example the values are taken as 200, 275, 300 and 325 respec-
tively and the original weights are used the products of the values and
weights would be respectively 2,800, 1,025, 2,700 an<1 1,950. Their
total would be 1657 and the weighted arithmetic average would be
1?~;5 or 261.875 paisa or 2.618 rupees per pound. This' error is
much more than the one in the previous case. Thus it should always
be remembered that an error in weight is less serious than a corresponding
MEASURES OF CENTRAL.'iit-JDENCY 153

error in the size of ilcou. The reason for it is that the errors in weights
are u<ually unbiased and compensate each other while errors in the
values of items are generally biased ones. It is for this reason that
we had concluded above that attempt should be made to make the
items free from bias and we should not strain after exactness in
weights. According to King, "The items should be as exact as pos-
sible and the weights used should be approximately accurate ...... ".
Short-cut method of calculating weighted arithlnetic average
The method discussed above for the calcluation of weighted
arithmetic average is sometimes found to be very tedious particulatly
when the size of items is big. In such cases a short-cut method can
be used. In this method, first an average is assumed and the deviations
of each item from the assumed average are multiplied by the respective
weights of the items. The sum of these !:-'roducts is then divided by
the total of weights and added to the assumed average. The result in
figure is the actual weighted arithmetic average of the series.
.+
1: d'1JI
· 11y a ' =x -Yw
Symb 0 1lea
Where a' stands for the weighted arithmetic average x' for the
assumed average 1: d'1)) for the sum of the products of the deviations
and the respective weights of items, and 1:D' fOr the total of the weights.
The following example would illustrate the formula : -
Example 25
From the following table calculate weighted average price of tea.
Price per lb. Lbs. sold
Rs. p.
1 00 200
1 35 275
1 62 400
1 75 150
2 00 100
2 ~ ~
2 50 50
SOilltioll. Caiculaiioll oj the weighted averag~ price oj a lb. oj tea
Deviations from
Price in Lbs. sold assumed weighted Total devia-
paisas per lb. average (175) tions
(m) (w) (d') d'w
100 zuu -75 -15,000
135 275 -40 - 9,625
162 400 -13 -15,200
175 150 0 U
200 100 +25 + 2,500
225 75 +50 + 3,750
250 50 +75 + 3,750
~w-l,247 1 1:m=1250· I 1:d' UJ = -19,825
154 FUNDAMENTALS OF STATISTICS

Substituting the above data in the fprmula,

, '+ ~ (J'w)
a =X
~ (w)

where a' stands for the weighted average; x' for assumed weighted
ayerage: w, for weight and Jr for deviation from assumed weighted
average .
• We get, .a' = 175 + - ~~58~5 = 175- 15.86 = 159.14 paisa

Thus the weighted average price of a lb. of tea is 159.14 paisa


or 1.59 rupees.

When to use weighted arithmetic av\!rage

Weighted arithmetic averllge, as we have seen, removes tllt.. bias


of items and gives a fair measure of central tendency. Though in
many cases the simple arithmetic average and weighted arithmetic
average give similar results yet there are some special circum~tances in
which the weighted arithmetic average must invariably be used. In
these cases a weighted arithmetic average is a much better measure than
the simple arithmetic average. These cases are as follows : -

(a) When the importance oj all the items in a series is not equal. We
have seen that simple arithmetic average gives equal importance -to all
the items of a series. In many cases all the items may not be of equal
importance. If it is so, a simple arithmetic average would give us
misleading conclusions. The following example would clarify the
point : -

Example 26. An examination was held to decide the award of


a scholarship. The weights of various subjects were different. The
marks obtained by 3 candidates (out of 100 in each subject) are given
below:-

Subject Weight Marks A Marks B Marks C


Statistics 4 63 60 65
Mathematics 3 65 64 70
Economics 2 58 56 63
Hindi 1 70 80 52
MEASURES OF CENTRAL TENDE~ 155
Sollition. Calctllalion of the weighted and simple arithmetic averages.

~~
....
I~~ '"81.,)
Subject /weight Marks Marks Marks ...c: .. :e~ :ern
A B C .....bO~... bOlo<
..... c.s bO~
..... C<!

Statistics 4 63 60 65

e3252 ~a
240 260
e3 S
Mathematics 3 65 64 70 195 192 210
Economics 2 58 56 63 116 112 126
Hindi 1 70 80 52 70 I 80 52
~
---
Total 10 256 260 250 633 624 648

Simple arIthmetic average of marks


256 '260 250
A = ~= 64; B= -4 =65; c= -4 =62.5
Weighted arithmetic average ~f marks
633 'n 624 648
A = -ro = 63.3; D = 1:0 = 62.4; C =11) = 64.8
Thus on the basis of simple arithmetic average B should get t
scholarship but according to the weighted arithmetic average the sel
larship should be given to C and not to B. Since all the subjects
which the candidates appear for examination are not of equal importal
the result given by the weighted arithmetic average is more accur
and C should get the scholarship.
(b) When the classes of the same group contain widely varyingfrequencl
Ifwe have to compare, say, the salaries of teachers in two towns A anc
and if teachers at both these places are classified in similar groups anc
is found that the numbers in these groups widely differ, weighted ari!
metic average would give a much better id,~a about their salaries thar
simple- arithmetic average. The following example would clarity
this point:
Example 27. Compute the weighted means of the salaries of teacherll
in towns A and B. Compare them with weighted means.
Town~ TownB
-
Schools No. of Rate of No. of Rate of
teachers salary teachers salary
~----
R_s. Rs.
1. Municipal school 25 30 34 40
2. Govt. school 26 50 35 60
3. Aided school 20 43 12 25
4. Non-aided school 19 35 11 20
5. Night school 10 32 8 25
1(10 19() 100 170
Total I
156 FUNDAMENTALS OF STATISTICS

SO/Iltion. Complltation Of the weighted and ul1llleighted means of the


salaries of teachers in towns A and B.
own B
.
Town A I
Description of -----No. or- No. of
school Salary teachers Salary teacher!
. (",) (w) (wm) (f)} ) (w) (lIml)
f--'--
Rs.
r-----
Rs.
1- Municipal
school 30 25 750 40 34 1360
2. Govt. school 50 26 1300 60 35 2100
3. Aided school 43 20 860 25 12 >
3nO
4. No~aided
school 35 19 665 20 11 220
5. Night school 32 10 320 25 8 200
~ ~u'= "i1l!Il' ~~ 170 .l:.w- };WIJI .~,
100 3895 100 4180
Weighted lIJean
1'ownA IJ= ~(1JIIJJ~ = Rs. 3895 = Rs. 38.95
};(Ill) IOn
(wm) 4180
Town B Q = t(w) =Rs· 100 = Rs. 41/S-
Thus the weighted means of the salaries of teacp.ers in towns A
and Bare Rs. ~8.95 and Rs. 41.8 respectively.
Unweighted mean
};m 190
Town A a= --;z-=Rs.-:s =Rs. 38
"E.m 170
Town B a = - = Rs. - = Rs. 34
f1 5
Thus unweighted means of the salaries in town:.. -4 and Bare
Rs. 38 and Rs. 34 respectively.
Thus we see that on the basis of simple arithmetic average we
would have concluded that the salaries of teachers in town A are on
an average higher than the sllaries of teachers in town B. But the
weighted arithmetic average reveals an entirely opposite tendericy.
According to weighted arithmetic average the salaries of teachers in
town B are higher than tbe salaries of teachers in town A. The conclu-
sion arrived by the use of the weighted mean is the correct conclusion.
In cases like the above, where the variation in the number of items in
similar groups is of a high degree. the weighted arithmetic average
shoul? be invariably used, as it gives correct conclusions. In the above
example the weighted arithmetic average is the correct average
because if we multiply the weighted arithmetic average with the totaJ
number of teachers we get the amount of the total salary. Thus, 38.95
X 100 is equal to the total salary paid to the teachers in town A ,and
MEASURES OF CEN'I1RAL TENDENCY 157
similarly 41.8 multiplied by 100 gives us this figure for town B. If
simple arithmetic average of the salaries is multiplied by the number
of teachers it would not give the correct figure of the total salary paid.
(~) Where there i.r a ~bange either in the proportion of va/un of item.r
or in the proportion of their freQllflTcies. If in example 27 above the salaries
of the teachers are doubled in both the towns, the weighted arithmetic
average would be rupees 77.9 and rupees 83.6 respectively. These
averages are in the same ratio as the original averages of rupees 38.95
and Rs. 41.8. If the salaries remain the same but the number of teachers
in each category is doubled the weighted arithmetic averages would
remain the same, namely, Rs. 38.95 and Rs. 41.8. In these two cases
there is a change either in the values of the items or in the frequencies,
but the prbportions of either of them is not affected. If the salaries
of the teachers change in such a way that the Municipal school teachers
get 20% more and Government school teachers 15% mote, then the
original proportions of the values of items are disturbed. Similarly,
if the numbers of teachers are not doubled in all categories, but in some
categories they are doubled and in others trebled, the proportions of
frequencies also change. In cases like these, where the proportions of
either values or frequencies change, weighted arithmetic average should be
used. It .rhou/d be remembered that it is not the ab.rolule .rize of the weight
that matter.r: it i.r the relative size of the 1l eights that actually affects the average.
l

(d) When ratio.r, percentage.r or ratn are hfin.g averag.a. Suppose the
heights of four groups' of persons are measured and it i~ found that
Scy, of the persons in group A, 10% in group B, 8% in group C, and
4~Z in group D hav"C heights less than 50" and it is required to find
mit the percentage' of people in all the groups combin("d together
:'Whose heights would be less than 50'. Simple arithmetic average of
these percentages would give a misleading conclusion. The reason is
that we do not know the number of persons in each group. In such
cases we should presume certain numbers in each group, and then on
that basis calculate the weighted arithmetic average, which gives the
correct results. If suppose the number of persons in these groups
were respectively 50, 70, 75 and 55 the weighted arithmetic average
can be ca1culated by taking these frequencies as weights of the various
percentages.
The percentage ratio of people with heights less than 50' (in all
the groups combined together) wo~ld be : -
(5 X 50)+(10 X 70)+(8 X 75)+(4x 55)
----50+ 70+ 75+5-5- - -
or
250+700+600+220
250------
or
1770
250 or 7.08%
158 FUNDAMENTALS OF STATISTICS

(e) Whm it is desired 10 caleulate the average of series from the average
oj its component parts. We have already discussed in the section on
simple arithmetic average, how the means of two or more compo~ent
series can be combined in one. The method involves the calculation
of weighted arithmetic average of the different means, using the number
of items in each case, as the weights. Thus, if the average of a series
is 20 and the number of items in it is 10 and the average of another_ series
is 25 and the number of items iD.!it is 15, the combined average of the
two series would be equal to the weighted average of these two averages,
the weights being 10 and 15 respectively (the number of items in each
case). The weighted arithmetic average would be : -
(20x;0)+(25x15) 23
10+ 15 or
The simple arithmetic average 10f. the two averages would be
--2 or 22.5. ThIS'"IS an Inaccurate
20+ 25 >
It 15 mu1tIP
average, as. 1"f"' . I"Ie d
by the total frequency (now 25) it would not give the correct aggre-
gate. If, however, we multiply the weighted arithmetic average or 23,
by the total frequency or 25, the product would be 575 which is the
total of the aggregates of the two series (200+375).
Discriminate weigbting. We have seen that in many cases the SImple
arithmetic average and weighted arithmetic average differ considerably,
and the question that arises is, which of the two averages should be used
in such cases to represent the series? For this, it is necessary to study
the weights of the items in relation to their si2:e. Sometimes it would
be found that big items in a series are associated with big, weights and
small items with small weights. In such cases weigh~d arithmetic
average would be more than the simple arithmetic average .... Thus the
simple arithmetic average of natural numbers 1, 2, 3, 4, 5, 6, 7, 8, 9,
and 10 is 5.5, and if these numbers are associated with weight~ whose
respective values are 1,2,3,4,5,6, 7, 8, 9, and 10 the weighted arithmetic \
average would be 7.0.
If, on the other hand, big items are associated with small weights
and small items with big weights, the weighted arithmetic average
would be less than the simple arithmetic average. If the weights in
the above case were respectively 10, 9, 8, 7, 6, 5, 4, 3, 2, and 1 the
weighted arithmetic average would be 4.0 whereas the simple arith-
metic average is 5.5.
Chance weighting. If weights are indiscriminately associated with
values or, in other words, if big items are associated with both big and
small weights and similarly small items with both small and big weights,
the weighted average and the simple average would not materially d11f~r.
Thus if for the'values of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10 the weights
wett respectively 10, 3, 6, 4, 5, 8, 2, 1, 9, and 7 the weighted arith-
metic average would be 5.4 and the simple arithmetic average is 5.5.
MEASURES OF CENTRAL TENDENCY 159

We arrive at an important conclusion by the study of the above


illustrations. If weights are biased in one direction or the other, simpl,
arithmetic average and the weighted arithmetic average would significant!J differ
from each other. But if weights are left to chance (or if there is chance weighting)
and big and small weights are associated with both big and small values, the
weighted and the unweighted average would be almost equal.
Rational weighting. The question that arises now is, should weights
be purposively selected or should they be left to chance? If there is
purposive weighting, and if it is biased the weighted average would
give a fallacious conclusion and as -suc:h it should not be used. If
weights are left to chance there would not be a significant difference
between the weighted and the unweighted averages, and then there
is no point in weighting an average. Thus weights should be neither
biased nor should they be left to chance.
Weights should be rationaliSed, or we should use what ire called
"ratianalweights." But this does not solve the problem. We shall have
to decide what we mean by rational or logical weights.We shall
leave this point to be discussed in the chapter on Inde~ Numbers.
However, it should be mentioned here that rational weights would
differ in different types of enquiries and the nature and pUlJ>ose of
the enquiry would have to be carefully studied in deciding what would
be the rational or logical weights in a particular case.
If rational weights are used, and if there is a diff('rence between
the simple and weighted averages, invariably weighted average should
be used as it would represent a series much better than a simple arith-
metic average in such a case.
WEIGHTED GEOMETRIC MEAN
Just as arithmetic average can be weighted similarly it is possible
to calculate weighted geometric mean. Weighted geometric mean is the
nth root of the product of various valucs raised to the power of their respectivl
weights. Thus if ml' m2 and m 3 , etc., stand for the values of a variable,
WI' w2 , w3 , etc., for their respective weights, n for the sum 'of the weights.
Weighted geometric mean or

calculating geometric mean

The following example would illustrate the formula : -


Example 28. From the following data calculate the weighted
index number, using the geometric mean.
160 FUNDAMENTALS OF STATISTICS

Group Index Number Weight


Food 125 7
Clothing 133 5
Fuel and lignt 141 4
House rent 173 1
Miscellaneous 182 3

Solution: Calculation 'of tbe weighted il1dex ntlmber (Direct method)

Index Welght logs. of


Group Number ..- Index Nos. Weightx1og.
(m) (w)
Food -r2S-' 7 -~:0%9-- ·---f4.678r--·-
Clothing 133 5 2.1239 10.6195
Fuel and light 141 4 2.1492 8.5968
House rent 173 1 2.2380 2.2380
Miscellaneous 182 3 2.2601 6.7803
Lw,=20 42.9129 -

Weighted geometric mean

.g =anti-Iog. [ .... m X Ul) ]


L (lO!w.

. 1og. (42.9129)
=antl- 20

= anti-log. 2.1456
= 140 (to the nearest whole number)
The method discussed above is the direct method. We have
seen in the calculation of simple geometric mean that a short-cut method
can be used by assuming a geometric mean. Weighted -geometric
mean can also be calculated by the short-cut method. The deviations
of the logs. from the log. of the assumed geometric mean are multi-
plied with their respective weights, and the sum of the products is divided
by the total of the weights. The resulting figure is added to the log.
of the assumed geometric mean. The anti-log. of this figure wonld
give the actual weighted geometric mean of the series.

Symbolically g' =anti-log. (log. of assumed m'ean + _!.!_w.)


LllI

where d' stands for the deviations of the logs. from the log. of the
assumed mean.
MEASURES OF CENTRAL TENDENCY r61

Ex:a.mple 28 would be solved by this method as follows : -


Sborf-&lIf TIIefooa

Index LOgari- Deviations WeIght


Group Number Weight thms from log. X
mean 2.000 Deviation

Food 125 7 2.0969 .0962 .6783


Clothing 133 5 2.1239 .1239 .6195
Fuel and light
House rent
Miscellaneous
141
173
182
4
1
3
2.1492
2.2380
2.2601
.1492
.2380
.2601
I .5968
.2380
.7803
L.,=ZO D/J.,=2.9129

Weighted geometric mean


=anti-Iog. [ assumed log-mean + ~: ]
= anti-log. [ 2.000- 2.;~29] =anti-log. 2.14564

=140 (to the nearest whole number)


Thus the weighted index number is 140.
Weighted geometric mean, as we shall see later on, is very useful
for the calculation of index numbers. In most of the index numbers
calculated nowadays only weighted geometric mean is used. We shall
discuss the utility of weighted geometric mean particularly in tbe cal-
culation of index numbers in another chapter.
'WEIGHl'El> HAJD{ONIC MEAN

Weighted harmonic mean is calculated almost in the same manner


as weighted geometric mean. The reciprocals of items are multiplied
by their respective weights and the sum so obtained is divided by the
total of the weights. The reciprocal of the resulting figure is the weighted
harmonic mean of the series.
1
Symbolically h'... [ i:r] .
Where h' stands for the weighted harmonic mean, r for the reci-
procal of various items and II' for their respective weights.
The following illustration would clarify the formula.
Example 29. Calculate the weighted harmonic average of the
following items:
1tr
-16%
FUNDAMENTALS OF STATISTICS

Items Weight
t 5
.5 10
10.0 20
45.0 10
175.0 15
.ot 2
4.0 15
11.2 8
Soilltion. CD1/Jplltation of the wlighted harmonic mlan
Items ReCIprocals Weight WClghtXftecl.

1 1.0000
- 5
/
procals
5.0000
.5 2.0000 10 20.0000 )
10.0 .1000 20 2.0000 (
45.0 .0222 10 .2220
175.0 .0057 15 .0855
.01 100.0000 2 200.0000
4.0 .2500 15 3.7500
11.2 .0893 8 .7144
85 1231.1719

The weighted hannonic mean _ ~


1

1:w
. ___ , 1:.,,-
-ReClp~ };W
• cal 231.7719
... ReClpro 85 . --, 2727
.. RcClproQU.
-.3663
QueStions
1. What is meant by measures of central tendency? What are the characteristics
of a good measure of central tendency i'
2.. Define arithmetic average, geometric mean, median and mode. Which of
these is most roprosentative and why i' (M. Cam. Au'" 1945).
3· What is a statistical average? What are the desirable properties £01' an ave-
rage to possess? Which of the averages, you know, possess most of these proper_
ties? (M...4. Delbi,19H).
4. What are the algebraic properties of the arithmetic average?
. ,. Define weighted average. How does it ditIer from a simple average? Is a
weIghted average better than a simple one? Give reasons.
6. Discuss critically the use of weighted mean in statistics.
(B. CDIII. Cal&tll/", 1937).
,. What are the algebraic .properties of the geometric mean? Is it a better
average than median and mode i' If So how?
8. Compare ~nd contrast the relative merits and demerits of the variouS measur es
of central tendency which you know.
MEASURES OF CF.NTRAL TENDENCY 163

9. Write a note on the limitation of averages.


10. Is there any relationship between mean. median and mode in a moderatel,
asymmetrical frequency distribution? Discuss.
II. What is the p'utpose ,erved by an average? Discuss the special advan.'
tages attached, to the different averages and illustrate their use, (B. eM' .• Al,a. 19'.,)
u. On what considerations would you select an average for ltudying a parti.
cular phenomenon. In which cases geometri~ mean is better than orher averag•••
13. Comment on the following statements : -
(a) Median is more representative than mean beClluse it is less affected
by the values of extremes.
(.) True value of mode cannot be calculated exactlv in a continuous fre.
q,!ency distribution.
<,,) The harmonic mean of a series of fractions is tFie s'tme a~ the reciprocal
of the arithmetic mean of the series.
14. Statistics help collective agreement of wage adjustments. Whllt data are
required for the consideration of a revision in wage rates in a factory? Which ave-
rage will you utilise and why? (M. Co1ll •• "AJlalJabaa. 1943).
15. Compare the merits and demerits of the median and the mode. In which
of the followinp; problems would they be mosr useful :-
(a) Skill mcasurement (b) Size of holdin$s (&) Comparison of i.·
tell~nce Cd) Marks obtained in an examinlltl6n (e) Hejghts and wdghts
of students. (M. A .• Ava. 1949)'
16. uThe figl,re of 2.a children per adult female waS felt ttl be in some respects
absurd ann the Royal'Commission suggested that the middle classes be paid monel'
to increase the average to a rounder and more convenient number ,.. (Plnt&b) •.
Commenting on the above statement discuss the limita~ions of the arithmetic
average.
17. Given below are the marks obtained (out of aoo) by the IS .tudents in an
interview. held by a Public Service Commission. Calculate the simple arithmetic
average of the series.
Size of series
180 lao 108
160 IZ4 loa
l}l 115 101
146 IU 8S
143 tIO 68
- -;8-:-' The-follo~g dat;gi;;;;;~dist:ibution. ~bta~~'by-;~ss;ng ten-Pen.
nics loa4 times and recording the number of head, that appearetj on each toss. 1t'hat
is the average number of heads per toss ?

Number of heads Frequency Number of headl Frequency


o I 6 209
16 7 118
2 42: 8 53
la6 9 4
4 199 10 3
19. 5 Givcn the following air!quency distribution. calculat~ the Arithmetic Ave-
rage.
_'UNDAMENTALS OF STA~ISTICS

Monthly Wages Workers Monthly Wages Workers


Rs. Rs. Rs. Rs.
12.5-17 .5 2 37.5-42.5 4
17.5-22.5 22 42.5-47.5 6
22.5-27.5 19 47.5.-52.5 1
27.5-32.5 14 52.5-57.5 1
32.5-37.5 3
(M. Sc., AgriCillturt, Punjab, 1943).
20. The following table gives the r"ight of 350 men. ,C.alculate mean height of
the group.
Height in inches Number of Height in inches Number of
P!:rsons Persons
59 1 67 131
61 2 69 102
63 9 71 40
65 48 73 17
21.
'.
The frequency distribution of cost of production of Gur in rupees per maund
{or different holdings in two districts is given below. Find the average cost in each
district, and test whether there is any significant difference.
Cost in rupees District District
per maund A B
2- 3 9 1
3- 4 32 10
4- 5 37 3~
5- 6 21 23
6-- 7 13 21
7-- 8 7 14
8- 9 5 10
9--10 2 9
10--11 1 5
11--12 2 2
12--13 1 1

Total 130 130


(1. C. S., 39).
22. The frequency distribution below gives the cost of production of sugarcane
in different ~oldings. Obtain the Arithmetic Mean.
Cost Frequency Cost Frequency
2--6 1 18- 52
6- 9 22- 36
10-- 21 26- 19
14:.,_ 47 30-34 3
(Indian Audil a11d Accoullt; Serpice Exam., 1941).
23. The following table gives the population of males at different age-groups of
the U. K. and India at the time of the census of 1931.

Age-group U. K. Lakhs India Lakhs


0- 5 18 214
5-10 19 258
10-15 10 222
15-20 18 157
20-25 16 145
25-30 14 Hi\
30-40 27 257
40-50 25 184
50-60 19 120
Above 60 17 100
MEASURES OF CENTRA!.. TENDEN.CY 16;
Compare the average age 'of males in the two countries, and account for the differ·
ence if any. (B. Co",., AI/d. 1936, 41).
24. The following table gives the male population of Kanpur and Jaiput in 1931 :-
Age group Populatibn of males in Thousands
(years) Kanpur Jaipur
0-- 5 14 9
5--10 13 8
10--15 13 8
15--20 13 7
20--30 33 15
30-40 29 12
40--50 17 9
50-60 7 6
60-80 4 4
Calculate the average age of males at Kanpur and Jaipur separately, and a~unt
for the difference, if any. (B. Com. Allahabad, 1952).
25. Find the average marks of a student from the following table :_
Marks No. of Students
Below 10 25
20 40
.... 30
I,(}
50 95
60
75
60 125
" 70 190
80 240
26. Compute the arithmetic mean from the follOWing data :--
Salary in Rs. Frequency
Below 50 30
50-- 70 16
70--100 19
100--110 20
110--120 10
120 and over 5
27. Find the avenge wage of a labourer from the following table
Wage in No. of Wage in No. of
Rs. above labourers Rs. above labourers
o 650 40 300
10 500 SO 275
·20 425 60 250
30 375 70 100
28. The following table gives the number of persons with different incomes in the
U. S. A. during the year 1929.
Income in No. of Income in No. of
thousands persons thousands persons in
of dollars in lakhs of dollars lakhs
Under 1 13 10- 25 27
1- 2 90 25- 50 6
2-- 3 81 50- 100 2
3- 5 117 100--1000 2
5--10 66
Calculate the average income per heAd. (B. CIIIJI. L'Jtk., 1939),
166 FUNDAMENTALS OF STATISTICS

29. Make a frequency table having grades of wages with class intervals of two
Annas each from the following data of daily wages received by 30 labourers in a certain
factory and then compute the average daily wages paid to a labourer.
Daily wages in annas,
14, 16, 16, 14, 22, 13, 15, 24, 12, 23,
14, 20, 17, 21, 18, 18, 19, 20, 17, 16,
15, 11, 12, 21, 20, 17, 18, 19, 22, 23.
(B. A. Hons. PUlljab, 1945).
30. The following table gives the monthly average of automobile production
in the United States for the year 1926-1932 (unit 1,000 cars).
Year Production Year Production
1926 358.4 1950 279.7
1927 283.4 1931 199.1
1928 363.2 1932 114.2
1929 446.5
Calculate the average per cent of change per year.
31. The following is the table of the age of 30 adult. persons

Digits (Division of class intervals)

Years 1 2 3 4 5 6 8

20-29
°
2 1 2 2 1
7
1 1
9 Total
10
30-39 2 1 2 1 2 8

40-49 2 2 1 1 6
50-59 1 2 1 4

60-69 1. 1 2

Thus there' are two persons of 23 years, one of 57 years and so on.
Find out the mean of the series
(a) by using only totals of class intervals.
(b) by using the entire data
32. A candidate obtains the following percentages in an examination: Sanskrit
75 ; Mathematics 84 ; Economics 56 ; English 78 ; Politics 57 ; History 55 . Geo-
graphy 47. It is agreed to give double weight to mlrks in English, Mathemati~s and
Sanskrit. What is the Weighted and unweighted mean ?
33. Explain what is meant by weighted ave rag,?, and discuss the effect ofweighting..
Calculate (i) the unweighted mean of the pn.ces in column III and (ii) the mean
-obtained by weighting each price by the quantIty consumed.
I II III
Articles of food Quantity consumed Price in rupee per
maund
Flour 11.5 rods. 5.8
Ghee 5.6 mds. 58.4
Sugar .28 mds. 8.2
Potato .16 mds. 2.5
Oil .35 mds. 20.0
(M. A. Cal., 1937).
MEASURES OF CENTRAL' TENDENCY

34. The following table gives the number of employees and their monthly earnings
in two factories of a particular city :

A B
....
Description No. of Monthly No. of Monthly
of workmen employees earnings emplQyees earnings
Rs. Rs.
(0) 3 800 2- 750
(b) 20 145 10 150
(f) 15 50 15 60
(d) 25 30 25 50
(e) 80 35 40 40
(f) 250 20 120 20
Compare the weighted average.

35. Suppose that an automobile makes a 200 mile trip, covering the first 100
miles at the rate of 50 miles an hour and the second 100 miles at the rate of 40 miles
an hour. What is its average speed ?

36. A railway train runs for 30 minutes at a speed of 40 miles an hour and then,
because of repairs of the track runs for 10 minutes at a speed of 8 miles, an hour, after
which it resumes its previous speed and runs for 20 minutes except for a period of 2
minutes when it had to run over a bridge with a speed of 30 miles per hour. What
is its average speed ?
37. The following table indicates the increase in cost of living over July 1946,
for a working class family as at 1st January 1955, and the weights assigned to various
groupS.

Group Percentage increase Weights


?ood 29 7.5
Rent 54 2.0
Clothing 97.5 1.5
Fuel and lighting 75 1.0
Other items 75 0.5
Find out the weighted average of the increase in cost of living.
(B. Com. Allahabad, 1938).

38. The table shows the age distribution of married females according t9 sample
census of 1941 in the Baroda State.

Age Number of married Age Number of married


females females
0- 5 3 35-40 1292
5-10 31 40-45 963
10-15 410 45-50 762
15-20 1809 50-55 531
20-25 2446 55-60 317
25-30 2223 60-65 156
30-35 1723 65-70 59
70-75 37

Calculate the median age of married females and also the two quartiles.
(T. A. & A. S •• elr., EXalll., 1942)
FUNDAME.Nl'AT-,~ OF Sl',o\TISl'ICS

39. Calculate the values of the median and the two quartiles for the following :-

Limits f~r percentage recovery Factories in India


of sugar on cane (1935-36)
8.0-8.3 2
8.2- 5
8.4- 4
8.6- 11
8.8- 11
9.0- 11
9.2- 13
9.4- 10
9.6- 7
9.8- 6
10.0- 3
10.2- 1
10.4-10.6 1
85
(M. A. Flllifab Univmity, 1943).

40. Calculate the mean and median for the following distributioo.

Weight of boys in Number Weight of boys in Number


a certain class a certain class
100-104 4 140-144 500
105-109 14 145-149 430
110-114 60 150-154 260
115-119 138 155-159 128
120-124 206 160-164 66
125-129 298 165-169 28
130-134 380 170-174 12
135-139 450
2974
(Illdian AI/dit alld Areol/Ills Service Exam., 1938)

41. \he foll~wing ?ble !!ives t?e distribution of the male. and female popu!at:r:::!.
of a certam area jfi India. By finding the mean age, the median age, and the ilpper
and lower quartile ages, make comments on the age distribution of the tWG sexes
in the area :-

Age group Male Female


0-9 2756 2787
10-19 2124 2032
20-29 1977 1724
30-39 1481 1485
40-49 1021 1022
50-59 610 1579
60-<l9 245 269
70-79 67 78
80-89 16 20
90-99 3 4
Total 10,000 10.000
(T. C. S., 38).
42. Calculate the average, median, and upper and lower quartile ages in the
following table :-
MEASURES OF CENTRAL Tm:l'DENCY

Age-group Population in thousand


(1881) (1931)
0- 4 3,520 3,280
5- 9 ~160 3~00
10-19 5,340 7.200
20-29 4,560 6,640
30-39 3,420 5,980
40-49 2,660 5,240
50-59 1,900 3,780
60-69 1,320 2,440
70-79 600 1,220
80 and over 120 320
(M. A., Agra, 1940).
43. Determine the quartiles and the median for the following table :-
Income No. of6~ersons
Below Rs. 30
Rs. 30 and below Rs. 40 167
Rs.40 50 207
Rs. 50 60 65
Rs. 60 70 58
Rs.70 .. 80 27
Rs. 80 and over 10
Total 603
(Bombay, 1942).
44. The following table classifies the she-buffaloes of India in 1940 according to
the yield of milk per day. Calculate from the data the mean and the median yield
of milk per she-buffalo (and its co-efficient of variation) :-
Yield per day in lbs. No. of she-buffaloes in
thousands
Upto 1 114
Above 1 to 2 2,005
2 to 3 7,706
3 to 4' 4,590
4 to 5 2,080
5 to 6 240
6 to 7 3,580
Total
20,315
(P. C. S., 43).
45. Find the median and the quartiles
Amount of wages Number of workers
receiving such. rate of wages
Not exceeding 8 shillings 85
Over 8 sh. but not exceeding 10 sh. 65
Over 10 sh. .. 12 sh. 59
Over 12 sh. .. .. 14 sh. SO
46. Amend the following table, and locate the median from the amended table.
Also measure the magnitude of the median so located :-
Size Frequency Size Frequency
10-15 10 30-35 28
15-17.5 15 3.5--40 30
17.5-20 17 45 ana onwards 40
20-30 25
(B. Com. Allahabad, 1942).
170 FuNDAMENTALS OF STATISTICS

47. The following table gives the marks obtained by 65 students in Statistics in
:ertain examination :-
Examination marks Number of students
More than 70% 7
60% 18
50% 40
40% 40
30% 63
20% 65
Calculate the median of the above series.
48. Find out the median of the following series
Wages No. of labourers
Rs.
60-70 5
50-60 10
40-50 20
30--40 5
20-30 3
49. The following is the age distribution of candidates appearing at the Matr
culation and Intermediate Arts examinations of the Patna University in 1937.
_Age in yeats 12- 13- 14- 15 16- 17 18 19 20- 21 22 Tota
Matriculation 5 48 189 303 522 980 981 794 515 474 X 481
Intermediate X X X 5 45 87 127 150 155 127 175 87
Compare the median and modal ages of the Matriculation candidates with thos
of 1. A. candidates. (M. A. Pallia, 1940
50. The following table shows the frequency with which profits are made. Wha
is the Mode ? I
Frequency
Exceedi.ng Rs. 3,000 and not exceeding 4,000 83
4,000 5,000 27
.. 5,000 6,000 25
" 6,000 7,000 50
.. 7,000 8,000 75
" 8,000 9,000 38
" 9,000 " 10,000 18
"
51. Find the modal wage group from the following table :
Wages in Rupees No. of labourers
Above 30 520
40 470
50 399
60 210
70 105
" 80 45
90 7
52. Find out the median and the mode for the following table
No. of days absent No. of students
Less than 5 29
10 224
15 465
20 582
25 634
" 30 644
" " 35 650
653
" " 40 655
45
"
, -, 1.
MEASURES OF CENTRAL TENDENCY , '.

53. Find the median and mode from the following table :
Class Frequency Class
0- 3 Frequency
4 18-20 24
3- 6 8 20-24
6-10 14
10 24-25 16
10-12 14 25-28
12-15 11
16 28-30 10
15-18 20 30-36 6
54. Find the modal wage from the following data :
Weekly Wage No. of wage-eamers
Sh. d. Sh. d.
12 6 to 17 6 4
17 6 22 6 44
22 6 27 6 38
27 6 32 6 28
32 6 .. 37 6 6
37 6 42 6 8
42 6 47 6 12
47 6 .. 52 6 2
52 6 .. 57 6 2
(B. Com., Rajplliana, 1949)
55 1'lnd out the mode of the following 'Seri<;s ' -
Size 0 It n::. Frequency Size of item Frequency
0-9.99 10 40-49.99 11
10-19.99 14 50-59.99 13
20-29.99 16 60-69.99 17
30-39.99 14 70-79.99 13
56. Calculate the geometric mean of the following figures : -
5, 10, 192, 14,374, 20,498, 1,20,674, 15,491

57. Compute the weighted geometric average of relative prices of the following
'~mmodities for the year 1939 (Base year 1938-price 100) :_

Weight
Commodity Relative Price (value produced in 1938)
Corn 128.8 1,385
Cotton 62.4 819
Hay 117.7 842
Wheat 99.0 561
Oats 130.9 408
Potatoes 143.5 194
Sugar 125.6 142
Badey 150.2 100
Tobacco 101.1 103
Rye 116.2 25
Rice 117.5 17
Oil seeds 78.7 29
How does it differ from the unweighted geometric mean, and why ?
(B. Com., Alld. 1943)

~, 58. The following table gives index numbers for various items entering the cost
)£ liVing. Find an index of the cost of living by computing a weighted average of
;!lese items. The weights to be used are also given in Ithe table : -
FUNDAMENTALS OF STATISTICS

Table
Items Index Weight
1. Clothing 77.3 13
2. Food 74.5 43
3. Fuel and light 85.8 6
4. Housing 64.6 18
5. Sundries 92.5 20
59. Compute the geometric mean of the following series
Marks No. of students
o-tO 5
10-20 7
20-30 15
30-40 25
40-50 8
60
60. The annual incomes 'of fifteen families are given below in rupees : -
80, 2500, 90, 1200, 1450, 7200, 120, 1060, 150, 480, 360, 96, 200, 520 and 60.
Calculate the Harmonic Mean.
61. The following table gives-(o) the total number of persons possessing hold-
ing~ of different sizes and (b) the total area of land comprised in holdings of different
sizes in U. P. during the year ending on 30th June, 1945 :
Total number Total area in
Size of holdings in acres of hersons in thousands
t ousands of acres
Not exceeding .5 2,643 925

" .. 1
2
.. ....
Exceeding .5 but not 1
2
3
1,696
2,205
1,430
1,556
3,361
3,373
3 "
" .. 4 "
5 "
"
"
4
5
6
992
703
515
3,458
3,150
2,817
" 6 " " 7 378 2,446
" 7 " "
"
"
8 "
9
..
" ,"
"
"
8
9
10
283
216
171
2,112
1,830
1,617
" 10 12 206 2,264
" 12 " " 14 138 1,776
" 14 " " 16 96 1,424
" 16 " " 18 68 1,252
" 18 " " 20 51 972
" 20 " " 25 70 1,570
" over 25" " 115 5,310
Grand Total 12,276 41,113
(i) Calculate the average size of holdings in the U. P.
(ji) Assuming the minimum size of an economic holding to be 10 acres-
(1) Calculate the percentage of the area under uneconomic holdings in 1945 in
the U. P.
(2) Calculate the percentage of persons having uneconomic holdings in the
U. P. in 1945. (P. C. S. 1951)
62. (0) Define a 'weighted mean.'
If several sets of observations are combined into a single set show that the means
of the combined set is the weighted means of the several sets.
(b) The number of asthma sufferers whose first attacks came at various ages is
given in the follOWing table. CaIculate the mean age at the first attack by any method.
MEASURES OF CENTRAL TF.NDENCY

T1\.aLE
Age at
first 0-5 5-10 10-15 15-20 20-25 25-3030-35 35-40 40-45 45-50 50-55 55-60 60-65
attack
Number
of cases 298 113 64 61 70 81 I 77 64 53 40 35 24 20

(I. A. S. 1955)
63_ Fi~d the mean, mode, standard deviation and co-efficient of skewness for
the followtng . -
Year under 10, 20, 30, 40, 50, 60.
No. of persons 15, 32, 51, 78, 97, 109.
(P. C. S. 1952)
64. What are the desiderata for a satisfactory average? Point out the special
characteristics of the arithmetic mean, the median a~d the geometric mean.
Explain the step-deviation method for finding out the arithmetic mean of a
frequency distribution. Derive the useful formula and apply it to find the arithmetic
mean of the distribution.
Variate 5, 10, 15, 20, 25, 30, 35, 40, 45, 50.
frequency 20, 43, 75, 67, 72, 45, 39, 9, 8, 6.
~ vi (P. C. s. 1954)
6? The following table gives the monthly income of 24 families in a certain
locallty :-
Serial No. of Monthly income Serial No. ot Monthly income
the family
1
in Rupees
60
the family
13
in Rupees
96
I
2 400 14 98
3 86 15 104
4 95 16 75
5 100 17 80
6 150 18 94
7 110 19 100
8 74 20 75
9 90 21 600
10 92 22 82
11 280 23 200
12 180 24 84
Calculate the arithmetiC average, the median and the mode of the above incomes.
Which average would represent the above series the best? Give reasons.
(P. C. S. 1955),
/66. Figures concerning the number of deatbs in two towns in a particular year are
given below : -
Town A Town 15
~ge-group No. of persons Deaths No. of person s Deaths
In years. living living
0-10 500 100 12,000 4,800
10-20 3,000 150 6,000 360

. 20-30
30-40
over -40
7,000
10,000
19,500
200
300
750
9,000
25,000
48,000
180
250
576
Total 40,000 1500 1,00,000 6,166
Compare the health conditions in both towns.
(P. C. S. 1955)
174 FUNnAMRNTAIS 0.1' STATISTICS

67. You are given the following statistics of population and unemployment-in ;-
(0)" Your country as a whole for a standardised age distribution.
(b) The local administrative area in which you live.
Calculate (i) the standardized unemployment rate in the country as a whole, (ii)
the standardised ratC of unemployment in the local Mea and (iii) the crude rate of
unemployment in the local area.
Age (Years)
16--;3lr 30---4S- 45-60 60-75 Total
Standard population
Age constitution 250 350 300 100 1,000
Unemployment rate
per cent 5 8 12 15 -
Local population
Age constitution 300 300 350 50 1,000
Unemployment rate
per cent 4 9 12 20 -
(P. C. S. 1956).
68. Fifty items sold in Department A of the Comer Store had a mean price of 30
rupees. Seventy-five items sold in Department B had a mean price of 20 rupees. The
mean price of commodities sold in Departments A and B was 24 rupees. Is it right?
69. If Xl and,)(2 are two positive values of a variate, prove that their geometric.
mean is equal to the geometric mean of their arithmetic and h:trmonic means. .
70. (0) An examination candidate's percentages are' ; English, 73; French, 82;
Mathematics, 57; Science, 62; History, 60; Find the Candidate's weighted mean if
weights of 4, 3, 3, 1, 1 respectively are allotted to the subjects.\
(b) The average percentages for the same examination were 57, 52, 48, 55, 50
for the above subjects respectively. Find the weighted mean for the whole examination.
71. "The inherent inability of the human mind to grasp in its entirety a large body
of numerical data compels us to seek relatively few constants that will adequately des-
cribe the data."-R. A. Fisher.
Comment.
72. Find the Average ages of men And WOmen blood donors from the following
data : -
Age, years 10-19 20-29 30-39 40-49 50-59 60-69
Frequency, Men 3016 6894 9229 5714 3575 1492
Women 7845 16,008 13,107 9685 6374 2137
Age years 70-79 80-89 90 and over
Frequency, Men 170 9 1
Women 173 9
73. A candidate obtains the following percentages in an examination : Latin,
75; Mathematics, 84; French, 56; English, 78 ; Science, 57 ; History, S4 ; Geo-
grapby 47. It is agreed to give double weight to the marks in English, Mathematics
and Latin. What is his weighted mean ?
74. Tbe frequency distributions of real income in rupees of the employees of a
big industrial concern, in two different periods. are as given below
Frequency
Income in Rs. Period t Period 2
0-50 90 200
50-100 150 400
100-150 100 120
150-200 80 100
200-250 70 150
over 250 10 30

500 1,000
!'dEASUllES OF CENl'RAL TENDENCY
175
The total income of 10 employees In the frequency class '':Iver 250' in Period 1 is
Rs. 3,000 and that of 30 employees in Period 2 is Rs. 18,000.
(a) Compute the mean and median incomes for the two periods.
(b) Write a very brief note on the .relative economic conditions of the employees
in the two periods, supporting your statements by analysis of the given
data, if, necessary.
(I) Every employee belonging to the top 25 per cent of the earners is required to
pay 1 per cent of his income to a worker's relief fund. Estimate the in-
crease in contributions to this fund from Period 1 to Period 2; (1. A. S.1958)

75. The following are the monthly salaries in rupees of 30_ ~mployees of a firm:-
139, 12.6, II4, 100, 88, 62. 77, 99, 10 3, 144. 148, 63. 69. 148, 132., II8. 142.
16, 12.3, 104,95, 80,85, 106, 12.3, 133, 140, 134, 108,12.9.
The firm gave bonus of Rs. 10, 15, 20, 25, 30 and 35 for individuals in the res-
pective salary groups-Exceeding 60 but not exceeding 75, exceeding 75 but not ex-
ceeding 90 and so on upto exceeding 135 but not exceeding 150. Find out the average
bonus paid per employee. (B. Com., B. H. U.)
76. For a certain group of 'Saree' weavers of Banaras, the median and quartile
earnings per week are Rs. 44.3. Rs. 43.0 and Rs. 45.9 respectively. The earnings for
the group range between Rs. 40 and Rs. 50. Ten percent of the group earn under
Rs. 42 per week, 13 percent earn Rs. 47 and over and 6 percent Rs. 48 and over. put
these data into the form of a frequency distribution and obtain an estimate of the mean
wage. (P. C. S., 19~6).
77. From a frequency.distribu.tion of marks in AcCOunts of 100 students, mean
was found to be 35. Later It was discovered that the marks 35 were mis-read as 25
Find the concet mean.
78. From the following data. find the missing frequency.
No. of Tablets. 4 - 8 - I2. - 16 - 2.0 - 2.4 - 2.8 - 32. - 36 - 40
No. of Persons cured II 13 16 14 9 17 6 4
The average number of tablets given to cure fever was 20.

79. Calculate the Median, Quartiles, 6th Deciles and 70th Percentile from the
following data : -
Marks less than 80 70 60 SO 40 30 2.0 10
N? of Students. 100 90 80 60 52 2.0 13
(B. Com., Raj., 1951).
80. (a) From the data given below, find the mode:

Age 2.002 5 25-3 0 30-35 55-40 40-45 45-S o 50-H SS-6O


No. of person 70 80 180 15 0 12.0 70 10

(b) If the mode and the median of a moderately asymmetrical series are 16
inches and 20.2 inches respectively, compute the most probable median.
(D. C01t1., Delbi, 1960).

SI. Recast the following cum_ulative table inlO the form of ~n "cdinary
frequency distribution and determme the value of Mode by usmg formula
Mean.Mode.= ~(Mean-Mcdian). -'~ -
176 FUNDAMENTAlS OF STl'TISTlCS

No. of days absent No. of students No. of days ab- No. of students
sent
- - - _ - __ - - - ... -_- - -_ ---_ ....
Less than 5 29 Less than 30
[0 224 H
[5 46 5 40
20 582 45
25 634
(B. Com., Luckno1Jl, 1957)
82. A taxicab drives from a plain-town to a hill-station, 60'miles distant, at a
mileage rate of 10 miles per gallon of petrol and on the return trip at 15 miles per gallon.
Find the harmonic mean rate of mileage per gallon. Verify that this is the proper
average in this particular case.
83. An aeroplane flies around a square the sides of which measure too m~les
each. The aeroplane covers at a speed of 100 miles per hour the first side, at 200 mdes
per hour the second Side, at 300 miles per hour the third side and at 400 m.p.h. the
fourth side. What is the average speed of the aeroplane around the square ?
8-4. A train moves first to miles at the rate of 10 m.p.h. next 20 miles at the rate
of 30 m.p.h .• and then due to repairs in the track another 5 miles at the speed of 5
miles per hour. It covers the last 15 miles at the rate of 10 miles an hour. Find the
average speed of the train per hour.
85. The mean wage of 50 labourers working in a factoil is Rs. 38. The mean
wage of 30 labourers working in the morning shift is Rs. 40. Find the mean wage
of remaining 20 labourers working in Evening shift.
86. The teachers of statistics reported mean examination marks of 37.5, 41 and
42 in their classes which consisted of 32, 2.5 and 17 students respectively. Determine
the mean marks for all the classes taken together.
87. The following table gives the distribution of the average weekly wages of
100workers in a factory. Calculate (i) Average weekly total wage bill of these
workers; (ii) The weekly wage of a worker whose wage is greater than that of
75% workers.
Weekly wages 16-20 ZI-2l 26-3 0 31-35 36-40 41-45 46-50
No. of workers 7 12 It 8
Weekly wages 56 -60
No. of workers
88. The monthly incomes of 8 families in rupees in certain locality are given
below. Calculate the Mean, the Geometric mean and Harmonic Mean, and confirm
that the -relationship a > g > h holds true.

Family A IB C D E I- F I G I H
Income : (Rs. J 70 r 10 500 1 75 8 1 25 0 1 8 I 42
(Sagar, B. Com., II,1965)
Calculate 3.4 and 5 yearly moving Average from the following data : -
Years 19P 152153 I 54 155 I 56 I 57 I 58 I 591 60 I 61 16 2 1 6 3 164 16,
Value 18 I 20 I 22 I 25 I 30 I 37 I 38 I 38 I 40 I 43 I 45 I 4 6 r 4 8 I 49 I F
MEASURES OF CENTRAL TENDENCY 177

90. The age-distribution of the members of a certain children'S club is as


follows: Age on last birthday
(in yrs.) 6 71 8
Frequency
4
5 r151
-9 +-1-l....;8~.!.-3..!-5--'-1 9110 III 112
-4-1-+--3-"1---',1C-1-5-~1-7-~

There is a member A such that there are twice as mlIny members older than /I.
a8 there arc members younger than /I.. Estimllte his age (in years upto two decimals.)
(M. A • .&0., Delhi. 1963).
91. The arithmetic mean. the mode and the meclian of a group of 75 observations
were calculated to be 17, H, 19 respectively. It was later discovered that one ob-
servation was wrongly read as 43 instead of the correct value 53. Examine to what
extent the calculated values of the three averages will be affected by the discovery
of this e r r o r . . (M.A .• E&O •• Delbi. 1963)'
:;1. If the mode and the median of a moderately asymmetrical series is 166 and
15.6 respectively. what would be its most probable median? (8. CDm., AgrtJ, 1960).
93. Under what conditions weighted average is 0) equal to simple a~e, (ii)
greater than simple av~tage and (iii) less than simple avcrage. lllustrate your answef
with the help of examples.
• 94. (a) A train starts from rest and travels successive quarters of miles at ave-
ragc speed of IlZ, x6, :t4 and 48 miles per hour. The average speed over the whole
mile is 19.7. m.p.h. and not 15 m.p.h.
(b) The price of a commodity increased by 5 percent from 1954 to 1955.
by 8 percent from 1955 to 1956 and by 77 percent from 1956 to 19n. The llvcragc
increase from 1954 to 1957 is quoted as 7.6 percent and not ~o percent.
Explain the two statements as you would to a layman and verify the arith.
metic mean. (M. COlli. Agrll. 1962)
95. If arithmetic mean of two cumbers is 20 and their geometric mc:atl IS 16. line
the harmonic mc:atl.
Measures of Dispersion 10
Need and meafliflg. In the preceding chapters we have already
discussed why it is necessary to tabulate and classify statistical series
and to condense them into a single figure called average. The average
as we have already seen has its own limitations and even an ideal average
can represent a series only" as best as a single figure can". No doubt
averages have a very great utility in statistical analysis but they fail
to reveal the entire story of a phenomenon. There may be a dozen
series whose averages may be identical but which may differ from each
other in a hundred ways. Obviously in such cases further statistical
analysis of the data is necessary so that these differences between various
series may also be studied and accounted for. If this is done statistical
analysis would be more accurate and we shall be more confident of our
conclusions.
Suppose there are three series of nine items each as follows :

Series A Series B Series C


40 ,6 I
40 37 9
40 ,8 20
40 39 30
40 40 40
40 41 50
40 42 60
40 43 70
40 44 80
To~al 360 360 360
Mean 40 40 40

In the first series the mean is 40 and the value of all the items
is identic~l. The items are not at all scattered, and the mean ,fully-
discloses the cha.racteristics of this distribution. However, in the
second case though the mean is 40 yet all the items_of the series have
different values. But the items are not very much scattered as the
minimum value of the series is 36 and the maximum is 44. In this
case also mean is a good representative of the series. Here mean
cannot replace each item yet the difference between the mean and
other items is not very significant. In the third series also, the mean
is 40 and the values of different items are' also different, but here the
values are very widely scattered and the mean is 40 times of the
MEASURES OF nYSPERSION 179

smallest value of the series and half of the maximum value. Obviously
the average dves not satisfactorily represent the individual items in
this group. In order to have a correct analysis of these three series,
it is essential that we study something more than their averages because
averages are identical and yet the series widely differ from each other in
their formation. The scatter in the first case is nil, in the second case
it. varies within a small range, while in the third case the values ragge
between a very big: span and they are widely scattered. ItTs'Cvldent from
the above, that a study of the extent of the scatter round an average should
also be studied to throw more light on the composition of a series/. The
name gillen to this scatter is dispersion.
Dispersion in a general sense. Dispersion, thus, refers to the variability
in the size of items. It indicates that the size of items in a series is not
uniform. The value of various items differs from each othe1. If thus
variation is substanti~l dispersion is said to be considerable and if the
variation is litt~e dispersion is insignificant. This is rather a general sense
in which this terni is used. If there is a series in which the scatter of the
value is much, say, from 100 to 1000, this series would be said to have
more dispersion than the one in which the values range only from 100
to 2.00.
Vispersion in a precise sense. The term dispersion not only gives a
g~r;. ral impression about the variability of a series, but also a precise
me ."ure of this variation. Usually in a precise study of dispersion, the
deviations of size of items from a measure of central tendency are found
out and then these deviations are averaged, to give a single figure re-
presenting the dispersion of the series. This figure can be compared
with similar figures representing other series. It goes without saying
that such comparisons would give a better about the formation of
series than a mere ('omparison of their averages.

Averages uJ second order. S:nce for a precise study of dispersion we


have to average deviations of the values of the various items, from their
average, various measures of dispersion are called Averages of the Second
Order. We have seen earlier that mean, median, mode, geometric mean
and harmonic mean, etc., are all averages of the first order. Since in the
calculation of measures of disBersion we average values derived by the
use of the averages of the first order, these measures are called averages
of the second order.

Absolute and relative dispersion. Dispersion or variation can be ex-


pressed either in term~ of the original units of a series or as an abstract
figure like a ratio percentage. If we calculate dispersion of a series
relating to the ineome of a group of persons in absolute figures, it will
have to be expressed in the unit in which the original data are, say rupees.
Thus we can say that the income of a group of persons is Rs. 120 per
month and the dispersion is Rs. 20. This is called Absolute Dispersion.
If, on the other hand, dispersion is measured as a percentage- or ratio
of the average it is called Re/c;;':"p. Dispersion. It is not expressed in the
180 FUNDAM.BNTALS OJ' STATiSTICS
\ I

unit of the original data. In the above case the average income would be
referred to .. S Rs. 12.0 per month and the rdative dispersion ~ or '167
120
or 16.7%. In a comparison of the variability of two or more series, it is
the relativt: dispersion that has to be taken into account, as the absolute
dispersion may be etroneous or unfit for comparison if the series are
originally expressed in different units.
Measures of dispersion
The following measures of dispersion are in common use--
I. Range
2. Inter-Quartile-Range
3. Semi-Inter-Quartile-Range or Quartile Deviation
4. Average Deviation or Mean Deviation
5. Standard Deviation or Root-Mean-Square Deviation taken
from the mean.
We shall discuss them in turn.
RANGE
Range is the simplest possible measure of dispersion. It is the
difference between th~ vallies oj. the.!..?f1!e1Jle.i1!J!;,LojEJ..e.r.iM:- Thus if in a series
rerat1t'ig to the weight measurements of a group of students the lightest
student has a weight of 90 pounds and the heaviest of 240 pounds the
value of range would be 150 pounds. This figure indicates the variability
in the weights of students. The distance on the scale measuring 150
pounds would include the weight of every student. If the data are given
in the shape of continuous frequency distribution, range is the difference
between the lower limit of the smallest class and the upper limit of the
biggest class.
Range as calculated aboveis an absolute measure of dispersion which
is unfit for purposes of comparison, if the distributions are in different
units. For example the range of the weights of students cannot be
compared with the rang(. of their height measurements as the range of
weights would be in pounds and that of heights in inches. Sometimes,
for purposes of comparison, a relative measure of range is calculated.
If range is divided by the sum of the extreme items, the resulting figure
is called "The Ratio of the Range" or "The Coefficient of the Scalier."
Merits, demerits and uses of range
A good measure of dispersiort should possess the same qualit:es
which were laid down in the Ilj.st chapter for a good measure of central
tendency. A good measure of dispersion should be rigidly defined,
easily calculated, readily understood and further, should be capable of
algebraic- treatment and should not be affected much by the fluctuations
of sampling.
The only merits possessed by range are, that it can be easily calculated
':::.. and readily understood. As against these, there are many drawbacks from
which it suffers. The most important point against range is that it is
HEASyRES OP DISPElt STON 181

afleoted vety greatly by tluctuations of sampling. Its value is never


stable and it varies from sample to sample. In a class where normally
the heights of students range from 60 to 72 inches, if a dwarf whose
height is 36 is admitted, the range would shoot up from 12 inches to
d

36 inches. Thus a single variation in the value of an extreme item


affects the value of the range. Range is not based on all the observations
of the series. If the heights of the shortest and tallest students remain
unchanged and if the heights of all other students are changed, range
would remain unaffected. Thus range does not take into account the
composition of a series or the distribution of items within the extremes.
The. range of a symmetrical and an asymmetrical distribution can be
identical. Two such distributions C:ln never have the same dispersion.
In this way we find tha.t range is a very unsatisfactory measure of
dispersion and should be used with extreme caution.
However, range as a measure of dispersion is commonly used in
some fields-particularly those where the variation is not much. III
quality control of manufactured products, range is used to study thc
variation in the quality of the units manufactured. Even with the most
modern mechanical equipment there may be a small, almost insigni-
ficant, difference in the different units of a commodity manufactured.
Thus, if a company is manufacturing bottles of a particular type, there
may be a sligh t variation in the size or shape of the bottles manufactured.
In such cases a range is usually determined, and all the units which fall
within these limits are passed as all right while those which fall outside
the limits are rejected. Variations in money rates and rates of exchange
etc .• are also studied with tange. However, it should never be for~
gotten that range is a very rough measure of dispersion and is entirely
unsuitable fot: pr~cise and accurate studies.
INTER-QUARTILE RANGF

Just as in case of range the difference of extreme items is found,


similarly, if the difference in the values of two quartiles is calculated,
it would give us what is called the Inter-Quartile Range. 1nter-quartil e
range is also a measure of dispersion. It has an adv-antage over range,
inasmuch as, it is not affected by the valu'es of the extreme items. In
fact 50 %of the values of a variable are between the two quartiles and as
such the inter-quartile range gives a fair measure of variability. How-
ever, the inter-quartile range suffers from the same defects from which
range suffers. It is also affected by fluctuations of sampling and is not
based on all the observations of a series. It is a measure of location,
and its value is not very stable. The inclusion or eXclusion of a single
item may sometimes considerably affect its value. It docs not take into
account the composition of a series. It is not capable of further algebraic
manipulation. But inter-quartile range is easy to calculate and is re~d'ily
understood.
Sometimes percentile range is also calculated. Since range i~ aittct-
ed by the values of extreme items, and since inter-quartile r.o.l!e leaves
182 FUNDAMENTALS OF STATISTICS

500/0 of the values, a percentile range which takes into account, say, the
90th and the 10th percentiles would give a better measure of dispersion
than either of these two. If the difference of the 90th and the 10th
percentiles is found out it will be called 10-90 percentile range. Un-
lik:e range it has the advantage of not being affected by the values of the
extreme items of a series and it also does not leave aside 50% of the
values as the intet-q uartile range does. A 10-90 percentile range would
leave only 20% of the values at the extremes. It, however, suffers from
most of those defects from which range and inter-quartIle range suffer.

SEMI-INTER-QUARTlLE RANGE

Semi-intet-quartile range as the name suggests is the midpoint


of the inter-quartile-range. In other words, it is one half of the diffe-
rence between the third quartile and the first quartile. Symbolically,

Semi-inter-quartile range
or

Quartile deviation

Where Q'A and Ql stand for the upper and lower qua{tiles respectively.
In a symmetrical series median lies half way on the scale from Ql
to Qa. If, therefore, the value of the quartile deviation is added to the
lower quartile or subtracted from the upper quartile, in a symmetrical
series, the resulting figure would be the value of the median. But
generally series are not symmetrical and in a moderately asymmetrical
s~ries Ql+ quartile ~eviation or Q3- quartile deviation, would not give
tne value of the median. There would be a difference between the two
figures and the greater the difference, the greater would be the extent of
departure from normality. .
Quartile deviation is an absolute measure of dispersion. If it
is divided by.the average value of the two quartiles, a relative measure
of dispersion IS obtained. It is called the Coefficient of Quartile Deviation.
/2a-Ql
Symbolically 2
Coefficient of a quartile deviation = Q2+ Q'8 =Qa- Ql
2 Qa+Ql
The following example would clarify the procedure of the calcu-
lation of the quartile deviation and its coelfficient : -
Example 1. Calculate the Semi-Inter-Quartile Range and its
coefficient of the marks of 59 students in Economics given below.
MEASURES OF DISPER.SION 183

Marks-grou No. of Stuclents


0-10 4
10-20 8
20-30 11
30-40 15
40-50 12
50-60 6
60-7J 3
Soilition. Computation oj the S:eoli-Inter- Quartile Rang'
Marks-group No. of students Cumulative frequency
0-10 4 4
10-20 8 12
20-30 11 23
30-40 15 38
40-50 12 50
50-60 6 56
150-70 3 59
Quartile or
59
Q 1 =-the marks of the "4 i. e. 14.75th student which lie in the
20-30 marks-group.
By interpolation, =20+ 30 1~0:_(14'75-12)=22.5 marks
Third Quartile or
Qa=the marks of the 3(~9) i. 8., 44'2Sth student which lie in tile
40-50 marks group. By interpolation,

... 40 + 50-40
1 2 (44'25-38)=45'2 marks
Semi-inter-quartile range =
Q a- 2 Q 1 =
\/44'2-22'5
2 = 10.85 marks.

Co-efficient of the S.. I. range·= Qa - Ql = 44'2 -22'~ =.324


Qs+ Ql 44. 2 +22·5
Median, or

M=the marks of the ( ~) i.e., 29·Sth student which lie in


the ( 30-40) group
o (40-30) ( 29'5-23)=34'33 marks.
-3 + 15
1n the above example the quartile deviation is 10·85 marks ..ltt
these marlliS are added to the lower quartile the resulting figure wo~d
1184 FUNDAMENTALS OF STATISTICS

be 33.35 and if they are subtracted from the upper quartile it will again be
33.35. The actual value of the median is 34.33. It shows that the series
is not perfectly normal though the department from normality is not much.
It, however, reveals that the dispersion of items on the two sides of the
median is almost equal.
Merits and drawbacks of quartile deviation
The quartile deviation possesses the merits of simple calculation and
easy understandability. It is commonly understood and its calculation
.does not involve any mathematical intricacies. These are the points in
\favour of quartile deviation but there are a large number of points which
go against it. Quartile deviation is neither based on all the observations
of the data, nor is it capable of further algebraic treatment. It is affected
to a cousiderable extent by the fluctuations of sampling. A change in
the value of a single item may in certain cases affect its value considerably.
Thus quartile deviation is not a very good measure of dispersion, parti-
cularly for series in which the variation is considerable. However, for
rough studies, '{uartile deviation may give an approximate idea of the
extent. of variabllity in a series.

AVERAGE DEVIATION OR MEAN DE<Vl~TION

Method of limits and method of avwaging deviations. All the above


mentioned measures of dispersion suffer from one seriC\us defect, i.e.,
they are calculated by taking into account only two values of a series-
either the extreme values as in the case of range, or the values of quartiles
as in case of quartile deviation. They ignore the other values of the
series. They are not based on all the observations of the series. This
method of studying dispersion (by location of limits), is also called the
"method of limits." Range, inte1"*Cluartile range and quartile deviation are
all such measures in which dispersion is studied by the method of limits.
This is a serious drawback because in such calculations the composition
of the series is en~irely ignored. It is possible that the range or the quartile
deviation of two series is identical and their composition very much
dissimilar. It is, therefore, always better to have such a measure of dis-
persion which takes into account all the observations of a series and is
calculated in relation to a central value. Range and quartile deviations,
as we have seen, are not calculated in relation to any average. If the
variations of items were calculated from an average, such a measure of
dispersion would throw light on the formation of the series and the dis-
persal of items round a central value. This method of calculating dis-
persion is called the method of averaging deviations. As the name suggests,
in this method, the deviations of items from a measure of central tendency
are averaged to study the dispersion of the series.
Mean deviation is such a measure of dispersion. "Mean deviatio'n of a
reries is the arithmetic aPirage oj the deviations of variolls items from a measllre
of lentral tendmfJ (either lIIean median or mode).' Theoretically, deviations
r
MEASURES OF DISPERSION 185
can be taken fr n any of the three averages mentioned kPove but in actual
practice mean ueviation ~s calculat~d either .fr?m mean. or from ~ed~an.
Mode is usually not consIdered, as ItS value is Indeterminate, and It gives
erroneous conclusions. Between mean and median the latter is supposed
to be better than the former, because the sum of the deviations from the
median is less than the sum of the deviations from the mean. Therefore,
the value of the mean deviation from median, is always less than the value
calculated from mean. In aggregating deviations the algebraic signs
are not taken into account. It should be remembered that if algebraic
signs were taken into account the sum of the deviations from the mean
should be zero and from die median would be nearly zero if the series is
moderately asymmetrical. Since the purpose of a measure of dispersion
is to study the variation of items from a central value. it does not matter
in the least. if the plus and minus signs are ignored. However, leaving
of plus and minus signs renders mean deviation incapable of further
algebraic treatment. Mean deviation is also known as the first moment
of dispersion. Symbolically
(i) 8 ~ '};d
n
Where 8 stands for the mean deviation from mean, d for the deviations
from the mean, and n for the number of items.
(ii) Sm = '};tIm
n
Where 8m stands for the mean deviation from median, dm f01 the devia-
tions from the median, and n for the number of items.
(iii) 8z = '};dZ
n
Where 8Z stands for the mean deviation from mode, d.z for deviations
from the mode, and n for the number of items.
Mean deviation or first moment of dispersion. as calculated above.
:woul~ be an abs.o~ute measure of dispersion, expressed in the same units
In whIch the orIgInal data are. In order to transform it into a relative
me.asure, it is divided by the average from which it has been calculated.
It is then known as the Mean coefficient of diS_1Jersion.
Thus, mean coefficient of dispersion from mean median and mode
would be respective y : '
~,
8m and 8t
a M Z
Calculation of mean deviation and its coefficient in a series of
individual observations
As has been said earlier. mean deviation should be calculated either ,,-
from arithmetic average or median, preferably from the latter. In the
illustrations gi.ven below we shall show, ho,\\," mean deviation can be calcu.
lated by the dIrect and short-cut methods from mean as well as median.
1~ FlJ.NtD.A.MJ!.NTALS. OF STATISTICS

n the direct method, as we have seen above, the mp.an deviation would
be calculated by totalling the deviations from the mean or median (plus
and minus ignored) and dividing this total by the nllJIlber of items.
In the short-cut method mean or median is assumed and the total of
the "allies of itWiS below the actual mean or median and above it are found
out. The former is subtracted from the latter and divided by the number
of items. The resulting figure is the required mean deviation.
Symbolically
81n= _:_(JIIY-1IIx)'
~ n ,
Where 3m stands for the mean deviation from median, my for
the total of the values above the actual median, and mx for the values
below it, and n for the number of items.
1
(;,1 8=-;;- (ay-ax)
Where 8 stands for the mean devia~ion from mean, '!Y stands for the
total of the values above the actual arithmetic average and ax for values
below it. The following example would illustrate these formulae : -
Example 2. The following are the marks' ~btained by a batch of 9
students in a certain test : - I
Serial No. Marks Serial No. Marks
(out of 100) (out of 100)
1 68 5 54
2 49 6 38
3 32 7 59
4 21 8 66
9 41
Calculate the mean deviation of the series.
Soilltion. Direct method. Calculation of mean deviation of the series
of marks of 9 students (arranged in ascending order of magnitude).
I- 1.)eVlattons frcJlllmedian (4~)
Students Marks (+and- signs ignored)
(m) (dm)
1 21 28
2 32 17
3 38 11
4 ~ 8
5 ~ 0
6 54 5
7 59 10
8 66 17
9 68 19
r.dm = 115
MEASURES ()il DISPERSION 187

n-\=1.
Median=value of - 2 - ltems

= 49 marks

.. ~ "l'..dm
Mean devlatlon or um = -
n
Where "l'..dm represents the summation of the deviations from the
median; and n, the number of items
115
Sm-=-9- marks =12.8 marks

Shorl-tut method. Marks arranged in ascending order of magnitude


Marks
(m)
21
32
38
41
49
54
59
66
68
Sum of items above median (with values less than median)
=(21+32+38+41)=132 (mx)
Sum of items below median (with values more than median)
=(54+59+66+68)=247 (Ply)

Mean Deviation = _!_cmy-m.x)


n
1 1
= - (247-132) - - X 115 .. 12'8 marks
9 9
Example 3. Calculate the mean deviation (from the arithmetic
average) and its co-efficient for the following prices of a Government
security.
1 1 3 5 3 1 6 10 8 1
Rs. 100 2' 4' 8' 8' 4' tj' 16' 16 10' if
188 FUNJ>AlDINTALS op STATISTICS

So/tllion
Calculation of Mean Deviation ftom the arithmetic average,
Prices Rs, Deviations from arithmetic average
(Rs, 100,425)
(+and-signs ignored)
(111) (d)
100.500 .075
100.250 .17S
100,375 ,050
100.625 ,200
100.750 ,325
100,125 ,300
100,375- ,050
100.625 ,200
100.500 .075
100.125 .300
:Em -1004,250 Ed ... 1.750
. hm .
Arit etic average-
Em
n
== 1004.25
1.0
R 100 4.25
- s. '1

Mean Deviation from the Arith. Av .• or 8= ~


"
Re. 1.~~ =Re .. 175
B
Co-efficient of dispersion fJ;om I:he Arith. Average--
a
.175 001
1C0,425 ==. 7 approx.
Thus the mean 4eviation of the given prices of Govt"rnment security
is Rs .• 175 and its co-efficient is .0017 appro:r..
Shorf-CIIf mtfhoa. Prices arranged in ascending order
Prices
100.125
100.125
100.250
100.375
100.375
100.500
100.500
100.625
100.625
100.750
1004.250
JolEAstmES 01' DISPERSION 189

.. 1004.250
ArIthmetIc average or a- 10 "'" 100.425
Number of items smaller than arithmetic
average or fiX = 5 and their total
or ax -- 501.250
Number of items bigger than arithmetic
average or '!1-5 and their total
or ~ ... 503.000
1
Mean deviation ,,-(aY-flyxa)+(nxxa-ax)
1
n(ay -IlK)
1 1.750
= 10 (503.000-501.250) - """'10
- .175-rupees ..

Calculation of mean deviation in discrete series


in a discrete series the deviations of items from median or mean
are multiplied with their respective frequencies and the total of these
products is divided by the number of items. The resulting figure is
the required mean deviation.

Symbolically (i) 8111-= l:...r: 11I


(iI) 8- ~
The following illustration would clarify the formula :_
ExturpI,4. Find mean deviation of the distribution given below:_
No. of accidents Persons having said
number of accidents
o 15
1 16
2 ~1
S 10
4 17
5 8
6 4
7 2
8 t
9 2
10 2
11 o
12 2
Total 100
i90 FUNDAMPNTALS OP STATISTICS

Solution. Crzkulation of the mean deviation


No. of Persons having said Deviation from Total
accidents number of accidents Median (2) Deviations
(+and - signs ignored)
em) (f) (dm) (Jdm)
0 15 2 30
1 16 1 16
2 21 0 0
3 10- 1 10
4 17 2 34
5 8 3 24
6 4 4 16
7 2 5 10
8 1 6 6
9 2 7 14
10 2 8 16
11 0 9 0
12 2 10 20
n -= 100
- "1:.fdm=
196
:
The value of medIan =2
"
M ean d eVlation ~ co: --n-
or om "1:.fdm == 196
100 == 196i acc.1'dents.
\ There is no need to calculate mean deviation by a short-cut m~thod
as the median value in a discrete series is usually around number and
there i~ no difficulty in the calculation of mean deviation.
Example 5. Calculate the mean deviation (from mean) of·..rfit!"
following series: -
Marks No. of students
5 5
15 8
25 15
35 16
45 6
Solution. Calt'ulation of mean deviation
- Marks
Step Devia-
tion from No. of
Deviation
frolll actual
as. avo students average
(25) J27)
(m) (d') (f) (fd') (d) Cfd)
5 -2 5 -10 22 110
15 -1 8 - 8 12 96
25 0 15 0 2 30
35 +1 16 +16 8 128
45 +2 6 +12 18 108
I "1:.f "",50 "1:. fd'=+10 L (d=472
MEASURBS OF DISPERSION 191

Arithmetic average or a =25-'- (!g X410) =27

. .
Mean d eVlatlon = l:.fd
I:.f =
472
50 mark s. =.
9 44 marks.

While calculating the mean deviation from the mean it may be


found more convenient to use a short-cut method, by assuming an arith-
metic average. The process of calculating mean deviation by the short-
cut method involves the following steps : -
(1) Deviations of items are taken from an assumed mean and
multiplied by their respective frequencies .and the products so' obtained
are totalJed.
(2) Number of items less than the actual arithmetic average are
multiplied by the difference between' the actual and the assumed mean.
(3) Similarly, the number of items more than the arithmetic average
are multiplied by the difference between actual and the assumed mean.
• (4) The latter (No.3) is deducted from the former (No.2) and the
balance is added to the sum of the products of deviations from the assumed
mean and their frequencies (No.1).
(5) The resulting figure is divided by the number of items and it is
the value of the mean deviation.

Example 5 is solved below by this short-rUle method : -


Dev. from -
Marks Step devia- as. avo 25
tions from (Number of (fd') (fd)
as. avo (25) Students (+ &-signs (with+&-
(m) (d') (1) ignored) signs)
5 -2 5 10
- 10 --
15 -1 8 8 -8
,
25 0 15 0 0
35 +1 16 16 +16
45 +2
}:. =50
6
v
II 12
}:. a'=46
+12
}:. a=+10

Actual arithmetic average ... 25 +( ~~ X 10 )=27

Total devijl,tions from assumed'average of 25


==46x 10=460

Adjustments
Number of items less than the !lctual arithmetic average (27)=28
'192 FUNDAMENTALS OF STATISTICS

Number of items ~re ~_h!ln the actual arithmetic average (27)~


Difference between actual and assumed average ~2.
Total deviations from actual average or ("I:.fd)
= 460+(28 X 2)-(22 X 2)
=460456-44=472
··
M ean deVlatlon =--
I;fd =-_
472
- 11 50
-9.44 marks
Calculation of mean deviation in continuous series
The calculation of mean deviation in continuous series is done by
the same proc~dure by which it is done"in discrete series. In the short-
cut method also the same procedure is followed, provided the assumed
mean or median is in the same class-interval in which the actual mean or
median is. ~f the assumed average is in a different class interval, further
adjustments are necessary. The following examples would illustrate
these procedures : -
BxalJlpk 6. Calculate the mean deviation (from median) from the
following data :_
C..lass intervaJ Frequency Class interval Frequency
1-3 6 9--11 21
3-5 53 11-13 I 26
5--7 85 13--15 4
7-9 56 15-17 4
SO/Illion. Dirett and short-tilt metbods. The median of the above serie~
is 6.5.

Class Deviations Dev. X Dev. from Dev. X


interval Mid- from actual Fre- Prequen- assumed frequen-
points median(6.5) quency cy median(6) cy
(tim) Cf} Udm) (d"m) (Jdm)
1--3 2 4.5 6 27.0- 4 24
~-5 4 2.5 -5-3 132.5 2 106
5-7 6 .5 85 42.5 0 0
7-9 8 1.5 56 84.0 I 2 I 112
9·-11 10 3.5 21 73.5 I 4 ! 84
12
11-13
13--15
15-17
14
16
5.5
7.5
9.5
I,
I
16
4
4
88.0
30.0
38.0
6
8
10
96
32
40
Total I 1245- 515.5 494

Diretl Mlthod.
MEASUR.ES OF, DISPERSION 193 -

Shorl-Ctlt Melhoi.Total of deviations from assumed median=494


No. of items with values less than the actual median (6.5)
(6+53+85) = 144
0=

No. of items with values more than the actual,median


"" (56+21+16+4+4)-=101
Ditference betweep actual and assumed median:;."", (6.5-6) =.5
Total deviation from actual median (when actual and assumed
medians are 'in the same class interval)
=494+(144x .5) (101 x .5)
=494+72--50.5 -515.5
"
Mean devlatlon= 515.5
245 =
21
.

Example 7. Calculate the mean deviation (from mean) from the


following .data : -
Di1ference in age between husband and wife in a particular com+
munity:-
Difference in years frequency Difference in years Frequency
3-5 449 20-25 109
5-10 705 25-30 52
10-15 507 30-35 16
15--20 281 35-40 4
Direcl alld shori-clil m;thods
Calculation of the Mean Deviation
Differ- DeVIatIOn trom Total devla- DevIatIOn
ence in Mid- Fre-the as. avo (12.5) tions from from the Total
years values quency the assWll- avo (10.5) devia-
dx/i ed average +'&-signs tions
dx where ignored
(.) (INII) ef) i-5 (fdx) (d) (jd)
0::5 ~ 449-- -=to- -2 --":':_S98 8 3:592
5-10 7.5 705 -5 -, -1' -705 3 2,115
10-15 12.5 507 0 -0 0 2 1,014
15-20 17.5 281 +5 +1 +281 7 1,967
20-25 22.5 109 +10 +2 +218 12 1,308
25-30 27.5 52 +15 +3 +156 17 884
30-35 32.5 16 +20 +4 +64 22 352
35-40 37.51'_~ +25 +5 +20 27 108
II...
12j 123
I ,
'f.JlJx-
-864 .~.
T.1L-,1
1;:340
13
194 F'tlJo."DAHJlNTAL$ OF STA'l'lSTICS

. hmetlc
Atit . average=a + 'SfdX • 25+('-864
~XI=l. 2f23 x 5 ) = 10.5

" 'Sfd 11340


Mean DeviatIon - ~n- - 2123 = 5.3

Short~cllf Method. Total deviations from assumed average C±signs


ignored)-(2342IX S}=11710
Ntunbetuf items smalrer than actual arithmetic al'erage (10.5)==1154
Number of items bigger than actual arithmetIC average (10.5)=969
Difference between actual and Assumed average."",_2
Total deviations from actual arithmetic average (where actual and·
assumed average are in the same class-interva])

= 11710+(1154x -2)-{969 X-"""2)


-11710-2308+1938=11340
.. 11340 53
~ean deVlatlon - 2123 ... '.
When the assumed median or mean is not in the same class interval
in which the actual median or mean is. some further adjustments at~
necessary. The following illustration would clarify them : -
I

'&tampll 8. The foUowing table gives the age distrih"tioos of


students admitted to a college in the year 1955:-

Calculate the mean deviation and its coefficient.

Age Number of students admitted


in the year
15- o
l6- 1
17- 3
18- 8
19- 12
20- 14
21- 14
22_ 5
23- 2
24- 3
25- 1
26- o
27~ 1

64
MEASURES OF DISPERSION 195

SO/Iltion. Complltation of the Mean Deviation

Mid-value
I
= Deviation 'fotal Deviation
from the deviations from the Total
I
Age-group of the Frequen- as. avo from the average devia-
group cy (19.5) as. avo (20.7) I tions
(+ &-signs!from the
ignored) , average
(m) I(m.v.) (f) (dx) I (/dx) Cd) Cfd)
15-16 15.5 0 -4 0 5.2 0
16-17
!
, 16.5 '1 -3 -3 4.2 4.2
17-18 I 17.5 3 -2 -6 :'\.2 9.6
18-19 18.5 8 -1 -8 i 2.2 17.6
19-20 19.5 12 0 0 1.2 14.4
I
20-21
21-22
22-23
I 20.5
21.5
22.5
14
14
5
+1
+2
+3
+14
+28
+15
.2
.8
1.8
2.8
11.2
9.0
23-24 2~.5 2 +4 + 8 2.8 5.6
2~25 24.5 3 +5 +15 3.8 11.4
25-26 25.5 1 +6 +6 4.8 4.8
26-27 26.5 0 +7 0 5.8 0
• 27-28 27.5 1 +8 + 8 6.8 6.8
----.
,I n-64 l:.fdr = . j}:.jJ='97. 4
I +77
.h
A nt .
metlcaverage or,a=x
+. l:.fdx -=19.5+64"
---n 77
=1.0.7 year$.

Mean deviation or 8 = l:.fd = 97.4 = 1.52 years approx.


a 64
Mean coefficient of dispersion = ~ _ 20.7
1.52
~ .07.
a
Short-cllt Method. Total deviation from assim1ed average ± signs
ignored c= 111.
Note. Where actual and assumed averages are in different class
'intervals a special adjustment is necessary. In such cases the frequency
of ;he class in which the actual me!).n lies is treated separately. It is
m~ltiplied by the difference of the deviations of the mid-value from the
actual ancl assumed averages. Th~ product so obtained is subtracted
fro~ the total deviation from the assumed mean.
Thus, number of items smaller than the actual arithmetic average
(20:7)=24 (frequency of mean class being ignored)
Number of items bigger than actual arithmetic averag<; (.20.7}
=26 .
Frequency of the mean class = 14.
Difference between actual and assumed averages -= 1.2
196 FUNDAMENTALS OF STATISTICS

Difference of deviations of mid-value of mean class from the actual


and assumed ,averages-(2O-.5-20.7}-(20.5-19.5)
=-.2-1
"".8±signs ignored
Total deviations from actual mean
= 111+(24X 1.2)-(26 X 1.2)-(14 X .8)
== 111 +~8.8-31.2-11.2-=97.4
971 == 1..5""..
Mean d eVlatlon.... ~
o'

Mean co-efficient of disperslOn= ~~~~ ~ .07


Characteristics and use of mean deviation
(1) Mean deviation is rigidly defined and its value is precise and
definite. However, since mean deviation can be calculated from any·
avetage, it is likely that in some investigatio~s the Olean has been used
as base, while in others either median or mode has been used as such.
If itis so, the c<omparison of the mean deviations would give inaccurate
results. TheJ;efore, it should always be ascertained whether mean
deviation has been calculated by using the mean) _median or mode-
(2) The calculation of mean deviation is not very difficult. No
doubt range and quartile deviation have an advant;age over mean
deviation in. this respect, still the calculatiop- of mean deviation cannot
be said to be a complicated or difficult job.
. (3) Mean deviation is readily understood. It is the average of
the deviations from a measure of central tendency.
(4) It is based on all the observations, and unlike quartile devia.-
tion or range, it cannot be calculated in the absence of a single figure.
(5) It is not affected very much by the values of the extreme. items.
We shall see later on, how the standard deviation is affected by the values'
of extremes much more than the mean deviation.
(6) Mean deviation ignores the algebraic:S1gns of the deviations,
and as such, it is not capable of further mathematical treatment.
(7) Mea'n deviation is not a very accurate meaSure of ~ispersion
particUlarly when it IS calculated from the mode because mode can
be unrepresentative, and even when it is calculated from median, it
cannot be fully relied upon, because if the degree of variability in a
series is high, median is not a representative, average. .If ~ean devia-
tion is calculated from the arithmetic average, it is not very scientific
because, the sum of the deviation from the mean (plus minus signs
ignored) is more than the sum of the deviations from the median.
Therefore, \ in many cases, mean deviation may give unsatisfactory
results. In fact, this measure of dispersion is not in common use ~nd
generally dispersion is studied through standard deviation, which, as
MEASURES OF DISPERSION 197
we shall see, has many properties no!: possessea by any othe.r measure
of dispersion. HO'":1."ever, mean deviation has found favour with eco-
nomists and businessmen due to simplicity in calculation and on account
of the fact that ~tandarci deviation gives greater importance to the
deviations of the extreme values.
STANDARD DEVIATION

Meaning. The technique of the calculation of mean deviation is


mathematically illogical as in its calculation the algebraic signs are
ignored. This drawback is removed in the calculation of standard
deviation. One of the easiest ways of doing away with algebraic signs
is to square the figures and this process is adopted in the calculation
of standard deviation. Standard dfJIJiation is tbe sqllare root of tbe aritbmetic
average of tbe sqllares of tbe deviations measured from tbe mean. Thus in' the
calculation of standard deviation, first the arithmetic average is cal-
culated and the deviations of various items from the arithmetic average
are squared. Thus squared deviations are totalled and the sum is
divided by the number of items. The square root of the resulting
figure is the standard deviation of the series. The standard deviation
is conventionally represented by the Greek letter Sigma CT.

Symbolically

a = jL:~
Where CT stands for the standard deviation, ~d2 for the sum of
the squares of the deviations measured from the arithmetic average
and n for the number of items.
Difference between' root mean os-quare deviation and standard deviation.
Various terms like· Mean Error, Mean Square Error_and Error of Mean
Sqllare are used to denote the value of standard deviation. We shall
be using the term standard deviation only as it is most" popularly used.
Some writers use the term root-mean-square-deviation to denote the stan-
dard deviation. This is technically wrong, bec~use the standarddeviation
is only one of the many values that the root-meatJ..-square.-deviation
Cll n take. Root-mean-sqllare-de.uiatiofl is tbe sqllare root of tbe arithmetic
average of the sqllares of deviations measllred from a'!Y arbitrary vallie. If the
deviations are measured from the arithmetic average there is no difference
between root-mean-square:...deviation and the standard deviation;
in' other -Words, standard deviation is the root-mean-square-deviation
0,
mea'sured from the arithmetic average. If deviations are not measured
from the arithmetic average but from some other value we can find out
the value of the standard deviation from the value of the root-square-
deviation. In fact the short-cut method of calCUlating the standard
deviation is based on the relationship between standard deviation and
root-mean-square deviation. We ~ball discuss this point a little later.
198 FUNDAMENTALS OF STATISTICS

Calcula ion of standard deviation in a series of individual observa-


tions
Dir"t Method No. 1. In a series of individual observations the
deviation of each item from the arithmetic average is found out, and
is squared. The total of these squared deviations is divided by the
number of items. This figure is called the Suond MomenT abollt the M.ean.
The square root of it is the required standard deviation. The follow-
ing example would clarify this procedure ! -
Example 9. Calclllalion of standard "viation of the height.
(Direct Method 1)

Height in inches 15eviations from Deviations"'""Squ.ared


mean 63")
________ <!!!;-)_ _ I (d) cdr;
6(} -3 9
60 -.3 9
61 -2 4
62 -1 1
63 0 0
63 0 0
63 0 0
64 +1 1
64 +1 1
_~-_-7~0.",..__1_. ____+.. :._7____-..-;--r::-...........
4.,-9_ __
::Em =630 >:'d"-74
.____---=~~~----~--------------------~--------------
. runetic
A rit . averag or a .,. -n-
::Em = 10
630 - 63·

Standard Deviation or 0'- ,J~


"74 --
.....
J -10 -=...;7.4-=2.72"

Dirl&t Method 2. Standard deviation can be ca.lculated by anoth,er


method also. In this method the squares of the values of items (not of
deviations) are totalled and from this figure the square of the total of
the valueli divided, by th~ number of items, is subtracted. The resulting
figJ.1l'e is again divided by the number of items and its square root is the
required standard deviation.
Symbolically u- jr.",r."--'' ---(=;:m'' ' ' ' ' ')t''"','-'
Where m stands for the values of the variable and n for the number
of items. Example No. 9 above would be solved in the following
ma.nner by this method : -
-MEASURES OF nISp:ERstON

Dirut Method 2. Size of item


(111) (~2)
60 3,600
60 3,600
61 3,721
62 3,844
63 3,969
63 3,969
63 3,969
64 4,096
64 4,096
70 4,900
~III ~630 ~1111 =39.764

- ..
-sfancTard DeViation or (1-
J~mI
- nl:(mj'/n -

'39,764-(630)1/10 ,j'W'
-=,J 10 - 10
= 2.72"
Short-cut metbod. Standard deviation can also be calculated by
a short-cut method. Here the deviations from an assumed average
are calcula.ted and squared. Their sum. is divided .by the number of
items, or in other words, the arithmetic average of the square of devia-
tion!> from the assumed average is found out. From this ligure the
square of the arithmetic average of the deviations from the assUllled
mean is subtracted. The square root of the resulting figure is the
standard deviation.

~2_ (~dX):P
Symbolically 0=
j "
--
n
Where dx 'stands for the deviation from the assumed mean.
Example No.9 would be solved by this method.as follows:-
·Proof
nand $= J'DJx
J ~~I 2
Let 0= - , , - and c=(a-x)

2 l
(Ill = '2.fd and $2 = '1:.fdx
n "
(111=$II_CI

s.
As would always be greater than (II, the root mean-square devia-
tion from mean would always be le~s than the root mean square
deviatiofJ from any other point.
-200 PUNDAM:e;N'l'ALS 01':- STATISTICS

Short-fill M,thaa
si:te Of items Deviacl"Ons from Square o f -
assumed mean 62 deviations
-------60- -
(dx) ....
, - - - ' (~-);.,.----
60
-2' I 4
61
62
-2
-1
o
I
I
1
0
63 1 \ 1
63 1 1
63 I 1 1
64 2 4
64 2 4
70 8 64
Total +10

-0 - J};:' -(~ y
a -J 84 -(~)I-V84-1
10 10 .
- v 7.4 =2-.72'
This formula ,can be 'Written ih the following way~ also

(-I)
Then dX-d-f
(tlx2)_ (tI+ &)'= tl s+ 2&J+ &~
L(dxY'=;l:dl +:t2ed+el
but ~d=:O
:. Z(dx)B=l::dl+ncl
I "i.(dX)1 = };dl _+ el
n n
l::d' l::(dx)1
-= ---e·
n "
-= (a-x)!
n n
MEASURES OF ruSPE'RSION 2.01.

(ii)

Where dx stands for the deviations from the assumed average,


a for the actual arithmetic average and x for the assumed average.
Thus in the above example: 13=63 and x-=62. Substituting the
values we get

(i) a
J 84-10(63-62)1 -
10
J 84-10. '-74
10 v - ·
2.72'

.(ii) a ...
J~ Iu -(63-62)' -J 8! -
10
1

~'\I~2,72

The standard deviation is an absolute measure of dispersion. For


purposes of comparison a rel~tiye measure of ,disper.sion is calculate~
hy dividing the standard devIatIOn by the arIthmetIc average. It IS
called "standard coefficient of dispersion" or "coefficient of standard deviation".

Thus, Standard coefficient of dispersion ... ~


a

In the above exampl~ its value would be 2.~; or .04


Calculation of standard deviation in a discrete series
In discrete series the square of the deviations from the arithmetic
average are multiplied by the respective fre.quencies of these items.
The total of these products is divided by the total of the frequencies
and the square root of this figure is the standard deviation of the series.
Symbolically

o =J-~~2
The following illustration would clarify this procedure : -
Example 10, Calculate the standard deviation from the following
data : -
Size of item Frequency Size of itetp f're'iaency
6 3 10 ..i
7 6 11 5
8 9 12 4
Q
13
202 l'UND~N1.'ALS 'OF STATISTICS

Soilltian. Direct,Metbod. Calculauon of Standard Deviation

- Size of Fre- Size X Deviations Deviations ,.Frequency X

-
items

(m)
6
I quency

,(f)
3
I
Frequency from the

(mf)
-.'---~-'
average (9)
(d)
squared
up
(JI)
-3
square .of
deviations
(fdt )
9 Z7
1- 6 42 -'2 4 24
8 9 72 ~1 I 1 9
9 13 117 0 I 0 0
10 8 80 +:1 1 8
11 , 5 55 +2 4' 20
12 4 48 +3 9 36
11==48 :Emf=432 -:Ejdi -124
- ~

. hm . 1:.mf 432
A rlt etlc average = - - = __ =9
n 48
Standard DeviatlOn

., j };~dt ... j-r;: = 1.6

In discrete series also, the standard deviation can bb calculated by~


short-cut method. The deviations from an assumed me,an are fust.squared
and multipliea by the:: l.espective' frequenCies of items. '1 he e products
are totalled and divided 'by the total of the frequencies. From th!s
figure the square of the difference betw'een the actual and asstUned
average is subtracted. The square root of the resulting figure is the
re::quiren standard deviation.
Symbolically

''i:.ftlxl
u=
J -n--(a-x)1I

or <:; = J~f:XS -"(a-~


or u= J ~~dxlI (~Jf: )" _

The following examples would illustrate'these formulae '!_

Example 11. The following table gives the number of finished


artic~es turned out per day by different num~er. of workers i1;l. a £actory.
Find the mean, value and the standard devlatlOo of the dally output
of finished articles.
MEASURES. OF DISPERSION 203 '

Number of Number of Number of Number of


articles wor~ers articles workers
18 3 23 17
19 7 24 13
20 11 25 8
21 14 26 5
22 18 27 4

Solution. Calcu/tltion of standard deviation


Number of
. No. of Deviations Total
.
Deviations Frequency
articles workers from the Deviations squared X square
assumed up of devia-
average (22) tions
(Ill) (f) (dx) (fdx) (dx)2 (fdx 2 )
~-"::"12- 48
11$ 3 -4 16
19 7 -3 -21 9 63
20 11. -2 -22 4 -1-4
21 14 -1 -14 1 14
22 18 0 0 0 0
23 17 +1 +17 1 17
24 13 +2 +26 4 52
25 8 +3 +24 9 72
26 5 +4 +20 16 80
27 4 \ +5 +20 25 100
n=100 i :Efd.x= +38 :Efdx2 =490

Mean value of the finished articles

=x+
:EJd« = 22
n
+ 38 =22.38 articles
100
Standard deviation
. . j j1X:-
:E n(a-x)1I

= j490-10~06·38)~·~ ,;;[756
= 2.2 articles app rox.
OJ'

a=J~a_(:Ef:X2)

= )
490 C
38\2 -
100-- 'tOO} = ,\/4.9-.144= '\14.756
=2.2
204 FUNDAMENTALS OF STATISTICS

Calculation of standard deviation in contin.uOu8 series


The technique of the calculation of standard deviation in a con-
tinuous series is exactly the same as it is in discrete series. The class
intervals are represented by their mid~points and once it is done, a
continuous series becomes a discrete series. However, since in con-
tinuous series the class-intervals are usually of equal size the deviations
from the assumed average can be expressed in class interval units, or
in oth::r words, step.-deviations can be found out by dividing the devig-
tion~ by the magnitude of the class-intervals. If it is done, a slight
adjustment is necessary in the calculation of the standard deviation.
The formula for the calculation of standard deviation is then written
as follows : -
a= J"'i:.~S _(E~dxy 'xi
Where i stands for the common factor or the magnitude of the
class-interval, and dx stands for the deviations in class-interval units,
and other signs stand for what they stood in previous formulae. The
foHawing examples would illustrate the calculation of standard deviation
in a continuous series by vrrious methods.
Example 12. Calculate the standard devia tion fo;: the following ta hIe
giving the age distribution of 542 members of the~House of Commons.
Age No. of Mem1ers
20-- 3
30-- 61
40- 132
50-- 153
60-- 140
70-- 51
80-- 2
Total 542
Solution. Calculation of the standard deviation of the age distribution
of 542 memhers of the House of Common,·.

Age
group
\

value
I
Mid- ' Freque-
ncy
Deviat~ons
from the
assumed
Total
deviations dev iations
--
Square of Frequency
X square
I aVo (55) fJxI
(m) (11/4)) ~f) dx fdx dx"
20-30 2 5
30--40 35
3
61
-30 - 90 900 2/UU
24400
-20 -1220 400
40-'5"0 I 45 132 -10 -1320 100 13200
50-60 55 153 0 0 I 0 0
60-70
70-80
65
75
140
51
+10
+20
+1400
+1020
I 100
400
14000
20400
80-90 85 2 +30 + 60 ,900 1800
n=542 "'i:.fdx=- "'i:.fdx"=
150 I 76500
MEASURES OF DISPERSION 205
-150
(a-x) .... 542 = .28

Standard Deviation = J "J:.j dx2-:(a-x)"

-=J 76500--:542(.28)1
542
- v'i4iJ57
-11.9
The following metpod 'will also give us the same result. ."--
. , 0 f t he age d
Standard d eVlatton "b
Istn '
utton = j"J:.fdx'l.
_ ' - - -("J:.fdX)'
--
1\ n n

-= j 76500
542
(-150
542
)2 = v' 141.07
.... 11.9 years
Example 1.3. The following data relate to the ages of a group
of Government mployees. Calculate the standard deviation. I
• Age Number of employees Age Number of employees
50-55 25 30-35 80
45--50 30 25-30 110
40--45 40 20-25 170
35-40 45
SoJlltion. Cal&1llation oj standard tktIiation

No. of Step devia- .......,......


Age
I tion from
employees as. av.(37.5)
+ ~(t/o(+l)
~
em)
50-55
45-50
(I)
25
30
(dX)
+3
+2
jdx
+75
+60
fdx 2
225
120
+4
+3
-
(d,(+ 1) ' - '
16
9
400
270
40-45 40 +1 +40 40 +2 4 160
35-40 45 0 0 0 +1 1 45
30-35 80 -1 -80 8D 0 0 0
25-30
20-25
110
170
-2
-3
-220
-S10
I 1530
440 -1
-2 4
1 110
680
"J:.f=500
I
"J:.jdx
-635
I~fdx=
2435
,
I- - -
~f(dx+l)
1665

u= J
Standard deviation or
_-==L""':j:~X-::z""'_-(-=-=~:-:~"""x-y X i

_- j 500
2435 _ ( -635)2
5eO X
'5

= 9.0 years.
206 FUNDAMBNTALS OF STATISTICS

Thus the steps in the calculation of standard devia60n in a conti-


nuous frequency distribution are as follows : -
(1) Assume an average at the mid-point of a class interval which
is preferably in the centre of the distribution.
(2) Measure the devia60ns of the mid-values of various class
intervals from the assumed mean and divide them by the magnitude
of the class interval to get step deviations or deviatIons in class interval
units (dx).
(3) Mul6ply the step deviations with the respective frequencies
of the classes (fdx). Totar these products ("'J:.Jax).
(4) Square the deviations and mul6ply them by the respective
frequencies and obtain the aggregate of such products (~dxl).
(5) Divide "'J:.Jdxl by the number of items or the total frequencies
"'J:.fdx
'
n
(6) Deduct ( "'J:.JdX)2
-n- from ;fax". This will be th~ value of the
n
IIarian&e or square of the standard deviation.
(1) Extract the square root of the variance and it would be the
standard deviation in class interval units. I
(8) Multiply the standard deviation so obtained by the magnitude
of the class-interval and the resulting ngure would be the standard
deviation in original unitR.
Cbarli~' s check of accuracy
J~st as in case of arithmetic average we check the accuracy of
calculatIons by a forl:Jlula given by Charlier, similarly, the accuracy of
the ca-Iculations can be assured in case of standard deviation also by
the following rule :_
"'J:.f(dx+ 1)"-"'J:.fdx'~2"'J:.fdx =N
Substituting the values in the above formula in Example No. 13
we get.
Hi65-2435-2(-635) .... 500
1665-2435-(-1270) ~500
1665-2435+1270 .... 500
2935-2435 == 500
. We thus find that the two sides of the equation are equal and this
IS a proof of the Correct calculation of the value of "'J:.fdKI.

Sheppard's conection for grouping


In the calculation of standard deviation in a continuous series
we take the mid -points of class intervals to represent the classes, or in
other words, we presume that all the frequencies are concentrated at
the mid-point of a class interval. This may not always be so. How
MEASURES OF DISPERSION 207
ever, if the distribution is symmetrical or even moderatelyasymme-
trical, and if the class intervals. are not greater than 1J12th of the range,
the likelihood is that the assumption would not be far from the truth.
If a distribution is continuous and if the frequency tapers off to
zero in both the directions, a correction in the value of standard deviation
is usually done to remove the effects of grouping. These corrections are
known as Sheppard's corrections for grouping. They are as follows : -
h'l.
oil = ut" 12

Whete a.· stands for the square of the §tandard deviation after
corrections, at for the square of the standard deviation before correction
and h. for the square of the magnitude of the class intervals.

Thus if ii~ Example No. 12 above, the' standard deviation is cor-


rected for grouping, it would be as follows : -

u1.= 141.07
h=10
10·
Therefore ul =141.07- = 132.74
12
Therefore

It should be remembered that Sheppard's corrections are not applicable


to }-shaped, U-shaped or highly c;kewed distributions. Further they
should not be applied if the total frequency is not very large-say less
than 1000.

Standard deviation and the spread of observations


It should be noted that in a 'symmetrical9r moderately asymmetrical
series a range which is six times standard deviation usually covets
at least 99% of the observations. Thus, in Example, No. 12 the standard
deviatiQn (after Sheppard's corrections for grouping) is 11.5. Six times
of this gives a range of 69.0. !twill be observed that all the observatio1ls
lie within a ·range of 70 (90-20). Similarly in Example No. 1~," the
.uncorrectcc1 standard deviation is 9, and six times this, would give a
range of 54. All the observations are covered within 'a range: which is,
smaller than this. This property gives a concrete and definite meaning
to .standard deviation. We shall discuss more about this property later
on in conrlectio~ with normal frequency curve. In fact in a symmetrical
cir moderately asymmetrical 'dlstribution, mea1i ± 10 covers about
67% of all t~e values and mean ±20 about 95% and mean ±30 about
99% of the values of the variable.
208 FUNDAMENTALS OF STAnS'i'ICS

Mathematical properties of standard deviation


(1) Just as it is possible to find ~llt the mean ofa series from
the means of its component parts, similady the standard deviation of
a series can be found out from the standard deviations of its component
parts and their means.
We know that
all ~ "1 a1+n:tZ1
__,'----'._,_----'O..~

" 1+n ll
Where all stands for the mean of a serie:; and a1 and a, for the means
of its component parts, and n1 and n. for the number of. items in tbe
two component parts respectively.
If. further 0'1 and 0'1 stand for the standard dcvi~tions of these
component parts and O'lf for standard deviation of the whole series
tben

+ d21)
0'11=
1 j nl (O'tl+dtl)+n.
n1 +n.
((1111

The follOWing example Would illustrate the formulae:-


Exampl, 14. Find out the combined mean and standard deviations
from the following data

Series Series
A 13
Number of items 100 500
Mean 50 60
Standard deviation 10 11
SO/Illioll.
Combined mean or

all=
"1fNl
11 +11
+" 2m 2
1 II
(lOQx50) + (500 X 60) 35,000
- - - . 100+500 ._= 600-
- 58.3
MEASURES OP DISPERSION

Combined standarq deviation or


+ dl l )+#. (uI1+Jl)
Ull'" j #). (O'I I
#1
+
#1
-
J 1 -50-58.3 --~.3
dl ... 60-58.3 ... 1.7

100[(10)1+(-8.3)1]+500[(11)1+(1.7)1 ]
• 100+501)
-= j 600
16889+61850
""
,.--~~-

J13f.23 =11.5

Similarly the standard deviation of more thah twO component


parts can be combined in one.
If the number of observations in the two component parts is equal,
and if the means of the two parts are also identical then-

0'11- j O'II+O'tl
2
• Thus, if in the above example, the number of items in each case
was 100 and if the mean in each case 'Was 50 the combined standat:d
deviation by the lirst method would have been
_j 100(100+oY.+ 1oo(121+0)
1 +100

- J 10000+ 12100
200

_j ~: _j~l
10.51
If we apply the second rule then
j _j ~1
~t-
J
0'11+0',- _

10.51
100+ 121
2 2

(2) The standard deviation of the brst # natural numbers is


0'-= "II~ (,,1_1)

We can know from elementary algebra that the sum of the first
" natural numbers is
11 (,,+ 1)
2
14
210 FUNDAlomNTALS OF STATISTICS

Thus the sum of 1, 2,3, 4 and 5= 5(5+1)=15


Z

The mean of first n natural numbers = ( 11+1


-,,-)

T-hus ~h~ ~ea_? ~f 1, 2, 3,~na 5 is ( 5~1 ) or 3. Further it can


be easily proved that the sum of the' squareS of the first II natural
number is

1'1(1'1+ 1) (2n+ 1
6
Thus the sum of the squares of natural numbers 1 to 5 would
be 1+4+9+16+25 ... 55. It is equal to

5(5+1) (10+1)
6 ~-

5 X 6:11 -55
We have Seen in Example No. 9 (Direct method No.2) that
u "" J '1:.ma_<;m)l{w

In case of natural numbers '1:.1111 = n(n+lL(2n+l) and

:l:1II = 1'I(1I~1). If we substitute these equations in the above formula


and simplify it we shall get
u = '\I'l(n+1) (2n+1)-~(n+1ya
u = '\I'1/12(n l -1)
Thus if we calculate the standard deviation of natural numbers
1 to 5 it will be
u = '\I'ls(Si-1) -
),(BASURES o~ DlSJ'ERSION 2U

If we calculate the standard deviation by the direct method we


shall get exactly the same answer as follows : -

Si:l!e of item Deviations ftom Mean (3) Square of deviations


(tI) (.til)
1 -2 4
2 -1 1
3 o o
4 1 1
5 2 4

Standard deviation="
j T.tinll = )10
s- = "';2= .1.414.
The standard deviation possesses many other mathematical pro··
perties which are derived primarily from the above two rules.
. Example 15. From the oata below. giving the averages and
standard deviations of four sub-groups, calculate the average and the
standard deviation of the whole group.
:Sub-group No. of men Average wage S. D. wage
s. d. 8. d.
A 50 61 0 8 0
B 100 70 0 9 0
C 120 80 6 10 0
D 30 83 0 11 0
Total -300
Solution.
Total wages of 50 men in sub-group A=50 X61/. =30501.
Total wages of 100 men in sub-group B=100X 70s.· .... VOOO:r.
Total wages of 120 men in sub-group C ... 120 X 80 5s. "" 96601.
Total wages of 30 men in sub-group D =30 X 831. =24901.
Total wages of 300 men in sub-groups A, B, C and D =(3050+
7000+9660+2490)1. =-22,2001.
Average of the whole group = ~odO s. = 741.
(ii) We know that SI=aI +JI
where, s. is the second moment about any arbitrary number and
tI is the difference between the mean and this number.
Now mean of the whole group is 74.
:.tll=74-61 = 13; d 2 = 74 - 70=4; d3 =80.5-74=6.5 and
d,=83-74=9
Now NuI=Nt(O'll + d1 2) +
N. (0'18+da2) + N. (O'.I+dl) +N,
(O','+d,2)
212 FUNDAMENTALS OF STATISTICS

or 300al -50(64+ 1'69)+ 100(81 + 16)+ 120(100+ 42.25)+ 30(121 + 81)


... 11650+9700+ 17070+ 6060 = 44480
or at -148.3
a-12.2
'Extl1llpJ, 16. For a frequency d,jc;tribution 9f marks in history
of 200 candIdates grouped in intervals 0-5. 5-10...•.. etc.). the mean
and standard deviation we.r~ found to be 40 and 15. Later it was dis-
covered that the score 43 was misr..: '!d as 53. in obtaining the frequency
distribution. Find the corrected mean and standard deviation corres-
ponding to the corrected frequency distribution.
Solution. Let us reconstruct the original and wrong tables.
Let the frequency of the ith group be Ii then:
Class-intervals Mid-values Frequency. Frequency
(correct)
0-5 2.5 .Ii. 11
5-10 7.5 I. II
# .. , •• , . . . . . . ,. ......... , ....... .... ............ ,
, ............
............ ........•... ............ .. .............
3S-4O 37.5 f.
40-45 42.5 f.- I. + 1

~o11
45-50 47.5 110
50--55 52.5 111-1
......... ... ............ .." ......... ............
----
200 200
The value of the mean when 43 was misread as 53 is given by
1
40- 20(r(2.5/1+7·~I+ •..... +37.5/.+42.5f.+47.5fl.+52.5111+ ..... ·)
Let the value of the corrected mean be x.
- 1
Then x .... 200 (2.5/1+ 7 •5/.+.,.+37.5/.+42.5 (/.+ 1)+47.5/10
+52.5 (jtl-l)+ ...... )
Let 2.5/t+ ...... +37.5!.+47.51111+57.5/1t+ ...... -.r
1
Then 40 =2O(j'"(s+42.5/.+S2.5fu) or .r+42.5/.+S~.5fn -800(,l
- 1
and x - 200 [1+42.5 (/9+ 1 )+52.5 (/11-1)]
1
- 200[1+42.5/.+52.5/11+42.5-52.5)
1 7990
- 200 (BOOO-10} - 200 -39.95
MEASURES OF DISpERSION 213

The corrected mean corresponding to the corrected distribution


is 39.95.
Calculation of the corrected Standard Deviation.
When 43 was misread as 53, the second moment about 40 which
was thought to be equal to ai, is given by :
1
(15)1 ... 200 [/1(2·5-40)1+/1(7·5-40)1+ ... +1,(42.5-40)1+/18
(47.5-40)1+/11. (52.5-40)1+ •..... ]
Let/l 37.51+/2 32.51+ .. .j, 2.51+/10 7.5'+/u 17.51+ ...... -s
Then 225 = _1_ [s+2.5~.+12.5o/11] or s+6. 251.+ 156.25/11
20u
=-45000
Let the correct value of the second moment about 40 be sa
Then
1
S· -= 200 [s+6.25 <f..+l)+156.25/u-1)1
1
~ 200 rs+6.25/,+156.2~1l+6.25-156.25)

1 44850
2000 [45000-150} = 200 -224.25
aI=sl-dl where d is the difference between the actual and
assumed mean.
In this example s:l=81 =224.25 and d-=(40-39.95) =0.05
:.aI =224.25-0.0025
=224,2475
:.a =14.97
The corrected standard deviation corresponding to the corrected
distribution is 14.97.
ExtZlllp/e 17. The mean. age and standard deviation of a group
of 100 persons (grouped in intervals 10-, 12-, ... etc.) were ,found
to be 32.02 and _13.18. I.ater it was discovered that the age 57 was
misread as Z7. Find the correded mean and standard deviation.
Solution. The age 57 belongs to the group 56-58 (mid-value
,-57) and the age Z7 belongs to the group 26-28 },mid-value-27)1
, Let the misread frequencies of these two grou:ps be 1 and I.. Then
the corrected frequencies will be (/1+1) and (/r-l) respectively. All
other frequencies have been entered correctly.
Mid-value Frequency (wrong) Frequency (correct)
57 11 11+1
27 I,. /.- 1
214 FUNDAMEN'!'ALS OF STATISTics

Value of the mean when 57 was misread as 27 is given by 32.02


1
100 (s+57/1 +27/,) or s+5711+ 27/.-3202
Where. s is the sum: of the products of correct frequencies and
values. Let the correct value of the mean be x. then
- 1
x= 100 {s+57 (/1+ 1)+27 (/a-1)}
1
100 {s+57 /1+ 27 /.+57-27}
1 3232
- 100 {320:2+30} = 100 = 32..32
,'. Cot;rected mean - 32.32.

Value of the standard deviation when 57 was misread as 27 is given bv:


(ts..18)2 0= 1~ [s+/1(57-30.02)1 + /2(27-30.02)IJ
Where s..,;Z/(X-;;)1 for all correct values 0(/.
1
or 174.0124 co 100 (s+727.9204/1 +9.120412)

It is the second moment about 32.02 when 57 was misread as 27.


Corrected second moment about 32.02=a s is given by

82 = 1~ (s+727.9204 (/1+ 1)+9.1204 (/,;-1)

= 1!0 (s+727.920 J1+9.12041.+727.9204-9.1204)

1~O (17401.24+718.8)= 18~:.04 0= 181.2004

We know that U'o=S2-J2 where J is the-difference bet\Veen the


ac;:tual and assumed mean.
In this example s2 ... 8~-181:2004. and d ... 32.32-32.02"",0.3
:.u2"", 181.2004-0.09 ... 181.1104
.~. C:Orre~ted standard deviation=V 181.1104 ... 13.45
Example 18. The mean and the standard deviation of 1000
values of a variate (grouped in intervals 2.5-,7.5 ... etc.) were fOurid
to be 29.93 and 9.977. Later it was discQyered that in calculating
these values the errata which was supplied with the-data was not consj-
dere<jl. The errata read as follows :
MEASURES OP DJII.-RSION

Grout> 7.5-12.5 for frequency .3 read frequency 28


Group 17.5-22.5 for frequency 120 read frequency 121
Group 27.5-32.5 for frequency 200 read frequency 598
Grout> 32.5-.-32..5 for frequency 175 read frequency 176
Group 47 .5~2.5 for frequency 25 read freqp.ency 27
Calculate the corrected mean and standard devjp, tion for the
corrected series.
Solution. The frequencies in all other groups were recorded
correctly. Let these frequencies be noted by Ii and let T.liK;=s.
The value of the mean when the errata was not considered is given.
by:
29.93= 4~OO [/+(30Xl0)+(120x20)+(200X30)+(175x35)+
25x50)]
1
- 1000 [/+300+2400+6000+6125+1250]

1
~ 1000 [/+16075] :.1-29930-16075-13855

Correct value of the mean is given by :

x- 1~00 [/+(28Xl0)+(121X20)+(198X30)+(176X35)+
(27 X 50)]

... 1~0 [13855+280+2420+5940+6100+1350 ]

30005
... --fooo ... 30.005
4gain let T.li(xi-29.93)2=T.; where I; are the
correct frequen-
cies. Then second moment ahout 29.93, when errata was not
considered is given by : ~

(9.977)2 "" 1~00 [T+30(10-29.93)2+120(20-29.93)1!+200

(30-29.93)2+ 175 (35-29.93)1+25 (50-29.932 ]


1
or 99.5295.,. 1000 [T+38318.195]
or T = 99529.5;_.38318·195=61211·305
·\..
2'16 FUNDAMENTALS.OP STATISTICS

Correct second moment about 29.93 is given by

82 -= 1~ [T+(28X397.2049)+(121X98.6049)+(198X.OO49)+
(176X25.7049)+(27X~02.8049)

- 1~00 [61211.305+38453.695] - l~OO X 99665-99.665

We know that a2=s~-da. Here s2~82 ... 99.665 and d-.07S


:.-a2 =99.665-.0056 ... 99.6594
:. a = 9.98

Merits, demerits and uses of statldatd deviation


The standard deviation possesses most of the characteristi<;s which
an ideal measure of dispersion should have. Thus:
(1) Standard deviation is rigidly defined and its value is always
definite.
(2) It is based on ali the observations of the data.
(3) It is amena.ble to algebraic trea.tment and possesses many
mathematical properties. It is on account of .these properties that
standard deviation is used in many advanced studies. I
(4) It is less. affected by the 1l'uctuations of sampling than most
other measures of dispersion.
. (5) The squaring of the deviations mak.es them pO~ltive and the
dlfficulty_about algebraic signs which was experienced in case of mean
deviation is not found hae.
(.fJ) However, standard deviation is not easy to calculate, nor
is it ~ty understood. In any case it is more cumbersome in its cal-
culation than .either quartile deviation or mean deviation.
(7) It gives more weight to extreme items. anc;lless to those whicb
are ·near the mean, because the squares of the deviations, which are big
in size, would be proportionately greater than the squares of those
deviations which are comparatively small. Thus dev1~tions 2 and 8
are in the ratio of 1 : 4 out their s~uare, i.e., 4 and 64 would be in
the ratio of 1 : 16.
The above merits and demerits of the standard deviation show
that despite some drawbacks, it is th~ best measure of dispersion and
should be used whereverpossibJe. HO'Wcver, the sta~dard deviation
has not found favour with economists and businessmen because it gives
greatCI weight to extreme items and economists and businessmen are
MBASUBns OF DISPBRSION 217 .

more interested in the results of the modal class. Moreover, the


difficulties of its calculation are also xesponsible for its comparatively
lesser popularity with the common man. But it should always be kept
in mind that just as mean is the best measure of central tendency (leaving
~xceptional cases), standard deviation is the best measure of dispersion,
excepting a few cases, where mean deviation or quartile deviation may
give better results.
OTHER MEASURES OF, DISPE:aSI~

Besides range, quartile deviation, mean deviation and standard


deviation there are some other meaSures of dispersion also. They
are not in common use and comparatively much less important than
others. They are :
Modulus
Modulus is the square root of twice the second moment of dis-
persion about the inean. It is generally d~noted by C.
Thus

C - J2fl
Modulus is equal to standard deviation multiplied by the square
[oat of2 or
C=aXV2
Like standard deviation this measure is also based on the second
moment about the mean.
Precision
It is the reciprocal of modulus.
Thus
Precision ...
1
jv:'
Probable ettot
It is equal to .67449 X stanctard deviation.
Modulus, precision and probable errors are used in the theory of
errors of observations. We .shall discuss them in chapter!! on Sampling.
Standard deviation should not be confused with the term "Standard
Error" which stands for the standard deviation of simple sampling.
The. concept of standatd error will also be discussed in details in the'
chapters on Sampling.
Variance
It is equal to the square ~f the standard deviation or in other words
it is the second moment about the mean.
'218 FUNDAMENTALS OF STATISTICS

Coefficient of variation
It stands for the percentage, which the value of standard devia-
tion is, to the value of the mean. In other words, if standard devia-
tion is divided by the mean and multiplied by 100 we get the coefficient
of variation. This measure was first suggested by Professor Karl
Pearson. According to him, coefficient of variation is the "percentage
variation in the mean, the standard deviation being treated as the'tota}
variation in the mean,"
Symbolically
Coefficient of variation or V ... -.!!_X 100
a
-Coefficient of standard deviation X 100
Thus, if the mean of a series is 50 and the standard deviation is 10,
the coefficient of variation would be
10
SO--X 100
or 20%
It means that the standard dcviation is 20% of the n.can.
Ginni's mea,n difference
Corrado Gipni, an Italian statistician, has suggested that instead
of measuring dispersion from any measure of celJtral tendency, the
mean dMrerence, between tne values' of all possibJe p~rs of the variable
should be found out, and it would give a good measure of dispersion.
Thus, thi~ measure of dispersion is equal to the mean difference (regard-
less of algebraic signs) of each possibfe pair of the values of the variable.
Symbolically
Ginni's mean differen£e _l
m
Where g stands for the total of the differences in the values of all
possible pairs of a variable and m stands for the total number of diffe-
rences. The tot~l number of differences would be equal to ,j n (n-l)
The following- example would illustrate the above formulae : -
Exampk 19. Find out Ginni's mean difference from the following
items : -
22, 24, 26, 28, .30.
SO/filion
30-22=8 28-22 ... 6
'30-24=6 28-24=4
30-26=4 28-26 ... 2
30-28=2
Total .. 20 -12
MEASURES OF DISPERSION 219

The sum of all the differences or


g_(20+12+6+2)~40
The tota-I number of difference =1 n (n-l)
-1 5 (5-1)-10
' " s mean d'Er
G lOnl 40 <= 4 •
g = 10
Juerence -;;,-

The mean deviation of the above series -2.4 and the standard devia-
tion-2.8.
Giani's mean difference is always more than the mean deviation
as it gives greater importance to extreme variations. The value of
Giani's mean difference lies in the fact that it studies the variations
(JIIJongII the values of a variable rather from a central value.
If the- square root of the average of the squares of all dif¥erences is

fuund it would always be equal to aJ 2 ( n1 )


·or nearly" '\>"2--

1n other words, it would almost be equal to the value of modulus.


In the above example the average of the squares of all the dHfe-
. .200 201 . '-
rences IS -W0r'. ts square root 1S V' 20. The standard deviation

of the series ... V -,fl.

Thus jZ-::j~ x j 2( 55 1)
-~/~xJ}
-J 40 X 2..00:J20
5 2

lteIationship between various measures of dispersion


For 1I. normal distribution Or even for a moderately asymmetrical
distribution the following relationshio between quartile deviation
220 FUNDAMENTALS OF STATISTICS

mean deviation and standard deviation hold good :_

Measure of Percentage of observations included Si2:e of various


dispersion within a certain range on either measures of
side of the mean dispersion in
relation to
standard devia-

d
tion.
--_----- :±: 1 stan- :±: 2 stan- ::I: 3 stan-
dud dev;,,- dard devia- dard devia-
tion tion tion

Quartile 50.0 82.3 95.7 0.6745


deviatiOn

Mean 57.5 88.9 98.3 0.797.9
deviation

Standard 68.3 \ 95.4 99.7 1.000


deviation
-
Thus fot a symmetrical or moderately asymmetrical.distribution:-
I
(1) The quartile deviation is .6745 times the standard deviation or
roughly 2/3 rd the value of the standard deviation.
(2) The mean deviation from mean is equal to .7979 times the
standard deviation or roughly f.th of the standard deviation.
(3) It follows from the above that a range six times the standard
deviation is equal to a range nine times the quartile deviation and 7.5
times the mean deviation. Within these ranges at least 99% of the ob-
servations are covered.
(4) Mean:±: 1 standard deviation would include 68.3% of the
cases.
Mean :c 2 standard deviation would include about 95.4% of
casc.. '
Mean::f: 3 standard deviation would include 99.7% of the cases.
Mean :c 1 quartile deviation weuld include 50% of the cases.
Mean ::I: 1 mean deviation would inclUde 57.5% of the cases.
(5) The prObable error is .6745 times of the standard de~.
Thus mean :I: 1 probable error :would COver roughly 4% of all
the observations.
MBASUaES OF DISPERSION ~21

Choice of a measure dispersion


We have already studied the merits and demerits of various mea-
sures of dispersion and we are not in a position to make a comparative
study of their qualities. It would help in the selection of an appropriate
measure of dispersion for a particular problem under study. Range,
as a measure of dispersion, suffers from serious drawbacks; it is an un-
stable measure, affected considerably by the fluctuations of sampling,
and as such, its use cannot be advocated except in cases where the varia-
tion in the size of items is very little. The quartile deviation is a better
measure than the range. as it is not affected too much by the values of
extreme items. It is easily calculated and is readily understood. In
these respects it is better than even the mean and standard deviations.
But quartile deviation.has no .algebraic propert~es abd its behaviour under
fluctuations of sampling is freakish. As such its use can be recommended
pnly in those cases where mean deviation or standat:d deviation cannot
be easily calculated or its calculation is impossib1.e. as in case of indefinite
extreme classes (like more than 1000 or less than 100). Between mean
deviation and standard deviation, the former has an advantage of copt-
'parativdy simple calculation and easy understandability, but the mathe-
matical properties possessed by the standard deviation are not found in
this measure, and it is not easily amenable to further algebraic treatment.
However, in cases where median is supposed to be an iQeal average, the
best measure of dispersion would probably be the .mean deviation. In
other cases the standard deviation scores over all other measuc('$ of dis-
persion. We have already seen that amongst the measures of central
tendency, mean occupies a unique position, and the same positic;>n is
occupied by the standard deviation amongst the measures of dispersion.
Standard deviation is rigidly defined, is based on all the observations, is
capable of algebraic treatment, and is not affected very much by fluctua-
tions, of sampling. However, it should·be kept in mind that standard
deviation gives comparativdy greater importance to extreme variations,
which should usually be ignorged.
The above' discussion leads us to the conclusion that though the
choke of a measure of dispetsion would depend on the nature purpose
and object of an inyestigation, yet for all practical purposes th; standard
deviation is a better mt1tSuce of dispersion than otbers.
It should further be remem'beted that for comparison of variability
of two series, we should always choose a relative measure of dispersion
Absolute measures of dispersion sometimes give very misleading con-
clusions. If, for example, the protits of two a>mpanies A and B during
'the .last t~ yeoars are as follows ! -
A
(Rupees)
2,000
3;O<K'
4.000
222 FUNDAMENTALS OF STATISTICS

The range in both the cases is Rs. 2,000 a.nd the mean deviatigr,
is Rs. 666.7 in both the cases. The absolute measures of dispersion
are thus equal but the variation in the two series, is, in reality, not iden-
tical. If, however, we calculate relative measures of dispersion diis
anomaly would be removed. The coefficient of range in the two cases
would be land 11 respectively and similarly the mean coefficient of dis-
.
perslon wou1 2 and
d be 2 respective
63 ' 1y. In ..
comparIng .
dIsperSlon
9
of two series, expressed in different unitS, the use of relative mea-
sures of dispersion is inevitable because absolute measures of dis-
persion in such cases would be in different units.
Lorenz curve
.
Dispersion can be studied graphically also with the help of what
is called Lorenz Curve, after the name of Dr. Lorepz who first studied
the dispersion of distribution of wealth by the graphic method. The
technique of drawing Lorenz Curve is not very difficult. In it the size
of items and the frequencies are both cumulated and taking the total as
100, percen'tages are calculated for the various cumulated values. These
percentages are plotted on a ~raph paper. If there is proportionately
equal distribution of the frequencies over various values of a variate, the
points would'lie in a straight line. This line is called tpe "Uneo! Eqllal
Dirtribllfion." If, however, the distribution of items is not proportioll-
'itelyequal, it indicates variability, and the curve would be away from
the line of equal distribution. The farther the curve is from this
line, the greater is the variability in the~ries. The following example
would illustrate the procedure of drawing'-a Lorenz curve : -
Example 16, Draw a Lorenz curve from the t'ollowing data :-

Number of persons in thousantls


Income in thou-
sand rupees Group A Group B Group C

10
20 I
i
I
5
10
8
7
I 15
6
40 20 5 2
50 25 3 1
80 40 2 1
- -
To draw the Lorenz Curve from the above data the size of the item
and frequencies would have to be cumulated and then percentages would
have to be calculated by taking the respective totals as 100., This is
MBASt11tlfS OF DISPERSI<?N 223

done in the following 'table : -


Income
------------------------
Group A Group B Group C

10 10 5 5 5 5 8 8 32 I 15 15 1 60
20
40
30
70
15
35
10
20
15
35
15
35
7
5
15
20
60
80
I6
2
21, 84
23 \ 92
50 120 60 25 60 60 3 23 92 I 1 24 96
80 200 100 40 100 100 2 25 100 I 1 25 100
--~------~---------------
Now the cumulative percentages would be plotted on a graph paper.
Percentages relating to the number (Jf person would be shown on the
abscissa and from left to right the scale would begin with 100 and end
with O. The income percentages would be shown on the ordinate and
here the scale will begin without' the bottom and go up to 100 at the top.
The above percentages would give the following type of curve :
224 PUNDAlmNTALS OP STATISTICS

From the above figure it is clear that in the first group of persons,
the distribution of income is proportionately equal 110 that 5% of the
income is shared by 5% of the population, 15% of the income by 15%
of the population ancfso on. It gives the line of equal distribution.
In the second group the distribution is uneven so that 5% of the income
is shared by 32% of the people and 150/0. of the income by 6()0,4 of the
people. In the ttir~ group the distributIon is still more un~qual so that
5% ofthe income is shared by 60% of the people and 15%oftheincome
by 84% of the people. The variation in group C is thus greater than the
variation in group B. Curve C is thus at a greater distance from the
line of equal distribution, than ~rve B.
The Lorenz curve has a great drawback. It does not give any
numerical value of the measure of dispersion. It merely gives a picture
of the extent to which a series is pulled away from an equal distribution.
It should be used along with some numerical measure of dispersion. It
is very useful in the study of income distributions, distributions of land
and wages, etc.

Questiool
1. What is meant by dispenion? What are the methods of computing mra-
sures of dispetsion ? Illustrate the practical utility of such methods.
eM. C_., Ail••, 194').
z. Explain the meaning of the term djspellion and distingui~ between absolute
and relative measures of dispellion. (B" C_•• Allaha/Hui. 1946).
3. Discuss the various ways in which the diifctences in the characteristics of
frequency djsttibutions ate generally measured. CB. C_ •• LIK_",. 1957).
4. Explain the various methods of describing the Idltter of a frequency distri-
bution and say what you know as to the relztive worth of the relztive measures.
(B.U..,NfII1lW. I 944)·
5. Frequency distributions may either differ in the numerical size of their ave-
rages thoogh not neccssatiJy in their formations or they may have the same valucs
orthe average but differ in their respective fonnations.
Explain and illustrate how the measures of dispersion afford a IUpplcmcnt to the
informatiOn about the frequency distributions given by the a~.
(M. C_ .• KlljJ1ldlrlltl. I.9S Z).
6. Ddine carefully the mc:an deviation. standard deviation and quartile devia-
tion of any given distribution. In wbat problems should each be uacd ?
(M. A.. AlJ6habtu1. 1940).
7. What arc the mathematical properties of standard de"jation? How is it"
better measure of dispersion than the mean deviation or quartile deviation ?
8. What is meant by Sheppard's Coucctions? Under what c:onditiosls should·
these. corrections be made ?
9. Define dispersion. Why is it necc:swy to measure dispctsiosl in ord er to
make comparisons of frequency d,isttibutiona ?
10. What is range? What ate ita advantages and disadvantages as mcslUre of
dispCllion ?
n. Find directly the standard deviation of the natural aucibers &om 1 to 10
and VCtify the answer obtained by a abort cut method.
U. Write abort notcs on
(II) Lotens Curve (/1) Charlier's Check (f) Ginni's Mean Differcucc Cd) pre.
cision ee) Modulus (I) Root Mean Square deviation.
MEA ~URES OF o:.>PERSIO" 225

13. The following table gives weights of one hundred persons. Compute the
coefficient of dispersion by the Method of Limi(s.
Weight in lbs. of 100 persons
Class-interval No. of pefSons
85- 95 4
95-105 13
105-115 8
115-125 14
125-135 9
135-145 16
145-155 17
155-165 9
165-175 8
175-185 2

100

14. What arc the different measures of dispersion ? Th~ following table gives
the height of one hundred persons. Ca1culate the dispersion by Range Method.
Height of 100 persons in inches
Height in incht:s Frequency
'Below 62 2
63 8
64 19
65 32
66 45
67 58
68 85
69 93
70 100
"
15. The following are the marks obtained by a batch of 9 students in a certain
test : -
Serial Number Marks Serial number Marks
(out of 100) (out of 100)
1 68 5 54
2 49 6 38
3 32 7 59
4 21 8 66
9 41
Calculate the mean deviation of the series.

16. Summary of receipts and Passengers of a certain Motor Bus Company


Year Receipt Passengers
1925 2,~54 SO,010
1926 2,780 61,060
1927 3,011 70,005
1928 3,020 70,110
1929 3,541 83,001
1930 4,150 91,100
1931 5,000 1,00,000
(B. Com. Allahabad, 1932).
i-rom the foregoing data, find out one measure of dispersion and state whether
the ~;\riation in receipts is greater than in passengers.
226 PUNDAtof!NTALS Olf STATISTICS

17. Find Mean Devilltion of the distribution given below ; -


No. of PetJQns having said No. of Persons havlkf: said
accidents numbet of accidents: accidents number of acei at.:
I) 15 7 2
1 16 8 1
2 21 9 2
3 10 10 2
4 17 11 0
5 8 12 2
6 .oJ
Total 100

18. Calculate the mean deviation from the following data, what light does it
throw on the social conditions of the c:;ommunity ?
Difference in age between husband lUld Wife in It particular co~munity.
Difference in years Frequency Difference in yeatS Frequency
0- 5 449 . : 20-25 109
5-10 705 25-30 52
10--15 507 30--35 16
15--20 281 35--40 .oJ

19. The following table gives the age distributions of swdents admitted to a col-
lege in the years 1914 and 1938. Find which of the two,gri>ups is more variable in age.
Number of stl](fcnts admitted 1n
I
Age 1914 19.38
15- 0 1
16- 1 6
17- 3 .4
18- 8 :2
19- ·12 ·5
20- 14 :0
21- 14 7
22- 5 .9·
23- .2 3
24- 3 0
25- 1 O.
.Q 1
26-
27- 1 0

Toral 6-4 1.48


(B. C_. ~. 19-42).

ZOo Calculate quartile dniation and its coefficient of A's monthly eAminp.
for It year.
Months Monthly earnings Months Monthly earnings
Rs. ·Rs.
1 139 \7 160
2 150 8 161
3 151 9 162
4 151 10 162
5 1.57 11 173
6 158 12 175
227
21. From the following table giving height of student$ calculate the Semi-
[nterquartile Range and' the Coefficient of Quartile Deviation.

Height 'in inches No. of students Height in inches No. of students


53 25 63 24
55 21 65 22
57 28 67 18
59 20 69 23
61 18

22. Find out the Standard Deviation of the following items : -


8, 10, 12, 14, 16, 18, 20, 22, 24, 26.

23. Compute the standard deviation'of the rainfall in the varioQ.S jute-growing
listricts of Bengal from the following statement : -

>jstricts Rainfall in inches Districts Rainfall in incj1es


(1939 July) (1939 July)
',4-Parganas 17.36 Rajshahi 21.23
,{utshidabad 19.17 Dacca 27.10
~ulna 22.99 Chittagong 40.97
lurdwan 17.00 Cooch-Bihar 26.58
..r.idnapur 14.99 Hoogly 17.67
(B. A. HOI1., Pwrjob, 1941).

24. Calculate the standard deviation of the following two series. Which shows
:ceater deviation ?

Series A Series B Series A Series B


192 83 260 126
288 87 348 126
236 93 291 101
229 109 330 102
184 124 243 108
(P. C. S. 1938).

25. Find standard deviation of the figures in the following table to show whether
he ","riation is great in the area or the yield ?

Yield in lacs of bales


Years Area in lacs of~acres of 400 lbs. each
1914-15 1~ G
-16 114 51
-17 138 50
-18 154 45
-19 144 40
-20 153 53
-21 144 59
-22 in w
-23 136 63
1923-24 154 60

26. The index numbers of prices of cotton and COli.] shares ill 1942 "ere as under:-
;l28 FUNOAMEN'l'ALS OF STATISTICS

Index number of Index number of


Month prices of prices of
cotton shares coal shares
January 188 131
February 178 130
March 173 130
April 164 129
May 172 129
June 183 120
July ~q4 127
August 185 127
September 211 130
October 217 137
November 232 140
December 240 142
\'V'hich of the two shares do you consider more variable in price ?
(M. A. Agra, 1944).
27. The fluctuations in the rates ofKohinoorand Tata Deferred on the 7th Ma>'ch
are given below. Find out which of the two shares shows greater variability.
Kohinoor-618, 619, 616, 623, 620, 624, 622, 625, 622, 625, 626, 625.
Tata deferred-2152l, 21321, 2134!, 2132t, 2145, 2142t, 21461. 2130, 21461,
21421, 2150, 2135, 2152t. (Bombay. 1955).
28. The following table gives the number of fipished articles turned out per day
by different number of workers in a factory. Find the~mean value and the "standard
deviation" of the daily output of finished articles, and explain the significance, of
'standard deviations'. : -
Number of Number of Number of Number of
articles workers articles workers
18 3 23 17
19 7 24 18
20 11 25 8
21 14 26 5
22 18 27 4
(D. Com. Calcutta, 1937).
29. Calculate the Standard Deviation of the following data with regard to 2,298
families in the U. K.
Number of persons Number of Number of persons Number of
in the family families in the family families
1 165 7 7
2 552 8 41
3 580 9 20
4 433 10 8
5 268 11 5
6 148 12 1
Total 2,298
(M. A. A/M., 1942).
30. Find out the mean daily earnings and standard deviation of earnings from
the following data :
Rs.
36 men get at the rate of 5.0 per man per day
40 5.5
94 6.0
138 6.5
80 7.0 "
"
61
25
7.5
8.0 ..
MEASURES 01' DISPERSION 229

31. Calculate the standard deviation for the following table giving the age dis-
tribution of 542 members of the House of Commons.
Age No. of members
2~ 3
30- 61
40- 132
50-- 153
6~ 140
70-- 51
80-- 2
Total 542
I
32. The following table gives the frequency distribution of expenditure on food
per family per month among working class families in two localities. Find the arith-
metic average and the standard deviation of the expenditUle at both places.
Range of expenditure No. of families
in Rs. per month Place A Place B
Rs. 3- 6 28 39
6-- 9 292 284
" 9-12 389 401
" 12-15 212 202
15-18 59 48
18-21 18 21
~~ ~ 5
(P. C. S., 41).

33. Find the mean yield of paddy and the standard deviation for the distribution
of the results of 3,061 crop-cutting experiments shown in the following table --
Yield of paddy per acre in
Lbs. No. of experiments
0- 400 236
401- 800 481
801-1200 604
1201-1600 576
1601--2000 419
2001--2400 333
2401--2800 217
2801--3200 87
3201--3600 64
3601-4000 23
4001-4400 14
4401-4800 6
4801--5200 1

3061
(B. Com., Bombqy, 1945).

34. Calculate the mean and standard deviation of the following series--
Marks Number of students Marks Numbct of students
1- 5 1 21-25 7
6-10 18 26-30 2
11-15 25 31-35 1
16--20 26
230 I'UNDAU;ENTALS OF STATISTICS

35. Find out the mean and standard deviation of the following data : -
Age untler Number of persons Age under Number of persons
dying dying
10 15 50 100
20 30 60 110
30 53 70 115
40 75 80 125
36.Find out the co-erlicient of variation of the following series :-
Number of Number of
Income persons Income persons
More than 1000 0 More than 500 600
900 50 400 750
800 110 " 300 350
700 200 200 900
600 400 " 100 1000
37. Calculate the standard deviation of the following seri,cs:-
Marks Number of students
More than 0 100
10 90
20 75
30 50
40 25
SO 15
60 5
70 o
33. Find out the m=an and variance from the following data : -
I
Factory .A Factory B
Wages No. of No. of
workers workers
Not exceeding Rs. 40 30 45
Exceeding Rs. 40 but not exceeding Rs. 80 25 35
80 120 30 25
120 160 45 40
160 200 25 25
200 240 13 20
240 280 24 5
" 280 320 8 5
Tot21 200 200
39. A collar ffi'lnufacturer is considering the production of a new style of collar
to attract young men. The follOWing statistics of neck circumferences are available
based upon measurements of a typical group of college students : - '
Mid-value No. of students Mid-value No. of students
(inches) (inches)
12.5 4 15.0 29
13.0 19 15.5 18
13.5 30 16.0 1
14.0 63 16.5 1
14.5 66
Compute the Standard Deviation and use the criterion (X ±3 Standard Deviation)
to determine the largest and smallest size of collars he should make in order to meet
the needs of practically all his customers, bearing in mind that collars are wom. on
~:age, ! inch larger than neck si.l.e. (D. Com., RRj., 1949).
loiHASUllES 01' DlSPERSION 231

,-- 40. Calculate the arithmetic tTerage and the standard deviation of the following
figures and state the percentages ot cases which He outside the mean at distance II ± (f,
'1I±2a, "±3a, where (1 stands for the atandard dCTiation.

148, f45, 141, 116, 96, $II, 87, 89, 91, 91, 102, 95, 108, 120, 139.
41. Find the S. D. of the following frequency distribution : -
Exceeding But not exceeding Frequency
5.5 6.5 4
6.5 7.5 2
7.5 8.5 5
8.5 9.5 7
9.5 10.5 9
10.5 11.5 4
11.5 12.5 2
(M. A., Agrll, ,1934).
42. The following table relates to the profits and losses of 100 firms. Calculate
the average profits and the standard deviation of profits.
Profits Rs. Number of £inns
5000 to 6000 8'
4000 to 5000 12
3OQo to ~OO 30
2000 to 3000 10
1000 to 2000 5
Oto rooo 5
-1000 to 0 6
-2000 to -1000 8
-3000 to -2000 9
--4000 to -3000 7
43. In any two series, where /1 and /. represent the deviation from a trial average,
100,
X/l -180 ,E/11=245320
XJ.-250 ,EJ,I-4385Q
II .... 100
Calculate the c:odfident of variation for the two series.
44. In any two aamplCl, wh~ the variatCi Xl and X. arc measured in the same
units,
"1-36 (summation) L'Xll=49428
",-49 ., 2',..,1..,71258
Compute the values of the StandArd Deviations of the two samples. What
additional information is required to calculate the co-dficient of variation of the above
two samplCl? Indicate the uses of such a coefficient. (B. Ctmt., LIKIcn~. 43).
45. An analysis of the monthly wages paid to workers in two firms A and B,
belonging to the same indusn-y, gives the following results : -
Firm A Firm B
Number of ~e-oarnetS 586 648
Ayerage monthl: w~ Rs. 52.5 47.5
Varian~ of the distnbution of wage 100 121
(II) Which firm, A or B, pays ~ut the larger amount as monthly wage. ?
(.) In which finn. A or B. is there sreater variability in individual wages ?
(&) What are the m~rCl of ('I aTctsge monthly wage, and (ii) the variability
ill individual waCCS. ot all the workers In the two .&nn•• A and B, taken toscthcr.
(1. A. S., ,11., ~",.,., 1951).
232 FPNDAMENTALS OF STA'l1ISllCS

46., The following table gives the marks obtained by 100 'itudents ; - -
Digits (Division of Class-interval)

Marks 0 1 2 3 4 5 6 7 8 9 Total
0-9 2 4 3 1 1 1 12
10-19-----1--5 3 4 2 1 15
- - -20--29 ! -
1 7 8 10 5 4 3 2 40
- - - 30-39 3 5 10 2 1 1 22
40--49-- 4 3 2 2 11
100

Thus, 4 marks obtained by 3 students 13 marks by 4 students, 35 marks by 2


students and so on.
Calculate the mean marks and standard deviation of marks ; -
(i) By using the totals only.
(ii) By using the Whole data.

47. How do you calculate the co-efficient of variation of a distribution?


What is the justification for saying that about 68 per cent of the observed values
lie within one standard deviation of the mean value ?
The following marks Were given to a batch of candidates ; -

66, 62, 45, 79, 32, 51, 56, 60, - 51, 49


25, 42, 54, 54, 58, 70, 43, 58, 50, 52
38, 67, 50, 59, 48, 65, 71, 30, 54, 55
82, 51, 63, 45, 53, 40, 35, 56, 70, 42
67, 55, 57, 3D, 63, 42, 74, 58, 44, 55

Find the co-efficient of variation of the marks.


Also draw a cumulative frequency curve, and from this curve find the proportion
of candidates receiving more than 50 marks. (T. A. S., 1953).
48. Explain the terms ; Frequency distribution, frequency polygon, frequency
histogram and frequency curve.
For a cert;ain group of 'Jart~' weavers ofBanaras. the median and quartile earnings
per week are Rs. 44.3, Rs. 43.0 and Rs. 45.9 respectively. The earnings for the group
range between Rs. 40 and Rs. 50. Ten per cent of the group earn under Rs. 42 per
\\ e!k, 13 per cent earn Rs. 47 and over and 6 per cent Rs. -48 and over. Put these
data into the form of a frequency distribution and obtain an estimate of the mean
wage and the standard deviation. (P. C. S., 1956).
49. Compile a table showing the frequencies with which words of different
numbers of letters occur in the extract reproduced below (omitting punctuation
marks) treating as the variable the number of letters in each word, and obtain the
mean, median, and the co-efficient of variation of the distribution ;
Success in the examination confers no absolute right to appointment, unless
Government is satisfied, after such enquiry as may be considered necessary, that the
candidate is suitable in all respects for appointment to the public service.
(I. A. S .• 1947.)
50. The following table gives the frequency distribution of area under wheat
in a sample of 282 villages in Meerut District during 19~6-37. Calculate (0) the btan-
dard deviation, and (b) the semi-interquartile range of the distribution ; -
MEASURES OP DISPERSION 23~

Bighas under Frequency Bighas under Frequency


wheat wheat
0 3 1,000 HI
100- 7 1,100 14
200- 10 1,200 14
300- 17 1,300 16
400- 33 1,400 8
500- 29 1,500 8
600- 27 1,600 6
700- 21 1,700 5
800- 23 1,800 2
900- 20 1,900-2,000 1
(I. A. S., 1949).•

51. What are measures of dispersion of a distribution? Why is the standard


::leviation most commonly used as a measure of dispersion in statistics ?

Goals scored by two teams A and B in a football season were as follows:-

Number ot goals scored Number of Matches


in a match A B
o 27 17
1 9 9
2 8 6
3 5 5
4 4 3

By calculating the co-efficient of variation in each case, find which team may
be considered more consistent. (I. A. S., 1954).
52. Explain the method of computing the standard deviation of a frequency
distribution from a working origin different. from the arithfIletical mean.
Calculate the standard deviation for the data given below using the interval,
50-59 as working origin : -

-Class-interval Frequency
0- 9 2
10- 19 4
20- 29 23
30- 39 30
40- 49 40
50- 59 45
60- 69 35
70-79 25
80- 89 12
90- 99 9
100-109 6
110-119 10
120-129 3
130-139 1
140-149 1
150-159 3
Total 249

How would the value obtained above be modified if. you have to adjust it for
the reason that the data are grouped in class-intervals ? (r. A. S., 1956).
FUNDAMENTALS OF S.TATl:;U(';:S

53. The following is a record of the number of bricks laid each day fot 20 daya
by two bricklayers A and B :-
A- 725, 700, 750, 650, 675, 725, 675. 725, 625, 675,
700. 675. 725, 675, 800, 650, 675, 625, 700, 650,
B- 575, 625, 600. 575, 675, 625, 575, 550. 650, 625,
550, 700, 625. 600, 625. 650, 575, 675, 625, 600.
Calculate the co-efficient of variation in each case, and discuss the relatlYc consis-
tency of the two bricklayers. If the figures for A were in every case 10 more and
those for B in every case 20 more than the figures given above. how would the ans-
wer be affected ? (M. Com., BtmurIU. 1950).
54. A distribution consists of three components with frequencies of 200,
250 an? 300 having means of 25. 10 and 15 and standard deviations 0( 3. ",
and 5 respectively. Find the mean and the standard deviation of the combined
distribution ? (M. Com., B4narar. 1954).
55. Suppose each measurement in a distribution is multiplied by 2. What
happens to the : -
(it) mean of the distribution
(/I) variance" ..
(l») standard deviation of "
(J~ each of the three if .. is added to each meaSlUCment ?
56. Compute the values of arithmetic average, mode, median and standard
deviation for the following observations :
96, 8.... 10.3, 88, 92, 98, 100, 96, 87
92, 94.
57. Suppose a group of children have a distribution of I. Q. Scores with mean
100 and standard deviation 10. If one child with I.Q. 70 is reroOfed, what wllI be thc
c:fi"ect on the mean, and slllndard deTiation.
58. Three distributions each of 100 members and standard deviatlon 4.5 units
are loated with their arithmetic means at 12.1, 17.1 and 22.1 units respectively. Find
the standard deviation of the distribution obtained by combining the chCQI '1

S9 The (irst of the two samples bas 100 items with mean and standard deo:rla-
tion ,: If the whole group has 250 items with mean 15.6 and standard deviatiOn
vIT44 find the standard deviation of the second group. \
. , (M. A., Beo., Ik/~/, 1~"91
60. The mean and the standard Deviation of a sample of to? observa.tlOfls waS
calculated a9 40 and 5. I respectively by a student Who took by mIstake '.0 mstcad of
40 for one observation. Calculate the correct mcan and standard deYlallon.
61. Co-efficient of variation of two series are 60% and 70%. Their standard
deviatjons are :z [ {lnd z6. What are their arithmetic means?
62. Given: Number Mean Variance
IG~ ~
11 Group 60 5'
1 and II Group combincd 95 u .,
Find the missing items.
63. Indicate the extent of dispersion graphically for the data giycn in the
follOWing table ; -

Years
Income (in thousands)
I '.><
AII
B1
6
t6
55
8
ao
,6
JJ
(8
57
9
18
58
8
ZO
59
10

Z2
60 61
12

36
10
18
6z 6_
J4
zz
u.
110

64. The tablc given below gives the population and weekly earnings of twO
localities-A and B. Represent the data graphically to bring out the inequalities
of dil;tribution of earnings.
MBASUB,ES OF DISPERSION 235

Weekly earning
(in Rs. I
o-:to I 2
20-40 6 S
40-00 8 zo
60-80 IS zS
So-IOO 20 4~

65. Find the actual class-intervals from the data given below :
dx -3 -2 -I 0 1 :l S
f 10 15 25 25 10 10 5

Mean = 31 and standard Deyiation = 15.9.


Moments, skewness and Kurtosis.
Moments, Skewness and
Kurtosis 11
MOMENTS

While discussing the calcula}ion of mean deviation and standard


z 2
. .
d eVlatlon we h ave d efi ne d Ed
- an d I:d
~
( or I:jd
- an d I:fd
- - 1n •
case
n n n n
or' discrete and continuous series ) as the First Moment and Second
Moments about the mean respectively. If the deviations are not taken
from the actual arithmetic average but from any other value x then
Edx or I:dx or (Ef~~ or ~fdx2) are known as First and
2

,~ n n n
Second Moments about the value x. It is obvious that any moment about
II value other than mean, would be more than the value of the moment
about the mean. Thus the first moment about the mean is 0 because the
sum of the deviations from the mean is-always o. The second moment
about the mean is the variance or the square of the standard deviation.
Just as we can calculate the first and second moments e\ther about the
mean or about any other value similarly 3rd, 4th, 5th and nth moments
can be calculated either about the mean or about any other value of the
variate. Thus the third moment about the mean or
};fd3
7t3= n and

the nth moment or


Efdn
1t n =n
Just as it is possible to calculate the second moment about the mean,
or the variance, from the second moment about any other value, similarly,
aU other moments about the mean can be expressed in terms of moments
about any other point. The following illustration would clarify these
points :-
Example I. Calculate the first, second and third moments about
the mean from the following data :
Size of item Frequencies
2; 10
4 15
8 8
10 7
MOMENI'S. SKEWNESS ANn KURTOSIS 237
Soltdion. Calcu~ation of the moments about the mean

8 I S
rIO
t>.. o tl 0 a<'I
E u ..!;:lpo."'-" J:j
o~
q ........ ",«~
*~ ----- :g S'-" 2
~
0 ...... q --0 .._.. co
......
0 &.'-' .;:: So II) U ~
-.:::... ~ ~
0 ...... ..... d
..... <'I
fd 2 d3 fd 3
bl)
........ ~ ~
~
~ :J
4.) •21 :J <'I • ..-4 U .....
~
N
po. '"
V <'I
U5 ~<'I
rIO
0
.2 10 -2 -.20 '40 -80 -HS 11.22 112.2 -37. 6 -376 .0
4 IS 0 0 0 0 -l.~S 1.82 27·3 -~·46 - 36 '9
8 8 4 32 u8 p2 +2.65 7. 02 56 . 1 10.6 148.8
10 7 6 42 252- 151 2 +4·65 21.62 15 1.3 100·7 70 4.9
Total 4 0 +54 4 20 1944 346 ,9 440.8
The first moment about the arbitrary origin (4) or
Efdx 54
vl = - - - = -=1·35
n 40
The first moment about the mean or
, S4 54
7tl=v1-V1 = - - -=0
40 40
The second moment about the arbitrary origin (4) or,
Efdx2 420
Vz = - - = -=IO·S
n 40
The second moment about mean or
7tZ=V2-V12 = 420
___ •.
r(

:(5- 4)2
" •

0
~ 86
- . 8
40 4
The third mOment aDout the arbitrary origin (4) or
Efdx3 1944
vs= - - = - = 4 8. 6
n 40
The third moment about the mean or
1Ta=Pa-3PtV2+2.V13
= 1944
40
(; X 54 X">42.0) +2 X
4 0 X40
(Jj;)3
4,0
=48.6 -42.5 +4.9 2
= 1 1.02

If we were to calculate the various moments about the mean by


taking the actual arithmetic average we would have got the same answers.
Thus
FUNDAMENTALS OF STATISTICS

Similarly the fourth moment about an arbitrary origin or


Efdx4
1 ' , = - - and
n
the fourth moment about the mean or
1T,==v",-4 VI 1'2+ 6 1'1 2V2-3 1'1'
Band y Coeflicients
Certain values derived from the moments are of special import-
ance, particularly in a study of Kurtosis.
Thus

and further
1'1 = +v'i3;
and
1'2=~2-3
Thus for example No. I.

(11"02.)2 _J(I1.02.)'
~l = (8.68)3 and 1'1 - (8.68)

We shall see a little later how these measures are of importance tn


studying the departure of a curve from normality (in a study of Kurtosis).
SKllWNESS

Need and lneanin.g. In our studies so far, we have discussed the methods
of measuring the central tendency of a frequency distribution and the
methods of studying the concentration of items ro'und the central value.
These measures of central tendency and disperSion do not reveal whether
the dispersal of values on either side of an average is symmetrical or not.
If observations are arranged in a symmetrical order round a measure of
central tendency, we get what is called a "symmetrical distribution."
When plotted on a graph paper such a distribution gives a normal or
ideal curve. A normal curve has many mathematical properties, which
we shall study in a later chapter in which we shall discuss the various types
of theoretical frequency distributions. For the present it would suffice
to say that in a normal distribution the values of the mean, median and
mode coincide and the quartiles are equidistant of the median. It is obvious
that in such cases the sum of the deviations measured from the mean,
median or mode would be o. We have already mentioned in earlier
chapters that the empirical relationships between various averages and
measures of dispersion hold good only in a symmetrical distribution.
MOMENTS. SKEWNESS AND KURTOSIS 2.39.

Anormal curve is a bell-shaped frequency curve in which the values on


either side of a measure of central tendency are symmetrical.
1n order to study a frequency distribution it would be of great use
to know whether it would give a normal curve, and if not, to what extent
it would §!eviate from a normal distribution. In fact measures of central
tendency and measures of dispersion should always be supplemented by
what are called measures of skewness. Skewness is opposite of symlHttry
and its presence tells tiS that a particular di.rtriblltion is not symmetrical or in
other words it is skew. Thus averages tell us about the central value of a
distribution, measures of 'dispersion tell us about the concentration of
items round the central value, and measures of skewness tell us whether
the dispersal of itemS from an average is symmetriCal or asymmetrical.
The following figures give us an idea about the shape of symmetrical
and asymmetrical cu:r;ves.
Figure No. I gives the shape of an ideal symmetrical curve. it is
bell-shaped, and in it, there is no skewness. The value of mean, median
and mode in such a curve would be identical.

( ~.
/ '

\
J ~

V
/ \\
L/ a "--
M
Z
Figure I.

Figure No. z gives the shape, of moderately skewed curve. It is


skewed to the right. In it the value of mean would be more than the
values of median and mode. Median would have a value higher than
the value of the mode. Such curves are called p()sitiflety· skew.
240 FUNDAMEN'rALS OF S'rA'rIS'l'ICS

( " ~
1\
I f\
II '\
~
./ Z Md
~
Figure z.
Figure No. 3 also gives the shape of a moderately skew curve.
This curve is skewed to the left and in it, the value of mode would be
greater than the value of median and the value of median would be greater
than the value of the mean. Such curves are called negatively skew.

V
I \
/
1
"\ ,

~
/ I ~
(1M Z
Figure ;.
UOlGlNTS. lOWNESS AND s:uaTOSIS 241

T_t oIl11ewaea
In order to find Qut whether a particular distribution is ,Ikew cer-
tain testa are u~a1ly applied. They ale as followa :-
(.) In a lkew distribution val,ues of mean. median and mode
would not coincide. The ttlean and mode would be pulled wide apart
and median would usuilly lie between them. Vie have already seen
that· in modetate1y asymmetrical distribution ;
Mean =Modc+ I (Median-Mode)
(j) In a Ikcw distribution the two qual' tiles would not, be equi-
distant from the median or in other words (12,- M)-(M- 121) would
not be O.
(e) A skew distribution when plotted on a graph paper would not
gi'Ye a .ymmetrl~ bell-shaped curve.
Mouurel ollkewnel'
The abo..e mentioned 'testl would indicate whether a particular
distribution ia skew or not. If a particular diltdbution is (ound to be
skew the nat problem that arises is to meu~re the c::ct~t of skewness.
Some distributions may be slightly dUfctent from th;' ~'!trical dis-
tribution while others may be very much different fro~ ~~,. Meuures
of skewness are meant to give an idea about the extent "01 asymmetry
in a series. . " -,
First IIIUlllrll of SIu1ll1lIlS. 'Pte 'first meaSures of skewness are
based on the assumption that in a skew distribution the values of mean,
median and mode do not coincide. This being so. the difference
between any two of these values indicates the extent of skewness.
Thus fint measures of skewness ate :-
(I') Mean - Mode or (11- Z)
(it) M~-Median or (.-M)
(iiI') Median-Mode 01' (M-Z)
The above measures of skewness arc absolute measures. For pur-
poses of comparison it is necessary to have telative meaaurea of .Itew-
neS!. Relative measures of skewness are obtalined by dividing the
absolute measures byuny measures of diapetaion. The absolute measures
of .kewnes. should not be divided by a mCUUt'e of central tendency or
average because. here the problem il not to study the extent of skewness
in s:elation to the size ofitem&, but it is to study the asymmetry in relation
,J to the di.~raal of items round a central value. The purpolle of studying
skewnes'1' to find out how much more or leis. do the items on one side
deviate.from the items on the other side of a central value. Therefore,
absolute measures of skewness IhQ~l~diVjded b1 a measure of disper-
sion rather than a measure of ce(it.r t\ndency. Relative measures of
.kewness .lIe known o,..'/fid,,,f bf ~ »IfI.us.
16
242 FUNDAMENTALS OF STATISTICS

Thus
Coefficient of skewness or
· a-Z (i)
J=sz····
· a-Z
or J= -a-· ..··· (it)
If mode is ill-defined median can be used in place of mode and then
• ,(1- M (,'1.'.)
J=an;- ..... .

· a-M
or J=_ -8-······ (iv)

Skewness can also be studied by studying the difference of median


and mode. Thus, I

./=sz--
· M-Z
..... . (v)

· M-Z
or J= sm- (vi)
Kllrl Pearson has given a formula in which the denominator ,is not
the mean deviation but standard deviation.
· a-Z (vii)
Thus J= - ..•...
a
\
If mode is ill-defined, Karl Pearson is of opinion 'that its value
should be estimated on the basis of the empiri~l relationship which
exists between the values of mean, median and mode in a moderately
asymmetrical distribution. We have seen that in a moderately asym-
metrical distribution
(Mean-Mode) = 3 (Mean-Median)
Thus j = 3(a-M) .... (viii)
a
The value of the above coefficients of skewness would be 0 for a
symmetrical distribution and for skew distributions it would be a pure
number. These are the two properties of these coefficients and for these
reasons they are regarded as better than other tneasures. In theory there
are no limits to the values of the coefficient numbers (i), (ii), (iii), (iv),
(v), (111} and (vii). In actual practice for moderately asymmetrical distri-
butions all these coefficients (excepting No. viii) vary between ± 1. The
theoretical limits of coefficient number (viii) are ±3 (because the
a-M .
theoreticaI limits 0 f - - are ±1) but they are never reached In actua
I
u
practice.
MOMENTS, SKEWNESS AND KUR'l:OSIS 243
SuonJ Measure oj Skewnes.r. The second measure of skewness is
based on the quartiles. It has been said above that in a skewed distri-
bution (M- Ql) and (Qa-M) would not be equal. A measure of skew-
ness is thus derived by finding out the difference between these
two values.
Thus
Second measure of skewness =(Qa-M)-(M-Ql)
=Qa- 2M+Ql
=Q.+Q,-2M
The above is an absolute measure of skewness. The relative
measure can be obtained by dividing this absolute measure by the sum
of (Qa-M) and (M-QJ.
Thus the coefficient of skewness or
;_ (Qs-M)-(M- Ql)
- (Q.-M) + (M- Ql)
QS+Ql- 2M
= QQ .... (ix)
3- 1

This coefficient is also a pure number. Its theoretical limits are


±1. For a sxmmetrical distribution its value would be o. However, the
~econd measure of skewness and its coefficient do not always give de-
pendable results. In many cases the value of this co-efficient may be
zero and yet. th~ d~stribution may not be perfectly symmetrical. The
reason for thiS lies In the fact, that quartiles llre not based on all the ob-
servations of a series. Thus this measure of skewness should be used
with caution and for purposes of comparison~ as far as possible, Karl
Pearson's coefficient of skewness should be used.
The following example illustrates some of the above formulae l -
Example 2. Calculate the coefficient of skewness from the follow-
ing data : -

Wages in Rupees No. of labourers


0-10 185
10-20 77
20-30 34
30-40 180
40-50 136
50-6Q 23
60-70 50
Soililion. In the above example Rs.
Mean or a 29
Mode or Z 37.7
Median or M 32.6
244 PUNDAMBNTALS OF STATISTICS

Fjrst Quartile or III = 9..3


Third Qul&.Ctile or Q. - .2.8
Mean De?iation ftom mean or 8 =- 16.5
Standard DeviAtion or a 18.9
Coefficient of skewness

j- ~~~ = 29-37.7 _ -.53 {No. ;;}


43 16.5-
,,-M ,29-32.6 22 (No. ill)
j= -8 - 16.5 --.
III-Z _ 29_-:-_37.7 = _ 46
a 18.9 .
3(a:M) 3(29"is~:.6) = -.57 (No. tliil)
Q.+QI-aM (42.8+9.3-65.2) 39 (No. ix)
j- Qa-QI --42:8-9T -.
Poeltive and negative akc..rne..
As has been said earlier. if A curve is skewed to the right the value
of the mean would be mote than the value of either median or mode.
Ia such cues skewness is positive. \
If, on the other band, a curve is skewed to' the left, the value of
mean 'Would be less than the values of median and mode. In such
casel skewness is said to be negative.
In the e:rwnplea. solved above, skewness is negative as mean has
a value less than the values of median and mode.. Further, the·degree
of skewness is high all a coefficient of skewness about .4 indicates a
high degree of skewne...

Jtt1ll TOSIS

We haT~ seen above that mc:uures of skewness tell us whether a


particular distribution dHfera from a normal or symmetrical distribu-
tion and if so, to what extent. Another measure to teat, how;Dear a
particular I'rc<jaency dlattibutlon confu"", to the ..."",u curve;'
tos~. II ittdktd", .111111". " Jistr;;IIn.. ;1 _.,.,11a1-hll" Dr III"'
p
-Ih_ 11M IID,.I111111 Milrilnllillll. The figure on the next page shows a no aI
carve and two other curves in which Kurtosis i. present•
f
• In this figure curve No. 1 is a norma1~' e. It is also c:a1led M ~D­
·_II~. CalVe No. 2 is more peaked th ':l the normal curvc. Such
cur-..ea are known u Llplo_lit. Curve .f3 is more flat-topped than
the nomW curvc. Such curves are leno nt as Pl4I.1htrlk.
XOIO!N'I'S~ 5DWSNEIS .AND JWJl'tOSIS 245

Figure 4.
baurea of kurtosis
Kurtosis is measured by coefficient' f3. or its derivatio.r )'1' We
lave seen in connection with th e ltudy of moments that

~.
Q "'"
== -.-
"".
In other words P. is equal to the fourth moment about the mean
li-rided by the square of the second moment about the mean.
Y. = P. - 3
The standard value of fl. is taken SUI 3 and the CUtVC8 with valuei
f II. less than 3 are called ~latykurtic and curves with values of P. morc
lao 3 are called leptokurtlc. In a normal or metokurtlc curve the value
flJ. is equal to 3. .As sudl for a normal curve the value of Y. -0, and
I curves which are more Bat-topped o·r more peaked than the nonna}
nve the value of y. would be cithet a minus or pInl iigure. The
igge.r the value
!!parture from no
c;!,!j1 in sa frcqueru:y dittributiOD. the greater is its
ty.
iapeaion, .xccwnes. IRIld kurto8ia contrasted
Now that we have .tudicd CIi.apetsion, Ikewocsi and kurtOllis, it
·ill not be out of place to comparc1Ulcfcontralt them,llI all the.e meuurcs
:e meant to study the formation of a frequency distribution;Disper.ion
:udies the acatter of itcml unmd a central value or among themaelTcl.
: doa not ahowthe extent to which deviations dulter below an QeDlle
246 FUNDAMENTALS OF S'rATISTICS

::>r above it. Measures of skewness study this point. ,They tell us .about
the cluste!= of deviations above and below a measure of central tendency.
In a normal distribution the deviations below and above an average are
equal while in an asymmetrical distribution they are not equal. Kurtosis
studies the concentration of items at the central part of a series. If the
items concentrate too much in centre the curve becomes leptokurtic.
and if the concentration in the centre is comparatively little the curve
becomes platykurtic.
Thus we find that measures of dispersion, skewness and kurtosis
study three different aspects of a frequency distribution. Measures of
dispersion throw light on the span withil;l which values of a variable lie.
They study the size of a series. Measures of skewness throw light on
the shape of the series and the size of variation on either side of a central
value. Kurtosis studies the frequencies of' a series at the cent.ral values.
The theory of skewness and kurtosis has not a very great impor-
tance in economic and social studies, as in these cases a normal distri-
bution is usually out of question, but the importance of these studies is
very great in biological studies and studies relating to other physical
sciences.

Questions
I. Define moments and discuss the method of calculating momcllts of dja-
persion about the mean.
I
2.. How would you calculate the value of a moment about the mean from the
value of the moment about any arbitrary value ?
~ . What is skewness? How does it differ from dispersion? What arc the
vadous measures of skewness which you know ?
4' What ia kurtosis? What purpose does it serve? 1& the ltudy of kurtosil
useful in economic and social scieoces ? If oot. why ?
5. Find the Second Moment of I;>ispersioo and a coefficient of skewness from
the data in the following series : -
Size of item Frequl;ncy Size of item Frequency

3 7·5 8S
7 8·S 32-
za 9·S 8
60
61. Find out the mean wage and a coe6icien't of skewness for the following :_

3~
40 ..,.
men get at the rate of Rs.

..
•• ,.••
5-5 0 ....
4-50 per man
"••
48
roo
u." .... ..
.f
6-50
7-5 0 .. .
...
f' f'
.f 8-5 0 f •
..•• ..
....
87 9-5 0
..
'f
43
2:& .. ff 20--5 0
Ir-50 .. ..
MOMENTS, SltBWNESS AND KURTOSIS 247
7. Frnd the coefficient of ske-wness from the following data :_
Heights of school bays at age 5
Height in No. Height in No. Height in No.
inches inches .inches
28 1 ~6 166 44 567
29
30
0 n 3'44 45 233
I ;8 740 46 89
31 3 39 U 67 47 27
32 S 40 1670 48 19
33 13 41 1614 49 4
34 I 40 42 154 1 50 4
H i 59 4~ 102.8 SI 1

(Bombay, 1935).
Find out coefficient of dispersion and a coefficient of skewness from the
8.
following table giving wages of 230 persons and explain their Significance.
Wages No. of personS Wages No. of pecsons
Rs. 70--So 12 lIo--UO 50
SOP-90 18 120--130 45
90-100 ~S 130--140 20
.. lOO-lIO 42. 140-150 S
(B. Com., AgTa, 1940).
9. The following table gi~es. the di.tribution of pop?lation in. towns A and B
in age groups. Compare the var'atlon and skewness of thelf frequenCIes.
Age-group Population ,in thousands
A B
0-10 is 10
10-2.0 16 12.
Z0-3° IS 2.4
30 -40 IZ '2
40-"-5 0 10 29
50-""60 5 II
60-70 Z 3
Above-70 I I
(B. Com., AgTa, 1947).
10.From the following table state in which section the coefficient of skewness
is higher.
Marks obtained Students of sec. A Students of Sec. B.
0.--10 I 0
10--2.0 4 ~
20--3 0 10 10
S0--40 2Z 13
40--5 0 So 42
50--60 35 So
, $0--70 to 10
70--S0 7 8
S0-90 I 2
I I . Weekly earnings of tWo groups of workers in factory A and B arc given
«nd you arc asked to compute coefficient of skewness by Pearaon'. method.
248 PUNDAV8NTALS Of STATISTICS

V,eekl,W.p No. ofworketl of No. ofworkers of


Re. PactotyA facto". B
I-u
u,---16 1II 50
16-Z0
1O-"S4 10
"
40
'070
Z4-:r.' a,
1&-32 ,0
,Z-$'
36 -41)
46
,0 I'"
IS
40-44 60 III
44-48 70 10
Total 310 ~to

67. Prom tile,EaUoW'iIlS rzblc. cOmpute quartile deviation all wcUutbecooffic:ier.t


ofelcewnCils :-

Size. P~uency Si:!:e F~


+-'
1-11
110-16
16-ao
EO
II
30
a+-a'
,a_,6
sll-sz
56-40
,
u
10

a
:0-:4 IJ

611. Pinel the mean deviation and lta'nd2rd dCTi;ttiOn-of' the follow ins table giving
tbe mark. obtained by soo candidates. Calculate alle) a coefficien.t of .tewnesa.
lJur.>hcr of marks No. of candidates Number of marb No. of C2ndidsres
f . ~ 10 ,0 Below SO tl)lf.
.. 20 70 •• 60 3S..
.. 50 120 .. 70 4a~
.. 40 ,6. .. 80 100
~.Plod the standard and qutlrtiJe deviations and coefficient of akC'iUICIII for
the following : -

0-,
Class
,-6
..
PI:Cq'l1coq
8
CbIa
I&-JIG
1l0-:4
Prequency
a4
as
O-IO 10 z.t-a, 20

1_1,
10-13

tria
14
16
u
,o--,a
as-,o 14
u
JZ-J' 7
'70~ Pind Qutthe mean ;!cvution. standard deviation and quartile deviation (tOm
rile following table. Also ealQllarc II. coefticient of .kewndll.

Wageain Ra, No. of labOureH


aboTe 0 61S
..
•• 10
20
~oo
4ZJ
.••
••
JO
40
,0
,19
109
75
,0,
..•• 60
70
249

.6.. Calculate K&rl peaftOq·. coefficient ofslcewnes.B fcom the f'ollowias datA 1 -
Mub Namber of 1Itudents Mala NlImber of atudca..
AIac"e 0 1,0 Abo"c ,0 70
10 140 •• 60 ,0

..••., :: I: .. I:
,t.
t
40. 80
(8. C-•• AJItI•• I",).
17' Calculate the valae of the ,td molDeot about the meaA hom the data alyce
itt question No. 16 rabo'Ye.
II. Fied out the muca of tJa
aod),. from the data 8ival in question No. I,
above.
19. Pind the mean, IllO<ic. standard dc"iailon and a co-effideot -of Ilcewneat
for the foUowins : -
YC&l1l UDder 10, :&0. ,0. 40, )0, 60.
No. of peaona I" Ja, )1, 7 8, 97. ICl9.
(P.CoS., 19'11).
80. CAlcQd:e Mean, S.D. md Karl Peamon's Co-cticlent of akewaca from
the £oIlowing d:t.t& pupea into 1.1QequaI. step interftla.

t
Clads
p~
1002.0
24
J-9
fJ,
'-4
1-2 .. J
1121
0

• al. (,,> KArl Pcamon', eo-dilcleo.t of Sltewnea of IS distributbl II + .,... Its


Standard Dc-vi.ation b 6., and m=n is :19.6. F'sed the .Mode and Meclilul of the die-
trilaioG.
(b) If the Il.lOde of the abo_ is ~.I. -whll.t will be the Stmdard Deriuion ?
(.8. C-.. DoJN. Jp60).
n. Po~ ckta Are ~ to an e.::cnomist (or the p;upoec of~. The
dill. tcEer to the lciGsth of life of & sample Good.-pr T,tea.- Do you find that the
data arc PJaty-IrattiC ?
D - roo. :z6b: ... ,0. ~,. ... 1967.&, ~ - 29&,.1.
~ - '"SO",
&,. Cc) Ira a ~ ~t the S.D. of which Is 11.8. the or.a1ue oE
~ II mote than mam by 4. what ww be ita co-ef11cient of IkeWDeu ~
(» In a &equ.eccy dhtn"blltion. KuJ. Peuecm'G c::o-dlident of akewneu f&
'f'CIled thd tIlfQ dfItribi:tkIa WIllI Ikcwed to the left to &Q ~ of ;8. 118 man film:
..... lea than i.ta moiW ftlue bJ,4-I. WhAt 'wt!a its ctaftduct c1eridIon?
(I) 10 a diztribalioa. the diaCrence of the two qusttilea fa IJ, their IIWD is
~, and diC~ 11; 20, Sod the Co-cfBciaIt of IIkewr.aat.
&4. Whit:h sroap '0 tnQl.'C ~1 ~ted;
C") (I) Mac- &a; Mcdi:IIn ... &40 $ D. - 10
(ii) :Meso - u; l:\fedim ... an S.D•.- IS
(j) "Wbethet t:bc Il'OI.tp II nqptimylJkew'ed or poeitiYely II1tctRd if:
McWm ==~ Mean - &6. (.D. C-.. ,...,,_. 196"
Index Numbers 12
Need and meaning
Index numbers are a spe&'ialised rype of average. They measure the
central tendency of a time series or a spatial series. We have already
seen that averages are used to cOmpare two or more series as they re-
present their central tendencies. But there is a very great limitation
in the use of averages. Averages can be used to compare only those
series which are expressed in the same units. If the units in which two
or more series are expressed are different, or if the series are composed
of different types of items, averages cannot be used to compare them.
Moreover, statistical series are generally affected by multiple causation
and it is not easy to study their effects separately. For this reason it
is not possible to directly measure the effects of such factors. In places,
where it is difficult to measure directly the variation in the effects of
a group of factors, or where such variations are entirely incapable of
direct quantitative study, relative variations are measured, and thus an
idea about the change in the effects of factors is obtained. Index num-
bers are meant to study the changes in the effects of su&'h factors which cannot
be lIIeasll1"ld dine/g. Thus according to Bowl!)" "Index numbers are
used to measure the changes in some quantity which we cannot observe
directly ...... " For example, changes in business activity in a country
are not ca!.Jable of direct measurement but it is possible to study relative
changes in business activity by studying the variations in the values of
some such factors which affect business activity, -and which are capable
of direct measurement.
The reason why we study the effects of the changes in such factors
lies in the fact that though events differ from each other yet there is a
certain amount of similarity in their effects. Due to this similarity we
attempt to study the events in a general fashion. As Bowley puts it
" .•• , .. the method of index numbers is at once applicable to th disen-
tanglement of that which is common to the whole group fro those
variations which are special to individual items". For example, cli nges
in the general price revel in a country are incapable of direct mea ure-
ment. But due to similarity in the changes of price levels of 'ff-
erent commodities we can have an idea about the change in the general
price level, if we study the changes in prices of different commodities.
We presume that there is nothing like a general price level, and so the
question of its direct measurement does not arise. General price level is
an imaginary concept and is affected by a multiplicity of causes whose
absolute effects cannot be m~c;ured correctly. In s:nch cases w~ can
INDEX NUMBERS 251
measure relative changes in the price level of different commodities and
thus can have an idea about the problem under investigation. This
difficulty is not confined only to the measurement of general price level.
In other fields like cost of living and industrial activity, etc., similar diffi-
culty arises. These things are also not capable of direct quantitative
measurement. We can only study their relative changes by studying the
variations in certain other factors which are connected with these
problems.
The changes mentioned above can be either in relation to time or
in relation to place. For example, the cost of living of a certain group
of persons IIlIlLy vary over two periods of time, or the cost of living may
be different at two different places at the same time. The technique
of index numbers is employed to measure both these types of changes.
It should be noted, however, that index numbers measure only relative
changes in the values of a phenomenon.
T.echniqne of common denominator. The question that arises at this
stage is, how can relative changes be measured. And the an.swer to
this question is, that the relative changes are measured by the use of a
common denominator. Thus if the price of wheat in the year 1950
was Rs. 20 per maund and in 1955 was Rs. 10 per maund we can reduce
them to a common denominator of 100. If the price of the year 1950
is taken as 100 the price of 1955 would be 50. Thus the index number
of the price of wheat in the year 1955 would be 50 as compared to 100
of the year 1950. The technique of common denominator is specially
helpful when we have to average the price level of a large number of
commodities which are expressed in different units. Thus, if to measure
general price level we take into account the prices of wheat, iron and
steel and cotton these prices would be expressed as so many rupees per
maund in the case of wheat, per ton in case of iron and steel and per bale
in case of cotton. Obviously these prices cannot be averaged as they
are expressed in different units. Or, in other words, we cannot measure
their absolute level. But if these prices are expressed as percentages.
of the prices of some previous period, it would be possible to study
their relative changes and to average them into one figure, because
now they will have a common denominator. Suppose in the year 1950
the prices of wheat, iron and steel and cotton were respectiv~ly Rs. 20
per maund, Rs. tOO per ton and Rs. 200 per bale and in the year 1955
these pric~s were respectively Rs. 10 per maund, Rs. 80 per ton and
·Rs. 150 per bale, the prices of 1955 can be expressed as percentages of
the prices of the year 1950. These percentages would be respectively
50, 80 and 75. These J?ercentages are index numbers of the prices
of these commodities In the year 1955 and they show the relative
changes in the price levels of these commodities over the prices of the
year 1950. If we want to measure the changes in the general price level
and if we take into account anI)' these three commodities we shall have
to average these index numbers of price relatives, as they are popularly
known. It is possible to average the price relatives though the~r original
252
?rioes were incapable of being averaged on account of the difference
n units in ",blch they were eXpressed. The arithmetic average of
these price rc1ativea "WOuld be 68.3 iSldicatlng that the general price
level in the year 1955 'WS8 68.3% of the price level of the year 1950.
It is diflicult to give a comprehensive delinitlon of the term index
nUQlbet, because in actual practice index numbers are calculated by a
variety of methods and tQ.ey ca1UlOt be summarized in one definition.
However, £rom a gt".netal view·point we can say that 1m ituinc t1III1Ibw
'1 If ,.,!tIm, 1IJI;'1fIN oj lin #,lIIrtl/llIIIkn!1 of 1:1 grotIp of itm/s. It cosnparcs
:he level of 11 phenos:nenon on a certain date with its level on some
?revious date or the levels of a phenomenon at dif£erent places on the
same date. It should be remembered, however, that the technique of
.index numbers is used to measure the relative changes in the level of a
phenomenon, w~erc it is difficult to measure its absolute change Of
'Where absolute change is incapable of mcasutement.
Concttuction of WholCGalc price iD4ex numbetG
We have seen in earlier sections that the neceSSity of the construe·
tion of index numbets 'Which measure relative variation in the geneml
price level is very great, because general price level is An imaginuy COn-
cept incapable of direct quantitative measurement. 'The study of price
changeS 1$ a very important one in the modem economic sct.ul? as it
afEectS practically all branches of economic activity. The' vanations
in the general .price level are studied with the help of icdex numbers
of wholesale pt1ce.. The construction of wholesale price index num-
ben involves certain problems which are not easy to tnc:ldc. These
CO~:u have to be tacJded very cautiously as any mistake in the.ir
g is liable to a.B.ect the index numbers so constructed. The
problems are a. loUaw8 : - .
(1) The lint problem which themaktt of an index number of
Wholesale prices has to lace is that of lIN nkmtm of illSr from W~Ch
the index m.1n:Iber is to be constructed. A decision has also to be ta a1
about the Ilttansement8 to be made Ear obtaining the price quota • ns
of the items selected, &om variou8 centres. .
(A) It has been said earlier that index numbers measure relAtive
changes ot III phenomenon and usually . the c:ba.agcs relate to the level
of the phenomena on Some previous date. 'I'he changes in the general
price level thus would relate to the price lcvel of some previous period.
Usually this period is of one year aDd it is called the base' year of the
index rnunbet. Various 'considerations have to be taken into aocount
in the seiccti,on of the base year Glnd so the second problem in the cons-
truction of index numbers is IN.r,1R1iDts ..,tIJI Hn'y1II,. and the conversion
of c:urrent prices to price relatives based on the prices of the base yeu.
(3) The nezt step in the conauuctiO.b of wholesale price index
numbers is to avet'Rge these price relatives of the ''fttiouB cOmmodities.
Which average shbuld be used for this p1ttpose is It very important
aad pertinent 'JUestion and thus the thUd probl~ which the ma~
of w~.olCflale prIce index number has to face is with regard to Ih, ·I,.1hn
oj lIN tIII"ag'..
(4) Ail the items used in the construction of aa index number
are not of equal importance and afl such if tbe index number is to be a
representative one, weigbts should be assigned to ~B. items in rela-
tion to their importance. We have already seen iQ1the chapter on
Measures of Central Tendency, how diJIicult it is tQ take a decision
about the coaect weights. Thus the most diftiC\1lt problem ·which
has to be faced in the construction of index numbers lS the problem
of weighting the index number and of s,h&ling Sllill1llu ",,«hls.
O~'tI of tiJI itllJ#x 1I11J'J1111r. We shall disehss all these problems in
detail in the following pages, but one thing should always be kept in
mind :and i.t is, that the most important factor whic:h $hould be taken
into account in the solution of the above problems ie the object with
which a particular index number is being constructed. The selection
of the items, the base year, the avenge and the weights is considc:nbly
affected by the purpose with which a particular index number is cons-
tructed. If the purpose of a price index number is to measure the
general tendency of the price level we shall have to select the items out
.-of Ii very large number but if its purpose is to study the general tendenq
. of the prices of agricultural commodities only, the selection of item8
would be £rom a smaller number. Sinillady the. selection of the base
year WOUld be aBeeted bi the object of the index number. If an inde2
number is being constructed to study the changes in the general price
level during the 'War period, the base year should be the year immediatel,
preceding the period of the war. If, on the other hand, the pUr:pOSf
of the index number is to study the changes in the general price leve:
in the post-WAr period the base year should be the rear immediatel}
succeeding the 'War period. SimiIa~ly, 'Weights of variOUS items consi-
derably depend on the object with which an indo: number is computed
Wheat prices may receive a very high weightage in an index number
constructed wjth the purpose of measuring general price level of fooi
article., but its weight in a general purpose index number of 'Wholesale
prices would not be as high as in the previous case. Thus, We arrive
at II very important conclusion that ;n 1111 t(J1lllnltnMl uf all intIIx f1tIIJI;"
liII ","ilill prabu.s _""" sholllJ b. vi,,11,J in 1M ligbl of fbe objul ";1.
lII!Jith II ptJrIiRi/4, intl4x flItllflll,. is P"}4"tI. We shall nOW study the fou
problems mentioned above, in that very order.
SlU.EC'l'lON 01' ITBMS, THlUB. NV:uD1l AND pllIa (JIOTATIONS
(1) Selection of the commoditic8
Chtll"adlrilnts til it11lU. The l'toblcm of the selection of item
arises on account of the fact that it IS not possible to take into accoun.
aU the itims whose prices change, a general purpose wholesale price
index number attempts to represent_ Since 1\ generil purpose wholesale
254 FUNDAMENTALS OF STATISnCS

price inde.x number represents the price changes of all commodities in


general, technically in its construction the priCe changes of a11 the items
should be studied. But it is ndther possible nor necessary to take into
account all the items, and only a few representative items are selected
from the whole lot. Various items are divided in different classes and
one or two items from each class are selected in such a manner that
they adequately represent their group. A representative commodity
should possess the following two characteristics : -
(0) It should be reprelentoNv, of the tastes, habits, cllstoms and necessities
of the people to whom the index number relate.f. If such commodities are
selected which do' not have these characteristics the index number would
not serve the purpose for which it is meant. A general wholesale
price index number is meant to give an idea about the changes in the
general price level in the country so that the effects of price variation
on various economic and social problems may be properly studied.
If the items selected are not representative of the tastes, habits and cus-
toms of the society, the conelusions derived from their study wou~d
never be representative and would not be correctly applicable to the
problems of the people for whOm it is -meant. For example, if in a
wholesale price index number, constructed in our country, items like
strawberries, foreign wines, felt hats, bows, heavy machines, refrigerators
and costly automobiles are included, the index- number would not
represent correctly the price changes, as the variations in the prices of
these commodities hardly matter much, so far as a vast Vlajority of In-
dians are concerned. It should be remembered that even a general
pur_pose wholesale price index number is meant for a particular society
whIch may have its own tastes and habits. The index number should
take all these factors into account and the selection of the items should
be made acco!dingly.
(b) It should be stable in quality and preferablY should be standardized
~r graded. If a commodity is not stable in quality it is unfit for inclusion
in an index number. The reaSOn is that an index number is meant to
measure the Change in the price level of the same commodities week
after week or month alter month as the case may_ be. If a commodity
is not stable in quality each time when its price is quoted, it would
te~hnically refer to the prk.e of S?me~ti diffe.rent from the o~e wb~se
prIce was quoted the 'prevlous tune. mparlson of the relative prIce
level is possible only if the quality of the commodity does oot change
much. This is the reason why in the construction of index numbers
it is better to have such items which, are standardized or graded. •
I Number of items. There is no hard and fast rule about the exact
number of items that a good index number should have. Theoreti-
c:).lIy the larger the number of items, the more accurate would be the
results disclosed by an index number. But a very large number of
items involves considerable difficulty and trouble in calculation. Many
times an index number may become erratic if the number of items is
lNDBX NUMBERS 255
very large and if they are not properly managed. It can be said that
the number of items from which an index number should be constructed
should be fair\y large consistent with ease in handling them. However,
in the construction of a sensitive index number of price the number of
items selected is generally very small. Sensitive index numbers of prices
are meant to measure changes in the prices of those commodities which
are very sensitive to change. In sensitive price index numbers even
a small change in the prices is accounted for. They are not fit for general
purpose studies but they have their own utility. In India during the
second war period and even after the war the Economic Adviser's
Office used to publish one-such index number of prices. It was construc-
ted by using only 23 commodities. In other countries such index
numbers are constructed from 15 to 20 commodities only. The general
purpose index number of prices issued by the Economic Adiviser's Office
in India takes into account 78 commodities. I.n other countries general
purpose index numbers are constructed from a large number of items.
The British Board of Trade Wholesale Price Index Number is cons-
tructed from 200 items and the U. S. Bureaus of Labour Statistics
Wholesale Price Index Number is constru<tted from 450 items.
Varieties of a commoJity. In the selection of items and in deciding
about their n mber, the different varieties of a commodity included
in the index number should be properly studied. Ordinarily all those
varieties which are in common use should be included. Thus three
or four varieties of wheat may be included in an index number of
wholesale price. If the prices of these varieties are averaged before-
their inclusion in the index number, tHe commOdity in question does
not receive any extra weightage but if these varieties are included as
so many items, and if their prices are not averaged but entered separately,
the commOdity in question receives a weightage equal to the number of
varieties included. Calcutta Wholesale Price Index number and Bombay
Wholesale Price Index Number (now defunct) used to give weights to
various commodities in this fashion. The Economic Adviser's Index
Number of Wholesale Prices in our country takes into account 215
varieties in all though the number of items is only 78, but in this index
number the prices of various varieties of a commodity are first averaged
into one price :Ind this average price is included in the index number,
so that a commodity does not receive extra weightage on account of
the inclusion of more than one variety. The various varieties selected
should not only be generally in common use but also stable in character.
We have already said in an earlier section that if there is no stability in
the quality, the price quotations at different periods would technically
refer to different commodities and as such cannot be compared. If
there is no change in the price of a commodity hut if its quality changes
the index number would record a change even if it is not there.
Clauiji:ation of items. With a view to give separate information
about the different groups of commodities, the items selected are classified
in groups and separate Index ~Jllb~rs are calculated for different groups.
'256
This helps in studying sepaately the price variations of diB'eient grouP"
Thua. jfTood articles constitute one group. it would be pom"ble to .tudy
the pricc change of this group sepatatdy &om the Changes of other
group-s. It goes without saying that such studies :arc very .eful. The
c:1assliication of items in this fashion increases the hOmogeneity of data.
If these groups arc further subdivided in smalJer groups. the hr ma-
geneity of data increases further and it becomes p08sibfe to study the
changes in the price level of the sub-groups separately and in detail.
The BcOnomic Adviser's Inde: Number of Wholesale Prices in our
country classifies the commodities in five major groups. namely:' (1) Food
Articles.. (2) Industrial Raw Materials. (3) Semi-Manufactures. (4)
Manufactures and (5) Miscellaneous. Each of these major grou~s is
further sub-~vided in Ii!. number of smaller groups. Thus the rood
articles group is divided in three sub-groups, namely, (.) Cereals
(6) Pulses, and (t) Others. ·Other grours are also divided likewise
and the total number of sub-fn'OUpS in which the 5 major poups are
divided is 18.
(2) Obtaining price quotations
SII,dia of rI/Jrmntlltitll lIMfS atuI pmcns. After the commodities
have been selec:tea and classHied the next problem that _arises i. that
of the collection of their prices. The prices of 11 commOdity vary from
palt:e to place and even at one place from shop to shop. Just as it is
not possible to include all the commodities in an index numbbr. simi-
larly it is not possible to obtain price quotations &om It places where Il
particular commodity is. putchased or sold. Even from selected places
price quotations cannot DC collected from all shops. Therefore, a selec-
tion of representative places and of representative persons bas to be d01le.
Generally such places are chosen where a particular commodity is
purchased or sold in large quantities. The prJces ruling at such places
usually affect the prices at other places where the commodity in question
is purchased or sold in comparatively smaller quantities. After the
si::lCction of places in this fashion, the next thing is to appoJnt some
representatives who would supply the price quotationi &om time to time.
This work can be done either by appointing special staff for the purpo~
or by giving the 'Work to some selected individuals or instit.ltions orthat
particular locality. Information published in journals or maga,;nes
about the prices ruling in various places can also be utili2!Cd. In appoint-
ing representatives or in selecting individuals or institutions ..,ho would
supply price quotations care should be exercised, as if they arc biased
the mt'oanation supplied by them would not b~ representative and
trustWorthy. To keep a check it is wllYs better to appoint more than
one person, preferably three or four in each selected locality.
How ;1 thl prifl to h, tplDlld. Mter this a decision has to be taken
about the manner in ...bleh prices would be quoted and about the deininon
INDEX .NUMBBllS 257
of the word price. Prices can be quoted in two ways-either by
expressing the quantity of commoclity per unit of money or by expressing
the quantity of money' per unit of commodity. The second one is
technically called Price and the first one ITlverse Price. "rhere should be
no doubt about them. They are not the same thing. There is always
an inverse ratio between them, if one increases the. other c1ecreases and
,lice "ersa• • For example, suppose the price of a commodity is Rs. 5 per
40 kgs. It can also be expressed as 8 kgs. per rupee. If the price
increases from Rs. 5 per 40 kgs. to Rs. 8 per 40 kgs. the inverse price
would be reduced from 8 kgs. a rupee to 5 kgs. a rupee. In the cons-
truction of wholesale price index numQers, prices should be quoted
in quantity of'money per unit of commodity. Or, in other words,
price;; should be money prices and not commodity prices.
Should be wholelale prke. So far as meaning of the word price is...
concerned in the construction of w~olesale price index numbers it j

obviously means Wholesale Price. The reason for it is that the wholesale
prices are more stable than retail prices and in one locality tl' ere is one
wholesale price of a commodity, but the retail price varies from shop
to shop. Besides this, the wholesale prices are more quickly affected
by changes in demand or supply or by other similar factors than retail
prices. Wholesale prices are more sensitive than ,retail prices. Retail
prices depend on wholesale prices, and, therefore, there is always a time
lag between changes in wholesale prices and changes in retail prices.
As such retail prices cannot give a correct picture of the price level at
:a particular moment. But this does not solve the problem of definition
'altogether, because we shall have to define the- term Wholesale Price.
Wholesale price can' be ex-factory price or price including incidental
expenses or price at which wholesalers in a market purchase or sell a
commodity. Again wholesale price may be at one level at the opening
of the market and at anotheiIeve1 at the close of the market. A decision
about the exact definition of the term wholesale price would derend on
the purpose of the index number. Generally, wholesale price indtx
numbers take into account the price at which wholesalers in the market
purchase or sell a commodity. If the price of a commodity is controlled,
then controlled prices are taken into account even though the price in
the black-market may be much higher.
FreqNency of price qNotations. Yet another thing associated with
the price quotations is a decision with regard to their number. The
question that arises in this connection is, how many quotations should
be obtained per week or per month, as the case may be. There is
no hard and fast rule about the frequency with which price quot!tio,ns
should be obtained. In general the larger the number of quotationa
the better it is. But too many quotations also complicate the prob.
lem of' the construction of inde;x numbers. Oq:linarily if an index
number is constructed every week one quotation per week is consider-
ed enough. For a monthly index numl-er at least four quotations per
month should be taken. The Economic Adviser's index number of
17
258 PUNDA!ctHN'l'ALS OF STATISTICS

wholesale prices in our country is a weekly index number and is based


on one-day-a-week prices, On or about Friday. In deciding about the
frequency of quotations, the fact which should be kept in mind is, that
the number should be such that the agency supplying the quotations can
easily and regularly send them. If the frequency is too great, the agency
supplying them may not send the quotations regularly. In the absence
ot actual quotations, prices have to be estimated and the index number
becomes comparatively inaccurate.
Aueragitg qllotations. The last thing connected with the price quo-
tations is to find out their average. If an index number is to be published
monthly and if the prices are obtained from twenty places and if the fre-
quency of the quotations is four per week, then every week eighty quo-
tations per commodity shall be received. They will have to be averaged
first. This average would give the average weekly .price of the com-
modity for the whole country. Four such averages would have to be
calculated for each week. These weekly averages would again be
averaged together, to give the monthly average price of the commodity
for the whole country. This monthly average price would be used
for the construction of the index number.
SELEC'l'ION OF 'l'HE BASE

We have already discussed that index numbers measure- the relative


changes in the level of a :phenomenon as compared to the level or the same
phenomenon on a previOUS date. This previous date or the period on
which the current variations are based is known as the Base Peri(Ja oj the
Inaex NIInJ ber. In the construction of index numbers the selectIon of the
base period is a very important step. There are two methods by which
base period can be selected. They are discussed below.
Fixed base melh'Dli. The first method is known as the Fixed Base
Mtlbod. As the name suggests, in this method the base period is fixed.
A particular year is generally chosen arbitrarHy and the prices of the sub-
sequent years are expressed as relatives of the prices of the base year.
Sometimes instead of choosing a single year as the base, a period of a few
years is chosen and the average price 'of this period is taken as the base
year's price. Fixed base can be used for an indefinite petiod. The year
which is selected as a base should be a normal year. Or, in other words,
the price level in this year should neither be abnormally low nor abnor-
mally high. If an abnormal year is chosen as the base the price relatives
of the current year calculated on its basis would give m1sleading con-
clusions. If, for example, a year in which war was at its peak, say the
year 1943, is chosen as a base year, the comparison of the price level of
subsequent years to the prices of 194.3 is bound to give erroneous con-
clusions. The reason is that the price level in the year 1943 was
abnormally high. In order to remove this difficulty associated with the
selection of a normal year, the average price of a few years is sometimes
takien as the base price. The idea is that if a few years average is blken-.
259
abnonnUlties in one direction would be set off against abnormalities in
another direction. Sometimes when an index number is being construct-
ed for a past r.eriod, the average of the whole period.may he taken as the
base. Thus if in the year 1956 'We think of constructing an index number
to 'study the variations in the price -ievel during the period 1900-1955,
the average of the prices of all these years may be taken as the base year's
price. But this is possible only if the index number relates to the past.

Chain brz'Se method. The second method of selecting the base is


known as Chain Base .Method. "- In this method there is no fixed base period
The year immediately preceding the one for which price re1atiyes have to
be calculated is assumed as ,the base year. Thus for the year 1956 the base
year would be 1955, for 1955 it would be 1954, for 1954 it would be
1953 and so on. In this way there is no fixed base. It goes on changing.
The chief advantage of this method is that the price relatives of a year can
be compared with the price level of the immediately preceding year.
Businessmen and others are more interested in comparison of this type
rather than in comparisons relating to distant past. Yet another
ldvantage of the Chain Base Method is that under It. it is possible to
include new items in an index number or to delete old items which are
no more important. In Fixed Base Method it is not possible. But
:hain Base Method has a drawback and it is that with it comparisons
::annot be made over a long period.
. Calt'II/alion oj price r,lalives. In the calculation of price relatives
:he price of the base year is assumed as 100 and the prices of other
years are expressed as percentages of the price of the base year.
(a) Price relative in fixed base
Suppose the prices of a commodity A for the last twelve years are
15 follows ~-
TABLE I
Prius of Commodity A
Year Price Year Price
.Rs. Paise Rs. Paise
1940 7 37 1946 11 00
1941 8 56 1947 10 50
1942 9 06 1948 9 37
1943 9 62 1949 10 12
1944 9 94 1950 10 62
1945 10 37 1951 10 00
.-----,
If the year 1940 is chosen as the base year then the price in this year
would be rep~ese~tea by 190. Other l'ri~es would be expr~ss(d as per-
cen~es of thIS price. A SImple formula IS used for calculating, the price
relatIVe!'.
260 PUNDAMENTALSOPSTAnsncs

It is, Current Year's Price Relative-


Current year's__p_~5e X 100
Base year's price
The figures obtained by the use of the above formula are price
relAtives of the current year. The price relatives for the data given in
table No. I, calculated in this manner would be as given below in table
No. II. In this table in column 3 the price relatives have been given on
the base of 1940 and in column 4 on the base of 1951.
TABLE II
Price Relative oj the Jata given in Table I

Price Price relatives Price relatives


Year (Rs. P.) (1940=100)
---en- (2) (3)
(1951 =100)
---(4)----
i940
1941
I 7-37
8-56
100
116
74
86
1942 9-06 123 91
1943
1944
I 9-62
9-94
131
135
96
99
1945 I 10-37 141 10f
1946 11-00 149 110
1947 10--SO l42 105
1948 9-37 127 94
1949 10--12 137 101
1950 10-62 144 106
1951 10-00 136 100
It is obvious that the price relatives give a much better Idea about
the changes in the price level than the original figur~s given in table
No.1. These price relatives can be called index numbers of a simple
type.
(b) Price relatives in chain base
As has been said earlier, in chain base method the previous year
is taken as the base and price relatives are calculated in relation to the
price of the immediately preceding year. In the above example, if
chain base method was used the price relatives of the year 1941 would
be based on the prices of the year 1940 and of 1942 on the prices of
1941. Such relatives are called LinJe Relatilltr. The following formula
is used for their calculation:
Lin}Q Relative Current Year's price 100
of the current .. Previous Year's price x
year
INDEX NVlDEllS .261
The data given in table No. I can be converted into UnIt relatives
as follows:- ' .
TABLE III
CJJklllalidll .of LiliA: Rtlatillls
(Price) Calculation of
Year Link Relatives ' Link Relatives
Rs. Paise Rs. Paise
(1) (2) (3) (4)
1940 7'37 100
8.56xl00
1941 8·56 116
7''37
1942 9·06 9'06Xl00
-8;56~ 106
1943 9.62">< 100
9'62 106
9'06
1944 9·94 9'94xl00
103'
9'62
1945 10'37x 100
10·37 ~'94'- 104
1946 11·00 11·00 X 100
10·3"/-. 106
1947 10'50xl00
10'50 95
11'0
9'37xl00
1948 9·37 10;50- 89
10',12Xl00
1949 10'12 9.37 108
10'62xl00
1950 10'62 ---ro--f2'- 105
1951 10·00 10·00 X 100
10·62 94

CHOICE OF AVElL\.GE
If we, have to study the price variations of only one commodity
the simple price relatives are the relevant index numbers and the problem
of a choice of average does not arise. But in actual practice the technique
of index numbers is used to study the changes in general price level and
in such cases more than one commodity have to be taken into a!=count.
When the price relatives of all such conunodities have heen calculated,
the next problem is to average them into a single figure. Theoretically
any Iolverage can be used for the purpose but in practice a choice has to be
made from amongst the mean, median and geometric mean only. Which
of these three averages should be used is a question which requires
262 PUNDAMENTALS OP STAl'ISTICS

slightly detailed analysis. We shall discuss this problem a little l~


For the present ~e give below an illustration to show how varj
averages can be used in the calculation of index numbers.
TABLE IV
Prkes of Co",moditie s
Commodity Price 1940 Price 1941 Price 1942
(Rupees) (Rupees) (Rupees)
---A_ 10 ---2~O~---------=~~-

B 11 22 33
C 12 6 12
D 13 6.5 13
E 14 7.0 28

TABLE V
Averaging Pri&e Relatil'eS ih Fixed
Bare Metbod
Commodity Price Price Relative rrice Relat:
1940-=100 1941 1942
A 100 -2~- -300
B 100 200 300
C 100 50 100
D 100 50 100
E 100 50 2.00
Tota-l- 500 550 1000
Mean 100 110 200
Median 100 50 200
Geometric Mean 100 87 178.2
TABLE VI
Averaging Link Relatives in Chain Base Method
Link Relatives
Commodity
1940 1941 1942
A 100 200 150
B 100 200 150
C 100 50 200
D 100 50 200
E 100 50 400
Total 500 550 noo
Mean 100 lio 220
Median 100 50 200
Geometric Mean 100 87 204.6
INDBX NUMBBIRS 263

In the above illu5trati_on, mean; median and geometric mean have


been used for averaging price relatives. For the years 1940 and 1941
the price .relatives and the link relatives are the same. They would al-
ways be the same for the fixed base year and the year immediately succeed-
ing it. For the year 1942, however, the averages of the price relatives and
link relatives are different because the price relatives of 1942 are based on
1940 while the link; relatives are, based on the prices of 1941.

Mean an IInsuitable average. Now we shall discuss the choice of


average for the construction of index numbers. As has been said earlier
the choice has to be made between mean, median and geometric mean.
Mean has the advantage of common understandability and ease in cal-
culation, but as has been discussed in previous chapters, mean is affected
considerably by the values of big items. It gives comparatively greater
weight to bigger values and as such if there is a substantial rise in the
price of even one commodity, the value of the mean would shoot up very
high. Moreover, mean measures absolute changes and in the construc-
tion of index numbers we study relative changes. From this point of
view mean is an unsuitable average. We shall discuss this point in
further details a little later.

Median also unsuitable. Median has the advantages of easy compu-


tation. It is not affected by the values of extreme items but median is
not representative if the number of items is small. In many cases it
cannot be easily determined and interpolation has to be done to estimate
its value. However, the most important point that goes against its
use in index numbers is that it also measures absolute changes. In the
calculation of index numbers we are concerned with relativ:e changes and
as such median also is an unsuitable average. It will not give a correct
measure of the variation in the level of the phenomena which is under
consideration.

Geometric mean the best average. Geometric mean no doubt suffers


from the drawback of difficulty of calculation but it has some such
properties which are very useful in the construct jon of index numbers.
Geometric mean measures relative changes and since in the construc-
tion of index numbers we are concerned with relative changes, it is
only natural that geometric mean should receive a preference over other
measures of central tendency. Geometric mean does not give greater
importance to big items like the arithmetic average, nor is it affected much
by the values of the extreme items. Due to all these reawns geometric
mean is preferred for the calculation of index numbers. The following
table would illustrate how the geometric mean measures relative changes
whereas the arithmetic mean measures absolute changes.
264 FUNDAMENTALS OF STATISTICS

TABLE VII
Calcnlation of Fi«~d base Index nllmber J by the lise of Geometric
Mean and Arithmetit' Average

1953 (base year) 1954 1955


:
Commodities
Price Relative Price Relative Price Relative
(Rs.) (Rs.) (Rs.)
A 400 100 !$UO 200 600 150
B 200 100 100 50 100 50
Total 200 250 200
Arithmetic
Average 100 125 100
Geometric /

Mean 100 87

In the abOve table the price of _commodity A for the yedr 1954
is double its price in 1953, and of commodity B it is half of the price
of the year 1953. If these two commodities are of equal importance
there should be no change in the value of the index number from 1953
to 1954. If the price of one commodity is doubled and that of the
other is halved there is no change in the general price level. However,
the arithmetic average of the price relatives shows that there is an increas~
of 25% in the prices of 1954 as compared to the prices of 1953. Th~
geometric mean, however, does not reveal any change and it is thus a
correct measure. Similarly in the year 1955 when the price of commodity
A has gone up by 50o/~ and ofB has gone down by 50% arithmetic average
4
does not record any change. A 50% fall is never made good by a 50'0
rise. 50% fall can be compensated by a rise of 100% but since arithmetic
average measures absolute changes a 50% fall has been set off against
a 50% rise, and the ind* number shows no change. The geometric
mean, however, records ~ change and shows that the average price has
gone down. Thus, geotpetric mean is a better measure than arithmetic
average, so far as the cor.struction of index numbers is concerned. It
has another advantage ovc!'r the arithmetic average inasmuch as, it makes
possibfe the replacement of commodities which have become obsolete
and the inclusion of new pnes without affecting the balance of the index.
We shall see a little laterl1:hat the index ntlmbers which are constructed
by the use of geometric tnean are reversible and satisfy the time reversal
test, and make base shifting an extremely easy task. Therefore, in the
construction of index !}1Ilmbers invariably the geometric mean should
be used. The Economic Adviser's Index Number of wholesale prices
in India also uses the ~eometric mean. '
INDEX NUMBERS 265
PR.OBLEM 0.1' WEI<aITING

Methods of computing index numbers discussed so far give, what


are called, unweighted indices. While calculating the average of various
price relatives in the illustrations given so far we have used either simple
arithmetic average or median or simple geometric mean. All the items
have been treated as equally important. An important question that
arises in this connection is, should index numbers be weighted or un-
weighted? We have already discussed in the chapter on Measures of
Central Tendency that where the relative importance of items is not equal
weighted average gives better results than an unweighted one. As such
index numbers should be weighted. But we have discussed in that
chapter how difficult it ill to arrive at correct weights. Further we had
rejected the idea of chance weighting, as well as of arbitrary "j:Veighting,
and had come to the conclusion that weights should be rationalised. We
shall presently discuss what we mean by rational and irrational weights.
It is impossible to· give a comprehensive definition of the term
"rational weights." Weights which are perfectly rational for one in-
vestigation may be entirely unsuitable for another. In fact the.purpose
of the index number and the nature of the data concerned with it, are two
important things on which a decision about the rational weights depends.
On the relation of weights to the purpose of index number Mitchell says
as follows : -
uIfrational weighting is worth striving after then with what methodli
shall the weights of the different commodities be arrived at? That
depends upon the object of the investigation. If, for example, the aim
be to measur~ changes in the cost of living and the data be retail quo-
tations of consumer's commodities then the proportionate expenditucr
upon the different articles as represented by collection of family budgets
make appropriate weights. If the aim be to study changes in the money
incomes of farmers then the data should be farm prices and the weights
should be proportionate to the total money receipt from the several pro-
ducts and the list of commodities on which. the index number shall be
based should be limited to the products of the farm. If the aim be to
construct a business barometer the data should be prices from the most
representative wholesale markets. The list should be confined to
commodities whose prices are most sensitive to changes in business
prospects and least liable to change from other causes and the weights
be logically adjusted to the relative faithfulness with which the quota.
tions included reflect business conditions."
It should be noted that in all the above illustrations the rational
weights are vallllS. "In many index number calculations, the only
possible .medium of comparison of the units which form the hetero-
geneous group, is that of value measured in money. The reason is
simple. We can only compare tons of pig iron, bushels of wheat.
266 FUND..usBN'l'ALS OP Sl'A'rIS'rICS

gallons of milk and barrels of beer if we express each in terms of money


values."
Should weight! be fixed or fluCiuating? We thus reach the conclusion
that in order that rational weighting be applied to index numbers the
roost appropriate rational weights in majority of cases are IIID"~ flaiml.
The next question that arises is, whether weights should be fixed or
fluctuating? The answer is that by changing the weights a more accurate
measure of importance is undoubtedly acquired, It may, however, be
said that in case weights are fluctuating, changes in an index number must
be interpreted not only in terms of prices but also in terms of weights.
When chain indices are used weights can be varied without confusion.
Thus we conclude that weighting of index numbers is essenlial, thai weight
should be rational, tbat ratirmalweights an utlll1l{y money pallltS and, lastly, that
weights should be jlllttlll1ling.
lJIIpli~il and explidt weighting. IndeJc numbers can be weighted by
two methods. In the first method the weights are not explicitly assigned
to any commodity but the commodity to which greater im'portance is
attached is repeated a number of times. A number of var_ietles of such
conunodities are included in the index number as separa(e items. 'Thus,
if in an index number wheat is to receive a weight of 3 and rice a \weight
of 2, three varieties of wheat and two varieties of rice would be ittcluded.
In this method weights are not apparent, but items are implicitly 'Weighted.
Such weights are known as ImpliNt Weights. Bombay wholesale Price
Index Number in our country, was weighted in this fashion, In the
second method weights are explicitly assigned to commodities. Only
one variety ofa commodity is inclUded in the construction of index number
but its price relative is multiplied by the figure of weight aSsigned to it.
Explicit weights are decided on some logical basis. For example, if
wheat and rice are to be weighted in accordance with the value of their
net output and if the ratio of their net output is 5 : 2, wlieat would
receive a weight or live and rice of two. SlIch weights are called
&plidt W'eiglM. 'The fOllowing methods are p:enerally used in expli-
citly weighting an index number:-
(a) Weighted average of relatives method; and
(b) Weighted aggregative method.
Weighted average of relatives. In this method the unweighted index
number is converted into another which is weighted. The weights
uSl!d are va/lies. The values are estimated on the basis of the aggregate
expenditure in the base year. Weights are in proportion of these values.
The aggregate expenditure of a commodity in the base year is calculated
by multiplying quantity with price. The index number for the current
year is calculated by dividing the sum of the produ .:ts of the current years
price relatives and base years values by the total of the weights. In
other words, weighted arithmetic average of the price relatives gives the
required ind~ number..
(NDEX NUMBEaS 267
Symbolically

Weighted index number of the current year-=~~


Where I stands for the price relatives of the current year and V
forvalues of the base year. The following example would illustrate the
above procedure.
Exampl, 1. The followilig table gives the prices of some com-
modities in th~ base year and current year and the quantity sold in the
base year. Calculate the weighted index number by using the weighted
average of relatives.
Co~odities [--u-ru-·-t--~I~B~a-Se--y-ea-.r~'s--~1~B~a-s-e-ye-.-a~r'~s~l~c~u-rr-e-~~t-y-e-a-r'~s-
_ _ _- . -_ _-,-1-=-0'--. quantIty prIce prlce

~C ~::nd
Dozen
I 16
~ \
5.6
1~ I ~~6
7.0
D Yard 21 1.5 I 1.4
Sollllioll. The price reLitive of the current year
_ Current year's price X tOO
Base year's price
and according to this rule the price relatives of the commodities A,
B, C and D for the current year would be respectively 122.5, 160, 125
and 93.3.
The values of the base year
=Quantity of the base year X Price: of the base year, and according
to this rule the values in the base year for conunodities A, B, C and D
would be respectively 112, 12, 89.6 and 31.5.
These figures are used in the following table for calculating
weighted index number : -
Price Relative Values or
Commodi~y of the current year weight Weight X Price
relatives
(I) M (IV)
A 122.5 112 13,720
B 160 12 1,920
C· 125 89.6 11,200
D 93.3 31.5 2,939
Total 245.1 29779
Weightid index number of prices ~ l~ "",~9779
... ~V 245.1
-=121.5
268 FUNDAMENTALS OF STA'l'ISTICS

Weighl,d aggrllgt1/ille melhod. In the second method the quantities


in the base year are taken as weights. Current year's prices are multi.
plied by the hase year's quantities and the sum of these products is divideG
by the sum of the products of the base year's prices and base year's quanti-
ties. The ratio so obtained is multiplied by 100 and the resulting figure
is the desired index number. This method is known as weighted aggre-
galille me/bod because in it, we calculate aggregate expenditure in the base
year as well as in the current year, on the assumption that the quantities of
the base year hold good for the cu~rent year also.
Symbolically
Weighted index number of the current year "" : P1IJo X 100
pOIJo
Where PI stands for the prices of the current year, Po for the prices
of the base year and IJo for the quantities of the base year.
The weighted index number of the data given in Example t above,
would be calculated by this method as follows :-

Commo-
dity I
Unit
Quantities

yefiT (qo)
p",,, of Pd,,, of
year (Pn) year (PI)
r
Colell/aliolJ of Index Ntlll/ber by Ihe Weighled AggregatitJe Melhoi

I
of the base the base the current (IJo XPo~ (fOXP1
-- A Maund 7 16 19.6 112 137.2
B Kilo 6 2 3.2 12 19.2
C Dozen 16 5.6 7.0 89.6 112.0
D Yard 21 J 1.5 1.4 31.5 29.4
"l:.pofJo ~lP"
I Total I / I 245.1 29 .8

Index nu mber "EP11Jn X 100


-"l:.PolJn
297.8 X 100
=-245.1
-121.5
Thus both these methods have given the same result. In both
the above-mentioned methods instead of arithmetic average, geometric
mean can also be used. In that case instead of the weighted arithmetic
average of price relatives. their weighted geometric mean will have to
be calculated.
Steps in the construction of Wholesale price index numbers
Thus the steps in the construction of wholesale price index numbers
are. as follows : -
(1) Sel.ect a suitable list of commodities and make arrangements
for Obtaining their price quotations regularly.
INDEX NU:MlIjBas 269
(2) Select a base year and convert current prices into 'price relatives
based on the priGes of the base year. There can be either a fixed base or
a chain base. A fixed base can be either a particular year or the average
of the number of years.
(.3) Select a measure of central tendency and obtain an average of
the pnce relatives. Choice of the average has to be made from amongst
mean, median and geometric mean. Ge<>me.tric mean has certain ad-
vant'2ges over other averages in the construction of index numbers.

(4) Ifweights have to be used they can be eitherimplicit or expl~cit.


Explicit weighting can be done ·either by weighted average of relatives
method or weighted aggregative method.

Conltmction of cost of lidng index numbe~8


Nled. The necessity of the construction of cost of living index
, numbers arises on account of the fact that wholesale price index numb~rs
measure the variations only in the general level of prices. These varla-
tions do not throw light on the effects of rise and fall of prices on the
cost of living of different classes of people in a society. Different
groups of people consume different types of commodities and even the
same type of commodities are not consumed in the same proportion
by different classes of people. The relative importance of various
commodities thus is different in case of different types of people.. In
or<;ler to measure the effects of rise and fall in the prices of varloue
commodities on the cost of living of Oifferent classes of people, separat
index numbers are constructed for different groups.
Diffttllilies in tonstrlltlion. The construction of such index num-
I bers is not an easy task. The main reason for difficulties in the cons-
truction of cost of living index numbers is that, for such indices, price
variations have to be studied from the point of view of the consumers.
As such, many difficulties which are not experienced in the construction
of wholesale price index numbers are present here. Since consumer;;
purchase commodities in small quantities, from retail shops, wholesale
price~ are not used in these index numbers. The collection of retail
prices, however, is a very tedious and djfficult task as they vary from place
to place and even at one place from shop to- shop and even at one shop.
from customer to customer. As such the index numbers constructed
by the use ~f retail prices cannot be used for different places nor for
pifferent classes of people at one place. Moreover, in the construction
of such index numbers it is not possible to exclude such commodities
'which are not stable in quality because if people consume such things they
have to be included in the index number. Due to these reasons it is not
possible to say definitely that each time when an index number is being
~onstructed the prices refer to exactly the same things as in the past.
270 FUNDAMENTALS OF STATISTICS

Another difficulty in the construction of cost of living index num-


bers arises due to the fact that all the members of any particular group,
however small it may be, do not spend on various items of consumption in
one ratio and even one person does not spt'nd on various items in the
same ratio at two periods of time. 'To remove this difficulty the concept
o~ an a.verag. famify has to be introduced. The average family is an ima-
gInary concept and there may not be in actual practice a single family
which fully resembles this theoretical average family. II should be re-
membered that d cost of living indlx number tell! us about the variations in the
cort of living of onlY one group ojperrons living in a particlllar region. By region
we mean here, a1l area within which retail prices are almost equal and by grOIlPS
we mean clauts distinguishedfrom each other on the basis ofinc(}mes. Thus there
cannot be one cost of living index number for teJI:tile work.ers of the
whole country~ because retail prices in different £:Ilaces differ and the
pattern-of consumption is also not alike in different localities. Similarly
we cannot have a cost of living index number for the whole popula-
tion of a particular town because gronps of persons with different in-
comes spend on various commodities in different 'Ways and the relative
importance of various commodities to all persons is not iccntica1.
Construction. If a cost of living index nUII\ber for a special group
of persons in a particular region is to be constructed the first thing that is
to be decided is, the persons who would be included in that specific
group. It is extremely necessary to define this in an unambiguous fash-
ion. After this has been done the next step is to hold a fal11i(J butigel in-
qlliry relating to the persons in this group. Family budgetl. enquiry is
held with a view to find out hoW much an average family at this group
spends on di.fferent items of consumption. The quantity of the conunodities
consumed, as also the prices at which they are purchased are noted down.
The enquiry is done on a random sample basis. Some families are
selected from the total number by lottery method and their family budgets
are scrutini%ed in detail. The items on which money is spent are classi-
fied in certain groups. Generally these gtoups ate (1) Food anicles,
(2) Clothing, (3) Fuel and lighting, (4) House rent and (5) Miscell aneous.
These major classes are furtheJ; subdivided into sma.ller classes. For
example, the food group can be sub-divided into (0) Cereals, (b) Pulses,
(c) Others. Usually these sub-classes are divided still further so that the
commodities iqcluded in each sub-group are individually mentioned.
These family budgets give an idea abOut the quality of various commodi~
ties consumed by an average family of this particular group and the prices
at which these commodities are purchased. In this way the total amount
spent on various items is calculated. The family budget enquiry thus
helps in deciding the commodities 'Which should be included in the COns.-
truction of a particular cost ofliving index number. Only those COmm~
dities "Which a group generally consumes are included in the index number.
The commodities selected should be preferably those whose retail price
quotations are easily available. The relative importance of various items
ror dillerent classes of people is not the same and that is why the cost of
271
living index numbers are always weighted. The extent to. which a
particular class would be aHected by chane;es in the price of a commodity
would depend on the importance which It giv.es to this commodity in its
family budgets. The relative importance of commodities thus decided
on the amount spent on various items. Weighted index numbers are
then calculated by the use.of any of the two methods discussed in previous
sections. They are known here as :
(1) Aggre~ate expenditure method or aggregative method and
(2) Family budget method or weighted relatives method.
Aggregate expendimre method. In ·this method the quantities of
commodities consumed by the particular group in the base year are
estimated and these figures or their proportions are used as weights.
Then the total expenditure 'On each commodity for each year is ca;1cu-
lated. The price of the current year is multiplied by the quantity or
weight of the base year. These products are added. Similarly for die
base year total expenditure on each commodity is calculated by multi-
plying the quantity consumed by its price in the base year. These pro-
ducts are also totalled. The total expenditue of the current year is
divided by the total expenditure of the base year and the resulting figure
is multiplied by 100 to get the desired index number,
Symbolically
'l:
Current year's index number = P190 X 100
~Poqo

Where Pt and Po stand for'the prices of the current year and base
year and 90 for the quantities consumed in the base year. The follow-
ing example would illustrate the aeove procedure : -
Example 2. Construct the cost of living index number foe 1940
on the basis of 1939 from the following data using the Aggregate Ex-
penditure method.
Quantity con- Unit
Article~ sumed in Price in Price in
1939 1939 1940
Rs. Paise Rs. Paise
Rice 6 mds. roaund 5'75 6
Wheat 6 ., 5 8
Gmm 1 " 6 9
"
Arhar ,pulse
Ghee
6
"
4 seers
••"
seer
S
2
10
l·S
S:rr 1 md. maund 20 15
S t 12 seers ., 20.50 18
Oll 20 Sc.Qrs maund 4 4.75
272 FUNDAMENTALS OF STATISTICS

Clothing 50 yds. yard ·50 .75


Firewood 12mds. maund ·50 1.125
Kerosene 1 tin tin 4 5·125
House-rent house 10.75 12.75
So/tllion. Cons/rile/ion 0/ Co.rt nf Liviilg Tlld,'.'( Nflmber
Agg. Agg.
Articles Quantity Units Price in Price in Exp. in Exp. in
r.onsumed 1939 1940 base yea cur.yeal
qo P.. P, p"qn p,qo
Rs. Rs.
Rice 6 mds. I per md. 5.75 6 34.50 36
Wheat 6 5 8 30 48
Gram 1 " " " 6 9 6 9
Arhar 6 " " " 8 10 48 60
Ghee "
4 seers " "
seer 2 1.5 8 6
Sugar 1 rod " md. 20 15 20 15
Salt 12 seers "
., rod. 20.5 18 6.15 5.4
I
Oil 20 md. 4 4.75 2 2.37~
Clothing "
50 yds. " yd. .5 .75 25 37.5C
"
Firewood
Kerosene
12 rods.
1 tin "
md.
tin
.5
1.125
5.125 I 6
4
13. ~
5.15!
House-rent - .," house {0.75 1 12.75 10.75 12.7:

Index Number for 1940 =~,h~ X 100 = 2_5_~6~ X 500 =


125
"EPrflo 200.4
Family blldget method. In the family budget method the famil
budgets of a large n~mber of people are carefully studied and the ag
gtegate expenditure of the average family on various items is estimatec
These values are used as weights. Current year's prices are converte
into price relatives on the basis of base year's prices and these pric
relatives are multiplied by the respective "alHu of the commodities, i
the base year. The total of these products is divided by the sum of th
values (or weights) and the resulting figure is the desired index numbel
Symbolically
Current year ' s In
. d ex number I;,V
:EIV

Where I stands for the current year's price ~elative and V for t}
values of the base year.
Example 3. Construct the cost of living index number for 19~
on the basis of 1946 from the following data using the family budg'
Method.
273
Articles Quanti ty con- Unit' Price in Price i n -
sumed in 1946 1946 1950
lts. Its
Rice 5 mds. per md. 12 16
Bajra 5 mds. "., 8 10
Wheat 1 md. ",. 10 20
Gram 1 md. .,., 6 12
Arhar 5 mds. .... 8 12
Other pulses 2 mds. "" 6 8
Ghee 4 kilos per kg. 2.5 4
Gur 2 mds. per md. 5 10
Salt 12.5 kilos per 40 kgs. 8 10
Oil 24 kilos per 40 kgs. 40 50
Clothing 40 meter per meter.5 1
Firewood 10 mds. per md. 1 1.6
KH~sene 1 tin per tin 4 7
ruse-rent .•. j>cr house 24 30
So/unon. Construction oj '(;osl oj Livi,,1. Ind,x NlIII1b,,.
, I 0 I 0 PrIce Values Product
Quantities . ~ G' ~ ~ rela- \ conSu- of price
consumed ,..Q ct. to)...... tives med in relative
• d ...... c: .. fi
Articles In base .... '-".... ~ or \baSe year and
Year 8 u·
Unit ..... ~ 8 1>0. current (wei- weight
.... ~
0':; 1>0. d:: c: year I ghts)
I ~ (1) (V) (IV)
lts. Rs. lts.
Rice 5 mds per md. 12 16 133.3 60 7,998
Bajra 5 mds. .,,. 8 10 125 40 5,000
Wheat 1 md. "" 10 20 200 10 2,000
Gram 1 md. "" 6 12 200 6 1,200
Arhar 5 mds. "" 8 12 '1 150 40 6,000
Other pulses 2 mds. ,." 6 8 133.3 12 1,599.6
Ghee 4 kilos per kg. 2.5 4 160 10' 1,600
Gur '2 mds. per md. 5 10 200 10 2,000
Salt 12.5 kilos .. 40 kgs. 8 10 125 2.5 312.5
Oil 24 kilos " 40 kgs. 40 50 125 24 3,000
Clothing 40 meters per meter .5 1 200 20 I 4,000
Firewood 10 mds. per md'l 1 1.6 160 10 lAOO
Kerosene 1 Tin per tin '4 7 175 4 l~ 700
House-rent - per house, 24 30 125 24 i 3,000
.. - ---,-----1:---:--- - - - 272.5 . 0,010'1
l:IV 40,010.1
Index Number of 1950c=~ = 272.5 '=147
In fact the two methods discussed above are the same as weighted
aggregative method and weighted average of relatives nlethod discussed
earlier ill connection with weigbting of wholesale price index numbers.
The results given by both'these methods are always identical.
18
274 FUNDAMENTALS OF S'1'ATIS1'ICS

Errors in cosl of livin,~ index nllmbers. Cost of living indcJ:: num-


b::rs are generally not very accurate. They only give a rough measure
of the effects of price variations on the cost of living of a certain group
of people in a particuiar locality. There arc various reasons for in-
accuracies in such indices. Thf: first reason is associated v..ith the
section of commodities and their price quotations. There is always 1I
likelihood that some such commodities are included in the index num-
ber which are not representative while others which are representative
have been left out. There can be difficulties in obtaining accurate-price
quotations also. These difficulties arise due to the fact that many
vaneties of a commodity are usually in the market at a time and the
selection of one particuhr variety becomes very diRicult. It is likely that
the quotations obtained at different periods refer to different varieties of
the same commodity. Again, retail prices also vary considerably from
shop to shop; and from person to person. Bargaining and higgling
in retail markets never allow one price to f?le at a particular moment.
Besides this all the families which come under a particular group do not
spend on various commodities in an identical manner, and an average
family may fiot always give representative values. The most important
source of error in cost of living index numbers, pO\"l;ever, is the possi-
bility of the use of inaccurate weights. Weights may not be represen-
tative and if it is-so, -index numbers would give misleflding conclusions.
Due to the above mentioned reasons the cost ofYliving index num-
bers are generally not very satisfactory in character and they do not
accurately represent the changes in the cost of living of the 'group for
which they are constructed. Their results are not very dependable as
the distribution of expenditure over various items by different persons
in the same income groups is not always identical. The distrib'!ltion
depends on the size of the family, its tastes and habits and the amount
or actual income. All these things aIe different in cases of different
families. Even for one family these things change with the passage of
time but the cost of living indices presume that there is no change in
thes~ factors. Another assumption of the cost of living .index numbers
is that the quanJ:ities of the base year are constant and hold good for
the current year also. This assumption is also very shaky. The pat-
tern of consumption is an ever-changing phenomenon. With changes
in the prices of different commodities, with new types of commodities
coming in t.he market and with changt;s in fashion, the relative quantities
of various commodities consumed also change. Just as prices change
from the base year to the current year similarly quantities also change
though the change in quantity is usually not as much as it is in case of
prices. However, it can be said that cost of living index numbers measure
the relative change in the cost of living on the basis of the standard of
living and consumption pattern of .the base year. But there is abso-
lutely no reason to presume that the pattern of consumption and stan-
dard of living of the base year represent the·normallevel of these phe-
nomena. This difficulty can be_removed to a certain extent if statistical
INDEX NUMBERS 275
data are rrogularly collected under changing circumstances, but this in
itself is a very difficult proposition.
INDICES OF INlJa~TRIAL PRODUCTION
Index numbers of industrial production have become fairly common
these days. Th~y tell about the relative increase or decrease in the
level of industrial production in a country in relation to the level of
production in the bas~ year. Obviously these ,indices can be constructed
by studying the variations in the level of industrial output. As such
the first step in the construction of such index numbers is to find the
level of output of various industries of the country. It should be
remembered that thue index nllDlbers throw light o.n the &banges in tht qJfantllfll
of prodllaion, nof in "alllu. If the variations in the value of output are
to be studied data about the value of industrial output will have to be
used for the purpose of constructing such index numbers. Thus indices
of industrial production are constructed either by studying changes in
the quantum of productio~ or its value.
Usually important data about oroduction are collected under the
following major heads :- •
(1) Mining Industries-Coal, iron ore, copper, aluminium,
petroleum, etc.
(2) Metallu rgicaI~ ,! ndustries- ,·Iron and steel, rolling mills, etc.
(~) Mechanical IndJ,lstries-Locomotives, ships, aeroplanes, etc,
(4) Textile Industd~s- Cotton, woollen, jute, silk, etc.
(s) Industries subject to excise duties-Sugar, match, tobacco,
breweries, etc.
(6) Miscellaneous-Cement, glass, soap, chemk~l, etc.
The data relating to the production of the above mentioned in-
dustri~s are collected either monthly, quarterly or yearly. The pro-
duction of the base year is taken as ,1)0 and the current year's production
is expressed as a percentage of the base year's production. These
percentages are multiplied by the relative weights assigned to various
industries. Weights are usually assign.ed on the basis of the rebtive
importance of different industries. The relative importance of industries
is usually decided on the basis of ..:apitd invested, the gross value of
production, turnover, net output, etc. Many other criteria of relative
importance can also be laid down. Usually weights in an index number
of industrial production are based on the values of net output of different
industries. The weighted arithmetic average or geometric mean -~
the rdatiyes give the index number of industrial production. Such
index numbers can be constructed both for gross output as well as net
~~ (
INDICES OF BUSINESS CONDITIONS
Business conditions never remain stationary. They are constantly
changlOg. sometimes; there is a boom and at others a depression
276 FUND-\MENTALS OF STATISTICS

Periods of business prosperity are usually followed by periods of busi-


ness decline and depression. Index numbers of business conditions are
meant to give an idea about the changes in the business conditions at
different periods. They are especially helpful for business forecasting.
The future course of business activities can be forecast with fair
degree of confidence if dccurate index numbers of business activity are
available. This is possible on account of the fact that business fluc-
tuations are periodic or cyclic ir character. But business conaitions
can be studied only on the ba,sis of. a detailed study of the whole- econo-
mic set up of a country, and as Such,. in constructing index numbers of
business conditions, statistical data have to be collected about a large
number of widely varying factors. The index numbers are rightly
called Economic Barometers as they measure the level of economic;: activity
in a country. Professor Pigol,i has given the following list of itens
. which should be included' in ,thc; construction of an index number of
business conditions for England':-
(I) Unemployment percentage.
(2) Consumption of pig iron.
(;) Prices in England.
(4) Rates of discount of three months bills.
(5) Volume of manufactured goods. .
(6) Agricultural production. :l
(7) Yield per acre of nine principal crops.
(8) Index of production from mines.
(9) Increase 'of bank credit.
(10) Credits outstanding.
(II) Annual increase in the aggregate money wage.
(12) Rate of real wages.
(I;) General aggregate consumption. ,
(14) Proportion of reserve to liabilities of the Bank of-England.
The above list of items gives a fair idea of the vadous factors
about which information .has to be collected in the construction of an
index number of business activity. The values of the above mentioned
items can be exptessed. as relatives of their tespective values in a base
year. These relatives are weighted by the respecti:ve weights of the
items. It is very difficult to decide the weights of items in an index
number of this type. 1be weighted a_yerage calcl.llated in. this manner
gives the desired index number of business conditions.

RelatiQnship be'tWecn :fixed base and chain base index numbers


Sometimes:'index numbers' constructed by ~sing a fixed base are
converted to chain base index numbers, while at other chain Qase index
numbers are linked together to a common base. 'Ihis is not a difficult
task. The following example ,,-:_ould ili'lstrate the proc~dure of con-
verting tixed b'ase index numbers to chain base index numbers and viGI-
I'er.rc ;--
INDEX NUMBERS '271

Example 4, (a) From the fixed .base index numbers given


below. prepare; chain base index numbers:
1945 1.946 1947 194 8 1949 195 0
376 39 l 392. .400
408 380
(b) From the chain base index numbers given
below, prepare fixed hase index numbers,
1945 1946 1947 948 1949 195 0
92. 102. 104 <)8 103 101
So/uJion, (a) Computation of chain base inde:lC numbers from
the given .fixed base index numbers
Fixed base index
Fixed Base numbers changed to Chain Base Index
Year Index chain base index numbers
Numbers numbers
(I) (2.) (3) (4)
--- IOU
1945 376
392.
194 6 39 2 -37 6 X 100 I 0 4'Z.
408
1947 4Q[.: 39~ ~ 1':10 104"J
380
]94 3 3Ro 408 X 100 93·'
S,)2.

1949 392. 3';-; X 10"" 10 3'2


I
4 00
19P 4 00 ~92. X 100 102 I
1 (a) By chaining together the chalO base index numbers, we get
• the fixed base index numbers"
'7' .(bY Computation of fixed base index numbers from the given
; chain""base index numbers,
hain ase index num ers FlXed ase
chained to 1945 as base index
numbers

2.:.. X 102-
100
92. 102.
- X ~XI04
[00 100
92. 102 104 ()
- X - X - X 9"
100 100 100
92- 98 102 104
- X - X - X - X 10 3
1949 100 100 100 100
_
92 102 104 9 8 y 10 3
X -~ X - X -,,-XIOI
10 [ .100 100 100 100 100
278 FUNDAl4ENTALS OF STATISTICS

BASE SllIFl'lNG

Need. Sometimes it becomes necessary to shift the base of an


index number. If an index number is constructed on a very old base
and if it is-'found that in changed conditions the base year is no more
normal, it is evident that such a base year should be replaced by another
which is more representative. An index with 1914 as the base year,
is not suitable for comparison in the year 1955. To compare the price
level of the year 1955 we should have a more recent and representative
base_ year. Before World War II, the year 1914 was a popular base year
but now most of the index numbers, all over the world; have 1939 as
the base year. Some index numbers are now constructed by using the
more recent base of a post·\\ar year like 1947 or 195 z. We have
mentioned earlier that no base year is good for all times and a~ such the
prOblem of shifting the base is a very common pliOiblem associated with
lndex numbers.
Reconstrllction oj series. There are two methods by which a base
year can be shifted. First is the method of reconstructing the entire
series. Here the prices of the new base year are taken as 100 and the
prices of all preceding and succeeding years are converted into price
relatives and all the index numbers reconstructed afresh. This is a
very tedious and difficult job. If in the year 1955 we are sl)ifting the
base year of an index from 1914 to 1939, all the index n\Jmbers for the
last 40 years would have to be reconstructed. If tho index number is a
weekly index number we shall have to prepare more than 2.000 index
numbers. Obviously the task is a very difficult one.
Short-cllt method. In the second method a simple arithmetical rule
is followed. Thus if on the base of 1914 the index number of 1939
is 2.00 and of 1955, 400 we can easily say that the index number of 1955
on the base of 1939 is zoo and of 1914 on the base of 1939 is 50. But
the short-cut method cannot be applied- in all cases. It can be used
only in those indicel> which use simple geometric mean in their construc-
tion. In cases where mean or median has been used short-cut method
would give inaccurate results. The reason for it is that geometric
mean measures relative changes which are always reversible while
arithmetic average and median measure absolute changes which are not
reversible. The following example would illustrate this point : -
Bxan;plf 5. From the following data calculate the index numbur
for 1950 using 1939 as base and then shift the base year from 1939 to
1950. (1) by reconstructing the whole se::ies and (2.) by the short-cut
method using both arithmetic average and geometric mean.
Commodities Prices 1939 Prices 1950
(Rupees) (Rupees)
A IS 10
B 16 11
C 40 60
INDEX NUMBERS 279

Soiution. Construction of index numbers and sbirting of base year


- Commodity'" price in .Price in Price relativ"es----"Price reIa-
1939 195 0 for 1950 on tives for
(Rupees) (Rupees) 1939 base 1939 on
1910 base
A II 10. lZS 80
B 16 1.2 75 I H
C 40 60 150 6]___ _
Total 350
Arith-m-e-:t""'ic~A~'-Te-'r-a-g-e-------------.::.J:..17-- ----9--~----
Geometric mean 1I 2 89'::__--
"Ihus
Index number of 1950 by using arithmetic average = 1I7
and by using geometric mean = I 12
If the base year is changed to 1950 then the index nnmber of J939
by using arithmetic average • = 9.~
and by using geometric mean =89
The above index numbers have been calculated by reconstruct-
ing the series. .If we follow the short-cut method then in case of arith-
. average t h e ln
mettc . d ex number of 1939 on 1950 wou Idb e IOOXIOO
-----
117
or about 85. But as we have seen above, the actual index number
is 93. Thus in case of arithmetic average the short-cut method has
given a wrong result. In case of geometric mean the index number
of 1939 011 1950 base would be -100 - - or about 89. We fi n d
- -X-100
I 1.2
that the reconstruction of the series has also given the same figure.
Thus in cases of geometric mean the short-cut method has given an
accurate result. Geometric mean gives index numbers ~'hich are rever sible
so far as time factor is concerned. The index number of the currmt year on
the base year and the index fiU'IJJber of the base year on the current year are re-
versibie or they are in a fixed ratio. Thry arc in fact reciprocals "of each other.
Thus, if the currect year's index is 400, the index of the base year based
on the current year should be 25. The figures 4 and 25 (index numbers
without the figure IDO) are reciprocals of each other. In other
words, if prices in the"current year are four times the prices of the base
year then the prices of the base year should be one-fourth of the prices
of the current year.
The above discussion leads us to the conclusIon that base year
can be shifted without reconstructing a series by a short-cut method
if'geornetric mean has been used in the computation of the index num-
ber. In other cases when arithmetic average or median h~s be~n. use?
for the purpose of the construction ot inde)t numbers, base shiftIng 1S
possible only by the reconstruction of the entire series and by calculating
the index nu.mbers afn:sh. The following example would further
show how the time series of index numbers can be switched to a new
base by the short-cut method :
280 FUNDAMENTALS OF STATISTIC!>
Exalllpl, 6.
The fonowing are the index numBers of prIces {Base 19~9:=IOO)
Year Index Numbers Year Index Numbers
1939 100 1947 380
1940 110 1948 370
1941 120 1949 3S o
1942 2.00 1950 ,60
1943 32.0 195 1 34 8
1944 4 00 195 2 340
1945 4 10 1953 32.0
1946 400 1954 300
Shift the base from 1939 to 1946 and recast the index numbers.
Solktion Shi/lilll!. of bas, fro", 1939 10 1946.
Year Index Nos. Index Nos.
_ _ _ _ _ _ _ _ _ _ _ _1-"-9.:. 3-"-9 ~a_:_:_ 1946 Base
100
1939 100 100X--=2.j
400
100
1940 lIO ITOX --=2.7.5
400
100
1941 12.0 12.0X -=}O
400
100
1942 zoo 2.00 X - =50
400
100
I943 32.0 32.0 X ---=80
400
100
1944 4 00 400X-- =TOO
400
100
1945 41 0 4 IO X--=102.·S
40 0
100
1946 400 4 C OX-=100
40 0
100
1947 380 3 80 X - 00 =-95
4
100
IC)48 370 370 X. -~ =92..5
400
100
1949 35 0 35 0 X-=87:'
400
100
195 0 360 3 60 X -=90
400
100
1951 34 8 34 8 X -=87
400
100
1952- 340 340)\, --=85
40 0
100
1955 po 32o X - =80
400
100
1954 }OO 300 X -=75
400
I~DE:jt NUMBERS' ' . 2-81
.fi..s has been s:l1d earlier this method should be used only when,
simple geometric mean has been used in "the construction of index
numbers.
SPLICING OF INDEX NUMBERS
Need and meanin...1!,. Splicing of index numbers is done with a view
to secure continuity for'the purposes of comparison. Thus •.if an index
number which had a base yeat of 1914 is continued up to 1939 and
another index number (constructed from the same items) has been
started in the same year (I939) and has a base of 1939, then the second
index number can be spliced to the first one so that the spliced index
number would give us prices on 1914 base. This is very useful for
purposes of continuity in comparison. Splicing is based on the pt:inci-
pIe of base shifting. The ratio of the two index numbers .in a common
year is calculated and on thllt basis the indices of other years are easily
obtained.
The folloWing example would illustrate the procedure : -
Example 7. Index number.A given below was started in 1914 and
discontinued in 19;9 in which year anotherindex number B waS started
which continues uptodate. From the following data splice index num·
ber B to Index number A so that a continuous series of index numbers
from 1914 uptodate may be available.
Splicing of index B 10 index A
Year Index A

100
1ndex B I'
I

r Index B spliced to A

193 8
1939
180
2.00 100 I, 100

100
X

J50 xzoo
2."00
=2.00

1940 15 0 -100- =3 00
1941 160
I '"
IGOX200
roo
Thus in the above example the ratio of the two index numbers
FO

in the year of overlap or 1939 is 2. : 1. If the index number B is 'mul-


tiplied by ! or 18-8 we shall get the spliced index number. These
spliced indices now would refer to 19 J4 base and a continuous comparison
of the index number from' 1914 onwards would be possible.
It is also possible to splice index number B so that comparisons
may be possible on the base of 1939. This can be done by multiplying
the index number A by the ratio 133. The spliced index number of
1939 base would be so for 1914, 90 for 1938 and 100 for 1939.
282 FUNDAMENTALS OF STATISTICS

Splicing would give accurate results if geometric mean has been


used in the construction of index numbers ....
DEFLATING OF INDEX NUMBERS
Deflating of a series refers to its correction for price changes.
Thus a series of money wages can be corrected for price changes to
find out the level of real wages. Similarly. per capita income series can
be corrected for changes in the cost of hving. A series thus corrected
would give per capita. income in real terms. The corrected figures
can be converted into index numbers for purposes of comparison. The
following example would illustrate the procedure of deflating a series:-
Example 8. The following table gives the per capita income and
cost of living index number of a particular community. Deflate the
per capita income by taking into account the rise in the cost of living.
Year Per Capita income Cost of living Index
number Base I~H9

1939 65 100
1940 70 110
1941 75 IZ.O
1941. 80 13 0
1943 9° 15 0 \
1944 100 ~oo

1945 120 z5 0
1946 15 0 35 0
Sollliion. Deflatinl!, per &apita i n&01116
t'er caplta " Lost of llvlng Vetlated or real per
year income index capita income
Rs. Base 1939 Rs.
19~9 oS 100 65. 0
100
1940 70 110 70 x-=63·
110
6
100
1941 75 120 nX-=62.5
11.0
100
1942 80 13 0 80X-=6I.5
130
100
1943 90 15 0 90 )(-=60.0
ISO
100
100 zoo 100X-=so.0
1944 zoo
100
120 25° IZOX -=48.0
19 4 5 25 0
100
15 0 35 0 15 0X -=4Z.·9
1946 35 0
INDEX NUMBERS

REVERSIBILITY TESTS

Fisher who has done valuable researches on the construction of index


numbers has laid down two tests for a good index number. They are
(r) Time Reversal Test, and
(z) Factor Reversal Tost.
Time Reversal Test
In the words of Fisher: "Tht test is that the formula for calculating
all iudex lIumber should be .ruth that it wiJ/ give the J-ame ratio b~tweell olle point
of comparison and the other no matter which of the tlllO i.r taken as base." This
means that the index number should work both backwards as well as
forwards. Thus, if the index number of the current year is 400 then the
index number of the base year (based on the current year) should be
2.5. In other \vords, the two index numbers thus clIlculated (without
the figure 100) should be reciprocals of each other. The reciprocal of 4
is .z 5 and the reciprocal of .25 is 4. The product of these two ratios would
always be equal to one. Thus if Pl. represents the price change in the
current year and POl. the price change of the base year (baseo on the current
ycar) the following equation should be satisfied : -
POlXPlfJ=1
It should be remembered that those indices which use simple geo-
metric mean satisfy this test. Thus, in Example NO.5, the index number
of the year 1950 (by using the geometric mean) is 89 and the inde;x num-
ber of 1939 (based on 1950) is IIZ. Without the figure 100 thes(" ratio!!
are .89 and I. I 2 respectively. If they are multiplied the product would
be 1. If the index o"!lmbe:rs calculated by the us,: of arithmetic average
are multiplied in the above fashion the product would not be I.
Factor Reversal Test
In the words of Fisher: "Just a! each formula .rhould permit inter-
rhange oj two items without giving huonsistllnt result so it oNght to permit inttr-
changing the prke and quantities without giving il1consirtent re.rlllt, t. e., the tWI
results 1I'IIIltipJied together sbould give the true t'aille ratio."
It means that the changes in the price multiplied by the changes in
q?antity should be: equal to the total change in value. Change in value
is the result of changes in price and changes in quantity and as such the
p'roduct of these changes should represent the total change in value. Thus,
If the price of a commodity has Cloubled during a certain period and
if in· this period the quantity has trehled the total change in the value
should be six times the former level. In other words. if PI and Po repre-
scnt the prices and '11 and '10 the quantities in the current and the base
years respectively, and if POl represe~ts t.he change in price in the current
yea~ and 'i.l the change in the quantity In the current year then

Pel XfJ01= ~ PI '11


:E Po '10
284 FUNDAMENTALS OF STATISTICS

As we shall see later on, the only formula which satisfies the factor
reversal test is the "Ideal" formula given by Fh-her himself.
Circulai' 'test
Another test applied in index number studies ;.s the circular test.
It is a sort of extension of the time reversal test. Suppose an index num-
?er is constructed for the year 1955 with the base of 1939 and ano~her
lUde::s; number for 1939 on the balie of 1914, tcen it should be possible
for us to directly get an indc;K number for 1955 on the base of I914. If
the index number calculated directly does not give an inconsistent value,
the circular test is said to be satisfied. If POl represents th:: price change
of the current year on the base year and PlI the price change of the base
year on some other base and PIO' tht! price change of the current year on
this second base then the following equation should be satisfied : -
POI XPn XPIO=I
This test is not fulfilled by m9st of the common formulae used in the
construction of index numbers. Even Fisher's ideal formula does not
satisfy this test. This test is fulfilled by unweighted or fixed weighted
aggregatives or by index numbers which use simple geometric mean.
THE PROBLEM OF AN IDEAL INDEX NUMBER

We have now discussed most of the important problems associated


with the construction of index numbers and have also laid down certain
cunditions which an ideal index number should satisfy. TItus we have
, concluded that, in general, an index number should be Weighted, that
wdg hts should be fluctuating, and that the index numbers should satisfy
the reversibilitv tests. At this stage arises the problem of choosing such
a formula which would satisfy all or most of the conditions laid above.
It should be kept in mind that no formula is 100% perfect and further
that the selection of the formula would depend on the object with which
a particular index: number is constructed.
Hundreds of formulae can be used for the construction of indh
numbers. In fact in 192.0 Fisher examined as'I:~".any as 134 differen~ for-
mulae before evolving his own formula which he calls ideal. \Y!e shall
examine below some of the important and' well-known formulae and
see how far they satisfy the conditioDf ,aid down.
Un weighted average. of price re1atites
If unweighted average of pri~e relatives' is calculate.d the iodex
number so obtained.is not a very good .one firstlY, because it is unweight-
ed and secondly because it does not sati;,fy any of the reversibility tests
unless simple geometric mean has been used. to average the price Irela-
tives. If simple geometric mean has been used the formula would satisfy
the time re~ersal test. The question of satisfying factor reversal test
does not arIse here as the index number is unweighted and factor re-
versal test relates to weighted jndex numbers.
INDEX NUMBER-S 285
Weighted average of price relatives
We have already discussed that price rdatives can be weighted in a
number of ways and different formulae emerge by the use of different
types of weights. We shall examine Some of them.
(I) LA<;:>EYRES' FORMULA

~Pl qo
P01 :E Poqo
In the above formula weights are the quantities of the base-year.
The weights are fixed. This formula does not satisfy the time reversal
test because when the index number of the base year (based on the current
year) is calculated the quantities of the current yeal' would be taken as
weights. This will be clear from the following example : -
Example 9.

dity
Price 1940 Quantity "dcc '95 0
Commo- (Rupees) 1940 Rupee,
PI ql
Quontltle'
'95
r>'9' Pof
0 t,g, Pt'1l
'10 Po
_ - - - --~------_:-----
A 10 3 20 30 40 60 80
B

Total
t
5 4 15

,
4
3 %0 IS
- - - -- -
50 55
60

120 125
-_
45

I n d ex numb er f l l50 ~P1 '10 '120


0 ~ =::E Po '10 = ""'5'()
In the calculation of index number ot 1940 (.on the base of 1950)
the prices and quantities of 1950 would respectively become Po and '10
and the prices of 1940 would become P1'
Thus
55
Index number of 1940 = 125
The two price changes are not reCIprocals of each other. This
formula, however, is very popular on account of its practical utility.
In actual practice it is not possible to' have fresh weights each time an
index number is constructed and as such fixed weignts have to be used.
(2) P AASCHE'S FORMULA

:E Plql
POI =:EP091
This formula differs from the previous one inasmuch as the weights
, used he~~ arc the quantities of the current year and as such weights are
not fixed. This indc-x number is also not reversible. From the data
given in Example No.8.
286 FUNDAMENTALS OP STATISTICS

Index number of 195 0 = T.Pl ql =~


T.PO tJl j 5

If the year 1950 is now taken as base and the index number of 1940
is calculated the prices and quantities of 1950 would become re-spectively
p. and tJe and those of 1940 PI and ql respectively ..
lhus
50
Index number of 1940= -
120

Again we find that the time re\rersal test is not satisfied as the two
indices are not reciprocals of each other.
(;) DROBISCH .... 1'0 BOWLEy'g FORMULA

This formula is the arithmetic cross of the Laspeyre's and Paasche's


formulae. In other words

2-
From the- data given in Example 8
~ +~
So 55 l-n
Index number of 1950=------=--
2 IIO
5~ 50
12.5 120 +
257
Index flumber of 1 9 4 0 = - - - - - - = - -
2 600
Here again the time reversal test is not satisfied.
(4) FISHER'S ID&4.L FORMULA
The formula given by Fisher is a geometl'ic cross of Laspeyre's
aad Paasche's formulae.
In other words

P01-
_j 1: hlfJ>_ X
t, Poqo
~ .P"lJl_-
'i:.1POQ1
This formula sati.sfies both the time reversal test and the factor re-
versal test. From the data given in Example 8
12OI-Z;
Index number of 1950 = J -So- x -55-
INDEX NUMBERS 287

:And jf 1950 is taken as base, in which base, the prices and quan-
tities of 1950 would respectively be Po and qo and the index of 1940 is
calculated (the price and quantity of 1940 would now De PI and ql res-
pectively) it would be

The two index numbers are reciprocals of each other and the time
reversal test is satisfied.
Symbolically

and

Thus,

=1

This formula satisfies the factor reversal test also. In factor reversal
test besides the index of price change, the index of quantity change has
also to be calculated. In calculating quantity index the weig~ts are the
prices; in other words, the positions of P and tJ are interchanged. Thus,
quantity index of the current year or

~Plq]
qOl = J EPlqO
'E.poqo
X
'LP~1

_J~P~l X 'LPltJl
- .P04o 'LPlqo
For factor reversal test
POIXqOI = 'LAtJl
-"i:.p~q-;
In Fisher's Ideal formula

,POI X q01 = J "i:.PJqox Ylql_x 'Lpr/it X };.Plql


'LPofJo 'LPotJl 'Lpoqo\ 'LPlqo
-J'Y>IQl X 'LPIQl
V - 'Lpo1O 'LpoQo
"i:.Pl(/1
='Lp~o
288 FttNDAMF.NTALS OF STATISTICS

Thus the factor reversal test is satisfied by the formula.


In example 8 above

and
P01= J~x....::1._
55
50

:x.12~
Thus
qOl = J--;;-- 55
120

J ~x~x-!I.x
12 5
POIXqol= 5055 jO 120
12 5
50

The value of 1:. Plq 1_ is also equal to 125 Thus I factor reversal
~Po1o 50
test is satisfied
This fonnula does not satiSfy tqe circular test. Though Fisher has,
called it an ideal fc.rmuJa, its practical utility is not much because in its
ca1cul~tion each time fresh weights have to ,be used and generally it is
very difficult to have correct informatiOJ1 about, weights e.ch time the
index number is prepared. Moreover the computation of index number
by this formula involves difficulties in calculation.

(5) FISHER'S ALTERNATE FORMUT'4E

(a) Due to difficulties of calculation in the ideal index number,


Fisher has proposed another formula which giv~s very quick results·
This formula has been supported by Edgeworth and Marshall also:
It is as follows :_n (qo+_qr).b
=};
.1'01 ~
(qo+qr)Po
As is evide-.nt from the above formula, in it the prices of 1:urrent
and base years are weig):lted by the total of the quantities of the current
and base years. This index is. based on arithmetically crossed weighted
aggregatives. This index number satisfies the time rever~al test as the
weights remain constlHlt in it, irrespective of the yearwhich is taken as the
base.
Thus for data given in Example 8

Index number of 195 0 = ~ (QO+ql)Pr = 245


. :z (qO+ql)PO JOS
and
10 5
Index number of 1940 = - -
245
INDEX NUMBE.RS 281)

The two ilidices are reversible but in the computation of this in-
dex number also the current year's weights are needed. It has !he same
defect which the Fisher's ideal iode-x has. .
(b) A '1ariation is possible in the above-me~tioned form~ if
i06tead of arithmetically crossed weighted aggregatrv~ geometrIcally
crossed weighted aggrega?ve~ are taken into a~count.. It it is done the
formula becomes more saentific. Wah" consIders thIS formula as the
best, from th~ theor'~tical point ,?f view.
Thus
~ Vq-;(iJi;)
POI = E' '\Iqo(qlPo)
The data given io.Bnmple 8 would be used in the following manner
to c()ostruct index numbers in accordanC"e with this formula.
Priet
~Q1uantity
I~
'";:I ~
T94C Quantity
\~
.l:'nce
Com
roodity (PQ'
1940
( qo )
.1950'
( '>1 )
0 95
( fl. ) ~• ~
~ ~
0
~
---~-

A
--- -
10
--- -- - ---
o:t.
-> --- --~__
_- -_.
3 ..0 4 140 15·5 110 10·9
r
rotal
13 4 IS 3 180 13·4
---
7R.(,
60
--- __
7-7
--
TR,(!i
----~--~~---------------
index number of 1950= 2S.96_and
1£1.
Index number of 1940 = T8 ·6
1 S.9

This index number also satistte$ the time reversal test but here also
the difficulty is the same as in the previous two formulae, i. e., it als
r.e~uires current weights. which are very difficult to obtain. '
(6) KELLY'S PORMULA

Truman L. Kelly in his book "StatJstkal Methods" has tried to


ctthnate the probable errors of important types of formulae used in the
construction of index numbers. He has on this basis graded different
fQtmulae in common use. According to him the two best index num-
bors which give most reliable results are those which use weighted
Acometric mean and weighted median in their computation. Fishe~'"
~deal inda number gets a plaCC' lower than these. Next in excel1e~
are the index numbers 11Sing weighted arithmetic and harmonic a.vtra-
gCI5. Last in the rank are those 'index number.s which use unweiglited
arithmetic average of 'price rellltives.
PUNDAMENTA:r..s OP S'l'ATIS'llICS

However, for practical purposes Kelly thinks that a ratio of aggre-


gates with selec~ed weights (not necessarily of base year or current, year)
gives the base index number.
Tbus
r..Ptq
POI == r..Poq
In the above index q ~efets to the quantities of the year which is
selected as -the base. It may be any year, not necessarily base year or
current year. This index number uses fixed weight aggrcgative formula.
It is tho most popular index number formula throughout the world. It
satisfies the time reversal test, as the \veights in this case are fixed. They
do not refer to either the base year or current year. It satisfies the circular
test al.,o because \veights are constant. Circular test as 'we have seen
already, is satisfied either by (a) unweighted aggregatives CCi';::> or (b)
by constant weighted aggregativ~s ~i~:~ or (c) by simple geometric
average of relatives.
The only thing that goes against this index number is that it does not
take into account the weights either of base year or of current year. But
if weights have been chosen appropriately this criticism has hardly any
force. There is no sanctity about the weights either of the base year or
the current year. All that is necessary is that weights should be appro-
priate and should indicate the relative importance of the vaHous commo-
dities. However, if weights are revised after short intervals or if base
year is changed frequently even this lame criticism can be met.
The above discussion should not lead to the conclusion that
anyone index number is best for all purposes. The probleulS in .,hich the
t"hniqlle of index number is used are so NJany and so diverse that no single
fo,.mula can be advocated for universal applieation.
Uses and Limitations of Index Numbers
From the discussion so far it is obvious that the technique of index
numbers is used to study all sueh problems which are capabfe of qaanti-
tative expression and which change with change in time. It is not ne-
cessary that the changes in the level of such phenomenon should be re-
gular. Further, it should also be clearly understood that index number
measure relative changes and not absolute ones. Since these conditions
laid above are found in many phenomena so we find index number of
""Ibious types in common use. Index numhers of price, cost of living,
industrial production, business conditions7 exports, imports, profits,
etc. are only a few important types of index numbers. All thes indices
are very useful in their own fields. Wholesale price index numbers
tell us about the changes in the general price level in a country, and
throw light on the value of the money. A study of the purch..sing power
INDEX NUMBERS

of money is very important from various points of view. If it changc:s


very frequently, the economic set-up of the country ca.rinot remain stable
aDd that is why attempts are always made to keep it within limits. But
before any steps -can be ;aken iQ.. this direction, it becomes necessary too.
measUre the purchasing' power of money which is not possible witho~t'l
the help of index numhers. lndex numbers also help in a study. Of
comparative purchasing power of money in different countries of the
world and of stability in their price levels. Cost of living index numbers
tell uS about the changes in the cost of living of different groups of people
in a society. With such index D\lmbers it is possible to study the changes
in t~e level o~ real w.a~~s 'of t~e labo~ers. Indices of industrial pro-
duction help 1n measurlng the 1!ldustrlal progress made by a partcfular
country. Similarly, index number of business activity throw light
on the economic progress iha( various countries have made. Thus
the index numbe,r' of various types are very' useful in their own fields.
T!J.ey not only measure the relative changes in the level of the pheno-
menou which they measure but klso help in forecasting their future"
trends.
But it should always be kept in mind that index numbers hav~
their own limitations. They are only 4pproximate ndiciators of the rela-
tive level of a phenomenon. There can be errors not only in the collec-
tion of data but also in the selection of the base, selection of represen-
tative commodities and selection of appropriate weights. Each step in
the construction of index numbers is full of possibilities of errors of all
types, but despite th:ese dangers, it can safely be said that if an index
number is not deliberately distortl.!d it will show correctly, at least the
trend of the phenomenon which it is measuring. But indices cons~ruct­
ed for one purpose should not be used in other places w'here they may
not be fully appropriate and way give fallacious conclusions.

Questions
1. "It is-really questionable-though bodering on }1carsay to put tbl= qucstion-
whether we would be any worse off if the whole bag of tricks relating to lodex numbers
wetc BCIapped." Commenting on the above statement, discuss the utility of the index
numbe1: in modem times.
2. "The pernicious nature of tying wages to cost of living indexes is apparent.
The whole scheme is positively Machiavellian in its acceptance of deceptions at a ne-
cesaitt in )'lolities. And does it really work so well after all? The truth is that
too ineffiClcnt even to keep the workers s~dardized." Comment.
3. "Index numbers-are economic barometers." EXplain this statement and
mention what precautions should be taken in making use of any published index num-
bers. (B. COfII., AlJababad, 19'z).
4. On what basis should commodities be selected for purposes of constru~is
a wholesale price index number ?
5-. Describe with illustrations the construction of a weighted'lndex number
of wholesale prices and show its importance. (B. Co",., Nllgptlr, I9U).
S. Distinguish between fixed base and chain base iOdex numb'er. What are
thei1:tcapective merits and dementa ?
292 l'UNl)i\MF.NTA L~ OF ~TATJSl'lC~

7. Examine the claims of (a) ge<_:>metric average'and (b) chain ~as~ metho,d In
the technique of index number construct.~_I. Illustrate your answer with illustration •.
(B. Com., Delhi, 19H).
8. Show givJ.ng suitable examples the importance of the use of index numbeca
in interpreting ecbnomic effects.
(B. COlli., Allahabad. 1946).
9. What po~ts w~uld you take .into considera!i~n ~ choosing the base and
determining the wClghts In the 12repamtlon of cost of hvmg Index numbers.
(B. Com .• Agra, 1943).
10. What are the muin sources. of errorS in cost of living !index numbers ?
How' om these errors be avoided? (B. CI1IIJ., Allahabad, 1938).
11. Write a note on the construction of an index number ofind!lstrial production.
12. Explain Fisher's ideal meth<?d 'of constructi!lg ind~x numbers. D? yo~
think that this method can be adopted m the construction of Index numbf'rs In this
country? If not, why ?
13. Discuss time reversal and factor reversal tests. Do you think It is necessary
that a good index number should satisfy them? IT so, why?
14.' Define an index number. Explain the role of weights in the construction
of an index of a general price level. (M. A., Rajptdafltl, 1950).
IS. "The real problem for the maker of index numbers is whether he shall leave
weighting to chance or seck to rationalise it." (Mitchel/)
Distinguish clearly between chance weighfing and rational weighting and
suggest a solution of the above problem. Also discuss whether Fisher's ideal formula
offers a rational system of weighting. {M. Com., Allahabad, 19' I).
16. Discuss the problem of obtaining a perfect formula for tfe index number of
prices. Explain fully what is meant by reversibility of an index number.
(M. Com., AlJahPhmJ, 1940).
17. ••... the method of index numbers is at once applicable to the disentangiement
of that which ill common to the whole group from those variations which are special
co individual items." Elucidate. (M. COlli., Rajplllana, 1942).
18. "Index numbers seek to set aside the irregularity of individual instances
and replace it by the regularity of big numbers." Comment. (M. A., Punjab, 1953).
19. Describe briefly the method you will adopt for the compilation of cost of
living numbers forworking classes in an Industrial atta. (B. Com., Honr. Allt/hra, 1944).
20. "The discus~ion of the proper weights to be- used has occupied a space in
statisticallitt!rature out of all proportions to its significance. For it may be said that
no great importance need be attached to the special choice of weights; one of the most
convenient facts of statistics I theory is that given certain conditions the same result
is obtained w!th sufficient closeness whatever logical system of Weights is applied."
(Bowley). DISCUSS the above statement.. (M. Com., Allahabad, 194 8).
21. Write short notes on- '
(a) Base shifting, (b) Splicic}g of index numbers, (&) Deflating of index
Qumbers.
22. Y-ou are required to construct a cost of living index for the- textile workers
of a city•. What information yon will collect for the purpose? Explain the method of
constructlOg the mdex. (1. A. S., 1943).
23· Co~pute.thei?de"numb~rsfoieach year fromtRe following average annual
wholesale prIce ofll1te In Cakntta In mpeeR f'er bale of 400 lb•. for the period 1914 to
193 0 :
INDEX NUMBERS 293
Year Rupeef! Year Rupees
I l4 78 192% 88
191} H 192.3 78
1916 67 192.4 ;'6
1917 56 192.5 Jl2.
1918 72 1926 99
[9!9 '<)2 1927 76
1920 9R 192.8 7}
192.1 94 192.9 71
193 0 5"
24. Construct appropriate ind("x numbers to discus!> the fluctuations in thc
expurt of raw cotton and raw jute from India for the period 1930-31 to 1935-36,
using the average of the period 1926-30 as base.
Raw Cotton Qty. Cotton value Raw jute Quantity jute Value
Year (1000 tons) (lakhs of Rs.) (1000 tons) (Lakhs of Rs.)
1926 -3 0 609 5,941 826 2,9 24
(aver-age)
193 0 -3 1 701 4.633 620 1.288
[93 1-32. 42.3 2,;45 5 87 t,119
[93 2 -33 36 5 2,°37 J 63 973
19H-H 104 z,7H 74 8 1,093
[9H-35 6.13 3.495 752. [,087
1935-3 6 607 ,3,3 77 77 1 1,31 1
(1. C. S., (939).
25. Use the following <.lata of industrial production in India to compare the
annual fluctuations in Indian industrial activity by the chain base method:-
Index Numbers of Industrial Production in India
Year Index No. Index No.
12.Q 149
-2.1 rz2. 15 6
n6 -29 (37
I~O .-3 0 16.2
1%0 - 31 149
-3 z 160
--33 160
(M_ CD",., Luehno1lll, 19H).
26. The following table ~jves the avctagc I ... ho]esale prices of the commoditics
. A , Band C dot II
----- -_._------- ----

..

{.
2.
A
B
:~
-0
C.
a
e
(3
--__.__

50 .6
6.8
1944
----

61.6
1945

6.4
--R-------
Average wholesale pric'-'8 in Rupees

1946

----
66.8
J.6
'~47 194 8

-- ---- ---- - - --
71.0
6.2
70.6
6.4
1949

72.0
7.8
195 0

72.0
6.0
195 1

7~.6
6.8
3· C 29.6 25.8 26.4 28.6 28.6 30.2 d.o
.. 34.6
Find out thc m..!cx mlmu.:rs (I) by rcterCllct: tv. 1944 as base yeru: (II) by the cham
balle method.
:1-7. You arc given the following series of index numbers of whdlesale'pdccs of
four commoditiea and a straight index based on the average. CaI,..,latc Ii new index
for the five years based on the chain method.
294 FUNDAMBNTALS OF STATISnCS

.
Year A B C D Total Average

1946 308 47 6 :u.o 31 4 1,3 18 511


1947 u6 5U 3%8 %4 8 1,3 0 4 316
1948 348 444 40 0 4 16 1,608 402

I
1949 33 0 616 3 84 371 1,671 41 8
19,0 171 660 35.:& %40 1,424 35 6
19" 176 1 6~6 35" %40 1,408 ~S2

%8. Which avenge would yoo use in computing tho: price Index Number frorn
the followbg d?t.. f<>f 1934 on the basil of 1930? Give realons.

Commodity Unit Price in t9~0 Price in 1934


Rs. P. Rs. p.
t. Rice per 40 kga 4_ n 7 1%
z. Wheat ., 6%
5· Linseed
4· Gur
.... 3
6 ,0
4
4
6
H
87
1S
,. Cotton ..
6
17
:1:1
:IS 1:1 '94
6. Tobacco .. .. 15 00 II a:l
%9. The following table shows the index numbers of Wholesale prices of certain
commoditiC'8 in 19:17 and 1937 Ouly, 1914. being taken as 100). DiscUH criticsily
how JOU lVould compare the at'crage ratio of prices in 1937 to those in 1927.
commenting on the relative advantages and disadvantages of alternative methods which
may be used for this purpole. I
Commodity Index Number of Prices
19 1 7 1937
Jute (taw) 93 ,6
Jute manufacture& 146 67
Cotton (raw) 16 7 89
Cotton manufacture, IS9 117
Wool and 'ilk 12.6 12.6
50. Fttlm the following data prepare a weighted index number of the food group
for 1949 with 1939 aa the base pc-riod.
Items in the food group Weights Price per leer in Price per leer In
1939 1949
Ra. as. p. Rs. as. p.
1. W'he:tf 40 0 5 0 7 6
z. Rice 20 0 % 0 0 10 0
3· Gram IS d I 0 0 S 6
,.
4. Arhar dal
Milk
6. MUltard Oil
6
10
5 0
()
z
%
3
6
0
0
0
0
9
10
8
0
0
0
0 S
Sugar 14 0
t Salt
3
I
0
0
4
I
0
0
0
0 ~ 0

100
eM. C_., LIltA:mJ.,. 19jO.J
INDEX NUMBERS 295

Prlces QuanUty
Crops ___ Base year 1947 Base year 1947
1 12 20 50 120
2 10 12 100 70
3 14 15 60 70
4 16 18 30 50
5 18 20 40 40
6 :a 15 70 60
7 20 16 90 100
8 !l.-_ 18 80 80
Find the index nUn;Jbers for 1947 by (I) Base year weighting, (2) Current year
weighting, (3) Fisher's Ideal Formula.
32.-' Given the following data, what index numbers would you.J.!se for purposes
of cod('parison ? Give reasons.
Rice Wheat Jowar
Year Price Quantity Price Quantity Price Quantity
19 2 7 9.3 100 6.4 It S·l 1
1934 4·5 90 3.7 40 2·7 3
Prices and quantities are given in arbitrary units.
,M. A., Cal., 1947).
33. Explain what is meant by Factor Reversal Test. Construct with the help
of the data given below Fisher's Ideal Index, and show how it sati<;lies the factor
re versal test.
Estimated total produce Harvest price per
in thous:md tons in mannM in fHRt.,.;,.....·
- -·dts"Trict Sarau -- Sa~-----

193 1 -3 2 193 2-33


Rs. P.
P. Re.
Winter Rice 17 26 3 50 3 I2
Barley 107 8~ 2 00 I 87
Maize 62 45 2 56 I 75
H. Prove using the following data that the Factor Reversal Te~t is satisfied by
Fisher's Ideal Fonnulz for Index No.
Base vear Base year Current year Current year-
Commodity Prfce Q;uantity Price Quantity
A 6 50 10 56
B 2 100 2 ud
C .4 60 6 60
D 10 12 24
P. 8 I2 26
(M. COIl1., Allahabad, 1946).
Construct with the help ofthe data given below Fisher's Ideal Index, and
35.
show how it satisfies the Time Reversal Test.
Commodity Base year Base year' Current year Current year
price quantity price quantity
Unit Rs. Rs.
Wheat p.rod. G 20 60
Ghee p. seer 2 6 10
Firewood p.md. I 20 25
Su,gar p.tive
seers '- 10 8
Cloth p. yd. I 40 30
FUNDAMENTALS OF STATISTICS

3". The following figures show the imports of cotton p!ece-goods into India
from Gteat Britain duting 191,-14 and :it few post-war years. Flll:d (a) i~dex nU~\lbet8
of quantity, (b) index numbers of value, and (c) index numbers of prtce, usmg the figures
of 1915-14 as base.

Quanlil.J (million :l'IrdJ)


Year Grey White Coloured Grey White Coloured
(unbleached) (bleached) (printed (unblea- (bleached) (printed
or dyed) cbed) or dyed)
1913-14 15H 793 832 17.0 9·5 1I·9
1929-,0 926 474 48, 15.7 10.0 II·4
1930-31 365 272 246 5·2 4.7 5. 1
1931-32 .249 280 22, 2.9 4.0 ,.8
Ignore the effects of exchange variation between the two countries
(1. C. S., 193 8).
37. The following are the group index numbers and the group weights of an
average working class family's budget. Con~truct the cost of living index number by
assigning the given weights.
Group Index No. Weights
Food 48
Fuel and Lighting 220 10
Clothing %,0 8
Rent 10 12.
Miscellaneous 190 IS
__(l • .A. S., 1950).
38. Calculate a cost of living index from the following indicesl the weights
being Food 60, Fuel and Light 8, Clothing IZ, Rent 16. and Misc.ellaneous 4. Total
100.
(i) (ii) (iii) (il') (II)
Food Rent Clothing fuel and light Miscellaneous
1924 100 100 100 100 100
191.5 102 100 103 100 97
1926 106 102 105 101 98
1927 10 4 103 106 102 99
192.8 107 105 lOS 101 102.
39.COIlSTr.lct the cost of living index number for 1940 on the basis of 1939
from the following data using the Aggregate Expenditure Metbod.
Quantity
Article consumed in Unit Ptke in Price in
1939 1939 1940
Rs. P. Rs. p.
Rice
Wheat
.Gram
6
6
I
..
mds. maund

...
I,
5
S
6
75
00
00
8
9
6 00
00
00
Arhat pulse 6 " 8 00 IC 00
Ghee 4 "
seers seet 2 no 1 50
Sugar 1 md. maund 1.0 00 15 00
Salt
Oil
Oothing
I2 seers
20
50 yds.
. seer
,.
ya.rd
20
4
0
50
00
50
18
0
a
00
75
75
Firewood I2 mds. maund 0 50 1 Ill:
Kerosene I tin °tin 4 00 3 12
House-reot bouse 10 75 12- 75
INDEX NV MBBRS 297
40. Construct the cost of living Index Number for 1950 on the basis of [~4P
from the following data using the Family Budget Method.

A.rticle Quantity Unit Price in Price in


consumed in 1946 195 0
1946
Rs. as.
Rice 5 mds. per md 12 16
Bajrn 5 mds. 8 10
Wheat
Gram
Arh9c
1 md.
I md.
S mds.
.. "" 10
6
8
zo
12
IZ
Other pulses l.mds. ,.per"seer 6 8
Ghee 4 seers. 2·5 4

.... .."
Gur 2mds. permd. S 10
Salt u.5 seers. S JO
Oil 24 seers 40 50
Clothing 40 yd". per yard ·5 I
Firewood 10 mds. per md. 1.6
I
Kerosene I tin per tin 74
House·rent per house 24

41. An enquiry into the budgets of the middle dass families in a dty in
England gave the following information:
Expenses on Food Rent Clothing Fuel Misc.
35% 15% 20% 10% 20%
Prices (19z8) [.15 0 [.3 0 [.75 [.2.5 [.40
Prices (19Z9) [.145 [.3 0 £65 {,z .. [.4'
What changes in cost of living figures of 19Z9 ~s compared with that of 19z8
are seen ? (B. Com.. LII&kllolP. 1944).
4Z. An average famiI;. of inauStriit workers in a certain town consumed during
August 1939. 1.5 maunds 0 foodgrain, 10 yds. of1::loth, 2. maunds of fuel. and 1 ti~
of kerosene oil and paid Rs. 15 as house rent. Food grain then sold at an average price
of Rs. 6 per maund, cloth at 8 as. per yard, and fuel at Rs. Z.4 per mc~. while a tin of
kerosene at Rs. ,. By August 1943, the average prices of foodgrains and cloth had
risen to three times and:r.j. times the pre-war average, respectively, fuel rose to Rs. S
per maund and hOllse rent to Rs. zoo The solitary exception was kerosene Whose price
fell by 8 annas per tin.
Express in quantitative terms, the rise that took place in the cost of'living of
industrial workers in the given town in August 1943, as compared with August 1939
making clear your method of approach. (M. Com •• Allahabad, 1948).
43. Given below are two sets of indice~. one with 1939 as base and the other
with 1947 as base. "
(a) Year Index Nos. (b) Year Index Nos.
1939 100 19<!7 100
1940 12.0 1948 no
1941 15 0 1949 90
1942 zoo 1950 98
1943 300 1951 101
1944 :150 195z 110
1945 370 1953 98
1946 3 80 ~914 96
1947 400
298 FUNDAMENTALS OF STATISTICS

The index number (0) with 1939 base was discontinued in 1947. It is desired to
splice the dccond index number (b) with 1947 base to the first index number for the sake
of continuity. How will it be done, so that the combined series has a common base
of 1939 ?
44. You are given a sufticic:nt number of family budgets which shows the total
expenditure of each family, and the number of children, adult males, and adult females
in each family. Explain bow you will employ the data to derive the most suitable weights
that may be given for the different costs of maintaining (0) one adult male. (b) one adult
female, and (&) one child? (LA.s., 1947).
45' What considerations should enter in the selection of the base period for the
computation of serial index number? When aQll how will you give effect to a shift
in the base originally selected? Explain with illOstrations the meanings of.the terms: ,
Factor reversal test, Fisher's Ideal Index Number. (I. A. S., 1952).
46. Q; and Pi respectively denote the quantity purchased and the pr.ice per unit
of each of n commodities (;= 1,2, ........................n) in the 'base' year; (jl and PI, the
corresponding measures in the current year.
What exactly do the following ratios indicate, and test whether each of them
satisfies (0) the time reversal test and (b) the factor reversal test l -
(,) ~pil ~Pi ; (#;) '4Qi pi/ ~Qi Pi ;
(;;) "»II pi/ IQi Pi ; (i,) I(Qi+gz) pi/ ~(Qi+!Ii) Pi.

(,)
X
"2qI
»liP;
pi] i
--
In all cases sum'lllltion eJ[tends over all the" values ofI, and the ratio is multiplied
by 10C. (I. A. S., 1943).
47. Define an 'Index number' I
The average of wholesale prices was higher in 1937 than in 19~6 by 1~.I per
cell.t. ~he index numbers ~or the two years b;ipg !o8.7 and 94.4 respectively (19;0= 100).
TIllS Incte:'se f~~owed rISes. of 6.1. 1.0 and 2.8 per,cent. Cllch year being compared WillI
[he precedmg. 11l 1933-Pnccs- were the same as In 1932 but 1.1 per cent. below 193 1 •
Prices in 1931 wete 13.2 per cent. below 1930'
From these data compute the index numbers for each year from 1930 to 1937.
(P. C. S., 19S6).
4 8: Explain the methods by which index numbers of volume and price of national
productIOn are prepared, and discuss to what extent each method is satisfactory ?
(I. A. S., 1947).
. 49· You are required to construct a cost of living index for te:ttile workers of a
CIty. Indicate what information you wo~ld collect for the purpose, and explain the
method of constructing the index.
(I. A. S., 1948).
So. How will you construct an index number of prices that will exhibit with great
s~nsitjveness movements in the general price level. Examine from this point of
vIew the (Indian) Economic Adviser's Index Numbefl~f Wholesale prices (I.A.s.,1949).
p. What are index numbers of prices and for what purposes ate they used?
descnbe the general method of construction of a wholesale price inde" nu:nber illus-
trating your remarks with the help of any official index in current use in India.
(L A. S., 1955).
. 52. What is 17ishcr's Ideal Formula for preparmg Index Numbers ~ What are
TIme Reversal' and 'Factor Reversal' tests?
INDEX NUMBERS 299
Compute an appropriate index number for purposes of comparison from the
following data : -
Rice Wheat Jowar

¥-ear Price Quantity Price Quantity Price Quantity

--------- -
1935 4 50 3 10 Z 5

1945 10 40 8. S 4 4

(Prices and quantities are stated in arbitrary units.) (1. A. S., I95 6).

53. One of the important problems during the Second Plan period is to keep a
watch on inflation of prices. For this it is convenient to define a suitable index number
of prices and study its short term changes. A.t. is ut.uai with the oont.truction of index
numbers there are difficulties in (a) choice of a suitable formula, (b) choice of a base.
period, (f) periodical collection of statistics, ctc.
Yoiu are required to prepare a brief report, discus~ing the various issues involved
and givng your recommendations about building a series of index.number of prices,
keeping in view the purpose for which this series is intended. Also suggest at what
interval the index number should be computed. You need not writt' an essay on the
various ways of COt;nputing index numbers but only justify the procecluresou
recomn:end.
(1. A. S., 19,8).

54. The Clitpenditure of a certain business on materials can be grouped under


three main headings in the ratio of 6 : , : 3. If the average prices in these groups
rise by 420/0. 35% and 28% respectively. by what pei-centage is Clitpenditure on ma-
terials increased if the same amount as before is purchased.
". On a certain date, the Ministry of !.about Retail Price Index was 204.6.
Percentage increases in prices were-Rent and Rates 6l, Oothiog .220, Fuel and light
no, and Miscellaneous U5. What was the percentage increase in food-group?
__ ,6. In a working class budget enquiry in towns A and B it was found in 1931
that an average working class &mily's expenditure on "food" lJIld "other items" was
as follows : -
Town A Town B
"Pood" 64 % 50%
"Other items" 30% j'o%
111 1947 the working class cost of Living Index stood at 279 for Town A and
165 for Town B (Base year 1929 = 100). It was known that the rise in the prices of
all articles consumed by the working classes was the same for A and B. What was
the 1947 Index for (a) Food and (b) Other items ?
57. Show that Pisher's Idelli Index Number fails to satisfy the circuiu test but
satisfies factor reversal and, time reversal tests.
. 58. In 1950 a Statistical Bureau started an Index of production based on 1944,
with the following tesP!ts, Index Nos. 1944-100, 1950--'12.0 and 1959-:1.00. In
1960 the Bureau reconstructed tbe Index on a new plan with 19'9 as base i.e., 1959
= 100, and found index of 1965 as 150. In 1966, Bureau again reconstructed the
IndClit on yet another plan with base 1961, i.6., Index No. ofx96~ ="100 and calculated
Index No. of 1966 as uo. Splice these three series together 80 as to give a continuous
series with base 1965 = 100. Draw up a working table in parallel columns.
Diagramnlatic Represen-
tation of Data 13
Need. It has already been discussed earlier that one of the most
important functions of the science of statistics is to simplify the com-
plexity of quantitative data and to make them easily intelligible. We
have seen how various statistical methods like tabulation, classification,
calculation of derivatives, averages and index numbers etc. reduce the
complexity of statistical data and make it possible to draw conclusions
from them. Tabulation and classification are meant for the systematic
presentation of data. so that they may not appear haphazard, unmanageable
and unintelligible. Measures of central tendency and index numbers help
in comparison of the data by converting them into single figures. All
these statistical methods have their own limitations. One more method
of making the data intelligible is to represent them by means of diagrams
and graphs. The special feature of dIagrams and graphs is that they do
away with figures altogether and present dry and uninteresting statistical
facts in the shape of attractive and appealing pictures and charts. Figu1:es
are usually aVOIded by the common man but pictureS. diagrams and graphs,
etc. always attract and impress him. Diagrams and graphs give a bird's
eye-view of the whole mass of statistical data that have Ibeen collected
about any problem. Persons seeing diagrams and graphs have not to
bother themselves about figures, they have neither to think nor to do
some quiet mental arithmetic to understand the problem in question;
they have merely to see the picture and conclusions automatically follow.
Diagrammatic and graphic representation of data are methods of visual
aids and are appealing to the eye and the mind.
Usejllineu. Diagrams and curves al.~ thus very useful. The most
important advantage of diagrammatic prese!ltation of data is that diagrams
are very attractive, and interesting, so far as the c.""'mmon man is concerned.
It is ~ common fact that people, in general, avoid figures. To them they
are terse, dry and mischievous but they are always impressed by pictorial
presentation of facts and are very much attracted by neat and good dia-
grams and pictures. Even while reading general books, people skip
over figures but always search and see \lictures if'there are any. it is
a psychological fact and to take advantage of it, quantitative data are
presented in the shape of pictures and diagrams. Further, since people
see pictures carefully, their effect on the mind is more stable. The Im-
pression oreated by a diagram or a picture is likely to last longer in the
mind than the effect created by a set of figures. Moreover, pictures,
diagrams, etc., are very catching, and this is the reason why modem adver-
tisements are mostly in the shape of attractive pictures anG ~ligrams.
Figures ate ra rely used fr- purposes of advertisement Modern
DIAGRAMMATIC REPRESENTATION Oll DATA 301
advertisement is based on human psychology and advertisers realise
that'inherently people have a craving and love for beautiful pictures and
they take advantage of this fact.
Diagtams, hesides being attractive, possess another merit and it
is that they simplif,. complllxity. Generally, all figures and particularly
big ones are not easily understood, but if they are represented by diagrams
their importance is at once realised. The reason is that one has to tax
his brain in undetstanding figures and drawing conclusions from them
but in case of diagrams conclusions automatically follow. If, for
example, it is said the population of India is ,6 crores and the hlnd area
about 8 I crore acres, in comparison to the total world populations of
2.40 crores and total land an:a of ;,2.~ 7 crore acres, the conclusions are
not evident, Even if it is said that the per capita figure of land in India
is 2..25 acres whereas the world average is more thanl;.s acres, the com-
parison is not very clear and one will have to think about these figures.
If, however, these data are represented by means of a diagram or picture
it will be more impressive and readily intelligible. If two bars, whose
lengths are in the ratio of 2.2.5 and I ;.5 respectively are drawn and if
below the smaller bar is written "per capita land in India" and below the
bigger one "per capita land in the world", the comparison would become
more effective and the central idea of the data would be easily grasped
by even a layman.
Since no efforts are necessary in understanding diagrams they save
lime which is otherwise needed in drawing inferences from a set of
figures. They give conclusions at a glapce of the data and no mental
efforts ate needed for the purpose.
Diagrams are of great help in all types of studies. To an economist
and to a businessman their importance is very great. Difficult econo-
mic theories can be easily understood if proper diagrams and graphs
are used. A businessman has hardly time to devote to the huge amount
of statistics which are available about his line of business, but if diagrams
and pictures are available he can easily draw his own inferences from
them in a very short time. Social reformers, poltticians, administrators,
in fact, all types of peuple greatly benefit by the use of diagrams and
pictures in their respective fields.
Limitations. Due to the above mentioned reasons, the represen_
tation of data by means of diagrams has I;>ecome very popular now-a-days·
It should, however, be remembered that diagrams should te used pre-
ferably in those places where the purpose ~s to explain the significance
of some statistical fact to the general public because diagrams only give
an approximate picture of the data. For a student of statistics or for an
investigator who is doing an exhaustive analysis of figures, the utility
of diagrams is not much. They are meant mostly to explain and impress
quantitative facts to the general public, as figures are difficult for them
to understand and follow. Further, they should be used to compare
only such data which are technically comparable with each other. They
should either relate to the same phenomen.on or such phenomena which
are capable of measurement in one common unit. A single diagram or
302 FUNDAMENTALS OF STATISTICS

picture has not much significance. It can be interpreted only when


there is another diagram or pictur~ to which it can be compared. It
should also be remembered that diagrams are capable of misuse very
easily. If a wrong type of diagram has been used it can give fallacious
conclusions and one should always safeguard against such types of
diagram~ and pictures.
Characteristics of diagrams and rules for drawing them
Diagrams only present data. Before laying down any rule for drawing
of diagrams it is necessarv to throw some more light on the special
features of diagrammatic ;epresentation of data because directions for
drawing diagrams can be given only in the light of these special features.
It has' already been said earlier that diagrams are meant only to give a
pictorial representation of the quantitative data with a view to make them
readily intelligible. Diagrams are not meant for anything beyond it. They
do not prove or disprove a particular fact. They are not suitable fur
further analysis of data which can he done only from figures. If dia-
grams are properly d~wn they can no doubt give proper emphasis on the
different characteristics of the quantitative data. Thev are capable of
focussing attention on the main findin_gs of the enquiry in question.
Beyond this, however, they do nothing. As such, while deciding about
the directions for drawing diagrams, this fact should be kept in mind
that the purpose of diagrammatic representation of 'lata is only to give
a pictorial view of the quantitative facts so that they may become more
attractive and readily intelligible.
Data should be homogeneous. Another point to be noted in this co~­
nection is that the person drawing the diagrams should make himself
sure that the data are really capable of diagrammatic representation. All
types of figures cannot be suitably represented by means of diagrams.
Statistical data which are intended to be diagrammatically represented
sho~ld be homogeneous and comparable. If we have a set of figures
relatIng to the national income, the number of houses, and the number of
unmarried females, rehlting to our country, they cannot be presented in
the shape of diagram. These figures are entirely unrelated. If, however,
the figures relate 19 the total population, its sex ratio, and its
marital status, very'useful diagrams can be constructed from them as
the figures are now homogeneous and inter-related. A single figure is
also useless for the purpose uf diagrammatic representation. A single
figure of Iridian National Income in the year 1951 cannot be represented
dirammatically unless there are other similar figures wi.th which it
can be compared or unless this figure itself is divided in component parts
and their relationships are to be studied.
Diagrams are not substitutes for figures. Another very important
characteristic of diagrams is that though they represent figures yet they
are not substitutes' for them. A figure is something constant whereas
the size of a bar or a square or a circle or a picture representing a 'let of
figures changes with changes in the scale with which they have been
drawn. If the same data are represented on two differeat scales their
DIAGRAMMATIC REPRESENTATION OF DATA 303
size would differ and sometime!; they may create very misleading im-
pression on the minds of the public.
SeleCtion of the . scale. Thus, 'a very important thing in connection
with drawing of diagrams is the selection of a proper scale. No hard
and fast rule can be laid down for this purpose. However, the scale
should be such that the diagrams constructed by using it are neither
so big that they cannot be easily understood at one glance nor so small
that they require time for grasping the central theme of the data. In
deciding about the scale, the size of the paper on which a particular dia-
gram is being constructed should be kept in mind. The size of the
diagram should be such that all the important characteristics of the data
are properly emphasised and can be understood merely by looking at
it. If two or more diagrams are being drawn for purposes of compa-
rison the same scale should be used otherwise they would give mislead-
ing conclusions. The vertical scale is generally shown on the left hand
side and the horizontal scale at the bottom of the diagram. The scale
should always be very clearly indicated in the figure, as without it no
diagram is considered complete.
Should be attractive ant: self-explanatory. A good diagram should
be neat and clean. It should be appealing to the eye and should be
attractive. Diagrams that attract the attention of the common man
are said to be good. As stated earlier diagrams are meant for the
presentation of data and if they suitably present the data in a pictorial
fashion so that the attention is diverted towards them, they have achiev-
ed their objective. Each diagram should have a proper heading and it
should be complete in itself. It should so represent the data that there
is no necessity of reading foot-notes or explanatory memoranda to un-
derstand them. Various points can be emphasised in a diagram by ,the
use of different colours. thick and thin lines, dots, and crossing, etc.
Selection oj diagrafll. The last and probably the most important
point is the selection of the proper diagram to represent data. All
types of diagrams are not suitable for all types of data. If a mistake
is committed in the selection of the suitaQle type of diagram, mislead-
ing imoressions would be created. An inappropriate diagram would
give a distorted impression about a phenomenon. Extreme care should
be exercised in the selection of a particular type or' diagram th:lt would
be suitable to represent a given set of figures.
DIFFERENT TYPES OF DIAGRAMS

The following are the important types of diagrams in common


UGe : -

(1) Dimensional Diagrams


(a) One-dimensionrll diagrams. 'They are in the ~hape of vertical or
hori 2:ontallines or bars. The lengths of the lines or bars are in propor-
tion to the different figures they represent.
304 FUNDA.MENTALS OF STATISTICS

(h) Two-dimen.rionaJ diagram.r. They are in the shape of rectangles,


squares or circles. 1 he areas of squares, rectangles or circles are in
proportion to the size of items which they represent.
(c) Three-dimen.rional diagram.r. They are in the shape of cubes,
blocks, or cylinders. Here the volumes of cubes, blocks or cylinders
are in proportion to given values.
(2) Pictograms
Here the figures are represented by pictures. The si:>:e or the number
of pictures is in proportion to the given figures.
(3) Cartograms
Here maps are drawn and the figures representing th-:: phenomena
at various plac~s are shown by signs or symbols.
(4) Graphs anti curves
Here the figures are shown in the shape of graphs or curves. They
can be either on natural scale or ratio scale.
The terms "diagram", "chart" and «graph'_' are very often used
as synonyms. We shall be using the term diagram for the first three
categories mentioned above. For the fourth category we shall uSe the
term graph. In the present chapter we shall study diagrams only, and
graphs would be studied in the next chapter.
ONE-DIMENSION_A,.L DIAGRAMS

As has been said earlier in these diagrams only the length of the bars
or lines is ta!ten into account. Since only one dimension of the figure
is taken into account these diagrams are known as one-dimensional dia-
grams. The bars which are drawn can be of any width or thickness.
It has no effect on the diagram. Howeyer, the thickness should not be
too much as otherwise bars would appear like rectangles and give a
misleading impression. Such diagrams are also known as Bar Dia-
gfam.r. Bar dbgrams can be of three types : -
(i) Simple bar diagrams.
(it') Multiple bar diagrams.
(iii) Sub-divided bar diagrams.
In simple bar diagrams one bar represents only one figure and a~
such there will be as many bars as the number of figures. Such dia-
grams represent only one particular type of data. For example, the number
of students in a university year after year can be represented by such bars.
Each bar would represent their number in a particular year. Multiple
bar diagrams are prepared on the basis of simple bar diagrams. These
diagrams represent more than one type of data at a time. Thus, if for
each year two bars are constructed- one representing the number of male
students and the other representing the number of female students, the
pigram would be a multiple bar diagram. Similarly, there can be dia-
grams in which 3, 4 or 5 or even more bars are constructed at the same
DIAGRAMMATIC REl'RESENTA"'TON OF DATA

time side by Slde Population data relating to occupational structl1r~,


civil conditions, age, etc. are usually represented by such diagrams. The
third type of 'Sar diagrams are those in \V hich each bar is divided into
certain parts and eac.h part represents a particular phenomenon. Thus,
if the number of students in a university for the last ten years are to be
shown on the basis of their distribution in variotli faculties, sub-dh~iqC!d
bar diagrams can be prepared. One bar would represent the total num-
ber of students in each year and it \vill be- divided in four or five parts
depending on the number of faculties in the university. rI'he length
of these parts would be in proportion to the number of students in
various faculties. We shall now study in detllil these three types of
bar diagrams.
Simple bar diagrams
The scale for the construction of simple bar diagrams should be
adjusted in such a manner that the longest bar may be easily accom-
modated on the paper on which the diagram is being drawn. Sufficient
space should be left on all sides of the diagram for writing heading,
scale, urdts, etc. Since thickness of bars is not taken into account
they should be so constructed that they look attractive and beautiful.
Very thin or very thick bars look bad and sometimes give misleading
impressions, but there is no hard and fast rule with regard to the width
or thickness of the bars. If the number of items;5 very large, bars
have to be thin and sometimes have to be replaced by lines. It should
further be remembered that whatever be the width of the bars it should
be uniform for all of them. Bars should further be equi-distant. The
space left between adjoining bars should always be equal. Bars can
be drawn either vertically or horizontally. In the former case one end
of the bars would be on the horizontal base line while in the htter it
would be on the vertical base line. Generally, the base line is taken
horizontally because vertical bars giv~ a better comparison than nori-.
zontal bars. Bar diagrams can b.: used ir case of discrete series, se ies
of individual observation, spatial and condition series. They ca 101
be used in continuous series. Bars are not suitable to reDresent ile
long period time series also. Wherever there is continuity in dat:l bar
diagrams should not be used. This is the reason why bars are kept
$eparate from each othc=r. If they are,joined together the diagram would
indicate continuity in data and would a~so become distorted and ugly.
Special care should be taken to make the bars attractive. ThIS can be
done by colouring them or by preparing designs in them. However,
the colour or designs of all bars representing one type of data should be:
identical. Below are given some diagrams of this type :-
TABLE 1
NNmber oj slNde»ts in UnilJersity during the last eight years (exdllding
ex-students and certificate examinati(ln stlldents) endir.g 1963-66.
20
.306 FUNDAMENTAI.S OF STATISTICS

Year Number of Students Year Number of Structures


19,8-59 ;,737 J 196Z.-6; 5,066
19~9-6o 3,897 19 6 ;-64 5.740
19 60-61 4, 18 3 19 64- 65 6,199
19 61-62 4,447 19 65-66 6,747
If the abov~ data are to be represented by simple bar diagrams
we shQuld first decide about the length of tbe biggest bar. The highest
figure in the above table is 6,747 and if this is represented by a bar ;-
in length then the lengths of other bars can be calculated accordingly.
The length of the smallest bar representing figure 3.737 (of the year
1946 -47) would be3~3737 inches or 1.56". In this way the leng.th
747
of othel; bars can also be calculated. In the diagram given below the
bars are vertical and placed at equal distances. Their thickness is also
identical.
Numb" of Students in a Univerrity
(1958-59 to 196.5-66)

Fig. 1
If the number of items is very large, bars have to be replaced by
simple lines. The technique of drawing such diagrams 1S the same
u in the previous case. The only difference is that the bars will have
no thickness. The data given below in table II is represented by lines
in figure No. z : -
DJAGRAMMl,TlC REPRESENTATION OF DATA 307

TABLE II
Heights of 32 sludents of M.(.om./inoi dau
Serial No. Heights in Serial No. H eights in
of students inches of Students inches
,
4' I I ' 17 ~, 5'
1. 5 ' o· 18 ~ 5'

4
3 5 ' o·
5' I'
I?
1.0
5' 6'
5 6' .
6
5 5 ' Z·
5 ' 2'
21
22
5
5
7'
7'
.
7 5' 2' 23 5 ' 7'
8 5', 3' 24 5 • 7'
9 5 25 5' . 7'
10
rI
5 • ;'
5' 3'
26
27
" 5 ' 8'
5' 8"
11. 5' 4' 28 5' 9'
[3 5 r 4' 2.9 S' IO~
14 5' 4' ;0 ~' 10·
15 5' 5' 31 5 ' 10'
16 ~' 5' 32 6' o·
Heights of St"dents
80 ,_.---_._-

70
!
60

so
.
I \
,
40 ,-
I
-
30

20'f-
I I
fO

o
o 5
I 10 t.s 20 25 .3032
Fig. 1.
308 FUNDAMEN1'AL~ OF STA'l'HTICS

In both the diagrams given above there is ~ horizontal hase. Base


can be on the vertical line also. In table III the number of teachers in
the University during ten years ending 1966 have been given. These
figures a-re represented in diagram NO.3'" hich has been constructed Of)
vertical base.
TABLE III
]\Trtnlber of Tcarhfrs in (J r; 11;11. rsit'Y
Year No. of Teachers
1956-57 121
1957-5 R 12.6
195 8-59 1;2
195,)--60 136
1 (}60-61 141
19 61 -- 62 150
1')62-6) 162
19 6 3-- 64 17 0
19 6 4- 6 5 IS!!
19 65-66 202
Nfl/fIber of liwhtrJ in a University
(I ')56-57 to 19 65- 66 )

61-61

60-61

5 6-'1
~ _ _~_J___-L_-L-_J~_-L_~__~_
f()O 110 '20 130 140 150 p,,:, 170 IRa 190 200 2/0

Pig. 3
OiAGRAM~IATIr. REPRE~ENTA1'ION OF DATA 309
(n all the above examples the figures given \\ere already in as-
cending order. If the figures are not in ei.ther ascending or descending
order they have to be so arranged for facility of comparison. But this
can be done only if the fignres do not relate to a series of years. For
example, it is not possible to first \vrite the figures of 1945 and then of
1948 and then of 1944 with a view to secure an ascending or descend-
ing series. Such arrangement is possible only jf the data are not 1n the
shape of time series.

Multiple bar diagrams


The technique of simple bar diagrams can be extended to represent
two or more sets of inter-related data in one diagram. Multiple bar
diagrams thus supply infot:mation about more than one phenomenon.
in each of the three figures numbered 1, 2 and 3, the_ diagrams supplied
information about one problem only. In the first case abOtlt the num-
ber of students in the University, in the second case about
the heights of 32 students aod in the third case about the number of
teachers. Two or more inter-related phenomena can be represented
in the $hape of multiple bar diagrams. Tab-Ie IV below gives the data
relating to the values of imports and exports in India during four years
ending 19H-54 : -

TABLE IV

Vallles of Exports find InJports of (I CO!lfJtr,y ,.J


(1960-61 10 1963-64)

Year Exports Imports


(crores of rup~es) (crores of rupees)

19 60 - 61 610.~ 62.4. 6 5

19()1-62. 955039" 742.·78

(962.-63 660.G5 51 8.3(,

196,-64 j6p·5 P7·9 8

The data given in the above table can be vety suitably represented
by a multiple bar diagram. The figures of exports and imports each
year can be represented by two types of bars placed side by side. Such
dia.grams facilitate comparison. The figures gIven in table IV above
can be represented by a multiple bar diagram in the following manoer:-
310 FUNDAMENTALS OF STATISTICS

Exports and Imports of Country A

In
lor Cf'DI'e l"II{'(tc!.!
-----'------,
9 t ~£Kpot'f
8 1/111pOI'1
7

61- 6a

Fjg.4

From the..diagram given above it is possible to compare the rela-


tive values of exports and imports in different years. It is possible to
have even three or more bars representing interrelated phenomena,
put close to each other for giving. a good comparisop. Thus, if figures
Bre available about the number of married, unmarried and widowed I
persons per 1000 people aged 20 years or more, for Bve different places,
we can have. three bars for each place. These three b~rs would represent
the three types of people-married, unmarried and widowed. Similarly
if we have ngures of the population of four tow,tlS during the last
three cens?,ses the data can be suitably represented by multiple bars.
Table V gIves below the population of four towns dunng the last three
censuses :-
nlAGRAMMATIC REPRESENTATION OF DATA 311

TABLE V

Population offOllr te/vns in ,oree &ellSUSes

Town Population 1941 Population 1951 Population 1961


1\
B
C 30,470 40,500 1:1°,300
D 32,600 45,300 1,2.0,000
The above data can be represented very suitably by multiple ban.
For each one of the four towns three bars \vould be drawn-:-.first re-
presenung the popUlation of 1941, the second representing the popu-
lation of 195 I and the third representing the population of 1951. These
bars. would be placed sid~ by side to facilitate comparisQn.
Bars can be either on vertical base or horizontal base. In the following
figure they are on a vertical base so that the bars are hori20ntal :

Population of 1'01)'11 A, B, C ana Din 1941, 1951 tlnd 1961

o ,0" 20 30 ~(' sc· 60 70 80 90 100 110 .ao


(In Ihousands)
Fig. .5"
312 FUNDAMENTALS OF STATISTICS

Sub.divided bar diagrams

Sub-dIvided bar diagrams are used to present such data which


are to be shown in the parts or which are totals of various sub-divisions.
If, fc;>r example, \ve have to show the expenditure incurred by th.e
Central Government and by Part A and Part B States (before reorganl-
sation) 011 the first Five Year Plan under six major heads we can represent
the data by means of sub-divided bars. Three bars would represent
the expenditure of Central Government, Part A States and Part B States
and the length of these bars would be in proportion of the total expen-
diture incurred by these three types of Governments. E1ch bar then
would he divided into-six parts fend deh part 'Would represent the ex-
penditure on one of the six major items. Naturally the proportionate
lengths of these 'parts would be in the ratio of ~he expendlture in~rred
by a {Sarticular government on the six major heads. The following
table gives the expenditure incurred by the Central Government, Part
A States and Part B States ~ the first Five Year Plan under six major
beads : -

TABLE V]

Expmdi/ure Ofl tbe first F;ve Year Plan

(erores of Rlipees)

~ubject Central Part A Part II


Govt. States States

--Agi-:-:ultUrCilnd De"elopmellt 186.; 12 7.; 37. 6


lrrigath '1 and Power 1. 6 5.9 206.1 81.5

Transport and Communication 40 9.5 56 .5 17·4


Industr. :~ 1¥i·7 11-9 7. 1
Socialselvit;cs 191.4 19 l ·3 28,9

Miscellaneous 4 0 .7 10.0 0·7

Total 114°·S
-----
610.1
--
1:.73·~
DIAGRAMMATIC REPRESEN1A'l"lON OF DATA 313

Expenditllre on ~he first Fille Year Plall

o M'scelloneolls
t· ..'::Jlndustries
~Agncll/Illre
• Social Service
IIJlrr'.9ation
• Transport

Go vi Stoles 51ales
Fig. 15
. In the above figure the lengths of the bars are in proportion to
the total expenditure incurred by the three types of ~overnment_ Each
bar is then divided in six parts, each part representing the expenditure
on one of the six heads. It would be noted that the expenditure of
Central Government is highest on transport and next highest on
irrigation. The various parts within a bar are arranged in the order
in which the Central Government has spent money. The arrangement
of the pa,rts in all the three bars is identical. This diagram not only
gives us information about the total expenditure of the three types of"
governments on the first Five Ycar Plan but also the distribution of
this amount over various major heads. From this diagram we can
also compare the expenditure of various governments on any particular
head, say transport or agriculture or industries, etc. Such sub-divided
bars can be used to represent the time series ~Iso if the data are avail-
able about various parts, period after period. For example, if we have
got the 6gtues of the number of students of Allahabad University dur·
314 FUNDAMENTALS OF STATISTICS

ing the last te:_n years divided facultywise we can have ten bars to repre-
sent the total number of students in the last ten years and each bar can
be divided in four parts on the ba,:;is of the strength of students in each
of the four faculties of Arts, Science. Commerce ~nd Law. The
following table gives such figures about the Allahabad University for
a period of six years.

TABLE VII

Number of Student! ill tbe Allahabad University

Year Arts Science Law Commerce Total

1945-46 17 83 590 39 1 401 31 6 5


~946-47 20 57 650 47 8 55 2 3737
1947-4 8 20 96 7z 3 43 8 640 3897
1948-49 225 2 84 8 394 68 9 4 18 3
1949-5 0 226 7 9°0 554 72G 4447
195 0 --5 1 261 3 I,u8 4 86 _ 749 5066
These figures are shown on a vertical base in diagram number
7 below':- •

Number of Students in Different FaCIlities

if Allahabad University

commerc(!

i
1945
Law
Soenc!'
ArEs

o I(JOO 1000 3000 4-000 5000

Fig. 7
OIAGRAMMATIC
, REPRESENTATION OF DATA 315

Sub-divided bar diagrams' are also used to show the difference of


figures. In such tases one bar is superimposed on the other bar and
the balance thus becomes clear. Supposing we have to show the
balance of trade for a country for two years. Here one bar would be
drawn representing the imports or exports of the country and another
bar representing the exports or imports as the case may be, would be
superimposed on it. In this way the'balance of trade would be very
clearly shown in the diagram. The following table gives the exports
and imports and the balance of trade for two years. In the
year 1960-61 the imports exceeded exports and there was an unfavour-
able balance of trade while in the year 1963-64 exports exceeded imports
and the balance of trade was favourable. We shall first draw a bar
representing the exports and on it, the bar rep' resenting the imports
would be superimposed, .

TABLE VIII

Exports and ImptJrts of COllfJtry A


.:t
(in crorn of rllpets)
Year Exports Imports Balance of
Trade
19 60 - 61 610 6Z4 -14
1963-64 5(,5 52·7 +3 8
The diagram drawn from the above figures would be as follows :-
Exports, Imports and Balance of trade of Country A

~£~orl$
_Imporls

,
19 60 - 61 196 3- 64
Fig. 8
316 FUNDAMBNTALS OF STATISTICS

From the above it is dear that in 1960-61 there was an unfavour-


-able balance of trade while in 196,;-64 the balance was favourable. In
the case of 196~-t'1 the top of the bar is shaded indicating that the jm-
ports were more than the exports and the bar representing imports
is taller than the one representing exports. In 196,;-64 the top of the
bar is unshaded indicating that the exports were more than the.iDlports.
Sometimes instead of using sub-divided bars only lthe . difference
of figures is shown in the shape of bars. Bars representing positive
differences are shown 2n one side and those representing negative
differences on the other side. A vertical line in the centre of the paper
represents the point of equilibrium and on its right-haud side there are
positive differences while on the left-hand side the..diIferences are ne-
gative. The following table gives the number of houses in ten small
towns of India during' the last two censuses. In some cases there is
an increase whIle in others there is a decrease. They have been arranged
in such a way that the towns in which there is an increase in the number
of houses are written first and the towns in which there is a decrease in
the number of houses are written after them. Within these groups
they have been arranged in accorrlance with the magnitude of the increase
or decrease.
TABLE IX
NIII"ber of HOllies ;11 Tell Il1Iall TOWill ill 19~ 6 alld 19(,,6

Name of Number of Number of Difference


Town Houses (l95 b) Houses (1966)
J.\ 200 2.jO 1-5 0

B ';00 34 0 +4CJ

C 400 .go -l- ~o

D 4~0 4~5 +n
E 52.0 53°\ -~. l()

F 300 z9° -10

G 4(.10 3 80 -~o

p :z.So zso -~o

I 71'- 710 -40

- J no p.e -50
DIAGRAl.fMATIC REPRESF_,,"TATION OF DATA 317
Differences in the nllmber of hOllses in tIn smull

"'---_.
tOll;ns from 19,6 t .. 1966

B _ _ __

-=H
____I
_ _ _ _ _J

-50 -25 o +25 .. ,50

Fig_ 9
In all the a~ove diagrams bars have been ~sed to represent t~e ac~al
ligures. Many times comparison of the data IS done on a relative baSIS.
In such cases also, f'imple or multiple bar diagrams can be used. Even
sub-divided bars can be used for the purpose. If the data regarding t~e
cost of production of a particular commodity and its sale price are avall-
able for a number of years, sub-divided percentage bars can be drawn
to show the percentage cost of each item to the total cost. It is also
possible to draw bar diagrams which show the percentage of pronts
llnd cost to the total turnover_ In the following table the data about the
cost, I?roceeds, profit and loss per chair in the years 195;, 1954 and 1955
arc glven.
'TABLE X (/7)
Co.rt" Proc"ds, Profit or LorI per chair
during 1963, 1964 and 1965
l:'artlCUlarS 19()3 ----------~------
19()4 196~
(R>s.) (Rs.) (RS-.)
Cbst per chair
(r) Wages 12. 10 11
(2) Other costs 8 7 7
\
(3) Polishing 4 3 ~
I Total cost 24 20 21
Proceeds per chair ,-S 20 20

-_Profit or Loss per


ch:lir +1 -'1
318 PUND.\UEN'tAL') OP STATISTICS

If the above data are to be represent.ed on a percentage basis the


proceeds per chair in each of the three years would be taken. as 100 and
the other figures would be expressed as percentages of this figure.
The percentages would be as follows : -
fABLE X (b)
Particulars 19 6 ; [964 19 6 5
(%) (%) (%)
Wages 48 50 55
Other c~ts 32 3' 35
Polishing 16 IS 15
--Total. cost 96 100 10 5
Proceeds 100 100 100
Profit or l.oss +4 -~

It should be remembered that these percentages have been cal-


culated on the proceeds per chair. Thus in 1963 wages were 48% of
the sale price of the chair, in 1964 this percentage was 50 and in 1965
it was 55. These percentages can be represented in the_shape of a dia-
g1'l1tn as follows : -
Percentage Cost, Procteas, Profit or Loss per
chair during 1963, 1964 and 1965

Fig. 10
In the above diagram first of all three bars of equal size have been
drawn. They represent the sale proceeds of each of the three years.
Then percentage of profit for the year 196; which is 4 is shown at the
bottom of the bar. From this level polishing charges which are 16% arc
DIAGRAMMATIC REl'RF."ENTATION OF DATA 319
measured. From this level (which is now zo% from the base line) an-
other part is cut at a distance of 3Z% (or S z% from the base line). The
remaining portion or 48% represents the wages. Similarly. the bar for
the year 1964 has been drawn. There, is no profit in this year and so this
ba~ contains only three divisions representing polishing charges, other
expenses and wages. In the year 1965 thereis aloss of S% so that the
total cost is ros % of the sale proceeds. This loss of S% is shown below
the base line and from this level the percentages of other parts are marked.
The bar is thus divided according to the variou~ :tems of~cost. It
should be noted that out of the percentage of polishing expenses in 1965
(which is IS) 5% is below the base line and 10% above it.
Though bar diagrams can be used to show many sub-divisions
it is not worthwhile to use them if the number of such divisions is large,
because in such cases comparison becomes somewhat difficult. Due to
disparity of figures in different bars various sub-divisions may be thrown
wide apart from each other.
TWO-DIMENSIONAL DIAGRAM'>
Rectangles
As has been said earlier, in such diagrams the size of the items is
represented by the area of tbe rectangle. As such, not only their lengths
are taken into account but also their breadths, because the area of a
rectangle is equal to the prod'lct of it::: length and breadth. When two
fig-Ilres are to be shown by the areas of two rectarigles, two methods
can be adopted: either their breadths :nay be kept equal and their lengths
in proportion to the two figures or, their lengths can be kept equal and
their breadths in proportion to thf' size of the two figures. In both
the cases the area of the rectangles would be in proportion to the size of
the figures. Generally the lengths are kept equal and the breadth in
proportion to the size of items. The data given below in Table XI are
represented in figure I I in the shape of a rectangle:

TABLE XI (a)
Expenditure on the first Five Year Plan
(in crores of rupees)

Subject Total Central Part A


Govt. State
Transport Mld Communication 497 4°9'S 56.~
Agriculture and Development 3 61 186,3 IZ7·3
Social services 4z5 191.4 19 z '3
Irrigation and Power projects 561 2. 6 5·9 z06.1
Industries 173 146,7 17·9
Miscellaneous SZ 40 .7 10.0
Total 2069 1%40 .5 610.1
320 FUNDAMENTALS OF STATISTICS

If the length of rectangles is kept equal, then their widths have to be


in proportion to the figures given in the totals column. In other words,
the breadth of the rectangles would be in the ratio of 2069: I240.~ : 610.1
or theywouid bein the ratio of ;.39: 2.03: I. After deciding the widths
of the rectangles the percentage expenditures on various items as com-
pared to the expenditure on all theltems would be calculated in all the
three cases separately. The totals would be taken as 10'0 and other
figures '\\ ould be expressed as percentages of the total. The rectangles
would then be divided in accordance with these ilercentages. Table
XI (b) gives these percentages for the data given in Table Xl (a) ! -

TABLE Xl (b)

Expenditure 011 Various Items

(in Pert'Mtalu)

,1 <I) U) II)~ II) '"


I> <I) I> <I)
i ..... b.C .f; ~ ..... bI)
Subject Total ! ..... c OJ 0::
0:: .... Central ~a PlrtA .... col
0:: .....
-c:
I

8~ uII)
; ~ ~
I Govt.
:::J
S
:::J
II)
~
0 States
:::J u
E ....u
~
I Up.,' up., uP::
Irrigation
~\~-f--
I 27.
27. J 21.43
---- ---_- _----
33·77
I 33-77 ~1·43
and power
I

Transport & ,
Communication 24·0 51.1 33. 01 54·44 9. 28 43· 0 j

Social
Services i
20·5 I 71.6 15 ·44 69. 88 ~l.p 74·57

Agriculture j I
and Develop-
ment

Industries
1.75

8·4
I !:l9. I

97·5
15. 02

I 1.82
84.9 0

9 6.7 2
20·!!7
\
2·93
95.44

9 8.37
Miscellaneous Z·5 II 100.0 ,.21\ 100.00 1.1) 3 100.00

Total 100.0
i
1----

I
_---
'00.00 100.00
----
I
I
DIAGRAMMATIC REPRESENTATION OF DATA 32-1

Per(Inlage E'xpmditflre (In first Five Year Plan

by C,ntral Government and

Part A Siales

.'rr~q(J/ion IIIISoclal St'rYIct'6 m'ndus',,~s


III Transport mAgricultur~

Fig. 11

Inttead of showing the percentages. the actual figures can also be


shown. In such cases both the width and the length of the rectangles
can be varied. Figure No. 11 above gives two types of comparisons.
Firstly, with it we can have an idea"about the relative expenditure of the
Central Govc:rnment and Part .A. states on the first Five Year Plan.
Secondly. the diagram discloses the manner in which the Central Govern-
ment and Part A states spend on various items. Rectangles can be used
to compare even three phenomena simultaneously. Thus if we know the
cost of production of two articles, their sale price and the quantities sold,
we can depict'these data by rectangles. The following table gives such
information about a commodity produced by two factories A and B : -
21 .
PUNDAMBNTALS OF STATISTICS

TABLE XII
Cost of Prodlltl;on, Profits and Nil",,," of Uf/its Prodllmi b.J 11110
jartories A au B
Particulars Factory A. Factory B
(Rupees) (Rupees)
Wages %000
Raw Materials ~ooo
Total cost 5000
Profits 40 00 1000
Total Sales 9000 4800
Numbe-t of units
produced and sold 1000 800
In the above table sale prices per unit in case of Factories A an!l
B are respectively Rs. 9 and Rs. 6. l'wo rectangles would be drawn
whose breadths would be in the ratio of 9: 6 and whose length, would
be in the ratio of 1000: Soo. These rectangles would then repr¢sen.t the
total sale proceeds and within them various dh'isions would be made to
represent the amount of different items of cost and the amount of profit•.
Cost of Prodllttion, Sale Proreeds and Profils of a t011l1ll0d;/y prodlilld
by fat/oriu A and B

No.ofumls FACTOflY A

Wages FACTOR. 'I 8

750

Wages

Pig.. l~
DIAGRAMMATIC REPRE,ENTATION OF DATA 323

The first rectangle is 3' long and three items (profits, materials and
wages) whose t>arts are to be cut are in the ratio of 4000 : 5000 : 1000
The vertical scale is divided in this very ratio to get the sub-divisions
of profits, materials and wages. Similar calculations have been made
i~ case of the second rectangle representing the profits and the cost of
Factory B.
Squares
Amongst two-dimensional diagrams, sometimes squares give a
better comparison than rectangles and bars. They are specially useful
when some items of the series have values much higher than others.
In such cases if a bar diagram is Qrawn then the bars representing big
figures would be very big in size while the bars representing the smaller
figur.es would be comparatively very small. If, for example, two values
are in the ratio of 1600 : 100 one bar would be 16 times the length of
the other. In such cases, squares give a better comparison becausere
case of squares the area is taken into account. In the above case where
two figures are in the ratio of 16 : I the sides of the squares would be
in the ratio of 4: I though their area would be in the ratio of 16 : J.
In the construction of squares first of all the squa.re root of the
various figures is calculated and then squares are drawn with the lengths
of their sides in the same proportion as the s'luue roots of the original
figures. The area of the squares would be In the same proportion as
the ratio of original figures. The following table gives the ligun:a of
the production of coal in some countries in the year I9S I. These
6gures are represented in the shape of squares in figure 13.

TABLE XIII (a)

Figllres of Coal Prodll~/ion in 195 I

Country Production
(00,00,000 tons)

U. S.A. 13°·1
Rllssia 44·0
United Kingdom 16.4
India ~·3

The square roots of the figures of production are calculated below


in table XIII (b) and the proportionate length of the side of squares is
also csIcuJated. The side of the square representing Indian coal pro-
duction which is 1.82.' has been taken as .2.S' aoU other figures have
beeD calculated on this basis.
324 FUNDAMENTALS OP STATISTICS
(TABLE (h) XIII Fi;!,ureJ oJ Coal Production In 195 1)
Country Production Square root of
------
Lengths of Side
I
(i)
u. S. A.. -- --
1__(00.00.000 tons)
(Ii)
13 0 • 1 - - U:40--
figures of col (ij)
(iii) I
of Squares
-\---1:,6
(Iv_!)_- - -

Russia 44.0 6.63 0.91


United Kingdom 16.4 4.0, o.U
~In~d~i~a__________________~3~.3~____~_____I~.~82~ 0.2,

Prodtltlion of Cfltli i" 19,1

II
II

Fig. 13
DIAGRAMMATIC REPRESENTATION OP DATA 325
The scale in the above diagram is calculated as follows:-
The area of any square is first calculated. Thus the area of the
square representing the production of United Kingdom is .55 X.5 5 or
3.025 square inches. 'Ibis area represents- ~ production of 16.4 million
tons. Therefore one square incb would roughly represent 14 million
tons.
For facility of comparison, as also for saving space, the total of the
figures can be represented by one square and the other figures can be
sho\vn in the shape of divisions. These divisions would be in the shape
of rectangles. The divisions can be made either horizontally or ver-
tically. The data given below in table XIV have been represented thus
in figure No. 14 below : -
TABLE XlV (a)
Prodllction of Manflaner, (000 tons)
CoulILry l"ruuUCtlull
RUb:.l.. z.z.vO
South Africa SiO
Gold Coast 7 11 3
lndia 747
Frt"och Morocco 3 16
Brazil 179
Egypt 16 7
Japan I48
'~'otal 5410
. The total production uf 54,10.000 tons 'Would be representt:o
,by a square \vWch would be divided in various parts to represent the
production of various countries. The square root of 5410 is equal to
73. Iftbis quantity is represented by 3.7" then to show the pcoductioo
of various countrIes it·would be divided in parts. Such figures obtained
after calculation ate shown in t!lbJe XIV (b).
TABLE XIV(b)
Prodllc_!i{ln of Afangll!':.!..' {ooo lorrs)
Cumulative
I
Country

Ku~~ia
I Production

2.!\JU
Length in
itiches
1·51
Length in
inches
I Ill'
South Africa 870 o·S9 ~.IO

Gold Coast 71S3 0·54 1I!.64


India 747 o.S I 3. 1 1
French Morocco 3 16 0.22 3·37
Brazil 179 :>·lZ 3·49
Egypt 16 7 0.1 I 3. 60
Japan 14g 0.10 3.70
Total ~410 3.70
326 FUND.... MENTALS OF. S:r.... TJSTICS

Prom the above figures the following type of diagram would be


cODitructed :-
Prodll(/ion of l.fallgantse in Variolls COlln/rils

1 Sf/r. i"cn. )'9J LaId, tons

rig. 1'4

Circle ot Pie-Diagrams
Circles occupy a unique place amongst t\\ o-dimensional diagrams.
The reason for their popularity lies in the facility and ease with which
they can be dra" n. The area of a circle is directly proportional to the
square of its radius. Thus if the radius of a circle is four times the radius of
another circle its area would be sixteen times the area of the other circle.
The area of circles is always in the ratio of the squares of their radii.
Circle can be used at all places where squares are used. Just as in the
construction of squares. the square roots of various items arc calculated.
siIr1ilarly in the construction of circles. the square roots of various figures
are found out. In case of squares their sides are kept in toe ratios of these
square roots, and in case of circles their radii are kept in this ratio. l'hlt
\
DIAGRAMMATIC REPRRSENTA'1l'ION OF DATA 327
data given in table XV below is represented by circles in figure No. I,.
TABI.E XV (n)

Production of Pelro/etlm in Different Countries (00,00,000 Gallons)

Country Production

U. S.A. 2200

Venezuela 6z:.z

Russia 301
Saudi Arabia 268

Iran 13 2

The square roots of these figures and the ratio of the radii of
various circles haTe beet: calculated in Table No. XV (b) below:-

TABLE XV (b)

ProdllGlion of Pelro/ellm in Different COlin/rill

Square root Length of


Country Production of figures in Radii in
column 2 inches
(I) (2) (~) (4)

U. S. A. 2200 4 6 ,9 T .02

Venezuela 622- 24·9

Russia 301 17·4

Saudi Arabia 268 16.4 ·35


Iran I,P II·4
I
In the above table the square roots are divided by 46 to obtain the
radii of different circles.
328 FUNDAMENTALS OF STATISnCS

Production oj Petroleum in Difftrenl Countries

r
I
I
1

I SlIr inC'" 66" miNlOtI gillIS

Fig. 15
In the above diagram, for the calculation of sc~le the area of aoy
circle can be calculated. Thus, the area of last circle representing the
production of Iran is about ·2 sqnare inches. If.2 square inches repre-
sents 132 rodUon' gallons, then one square inch would represent 660
DIAGRAMMA'l1IC REPRESBN'l'A'l1ION OF DA'l'.\ 329
million gallons. The area of the circles is in proportion to the figures
of produetion. Circles look more beautiful than squares and are also
easy to draw. As such wherever a choice has to be m1.de between
squares and circles the }atter should be preferred.
Just as in c~se of squares it is possible to represent the aggregate
l1y one big square and various cOl11ponents by rectangles cut within it,
similarly, in case of circles the aggregates can be represented by a b~g
circle and the various components by sectors cut irside it. Such dia-
grams are known as Angular Diagrams. Sectors are not difficult to draw
and they look beautiful. This is the reason why they are preferred
to squares.
The areas of various sectors are in proportion to the angles which
they make at the centre of the circle. The angle at the eentre of the
circle is of 360 degrees. This angle of 360 degrees represents the ag-
gregate. It can be divided into a number of smaller angles whose
degrees would be in proportion to the values of the components.
Table XVI (a) below gives the total expenditure on the first FIve Year
Plan and its distribution amongst various types of States.

TABLE XVI (a)


Expenditure on the first Five Year Plan

Expenditure in crores
I -----------
Government
-------·-----1
of rupees

Central Government I 1,%4 1


Part A States 6I0
Part B States 173
Part C States '3 %
Jammu and Kashmir I;
----------------1----
Total
To .t;epresent tois data first of all a circle with any radius would
have to .be drawn. The angle at the centre ,,,,hieh would be of ;60 degrees
would represent 2,069 crores of rupees which is the total erpenditure.
The area of this circle thus represents the total expenditure. On this
basis the sector which is to represent an expenditure of 1,241 crores of
360X I 241 .
rupees would have an angle of 2,069 degrees. In this way we can
calculate the degrees of angles which the different sectors should have,\
The total of the degrees of all these angles would be 360. Table XVI
<h) given below gives the degree!'. of angles of different sectors relating
to the data given in Table XVI (a) above.
330 FUNDA.MENTALS OF STATISTICS

TABl.E XVI (D)


Expenditure on the first Five Year Plall

Expenditure in Degree of angles


era res of rupees
Central bovc::rnmt!nt 1,2.4 1 21)'9
Part A States 610 106.1
Part B States In 3°,1
Part C States 3% 5. 6
Jammu and Kashmir I; 2.;
Total ~60.0

•.o\fte·t the degrees of various angles have been calculated in the


manner given above, the next thing is to draw any radius in the circle.
From this line an angle of Z I 5.9 would be marked and a line would be
drawn (passing through this point) from the centre IIp to the circum-
ference. The sector so formed would represent the expenditure of the
Central Government. From this second line an angle of 106. I degrees
would be marked and a third line would be drawn forming the second
sector representing the expenditure of part A states. In this way other
sec~ors \vlJuld also be drawn. Figure No. 16 given below represents
the above data. I,

ExpendtTllre of VariNIl Governments on the first Fiv, Year Plan

Fig. 16

If two aggrega~_" and their components are to be compared, two


drdes would have to be drawn with areas proportionate to the ratio
of the two aggregates. Each circle would then be divided into sectors
on the basis of the values of its components. The following table gives
DUGllAMMATIC REPRESENTATION 01' DATA 331
the total expenditure of the Central Government and part A State. in
the fust Five Year Plan and their distribution over various item. : -

TABLE XVI~ (a)

Bxp411tiillire of Cenlral Governl1Jent and pari ~ Statu on Jirrt Fi", Y,ar


Plan (in Crorer of rHpell)

Subject Cehtral Government Part A States


Agriculture and Development 111(>,3 1%7·3
Irrigation and 'Power 26"9 106.1
Transport and Communication 40 9.' 5 6 ·S
Industries 146·7 17·9
Social services 19 t ·4 19 2 .3
Miscellaneous 40 .7 10.0
Total 610.r
The square roots of l240.5 and 610.1 are respectively 35.5 and
24'7~ Therefore, the "fadH of the two circles would be in this ratio.
Roughly if the bigger circle has a radius of 1.35· the radius of the smaller
one should be .9·. The various angles representing the component.
have been calculated in Table XVII (b) below : -

TABLE XVII (b)


Expelltiilllre of Cmtral Government and Pari A
SIalu on finl Five Year PlafJ ;11
Crorer of RHpeer
Subject Degrees Degrees
Central Part A of angles of angle.
Government States (Central (Part A
Government) States)
Agriculture and Develop-
ment 186.;' 12 7.3 54 75
Irrigation and Power 2. 6 5.9 206.1 77 IlZ
Transport and Commu-
nication 40 9.5 56 .5 119 B
Industries 14 6 07 17·9 43 II
Social services 191.4 19 2 .3 55 11;
Miscellaneous 4 0 .7 10.0 IZ. 6
Total 360 360
FUNDAllENl'ALS OF STATISTICS

From the data given above the following diagrams can easily be
drawn:
Expmditllre of Central Gflvernment and Part A
States on fir.rt Five Ytnr Plan
(Crores of rtipee.r)

PA!lr·A ·STATU

CENTt/AI CDVT

Fig. 17
Circles can be used at all places where squares and rectangles are
used. Angular diagrams look beautiful but if the number of compo-
nents is very large it becomes difficult to show them in this fashion.
In such cases smaller components should be merged and then the data
should be shown in the shape of a circular diagram. 'But merging of
comJ>onents is not always aesirable. It may sometimes create mislead-
ing Impressions. It should be remembered that various sectofs in two
or more circles should always be kept in one order to facilitate com.
parison.
THREE-DIlmNSIONAL DIAGRAMS

Cylinders, Spheres, Cubes, etc., are kno",-o as three-dimensional


diagrams, as in the calculation of their area three dimensions. length.
bf!adth and depth have to be taken into account. We shall not consider
the construction of cylinders and spheres as it involves difficult calcula-
tions. However, we shall give an example illustrating the representation
of data by means of cubes. Cubes are very useful in such cases where
the difference between the smallest and the biggest figure is very large.
S.uppose two figures are in the ratio of I : 729. if bars are constructed
to repre!lent them one bar would be of I ' in length and the other of 60
and 9'. If these figures are represented by squares or circles then the
side of one of the squares or the radius of one of the circles would be I '
and that of the other 'l.7'. If, however, cubes are constructed their
sides would be in the ratio of I ' : 9'. Thus we find that in cases where
the range of the figures is very large cubes are very helpful in represent-
ing the data diagrammatically. Three-dimensional diagrams are 110111111'
diagrams whereas two-dimensional diagrams are SlIT/ace diagrams and as
such the drawing of cubes, etc., is more dilricult than the drawing of
rectangles or squares.
DIAGRAlOlATIC ,RBPRB~BNTATION ,OF DATA

The following table gives the population of four towns in the last
census : -
T:ABLE XVIII
Populalion of FOllr Towns

Town Population
A 5,00,000

B 1,00,000

C 50 ,000
D 10,000

If the above data are to be represented in the shape of cubes we


shall have to find out the cube-root of these figures. The cube-root
of th~se figures are respectively about 79.2.5, 46,45, 36.81 and n·B·
The sIdes of the 'various cubes have to be in these ratios but since these
figures are still very big they can be divided by a common factor. If
these' figures are divided by 50 they would respectively be 1.5 8, .9 2 , ·74
and .44. We can now have cubes whose sides are respectively I.J 8'
.92.' .74' and .44". The following figure represents the data given in
table XVIII above, in the shape of cubes : -
Poplllalion of FOllr Tou'n.r

Fig. I8-A

The construction of cubes is not difficult. Suppose a cube with


a side of I ' is to be constructed. The following procedure would have
to be followed : -
(1) Construct a square with a side of 1'. It is represented by
ABCD in figure 18 (b) below.
(2.) Find out the mid-point of the fine AB and draw a perpendicular
I ' in length, ~. below AB and i" above it. It is represented by EF.
(3) Join AE and from B draw a line parallel to AE and equal
to it in length. It is represented- by HR.
(4) Join EH and from the point H draw a line parallel to EF Ilnd
equal to it in length. It is represented by GH.
334 FUNDAatBNTALS OP STATlSTlCS

" (,) Join C and G. Rub off line BF and FG. The required cube
is ADCGHB.
CO,lJifilimm of CUb6. Three-dimensional diagrams arc more difficult
to construct than surface diagrams. E
Cube rool: -of the figures cannot be "..------:.1
calculated very easily and though
the drawing of cube is not very diffi-
cult, cylinders and spheres require very A ~_______""
great care in construction. As such ,
three-dimensional diagrams, in general, I
and cylinder and spheres in parti- 1
eulu are not very popular. But it J" ___ _
must be admitted that three-dimen- /' F
sional diagrams look more beautiful ~~
than hars, squares, rectangles or ~
circles. "
D
Fig. 18-B
PICTOG R.4 "'(5

In pictograms the relative values of items are represented by pictu1'es.


The number of pictures drawn or the sizes of different p,ctures are in
proportion to the values of various items which have to be represented.
Thus, if the nnmber of cows in India, U. S. A., and C..b.ina are in the ratio
of 10 : 4 : 1 we shall draw pictures of 10 cows representing their num-
ber in India, 4 cows for U. S. A. and one for ('bina. Similarly if in our
country the expenditure on education in a year, is rupees four crorea, and
on police, rupees sixteen' crares, a pictogram can be constructed from
these figures. The sguare root of these figures can be calculated and two
squares with side of 1.. and 4' respectively can be drawn. Inside the
squares, pictures of two money bags can be drawn. Thus the areas of
these squares would indicate" the ratio of expenditure of these two items,
and the money bags would give a pictorial touch to the diagrams. The
following table gives the teacher-student ratio in University A : -

TABLE XIX
Tea&her-silldeni ralio in University A

Year No. of students per teacher


194 -43
2 16
195 2 -53 21

1962-6~

The above data cilO be represented in the shape of pictures, in the


following manner :
DIAGRAMMA'l'IC lffiPRESEN'l'A'l'ION OF DA'l'A "335
Tet1Gh,,.-lfllti,nt Ratio in Univ6f'sity A
336 FUND..uLBNTALS OF STATISTICS

Data relating to distribution of population in various ages arc usu-


ally represented by pyramids. The following are two such pyramids
relating to the population ofIndi.. in 1951 and of U. S. A. in 1950:-

AGE-PYRA/It/1J5
INDIA
AGE CENSU.s 1951
75 AND "I
OVER

35-44

25-

V.S.A_
CENSUS 1950
.460£
;Of; AND
OVER
65-74

55-64

45-

'35-14

'15-34

15-24

5- 14
,- "*
15.000
FEMALE
)
Fig. zo
DIAGRAMYATIC REPRESENTATION OF DkT_4. 33,.
CARTOGRAMS
The regional distribution of data is usually shown by the use of
maps. The distribution of rainfall in various regions of India or the
production of coal 'in Vll.rious parts of the country -can be shown with
the help of maps. Similarly density of population in a particular country
can best be stud ied by drawing a map and putting down dots representing
a certain number of people. Thus, one lakh or ten lakhs of people can
be represented by a dot and the density of population in various regions
can be suitably pictured in this manner. The following map shows
the density of the Indian population : -
-- ---:::::-"':-'=:::'--'-"':'='':1

Fig. 21
Various methods by which statistical data can be represented by
means of diagrams and pictures have been discussed above. Which
particular diagram should be chosen for a parti(..ular type of data. is a
q'lestion not v~ry easy to ,answer, The selection of a diagram should
22
338 ··FUNDo\ME...... TALS OF STATISTICS

be done carefully as an inappropriate selection may misrepresent the


Wita and give .misleading_impressions. As has been sltid eader. i.tt the
construction of diagrams and pictures due attention must be paid to
neatness and display. Some types of data particularly those relating
to time series spreading over long period can best be represented by
means of graphs or curves. We shall discuss these in the next chapter.

Questions
I. Write a note on the necessity and usefulness of diagrammatic representation
of statistical data.
2. . What' types of mistakes ale commonly committed in the construction of dia-
grams ? What precautions are necessary in this connection ?
3. Point out the usefulness of diagrammatic representation of facts and explain
the construction of anyone of different forms of diagrams you know.
(B. Com., Allahabad, 1945).
4. Write short notes on : -
(a) Surface diagrams (b) Volume diagrams (c) Pie diagrams (i) Bat diagrams
(e) Two-dimensional diagrams.
s. The following table gives the detaila of the cost of the construction of a
house in Allahabad : -
Land Cement 800
Labour Lime 800
Bricks Stone 600
'Iron 1800 Sand 200
Timber J500 Other things 1500

Represent the above figures by a suitable diagram. (B. Co", .• Allahabad. 1945).
6. Represent the following data by vertical baIS constructed on a percentage
ballis ; -
Prweeas, CoSI. Profit or uss ( per pair oj shoes) malUlja&lund by A//ahQbad Shoe COIII/NRfJ
in the years 1936 aflll 1940
1940 '193~
RII. as. Rs. as.
Proceeds per pair of shoes I2. 8 10 0
Cost per pair
Wages 4 0 5 0
Leather 8 0 6 0
Othercost8 I 0 o 8
Total IJ 0 9 8
Loss Pt'Oiit
Profit o~ l~ per pair -0 8 +0 •
(B. C_., .A//~
DIAGRAMAT'IC REPRESEN'l'A'l1ION OF DATA .339

7. Represent the following by sub-divided bars drawn on a perccnlllge basis : -


Oul, Pro;Ulir, Profil or Lou per ;hair aurillg 19,8, 19'9 alia 1940.

Particulars 193 8 1939 1940


Rs. R8. Rs.
Coat per chair
I. Wages
4·'0 7·' 10.,
:to Other costs 3. 5. 1 7. 0
5· Polishing t., &·4 3·5
Total cost 9. 0 1,.0 21.0
Proceeds per chair 10.0 15.0 &0.0
Profit or Loss per chair +1.0 -1.0

(B. Com., Allahahad. 1948).

I. The following table gives the population of various countries and


total world population in the year 1931 : -

Country Population
(-000)

China 4,'11.770
India 3.5:t,310
Russia 1.61,000
America 1.&4,070
Germany 64.77 6
Japan 64.7 00
United Kingdom 4 6•0 77
France 4 1 ,860
Italy 4°,100
Others 7.0 5,°..:.7.:..7_ _ _ __
Total population of the World 20.11.800

Represent the above data by means of circle divided into sectors.

9. Draw suitable diagrams to represent the following information:-


Factory Wages Materials Profit.. Units
Rs. Rs. RB. Produced
A %000 3000 1000 1000
B l400 2400 1000 800
Show also the cost and pro&t per unit. (B. Com., Allahabad, 19P).
10. Represent the following figures by a simple bar diagram : -
Total Production of Foodgrains
(Million tons)
Year Total Production Year Total Production
1949-5 0 54.0 5 195%-53 57.48
19~O- U ,0.02 t9H-H 6'.42
195 1 -'-5& 51.14
340 FUNDAMl'..N'l'ALS OF S'I'ATJS'J'JCS

II. \tcpresent the following figure. by multiple bar diagram : -

r"J#es of the (/III»Itll'" of Exports anti l",porls (Base 1948-49-100)

Year Imports Exports


1948-49 100 100
1949-$0 194 lOS
19SO-S 1 85 110
19P-P 100 89
19,z-B 74- 94
1955-'" 64- 94
12. Represent the following data by a suitable diagram : -

1",porls uf Variolls CD_Dailies in lnJia (1948-49 10 19H-H)


Manufactures
Ycar Food. drink Raw and semi- Miscclla- Total
and tobacco materials manufactures neous
194 8 -49 9 1.9 8 126·93 z94·S 2 4·51 51 8 .00
1949-S o IZ2.,6 144.3S 288.S3 4.79 S60.03
19so -S 1 IH·81 12.S·77 314.7 8 2.6a 578 '9 8 '
1951-S 2 262.07 2,6.08 51 1·44 ,." 874'94
19S2,-n 175.6, 179. 16 276 ,37 4.3 1 6"·49
19'5-,4 92 .74 169·H 27 6•0 3 3·97 ,~·:t9

IJ. Rcpresent the following data by a suitable diagram sho\ving thc difference
bctween procecds and costs : -
Year Total proceed, TotaleOSh
1940 22.0 19·5
194 1 2,7·3 :n.7
1942 2,8.2 30 •0
1943 30 .3 2S·6
1944 P·7 26.1
1945 n·, 304. 2
(These figures are imaginary~

14· The follOWing table gives the profit and loss of a concern. Represent
the data by .ub-divided bar-diagrams.

Cosi. ',,~om~. Profit and LoIS Auo""t 0/ a Fir",


19S o 1951 19S z 19H
Particulars (Rs.) (RB.) ma.) (RI.)
Cost per un it
II. Wages S'4 n 6'4 7'0
h. Raw 'Materials 3'8 2'6 3'0 3"
r. Salaries 1"0 0.8 0'8 0'6
d. Other costs a'S a', a'l a'4
Total 12'7 11'4 12', 13·'
Proceeds per unit 11'0 la" 12'9 13'0
Ptofit or lOIS pet: unit -1'7 --0'7 -0-6 -o'S
DIAGRAMMATIC RE.PRBSENT.~TION OF DA'rA 341

u. The following table gives the national income of India by industrial origin.
Rl:1'resent the data by 8ub-divided bar diagrams : -
N4tiDlla/ l"come of I"tl;4 b.J Industr;ll/ Origi"
(In hundred croT'e/)
Particulars 194 8 -49 1949-5 0 19,0-,1
Agriculture 4:t·S 44·9 48 ,9
Mining, Manufacturing and
Hand trades 14.8 IS·0 I s.3
Commerce, Transport and
Communication • 16.0 16.6 16.9
Other services 13·4 13. 8 14.4
Net domestiG product at
factor cost 86·7 0
9 .5 95·5
Net earned income from
abroad -0.% -0.1 --0.1

National Income 86., yo. 1 95·3


Represent the above data diagr.Lmmat'caUy on a percentage basis alao.
16. The following table shows the monthly expenditure of three families
Represent the data by a suitable diagram on percentage basis : -
ItemsofBxpenditure Family A Family B Pamily C
(Rs.) (Rs.) (Rs.)
Food actiele. 43 83 zo
Clothing 8 17 as
Recreation 3 10 1.1
Bducation S 9 IS
ax
Rent
Miscellaneous
10
6 t, 17
11
17. Represent the fo:Iowing data by rectangles:
MOlllb/..1 bMdgel of 1_ f<l",iliel
Items of expenditure Family A Family B
(Rs.) (Rs.)
Food articles 180 no
Clothing 70 50
House rent 90 60
Fuel and lighting 5S 30
Mtscellaneoul 7' So
Total Expenditure ·n o 500
Saving :r., 10
18. Represent the data given in question No. 17 above by pie diagrams •
. 19. Diagrammatically compare the follOWing statistics of textile production
and Imports in India. What conclusions do you draw from these figures?
In Crores of Yards
1913-1 4 193 8 -39
Mill production 116.4 4:t6'9
Handloom '>roduction 106.0 19 1 •0
Importll ~t9·7 64·7
(D. COfII., AI"'htsbllti. 1946).
342 I'Ul"I).IIMENT.I\L"· OF ::>TA1:IS'i"!CS

:10, The following table gives the quarterly foreign trade of India. Represent
tbe data by a suitable diagram. .
In CroresoiRupees
Exports Imports Difference
<+) (-)
1952--53
Second quarter 48,7 38.9 + 9. 8
Third quarter 151.0 16 7.7 -10·7
Fourth. quarter 140 .3 13 8,3 + z~o
19H-H
First quarter 132.6 130•6 + z.o
Second quarter 119·9 164. 0 -44.1
Third quarter 13 0 .4 148.0 -17.6
Fourth quarter 148.8 12 4. 0 +z4· 8
19S4-SS
First :a=er
Secon quarter
13 2 .9 I29·6 + 3·3
II3·S 14S· z -3 1.7
;no Represent the following data by a suitable diagram :
Main headings of
income of the
1948-49
(lakhs
1949-So-
(lalehs of
19S 0
(lakhs of
-,1.
Central Government
Import duty /
of rupees)
7. z 74
rupees)
12..616
h pees
z,471
)
Production duty 1.06 3 6.78, 6,7S4
Income tax 13.988 n.'H iz.5'71
Other taxes 319 360 '. 661
22. Represent the following data by a circular diagram divided into sectors :

Poplllalio" of Pari A Siaies of I"dia


19,1-(in lakhs)
Assam 90 .44
Uttar Pradesh 63 Z • 16
Orissa 146.46
West Bengal 248.10
Punjab Iz6.41
Bombay 3'9.,6
.Bihar 402•2 6
Madras '10 • 16
Madhya Pradesh z12·4 8
zz_A. Prepare a cartogram and show tbe density of the population of India in
regions as per data given below : -
State Density nf population State Oensity of population
per square mUe per sGluare mil,.

Uttar Pradesh Madhya Pradesh 16 3


Bihar Hyderabad zZ7
Orisaa Rajasthan II7
West Bengal Punjab 33 8
Auam Pcpsu . 3"1
¥adras Vindhy~'Jrade8h 15I
Mysote Central I'R't\ia 171
Bombay Travancore Cochin lOIS
OlAGIlAYMATIC REPRESF.NT_O\T10N OF DATA 343

&5· The following table gives the population of India on the basis of religion_
Represent the data by pie'diagram constructed on a percentage basis:-

Religion Number in lakhs


Hindu 20 3 1 .9
Sikh 62..2.
Jain 16.2.
Muslim 354.0
Christian 81.6
Other religions 20.1

24. Represent the following data by a pietogtam.:-

CQuntry Population in crotes


China -------..._~
India 35·1
Pakistan 1. 6
U.S.A. IS·I
U.K. 5. 0

2,. Utilise the following data to represent diagrammatically the re/aJipe in ere-
Uc in note circulation towards the end of 194' in different countries:-

InmJtls6 in Nole Cirruialion in Millions

of National Currenty Units

Country In 1939 By the ead ofI94 ,


Canada 2.33 II 29
U.S.A. 759 8 2.8,01
U.K. 5H 13 80
Australia n 200
India 2.245 Iu09
(M. Com., Allahabad, 1948).

2.6. Show the details of monthly expenditure of two families given below by
means of two-dimensional diagrams : -

Family A ,Family B
Items of expenditure (Income Rs. ~Ineome Rs.
500 p. m.) 400 p. m.)
(Rupees) (Rupees)
Food 140 12.0
Clothing 80 80
House rent 100 60
Education 30 40
Fuel and JigbtinQ 40 ao
Miscellaneous 40 4°'
(M. A •• Plllljah. 1952.)-
344 FUNDAMBNTALS OF STATISTICS

2.7. Represent the following data by a suitable diagram : -

Principal heads 193 5-39 1939-40


of revenue (Lakhs of (Lakhs ot
rupees) rupees)

Custom 4 0 50 4S S8
Central Excise Duty 868 Gsz
Corporation Tax 2. 0 4 13 S
Taxes on Income 1574 1410
Salt Bu 10SO
Opium so 46
Other heads ua 15 0
----~~~----.--------------------------------~~~~~~=-~~--
~ (B. Ctllll •• N~gpur.1943)·

The following tabIegives...th.e birth rates and death rate. of a f.w countries
in the world during the year 19,1 : -
~ ; -." -_ -Death rate
Countn' Birtllu'te
Egypt 44 207
Canada 2.4- II
U. S. A.. 19 II.
India
Japan
Germany
,.
33
16
2.4
19
11
France 18 16
Irish Free State 10 14-
United Kingdom 16 lIZ
Soviet Russla 40 II
A.ustralia 200 9
New Zealand 18 8
Palestine H 20J
Sweden IS 120
Norway 17 II

Represent the above figures by a suitable diagram.


2.9. The following table gives in arbitrary units the cost of production in a
factory in biennial average. : -

~ ....b
M

...:..'"'" ....
~
..,_ .0- ...,
H .....
..... "".....~
.. .2- ..2-
.,;, ~ ..b .,;, b :......
0-
~ 0- '"
2- ~ '"
~

Material.
- --- --- --- -- -
57 20S 55 36 3S 38
---------
17
2020 1i)
Labour 10 8 II II II IZ 7 5 8
Over-head 14- 10 15 16 17 10 Ia 9 la

Total
---
61
--- ---
45 61
- 63 --- --- - - --- ---
63 70 41 31 46
/
Draw a 2taph of the different component of costs as percentage of the total
DIAGRAMMATIC REPIlESENTATION OF DATA 345

~o. Show by suitable diagrams the absolute as well as relative changes in the:
student population of the colleges A and B in the different departments from 1940 to
1947 : -
Subject A B
1940 1947 1940 1947
Arts 300 3S o 100 2.00
Science 12.0 500 150 2.5 0
Commerce 2.00 6so 13 0 15 0
Law 100 300 100 12.0
(B. Co".., Agra, 1948).
31. Indicate the diagrltms you would consider most appropriate to use for re-
presenting each of the following classes of statistical data, stating briefly the reason for
your choice:-
(a) Distribution of a large number of candidates according to the number of
marks scored by each at a public examination.
(b) Marks scored by two selected candidates in each different subjects tested
at an examination.
(t)- Total value of Indian Exports and Imports during the years 1938 to 19S5.
(d) Distribution of Assets of all Indian Life Assurance Companies put together
as at January 19, 1956.
(II) Middle class cost of Living Index Numbers in Bombay and Calcutta during
tbe years 1938 to 19H.
o (f) Distribution of age, sex and civil condition of person enumerated at tbe
CCOIllS in 1951. (I. A. S., 1956).
32.. Diagrammatically compare the following itatistics : -

AlIflrage Sif(,e of Ho/Jings In India


AI Co_pared To Some Foreign COllnlriet

Country Acres
India 7·S
Denmark 40.0
Holland 26.0
.Germany aI·S
France 20·S
Belgium 14·5
Britain 20.0
U.S.A. 145.0
(Source : Congress ,Agrarian Reforms Committce Report, 195 0 )
on. "Give me an undigested heap of figures and I cannot see the wood for the
trcc~o Give me a diagram and I am positively encouraged to forget detail until I have
a real grasp of the overall picture. Diagrams register a meaningful impression almost
before we think."-Moroney.
Discuss the utility of diagrams and elucidate the above statement.
346 FUNDAMENl'ALS OF STATISTICS

-34. The following tabie gives the number of students appearing at various exa·
minations from a college in 1958, 1961 and 191140 -
BUIDlllations I Number of Students appearing
---- - \---1918 1961 --"19 64
a~ I ~ ~
B. Com. I 100 us
..B,..--,S,C.,-________ i ISO 2.50
Total I 4,0 675,
Represent the above data by a suitable diagram.
55. The table given below shows the percent of the worlF done in tae manu-
facturing sections as against the allotted quota.
Sections Monday Tuesday Wednesday Thursday Saturday Weekly
Jan. 2.5 2.6 2.7 2.8 30 Total
A ~ ~ n ~ 55
B 70 6S 80 85 100 100
C 6, 51 7S So 75
75 So 100 100
D
-110
Allotted quota for each workday 100%. Draw a Gantt'iCilart-£rOm: {In: sbcvt'
dat..
36. The following table gives information of outlay in th6 two five year plans
of India under IDl\jor heads of development expenditure : -

Heads of Expenditure Plan~


First plan Second pIan
(in crorcs of Rs.) i (in crores of

(a) AgriCillturc and Community Dev-e--:l'--o-pm-en-t


(II) Irrigation & Power
I
-I' ~,
---H-7---- i
661
~.)
~68
913
(,) ID.dustry & Mining ._ ·I7s. 89 0
(d) Transport and Communication 1J7 13 S'
(e) Social SemCC8 H3 9'U
(f) Miscellaneous _ _ _ ______ 69 99
Total 235!) 4 800

Represent the above information by a suitable diagram.


(D. Com., Agra, 1959'
H. Draw a single diagram to depict the following data
(in Lakhs.)

Self-supporting Non-eaming Earnin~ Depen- Total


dependants ents
- -2490
--
AgriCUltural 7 II 146 9 3 10
Non-Agricultural 673 61! 1076
334
-- Total \ 10 45 2.142 379 ~
(B. Co",., Raj., 1959)
~8. Represent the following data by a three dimensional diagram :
Ncws paper Circulation (000)
The Chronicle ,2.800
The Student's News IitOO
The Student's Telegraph Z50
Graphic Representation
of Data 14
Diagrams discussed in the last chapter are generally used for the
purpose of publicity and propaganda. From the purely statistical point
of view their importance is not much. They only give an approxunate
and rough idea about the level of a phenomenon. From, the statistical
point of view graphs and charts are mUch better than diagrams. Dia-
grams can be used only in those places where two or more quantities
have to be compared. If the relationship between two variables!p to
be studied dia~rams would be useless for the purpose. Such studies
can be made Wlth the help of graphs only. \,\'hen we study the relation-
ship between two variables the idea is either to study their cause and
effect relationship or to study the extent of change in one variable if the
other variable changes by a particular amount. Such studies cannot be
made by diagrams; graphs are, however, very useful for studying such
relationships. The special feature of graphs is that they are more obvious,
accurate and precise than diagrams. The drawing,of graphs is also easier
than the drawing of diagrams. Graphs are very useful for studying time
series and frequency distributions.
Construction of Graphs
In the construction of graphs two "Simple lines arc first dra)vn which
cut each other at right angles. These lines are called axis. The
horizontal line is called abscissa or x-axis and the vertical line is called
ordinal ory-axis. 'The point at which they cut each other is called the
Poi/II of Origin. The following figure gives tWO such lines : -

~-- ._-- ---- ... p


~:.-------
.,
·4
2 '3
'2

X' ·,0 x

3
...,
.J 4 :

••:--__ j-~--s:
348 PU"'OAMBNTALS OF STA'l'lSTlCS

In the above figure x'x is abscissa or x-axis andy~r is ordinate a


,-axis. The point of their inter-section or the point of origin is C
In x-a::ds on the right hand side of the poim of origin, positive 'value
are shown while on the left hand side of the point of origin negativ,
values are recorded. The value at the point of origin is O. Thus fran
+ +
o to x there are positive values I, 2,-1-?o, etc., and from 0 to x, at.
negative values-l,-z,-;, etc. fn y-axis positive values are shaWl
above the point of origin and negative values bdow it. Thus from 0 t(
J positive values--j- 1,+2,-1- 3 etc., are shown while from 0 to y' negative
values-l,-2,--3, etc., are shown. Abscissa and ordinate thus dividl
the graph paper in four parts numbereq. I, z, ~. and 4 as in the figure
shown above. In the above figule Ox and Oy measure positive value:
and Ox' and 0),' negative values. If two values (one of x and another ot
y) are positive they would be plotted somewhere in part No. I. If bod
are negative values they would be plotted in part No. ;. If the value oj
'>( is positive and of y negative they would be plotted in part NO.4 anc

if the value of x is negative and of y positive it would be plotted it


part NO.2. These parts are known as Quadranls.
For each axis a convenient scale is chosen. It is not necessary thai
the scale of x-axis and y-axis should be identical. The scale indicate!
the units of a variable which a fixed length of the axis would represent.
Thus in figure No. I above, on the x-axis 1- measures 10 units of the
variable and any-axis 1- measures 5 units only. The'l'alues of two
variables x andy given belo\\' are plotted in figure No., 1 above -

TABLE I

Variable x Variable.)

12

-10 6

6 -6

4 -,
Thus P is plotted at a point .wh~re x has a value of ~z andy of~.
The distance of P from the ordinate is 1.2.- tndicating a value of 12 of
x-variable and its distance from the abscissa is 1.6- indicating a value of
II of y-variables. Since both the values are positive the. point P is in the
6rst quadrant. In x-axis the distance from the pc;>int of origin to any ?ther
point is known as x-(".oordinatc and in. y-axis it IS known as~-coord.lnat~.
These two distances are called the co·ordinates of the pOint which IS
GRAPHIC REPRESENTATION OF DATA 349
plotted. Thus the co-ordinates of point Pare IZ and 8 ; x- co-ordinate
is 12 and y-coordinate is 8. Co-ordinates are expressed in terms of x
andy. Q, Rand S are points in which x andy co-ordinates are respecti~ely
- 1 0 and 6 -6 and-6 and 4 and-5. The co-ordinates of these pOInts
are indicat~d by dotted lines in the above figure. In actual practice only
the points are plotted and lines are not drawn to show the co-ordi-
nates.

Choice of Ira/e. In drawing of graphs some points must be carefully


noted. The first and the most important point is the choice of a scale.
The scale should be such that it can accommodate the whole data. In
this connection another point that arises is on which axis a particular
variable should be shown. Conventionally the independent variable
is shown on x-axis an.d the dependent variable on y-axis. Thus, in
plotting of time series the years and months are shown on x-axis and the
dependent variable ony-axis. The horizontal scale or the scale of x-axis
need not begin with O. The ),-axis, however, should have a scale beginn.,
ing with O. The point of origin should represent 0 on the vertical line.
Equal space on y-axis should represent equal amounts in a natural scale.
In ratio scale it is not so. For the present we shall study only the n€t
tural scale. There is no hard and fast rule with regard to the ratio of the
scale on abscissa and on the ordinate. In this connection Bowley says,
"!t is difficult to lay down rules for the proper choice of scales by which
the figures should be plotted out. It is only the ratio' between the hori-
zontal and vertical scales that need be considered. The figure must be
sufficiently small for the whole of it to be visible at once, if the figure is
complicated, related to long series of years and varying numbers, minute
accuracy must!be sacrificed to this consideration. Supposing the hori-
zontal scale is decided, the vertical scale must be chosen so that the part
of the line which shows the greatest rate of increase is well inclined to the
vertical which can be managed by.. making the scale sufficiently sman
and on the other hand, all important fluctuations must be clearly visible
for which the scale may need to be increased. Any scale which satisfies
both these conditions will -fulfil its purpose.".
,

Thus, the scale chosen Inust be such which would permit the whole
data to be represented in an accurate manner so that the fluctuations are
clearly indicated. The respective sizes of the scale of x-axis and.1-axis
cannot be rigidly laid down. It depends amongst other things en the
si:>'e of the paper also. Conventionally, however, y-axis is taken Ii times
as long as (-axis but there .is absolutely no rigidity about it.

• Bowlev. A. L .• B/e_nuDf Slali.lliu. pa~ t lJZ. (1920 edition)


350 FUNDAMEN'l'ALS OF STATISTICS

Plotting of data. When the scales h9,ve been decided and marked
on the graph paper the last thing to be done is to plot the data. On
the basis of the values of x and'y co-ordinates variou!i points should be
plotted on "the graph paper. The next thing is to join these points.
The rule with regard to the joining of points is that if the figures relate to
a continuous variable the line joining the points should give as smooth
a curve as possible. By continuous variable we mean such variables
which can assume any value within a given range. For example, the
heights of persons can have any value within a specific range. The series
relating to the heights of some persons would be a continuous variable.
In such cases it should not appear as if the different parts of a curve are
'lot smooth and give an angular picture. If a curve is smooth it, indi-
cates that the different values of variable are continuous and there are
no parts separate or different from each other and further there is
no break or gap between them. If the variable is discrete thell the
different points should be, joined by straight lines. It would indicate
that there is no continuity between the value represented by one point
and that represented by another. It means that the variables can a,ssume
only those values which are indicated by various points. They cannot
have any value between the points. Ordinarily It is very difficult to
smooth curves and the data are shown by joining the faints with straight
lines. However, curves which show mathematica relationships can
always be smoothed and they should not be shown by straight lines.

GRAPHS OF TIME-SER1ES OR HISTORTGRAMS

First we shall study the graphs relating to time series and then the
graphs representing frequency distributions. Graphs of time series
can be either on natural scale or on ratio scale.

If the absolute values of a variable are to be represented then natural


scale is used. In natural scale equal distances of an axis represent equal
values. If for x-axis the scalt: is I ' =5.0 units, then a distance of I" any-
where on x-axis would represent 50 units. Similarly if on y-axis I" = ,10
units, distance of I ' on'y-axis would represent 10 u~ts. If instead of
absolute values the relative values of a variable are to-be shown equal
distances on an axis would represent pxoportionately equal values of a
variable. First we shall discuss the graphs constructed on natural scale
and afterwards we shall discuss how ratio scale is used.

In graphs relating to time-series values of a variable at different


periods of time are shown. In other words, the graph shows the changes
GRAPHIC REPRESENTATION OF DATA 351
in the values of a variable with the pass~ge of time. If absolute values of
a variable are taken into account the .graphs obtained by plotting them
are known as Absolute Historigral1u. If the values are represented by
index numbers and if instead of the actual values their indices are plotted,
the graphs so obtained arc called Index Historigrams. If the changes
in the values of two or more variables are shown, there would be two
or more historigrams and thus a comparative study of the changes would
be possible.
Absolute historigrams--o(lc variable
In table II below the quantity of steel produced in a country in the
years 1958 to 1965 is given;-
TABLE II
Prodllflion of Steel from 1958 10 1965 (in lnkh Ions)
Year Production

195 8 9·54

1959 8,90

19 60 8·93

1961 8·57

19 62 8.;0

196; 10.04

1964 10.7 6

1965 11.03
If the above data. are to be shown by means of a graph, first of all
x-axis and y-axis would have to be drawn. Since the values of both
the variables are positive we shall draw only one quadrant in which the
~alues of both x and y variables are ppsitive. On tI:e abscissa or x-axis
we shall show the years and on the ordinate or y-axis the figures of the
production of steel. In figure No. 2 given below on x-axis l ' represents
two 'years and on y-axis I ' represents four }akh tons of steel. Thus
agaiDst the year 1958 the point at a distance of 2' 38" from the J:loint of
origin would show the production of 9.54 lakh tons of steel.
Similarly against the year 1959 the point at a distance of z.u· would
reprcacnt a production of 8.90 lakh tons. In the same way other
pointa can be plotted. The line joining these points would be the
desired graph. It would be as follows ;
352 FUNDAMENTALS OF STATISTICS

Prodll#iOJl of .11,'e/ from I9j S 10 1965 (in /akh 1011.1)


'rotlUClt'Oll
(I.IIC tlln.!)
'2 T-
+~~~-p:f
to ........ -_+- __L
~ ~
r- - -. . .V -r--+--'
I

.-y, I
11

--
r
~+---4-
f
.--
I
,-- -+----+
I
--l-
j_'-t-. i-I
6
I I

.i .-f- •

1+t
-- --'Ii--+-+i
_- ,-t--- I .
r-----+-
II I
!
I---i---
, )
H-__L_-
I_ I J
2

I i - .1.Ll
60 61 6z
yeOTs
Fig. 2

From the above graph the production of steel in each year can be
known. The graph can also reveal the changes in the production from
year to year. If these data were represented by a bar diagram it wouid
not have looked so impJ;essive.
The only difference' between absolute historigrams and index his-
torigrams is that in the former, actual values of the variable are plotted
whereas in the latter their index numbers are plotted. Absolute his-
torig.rams teJl us about the changes in actual values whereas index his-
torigrams tell us about the relative or percentage changes. If in the above
case the production of steel for the year 19S 8 is represented by 100 and the
production figures of other years are expressed as relatives and if these
indices are plotted the resulting graph would be an index historigram,
False Base Line
If the fluctuations in the values of a variable are very small as com-
pared to the size of items, a false base line is used. By its use even minor
fluctuations are magnified so that they are clearly visible on the graph.
If the size of items is big and if the vertical scale begins from zero the
curve would be mostly on the top of the pa,per and if the differences in
the values of various items are not much, it would, more or less, be of the
shape of a straight line. In false base line the scale from zero to the smallest
GRAPHIC REPRESENTATION OF DATA 353
value of the variable is omitted. Whenever false base line is used
it shou'Id be very clearly indicated. on the graph. Generally in such
cases Terti cal scale is broken in two parts and some blank space is left
between them. The lower part of the vertical scale is kept very smal.l .
and it begins with zerO. The upper part begins with a value equal or
nearly equal to the smallest value of the variable. TQ make the breaking
of vert).cal scale prominent usually saw-tooth lines are used. In Figure
NO.3 which represents the data given in table III below such a false base
line has been used :-
TABLE III
ToM Supply of Money in India
(in hundred-million rupees)
Month 195 I 19S Z Month 195 I 195 Z
January 19·7 18·7 July ZO.I 18·3
February 2.0.2- 19. 0 August 19·4 18. I
March 2.0.6 18,9 September 19.0 17·9
April 2.0·9 18.'9 October 10.0 17·9
May 2.0·9 18·7 November 18·7 17·9
June 2.0·4 18·5 December 18.8 17·9
Total Supply of Money ;n India "
f?s
(IIW1(''''''
21'0
""",on) --

~o 5
v 1\
t-
!
Cl 1'\
200
J
II
195
1\
190
i\
[\ v t'-. V l"'- f--
18 5
"- f'... 0

f'\ t'-,
180

175

o=TITmr
JFMAMJJA~ONOJrMAMJJASONa
1951 1952
Fig. ,;
23
354 FUNDAMENTALS OF STATISTICS

~ In the abov.e graph I" on the vertical scale represents 100 million
rupees. If a false base line was not used the vertical scale would have
been 21' long. If 1" was to represent 500 million rupees the size of the
vertical scale would still have been more than 4" but then the fluctuations
in the supply of money um-ing these two years '\vould not have been
very clear from the graph.

False base line should be used only wheu it is absolutely necessary


to do so. It is generally used to save space and to depict small fluctua-
tions sharply. Dmler such circumstances a graph in which false base
line has been used should be interpreted with caution as sometimes false
base line gives very misleading impressions.

Historigrams-Two or More Variables


If two or more varhbles which are to be shown on a graph ate
expressed in the same unit there is no djfficulty in their representation.
Both the horizont?l and vertIcal scales in such cases would be common
for all the variables. The procedure of plotting these curves is the same
as in the case of historigmms of one variable. The only difference is
that now there would be t\VO or three curves inste~d of one. One
aclvantage in such graphs is that, it is possible to sho\\ the difference cr
the sum of two or more variables by another curve dra" n alongside
,vith the original ones. The data of this type given in table IV below
are plotted in figure NO.4.

TABLE IV

Average Weeki)' Income ,mrl Expemlillire of a Skilled Labof(r

Years lncome- Expend iture Differences


I<)5 2 -3 1 16·9 28. I ·-11.2
1953-54 24. 1 43· ! --19. 0
1954--55 3 Z·4 4 8 .5 --16.1
1955-5 6 ; ) .0
47·' -12·3
1~J5 6-57 33.0 B·l -0.1
1957-5 8 18.6 14. 1 +4·~
195 8-59 ;5·9 30 • 8 +5. 1
1959-60 33· 3 30 • 0 +3-3
J9 60- 61 39·4 33·4 +6.0
19 61 -.62 49·5 ;6.6 +u·9
1962-6; :;9. 8 31·9 +3·9
GItAPHIC REPRESENTATION OP DATA 355
AU!rag6 Week!, /nfome alld Expenditure oJ a Skilled Labollr
fls

50
.,.'
"
~''''
45 J
40 J \ II
I \ /
35
30 I
: V r.....: J -- f... .......
.. 1--,_

!-
25 1/ 1\ Ii
J \~ II!
20
V \If !
.
15

1'0 I , ,
,, ,,
5 ,
, ~-
,.,
,,
0 f----
5
-,'0
,,
,. ~/ncomE
.f;:5 '
,. - ..- £xpendi!vre
"
".

-2'0
--- Oiflt'rt'flct'f
o .... '"r
'-0 \D
I
0'
I
0
..... \0
\0

\0
... .
......
\D
I
\0

Fig. 4
In the graph given ahove two quadrants-the first and the fourth-
are shown as some of the figures of ~-axis (relating to diffetences) are
negative. .
- If the units itl which different variables are expressed ar~ dilferent.
then also such graphs can be used. The techni~ue of their cOnstruction
is the same as in the above ca~e. The only dlfference is that in such
cases two or more scales relating to different units have to be shown.
If, however, there are onlv two variables in different units, One vert"ical
scale can b~ shown on the'left hand side and the other on the right hand
side. 'The two hi$torigrams can be plotted in this manner on the same
paper. To facilitl\te comparison the two scales are so adjusted that the
hi$torigrams are close to each other. Thus, if the average value of both
the variables is kept near about the centre of the vertical scale the two
curves would affotd a good comparison. False base line can be used for
356 .f'UNDAlI4BNTALS OF STATISl'ICS

the purpose of adjusting the scales. The following table gives the mon-
thly imports (volume and value) of liquor in India in the year 1941-42-
Thcs(' data have been plotted in Figure NO.5.
TABlE V
MOllth['y Import'J Liquor ill India
.
Month
Volume
(lakbs of
gallons)
I Value
(lakbs'of
rupees)
194 1 -4 2

I Month
I Volume
(lakbs of
gallons2
'Value
(lakhs of
rueees)
April 4. 6 October 2.6., 4.6 3 1 •1
May 3·9 November 19·3 3·4 23. 2
June ;;6 11.1 December 2..1 IS·3
July 4. 1 2.6.6 January 2.·3 2.1'.1
August '3·3 2.1.0 February 1.6 16.7
September ;.6 2.;·4 March ;·s 19·C}
MOlllh!fINlportoTLI'lHor In ]ndiq

VO/llma
fl
socgaIIOtIS)
S 3

,
4
.S~
·0
J1\
II 1\ ili,
3 '5

3 .f)
1\
Ie.
\~
~~ '" (
I
I
I
\
\
\

\7
,V'
\
"
I

~ I
2&

2 .$
\
\
\

i7
1I
I
Ii ',
,
\
7 2D

2·0
\'tJ1\\ it"
I
I ,
I
\

- , I \
'·5
GRAPHIC RBPRESENTATION OF DATA

The above histotigram indicates that there is a positive relation-


ship betWeen the v~lume and value of liquor imported in India. If the
vofume rises, the values also rise, and if it falls the values also fall.

If relauve-clIanges in th-e values of a variable ate to be shown, index


historigram can be plotted in place of absolute historigram. If two or
~_lllore variables are pres~nted in the shape of index historigrams, theiI
base year should be common otherwise comparisons would be fallacious
and .the graph would £ive a very misleading impression.
Sometimes mixed graphs are prepared to study the inter-related
variables. In such graphS one v.ariable is usually shown by bar diagram
and the other is plotted in the shape of a curve. In figure No.6 below
such a graph has been shown which reo resents the data given io
Table VI.

TABLE VI

Volume and Value of Exports of LaG from India in the

Second Half of the Year 1941

Month Volume Value


(Cwts) (Rupees)
(000) (00000)

July 96 50
august S6 33
September '59 43
October 32 23
.:November 60 48
December 22" 1.9

The above figures can be represented by means of a mixed graph.


'1 he volume would be represented by_vertical qars aQd the value 'Would
be shown in the shape of a historigram. On the vertical scale 1 would
r

represen~ 32,000 cwts. and 20,00,000 rupees. The scales are such that
tltt: f\.i~~rig~atn. would run ~hrough the bats making;the gra ph 'geautiful
~tl!lg Hlly the comparison between value and vol~tne.
358 FUNDAMENTALS OF STATISTICS

VO/llme and Va/lie of Exports oj Lacjro", India ilzthe


Second Half of the Year 1941.
Vo/umt' VatU'
c",ls /l.,
(00,000)
--------- ---------~'75

VValUI!

~~~~~~~~~~~~~~~o
SEP. ..;J OCT.
Fig. 6
From the abov-e graph it is possible to study the variations in the
voiume of' exports month after month and similarly the variations in
the values are also clear. 'The two variables are moving in the same
direction. Wbenever there has been a rise in the volume. values have
also gone up and conversely with a fall in volume, values have also gone
down.. ThIs indicates that there is a positive relationship bct\veen the
two phenomena.
Method of showing Range-Zone Graph
In soine data the difference between the maximum and miOlmum
values of a variable have to be emphasised and presented graphically.
In such .:aseS zone graphs are used. Zone graphs show the range of
variations. Figure No. 7 presents a zone graph based on the data given
in table Vll.
GRAPHIC REPRBSENTAT1.0N OF DATil

TABLE vn
A!>erage PriceJ of Gold in Bombay (Pcr Tota)
Year Maximum Minimum
Rs. P. Rs. P.
195 0 122.00 12.1.;0
195 1 12.1.80 UI.Z.S
19P. 13) .12 I21,2-5
1953 132. 12 n6.6z
1954 134·n rzR·7°
1955 13 6 . 80 133.20
195 6 13 6 ,75 I3 I •2 ,
1957 135.5 0 133·94
195 8 135. 20 134·2.5
1959 137·7° 134·75
In order to plot the above data in such a manner that the difference
between the maximllm and the minimum prices in diR-erent years is
clearly represented, it is necessary to use a false base line as the difference
between the maxima and minima is not very much. In figure No. 7
below. the minimum and the maximum values have been plotted and the
difference between the two has been made prominent by drawing thin
bars between these values. The size of the bars represents the range of
variation in prices in different years.
Maximum and Minimum Prictll of Gold in Bombay
IIspsdo/tJ
'4 0

120 -- I -

110
360 FUNDA.MENTALS OF S'l'ATISTICS

[nstead of depicting the above data in the shape of zones it is also


possihle to draw two historigrams, one representing the maximum
values and the other the minimum values. The space between these
curves 'Would indtcate the range of variation, and in order to make it
prominent can be shaded. Table VIII given below gives the imaginary
maximum and minimum values of a variable x. These data: are plotted
in the above fashion in figure No.8.

TABLE VIII
.
Maxil1l111J1 and Mi'nilJllllf' Va/t(u oj "X"
Date Maximum value Minimum value

I 52 50
l, 51
; 56 55
4 5; 50
S 51 48
6 52 51
7 53 51
S 55 53
9 56 54
10 58 54
II 51 ,6
12 59 54
13 56 55
58
1,14 62
54
60
16 63 62
17 61 59
18 60 51
19 64 58
20 66 63
.21 62 60
22 59 '5
23 60 '9
24 64 63
25 66 64
26 67 62
27 65 60
28 .1 8 n
29 58 56
30 15 ~2
GJ!.APHIC REP)!.ESEN"I'ATrON OF DATA 361

MdximUf1 dnd /v(inilflUm Va/mIO! "X"

~
6J 1-,.-

1-1-

.,..
~i.
55
~
~
5
I
"oem I I I ( 1 1 I ITt 0 T1 t::n 1 1 I I t I (J I J
t J S 7 9 " 'J '5 11 19 11 2} 25 27 2~
Dat~s

Fig. 8
Method of showing difference
When the difference of two figures is to be shown prominently
the space between them is either coloured or cross-lined. Such graphs
. are very attractive and appealing. Positive arid negative differences
are indicated by different colours or ditlerent types of lines and crass-
lines. Such a graph has been shown in figure NO.9 which presents
the data given in table t X.
TABLE IX
India'.s Foreign Trade (Janl/dry, 1953 to July, 1954)
Month Imports Exports
1953 (in crares of rupees) (in crore'!> of rupees)
January 43·5 44·5
February 4 0 .4 39·2.
March 47. 1 4 8.8
April '56.l 38'9
May 51·4 4 1 .0
June 51.8 4 0 .0
July· 50 .0 41.0
August 4 6 '5 49·4
September 45·5 4 8 •8
OctoQer 39. 0 ' 4 8.7
. Noyember
~~ ~ ..
.'X~~ t;.l'iI
39.·4
3~'9 ;
51·5
44.6
362 i1UNDAMEN'I'ALIl 01' STATISTICS

Month Imports Exports


1954 (In crQres of rupees) (In crores of rupees)
January 40 •1 40 • 8
February 49. 6 4 6. 6
March 47. 8 3 I •O
April 51·7 3 8 .7
May 45·7 43. 2
June 53·7 4 6.7
July 45·5 45·2

India's Foreign Trade UnnJ/ary, 1953 If) j/i{y, 1954)


Crorl' RdPt't'S

JTIITIJ]~rmJTD]JJ
J F M'A '" J J ,1 .s 0 N D J F M A 111 J J
1953 1954
Fig. 9
F.rom the above graph the f:wourable and unfavourable balance
of trade can be very easily studied.
Band Graphs
Band graph is a type of line graph which is used to present the
total for successive time periods broken up in sub-totals for various
component parts of the total. Va+ious component parts are plotted
one over another and in this way there would be as many bands as the
number of parts. To distinguish various parts from eaeh other they arc
either coloured in different colours or the space between them is filled
by cross-hatch, vertical or horizontal lines or different types of signs
and symbols. This type of graph is specially useful for studying total
cost divided in various component parts or total sales, production,
consumption, e~ports or imports, according to different states, districts
GRAPHIC REPP.ESENl'ATlON OF DATA 363
or regions. Table X below gives the imports of newsprint in India
from various countries ! -
TABLE X
import of Newsprint in India (1947-48 to 195 z-53)
Country '947-4 8 194 8-49 1949-$0 195 0 -$1 1951-$ 2 195-53
- - - - - ----.- 1------ ---- ---
Canada 6.6 8.2 7. 1 6.2 11·9 10.6
Finland 5·5 5 ·9 2·4 8.0 12'9 10·9
Sweden 4. 6 7·3 4·4 3. 6 1.6 4·5
Norway 9. 1 14·9 8·5 11.9 12.. I 10·7
Others 5·-1- 7. 8 4·7 24. 0 18·5 12.6

- Total 5I.2 44. 1 2.7. I 53· 7 57. 0


The above data ha ve been presented in the shape: of band graph in
49·3

ligurc No. 10 below:-


I",ports oj New.rprint ill 1I1difl (1947-48 to 1952.-53)

6 0 , - -..-- -T-- - -----.---------,--------.,.-------


I

47-48

Fig. 10
FUNDA1I.IbNTALS OF STATISTICS

From the above graph it is possible to study the trebd -of total
imports as also the imports from various countries. If the data are
given in the shape of percentages then also band graphs can be used.
in !ouch graphs the total in each year would be represented by 100 and.
the curves representing the imports from various countries year after
year would be expressed as percentages of the totals in different years.
The data given in table xr below are shown on the basis of percentages.,
in figure No. 11.
TABLE Xl
Number rif Hindi Films prodllred ;n India (1940-5°)
No. of Hindi Total No. ot Col. (I) as
Year Films Films Perc.el1tage of
{I) (2) Col. (2)

J94 0 86 50 .3
1941 78 4 6 ,5
[942- 97 59·5
[943 108 68.0
1944 86 68.2-
1945 73 73·7
1946 155 77·5
1947 186 65·7
194 8 14 8 55 ·9
1949 157 54·;
T95 0 IH 46.()

NIIII/her oj-Hindi Fi/llls prfJdlmd In India (1940-50)

.fig. II
GlI.APHIC REPRESBNTA'1'ICN OF PA~A
365

....'" N
q
0 0
j co
N
1""4
c1'\
)'0'1

cr'\
...

.....o
u
t
t

-<
-i:l
0..
Q'\
"" 00 00
00' 0 \•. 0•
. . . . 0'\

~ o 0 .... o
N'O
0..
~_I~----I----~I~---
-i":' .
,t'OO'V
..Q ~ ..... 0
~
-
e -oi -.;..- 1 - - - - 1 , - - - 1
366 FVNl>A:MllNTALS OF STATISTICS

In the table given above the monthly figures of sales are first
cumulated. These figures are shown in the second row for each of the
three years. Monthly annual total has been given in the third row for
each year. The figure given against December, 1953 is the totalof the
figures of the twelve months ending December, 195}. Similarly, the
figure given against January, 19J4 h the total of the t\ve}ve months
ending January, 1954 or in other words, the figure against January, 1954
is found out by dropping the figure of January, I95; and adding the
figure of January, J954 to the annual total of J953. In this manner
other figures have also been calculated. The above data would be
plotted in the shape of a zee-chart as follows.

Z-CffrtJe NprueFltillg {oe Sales of (/ Ret.1iifr.

100

~
::; 150
~
~
~ "0

'50
t~
'V
0 U
J F

Pig. 1%

In ord!:r to preserve the proportion, if monthly data are g~ven


the scale for cumulative and moving annual totals is about ten Urnes
the scale on which the orig:nal data ate plotted. In the above graph
the scale on the left hand side is used for plotting the monthly Egures
of saJe, and that on the right hand sjde for plotting the cumulative figures
and moving annual totals.
The thick lines which represent the monthly sale figures indicate
the seasonal fluctuations of the sales. It will be noted that every year
from January to April the sales show a downward tendency. There is
slight recovery in the month of June and the sales reach their pea~ level
to\\-ards the close of the year. These seasonal fluctuations are quite
apparent from the ~urve of the monthly sales.
GRAPHIC ItEPRESENTATOIN OF DAT." 367

The cumulative' curve which is plotted month by month shows


whether the veal's' sale total up-to-date is more or less, than the similar
figures of the previous year. It will be noted that the cumulative figures
for each month in the year 1955 are higher than the respective figures of
the year 1954 indicating that the sales of the concern have an upward
tenden",y.

The moving annual total curve shows an upward' trend indicating


that year after year the sales of the conCern are going up. It shows
the trend both month by month as well as year by year. In the
above' case the sales' in each month are more as compared to the sales
in the same month in the previous year and the moving total in a ~
is als('l higher than in the previous year.

GRAPHS OF FREQUENCY D1S'rJ>lllUTIONS

r'requency distributions of all types are represented by means of


graphs plecisely fOf the same reasons for which graphs are prepared
for other types of data. Such graphs ate called frequency graphs. The
technique of plotting them is the same as ex.plained.in previous sections.
But which particular type of frequency graphs would be constructed
for representing a given set of figures would depend on whether the par-
ticular frequency distribution is cliscrete or continuous. We have al-
ready said earlier that a discrete seties is One in which an item cannot
assume any value in a particular class~interval. The value of each item
is fixed and definite. As against this a continuous series is one in which
an item can assume any value within a particular class interval. Fot
example, the number of rooms per house would give a discrete series
as the rooms can he in whole numbers only. But a series relating to
heights of a group of students would give a continuous series, as within
a class-interval the height of an individual may be any figure. Height
is not capable .of exaGt measurement. In actual practice discrete series
are more popular as most of the phenomena relating to ordinary walks
of life are measured in well~defined units. The fractions of the units
are conventionally eliminated. Thus, even heights can be measured
correct to one-fourth of an inch or one-tenth of an inch, and a discrete
series may be obtained in this fashion. Discrete series are usually in
the shape of bar frequency diagr(J'I1s or discontinuolls turtles. Continuous
series are' usually shown by means of .rmoothed curves.

In the construction of freql,lency graphs the values of the variable


arc measured on the x-axis and the corresponding frequencies on y~ax.is.
The following types of graphs can be constructed to repreSf'flt frequency
distributions ! -
(I) Bar FrequetlC.J Curves. Such curves are used tc depict dis-
cre~e sed..s.
368 FUNDAMENTALS OF srATISTICS

(2) Discontinuous Curves or Frequenry Polygons. In such curves


the plotted points are joined by straight lines. The curves are not
smoothed. Such curves are also used to represent discrete series.

(;) Continuous Curves or Smoothed Frequenry Curves. In such graphs


the frequency polygon obtained by joining the point is smoothed.
Such curves are used to represent continuous series.

We shall now discuss these in turn.

The rules for constructing the bar frequency curves are the same
as discussed in the previous chapter in connection with bar diagrams.
The values of the variable are shown on the base line and above each
value a bar representing related frequency is dra'\\n. The lengths of the
various bars ~re kept in proportion to the size of the respective frequen-
cies. Sometimes instead of thick bar only lines are drawn. Since they
do not look beautiful, thick bars should be used for the purpose. In
table XIII bela".. the data relate to the number of rooms in houses.
The series is a discrete one and a bar frequency diagram represents the
data as shown in figure No. 13.

TABLE XIlI
Nflfllber of Rooms in HoufeI

No. of Roobls No. of Houses

1 17 0

2 18 3

3 ;:9 1
4 146

5 lOS

6 75

7 42

8

9 2,
GRAPHIC REPRESENTATION OF DATA 369
Nllmber of Rooms ;n HOllses

~aor--------'-'----------------------~

Bars can be horizontal also but vertical bars give better graph!>.
The data can be shown by.a discontinuous curve'as well. In this case
instead of bars or lines only points would be plotted on various heights
representing the number of hOJlses. These points would be joined
by straight lines and a frequency polygon would thus ,be obtained. In
table XIV below the frequencies of the.,_different values of a variable are
given. These data are shown in the shape of a frequency polygon in
figure No. 14.
TABLE XIV
Values of 11 Variable and their Corresponding Frequen&ies
Values of a Variable Frequency
I 3
Z II
3 3%
4 41
24
'-37'"tl FUNDAMENtALS OF STATISTICS

s
6
7
8
9
10
II
12
13
14
IS
16 z

Va/IIU Of It Variable and their Co"uponding Freqttenfies

Fig. 14

Continuous frequency curves are used to depict such data which


give continuous series and in which a frequency can assume any ~
of the variable within a particular class limit. In such cases ali far as
possible, the class intervals should be of uniform size otherwise the
curves can give a misleading impression. Ordinarily, there should not
be more than 15 class intervals though there is no hard and fast rule in
this respect. No class interva.l should be left out for the reason that
there is no frequency against it.
GRAPHIC llE1'R.BSENTATION, OF DATA 371

If the data are classified the bar frequ~ncy diagram or frequency


polygon representing them has to be smoothed for the purpose of getting
It continuous frequency curve. The proces:I of smoothing a curve
is discussed below.
The data given in table XV below have to be shown by a smoothed
curve :-
TABLE XV
Age distribution oj a group of people
Age (in Years) No. pf people Age (in Years) No. of people
11-10 If 2.4-2.6 60
10-12. IS 2.6-2.8 53
12-14 19 28-3 0 42
14-16 2.6 30 -32. 31
16-18 38 ;2.-34 18
34-3 6 to
18-20
2.P-22.
51
)1 36-;8 .,
2.2.-2.4 64 38-40

These data should first be plotted in the shape of either a fre-


quency curve or a frequency polygon. Figure No. 15 (a) and 15 (b) re-
present the data in the shape of a bar frequency curve and frequency poly-
'~on respectively
Age dis!}:ibution of d group of people

, i
0 .r
I
I

~
0

z0
-
r
: T1 tnJ
Fig. 15 1\
372 FUNDAMENTALS OF STATISTICS

In plotting the data in the above manner it should be noted that


tht" various bats ate joined together and no space is left blank between
them. This giveS an impression of continuity of the distribution. It
is possible to obtain a frequency polygon from this graph. If the mid-
points at the top of each bar are joined together by straight lines, a fre-
quency polygon would be obtained. A frequency polygon can be
directly plotted also if instead of d~awing bars only points are marked
against various values. However, it should be remembered that the
points in such a case should be plotted against the mid-values of each
clas:s-intetval. Thus in the above data the £tequency of 12 which is
againct the class interval of 8-IO years would be plotted against the
value 9 years on x-axis. :;:f the data given in table XV are plotted in this
manner it would give a frequency polygon of the shape shown in Fig. B. I,
Age distribtltion oj (/ group Age di slriblltion of agrollp

OJ people of people
No ofPropir
.. __. ..... - '''0. of P~Dpll'
80
!- 1 80

'I

r
6(/ vl'\ 60 t- 1.11\. 1
II f\ I

so i
so
Il ~ ! I

I !/ 1\
t-
40
I
.f- , \ II
I-t-
~
$0
~ 1 lJ. ~ I
, ~
j_y
I

I . ~ L
10 -~)III
I
~
\
'0 10 I-

I _L itJ
I
. ~_L
t'-
~ 'II ,. IS ,7 19 21 2} ]~" 2' 31 3Jl5_.- J,
l_LLL-l_l_
, " I)"" 19 21 lH5l7 ~ 31 .n 34 31 n
Agr AJ'
Fig. 15 B Fig. 15 C

The frequency polygon shown in Fig. 15 B can be smoothed hy


freehand. The angularities can be removed and a smoothed frequency
curve of the type shown in Fig. 15 C can be obtained.
In smoothing a frequency polygon extreme care should be exer-
cised. It should never be forgotten that a smoothed curve ~s meant
to represent a frequency bar diagram. or a frequency polygon and as such
in smoothing a curve the main characteristics of the data should not be
obliterated. The top of this smoothed curve would generally be above
the highest point of the frequency poly-gon. As far as possible the curve
GRAPHIC REPRESENTATION OF DATA 373

should be m?de regular without any angularities but smoothing should


n~t distort the original shape of the curve. The extent to which Some·
thing can be done would depend on the type of the data that are being re·
presented. There are come phenomena which ordinarily give a normal
or moderately skew curve. In such cases the polygon can safdy be smo-
~thed to a considerable extent. Examples of this type of data are dis-
ubutions obtained by tossing of ';oin or the measurement of the leaves
of a big tree, etc. Frequency polygons representing economic or so-
cial phenomena do not ordinarily give a normal curve. In such -cases
smoothing should not be done beyond a certain limit othenvise the
nature of the distribution would be misrepresented. 1n such cases only
minor irregularities should be removed. Another point to be kept in
mind while smoothing a curve is that the area under the smoothed curve
should be the same as the area of the polygon, or it should be equal to
the total of the frequencies. It is always better to first draw a frequency
bar diagram and then join the top mid-point of the bars to get a frequency
polygon and after this to smooth the frequency polygon to get the smoo-
thed curve.

TH"ORETICAL FREQUENCY CURVES.

Normal frequency curve or normal curve of error

One of the most important properties of a normal curve is that


It IS not skew, or, in other words, it is perfectly symmetrical. The
shape of such curves is. like that ofa. bell and that is why they are some
times called bell-shaped frequency curves. In a normal distribution the
frequency decreases symmetrically on either side of the central value
which has the maximum frequency. As has been said in the chapter
00 Skewness, in such frequency distributions the values of mean median
and mode coincide. A symmetrical distribution takes all values
from - 00+00 and in it the frequencies are in a definite mathem&.tical
relationship. The mathematical relationship is that the logarithm of a
frequency at any distance x from the mean is less than the frequency
against the mean by a quantity which is proportional to Xl. Since this
relatiQnship holds good for all distances from the centre of the distri-
bution, the frequencies in a normal curve always stand in definite pro-
portion to the frequency at the centre. It follows from it that the area
covt"red between two perpendiculars-one at the centre and the other
at any o'ther point on the base line-·would always be a fixed ratio of the
total area of the curve. ThiS area relationship of a normal curve is of
very great significance in statistical analysis. \X':e shall discuss it in a
later chaptf'r.

• For detailed description see chapter on Theoretical Frequency Distributions.


314 1"UNDAlf:ENTALS OF STATISnCS

It should be t'ememb~r~. however. tb.llt il per/lttl] SJIIIlIIllritllJ


N 1I11qr!l!aJ l'lIrP8. A pe?fcctly symmetrical curve
fll'TI' 111(1)1 1181 IIIt1.rsari!,
may be fiat· topped whereas a normal curve is never such. Again in a
pe.rfectly symmetrical curve values may taper off to zero but in a no~mal
cU'rve values never taper off to Zero. A normal curve has a tendency
to slope towards the. baSe line but. it. never touches it. Theoretically
the scale on the base .line of a normal curve extends to infinity, The
data relating to economic and socral phenomena never- giv~ a normal
~urve. Sllch phenomena g!ve moderatel), aSymmetrical curves.

Normal curve: can be used to represent data which are of a theore-


tical nature antt in which particular' type of mathematical re1atignship
holds good. For example, the distribution obtained by tossing twelve
coins 4096 times may give a symmetrical distribution and these data
may be represented by such a curve. Figure No. 1,6 below gives tbe
shape of a normal curve : -

Normal. _CtIrtI, of Brror

Fig. 16

Moderately asymmetrical. frequency cnl'ves

. In. s~h cases as the name suggests the curves are not symmetrical.
The frequencies of various values are not in any mathematical relatiQn-
ship. They arc: most common, type of curVes, as in actual practice,
perfectly symmetrical curves are rarely obtained. Such curves may be
either positively skew or negatively skew. Figure No. 17· shows a
curvewhichispositively~kewand Figure No. 18 a curve which i,s nega-
tively skew : -
GRAPHIC REPRE~ENTATION OP DA1A 375
Positively Skew Cllrl1e

Fig. 17

Negatively Skelll Curve

Fig. 18

r.shaped or extremely asymmetrical frequency curves

In such curves skewness is very high and the size of the items
containing the maximum frequency is generally in one corner (not in
the middle as in the case of symmetrical curves). Figure No. 19 below
shows a J-shaped curve : -
376 FUNDAMBNTALS OF STATISTICS

J-Shaped Cllrve

Fig. 19

U-shaped curves

In the U-shaped curve the values at the extremes have very high
frequencies and the values at the middle have very low frequencies
Figure No. 20 shows such a curve : - I

U-Shaped Curve

l
--------------~

Fig. _20
GRAPHIC REPRESEN'CATION OF DA.TA 377

Fig. 2.1

CUMULATiVE FREQUENCY CURVES OF OGIVES

The ~requency curves which have been discussed so far ~elat~d


to such serles In which the frequency of each class was shown agatnst It.
Sometimes it is more useful to cumulate these data and to draw what
are called cumulative frequency curves. It is important to note that
series can be cumulated in two ways. In the first method the frequen.
cies of all preceding class-intervals are added to the frequency of a class
and in the second method frequencies of all succeeding classes are added
to the frequency of a class. The series cumulated according to the
first method would show the frequency of classes Jess than a particular
value whereas the series 'cumulated according to the secono method
would show the frequency of ::lasses more than a particular value. The
first method gives a series cumulated upwards and the 'lecond one
a series cumylated downwards.
The technique of drawing frequency curves and cumulative fre-
quency curves is more or less the same. The only difference is that in
case of simple frequency curves the frequency is plotted against the
mid-point of a class interval whereas in case of a cumulative frequency
curve it is plotted at the upper or lower limit of a class interval depend-
ing upon the manner in which the series has been cumulated. It will be
plotted against the upper limit of a class-interval when the cumulation
has b~en done IIpwards or when a "less than" series has been obtained.
If, however, cumulation has been done dOJVnlJlards so that a "more than"
series is obtained frequencies would be plotted against the lower limit
of class-intervals. Table :No. XVI below refers to the age distribution
of a group of students. The figures have been cumulated upwards so
that against each class interval the total of the frequencies of all previous
classes is wr~tten. The cumulative frequency curve representing the
data is plotted in figure No, 22. From it median and quartiles have also
been calculated.
378 FUNDAMENTALS OF STATISTICS

TABLE XVI
Age dillribllfioll of a groNP of ItNdentl

Age Frequency Cumulative


Frequency

,- b 40
6- 7 96
7- 8 15 6
8- 9 122
9- 10 306
10-11 40 2.
II-12
494
12-1 3 574
13- 1 4 63 8
14-15 682
15-16 702
16-17 7 10

Age dirlriblltion oj a grollp of sllIdelltl

000 j -

700
I _j
_/' V Ii
600
I

~ 'fOO ~
-. - - - - - - -- -- ' /
"~
~ V. · I

I:r-

~~
400

t
30
.. .. - - - - --
V ·•,
.
V ·· ·•
1-.. -- v,
·;.tt
"-' ,200

10a /" ,Q, :~


,
V . ·
I

~_L~_
I
o, L-.L.._ ,,:,._l_.. I
i
7 8 9 10 It 12 I) 14 1516 II
4g e
Fig. zz

&'he above graph shows that the total number of students reading
is 710 as such the value of median will be the age of the' 3~ 5th student,
of first quartile, age of In.5th student and of third quartile, age of S 32..5 th
GRAPHIC REPRESENTATION OF DATJ. 579

studen~, (in graphic method median is the value of (n;-) items and

first and third quarti.les. are similarly the values of (~) and (34
n
)

items respectively).
To find the value of median, first of all, the point on vertical line
which reads 355 would have to be located. From this point·a line would
be drawn parallel to the base touching the ogive at some poi~t. From
this point of inter-section another line would be drawn parallel to the
vertical scale touching the base line. The value at the point where
the base line is touched wou-Id, be 'the value" of the ~edian. In the above
graph the value of median located in this fashion comes to about 10.S
years. In the same manner values of quartiles, deciles and percentiles
can be located.
Just as f'teCJ.uency curves are smoothed, similarly the ogives or
cumulative frequency curves can also be smoothed. "The data given
in table XVII below have been cumulated by both the methods discussed
earlier and the smoothed frequency- cUrves have been shown in figures
Nos. 2.3 and ~4. :-:-'

'Height in Inches Frequency


_:Cutnula:tlve
Fieq~ency (less
i Cumulativl'
Frequency (more
than 58", H· ~ 57', S8
etc.) etr.)

57-58 ~ .-• 6~8


S8-59 9 12., 63S
S9-6o 2.6 ' 38 62.6
60-61 31 69 600
6I-62. 45 [14 569
6'2.-63 64 17 8 52.4
.63-64 78 ~l6 460
64-65 '85 341 382.
65-66 96 437 2.97
66-67 72 50 9 2.0 I
67-68 60 569 12 9
6S:'_69 43 6a 69
S~-70 I/) 632- 2.6
70-7 1 6 63 8 6
380 FUNDAMENTALS OF STATISTICS

Heigbt distribllfiofl Of a grOIiP of perso"s


(Ius than cur",)
after smoothing
700

'iDO
./
.......... -
500
,/
/
3
/
200 ./
V
- I--- v
I()O

0 J
513 5? 6D 61 4>.2 63 64 65 66 67 68 6' 70 71
LUI th4n H'tglr' in incltn
Fig. %3
Height ,distribution of a grOIlP oj perso"!
(more tball GlI1'fle)
after smoothing
700
~
600 -
500
""1_....._, I ,
~400
~
-r- ~"" I

" ""
:::. ,
~JOO
I.<:
'" 200
C
(00 ""- .........
o r-- ~
58 S9 60 &1 62 63 64 65 66 67 68 69 70 71
14Dr"t""n JI,ijhf in inclrts
Fig. %4
Cumulative frequency curves have some advantages over simple
frequency curves. Simple frequency curves cannot be compared with
each other unless the magnitude of the class intervals is uniform in all
the series. But there is no such restriction in case of cumulative
frequency curves. Again, uneven class intervals of series may distort
simple frequency curves but a cumulative frequency curve is not affected
by the unequal size of the class-intervals. Cumulative curves are well
adapted for interpolation. We have already seen how the values of
GRAPHIC REPRESENTATION OF DATA 381

m.!dian, quartiles, dedles and percentiles, etc., can be interpolated from


frequency graphs.
The relationship between simple frcCjuency curvcs and cumulative
frequency curves can be better understood from figure No. \vhich
has been constructed from the data given below in Table XVIII : -
z,
TABLE XVIII
Dr.i(y Incom~ of {I grollp of persons

Income Cumulative
No. of Persons frequency
Rs. (Less than)

1- " 3 3

.1- 3 ~o 13

;- 4 14 .17

4- j zz 49

5- 6 38 87

(,-
7 40 12.7

7- 8 38 16,

8- ') H 19 8
,.0
9-1(· to

10-11 I 1 ZZ9

11-]." 10 23()

1%-13 I I 250

13-14 R Z5 8

14- 1 5 4 z6z

15-16 4 .166

16-17 5 %69

s69
382' PUNDAIC'.NTALS OF STA nSncs

D~i!J 1"((1*' (If ~ gt'(J'!P (If p4rlfJRS


(Simpl, tZlltI ~1In1N1aIi'" Frlgtllll'.J CIIt1IIS)
»O~--r---r---r---~~~~~~~~

:t,~ ~:!=:
:r:!t ~~~
,
.. .. .,
I
I
,I ,
I
:
:
,
,
, :T
.
,
, 1- : ., . ,,

,
I
~ ,
U-, r-f J r
I

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~/~ ~ ~ ~
".!_ ~..,~ .... "',.. ClC)o..~::::~b~~ ~
Fig. z~
The upper part of the above figure shows the method by which
an ogive is constructed. The area of various rectangles representing the
frequencies in different classes is in proportion to the number of items
in their respectiv"e groups. Since the operation is cun;lulative the base
of a rectangle is the cumulative frequency of all previoUs class intervals.
Thus, the base of the first rectangle IS zero, that of the second 3 and tfult
of th~ third _1; and so on. The cumulative frequency curve is passing
throl1gh the upper limits of the variolls class intervals as th: data. have
been cumulated upwards. 'The lower part of the figure glves Simple
fre9uency bar graphs. From it frequency polygon can b~ constructed
which in tUI:n can be smoothed to give the simpte frequency curve.
Galton's method of locatin~ median
Francis Galton bas gIven a method by which median can be
locatbd grapbically without cumulating the series. In this type of graph
the values of the variable are marked on the horizontal hne and the
GRAPHIC REPRES~A'I'ION OF DATA 383
.frequencies on the vertical line. The ':lpecial feature of this graph is
that for each successive plotting of frequency the previously plotted
point is taken as the base. The frequencies are shown by plotting the
points equal in number to the frequency; A curve is then drawn passing
through 'the middle of the various groups of points p!otted in tbe above
manner. A line is then drawn parallel to the base line from that point
01), the vertical scale whose value is the median. This lin,c touches the
cutve and from this ,point of inter-section a perpendicular is drawn to-
wards the base line. The point at which the perpendicular touches the
,base line gives the value of the median. The data given in table XIX
below llre plotted in this fashion in Figure No. 26 :_

TABLE XIX
Marks obtaintd by 32 students

Marks No. of students I Marks No. of students


31
B
H
I 1
2

4
4S
SO
54
2.
S
3
35
40
42 I 4
S
3
Marks obtained by 32.
58
60

II/ldenls
,
2.

10

4D SO
Marks

Fig. 26
384 FUNDAMENTALS OP STATISTICS

In the above data the frequency against 31 is 1. Therefore, one


dot has been placed above the value of F. Now for plotting the fre-
quency against the value of 33 the base would be at the level where the
previous point was plotted. If necessary line can be drawn at the level
where the first point was plotted. From this new base line the frequency
of z against the value of 33 will be shown plotting two points vertically.
These represent the marks of the second and third students and so these
points would be on the lines on which the frequencies are z and; res-
pectively. Now for plotting the next frequency of 4 which is against
the value of 34 a new base line would be taken. This base line would
be at the level at which the last point (point NO.3) was plotted. In this
way the whole data would be plotted on the graph. After this a curve
would be drawn passing through the mid-points of these sets of points.
If there are three points in a set the curve would touch the second point.
If there are only two points it would pass from thei1: middle. In the
way the curve can he very easily plotted. The value of the median can
be very easily located after this. In ~he above case the median is tbe
value of marks obtained by the 16th student. Therefore. from the point
where the vertical scale reads 16 a line paraJlel to the base has been
drawn. This line touches the curve and from this point a perpendicular
has been drawn towards the base line. It touches the base line at a point
showing the value of 41. This is the value of tht!" rnedian,

GRAPHS ON RATIO S(.."lB

NolllraJ and Ratio SGaJe


IVnf",nl Seal, ~a!to Scol,
s 32 320 3200 All the graphs that
have been discussed so
far have been drawn on
natural or arithmetic.
4 r~ 160 IGOO scaie. Such graphs
indicate the absolutr
changes in the values of
3 8 80 1$00 a variable. Ratio scale is
meant to study relative
change& in values. It
tells us about the rale or
2 4 40 400
ratio of ,bange. '"I:he use-
fulness of such studies is
too well known to be
2 .2~ 200 emphasised. The oppo-
site figure discloses the
difference between the
(1 10 10';
natural scale and a ratio
scale.
Fig. 2.7
GRAPHIC RBYRBSENTATION OF DATA 385

It will be noticed that in the natural scale ~ual differences are


measured by equal distances on this scale. Thus the difference of I and 2
is shown by a distance of ". Similarly the difference between 4 and
5 is also shown by i". These differences are absolute and thus the
natural scale measures absolute differences. In the ratio scale the
difference of I and 2. is measured by i· and that of 8 and 16 also by,".
2. is two times of I and I 6 is also two times of 8. Thus. the proportionate
change in the two values is equal. Ratio scale thus measures propor-
tionate changes. Natural scale is always in arithmetic progression where-
as the ratio scale is in geometric progression. The difference between
-natural and ratio scales is further clarified in the following table : -

TABLE XX

Monthly income of all Individnal

Month Income Rupees Monthly Increase Percentage Increase

I 100 - -
2 2.00 IOC 100

3 300 Jt')o 50

4 4 00 100 33-3

5 ~OO 100 25

6 600 100 2.0

7 700 100 16·7

8 800 100 14·3

From the above table it is clear that whether the income increases
from 200 to 300 or fr~m 700 to 800 the absolute increase is equal in both
the cases. However. the relative changes are not equal. A change
from 100 to 200 shows an increase of 100% whereas an increase from
700 to 800 shows an increase of 14.3% only. Ratio scale very clearly
o;hows this difference.
25
FUNDAMENTALS OF STATISnCS

Logarithmic Scale and I ~garithmic Curve

The relative changes can be studied graphically itt the following


two ways : -
(1) By plotting the logarithm of the given values on a natural scale.
(2.) By plotting the given values on a logarithmic scale.
In the second method only the vertical scale is in logarithmic ratto.
The horizontal scale is in arithmetic ratio. For this reason graphs drawn
in this manner are also called Semi-L6garithmic Graphs.
In Table XXI below, increments of two sums (A and B) of Rs. 100
and Rs. 1,000 respectively at 10% compound interest rates are shown.

TABLE XXI

Increment of t1ll0 I1Ims A and B at 10%


CompoHnd Interesl

I
Year A B Logarithm A Logarithm B

I 100 1000 :t.oo 3. 00


:& IlO 1100 %.04 ).04

, 12.1 12.10 %.08 ).08

4 In; IHO 2..12 3· n


j 146 1460 2.16 3. 16
6 161 1610 2.20 3. 20

7 177 1770 2.24 .•• 24

8 I9S 195 0 2.2.8 2.28

9 ZI4 .2140 .2.3 2 3.3 2

10 %)6 13Go :&.36 '.3 6


GRAmzc REPRESENTATION OF DATA

The above data are plotted on a natural scale in figure No. 18--
below : - "

SU11I1 of Rs. 100 tIIId Rs. 1,000riling at Compound Interest rail of 10%
on Nalnra/ S~ale
Its.
2400 ,

ZIOO
lY
/
1800 /
/
/
'500
V
'200
/'
V
V
900

61>0

JlIO
1-

2 .3 .. 5
YhZI"S
6 7 8 '0

Fig. 2.8
From the above figure. it appears that the sum of Rs. 1,000 is in-
creasing at a higher rate than the sum of Rs. 100. This is so on account
of the fact that natural scale measures absolute changes and the absolute
changes in case of curve B are no doubt more than the absolute c;::hanges ,
in curve A. If, however, the data are plotted on ratio scale, this anomaly
would be no more. In figure No. 2.9 below the original data are not
plotted but their logarithm have. been plotted on a natural scale.
383 FUNDAlmNTALS OF STATISTICS

SHIIII of R/. 100 and RI. 1,000 rhing al Compound Intenst rate of 10%
Log'!
3-8 -

----
$·4 f----
I
~
(
,
-
I--
3· 0 I-- K -
I
2· 8 Fig. 29

2·4t-- ~I
4i I I
I
~~
2· 2

- ~
i !
'-
I
~ /
I

-J
1 .l
2 J 5 6 7 8 ~ . to
Y,ors
From the above figure, it is dear that the two sums are. increasing at
the same rate. This graph can be plotted on semi-logarithmic scale
also. In this case instead oflogarithms, the original data would be plotted
on a semi-logarithmic paper as in Figure 30 •
SUIIIS ofRs. 100 and R/. 1,000 rising at Compotl1ld Interest rate of .. 0%
Ammount
(Ill)
4000
5000
- ;--_.
I I
"000 r·
300l>"
B

--
~ooo
,
i--"

1000
800
600
400
, -_ Fig 30

-;0 0 A

100
2
-I--
3 4 5
---
6 7 8 9 to
Y"ars
GRAPHIC REPRESENTATION OF DATA 389

Reading graphs on ratio scale


(1) If a logarithmic curve rises upwards, it indicates that the gro-
wth is positive and va,lues are increasing; if it falls downwards, it indi-
cates a reverse tendency.
(2.) If a curve is a straight line, it indicates that rate of growth
is constant_
(3) If the curve rises,more st~eply atone point than at the other,
it indicates that the rate of Increase In the former case is more than the
rate of increase in the latter.. ~im~ladY', if a curve falls more steeply at
one point than at the othc:r, It IndIcates that the rate of decrease is more
in the former case than 10 the latter.
(4) If two curves are parallel to each .other, the rate of increase or
decrease in them is identical. If one curve 1S more steep thall the other,
its rate of change is higher than that of the other curve.
Special features of ratio scale
(I) In ratio scale equal distances on the vertical scale represent
equal proportionate cl!anges in values. Thus, the. difference betwecl
10 and 2.0, 100 and 2.00 and 1,000 and 2.,000 would be measured by equal
vertical distances on the ratio scale.
(2.) Ratio scale does not begin with zero. This is a great advantage.
We have seen that in natural scale, many times, a false base line has to
be used to accommodate big figures and this leads to inaccurate represen-
~at;ion of data. In ratio scale there is no such difficulty. However,
It 1S not possible to show zero or negative values on a ratio scale.
(3) It is not necessary to have a base line in ratio scale. Any
curve can be shifted upwards or downwards without affecting the pro-
portionate changes in its values. This is in fact a very great merit of
ratio scale as it facilitates comparison. Two or more curves which
have to be compared can be brought close to each other and studied.
(4) Ratio scale can easily represent such data in which the range is
very high. In a natural scale, it is not possible without false base line.
(5) It is not possible to study an aggregate in its various component
parts in a ratio scale.
(6) Two or more scales can be shown on the same paper. Simi·
larly, two or more series which are in different units can be presented
on one graph paper in ratio scale.
(7) Logarithmic graphs are very useful for purposes of showing
a series of indices or index historigrams. Index historigrams are meant
to study relative changes and as such should be presented on ratio scale.
Ratio scale, however, cannot study absolute changes.
GRAPHS SHOWING FUNCTIONAL RELATIONSHIP
When the relationship between two variables is one of complete
dependence, it is said to be functional relationship. Thus, if the value of
390 .
y can be determined by given values of x •.J is .said to.be a f!maiDII of·x.
Generally, such relationships are expressed by" the ·equatlon y =1 (x):
If, for example, the value of.) is alway~ two· times that. of X,)=2. (x). In
this case x is an independent variable. The values of the independent
variable are shown on x-axis ruld those of the' function (or dependent
variabte) ony-axis. If it·is done, a curve can be obtained to represent this
relationship. Functional relationship can be of various types. We
. shall discuss below some simple ty~es of functional relationships and
.the graphs which are constructed to ~epr~ent them.
Linear relatiollship
Linear relationship~ are those which always give straight. line when
plotted on gtaph paper. They can be obtained In a number of ways.
If two variables are related in such '8; manner that their values are always
equal, the relationship is·obvious~y'.(~f.~e form.1=X. Here, the values cf
.1 and x are always equaL, If such·values' ~e plotted on a graph paper,
a straight line would be obtained .and if tbe horizontal and vertical
scales are equal, this line would run diagonally dividing the two right
angles ~.t-th~ left· bottom and, at thc:right top-,of the figw:e. Figure No
3 I beloW' gIves s.ucb a graph:-" -
Grap/J oj" 10. efJflll/~~1I ;'=x
·X

:r
sJ'
"
- V
----
1 .. V
.. V
t7
"

s
V
4
" 17
3
[7
2
\/' \

7
:v o 2 3 4
x
5 6 7 8 9

Fig. 31
1 391
GRAPHIC REPRESENTATION OF DATA

.In the l1bove graph when the value of x is I. the value of .J=I and
when the value ofx is z. the value of.J is also t. All corresponding values
of x and J are equal Simple equations of the first degree also disclose
linear relationships. Such equations are of the form.1=a+bx. In this
equation. a is a constant representing the distance from the point of ori-
gin to the place where the line would touch the vertical scale or.J axis
and b is a constant representing the tangent of the angle which the line
would make with the horizontal base. If a=I and b=3. the equation
would be.1=I+ }X. To construct the graph. various values of x can be
assumed and the corresponding values of.J can be calculated from the
above equation. Thus. the following type of series can be obtained.
TABLE XXII
Vallles of x and.J ill t/J6 aqllalion
.1=1+3 X
.J
4
:z. 7
; 10
4 1~
1 16
If the above dat'a ate fllotted on a grl1ph paper. the following type
of figure would be obtaI1lcd :-
CNrQ8 of th8 81jllatioll .1=I+;X

yO'L-__.__~____-+----~~--~
I 234 5
x X
Fig. 3~
392 FUNDAMENTALS OF STATISTICS

It is clear that the function is linear and that is why a straight line
has been obtained by plotting them. Any two values of x and corres-
ponding values of y were sufficient to locate the lipe. That the c-:.ave
represents the equation is proved by the fact that the above equation is
satisfied by the co-ordinates of various points of. this curve. In a linear
relationship if one variable increases by a constant amonnt, the corres-
ponding increase in the other variable is also constant. Tn the above
case x variable always increases by I andy variable by 3. Such series
in which there is constant increment of this/type, are called arithmelie
teries. -
In physical sciences various types of linear relationships are found.
In economic and social phenomena such relationships are rare. How-
ever, one example is that of the growth of money at simple rate of interest.
Ify represents the sum to which Re. I would amount at the end of two
years at r rate of simple interest, the equation would be of the following
type :-
y=1+"X
Thus in ten years at 5% simple interest, Rs. 100 would amount to
100+ (5 X 10) or Rs. 150. In this caser is constant and so the relationship
is'tinear. If plotted on a natural scale such data woulaalways give a
straight line.
Non-linear relationship
N011-linear functions are of many types. We shall discuss some
common forms of such relationships particularly those wbich can fit
into economic and social phenomena. Parabolic and hyperbolic -func-
tions are very common in physical sciences and they can fit in economic
data also. Parabolic curves are used to represent data to which laws
of increasing and decreasing returns apply. Demand curves and utility
curves are also parabolic in nature. The general form of e'Juation in
such caSes isy=axb. The curveis parabolic when the exponent b IS positive
and hyperbolic if it is negative. In suc' ~ curves there is no constant
term. The following eXl1I1l.ples would illustrate the d.ifference between
parabolic curves and hyperbolic cu~es.
If the equation is y=XI. the following series Can be obtained :-

x y
-4 16
-3 9
-2- 4
-I I
0 0
I
2. 4
3 9
4 t6
GB.APHIC REPRESENTA'llION OF DATA 393
The graph of the above data would be as follows : -
Parabolic curve of the equation,) Xl

16

14 1\ J

12 \ /
10
\\ /
s
6
1\ I I
4
\ V
2
1\\ V
/
oL ~ ./ --
~4 l -2 -, 0 2 3 4

Fig. 33
The above figure is that of a parabolic curve. In such relationships
an important characteristic is that if x variable increases in geometric
progression y variable also increases in geometric progression. Thus,
if the values of x are 2, 4, 8 and 16, the corresponding values ofy would
be 4. 16, 64 and.256. Both the series are in geometric progression.
If the relationship is y = x - 1. the following series can be
obtained :- .
x .Y
(x-t)

i
t z

I I
394 FUNDAMENTALS OF S'l'A'l'ISnCS

The graph of the above data would be of the following type : -

Hyperboli&' &,lIrVG of the eqlllltioll .J=x- 1

Jr-~------r---------r---------'

2~--~----~--------r---------;

o~ ______ ~ ________+-__ ~ __ ~

o 2

.Fig. ~4

. In economic data a common form of relationship is y=abx. In


these equations one of the variable quantities is an exponent and that is
why such equations are called expoll8ntial 8qllatiolls and the curvt:s obained
by them 8xpolllfltial fIIrlles.

If 4=1 and b=z. in the relatio.nshlpy=abx the following series ca1l


be obtained:- \
(1 Xz.x)

z. 4

4 16

If the above data are plotted on graph paper, the following type of
CJtponential curve would be obtained :-
GRAPHIC REPRESENTATION OP DATA 395
Expo"'"tial &fIN1, ,.,prUlntill& Ib, 'IJ..lIIltiDII
.J=ab
40

32
,

,
,j
v
16
/
8

/
V
o
/ I
o 2 3 4 s
X
Fig. 35

It will be noted that in this case, x-variable increases in arithmetic


progression and-j-vanable in geometric' progression. According to
Malthus, the relationship between food supplies and population should
give a curve of this type. In such a. case, x-series would represent food
supplies and y-series, the population.
Curves based ·on 1:he· following rela.tionship are also very
common : -
y=a+bx+~xl+~+ . ...
Such series are called poteNtial smtS. Though such equation do
not stri&tl1 give parabolas of the conical type yet they are popularly
known as parllbolas of second order or third order depending on whe-
ther the eqUlltion is carried up to x 2 or x 8 respectively. These parallt>las
do not give curves .of a uniform type. We shall study parabolic curves
~ more detdls j.tl the. chapters on Analysis of time series and Inter-
polation.
396. FUNDAMENTALS OF STATISTICS

Suppose in the relationship y=a+bx+cx l , the value of 0=10 of


b=2. and of &=3. the following series can be obtained : - •
x y
o IO+(2.XO)+(;Xc ll) =10
10+(2. X 1)+(, X 13 )=15
2. 10+(2. X 2.)+(3 X 2. 1 )=2.6
-' 10+(2.X,)+(,X''')=43
4 10+(2. X4)+(3 X41)=66
5 10+(2. X5)+(, X5 Z)=95
It the above data are plotted on graph paper they would give a
curve of tbp. following type : -
Parabolic curve of the equation
y=a+bx+cx2

fOO

I
90
II
80

70
-ij
j
7 J
60

y SO
I
/ I

V
40

30
v
7
20
/
V
10 ~

o
o 2 .3 4
X
Fig. ;6
The above curve can be used to study the relationship between
cost and return. With every constant increase of one unit in x-variable.
GRAPHIC REPRESENTA'lHON OF DATA '397

the y-variable increases by pr<?gressivcly increasing amounts. Thus,


if x-variable increases from 0 to r,y increases by ~ units if x increases
from I to 1, y increases by I I units and if x increases from .% to 3,y
increases by 17 units lI"ld so on. Such relationships can be used to
illustrate the law of increasing cost where with every increase in the
quantity produced cost rises 'more than proportionately.
All the above mentioned types of curves s~owing functional
relationship can be plotted on ratio scale also. Thus, the· functional
relationship y=XIl can be expressed as logy=.z log x; andy=abx can
be represented as l~g y=log a+x log b. Similarly other curjVes can
also be· represented logarithmically.
Questions
I. What POUlts should be taken into account in the construction of graphs?
Discuss in detail ..
2.. What Is false base line? Under what circumstance should it be used ?
3. Differentiate clearly between natural scale and the logarithmetic scale used
in graphical presentation of data.
4. Write short notes on :
(iI) Oglve (b) Historigram (t) Frequency curve (tI) Frequen9' polygon (e)
Zee-chart (f) Band and Zone curves (g) Normal curve of error.
~ . Explain the essential difference betWeen parabolic, hyperbolic and expo-
nential curves. Which of these is most common in economic data ?
6. The following table gives the cost of living index numbets relating to
In~ labourers. Represent the data graphically : -
Year Index Number
(blille 1944=100)
1946-47 100
1947-48 106
1948-49 72.0
I949-~o 134
I9~O-SI 137
I9~1-5~ 139
19~2.--~3 14S
1913--54 142.
19~4--55 146
7. The following table gives the values of Imports and of Exports of (undivid-
ed) Inet. for the years 192.0-2.I and I92.I-22 in crores of rupees:-
t92.~2.I I 92.l-2.2.
Month. Imports Exports Imports Exports
April 2.Z 2.8 2.6 18
~y 24 2.8 2.1 2.0

l:t;
August
:~:~
~I 2.0
~~
2.I
~~
20
September 2.9 2.2. 2.0 zo
October 32. 2.I 2.3 18
November 32 19 2.6 2.0
December 32. 2.0 23 2.2.
January 31 19 2.8 2.3
February 25 18 20 2.2.
March 2.4 19 2.I z8
Plot the above ligure" on a graph paper. and sh ow also the balance of trade.
(R. Co", ... Allabab1d. I9~8).
.398 FUNDAMENTALS OP STATISTICS

8, Study the following table and dtaw 11 StaPh on loguitbmic ecale to ahow
nct supplies of ~ and adult population in (Undivided) India : -

NctImportll
Year Production Seed and (+)orEx- Adult
of cereals wastage ~rts(-) po~ulation
'000 tons '000ton8 000 toos 000

t9SS-3 6 54. 1 77 6.72:1. +1.793 %90. 84 6


1936-3') 59.n 8 7.445 +1.245 %94.9 1 7
I9H-3 8 ,8.739 7,34 2 + 63 0 29 8.9 87
193 8-39 54.468 6. 809 +z.o44 30 3.0 18
1939-40 57. 244 7. 1 56 +2,221 ~07,u8
1940-4 I
1941-42 ,6.,,0
54.808 6.85 1
7.069
+ 96 3
-r 43J%.
31 1 • 1 9 8
~15,%.69
1942 -43 ,8.726 7.34 1 - %.9% 3 1 !iI.399
1943-44 62.9 2 5 7. 86 5 + 29 8 ~23AIO
1944-45 59.5 2 7 7.441 + 693 !27A81

(B. CtI11I., AJ/aIMW, 1946),

9. 'Vhat are the advantages of the Ratio Scale over the Naturs} Scale i' Plot
the following, data graphically on the logarithmic SClIle ; -
Year Total Notes issued Notes)n c:irculation
in crares of rupees in crares of rupees
1933-34 167
19H-n I7:1.
1935-3 6 167
1936-37 19:1.
1937-3 8 IBS
1938-39 187
1939-40 237
1940-4 1 %.5 8
1941-42 410
J94 2 -45 62 5
(B. COMJ.. Nagptlr, 1943).
10. Plot the following figures reating to population of India (undivided) 80 as
to show the proportionate increase in population from onc period to another:-
Year Population.
(000,000'& omitted)
t87J uo
1881 :I.~O
18 9 1 2.90
1901 '295
19 I1 ;15
I9 u 5to
19;1 HO
1941 390
(B.o.•• NIfdX#T. I~J).
GRAPHIC lI.BPl\BSBN"l'A'llION OF DA'JlA 399
II. Show the results of wot,king of class I railwayi graphically and comment
thereon.
(In million. of
Capital outlay Gtpn earning)
I9.z~-24 464 JO
1924-1, 473 74
79 2 S-16 4 87 n
192 6- 2 7 50 S 71
19 27-.'l8 594 86
192 8-2 9 599 86
19 2 9-3 0 61 7 84
1930-3 1 62 7 77
193 1-3 2 63 1 71
193%-,33 63 8 70
19;;:-34 635 71
CB. CD1II., .AlII, IHO).
12. Reflfcaent graphically the data givCll below Oil a single sbeet of gnp}: P2per
to bring out clearly the relatiTc fluctuations in the prices of various articles. Draw
such conclusions as you can from thc graphs.
rr/HJ/~rll!s prit6r ;11 lVInpw
(hi rupees per maund)
Year Rice Wheat Linseed Gur Cotton Tobacco
19 zB 7.7 7. 0 6·s H·I 17.3
192 9 5.5 8.0 7·' 29.8 17. 1
193 0 3.6 6.5 6.2 17.3 14.,
1931 2.7 4. 2 4. 2 13.3 II.6
193 2 3·4 3·5 3·5 14. 8 4-9
I9H ,.2 3.4 3. 1 u·9 4·9
1934 2.8 3.6 4.1 13·. 5.7
(M. C-.• Alltlht."lItl. 19'''),
1.3. The following table shO'WI the total sales of gold by thc Bank of England
.)n foreign account. Represent the data graphically on the logarithmio scale:-
Year Pounds ('000)
1910 14,44'
19I:f 8,2.2.8
1912. 9,670
1913 7,94~
1914 8,027
1915 "3,076
1916 2,360
CB. CDm., AliIIWd, 1932).
14. The following table gives cost of living index numbers of Ksmpllr, Nagpur
and Calcutta. Represent the dacK graphically l -
Year Kanpur Nagpur" Calcutta
(I939= 100) (19'9= 100) (1939""=' too',
1944 ~I4 267 Z79
1945 ,08 219 aS3
1946 ,a8 as, 27'
1947 378 3 ao 30 9
1948 471 37 2 339
1949 47 8 377 348
1950 434 37% ;49
19,1 4,1 391 370
191 2 441 ,80 HI
19H 413 3 8, ,49
400 PuNnAMENTAL8 OF 8'l'A'l'J8TICS

IS. When should false base line be used? The following data give the Index
number of industrial profits in India. Represent it graphically : -
Year Index Number Year Index Number
(1929=100) (19 29- 100)
1941 187 1946 229
194 2 222 1947 192
1943 246 1948 260
1944 239 1949 182
1945 334 195 0 247
16. The following table gives the indices of the supply of money relating to
certain countries. Represent the data on a logarithmic sCale : -
Base 1948= 100
Year ending U. K. U. S. A. France
1945 86 92 47
1946 97 99 6z
1947 98 t02 71
1948 100 100 100
1949 lOT 100 125
1950 102 108 144
195t 103 115 170
195 2 105 119 192
1953 100 121 214
17. The following table gives the data relating to production, wholesale
prices, and cost of living in India. Represent them at one place by a suitable graph:-
Year and Indices of Indices of Indices of cost
quarter Industrial wholesale of living
production prices
(1939=100)
1951-5 2
I II7·' 459·9 144
2 II7·7 439·9 145
3 IZl., 43.5. 6 145
4 u6.0 401.9 139
195 2 -H
1 u6·7 373. 2 140
2 u8.: 386.7 143
5 1H·6 381 •2 142
4 13 2 .7 387 .4 142
19$3-54
I 135.5 395.9
2 134'9 407.3
3 137·5 39 1 • 2
4 137·3 395'~
18. The following table gives the prices of gold and wheat and net export of
gold during the years 1931-3:& to 1938-39 : -
Years Average price Average price Net export
of gold of wheat of gofd
(per tola) (per maund) (crates of RB.)
Re. as. Ra.
:l93:1-3 Z 2S 4 3·'
:l93 2 -H 30 I2 3.3
:1933-34 33 6 ·::.8
1934-35 35 8 3.1
1935-,6 35 4 3. 2
1936-37 36 0 3.9
1937-3 8 36 6 3.0
193 8-39 H r.a 3.4
GRf\ I'HIC RE ('RESENT." 'l'ION 0 F D.~ 'PA 401
Plot the abO're figures on a graph paper and comment upon the relationship.
(M. A., Aga, 1943).
19· Followin~ table gives the ptoduction of sugar io Cuba, Java and (undi.
vided) India during 1930-39 in millions of quintals. Represent the figures by a suitable
diagram and comment on their relationship.
Year Cuba Java India
19 2 9-3 0 44 '"9
193 -3 1
0 30 28 20
193 1 -;2 2S 26 24
193 2-33 19 14 28
1933-34 22 Ii ;0
1934-35 2S S ;1
1",~6 ~ 6 ;6
193 6-37 29 14 40
1937-3 8 29 14 52
193 8-39 26 ~, %7
(M. A., Palna, 194').
20. The following table gives the pwporuon of muried_ women in 1910 and
in 1920 from women of every $ge. Show graphically that the increase was most marked
for the women of younger years.
Ag~ Married women% Married women%
1910 19 20
18 17.0 19.2
zo ;6.2 3 8-4
SO·7 52 .9
62.0 64. 2
65-7 67.8
(B. Com., Nagpllr, 1944).
21. Descn"be the Lorenz graph. How does it differ from an Ogive? Illus-
trate your answer by fitting a Lorenz curve and an Ogive to the following data : -
Pertenta6e Df tlgf3 distribution Df the male
pop"latiDfl iff British I"dia, 19P'
Age groups Males Age groups Males
0-10 28.0 50-60 5. 6
10-20 2.0.9 60-70 2·7
10-30 17.7 70--over 1.1
30-40 14.3 Mean Age 23. 2
40-,0 9.7 (M. A., Pa/na. 1940).
Write a short account of the use of graphic methods in statistics. Draw
22..
a diagram to represent the ~a~ gi'(e~ in_ the. follo~ing tab!e showing the numb~r .of
rooms measured, in a certaUl 1OvestlgatlOn 10 which the sIZe lay between the ltmlts
given in the column on the left. (Area calculated to-the nearest square foot):-
Area of room Number Area of room Number
(Sq. ft.) (Sq. ft.)
zo and under 40 ~ zoo and under 220 18
40 ..
60
60
80
14
16
Z20
140"
Z4°
260
u
,
80
100
120
.. 100
120
140
~6
51
H
260
280
300
..
.. .. z80
300
3 20
a
I
a
140
160
180
•• 160
180
200
35
IS
2(,
310
340
360
..
..
..
. 34°
360
380
26
402 Fll'NDAMENTALS OF S']1.\TISTICS

Read off the median and quartiles from your diagram and check your results
byaetual calculation. (B. CD"'., Hons., Afllihra • 1944
2,.
DislfibliliOIl offirml ill Woollen alld Worllen [ndlil/riel
in Y Mllshire, according tD nfl1/lber oj operative .•
Operatives No. of firms Operatives- No. of firms
1-20 380 301-340 24
21-60 3 20 341-380 18
61-100 I82 ,81-420 II
101-140 147 4·U-460 16
141-180 92 461-5 00 9
181-:U.0 66 501-700 19
2U-260 ~9 7°1--900 IS
Z61-300 ,0 901- 16
Total number of firms 1~84
Represent this distribution graphically (by means of a cumulative diagram) and
'rom this graph estimate the median and quartiles of the group. (B. Com .• Llld,n" ..,
[93 0 ).
24. Represent the follOWing frequency distribution by means of a graph. Cons·
fUct the cumulative frequency curve also.
Class:interval Frequency Class interval Frequency
0- 5 13 SO-35 2S0
5-10 42 35-40 Z'37
10-1 5 135 40-45 .l~
15-20 237 45-5 0 '4%
to-2S 250 50-55 13
25-3 0 25 6

as. The following table gives the figures of production of paper and paper
lards tn India in 1000 tons .. Represent these figures by a Z-eurve.
:ar Jan. Feb. Mar. A~r. May June July Aug. Sep. Oct. Nov. Dec.
'49 8.1 7.7 8·7 9. 1 9·0 9. 1 9. 1 9.0 9. 0 8.4 8.z 8·9
50 8.3 8.5 9. 1 8·7 9·5 8·7 9·3 9·4 9·4 9.0 9·3 9·7
P 10.0 9.9 II.O 10.5 I1.Z II.O 10.9 10.1 II.~ II·4 II.4 n.l
26. The following table gives the wholesale price index numbers of certain
Ilntries. Represent the data on logarithmic scale.
Base 1948= 100
Year and Month India U.K. U.S.A.
1953
January 10 3 15 0 10 5
February 10 4 14 8 10 5
March 10 5 15 0 10 5
April lOS 152 10 5
May 108 ljI 10 5
June IIO ISO 10 5
July III ISO 106
August lIZ 149 106
September 110 149 [06
October 107 14 8 106
November 106 149 105
December 106 149 105
Gl\APHIO l\EP"ESEN'J.1Al.'ION OF DA'rA

2.7. What is an exponential curve? Construct the guph of the function y- 2


for the following values of x :
0, I, 2, " .., 5, 6.
:t8. What il the difference between a pambolic curve and a hyperbolic curye
Construct a parabolic curve of the .functiony-x fOl the following valuel of x :-
2, 4, 6, 8, 10, 12.

29' What is a potential series? Cons!ruct a CUrTe based ?Q the following


relationship.1-a+bx+&XI when the values ot a," and ,arc: respectively 12, 1 and 2~
50. Assuming that x represents the market prices of a commodity Ind.1 the
quantities of the commodi~y demanded at the given prices, construct a Demand
CUNe satisfying the following equation:
Log .1- a-o. 50(
or .1=antilog (2-O.5x.)
For x you may assume the valuesofRe. 1 - . Rs.a-. RI. 3-.R....-.and R,. s-
to arrive at the corresponding values of .1 before plotting the Demand Curve.
(M. Co",., Rttj•• 195a).
,I. Draw a cumulative &equeney graph of the rOllu.vir g distribution, Ihowing
the monthly wlges of a group of workmen, and hence or otherwise calculate tbe
yaluel of (a) the mode. (b) the median, and (,) the two quartilea~-
Wages in tupeea No. of Workmen
zo-al 8
ZI 10
2a II
23 16
2-4 ao
25 25
26 15
27
28-2 9 ~
sa. (..) Below arc giTen the matb scoted by ao candidatel in a certain aamina-
tion in General English-30, a6. ;1. ao, 33, ,,0, 7, ;6; a8, 15. d. 2". 22. 21,28, a2. 25 •
..6,29,27·
Represent the data by a cumulative Jraph. Detem:Jine the range, upper and
lower quartilel, qwLrtile deviation and m ian. Indicate the qWLrtilea and median
on the graph.
(b) Deline the geometric mean and the harmonic mean. GiTe a lituation where
the harmonic mean is the appropriate average to be uled. (P. C. S., 1913).
33. The populatioDl of three towns in U. P. at the time of the last leven CCOSUlca
ate given below ID tbousandl : ~

Year Jhan.i Sabaranpur Bareilly


IS91 54 6~ 123
190 1 56 66 153
19 t 1 76 6, 12.9
1921 75 62 129
1931 93 79 1 ....
19,,1 103 lOS 193
1951 106 1"3 19S
Estimate graphically tbe population of tbese towns in 1956. (p.e.S .• 1955)
404 FUNDAMEN'Oit,S OF S'l'A'l'Is:I'ICS

H. Represent the following data graphically-

Atlr¥m~,..tIIII4J by Primsry Agrietl1ll11'4/ ereJil Sotietiw~

(In crQl¢II of rupees)

Year tiombay Madras AU-India


- 1946- 47 "1·70 3·47 9.0 3 .
1947-48 :2.:2:2 4.40 10.4'
1948 -49
~949-50
19,0-51
,.'1,
~.z9

6,90
4.9 6
6.44
14·40
H'99
7. 6 5 :22.'90
1951- 52 8.12 7·B :24. 2 1

(SOUBOlt: AIl India Ruml Credit Sutvey. 1954)


3~· I~O individuals .ring at a moving target Illiss by the follow_ing. dJlemcce"
th~ pOll~tr.e (+) and negative (-) signs com:aponding to the shot bemg ltJ "vance
ot behtnd the ~et. Draw Ii Histogram-
Isbot is between +10 and +15 inchCll wide
3 shots are between+
..
~

0
..H+ ,
+10

... -,
20
2S
2Z
.. -y
-10
0

17
13 .. .. ., -IS
-20
•• -:-lo
.. -I,
10

7
.. .. -a, -3 0
" -zo
.. -.2.,
:& . -n •• -3 0

(SOUltOE: Hymsn Levy and B. E. Preide)


~b. "'Graphs ase dynamic, dramatic. They may epitomisc an cpoch. each doc
a fact, each slope an event, each curve a history. wherever there arc data to record,
~ference~ to d-?~. ~r facts to tell, grapbs furnish the unrivalled m~ns whOllC pow~
We are 11l~ beginnIng to r~lise and apply."- Hubbard. Comment.
37. LOcate tnc median by Galton GTapb from the following distribution:_
Size of item 0-10 10-20 :1.0-30 30-40 40-5 0
Frequency 6 6 9 7
38• Draw a cumulative frequency graph of the follOWing distribution. shoW-
ing the monthly w.ages of a group of workmen. and calculate the values of (al the
Mode (b) the medIan and (&) the two quartiles.
Wages
_N
__
in Rs. 20 I 21 I 22 I 2; I· 24 '25 '2.6
o._o_f_~~.e_r_S_ _ _...:.._8;._!...1_~~_(_~I !:_I_6_l-_20;.__1_2...::5_,-'_I:5_--,-_.;;.;/___6__
2.7 28
Analysis of Ti me Series 15
Meaning a"d need. In preceding chapters we have dealt with frequency
distcibution and have discussed the methods of describing and present-
ing them for the purpose of statistical analysis. We shall now deal with
such series in which the main problem is the analysis of the effects of
different factors spread over a long period of time. We shall discuss the
technique of the measurement of chronological variations. In the study
of economic problems such series have a unique place of importance.
. Series relating to production of various- commodities, their consump-
tions, prices, etc, !Ire all time series spread over long periods. It is
necessary in various types of studies to find out the change~ in the
values of such variables due to changes in time. Such studies which
tel'ate to the analysis of series spread over a period of time come
under analysis of time series. A time series discloses relationship
between two variables. For example. the price level of a commodity
in different years, the temperature of a place on different days and the
production of a commodity in different months all indicate the
relationShip between time changes and changes in the values of the
other variable.

In economic and commercial fields time series may relate to either


the problems of internal administration like sa.les, purcha.ses, profits,
etc., or to the wider problems of the analysis of general economic con-
ditions like bank rates, quantity of money in circulation, prices of shares,
etc. In such types of problems it is grocrally thought convenient to
study the general tendency of a phenomenon and to isolate the regular
and irregular fluctuations. An economist vr:ry naturally likes to know
the general trend of the price level-whether it is upward or down ward.
Similarly, a businessman likes to know whether his sales or cost are
gndually rising or falling. In such studies it beco~s essential to isolate
f'lmctuations of all types, so that the rate of growth may be properly
emdied. Table I below. gives an imaginary time series relating to the
"I'IIIIIIc:s of a variable in different YC!lfS. It is plotted on graph paper in
i&me N o.l.
406 FUNDAMENTALS OF STATISTICS

TABLE I
Values of X-Variable in different years

Year Value of X Year Value of X


-
1901 6p. 19 10 9°1
19°2. 60~ 19 17 10 42-
68 4 19 18 1002.
19°3
19 0 4 694 I 19 19 It I 3
19 0 5 679 192.0 IIH
1906 75 6 19 ZI 1139
19 0 7 777 19 22 1I 84
19 08 72.2- 191.3 1I95
19 0 9 781 19 2 4 1008
1910 89 8 192.5 847
191 I
19 1 2.
889
87 8
I 192.6
192.7
65 8
735
19 13 851- 19 28 786
19 14 879 191.9 ~ 86 7
TC)n 735 193 0 1020
Vallles of X- Variable in different years I
Vfll/J~
120
TT ),/
1100
II Ir-..
,f ,
I
I .
!\

I ,
.... r-., vI II, ~l1
V \I I !'/ 1 I
I
\
,00
1:/ I V'
I :/V~VI I :I i I
I I \ I

'100

~ 1 I

I
I-r-- ,
_.___LL
1915
I
1920 1915
1/
J

1,..,.
I

1905
Fig. I
AN ~LYSIS OF 'l'IME SERIES 407

Data which are available in the above fashion are usually affected.
by a multiplicity of causes. The changes in the values of a variable
related to time can be the result of a large variety of factors, like changes
in the tastes and habits of p~ople, changes in population, reduction in
cost of production,. increase in incomes of people, etc. The value of
a variable changes due to the interaction of such forces. If these forces
were constant and not liable to change the value of the variable would
also be constant and even if an equilibrium of their effects was slightly
disturbed and after this there was no change, the values of the variable
would also slightly change and then become constant. But in actual
practice things do not happen in this fashion. Reality is more complex
than this. Tbese forces are never constant, and as such, due to the
effects of their constant interactions, values of the variables also go on
changing with the passage of time. Generally, we do not know much
about the variations in these factors or about the magnitude of such
variations. An idea about their effects is obtained only by a study of
the changes in the values of the related variable. Therefore, to study
the changes in these forces and the magnitude of ~uch changes, we have
to study the variations in the values of related variables in chronolo-
gical order. In economics these two conditions-one in which those
situations, in which there is no change in the effects are studied, and
the other in which the study relates to those situations in which there
is a change-are respectively called static and (i.J'nafIJic conditions. The
analysis of time series is done to understand the dynamic conditions.

As bas been already said, the effects of various factors and their
Significance are incapable of detailed study. Their existence i~·recog­
nlsed due to changes in the values of related variables, By studying
Changes in the time series, an idea can be obtained about the changes in
the effects of various forces which interact simultaneously. The effects
of these forces can be classified roughly in some major categories.
These categories or classes are called the components of titNe series, be-
cause a time series is the result of the combined effects of diffetent
categories o~ forces. These components are as follows ' -

1. Long Period-Secular trend,


2. Short Period-(a) Seasonal variations
(b) Cycli~l fluctuations
3. Itregular or random fluctuations.
In the succeeding page~ these components are discussed in detail.
Secular trend. If figure No.1' which represent the data givc:n in
table- I is carefully studied, it would reveal that the values of the variable
have a tendency to rise. 1£ this figure No.1, is studied in parts it 'W·ould
show that the values of the variable sometimes increase and at others
FUNDAMENTAlS OF STATISTICS

decrease but the general tendency of the data. is upwards. It can be said,
therefore, that~he data have an upward trend or tendency. That component
of a tillle series which reveals the.general tendenry of the data is called long period
Jr secular trend. The secular trend can be either upward or downward.
[t cannot be both ways. Secular trend is the effect of such factors which
are more or less constant for a long time or which change very gradually
and slowly. S\lch factors are changes in population or tastes and habits
of pevple, etc. The effc;:ct of such factors is very slow and gradual. For
example, the effect of increase in population on prices or production
cannot be sudden or irregular. It would always be very very slow,
gradual and regular.
In the analysis of time series the trend values are taken as normal
values. From these normal valu~s, an idea is obtained about the different
types of fluctuations which may be both regular and irregular. The
concept of normal values is no doubt an empirical one but it is very useful
in studying economic events. It should be remembered that social sci-
ences are incapable, by their very nature, of adopting the experimental
methods in their studies, and as such the concept of normal values,
even though it is empirical, is very essential and helpful.
As has been said, trend values indicate the smooth, regular and
long term movement of a series, but it sl:J.ould not be concludfd from
this statement that all time series give a definitely rising or falling trend.
Many series are such in which values fluctuate round a more or less
constant figure which does not change with the passage of time. An
example of such data is a series relating to the temperature of a particular
locality. The temperature would no doubt fluctuate in various seasons
but_ the general tendency of the temperature would hardly change with
the passage of time. Barometric readings generally fluctuate round a
more or less constant value.

SeaS011a1 variations. If we look back figure No. I, we shall notice


that though the teq!1ency of the series is clearly upward, yet the trend
is super-imposed by a variety of fluctuations, ;tnd that is why the curve
shows both upward ~nd downward movements in short periods. These
fiuctuations may be regular as well as irregular; they may be violent
or mild. One type of such fluctuations are characterised as seasonal
variations. These art the results of stich factors which uniformly and reguJarly
rise and falJ in magnitude. These variations can be per hour, per day,
per week or per month. Mostly series relating to economic data con-
tain such variations. Prices of commodities, their production, consump-
tion, bank clearings and interest rates, itc., all show seasonal variations.
These variations are periodic and regular. They are definite and precise
and can be foreseen. For example, prices of agricultural commodities
always go do,\yn at the time- of harvest. This variation in agricultural
:>rices is expected to happen season after season and its effect and inag-
litude can be reasonably estimated beforehand. The upward anti
ANALYSIS OF TIME SBatES 409

downward movements observed in the time series are usually due to


such factors.
Cyclical fluctuatians. Cyclical variations are also regular but their
period is more than one year. In case of seasonal variations we have
noticed that -the period within which such variations take place is a year
or less. Cyclical fluctuations llre effects of business cycles. It is common
knowledge that in economic and commercial series business cycles
playa very important role. Series relating to prices, production and
wages, etc., aIe all affected by business cycles . . However, the period
of business cycles is not as regular as the period of various sea::lons and
that is why they arc not called periodic fluctuations but cyclical fluctua-
tions. But it shouLd _be remembered that though the length of period
of business cycle may vary, the sequence of change remains more or
less regular and this fact makes possible the study of cyclical fluctuations.

Random fluctuations. Mixed up with seasonal and cyclical fluctuations


are the effects of irregular and accidental fluctuations like strikes, lock-
outs, floods, wars, earthquakes, etc. There is no regular period or time
of their occurrence and that is why they are called random or chance
fluctuations. Sometimes they are very effective and they may even
give rise to-cyclical fluctuations. However, since they are irregular
in character it is very difficult to isolate them and their exclusive study
is thus very difficult.

:Ana!Jsis of time series. The observed values in a time series are the
result of the interaction of the various components discussed abovc-
secular trends, seasonal variations, cyclical fluctuations and random
fluctuations. In the analysis of time series attempts are made to isolate
the effects of these forces and to study them separately. The importance
of such a study cannot be overemphasized. Economists and business-
men ~ve not only to study the short time fluctuations but have also
to observe the long period or secular trend of the data. But here many
diBiculties have to be faced. These d.i.fficulties arise due to the limita-
tions of the science of statistics. If it were possible for a statistician to
carry on experiments like a physicist- he would have been in a position
to isolate the effects of '\tarious factors and to study only one factor at a
time. But a .atatistician cannot do so. He is helpless in this respect.
The only course ope~ ~o him is to stutly the effects of various factors
by the process of elimination. In the following pages this method has
been explained in details.

MEASUREMENT ')F TREND

The measurement of trend is generally done for two reasons. The


first is to study the manner in which the value" of a variable behaves in
a long time. This study is possible only if the other components of the
'time series are eliminated. In other words, for such a st'ldy trend values
410 FUNDAMENTALS OF STATISTICS

have to be isolated from the effects of regular and irregular fluctuations.


The second reason for the pteasurement of trend is to study the fluctua-
nons in the. value of the variable. Fluctuations both regular and irre-
gular can qe studied only when trend values have been isolated from the
time series. A time series is the result of a combination of all the com-
ponents and as such, if from the original series the trend values are sub-
tracted, the remaining values would represent the fluctuations. Trend
value~ are usually measured by the following methods : -

(0) Curve-fitting by inspection


(b) Moving average method
(~) Curve-fitting by mathematical equations.
We shall study these methods in the following pages.

Curve fitting by inspection

In this metho.d the data are first plotted on a graph paper and a
smoothed curve is plotted to the data merely by inspection. The curve
is fitted in such a manner. that the general tendency of the figures
becomes clear. Such curves eliminate other components-regular and
irregular fluctuations.
From the point of view of simplicity this is-the best method. It
saves time. In' other methods complex mathematical processes have
to be used whereas, in this method nothing of the type i~ needed. But
the main disadvantage of this method is that the trend curve so drawn
can be effected by the bias of the statistician. In such cases different
curves would be obtained from the same data by different persons. Due
to this shortcoming this method is not very popular. Usually trend
values are obtained by the application of mathematical formulae.

Movina- average method

Since curve fitting by mere inspection does not usually give satis-
factory values of the trend, other methods are used for the purpose.
Movipg average method is a simple device of reducing .fluctuations
and obtaining trend 'Values with a fair degree of accuracy. The tech-
nique of moving average has already been discussed in the chapter on
Measures of Central Tendency. The fiJ;st thing to be decided in this
method is the period of the moving average .. What it means is, to take
a decision about the number of consecutive items whose average would
be calculated each time. Suppose it has b,een decided that the period
of the moving average would be 5 (years, months, weeks or days as
the case may be:) then the arithmetic average of the first items (numbers
1,2, 3,4 and 5) would be placed against item NO.3 and then the arith-
metic average of item Nos. 2, 3,4,5 and 6 would be placed against item
NO.4. This process would be repeated till the arithmetic average of
the last five items has been calculated.
ANALYSIS OF TIllE SERIES 411

. The ~ost important qdestion that arises hele is about the period
of average. Should we take three yearly (monthl, or weekly) moving
average or five yearly moving average or seven or nine yearly moving
average or the moving average of some other period, is a question not
easy to answer. This question is very important because trend values
are affected by the period of the ~oving average. We have already
said that the purpose of moving average is to obtain trend values so
that all types of fluctuations are eliminated or in any case reduced to
minimum. The period of moving average should be such as would
achieve these objec,dves. This statement, however, does not carry us
very far, for we have still to find out which period would be ideal for
realising the above mentioned aims. We shall study the trend values
obtained by various periods of moving average in different types of series
to arrive at some general conclusion.

Example I. The following table gives a series in which there are no


fluctuations. The values of the variable increase by a constant amount
each year.
TABLE II

\ Values.of 3-Yearl y s-Yearly 7-Yearly 9- Yearly


Year x-vart- moving moving moving moving
able average average average aveage
------ -----
1945 2

1946 4 4

1947 6 6 6

1948 8 8 8 8

1949 10 10 10 10 10

195 0 12 12 I" 12 12

195 1 14 14 ,..I 4 104- 14

19S 2 16 ,6 16 16

19B 18 18 18

1954 20 20

1955 22
412 FUNDAMENTALS OF S'l'ATIS'l'ICS

The data given in the above table are plotted below in figure
No.2..
Linear Trend

-1
10
~
VV I
15 / I
l

vV I
:0

/
v _-
/

liI
s
~v
o I
'94J 46 .p 4-8 49 $0 51 52 5-' 54 Sf
YI!'''''s

F(g.2

It will be observed that whatever the· period of moving average


the original series has been repeated. We arrive at an important con-
clusion by this study. If a strus containuJo flllClltfltions bllt on!! 'genera!
1r8nIl, which whell plfJlled III II I',raph paper ~illes II straight li1I4. the lIIIJ7Iing-
IIIIN'age will repro_e the series. Such trend is . called Linear Trend. If
the data were in a reverse order then also the origin:il seriC?S would
have been repeated With different periods of moving average. How-
ever this time the trend would have been downwards and not upwards
lIS in FigUl'l: 2.

Exlllllple 2. The following table gives a series in which there ace


no fluctuations but the curve of the data is CIJ_ex to the ba....e (a cOnvex
a:arve is obtained when the values increase hy a consta~tly ina:Casing
_ute).
ANALYSIS OF"PIME SERIE.'i 413
TABLE ill

3-Yearly 5-Yctrly 7-Yeady


Year Values or" moving moving moving
y-variable average avera.ge aventge

190+5 2 ... ... ., .


19-46 4 4.3 ... ...
1!l47 7 7.3 8 ...
1948 11 11.3 12 13
1949 16 16.3 17 18
1950 22 22.3 23 24
1951 29 29.3 30 31
1952 37 37.3 38 39
.1953 46 46.3 47 ...
1954 56 56.3 ... .. ,
1955 67 ... ... ...

. In the above tabk, the values of x-variable increase by 2 in the


year 19"46, by 3 in the year 1947, by 4 in the year 1948 and so on.
Tme-yearly, five-yearly, and seven-yctrly moving averages of the data
have been calculated and all these along with the original series ue
p-lotted in figure No.3 below.

C.,."i-LDuar Tr~nd (Co"",x)

-- o.·.,. :h/dtl
_._J YH'~: /ItI.~
- --_:
.... ,.
GO . /
1/
." /
·-;ff
.~;?
~
,,;

~
r;_; ~
f':?
9 -- 1=-':::; ~ 5) 51
'~:i ~ 4 48 ""9 5 53 54 5'11
VPar.s

Fig. 3
JlUNDAMBNTALS OF STATISTIC:;

It will be observed from the above figure that moving average


figures have given curves which are parallel to the curve of the original
data but all these curves are above the original curve. In other words,
trend values are more than the original figures. The second thing to be
noted in the above graph is, that the greater the period of the moving
average the farther the trend curve is from the curve of the original data.
So we arrive at another important conclusion. If (/ .uries GtJntain.r no fllltllla-
tions but only a general trend, which, whm plotted on a graph paper givu a (IIrve
convex to time series, its moving average willgive another curve parallel 10 it but
above it. Further, the longer the period of moving average, the farther will be Ihe
trend curve from fhe curve of original data. Thus, in such cases if the period
of moving average is very long, the trend values would be distorted
and would considerably differ from the original figures. Such trend as
is shown in Fig. 3 is called Cfjrvj...Linear Trend.
Bxample 3. The following table gives a series in which there are
no fluctuations but the curve of the data is concave to the base : -

TABLE IV

Year Values ofx


variable
;-Yearly 5-Ye 1y
moving
:u
moving -
average average
t

1946
1947
19
21
-
20.6
-
-
1948 22 2z.6 21
1949 22 2r.6 21
1950 21 20.6 20
195 I 19 18.6 'I !l
1952 16 15. 6 15
1953 12 11.6 I I
1954 7 6.6 -
1955 I - -
J

In the above table first there is an increase in the values of the


varia~le. ,!he increa.se is at a declining rate. From 1946 to 1947,
ther.e IS an u~crease of 2. ~rom IQ47 to 1948 of I. From 1948 to 1949
the Increase IS 0. After thIS there IS a decreasing tendency in the figures
~nd the decrease is at an increasing rate. Thus, from 1949' to 1950, there
15 a decrease of I, from I950 to I95 I a decrease of 2 and from 195 I to 195.z
of 3 and so on. In the above series three-yearly and five-yearly moving
averages have been calculated. The three curves--one of the origioll.l
ANALYSIS OF TIME SBRIES 4.15

data and the other two of trend values-have been plotted in figure
NO.4 below : - .

Curvi-Linear Trmd (Concave)


Va/un
24r-..--,--,-

8 r--r-jl--+---+-

Fig. 4
It will be observed from the above figure that the moving 2verages
have. given such curves which are parallel to the curve of the original
data. These curves are below the curve of the original series indicating
that in such cases trend values are less than the original ones. Further,
the longer the period of moving average, the farther the trend curve is
from,the curve of the original data. We, thus conclude that if a series
contains only trend and no fluctuations and if the origiltal figures give a (/Irve,
concave to thIJ baS/J, moving average would givIJ another curve, parallel to the original
one. Further, the longer the period of m(Jving average the farther wOllld be the
trend curve from the curtJe of the original data. This is another example of
CllrtJi-Linear Trend.

The above examples clearly show that if a series contains a linear


or curvi-linear trend, moving average" either reproduces the series or
gives a curve parallel to the curve of the original data. If the period of
rr.ovi~g average is very long, the trend values are pulled away from the
original ones and, therefore, in such series where the fluctuations either
do not exist or are insignificant the period of moving average should
not be very long. But as we have said earlier moving average is pri-
marily meant for reducing or eliminating fluctuations and it is necessary
to study the effects of the period of moving average on regular and irte-
gular fluctuations. These series would be worked out with different
416 FUNDAMENTALS OF STATISTICS

periods of moving average and we shall then be in a position to judge


the effects of the period of moving average on fluctuations.
Exampl, 4. The following table contains a series of regular fluc-
tuations : -
TABLE V
5-Yearly 7-Yca.rly 9- Yearly
Year Fluctuatiom moving moving moving
average avern'ge averjure
1935 ,'" ... -
193 6 +2 ... ... ...
193/ 0 +.2 ... ...
193 8 -I -.6 0 ...
1939 -2 -.8 0 +·4

1940 -2 -·4 0 +.2.

1941 +1 +.2 0 -.1

1942. +2. +.6 1) -·3

1943 +2 +.8 .0 I -·4

1944 0 +.2. 0 -.1

1945 -1
I -.6 0 +3
1946 -2 -8 0 +·4
1947 -2 -·4 0 +.2
1948 I +1 +.2. 0 -.1

1949 +2 +.6 0 -·3


!
195 0 +2 +.8 0 -,4

19P 0 +.2 0 ...


1952 -I -.6 ... ...
1953 -2 ... .,. ...
Ii
19~4 -l
I'
I
... . .. J ...
ANALYSIS OF TIME SERIES 417
Tpe ~bove data have been plotted in figure No. 5 below : -
Cyclical Fluctuation.! and their Moving AI/erage

- -
_O'~/"lj
ICH'~~U'
-- - 7 ~ #
11\
! 1/
I
----9" 4'1
I I 1\
\ '~ I/~ ;
\
\
.'
I / -. ,/
-
o --- - - - -- ..: - ._ i. - - ~

rI ~ -..,II
/
-- -.
\ ....
I
~
I

I
1
[ I I \,

.1
r93~
II3' J 7 $u

39 40 4 .,. 4. ·1
\ 46 1 41 49 SO 51
1\
51 53 5.

Fig. l
It will be observed from table No. V as also from figure No_, that
seven-yearly moving average eliminates all the fluctuations while five-
yearly and nine-yearly' moving averages only reduce them. The reason
is that there is a seven-yeady cycle in the fluctuations and that is wh,
a seven-yearly moving average has eliminated the fluctuations completely.
With a five-yearly moving average the 'range of the fluctuations which
is:l::2 in the original series-.has been brought down to+.S and nine-yearly
moving average reduces the range to:l::.4 whereas a seven-yearly moving
averllge reduces it to o. Fourteen-yearly moving average will also eli-
minate the fluctuations' completely. In fact, if the period of moving
average is in multiples of 7 the fluctuations would be completely eli-
minated. Other periods of moving average would merely reduce the
fluctuations. They cannot· eliminate them. Thus, we arrive at anothc:r
conclusion that if a series contains cyclical flllttuations, a moving average with
a jHrioJ leu or more than the period of the cycle wOHlJ only redtfCe the flrIG/tfalions
bllt if the period of moving average is the same as the period of the fYtle or its
"'lI!t~tJle the fltfctHations wOlild be completeb eliminated.
Example ,. The following table contains a series of irregulllr
fluctuations : -
"7
418 FUNDAMENTALS OF STATISTICS
TABLE VI
Irregular 5-Ye~rly 7-Yearl y 9- Yearl y
YEAR fluctuations mOV1ng moving moving
\ average average average
193 6 -z ... ... ...
1937 0 ... ... ...
193 8 +1 -.2. ... ...
19~9 0 0 -.4~ ...
1940 0 -.2. 0 - ·55
1941 -I -,2. -·43 -·44
1942. -1 -,8 -.7 1 -·55
1943 +1 -1.0 -.86 -·55
1944 -3 -1.0 -.7 1 -.2.1-
'945 -1 -.6 -- .4 1 -·H
1946 -1 -- .. 2. -·41 -',%.%.

1947 +1 +.2. -·2.9 -·H


1948 +3 +·4 -.14 -.3,;
1949 -I +.z +.14 -.Il
195 0 0 +,% +·]4 +.zz
195 1 -2. -.6 +'%,9 +·33
1952. +1 0 0 ...
1953 -I +.% ... ...
1954 +2. .. , ... I ...
1955' +1 '" ... ...
The above data have been plotted 1n figure No.6 below : -
Irregular Flllctuations and their Moving Average
J

Fig. 6
ANALYSIS OF TIME SERIES 419

It will be observed that so far as irregular flllthlations are Go""rll6d


",oving average only reducel them, it cannot eliminate them, and the lonl,er th,
period of moving average the greater would be the redfiction : bllt after a "rta;n
limit an increase in the period of moving al/erage will increase the fillctllalions
also.' In the above case. five-yearly moving average has brought down
the range of fluctuations from±1 in the original data to+.4-I.O and
seven-yearly moving average has reduced it to+ .19-.86 and the nine-
yearly moving average to +.33-.55.
We are, now, in a position to summarize the effects of moving ave-
rage on trend values and regular and irregular fluctuations. Our con-
elusions are : -
I. Moving average reproduces the linear trend. The longer
the period of moving a.verage the greater is the difference between trend,
values aoo the original data (Example No. I).
2.. Moving average reduces the curvature of curvi-linear trends
and longer the period of moving average the greater is the difference
between the trend values and the original data. (Example Nos. 2.
and ;).
3. Moving average with a period coinciding with the period of
cycle in a series or its multiple eliminates cyclical fluctuations, while
moving averages with periods less or more than it only reduce them.
(Example NO.4).
4· Moving' average cannot diminate irregular fluctuations. It
only reduces them and up to 11 certain limit the longer the period of
moving average the greater would be the reduction. (Example NO.5).
From the above it is clear that a long pe~od moving average is
good from the point of view of reducing irregular fluctuations but such
a moving average is likely to distort trend values. We have seen that
the longer the period of moving average the farther away are trend va-
lues from the original ones. We have, thus, to adopt a middle course.
The period of moving average shouI'd neither be too long nor too
short so that neither are trend values distorted nor are irregular fluc-
tuatiqns present in large magnitude. Whenever there is a cycle in
the series. the best period of moving average is the one which coincides
with the period of the cycle. It would eliminte the cyclical variations,
reduce irregular fluctuations and would give the best possible -values
for the trend. But it is also very likely that the period of the cycle in
• series is not uniform. Sometimes the cycle may complete in five-years,
at others in seven years and at still others in eight or nine years. Under
such circumstances, the average duration of the cycle should be calculated
and this should be the period of the moving average. The durations
of cycle can be found out by plotting the original data on a graph paper
420 PUNDAMENTALS OF STATISTICS

and leading the time distances between vario'tls peaks. The average
of these time distances would give the average duration of the cycle,
The following table gives the values of a variable and its three-yearly,
five-yeatly and seven-yearly averages which ha'V'e been plotted in
Figure. No. 7. .

TABLE VIl

Vallles oj a Variable

Values ;-Yearly 5-Yearly 7-Yeatly


Year (annual) moving moving moving
average a.verage average

19 20 225 ... .. ,
19 2I 21; 21 3 .. , ...
'"

1922, 101 2.10 US ...


191 3 lIS 21; 2. 1 9 ~
212.
1~14 12.; 2.18 12.4 211
1915 145 2H 229 ~2..5
19%6 1'35 235 232 23 1
192.7 115 2.31 237 239
19 28 1H 2;6 2.41 2.44
19 29 249 249 246 245
1930 2. 65 /2.57 251 2.46
193 1 159 157 25 2 2.5 2
1932. 2.49 2.5 0 2.5 6 159
1933 24 1 2.5 1 .160 16 3
1934 2. 6 5 .164 .163 z63
1935 z85 Z75 .166 z63
193 6 2.75 275 .17° z66
1937 26 5 266 '1.7 1 2.74
193 8 2S9 .166 . 274 2.78
1939 z75 277 .177 .177
1940 2.97 2. 87 180 177
1941 28 9 289 18 3 .z81
1942. Z81 z-84.. 187 2.9°
1943 2.'75 18 3 2.9 1 2.94
1944 195 195 2.94 ...
... ...
1945
1946
31 5
30 5
30 5
...
I .... ...
ANALYSIS OF 'l.IfME SERms 421

Vollies ()J a variable and its three yearfy.Jive:Jearl'l and


reven-ycarly moving averages
ur _
0

I/~ !
-Onglna! dala
__ - 3 vcarly '" A
!
_ ··5 .
0[-7 .

~
'J
\ ~'V :1
-~
I

\
I _ ...
'r"\ _- r-
.
Z80
\
~ .' ,,-:;
,,d-r,I ItV ~ r-... V
_\

I~ ~
r- _- r- f-

Z<I0
ij,
,/1 fL _
.' t
~';;'
tx
J! I

t,~ KI
"... ~
\ /"

200
~~ J.lJ
f920 25 )0 35

Fig. 7
From the above graph it is clear that the first peak in the above
lata is ir the year 1925 and the second in 1930. The time distance
Jetween these two peaks is five years. The third peak is in 1935 and
lere also the time distance is the same. In this series, the time distances
between adjoining peaks is always five years. In other words, the cycle
!las a perfectly uniform period. The best period of moving average
n the above series is, thus, five years. If the time distances, between
ldjoining peaks were not uniform then arithmetic average of the various
ime distances would have bes:n calculated to obtain the average duration
Jf the ~ycle and the relevant period of the moving average.
It should he noted that if the average duration of a cycle is in even
number of years, say six, then the average of the first six figures (1 to 6)
would be placed between the third and fourth items and similarly, the
lverage of item Nos. 2 to 7 would be placed between the fourth and
Hfth items. The arithmetic average of these two moving average figures
would be kept against item No.4. Similarly, the other trend values
would have to be found out. The calculation of moving average with
an even period, thus, involves t\1'O processes and considerably increase~
the work of calculation. Generally, however, the period of cycles'is
in odd figures aad this. difficulty does not arise.
422 FUNDAMENTALS OF STATISTICS

Limitations of moving 1lSI6rage. The moving average method dis-


cussed above suffers from move limitations. This method is suitable
when the series shows regular cyclic movements. We have seen that
when there is a regular cycle in the series, the moving average completely
eliminates the cyclical fluctuations and considerably reduces the irregular
ones. But if there is no cycle or if the cycle is of varying duration, moving
average method does not ~ive very satisfactory results as in such cases
it cannot eliminate periodical fluctuations and it does not very much
reduce the random fluctuations, particularly if the average duration
of the cycle and consequently' the period of moving average is a short
one. As such, this method cannot be recommended for universal ap-
plication. It is most suited when there are uniform periodical move-
ments in a series.
Another shortcoming of this method is that it cannot give trend
values for extreme items of the series. Thus, in a five yearly moving
average, there are no trend values for the first two and the last two items,
in a seven yearly moving average for the first three and the last 3 items
and in a nine yearly moving average for the first four and the last four
items. The greater the period of moving average the larger will be tQe
number of such items whose trend values cannot be obtained. In such
cases the trend values of extreme items are usually estimated by a free-
hand extension of the trend curve both in the beginning and at the end,
but such a procedure gives only the approximate values of the trend.
Another course which is sometimes adopted a?d which is even
less accurate than the first one, is to repeat the figures at the extreme
ends. Thus, in table VII, 305 (the value against ~946) may be repeated
for the years 1947, 1948, 1949, etc., to obtain trend values right up to
the end of the series, that is, the year 1946. Thus, in three yearly moving
average, this value would have to be repeated once, in five yearly moving
average twice, and in -seven yearly moving average thrice. Both these
methods are, however, unsatisfactory.
Curve fitting by mathematical equations
For many types of data it is better to obtain trend values by fitting
a mathematical curve than by finding out the moving average. If the
increase or decrease in the values of a particular series is of equal absolute
amount year after year, or, if the increase or decrease is always a constant
percentage, as in case of compound interest sums, it is better to estabUsh
mathematical equations of the increase or decrease, and to fit a trend,
based on the values obtained by such equations. In many studies relating
to economic phenomena. the data conform to definite laws of growth
or decline, and in such cases. the mathematical curve gives the best po-
ssible trend values. No doubt actual values in the series would deviate
from the trend value~ so obtained but they cannot destroy the significance
of the equation which describes these changes.
In moving average method we presume that there is no law on
which the changes are based. Moving average gives the trend values as
.obtained from the data themselyes. As against this, in mathematical
ANALYSIS OF TIME SERIES 423

curves, it is presumed that there is a definite law which governs the


changes and, as such, the trend values so obtained depend on the equ?-
tion representing this law, rather than on the data themselves. Moving
average method is in this respect more flexible because if the data change,
the moving average also changes, but in case of mathematical curve
unless the equation is changed, the trend values would not change.
For this reason, if it is established that particular data do not conform
to a particular mathematical relationship after some time, the equation
has to be changed, and even in one time series trend values may have to
be obtained by two or more different equations. Thus, if a series r e-
lating to the production of coal conforms to a particular mathematical
equation and if it is found that due to new discoveries in the technigue
of coal production that equation no more holds good,'a new equatton
may have to be established and the trend values of the latter period may
have to be obtained by the new equation.
In this technique of isolating the trend, the most important thing
is the selection of an appropriate equation to wl;lich the data in question
conform. This is by no means an t'asy task and if once a wrong equation
has been selected the entire analysis would be faulty.

Fitting straight line trend-Method of Least Squares


One of the best ways of obtaining trend values is the method of
least squares. With this method, a straight line trend is obtained. This
line is called the Line of the Best Fit. It is a line from which the sum of
the deviations of various points on either side is equal to zero. In other
words, if the vertical distance of the various points on one side of the
line are n.easured and totalled, this figure would' be equal to that which
would be obtained if the vertical distances of the points on the other side
of the line were measured and totalled. This being so, the sum of the
squares of these deviations would be the least as compared to the Sl} m
of squares of the deviations obtained by using other lines. We lla\(~
alreaay discussed in -earlier chapters that the sum of the squares of the
deviations measured from the arithmetic average is the least and here
also the same condition holds good. The sum of the deviations from
the arithmetic average is zero and here the sum of the deviations from
the Line of the Best Fit is zero. Eor this reason the sum of the squares
of the deviations of various points from the Line of the Best Pit is the
least. It is on account of this fact that this method is known as the
Method of Least Squares.

The question that naturally arises is, how to obtain such a line
wh~ch would sati~fy the above-mentioned conditions. We shall first
take an e.xample and illustrate by a very simple and non-mathematical
technique, now this can be done. Later on, we shall examin~ the
mathematical implications of the technique.
424 FUNDAMENTALS OF STATI5TICS

Example G. The following tab:e give:l the World production of


gold. Fit a straight line trend by the Method of Least Squares.

TABLE VIII

IPorld Production of Gold

(In croce Ounces)

Years Production

1945 IZ·7
1946 10.1
1947 1;.0
1948 1;.2
1949 1.2.6
195'" 14·~
19P 13·t

TABLE IX
Solution: Calcll/atio,n of the Line of the Best Fit by the Method
of Least Sqllc."'s,

0)
c; ~L. . . "'::0 .....o '"c:: X.
.9 0 tJ t:::-o 0
0 ........ ~

.-
I

Year titlu .~ a~ ~.~


'" Trend Ordinates
-6 c:: g .~ E ... , &~
C<!._
-0
8:.::!-C 0) C r,n"1;) °u
U .
p.., 0..::::
~1) (z) (3) (4) (5) (6)
-- -3 -3 8. 1
!2·7 1.2.786-(3 x.;86)=Il.6z8
1945
1946
f947
fO.l
1;.0
-z
-1
9
4
1
0
-zO.z
-13. 0
0
I lZ·7 86-(z X·386)=u.OI4
12.786-(1 x.;86)=I2.·400
1948 1;.2 0 12.7 86 +(0 x.;86)==u 7H6
IZ.6 +1 +12.6 12'7 86 +(1 X·386)=13·17~
1949
19,0
f911
,
14. 2
1,,7
+2
+3
~
+28·4
_9 \ +41.1
I 12.·7 86 +t2 X'386)=I;.5~ 8
12.7 86 +(3 X·3 86 )=I;·944
Total I z8 i +10.8 !
89,5
ANALYSIS OF TIMB SERIES 425

~rend ordinates given in column 6 have been calculated as follows:

(1) The adthmetic average of the production figures has been


calculated It is 89· 5 or 12.786 crore ounces. This is the middle
7
point of the Line of the Best Fit. Or, in other words, it is the trend or·
dinate for the middle year which is 1948.

(2) The deviations of years from the middle year have been cal-
culated. They are given in column No. ; and the squares of these de-
viations are shown in column NO.4.
(3) Deviations given in column NO.3 have been multiplied by
the production figures given in column No. 2. and the products havp
been totalled. This is in column NO.5.
(4) The total of column NO.5 has been divided by the total of
column NO.4. This figure ( 10;! =+.;86) shows the average annual
rate of growth.
(5) The arithmetic averagt: of the figures of column No. 2.
(12'786) has been written against ,the middle year in column No.6.
The procedure of finding trend values for oth~r years is clarified ;n this
column. The trend values of the years before the middle year would
be less than the trend value of the middle year and similarly, the trend
values of the years after the middle yeaf would be more than the trend
nlue of the middte yea!'. The difference betw.een the figures of ad-
joining years would be .3S~. 'In this example if the average annual
r!lte of growth would have been negative tqe trenJ values of the years
before the middle year wo'uld have been more than the trend value of
the middle year and the trend valut::s :>f years after the middle year would
have been less than the trend value of the middle year.
The production figures of gold and the Line of tlie Best Fit obtain-
ed by the Method of Least SQuares have been shown in the graph on
next page.

In the above example we have assumed a straight line as the.


trend of the series. 3uch a trend line can be obtianed by the use of
simple mathematic device.
The equation of a straight line is y=a+bx where a and bart:
co~stants. To obtain definite values from the above equation values
of a and b have to be detennined. This can be done without much
d~culty. Simple illustration is enough to explain the procedure of
obtaining the values of a and h.
..26 FUNDAMENTALS OF STATISTICS

World Prodlfdion of Gold fiNd lis I.:.Jlle of Bul Fit

(Cror, Dtmn'S)
I$r------r----~------~----_r----~------T".----~

13

1947 1948 1949\ 1951) (gJ


Y'Qr5
Fig. 8

ExalHplr NO.7. Fit a straIght line trend by the Method of Leas'


Square with the following data :--

TABLE X
Values of x and y
x y

3
4
3 6
4 9
10

In the above table, the values of x andy are not in any fixed rdatio~­
ship and we have to obtain such an equation which would give the most
probable values of a and b. First, we shall obtain five equations for
the five sets of relationships disclosed by the abov:e figures.
ANALYSIS OF TIMB SBRIES 427
Thus, when .1=a+bx
3-a+ Ib
4=a+z.b
6=a+3 b
9=a+4b
and lo=a+5b
Any two of the above equations can be solved as simultaneous
equations to get the values of a and~. But these values would not
necessarily satisfy the remaining three equations. We have to obtain
such values of a and b which are most probably taking into account the
relationships in all the above five equations. For this we shall have to
obtain two normal etJ.'IoIMIJ. The first normal equation can be obtained
by multiplying the five equations by the respective values of the coeffi-
cients of a and adding them together. The second normal equation
can be obtained by multiplying the five equations by the respective values
of the coefficients of b and adding thetn together.
Since the value of coefficients of a is unity in all the above five cases
,we shall simply add these five equations to get the first normal equation.
Thus, the first normal equation would be :
32.=S"+I5b
To obtain the second normal equation we shall multiply eguation
Number I by 1, number 2 by .I, number ~ by .5 and so on, and after this
we shall add them. After multiplication, these equations would be :
3=a+lb
8=z.a+4b
18=;a+9 b
6=.f.O+ 16b
50=~{1+2.5b
and their total would be :
115=Isa+s~b
This js the second normal equation. Thus, the tW(\ normal equa-
tions are:
sa+lSb=;2 ...... (i)
lSa+5S b= 11 5 ......... (ii)
If these simultaneous equations are solved, the values of rJ and b
\'\'ould be found,to be respectivelY.7 and 1.'). These are the most pro-
bable values (If ,q and b respectively. From these values we can get
the equations· which 'would give the line of the best fit.
),=If+bx
Substituting the values of II and 11.
_"=·7+ I ·9x
428 FUNDAMENTALS OF ~TATISTICS

In actual practice, it is not necessary to write down all the equations'


as has' been done above. We have only to insert the proper values io
the two eqllations given below which hold good wheny=a+hx.
E(1)=110+ h~t·"O()··· ......•..•.•...•••.••.•. (,)
E(.lQ'}=a»:x)+bl:(xl) .•.•..•..•••••.••. (ii)
where E(1)=The total of the values of)
l(x)=Thc total of the product of the correspond.
ing valu~s of x and,1
1: (XS) = The- total of the squares of the values of ""
and n=The number of pairs of values.
In the above example
I;{,)=32.
E(X)=I5
I:(~)=(I X 3)+(2. X4)+(3 X6)+(4X9J+(5 X 10)
=3+8+18+36+5°
=llS· ~
l:(XI)=(I+4+9+ 16 +2.j)
-=SS

,,=~

Substituting the values in the above equations, we get :-


;2.= sa+Isb
115 =Ija+ssp
Thus, the tame normal equations have been obtained by using
the short-cut method also. When solved they would give the sam c
equation of th.:' Line of the Best Pit, i. e.,
.1=·7+ I ·9x
The trend values obtained by using the l!,bove equations would
be as follows : -
when -'f;"=I,.1='7+(rXI.9)= 2..6
" :<=2.,.1=.7+(2. X 1.9)= 4.5
.. X=3,.1=·7+(3 X 1'9)= 6.4
" X=4,y=. 7+ (4 X 1.9)= 8·3
x=~, . .:v=·7+(5 XI.9)=IO.2.
The.above il1':lstr~ti~n clearly shoW's how with the help of simple
mathematIcal eqllRtlOn It IS easy to find out the t;rend values. We shal
now solve example No. 6 by this method:-
ANAI.YSIS OF TJME SERIES 429
TABLE XI
World Prodl(ction of Gold and lIs Line of Belt Fit
(y=a+bx)
Produc- Trend (comput-
Year tionin :It( x.J. xl ed values of
crore ozs. (.1)
)
194~ 1.2.·7 1 12..; 1 1l.629
1946 10.1 2 20.2 4 12.• 015
1947 13.0 3 39.0 9 1%··4°1
1948 13. 2 4 52 • 8 16 12.7 8 7
1949 12.6 S 63. 0 25 13. 113
195 0 14. 2 6 81.. 2 36 13·559
19P 13·7 7 95·9 49 13·945
Total 89·5 28 368 •8 140
From the above table it is clear that
n=7
1:(..1)=89·5
l:(x)=.d
.E("!7)=3 68 •8
E(xl )=J40
The normal equations are
E(J')-u+b E(x)
.E("!7)=dl:~x)+b.t(xI)
Substituting the values in these two equations, we gr.t
89,5 =74 +2.8b... (I)
568.8=z.8tZ+ f4P II... (ii)
Mutiplying the fitst equation by 4 and subtracting it from the
second equation, we have
568,8==28a+ 140b
358;0:::=2.114+ 1 ab

or .b=.386
Substituting the value of b in equation (t). we get

or 89.5-10.8=7.0
89·S-14 10 • 8+
or 78'7=74
or 4=11.2.43
430 FUNDAMENTALS OF STATISTICS

The equation of the Line of Best Fit is .1=a+ bx. Substituting the
above values of a and b, we have
y=1l.%.43+·386~
To find the ·trend values tor computed values of (y) for different values
of x (from 1-7), we substitute the values of x in the above equation and
get the following results' : -
When X=I,.7=I1.Z43+ .;86=Il.6z9
" X=Z,Y=11.z43+ .77 2 =IZ.015
" X=3,Y=II.z43+I.l~~=IZ·401
" x=4,J'=II.z43+ 1·544=U·7 87
.. x=,,)'=--'1I.2.43+ 1·930=13· 1 73
" X=6,Y=ll. z 43+2..·3 16 =13·"9
.. X=7,Y=II.z43+ 1.·702. =1 3'94'
These trend values are shown in the last column of the table giving
the solution of Example No.6 by this method. It will be observed that
the trend values obtained by this method are exactly the same as' obtained
by the method first illustrated. The difference of .00 I in trend value!
is due to the approximation of decimals.
In the example solved above, the calculation of trend values would
become very easy if the deviations from the middte year are taken as the
values of x. This has been done in the following table : -
TABLE XII \
World Production of Gold Qlzd its Line of But Fit (in crores of Ofillces)
rear Production .l)evtatlons ::'quare <..ol.zX Trend
(.7) from middle of Devia- Col.; Values
year clons
(x) (Xl) (xy)
(I) (%.) (3) (4) (S) (6)
194' 12.·7 -3 9 -3 8.1 II.6z8
1946 10.1 -1. 4 -zO.Z 12..014
1947 13. 0 -I 1 -13·" IZ·40 0
1948 13. 2 0 lz·7 86'
1949 12..6 +1 °I +u.6 ° 13· 17 z
195 0 14. 2 +z 4 +2.8·4 13·,,8
195 I 1;·7 +3 9 +4 1 • 1 13·944
Total 89·J %.8 +1,.8
In the a,bove data
l:(y)=89· 5. l(x)=o, r(xY)=Io.8, E(.lfI )=8, 0=7
The two normal equations are
~y=na+b I(X)
1;(xy)=o r(x)+b 1;(xS)
ANALYSIS OF TIME SERIES 431

-Substituting the values given above, we get


89.5 =7a+(b xo) ...... ............ (i)
or 7a=S9.S
or a=1z..786
10.8=(0 Xa)+2.8b ..................... (ii)
or ZSb=10.S
or b=.386
The equation of the Line Qf the Best Fit is
_,=a+bx
Substituting the above values, we get
"=12..7 86 +.3 86,,,

With this equation, the trend values can be easily obtained. Thus,
far tbe year 1945. when the value of x is-3> the trend value would be
I2..y86+(-3X.386) or J2..786-1.1S8 or 11.6zfC. Similarly, other va-
lues can be <;alculated. These values are identical with those calculated
before.
This simplification is possible only when x's are consecutive num-
bers. It will be so when there is an unbroken time series. If there is
Il break in the time series, this simplification would not be of much help.
In fact in an \lllbraken time series where the sum of the deviations from
the middle year is equal to zero, the two normal equations are:
E C),)=na
E (xy)=b 1: (Xl)
It w.il1 be noted that in the above solution these simplified equations
are satisfied.
It should be kept in mind that if the deviations of these trend
fi gures from the original data are calculated, their sum would be zerO.
In the above example, the positive and negative deviations both total
at about 2.7 and their sum is zero. The sum of the squares of these
deviations would be about 6. 1. The sum of the squares of such deviations
from ,values of trend other than these, would be more than 6.1. The
Method of Least Squares is based on the principle that the sum of the
squares of the deviations from the Line of the Best Fit is the Least.
The main limitation of the Method of Least Squ~res is that if some
items are added to the original series, a fresh equation has to be obtained
as the values of x, xy and Xl, etc., would all r.hange with the addition
of items. Tn case of moving average, this difficulty is not there.
432 FUNDAMENTALS OF STATISTIC';

Fitting a curve of the power series

The discussion relating to the isolation of trend so far has been


confined only to linear trend. In many cases a straight line trend may
not give satisfactory results and in such cases, sometimes, the series has
to be broken in parts and different equations nave to be obtained for the
various segments. This practice of breaking the series, however,
cannot be justified'as it violates the basic principle of trend fitting. If
a strai&ht line trend does not give satisfactory results, the best alter-
native IS to try a curve of the power series.

The general form of the equation of power series isy=a+bx+,;-x l


+dxS +... This type of equation does not represent a curve of strictly
parabolic type but as has been said earlier also, in common l1sage, the
term parabolic curve is used to indicate curves obtained by equations
of the above type. If the equation is carried to the secohd power of tI(
it is called the parabola of second degree; if to the third power of x it
is called the parabola of the third degree and so on. Ordinarily, second
degree parabola are used for the purpo,se of fitting parabolic tre.p.d.
In a parabola of second degree there are three unknown valut's and,
therefore, three simultaneous equations are obtained to secure the 're-
quired values.
The technique of obtaining equations is the same as deta.il~ in
case: of straight line trend. First!y, all observed equations are multiplied
by the respective values of the coefficients of a and totalled. This gives
the first norm:;!] equation. Then the observed equations are multiplied
by the tespe~tive values of tbe coefficients of b and the sum of these
gives the second normal equation. Lastb, the observed equations are
multiplied by the respective values of coefficients of G and the total gives
the third normal equation. These three normal equations are
simultaneously solved to get the values of a, band G.
In actual practice it is not necessary to multiply and add the ob-
served equations as mentioned above. The relevant values can be
substituted in the following normal equations:

Xly)=nl1+b .t(x)+~ X(x!) (i)

1:(-"'£1)=« :t(x)+b text) XG t(x-> (ii)

l:l.x~)=a t(xl)+b t{xl)+G xCx') (iii)

The following example would illustrate the above method :-


Exa1f,ple S. Fit a parabola of the second degrt'c to the data given
below:-
ANALYSIS 01' TIME. SER.IES 433
TABLE Xlll
ValulS of a Variallit (1)
Year Valuesofy
------.---:-----~-

194'
1946
1947
1948
1949
195 0
195 1
195 2
195.3
1954
1955

fO/lltiDlI :
TABL~ XIV
Filling of a P.raboltl of Ih, Se;o,,,J Delre,
(ear Valucs l)CV13-
tions
from
middle
year
(1) (x) xl xC
.,
..6
17 -5 - 85 2.5 -uS 62 5
2.0 -4 - 80 16 - 64 2.5 6
7 19 -~ -51 9 - 2.7 81
8 26 -2. - 52 4 8 16
9 %.4 -I - 2.4 - 1 1
j
40 0 o 0 ° .) 0
1 35 +1 35 + n 1 + 1 1
+2 4 + 8 16
:I
~
55
5I -7-3
+110
+1",
:UO
4'9 9 + 17 ,.
~ 74 +4 +196 lIS4 16 + 64 2.5 6
69 +5 +H5 17 2 5 2.5 +u5 6z1

Substituting the values in the normal cquations giTco aboTe


43o-elI xa) +~o~b) +(uoXt) ......... a ....•• (ill)
64I=(OXd) +(lIoxb) +(oX;) ..................... (II)
4667=(uoxa) +(oXb) +(19SSX&) (IIi)
: or
1"1a+ I 10&=43° ........................................................... •(";'j
110.=641 ................................................ .............. ('i~i)_
IlOll+i9'8i=4CSb7 .. • ..... ................. : ............ _ ....................... {IX
18
434 FUNDAMENTALS OF STATIS'I'ICS

From equation (viii), h=5.8. By solving equlLtion (vii) and (ix)


we shall get the value of a=;4.8 and of C=.43.
Thus the values of a, hand c are respectively ;4.8, 5. 8 , and ·43·
Equation of the parabola of the second 'degree would be y=a+bx+cx·
Substituting the values of a, band c, we get
y=;4.8+5. 8x+.43 X •
We can now obtain the ,values of y by substituting the various
values of x in the above equation.
Thus,
When X=-5, Y=34· 8 +CS. 8 X-S)+(·43 X 2.S)
=16.55
x=-4, .Y=I!1.4!l
x=-;, y=2.1.7.7
x=-2., Y=Z"4·~
" X='-J, Y=29·43
.," x= 0, )'=34. 80
o. x= I, )=4 1 •0 3

.." x=
x=
x=
z,
;
4,
. .)'=48 . 12
.1-,6'97
.1=64. 88
" x= 5, .1=74·55
. These ar~ the required trend values.' Th~ original data and the
___ .. d values are plotted below in figure 9 : -
Vallles of a Variable and itl Trlllil
(Parabola of the Second Degree)
'aNALYSIS OF -rUlE SERIES

The greatest limitation of this method is that if attain items arc


a1ded to the series, fresh trend values have to be obtained hecal1se then.
the values of ~, ~I etc., would aU change.
llEASURE}.(ENl' OF SHORT PERIOD FLUCTUATIONS
In the measurement of short period fluctuations, the ·trend valu.e's
are isolated from the original data. As has been said ~arlier there (;I,n
be four components of a time series. A time series, is tbe total of the,as
components. If from the value!! of a .time series the trend VIUues fAte
subtracted, the remainder will be 'the values of the remaining 'The
components, namely, seasonal. cyclis:al and irregular fluctuations. tua.-
totals of these three components are called the Short. Period .Flue ;tua,..
cions. In the. following table five yeady moving average has bc:en useeL
to obtain trend 'Talues. These values arc subtracted fror;n the orgrdnaJ
data and the resulting figures reprr:sent the short- period fluctuations.
TABLE XV
Va/lies IIf a Variable-Colollotion oj Short
Period FltlGllMlio,1S
'{cars Annual 5 Yearly Deviations from
Value moving the trend values
average (Cu'. Z minus Col. 3)
(1) ,.
(1. ..
~3) (4)
191.0 Z25
191.1 1.10
191.2 tOI 1.15 -14
191~ zl) ~19 -4
19 14 zz3 2.24 -)

1915 145 2z9 +16


19 16 135 :t311 +3
191.7 23,4 237 -12
19 28 z39 241 -8
192~ 149 :!46 +3
19~{ uS 2Sl +14
193
193 2
259
&4'
2SZ
25 6 _.,
+7
1933 241 z60 -19
1934- &6$
%8,
z{,J r&
19H zb6 +Iq
193 t &15 270 -t-1
1937
I9j8
1939
2.6t
·59
.]1
z7 2
274
1.77
-1,
-7
-1
1940 t97 280 +17
1941 z89 28 3 +6
1')4z 281 2.87 -6
1943 2.1~ 29 1 -16
19«<
1,. .
1,94'
6
z9~
£1'
,oJ
&9"
.w- +1
--
436 FUNDAMENTALS OF STATISTlCS

The figures given in Column No. 4 are obtained by deducting the


values of Column No. 3 from the value:; given in Column No.~. These
figures indicate short period fluctuations of the data. They are plotted
in Jigure 10 below : -

Sho,.t Pe,.iod FllIGtllations

I1T I -r- -T-rl r T Ii'


~\jlJ-1
_J
,

o
\ 1
J
\1 II
I
t. \ 1 I I\ I
1\
t\ Il
i ·
\

~ ~ I\ \
\~
V
II I
-
1\ I -+
,t
- (0

~ II
-20
~ n H U U
i
~
J.~_~_
D ~'H
,

U
11' I
U v
Fig. 10

[t is clear that the data shown in figure 10 above are-totals of the


three components, seasonal, cyclical and irregular fluctuations. The
next problem (after the elimination of trend values) is to obtain separate
vaues of these three components. In other words, the problem is to
divide the short fluctuations in three parts-first representing the
seasonal fluctuations, the second representing cyclical fluctuations and
the third representing irregular fluctuations. The technique of this
analysis is dIscussed in the following page!'>.

Measurement of seasonal fluctuations


There are three irrlportant methods by which seasonal fluctuations
arc measured. They are as follows:
1. Method of monthly averages to compute the seasonal varia-
tion indCJ..
A.NALYSIS OF 'rIME SERIES 437

z. Method of moving averages to compute seasonal variations


llnd indices of seasonal variations.
3· Method of link relatives. The above methods are discussed
below in this very order.
Seasonal variation index-By monthly averages
The procedure followed in this method cap be explained witn the
help of the following table :--

TABLE XVI
Calm/atiol1 of Seasonal Variations index Iry MONthly Averages

:>- ...
-5tl :g.B S 0:>-& .:.
0
,~
Production of a Commodity d-
o 5
II.J
:>-
._>0::: """
~
1l.ull.J
«I t:l
U '"
OJ
!::!!>Il
)lB"" :>-~ q) «I
Cl..
----
193 6 1937 193 8 1939 1940
(1) (2.) (,) (4) (5) (6) (7) (8) (9)
Jan. 13 1 I4~ 2.08 2.;8 2. 63 9 85 197.0 99·7
Feb. 12.9 15 1 211 2.37 261 9 89 197. 8 100.1
Mar. 12.1 149 20 7 225 250 952. 19°·4 96.3
Apr. 119 143 2.01 21 9 2.44 92.6 185·2- 93·7
May 113 149 201 212 240 9 15 183.0 92.·6
June Il6 159 20 3 206 253 937 18 7.4 94. 8
uly It; 153 193 201 25 0 910 182.0 9 2. 1
Aug. 113 165 2.05 210 255 95 8 191.6 97. 0
Sep. 1;0 173 210 220 2. 69 1002 200·4 101.4
Oct. IH IS, 22.1 1.37 285 10 59 zII.8 10 7. 2
Nov. 145 197 21.3 1.41 1.90 10 96 %.19'.2 110·9
Dec. 149 19 8 222 260 300 IIz9 22.5.8 114. 2
---------- ----
Total 118 58 237 1.6 11.00.0
---_._._-----------------------
Average 988 197. 6 100.0
--------- ----- ----
The procedure followed above can be explained in the following
four steps : -
(I) Find out the total of the values of similar months. For
example, in table XVI above the totals of the figures of Januarys,
Februarys, etc., are given in column NO.7.
(2) Divide these totals by the number of years. This will give
the monthly average fot variou5 months. (table XVI, column No. -8)
438 FUNDAMENTALS OF ST.".TlSTICS

(~) Calculate the average of monthly totals. It can be done in


two wa.ys: Either divide the sum of the monthly totals by 12 or divide
the total of monthly averages by 12 (table XVI; last line).
(4) Calculate the percentages of various monthly averages.
Assuming t.he average of monthly averages as 100. In table XVI, for
Janua~y thiS percentage
=~?_r:tl.), Average for Januaty X
IOC
A verage of Monthly Averages
197. 0
= 197·
--6 X100 =99·7
or
Monthly Total for January
= - --- - - - - - - - X 100
Average of Monthly totals
85
=-9 88
. -x 100=99.7
9
These are the percentage seasonal indices and they measure seasonal
fluctuation of the data (table XVI. column NO.9)' These figures of
seasonal indices have been plotted below in figure No. I I : -

Seasonal Variations Indites ~

r.. 'J- I

tlO _. ;/
/
fV c,_. I V
'\ VI

90
J M
'" A
V
:---.
'M j
~
.i
VI h
- A
I
5 ~[l'
ANALYSIS OF TlME SERIES 439
The\ seasonal tluctuations of the data given in table XVI are very
clearly visible in the above figure. In this example data relating only
to five y.ea\~s ~avc been taken into actual practice. Data relating to a
larger numher of years should be taken so that cyclical fluctuations may
not affect seasonal variations.

Seasonal variations and their indices-By moving averages

In this method the ~follo\Ving steps are taken to study seasonal


variations : -
(1) Calculate the moving average of the data.
(z) Sub~ract the moving average figures from the original ones
and thus obtam the values of the total fluctuations.
(.,) Tabulate the fluctuations obtained in the above manner
season-wise (month-wise if figures are monthly).
(4) Obtain average fluctuations for each season (or month if the
data are monthly).
These figures are seasonal variations.
. (5) Subtract the seasonal variations from the total fluctuations
and the remainder wodd be the irregular fluctuations.
If instead of seasonal fluctuations indices of seasonal varia.tions
are needed, the following procedure should be followed : -
(I) Calculate the moving average of the data.
(2}, Convert the original data into percentages based on the res-
pective values of the moving average.
(;) Tabulate these percentages season-wise. If monthly fig~res
for fr.ve yeats are available there will be .five such percentages agatnst
each month.
(4) Obtain the average of these percentages for each season. If
ligures are monthly there ,-<ould be 12 slfch Averages.
(5) Obtain the average of the average percentages for various
seas()n~. In case of monthly figures the average of the average per-
centages for 12 months would have to be calculated.
(6) This average (calculated as per rules given above NO.5)
should be taken as 100 and the average monthly percentages (c~lculated
as per rules in NO.4) should be e~ptessed as percentages of thiS figure.
They are the required seasonal variation indices.
FUNDAMENTALS OF STATlSTIC~

In table No. XVII below seasonal variations have been calculated


in accordance with the method given above :_
TABLE XVII
Re-
Year and Production (2-Monthly Moving Short Seasonal main-
Month moving average time variations ing
average centred fluc- , fluc-
/
tuation tuation
(1) (2) (3) (4) (S) (6) (7)

IY3 6

Jan. 13 1 .. , '" ... ... ...


Feb. IZ9 ... '" ... ... ...
Mar. IZI ... '" .. , ... ...
Apr. 119 .., ... ... ... '"

May
I II 3 ... '" ... ~
-., ...
June 116 12.7 ... ... ... '"
\
July 113 I u8 1%7·5 ---I4.~: -17. 1 +2.6
Aug. 12; !3° 12.9.0 - 6.0 -9. 1 +3· J
Sept. 130 131. 13 1 • 0 - 1.0 --4·4 +3·4
I

(lct. 133 134 \33·0 0.0 +303 -.;.3

Nov. 14 6 137 135· S +To·5 +9·3 +1.2


i
Dec. 149 141 I ~9.0 +10.01 +11.8 -1..8
I
1937
Jan. 146 144 14Z.·S + ;·5 +15·2. -12·7
Peb. 151 14 8 14 6.0 -I- s·o I- I ;'9 -11·9
1\1ar. 1.19 1S2 15 0.n
II - 1.0 +3. 8 -4.8
ApI.
i
143 15 6 154. 0
~ -·1 [.0 -5,4 -5. 6
ANALYSIS OF TIME SERIES 441
(:
May 149 160 158.0 -9. 0 -9.6 +0.6

June 159 16 4 r 162.0 - 3. 0 -7. 8 +4. 8

July 153 168 l66·5 -13·5 -17. 1 +3·()

Aug. 16 5 174 71.5 - 6.5 - 9. 1 +2..()

Sept. 173 179 176.5 - 3·5 - 4·4 +0·9


Oct. 18 3 18 3 181.0 + 2..0 +3·3 - 1.3

Nov "97 18 7 18 5.0 +12.0 + 9·3 + 2·7

Dcc. 19 8 19 1 18 9. 0 + 9.0 +1l.8 ._ 2.~

193 8
Jan. 208 195 193. 0 +15.0 +15. 2 -O.l

Feb. ZI2. 19 8 19 6., +1505 +13·9 + 1.6

Mar. 2.0 7 2.01 199·5 + 7·5 +3. 8 + 3·7


Apr. 201 2.0 3 202.0 - 1.0 - 5·4 + 4·4
May 2.01 20 5 20 4.0 - 3.0 - 9. 6 + 6.6

"June 20 3 20 7 206.0 - 3.0 - 7. 8 + 4·8

J~ly 193 :no z08.~ -15·5 -17. 1 + 1.6

Aug. 20 5 212 211.0 -6.0 - 9. 1 + 3·1

Sept. 210 21 4 2.1;.0 - 3.0 - 4·4 + 1·4

Jet. 221 2.16 21 5. 0 -1- 6.0 .,...


I
H + .2·7

Nov. %Z.J. 2. 17 1.16., + 7·' + 9·3 - 1.8

Dec; 222 21 7 2. 17. 0 + 5. 0 +II.8 -- 6.8

1939
Jan. ZIS Z~8 21 7.' +20·5 +15. 2 + .503
. Feb. 2.37 2.18 118.0 +19. 0 I +13'9 + S'l
442 I'UNDAMENTALS OF STATISTICS

Mar. u6 2. 1 9 US·5 + 7·5 + 3' S I + ;·7


Apr. 21 9 220 ZI9·~ -o·S + 4·9
- 5.41
May ZIZ tzz ZZI.0 -9. 0 - <).6 + 0.6
I
June z06 ZZ} 22.3·} --n·5 -7. s1 - 9·7
July 201 22.7 2z6.0 -21.0 -17.1 - 7'9
Aug. Z10 229 2ZS.0 -18.0 - 9. 1 - 8,9

Sept. 220 2~1 Z3°· C -10.0 - 4·4 - 5. 6


Oct. 237 Z;~ 232.·0 + 5. 0 +H + 1·7
Noy. 2.4J 235 2.34. 0 + 7.0 + 9·; - 2·3
Dec. 2.60 239 2.37. 0 +2.3. 0 +II.S +ll.Z

1940
Jan. 2. 6 3 243 241. 0 +22.0 +15·2. + 6.S
I
Feb. 261 ~47 245. 0 -+-16.0 +13'9 + 2.1

Mar. 250 2H 2.49. 0 + 1.00 + ~.8 2.1I

Apr. 2.44 2H 253. 0 -9.0 - 5·4 3. 6

May 240 2}9 2.57. 0 -17'S -9. 6 - 7·9

June 2S3 262 260.S -7·5 - 7.S + 0·3

July 2.50

Aug. 255

Sept. 239 ...


Oct. 2.8}

Nov. 29 0

Dec. 300
ANALYSIS OF TIME SERIES 443
The tigures of seasona~ variations given in col. (6) of table No.
XVII above have b~en calculated as below : -

TABLE XVIII

G"ak/dation of Se,asonaJ VariafiolU

Seasonal
Month Dev:iations from Trend variation

'''6=~9''
IAverage
1939 1940 of cols.1.
3,4,' and
~ __ ,_ (2)____~;~__ (4)_ _ (,) (6) 6 (7)
_--'- ---- ----2
J~uary i ... + ;., +15. 0 +20·5 +22.0 +15.
February ... + ,.0 +15·' +19. 0 +16.0 +1;'9
March ... - 1.0 + 7·5 + 7·5 + 1.0 + 3. 8
April '" -II.O - 1.0 - 0·5 -9.0 - 5·4
May ... - 9. 0 - ;.0 - 9. 0 -17·' - 9. 6
June '" - ;.0 - 3. 0 -17·' - 7·5 - 7. 8
July I -I4'J -1;·5 -IS·' -2S·0 ... -17. 1
August I
- 6.6 - 6., - 6.0 -18.0 ... - 9. 1
Sept. - 1.0 - ;·5 - ;.0 -10.0 ... - 4·4
Oct. 0.0 + 2.0 ' ,
+ 6.0 + 5. 0 ... + ;·3
Nov. +10.5 +12.0 + 7·' + 7. 0 ... + 9'3
Dec. +10.0 + 9. 0 I
+ ,.0 +23. 0 ... +II.8
I :1

In the calculation of seasonal varIation indices the rules given


above can be easily followed. Thus if we have to find out the seasonal
variation index for January, the first thing to be done is to convert the
figure of production of January into a percentage based on the corres-
ponding figure of the moving average. Thus for January 1937 the
percemage would be 145 X 100 or 101.7. This process will have to be
14 2 .5
repeated for the production figure of January for all the yeats. i\ll
these pe1centages for the month of January should then be- averaged.
In this way the averages should be calculatt'c1 f0r ea<:h month. After
this, the average of these monthly averages should be taken as 100 and
the monthly averllges should be expressed as percentages of this figur!!.
The resulting figures would be the required indices of seasonal variation.
Figure No. 12 below shows the Bgures of production given io
table XVIII and figure No. 13 shows their seasonal variations and irre-
gular fluctuations : -
444 FUNDAHBNTALI OF STATISTICS

Figures oj prodllction q.nd the calclliated trend valuos


40 0
- O"'jmal dd(!
--- Tr~nd

30 , --p'.- -
,. .'
I~, , . f
.'

~; ,_---/'"
>o~

.".:../
....-~---/~ -- ....
}O 0 ' £c '

~:/.,
,

10
~

0
J D o D
t I
D
Fig. 12 \
Seasonoi poria/ion olld irrcgN/ar j/u.-tNalioIlJ
. 0
- 5,.q50nt1~ Variations
,
- -1""~9ular FluctuO/lon$

('\

0
-'
/.h '- .....
t r\ ., ,I
....

"V
A
! r . . '\, if , \
,
\

r\ I
Ir
\ I\J
'., ~"
'i'i'ol

V
" I \ I

/ .,
i~1
\ I

V
2v

I
!
,

0
,
,.,
i
I
If 0 I 5 M f::1 .!
J M 0
Fig. 13
!,.I S
" ..J !. I> .J
ANALYSIS OF TIME SERIE<; 445

The above graphs very clearly show the seasonal variations of the
data.
Method of link relatives
The following s~eps should be taken to calculate the seasonal varia-
tion indices by this method : -

(I)' Calculate the link relatives of the seasonal figures. For calcu-
lating link relatives divide the figures of each seaSon by the figure of the
immediately preceding season and multiply by 100.
(Current season's figure)
·
P revtous season•s fi gure X 100
(z) Calculate the average of the link relatives for each season.

(3) Convert these averages into chain relatives on [he base of the
first season.
(4) Calculate the chain relatives of the first season on the base
of the last season. There will be some difference between this ehain
relative of the first season and the chain relative calculated by the previous
method. This is due to the effect of long period changes. It is, there-
fore, necessary to correct these chain relatives.
(5) For correction, the chain relative of the first season calculated
by first method is deducted from the chain relative (of the first season)
calculated by the second methoc1~ The difference is divided by the
number of seasons. The resulting figure multiplied by I, 2, ~,(and so
on) is deducted respectively from the chain relatives of the 2nd, 3rd,
4th (and so on) seasons. These are correct chain relatives.
(6) Corrected chain relatives expressed as percentages of their
averages, give indices of seasonal variations.
The following table would illustrate the above procedure : -

TABLE XIX

Quarterly Fif!,lIrel
Quarter 1940 1941 194.2. 1943 1944

4·5 4. 8 4. 0 5·2 6.0


2 5·4 S.6 6·3 6.~ 7. 0
3
4
7·.2.
6.0
6·3
1.6
7.0
6.~
7·S
7.2 ,.,
8·4
446 FUNDAMEN'l'ALS OF STAnSllCs

The chain relatives would be calculated as follows !-

TABLE XX
CalUllation of Chain ReialifJe.f
Quarters
Year
1940 -- 1

80
2
120 1)3
3 4
83
1941 1I7 IH 89
1942 88 12 9 III 92-
1943 80 IZ,5 115 96
1944
Ar1·thmetle
.
8, Il7 rzo 79
I
Average 81.8 12.1.6 , 118.4 88
Chain Rela-
tives 100 100 X 111.6 121.6 X Il8'4 '-14;'2 X88
100
=1,ZI.6
100 i=u6.6
100
14~·9
Corrected chain
r~latives
100
(
1,21.6-1.1
=120·4
143.9-2 .4
=141·5
I u6.6-;.6
=1 2 3

Seasonal
IndiCes
100 12.0·4
121.Z
141/5
- - XIOO --', XIOO
IZl.2
\
I
-XIOO
IZ3·
121.,
0

=92.4 =II6·7 =1 0 1.,

In the above table the figure for c:orrection has been calculated as
follows ! -
Chain relative of first quarter=loo
(on the basis of first quarter)
Chain relative of first quarter
. fl )=82.8 X 12.6.6
(on t h e b aSIS 0 ast quarter
JOO
= 104.1,1
The differenc~ between these chain relatives=r04.8-roo
=4. 8
4.8
Difference per quarter = - =I.Z
4
Seasonal variation indices have been calculated as follows :-
Average of corrected chain relatives
+
__ 100+ 12.0.4+ '41.S 1.23. 0
4
=121.2
Seasonal variation indices
Corrected chain relatives X 100
111.2.
ANALYSIS OF TIM8 SERIES 447

Measurement of cyclical and irregular fluctuations

If from the original series trend values are isolated the remainder
consists of regular and irregular fluctuations. If from these fluctuationrs
seasonal variations are also isolated the remainder consists of cyclic and
irregular fluctuations. In figure No. 13 we have presented seasonal
variations and the remaining cyclical and irregular fluctuations. Or-
4inarily if the period of moving average coincides with the period
of cycle in the series and if the cycle is of a more or less uniform duration,
cyclical fluctuations are considerably reduced and sometimes elimnated
in the process of finding out the trend values..- As such, if seasonal
variations arc isolated from the total fluctuations, the remainder mostly
consists of irregular fluctuations. However, a certain amount of cyclical
fluctuations may also be there if they have not been completely eominated.
There is no well-recognised method of separating irregular and cyclical
fluctuations. One method is by finding out the moviQ.g aveage of the
series of cyclical irregular fluctuations. By this proces the irregular
fluctuations would be reduced and cyclical data would hecome more
prominent. The period of moving average in this case would depend
on two factors (I) The irregularity of the data and (2.) the extentto which
the curve is to be smoothed. The more irregqlar series is, the longer
should be the period of moving average so that irregularities in one'
direction may be set off against irregularities in another direction. But
if the period is too long the curve would be very much smoothed. The
problem is to find a middle course between these two factors and in
taking a decision the object of the analysis should be kept in mind.
So far as irregular fluctuations are concerned, t~ere is no method
to isolate them. Since by nature they are irregular it is difficult to know
anything about them. After removing trend, seasonal and cyclical
fluctuations from the original data whatever remains, constitutes the irre-
gular fluctuations. Since they are irregular in character, their scientif,c
analysis is more or less out of question. [t should, however, not be
taken to mean that they are not important. They are sometimes very
significant and they can even give birth to cyclical and ,other types of
regular fluctuations.
QuestioDs
I. Distinguish hetween secular trend, seasonal variations and cyclical fluctua-
tion •• How would yO~1 measure secular trend in any given data. (M. Com., Agra, 1946)
z. (a) Distinguish between regular and irregular fluctuations in a time series ••
(b) Write a short note on the value of analysing time variations.
(M. A., PtIII}4b. 19S1).
\. Write a short essay on "Analysis of Time Series" (M. A., Palna, 19S4).
4. Describe briefli the statistical procedllre you would adopt in the analysil
of Ii time aeries and explain how you would isolate the secuhr trend.
(M. A., Pahra, 1942).
448 FUNDAMENTAL<: OF STAT1STTC~

s. Esplain clearly what is meant by time series analysis. Indicate fully the
importance of such analysis in business. (B. CtJm., LIM_III, 19«).
6. Describe one method each of (a) eliminating the effect of trend from time
aeries (b) measuring the seasonal variations.
In measuring seasonal variation. can cyclical and erJ:atic influences be elimina-
ted. How ? cr.
A. S., 1948).
7. What Is meant by trend? How would you statistically eliminate the in-
fluence of seasonal and cyclic factors on the lOng period movement of any series.
(B. Co",., BOlllbtIJ, (936).
8. Discuss the claims and limitations of the method of moving averages as
applied to analysis of time series. (M. A., Delhi, 1953).
9. How would you find out the trend values in a series by the Method of Least
Squares? Explain the mathematical implications of the technique.
10. What do you understand by parabolic trend? How would you fit a para-
bola of the znd degree to a time series to obtain trend values?
I I . In an experiment designed to find the effects of seed rate on the yield of
wheat the following results were obtained : -
Seed, rate (lbs. per acre) 40 50 60 70 80
Avcrage, yield of wheat
(lbs. per acre) 850 86z 8S8 817 768
Draw a grapb and fit a second degree parabola.
u. Fit a stmight line trend by the method of least square and a ~rabolic trend
(by a parabola of the second degree) to the data relating to growth Of reserves of
Cooperative Societies in India liS given below. Plot th~ --seties and the trend on a
graph paper:-

Year. Reserves Year Reserves


(Lakhs of rupee~) (Lakhs of rupees)
19 z7- z8 612 193 1-5 z 1001
1928- 29 719 193 2-H 1I06
19 29-3 0 820 19'3-34 12 31
TQ~O-~T <)0"7

l~. Explain how you will deal with a time series, and illustrate your remarks
with the help of the following series of annual figures for the period 1901-1930.
Period A.nnual Values
'90 I - 19 10 ot08,zz3.22 S.Z12..Z39,&4z.z38.z S2,2 57,:&50,
1911-19z0 Z73,Z70,z6B,z88.z84.:t8z,300.soS.Z9S.3 I 3
1921 - 193 0 3 I7.30 9,3 z9,335.3 z 7,34 S,344,HS.S62,S60
(I. C. S.. 1939).
14. Explain the use of moving averages in thc analysis of timc scries. Find
out an approximate moving average for the follOWing series : -
ll}Ol 506 19u 818
ll}OZ 620 1913 745
190 3 1036 1914 8.+s
lI}04 615 1915 U76
I~
l1}O6
s"
696
I~6
1917
~
814
1907 1II6 1915 929
l1}O8 738 1919 1560
lI}09 663 1920 961
1910 777 1921 926
1911 n89
(M. A. CaltIIll.)
A:;rhLYS~S OF TIME SERlE' 449
15. The following are the figures for the infantile mortality rate in England
and Walea (deaths of infants under one year of age per 1,000 live births).
Year Rate Year Rate

19 22 77 193 6 59
192 3 69 1937 58
19 24 75 193 8 H
19 2 ' 75 1939 ,I
19 26 7° 1940 57
19 27 7° 1941 60
19 211 65 1942 61
1929 74 1943 49
193 0 60 1944 45
I93 1 66 1945 46
193 2 6, 194 6 43
1933 64
1934 59
1935 57
Fit a simple moving average of fives to the series and apply a further simple
moving average of fives to the result.
16. Business Cycles in U. S.A. and England arranged in chronological ordeB
(1796-1923) have had the following d\!~tion as measured to the nearest years:-
U.S.A.
6,6, 5, 3. 7, 3,3, 5,4, 3, r" t, ", 6, 4, 3, 5, 5,4.9, " 3. 2. 3, 4, 3. 4 ,2, " 5, 2, 3·

England
4,6,3,5,6,4. Z, 6,10,7,4,8,8,9,8,10,7,6,5, z.
Tabulate the above figures in classes of one year eaeh and calculate the average
duration of the business cycle in each country separately. (B. Com., LHt!t:no", 1939).
17. The following table gives the Bank Clearings in the Bombay city for the
years 19.6 to 1940 in millions of rupees. Find the trend.

19 16 52·7 192 9 94. 6


17 79·4 30 8,.0
18 76 .3 31 II 0.6
19 66.0 32 159. 6
ze 68.6 33 177·4
ZJ 93. 8 H 178•6
22 104.7 35 235.8
23 -B7· 2 36 243. 2
24 79·3 37 194·4
Z5 103. 6 38 21 7.9
26 97-3 39 Z14· c
27 9 2.4 40 "5 6 ,7
,8 100·7
(B. Com., Allahabad, 1943).
1 S. Assuming a ten-yearlv cycle for the following lIeries relating to Index Numbeta
of the Retail Price of wheat in India (18n= 100), give the trend values, and reo
present graphically the short time fluctuations with the trend removed.
29
450 FUNDAMENTALS OF STATISnCS

Year Annual Average Ye'U Annual 'A vcrage


1906 111 1918 &70
'90 7 168 19 19 34 1
1'}O8 2.2.6 19 z0 510
1909 20 3 19u 560
1910 170 19 22 315
19 I1 ['53 192 3 H6
1912 170 19 2 4 Z4 6
1913 177 19 2 5 294
1914 20e. 19 2 6 281
191~ 127 19 2 7 267
1916 193 192 8 z64
19 17 20 5 192 9 262
(M. Com., AIJ4babad, 1944).
19. Belo- are given figures of production (in thousand maunds) of a sugar
factory : -
Year Production 10 Year Production In
thousand maunds thou~and maunda
1941 80 1945 94
1942 90 .1946 99
1943 92 1947 92
1944 83
<a) Find the slope of a straight line trend to these figurers.
(b) Plot these ligures on a graph and show the trend (me.
(,) Do these figures show a rising trend or a falling trend? Row do you arrive
at your conclusion? (M. Com., .l...uthto.., 1950).
20. Compute the trend of the Sterling Assets of the Reserve Bank of India bl'
the method of Least Squares.

Yeu Sterling Assets Year Sterling Assets


(crores 'f RI.) (crores of Rrs.)
~ 1936-37 8~ 1939-40 90
1937-3 8 9z 1940-41 169
193 8-39 7J 1941-4' 191
ZI. The following arc thc Monthly Index Numbers of Commodity Group
(Food and Tobacco) issucd by the Economic Advisc" to thc Govt; of India.
Prices for week ending 19th August. 19,9= JOO
Month Index Month Index
194'1 1942
Oct.
Nov.
JZ7·4 July -
August
In· 1i
IZ7·9 15 8 .9
Dcc. 12 7.5 SeptClI'''''r 161.0
194 2 Oct. 167. 2
Jan. 12.8.4 Nov 17 2 .4
Fcb. J32.' Dec. !,;,8·5
March 1,0.5 1943
April 1,6.1 Jan. 190.8
May 144.7 Fcb. '-7 0 •0
June IU ••
Fit' a straIght line ~rcnd_ to thc above data by the method of the Least Squuca
AN.ALYSIS· OP ;'TIME 'SERIES 451

%2, Study the short-time fluctuations of the following temperatures measured


In degrees Fahrenheit : -
Date 1941 Temp. Date 1941 Temp.
Feb. 40 Fcb. 11 78
1 50 11 80
3 44 13 60
4 70 14 64
5 52 I'S 61
6 44 16 68
7 36 17 86
8 40 Ii 96
9 56 19 94
10 68 20 78
(B. C,,,,., Allababad, 1941)
23. Using the data given below, explain dcarly how you would determine
seasonal fluctuations in a time series : -
Ycar Summer Monsoon Autumn Winter
30 81 61 II9
2 53 104 86 17 1
:J 41 153 99 :t:tl
4 56 172 129 135
S 67 20( 136 30 2-
(I. C. $., 1940).
. 24. Analyse the following figures of output of coal in Great Britain so as to
8trlve at the e.xtent of
CD) seasonal movement and
(b) irregular fluctuations:-
GREAT BRITAIN
Quantity Output of Coal (In Million Tons)
Year Quarters Output
1927 J 68,3
II 62.6
111 6'11.1
IV 63·3
19 28 I 65'.4
II n·9
6
III 5 .4
TV 6t·5
19 2 9 ~ I 68.1
II 62.7
III 62.8
IV 67.0
1930 J 70 • 1
II 59·1
TJJ S6.~
IV 61.6
IQ~t I 59·5
II 54.8
IIJ 5 J •1
IV ,8.0
(M.e_., AlltJl1obtla, ~94").
4:'2 Ft1NDA:u:5NTALS 01' STATISTICS

z,. 'The following are the quartedy index numbc:n of Industrial Production
with lQ50= 100 (All items) published by the Board of Trade, U. K. By II movlnS
avenge of four, calculate II quarterly index corrected for seasonal effects.
Year Index Year Index
1928 193 0
I 106.0 I 10 7.6
II 100·4 n 100.0
III 97. 1 III 96.,
IV 10~·7 IV 96 •0
192.9 193 1
I 107.2. I 9 1.5
II 108.6 II 89. 1
III 107.3 III 86.4
IV IlO., IV 94. 1
193 2
I 9 1 .7
II 91.0
III 84.4
IV 9 1.7
2.6. The following is an index no. of the price oflead from 19z!)-1945 together
with the "Statist" wholesale price index of the period. Construct an index of Jead
prices, "corrected" for changes in the wholesale price leveL
Year Index ofwholesaJe Index No. of
price Lead prices
192.6 u5 157
1927 122 US
19 22 119 109
19 2 9 114 II7
193 0 9 6 95
1931 82 71
193 z 79 63
1933 7 8 6,
1934 8y 61
1935 83 78
193 6 88 9'
1937 102 121
193 8 90 83
1939 94 8,
1940 128 127
1941 14z 129
1942 151 129
1943 115 129
1944 160 129
1945 164 42
20 7. The number (in hundrt'.ds) of letters posted in a certain city on each day on
a typical period of five weeks was as follows : -
Sun. Mon. Tues. Wed. Thura. Pri. Sat. Total
for each
week
lit week 18 161 170 164 153 181 76 92 3
and week 18 16, 179 157 168 195 85 967
~rd week 162 169 153 139 IS, S.1 9I1
~th week 182 170 16z 179 95 9 83
5th week 186 170 170 18z 120 101 7
Total for 886 814 79 2 922- 458 4801
1111 weeks ,
Calculate the average fluctuation, indices within a week.
(B. Co",., A"tlhra, 1942)
ANALYSIS OF TIME SERIES 453

:ll. Deseribe one method each of


(If) eliminating the effect of trend from a time series, and
(b) measuring seasonal variation.
In measuring seasonal variation, can cyclical and erratic influences be elimI-
nated? How? (1. A. S., 1948).
:l9. The annual average price index of merino wool for the years 1900-1938
is tabulated below. Draw a graph of this data and then superimpose on it the graph
of a s-ycar moving average. Briefly deacribe the trend.
0 % 3 4 6 7 8 9
190- 79 72, 87 9; 92, 98 101 100 8~ 101
'9 1- 101 94 98 IO~ 109 151 187 2S2 249 393
192-,64 117 185 210 273 204 179 190 181 134
'9~- 91 76 69 96 92 95 117 121. 8~

50. The revenue from Sales TID: In U. P. during 19S8-S9 to 1(62 to 63 is shown
in the following table. Fit a straight line trend by the method 0 least squares and
exhibit the data as also trend on a graph paper.
Years 18 58-59 59-60 60-61 61-62 62-63
Re...enue (RI. Lakha) 427 6u 52 1
51. Flna out Trend, Short-time Oscillation, Seasonal VariatIons and Iuegular
fluctuation. &om the following data : -
Year Summa Monsoon Autumn Winter
1. 62 II9
,....
J.. 86
99
u9
171
2.2.1
2.35
s· 136 5 0 2.
Correlation 16
Meaning. In various types of analysis discussed so far in previous
chapters we have confined ourselves to such series where variol s items
assumed different ,'alues of on, variable, We have discussed how,
measures of central tendency and measures of dispersion and skewness
are calculated in such cases for purposes of comparison and analysis,
With the help of these measureS such data can be easily understood.
There can, however, be such series also where each item assumes the
values of two or mor, variables. For example, if the heights and weights
of a group of persons are measured we shall get such .series where each
member of the group would assume two values-one relating to height
and other relating to weight. If besides heights and weigbts, the chest
measurements were also taken, each member of the group would assume
three values relating to three different variables. In such cases we can
calculate averages, dispersion and skewness, etc., in accordance with
the rules given in previous chapters.
But sometimes it appears tha the values of the vJrious variables
so obtained are inter-related. It is likely that such relationship may
be obtained in two series relating to the heights and weights of a group
of petSons. It may be observed that weights increase with increase in
heights-so that tall people are heavier than short sized people. Simi-
larly, if the data are collected about the prices of a commodity and the
quantities sold at diff~rent prices two series would be obtained. One
variable would be the various prices of the commodity and the other
variable would be the quantities sold at these prices. In two such series
we are again likely to find some relationship. With increase in the
price of the commodity the quantity sold is bound to decrease. We can
thus con~lude that there is some relationship between price and demard.
Such relationships can be found in many types of series, for example,
prices and supply, heights and weights of persons, prices of sugar and
sugarcane, ages of husbands and wives, etc.
The term correlation (or co-variation) indicates th, ,.elationship betfIJ,en
two slIch variables ;n wbicb with cbanges in the va/illS of one variable, to,
vallies of the otoer variahle also chang'. Here the word relationship has
been used in the sense of mutual dependence. Correlation in two series
need not always be the result of their mutual inter-dependence. Changes
in one series may be the cause of changes in the other and there may
be cause and effect relationship between the two series. It is also-...likely
that the changes in the two series are the effects of a third factor which
affects both these series either in the same way or in different ways.
CORRELA TION 455

In physical sciences the study of correlatic n is not very difficult. as on


the basis of experiments, mathematical relationships can be established
between the values of two or more variables. For example the effect
of heat on temperature can be reduced to a mathematical formula which
would disclose the relationship between these two variables. I n economics
and other social sciences the study of correlation, like other studies cannot
be so accurate and precise. Here the data are affected by multiplicity
of causes, and it becomes difficult to ascertain which particular cause is
responsible for changes in the value of a variable. Thus apparently,
there may be a relationship between price of a commodity and its supply,
~o that whenever prices rise, supply also increases and vice versa. But the
changes in supply may not be the direct effect of the changes in price
or in any case price changes may not be the only cause of changes in the
supply. There are a variety of factors which affect supply of a particular
commodity. Some of these factors are, labour conditions, availability
of raw materials, means of transport, etc. Under such circumstances it
becomes difficult to say that the changes in supply are al'Ways due to
changes in price. Moreover, it is extremely difficult to establish a mathe-
matical relationship, between the values of two or more variables in such
type of data. Economic laws are true only on an average and here too
the accuracy is not perfect. Therefore, in inexact sciences like economics
all that can be done is to find out whether, in general, the values of two
variables move in the same direction or in reverse directions. In either
case there rs an indication of correlation. But it should always be
remembered that existence of correlation is no guarantee that the relation-
ship between the two variables would always be of the same type. Tall
husbands may, in general, have tall wives and short husbands may have
short statured wives, but it does not mean that everv tall husband would
have a tall wife. Many a tall husbands may have short wives and many
short statured husb.ands may hav.! tall wives. A husband may have even
1:WO wives-one tall and one short statu red. Similarly there may be a
correlation bet'.veen price and demand so that in general whenever there
is an increase in price the demand falls, and vice versa. But this does noc
mean that whenever there is a rise in price the demand must fall. It is
possible that with the rise in price the demand may also go up. This is
so on account of the fact that in economic and social phenomena various
factors affect the data simultaneously and it is difficult- almost impossi-
ble-to st11dv the effects of these fact'Ors separately.
Positiv, and negative correlation. Corre~ation can be either positive or
negative. When the values of two variables move in the same direction
so that an increase in the value of one variable is associated with an in-
crease in the value of the other variable also, and a decrease in the value
of one variable is associated with the decrease in the value of the other
variable also, correlation is said to be positive. If, on the other hand,
the values of two variables move in different dIrections, so that with an
increase in the. value of one variable the value of the other variable dec-
reases, and with a decrease in the value of one variable the value
of the other variable increases, correlation is said to be newllivt.
456 PUNDAlLEN'l'ALS 01' STATISTICS

There are some data in which correlation is generally positive while


in others it is negative. Thus, generally price and supply are positively
correlated. When prices go up supply also increases and with the fall
in prices supply also decreases. The correlation between price and
demand is generally negative. With an increase in price the demand goes
down and with a decrease in price the demand generally goes up.
Linear and non-linear correlation. When the variation in the value:!
of two variables are in a constant ratio, correlation is said to be linfar.
Thus if with 10% incre~se in price each time, the supply also increases
by 2.0%, there is a lin'c::ar relationship between these two variables.
Their relationship is of the type y=a+ bx. We have seen in the previous
chapter that this is the equation of a straight line. If the corresponding
values of two such series are plotted on a graph paper a straight line
would be obtained. In economic and social data, however, such relation-
ships are very rare. They are found only in exact sciences. In economic
data the ratio of change in the two variables is generally not constant.
In such cases, the corresponding figure of the two variables would not
give a straight line. The correlation may be curvi-linear or non-linear.
Thus linear correlation is one where the ratio of variations in the related
variables is constant and non-linear correlation is one where this ratio
is fluctuating.
Correlation in two or more series can be studied bj anyone of the
following methods : - \
(a) Scatter diagram.
(b) Correlation graph.
(c) Coefficient of correlation.
Cd) Correlation table.
~Ce shall now study them in turn.

SCATTER DIAGRAM

Diagrams and graphs can be drawn to have an idea about


the relationship between two or more varia~les. We haye .already
discussed in Chapters XTJ and XIII the tecbnlque of draWIng dlagrams
and graphs respectively. Suppose in a data each iten; assumes. one value
each, of the two variables x and .1. For example, If the heIghts and
weights of 100 people. are meas~ted then eat;:h person ~ssu~el) one value
in each of the two van ables-heIght and weight. If heIght 18 represente.d
by x and weight hy y and if, further ..x-variable is plotted ?n the horI~
zontal scale andy-variable on the verucal scale only one POI.!lt ~ould be
plotted for each set of the two _values. On the hon~ontal scal~ It would
indicate the height and on the vertical scale the welght. In this way the
wI-.ole data about 100 people can be plotted in the sbape of 100 points.
It these points show some trend either upward o~ downward the two
variables are said to be correlated. If the plotted p01nts do not sho~v an_y
trend, the two variables are not corrc~ated. If the trend of the POlOtS 1S
CORRELA'l'ION 457
upward rising from left bottom and going up towards the right top
correlation is positive. If, on the other hand, the tendency is reverse,
~o that the points show a downward trend from the left top to the right
bottom correlation is negative. The following scatter diagrams would
clarify these points:-
Posifille corrl/afion Negative correlation Absense of correlation
._-
.... ...
to :0 10

.... . . ..
.. ..
'0
....
l" Y fO
..• Y
. .. .
.. . 0

• .
0 .: •
0 20
• 10
0
()
'0 20

Fig. I Fig. 2 Fig. 3

The three figures given above are scatter diagrams. They indicate
the scatter of various points. These points are not in any mathematical
relationship and as such they only indicate the trem! of the data. Figure
No. I indicates positive correlation as it shows that the values of the two
variables move in the same direction. Figure No. 2. indicates negative
correlation as the values of the two variab~es are moving in reverse direc-
tions. Figur~ No. 3 does not have any trend line and it shows that
there is no correlation between the two variables. .
Ids possible to have a line of the best fit in the above type of cmta.
If jt is drawn by the method of least squares it would set a mathematical
relationship in the variations of the two variables. One advantage of
the line of the best fit is, that with it, if the value of one variable is given
it is possible to estimate the value of the other variable. The line of the
best fit can also be drawn bY' free-hand. It is rather difficult to draw such
a line by free-hand method. Generally a piece of thl ~ad is stretched
through the plotted points to locate the best possible position for the
line.

CORRELATION GRAPH

Graphs can also be used to study correlation between two series.


Graph_s disclose whether there is any relationship between the two
variables, and if it is, whether it is positive or negative. If in a graph
the two curves representing the two variables show similar tendeQ-cy
it is an in4ication of positive correlation. If, on the other hand, the
two C'1f"es move in different directions, correlation is negative. The
data ~;Iven ir. table I below are plotted in figure NO.4.
458 FUNDAMENTALS OF STATlS'l'ICS

TABLE I
Pritt and SupplY of a Commodi!"
Year Price-per Maund Supply
(in rupees) (in maunds)
1944 32 22,000
[945 45 29,000
1946 )2. 22.,000
1947 2.9 19,000
1948 44 27,000
1949 69 43>000
1950 40 24,000
1951 2.9 18,000
1952 31 2.0,000
1953 39 23,000
1954 5; 3 2 ,000
1955 43 26,000
Average 40.5 25,4 00
In representing the above data by a graph the ordinate will have
two scales-·one which will show the price (in rupees) and the other
which will show the supply (in maunds). In adjusting the scales of the
two variables, care should be taken to see that their averages are at the
same level or in any case very close to each other. If this is .done the
two curves would be close to each other and their re~ationship can be
easily studied. If need be, a false base line can be taken for the purpose
Price and SIiPPlY of a Commodily
"riu Supply
(p,r mil) (oo.,mds)
q .. 72 45
fprta I AI\
,-
64 40

56 35
I
- S/Jpply
! 1\
48 30
f \
, .~~
I t\
I
\
Il'
, f\ /
[7/ ,:\
//' --1\1 1/
40 25
,,

l
,
r---..
~T' , "
I
32 20 I
~-I
t
24 15
I
16 10
I I L
8

o
.5
,1 1 l_Ll .__L_L-JI
Years
Fig. 4
COIl RBLATION 459

The above figure clearly indicat~s that the two variables-price


LOd supply-are positively correlated. Their curves always move in
.he same direction and this shows that the relationship is very close.
fhis graph, however, measures only absolute changes. If there is
:eally a good relation between the two series then the relative variations
n one would be related to the relative variations in the other. In order
:0 study the relationship between relative variations,either the data can
be plotted on semi-logarithmic scale or the series may be converted into
index numbers and then plotted on a natural scale.
Grapbit torre/ation of long period and short period thange.r. If correlation
between long time changes or trend of two variables is desired to be
studied separately from the correlation of their short time fluctuations,
each series must first be split up into parts-one Showing only their
trend values, and the other only short time fluctuations. This can be
done by linding out trend values by the method of moving average or
least squares. From the original series these trencl values should be
subtracted to obtain values of short time fluctuations. After this,
the trend values of the two variables should be pfotted on one grapb
and the fluctuations on the other. tt is possible to draw them on one
graph paper also. Thus. the relationship between the long time changes
of two series can be studied c;eparately from the relationship of their
short time changes. It is not unlikely that there is a l'ositive correlation
between the trend values of the two variables and negative correlation
between their short time fluctuations.
It should be remembered, however, that by a correlation graph
we can only have an idea about the direction of cQrrelation-whether
it is positive or negative we cannot know its extent or magnitude.

COEFFICIENT OF CORRELATION

Purpose oj taku/alion. Coefficient of correlation is calculated to


study the extetJt or degree of correlation between two variables. As
has been said earlier, the fact that there is correlation between two
variables does not mean that their relationship is functional or constant.
If the value of a variable is known it is not always possible to obtain
the exact value of the other variabl~. This can he done only where there
is linear relationship between the two variables. There are a few series
in which linear relationship exists, e.g., natural numbers and their squares
o·r square roots would always give a linear rel<ttionship. Similarly
linear relationship would be obtained between two series, one relating
to radii of various circles and the other relating to thel! areas. Tn economic
data such relationships are rarely f()und. No doubt demand would
fall with an increa~(' in price, but the relationship is not functional.
There is no constMrt ratio between the variation of the two series relating
to price and demand.
PerJetl Go"el,zlion. If the relationship between t'.'"O variables is
such that with an increase in the value of one. the: value of the other
460 FUNDAMENTALS OF STATl51ICS

increases or decreases in a fixed proportion, correlation between them is


said to be perfect. If both the series move in the same direction and the
variations are proportion::.te there would be perfect positive correlation bet-
lVeen them. If, on the other hand, the two series move in reverse direc-
tions, and the variations in their ~alues are always proportionate, it is
an example of perfect negative correlation. It is also likely that there may
be no relationship between the variations of the two series in which case
there is said to be no correlation between them.
As has been said earlier, in economic data perfect positive or ne-
gative correlation is usually not .found, as the relationship between eco-
nomic series is rarely functional. In such data correlation is not perfect
as the related series are not completely dependent on each other. Perfect
correlation is obtained when there is complete mutual dependence bet-
ween the two series. The following graphs show perfect positive and
perfect negative correlations : -
Perfect Positive Perfect Negative
C_:orrelatlon Correlation

"
.'
.' .'
,"

. . . ,

.. '

.'

"

Fig, 5 Fig, 6
It would be observed from the above graphs that all the corres-
ponding v.alues of x andy are in a straight line. Figure S indicates perfect
positive correlation between x andy as the variations in the values of the
two series are always in a fixed proportion and they move in the same
direction, Figure 6, on the other hand, shows a perfect negative correla-
tion between x andy as the variations between their values are in a constant
ratio and the two series move in reverse directions.
After knowing this, it is necessary to obtain such a measure of cor-
relation which can ac;curately indicate the de,grel! of correlation in quan-
titative terms, The measure should be such that its extreme values re-
present perfect positive and perfect negative correlations and the value
in the middle, absence of correlation, Such a measure is given by the
coefficient of correlation. .
The coefficient of correlation which we are going to discuss in
the following pages always varies between the two limits of + 1 r nd - I.
CORRELATION 461

'When there is perfect positive correlation its value is + I and when there
is perfect negative correlation its value i5- 1. Its mid-point is 0, whicb
indicates absence of correlation. As the value of tbis coefficient decreases
from the upper limit of + I, tbe extent of positive correlation between
the two variables also declines. When it reaches the value of 0 it indicates
complete absence of correlation and \vhen it goes further down in negative
values (less than zero) it indicates negative correlation. When it reaches
the other limit of - I there is evidence of perfect negative corrdation
between the two series.
The above-mentioned points can be studied from the: graphs which
have been given so far. When the values of the variable are like those
given in Figure 5 there is perfect positive correlation or the value of tbe
coefficient of correlation is+- I; when they are like those given in Figure
I there is positive correlation but it is not perfect, or the value of the
coefficient of correlation is less than--+- I but more than o. When the
3
values are like those given in Figure there is no correlation between
the data or the value of the coefficient of correlation is 0 ; when the values
are like those given in Figure 2, there is negative correlation though not
perfect, which means that the value of the:' coefficient of correlation would
be more than 0 (on the negative side) but less than--I. If, however,
the values of the variable are like those given in Figure 6 there is per-
fect negative correlation or in other words, the value of the coefficient of
correlation would be- I.
CALCULATION OF COEFFICIENT OF CORRELATION

(Karl Pearson's Formffla)


Karl Pearson, the great biologist and statistician. has given a formula
for .the calculation of coefficient of correlation. According to it the
coefficient of correlation of two variables is obtained ~.;. ,[ividing the slim of th,
products of the corresponding deviations of the varioNs items of two series from
their resper/ive means by the prodlfct of their standard deviations and the nllmber
of pairs of observatiolls.
Thus, if xl> XI' Xa ......... ... xn are the deviations of varIous itemS
of the first variable from mean value andYl,Yt'Ya'" .. .yn are the correspond-
ing deviations of the second variable from its mean value, the sum of thJ
products of these corresponding deviations would be !xy. If further,
the standard deviations of the two variables are respectively, 0'1. 0'1 and
if n is the number of ~airs of observations, Karl Pearson's coefficient
of correlation represented by r would be
(1) r=T.xy
nUl 0'2

It is clear from the above formula that if xY IS pOSItIve, the


coefficient of correlation would also be a positive figure, indicating
positive correlation between the two series. If, on the other hand, ~xv
is 'negative, coefficient of correlation would also be negative, indicating
that the correlation between the two series is negative. rXJ would
462 FUNDAWENTALS OP STATISTICS

be positive, if generally, positive and negative deviations in one series


are associated with positive and negative deviations in the other series
also. The value of ~)' would be negative, if generallY, the positive
deviations of one variable are associated with the negative deviations
in the other variable and IJ;&e IJer.ra. If positive and negat;ve deviations
of one variable are indifferently associated with the deviations of the other
variable the value of :Ex)' would be 0 or near it, indicating absence of corre-
lation between the two series. The value of this coefficient of correlation
is always between+ I and _ I. It cannot exceed unity*.
* Proof:-
Suppose ~he deviations of X-series fr?m it~ mean are xl> x., x.,_
Xn and ofy-serles)'h)'.,Ja ...••.)'0 and thetr ratios
• Xl 1
I. e., - , X- 2 ... -""-are
Xn .
respectlv e y ai' 11 2 , •••• an
)'1 )'. )'n
2
n'l.u 1 u/·- t:E'"><J'F
=n2 :Ex2 X E)'2 _ {Exy)2
n It
=:Ex2X:Ey2- (:Exy)2
=(X I 2+X.2+ . . X02) (Y1 2 +Yz 2 +.. )'0 2) - (x 1Y1+X. Y.+ .. X 'y)1

= {(~: Ylf+ <;'2 ylf + .. <;:J'nr } 1 2


{Y1 +YI

+ .. Yo 2)} - {<;;YI Z )+(;:h2)+ (J;)'~I)}2


(a1 2YI 2 + 1I • 2 ),.2+ .. an 2 )'n 2) (]1 2+ )'.2+ •• )'nll)-(al )'12+111

),.2+ •• anYn2)2
2
-)'1 Ya 2 (a 1 2+a 2 2 - 2tilal)+ ..
2 2
=-"1 YI (1I1- a 2)2
=0 if 11 1 =02=08= .• lin
Or positive for all oth~r values, or > 0 for other values
Thus
(i) When ai=Ftl2= •. tin
n Zo 1 2 u2 2 - (~Y) 2=0
or (:Ex])2=n2u12u22
or JExy)1 =1
IJ2U12Ua2

t) r r2=I

()rr=l
(it) When"l is not equal to tit or a,. etc.
2
It'" a1 a22_(~)2~ 0
or n? al? u2?' J> CL'9')'
CORRELATION 463

(CALCULA:PfON OF ltAU PEARSON'S COEFFICIENT OF CORRELATION)


(DIRECT METHOD)

Series of individual Observations


Example I. Find out the correlation co-efficient between heights
of father and son from the following data : -
Height of father
in inches 65 66 68 70 7Z
Height of son
in inches 67 68 68 71
Solution by Dire&l Method
Calmlalion of correlation co-efficients between heights
of father and son
Height of Deviation Square Height Deviation Square Product
father in from aver- of devia- of son from aver- of devia- of devia-
inches age height tion in inches age height tion tions
(68") (69")
ml x xl ml Y y2 ":'
---- 1------- - - - - - - - - - - - - _ ----1 - . _ - -
6S -3 9 67 -z 4 6
66 -z 4 68 -1 1 z
67 -1 1 65 --4 16 4
67 -1 I 68 -1 1 :;:
68 0 0 7z +3 9 0
69 +1 I 7z +3 9 3
70
7Z
+z
+4
4
16
69
71
0
+z - 0
4
0
8
---- -- 1:-,,<,1+ ;6 :E1/I 1 = :EY!=44
I
~,,<,y=Z4
:Em1544 0 0
11=8 55 z 11=8 I

.
Average he1ght 0
f f:ather = :Em 1 544' = 6'
-n- = -8- g
z
Average height of son = l;nml-:_Hg ' =69'

(l;x'y)1
or 1 l>
n20'1 20',2
or t> rl
I
DrxJ>r
or r <1 I
Thus r or coefficient of correlation cannot exceed unity
464 FUNDAMENTALS OP STATISTICS

Standard deviation of the heights of father

U1 = Jt ;2 = J ~6 =~.I2'
Standard deviation of the heights of sort

"'. = J,~
I
y2 =
11
j« =2..34'
8
Substituting the above values in the Karl Pearson's formula, we
get,

+.24
r = -------=+.6
8 X .2 •• zX .2.34

Thus the Coefficient of Correlation between the heights of fathers


and sons is ,6. +
In the above example we have calculated tl:i~ values of the arithmetic
average and standard deviations of the two series. If in the above
mentioned formula in place of standard deviation we ins~rt the formula
for its calculation, the work becomes easier, as then there will be no need
to calculate the standard deviation of the series separately. If it is done
the formula for the calculation of co-efficient of correlation would be

(it) r D<'.Y
.~
__
"'~ x J__ J--
-.iiY3- -
nJ l,;
n n
If the above example is solved by this fomiula. ,,'e gel

=--------
8 X j ~6 X :; ~4 = +.6

The above formula can be reduced still fU'rther and written as

'iii) r= _~_~.L -
\; Vl:,'I:' 2 X:E),2
In the above example with this formula we get
.24
r= -=--==-- = f-.6
V~6X44
CORRELATION

The last mentioned formula is also known as the Prodllt/ Moment


Formilla of CoeffiGient of Correlation. The above three formulae give the
same value of coefficient of correlation. But in these methods we have to
take the deviations of various items from their actual arithmetic average.
If the arithmetic average is in fractions, the work of calculation becomes
very difficult and to temove this difficulty other methods are used.
We know that the sum of the products of corresponding deviations of
two series from their respective means is equal to the sum of the products
of the values of corresponding items (not their deviations) less the product
of the total values of the two series divided by the number of items.
Similarly, the sum of the squares of the deviations of items from their
means is equal to the sum of the squares of tl:te values of items (not their
deviations) leu the square of the total value divided by the number or
items. On this basis we can arrive at such a formula in which there is
no need to calculate the deviations and their squares aod product, etc.
Example No. I has been solved below by this m~thod:-
Calmlalion of the Coefficient of Correlation be/ween the
T-lfiJ!,hts of Fotbers and Sons.

Height of Height of Square of the Square of the


father son height of father height of soo
(x) (y)
------ ------ -_.-- --- - - -V')
(x!) (~)
- - - - - -------
65 67 4H 5 44 89 4355
66 68 435 6 4 6 2.4 44 88
67 65 44 89 4 zz 5 4355
67 68 44 89 462 4 455 6
6l:! 72 462 4 51 84 4 89 6
69 72. 47 61 5 18 4 49 68
7° 69 49°0 47 61 4830
7z 71 51 84 50 41 5 1 I2. •
---- ---- ------------------------
TX--:-H4 TY=5'z I: (,,2)=370 .2.8 I:(y') = 381 3Z l:(xy) =37560

Substituting the above values in the formula


(ill) r r(x.v)-Tx. "Ty/N
V CI:(X2)- Tx2jN)(I:(yI)-Ty2jN}
Where r stands for coefficient of correlation
l:~ stands for the total of the product of items in the two series
Tx and 1) stand for the totals of x andy-series respectively
- ~x 2 and ~.Y 2 stand for the total of the squares of items in x andJ
series respectively and
30
466 FUNDAMENTALS OF STATISTICS

N stands for the number of items paired.

375 60 - 544X n 2 / 8
r= -..,;7-=37=-:>=2=:::8_:::::;::(5=44==):;;:2/;;8)~(=;8:::}=;=2-=:;:(5=5:::;2)~2/7:=8=-
375 60 - 37H6

24
=-~=:=;,.JL.6
..,; ;Z X44
Short-cut methods
In these methods assumed average is used for the calculation of
coefficient of correlation. Instead of taking deviation from the actual
arithmetic average (for calculating standard deviation and (Exy) the
deviations in both the series are taken from assumed averages. The
sum of the products of such corresponding deviations (:E~Y) is later on
corrected b~· subtr.lcting from this figure. the product of the differences
between the actu.l1 ~lnd .lssumedaverages in the two series , and the numbel
of pairs of obsecv:ltlons. The standard deviation of the two series is
either calcuhto::d lW the short-cut method or the relevant formula is
inserted in the formuh of calculating coefficient of correlation. Thus
co-effidellt of c(\rreiation or,
:r,'!I'-n(al-x1) (a.-xl!)
r=

Where,
III =Actual arithmetic average of the first series.
a z=Actual arithmetic average of the second series.
Xl =Assumed arithmetic average of the first series.
x 2 =Assumed arithmetic average of the second series.
:Exy=Sum of the products of deviation from the assumed averages.
The other symbols stand for the same things as in the first formula.
or
Exy-n(E:) (T.;)
Cii) r = - - - - - - · - - - - - - _ _
llj~;2 _( ,:x Yj-T.;:2 _(~ y
Exy _( EX; EY)
(iii) r = - - - - - - - -______ _ -----_,

J( E--.:2- (E:)2) (E)'2- (E;))


CORRBLATION 467
or
:Ex), X n-(I:X X I:1)
(ip) r= VI:x2X"_(~x)2v'I:)'2xn--(l:Y) 2
All the above four methods would give the same value of the
coefficient of correlation. The following example illustrates the above
formulae. .
Example 2. Calculate the coefficient of correlation from the
following table.
Year Av. dally No. Lakhs of Bales
of labourers consumed bv
(in thousands) mills -
192~ 368 22.
19 26 ;84 21
[9 2 7 38~
19 28 361 2.0
19 29 347 22
193 0 3 84 2.6
193 1 395 2.6
193 2 40 3 2.9
1933 400 28
1934 385 2.7
the above questlon has been solved below by all the four methods
gi yen above.
Solution. Caiculation of the Co-efficient of Correlation between the Num-
ber of Labourers and the Bales a/Raw Materials consumed k'l the mills.
A v. daily No. ot labourers eales consumed by miHs-
","'0 Devia- Devia- iProduct
Year ........
o ...u c::<'I
", tion Square Bales tion Square of
• ::s ::s
0 0 0 from the of in from the of devi:>-
Z~-E as. avo deviation lakhs as. avo deviation tions
- .51 (;80) (25)
fIJI (x) (x 2) mz ('I) (y2) (x)')
-- - -68 - --11.
192.5
- - - ------ ----- - - - - - - - - 1----
22. 6
3 144 -3 9 I
+3
191.6 38 4 +4 16 1.1 -4 16 iI -16
192.7 38 5 +5 2.5 2.4 -1 1 -5
192.8 361 -19 3Q,I 2.0 : -~ 2.S +95
19 29 347 -33 108 9 2.2. -3 9 +99
193 0 38 4 +4 16 2.6 +1 1 + 4
1931 395 +15 2.2.S 2.6 +1 I +15
193 2 40 3 +2.3 5 29 2.9 +4 16 +92.
1933 40 9 +20 400 2.8 +3 9 +60
1934 38 5 +5 2.5 Z7 +z 4 +10
- - - - - - - - - - r - - - - - -------- - - - - ---_
,.-1 C)
.I:x= I;x2= I:)'=-5 :E)'= .I:x~=
+IZ 2.8;0 " 91 + 39 0
468 I'UNDMBNTALS OF STATISlICS

CaJfll/oti(Jn of Coefficient of Co"eWiolJ. :


F,rst Meth"d
Arithmetic av~rage of TIll or
01=,80 + -11 = 381.2. thousan d labourers.
10

Arithmetic average of III, or


a.=2.,+ -s
10
=%4'S lakh bales
Standard deviation of 1111 or

0'1= j 28
10
0
3 (E.)3c =16'79 thousand labourers
10

Standard deviation of 1111 or

0' •• = J~
10
_( -s )2.
\: 10
=1.97 lakh bales.

Coefficient of correlation Or
r- 39:1-10 [(381.2.-580) (24.S -2S)]
- lOX 16'79X 2.97

=+ .8
In the second, third and fourth methods there· is no need to
calculate the arithmetic average, or the standard deviat~on.
Coejfiriel1t of Corre/arion
Suomi Method
1
39 0 - 10 (:;) ( 10 )
r __

loj 2.:~o __ ( ;~)2 J~~ -C:)2


39 6
lOX 16,79 X 2'97
= +.8
Coeffinent of Correlation
Third Method
+uX-s )
39 0 - ( 10
r=
CURRELATION 469

== ----------
V 1815· 6 X 88·5
=+ .8
Coet!i,;ient IIJ Correia/ion
Fourth Afe/bod
390 X 10 (12 X -- S)
r
'\12.830 X [0 -(TI2.)2
\9 60

'\Iz81l 6 X B8). = + .8
The coefficient of correlation of + .8 indicates a high degree of
pos~ivecorrelation between the number of labourers and the cort·
sumption pf cottOD. It means that with an increase in the number
of labourers the consumption of cotton also increases and vi.·ever sa.

Calculation ot correlation in time series

We have discussed in the last chapter that in a time series there


are mrunly two types of changes-long period and short period. When
'a study of correlation is made in two time series, it becomes necessary
to study it separately for its main components. The reason is, that
it is not necessary that the relationship between the long time changes
io the two series and the relationship between their short time fluctua·
tions should be similar. It is not unlikely that in two time series there
may be positive correlation between long pedod changes and negative
correlation between short period fluctuations or there may be negative
correlation bet\yeen long period changes and positive correlation be'tween
short period fluctuations. Under such circumstances if correlation is
studied between two time series as a whoie. misleading conclusions
would be arrived·at. As such, as far as possible, time series should
first be divided in various components and' then correlation should lJe
studied separately between correspo~ding components of the two
series.
(i) Correlation of long liltIC chan.e.es. To study the correlation of
long tiine changes first the trend values of "the two series must be
obtained. This can be done either by the method of moving average
or by the method of fitting straight line trend \vith least squares or by
fitting a parabolic curve to the data and ubtaining parabolic trend
values. After this, the coefficient of correlat'ion can be calculated
between the trend values of two series. No special method 15 needed
to calculate coefficient of correlation between such series. The
methods \vhich have been discussed so far can be used to obtain
470 FUNDAMENTALS oP 5TA1IS'nCS

coeffcient of correlation. Thus in studying correlation between


long time changes the only special thing is, that instead of original
series trend values are used for the calculation of the coefficient of
correlation.
(it) Correlation of short time oscil/ations. To study the correlation
of short time oscillations of two series it is necessary that trend values
are isolated from the time series and the values of short period fluctua-
tions obtained. Thus, two series would be obtained in which there
would be no trend values but only short time fluctuations. The
sum of the products of the corresponding short time fluctuations of
the t\\'o series gives the value of l:Xy. Thus the special point in the
calculation of coefficient of correlation between short time fluctuations
is, that tn such cases, deviations are taken from the t;Iloving average
figures (or trend values) rather than from the arithmetic average as is
done in ordinary series. The figures obtained by dividing the sum of
the squares of these deviations by their number gives the value of the
variance. Its square root gives the value of the standard deviation of
the series. When the values of 1;~)' and 0'1 and Us are thus obtained,
coefficient of corrdation can be easily cal5=ulated. The following
example would illustrate the above procedure : -~

. Bl(ofllple 3. Calculate the coefFcient 9f correlatiol1 of short-time


oscillations from the following data. Assume five-yearly moving
average and ignore decimals. These figures are imaginary : -

Year Indices of Supply Ip..jices of Price

1937 101 Il7


193 8 108 97
1939 10' 102.
1940 14, lIS
1941 I) 3 2. 0 )
1942 186 19 6
1943 202 177
1944 20 7 168
1945 20 4 177
1946 19 8 170
1947 200 1 65
1948 208 17 0
1949 232 175
195 0 228 180
19P 222 190
CORRELATION 471

II ' " \0
. ~::-::
'" "\0
2
0
o,\co....
.. 0 co 0 '<I"
~ ... .... II 00;!
I + I I + I I + ++ i \+

I :• \DC\I \0N
I
""' .... 00
~ l'1li
0toot \0 0 \0 ... ... •
/'

I I ++ I I+ I I I
472 FtTNDA.~!ENT.ALS OF STATrSTrC~

CoeJficim r of corre/ation
Direct Method Forlllula No. ;
"£.xy

V94-4 X 4 I 47
The value of + .07 indicates that there is hardly any correlation
between the short time oscillations of the two series. Probably this is
on account of the fact that the figures are imaginary. Ordinarily
short time fluctuations of supply and prices give a high degree,of cor-
celation.
(iii) Correlation of cyclical fluctnations. In example No. 3 above
we have calculated the coefficient of correlation between short time
fluctuations of two series. We know that short time fluctuations
consist of seasonal variations, cyclical fluctuations and irregular fluc-
tuations. It may be desired to study the correlation. exclusively
between cyclical fluctuations of two series. For this, it is necessary to-
obtain exclusive figures of cyclical fluctuations. After this has been
done, these cyclical fluctuations are divided by the '8tandard deviations
of the series to which they relate. This is done to bring them to a
common denominator. These figures are then multiplied in pairs
and their products are totalled to obtain the value of }; xy. This figure
divided by the number of pairs of values gives the required coefficient
of correlation. Here the formula for coefficient of correlation is.
r= I;xy

The reason for this modification in the formula is that the values
are already divided by their respective standard deviations and as such
there is no need of dividing 1;xy by the product of the standard
deviations.
In actual practice instead of cyclical fluctuations the pcrce~tagcs
of cyclical fluctuations are used in the calculation of coefficient of
correlation. Cycle percents can be easily calculated by the following
procedure ; -
(r) Represent the original series by percentages based on trend
valuer,. It means that the value against a particular year should be
divided by the corresponding trend value and multiplied by 100. These
are perct:ntages of trends.
(2.) Compute the seasonal variation indices by the methods ex-
plained in the last chapter.
(3) From the percentages of the trend subtract the corresponding
seasonal variation percentages. The resulting figure would give the
cvcHcal variations.
CO RI{ELA 'cION 473

(4) Divide the cyclical variatiOl::s by the standard deviation of the


st:ries and the resulting figures would be the cycles p':l(ent.
The cyclb percent of the two series can be multipl!,''''' ill paIrs and
the products totalled to obtain the value of ~.:.o" Thts figure divided
by the number of pairs of items woule! give the coefficient of correla-
tion. Thus the coefficient of correlation of the standard deviation
cycles is obtained by the formula ~ Here x andy respectively
stand for the cyclical variations of x andy series divided by the res-
pective standard deviations. The following example explains the above
procedure : -
Bxalllple 4. Compute the coefficient of correlation of the standard
deviation cycles--of the two series A and B given below:-
Calculation of Coeffici'ent of Standard Deviation Cycles
'\
Standard devia- Product of
Standard devia- tion cycles B standard devia-
Year tion cycles A serie5 series tion cycles
(x) (_y) (xy)
1<)41' ""1- L.< +1.1 + 1·3'%
~946 --0.2 ---0.6 +0.12
1947 -1.0 -1.1 + 1.10
1948 --0·5 -0.1 +°·°5
1949 --0·4 -\-0.1 ---0·°4
1950 -0.1 --0·3 +0·°3
195 1 -2.1 --1. 1 +2.3 1

195 2 --0·3 + .1 -0·°3


£95 j +°·5
-0.8_
+ ·4 +o.:w
1954 --0·4 +°.3 2
l<j~ ~ --0·5 -0·3 +0.1\

11= 11 + 5·53
Coefficient of correlation or
I;xy 5.53
r = -- n
... = ---=+.502.7
I I

Calculation of co-efficient of correlation in grouped data


If the values of two variables are grouped and the frequencies
of different gro1jlps are given, double tabulation is necessary for lind-
ing out the coefficient of correlation. Suppose two variables have
been grouped in cl~ss intervals with a'magnitude of ten units and the
frequency of classes are noted against them, the selies then would be
of the following type : -
474 FUNDAMENTAL,) OF STA'l1IS'l'ICS

(x-series)
Age i'1 years Number of Husbands
20-30 5
30 -40 20
40-5 0 44
50 - 60 24
60-70 7
Total 100
I- ..("j_ge (Jf W'it1e!'
I -----------.--~
Age in years
("I-seria)
Number of Wives
15-2.5 17
2.5-;5 37
35-45 15
45-55 25
55-65 6
Total 100

In order to present the data given above by double tabulation it


is necessary to know which different values are assumed by various
items in the two series. Suppose any three items assumq the values
40-50 in x series and values ;5-45 in y series. The frequency of
these items would be 3. In tbe above table if 10 husbands whose ages
are between 40 and 50 years have wives whose ages are 35-45 years,
the frequency of this group in double tabulation would be 10. In this
way we can know the number of items which assume the two parti-
cular values of these two variables. If these -data are available tbey
can be shown in a two-way table. Such tables are also called correla-
tion tables. Suppose the data relating to the ages of husbands and
wives give the following correlation table : -
Age of Husbands
(x-series)"

Age of wives 2 0 - 30 30--4Q 40-50 50-60 60-70 Total


y-series

f
------ ---------r-----------------
15-2.5 5 9; I7
2.5-35 10 25 2. 37
35-45 I U 2 15
45-55 .•. 4 16 ,. 25
55-65 ... ..• 4 2 6
--Total - - - - - - --:~- -~;-r--;-4- - - : ; - --~oo--
5 I I '
CORRELATION 475

Jn this table there is more information than in the previous two


tables. In the previous two tables only the frequencies of items
against various values ",ere given. 10 this table the number of items
which assume anv two values of these two variables are also given.
For example, the "number of items whose values are between .30-40
in x-series and between '2.5-35 iny-series is 10. Similarly the number
of items whose values are between 40-50 in x-series and between
35-45 in_" series is lZ. The number of items against certain values
of x and y variables is o. Thus there is no item whose value may be
between 20-;0 in'x-variable and between 2.5-:.5 ioy-variable.
The calculation of coefficient of -correlation' in a two-way table
is done on the same basis on which it is done in case of simpl~ series.
In the calculation of the value 0 :EX], however, the respective frequen-
cies of various groups have also to be taken into account. This will
be clear from the following example in which coefficient of correlation
bftween the ages of husbands and wives has been calculated on the basis
of data given in the last table.
Calculation of the ArithlJJeti& Av,rage and Standard
Deviation of x-seri8,
Age of Step devia- Squar~ of Total
No. of tions from deviations deviations
husbands husbands average of I

(f) 45 (dx) (dx 2 ) (fdx) (jdx 2 )


20-30 5 -2 4 -10 20
30 -40 20 -1 I -20 20
40 -5 0 '44 0 0 0 0
50 - 60 24 +1 I +2.4 24
60-70 7 +2 4 +14 28
_----_
Total 100
) +8 92.
I \
AIlthmetlc average of x~series or

Ql=X 1 -+- ('1:.!d>5 Xi)


, \X1here Xl stands for the assumed averttge and i for the magnitude
of the class-interval. Substituting the values we get
al=45 + (_8_ X10
100
)

=45'11 years
Stanaard deviation of x-series or

<11- j~~:~2 _(:E~d_X__y X ;


476 FUNDAMENTALS OP S'tATISTICS

=J~
100
( 100
8 )lIX 10 =J" ·9 -·)o.64X 10
1

=j'9 1 36'X 10 ='955XI0

9' j, years

Calculation oj Arithmetic Average and the Stal1dard


Deviation of Y-ferief.

No. of Step deviations Square of Total


Age of wives from assumed deviations deviations
wives average age 40
(f) (tty) (&2) (j11Y) U4f')
- ---- ---- ------;------------- ----
15-1 5 17
25-35
-2-
4 -.34 68
37 -I
-37
35'-45 15 o o o
45-5' 2.5
1-- 2 5
55- 6 5 (,
+12 4
I
----.---i----------________ ,______ _
Total JOO 34 154

!\rithmetic average of.'Y series or

a,=x2+(~fdy
If
Xi)= 40 +( -;4 X 100
10)
;=:::36.6 yeats.

Standard deviation of y-series or

U2=j~-(~~~yx;
=j110 _ (:-34)Z~ 10
100 100

=Vl"54-·IIS6~ 10= V I"4244 X 10


1.192~10=I1·92 years.·
To find out the co-efficient of correlatiol we have still to obtain
the value of ~xy- the sum of the products of corresponding deviations
of the two series from their respective means. In the following table
the values of ~X)' have been calculated by using assu)11cd arithmetic
average.
CORRELATION 477
Caklliatioll of the Vallie of!,xy

(the slim of the products of ti,piations from assllmed alleragu)

x-sedes

-
1;0=-40 ~o-,o
I-~--

Age-Husband_ 50-60 60
Wife
J.
------ --------i----
Deviations
'__--
---- - - - - - -
I
dx_., -20 -10 O. 10 20

------- - - - - - - - - - -I ---- ---- -


15-25 -20 2000 1800 0
\

--~\
9 3
---I-.-----
:1.5-55 -10 1000 0 -200 800
10 .2.5 .2.

- -- ---
~
·0 0 0 0 0 ·0
~ 35-45 I 12 2
~
---- -
-:=-~--
.-'-_

0 1600 I 000 2600


4 16

------ 800 800 1600

- -
4
----
.2.

-_._.2.800 0 2200

---1-,
I 800 8800
(EXY)

In the above table the figures given at the left top of various
cells indicate the products 'of the deviations of x and.J series from
thefr assumed averages find the corresponding frequencies. Thus,
in x-series, tqe deviation of the mid-point of 2.0-30 group from the
assumed average (of 45) is-zo, and io.),-series the figure of deviation
478 FUNDAMENTALS OF' STATISTICS

of the mid-point of 15-25 group from the assumed average (of 40) is
also-zoo The number of items \vith these values is 5. The product
of these deviations and the frequency is (-20 X -20 X 5) 2000,
which is written at the left top cornet in the relevant cell. The total
of all such products comes to 8800. This is the value of l;x.y from
assumed averages.
The co-efficient of correlation between the ages of husbands and
wives can now be calculated by using the following formula : -

_l;xY-f1(al-x 1)(02-X 2)
r- -------- ---- .
aX (11+(12

Substituting the values we get

8800-100(45.8-45) (;6.6-40 )
r= . - ---. - -----
100 X9.5 5 Xl 1.9 2

=+,79

Thus we find that there is a high degree of positive correlation


between the age of husbands and wives.
Short-cut method

The method of calculation of coefficient of correlation described


above, takes a very long time because in it, the actual arithmetic
averages and the standard deviati::>ns of both the series have to be
found out. These values generally come in fractions and the work of
calculation considerably increases. To remove this difficulty a short.
cut method is used. Here the deviations are taken from the assumed
average and they are divided by the magnitude of the class interval
These step deviations are used for the calculation of values of l;~.
111 and as. At eo stage ate the deviations or these values multiplied
by the magnitude of the class interval. The coefficient of correlation
is not affected by this change because both the numerator and the
denominator are proportionately reduced. In this method only one
table is enough to calculate the coefficient of correlation. It is not
necessary to ob tain the values of standard deviations of the two series,
as in the formula of correlation instead of 0'1 and a~ the formulae of
the calculation of standard deviation are inserted. In this way all the
calculation.s can be done at one stretch. Further, logarithm tables can
also be used to minimise the calculation work. The question solved
above by the direct method has been solved by the short-cut method
in the following table:·-
CORkELA'lltoN 479
Calntfatil!ll of th6 ro-effirient qf wrrt.lpJioll lIe/Wellf /he· agel of t,lIbatltU tI/t~ "';161.

Ages of husbands

Ag.-
Group
'20 oj]
1Jo-4014o-S( jso-!JO160-70
Ages of
WIves Mid-
I'olnts 35 4~ 55 65
2S

K
-20 -to Q +tQ HI} T6U\

-- 2 -1 0 +t + 2 ! fd; fr/J 2 flbrJ;

A'l,<- a.' -.%0 -2 20 .HI U


Croup \1 ->; 68 38
15-25 ~
-::> S \) }
'" 10 () -2
25 -35 30-10 -I 37 :-37 37

I
In 25 2
f,) 11
"
3S-AS 40 (] 0 J5 0 0 0
i t2 2
,"
(J 16 10

'IS-55 SO +1 1" 21 +2; 25 ~6


. 1() ~
8 8

~'-~5 6(1 + 20 +2
4 2
6 +12 .4 16

r01ll1 I ~ ]JJ 44 l~
, 100
r. 1"..1 ~f¥
--34 ~154

I(J,
(d)( -10 -26 0 {>oN +14 ... 8~

'Uri"
Uy' 20 ZO .\) 24 211 "'92

fJ>cdj 20 28 0 22 18
Ef~lI~
480 FUNDAMENTALS op STATISTICS

Substituting the above values in the Pearson's formula

~fdxdy-n (~~dx) (f1; )


r=---------------- .------______ _
n j f';2 _e~J:x Yj };f:;~ _(};-~Y Y
We get,
88-100 (~)
lOu
(-.;4)
100

r=--~~==~~~~====~~
160j_9"-1~-O-- c!o) j :~~ -( :~4)2
88 X 10) -(8 X--H)

8800+27 2
v'9::2=0=0==6;:4 v' I~5-40-=0===I1=5:::;:&=-
9°72.
r
v'9I;6X 14244
r=Anti log. [log. 90 7 2-1(log. 9136+log. 14 244)J
r=Anti log. [3.9 5 76-h(3·9 60 9+4.15 23)J
1

r=Anti log. [;.Q576-H8.II;Z)]


r=Anti log. [3.~576-4.o566]
r=Anti log. "i.90lO=.7962
Thus the coefficient of correlation between the ages of husbands
and wives is again+.79. It should be remembered that+.79 is a higlJ
degree of positive correlation. It is, however, not perfect correlatior..
whose value is + I or - I.
Assumption of Pearsoman Correlation
The Pearsonian coefficient of correlation rests on two assump-
tions. The first is that a large number of independent contrib/dory causei
are operating in each of the tlVO series correlated, so as to prot/uce !Tormal or
probability distribution. We know that such causes always operate
in chance phenomena like tossing of coin or throw of a dice. They
also operate in other types or data. For example, such forces are
usually found operating in phenomena like indices of price and supply,
ages of husbands and wives and heights of fathers and sons, etc.
/ The second assumption is tb~t the forces so operating Lire not in-
dependent of each other but are related in a casual fashion. If tht forces
are entirely independent and unrelated there cannot be any correlation
bet,'."een the two series. The forces must be common to both the
COalll!.LA TION 481
series. The height of an individual during the last ten years may
show an upwar.d trend and his income during this period may also
show a similar tendency but there cannot be any correlation between
the two series becalise the forces affecting the two series are entirely
unconnected ,vith each other. If tho.: coefficient of correlation in such
series is calculated it may even be+.8 indicating a very high degree
of positive correlation, but such correlation is usually termed as n01l-
sttlse torre/alion because the two series are affected by such sets of forces
which are entirely unconnected with each other.
In the words of Karl Pearson: "The sizes of the complex of
organs (something measurable) are deter(llined by a great variety of
independent cor.itributing ca\,1ses, for example, dimate, nourishment,
ph/sical training and innumerable other causes which cannot be in-
dividually observed or their effects measured.," KarfPearson further
observes, "The variations in intensity of the contributory causes are
small as compared with their absolute intensity and these variations
follow the normal law of distribution .....
" ••• In a series relating to tb,e heights of a large number of fathers
and sons it wlll be noted that the heights of both fathers and sons will
no-doubt be affected by a multiplicity of causes yet they would tend to
conform to the normal probability curve. If the forces producing
this tendency conform to the normal probability cUrve and are not
independent of each other then it is said that the heights of sons are
correlated with those of their fathers."
Probable ettor of the coefficient of correlation
After the calculation of coefficient of correlation the next thing
is to find out the extent to whiC'h it is dependable. For this purpose
the probable error of the coefficient of correlation is calculated. The
Theory of Errors forms part of the Theory of Sampling and as s:uch
we shall not discuss it in detail here. However, in chapters on Sampling,
the Theory of Errors would be fully examined. At this place it is
enough to write that if the probable error is added to and subtracted
from the coefficient of correlation it would give two such limits within
which we can reasonably expect the value of coefficient of correlation
to vary. It means that if from the same universe another set of samples
was selected on the basis of random sampling, the coefficient of corre-
lation between the two variables in this new sample would not fall
outside the limits so established. The formula for finding out the pro-
bablt error of the Karl Pearson"s coefficient of correlation is :
Probable error of coefficient of co!"relation
- I-rill
=.674~--
-'\In
where r stands for the coefficient of correlation and n for the
number of pairs} of observations. If the ~alue of the pr?babfe error
is subtracte-d from the value of the coeffiCient of correlation It would
31
482 FUNDAWENTALS OF STATISTICS

give the lower limit, and if it is added, it would give the upper limit,
within which the coefficient of correlation can be expected to vary.
Thus, in the example solved above, relating to the ages of husbands
llDd wives, the value of the probable error would be,

Probable error=.6745 ~('79)1 =. 01 5


100

The coefficient of correlation of the above data should be written


as follows : -
r=+ .79±.OZj
The limits of the above coeffiden t of correlation would be
,79-.0%5 or .765 and .79+.0,25 or .81~. If another sample of 100
husbands and wives was chosen at random from the same universe
from which the first sample came, the value of the coefficient of correla-
tion in the second sample can be expected to lie within these two limits.
Interpretation of coefficient of correlation
In order to conclude whether the coefficient of correlation i&
sig"ificant or not, the following points should be kept in mind : -
(i) If the coefficient of correlation is lesS' than its probable enor
it is not at all significant.
(it') If the coefficient of correlation is more ~ban six times its
probable error it is definitely significant.
(iii) If the probable error is not much -and if the coelIi, :'ent of
correlation is .5 or more it is generally conside'red to be significant.
In the above example the coefficient of correlation is +.79 and
the probable error is .o~ 5. The coefficient of correlation is more than
30 times of the probable error and as such is highly significant. It
means that ordinarily the higher the age of the husband the higher
would be the age of the wife and the lower the age of the husband the
lower would be the age of the wife. It does not mean, however, that
all old husbands,would ha·ve old wives and :lll young husbands young
wives. It rho,,1d be "f!fllu!lb~r~d that thf! co~ffidl!nt of correlation expre.rres
the relationship beflJ'etli two serino lind not b~t/llten individual itU!I.f of Ihl
s6riu.
Sonietimes the probable error may give misleading conclusions
particularly where the number of pairs of observations is small. In
order to draw dependable conclusions it is necessary that the number
of pairs of observations or the value of n should be fairly large.
Calculation of coefficient of correlation by the method of least
squares
It study of correlation between two series can be done with the
help of the line of t.he best fit as obtained by the method of least
.quares. Whell a coefficient of correlation has to be obtained
CORREI.ATION 483
between two series the first thing to be done is to get an equaclon which
would give the best possible values or y-variable (relative) for
given values of x (subject). From this equation values of y for given
values of x are computed. This procedure is already discussed in
details in the chapter on Analysis of Time Series. The mathematical
equation describing the relationship between x and y variable consti-
tutes a measure of functional relationship but is only a measure of
their average relationship. If the relationship is perfect, all the plotted
points (of the original data) would 1.Jc in a line and the equation would
accurately describe the relationship. But we know that in economic
phenomena this is very rare. Relationship between the original series
IS rarely functional. The method of least squares only gives the line
'f the ),., fit. It only indicates the average relationship between the
two variables. In such cases many points may be far away from the
line of the best fit and the question that naturally arises is how far are
the results dependable? Therefore, in studying correlation by the
, method of least squares, it becomes necessary to obtain a measure of
dispersion about the line that has been fitted so that an idea may be
obtained about the extent to which the actual values deviate from the
computed ones.
The standard deviation about the line of the best fit is called
STandard error of the estimate. It is usually represented by S. In the
calculation of S, deviations of actual values of y from the computed
values are calculated. The root mean squ.. re of these deviations gives
the value of S. In other words, the deviations should be squared and
totalled. The total should be divided by the number of items and the
square root of the resulting figure should be obtained to get the value
of the standard error of the estimate. Thus, if d represents the devia-
tions of the actual and computed values ofy the value of Sy would be,

Sy=
J :Ed!
--n-

The value of Sy is interpreted in the same way as the standard


deviation about the mean, If about the line of relationship there is a
normal distribution, 68% of all the cases would lie within a range
of ± S, 95% within ± 2S and 99.7 % within ± 3S. If the dis-
persion about the line of equlI.tion is less, the value of S would also
be less; if it is more, the value of S would also be morc. The value
of S is, therefore, an indicator of t_he significance of the computed
values.
If the average deviation of the original data ofy series is calculated
it also gives a measure of significance of the arithmetic average from
which deviations are calculated. The probable values of .y can be
obtained either by the equation of the best fit or arithmetic average
ofy-series and its standard deviation. The question that arises at this
stage is which of the two estimates is ,better. Usually a relation-
484 l'UNDAWENTALS OF STATIStrlCS

ship ill established between the two measures which is also helpful
in the calculation of coefficient of correlation. Such a relation-
ship is obtained by dividing the standard error of the estimates by
the standard deviation of they series, or in other words, such a relation-
ship is expressed by finding out the value of S,Y This is usually
(1Y
called HleaSllrtJ of ~o"elation. A better measure of correlation is obtained
by finding out the coefficient of correlation which is obtained by the
following fotmula : -
Coefficient of correlation, or

r=jl_~j_2_-
of'

In this formula if there is no dispersion aoout tbe line of relation-


ship, the value of Sy would be 0 and the: value of the coeff.cient of ccrrela-
tion .would be I. The maximum value of C1J' is that which is equaJ to Sy.
Sy1
In such cases l \\ ould be equal to I and the coefficient of correlation
(1)
would be equal to o. It will indicate that there...is no corrdation betw('en
values of x and.1' Thus the two limits of this correlation are 0 and I.
The higher the coefficient of correlation the greater tfe confidence that
may be placed on the computed values ofy and as such the greater the
degree of correlatioD between the two series.
The following example would illustrate the above formula:-
Exa",ple 5. Calculate the coefficient of correlation between the
values of x andy by the method of least squares.
x
y 166
1
> 4
142 180
Sollilion. For obtaining the value of Sy we shall have to obtain
the computed values otj by the mtthod of least squares. The technique
of the calculation of these values has been explained in the chapter OD
Analysis of Time series.
\

C 0111pliled Vailies ofy by the Methoa ~f Least S qllare s


oX
.J oX.J Xl Computed values ofy
(1) (z) (3) (4) (S)
166 166 134
1 184 368 4 168
~ 14.1 4.16 9 201
4 180 720 16 23 6
~ 33 8 1690 2,
- --------._-------------------------
1010 s.s 1010
27 0
COR RE.LA TION 485
The equation of the line of best fitisy=a+bx and as we have seen
ill the Chapter on Analysis of Time Series that the values of a and
_• in the -equation can be obtained by substituting the relative figures in
the following two normal equations.
1:(.1) =n'(.)+b(l:x) ••. (i)
1: (-")') - I I 1:(-,,")+ 101': (x 2) ••• (ii)

1010=
3310=1,.+, ,b..
,.+
Substituting the v~lues in the above two equations we get -
I,b •.• (1)
. .. (ii)
If these two equjltjons are ~olved simultaneously the value of it
would be 100 and the value of b 34, and tbe equation of the line of the
best fit would be- .
y=IOO+34X· .'
On the basis of this·equatipn the computed values of.1 have been
given in c()l~mn No~ 5 of the above table~
In order·to calculate.·the value of Sy w.e shall have'to find. out the
~fference between the, original and computed values ofy and we shall
have to obtain the square of thes~ devia~on~. ,For c:r-lculating the value
of tTy we shall have to £ind out the deyiations of the original values of .1
from the arithmetic average of the series and·we shall have to obtain the
square of these deviations als~. This has been done in the following
table : -
Calculation of the valtler ofS.1 and rI.1

I Duterence r
between
original and I -
Deviations of
original values
-
of.1 from the
Computed computed mean of the
values of values series 20Z.
(-"") ()') (y) (d) (d') (dy) (d)'·)
-I
-_- ----- .-...------ ----
166 134 3l 1014
-------~

-3 6
---
96
12
2 t84 168 16 z.S6 -18 3~4
3 141 101 -60 3600 -60 I 3600
4 180 2.}6 -~6 3 1 3'6 -:1.2- 4 84
S 33 8 17° 68 46 14 1;6 [849 6
- ---- ------ ------ --__.__ ~_----- ----
15 1010 101C> ... 11640 ... 24100
Standard error of the estimate or

J'y=
=5 0
JtJr' =
-n-
•2
j12640
-5- =V 2~
Standard deviation ofy series or

J t!Yt.
II ,=
J24200 =V
5 4!<4 0
486 FUNDAMENTALS OF STATISTICS

Coefficient of correlation
r=Jl--.~)'2 =jl_2~
ay2 4~4°
=+.69
The coefficient of correlation calculated by Karl Pearsoo',
formula or by the Product Moment formula would abo be+.69.
This can be verified as tollows :
According to Karl Pearson's formula
r .._...;.1:"",,'\...
:'_
n)( CTXX o-y
The value of ~x)' if calculated would be HO .nq of .x 1.4
Thu~

r 340
S )!: 1.4 ~ 6 9,6
=+,69
It can easily be proved that
~-/t=rz
Sy =cryV
In the above example CT.Y V'X-=;2 \
=69. 6 </1-.4 8 =69.6 </ ~ =69. 6 X .71
=5 0 • 2
Shorl-clit Mtlhod. The coefficient of correlation by the Method
of Least Squares can also be obtained directly without calculating the
values of Sy and cry. For this, besides the computed values of_' the
value of I;yl has also to be obtained. In the above example the Talue.
of yl would be as follows _

Original
values 166 184 142 r 80 .B 8
I Tot"l
1,.)10
ofy •

~~--- -;:;:5-;6 ~;s;6 -;~~-- -;~:o- ;,;;,;.;~I::~:;~


The short-cut formula for the calculation of the coefficient of
orrelation is
~,z: y)-l h l;(xy -NC}'2
I;(yl)- Nry2 .
Where ry is the difference between the mean)' and the origin em•
•loyed in calculations. In the above example we have not assumed
verage ~nd the: origin is 0, and so the "alue of c.J would be equal to the
b,
'COllRELA'l'ION 487

-Substituting the values in the above formula we get


r=-
/I
/< 100X IOIO)+(34X'337 o )-Crx 202 ~ ;:oz)
.%.z8z.z.o-(5)llZOZ)"(Z02)'

=+.69
It will be observed that in this method there is no need of finding
Ollt the values of d, dB, dy, and 4'11. The table which is prepared for obtain-
Ing the computed values of.} is enough for finding out the coefficient
of correlation. Only the values of ~(yS) and ry have to be calculated.

Calculation of the coefficient of correlation by rank differences


Sometimes such problems arc faced where it is possiblc-to"aua1;lgc
,the various items of a series in serial order but the quantitative'me'asu,te-
ment of their values is difficult; for exatnple, it is possible fot a clasa
teacher to arrange his students in ascending Ot descending order of in-
telligence, even though intelligence cannot be measured quantitatively.
No doubt, the quantitative study about the intelligence of students can
be made by holding an examinatioo and assigning them marks, but
this method can never be said to be infallible. There are many such
attributes which are incapable of quantitative measurements; fOE
example, honesty, charact~t', morality,:"..~tc.._ lli;ili!es..J2ih seria!l}'-:.::
ranged data are used in such places also where due to paucity of funds or
12.ck of proper statistical mllchinery dependable measurements cannot
be obtained. Sometimes this method may be Ilsed to escape mathe-
matical calculations 8;ssodated with other methods; for example, it is
mote tedious to measure the heights of. group of persons th2.o to
arrange them in ascending or descending order of height.
Suppose the values of a vartable (height) arc 70', 66', 65', 63"
73". If these figures are atnnged in descending order the figure 73'
would receive the first rank, 70" the second rank, 66" the third rank,
6~" the fourth rank and 63" the' fifth rank. In this way the ranks
of the other variable are also obtained. The rank of the item whose
value is highest is t and so on. If th~re are two or more items having
the same value then there is some difficulty in finding out their ranks.
Suppose two items have equal value and their rank is 3. Now they
will be given the average rank of those ranks which they would have
got. had there been a slight difference in their values. According to
this rule the rank of both these items would be S+ 4 or 3.5 and the raok
2
~f the next item would be l.
488 P1JNt)A:alBNTAt.S OP S'l'ATIlnCS

After assigning ranks to the various items of both the series the
differences of corresponding rank values arc calculated. To calculate
the coefficient of correlation the following formula is used : -
r _ 1- 6(»l1)
n(III-I)
or
6(»11)
n 1- tI'
Where r stands for the coefficient of correlation ~I for the total
of the squares of the difference of corresponding ranks. and n for the
number of pairs of observations. The following example would ilh,ls-
trate the above formula : -
B"lt4I11pJ, 6. Calculate the coefficient of rank correlation from the
following data-

Rank y Rank /DUteren~


of Ranks (d)
----- ----- ---- ----T--- ----
7'
+~-
60 3 2. .04:
74 - - Jl_ -l2-- -i t -I
40 10 H 2-
-~
4
SO 4 40 6 ' -2. 4
4S 6 4S 4 2. 4
41 9 B 9 0 0
_2.2. 12. 12. I 12. 0 0
43
42.
I 7
8
30
36
I II
7
-4
1
16
1
66 1 I 72. I 2. --1 , 1
64 2. 41 i 5, -3 9 .
----S--I--:2_-!--__:- ~---:_---- ~--
6
__ 4 _ _
'-'=12 I I I 0 48
Coeftiaeot ot rault correlation or
CI)
CORRELATION 489

6(l:dl ) 6(4~) 142.8


(;;) r=I -1
(nl-n)- 12&-12. = 1716

='82.
Sometimes where there is more than one item with the same value
a common rank is given to such items. This rank, as has been said
earlier; is the average of the ranks which these items would have got had
they differed slightly from each other. When this is done, the coefficient
of rank correlation needs some correction, because the above formula is
based on the supposition that the ranks of various items are different and
that no rank is given to more than one item.
If in a series there are III items whose ranks are common, then for
correction of the coefficient of rank correlation 'I"
[("",1-18)) is added to
the value of (Edl). If there are more than one such groups of items
with common tank, this value is added as many times as the number of
such groups. This procedure is clarified in'the following example : -

Bxampl, 7. Calculate the coefficient of rank correlation from


the fo11owing data-

. Rank Rank.
CaltNlation of Coiffititnl of Rank Corr,latiotl

Dlltercnce dl

I
Y
_____________________,_of_!anks (d~ _ ___ _
48 ~ 13 5.S -Z.·5 6.15
B S 13 ,.j -0., .25
40 4 14 I +3.0 9.00
9 10 6 8.5 +1., 2..2.5
16 8 IS 4 +4.0 16.00
16 8 4 10 -2..0 4.00
6, 1 20 2 -1.0 1.00
2.4 6 9 7 -1.0 1.00
16 g 6 ll.5 -0., .2.5
j 7 2 1 9 'j 3 - 1.0 1 • 00
------ -----_ --------;--- -_ ,--------- ----
__n_-_-_.I_o__~________+_--------------~~-----O-----------4~1,:~~

In the above table in x-series the figure 16 occurs three times. The
tank of all these items is 8 which is the average of 7. 8 and 9-the ranks
which these items would have got had thete bc:oco some difference
between their values. In.1-series figure 13 and 6 both occur two times,
Their ranks are respectively 5.5 and 8.5. Due to these common tanks
tbe coefficient of rank correlation would have to be corrected.
490 FUNDAMENTALS OF STATlS 7ICS

For conection we shall add [ TIl' (m-m)] to the value of (2:d l ). In


....-series this value would be equal to [l.. (3 8 - 3)] as the value 16
has occurred three times in this series. Iny-series there are two such
groups of common ranks. In the first group this correction would be
ht.. (2 1-Z)] as the value I ~ has occurred twice and for the second
group also the correction value would be [l.. (2 1--2.)] as the value has
also occurred twice in this .eries.
Thus
2
r=l- 6 [(,Ed )+li(m 3 IH)]
n3 -11
= 1 - 6_.~[4.;_1_+-'---""I,-"~,-,(",,,5_3---,3:'<'~_o+';-I-'T~~~~_2._3_2L)....:+_1~l!..:(:..:2._3_Z~)J

=+'73
Coefficient of concurrent deviation
Sometimes it is desired to study the correlation bet\tlee...1 two
series in a very casual manner, and in such cases no particular attention
is needed so far as precision is concerned. In such-<=ases it is enough to
calculate die Goeffillie1l1 of GonGllrrenl devialions. In this method correlation
is calculated between the dirul;on of deviations, not thair magnitudes.
As such only the direction of deviations is taken into account in the
calculation of this coefficient, and their magnitude is ignored.
It has already been said earlier that if the short time fluctuations of
two time series are positively correlated or in other words if their devia-
tions are concurrent, their curves would move in the same direction
and would indicate positive correlation between them. Coefficient of
concurrent deviations is calculated on this very principle and ordinarily
It indicates the rela.tionship between short time fluctuations only.
To calculate the coefficient of concurrent deviations, the devia-
tions are not calculated from any average or by the method of moving
a'Yerages but only their direction from the previous period, is noted
down. The formula for the calculation of coefficient of concurrent
deviations is ginn below : -
Coefficieat of concurrent deviations or

Where r stands for the coefficient of concurrent deviations, G for


the number of pairs of concurrent deviations and fI for the number of
pairs of deviations. The value of this coefficient of correlation also
varies between ± I. The plus, minus signs given in the formula
should be carefully noted. If the value of( un 11) is negative its square
CORRELATION 491
root cannot be calculated and so a minus sign is placed before the sign
of the:: root so that the square root may be cakulated and the minus sign
maT be kept before'the. value of the coefficient of correlation.
The following example illustrates the above formula : -
Extllllpl, 8. Compute the coeffi~ient of concurrent deviations
from the following table, showing the output of steel, in tons, and the
number of unemployed persons in steel i"c1ustry, in thousands, for I,
months.
::'ublect l\.Clauve
Months Output of Unemployed in
ited in ooo's steel industry
tons ooo's
January 8., 60
February 9.2 6,
March 9.) 6I
April S., 74
Ma, ].2 5"1 2
June '·9 In
julr 1. 1 130
_"Ugusl 6.6 t06
.~cptcmbcr 7.9 ,8
Octobt:r 7.6 lb
November 8.% 50
Dccembr-r 9·~ 41
SoilltiflfJ. C.mplltdtiofl of the coefficienl of torre/atiolt ie/ween It, 0111-
Pllt oj sl,,1 a.1I Ih. umber" 11'1'"plo,,,1 persoIJI in Illtl indlillry I?J mId.! of co.-
c.,.rtnl 4";111;0" I
Output of Steel Unemployed in stcd
industry

Months Dc'dation from Deviation from ' Product


Output preceding Number preceding
in 000·' month in ooo's month
tons (x) (.1) (X)')
January 8·5 60
FebrulltJ
March
9-~ + 6,
61
+
--
+
9·; -!- I -
t.pril 8., -
_. 74 + --
:Mar 7·2. 9%. + -
June 5·9 -. 157 .J._
--
Julv ,.1 -. 130 - +
August 6.6 + I06 - _.
- -
-+
Sept. 7·9 58
October 7·(:, 80 -+-
-.
--
November 8.2 + S2 -
December 9. 2 + 4~ - -
492 FVNDAMF..NTALS OF STATISTICS

Number of pairs of observations or n=1 I


Number of concurrent deviations or &=2
Substituting the above valu~s in the formula

where r
r ==J= cat n ~
teprescnts the coefficient of correlation,
We get,

=± Y-'(-'6364) .=--y;6364
=-.79 8.
Thus, the_,rt>efficient of correlation between the output of steel and.
the number oC unemployed persons in steel industry is -.798, which
indicates that there is a hlgh degree of inverse correlation between the
two.
If there ;re
concurrent deviations between two serle!' (whether
positive 9r negative) xJ would always be plus. The value oh: is equal
to the number of times the deviations are concurrent. In the above
s~es. there 'lire pnly two concurrent deviations, which means t~at o?ly
two .t1mc$ the:rnovement of the two series have been in the same dIrectIOn
, an,d t~t is why there is a high degree of negative correlation between
them. ,'.'
Cottela'tioD table
Jusi as in case of scatter diagrams we plot the value of x and y-
variables 01). the graph and study their trend lineS, in the same way,
conclusions can be drawn about the relationship bert'i,'een .two variables
w.hicli are' presented in the shape .of continuous series in a two~way or
correlation table. In 'correlllHon table the number of items which as~
x
sume particular values :of apd y variables arc entered in the relevant
cells. ~here~~r ;:?rrelation between grouped series h~s to be studied
correIa,tIoo ,table IS necessary. The following table gives the figures
of production of pig iron and the figures of industrial production:-
. Indices of Pi~ Iron Production
Total

I I I I I I i
~a
a
'

c:I \
-
1}.::>-130
110-120

[Oo-tIO
I
I
I
I
I
I
I
I I 1 () I ~4 I
11115 1 1 0 1
! n
I I
I
.p

-g.S!
... u
I 9°--100 1 I I 1 31 H I I I I
'0 .g 90 (
I!o-- I 2. I 2.4 i \ I
VJ e 70- roo I I I 7 I l I I
C)

~p.. 60- 70 I I 2. \ I I \ i
oS 50 - 60 ! () I z! I ! 1 I I
Total 6 I 4 I 10 I 2.9 i 41 I 511 I 40 I rt> I ~04
CORRRLATION 493
In the above table the figures are in a diagonal band almost in the
same style as the points are in a scatter diagram when there is positive
correlation between two variables. If in a correlation table the values
of x (or,}) variaqle given horizontally are in ascending order and the
values of.J (or,x) variable given vertically in descendjng order from the
top. and if in such a table the frequencies are in a diagonal band rising
from left bottom towards the right top. there is an indication of a high
degree of positive correlation bctv.een the two .series. In the above
table indices of pig iron production shown horizontally are in ascending
order and the indices of industrial proi:1uction shown vertially in des-
cending order from tht: top. 'The frequencies are in a diagonal band
rising fwm left bottom to right top. 'Ibis indicates that there is a high
degree of positi"e correlation between the two series. If in such a table
the frequencies are in a diagonal band which falls from left top to right
bottom it indicates a high degree of negative correlation. 1f there is
no trend in the frequencies either upward or downward it i(l an indica-
tion of absence of correlation or of a very low degree of correlation
between two series. It should be remembered that if the values on the
vertical scale are arranged in ascending order from the top. the conclu-
sion with regard to the direction of correlation would be reverse. In
such cases a diagonal band rising from left bottom to right top would
indicate negative correlatlon and a diagonal band sloping from left top
to right bottom would indicate PQsitive correlation.
Lag and lead :
When there is cause and eHect relationship between two series it
is not unlikely that there is a time lag between the changes in the values
of the subject and the relative. If. for example. it is established that III
'rise in prices is accompanied by an increase in supply it is quite possible
that the change in the supply may take place three months -or sis months
after the changes in prices. The difference in the period of change in the
values of the subject and the relative is called time lag. 1£ there is t.
time lag of, say, one year bet\Veen the price changes and the change in
the supply, it is essential that in the study of correlation the values of
prices should be paired with those values of supply series which are ob-
tained after one year from the change in price series. Thus the prices of
the year 1953 should be paired with the supplies of 1954 and the prices
of 1954 with the supply of 1955 and so on. The values of the relative
should always be lagged in 5uC'h a way that they can be compared with
the values of the subject. The underlying principle of allowing for
the lagging effect is, that the values J'aired should be such which would
give the highest value of the resulung coefficient of correlation. The
period oEIag can be estimated by plotting the two series on a graph paJ'er
and studying the period of peaks and troughs in the data. If there IS •
lag of one year between the series of the price and supply the price curve
will lead by one year and the supply' curve would lag by this period.
The peaks in the price curve would be one year earlier than the peaks
in the supply curve and similarly the troughs in the price curve would
be one year earlier th~ the troughs in the supply curve.
PIJ~DA"F.NTALS OP STATISTICS

Correlation and Determinliltion


We have seen that correlation indicates the amount of varlatlon
of one varlable which is associated with or whicb is
accounted for by
the variation in another variable. A more easily understood and in
certain case~ a better measure to fulfil this purpose is the loiffi&iml 0/
th'erminal;on.\ It indicates the actual perct:ntage of the portion of one
variable wbich is associated with the other or the percentage variation
in one variable which is accounted for by the otber. Coefficient of
determination is the square of the coefficient of correlation. The
relationship between the coefficient of correlation and coefficient of de-
termination is of the following type

'% 11% ,0/


/U n~u
1.00 1.00 .4-0 .16

'9° .81 .;0 ·°9


.&0 .64 .2.0 .0,",
.60 .3 6 .10 .01

·5° ·2.5

The coefficient of determination is a better measure than the coeffi-


cient of correlation. If we compare the two coefficients I of correlation
one of which is +.6 and the other +"h we shall have the irupression
that the correlation in the first case is twice as high as in the second but
the truth is that the correlation in the first case is four times as high as i.l1
the second case. This fact is clearly indicated by the coefficient of
determination. The coefficient of determination in these cases would be
respectively r .,6 and + .09. If the coefficient of determination is
.+.11 1 1t means that 8 I % of the variations in the relative series are due
to varations in the subject series and the remaining 19% due to other
factors. In case the coefficient of correlation is +.9 we cannot say that
90% of the variations of the relative series are due to the variations in
the subject series. We shall have to find out the square of the coefficient
of correlation to find out this percentage. As such sometimes the co-
effide~t of correlation may actually give misleading conclusioos. In
coefficlent of determination there is no such confllsion.

Qu.adODS
I. What i. meant by correlation? Doc. it alwa)'11 signify CAUse and el£ect
zelottionship between two Tanables ? (]II. C.",., RlljPllflll!l•• '19")'
z. Explain the meaning and ligoificance of the concept of ootrelatlon. How
will you calculate it from a statistical point of view) (ll/. C""., Agr(l, 19~5)·
,. What are the spechl chatact<!ris!ici of Karl,f~ars6n'.cOe[i1:ient of correlation
'V1hat arc the aalumption. which thi. formula i. baled Ot! ?
~. How would you calculate the coefficient of'correlation h~ the method of leut
4Iquares. &pJaip the underlying assumptionl of IUeb a correlation.
5. What is meant by correlation? Gi ..." the general rule. for Interpreting its
CQ.eHicient, (]II. C-., AIJj~ 19«).
COaRI!LATION 495
6. What is correlation? Explain how would YOD study correlation by-
(II) Graphs
(b) Correlation Table
(e) Karl Pearson'. coefficient of correlation.
7. Diacussthe problemlnvoh·ed in correlation analy.is in the cue oftimeKrle.
and state how they can be solved. (M• .A., .Ai11llNlHJ. 19,0).
S. Write .bort notes on-
(,,) Positive and negatf...e correlation.
(b) Line of the beat 6t.
(t) Lag and Lea"d.
(J) Correlation table.
(e) Coefficient of determination.
9. Prove that the Karl Pearson's coefficient of correlation cannot cxceed±t.
10. How would you calculate coefficient of correlation bctween:-
(II) Long time changes.
(b) Short time changes.
(I") Cyclical fluctuations of two serIes.
n. Compute the coefficient of correlation from the following data:-
1100-1000--900--400 1200 1400--600--1000
-3 600 35 00 %400 UOD-3600-UOO 1800 3000
n. The following data give the Index numbers of industrial preductlo!.
of Great Britain and the number of registered unemployed persona in the ..m~
country during the year 1914--31:-
Industries Numberoftegbtercd
Year Production UnemploJc~
(Index Numbet) (Hundred thouunds)
[924 XOO 1I·5
191' 10% IZ.O
19z6 104 14.0
1927 107 II.I
1928 105 U.,
1929 XI2 IZ.2
1930 103 19.1
193 1 94 z6.4
Calculate the coefficient of correlation between production and. the number of.
unemployed_ (B. eli,..• L.chN., 1944)
"";3. The following table give. the value of expOttl of raw cotton from India
and the value of the Imports of manufactured ,cotton goods Into India during the
yean %913-14 to 1931-3%:- "
(IncfOxesof .cupeei) Imports of manufactured
Year Exports of Raw Cotton Cotton Good.
[915--[4 4Z ,6
19 1 7--18 4~ 49
[9 1 9--20 S. SJ
1921--12 ss Sa
1925--24 89 6,
1929--30 9i 76
['131-32 66 Sl
Calculate the coefficient of correlatioh between the yalue of the expottl of raw
cotton and the value of the imports of cotton manufactured goods. (M. A •• Clllndlil
, 1937)' (B. Co,. •• N"IPIir. 1944).
496 FUNDAlaNTALS OF STATISTICS

14. Calculate the coetJi.cient of correlation between the COlt of living and the
weekly wage !lit" from the following data : -
Year Coatof living Index Index of weekly waae
Rate.
1920 1,1 In
192 1: no 120
1922 102 99
1925 101 9'
1924 103 101
192 5
192 6
100
100
'.1
10Z
192 7 96 100
19 21 9' 99
1:9%9
1930
9'
87
99
9'
1931 84 96
193 2 81
(M. A.•• ...J:W-,
1937).
1,. Calculate the coefficient of . correlation
given below : -
between the vaIuCl of!JC and"

!JC
78
19
u,
"
IH
97 1,6
69 :rn
'9
79
107
136
68 u3
61 108
(You may ule 69.1 working mean for:lt and that for.1)
liZ . .
(M. :A... Delbi. 19")'
16. Calculate the coefficient of correlation betweep infant mortality and
o'lCrerowding from the following datal-

i 109!
In£mt mortality
Percentage of
population ovcrcrwded
!
-1---1--
I
. 14.9· 6., ,.8
!
96
---- ---I
IU I 14Z

I
1,1 114 uS IOZ 109 1,6/ 122

n.z H.2 13.3 14.6 8.8 4.9 39. 8 1 6·3

(B. C,••• HHs. A1NlbrtI. 19"),


17. Prom the following table. find out how far the fluctuatlonl in prlcca
correspond to the amount of money In c:irculation in India : -
Rupeel aodNotCi Index Numbcrs
Year in circulation of PrlcCl
(iaerores) (1 873-1000)
1912 248 137
191 3 2,6 143
191 4 248 147
191 ' z66 1,1
1916 297 1 84
191 7 53 8 196
1918 407 U$
191 9 46 !! 276
1920 4II 211
1921 59!! 260
(B. Com., Agr•• 19:n).
CORREI.ATION 497
18. Calculate,. from the following table, and indieate itt! probable error :-
Net area No. of ploughs
Sown in lakhs in
of acres lakhs
U. P. 3~9 51
Madras 310 44
Bombay 285 12
Punjab 275 24
B.&O. 257 35
C. P. 245 16
Beilgal 140 46
Assam 64 II
S~d ~ 3
N. W. P. F. 23 2

Avera~e uo6 145


(P. C. S., I 94~).
19. The foll",wtn& table gives the results of the Matriculation Examination held
in 1936:-
Age of
Candida es 13- 14- 15- 16- 17- 18- 19- zo -11
Per& ntage of
failures 39.11 40.6 43.4 34.:& 36 . 6 39. 2 4 8 ,9 47. 1 H·5
Calculate the coefficient of correlation and estiinate its probable error. Prom
,-qur results ean you definitely assert that failure is correlated with age? (P.e. S., 1940).
10. The index number of prices of all commodities in Bombay and in Calcutta
were as under:-
Index number of Index number of
Month commodity pricf" commodity prices
in Calcutta in Bombay
May 1942 169 Z04
June 1942 182 ZZII
July 1942 182 u~
A'Ilgust 1941 1911 u8
Sept. 1941 19 8 1119
October 1942 109 zH
NOT. 1941 117 149
I>~ 1942 238 266
January 1943 25 0 155
Pcb. 1943 153 25~
I>o you think prices in Bombay and in Calcutta are correlated ?
{M. A .• Agra, 1944}.
11. The following table gives the distribution of the total population and those
who are wholly or partially blind among them. Find out if there is any relation bet.
ween age and blindness -
Age No. of persons in thousande Blind
0--10 100 4~
10-20 60 40
zo-,o 40 40
30--40 36 4"
4Q--S o 24 36
,0--60 11 12
60-70 6 18
10-80 3 15
(B. COfll •• Agra, 1939).
32
498 FUNDAMENTALS OF STA'J.'IS'llICS

12. Calculate the' coefficient of correlation between the ages of 100 mothers and
daughterS from the following data : -

eOf
moAtfera in
years S-IO 10-1,
Age of daughters
-----
1,-10
-10--1,
- - ----
----- - - - - ---- ----- -----
in years
1S-~0
T:;-I
Is-a, 6 3 ... ... ... 9
1'-35 ~ 16 10 ... ... 29
55-4' ... 10 IS, 7 ... 32:
......
II ,
4'-H ... 10 4 21
SS-6, ... ...7 4 9
Total 9 29 ,2 21 9 100

a,. The',following table gives the number of students IlavJng different heights
and ,weights.

Heights in
Weight in pounds
Total
-
inches
8Q--90 90--100 lOO-IIO nO-I 20 Uo--I30

,cr.--:-H I 3,. 7 , 2 18

'SS-6o 2 4 10 ...-'7 4 1,

60-6, I , 12 10 7
-----
, - "
-
6'--71

Total
...
4 If
Do you find any relation between height and weight ?
8

57 I 28
6 3
16 I
20

100

(B. COl1l:, '~I/a"a_bllli, .!940).


24.The following table gives the frequency according to age groups ormark
btained by 67 students in an intelligence tellt : -
Age, in years
.. ..
Teatmarka 18 20 21 Total
_I 19
'I

i .'
200-2 50
2,0-3 00
300- , , 0
4
3
2
,
4
6
2
4
8
'"
I,

,
2
,
II
14
21
,,0-400 I 4 6 10 al
Total 10 ,19 20 I 18 67
Is there any relationship betwecl!.1-ge and intelligence?
CB. (:0111 •• Agra. 1942).
2,. Calculate from the data reproduced below pertaining to 66 selected village
in Meerut District. the value of r. between "total cultivable area" and "the are
under wheat."
CORRELA'rION 499
Total Cultivable Area' (in Bighas)

..g. 0-- ~oo- 1000- ISDO- ZOOO-


z.~oo
-
·Total
.t:J
i!:-; '0 12. 6
-- ... ...
I--'
t8
to '"
zOO- z:
.8j 40 0-
2
...
18
4
.4
7 S
I 2.7
14
Be: 600- ... . 1 ., Z
'"
I 4
...
..
<
~,5 800-1000

Total
'"
----
l4
---~
..
29
'

II
I
e - . -_ _ _

8
-----
4 -
2
-----
66
3

(1. A. S.. 1949).


26. The correlation table given below shows the ages of husbands and wives
for ,}5 married -couples living together on the census niglit of 1941. Calculate the
coefficient of correlation, between th'e age of husband and t~at of his wife.
Age of Wife
'5- '2'S H- 43- H- 65-
25 35 45 5S 65 75 Total
Ageot
Husband
-----
.,
---- --------
...
---
,I~-a~ 1
'2 12
1 i
10
...
:t
...
...
... 2.
IS
2j-H
... 3 6 ... .. .
'"
IS
35-4'
4S-H ... ...4 ... a I '"
10
~~-6S ... ... 4 2- 8
6S-'U .. ,
'"
3 1 2.
---_ ---- --- ---- -,------
Total 3 '7 14 9 I 6 4 H
(1. A. S., 1910).
27•• Compute: the c6effident-ot correlatior. ~!'Om the following correlation table
sbowing the age in years of the students and the marks obtained.
~x~s~'e~n~~~--~--------~------r-------~--------------~ -----

Age in years • Total of J


~Secies l6-r8 r8-%0 ZO-U U-%4- Frequea-
Marks cies for
x-series
t-----I----f----·1----I---- - . - - - - -
10--20 ~ ] 1 ... 4
t----I--...:.._-+----~---I---~......,. -------
'3 2 2 10

'., . . . 4 ': . . ..~, ~ .. , .. 6 18


......,.._ -'------
2 .,' 4 11
l------I--~---~---- -------
• 2 I. 5
2 1
-------
4
Total of • 10 II IS JZ
frequencies' -
for, series' , -
1M. A, A#garb, 1~1).
500 PUNDAMENTALS OP STAnSTICS

1.8. Ptom the following table calculate the cocSicient of correlation between the
llg e. of fathers and S008 and eathnllte the probable euon of the tcsult obtained.
Age of Father. Age of SonI
Yean
-- %
1-'_- - - - - - - - - - - - - - - -
6 10 14 18 Z% %6 50 Total
S,-60 1 3 6 14
--- --
,0-"
----- -
8 10
-- 6
---
%
--' _._
1.6
~
4,-,0 13 8 4 2 27
- --- - - r - - -- --- --- - -----
+0-45 14 18

'S-40
--- ------
IS 20 8
3
------- "
--------
43
--- ---
30-35 6 12.%5 x6 59
2S-~0 IS 26 :0 I 6:
20-2, U- 10
--- ---
2-
<

34
Total 4~ 48
---
62 H 47 1.7 H ~ ,,00
29. Calculate the coefficient of conelation &om the following data by tbe
method of rank dUfercnce.
- n. 18. 95. 70 • 60. So. 81.. SO.
_7-J20. I,... 1,0. II" IIO. 140. 14.2., 100
~o. Calculate the coeBicient of rank cortclation of tbe ~lloWing data : -
_87, 22, ~3, 75, '37•
.)'-29. 63, S2, 46. 48.
31. Calculate the rank ooeIJiclent of correlation of the following data :-
~-80. 78, n. 75. 68. 67, 60, 59·
_7--I:r.. 13. 14, 14. 14. 16. 15. 17.
J2. The competiton m a beauty contest llre ranked by three Judges in the fonow-
ing order : -

Fint Judge- 1. 6. S. 10. 5. :I 4. 9. 7. I.


Second Judge-_ 3. ,. 8. 4. 7. 10, Z. t. 6. 9.
Third Judge- 6, 4. 9. 8, Y, 2. ~. la, 1, ,.
Use the tank correlation ooeIJicient to dilCUft 'Whk:h pait-of Judges h ...·c the neaut
aPl'roach to common tastes in beauty. (M. A., Al!llWtJi, 19P)".
3l. Calculate the coeftici'!nt of concurrent dniation flOm tb~ data gI.eo bela'W-
Year Supply Pric:c
r9~ 1~ ~2
1944 ,64 2.10
1m I~ :I~
1946 r8z z~4
[947 166 ~6
r9411 (70 2\-:
1949 111 z'o
19~o r9:r. 190
I()~r r86 200
CORRELA1'IUN 501
34. Compote the coefficient of correlation of the following table, (by the method
,of coocurrent deviations) relating to the marks obtained by IZ students 10 Hiltory and
Geography respectively.

Students Maries in History Marks 10 GC'ography


A 65 30
B 40 55
C 55 68
D
B
n 28
b~ 16
F 80 25
G SS So
H ac.. 8,
I 85' aO
J 65 35
K 5S 45
L H 6,

H. Compute the coefficient of correlation, by concurrent deviation method, of


the following table relating'to the average number of employed daily in thousand
and the average number of bales consumed daily in lalm ••

Year Qoarten Average number Average number of


employed daily bales consumed
(000'8) daily in laleh,
I aa
II as
III ao
IV 19
I
II
III
IV
195 1
I
n
III

~6. Compute the coef£lcient of correlation of the short time oscillations from toe
following ignoring decimals.

Yeat Supply Price


19:tI 80 146
19 22 R2 140
192 5 86 13 0
'19 a 4 91 ,
1'''
19:tS 83 153
19 26 8S u7
19 2 7 89 lIS
19 28
19:t9
96
95
9'
100

(_e. Co",., Al/abaJaJ, 194')'


5U2 FUNDAMENTALS OF STATISTICS

~7. ,The following table gives the wholesale price ihdex 'number8 for'Calcutta
and Karachi for the period 19:'7-41 : -
Year Calcutta Index Nos. Karachi Index Nos.
(base: July, 1914) (baae: July, 1914)
192.7 148 137
192.8 145 137
192.9 141 133
193 0 116 108
19;1 96 9~

1932. 91 99
19H 87 97
1934 89 96
19;5 91 99
193 6 91 10Z
1937 10Z 108
193 8 95 104
19'9 108 108
1940 IlIO u6
1941 139 lZ0

calculate the coefficient of correlation of the short time oscillation between the
above two l~dicC8 taking five-yearly moving average and ignoring decimals.

38 • Compute the coefficient of correlation of short-time oscillations between


demand and price from the following table as assuming ,-yearly cycle and ignoring
decimal·
Year Index of demand Index of price

1937 101 I17


193 8 108 97
1939 lOS lOa

1940 145 uS
[94 1 1S3 2.0S

1942 186 19 6
194' 202. 177
2.07 168
1944
194~ 2.0 4 177
1946 19 8 170
1947 zoo 16,
1948 208 170
1949 23 2 17'
228 180
19~o

195 1 22,2. 190

(Figures In the aboye table arc atbitrary)


CORRBLATION 503

59. Compute the coefficient of correlation of short·time oBciUations between


index numbers of Raw Cotton and Cloth from the following table : -

Year and Date Index Number of Index Number of


month week Endin.~ Raw Cotton Cloth
---
195 1
7 96 104-
July 14 101 106
u 99 110
:t8 101 107

7 104 108
August 14 10 5 108
21 10.3 109
2.8 102. 10 9
I
7 107 108
Scpo 14 , 106 107
I 108
ZI lOS
t I 2.8 1I0 IIO

.'
Oct.
I

14
7
I
II,
,109
II 2.
107
2.1 U2 108
2.8 10 7
- 108

(FIgures 10 the above table are arbItrary)


40. A sample of 147 seed-pods is being tested for any possible correlation
between 'length of pod' and number of 'seeds per pod', and the following is • record
of observations taken for the purpose : -

Number of
seeds per pod

Numbetof
6 I I
L:J----
7 8

0
9

28
~
10

10
II

2
Total

147
pods 7 7

Length of
pod In em.
I
---
Number of
poda
Total number of seeds
In each category

S 6 46
6 33 2~8
7 H
" 4~7
8 57 51 4
9 13 113

I
10 2 19
II 1 9
-,.--
Total 147 1,176
Calculate the co-efficient of correlatIon and Its probable error. You may use the
following results;-

L~ngth of rod
if! em.
Number 0 seeds pet pod
Means Variance
1. 18 79
0·9"4
(loA. S .. 19S 2).
504 FUNDAl4ENTALS OF STATISTICS

4~.
How do you find a coefficient of conelation when UBiy ElInk, arc known'?
Persons in the income group Rs. 20.000 to Re. 25.000 were lAked to supply
30
befo.tc a apei:.lfied date Return. of their annual income for some person connmed with
taxation. But when the dat.. arrived. only 2.0 Returns of total income .. Doted below
had been received in the follOWing order.
20.690 %.1.720 2.4,010 2.0,090 20,940 &1,5 10
20.340 220,42.0 2Z,180 2.2,600 2;2,94 0 23.080
u,840 23.,10 Z4.260 2;3.740 24,7 20 23.310
21,3 00 24,Ho
Do the data conflrm the belief that persons with bigger incomes delay sub-
mission of Retums more than the others? What is the measutc of confidence you can
aaaociate with your answer? You ma., assume that the square of the test criterion
t is equal to (_2). pI (I-pl). where p 11 the co-efficient of tllnk correlation and n the
number of oblervations on which the coefficient is based. (1. A. S •• 1953).
42. What is rank correlation and what is the purpose for which it is used? Ob.
tain the formula for the Spearman coefficient of rank correlation.
6~
P=I- n(nl-1:)
When " is the number of individuals ranked and d is the difference in the ranks
IIIIBigned to thc IIalDC individual.
Twelve pictures submitted in a competition were ranked by two judges with
results shown in thc table below : -
TABLE
Picture
lUnk assigned by
1\

5
B

9
c
D~E FG: H-:__2_,-~ rL-
7
I
3 4 12 2 II 10 8
first Judge
--------+--1--1--,-- ---ro--- _--
Rank assigned by 5 8 9 II I 2 10 4 IZ 7 6
second Judge
Calculate p. Is therc a lack of independence in thesc rankinga ?
(Assume that on the hypothesis of independence of two sets of " rankings,
t =p (";'2) i follows the I distribution with ,,-2; degrees of freedom).
Extracts from Statistical Tables
Table t : The normal distribution
The area under the standard normal curve between x=o and s=o." is 0.13 68
and between x=o and X=I.15 il 0.~746.
T able II : Th e Ch' square d'19UIibutlon
.
Degree ot Freedom I .z 3 4
Values of X. ai,nifi- •
cant at 5 per cent evel S·84 5·99 7.81 9·49
of probability
Table III : The I distributIon
Degrees of freedom 10 II
• 12;
- Values of t signilicant 2,2~ 2.10 2.18
at S Ber cent level of pro-
babi itv
(I. A. S., 1956).
CORRELATION 505
4;. Pollowlng Is a table showing height-weight frequency distribution. Com·
pute KuI-PcarIJOIl'. Coefficient oJ Correlation:
~-H~V~a-t~ia~b~le~(b~e~ig~b~t-.~In~ch~CI~)-----·--------
- - ' _ ' - - - ; ; ; - , -f - - - - ' - -
YVa.tiable ~o 6~ 64 66 68 70 72. 74
(weight, pounds) J~~
---I - J ,
~'o '~ I

-, %.2.0

no 5
4
1
4

~
3
1

2.00 2.
---
1 ,
f--
I 1
9 7
f..--
19{' I 3 8 16 ; S
do I S 8 IS I2. I
-
" 170 2. 8 18 2.6 8 I

160 19 40 20 ..
ISO S
1--,
1, 2.6 9 2.

140 I 4 6 S I

1,0 I ; I r

12.0
I--
t t
- ---I - -
4+ The data of the table given below are from a paper on "Wool in the World
Economy." Calculate coefficient of correlation.

Average wool A v:::fc real inconJ<:


consumption per per b (Intemation-
head (I lb. clean) al uaitl)
U.K. S·2.7 1,06 9
New-Zealand 4.6 , 1,2.02.
Australia 4. 6 ; 9 80
Belgium 4·55 600
Sweden 3-55 653
France 3·5.2.~ 684
Switzerland 3·2.2. t,018
Argentina ;.00 1,000
Germany 2.87 646
U.S.A. 2.66 I"SI
Czechoslovakia 2.04 455
Yugoslavia 1·74 HO
apan
italy
Hungary
1.%6
1.1:Z
1.12
353
H3
359
Poland 0·99 35%
U. S. S. R. 0.87 32.0
South Mrica 0.8; 2.{6
India 0.20 200
Cblna 0.10 100-120
506 FUNDAMENTALS OF STATISTICS

4~. From the following data find out if there is any .relationship between density
of population and death tate.

Districts Area in Sq; miles Population No. of Deaths


A. 240 4 8 ,000 576
B. 300 15 0 ,000 2,25 0
C. 160 96•000 1,53 6
D. too 80,000 t,#O
E. 400 I,OO,OOO 1,3 00

46. Co-efficient of correlation between two series X and Y is .85. Their


Co-variance .is 6.5. The variance of X is 6.1. Find the'Standard Deviation of Y
series.

47. From the following information find the number of items : -


r = .s, };xy = 120, Standard Deviation ofy series = 8 and };x2 = .90 where
:II: and y d.-.note deviations from arithmetic average.
48. C:Ompute the G)-efficient of Correlation of the Short-term oscillations from
the fOllOWing data :
Years 193 1 33 39
Supply 80 86 93
Pric~ 146 14~ 130 It7 t33 127 115 95 100
(Assume a three year cycle and ignore. decimals). (M. CoJ .• Allahabad. t945)'
49· Comput... the CoO-efficient of Correlation by Karl Pearson's method and also
by the method of .rank differences in respect of the pairs of series in (..) and (b) given
below. Do the two methods give identical results :with these data ?

a b
x y x y
100 400 21 58
200 600 17 56
300 700 u 64
4QO SOO 23 66
500 100 20 62
600 300 19 54
700 200 t8 60
50. Calculate Co-efficient of Correlation from the following data by (i) Karl
Pearson's Method (ii) SpearInao's Ranking Method and (iii) Concurr~nt Deviation
Method.

Subject 280 275 280 286


Relative 21 20 19 18 16

St. Mention the rules f~r interpretation of Karl Peal, ~ s Co-efficient of corre-
lation. What is the significance of the co-efiicient of correlatio:' _ r, for the follOWing
values based on the numbers of observations (II) 50 and (b) 500 : r '= .2••4. ·9.
COI\RELAnON 507
,Z. Given I Number of pairs of observations of" and y series ~ IS
Arithmetic average of:z: series
Standard Deviation of x series
Arithmetic average of y series
Standard Deviation of y series
Sum of products of deviations of ;z: and y series 122.
Find out (a) Co-efficient of Correlation between x and y, '(b) the probable ereor
of the Co-efficient. (M. A. Raj., 19( 3).
H. Given: r = .56, };xy = 60, oy ,,;, y. ~x2 = 90.

Find the number of items.

54. A student calculates tbe value of. r as +.7 when the value of N is 5, and
concludes that r is highly significant. Is he correct ?

55. Given: Covariance = 271 r =.6 variance- of y "" Z5


Fmd the value of standard aeviation of 'X.

5(>. The following table gives the results of two different intelUgence tests. Find
¢e co-efficient of correlation between them. \

Test B. (y) TestA ~x2


no-ug "120- 2 9 130 -39 140 -49 150-59 160-69 I Total
140-49 2Z 20 10 3 55
ISO-59 17 z8 zo 8 73
1(,0-69 9 18 28 10 3 68
170-79 5 9 10 10 5 5 44
180-89 I -75 I 7 6 I 16
Total 54 69 38 14 6 I Z56

(Nagpur. 1956 Supple).


H. Find correlation co-efficient betwcen age and playing habits of the follow.
ing students : -
Age IS 16 17 18 19 20
No. of Students ,zoo
~i(n :1:00 tSo 120

48
100

30
80
12
Regular PI*yers 15 0 90
(T. D. C., II._".., Raj., 1962).
Regression and Ratio of
Variation 17
Meaning and lise. It has been discussed in earlier chapters that
when a line of the best fit is obtained for data, it gives the best possible
mean values of y for a given value of x (when x is the independent
variable andy the dependent variable). It is also possible to fit another
straight line to the data to obtain the best possible mean values of x for
given values of'y (assuming y as independent variable and x as the de-
pendent variable). If two such lines are plotted on a graph paper (one
which shows the best possible values of y for given values of x and
the other which shows the best possible values of x for given values of
y) a study of correlation can be made from them. If there is perfect
correlation between the two series both the lines would coincide, or in
other words, there would be only one line which would give the best
possible mean'values of y for given values of x and the best possible
mean values of x for given values of y. """-The farther are the ,lines from
each other the lesser 'is the correlation between the two series. If they
cut each other at right angles there is no cotrelation between them.
These lines are called regreuion lines.
We have seen in the chapter on analysis of time series that the
line of the best fit describes the change in a given series accompanying
a unit change in time. Toe regression lines describe the average rela-
tipnshlp between the 11/10 series. In fact there is no difference between
the lines of the ~st fit and the regression lines though the term "line of
the best fit" is generally used when x-series, relates to time andy-series
to the values of a variable. If both x andy series are variables the lines
of the best fit are known as lines of regression. The equations describ-
ing the regression lines are called regression equations. We shall dis-
cuss later on how regression eqqations are obtained. The use of
these termS dates back.to the time of early studies made by Francis
Galton. He made studies relating to the heights of fathers and sons and
found that the deviations in the mean heights of the sons from the mean
height of the race was less than the deviations in the mean height of the
fathers from the mean height of the race. When the fathers were above
the mean or below the mean the sons tended to go back or regress
towards the mean. Regression thus implies going back or returni~g.
Galton studied the average relationship between these two variables
graphically and called the line describing this relationship, the line of
regression. Since then. these 1:er_ms a~e in com?lon use.. Regression
lines thus study the average relationship between two varIables. They
throw light on the correlation between two series. If the coefficient
of correlation betwee.o the heights of fathers and sons is +.7 it mean.s
.REGRESSION AND RATIO OP VARIATION 509
tJiat if a group of fathers have heights which are more than average by
x inches their sons would have heights which would be more than
average by .1>: inches. This going back to the mean or average is called
regression.
Regression equations
Regression equations are algebraic expressions of the regression
lines. Since there are two regression lines there are two regressi9n
equations. Regression line of >: on y gives the best possible me~n
values of x for given values of'y and similarly the regression line of' y.
on x gives the best possible mean values of.1 for given values of x. As
such, regression equation of x on y would be used to describe the
variation in the values of :t(' for given changes in the values of .y, and
similarly the regression equation of y on >: would "e used to
describe the variation in the value of y for given changes in the values
of x. ]f the variations are studied from the respective means of the two
series the regression equations would be calculated as follows : -
Regression equation of x on .1.
(I) x=a+by
The above equation is of the same type as we studied in earlier
chapters In connection with the line of the best fit by the method of least
squares. This equation can also be written in terms of the coefficient
of correlation, standard deviations and the means of the two series.
Thus
- ax -
(it)x--x=r ay (J-.')
Where x'andj stand respectively'for the mean values of x andy
series and fiX and uy for their standard deviations and r for the coefficient
of correlation.
Regression equation of.1 on x
(i) y=a+bx
Or
(ii) y-y=r ay (x-x)
(1X
The symbols stand for the same things as in the previous case.
The' following example would illustrate the above fotmulae : -
Example 1. Calculate the regression equatioos fro· 1 the following
data -
x .Y

z
3
4
J
510 PUNDA!.reNTALS OF STAnS'llCS

This example was solved in the last chapter by the method of least
squares and we had obtained the computed values ofy by the equation
y=a+bx. The equation obtained was Y=lOO+ 34.%'. Now we shall
solve it by taking into account the values of the coefficient of correlation.
standard deviations and the meaOR of the two sc:ties. In the above
example the value of the coefficient of correlation or r is+.69 and tl;lC:
values of the standard deviations of x andy series or ax and ay are res-
pectively r.4 and 69.6. The values of the means of the two series are
respectively 3 and 2.02.. -
With the above figures regression of,,, on x would be
-
_J-J=r ay (x-x)
_._ -
(IX

69. 6
y-2.0'1. =.69 - - (X--3)
1.4
y-2.0Z.=~34 (X-3)
_)'-202.= 34X-102
.Y=34X+ 100
Thus_it will be noted that the regression, c;quation ofy on x ob.
waed by ,this me~hod is exactly the same as obtained by the equation
.1=.+.b~.' ,
The regressioa equation ·of x, on y )voul& l);
- ax
x-x=r --;ry(Y-:i')

X-3 =. 69 - 1·4
- (.,'
V -2ciZ
)
, 69.6 '
x-3=.or4 (Y-'202.)
x-~=.oI4y-Z.·82.S
X=.014Y+·17 2
Frotn the above two regression equations we can calculate the
most probable values of x for given values 0(1 and most probable values
of y for given values of x. ' ,
The 'regression e(Juation ofry on x is .1==:= 34X+ 100• , From this
we can compute the values of y for ,given values ,of _". Thus ~hen
-, x=I •.1=(34X 1)+100=1';4,
x=2..y=(34 X .z.)+ 100=168
x.:_ 3•.1=(34 X 3)+TOO=.z.02
.. X=4.y=(34 X4)+ Ioo=.z.;6
X=5,Y=(34X5)+loo=2.7°
Similarly, from the regression equation of x ony we can. calculate
most probable values of x for given values ofy. The regressIon equa-
tion of x ony is
aEGRESSION AND kATIO OF VARIATION 511

Thus when
Y=I66,x ·",,(.014X 166)+.I7Z=2.496
),=1114, x=-:(.0!4X 184)+.172=2.748
.1=142, X=(·':)14 x. 142)+.172..=2..160
)I=luo, X"=-(.oI4X 180)+.172=2.692
.1=338, X=(.oI4 X 33!!)+.I72=4'904
To plot tire regressi?n line of x ony we shall tltke tl;le actual va1?CS
of'y and the computed 'values of x and similarly to plot the regresSlO n
line of J on x we shall take the a~al values o~ x and computed values
ofy. .
Thtis for. the regressjon line of x on .J the data would be
.l 166 1~4. 142 180 33 8

x 2..496 2..748 2..160 2.692 4.9v4


Similarly' for th.e regression line of'y on x the data would be
y . 2 .5 4 ,

)' 134 168 202 270


These data are plotted in figure 1 :-

Reffessioll Lints
" X'
y,
·014y .. ,172
34" + 100
I I/~VI
4
I 1 ±o ,
,

~!
V
3 ..... . _-_ . - -... ~
It
2
1B

/
l1( ,,
V /. ,,,
:

, '/ /
,,
,- .

0
t!
, '

- "'
.
.., ~~
.
50
/
/ 0 f 0
~R
,zoo
-:
,

y
,,
.
; 0 30t) 350 4t

"
Fig., .I

, '~ro~' the ibove.~eg;~~~jon ..f.iD;es .we .CJ.n 'find ~ut the' valu~ of"
of giv~_.val~e;>,of.Y and the values ofy,for giVC;n values, of~· :thus,
512 FUNDAMENTALS OF STATISTICS

if we have to find out the value ofy when the value of x is 1.5 we shall
use the ~egression line ofy on .'tt:. From the point I.5 on the vertical
scale, a line parallel to the base, would be drawn touching the regression
line ofy on x at point P. From this point a perpendicular towards the
ba~e Hne wou~d be drawn touching it at point R. The value at point
R IS 151. Thls would be the value ofy when the value of x is 1.5. In
the same way "fecan find out..the most grobable value of x for a given
value ofy. Here we shall take ~nto account th(" regt'ession lioe" of x on.y.
Suppose 'We have to find out'the most probable value of x when y is
200, We shall draw a perpendicular from the base at the point where
the value is 200. This perpendicular touches the regression line of x
on y at point A. From this point a line parallel to the base has been
drawn touching the vartical scale at B. The value at this point is 2.972.
This is the most probable value of x when_' is 2.00.
These values can also be calculated from the regression t.quations.
Thus, if we have to calculate the value of,1 when x is 1.5 we shall use
the regression equation ofy on x. It is

Y=34X+ ZOO
If x is 1.$ the value of.J would be
(34 X 1.5)+ 100
=15 1
Similarly. if the value of ~ is to be. calculated when t~e value of
1 is zoo, we shall use the regressIon equation of x ony. It IS
x-=.oI4Y+·I7 2
If.J is zoo the value of x would be
(.014 X 200)+ .172

=2.97 2
Thus we notice that the computed values of x and y for given
values.7 and x as obtained by regression equations are the same as obtain-
ed by the lines o£ regression graphically.
Regression equation and regre8sion coefficients
Regression equations are expressed in teans of mC2n values of
the two sedell and indicate the variation in one series from its mean as
compared to a variation from the mea1\ of ti.c other series. Regression
coefficient gives the value ·by which ~ne variable changes for a unit
change in the other ~ble. Just as t;here are two regression c"ua-
tions similarly there are two regression coefficients. The l'egresslon
coefficient of :Ie on'y would indicate the value by which x variable
would change as compared to a unit change in the value or_,-variable.
Similarly the regression coefficient of.J on x 'would indicate the value
REGRESSION AND RATIO OF VARIATION 513

by which .the y variable would change as compared to a unit change in


,..-variable. The regre~sion coefficients are calculated as. follows : -
I

RegressIon ·coefl1cien t of x on y =- r ax
qv
R •
I,egresslon coe ffi'
Cleot 0
f y on x=r ay
a",.;

In tne last example the regression coefficient of x on y or


aX 1.4
bX'y=r ay -'- .69 69. 6

• 01 4

and the regression coefficient of y on x or


byx=r a y =.69 69. 6
. a X 1.4
=34
It means that
x=.014.Y and
1=34 X
If y-variable changes by I the change in ""'-variable is .014 and if
x-variable changes by I the change in y-variable is 34. The regression
coefficients indicate the slope of the line of regression. From the
regression coefficients it is very easy to calculate the coefficient of
correlation. This can be done as follows :-
ax aV
bxy X b'llx=r -
..1 ay X r--
ax
=rl
or

r,,6: j bX,YXbYX
Thus tbe roeffirien! of correia/ion is the square root oj the product of the
two regression coefficients. It can also be said that the. coefficient of
correlation is equal to the geometric mean of the two regression
coefficients. In the last example the value of bx_y was .014 and of byx 34·
Therefore,
.014X 34

33
514 FUNDAMENTALS UF STATISTICS

In the last chapter we have seen that the value of coefficient of


correlation is exactly this figure.
If only the regression coefficients are needed and there is no need
of the coefficient of correlation, they can be calculated directly in the
following way
ax
hxy=-
av
-E,dxdy (IX

=n ax ay x-y(1

1: dxqy .
= n oy2
1:.dxdy

E, dxd v T"d:>:d v
= l: "Ji2 l:,dy2'-
nX n
1: dxdy
Similarly hxy = :£ dx 2
Where .E dxtfy stands for the sum of the ptoducts of the deviatiom
of two series from their respective means, and dx and dv for the deviations
of x and y series from their respective means. . \
The following example would illustrate the above procedure :--
Exa11lple 2.. Calculate the regression coefficient from the following
data : -
x y
--------
I 166
2. 18 4
3 142
4 180
5------ 33 8
Solution Calculation of Regrus;on Coef!idents
x SerIes I y Serlc;s
IDeviations Deviations
1111 from mean (dx 2) mJ from mean (dy2) dxdy
(dx) Ctfy)
-Z 4 166 -3 6 IZ96 7z
-I IR4 -18 } 2.4 18
3 0 0 142 -60 i 3600 0
4
5 2
I
4 33 8
ISO -22
+13 6
I18494 64
8 -22
27 2
--------- ---- ---- -I--
Total 10 12 42.00 ~40
REGRESSION AND RATIO OF VARIATION 515

The value of
bx:- "Y -_T.dxdy
T.elj2 _- ~ 24 200

The value of
b)'x= "i:.dx4J = 34 0
,T.dx"" 10

=34
It should be noted that the same values were obtained for bxy and
byx when the coefficient of correlation and standard deviation of the
two series were actually calculated.
If instead of actual arithmetic average the assumed arith01etic
average are used, the formula would be
ax
bxy=r--
ay
~dxdy-n (~) (T.;Y) j "i:.:x
2
(.'l::x y
=--- ----,------- - X ------,--

"JT.:X2 eXYJ ~ ("5:.?2) J ~y2 {T.:V y


"i:.,lxdy-n (~X) (~/)
= - - - - - - - _ ._ _- - _ - - - - -

, j>J1: (~: Yj~--_ (~jy


1
-

T.dxdy-n (~x) ( ~v )
-=-------------
n X {~dy2 _ (T.;'Y y}
T.dXdy_n(T.:x) (T.;V Y
==------------------ 2
»{v 2-n ("5:.;Y )
Similarly

l;dXdy-n( ~X) (~v )


byx= - - -___ - - - - - - _
~dx2-n( "5:.;: ) 2
516 FUNDAMENTALS OF STA'tIS'1'lCS

Example No.2 bas been solved below oy this method: -


Cai&lJlalion 0/ R~gre.rsion Cl>effitients

x-Series y-Series
----
Deviations Deviations
from ftom assum-
assumed ed average
average (z) (zoo)
"'1 (dx) (dx S ) 1112 (41) dy2

I -1 I 166 -34 11)6 +;4


z 0 0 18 4 -16 256 0
3 I 1 142. -~8 ;364 -~8
4 2. 4 180 -2.0 400 -40
5 3 9 33 8 +13 8 190 44 414
----- ----_- ---- -~

Total S 15 10 24 2 21,) 35 0
J

l;dxt!J-n ((;:) ( l:'/; )


bxy ------
l;~2-n( ~:' )11
;~O-5(_~ ) (+)
3'°-10 ~

1 Y
(10 2.42.2.0-2.0 2-42.00
2.42.20-j

=.0)4

r,dx4J--n(-;;-
l)iX) ( >=41 ) -11-

Oyx
y_dXJ-fi(--;;-
l:Jx)'
HO-,(+) (-~~)
3~0-10 340
--=-
15-' (+-y 15-~ 10

=~4
R.E'.G~ESSION AND RA.'l'IO OF VAR.IATlON 51i

It should be noted that both the regression coefficients arrived at by


the short-cut m.::thod arc elCJ.ctly the same as ~rrived at ~y the direct Py
cedure. / .
An important point that should alw,ays be kept in mind about the
tegression coefficients is that the values of both the regression coefficients
individually cannot exceed unity, though the value of one of the regtessi~n
coefficients can be more than unity. If the vaiues of both the regression
coefficients are more than I, the value of the coefficient of correlation
which is equal to the sqeuare root of the product of the regression co·
efficients, would also b~ more than (. We know that the coefficient of
correlation cannot exce.::d unity and as such if the values of both the reo
gression coefficients individually, exceed unity it is certain that there is
some mistake in calculation.

RATIO OF VARIATION

We have seen above that the regression equations give us an idca


about the relative variations of the two series. 'A similar study is <tone
by the calcuation of ratio of variatio'l. Ratio of variation tells us about
the variation in relative sertes as compared to a constant variation in the
subjec.t. Although two series may have almost perfect correlation the
proportionate varJations 10 them may be different fro m ~ach other. Thus
price and demand may have a high degree of negative correlation yet the
variations in price may pe proportionately different from the variations
in demand. If we \dsh to know the proportjonate fall in demand if
price rises by say (~~, we can calculate the ratio of variation of the two
~erjes.

Ratio of variation is the arithmetic average of the ratios of the percentage


depiations /fom the mean in the reiatilJe series as compared to those in the IIIbject.
The variable in which the average percentage deviation is less. is generally
taken as the relati ve series so that the value of the ratio of variation may be
less than unity.
The ratio of variation can thus be calculated as follows : -
«() Compute the percentage.. variation of the di tIerent values of the
relative from the mean value. In other words, the arithmetic average of
the relatives should be taken as 100 and the other values should be ex-
pressed llS its percentage and the devia.tfons of these relatives from the
mean value of (00 should bi! obtained.
(2) In the same way compute the percentage deviations of the
subject series from its Olean.
(3) Divide the percentage variation of the relative series by tl;lc
corresponding figure of the percentage variation of the subject.
(4) Calculate the arithmetic average of the coefficients obtained
and it would be the- desired ratio of variation.
518 FUNDAMhl'TALS OF STATISTICS

Thus in the calculation of the ratio of variatior. the proportional


variations of the two series are found out and then a ratio is obtained
to describe the proportionate variation in the relative as compared to a
variation of unity in the subject. In series which are irregular it is better
to study the ratio of variation graphically rather than by the above-
mentioned mathematical procedure, because the graph gives a better
picture of the relative variations of the two series.

Galton's Graph
Francis, Galton who firs! used th~ term regression in his studies
relating to the mean height of fathers and sons has given a graph to study
the ratio of variation between the two series. This graph is known
after his name as Galton's Graph. This Graph is very use(uI when the
variations in the series are irregular as is generally the case in series reo
lating to economic data.

Galton's graph can be drawn as follo\\ s : -

(z) Convert x and y-series into index numbers based on their


respective mean values.
(2) The series in whiCh the variations are less should be taken
as the relative series and the series in which variations are more, as the
subject series. This will give a ratio of variation whose value \Vould be
less than unity .
. (,) Plot the indices of the subject on the vertical scale and of tlie
relattve on the horizontal scale allowing for the time lag, if any.
. (4) The plotted points would give a scatter diagram. Draw a
l1n~ of the best fit by free hand method. In drawing the line the f~llowing
pOlO ts should be kept in mind :_
(0) that the number of points on either side of the line are
approximately equal,
(b) that the points on either side of the line are equi-distans
from the line dra\\'o,
(c), that the line drawn passes through the mean values of both
the series. Since the mean values of both the series ~ re
assumed as 100 the line should pass from a point from
where the values of both x andy are read as TOO on these
respective scaleS.
After the line has been drawn in the above fashion the tangent
of the .angle be~\Veen y and x should be found out, This would give
the ratIO of varIation between y and 'x. The reverse of it would indi-
cate the. ratio of variation between x and y. The following example
would Illustrate the above rules.
REGRESSION AND RA'tIO OP VARU'tION 519

Example 3- Calc~late the ratio of variation between the following


two series by Galton's graph : -

Serial No __ Subject Relative


(x) (y)
----------, --,-------- --------------_._-
24 7. 0
.2. 40 9. 0
; ,6 9. 0
4 50 9-5
5 ,6 10 • .1
6 2.6 N
7 24 8.0
8 34 7. 0
9 5~ 10.0
10 51· 1;.0
II 64 16.5
12. 56 11.0
13 4R 1;.0
14 40 11.0
IS p. 9. 0
16 26 7. 0
Total 640 160.0

The above data would have to be converted into index numbers


based on the mean value of the two series.
Arithmetic average of x-series
640
=~=40

Arithmetic average of _'I'-series


160.0
- 16---=10.0

If the mean values of x andy-series are assumed as 100 the index


numbers of the other values would be as follows : -

Serial No. lndex of x Indices of y


(4 0 =100 ) (10=100)

60
2 100
; 90
4 12 5
:5 90
6 6S
520 PUNDAM:F ...·TALS OF S'l"ATISTICS

80
7 60
II. liS 70
lOa
9 Ijo
10 ljO 13 0
II 160 16 5
12 140 IIO
13 LZO 13°
14 100 110

15 So 9°
16 70
65
- - - - - - - - - _..
The above indices are plotted in the following graph
-60 .-.---~ -

120

'>
.~
~
"I
:::::.
-,
.::.: 00
'..... 8
""=-<
40

O'i-_ _ _- J . _
-#1 ao /20 160
y - $t'r/t'~ (J'l'h/ivr)
Fig. 2. Gallon's Graph Sh01JJing Ratio of Variation

In the above graph x-series has lit(en taken as the subject and
J-series as the relative, because the standard deviation of the indices
of -",-series is more than the standard deviation of the indices ofy-series.
The subject series is on the vertical scale and the relative s'eries on the
horizontal scale. The line which has been drawn passes through the
averages of the two series. The average of both the series is 100. The
number of points on either side of the line is almost equal and they are
also equi-distant from the line of regression. To .6nd out the ratio of
variation We have to find out the tangent of the angle betweeny and x.
for this, any point may be taken on the vertical scale and from it a line
should be drawn parallel to the base touching the line of regression
REGRRSSWN AND RATIO OF VARIA'rION 521

at some point. In the above graph this line is represented by Be. It


touches the line of regression at the point C. The distance of Be divided
by the dIstance of BA (A being the point \\ here the line of regrcs~ion
touches th'e vertical scale) gives the ratio of the variation. In the ?hove
graph BC =40 and BA =60. Therefore
. .. BC 40
RatIO of vaClatlon= FA = 60
='66
Now we can say that if the subject series changes by 1% relative
series changes by '66"1;,. The difference between the ratio o( variation
and unity is called tb'e Ratio of Regression. In this example the ratio of
regression=(I-'66) or .34.
Interpretation of Galton's Graph
Galton's graph can be interpreted as follows : -
(1) If all the plotted poiUl~ lie on a straight line it indIcates that
. there is perfect positive or negative correlation between the two series.
If the plotted points lie on a well-defined curve, it indicates that there is
perfect correlation but there is a time lag between the two series. III the
hgure plotted above since the points do not lie either on a straight line
or on a well-defined curve, correlation between x and'y is not perfect.
(2) lf the slope of the line is left q,ownwards it indicates positive.
correlation. In the above figure since the line of regression (or the line
of the best fit) slopes left downwards correlation between x andy series
is positive.
(3) ,1[ the line slopes right upwards, it indicates negative correla-
tion between x and y series.
(4) If the line of the best fit forms an angle of 45 degrees both at
the vertical and horizontal base it indicates that both the series change
in the same proportion and the ratio of variation is unity. In the above
figure the line of the best fit does not make an angle of 45 degrees with
the base and as such it indicates that the two series change in unequal
proportions. The Line of Equal Proportional Variation (with an angle
of 45 degrees) is also known in the above figure.
(5) If the line of the best fit makes an angle of less or more than
45 degrees it indicates that the proport jnnate var iations in the two
series are unequal. The line of tbe best fit is then called the Line oj
Re,f!re.rsioll. The wider the divergence between the line of regression
and the line of equal proportional variations, the lesser is the correlation
between the two series.
(6) To find out the ratio of variation draw at any point a hori-
zontal line parallel to the base, touching the line of regression at some
point. In the above figure this line is represented by BC. The distance
between the point of inter-section of this H'le and the line of regres:sio[l
522 PlJNOAMENTA LS OF STATISTICS

(C) to the ordinate axis from where the line begins CB) divided by the
distance between B and the point at which the line of regression touches
the vertical scale (A) gives the ratio of variation. In other words, the
ratio of variation is equal to the tangent of the angle BAC or is equal to
BCfBA.

Usefulness of the study of Regression

The study of regression is very useful in various types of analysis.


By its study we are able to obtain the most probable values of one series
for given value of the other related series. Thus, if We know that two
series relating to supply and price are correlated we can find out what
would be the effect on price if the supply of commodity was increased
or decreased to a particular level. Similarly, with the help of regression
equation it is possible to find ont the increase in cost of living followed
by art increase in general price level. Such studies are very helpful in
economic analysis. The utility of the study or regression is very great in
physical sciences where the data are generally in functional relationship.
There it is always possible to exactly calculate~ the value of one variable
for a given value of the other variable by studying their regression. In
economic and social data exact calculations are n9t possible and we
cannot generalize on the basis of sample studies. A study of regression
and ratio of variation is thus very useful from various points of view.
Ratio of variation should preferably be studied graphically. Study
of regression can be done either graphically by regression !lnes or
mathematically by regression coefficients and regression equations.
Ratio of variation is studied from the indices of the data whereas reo
gress ion can be studied from the original data.
Questions
1. DeBne regression. Why are there two regression lines when the coeffidetlt
of correlation is not unity.
a. Explain the concept of regression and ratio of variation and state their uti·
IIty in the field of economic enquiries. (M. A., Punjab, 19S~).
;. Explain with illustration or otherwise the meaning of the term regression
equations. Prove that r is the geometric mean between regression coefficients of.:l
on II( and that of II( 011 y. (I.e.S., 1948).
4. Write short notes on
(0) Ratio of Variation,
(b) RegreSSion coefficients.
(&) Line of equal proportional variation.
,. How would you study regression graphically? Discuss the rules for dtaw·
ing Galton'S graph. How is such a graph interpreted ?
6. Plot a Galton's graph from the following table and show the ratio of vatia-
tbn between .bank clearings and Immigrants for right years; -
REGRESSION AND RATI!) OF VARIATICN 523

Year Immigrants Bl\nk clearing!!


(Tens of ~housands) (Millions of £)

I 79 49
z 62 40
3 33 25

", 6
55
46
62
31
H
34
7 31 34
8 34 2.8

Average 49 35
7. Calculate Karl Pearson's coefficient ot correlatIon and the regteSslon equation.
from the following data : -
Age of Husband Age of Wife
III 17
19 17
20 18
21 18
22 18
23
24
2.5 20
2.6 ZI
2.7 22
(Allahabad, M. Com., 1951)
8. Write down the two regression equations ~hat may be associated with the
following pair& of .values :
(x) 152 114 138 154 144 15; 141 I17 136 154
(1) 193 300 414 594 676 549 4 83 481 659
(I. A. S., 1951).
9. The following marks have been obtained by a class of students in statistics
(out of 100).
Paper I 80, 45, 55, ,6, 58, 60, 6" 68, 70, 75, 8,.
Paperll 82.. ,6, 50, 48, 60, 6z, 64, 6" 70, 74, 90.
Compute the coefficient of correlation for the above data. Find the lines of
regression and examine the relationshlj>. (Indian Alldil ana A&d/.
Examination, 1945).
10. Vital Statistics of the U. P. (in thousands)

Dysentery
Year Fever Respiratory and Others Total
diseases diarrhoea
19~1 10Z.5 37 10 .ub 1300
1932. 8H 34 13 176 10 7 6
1933 69 8 35 12 160 90 5
1934 97 0 47 18 260 12 95

Find out 'r' of the deaths from the fevers and lOtal deaths given above. Cal.
culate standard error of this coefficient and the line of regression - of the death from
fevers on total deaths. (M. A •• Ag,... , 1937).
524 FUNDAMENTALS OF Sl'~TISTICS

n. Given the f'~llowing values of' Arithmetic Mean, Standard Deviatlon and
~dcnt of Correlation of 240 sets of values; find the regression equation of:J< in
(tems of ,:<., and.:l. •

:<.=4,91 O'J= 1.10 "18= -0.40


No=S94 O's=8$ ".,=-0.S6 (M. A., MaiD., Pltfljah. 1946).
12. The heights of fathel'll and 80ns are given in ,the following table : -
Height of fathet In Inches 65 66 67 67 68 69 71 7'
Height of 80n in inches 67 68 64 68 72. 7~ 69 70
Porm the two lines of regression and calculate the expected average height of
the Ion when the height of the father is 67.S inches.
15. Find the most likely price in Bombay corresponding to the price of RI. 70
at Calcutta from the following data : -
Average price at Calcutta 6S
at fk)mbay 67
Standard deviation at Calcutta 2..S
at Bombay 5.S
Coefficient of correlation ls+.8 betWeen th~ two prices of the commodity In the
twO townl.
(M. Com., Ava, 1951).
14. The following statistical coefficient were deduced in the course of an
examination of the relationship betWeen yield of whcat and the amount of rainfall : -
Yield in lbs. per acr~ Annual rainfall ill
iuches
Mean • 12.8
Standard Deviation 70.1 1.6
r between yield and rainfall +0.5 z
From the above data, calculate (a) the most likely yield of wheat per acre when
the annual rainfall is 9.& inches, and (b) the probable annual rainfall for yield of 1400 Ib
per acre.
(M.A .• Agrtl, 19~8).
1 S• The following data a~e giv~n for marks in English and Maths., In the
S. L. C. Exam. of the U. P.ln a ('.crtam year:-
Mean marks in English 0= 39.~

.. ., .. Maths. =47.6
Standard deviation of marks in Englisb = 10.8
.. t, t, .. .. Maths = 16,9
r between marks in English and Maths. =+0.42.
Form the two lines of regression and explain why there are two equations of
regression. Calculate the expected average marks in Maths. of c:andidates who received
~o marks in English. (U. P. C. S., 1941).
16. In the following table are recorded data shOWing the test scores made by
salesmen 00 an intelligence test and their weekly sales : -
SalCSlIlllQ 2 ~ 4 S 6 7 8 9 10
l'e6.t Scorea 70 So 60 80 SO 90 40 60 60
Sales (000) &·5 6.0 4'5 ,.04.5 2.0 S.,
5.0 4.'
REGitl'-SSION AND RATIO OF VARIATION 525
Calculate the regression line of sales on test score, and estimate the most probable
weekly sales volume if a salesman makes a score of 70. What will be sampling error
of your c:stimate ? (T. A. S., 1948).
17. Given

Find out : -
(a}.1 (b) ,
18. In a partially destroyed laboratory record of an analysis of cortelation data
the follOWing results only are legible:-
Variance of tK- 9
Regression Equations : -
8,..-1°.1+66=0
40,..-18.1 -=214.

What were (a) the mean values of,.. and.1, (b) the standard deviation ot-" and (&)
the coefficient of correlation between,.. and.1? (1. A. S., 1947).
19· Explain what is meant by a scatter diagram and line of regression. Why
should there be, in general, two lines of regression for each bivariate distribution?
You are givcn the follOWing results for the heights (x) and weights (1) of 1,000
policemen of U. P:
,..= 68'00 ill., .'1= 1 50-00 lb. '=+0.60
ax=250 in., o-,=2.o·oolb.
'" Estimate from the above data (0) the height of a particular policeman whose
'weight is 200 lb.,(b) the weight ofa particular policeman who is 5 ft. tall (P. C.S., I9B).
2.0. (a) Explain the concepts of correlation and regression.
(h) How is a regression equation, such as that of the timber volume of 11 tree on
its height and girth measurements, useful in predicting the total timber volume of
standing tre'es in a plan tation ?
(&) The correlation between marriage rate and the value of industrial exports
over a number of years is of the order of 0.95 for a certain country. Recalling the
purpose for which the correlation coefficient was introduced, what conclusions can you
draw about the association between marriage rate and ihdustrial production?
(1. A. S., 19S7).
21. Suppose a school class has an examination at the beginning and at the end
of the school year: What is meant by"'regression of final grades on beginn ing grades"
and by '<regression of beginning grades on final grades"? Which of these would be
more useful in practice? Might these regressions coincide?
u. 18 the regression in the popu~tion always a straight line? If [lot, give an
example of a population where it fa not?
&3. Explain the dilference between regression and "correlation" problema.
Clln a correlation problem al60 be a regression problem ? Can a regression problem
also be a cortelation problem?
a~. Label the following examplts as regression or correlation type problems.
It is desired to study:
(a) The conne~tion between 1. Q. and weight of Is-year-old girls.
. (b) The connection between' the velocity of ,the Gange8 River and its depth at
vanous points.
. (;) The connection between the amount of winter snow and the batley yield
to r some locality.
526 FUNDAMENTALS OF STATISTICS

(II) The connection between tensile strength and hardness of aluminium.


(e) The connection between the size of brains and success in life.
(f) The effect of an anti-histamine drug upon )t'ngth of time it takes to recoVp.r
from colds. '
. (8) The efl:ect upon the size of fish of dumping waste material from mills Into
a nver.
(h) The connection between city size and am~unt of crime.
2:1. Measurements were taken of the ability of rats to "un a maize before (N)
and after (y) a stimulus. In a sample of 300 rats it Was found that :_
average (x) = 16.0
(1) =1%.8
Standard deviation (x) = 4.
(y) =3·7
,. =·4
(a) Draw the estimated regression line ofy on x.
(6) Estimate the mean y for x=17.0.
(!) If a rat hag X= r 7, estimate by a 90 per cent interval his.)' value.
1.6. From the following data ascertain the ratio of variation between sales and
profits, ~

~l
Year: 195 2 B 54 55 56 57 58 59 60 61 61.
Sales 72 84 66 60 48 42 54 63 51 57 69
(Lakhs Rs.)
Profits
(Lakhs Rs.) 42 P 48 46 30 2:1! 36 3 8 34 42 44 40
27· Plot a Galton graph from the following Table and show the Ratio of
Varation.
Year
Subject Series
Relative Series
, :toI
I 160
I 2
30
170
f 503
180
I 4\5\1:>\71
100
190
100 12.0
ZOO
'150
Z10 2.2.0
8
160
:t~o
1]0
2.40
9

2:8. Two lines of regression are given: x +


:tv - 5 = a and zx ~y :_ 8 +
= 0; and ox2= 12:. Calculate .he value of x, y, ay2 and r. (Raj., M. Com., 1965).
2:9. The ages of husband and wife in a community were found to have a cor-
relation co-efficient .8; the average of husband's age was :t5 years and that of wife's
age 2.2. years; their standard deviations were 4 and 5 respectively. You are required
to draw two lines of regression and to measure (i) the expected age of husband when
wife's age is u. years and (;;) the expected age of wife when husban.l's age is 33 years.
30. Show that the co-efficient of correlation is the geometric mean between the
two regression co-efficients. (M. A. &0., Delhi, 1950).
31. Find the most likely price in Bombay corresponding to the price of Rs. 70
at Calcutta from the following data :
Calcutta
Average Price:- 6~
Standard Deviation : 2..,
REGRESSION AND RA"I'IO OF VARIA'I'ION 527
Co-efficient of correlation is .8 between the two prices of rhe commodity in two
towns. (Ai. A. Eco., jilmji, Gll'alior, 1965).
32. Find the co-efficient of correlation when bxy (co-efficient of regression of
"{on y)=.84 and byx (Co-efficient of regression of yon x)=.4.
(Rqjajlhan, AI.Com. 1965).
\
33. Find our aY and from the following information ; -
I'

3x = y, 8y = 6x and ax = 4.
34. Find out the following : -
(i) Co-efficient of conelation,
(ii) The two regression equations,
liii) Most likely value of x when y is 34,
(iv) Most likely value of y when x is 47.
(II) The regression Co-efficients.
Subject Series: 48 50 53 49 53 49
Relative series ; 36 32 33 38 35 30
3'5" Explain the circumstances when co-efficient of correlation would be more
than unity.
36. The two regression lines between height (x) in inches and weight (y) in
Ibs. of male students are : -
4Y - 15 X i- 530 = 0

20X - 3Y - 975 = 0

Find the mean height and weight of the group and r. Also estimate the weight
when the boy has height 80 inches and hc:;ight whe&weight is 167.5 Ibs.
37. For 30 students of a class the regression equation of marks in Htstory(x)
on the marks in Geography (y) is : - 3Y - 5X 100 = + o.
The mean marks in Geography is 40 and the variance of marks inJiistory is
4_ of the variance of marks in Geography. Find the mean marks in History and the
9
Co-efficient of correlation between marks in the two subjects.
38. Comment on the following : -
The co-efficient of correlation between x and Y and x and z is the same. Hence
the rate of increase of x with respect to y is"the same.
39. The values obtained in measurement of charactcJ;:s x and y on each 35 indivi·
duals led to the following table : -
Mid-point of x : 10 15 20 25
Frequencies: 4 6 8 10 7
Average of y : 3 4 6 8 9
Obtain the linear regression equation of y on x. C~n you determine from
rhis data regression equation of x on y? If not why ?
Theory of Attributes and
Consistence of Data 18
Meaning ~
It has been said in an earlier chapter that statistics deal with quan-
titative data alone. Quantitative data may arise in any of the following
two ways : -
(a) In the first place an investigator may measure the actual mag-
nitude of some variable-height or weight of a group of individuals,
their income or expendittlCe, marks obtained by a group of students or
the number of labourers getting a particular amount as wage etc. In
all these cases the data are such that a faidy accurate quantitative mea-
surement is poss ible. In all the previous chapters we have discussed
various statistical methods applicable to such type of data which are
known as .flatiItie.f "f tJariable.r. Measures of central tendency, measures
of dispersion and skewness and correlation are some of the important
statistical methods used in the analysis of sucp. data.
(b) In the second place data might be such that it may not be
poss ible for an investigator to measure their ma~nitude. In such
cases the observer can only study the presence or absence of a parti-
cular quality in a group of individuals. Examples of such phenomena
are blindness, insanity, deaf-mutism, sickness, honesty, extravagance
etc. In such cases an observer cannot measure the magnitude of the
data ; for example, he cannot measure the extent of blindness or honesty
in quantitative form. All he can do is to count the number of persons
who are blind or who are honest. He has to take this decision on the
bas is of some standard definition of the term in question. Such data
in which the quantitative measurement of the magnitude is not possible
and in which only the presence or absence of an attribute can be studied
are called statistics of attributes.
In the present and the succeeding chapters ·we shall be discussing
some general aspects of the theory applicable to statistics of variables.
It should be noted that the methods of statistical analysis applicable
to statistics of variables can be used to a certain extent in the analysis
of statistics of attributes also. For example, the presence or absence
of attributes may be treated as changes in the values of variable (which
has only two values).
Classification of data
In the ana lysis of statistics relating to attributes, the first thing
is the classification of data. Here data are classified on the basis of
THEORY OF ATTRIBUTES AND CONSISTENCB OF DAT" 529

presence or absence of particular attributes. If only one attribute, say


blindness, is being studied the population would be divided in two
classes-one cpnsisting of those people in whom this attribute is present
and the othel consisting of those in whom this attribute is not present.
Thus one class would be of the "blinds" and the other of cenot-blinds!'
If more than one attributes are taken into account the number of classes
wourd be more than two. If, for example, the attribute of deafness is
also studied, there would be a number of classes in which the universe
would be divided. There would be "blinds" "not-blinds", "deafs",
"not-deafs", "blind and not deafs", "blind and deafs" "not-blinds and
deafs" and "not-blinds and not-deafs."
Cl~ssification is arbitrary and vague. It should be clearly understood
that when the universe is divided in, say, two classes-"blinds" and "not-
blinds" there is no clear-cut line of demarcation between them. It is
very difficult to lay down such a definition of the word blindness which
may give two clear-cut classes. In practice, the attribute of blindness
,gradually transforms into attribute of sight and there are many cases
on the border line. The boundary in such cases is very vague and
uncertain. In some cases the line of deman.ation may be arbitrary.
For example, when people are classified as tall and short, an arbitrary
figure is the dividing line. People above a certain height (decided
arbitrarily) are called tall and below it shalt.. In all types of analysis
1"elating to statistics this point should always be kept in mind.
Clas.rificatioll by dich%!1!Y. If only one attribute is bdng studieli
the universe is diVIded in two parts-one in ,,,hich the attribute is
present and the other in which it is not present. These classes are
mll/llollv exclllsive. Such a classification where the universe is divided
in two parts is called "Classification by Dichotomy." In actual analysis
usually there are more than two classes in \.vhich the universe is divided
and such classification is called manifold darsification.
No/alion and terminology. For the sake of convenience in analysis it
is necessary to use certain symbols to represent different classes and their
frequencies. Usually capital letters A, Band C etc., are used to denote
the presence of attributes and the Greek letters, a, fJ and Y etc., are used
to denote absence of these attributes respectively. Thus, if A represents
the attribute of blindness a would represent absence of blindness, if B
represents deafness fJ would represent absence of deafness and if C
~epresents insanity Y would represent absence of insanity. The number
of units possessing a particular attribute represented by A would be
termed as belonging to Class A, and similarly those in whom this attri-
bute is absent would be termed as helonging to Class a.
If two attributes are being studied their combination' can be re-
presented by the combina tion of the letters representing the two attri-
butes. Thus, if blindness is represented by A and deafness by B then
AB would represent blindness and deafness;.; AfJ would represent
blindness and absence of deafness; aB would represent absence of
34
530 FUNDAMENTALS OF STATISTICS

blindrtess and presence of deafness; and af) would represent absence


of blindness and absence of deafness.
The number of units in different classes are called "classfrequencies."
Thus, if the numbet of blind and deaf people is 20 the frequency of class
AB is 20. Class frequencies are denoted by enclosing the class symbols
by brackets. Thus (AB) would represent the frequency of the class
AB.
If there is one attribute rr::presented by A the total number?f classes
is 3 (if the total or N is also taken as a class); they \vould be A, a and N.
If the number of attributes is tWG, represen ted by A and B the total num-
ber of classes (including N) would be 9. They would be N, A, B, a, p,
AB, AP, aR and a$.
If the number of attributes is three the total number of classes
(including N,) would be 2.7. The total number of clas8es is always
equal to 3n where 11 stands for the number of attributes. Thus, \vhen
there are three attributes the total number of classes would be 3 9 or
1.7 if the number of attributes is 4, the tntal number of classes would be
3' or Rr.
The attributes denoted by A, Band C, etc., are called positive
attributes and those represented by a, fl, y, etc .., are called negative attri-
butes. Thus A, AB, ABC ar(" all positive classes and a, p, yare negative
classet. AP, aB, APC are pairs of contrary'classes. \
If there are two attributes A and B then the classes AB, Aj3, aB,
and af! would be called «classes of second order; and since there are no
further classes of higher orders they would be also called the "classes of
IIltin/ate order." The classes A, a, Band P would be called "classes of the
first order," and N the "clan of zero order." If there are three attributes,
the classes ABC, AflC, etc., would be classes of the third order or ccc/asses
of the ulfimatp. order", and AB, AC, BC, AP, pC, etc., would be called
classes of the second order. The frequencies in these classes are called
first order, second order and third order frequencies. The total number
of classes of the ultimate order is always equal to 2.n where n represents
the number of attributes. Thus, if there are two attributes the total
number of ultimate order classes would be 4 and if there are three attri-
butes the number of ultirpate order classes would be 8.
In classifying statistical data according to attributes the following
simple rule should be kept in mind. Any class freqllenc)I Can aII1'ayJ' be
expressed in terms of class freql/encies of higher order. Thus the frequencies
of first order can be expressed in terms of the frequencies of the second
order which in turn can be expressed in terms of frequencies of the third
order and so on. On the basis of this mle we can set up various types
'Of relationships between the frequencies of different orders. If there
is one attribute only. represented by A. the Frequency of the uni-
verse or N can be divided it;l two classes (A) and (a). Thus
N=(A)+(a)
THEORY OF ATTRIBUTES AND CON~I1>TENCE OF DATA 531

Now, if one more att:J;ibute B is taken into account the first order
classes, i. e., A and tI can each be divided into two classes-one in which
I!ttribute ,B is present aqd the other in which it is not prescnt.
Thus
(A)o=( AB )+C AP)
(a)=( aB )+( ap)
Similarly
(B)=( AB )+( aB)
lfJ)=( AfJ )+( ap)
If there is a third attribute C also, then each of the above second
order class frequency can be divided into two classes-one in which Cis
present and the other in which C is not present. Thus
, (AB)-,-( ABC)+( ABY )
(AB)=(A{JC)+C ABy)
(a B) = ( aBC )+( aBy )
(afJ)=(afJC)+(aBY)
From the above, it is clear that if higher order frequencies are given
we can find out the values of the lower '.Jrder frequencies without much
difficulty.
Thus
( ABC )+( ABy )=( .riB)
(ABC)+C APC)=(AC)
(ABC)+(aBC) =(BC)
( A~C )+( Af3Y ) =( Aft )
The frequencies of the other classes can be similarly calculated.
From the above we can easily conclude that
(A) =(AB)+CAP)
=( ABC )+( ABy )+( APC)+( AP,,)
The following example!1 would clarify the above rules.
Example l. Given the following ultimate class frequencies, fi1• .:1
the frequencies of the positive and negative. classes and the total number
of observations : -
CAB) =250 CAP) =120
(aB) =200 (ap) =70
SDIJlH()fJ
N=(A)+(a)
=(A B)+ (AP)+ (aB)+ (af3)
=250+120+ 200+70
=64 0
(A) =(AB)+(AtJ)
=25°+12::)
=~7:::
532 I't.JNDAMENTALS OF STATISTICS

(B)=(AB)+(oB)
=250+ 200
=45 0
(a) = (a B) + (ap)
=200+7°
=270
CP)=(Ap)+(a/3)
=120+70
=19 0
When only two attributes are involved the class frequencies can
be very easily found out by a table of the following type. In this table
the given data can be filled in the relevant cells and frequencies of the
blank cells. can be easily found out.
A II
(All) (all) (li)
B
Z5° zoo ~

450
---- ----- ----
(A{J) (af!) 001
{J
120 70 190
- - - - - - ----
370 270 640

(A) Ca) (N)


From the above table the values of the Dositive and negative classes
and the total number of observations can be very easily found. Thus
the value of (B) should 'be the sum of 250 and 200 or 450; the value of (A)
would be the sum of 250 and 120 or 370 the value of (0) would be the
sum of 200 and 70 or 270 and the value of ((J) would be the sum of 120
and 70 or 190. The value of N would be the- sum of the values of (B)
and CP) or of CA) and (a). Thus the value of N would be 640'
From the above table it is also possible to find out the value of
(AfJ) if the values of (A) and (Afl) are given. It would be equal to (A)-
CAB). Similarly Cap) if> equal to ~a)-CaB) which in turn would, be equal
to N-(A)--{(B)-(AB)} or it would be equal to N-(A)-·(B)+(AB)
Similarly other calculations can also be made.
Exampl, 2. Given the following table. calculate the frequencies
of the remaining classes : -
(A)=5 0 (AB)=,o
(B)=40 (N)=IOO
THEORY OF ATTRIBUTES AND CONSISTENCe OF DATA 533

Sollltio"
Here we have to tind out the frequencies of the following
classes : -
(a) ; (/3) ; CAlf) ; (aB) ; (all) •
Now
(a)=N-(A) =100-~0=SO
(/3)=N-(B) =100-40 =60
(AfJ) =(A)-(AB) = 50-;P= 2.0
(aB) = (B)-(AB) =40-30 =10
(.1J1)=(a)-(aB)
=N-(A)-(B)+(AB)
=100-5 0 - 40 +3 0
=40

In a nine square table the given data would be written as follows : -


A II
CAB) (aB) (B)
B
30 40
- - - ------ -----
(A{3) (all) (f3)
---- ---- ----
(A) (a) N

SO 100

"

If the blank columns are filled up the remaining class frequencies


would be very easily obtained. In the above two examples we had
taken onlv two attributes into account. If there are three attributes
the total number of class frequencies would be ;n or 2.7. out of which
frequencies of the ultimate class would be 2.a or 8. There would be
r 2. frequencies of the second order, 6 of the first order and I of 0 order.
ff the 8 ultimate class frequencies are given, we can easily calculate the
I" second order frequencies; 6 first order frequencies and the total
number of observation N.
Example ;. A number of labourers in a factory were examined
for the presence or absence of certain defects of which three chief
descriptions were noted : -
A-Physical weakness
B-Nerve signs
C-Mental dullness
534 PUNDAME.~TALS OF STATISTICS

Gjvcn the followmg ultimate frequencies, fir1d the,frequencie:.


of the positive classes including the whole numbc:r of observatiom
N :-
(ABC)== 75 (aBC)= 98
(AB,,) =3 10 (aB'Y)=7 0 :l
(A{JC)==n.06 (a{JC)= 74
(AIl"Y)=489 (a{J'Y)di41j
Jti/ulion
The positive classes including the whole number of observatioDs
N, are
N (A]3)
(A) (Ae)
(D) (BC)
(C) (A13C)
Now.
+ +
N =(ABC)+ (AB,,) + (APc) (AfJ'Y) (a BC)+ta'Wt)+ ta~J+
(a{J'Y)
=75 + 310 + 106+489+98+70.!+74+84i5
= 10269
(A) = (ABC) + CAB,,) + (ASC) + (A{J'Y)
=75+310+106+489
=-9 80
(B)=(ABC")+(AB'Y)+ CaBC)+(aBy)
=75+;10+9 8 +7°2
=1, 18 5
(C)=(AnC)+(aBC)+(A{JC)+(a,9c)
=7j+9 8 + 106 +74
=353
(bB)=(ABC) + (AB,,)
=75+3 10
=3 8 5
(AC)=(ABC)+(Af3C)
=75+ 106
=181
(BC)=(AHC)-J (aBC)
=7) -1 9&
=173
(AUC) -=75 (Givefl)
THEORY OF ATTRIBUTES AND CONSISTENCE OF DATA 535

In the above example we found out the positive class frequencies


from the frf quencies of the ultimate classes. It is also possible to find
out the ultimate class frequencies if the frequencies of a lower order
and one of the frequencies of the ultimate class are known. The follow-
ing example is one of this type : -
Example 4. Given the following frequencies of the posith
classes find out all the class frequencies : -
N=I.2.000 (AB)=45 ;
(A)=977 CAC)= z8 4
(B)=II8) (BC)=Z5 0
(C)=59 6 (ABC)=12.7
Solillion
The classes for which frequencies have to be found out are:-
(a), (fI), (7), (A{1), (aB), (a{J), CAy), (aC), (07), (By), «(1C), CPy), CABl') ,
(AfJC), (aBC), (AfJy), (aB,,), (apC), (013,,)
Now
(a) =N-(A)=12000--977
=IlOZ3
(fJ) =N-(B)=1Z000-11 8S
=10815
(7) =N--(C).= 12.000-5 96
=1140 4
(ABy) =(AB)-(ABC) =45 ;-12.7
=;2.6
(A{3C) = (AC)-(ABC) = 284- 1 27
=15]
(aBC) =(BC)-(ABC)=25 0 - 127
=12;
(A{JT) =CA(1)-(AfJc)
=(A)-(AB)-CA{JC)
=977-45,-01 57
=;67
(aBy) =(aB)-(aBC)
=(B)-(AB)-(aBC)
=1185-45;-123
=6°9
=(PC)-(ApC)
=(C)-(BC)-(APc)
=59 6- 2 5°-157
= 18 9
536 FUNDAMENTALS 01' STATISTICS

(Afj) =(A!3C) +(AP,,) = 157+ 367


=52.4
(a-B)=(abC)-;- (aB:')=12 J+ 609
=732.
(A,,) =(AB,,) +(A,8,,) = 326+ 367
=693
(aC)=(aBC\+(a!3c) = 123+ 189
=3 12
(B,,)=(AB,,) +(a]3")=326+609
=935
(/3C)=(AfJC)+(a!3C)=ln+ 18 9
=)46
(aP" )=(af3)--(afJC)
=(fJ)-(AfJ)-(afJC)
==10815--524-189
=10102.
Cap) =(a.t~C)+(afl")=189+IOJ02
=102.9 1
(ay) ==(aBy)+(aP'Y)=609-+ 10102
=1071 I
(fJ,,) = (AP'Y) +(apy) = 367+ 10102
= 10 4 6 9
Consistence of data
Meaning. It is obvious that in statIstics of attributes when
frequencies of various classes are counted, no class frequency can be
negntive. If any cJass frequency is negative the data are said to be
inconsistent. Such inconsistency Olay be due to wrong counting or
inaccurate additions or subtractions or may be the result of misprints.
In order to test whether a set of figures is consistent, various class
frequencies should he found and if no class frequency is neg~tive
apparently the data are consistent. It should be remembered that
consistence of data is no proof of accurate counting or the correct
calculations though the inconsistence of data proves that there is either
a mistake or mispr!nt in figures. The easiest way to find out whether
the data are consistent ~s to obtain ultimate class frequencies because
jf there is any inconsistence, one or more ultimate class frequencies
would be negative. It is also possible to lay down rules for testing
the consistence of data. For example, the value of CA) cannot be
more than N otherwise (a) would have a negative value. Similarly
(A) cannot' be less than 0 otherwise (A) itself would have ~ negative
value.
Thus
If there is onlY "one aI/rib/lie represented by A
(1) (A) <: 0 otherwise (A) will be negative.
(2) (A) )- N otherwise (a) will be negative since
N=(A)+(a)
THEORY OP ATTRIBU'tES AND CONSISTENcE OP DATA 537
If tlJlre are J11I0 attriblltes represented by
.
A and B
(;) (AB)< 0 otherwise (AB) will be negative.
~4) (AB)«A)+(B)-N otherwise (alI) will be negative*
(~) (AB»(A) otherwise (AP) will be negative since
(A) =(AB)+ (A,B)
(6) (AB»(B) otherwise (aB) will be negative since
(B) =(AB) (aB) +
If there ar, three (/tfributeJ represented by A, Band C
(7) (ABC)< 0 otherwise (ABC) will be negative.
(8) (ABC)«AB)+(AC)-(A) otherwise (AP'Y) will be negative"·
(9) (ABC)«AB)+(BC)-(B) otherwise (aB')') will be negative.
(10) (ABC)«AC)+(BC)-(C) otherwise (apC) will be negative.
(II) (ABC)::>(AB) otherwise (AB')') will be negative
since (AB) =(ABC)+ (AB'Y)
(rz) (ABC)::>(BC) otherwise (aBC) will be negative
since (BC)=(ABC)+(aBC)
(13) (ABC)::> (AC~ otherwise (.Af3C) will be negative
since (AC)=(ABC)+(APc)
(14) (ABC):Jo.(AB)+(BC)+(AC)-(A)-(B)-(C)+N otherwise
(tiP')') will be negative··'"

*Proof
(ap)=(a)-(aB)
=N-CA)-{(B)-(AB) }
=N-(A)-(B)+(AB)
-(AB)=N-(A)-(B)-(ap)
(AB) =-N+(A)+ (B)+(ap) =(A) + (B)-N + (ap)
Now if (AB) is less than (A)+ (B) -N it is obvious that (tlfJ),
would be negative
·*Pro(;f
(Ap,,) =(Ap)-(AfJC)
=(A)-(AB)-(AC) + (ABC)
-(ABC) =(A)-(AB)-(AC)-(AP')')
(ABC) =(AB)+(AC)-(A)+ (AP'Y )
Now if (ABC» is less than (AB)+(AC)-(A) 1t is clear that (A{Jy)
would be negative.
Similarly other rules can be proved .
•• ,. ProlJ.!
CaP')') = (ap)--{afJC)
=(a)-(aB}-{fJC) +(AfJC)
+
=N-(A)-(B) (AB)-(C)+(DC) + (AC)-(ABC)
(ABC) = (AB)+ (AC) + (BC)-(A)-(BHC)+N-(ap'Y)
Now if (ABC) is more than (AB) + (AC)+ (BC)-(A)-(B)-(C) + N
it is clear that (ap')') would be negative.
FUNDAMENTALS OF STATIS1'ICS

From the above rules relating to the consistence of dara in ca se


of three attributes we can derive four new rules.
Thus -
(ABC) ",0
(ABC) ~ (AB)+(AC) +(BC)-(A)-(B)-CC)+N
The upper limit of (ABC) cannot be less than the lower limit.
Therefore, •
(AB)+(AC)+(BC)-(A)-(B)--(C)+N '(0.
CI 5) (AB)+('AC)+(BC)> (A)+CB)+(C)-N
Similarly by combining the other sets of lower and upper limit
o( (ABC) we can get
(1<» «
(AB)+ (AC}-(BC) (4'\)
(17) (AB)-CAC)+ (BC) <t (B)
(:11) -(AB)+{AC)+(BC)«C)
Incoll1'l'lete data
The above rules rdating to consistence of data are also used to
till the gaps if the data are incomplete. \'Vith the help of these rules
it :s possible to lay down the minimum and maximum limits of a
particular class frequency. Below are given som~ examples in which
data have been tested for consistence in accordance with the above
rules. In the last two examples the data are incomplet<1 and with the
help of the above rules the limits \vithin which the values Of the missing
classes should vary are obtained : -
ExaoJple 5. To investigate the association between eye-colour of
husband and eye-colour of wife, the following data are available : -
Husbands with light-eyes and \Vives with not-light eyes .. , 4.14
Husbands with not light eyes and wives \vith light-eyes 260

Husbands with not-light eyes and wives with not-light eyes z 38


Husbands with light-eyes 400

Do you find any inconsistency in the above data?


Solution
Denoting
Husbands with light-eyes by A ana
Husbands with not-light-eyes by
Wives with light-eyes by
Wives with not-light eyes by
the given data are
(A,B)=414(aB)=z60 (af3)=238 (A)=4 00
F rom the above,
(AB)=(A)-(A{3)=4::>~-414
=-14·
THEORY OF ATTRIBUTES AND CONSISTENCE OF DATA 539

Thus AB has got a negative value. Hence the given data are
inconsistent as it is obvious that no class-frequency occurring by count-
ing real attributes can be negative.
ExampJe 6. A labour welfare officer returns the following number
o£ workers observed with certain classes of defects amongst a number of
factory workers. A denotes development defects and B, low nutrition.
N=6oo (A)=25o(aB)=4oo(A,B)=zoo
Do you find tbe data consistent ?
Solution
From the given data,
N=600
(A)=z5 0
(AB)=(A)-(A,B)=z5 0 -.200=5 0
(B)=(AB)+(aB)=5 0 +4 00 =45 0
Now, according to the conditions of consistence
(AB)~(A)+(B)-N
or (AB) ~ 2.50+410-600
or (AB) ~ 100
But the value of AB is 50 which is less than 100. Hence the given
data are inconsistent.
ExampJe 7. A market investigator returns the following data of
10000 people consulted
8110 liked chocolates,

75 zo liked toffee,
4180 liked boiled sweets,
5700 liked chocolates and toffee,
3500 liked chocolates and boiled sweets, and 3480 liked toffee and
boiled sweets,.
2970 liked all the three.
Show That this information as it stands must be incorrect.
Solution
J:?enot!ng
persons wbo liked chocolates by A
persons wh? liked toffee l-,y B
and persons who liked boiled sweets by C
tbe given data are :
(A)=8 TIO (AC)::..-= 35 00
(B)=75 zo (BC)=H80
540 FUNDAMENTALS OF STATISTICS

(C)=4IS0 (ABC)=z97°
(AB)=S700 N=IOOOO
Now, according to the conditions of consistence,
(ABC» (AB)+(AC)+ (BC)-(A)-CB)-Cc) +N.
Substituting the given values, we get,
(ABC» 5700+ 35°°+3480-8110-7520-418°+10000
. or (ABC» 1870
So ABC can be either equal to or less than 2870. But as per the
data given it is 2970. Hence the data are incorrect.
Examplt R. If in a village actually involved by anthrax, 70 per
cent of the goats are attacked and 85 per cem: have been inoculated with
vaccine; what is the lowest percentage of the inoculated that must have
been attacked ?
Solution
Denoting
the attribute of attack by A
and the attribute of inoculation by B,
the given data are
(A)=70 (B)=85 N =100
We have to find the lowest percentage of ,AB.
Now, according to the conditions' of consistence,
(AB)c(o, and I
(AB) «(A) + CB)-N, i. e., 7°+85-100=55
Hence the value of AB cannot be less than 5 5 •
Hence the lowest percentage of the inoculated that must have been
attacked, is
AB 100= 85 x 100 ) = 65 per cent
(B
J)
X

Example 9. Given that 5° per cent of the inmates of college


hostel are girls, 60 per cent are between teens, 80 per cent intelligent,
35 per cent girls between teens, 45 per cent intelligent girls, and 42. per
cent between teens and intelligent, find the greatest and least possible
proportions of intelligent girls between teens.
Solution
Denoting the attributes 'girls' 'between teens' and 'intelligence'
bv A, Band C, respectively, and letting N=100, the given data are
(A)=,o (AB)=35
(B)=60 (AC)=4}
(Q=80 (BC)=4Z
THEORY OF ATTRIBUTES AND CONSISTENCE OF DATA 541
The greatest and the least proportions of (ABC) have to be found
out. According to one set of conditions of consistence:
(ABC)< 0 "-
(ABC) < (AB) + (AC)-(A), or 35 +45-50, or 30 •
(ABC) < (AB)+(BC)-(B), or 35 +4 1 - 60 or 17
(ABC) < (AC)+ (BC)-(C), or 45 +4 1 -80 or 7
So, the least possible value, of (ABC) is 30. According to another
set of conditions of consistence
(ABC» (AB) i. e., 35
(ABC»(AC) i. e., 45
(ABC» (BC) i. e., 41
(ABC»(AB) + (AC)+(BC)-(A)-(B)-CC)+N.
:> 35 +45 +42.-50-60-80+ 100.
:>32.·
So, the greatest possible value of (ABC) is 32.
One more general rule that can be laid down is that
for 'PI attributes A, B, C, D., .... M
(ABCD ... ·M)<[(A)+CB)+(C)+CD)+ ... +(M)]- (n-I)N*
Where N is total frequency. The following example illustrates
this rule : -
Example 10. In a very hotly fought battle 70% at least of the
combatants lost an eye, 75% at least an ear; 80% at least a leg; 85%
at least an arm.
What peicentage at lea:;t, lost all four ?
Solulion
Denoting the combatants who lost an eye by A, those who lost
an ear by B, those who lost a leg by C, those who lost an arm by D

·Proof
(AB) «A)+CB)-N (proof'of it has already been given.)
Applying the above inequality to the universe of C, we get,
(ABC) < (AC)+(BC)-Cq
<(A)+(C)-N+(B)+(C)-N-(C)
«A)+(B)+(C)-zN
Applying the above inequality to the universe of D, we get,
(ABCD)«AD)+(BD)+CCD)-2(D)
«A)+(D)-N+(B)+~D)-N+(c)+CD)-N-2.(D)
«A)+CB)+(C)+(D)-3N
:. (ABCO)... .(M)=[CA)+CB) +(C)+CD)+ ......
(M)](n-I)-N
542 FUNDAMliNTALS OF STATISTICS

and the total number of combatants by N and further assuming that


N=IOO
(A)=7 0 (B)=75 (C)=80(D)=85
We have to find out the least value of (ABeD).
(ABCD)«[CA)+CB)+CC)+CD)1-(n-I) N
Where
n represents the number of attributes
or
(ABCD)=[C70+75 +80+85)]-(4-1)100
=3 10 --5 00
=10

Hence the least possible value ofeABeD) is 10 and since the value
of n has been assumed as 100, IO~/~ at least of the combatants must have
'ost an eye, an ear, a leg and an arm.
Questions
I. Given the following frequencies of the positive classes, find the frequencies
or the ultimat~ classes. •
(A)=80, (B)=100. (AB) = 70, N=Z50
2. A number of labourers in a factory. were examined for the presence or
absence of certain defects of which three chief desctiptions were hated.
A-Physical weakness
B-Nerve signs
C-Mental dullness
Given the following ultimate frequencies, find the frequencies of the positive
classes including the whole number of observations N : -
(ABC)=75 (aBC)=98
(ABY)=:I 10 (oB;,)= 702
(AfJC)= 106 (a,8C)= 74
(A{JY)=4 89 (a{J"I)=4 81 5
3. Given the following frequenciee find out the frequencies of the positiye
and the ultimate classes.
N= 29,002,5 2 5 (ABy)= 82
(A)= 23.4 6 7 (AfjC)=3 80
(B)= 14,19 2 (aBC)=5 0o
(C)= 97.3 8 3 (ABC)= 25
4. Measurements are made on a thousand husbands and a thousand wi ves.
If the measurements of the husbands exceed the measurements of the wives in 789
cases for one measurement, in 741 cases for another and in 690 cases for both mea·
surements, in how many cases will both measurements on the wife exceed the ml"lI-
surement on the husband?
5. In a Girls' High School there were zoo students. Their results in the
quarterly, half-yearly and the annual examination were as follows:
80 passed the quarterly examination,
75 passed the half-yearly examination,
and 96 passed the annual examination.
'l'HEORY OF A"D'l'RIBU'l'ES AND CONSIS'lffiN(';E 0 F DATA 543

Z 5 passed all three, 46 failed a 11 three.


29 passed the first two and failed in the annual exambation.
42 failed the first two but passed the annual examination.
Find how many students passed at least two examinations.
6. In a war between White and Red forces there ate more Red soldiers than
White; there are more armed Whites than unarmed Reds; there arc fewer armed Reds
with ammunition than unarmed Whites without ammunition. Show that there ar~
more armed Reds without ammunition than unarmed Whites with ammunition.
7. The following data are given in a report.
N=IOOO
(AB)= 2.00
(AfJ)= 35 0
(aB)= 500
Show that there must be a misprint or mistake of some sort.
S. In an investigation measurements were made on a thousand husbands and
a thousand wives. From the records of the said investigation, it is found that th~
measurements of the husbands exceeded the measu remen ts of the wives in 800 cases for
\ one measurement in 700 cases for another and jn 420 cases for both measurements.
Show that the data are inconsistent.
9. The-following ate the proportions per 1000 of boys observ_ed by an iove,·
tlgator for certain classes of defects amongst a number of school chIldren:-
A = development defects
B=.nerve signs
C= mental dullness
N =1000 (AB) = So
(.1\)= 52.0 (AC) =15':
(B),= 310 (BC) = S,
(C)=47 0 (ABC) = 25
Show that the information as it stands must be inc"aeet.
10. From the office of a railway hospital, the following inform.ition is sup-
plied to you : -
55 per cent of the patients are men, 60 per cent are "aged" (over 60), 75 per
cent non-able bodied, 37 per cent aged men, 43 per cent non-able bodied men, and
.0 per cent aged non-able-bodied- men.
Do you think that the information as it stands IS correct j>

II. 1000persons of London were asked by a B. B. C. investigator to give the


nationality of the music they liked. He returns the following data : -
S 70 liked English
650 liked French, and
480 liked German
440 liked English and French
360 llked French and Gennan, and
140 liked English and German
12 5 li ked all three.

Show that the information as it stands must be incorrect.


If a report gives the following frequencies as actuallly observed, show that
11..
there must be a misprint or mistake of Some sort, and that possibly the misprint
coosists in the dropping of a I before the S 5 gi ven as the frequency (Be) : -
N= 1,000
(A}=5 IO (Bq= I89
(B)=490 (AC)=l:40
(C)~4a7 (BC)- 85
544 PUNDAMENTALS OP STATISTICS

13. Prove that the following data Ilre inconaisteot:-


(N)= 1.000 (AP)=48,
(A)= 52S (A,,)= 378
(B)= ~I2 (By)=u6
(C)= 470 (ABC)=25

14. The fl"'llowing information is given to you : -


100 cHlc1ren took three examinations. 40 passed the first. 39 passed tbe second
and 48 passed the third. 38 passed the £irst and the second. 35 passed the first and
the third. 37 passed the second and the third.
Show that 1:he information as it stands must be incorrect.
IS. The following summary appears in a report on a survey covering 1.000
fields. Scrutinize the numbers. and point out if there be any mistake. or misprint
in them : -
Manured '10
Irriga,ed fields 49 0
Fields growing improved varieties 4 27
Fields both irrigated and manured 189
Fields both manured and growing improved varieties 140
Fields both irrigated and growing improved varieties 85
( 1. A. S., 1949)
16. The following is a sum~ary of the statistical features of a census of ration
card~ : - ,
Total Number of
(tem No. Category cards belonging to
the category
1 The whole of the census 10,000
,
4
Permanent residents
Males
Consumers of rice
5.100
4.900
4. 2 70
Permanent male residents 1.1190
~ Consumers of rice among permanent residents 1.400
7 Males consuming rice 97 0
Show that the entry against item NO.7 is inconsistent with the entries against
all the previous items, namely, I,2,~,4, 5 and 6, taken together (M. Com., A//ababatl, 19,1.)
(1. A. S•• 1947).
17. If, in the urban district of Bombay 2. 50 per thousand of the women between
2.0 and 2.5 years of age were returned as "employed" at a census and Sso per thousand
as married or widowed, what is the lowest proportion per thousand of the married
or widowed that must have been employed.
t8. The follOWing are the proportions per 1000 of girls observed for certailJ
classes of defects amongst a number of school-children:
A=deve!opment defects, B=nerve signs, C=mental dullness.
N= tooo (AB)= 55
(A)= 68 (BC) ... 36
(B)= 85
(C)= 69
Show that some defectively developed girls are dull flIld state how many at least
muat be 10 ..
~~EORY OF' ATTRIBUTBS AND CONSISTENCE 01' DATA S4S
19· Tbe following are the proportion. per,,.ooo ohrorken obaervecl for cmaiD
clutes of defects amongst a number of Eadory wonen.
A-deyelopment defQ;ts
B=ncne sign.
D- meoW dulloesa.
N ... ,00 (0) _400
(A)=4<IO (AB)-I70
(8)-$4' (8D)_228
Show that lOme dun worken do not eshibit development defec:tl, aoc! lltate
how many at ledt do not do 10.
80. Among the adult population of certain toWD ,0 per a:nt of the populatioa
an: malea. 60 i'" ceot are wage C&II1et8, and ,0
per cent are 10
yeara of age or 0'Rf.
10 per cent of the maln are not wage earnera and 40 per cent 0 the males are IlI1der 40.
am we infer anything about what l)etcentage of the population of 40 JCUI Of 0Tef
are ...... eamen? .
:at. (.) Obtsin the total number of c1asaea into whic:h a univene can be divided
by thrr.e attributea each of which separately dividea the uni.crae into two c:lafIa.
Show further that any cJaas.~ency of the 6rat or 8eCond order can be erp_d
in tertM of the th~rder cJsss..t'ftquenclea.
(Jo) Il. in • collection of h01Jllel actually invaded by linall-pos. 70 per eeot of
the Inhabitants are attacked and 8, per ccut have been nc:cinated. what is the lower
pea::cIlUge of tbe nccinated that mUlt have been attacked ? . (I. A. S•• ! 9' ,).
2.Z. A uniYCrSC conliata of three attributes each of ",hk:B Is dh·isible into
two parts. What are the diffetent clasa-fiequcnciea obtainable ?
At a competitive examination at which 600 graduatea appeared, ~ outnumbered
gkll by ,6. 'those quali!'ying for1nteniew esceeded in nuliJ'bex tho.e tiWin, to qoalifJ
by ,10. The number of Scienr.e graduate boJa intCfYicwed waa ,00 wilne aaaoDS
rlie Iuu gruluste girls there were z, who Caned to qualify for interview. Altogether
there 'W\ft only IH Am graduates and ~, among diem t'ailed to qualify. Bop who
&Wed to qwUi£y numbered 18. Find (.I) the number of boys who qualliied for: inter-
..-iew. (&) the total number of science gmduatc boys appearing and (I) the numbel ·of
edaxc gaduate litle who qualified. (P. C. S.• 19,6).
25. A study was made lIbout the studying habits of the students of z certaiD
uni'retlity, and the fonowing summary is given at one plac:c in the report :
Of the students surveyed 71 ~t were from we1J.to.do &milieI. " pc«eet
wetC boys and 60 percent were irreaular in their: stud,iea. Out of the irr:egulu ones
50 ~t ~ boys and two-thirdS were &om weU-tO-do &mniea. The percentage
of isteguJar boY' uom well-to-do families Wtal S. I. there any itlcon.isteccy itl die
data ~
Give p100f for: your answer. CAr-. M. 0.., 196&).
%4. There were 400 students in the B. Com. clS1I of .. UDiTeraity. Tbdr re-
sults in the varioua terminal cuminations ate given below :_ ,
(i) Tcnnina1-180 passed. fii) Terminal-l40 paned. (i;l) TCftDinal-180 pas.
lCd, 60 passed in all tl-~ tctmlnals; So tiailed in aU the three; 40 ~ in the tat and
and terminals but failed in the third; 70 failed in the tit and znd tc:'rminala and passed
in the third.
P'md out how many students passed at leut two e:uminations.
(M. 0 .•• Var-).
2,. A market investigator: retuma the foUowln!J:-
Of rooo people consulted. 8n lilced chocola~. " . liked tofFcc, and 418 liked
boiled sweets; '70 liked chocoiatea and toffee, ,,6 liked chocolates ana boiled sweet.;
and ~ liked tofl'ee and boDed sweets; %97 liked all thKC.
Show tbat thj.~ information as it stands must be inconect.
(M. A., DeJM. t960).
35
Association of Attributes
19
Expected values
When it is desired to study association between two attributes
A and B it becomes necessary to find out whether the attribute A is
more commonly found with attribute B than is ordinarily expected.
Thus in a study of association the first thing to be calculated is the ex-
pected value onAB). This' {alue is calculated on the basis of simple
rules of probability . We shall not go into a detailed study of probability
at this stage and leave it for full discussion in a later chapter. but it is
essential to point out here that the expectation of a particular event is
always equal to its probability of happening multiplied by the number
of observations. Thus. if a coin is tossed. the probability that it will fall
head upwards is clearly ~ and if there are 100 such tosses the expectation
of heads is equal to • X 100 or 50. Similarly the probability of drawing
a spade frotq a pack of cards is U because the number of spades is I';
and the number of clt£ds is 52. If there are 100 such draws the expecta-
tion of spades would be H X 100 or 2,5. Thus the probability of the
. . Numbers of favourable cases .
happentngof an event IS equal to=T 1 b f . If 10
ota num ers 0 cases
a population of 100 students, 20 are married the probability of coming
across a married student is thus T~U'
If two attributes A and B are studied in a universe and if the fre
qucncy of A is represented by (A) and of B by (B)
(A)
the probability of CA) =~

and the probabi!ity of (B):: (13~

The combined probability of two independent events is equal to


the product of their individual probabilities. Thus. if a coin is tossed
two times the probability that both the times it will fall with head up-
wards is t X ~ or 4. Similarly. the probability that in two successive
draws a spade would be drawn is U X ~: or l. On this basis the com-
bined probability of (A) and \.B) would be
(!:) X (B)and
N N
the expectation of (A) and (B) combined, would be
(A) X (B) xN
t... N
ASSOCIATION OF A TTllIBV'I'ES 547

(A) X (B)
N .
From the above it is clear that ordinarily if attributes A and
B are independent the expected frequency of (AB) would be.e·qual to
(A) X (B)
~--.

Criterion of independence
If there is no kind of relationship between the attributes A and
B we may expect to find the same proportion of A's in B's as in {J°s.
lIn other words, attribute A must be equally popular in IVs and in not
B's. If, for example, blindness and 'deafness are not associated, the\
proportion of blind people amongst the deafs and lI-mongst the not-
deafs must be equal. If, however, it is found that the proportion of
blind people amongst the deafs is more than their proportion amongst
not-deafs, it indicates that blindness and deafness ha.ve some association.
Two attributes A and B are said to be independent if the
observed frequency of (AB) is equal its expected frequency, i. e.,
(A) X (B)
N
Example I. In a population of zoo students the number of
ma.rried is So. Out of 60 students who failed, 24 belonged to the mar-
ried group. It is required to find out whether the attributes of marriage
and failure are ind(.'pendent.
Suppose the attributes of marriage are represented by A and failure
by B. The actual value of (AB) = 24
(A) X (B) Sox60
Expected value of (AB) = N = 200

=z4
The actual value of (AB) and its ~xpected value are equal. It
means that attributes A and B are independent. We can look at this
problem from another point of view also. The percentage of married
students who failed is ,

(~:/ X 100= :~ X Ipn


=3 0
The percentage of unmarried students who failed is
an- X 100=_3_6 X 100
a 120

=3 0

Thus, so far as marriage is concerned, it has nothing to do with


failure. 30% of the married students failed and the percentage of
548
/

failure amongsi the ~married students is also 30. The two attribute.
a" thus ind~dent of each other.
Yet another way of looking at the above problem is to find out
whether", -and If ate associated. In other words. whether bachelorhood
is asodated with su~ss in examinati9n. The .actual value of (41) in
the above illustrations
=(IHaB)
=N-(AHB)+(AB)
=%00-80--60+14
=84
The value of (..,) =N-(.A)=.z~-80
==tao
The value of (/J) =N-(B)==aoo-60
=&40
The Cltpected value of 11/1
_(II)X(JJ) =~XI40
N . too
==84
Thus we find that II and fJ arc also independent of each other.
We are now in a position to lay down the folIo,," ing general rules
for decid.ing w,bcthcr any two attributes arc: independent or not. Thest
rules a"· ..tis6ed by substituting the ligures given in the above esample.
In the above eumple the aCtual values of (AB), (ap) (Af1)•. apd
C.B) are respectivdy 24. 8"" 56 and 36.
Attributes A and B arc independent. if
(I) (AB) == \~ X (B) 14 == ~_)ij!60
N 200

{HI)
I

In all the above four cases the attributes A and B would be


found to be independent. From the above it can he easily concluded
that attributes A and B would be independent. if
(AB)_(AfJ) 24_ ~6
(1') (B) - (fJ) 60 - -140
(.1) ~~ .... ~aB) ~= -~
A (.) 80 120

i.';'"' (AJl) (afJ),6 _~~-


'I CA) ..... (4) • 80- 12.0
ASSOCIATION OP ATrI.OU'l'aS

(.m)

It can alSo he concluded that A ~d-B ",ould ~ independent if

(ix) (AB) X (4rB) = (All) X (.B~


=24 )( 84 ==
S_6 X ,6-
...
(... ) CAB) (B) "4' 60
(A)- =vN)
\ = ao
- - =200
-
(AB.-) (A~ ;14 _ .'80
(XI) (8) -- (N -60- &00
(:Jt"\
'I
fAa) (:\1'
(N) ==(N) X (N)
(8)
:&4
:;z --......::a
60
llo X .....,_
~.
?~
, 200 zoo-
from the la,t equadon we arrive; at an- important condu,ion that
if 1- '111111 illlr;;II_11I A IIfIII B .,., .itlM/JIMJId JIM ,.."",Ii,. -J

.'P,...j
(AB) X (11/1) _CA).; (B)x ~II)-N (/I)

, :A); (" X 1.11) ~ (~~


(AfI)x(.B)
-In the above.example (I) the value 0( (~:? or the: proportion

of married stu~nts who (ailed is ~ or 50%. This proportion

is Qot only equal to ~(:{ i:", the- proportion of unmarried who

failed whi~ is J :! or 3~%'_ ~ but 15- also equal to the proportion 'of

the-total number of students v.'ho ftiled in the universe or ~ ~b1cb


i~ 'or 50 'K,. It means .that A and B ~e independent. The pro-
·&00
portion or' B·s in A's is equal to the proportion oC B's in the univcne
(AB) (8)
or (X)== eN)
Similarly other relationship can also be laid down.
550 FUNDAMENTALS OF STATJ~TICS

AH'! in Ihe IInivt'rll il etfHall1l 1m prodllrl of 1M proper/ion! oj A'! lind B' oS


;n rhe lI11i""se.
From the above rules it is very easy to find out the frequency of
fAB) in the universe '.."here the attributes A and B are independent.
E-"'<Qmple 2. 1£ (A) "'-'5 0 ; (B)=6o and N=IOO What \vou}d be the
value of (AB) if the two attributt:5 A and B are independent?
If A and B are independent the value of tAP.~ would be equal tu
AxR
-'N
Thus
0
CAB) = 5 XGo 100 =30
In the above example the value of
(ap) would be _V!.) X (/J)
N
(a)--N-· (A)=IOC-50=50
and (/J)=N-:(B)=.100-60=40
Thus
(af!) = ,0 X40 _:. 20
100
Similarly the value ot (AP) would be
(A) X (fJ)
.. - N
so X 40
= ---X-oo = 2(1
:lQd- the value of ~IlB) would be
(ll)x(B) 'oX 60
~ ,= '100 =~o

'Ibu5 we can conclude that if A and B are independent. the nine ceJ)
table would have the following form : -
A a

(A) X (B) (a) X (B) (B)


B \
N N
I
(A) X_ (fJ) (4) x({!) (f3)
N N I
I
(A)
I (II) I
\
N
ASSOCIATION OF A'I'l'RIEUTES

Associatiotl
In statistics the word a!>soclatlOn has a technical meaning. Ip
corr:mon parlance if A and Bare fouIld together in a large number of
cases they are said to be associated, But in statistics they cannot be said
to be associated until they are found together in a larger number of
cases than 1S expected if they are independent, Thus even if A and Bare
found together in a very large number of cases they will not be said t
be associated unless this number is greater than the figure expected
when the attributes A and B are independent. This point should always
be kept in mind while drawing inferences from statistical d.ata relating
to attributes. _
A and B would be associated if they are not independent. Or,
in other words, if CAB) is not ("qual to (A) X (B).
--l\r-·
It
(AB);> (A)XC13)
N
A and B are said to be porilively associated or simp" associate,t.
If on the other hand
(AB) ~ (A) x(B)
N
A and B are said to be negatilJe{y aucciated or Jitnply'disassociated.
It ShOllld be remembered that disassociation does not mean absence
of association. It means presence of negative association. _ -
ExOfliplf ~
Given
(A)=4 0 (B)=30 (AB)=.2o N=IOO
Study the association between A and B ; a and f3 ; A and f3 and II
and B.
We can represent the given data in the shape of a table and obtain
the frequencies of the missing classes :_
A a
B CAB) CaB) (B)
20 10 30
"-

CAP) (afJ) (ft)


,,
20 50 70

N (A) (0) N
40 60 100
552 P'UN1)A!I'i'.NTALS OF STA'l'ISTlCS

The 'above values are those which h2ve been observed. We caD
DOW calculate expected frequencies,. The expected frequency or
(AB)_(A) x (B) .... 40 x'so == u
N I()(\

t_tl)=,II)X(Jl)
V"~ N
=- '0 X70 ==4'
100

( AR'I_(A)X®_40
X 70 = as
~""" 100 100

~B)~")N~~)- ~:e ~J~.


No'" 'We c:ah,easily'8tudy'tbe'daodadon betw~cn various attributes
it w~ lay do_Wn, the actual
Thus'
All,,'
expeCted' values together.
"

, .' 'l'hen: em be a situation in which the; value of (ltB) may not be 0


and yet there may, be' compIete ocpt,tve ass~on between the two,
ASSOCIATION OP ATTaIDUTES 553

attributes. We know that (AB) c:annot be: less than (A)+(B)-N. 1=bull
if (A)+(B}-N is more tban o. (AB) would also be more than 0 If
(AB) is just equal to (A)+(B)-N. there would be complete negatIve
asaociation between A and B. In such cases the value of (1ltJ) only.
would be o. For example, if the value of (A)=so and of. (B) 60 and
N 100. the value of (A)+(a)-N would be ,0+60-100 or 10. If the
value of (AB) is 10, the value of (All) would be 40, of (uB) so and (all) o.
Thus we conclude that if value bf (AD) Qr (alJ) ,is 0 there is complete
disassociation between ~ two attribute's.
Intensity q£ aaaociat!.Oft , , ' ,
In actual practice in C'lQSt of the ,cases tJte value of (AB) would lie
somewhere betw~ tbe two . limits---oone laid down by the, espected
value and t1ie other'~d dOwn by'the :va1ul!:[ .cSpcc:tcd in ~r£ect 'positive
Dr ne~ativc association~ 'The ~tensity of asaoc:iation is'indicated by the
alClit to. which tb~ obsetved ~uc. o£ (AD), dcvia_ from its apcctcd
value towaidS. the linut '~,,~c.Ct ·...u~tiOn. We shan disC:USS'the
quantitative '~ea6urcment of.~~'~Unslty of aaociation or disusociation
'I_little later.' ,.'., " ",' ' " "
Chance a.sOciati_' " ' , ."~", ' , '
At this s~ge it iJ" .il.e~~ to point Out that if the value of. (AB)
N
'is fouod tO,be grea~t ~ ,the~~~;<>~ (~~ (B) it ahoul4 not be, at O!Jcc
co~cl\ld~d that there is ~:,~~t$n ~eeII the ~o attribut;es.
It 11 qlJ1te possib~. ~11W~ ~ ,~ce, ~~ obacm:d
and, the ezpect~ TSfaa ~...Ot', ;~"ch.':~' ,o,~ ~oc:iation may be the
result or UlDpling ftUct\la~ the ti.\lC a~oaaUOD may be zero.' As
such. unJeu thc ~ ,,' ';thc,o~cd aQd the Cspcci:ed 'v~~cs
is very significant we sb, ' ,not ~d~dc tbat'tbete .ii' any a..ociatiOD
,or disass~tiOD ~ccn ~ 'tWo ~..Urlb~tes.. The cp:ation which n_a-
tunlly anaes here ~ b~w mu~ divergence betWeen the obsaftd and
IICIDal values can be term~d as sigstificant. We shall discuss tb:i& qu~­
tiOD in details in the chapters o~ S~q. This point has b=n aiscd
here o~Y to warn ~ student pf thi$ j~ubjcct against drawing ,hasty
COD.clUS10n, "
Coefficient of".lOCf8dot.. '
S() far we. bave-dis(U*d,th_i a 'rough, idea about' the ~tCDt.of asso-
ciatiOtt or disassociation bctw~ 'two attribUtes can be had' by finding
out the extent of'the difference between their obse~ed and't;Xpccted fre-
~endes. Fot prac:tical,'~es it is !'!DOugI'! to take I!l, qedsion ,..bout
whether the two attributes.in qucstic5n. ue associated. disassOciated or in-
dependent. But in'some cases the qHferenee bet\Veen observed .and ex-
pected frequencies may be due to what ~e called'fluctuations of sanipling.
Under such circumstllnces it becomes neccssuy t6 obtain an idea about the
extent to which the diffcrence between the observed and expected fre~
q\lcncij:s can be due to chance fluctuations. We shall discuss these
tests and their signUicance in a separate chapter entitled Sampling of
Attributes. Por the present we shall disogsl the possibility of obtain-
554 FUNDAMENTALS OF STATISTICS

ing a coefficient of association which can give some idea about the
extent of association between two attributes. It would be convecient,
if the coefficient of association is such that its value is 0 when the two
attributes are indepe.cdents,+ I when they are perfectly ~ssociated and-l
when they are.perfectly disassociated. Many such coefficients of asso-
ciation have been worked out by different authors but the one given by
Yule is easy and simple.
Yule·s coefficient of association or
Q _ (AB) (a{J)-(AfJ) (aB)
- (ABJTaIf>+(Ap)(.aB)
I
We know that when the two attributes A and B are independent the
value of (AB) (afJ)-:(AfJ) (afJ). As such, if two attributes are independent,
the value of the numerator in the above formuJ a would be 0 and the value
of the coefficient of association would also be o. Similarly if there is
perfect association between the two attributes A and B the value of (AfJ)
(aB) would be 0 and since it will be so both in the numerator and the
denominator, it is evident that the value of the coefficient of association
would be+ I. Similarly if there is perfect disassociation bet\veen two
attributes A and B the value of (AB) (ap) would be 0 and since it will be
so both in the numerator and denominator, the coefficient of association
would . he-I. The following example would illustrate t1:lc above for-
mula r
-Exampl8 4. Calculate the coefficients of association from the
foll.pwing data ; -
(1) (A)=60; (B)=80; (AB)=48; N =100 1
(2) (A) =60; (B)=8o; (AB)=6oj N=IOO
(3) (A) =60; (B)=8oj (AB)=40; N =100
(4) (A)=60; (B)=80; (AB)=50; N=100
'iOili/ioll :
(1) In this caSe
(AB) =48
(AfJ)=(A)-(AB)
=60-48=12
(aB)=..(B)-(AB)
=80-4 8 =3 2
(a{J) . (a)-(aB)
=4 0 -3 2 =8
(AB) (afJ)-(Ap) (aB)
Q= (ABf(afJ)+(A{J) (oB)
(48 X 8)-(1 ~ X 3 l )
= (4 SX 8)-t--(i l X-32)
ASSOCIATION OF ATTRIBUTES 555

Thus the two attributes A and B are independent.


(2) In this case
(AB) =60
(Af3) =(A)-(AB)
=60-60=0
(aB) =(B)-(AB)
=80-60=20
(af3) = (a)-(aB)
=40-20=20
(60 X 20)-(0 X 20)
Q = (60X20)+COX20)

::::i 1 2.00=+1
1200
Thus there is perfect positive association between the attributes A
and B.
(3) In this case
(AB) =40
(Af3) =CA)-(AB)
=60-40 =20
(aB) =(B)-{AB)
=80-40 =40
(af3) =(a)-(aB)
=40 - 40 =0
(40 X 0)-(20 X 40)
Q =(40 XO)+(20 X40)
-800
== 800
- =-1

Thus there is perfect negative association between the two attri-


butes A and B.
(4) In this case
(AB) =50
CA(3) = (A)-(A13)
=60-5 0 =10
(aB) =(B)-(AB)
=80-5 0 =3 0
Ca(3) = (a)-(aB)
=40-';0=10
556 PUNDAKENTALS OP STATlntc5

Q ho~roHlox,o)
(yO)( 10)+(10 X JO)
:00
==8 +.2}
00
Thus there is slight association between the attributes A and B.
'The chief characteristic of this coeBident of association is that it
is independent 'of the relative proportions of
A's and ,,'. in the data.
If all the terms containing A uc multi~lied by a constant,- the ...alue
of Il would not' be. aifected. SinJilarly if all the tmns containing B
or tI or fJ are multiplied by a 'COnlWlt, the valUe' of f2 'Would rCs:Qain
unaKccted: Thus, if in the laSt eumpl~,tbe values 'of '(AB) and (AB)
are multiplied by two, the frequencies would be-
(AB) ='00
'(JVJ) =.10
(.,B) =.50
(lIfj) =
10

,J2= (100,Xl()HJ.ox}~)
, (1C:~oXI0)+(%O'X50)

-~
JQQO
-+.a,.
Thus,the ......ue of the codIici=t of euoc:iation_ ~Cc:l'~­
'ted C'ftA thOllfb.aU ~ COG~I:A~,;"~ ~) Uld (~_~ mwti-
plied by.l. Similuly if (",B) ...d '(4l) 'Were', -.GltipJicd by • coalUftt the'
'Value of IJ. woUld not be',.aa'~ed. '
The comparison of the coefBcicots of ...oc:iat1on of two sets-of dab
,can gi'9'e an idea about the 'csteQt of ,..sodation betweeo twQ_~ of
Sautes.
&,..pk ,. lbe following tabJe gi..... 'the aWilber of persons
a.,dericg, ftom certain -¥rmh1es in' BeDgal. ~,
< ':.. '"

Total .. ,
.Deaf and ' Insane aad
numbers lris~e mutes ' deaf-mutes

,. ,
Study the IlSsociation between insanity and deaf-mutism, aepa.ca-
tc1v for males and females. '
ASSOCIA'rION or ATl'atJSUl'ES 557
SOllllioll :
Denoting
Insanity by A aDd sanity by tI, Deaf-mutes by B, and its abscoce
by fJ, the given data are
Sex
Males Females
N = 160,00,000 141,00,000
(A) = 1%,'01 9,0"
(B) = 11.6,0 14"1}6
(AB) =;= '17
34'
Calculating the class frequencies of the second order.from the above
we have.
S'X
Males Fema1ea
(AB) . = 54' '17
(AP)=(A-AB) = 1%,10, 8,,,8
(IlB)=(B-AB) = 20.7S6 13,8J~
(ap)-N-{AH4B)= 259.66,194 .140,77.126
Substituting the abo~e values in the 'Yule's formula for c:oefIident
of association
Q (AB) (afJHAfJ) (B.)
(AD) (afJ) + (AP) (aB)
Where Q represents coefficient of association.
we get,
QforMales- ('4J X~'9,66"94Hu,IO' X20,7S 6)
('4' X2S9,66.S94)+(U.10S X%0,7S6)
='96
and Q fot Females==(,~?~_.!:~o.77.u6}-(8n8 X 13 81 9)
(311X 240.77~1 ~6)+(8738 X t ,819)
==+·97 .. '«.'.~ .
Thus; thete is a positive association between insanity and deaf-
mu~~ fo! both the males. and females of ~~ but the degree: of
usoaaUon 1$ mo~ for females than for males. '~:
Coefficient or c:OWgnadoo
Another impOrtant coefficient· which is also ind~~t of the
relative ~rof0rtions of A's and a's in the data.(lilte Yule s ~clent ot
Auociauon is known as Col/./id,fI' of C-oIIigfllltitlfl, Coefficient of Collig-
,oation or
558 FUNDAMENTALS OF STATISTICS

y
1 - j CAB)
(ABf(aB)
(afl)

+ J(ApgaB)
1 (AB) Cap)
The following example illustrates the above formula:--
ExalJlpfe 6
Given
(AB)=,o
(Afl) = 10
Calculate the Coefficient of Collignation.
SO/lIlion
Coefficient of Collignatioo or

I-J2
30 X 5
y=--
0X5

1+J~?:C5
_;OX5
I-V.,33 1-·57 ·43
=--=-
=1+.57 = r·57
I+V·33
=·2.7
it can easily be proved that coefficient of association or
21
Q= I+yz
In the above example the coefficient of association or Q
_ CAB) (am-CAp) (aB)
- (AI3) (4)+(A{J) (aB)
(3 0X 5)-( IO X5)
=-(30 )<5)"+-(10 x 5)
=~=.~
.100

The value of --~


21
1+,,2
·54
=--= '5
2 1.°7 9
Thus we find that the relationship between the Coefficient
A.SSOCIaTION OF ATTRIBU'lT.S 559

Associatioa and Co-efficiel1t of Collignation Q= ( 2'Y 2) is &atisfied by


I+Y
the above c.omple.
Partial Association
Uptil now we have considered the association of A <lnd B in the
universe as a whole without finding out the other attributes in the uni-
verse. However, the association between A and B may not be a direct
as:;ociation ; it may be the result of their association with a third attribute
C. Thus if A is positively associated with C and if B is also associated
with C, A may be found to be positively associated with B. This asso-
ciation between A and B is not direct. It is the effect of their association
with another attribute, C. To find out whether the association between
A and B is real, and not merely due to their association with a third attri-
bute C, it would be necessary to study the association of A and B in the
sub-population of C and 'Y' If A and B are associated in both the sub-po-
pul~t~on of C and 'Y it would indicate that A and B are really associated
with each other. The association of A and B in the sub-populations are called
partial i'J..fSociations to distinguish them /r(Jm total associatinn in the universe as
tJ whole. An illustration would make the point clear.
Suppose that an association is obser;ved between "B. C. G. vatci
nation" and "exemption from Tuberculosis", which means that a largel
number of those people who are vaccinated are exempt from an attack.
of tu,berculosis than those who are not vaccinated by B. C. G. It may
be aJgued that this' association is not real. It is illusory. It may; be due
to their common ~ssociation with some third attribute. say, economic
condition 9f the people. Most of the vaccinated people may be those
who are rich and well-to-do and most of the unvaccinated people may be
poor. Well-to-do people generally live in healthy surroundings and take
dch d.iet and as' such may be less exposed to the attack of tuberculosis.
On the other hand poor people generally livf" in unhygienic surroundings
and take poor diet and as such are more liable to attsck of tuberculosis.
thus the association between vaccination and exemption from tuberculosis
may not be a direct one. It may be due to the fact that the attribute of va-
ccination '~associated with the attribute of richness, hygienic condition
and good diet and exemption from tuberculosis is also associated with
these conditions. In this way vaccination (or A) is associated with ex-
emption of tuberculosis (or B) due to their common association with rich-
ness, hygienic conditions and good diet (or C).
In actual practice the ambiguity rf ':-erred to above is of a Plore
complex nature because the population under discussion may contain
not only those units which possess a . hire. attribute alone but a mixture
of units with and without it. For e;.rample, in the above case richness
and hygienic conditions may be present side by side with poverty and
unhygienic _conditions at the place where the observations are being
made. If, however, the study is made only amongst those units where
560 I'UNDAJomNTALS OP STATISTICS

the third attribute is either present or not present, this type of


ambiguity
would not be there. Thus if the data in the above illustmtioo refer
to only such a group of persons all of whom are rich IiUld live In hyglcnlC
coJuiitions IiUld take rich diet or to such a group of pe-rsons all.of whom
are poor and live in unhygienic conditions and take poor diet, there would
not be any complication in .the study of associadon between A and B.
Is:a order to be sure about the assodati~ns of A and B, generally their
associations in the sub-population C and" are separately studied and then·
only conclusions are drawn about the association of A and B in tb~ nni~ne
at large. Aa has been said earlier the association of A and B in the sub-
population of C and ., are called partial associations and the association
of A p.nd B'in the universe as ,. whole, total assOciation.
• A-. and B would be positively associated in the sub-population 0:
C if

( ABC) I> (Ae) (BC)


. (C)
and negatively usociated if
(ABC) <I CAe) (BC)
(t.)
Similarly A and ~ wou14 be positively associated in the sub-popu-
lation of '7 if
(AB,,) 1:> CAy) (B7)
(7'
and negatively associated if
(AB,,) <:: (A,.)(B7)
(1')
It should be noted that the above formulae are derived from the
formula of total association by specifying the sub-population in which
the study of association is being made. Various other types of formulae
diaeuaaed earlier in connection with total association can be written down
in this manner for studying partial asllociations. The best study of partial
assocation is, however, done by finding out the proportions or percen-
t.geI we did in SODle examples in the study of total associations. The
following eltample would clarjfy the study of partial association.
&11111,,1, 7-
Given:
N == 1000 ~
Vaccinated or (A) ..... So
Exempted from tuberculosis (B)-70
Rich (C) =100
(AB) = JO
(Ae) - ~o
ASSOCIATION OF ATTRIBUTES 561
Study the association between vaccination and exemption from
tuberculosis. Find out If'the association is due to the third attribute of
richness because it is likely that a larger number of rich people may be
taking vaccination than poor people and further rich people 'may be
more immune from attack of tuberculosis than poor people.
In this c-ase we shall first study the association between the two
attributes vaccination (A) and exemption from tuberculosis (B) in the
population at large, and then we shall study the partial associations of
A and B in the sub-populations of rich and poor represented by C and"
respectively.
(1) FOR THE WHOLE POPULATION
Percentage of people exempt from tuberculosis
(B)" 7
=-XIOO=--XIOO= .7
N 1000

Percentage of vaccinated people exempt frorn tuberculosil>


(AB) ;0
= - - X 100= - - X Ioo=n.l
A Ko
(2) FOR RiCH PEOPLE
Percentage of people exempt from tuberculosis

=_i!_x 100=41
100

Percentage of vaccinated people exempt ftom tub::rculosis


=(ABC) X 100= ~ X 100=50
(Ae) 30
(3) FOR POOR PEOPLE
Percentage of people exempt from tuberculosis

By X 100 =~x 100=1..77


" 9°0
Percentage of vaccinated people exempt from tuberculosis

= (AB,,) X !Oo=._I_5_X 100=3 0


(Ay) .1 0
Fro"m the above it is clear that there is a high degree of positive
association between vacdnation and exemption from attack of tuber-
culosis in the total population as also in the sub-population of the poor-
36
562 FUNDAMENTALS OF STATISTICS

Amongst the ri,ch, hO\\'ever, the association is not of a very high degree.
As such the idea t;hat the association in total population is due to the
fact that larger number of rich people are vaccinated and are also exempt
from tubi!fulosis is incorrect, because in that case the association of A
and :i iD' tysub-population of C should have been very high.
It should be noted that in the above case the percentages of l'eople
exempt from tuberculosis in the total population as also in ,the sub-
population of poor are very low (.7% and Z.77% respectively) but in the
vaccinated group' ·the percentages are very high (37.5% and ;0% res-
pectively). In the sub-population of rich the percentage of people exempt
from tuberculosis is fairly high (45%) but in this group also the percentage
of vaccinated 'people exempt from tuberculosis is higher by 5%-it
is p%. Thus the attributes A and B are positively associated in the
total population as also in the sub-populations. The association between
A and B is thus not due to their common association in the sub-population
of C.
It is possible to lay dowh a formula for the coefficient partial of
association, by; modifying the origina1 forn'lula for the calculation of
coefficient of association. '~he only chJnge that is introduced in the
formula is that the sub-pqtYulation in wnlch aS$ociation is being studied
is also indkated. Thus It formula for the coefficient of association bet-
ween A and B is ..
_ (AB) (afJ)-(Af3) (aB)
Q-(AB) (af3) (Af3) (ail)+
Now if we wish to study the partial association between A and Bin
the sub-population of C we shall add the Qttribute C in each of the above
classes and the coefficient of association would then be
(ABC) (afJC)-(AfiC) (aBC)
Q= (ABC) (aflC)+(AfiC) (aBC)

'Numbc.:t of partial associations


The total number of associations in case of 11 attributes is always

equal to !!i"_-:-I) 3112. Out of thes~ n (n-I) are total associa-


z •
tions and the rema~nder, partial associations. Thus in case of three
" butes t h ere are--'-- -
Qttn 3(3- 1 )
2
,= 3II Or -3-
2
X-1 . .
X 3 or 9 aSSOCIatIOns an d
X
out of these (3- ) or,; are total associations and 6 are partial associa-
2
tions. The total associations wou'ld be between A and D, B and e, and e
and A. The partial associations would be between AB and C, AB and y.
AC and B. AC aud,s. Be and A and.Be and (I.
ASSOCIATION OF A'lIl"RIBUTES 563

,Illusory associations
It is clear from the above discussion about partial associations
that sometimes misleading or illusory associatipns may be ob~e-rved
between two attributes wl-_ich are not directly associated but which
arc both individually associated with a tliird attribute. Thus. if A and
B are two independent attributes but both of them He positively asso-
ciated with a third attribute C, it would appear as if A and B are directly
associated. If A is positively associated with C and B negatively associ-
ated with C the association between A and B would appear to be negative.
Thus misleading conclusions may be arrived at, if the partial associa-
tions are not studied. The following illustration would further clarify
this point ; -
Suppose out of 100 non-vegetarian patients, a new dietic treat-
ment is tried on 80. Further if out of 30 patients who died only 10 were
from th~e who were under the new treatment, the coefficient of asso-
ciation b~ween the two attributes. i. 6.. N.:w Treatment (A) and
Deaths (B) would be-I. Suppose futther that the same treatment was
tri~d upon vegetarian patients also. If out of JOO vegetarian patients
the treatment was tried upon 40 and if out of 60 deaths in this group 40
were those on whom the new treatment was tried there would be perfect
positive association between the new treatment and deaths, or the coefJi-
cient of association b~tween new treatment (A) and deaths (B) would
be+ 1. Thus there is perfect negative association in one case and perfect
positive association in the other. If further the resultt were pu.b-
Iished' :wit,hout distinguishing between non-vegetarian an~ vegetarian
patients the conclusions would be highly misleading. In that case out
of 200 patients the new treatment is tried on 120 and out of 90 deaths 50
are from. the group. on which the treatment was tried. The coeflicient
of association in this case. between new treatment (A) and deaths (B)
would be-' I 7 indicating that there is a slight degree of negative
association between them.
It is thus evident from the above illustration that if the association
between sub-populations is not studied separately misleading conclusions
ate liable to be drawn. There may be cases where the apparent asso-
ciation or disassociation between the two attributes in the universe at
large may be the result of association between the two attribu~es with
a tbird attribute. It may also be (as in the above illustration) that the
association between attributes in the universe at large may not appear
significant but there may be a high degree of negative or positive asso-
ciation between the two' attributes in the various sub-populations. In
the above illustration if the combined results are published they would
show that out of I 20 patien~s on whom the ne\v treatment was tried
on!y d·ij X 100 or 41.7% died while in 80 patients on whom it was
not tried the percentage of death was t%- X 100 or 50. It would indi-
catc tbat, the new treatment has some value. But we have seen that in
the non-vegetarian group out of 80 patients on whom the treatment
564 FUND'AMENTALS OF STA.TISTICS

was tried only IO,or 12'5 % died and in the vegetarian group the per·
centage of death was 100. It means that the new dietlc treatment is
excellent for non-vegetarian patients but suicidal for vegetarian patients.
In this case it is necessary that the results are published separately for the
types of patients.

Illusory associations may arise, in another way also, through the


personality of the observer. If the observer has not a very keen eye
it is lik~ly that he observes the presence of an attribute (A) when he
observes the presence of another (B) Or vice versa. In such cases the
attribute A and B are associated with a third attribute (C) of observers
attention. Such data give rise to illusory associations between A and
B. Thus if an observer is subconsciously affected, by the common
notion that the musical talent is very common in blind people, he may
observe the attribute of musical talent (A) whenever he sees a blind
person (B). In not-blinds he may not observe musical talent as quickly
as amongst the blinds. In such a case even though there may be no
association between musical talent and blindness the data would reveal
a positive association between them.

CONTINGENCY AND CHI-SQUARE.

Manifold classification

We have discussed in the last chapter that classification of data


can be either dichotomous or manifold. For example instead of divid-
ing the universe in two parts-tall and not-tall -we may divide it in a
larger number of parts-very tall, tall, medium sized, short and very
short. Here the attribute tall and its counterpart not tall have been
further divided into a number of sub-divisions. Similarly the two classes-
-heavy and not-heavy may be sub-divided as very heavy, heavy, nor-
mal, light and very light.

Thus attribute A can be divided into a number of groups AI' AI'


B~z, B a, ...
A, ... A, and similarly the attribute B can be sub-divided as B 1 ,
... Bt . It will be observed that each one of the classes AI' As, A~, etc.,
of the first attribute would be divided into a number of heads like B 1,
Bz, B a, etc., when a second attribute B is taken into account. Such
classUication is ca.lled Manifold Clauificatlon.

Number of classes

We shall confine our discussion to two attributes only. If attri-


bute A is sub-divided in J classes and attribute B io t classes -we shall have
a table of the following type : -
ASSOCIATION OF ATTRIBUTES

Attribute A

I
Attribute Al A2 I Aa '" ... I As Total
----- ----
(AIBl)
- - - - ----.
(A2Bl) (A JB 1 )
---- - - - - -------
... ... (AsBl) (B l )
BJ.
----- - - - - - --_- ---- - - - - -_--- ------
B2 (A J B2 ) (A2B,) (AsB2) ... ... (AsBi) (B 2 )
- - - - - - - - - ---- ---- ---- _--- ----_-
B8 (AIBs) (A2 B a) (!laBa) ... ... (AsBs) (B,)
----- ----- ---- ----- ---- ---- ------
...
----- ---- ---_- ---- ---- ---- --------
...
------- ---- --_- ---- ---- ---- ----_ -
\ B t (AI Bt ) (A 2 Bt ) (AaBJ ... ... (AsBt ) (B t )
------ - - - - _---- ---- ---- -(AY-j ------
Total Al (A 2) (As) ... ". N.
I
Tn the above table the totals of various columns Al All etc., and
the totals of various rows B1> liB etc., would give the first order frequen-
cies and the frequencies in various cells would be second' order frequen-
cies. The total of either Al A2 etc., or B I , B 2 , etc., would give the grand
total N. Such a table is called Contingency Table. The following con-
tingency table is 4 X 4 fold. It gives' the details about the stature of
the parents and the stature of the sons.

Parents


V. Tall Tall Medium Short Total

----- ------ ----- ----- ------ -----


V. Tall 20 ,0 20 1. 71.
0lJ
- - - - - ----- ----- ----- ----- -----
... Tall
S
c... ---_~
14
------ -_---
I25
~-----
85 12
-~--- ~----
1.,6

£ Medium ; 140
433 [65 U5
0 ----- ----- ----- ._--- ----- ------
Short ) 37 68 15 z59 [

----- ----- -_--- --_-- ----- - ... ----


Total 40 332 33 8 1"'00
"566 FUNDAMENTALS OF STATISTICS

From the above table we can easily know that the~e are 40 very
tall parents and 20 of their children are very tall. 14 tall • .3 medium and
.3 short. Similarly there are 290 short parents arid 151 of th~ir children
are short. 125 medium. I:!. tall and 2 very tall.
Association in contingenc~ tables
For studying association in such tables the easie~~ ,yay is to convert
them into 2 X 2 fold tables by merging the various groups. For example,
in the above case, tall and very tall groups can be combined in one and
named "tall". Similarly the medium and short groups can be combined
in one and named "not tall." If it is done, the above table would become
2 X 2 foJd and then association can be easily studied. The contingency
table given above can be reduced to a 2 X 2 fold table as follows , -
Parents

Tall Not tall Total

Tall 18 9 119 308

Not tall 18 3 50 9 69 2

Total 37 2 628 TOOO

We can now trace the association between the stature ot the off-
spring from the above table by the metho~_ discussed eatlier. Thus the
percentage of tall ~hildren in the universe is 3<_:_8 X 100 or 30.8. The
1000 18 9
percentage of tall children amongst tall parents is -8 X 100 or 6103-
30
This indicates that there is a positive association bet\veen the stliture
of tbe parents and the stature of the offspring. Similarly, the perc-ent·
age of not-tall children in the universe is 6qz X JOO or 69.2, The per-
1000 50 9
centage' of not tall children amongst not-tall parents is - 6 X ioo or
92
n.'. Here again there is indication of positive association between the
stature of the parents and offspring.
The abov~ procedure of ~tudying association in cc:ntingency
tables is not very.accurnte or convenient. In actual practJce we are
concerned with finding out whether A's on the whole depend on B's.
We are not ,concerned \'lith the association of individual A's and B's.
ASSOCIA'CION OF A'1"l'RIBO'l'ES 567
There IS need of a co-efficient which would summarize the extent of
dependence of one attribute on the other. Moreoyer the technique of
pooling sub-groups together is laborious and inconvenient particu-
larly if J and I are large.
Coefficient of contingency
If A and Bare c_ompletely independent of each other in the uni-
verse at large then the actual values Al B I , A9 , Ba, etc., must be equal
to their expected val~es which are in turn equal to (At) JB
1) and

(A~~(BII) respectively. In other words, if ~he observed frequency in


each of the cells of a contingency table is equal to the expected fre-
quency of that cell, A and B would be completely independent of each
\Other. If these values are not equal in all the cells it is an indication of
association between the attributes A and B. In order to test the inten-
sity of association~ the differen~e between the actual and expected fre-
quencies of various cells is calculated. With these differences the; value
of Chi-Square (pronounced Ki-Square) is obtained. We shall discuss
Chi-Square test in details in chapters on Sampling. The value of Chi-
Square is represented by
XS=.E {,(Differences of actual and expected freqUencieS)!I}
Expected frequencie!!
If f stands for actual frcCJuency of a class and fl for expected fIe-
quency the value of XI wouJd be

XI=E {(If )1 }
1
II
This value is called "Square Contingency" and if the mean of the
square contingeflcy is calculated, it is called "Mean Square Contingency.Of
Thus
Square contingency = X 2 and
Mean square contingency or
XI
q,2 =N
(.pI is pronounced as Phi-Square)
XI can ~lso be calculated by the following formula :_

X2=E'/' (.()J ~ -N.


L J1 J
It is obvious that X 2 and cp2. which are the sums of sl"Juare cannot
have negative value... If, however, the actual and the expected values
568 f'UNDA~TAL& OF STATI5TICS

are equal in all cases the value of Xl and r/JI would be o. The limits
of X' and 41 2 vary in different cases and as such they are not suitable for
studying the association in contingency tables. Karl Pearson has given
the following formula for the calculation of "Co-efficient oj Mean Stj!ltJre
Conting·'ICY." According to it the coefficient of mean square contin-
gency or

c= j;;&
-J
- q,2
-,-,'--:X=2-

[f X' is calculated by the formula

X2=}: { Cj:) } _ N and if r { (j ~2 }

is represented by S, the coefficient of mean square contingency or

-J-
-
~--1'.
:::.

The above coefficient has a drawback and it is that it never reaches


the limit of I. The limit of I is reached by it only if the number of
classes is infinite. Ordinarily its maximum value depeods 00 the values
of.r and t (;. e., the number of sub-divisions of the two attributes A
and B).
1nat X t table (2 X 2 or 3 X 3 or 4 X 4 fold table) the maximum
value of C is equal to

= j III

Thus in a z X 2
-
fold tahle the maximum value of C

=
J a-I
-2-=.70 7

In a ;,'.'; fold table it is .8'16 and in a 4 X4 fold table it is .866.

It is obvious from the above discussion, that the maximum value


of C would depend on the manner in which data are dassifieQ. If
instead of 4 X 4 fold tabulation there is 3 X 3 fold tabu latioa the maximum
value of C would fall from .1l66 to .816. Thus strictly speaking the
.coefficients calCl.llated from different types of classification are not com-
parable with each other.
ASSOCIAnON OF ATTRIBUTES 569
Example 8. Discuss the resemblance of stature qf parent and
offspring-from the following:-
Parent
,
Offspring Very Tall I Tall Medium Short I
I
Total

- - - - - - ------ ---- ----- -----/ -----


Very tall 20 30 20 2 72
Tall 14 12.5 85 12. 2.3 G
Medium 3 140 16 5 C25 433
Short 68
- T~:l-·-I -----1-----
40
.3 37

~32
15 1
-_--- -----
q8 2.9 0
259
_-----
rooo

SO/Iltion
\
Let us take as our hypothesis the supposition that the two attributes,
t'iZ., stature of parent and stature of offspring are independent. If this
b~ true, the theoretical cell frequencies would be :

Stature of Stature of F"rent Tot:ll


Offspring
Very Tall TaU Medium Short
- - - - - - -_--- ---- - - - - - - - - - - - -----
Very Tall 2·9 23·9 24·3 40·9 72
Tdl 9·4 7 8.4 79. 6 68.4 23 6
Medium 17·3 143. 8 14 6 .4 12 5 ·5 433
Short ;0·4 85 ·9 87·5 75. 2 259
------- ----- ---- ------ - - - - - ----
Total 40 33 2 33 8 29° 1000

Substituting the observed and the theoretical freguencies in the


formula

We get,
X 2= (20-2'9)2+ii_0=-~3~2)~+ (20- 2 4.;)2 (.2-20·9)2 +
2·9 23'9 24· 3 20·9
+ (14-9.4)2 +
(125-78.4)2 + (85-79.8)2 + (12-68.4)2
9.4 7 8 .4 79. 8 68·4
+ 12
(3-17'3)' + (1 4 0-1 43 .8)2+ (165-146.4)2+ ( 5- 12 5.5)2
17.3 143. 8 146 .4 125·S
+ (3-10.4)1 + 8
(n- 5·9)2 +
(68-87.5)2 (In-7~ .2,2 +
10·4 85'9 87'5 7~.7
.570 .PUNnAME.NTA~S OF STATISTICS

= 29 2 .41 + 37•.%I + !8.49 + H-7·:l. 1


2'9 23·9 ;t4·; 20·9
+ 21.16 + :n..1l.~6 + 17.04 + 3180.96
9·4 7~'4 - 79. 8 68·4
+ 2°4.49 + ,£4.448 + 34~6·96 + ..:..:1_
17·; 14~· 14 125.5
~,:l

+ H.7 6 + 2.39 + ,80·2.5 + 5745.64


1 • 21

10·4 8j·9 87·5 n.2


3 2 5. 1 5.
Subi!itituting the value of Xl! and N in th~ .Pearson's formula

C = IN!~'
vht!reaS' C repteseots ~- -Coefficient of Mean Square Con-
tingency
W~ get,

C =
=.49~
J 325. 1, .
looo+3 1 5.Q

Thus, the coetEcient of· contingency is .495 which indicates tha


the association between the stature of parent and the stature of offspring
is significant. From inspection of the table, the contingency is positive
w hi~h means that there is resemblance between the statures of the two.
It should.be remt'mbered that in 4 X 4 fold table the maximum
value of··C==.866.

J ---s- we shall get the same


~
If C is .calculatl?d by the .fo.rm.ula C =
value as obtained. by the above method.
Tachuprow's co-efficient .
Since the Pearsonian co~fficient of mean square contingency
does not reach the maximum limit of I and since this is a drawback
Tschuprow has sug~d the. co-eflicient T. It is calcula~d as follows : -

In example No. 8 above the value of-Twould be-

= l ·'49SI
--~VC:;;--=(=4'_1) (4-1)
ASSOCIATION OF ATTllIBl1TES 571

·245--
= j_
·245

=V'
·755 x3
=-to8
= ·329
J 2·265

Thus both the coefficients indicate that there is association bet-


ween the statutes of the parent and the offspring.
Questions
I. How would you distinguish between "Association" and "Correlation"
Ill! the tCtmll are used in statistics? (M, Cl>I1I•• AJ/abahatl, I 948).
z. "99% of the people 'w~o drink wine die before they reach the age of 80
yeus. Therefore taking of wine is, bad for long life." Do-you agree with the
above condulfion ? If not, why? Give reasons in support of your answer.
3. What is meant by (i) Criterion of independence (ir) Complete association
«iii) Complete disassociation.
\ 4. How would you measure the intensity of association of two attributes?
,. What is meant by partial association i' 'Illustrate how partial association
can give rise to misleading conclusions ?
6. How would you study association in contingency tables? Point out the
chief drawbacks of the PeatSo~ian coefficient of mean square contingency.
7. 'Investigate the IlBsociation between darkness of eye-colour in father and son
from the following data : -
~~~~~~~~~ ~
F~ with dark eyes and sons with not dark eyes 190
Father with not dark eyes and soris with dark eyes 890
Father with not dar~ eyes and sons with not dark eyes 7.8%0
8. Can vacdi:aation be regarded as a preventive measure for smallpox from
the data .given Delow ?
·Of 1,48% persons in a locality exposed to small-pox, ~68 in all were attacked.
Of 1,48a persona, H3 had been vaccinated and of these only" were attacked.
(M. C-., AJltlbabtNJ. 194-4).
9. In an anti-malarial c:ampaign in a certain area, quinine was administered
to au persons out of a total population of 3.&48. The number of fever c:uea is
.bown below : -
TretII_1 F_,. No F_
Quinine 10 792
No-qainine 2%0 :,n6
Discuau the usefulncss of quinine in checking malaria. (P. C. S., 1941).
10. Can it be concluded from the data given in the following table that ~
.feeding ill conducive to malocclusion of teeth ?
. Malocclusion of the Teeth in Infants

Normal Teeth Malocclusion

___---------------~------------_4.-------------
II

t n
572 FUNDAMENTALS OF STA'I'lSTICS

II. Do you find any association between the tempers of brothers and sisters
from the following data : - '
Good natured brothers and good natured sisters u30
Good natured brothers and sullen sisters 850
Sullen brothers and good natured sisters 30
Sullen brothcrs and sullcn sistcrs 580
12. Explain the 'method of finding association between two attributes. Out of
70 thousand litcratcs in a particular district of India. number of criminals was 100.
Out of 930 thousand illiterates in the same district, number of criminals was I 5 thousand.
On the basis of these figures, do you find any association between illiteracy and
criminality? (M. A., Agra, I941).
13. (0) Write a short note on the use of Coefficient of Association in analysing
economic statistics. .
(b) From the figures given in the following table, compare the association
between literacy and unemployment in rural and urban areas, and give reasons for the
difference, if any : -
Urban Rural
Total Adult Males
Literate Males
Unemployed Males
Z5 lakhs
10
5
.... zoo lakhs
4
0
12.
"
"
Literate and unemployed Males 3 4
" A. A''''., 193 7 ')
(M.
(M. A., Pa/na, 1943).
14· The following table gives the number of literates and criminals in three
cities of U. P.
Kanpur Allaha\Jad Agra
Total number (thousands) 244 1 84 230
Literates (in thousands) 40 47 H
Literate criminals (in hundreds) 3 2. 2.
Illiterate criminals (in hundreds) 40 2.0 24
Comparc thc degree of association oetwcen criminality and illiteracy in cach of
the three towns. (M. A., AI/ahabad, 1944).
15. A census revealed the following figures of the blind and the insane in two
age-groups in a certain population.
Age-group Age-group
15-25 years Over 75 years
Total population 2,70,000 1,60,200
Number of blind 1,000 2,000
Number of insanc 6,000 1,000
Number of insane among the blind 19 9
(a) Obtain a measure of the association between blindness and insanity in each
of the two age groups.
(b) Do you consider that blindness and insa nity are associated or disassociated
with each other in the two age-groups, or more in one 2ge-group tbnfl in the otner?
(U. P. C. S., 1948).
(M. Com., Allahabad, 1950).
16. Calculate the co-efficient of association between extravagance in father and
sons from the follOWing data :
Extravagant fathers with extravagant sons 3z7
ExtravaglilOt fathers with miserly sons 545
Miserly rathe!S with extravagant sons 741
Miserly fathers with miserly sons 2.55'
(M. A., Lu~kno1P. 1947).
ASSOCIATION OF ATTRIBUTf_S 573
17. 1n December 1897, there was aR outbreak of plague in a jail in Bombay.
Of 127 persons wbo were t1ninoculated, 10 contracted plague, 6 of them dying. Of
147 persons who were inoculated, 3 contracted plague and there were no deaths.
Trace the association between (a) inoculation and contracting the disease (b)
inoculation and mortality among persons who have contracted the disease.
18. The following table shows the distribution of the temper in pairs of sisters
in an exhaustive school enquiry:- ,
)

FIrst SIster
Second Sister I
Good natured I Sullen Total
Good natured 1040 I ISO 1220
SuITen ~6o I 12.0 2.80
Total 12.00 I ~oo 1~00

Trace the association, if JIOY, in the distribution of tempers infirst sisters and
second sisters. (M. Com., Raj., 1952.).
19. Find out the coefficient of association between the type of college traininjZ
and success in teaching from the following table : -
Institutic,,)O Successful Unsuccessful Total
Teachers' College 58 42. 100
University 49 51 100
Total 107 93 2.00

(M. A., Allahabad, 1950).


2.0. The following table shows the association, among 1000 school boys between
their general ability and their mathematical ability. Calculate the coefficient of
contingency between the two,
General ability
Good I Fair I Poor
g
:.E..
J
Good 44
I 2.Z
I 4

..<i
Fair 2.6~
1
2.57
I 17 8
to
::8
Poor 41
I 91

(M. A., (Malhs)., Punjah, 1945.)


I 98

:n. Given the following contingency table for Hair -colour and Eye colour,
lind the value of C. Is there good association between the two?
Hair colour
Eye colour I Total
"-
Fair Brown Black
------
Blue 15 5 20 40
Grey 2.0 10 20 5'0

Brown 2.~ IS 2.0 60


Total 60 30 60 . ISO
574 FUNDAM!lNTALS OF STA'l'ISTICS

&2.. The following table show8 the astIociation among 1000 criminals between
their weight and mentality. Calculate the coefficient of contingency berween the two:
Weights in Ibs. I
Mentality
90 - ao
I
I 120-130 130-140
I
I 140-150
I 15 0 I Total
I I Iupward II
Normal ·.----,-O--I--I-O-2.--T--1-9-;8:---_,I~-2.-I-O--TI--2.4-0~-';I--n8~00::-
Weak ;0 ,8 72 ~o I ~o I 200
Total 80 140 2.70 I 2.4" I 2.70 I 1000
23. The data in the following table were obtained in a cross between a rust
resistant and a susceptible variety of oats. The Fa families were compared
for reaction to rust in the seedling stage, and in the fide under ordinary q>idemic
conditions.
Classification of Seedling and Field Reactions of 900 Fa
Families of Oats
Seedling. Reactlon
Field Reaction
Resistant Segregating Susceptible

Resistant 112 51 37
Segregating 47 404 49
Susceptible 13 In

IZ

. Tesl the sIgnificance of the asSOClatlOn in the table and cal~late the coefficient of
contmgency.
24. 1000 subjects of English, French, German, Italian and Spanish nationality
were asked to name their preference -among the music of those five nationalities. The
results were as follows : -
Nationality of music prefened

J
English French German Italian Spanish Total
..... Engl~sh
-
:g"
Ul
32 16 75 47 30 zoo
----
French 10 67 4z 41 40 zoo
'0 ,
Gerrnt lZ z3 107 36 Z2 zco
~ Italian
.-------
16 ~
Zo 44 76 44
-.-
zoo
'D
S
Spanish
~ 30 66
01 8 zoo
Z H 43
Total 78 it 179 \
29 8 243 202 1000

Discuss the association b~eeo the nationality of the subject and the nationality
ollhc music preferred.
ASSOCIA'l'ION OF A'l"l'Rmt1T.BS 575
2.5. Examine critically. the following statements and the infcrcncel1 drawn-
State wheth'er the'inference is or ill not valid in each case, glvin~ reasons. It lD2y be
that to test the validity of the inference in any case, some addItional information is
needed; in that case state what this additional information is and how it should be
analysed to examine the truths of the inference drawn.

(a) Statemeot: Eighty-five per cent of the girls reading in University Bare
short-sighted and wear gllisses whereas only 2.S per cent of the boys have this eye
defect and wear glasses.

Inference: There is a strong association between eye condition (defective ')r not)
Ilnd sex, with short-sightedness being almost a char"cteristic of the female sex.
(h) Statement: The acreage under food snd cash crops in a tahsil for the years
t9S 3-195 5 was as shown below :-
Year Acteage (in units of a thousand acres)
Food crops Cash crops
I9H' 100 2.0

1954 12.0 38
1955 ISO 62.

Inference: During recent years there has been a tendency to convert the land
under food crops to casb crops.
(t) Statement: Out of all the 600 children of a school, who were vaccinated,
only 2.0 were attacked with tuberculosis. Out of all the 300 children of another school,
'QOO~ of whom were vaccinated, 30 were attacked with tuberculosis.

Inference: Vaccination has some effect in providing an immunity against


tuberculosis. (1. A. S., 19S7).
2,6. Explain clearly what you understand by association and illusory IlIISOC; ,tion.
In a state with a total population of 70,000 adults. H.OOO are males and c.ut of a
totill of 6,000 graduates, 700 are f~ales. Out of 1,2.00 graduate employees of the
lltate. 2.00 are females. Is there any sex bias in education among people? ~e s~~e
bolds that no dis#nction is made in appointments in respe~ of sex. How far 18 melt
claim substantiated by t!te data given above? (P. C. S., 1953)·
2.1. The male population of a certain state in India is HI lakhs. The number
ofliteratc males is 66 lakhs and the number of male criminals is H thousands. ~ ~e
number oflitctate male criminals is 6 thousands calculate the co-efficient of assOCJlltlon
between literacy and criminality in this state. ' (P. C. S., 1955)·
2.8. When are two attributes' said to be (a) independent, (b) positively IISsocia-
ted, (~) negatively aSsociated? What are the conditions to be satisfied by class
frequencies in each ?f the above cases ?
" Investigate thq association between 'darkness of eye-colour in father and SOn
from the following data : -

~inbination

Fathers with dark eyes and 80ns with dark eyes


Fathers with tdark eyes and sona with not dark eyes
Pathers with not dark eyes and 8b~s with dark eyes
fltbe with not dareyca and I0OI with not dark eyC1
576 FUNDAMEN'l'ALS OF STA'l'IS'l!ICS

What would have b~en the frequency of 'fatherS with dark eyes and sons with
dark eyes'. for the same total number. had there been complete independence?
(1. A. S., I9H).
29. In a town of about 1,00,000 population, 52.,000 males and 48,000
females were distributed as follows ;-
., Males FamaJes
(In thousands) (In thousands)
Educated and employed 38 6
Educated and Unemployed 2 14
Uneducated and employed 84 18
Uneducated and unemployed 10
Is there any Connection between education and employment in the two groups
as well as in the total population.
Interpolation 20
Meanillg. Ordinarily statistical data relating to the values of two
interrelated variables are not available in the shape of cvntinuous
series. Such dllta are found in the form of discrete series so that for
some given values of x-variable corresponding values ofy-variable are
available. Sometimes necessity is felt for obtaining corresponding value
of y-variable for a particular value of x-variable which is not available
in the given series. The process of estimating such unknown values
of y-variable conesponding to particular values of x-variable is
. called Interpolation.
Suppose ~he interrelated values of two variables x and 'J are
a~ foIl ows : -

x .Y
3·2.
2. 4. 1
3 5. 6
4 6.8
5 7·'
6 7·9

In the above table we have given some values of x-variable and


tbe corresponding values ofy-variable. Thus if x is 2.,y is .p; jf x
is 3,.1 is 5.6. Here we do not know what the value of y-variable would
be if the value of x-variable is 2.4. The technique of estiml!ting value
ofy when x is 2..4 would be called interpolation. Interpolation can be
done under certain assumptions which we shall discuss a little later.
We can then define interpolation as the technique of obtaining the mosl
lileely estimate of certain tjuantiry under cert.tzin assumptions. In interpolation
the value of x for which the corresponding value ofy is to be estimated
would always be within the lowest and highest values of x series. Thus
in the.above table if we have to find out corresponding value of.1 when
the value of x is 8 or 10 the process would not be called interpolation
because the values of x for which corresponding values of'y are to be
estimated are outside the minimum and maximum values of x series.
In such cases the term that is used is Extrapolation.
Need. We have already mentioned about the necessity of interpola-
tlon in the calculation of median and mode in a continuous series.
37
57M FUNDAHEN'rALS OF STATISnCS

When statistical data are available in the shape of class intervals and
class frequencies it become':! inevitable to use the technique of inter-
polation for the calculation of the values of median and mode. It
would be remembered that interpolation is always dont: under certain
assumptions and in case of the interpolation of median we had assumed
that the magnitude of the median class is equally distributed over the
frequencies- of that-class. Similarly. in case of interpolation df mode
our assumption was that the m~dal value is affected by the values and
frequencies of tbe adjoining classes. But it is not only for the purpose
of locating the median or mode that methods of interpolation are used.
The use of such methods is necessary in a large variety of studies. As
a matter of fact whenever we have to fill up the gaps in statistical data
the technique of interpolation has to be used. vl?-PS in statistical data
may a.::se due to various reasons. In many cases it is not possible to
collect the whole dat!l about the problem u~der study. Even if it Were
possible to collect the whole data it may not be worth while to do so
due to a large amount of expenditure involved or due to organisational
difficulties of a complex nature. Population census. as for example,
is not conducted every year because It involves a huge expenditure of
money and there are considerable difficulties in organising it. As such
population censuses are conducted only once in ten years in almost
all countries of the world. Now if we wish to know the population
of a country in-between these censuS years we shall have to use the
technique of interpolation. In India the last ceqsus of population
was held in 1951 and prior to it in 1941. If we hive to estimate t1;le
population ofIndia in 1940 or 1947 we shall have to use the technique of
interpolation. If, on the other hand, we wish to know the population
of India in 1954 or 1955, we shall have to extrapolate it because the last
available figure relates to 195 I only. Besides this, gaps in statistical
data may also be on account of the fact that for some special reasons
no statistics were collected either in a particular week or month or year
as the case -may be. In some cases the collected data may have been
destroyed or lost and in such cases also the technique of interpolatiQn
has to be used to fill up the gaps in statistical information. In all such
cases where there are gaps in data we have only two alternatives before
us (a) either to fill up the gaps by imaginary figures or most likely
4gures according to our intuition or judgment or (b) to fill up the gaps
by the most likely figures as estimated on the basis of the available data.
Pbviously the latter alternative is better and is likely to give us a
dependable estimate.

Assumptions
As. has ~een sai,d above interpolation of figures.is done on .certain
assumptions. They are as .follows .:-
(I) There are no stIJJen jumps ill jigNres /r:01ll one period 10
another. In other words it means that the data are in the shape of conti·
INT8B.POLATION 579
nuous or smoothed curve. H t for example, we are interpolating the
figure of population of India in the year I933 and we are given the
figures ofIndian population in the yeats 1911, 19%1•. 193I, 1941 and 19' I
our presumption w01,lld be that t!te population in this country bas
grown up smoothly and there are no violent ups and downs in these
6gures.
(il) The second assumption is that the rate of .hollge oJ the figures
is II1liforlll.
It means that in our example of interpolation of Indian
population our second assumption would be that the rate of the growth
of Indian population h1ls been uniform throughout the period 19 I I to
[95 I .

. On the basis of the above assumptions missing figures can be


interpolated from given set of data with a fair degree of accuracy. One
question that may arise here is about the extent of the accuracy of inter-
polated figures. The accliracy of the interpolated figures actually
,depends on two factors. The first is our knowledge about the fluc-
tuations in the figures. We can have an idea about fluctuations in the
figun:s by gene!al inspection of the data available. If we feel that the
fluctuations of the given figures are regular and the first assumption of
interpolation is satisfied the interpolated figure would be fairly accurate.
The second &.etor is our knowledge about the course of events relating
to the problem under investigation. If we have some external know-
ledge about the special factors which have" affected the particular pheno-
meoa under study we can modify the interpolated figures in the light
of these factors and cao thus have a more dependable estimate. Thus,
if we interpolate the population of India for the year I947 and if we
know that due to the partition of the country the figure of population
was affected in that year we can modify the interpolated figure and thus
make it more accurate.
It should not be torgotten that interpolated figurdS are only best
IIIIlier ~ertaill flSSll1JlPtiCIJS and that they ate obtained
possiil/8 estiflllltrs
by methods which are entirely clliferent from those by which actual
figures are got: Interpolated figures are not perfect substitutes of the
ori~ figures. They are only best possible substitutes 00 certain
hypothesis.

Methods of interpolation

Broadly speaking there are two types of methods' of interpolation.


.l'bcy are:-
(:rJ Graphic methods
~\>
Algebraic methods
-We .shall discuss these below.
580 FUNDAMENTALS OF STATISTICS

GRAPHIC METHODS
Graphic Methods in continuous time series
If statistical data are available in sufficient quantities they can be
plotted on a graph paper. After this continuous smoothed curve can
be drawn passing through the plotted points. This curve would
disclose the inter~atiotr bf the two variables and if we know the value
of one variable we can estimate the corresponding value of the other
variable. This method would become clear by the following ,example :-
Ee<omp/e 1. The following table gives the population of England
and Wales. The figures are for every twenty years beginning from
IBIJ : -

Year Population
(in crores)
1811 1.02
1 8 31 1·39
18 51 1.79
18 7 1 2.27
18 9 1 2.9 0
19 11 \.61
1931 4. 001
The above figures are plotted below in figure I :-

PoP!lation of Enl!,land and Wails


-T---'---'---'
4~~--+--+--+-~--~~--~~

~ .~ ------. -!---~~'---I-'--
~ ~...:.r:-:-:-::.: r-:-·""-.~r=-:-:-: .. -l .. · ( . ._+---1

~ .? " ........ --- -./_-----V(-i-;--t----t---t--


, I
I ,

1811 /831 /851 181( 189/ 1911 1931


Y~qrJ

Fig. 1
INTERPOLATION 58l'
In the above figure the plotted points are very clear. The curve:
is not obtained merely by joining the plotted points by straight line.
In that case the curve would not have been smooth. Now suppose
we' have to iaterpolate the population figures for the years 1861 and
I 88I. For this we shall first locate these values on the x-axis on which
the years are shown. From these points two ordinates shall be drawn
at they-axis. We can now read the values at the points where these
ordinates touch the y-axis. They would be the interpolated figures
of the population for the years 186 I and 188 r . From the above graphs'
these figures are 1.'0 crores and 2.6 crares for the years 1861 and 1881
respectively. For these years the actual figures as obtained from the
censuses were respectively 2..007 and 2..597 crates. It is clear that the
difference between the interpolated figures and the actual figures is not
much.

The curve drawn in the above figure is a freehand curve. We


can draw a mathematical curve also. For example, we can fit a para-
bolic curve to the given data. In the chapter relating to Gtapbic
Presentation of Data we have already discussed the method of litting
a parabolic curve. We shall discuss the mathematical implications of
the parabolic curve method again in this chapter when we deal with
the algebraic methods of interpolation.

If the data f:elating to pOPQlation figures were not available for


all these years but for only two years then also it would have been
possible to interpolate the figure in any year in-between the two years
for which the figures were given. Suppose we are given the popula-
tion figures for only two years, i . •'., 1911 and 193 I. These figures are
~.61 crores, and 4.00 crores respectively. Since we do not know the
figures of the other years the line joining them would be a straight line.
It would mean that the popUlation figures from 1911 to 193I increase
at a uniform rate. Now if we wish to know the population fot the year
1921 we can follow the same method as discussed abo~e and draw an
ordinate at a point where the x-aris reads 19ZI. Then from the point
where it touches the plotted line we shall draw another ordinate at y-
axis and read the value. It is clear from ordinary arithmetic that the
populatlOn of 19Z1 would be equal to the mean of the population figures
~'6I+4'oO
of
. 191 I and H}3 1.10 other woids, it would be z 01' ;.80$ crates.

The actual figures ofpopulaticn for 19%1 as disclos~d was ;.789 crateS.
Thus here again the difference between interpolated and the actual figure
is not much. The error is only.. 4~%'

Graphic methods in serjes showing periodicity

The graphic method of interpolation can b!,! used in such cases


also which disclose some type of periodicity. We have disrLlssed
582 FUNDAMENTALS OF STATISTICS

in the chapter on Analysis of Time Series the various types oE periodic


movements which may be expected in a time series. Seasonal. and
cyclical movc:ments in time series are examples of periodic movements.
Thus prices' of foodgrains always show downward tendency at the
time of harvesting and this is a periodic movement to be repeated at
the time bf each harvest. Now if we have to interpolate graphically
the price of wheat at a particular period we can fill in the gaps in the
given data in a better manner. Suppose the figure for the month of
March (which is the harvesting time ot wheat) is missing. Now instead
of joining the points representmg the prices of February and April as
a straight line we· can draw·'2''CUrVe showing the downward tendency
during the month of March because we know that in this month the
prices must have been lower than the prices of either February or April
Similarly if there are cyclical movements in the series we can iill up the
gaps in the given figures more satisfactorily than is possible in CMeS
where no such tendency is obvious.
Graphic method for correlated series
If two series in question are correlated either positively or ~­
[lvely and if some of the figures in anyone of the ~9 series are missmg
the graphic methods of interpolation would give &irly ac:cunde estimates
of the missing figures. The data relating to both the series would fiat
be p:otted on graph paper. The series whose figures ~ c.ompkte
would give a complete curve 2tld the one whose figtues are missing
would naturally give an incomplete curve. ::Now since we know that
there is correlation between two series we can confidently compkte
the incomplete curve with a fair degree of accuracy. If thc!e is positive
correlation between the two seric:s the two curves should show similar
movements and if the correlation is negative we shall complete the in-'
,"omplete curve in such a manner that the two curVes show op~
tendencies and move in reverse directions. Once the two C\UVCi au
plotted' and smoothed it is easy to interpolate any missing figure. ".l1Us
method can be easily applicable to series relating to pricc and c:Jennnd,
production and exports, retail prices and COlt of living. etc.

ALGE.BDAIC }'fETHODS

Various algebraic methods have been developed by which inter-


polation and extrapolation of figures can be done. The general assump-
tions under ~'hich the interpolation of figures is dane has already bcc:n
discussed eft'lier and all the fotmulae of interpolation which we shall
discuss beJC1W' are based on them. The important algebraic methods
used for inC41poiation C2ll be grouped in the following three categoric:s:-
(i) MOOlods of curve fitting
(ii) Methods of finite differences
INTERPOLATION

,tit) Methods applicable in case of unequal interval!l of


arguments.
In each of these categories there are various types of formulae
that can be used for the purpose of interpolation. We shall discuss some
of them here.
METHODS OF CURVE FITX'ING

We know that various types of curves can be fitted to a statistical


series. If the movements in a series are of a uniform and regular type
we can fit a straight line "curve, by the method of .least squares and t!Ius
estimate the value of y for given value of x. However, for the purpose
'of interpolation the best curve is the parabolic one and here we shaH
discuss only this curve.

'\. Suppo!;e there are two series x andy. We presume that the values
of.J depend on the "values of x -in-such 2 manner .t.h;tt w_h~n _x i, given y
can be estimated. In a parabolic curv.e as we have seen in the chapter
on Analysis of Time Series the relationship is of the type .
.1=a+bx+cx'+Jx3 ........ .
wht;re a, b, f: and dare cor.stants to be determined. The equation_'=
,,+bx+~+Jx8 is a parabola of the third order. If we are given fOUl
v~lues of.1, we can tit a parabola of the third order, to the series. Simi-
larly if 8 values ofJ are available, we can fit a parabola of the 7th order
to snch a series.

fn the equationy=a+bx+f:xZ we can know the values of the cons-


tauts a, band (; from the data given and further if we substitute the values
of x in the equation we can estimate the corresponding values of .1 for
given values of x. The following e~ample ~9~.t~ illustrate thc:_above_
met1rod-;,-
Example 2.. The folio"'lng table gives the popuia'tion or india :-
Population
Years (in crores)
(x) (.1)
(9 1J 3°·3
191.1 ,o.~

1931 n· 8
1941 ;8·9

We, have to interpolate the population of the year 192.6.


584 FUNDAMENTALS' OF STATISTICS

In the above table we are given four values of the ,-variable and
as such we can fit a parabola of ;rd order. It would be of the following
type
y=a+bx+cx'+dx'
Now it we can know the values of 0, b, I and d in the above equa-
tion, we can complete it and then it would be possible to interpolate
the: population for the: year 1926.
In this data we shall take the years as x-series and the population
figures as ),-series. We have seen in the' chapter on Analysis of Time
Series that in place of given values of x we can write down their devia-
tions from any point of origin. If we take 1926 as the year of origin
the deviations of the years 1911, 1921, I9.~5 and !941 would be respect-
+
ively -'15, -5, 5 and + 15. We can further reduce their size by
dividing them by a common factor and writing them as-" -I, +1
and +; respectively. The deviation at the point of origin or 19z6
would be o. Thus the data.given in the above example can De written as
Tollows : -
x J
-,
-I
30 .3
30 .5
0 .10
I )3. 8
.5 3 8 ,9

Now all these points are on the equation y=a+bx+&xi+dxl' and


as such in place of x and." we can substitute the figures given. If this
is done we shall obtain the following five equations:
;0'3=a+b(-3)+c(-3)s+d(-3)3
or
30.3=a-3b+9c-Z7d
300 y=a+ bt--"tr-t=r{=r)lI-::j=(l(":"" 1)3
... or ~

or
30.5 =a-b+&-a (ii)
.yo =a+b(o )+&(0 )2+d(0)3
or
)'o=a (iii)
33·8 =0+b(I)-t:&(1)2+d(I)3
or
8
33. =a+b+c+d (iv)
38·9=a+b(3)+C(3)2+d(3)1I
or
8 b
3 .9=0+ 3 +9&+qd (v)
INTERPOLA'l'ION 5~5

Aecording to equation (iis) yo-a and as such we shall find out


the 'value of a which would be equal to the value of YCi, which has to be
interpolated.
We can now utilise equation numbers i, ii, iii, i" and" given above
to calculate the value of a,
Adding equations ii and if) we get
, 64.;=2.a+u ("i)
Adding equations i and " we get
6Cl.2.=2.a+18.r (pit)
Multiplying equation vi by 9 we get
578'7=180+ 18t (viii)
Subtracting equation vii fiom viii we get
509.5 =16a, or,
50 9.5
a=--=31.8 crotes
16
Thus the interpolated 6gure of the popu]awon of Ind.ia for the year
[92.6 is ; 1.8 croreS.
The most important drawback of this method is that When the
number of items is large it results in too many equations which have to
be solved simultaneously. It is a difficult task to solve a large number
of equations and as such this method should be used only in those cases
where the number of items is small. Judged from the purely mathe-
matical point of view this method is probably the best berause it can be
applied in all ~s of continuous series-
ME.THOD OF FINITE DIFFERE.NCES

To illustrate the meaning of "differentes" let us take the following


table in which the logarithms of some numl:!ers are given : -

Number Logarithm Defferences


(I') (z) (;)

100 2..0000
101 :&.0043 +.0043
101 :.0086 +.0043
10 3 ".0] 2.9 +.004;

The differences in column; have been calculated by subtracting the


logarithm of a figure from the logarithm of the immediately succeeding
PUNDA1.iFNT ALS Ol"S'l'ATISTIC ~

6guze. If we have to find out the logarithm of 10 I.~ we can do so by


simple ioteqsolation as follows:-
Log 101= 2..0043
., X.OO43 +.OO2.Ij

2..006""
In the above table the figures were tabulated very closely and
dee diiIe:l:ence was equal. In many cases the difference may be equal
a~~~~qw:ptly it may not b~ possible t!=' ~~terpolate ~gmes by this
!idiidC method. In such cases .we have to' proceed to highlr dlfferlnn s
ttl t£e hope that they would ultunately be equaJ and may vanisb at some
.tag. In other words, we presume that the differences are finite and are
c:wpabk of be.iog eliminated.
Now take the fonowing table relating to the squsres of cemtin
n~:-

No. I Sqwue 1st Difference l 2.nd Difference 1 3rd Difference

,
J

a
l 1

...
+;
+,
+2.
.

"

I 9
16
r
I
+7
+2

In this case the first differences are not equal but the second
differenc:es ate constant and coru;equentJy the thi!:d differences vanish.
In puctice the differences are indiCated by the sign~. Thus first diffe-
fences would be indicated by 6,1. second differences by' 6,'.
third diffe-
rences by Aa· aD.d so on. The first difference in each column is called
L~ DifJerlllU. Thus the leading differences in the above table are
+ J + 2. and o. H we are given the leading term (the first figUre in the
column of squares) ami the JeacUng differences we can build up the whole
table. Tbua z+o (second and third leading differences) is equal to 2.
which is the value of the second difference; 3+ 2 is equal to j which is
the n1ue of the second difference in the first difference column. In this
way we qm find ont all the differences and when the differences of the
first cohrmn ~ obtained we can easily find out the values of the vari-
ous teals. -

For farther studies we shall indicate the various differences as


follows : -
:: _-
s! <3
iI
it.;!
v ...•
:0 <l
I ..
...
<l
<3

<i
..,

.<l.
,.d-
::3
::sa ..<lII• 11

tf~
;a
..<1f.:. .<lL
.•
<J
..<l.. •<l
..
'Ui
.
:all ~
..<ln• ..<1..
R
.<I.
II

;;
.<IL ..<Jr.. ..<I..
(

..<I.. .<I.. ..<1• .<l.


11
-til
CIIt __
..."
<I
..<lII... .<lII. ..<I.
(I

"'0
..<II.. .<1I• ..<1I. ..<lI.
.. v •
..<1.- ..<1.. ..<I.. ..<1.. .<l.
.. n..
"..
::" II "- II..
-I
1IIf-r ~
=if
"r i'. ......
t t t~
~ ....
'"'
.... co
~ ..... ..
.... .
.... ..
....
- • - 'ot' ... "
\0
588 PUNDAYENTALS 0P STATISTICS

It is obvious from the above table that there is a certain relationship


between the various differences and consequently in the values of .'Y.
If we know the value of.1 we can calculate the differences and conversely
if we know the values of the various differences we can find out
the value of". Thus, the relationship is of the following type: -
I:::. 1. ___::Y1-10
1:::. 20=1:::. 11-1:::. 10
=.JI-Jl-(Yl-:YO)
-:YZ-·2..1l +.10
0=1:::. 21-1:::. 20
/::,.8
= 1:::. 12-1:::. 11-(1:::. 11-1:::. 10)
=1:::. 1 .-1:::. 1 1-1:::. 1 1+ 1:::. 10
= (Y,-:Y.)-Cy 2- :Yl)-Cy.--:i'I) +(.11-.10)
=.1.-y, -::'Y.+.1I-Y.+Yl +11-10
:y.-3YI + 3Yz-:Y0
From the above relationships it is clear that
.1. :Yo
.11 -:Y.+ l::.10
.1.-=.11+6 1 1
=lY.+ 1:::. 10) +(1:::. 20 + 1:::. 10 )
-:Y0+2./::,.10+ I:::. 2 0
",=.12+1:::. 1 •
=(yo+2.~ 10+ I:::. 20 )+(1:::. 2~ -f 1:::. 1 1)
=(Yo+ 2.A 10+ 1:::. 11 0)+ (A 80+ I:::. '0)+ (I:::. 20+ /::,.10)
-=.10+ s1:::. 10+ 31:::. 20+ I:::. 3 0
The numerical coefficients in the expression fpr Yo .11.1'1. and _t'.
are respectively as follows : -
I
I + I
1 + 2. + I
1 + 3 + 3 + 1
These are the terms -,f the expansion of the following binomials
(I + 1)°; (1+ 1)1, (1+1)11 and (1+ 1)3. We can now have a generalization
as follows : -
_ . 1 X(X-I):I X (X--l) (X-2.) 3'
.1z-:Yo+XI:::. 0+
IXZ I:::. 0+ IX2X3 /::,. 0' .... ·.. ·····
This important equation is called Newton's formula.
INTERPOLATIOlll 589
, The above' formula of Newton should be used when the figure
to be interpolated is in the beginning of the table. The reason for
this j s that in this formula we take only leading differences into account
and leading differences as we have seen are always in the beginning.
The value of x in the above formula is equal to the value of x for
which.we have to interpolate the value ofy minus the vaiue of Xo divided
by the difference between adjoining values of x series. It means that
this formula can be applied only in those cases wher'e x series advances
by equal intervals. There are a number of formulae of interpolation
based on finite differences :~

Newton's ,formula
We have already shown above how Newton's formula is derived
and how it is based on binomial expansion. We give below examples
to illustrate the use of this formula.

\ Example 3. Estimate, by Newton's method of interpolation, the


expectatior. of life at age 11 from the following data : -
Age 10 15 zo Z5 30 35
1
Expectation of life (in years) 35.4 31.1 19. 16.0 13.1 10.4
Solution. Interpol~tion of the expectation of life at age 2.2. by
Newton's method.

Expecta- I
tion of Diflcrences
life in

z6.o )'3

Z3. 1 /.Y4
z.o·4 Y:;

Year of interpolation-Year of origin 2.2.-10 u


X= Time distance betweep adjoining year Z5 -10 = -5- =1·4
Substituting the given values 10 the Newton's formula of inter-
polation
PUNDAJDNrALS 01' STATISTICS'

J X(X-I) J X(X-I)(X-Z} a
_'z=J.+xA 0+ I xz A 0 + IX2.X; A 0

..L.~(X-I) (x-z)(x-~). x(x-t)(X-ZXX-3) (X-4) A'


T I:X 2. X J X 4 A 0+ 1 X z X 3 X 4 X 5 L.l ~

Wberc..1x represents the thing fo be interpolated thst thcrxpecta- is


tion of life at age 1% in this case.
We get

..1z=35.4 + (z·4X 3·2.) +2.4(2..4-I)X.1


IX2
+
Z.4 (2..4-1) (z.4-2.)X-.I
JX2X~

+ 2..4 (2..4-1) (2..4-2.) (2..4-~)X.~


IX2.X3X4
2.4 (2..4-1) (2..4-2) (Z..4-~) (Z..4-4)X-.S
1 lX2X3X4X~
Or .1z=3~ .4-7·68+.I68-.022.4-.0Io08-.00~ 37 6
Or .1x=27.8S years _
Thus. the expectation of life at age zz is 27.85 years. /
Ex"",pIe 4. Find out by interpolation from tlte following data
the number of workers earning Rs. Z4 or more but less than Rs. 2.1.
Earning less than- N umber of workers
R'S.
2.0 z96
2.S 599
~ k4
35 9 18
40 966
Estimation of the number of workers earning less thm
SOllifioll.
Its. 24 by Newton's Method.

Numbuofl First I Differences


Thlrd Fourth
E.1rning in Rs. workers
(x) ty) 6 1
Second
6,. l
6,. 1 f).' ,
Less than 2.e ~o 2.96 .Yo
+;°3 AI.
.
..
..
.. 2.5 Xl S99

" ~e XI 80 4
.11
.1.
+z.oS

+II4 All
All
-9 11 6·.f

~J All
-t-7 A.I,
+Z~ All
+1' ' A
-
.•
a. " 3S X. 918 .YI -66 A'.
+ 4 8 All
•• .40 fx• Q66 '.
INTERPOl.!t.TION 591

Earning of interpolation-Earning of origin


x
Dilterence between two adJoining: earnings

2-4-20 4
~=-=8
2.5- 20 5

Substituting'the above values in the Newton's formula of inter·


polation.

= v + xb.1 f=X(X-I) 1\ a +
,;I((X-I) (x-z)
.Yx 0 or X%I W. 0 I X 2. X ,
X(X-I) (x-2.) (x-;) 6,.
IX2X3X4 0

Whereyx represents the thing to be interpolated, that is, tbe number


of workers earning less than Rs. 24, in this case,

We get,/"
1,,=%9 6 +'(.8 X ;0 j\.J.--!8(. 8-1) x-08 + .8(.8-1) (.8-z)X7
'/ IX% IX%X5
.8(.1\-1) (.8-2) (.8-3) X 18
+ IXzX5X4
=29 6 + 242..4+7.84+.224--.3 J68
=54 6

Thus the number of workers earning less than Rs.


and
" "
Therefore, the number of workers e!lrning Rs. 24 or more but less
than Rs. 25 is (599-546)=5;.
Newton-Gauss formula

Another formula which is based on finite differences is given by


Newton an~ Gauss. It is suitable when the figure to be interpolated
in
is' the middle of the table. This formula is as follows:-

X(X+I) 1\ 2 + (x+ I}X (X-J) ~


A'
IXZ
W.
Y
-I
I X rX3 ,.-J
592 FtTNDAMENTAL\ CP STATISTICS

The following example would illustrate the formula'

Exlllllp/, 5 •. Estimate the value ofy if x is ;.75 from the following


table :-
x y
2..~ 14'14$
3.0 11'043
H 2.0'1:'5
4. 0 111'644
4·S 17'2061
5. 0 16'c47

, SO/utioll., In this method.yo is generally the value of the item imme-


dlately preceding the interpolated value ; -

Differences

First Second I Third Fourth


x .J
_Al___ 62 I ~3 6'
2. , 5 2.4'145 ),_2
-2.102 /:::,. 17_~
~'O 2.L·043 y_1 +.!84 6 2
7_2
1
-1'BI8 6 '_1 -'1)47 6. 3,..2
n 2.0·Z2.5 Yo
-1'j8r /:::,.110
+'2.37 D. 2v_1
-'03 8 .6 3,'.1
+.009 l:l 'Y.'
4.0 I st .644 )'1 +'199 6\'0 + '006 .6. \.1
1
-t';82o 6 "1 '0'1 20 b,,8
yO
4,5 17':62 II
-1'2015 1
+' 167 6 2
YI
6 72
S'o 16'047 73 \

Newton-Gauss formula
1 X(X-l) 2 (X+l)X(X-l)
Y" )'o+xb. yo-+- i X 20 b. 7.1 + I XZ X 3
, (X+1) x (X-I) (x-z)
T ~'Y-2
IXzX3X4
3'75-3'5 '2.5
X= =---='~
4'0 -3'5 '5
INTERPOLATION 593
Substituting the values we get,
Y,=z.o'US + (oj X 1'581) + {'S l,S-'I)z X 'zn>}
= { (0,-1) 'S ('S~ I) X - '03 8 }

+{ ('S+I) 'S (·5-':1 (·S-I) X '°°9 }


=2.0·1Z5-'7905 -'01.962., +'002.375 0 + ·0002. 106
=19.407 approx.
Thus the value ofy for x 3"15 =19'407
Sterling Formula
Another formula suitable for interpolation when the missing num-
ber is in the middle of the table is given by Sterling. It is as follows :-
I l
+~ A2 +
1
Y X ='11
-,0
+x A '_I+A\.0
2 2. y-1
X(X _l )
6
X
3
+~(xl-12)A')'-2.
1
A ,_2+A _1
2. ' 2.4
The following example wou'd illustrate the formula :-
Exam_fJle 6, Using Sterling's formula interpolate the value of,
",hen x=9.
x J
4
6
8
10
12

SO/tdiOIl Interpolation of the value of y when x=9


Uitterences
x __ r--------- ~

-y ..-.First Second - Third Fourth


A'Al AI AI
----
. --"-1--- - - - - ----- ----- --r----
4 40 ~
6 43 .r1 +3 A 1 ,_2 +1 AI, •• -3 AI".,
8 48 .Yo +5 Al v-1 - 1 A 22,.1 .0."_1 +~ A'",
IC 12 .Yl +4 Alyo +1 A 00 +1
12 57 Y2 +5 A 1 ,1
_'.
38
594 FUNOAMENTALS OF STATIC;TICS

9-11
Value of x = - - = .5
11-6
Sterling's formula is

yx =Yo+ x '"'
Al +6>¥o_ + -;-
x2
A. 2 Y-
X(X2-1)
Y-l2. 1+ 6 X

A IIY-2 + A. IIY- 1 2
x (X 2_1 2 ) A!Y-2 .......•.
+ __
Z 2.4
Substituting the values, we get.
,X=48+.5 5+4 + .2.l_ X-I+ .5(.2.5~1) X
2. 2. 6
-~+2. .2.5
- 2.- + --- (.2.}-:)'
2.4 ' J

=4 8 +2..25-. 12 5 +.0312.5-. 0 39
=5 0 .117.

Tf the above problem is solved by Newton-Gauss method we shall


get the same answer.

Newton-Gauss (Backward) formula

This formula is suitable for interpolation of mislsing figure which


is at the end of a table. The formula is as follO\vs : -
yx Yo-XAI _
'II 1
+ (X+l)X 2.
A 2 __
~ V 1
(X+l)X(X':'__I)
6
A'
~ ,,_2

+ (X+I)X(X-1) (~--=-~) A' __ ,


2.4 v
Tn this formulayo is the figure succeeding the missing figure and
x is equal to the difference between the units of x for which,Y if> to be
interpolated and the unit succeeding is ciivided bv the differ::-nce between
the two adjoining values of x. The following example would illustrate
the formula :-

Example 7. The following table gives tbe population of a town


during the last 5 censuses. Estimate the populat'fooror the year 193 6•

Year Population (000)


19°1 12
19 11 IS
19 21 20
193 1 27
1941 39
1951 5%
INTIltll'OLA'nON 595

S,llliion. Interpolation of the population fOf the year 1936.

1_--;=,,----~--r___,.;---...,D-iHi-e-lI=en
.........ce-s.._____r-~_-~-
Year

1 901
19 11
! poPUlatiOn
(000)
y
First I Second I Third}
~l ~I ~.
Fourth
~4

1 92.1
1 931
1 941
1 951

1 94 1 - 1 93 6 ·5
x = 1 95 1 - 1 94 J = 10= ·5

Y"
=Y _XA 1 _
- 0 ~ ,. 1
+ (X+I)-,,"
%
AI _ _
~ 'I 1
(.%"+1)-,,"(-,,"-1)
6
AI
~ T_I

=39(.,XIZ)+ (1.'_~~5XI) _(I.5X.5/-.5 X-4)
=39-6+.375-.2.5 0
=33.12.5 thousands.

Direct Binomial Expansion

In some cases it is possible to find out the masing figure by directly


raising a binomial without finding out the ditkrences. Such OIsea are
those in which not only x series advances by equal intervals but the value
of x for which values ofJ ar= to be interpolated is one of the class limits
of x series. Suppose we are given the following data : -

x
(0

~o .• .Ja
4 0 7 J.
So 8 y.
'In the above table the x series advances by equal intervals l?f teo
units and the value of x for which the cortesponding value of.7 has to
be interpolated is also ooe of the class limits of x series. As such in
this case we can make use of a formula of direct binomial expansion.
Since 4 values of.J are given we can presume that the fourth leading
difference would be o. We have already illustrated that the leading
differences follow the iaw of Binomial Expansion and lIS such we shall
raise a binomial of the 4th order.
596 FUNDAMENTAl.S OP STATISTICS

Thus
b.'o = 0

or
Y4- 4 )'+<?Y2-4Yl+YO=0
Substituting the values of y,,)'u etc., we get
8-28+0'2-20+4 = 0
or 02=36
or Y;- 6
The following example would further clarify this formula :-
Example 8. Obtain the missing figure in the following table.
Value of chi-square at 1% level of significance.
~Degrees of Freedom I, 2, 6, 7.

1% chi-square 16.81, z_S.48

Degrees of Freedom 8,

J % chi-square .

Sollltion. Estimation of the one per cent value of chi-square for


five degrees of freedom.
Degrees 0 om 1% chi-square

6.64 .10
2 9. 21 .11
3 II·34 J.
4 r 3.28 J'a
5 ? Y ..
6 16.81 Y.
7 18.48 _"'t
8 20.07 y,
Q 21.67 J'.
Since the known quantities are eight the eighth leading difference
will be zero.
6. ~O=J'8- !brd-- 28Y6- 5~Y5+7Qy- 5~Jld·2~2 -l!Yl +'>:0,:,"0
INTBRl'OLATWN 597
Substituting the given values, we get
6. 8 0 =(Z.1.67)-(8 X Z.0.07)+(z.8 X 18,48 X )-(56 X 16.81)+70Y4
--(5 6 X 13.28)+(28 X Il.34)-(8 X9.2.1)-6.64=0
=21.67'-160.65 + 5 17.44-941.;6+7~y,-743.68
+ 3 1 N 2.-73. 68 +6.64=0
or 7QV 1=160.5 6-2.1.67-517.44+941.36+743.68
-3 17.52.+73.68-6.64
70Y4=I056.01
or.Y =15.°9
Thus the interpolated value of chi-square at 1% level of signifi-
cance for five degrees of freedom is 15.0 9.
METHOD USED IN UNEQUAL INTERVAL OF ARGUMENTS

If .">: series advances by 'unequal interval then we cannot apply


the .formulae already discussed above. In such cases the most con-
venlent f~r:nula of interpolation is one given by the fam.ous French
mathematicIan Lagrang-e. The formula is named after hIS name as
Lagrange's formula. It is as follows :_
" _ (X-Xl) (X-X 2 ) (.,,>:-x 8 ) ... (X- Xo)
- Yo (XO- X1) (X O-X 2 ) (X O-X8)"'(XO-XO)
+ (X-Xo) (X-X2) (X-XII)"'(X-Xo )
Yl (x 1 -x2) (X1- X 2) (X I -X8) .. (X-Xo)
+ (X-Xu) (X-Xl) (x-x.) ... (x-xo)
]2 (X2--"'0) ('''>:2- Xl) (X 2-Xs)"'(X 2-XO
+ .............................................
(x-·,,>:o) (X-Xl) (X-X 2)(X-Xn-l)
.+.Yo
(Xn--Xo)(Xn-Xl) (Xn- X 2) .. ·(xn-x o-l)
Where'y is the figure to be interpolated, X is the value in the x
series for whi~h)' is to be obtained,xO, Xl' X 8 , ..... xo, etc'., are the given
values of x-varia-ble and Yo, .11' J'Io'YS'" In are the corresponding given
values of _)·-variable. The following examples would illustrate the
formula :-
Example 9. Determine by Lagrange's formula the Canadian
national income for the year 194z from the following data : -
Canadian National Income
Year Million Dollars
1940 5112.
1941 6514
194z ?
1943 906 9
1944 968 5
598 FUNDAKENTJrI,S 01' STATISTICS'

SOINlioll.Computation of the Canadian national income for 1942


Canadian National Income-
Year in millions of dollars
(x) (y)
'940 X9 5 1I2 Yo
1941 Xl 6S14 Y1
1943 x. 9069 .1.
1944 x. 9 68 5 .1r
x = 1.942, the year for which the Canadian national iJlcome has
to be interpolated.
Applying to the above data, the Langrange's formula of interpo-
lation
(x-x 1 )(X-X2 ) )x-x.)
.7" = Yo (.xO-Xl
• ) (X o- ·x2 ) (xo-x.)
+ (",--x o) (x-x 2 ) (x-X8)
YJ (xl-XO) (X-X2) (x-x.)
+ (X-Xo) (X-Xl) (x-X.)
.11 (x 2 - XII) "XI-Xl} (Xl-X,)
(X-XQ) (X-Xl) (X-X2 ) \

+.1. (X,-X;) (x, - x~f(x. - X z)


Where.J is the quantity to be int,.rpolated, X is the given quantity
in the x-variable corresponding to which.".. is to be interpolated, x.' Xl.
X" x~ . ............ are the given values of the variable x, andYo'_'l,.7I,JI
arc the corresponding given values of the variable y.
We get,
(1942.- 194 1) (1941.-1943) (194 2 - 1 944)
.Y. =5 lIZ ('1940- 1 94 1) (1940-1943) (1940-1944)
(194 2-'194°) (19f2-1943) (194 2- 1944)
(1941-194oHI941--1943) (194 1- 1944)
(1942- 1 940) (1942-::-1941) (194 2 - 1944
(1943- 1940 ) (194;-L941) (19H- I 944)
(1944- 1 940 ) (1942- 1941) (194 Z - 1943)
(1944- 1940 ) (1944- 1941) (1944- 1 94 3)
(I) (-I) (-z) + 6 4 (2) (-I) (-2)
(-1)(-3)(-4) P (7-'1):-:(:-:--"';2) (-3)
(2) (I) (-2) + 68 (2) (x) (-I)
(;) (i) (-I) 9 ~ (I)
ory.= -852 + 434 2.67 .;t- 60 4 6-1614.16
= 7922.5 x million dollats.
rNTER I'OLA TICN 599
Thus. the estimated Canadhn national income fot 19·P is 19H.5 \
million dollars.
ExompJ'lo. In the follov.ing table b is the height above Sea le...-et
and p the barometric: pressure. Calculate p \vhen b = j 280
b=o, 476.;, 6942, 1059~
P=2.7, 2.5, 23. 20
SO/lltion. Estimation of the barometric pressure when the height
above sea level is j 28o.
Height above sea level Barometric pressure
(x) (y)
27 Yo
25 Yl
23 Ya
10 593 xa 2.0 Yoa
x= 528o, the height above sea level for which the barometric
pressure has to be interpolated.

Applying the Lagra~ge's formula, we get

(52.80-476,) (52 80- 6942.) (S 2.80-10S 9 ;)


(0-47 6 3) (0-6942.) (0- 1 °59,)
+ 25 (52 80 - 0 ) (52 80 - 6 942.) (52.80-1 0 593)
(47 6 3-0) (47 6 3-6942.) (47 6 3- 10 593)
+2.3 (52 80 -0) (52.80-47 6 3) (52 80 - 10 593)
(6942.-0) (6942.-476.3) (6942.-10593)
(52.80--0) (52.80-4763) (5280-6942)
+2.0 -
( 10 593-0) ( 10 593-47 63) (10 593-6942.)
(5 1 7) (-1662.) (-53 13) L (52.80) (-1662.) (-5313)
or_Y ... =z7 -,2.5
(-47 6 3) (-6942) t 10 593) (476,) (-2.U9) (-5 8;0)
+ ·(p80) (P7) (-5;1 3)+ 02.80) (517) (-1662.)
23 (6942.)(2.179)(-;651) 2.0 ( 10593) (5830) (3 651)

orYz=-.353 + 19. 2 3 + 6.;4-.4°2.


orYIC=24· 8 .
Thus. the estimated barometric pressure when the height above sea
:vel is 528o is 24.8.
tn all the above examples we have illustrated how figures cao be
interpolated under certain assumptions. These formulae can also be
used for the purpose of extrapolation.
600 FUNDAMENTALS OF STATIS1'IC5

Tt is difficult to over-emphasize the importance of th_e study of inter·


polation and exrapolation. Even though figures can be interpolated
and extrapolated only under certain assumptions it goes without saying
that the estimates obtained by various methods of interpolation are far
better than those which are obtained without taking into account the
data available about the problem under study. We have already indicated
in -the beginning of this chapter how useful and how much necessary is
the study of interpolation in various fields and the only thing we would
like to add here is that many useful studies in various branches of human
activity could not have been possible in the absence of the methods of
interpolation.
Questions
I. Write a note on the necessity and usefulness of Int~rpolation.
2. What are the assumptions under which the ligures can be interpolated i
3. How would you interpolate figures by graphic methods?
4. Why parabolic curve is considered to be- the best curve for the purpose of
Interpola.tion of data ?
~ . What are the assumptions of the various algebraic methods of interpolations?
6. Illustrate how the methods of finite differences are based on the Binomial
Theorem. Show also how Newton's formula of Finite Differences is derived ?
7. The following table shows the value of an iInmediate life annuity for every
£100 paid : -

Age in years Annuity (£)


,
40 6.2
50 7. 2
60 9. 1
70 12.0
Interpolate for the age 4:1.
(M. A., Cal&lIIla. 19369.
8. The following table gives the census of popUlation of an Indian State in 1901,
191 I, 1921 and 193 I.
Estimate the population of the state in 1924. making your method clear.
Year Population
(in thousands)
1901 2.797
t9 I I 2.935
1921 3.0 47
193 1 3.3f4
(P. C. S., U. P., 1939).
9. Estimate the annual sales of pencils fot 1942 from the follOWing records
of wholesale merchants :-
Year Sales of pencils
in lakhs of dozens
1932- 2.~
193 6 3°
1940 40
1944 55
'94 8 60
INTERPOLATIO!. 6U1
-./
10. State Newton's formula fot interpolation tot equal Intervale and th e
lssumptions underlying it. Use it to find the annual net premium at age 2S from th e
table given below : -

Age Annual net premium


20 •01 4 2 7
24 .01S81
28 •01 77 2
P .01 99 6
(1. A. S., 19)0}.

11. The following table gives the census of population of a certain tOWn i!l
1891,1901, 19II, 1921 and 1931. Estimate the population in 1925, making your
method clear : -

Years Population
18 91 98 ,754
1901 1,3 2 , 28 5
19 I I 1,68,076
19 21 1,95,690
193 1 2,46 ,05 0
(M. A., Ca"utta, 19 n),
12. The following are the annual premiums in a certain Life Insurance Com pany
for a policy of Rs. 500 payable at the death with an agreed bonus : -
Age next Annual
birthday Premium
Rs. as.
2.S 24-10
;0 2.7-11
35 31 - 9
40 36- 6
45 42 - S
Calculate the premium at age 36.
(M. COfll.,Lu&i:nolll, 1942)
'3. The following table gives the quantities of a certain brand of tea demande.d
at prices noted against each. Estimate the probable demand when the price II
Rs. 1-14-0'8 pound.
Price of tea Quantity demanded
per lb. in thousand
Rs. as. Ibe.
1- 4 82·5
1- 8 70 • 8
l-U. 63.1
2- 0 55.0
z- 4 48,9
(M. A., AII#habad, 1942) •
• 14. The Gross Profit of the Buland Sugar Co., Ltd., are given below:
Years Gross profit
(in lakhs of Rupees)
19;5-;6 4. 86
1937-311 12.64
1939-40 1;.68
1941-41 16.6~
1945-44 13.1 9
Make an estimate for 1932-43 and 1944-45. (B. Co", •• RtU•• 1944).
602 FUNDAMENTALS OP STATIST1CS

1,. 'tf Ix repreaents the numbers living at age x in a life table, find as accurately
.. the data will permit. Ix for values of X= 55,42 and 47, given.
' ..- 51 2 • ',.-439, "0= H6; '50= 245· (1. A. S., 1948).
16. From the following data, estimate the number of persons earning wages bet·
ween 60 and 70 rupees.
Wages in rupees No. of persons
in thousands
Below 40 25 0
40 60 no
60 80 100
80 100 70
tOO 120 ~o

(M. Co",., Agra, 1951,\.


17. Estimate the number of persons having incomes between 1000 and 1500 in
tbe table given below in the groups A and B
Income No. of persons No. of persons
in Ri. Group A GroupB
Below 500 5000
100 - 1000 45 00
1000-2000 4800
2000--3 000 2200
3000 -4000 1500
(B. Co",., Agra, 1947).
t8. The following table relate. to income earned per mon1th by a certan number
of workers in a big manufacturing concern.
Bamin8'l per month Number of workers
in rupeea
Up to 10 50
20 15 0
"
...
"
30
40
3°0
;00
.. 10
60
700
800
It is requited to .find out the number of workers (0) fajling ivithin Rs. 21- 31
earning group and (') earning above Rs. 42.
19' The following are the marks obtained by 49~ candidates in a certain
exa,mination : -
Not more than 40 marks 2 12 candidates
.. 45
50 .. 296
368
55 42 9
60 460
" 65 481
70 49°
n " 49~
"
Find out the number of candidates who secured more than 42 but not more
than 45 marks. (M. A., Ca/rulla, 1953),
INTllltPOLA TION 603
ao. The foltowing ue' the a.mounts of income tax paid by 600 busineflamen of
L certain district of Uttar Pradesh in the year 19,0 : -
More than Rupees ,00 600
.. 1000 550
15 00 42 5
" 2000 275
" " 20,00 100
"" , , 3 000 ••• 25
Find out the number of businessmen who paid more than Re. 1,200 but not morc
than RI. 2,400 at income tax.
21. Extrapolate the population of a town for 1946 from the following data
about ita population during the preTious four censuses :-
Census Year Population in thousand.
1911 473
1921 468
195 1
194 1
4'.
4 84
(M. CD""., RoY., 19,0).
u. The following table gives tbe sales of a concern for the last few ,~.tI.
B.timate the lales for the yeu I9P.
YC8.1: Sales (Rs. 000)
1946 40
1948 43
195 0 48
19J2 J2
19'4 57
2.~.
Year Sales (Rs. 000)
19 2 1 25
193 0 ~~
193' ,')
1940 47
194' 51
19,0 64
From the above table estimate the sales in the year 1943' Use Newton's Ga .... s
(backwatd) formula.
2.4. Estimate the q_utflow of gold from India in 1937'38 from the data ghreo
below:
Average value of net exports of gold coin and bullion.
YC8.1: Re.
1933-34 89.,6,3:l,418
193<4-5' 52.,53,74, 60 7
1935-3 6 37.35,5 8,955
1936-37 2.7,84,61,I2.9
1938-39 2.3,26,02,668
(M. Com., Au"babad, 1948)
2.,. Below are given weighted index numbers of cost of living of labow:en
I industrial centre in India. Inteq)Qlate to find out the misting iodell: number
lin
Ir 1933 to the nearest integer, using all the figures:
Year Index
19~0 173
193 1 149
1932 145
1934 13 1
193' 141
604 FUNDAML'.NTALS OF STATI sncs'

The age of mothers and the avemge number of children borl(l. per mother
26.
are given in a table below.
Interpolate the average number of children born per
mother aged ~O-34.

Age of mother No. of children


in years born

15-1 9
10-z4
:1S-2 9
30-34
~5-39
40-44 (4~. P. C. S.).

(M. Com., Allahabad. 194'£,)


27. Tbe following table gives the population of a town at the time of the
last SIX censuses.
1881 75.401
1 891 82,984
1901 86,686
19 I1 86,547
19 2 1 93,09 1
193 1 1,27,~17

Estimate the popUlation for 1941.


zg. The number of students in a re<,:ognised institution of India for the last
nine years is given below. Estimate the numbers for the yea~ 1951-51.

Year No. of students


194 2 '43 IIIO
1943-44 IISO
1944-4' u8a
1945-46 noo
1946-47 12.10
1947-4 8 11 2 5
1948-49 U5 0
1949'5 0 uSo
195 0 -51 u60

~9. It is required to find the missing value in the following table. Establish
!lny suitable formula for interpolation, and find the missing value.

Serial No. Value


6·4577
2 3·4531
3 2.5 604.
4 2.15 2I
~
6 1'7 8 49
7 1.6874
8 1.61 77
9 1.5646
10 I.51~2

(1. C. S., 1944). '"


INTERPOLA TION 605
30. I nterpolate the missing figures in the following table of rice cultivatloQ :-

Year Acres in millions


1911 76 •6
1912 7 8,7
19 13 ?
19 14 77·7
19 1 5 7 8.7
19 16 ?
19 1 7 80.6
19 18 77. 6
19 1 9 7 8,7
(B. Com., Agra, 1937).
31. {ntetpolate the missing figure in the following table with the help of a
suitable formula :
19 I I 1331
19 12 17 28
19 1 3 21 97
191 4 ?
19 15 3375
19 16 40 96
1917 49 1 3
(M. A., Delhi, 1939).
~ 2. The annual sales of a concern are given below : -
Year Salea of cloth in
lakhs of yards

19 1 5
1920
19 2 5
193 0
1935
Assuming the conditions of the market to be the same, estimate the sales for th e
year 1940. eM. A., Palna, 1941). \

H. What do you understand by interpolation and extrapolation ~ What are


their uses? The following table gives the normal weights of babies during the first
twelve months of Ufe : '

Age in months z g 10 12
Weights in lbs. 7t 16 IS ZI

Estimate the weights of the baby at the age of 7 montha. (M. A., PaIno, 1940).
34. Determine by Lagrange's formula the percentage number of criminals
!lnder ~ S years.
Age %number of criminals
lJ ndec z 5 years 52 •0
30 " 67-3
40 84.1
,0 " It

94·4
(M. A., Agra, 1934).
606 FUNDAMENTALS OF STATISTICS

35. The following table gives the npmber of income tax aSSe8sees in U. P.:-
[ncomes not exceeding Number of
Rs. assessees
2,5 00 7,166
3,000 IO,S76
5,000 17,200
7,5 00 20,5 0 5
10,000 21,97S

Bstimate the number of assessees with incomes not exceeding Rs. 4000.

(M. A., AiM., 1944).


~6. Estimate the expectation of !if.. at the age of 16 years using the following
data :
Age in years Expectation of life
10 35.4 year~
15 3 2 .3
20 29. 2
2.S 26.0
30 ~2..z ..
35 20·4 .,

(P. C. J., 19S1).


57. The following are the number of deaths in four successive ten-years age
gftIUpi. Calculate the number of deaths at 45-50 and SO-H.

Age-group Deaths
.1.1- 13,2.1.9
35- 18,1391
45- 24,.1.25
55- 31,49 6
(P. C. S., 1'5.1.).
58. Find by algebraic method of interpolation, using all the infurmation given
the likely number for 1950 from the following table of index numbers of production
of certain article in India

Year Index Number


1948 100
1949 10 7
195 0
19P
195 2
(P. C. S., 19H)'
39. What are the assumptions on which the method of interpolation is based ?
Fm out Newton's formula for interpolation in case of equal intervals.
The following are the marks obtained by 492 candidates in a ,;ertain examination.
Not More than 40 marks :!IO Candidates
,f
,f "
It 45
So
253
30 7
.
,. 55 381
"
60 4 13
65 49 2

Find out the number of candidates (a) who secured more than 48 but not more
than So marks, (b) less than 48 but not less than 45 marks. (P. C. S., 1<)54)
INTERPOLATION 607
40. The following figures show the valbe of a life annuity upon a single life
aged 20, at rates of interest varying from z.~ [,.} , per cent : -

Rateofinterest-z.,. 3. 0 • 5.'. 4·0.


Annuity value - 24.145. zz.045, 20.225. 18.644
Rate of Interest:"'" 4.5. 5.0
Annuity value - 17.262, 16.047

Calculate the intermediate values at 2.75 and 3.75 per cent after developing an
appropriate interpolation formula. (P. C. S., 1956).

41. Develop a formula which will help interpolation when observations are
known to be at unequal intervals.

The observed values of a function are respectively 168. I ZO, 72 and 65 at the four
positions ;, 7. 9 and 10 of the independent variables. What is the hest estimate you
an give for the variable of the function at the position 6 of the independent variable?
(1. A. S., 195 I).
42. The. following values are given in a table:

x .}
216.000
Z 22.6.9 81
3 ----
4 25 0 •047
26z,l44

Using any suitable 4lgcbrair. method, find the value of.} for X= ,.
Also draw a graph of the above points on a piece of squared paper and from tbl.
graph find the value ofy for X=4.4 cr.
A. S., 19H).
4~. (a) By coqstructing a difference table,lind the 7th term .. well a8 the
general term of the sequence :

0t o~ 2" 6. 1_. 20,H ................. 30.

(b) Given
Sin 4,°=0.7°7 1 ,
50°=0.7660,
55 0 =0. 81 9 Z ,
60°=0.8660.
Fi~d Sin p.o, by using any method of interpolation. (1. A. S., 195 ,).

44. Given log10 654=2.8156; log10 6,8=z.81112


log10 659=2.8189; log10 661 = 2.8202.
Find 10g10 656 using two different interpolation formula a vailahle for ob.iervations
unequal intervals, say. Lagrange's formula and the formula for divided differencee.
(1. A. S., 1956)
PUNDMBNTALS OP STATISTICS

4S. The length of the day was IZ hours on March 19th; 14 hours on April 18th
and I S hours 40 minutes on May 18th. Required an approximate value of C.) t h~
length of the day on May 3rd (b) the m~n length oftbe day during the period, March
19th to May 18th. (1. A. S., 1947) .•
. 46. The following table gives the population of Indore city at the time of last
61X censuses : -

1901 19 11 19 z1 1931 1941 19S1


99,880 51,z8S 1,07,948 1,47,100 2,03,695 3,10,8S9
Estimate the population for 1961. eM. CDm., ViA:rfJ1ll, 1961 )
47. Use some appropriate interpolation method and reconstruct the following.
frequency table with class-intervals halved :
.x o-z Z.-4 4-6
Frequency 3S S2 eM. A., Raj., 1961).

48. Interpolate the missing figures in the following table:


Year 19S6 51 S8 S9 60 61 62 63 64
No. of Students ItO ISO ZOO :uo 22S 260
49. The annual purchases of a company are given below : -
Years 1930 H 41 SO 60
Purchases of Coal
in thousands of tons) 2S 60 30 lItl 170
Estimate the purchases of the company for 196 5.
I
So. The following table relates to marks obtained by 130 competitors at the
I. A. S. Examination. Find out the number of competitot8 who secured ISt class
marks in the examination.
Marks out of zoo No. of competitors.
More than So 13 0
. 126
. .. ., 110
80
36
.. .. 140 14

p. From the table estimate India's population fof'the year 1896'by


(i) Parabolic curve method.
(i;) Newton's method, and
(m) Lagrange's method.
Year 1881
Population z53
(in Millions)
5z. The following figures give the number of students admitted in a·college
in the first few years of ita starting. Estimate the number of students in 1962 and
[913 and comment 00 the results.
Years 195 0
No. of Students 80
INTERpOLATION 609
B. From \'be inforInition given below, estimate the total production of wheat
in eaCh year 1941 to 1948.
Two year Period :
Monthly production : 4
(000 Maunds)
H. The followirig figures relate to the production of a certain commodity it>
four years. Estimate the missing figure for I94~.
Year 1941 194~ 1947
Production 1%4 31 7
('000 Maunds)
What will be your est.imate if the production is expe.:ted to follow reciprocal
relation. Comment on the results.
~ 1. The function of 3% table shows the valuea al I, 3. 9 IIIld BI, when :It is equal
to 0, I, %, 4 respectively. Apply any' method of finding difference. obtain the value
corresponding to X=3. Explain why the resulting value differs 31 or %7.
Business Forecasting
Meaning and Nt~d. Forecasting is a part of human conduct. What-
!ver an individual does at present is in the expectation that certain events
.viII take place in future. This expectation about the happening of
:ertain events in future is generaU-:-r based on past experience. Forecasts
nade in this fashion mayor may not be true. Inaccuracies in such
forecasts arise on account of either the utilisation ofincorrect information
or the application of faulty reasoning even to correct data. Many
times people suffer huge losses due to incorrect forecasting but this
does not stop people from making forecasts. It is so because in various
fields of human activity forecasting has become more or less indispensable
and in many fields like business and commerce, succesS or failure depends
to a consiClerable extent on whether the forecasts are accurate or inaccurate.
In business and commerce the impor,tance of forecasting is so great that
when one enters business, he really enters the profession of forecasting.
The principles on which business forecasting is done are the same on
which individual forecasts are made. However, in business there is
need for caution in forecasting, as an inaccurate forecast usually involves
huge los~. No ooubt there are many experienced \businessmen who
make cqrrect forecasts witllout any knowledge of statistical methods
but the;r number i~ very small and usually they are such persons who
have considerable experience at their back and whose intuition generally
comes to be true. They are just like experienced sailors who forecast
the weather conditions quite accuratdy merely by looking at the sky.
But everybody cannot do so and in recent times considerable research
has been conducted in this field and attempts have been made to make
forecasting as scientific as possible and in this way to reduce the hazards
and risks usually associated with it. In the modern sense of the term the
problem of bllsiness forecaslinrt. refr.rs to the ana~lsis of the past and present
econo1lJic conditions lIJith a view to draw inferences about the illturl COllrs.
of events.
Basis
The basis of scientific business forecasting is not so much the esti-
mation of certain figures of_..sale, production, profits, etc., as the analysis
of known data, internal and external, in a manner which will enable a
policy to be determined to meet possible future conditions to the best
advantage. Business forecasting gives recognition to the fact that the
science of statistics is not only useful for studying the past but also for
studying the present and predicting the future. There are two aspects
of scientific business forecasting. The first is the analYsis of past conditions
and the second is the allalysis of &1/"ellt conditi01ls in relation to a probable
ftdllrl tmthtifJ.
--
The analysis of pa~t conditions or ~t()rical analysis of the problem
would reve~l the course that business had followed in the past.' Fot the
analysis of past conditions· we have to study the various types of factors
which affect a time series relating to busines.s. 1n the chapter riP Analy~js
of Time Series we bad seen that there are four types of ficfors which
,affect time series in generu. They are secular trend, cyclical fiuctuatiBos,
season.al variations and irregular factors. For analysing the past COJl-
ditions all these four types of factors would have to be- studied very care-
fuily and their effects would have to be analysed in'detail. The gena;al
trend of the series v,.:ould giv~ an idea of the qttection in which the series
was moving in past and its probable future course over a long period.
The cyclical fluctuations would reveal whether the series u.nder study is
passing through a pedod of boo1:ll or'a period of depress~on. It would
also reveal the peL"iod·ofthe trade cycle. These things arc ofvcry gr~it,
help in forecasting the future course of ev:ents. Thus, if a series is pass-
ing through a petie d of boom ~d if it is felt that the peak point bas been
feached it can be forecasted th,at in future the seri~s wou[d move in the
reverse direction. Similarly, ,s~as()nal variations of the past would ia-
dicate the course of ~vents in the immediate future. Seasonal fluctuations
reveal the she~t pe~iod movetnents of a series and in day-to-day business
forecasts have to be made about the near future period. As such seasonal
variations have to be studied very closely and in detail so that the course
of events in neat future can be correctly f01:eca.sted.
The.analysis of the present conditions would reveal those factors
wpich are influen9- ng the phenomena under study in a particular direction.
[n the analysis or present- co.I}ditions an attempt!s made to stUdy the
facton which affect and alter th~ sequential changes expected on th,e
basis of the ,analysis of past conditions. Such factors are ne~ inveation~.
changes in designs and fashions, changes in government's eco.Q.om!c
policy, war, armis-tice-. etc. The analysis of present conditions., is done
with a :view to have an idea about the future cQurse of events. Suppose
by the study of past data we conclude that there is a trade cycle in every
seventh yeat'. Now it is likely that due to the above meat,ioned factors
the duration of the trade cycle; may change and actually we may. expect
it earlier than seven yeat:_s or we may.nat expect it tilJ after seven years
even. Thus on the basis of the analysis of the present conditions it
would be pos~ihle for us to modify the conclusions which w~ artive at
by a study of past events.
T~pique
From what has been said above it is cleat that for scientific busi-
ness forecasting it is necessary to have detailed information about the
past movements in the series relating to the phenomena under study -as
also full infotmation about the special factors afiecting the problem at the
time of making the forecast. If. for example. a forecast has to be made:
about ,the prices of wheat in future we shoUld hav:e complete information
about past movements ip th,e prices, of whc;at_ an~ we should al;;o know
the v.a,dous £acto~ which uc.afJccting.whC$t,prices .tthe pr~t momen,,:-
612 PUNDAMENTALS OP STATISTICS

Detailed figures of production, consumption, exports, imports, etc.,


would have to be collected and the economic poltcy of the government,
the weather conditions, the area under cultivation and similar other
factors which can affect the prices of wheat in future would have to be
closely analysed. Usually series relating to these informations are
converted into index numbers so that the relative movements may be
properly studied.
Bflsine.rs Barometers. The index numbers relating to business
conditions are called business barometers. These index numbers are
modern device to study the trends, seasonal fluctuations, cyclical move-
ments and irregular fluctuations. Business barometers facilitate various
forms of business forecasting. It should be remembered that business
index numbers are not necessarily composite index numbers of all com-
modities. Even a price index number or an index number of the pro-
duction of a particular commodity is a business index number since it
gives us the relative movements of a particular sc:ries. A business index
number may refer to general conditions of trade or finance or to a parti-
:ular trade or industry or to an individual business. Indices of pro-
Juction, 'of prices and wages, of financial statistics of bank clearings,
money rates, and indices of prices of stocks and shares when plotted
graphically give a readies view of the movements, their seasonal variations
and long period trends. Thus with the help of business activity index
numbers it becomes comparatively easy to forecast ihe future course of
events.
However, it should not be forgotten that business barometers
have their own limit~Hons and they are not sure roads to success. All
types of business do not follow the general trend. The trends disclosed
by index numbers of business activity may be different from those actually
observed in different types of businesses or different types of industries.
This is the reason why separate index numbers are prepared for various
types of business, industry, agriculture, transport, etc. Moreover many
times business barometers may give misleading conclusions. This may
be due to paucity of data from which the.y have been constructed or due
to the defects in the collection of the data, etc. If forecasts are made
exclusively on the basis of business barometers, it is not unlikely' to
get faulty conclusions. Business forecasts should be made on the basis
of past experience as modified in the light of current conditions. f!.co-
pomic barometers may sum up the past conditions all right but so far
as the effect of current conditions is concerned they cannot do much.
The tendency disclosed by indices of business activity should always be
modified in the light of current factors affecting the particular phenomena
which is under study and then only one can hope to get fairly reliable
conclusions.
General assumptions
Whenever a business forecast is made there is always a funda-
mental assumption behind it. It is' an assumptio~ of 'general orderliness
oj d.4(· It means that there are no violent changes in the g~neral
BUSINBSS FORBCASTING 613
tendency of the series. The reader would remember that a similar
assumption was made in connection with interpolation of figures in
the last chapter. As a matter of fact this assumption is always made
in almost all types of statistical studies. '"This orderliness is believed
in by all schools of thought. By this belief we mean an assumption
whiCh appeals to common sense and which has beeen confirmed by
science though it is hardly anywhere expressly stated. It is a belief in a
general order, in a recurrent regularity or a slow but continuous change
and orderly development of the things and events of the world."
"The recognition of this belief i!; necessary if we are to understaCld the
manner in which the statistician moulds his investigation and arrives at a
statistical inference. He follows the method of the experimental
scientist when he selects as a basis of forecasting-past period for study
as nearly as possible like that of the preseht."
An obvious question that may arise at this stage is whether there.
ISany difference in the technique of forecasting and the theory of pro-
bability. We have mentioned above that' forecasting is based on general
orderliness of the data and we presume that whatever happened in the
past would happen in future also if conditions are similar. 1bis meam
that there is hardly any difference between the theory of probability
and the technique of business forecasting. But in reality it is not so.
There is very great difference between the two. In theory of probability
we study only past conditions but in business forecasting we have to make
modifications in the light of current events and hence the theory of
probability is not the only basis for business forecasting. The theory of
probability holds good only on a random data but when WI:; forecast for a
particular year say 1955 then it is not a random year. We have to study
the effects of the year 1954 on 195 5 and it is this thing that is ignored
in the theory of probability which does not take into account current
conditions.

THEORIES OF BUSINBSS FORECASTING

It has been said earlier that business forecasting has in recent years
been made a more scientific and aq:urate proposition than what
it was formerly when business forecasts were made only on the basis
of experience and intuition. In modern times business forecasting
has been put on scientific footing so "that risks associated with it have
been considerably minimized and the chance of precision iqcreased.
In fact in most of the economically advanced countries of experienced·
persons and which undertake this highly specialized work of drawing
statistical inferences from the past data as modified in the light of current
conditions, with a view to study the future course of events, the
Harvard Economic Socil!ty, Brooletnirl! Economic Servicl!, Babson Statistical
Organisation of the United States of America, The Lo,rdon and Cambridge
Economics S~rvice of the United Kingdom and The SwedisiJ Board of T,.ad6
~re world famous institutions which undertake the work of business
614 FUNDAMENTALS OF STATrsr..cS

forecasting. Thus, theories have been derived au} of researches that


have been conducted by various individuals and Institutions working
in this field. Some of the important theories of business forecasting
are .discussed below ; -

Time-Lag or Sequence Theory


This is by far the most impor:"lnt theory of business forecastIng.
I t is ba~ed on the assumption that various businesses show similar move-
ments but they are not simultaneous but successive. On this basis
various well-recognised sequences have been found out. Thus when
there is currency inflation in a couritry the first thing to be affected by
it is the foreign exchange rate. With the currency inflation foreign
exchange rate becomes unfavourable. After this wholesale prices begin
to rise and soon after the retail prices are also affected and they also move
in upward direction. With the rise in retail prices the cost of living goes
up and with it there is a demand for increased wages. After some time
wages· also go up. We thus see that one factor-currency inflation in
the present case-has affected various fields of economic activity not
sir~1Ultaneously but successively. There is time-lag between different
movements. Another sequence of this type is speculation-business
activity-money rates. Thus when speculative activity increases in it
country business activity also goes up and after' a certain time-lag it is
followed by a rise in the money rates.
Harvard Economic Society of U. S. A., London and Cambridge
Economic Service of U. K., and Swedish Board of Trade base their
forecasts on this theory.
If further s~ies are conducted they would reveal that these se-
quepces can be increased still further. Thus when with inflation whole-
'!ale prices rise they do not rise in all the industries and businesses simul-
taneously. There are som-e industries and businesses which are affected
earlier than others. Still further studies may reveal that even in the same
industry some types of units record a change earlier than others. In
(his wav the chain of sequences can be increased to a considerable extent.
:r he most important thing so far as thi~heory is concerned is the
stu<\J of time-lag between various movement~ ( If the time-lag is correctly
estlInated forecasting becomes accurate anMependable. In the estima-
~on of t;ime-lig l;letween the movements in tWo series the first step is the
converSlon of both the related series into index numbers. After this
the cyclic percentages of the two series are calculated by. dividing the
cyclical variations by the standard deviation of the series. * The cyclical
percentages curve of one series is then superimposed on the -other to
get an idea about the time-lag be'tWeen them. Tune-lag between two
s~~jes is also :stutlied by calculating the coefficient of correlation betw cen

, .Sce Chapter 00 Correlation.


BUSlN:'8SS FORECASTING 615

tht: two series of cyclical percentages with .~ffetent time<-1ags. The t.im~
lag which gives the highest value of coefficieilt of correlation is con:si-
dered to be the best estimate of the lag between two $eries.
Once the time-lags between the movements of various series have
been estimated forecasting caJ?- be done easily. It should be remember~d
that here also forecasting is not done mechanically and due adjust/Dents
are made for the effects of the current economic conditions and other
special factors operating at that time .. Thus. if there is currency infla-
tion in a country it can be forecasted that wholesale prices would go up
or that retail prices would increase or that the wages. would record 'an
upward change but if there is an effective government control over
prices and wages, then despite inflation they may not record any change
or in any case the expected quantum of change. We thus see that the
effects of special factors operating at the time of making a forecast are
very important and have to be taken into account.. In the above case
unless the forecasts are modified in the light of special factors operating,
the inferences are bound to be misleading and inaccurate.
\ .
""ction and reaction theory
This theory is based on the assumption that every action h .. a u

reaction after some time. It also assumes that the magnitude of reaction
is based on the magnitude of the original action. Thus if the price of
a commodity has gone up above the normalleve1 there is every likelil..
hood that after some time it would go down below the normal level.
In making forecast according to this theory special study has to be made
about the normal level of the phenomena in question. Normal le:vel
is not fixed for all times and sihee it is a dynamic concept, normal level
of phenomena has to be very carefully estimated at the time of making
forecasts. It is common knowledge that after a boom there is a depress-
ion which is again followed by a boom and the cycle goes on in this
way. Thus for every action there is a reaction in the reverse direction.
This being so if it)s felt that a particular phenomenon has gone above its
normal level it can be forecasted that after some pme there would be a
tendency for it to move in the reverse direction below the normallevel.
The extent of the movement and the time after which the reaction would .
set it would have to be decided on the basis of past happenings as modi-
• fied in the light of current facts. Thus the basic nature of business fore-
C2.sting remains the same in all theories and the analysis of past and preseot
conditions has to be done in all cases.
Babson Statistical Organisation· of the U. S. A, makes its forecasts
on the basis of this theory. It should be remembered that irt· this theory
forecasting is done on the basis of actual level of phenomena in relation
to its normal level. .'
Specific histotical analogy theory .
.As the name of this theory suggests it 15 based on the study of
such past conditions which closely resemble those under which forecast.
\
616 FUNDAMENTALS OF STATISTICS

ing is being made. What is done actually is that a time series relating
to the data in question ,is thoroughly scrutinized and from it such period
is selected in which conditions were similar to those prevailing at the
time of making the forecasts. The course which events took in the
past under similar circumstances is then studied and it gives an idea
about the likely course which the phenomena in question would follow
in future. The theory is thus based on the assumption that history
repeats itself and that whatever ha'ppened in the past under a set of cir-
cumstances is likely to happen in future also if conditions are the same.
This theory also makes due ad;ustments for the special circumstances
which prevail at the time of making the forecasts but it is largely depen-
dent on pa.st conditions.
Cross·cut analysis theory
This theory is different from the last one. It denies that history
repeats itself in economic life and according to it each factor affecting
a phenomenon should be studied separately and independently. In the
last theory we have seen that business forecasting was done on the basis
of the analysis of past conditions as modified in the light of current
situations. In it, the different factors affecting the problem under study
were not studied separately and independently. In l!istorical analogy
theory, as also in other theories the conclusions wefe arrived at by the
study of the combined effects of the various factors affecting a pheno-
menon as modified in the light of current conditions. In Tthis theory
the combined effects of the Tarious factors are not studied. The effect
of each factor is studied independently. The process is very difficult
and we have already discussed in the chapter on Analysis of Time Series
how difficulties arise in the separation of the effects of va rious types of
factors affecting a series. It is obvious that this theory makes little use
of historical data. It concentrates on the analysis of the present factors
affecting a phenomenon in question and as far as possible the effects of'
each factor are studied separately.
Utility of business fbrecasting
For Controlling Business Cycles. Frolfl what has been said above it
is clear that the utility of b1.lsiness forecasting is v,. ''17 great not only to
businessmen and economists but to the society as a '" :lOle. It is common
knowledge that business cycles are always very harmful in their effects.
Abrupt rise and fall in price level is injurious not only to businessmen
but to all types of persons. Industry, tradi", agriculture, etc., all suffer
from the painful effects of depression. Trade cycles increase the risk
of business, create unemployment, induce speculation and discourage
capital formation. They spread from country to country and in a short
time the entire economic body of the whole' world is in the grip of trade
cycles. The Great Depression of the "thirties" is a very well-known
illustration on this point. Business forecasting reduces the risk associa-
t~d with business cyeles. If businessmen, industrialists, economists
know in advance that a period of depre3sion is expected in the
near future they can plan in such a manner that the intensity of
BUSINESS FORECASTING

depression is roduced and its harmful effects are minimized. Sim ilarly
if businessmen could know in advance that a boom is to set in they
'can plan their policy in such a mann-::r as to take the maximum advan-
tage of the situation. B lsiness forecasting is thus very useful for the
purpose of controlling business cycles .
. For Making Profits: Besides this, businessmen make forecasts for
the purpose of making proBts. As has been said earlier when a person
enters business he enters the profession of forecasting. In business,
forecasting has to be done at every stage. A businessman may dislike
statistics or statistical theories of business forecasting but he cannot do
without making forecasts. A businessman has to forecast the future
level of prices and the extent of demand and his success or failure de-
pends on the accuracy of the forecasts that he makes. The amount
of stock to be kept by a businessman or the quantum of goods to be
procluced by an industrialist entirely depend on what they feel about
the future course of events. It is thus obvious that in business and
commerce forecasting is indispensable and it plays a very important
part in the determination of various policies.

As has been discussed .earlier business forecasting has lately been


devel9ped into a scientific technique. Various types of researches are-
oeing conducted in economically advanced countries to make business
fore~asting more accurate with a view to control trade cycles. Speci-
alised institutions have been set up in these countries which carryon
experiments in this field and also make actual forecasts. In our country
unfortunately there are no such speciali:ted institutions to do this job.
We have no doubt indices of industrial and agricultural production .as
also .indices of prices and of cost of living, e~c., but there is no co-
ordinate scheme of making business forecasts. One of the p<?ssible
reasons for the absence of a well co-ordinated agency of business fore-
casting in our country is that these indices are recent and for correct
forecasting, a time series of such indices, spread over a long period is
needed. Moreover it is also a fact that our ipdices are not complete
and are inadequate from various points of view.
:It should, however, be k-ept in ~ind that business forecasting is
not a sure road to success. It has its own limitations. The assump-
tions ]lnder which business forecasts are made may not always be
satisfied and when it is so the forecasts are likely to be misleading.
Human behaviour is a very uncertain and unpredictable thing and as
such the element of risk; in forecasting should never be minimi2!ed.
It should not be forgotten that though history repeats itself it does not
repeat itself with a mathematical precision and particularly in econo-
mic and social Belds where new factors always come in. Business
forecasting only discloses what is likely to happen"in future under
certain assumptions-nothing more than this.
S18 FUND.t\MBNTALS OF STATISTICS

Questions
I. What is meant by business forecasting? What are the assumptions on which
business fore-casts are made ?
z. Discuss the important theories of business forecasting. How does analysis
of time series help in forecasting of ecc,nomic events? (M. Com., A/I•. , 195%).
3. What is meant by business forecast? Explain the major classes of me-
thods used in forecasting. (M. Com., urJ:now, 1944).
4. What IS meant by business activity indez? How will you 'Weigh the va-
rious series of which such index will be composed? (M. Com., LIICJ:noIlll, 1947).
5. What do you understand by time-lag theory of business forecasting? What
are the general assumptions in this theory ?
6. Write a note on the usefulness of analysis of time series in business
forecasting.
7. What important thtories of business forecasting are known to you? Give
a critical estimate of each of them.
S. What is meant by business barometers? Write a note on their limitations.
9. What is the utility of business forecasting in th ~ modern world? What
are its limitations?
10. In what fields of business facts must be studied to,judge the position o_f
a business in regard to btlsiness cycles? Why is a careful analysis of business cycle'
important in the field of marketing? (M. Com., ~GJ:nDIV, 1943)
Interpretation of Data [
22
In previous chaptt'rs we have discussed the various methods of
colle<;:tion and analysis of data. An attempt has been made in those
chapters to a'nalyse the various methods which are used 'by statisticians
to collect statistical material and the technique used by them for their
detailed analysis with a view to draw inferences from them. All such
rules of collection and analysis of data are briefly termed as "statistical
methods". The task of the statistician does not end after the collec-
tion and analysis of data have been done and he has to draw inferences
from the analysis that he has done. In drawing inferences it is neces-
sary to exercise extreme care otherwise misleading condusions may be
drawn and the whole purpose of the collection and analysis of data
may be destroyed.
Meaning and .;Need. Interpretation of data refers to that part of
the science of statistics which is associated with the drawing of in-
ferences from the collected facts after an analytical study. Interpretation
is an extremely important and useful branch of the science of statistics
because it makes possible the use of collected data. Statistical facts have
by themselves no utility and interpretation makes it possible for us to
uti1is~ collected data in various fields of activity. The usefulness and
utility of collected information lies in its proper interpretation. Ali
statistics are collected with a view to draw certain conclusions about the
prbblem which is being studied. In all such sciences where the method
of induction is used, statistical methods are important tools but as is
true of all other methods,statistical methods also depend to a considerable
extent on the nature and the use to which they are put. If statistical
methods are misused it is natural that the conclusions obtained would be
inac~rate and undependable and 'if, on the other hand:(a proper use of
statistical methods has been done there is no reason why the inferences
drawn would not be fairly accurate and trustworthy. It is, therefore,
extremely essential that various statistical methods are very carefully
used in the analysis of data. Mistakes are committed in the analysis of
data either deliberately or unconsciously. As a scientist it is the duty
of the statistician to see that mistakes are as few as possible. Deliberate
mistakes are due to bias and prejudice and they can be e1imit:lated com-
pletely if tlIe statistician is careful in the selection of his staff. so that only
such persons are given the task of collection and-analysis of data who can
work impartially. In previous chapters, whenever we have discussed
the methods ofl:ollection' of data or their analysis we have indicated the
various sources from which errors can arise. But it is not enough that
the 'statistician onfy kno'Ws the possible sources of errors or mistakes
"as. this knowledge would not by itself reduce the magnitude of the errorsfo..
620 FUNDAliENTALS OF STATISTICS

Foro minimizing errors the statistician has to be very careful at every


stage and has to do his job without preconceived notions of any type.
Whatever is true of collection and analysis of data is also true of inter-
pretation. Even if the data are properly collected and analysed wrop.g
Interpretation would lead to inaccurate conclusions. If a particular
group of persons are affected by the conclusions drawn from a set of
figures they would naturally like to·interpret them in such a manner that
their selfish ends are fulfilled. It is therefore, absolutely essential
that the work: of interpretation of data is assigned exclusively to such
persons who are not only familiar with the use of statistical methods
but are' also impartial and who can look: at the problem under investiga-
tion in a correct perspective.
Before interpreting the data the statistician should carefully note
the following things : -
(i) That the collected data are appropriate for a .study of the
problem under investigation and that they are trustworthy also.
(i;) That the collected data are sufficient for the' particular prob-
lem and for drawing inferences.
(iiI) That the data are homogeneous.
(ilJ) That the liata ha~e been properly analysed by the use of
appropriate statistical methods.
The statistician must invariably satisfy himself about 1\11 these things.
They relate to the collection and analysis of data. After the statistician
has satisfied himself abQut these things he should proceed to interpret
the data and to draW' inferences from them. Here also he has to be very
cautious as. errors can ariso in the process of the. interpretation of data
also. In interpretation, errors can arise due to two reasons : -
(a) False generalisations.
(b) Wrong interpretation of statistical measures (like averages,
index numbers, measures of dispersion, correlation, etc.) cal-
culated from the data.
False generalisations
The reason for such types of mistakes generally Hes in the fact that
the people draw conclusions about the whole by only studying a part.
It is not always necessary that whatever is true of the part must be true
of the whole also. Sometimes it may be that the changes recorded
in part are very similar to changes in the whole of the data ~ut it is not
so invariably. Moreover to draw generalizations about the whole it is
necessary to know the movements recorded in various parts. It is likely
that in one part the movement is in one direction while in the other it is
in reverse direction. Under such circumstances conclusions drawn on
the basis of the study oof a part would not be applicable to the whole.
Such generalisations are mostly done by propagandists and advertisers.
They magnify the conclusions drawn from parts and it appears as if t,he
\
INTERPRETATION OP DATA 621
"
statements made are true of the whole; actually they are not. Thus it
a manufacturer of a medicine advertises that 90% of the patients who
used a particular medicine manufactured by his company are cured of a
particular disease, it appears from the statement that the medicine is
really a very good one. But it may be that the percentage of cured
amongst those who used some other medicine is 99 or even 100. It
may also be a fact that 99% of those persons who do not tak-e any me-
'dicine for that particular disease are automatically cured. Yet ano-
ther possible snag in the statement may be that the patients on whom
that medicine is tried are those who have just contacted the disease and
are thus in very early stages of it. Thus it is obvious that if conclusions
are drawn from parts they may not be true about the whole.
Some examples of how false generalisations arc commonly made
are given with a view to make the student of this subject careful and
cautious in accepting the generalisations made by other people. Sup-
pose it is said that since the per capita income of our country bas in-
creased, therefore, there is economic progress in the country and we
are Bnancially better off from previous years when the per capita in-
come was at a lower figure. The argument appears to be all right bqt
if we go a little deep in the matter we shall realise that inferences drawh
from the fact that the per capita income of the country has gone up
may not be correct. Thus if the per capita income has increased Oil
account of the fact that the price level has gone up there may not be
any progress and financially people may be worse off. If the per capita
income of a country is douLled and if the prices increase fourfold then
in terms of goods and serivices the income per capita has gone down.
Again it is also likely that rich people might have become richer and
the poor poorer so that only a small section of the people are better
off while a large maj ority of people are actually worse oII from the previ-
ous period. It is also possible that the calculation of per capita incolI;lc
at twn different periods may have been done by two different methods
and this may be responsible for the increase in the figure of the national
income in the latter period. It is also likely that the increase in pel
capita income may be of a very temporary nature due to special circums-
tances. Thus in times of war when the production' is at its peak point
and the prices also go up the figure of the per capita income generally
increases but it hardly indicates any solid type of economic progress. It 15
thus obvious that the statement which appeared only innocent and logical
in the beginning may actually be very misLhievous and illogical.
Take another example. Suppose it is argued that since the im·
ports in the country are increasing year after year, therefore, peopl<
are consuming more goods and it is an indication that the economic
condition of the people is improving. This statement 'also appearl
to be logical at the first thought but if we think about it carefully W(
may find many pitfalls in it. Thus it is possible that along with at
increase in the imports the quantity of re-exports may also have gom
up in which case there is llil increase in the per capita consumption of
622 FUNDAMENTALS OF STATISTICS

the goods. Even if the quantity of the re-exports has not increased
there may have been a decline in the conSUlllption of homemade goods
and there may not have been any increase in the per capita consump-
tion. It is also likely that the population of the country has increased
and the additional imports are accounted for by the increase in popu-
lation. If none of these factors have changed and the per capita con-
sumption of goods has really gone up it is no proof that the economic
condition of the people has improved. It may be that the additional
imports are mostly of luxury goods consumed by a handful of rich per-
sons who may have become richer whereas the vast majority of the
people may be consuming the same quantity of goods as formerly or
even less. It is very obvious that the conclusion which appears all
right at the beginning may n9t at all be accurate.
Wrong interpretation of statistical m.easures
Wrong ;nlqpr,tation of index nllfllbus. Mistakes in interpretation
of data may also arise due to wrong interpretation of statistical ~easures
calculated from the data which have been collected and analysed. Thus
if index nUlpbers have been calculated from the collected data and if they
are not properly interpreted, wrong conclusions are bound to be arrived
at. It has already been said that inda numbers only reveal a general
tendency and further that index numbers constructed for one purpose
may not necessarily be suitable for other purpose. If some conclusions
are arrived at by interpreting inda numbers without keeping in mind
their limitations it is quite likely that they may not be accurate. Similarly
if inda numbers constructed for one purpose are used in such problems
which are of a different nature wrong conclusions are likely to be drawn.
Thus it may be wrong to say that since the general price level has in-
creased, therefore, the cost of living must have gone up. It has already
been pointed out that general purpose whole sale-price index numbers
and cost of living inda numbers are constructed in two different ways
and serve different purposes. One cannot be used in place of the other.
Similarly it would be wrong interpretation of index numbers to say that
since the general price level has gone up'; therefore, the quantity of money
in the country has also increased. Generally price level does not merely
depend on the quantity of money in circulation. It also depends on the
quantity of goods and the velocity of the circulation of money, etc.
Wrong int".pretalion of torn/a/ion. Just as wrong interpretation
of index numbers would give wrong conclusions similarly if coefficient
of correlation oc coefficient of association are not properly interpreted
wrong conclusions are likely to be arrived at. Coefficient of correlation
also indicates a general tendency and that is why we had mentioned
in an earlier Chapter that a trend line can be "obtained for studying
the correlation between two series. It was also mentioned
in the chapter on Correlation that coefficient of correlation should" be
interpreted very carefully because it does ~ot fully disclose the mutual
dependence of two variables. Moreover correlation does not nCCCl-
INTBRl'llETATION OP DATA 623
,
sarily mean cause and effect relationship between two lICries. Supposie
in the state ofU.p. there is ne~tive correlation between the "area under'
surgarcane' and 'area under foodgrains·. If from this we conclude
that the cultivation of ~ugarcan~ is increasing at the cost of cultivation
of foodgrains it would be a clear example of wrong interpretation of
coefficient of correlation. Again if we conclude that people prefer
sugar to foodgrains it would also be a wrong interpretation of the c0-
efficient of correlation. It is likely that due to import of foodgrain
at low prices from foreign countries their cultivation inside the country
has become less profitable and S9 sugarcane is being cultivated in place
of foodgrains. It may also be that due to the increase in the sugar
mills. the prices of sugarcane has gone up and it may have become
more profitable to produce sugarcane than to produce foodgrains.
It is also likely that d~e to the construction of canals those people
who could not produce "sugarcane for Wallt of adequate water have
started producing it. Tlfe preference of surgacane cultivation over
the cultivation ot>oodgrrutls may also be due to changes in the climatic
conditions.'IThtfs before any conclusion is arrived at from the co-
efficidtt of correlation in the present case it is absolutely essential. to
take into account all these factors. If they are ignored it -is obvious
that" the conclusion would not be dependable.
Take another example. Suppose tlJ,e proportion of child accident'-
are less in those localities where there are parks and morein those where
there are no parks. It means there is negative correlation between the
number of parks and the number of child accidents. From this it
cannot be concluded that to reduce the number of accidents the num-
ber of parks must be increased. It is possible that those localities v.here
there are parks rich people live and the number of children are few
and the number of servants who look after them large. It is also likely
that in such localities there may be gardens attached to the houses and
the children may not be coming out,on the roads frequently. It is
thus clear that unless the coefficient of correlation is interpreted very
carefully misleading conclusions are likely to be drawn.
Wrong interpretation of association. In case of coefficient of associa-
tion also ~ess proper precaution is taken wrong conclusions .are likely
to be drawn. In the chapter on Association of Attributes while dis-
cussing partial association we had mentiOned that the association between
two attributes may be the result of their common association with a
third :i.ttribute and there may not be any direct association between them.
Thus. if there is a l'ositive association between inoculation and exemp-
tion from attack of small-pox it should not be immediately concluded
thatthe inoculation is useful in preventing the disease. It may be that most
of the inoculated people are rich and live in healthy surroundings. where
chances of getting these disease are litde. Thus the apparent association
between inoculation and exemption from small-pox may be due to the
common association of both these attributes with a third attribute.
namely. the cc::noomic status of the people.
624 FUNDAMENTALS OF STATISTICS

The above example clearly shows that interpretation of data is an


extremely difficult task and unless proper precautions are taken mislead-
ing conclusions are likely to be drawn. A statistician should always
keep in, mind the limitations of the science of statistics. Statistics
generally do not reveal the entire story of a phenomenon. We do not
take into account all the factors that affect a particular phenomenon
and from a study of a few factors it is very risky to draw generalisa-
tions. The laws of statistics are true only on an average and as such
a statistician can never afford to be dogmatic about his conclusions.
Statistical conclusions are always based on certain assumptions and in
many cases the assumptions do not hold good and faulty conclusions
are thus arrived at.
Effert of wrong interpretations. Wrong interpretation of statistical
data promote distrust of the science of statistics. People lose faith in
statistics and begin to look at them with suspicion. Unfortunately
statistics do not bear a trade mark of their quality and as such good,
bad and indifferent statistics get mixed up and create confusion. Wrong
if" erpretation of statistical data leads people to use for the science of
,catistics phrases like 'tissues of falsehood' and 'lies of the first order.'
The common man begins to believe that 'statistics can prove anything'
and that 'an ounce of truth can produce tons of statistics'. It is there-
fore, essential that students of this subject should realise the da~gers of
wrong interpretation of statistical data and should take adequate pre-
cautions in drawing inferences and 'further should not accept other
people's statistics at their face value.,
Questions
1.What do you understand by interpretation of data ? Write a detailed
note on its utility.
2. What are the mistakes commonly committed in interpretation ? What
prec:tutions are necessary to avoid them ?
3. What conclusions would you draw regarding the economic activities of
the people liVing in U.S.S.R from the study of the figures given in the following
table?

1919 1 ':1 193 1 193 2 1933 1934 1935


- - - ----i--- ----- --- - - - --.
Industrial product:on 126' 164 I 203 231 250 300 36 9
----_._--- - - -------- - - - ---- --_ ---- - - -
Output of investment
goods 131 I 195 240 279
II 30 7 3 82 I 4 81
- - - - - ______ • _ _ _ _ _1_ _ _ _ - - - _ _ _ _ _ _ _ - _ _ _ __

Output of consumer's i I

_
goods 12.2. : 147 171. 190 1.00 1.30 274

_?et imports --=--=--=---=1=-9~]=;;;-=1 1I~ -~== 37 ')- 2~_ =~


Net exports 144 ! ~ 100 I _6I 5_~ 45
(B. Com•• Allahabad. 1939).
INTERPRETATION OF DATA 625

4. Interpret the data given below and illustrate any two series given by •
suib~le diagram :-

Percentage of
World land World cul- World prc..- Worldpopu.
Quantity of Country area tivated area duction of lation
cereals
Asia excluding U.S.S.R. 18.6 32.9 31.0 53.1
North America 17.3 21.2 21.5 8.2
U.S.S.R. 16.1 16.8
- 22.0 7.6
Europe excluding U.S.S.R. 3.7 16.3 16.0 17.9
Mid and South America 13.2 5.7 4.5 5.0

-
Africa
-_ -
2".1 5.6
. .
4.0 7.7

Oceania
Total 1
~

ToO
100.0 f 100.0
F ~ 1.0
_
1-
0.5
100.0
(M. A., All4babaJ, 1952).
S. How fat do you agree with the conclusions drawn in the following cases :_
(d) It is observed that intelligent fathers have intelligent sons; and intelligent
pnd-fathers Mve intelligent grand-son!>, thctefos:e, intelligence is hereditary,
(6) Two series:_quantity of money in circul3;tion and general price index-
:lrC found to possess positive cors:elation of a fairly high order. It is concluded that
one is the cause and the other the effect in a direct causal s:elationship.
(,;) I~ is observed that generally death rates in two towns as:e identical. It ie
infein:d from this that the popUlations of both the towns are equally ~lealthy,
(M. A •• RajPNltIII4, 1959).

6. Point out the ambiguity or mistakes if any in the followi!J8 statements : -


«(I) The death rate in the American Navy during the Spanish-American War
was 9 per 1000 while in the city of New York for the same period it was 16 per 1000.
It was safer tp.en. to be a sailor in the American Navy tMn to live in the city of New-
York .
.j ,

(b) The pet capita income for India in 1931-32 according to the estimates fra-
med by Dr. V. K.-R. V. Rao was' Rs. 65.00. The estimate for 1948-49 framed by the
National Income Committcc was Rs. 225. In 1948-49 India was, therefore, four
times more prosperous than in 1931-32.
«() The examination results of school x was 75% in a particular year. In the
same ye_ar and at the same examination-onl} 400 out of a total of 600 students were
successful in ,chool.1. The teaching standard of the former school was decidedly
better. (B. Com., Dtlhi, 1953)
40
FUNDAMENTALS OF STATISTICS

7• . Study me following table· very c.arefuJly : -

-8.. c:~
u·_
.......
CO B
'O_g.~
." '"' ....
o '~
on
......
....
o':i
.
e'..0.... . . ,.
....0 , ~...
c:
§D~": ~g5 ~2 .. r:! i:'.a ~ d ~[.'.! !
-2 ....o ~aS ]'0
~~1~
._ " " - 0 .;: d S
..... ,0 'zj
a ~~'-'~

~~~~
ft ~::> .. C "' ....
en
<~c:dl
:::5!
.!l.c'
;:.. ....
~."
o:J.. H
'0
.. ." c!-t3]'O
1931-32
1932-33
15.9
17.9
._..!!
222.5
261.2
8.5
16.4
-- ::I
."
0.7
1.4
2.0
2.2
.g
-
19)1-.-34 17.3 257.0 30.2 2.7 1.6
1934-35 18.4 275.8 36.9 3.2 1.2 17.2
193"5-36 22.5 333.6 55.3 5.3 '1.0 20.4
1936-37 25.2 380.2 63.0 6.1 0.8 24.1
1937-38
1938-39
22.2
165
319.4
145.4
57.9
35.9
5.3
3.2
.
1.0
1.0
18.8
7.3
1939-40 19.1 217.4 70.3 6.6 1.0 9.2
1940-41 25.4 286.4 52.0 5.1 1.2 15.5
19-41---42 17.5 145.5 38.8 3.8 0.8 7.5
Write a short review based on the above ~blc
...
of tbe .apr economy of Uttar
Pnocbb during the period 1931 to 1942. (N. A. A,.a, 1944).
- - - - - - - - - - --""7"--:-""7 - - - - -
8. Iacapret the following results relating to two collegu A and B and find
('41t which of the two is better ; -
A .tI
Esamination No. of .caodi-_ . Successful· No. of candi- Successful
dates appeared ~dates appeared

M.A. 30 25 190 ao
M. Com. 50 45 1 0 85
B. A. 200 150 100 70
B. Com. 120 75 80 50
Total 400 295 4'0 215

9. The following table gives me figwes of ,"Iue of Industrial Production and


Iodia:s of WnoJaalc Gcne~l Prices (or a particul~ ami {or tcn years :
Va~ue of lnc.lustrial Indices ot Wholesale
Yeu Production in lalths Prices
of tons
1931,-32 60 100
1932-33 36 91
1933-~ 45 87
1934-35 58 79
1935-36 84 91
1936-31 93 91
1937-38 86 102
1938-39 84 95
1939-40 82 108
1940--41 80 120
Comment on the above figures.
10. The following table gives· two seri'!S of index numbers. The first class
relates to prke kvcl-of those commodities which the agri£ulturist of Uttar Pl'2desh
selJ. md the second relates to me general price level of thoSe commodities which ht
purbua. Analyse the two aeries and find out
INTEIlPIlETATION OF DATA

(0) Whether the economic condition of the agriculturist Qf Uttar pradesh was
favourable Of unfavourable to him month by month in 1948.
(b) Whether at the' end of 1948 he was better or worse off as compared ,~o (i)
1932 ; (ii) Beginning of 1948.

Months of Series A Series B


1948 1939=100 1939=100
January 434 310
February 420 323
March 374 332
April 354 351
May 417 390
June 438 387
July 474 395
August 495 405
September 500 392
October 499 393
November 485 392
December 485 378

11. Interpret the following data ; -


Indices of ,Industrial Disputes in India
1939=100

Indices of the Indices of Indices of


Year ,number of number of . man days
disputes labourers . lost
involved
1939 100 190 100
1940 78 171 152
1941 88 71 67
1942 171 189 116
1943 176 128 47
1944 162 137' 69
1945 202 183 S1
1946 401 479 255
1947 446 .450 332
1948 ,310 259 157
1949 227 168 136
1950 201 176 257

12. Discuss the soundness of the f')i1owing arguments and indicate what
additional information, if any, would be needed to test the matter more effe~tively.
Your answers mus_t be brief" and to the point. (You may assiune that all tlie f"cts
stated are correct.)
(0) Wages have risen much more than salaries compared with pre-war days,
so any action to remove the excess of demand should be concentrated on them.
(b) The percentage of women in the various ranks of state service falls consis-
tently as one goes up the hierarchy; this must mean either'that there is :it prc;udlce
against promoting women, or that they have less llbi)ity to do responsible work.
(c) The mortality rates of miners for a particular period were above the male
average in each age-group, but those of mmer's wives were even further above the
female averages, it follows that high rates for miners were not due to unhealthy
workjng conditions but to low incomes.
Fl"I'DA'YEl':TALS OF STAThnCS

(J) The futility c;,f diphtheria immunization is shown by th~ fact that there
were over 5,000 cases of diphtheria amongst immunia:d children .in a particular
period. (P. C. S., 1953).
13. The following figures have been taken from the census ofIndia Report,
1951 :
Age-!l'D/(p 5 ID 14 flllri
Number of Numocr ot Numbelof Numbe!of
Zone males in married males females in married fema-
thousands in thousands thousands les in thou-
sands
North India 7,41b
~~~
tt,298 1,568
East India 10,935 10,253 1,759
South India 9,256 ·87 9,213 421
West India 5,~2 128 5,010 535
Central India 6,750 494 6,427 1,364
On the basis of the foregoi ng figures, write a critical Dote 00 the -extent of earlv
"11'lrriagcs ill different zones of India. (P. C. S., 1955)·

14. Comment on the following : -


A budding contractor employs three categories· of workers-mc;n,
(1')
women and bo~d pays them a daily wage of Rs. ~. %, and 1 respectively. The
average of three wage rates is Rs. a-If the contractor employs So workers (30 mell,
15 women and , boys), the total wage bill of the contractor will be of Rs. 100.
(b) Nearly all the A's ne B's. and therefore A and B must be associated.
(t) Rate for a certain commodity in the first month is 4 Kg. per rupee and
in the second month is 6 Kg. per rupee. Thus the average price is (4+6)/%=, Kg.
for:1 rupee. .
15. From the following data. show whether there ill any relationship between age
and blindness.
Age No. of Pc:rsons Blinds.
(in thousands)
0-10 100 H
10-%0 '0 40
20-30 40 40
30 -40 3~ 47
4D-5 0 %4 0
,0-60 It 36
00-70 6 Z2
70- 80 3 18.
16. Two students A and B get the follOWing marks:
A B
First Terminal Examination 4'% 6,%
Second •• H~ H~
Annual Eumination 6,% 4S%
Sin~ the average pereentcages of both of them is H. the progress of both the
students is identical. Comment.
Probability 23·
In preceding ellapten we have studied the analysis of such data
which are obtained from sources which are beyond our control, for
example. fluctuations in prices, distributio~s of income, etc. In this
chapter we shall study the analysis of artificial data which llre obtained
by methods .which are more or less in our control.' The analysis of
these data would give certain generalisations which are of very great
use in statistics. These generalisations arc studied under the theory
of probability.
The mathematical measure of probability is generally defined in
the following manner :-
"If an event can happen in III ways and fail to happen in n ways.
and each of these ways is equally likdy.the probability or the chance
of its happening or p"", - '!!_ and that of its failing to happen or
(m+n)
If
'1= (m+1J)
This is also expressed by saying thAt the chAnces are m to If that
the event will, os: " to
111 that the event will not bappen.

For example. if in a lottery there are .. prizes and 16 blanks the


chance that a person who holds one ticket. will get a prize is (4+416 )

1
or San e w ill not get a pmc:
d the ch ance t h at h' . 1S
. (4+
16 ) or l'
16
4 In

other words. the odds are 4 to 16 or I to 4 that he will get a pri%e.


Since an event can either happen or not happen, the proba\Jility of
its happening and the .probability of its not happening total to 1. In
other words (p+q) = I. It means that (I-P)-q or (I-q)=p. It also
means that if the happening of an event is certain p= I and if its not-
happening is certain 1]= I.

Thus we can conclude that the probability of happening of an


event is equal to the number of ways favourable to the event. divided
by the total number of ways in which the event can happen and not
happen. In the illustration given above the number of ways favour-
able to the event was 4. because there were-four prize tickets. and the
FUNDAMENTALS OJ! STATISTICS·

total number of ways in which a person could have possessed one


ticket out of 200 was 200. The prpbability of winning a prize then, is
equal to the n~mber of favourable ways divided by the total number
of ways favourable and unfavourable, or -4.
200

It. is obvious from the above that to calculate the probability of


the happening or not happening of an event we shall have: to calculate
the number of favourable; ways and the total number of W:;1YS in which
an event can happen. For this, it is necessary to know the_ formulae
of Permutations ,and Combinations. ·We discuss below some of the
important rules of permutations and combinations.
PER.MUTATIONS AND COMBINATIONS

The word permutation refers to the arrangtm,ents wbic::h can be


made by t:;1king some or all of a number of things. The ,,;ord com-
bination, in the other hand, refers to the groups' or' selections which can
be made by taking some or all of a number of things.
The permutations whi~h can be made by taking the digits 1,1,3,
4, two at a time, are 120 in number. They are \
I 20; I 3; I 4; 20 3; 20 4; 3 4;
20 I; 3 I; 4 I.; 3 20; 4 20; 4 3;
Each of the above set presents a different arra"gelllcnt ot two digits.
The combinations which can be made by taking the digits 1, 2., 3, 4
two at a time, are only 6 in number. 'they are I 2.; i 3; I 4; 20 3; 20 ·4
3 4·
Each of the above set presents a different grOIiP. of ~ digits.
It is obvious ·from the above that the comiiUlations refer only to
the diff~ent groups or selections whereas permutations refer to the arrange-
lIIenl ~f items in different groups also.. In the above case, each of ttte
six combinations of ~he two digits can be ~anged in ~o ways and so
the: number ,of permutations is 6 X 20 or 12.. A combination can thus
result in a number of permutations. Thus abc is one combination of 3
alphabets and it can have the following six permutations :
abc acb bca bac cab cba
Permutations and c;ombinations are generally studied in the bac~­
ground of a fundamental rule. It is as follows :-
If an operation can be perforllled in m 'Wqys and haping been perjorlfJca
in one of these UJqyJ a second operation call tben. be perforllled
ill n wqyJ, the
nllllJb~r of perfor,lJIing the two~ operations would be m X n .
. Example 1. There are fiv.e different ,roads connecting two towns
A and B.. In how many ways can a person go from A to B by one road
.lnd return by another ?
PROBABILI'rY 631

The journey from A to B can be made" In five:: ways as the man


can go by any of the:: five roads. The journey from B to A can be
perforlJled in four ways only as he cannot return- by the way by which
he:: went. The:: total number of ways of making the outward and in-
ward journey would be 5 X4= 20. ~

This principle can be extended to cases wher~ there are more than
2 operations to be:: performed.
EXQ,!,ple 2. There are six doors in a room. Four persons have
to enter It. In ho\'.'" many ways can they enter from different doors ?
The first person can enter from any of the six doors, the second
from any of the remaining five doors, the third from any of the remaining
four doors, and the fou.rth from any of the remaining three doors.
Thus the tot al number of waysoy which they can enter through different
doors is
6 X5 X4X3 = ;60
(i) The nunlber of permutations of n dissimilar thing. taken r at a time.
This is the same thing as finding out the number of ways in which
, places can be filled when there are n dissimilar things. The first place
can be filled up in n ways, the second in (n-I) ways, the third in (n-1)
ways and so on. The total number of permutations would be
n~(n-I)x(n-2)X .•.•...•• . x(n-r+I).
In example No. 2 the value of n was six and of r 4.
Thus'the number of ways in which four people can enter' a six-
door room from difterent doors is
n(n-I){n-2)(n-3)
or
6 (6-1) (6-2) (6-;)= ;60
The. above rule is written in shott form as
"P~
Thus
"Pr=n (n-I) (11-2) . •.. .(n-r+l)
n 1
- (n-r) I
The sign 1 is read as fafloriaJ.
n1=n (n~I) (n-2) (11-3) . ....•.
In example No. "2, the number of ways in which four persons can
cnter a rOom from six different doors
op 6 !
= .= (6-4)!
FUNDAMENTALS OF STATISTICS

= 6X5 X4 X 3 X%XI 360


%XI
In actual practice it is not necessary to write all the terms as written
6 I
above. (6-4) I actually means 6X5 X4X;. We need not further
write X % X I because the denominator will also have these terms and
they will cancel each other. . .
I;!.xample 3. In how many ways can ten seats be occupied by four
students? Here
..p _lOp _ 10 I
r- ,,- (10-4) I
= IOX9X8X7= 504 0
(il) The nllmber oj (omb;"ations oj n dillimilar things taken r at a time.
We know that r dissimilar things can be arranged in r I ways. As
5uch if the total number of pe~mutat~ons of n dissimilar things taken r
at a time is divided by r I we shall get the .total number of combinations
of n dissimilar things taken r at a time.
· n....,.
If this numb er IS r"
It IS rr
equaI to npr
Thus
n I
tIC - -:-~-.---:-
r- (n-r) I r I
It means that "pr="c,.xr I
Example 4. In how many ways can five persons be chosen out of
eight?
Here we ~re concerned with the number of selections of a group
'or combination of five persons, and not with the arrangement of selected
units.
II I II I
I (8-5) I = -, C0
8X7X6
}Xl
If the five selected persons have to be despatched to five different
Itations, the different ways in which this arr:angelllmi is possible would
be 56X5 I -56XIZO=6710
(iiI) The nllmber ojiPays in whi(b III + n + p things (cUI /" t/ivid,J into
Ibn, grOllps (ofllaining fII, 11 and p things ruperlille(y.

This is equal to
(Ifj'+n+p) I
mlnlpl
PROBABILITY

If, h owever, lII=n=p the formula b ecomes m I X3mIII I 1XIII I


But this formula regards as different, 'all the possible orders in which
the three groups can occur in anyone mode of division. But when
m = n = p such a distinction is, not possible. Since here are 3 I such
orders corresponding to each ,mode of sub-division' the number of
different ways in which three equal sub-divisions can be made.
;ml
mlmlml;'
Example 5. In how many ways can twelve books be allotted to
three shelves which can hold 2, 4 and 6 books respectively ?
There are three groups in which twelve books have to be divided.
The groups are to be of 2, 4 and 6 books.
Thus
(m+n+p)_1 _ (2+4+6) !
Ilflnlpl - 2.14!61
u!
2. ! 4 ! 6 !
I2XIIXI0X9X8X7
4X3X.z. X .z.
= 1;860

If, however, the shelves were of equal size so that each could con-
tain four books the answer would be
(m+n+p) ! 3m ! .
mIn ! pI; ! or m 1 m 1 m ! ; 1 Slnce m = n = p.
12.1
- 4 1 4 1 4 1 ;1
= 5775
(iv) The number oj wtrys in which n things mtry be arranged among them-
selves when p oj the things are exac'tIY alike and of one kind, q of the thi.ngs are
exactlY alike, of ~ second kind, r of ~he things are exactly alike of a tbird kind,
and the rest' are all different.
If, n things are aU different, the total number of ways in which
they can be arranged is n I. If, however, Some of them are exactly
~imilar to each other, the number of permutations is reduced. Thus
the number of ways in which" things can be arranged, 1f p are of one
kind, q of a second kind" and 'r of a third kind, is
nI
pI q 1 r 1
Example 6. In how many ways can the letters of the word
"combination" be arranged?
FUNDAMENTALS OF STATISTICS

If aU the el-. yen letters in the word "combination'· were different


the answer would have been I I I. In this case, however, tliere 'are two
D'S, two i's and two n's. Each of these gr.oups of. two letters can be
~rmuted without altering the number·of arrangements. As such the
answer would be
11 !
.2 1 .21 z 1
.== 49,89,600
CALCULATION OP PROBABILITY

Simple events
We have already pointed out that if an event· can happen in !II
ways and fail to happen in n ways the probability of its happening or p is

( 111 ) and the probability of its not happening or q is ( n )


m+n m+n
Since the event can either: happen or not happen p+q= I hence p=
(l-q) and 9=(I-P).
It is clear thll:t (!II.+n) indicates the total number of ways in- which
an event can take place, and m indicates the nuinber of ways favourable
to the happening of the event. Therefore, to find out the probability,
of the happening of the event, we have to find out th! number of ways
favourable to the event and the total number of ways in which the event
can take place and qivide the form~_!: by the latter.
Example 7. Fipd the:: chance of throwing a number greater than
t-vo with -an otdinary dice.
A dice ha:s six faces marked I to 6. Numbers greater than two
can be. 3~ 4, 5 or 6: Thus, there are four ways favourable to the happen-
ing' of tile event and the total number of ways in which the event can
happen are 6. The- desired probability therefore is i or j
Example 8. In a bag there are four red and three black balls.
What is the probability that if they are drawn one at a time; the first
will be red, the second white, the third red and so on ?
The question is like the one relating to the arrangement of 7 balls
(4 red and 3 black) in seven places. The four red balls are to occupy
four (Jdd places, i. e. I, 3, 5, and 7 and the three ba1ck balls are to occu-
py three even places,. i.e., 2., 4 and 6.
Four red balls can be arranged in four places in 4 [ way!! and simi-
larly three black balls can be arranged in three places in 3 I ways. The
total number of arrangements of seven balls in Seven plaCes is equal
to 7 I
Thus the desired probability is
4 13 I
7 1
-j'S'
P1l0~ILI'l'1'

Example. 9.' From a ba~ containing four black and five red balls
a draw of three balls is made. What is the probability that aU of them
would be black ? .
The total number of ways in which, three balls can be drawn out
of 9 is=IIC a· .
The number of ways in which three black b~lls can be drawn out
of four = 'Ca.
Therefore t.Ile desired probability =:~,
a • e

'ea= 4X3 XZ = 4'C


g'"
9X X7 =84
3 x~ I 3 XZ
Therefore
'ea 4 I
'ea = 84= il
Ex4mpJe 10. A bas"three tickets in a lottery in which there are ~
prizes, and 6 blanks. .B has one ticket in a lotte.f1' in which there is one
prize and two blanks. Who has a better chance of winnhlg a prize?
A wHl not get any prize jf all his three tic~ets are blanks. Three
tickets can be chosen out of 9 in 'Ca' ways and 3 blank tickets can be
chosen out of 6 in teaways
Thus A'r chance of not getting a prize :Cc'3 =1..
:1
B will not get a prize if his tjcket js blank. One ticket can be
drawn out of three in aC1 ways and one blank ticket can be drawn out
of two in 2~ ways.
Thus the chance of B not getting a priz~ =~ = ~
Thus A's chance of not getting a prize i~ '!ii- and B' J chance of
not getting a prize is ~. ~
In other words A's chance of getting a prize (either I or 2. ot
3) i s '
1-"Ii or {~
and B's chance of getting a prize is
1- 2
11"
er 1J
1

The respective chances of A and B of winning <t. p~ize, are ,in
the ratio of 16 ; 7. Thus A has a better chance of wlnrung a prize
than B. .
Exa"Jp/~ I I. There are three events A, Band C. of which only
one must, an~ one 0rl:ly can happen. .The odds are 8 to 3 against
A, , to 2. agaInst B. FlOd the odds agamst C.
Th,e probability of A's happening is rio
The probability of B'j happening is ~.
FUNDAMENTALS OF STATISTICS

••. The probability of C's happening is- l-(y{+,}) or ~~


because one and only of the events can and must happen.
Therefore odds against Care 43 to 34.
Compound events
Up till now we have been discussing the probability of the hap-
pening or not happening of single events. Such ev~nts are called
simple events. When two or more events occur together, their joint
occurrence 1s called a Compound Event. If, for example, there i,s a bag
containing six white and twelve black balls and if 2 successive draws
of four balls are made, and if we ha.ve to find out the probability of
getting four white balls in the first draw and four black balls in the
second draw, we shall be dealing with a compound even1:.
In such cases two or more events may be either independent of
each other or the second or third event may be dependegt on the first
and second respectively. For example, if in the above illustration the
balls, are replaced after the first draw, then the probability of the s'econd
event is not dependent on the first, because the position at the time
of the second draw is the same as at the time of the first draw-each
time the bag contains six white and twelve black balls. If, however,
the balls are not replaced after the first draw, the probability of the
s~ond event is affected by the first. In the above case then at the
time of the second draw there would be only two \white and 10 black
balls. The probability of drawing four black balls thus would depend
on the happening of the first event.
Thus, events can be either independent or dependent, depending on
whether the probabilities of successive events are not affected or affected
by the probabilities of the previous events.
Multiplication of probabilities
To find the probability of the combined happening if two or more inde-
pendent events if the probabilities of their separate happenings are known.
Suppose the first event can happen in tli 'ways and fail to happen
in n ways each of which is equally likely; and suppose a second
independent event can happen in m' ways and fail to happen 'in tI ways
each t)f which is equally likely. Now each of the (m+n) cases can be
associated with each of the (If.' +n') cases, and the number of compound
cases each equally likely to occur, would be (m+n) (m' +n').
Now mm' would' denote the number of ways in which both the
events happen, nn' the number of ways in which both fail to happen,
I1In' the number of cases 1n which the first event happens and the second
does not happen and nm' the number of ways in which the first event
does not happen but the second happens.
According to the above fundamental rule of probability
mm'
(lII+n) (111' +ni) would be the chance of both events happening
PROBABILITY

nn'
,--.:--would be the chance of both events failing;
(m+n) (n;' +n')
mn' wopld be the chance of the first happening and seconA
~~~ .---- \
(m+n) (m'+n') failing; and
nm' would be the chance of the first failing and the second
(m+n) (m' +n') happening.
If the respective chances of the happening of two events are de-
noted by p' and p' the probability that both of them would hapl>en
would be pp'. Similarly the chance that none of them would happen
would be (I-P) (I-p') and the chance that first would happen and the
second would .fail to happen, would be P(I-p'); the chance that the
first event does not happen and the second happens would be p'
(I-P)·
If P is the probability of the hal>pening of an event in one trial,
the probability of its happening in two trials would be p 2 and in three
trials p3 and in n trialsr, because the probability of its happening in
each of these trial~ is equal and independent of the previous events.
If PI' P2 and P3', are the probabilities of the happening of three
independent events, the probability that some one of them would
happen is
1-(I-Pl).(I-P2) (I-P3) because the chance that all the three
events fail to happen is (I-PI) (I-P2HI-Ps) and except hi this case
some of the events (either I or 2 or' 3) must happen.
Exau,ple 12. A dice is thrown three times; what is the chance
that on the first t)1rOW, it falls with number 1 upwards and in the
second with either number I or number 2. and in the third with either
number I, or number 2. or number 3 upwards.
The probability of the first event is i
The probability of the second event is i
The probability of the third event is i
.'. The probability of the '-Compouqd event is i Xi Xi
- )
-Sir
Example 13. A 'bag contains 5 red and 8 green balls. Two d'raws
of thr~e balls each are ma~e, the balls being replaced after the first draw.
What IS the chance that In the. first draw, all the balls were red and in
the second, green ?
The number of ways in which three balls can be drawn out of
13 is= 13Ca .
The number of ways in v:hich three red balls can be drawn "ut of
~ is= <lCa
Tiw; number of ways in which three sreen ba11e can be drawn out
of 8 i5_- BCS
FUNDAMENTALS OF STATISTICS

Therefore the probability of the first eve1].t, i. e., draw of 3 red


sCs
balls = 'TaG;"
Similarly the probability of the second event, i. t. draw of ~
BC 3
green balls = .1llC
3
The probability of the compound event, i. e., of 3 red balls in the
first draw and of three green balls in the second draw
_ liC a sC a
- lac X lac
3 3
5
28
=--X--
143 143
140
=--
20 449
Exa1l,ple 14. ~t is 8 to 5 against a person who is now 40 years old
living till he is 70 and 4 to 3 against a person now 50; living, till he is
g~. . Find the p'robability that one at least of these persons would be
ahve 30 years hence.
The probability thai:. the first person would die within the next
. 8
th lrty years = - -
13
The probability of that the second person .would die within the
. ' 4 I
next 30 years = _-
7
Therefore the chance that both would die within the next 30
8 4
years=~X - = -
32
13 7 91
Hence the probability that one at' least of these persons would
be alive 30 years hence = 132 = ~ _
91 9 1
In all the above examples we have assumed that the successive
events are independent of the previous ones. If. however. the .prob-
ability of the successive events is affected by the previous ones, the
same rules apply for the c~lculation of the probability of the compound
events. If the probability of the happening of the first event is p and of
the happening of the second dependent event is p' the probability that
both would happen is pp'. In example NO.1; we had presumed that
tne balls are replaced after the first draw so that the probability of the
second event was independent of the first one. If, however, in that
example the balls were not replaced after the first draw the probability
of the second event would be dependent of the first event. The question
then would be as follows :-
Example 15. There are five red and eight. green balls in a bag.
Two successive draws of three balls are made, the balls not being replac-
PllOBABIUTY 6~9
ed after the fir,st draw. Find the chance that the first draw would give
three red and the second three green balls.
Here the probability of the first event is (as calculated in Example
IiC
No. 13)= 13 C- 3,
The probability of the second event. however. would be different
from the one calculated in Example No. I;. The balls are now not
replaced after the first draw so that the bag would contain only two red
and 8 green balls at the time of the second draw.
Number of ways in which; balls can be drawn out cif IO=10C 3
The number of ways "in which 3 green balls can be drawn out of
8=8C a '
Therefore the probability of drawing 3 green 'balls' would be
Sr'
'-'3
=lOC-
3
_- The probability of the compound event, i. e., 3 red balls in the
first draw , three green in the second draw would be
IiC sC 5 7
= -sX -s- = - x -
13C 3 IOC 3 143 15
7
=-
42 9
Example 16. In a game of bridge what is the probability that a
specified player. would hold all the four kings ?
In 'a game of bridge a player must have I; cards which can be
drawn from 5.2 cards 52Cra ways.
All the 4 kings can be drawn out of 4, in ways. 'C,
Nine remaining cards can be drawn out of' 48' remaining cards in
(SC e ways.

_ 'C. X484_
The required probability would be - -52C-'- - 416~
2!..
13

Addition of probabilities
If a set of events is of such a character that when one of them
happens the other cannot ~ppen, the sets are said to be mutuallY exclu-
sive.. For example, if three per~ons run a race only one of them can
win, the other two cannot, assuming a dead heat to be impassible.
(Assuming that two or more persons do not cover' the distance in
exactly the same time).
The rule of tinding out the probability of the happening of
mutuaily exclusive events is as follows :-
.. If an event can happm in different Illtrys whic6 are 'mil/HallY exclusive the
probability that it u'ill happen is the SlIm of the probabilities of its happening
in these, different ways,"
6.40 FUNDAMENTALS OF STATISTICS

Thus, if the probabilities of n mutually exclusive events ate Pl' Pt,


P3...........p,,'
then the probability that someone of these events would
hapPen would be
Pt+P2+P3"'+P",
Exampl;- 17. The odds against A solving a problem are 8 to 6
-and the odds in favour of B solving the same problem are 14 to 10.
What is the probability that if both of them try the problem would be
solved?
The problem would be solved if,
<i) Both A and B solve it.
(H) A solves it but B fails to solve it.
(iii) B solves it but A fails to solve it.
The probability that A and B both solve the
- 6 l&_1
probiem .. -"1"4' XY4'-4'
The probability that A solves it but B fails - - -
to so Ive It
. ax 10 !5
.. ='I'4' Y4'=Ylr
Then probability that B splves it but A
£al'1s to soI ve'It 8 XY4'=-S-
:;=T4' 14 1
Therefor~ the probability that the problem
would be solved (either by- both, or by A or by B) I
wouldbe .. =4''+ Ylr5+1_6~
1J'-lr4'
16
=Y'I'
This problem can be solved easily in the following way also :-
The probability that neither A nor B is able
to solve the problem •. ='T{ X -}T =yf
Therefore the probability that the problem is
solved (either by both or by A or by B) •. = 1-ri=i-~
There' can be cases where both the rules of multiplication of pro-
bability and of addition of probability may have to be used. The
following are examples of this type :-
Example 18. A bag contains five white and three black balls, and
four are successively drawn out and not replaced. What is the chance
that they are..afternative1y of. different colour?
Beginning with white
The probability of drawing a whit~ ball . =i
The probability of drawing a black baU =i
The probability of drawing a white ball again =-i
The probability of drawing a black ball again =i
Thc!efore the probability of the compound
ev~nt
PllOBABILI"l'Y

Beginning '/Pith blade


The probability of drawing a black ball .. =;
The probability of drawing a white ball .. =}
The probability of drawing a black ball again = ~
The probability of drawing a white ball aga:in = ~
Therefore the probability of the compound event
=lx~xix~
-
-I4'
)

The above two events are mutually exclusive therefore the required
chance that foui- successively drawn balls are alternately of different
colour (without mentioning the colour with which to begin):

=I'~+I~
=}
Example 19. From;o tickets marked with the first thirty numerals
one is drawn at random. It is then replaced and a second. draw
is made. Find the chance that in the first draw (a) it is a multiple of
5'or of 7 and (b) in the second it is a multiple of 3 or of 7·
(a) The chance that the number is a multiple of 5 is =i:~
and the chance that it is a multiple of 7 is ="B".u
These events are mutually exclusive. Hence the requirect chance
is 6 +"- _1
= 1nJ "B"lY - 3
(b) The chance that the number is a multiple of,; is =ig
and the chance that it is a multiple of 7 is ="B"~
But Z I is a common multiple of both ,; and of 7. Hence
the probability that it is either a multiple of ,; or of 7
='0+ 4_ '=18
"B"lJ "B"lY lflY 'I er
The probability of the compound event = !-x i~ = ~~
Example zo. A bag, contains 5 red and; black balls, and-a
second one 4 red and , black balls. One ·of the bags is chosen at
random and a draw of 2. balls' is made from it. Find the chance that
one is red and the other black.
The probability that the first bag is selected and a draw of two
balls gives one red and one black ball =~X6CIX3~
T 'lC.
= '5
1>6'
The probability that the second bag is selcted and a draw of two
~ .~ X 6 C
balls gives one red and one black ball = T X ---'==----'-l
9C2
= Is5
Since the events are mutually successive the probability of the
events ='6+ 5='27 5
'1:11' Tlf '11":0 4'
41A
FUNDAMENTA~S OF STATISTICS

Example %I. What is the chance of drawjng a one-rupee coin


from a purse, on~ compartment of which contains % one-rupee and 4
one-anna coins, the second 10 one-rupee coins, the third 10 one-anna
coins and ~ 'pies and the fourth 7 one-rupee corns and s four-anna
coins?
The' chance that the first compartment of the purse is chosen and
a one-rupee coin is drawn or PI = 1X ~ = ri
The chance that the second compartment is chosen and a onc-
rupee coin is drawn or Pa =i X =1 -r.g
The chance that the third compartmen~ is chosen .and' a one-rupee
coin is drawn or Pa =1 X'l~ =0
The chance that the ,fourth compartment is chosen and a one-rupee
coin is drawn or p, =ix1:i=~i
Since the event, that is the draw of one-rup.ee coin from the purse
can happen in the above four different ways, the chance that the one
rupee coin,will be drawn from the purse is=Pl+Pa+Pa+h
='d'+i+o+~i
=2,11
~ti
The probability of the happening. oj an event in one trial bein~ kn~1J}n,
to find the probability of its happening in exactlY I, \2. 3••••••• r. tllflu In n
trials.
The probability that an event will happen exactlY r times in n
trials is
=n(n-l) (n-%) .•.(n-r+r)prqn-r
I X2X3'" Xr

nl .,., n-r
rl(lI-r)1 r 9
=ftCrp''qn-r
Where p stands for the probability of its happening and '1 fClr the
probability of its not happening in a single trial.
The probability that a series of events will happen r time~ in n
trials and fail to happen n-r times is equal to pr qn-r, but each seIles can
happen in flCr different orders all mutually. exclusive. T~erefor~ t~e
probability of the happening of an event exactly r times 1n n trIals IS
nCrp" qn-r
If we expand (p+q)fI, by the binomial theorem we can ge~ the
probabilities of the happening of the event exactly n times, n-I tIme,S,
n-2 times, etc., in n trials. We shall discu'ss.the binomial theorem In
deta.ils ,in the neXt chapter. For the present we shall obtain the J>ro-
bability· of the happening of events of this type by bC,. pro qn-r
Example 22.. What is probability of obtaining exactly three, heads in
five throws with a single coin ?
PROBABILITY

The probability of getting a head in a single throw or p=!


And the probability of not getting a hea~ in a single throw Ot
q=!
We want heads in three cases out of five so r = 3 and n = 5.
The desired probability would be obtained by the formula :-
"crpr qn-r
Substituting the values in the above formula we get
5Cap3 q2
=5C3(1)3(t)2

=_,5__, X(~X!X!) (tXt)


1
3 . 2 •
X
=5 4 X,X!X!X!X!
ZXI
_
-Tn-
[;

It should be noted that the


probability of three heads is t X f X !
and o( two tails is ! X t
The pro_bability of 3 heads
and 2 tails would be tX!X!X!X!
Ot 1ri
But there can be 10 different orders of three heads and two tails
as ,_follows :-
If H stands for heads and T for tail
HHHTT HTHTH
HHTTH THHHT
HHTHT THTHH
HTHHT THHTH
HTTHH TTHHH
Therefore total probability would be
IOx1r{- or T~
,
Example 23. Three coins are tossed simultaneously. What is the
probability that they will fall, two heads and one tail ?
The chance that a coin falls head or p is i
and the chance that it falls tail or q is also !.
The ptobability that out of 3 coins there will b¢ exactly two heads
and one taiL
=3Czp2 q=; X!x!x!=!
E?fample 24· Five coins whose faces are marked 2, 3 are thrown.
What is the chance of obtaining a total of 12 ? .
FUNDAMENTALS OF STATISTICS

To maKe the total of 12.; two coins must fall with the mark 3
up and three coins must fall with the mark 2. up. No other arrange-
ment would give a tobU of I Z.
The probability that a coin in one throw would fall with the
mark 3 up is t and the 'probability that it will fall with the mark % up
is also equal to i.
Therefore the chance that two coins would fall with mark 3 up
and three coins with mark Z up.
IC.p'q'
=~
Z X I
X lxlxlxlx!=
.
11
TW
Th, probabilil.1lhal an 'PIIII 1I'oIlIJ,happe"II al I'lasl r til11IJ ,,, " trials is
==p"+"c:.p_lq+"C,pr.-lIql+0Capr.-8g '+ ......
flC, p"qn-r

Since the probability of the happening of an event exactly


r times in II trials is pfl, exactly I I - I times is "C1pn-lq; exactly 11-'%
times is nC,pn-'q' and exactly r times is "C,P"qfl.-T' and since all these
events arc mutually exclusive, and since in any of them the event
happens at least r times the required prob"ability must be the sum of
all these terms.
In example No. %2 if we wish to nnd out th~ probability of at
leasl 3 heads of S throws of a single coin it means that we have to find
out the probabi!ity. of either S or 4 or 3 heads.
The probability of S heads is p" =(l)5 = lr}
The probability of 4. heads is ICJJ"q
=sx!xlxIXlX!
==rl
The probability of 3 heao~ is

= 2.SXx41 x!xlxlxlx!
=H
'Therefore the probability of
at least three heads -u-~+lri+i.g.=i-a-
==1
Exal11ple 2. 5. The chan~ that a ship safely reaches a port is i.
Out of S ships expected what is the 'probability that 3 at least would
arrive safely ?
The probability of the safe arrival of a ship or and hencep=i
q=:.
PR.OBABILI'I'Y 64'
We want the probability of the safe arrival of al karl three ships
out of five which meanS either all the' five ships or four ships or three
ships.
The probability of the safe arrival of all the five ships

=pn= (-5-
1)5 = 3 12.5
1

The probability of safe arrival of four ships


=5C,p'q
1 1 I I 4 2.0
=5X-X- X - X - X - = - -
5 5 5 5 5 312.5
The ptobability of safe arrival of 3 ships
_5CaJi''1 'l.
5 X4 I I 1 4 4 160
= -X-X-X-X-X-=--
2.XI 5 5 5 5 5 312.5
Therefore the probability that either 5 or 4 or.3 ships arrived
safely which means the same thing as al I,arl 3 ships arrived safely
I 2.0 + 160 lSI
=312.5+312.5 312.5=312.5
Mathematical Expectation
Mathematkal expeflation is defined as Ihe prodllGt of the probability oj
lhe happening of an event and th, alllollni of 1II(}1If:Y one is 10 get if the event
;n '1111$#011 'happens. Thus, if A is to get Rs. 2.0, if a particular event
happ~ns and !f the probability of the happening of the event is i his
expectation would be i X 2.0 or Rs. 4·
Thus, if p represents the chance of success and }y[ the amount
which a person is to receive in the event of success, his expectation
would, be pM.
Exalllple 2.6. A and B throw with one dice for a prize of Rs. 33
which is to be won by the player who first throws six. If A h... the
first throw whAt are their respective expectations ,?
The probability of A's success in the first throw = i
The probability of A's success in hi~ second throw =(j)2 X ~
because A will get.'a second chance only when he himself fails in the
first chance and B also fails'in his first chance the probabilities of which
are 11'II and 11'.
15

The probability of A's success in his third throw would be


(ij.)'xj.
because A will get his third chance only atter both A and B have failed
in two chances each, the probability- of which is i X ij. X ix~.
- Thus A's chance of success is the Sum of the infinite series
i{I+(~)2+(~)' •••••. }
FUNDAMENTALS OF STATISTICS

Similarly 13's chance of success is the SUm of the infirute series


I Xi-{I+<i)I+<l)'·.···· }
In Oth~r words A's chanCe: B's chance as i:
<ixi-) or ~ : ~8"
Thus the chances of A and 13 are in the ratio of 6 : 5.
Or their respective probability of success are T-t- and ·d·
Therefore A's and B's. expectations would be
<rtXH) or Rs. IS'and (riXH) or Rs. 15 respectively.
Inverse Probability
Up till now we have been concerned with such problems in which
our knowledge of various causes which may produce an event was
enough to enable us to determine tHe chances of the happening of the
event. We shall now discuss some problems of a reverse type. For
example, we may know that an event has happened as a resnlt of some-
one of a certain number of causes, and we may have to find out the
probability of a partiqtlar cause being the true one. Such problems
hte known as problems of Inverse probability.

Suppose a bag contains three black balls and four white ones.
Another bag contains four black balls and five white ones. A black ball
has been drawn from one of the bags and we have to find out the
probability that it came from the first bag. It is clearly a question
of inverse probability. Here we know that a black ball must have come
either from the first bag or from the second one an~ we have to find
out the probability of the first case.
If a very large number of draws are made from the first bag
(the ball being replaced after each draw) and if this number is denoted
by N then we can reasonably ~ct that ~ N times we shall get the
black ball and i N. times, the white ball. Similarly if the draws were
made, from the second bag we shall get a black ball i N times and a
white ball -& N times. This is a question of the general theorem which
is due to James Bernoulli and was published in Ars COnJtctandi in 1713,
eight years after the death o( the author. It should be remembered
that. this theorem holds good only when N is large. In ten draws from
the first bag (wjth replacement) it is not unlikely to get a black ball
each time but it will not happen so, if the number of draws is say 1000.
In a large number of draws the number of times we shall get a black
b311 would be more or less equal to pN or ~ N.
In the above illustration the 'probability that the ball came from
the first bag would be equal to the probability of the favourable events
~vided by the sum of the probabilities of all possible events. In other
words the probability would be
"_ ';'N _a 63
x 1nr
p- ,}N+j.N -'f

~'~ii
PROBABILITY

i The fundamental theorem of inverse probability may be stated lls


follows :-
"An event is known to have pror;eeded from one of N mlltllallY exclll.tive
callJes whose probabilities are PI' P2 •••••• P n' Fllrthermore let PI' Pa· .. ·Pn· be
the respective probabilities that when one of the N callses exisls the ev.ent will
then have followed. The probabiljty that event proceeded from the mth CaIJ.,
is then
p_ Pm pm
- PIPI +P2 P2+······+ PnPn
In the illustration discussed above the probability of the selection
of any bag out of the two is i or Pl =P2 =i. Further the probability
of drawing a black hall from.t:he first bag or PI =i
and the probability
of drawing a, black ball from the second bag or P2=j.
If these figures are substituted in the formula given, the probabi-
lity that the ball came from the first bag would be

-
(-i- x +)
---------=-
27
.2_ X .l_) X (_I X_!,)
( 2 7 9
.2
Example 27. Suppose a black ball has been drawn from one of
three bags, the first containing three black balls anq seven white, the
second five black balls and three white, and third eight black balls and
four white, what is the probability that it was drawn from the first
bag?
If an event is known to have proceeded from one of n mutually
exclusive causes whose probabilities are PI' P2 ......... Pn and further-
more if PI' P2 ............Pn are the respective probabilities that one of the
n causes exists, then the 'probability that the event proceeded from the
mth cause is

P= ' PmP...
P1Pl+P2P2 ...... + P,.,Pn
In the given problem :-
'PI =P2 =Pa=j since it is just as probable that the ball was drawn
from one bag as another.
, also PI i. p.., the probability of drawing a black ball from the first
bag = L
10
and P2 i. e., the probability of drawing a black ball from the second
bag =j
and P3 i. e., the probability of drawing a black ball from the third
8
bag ;::-
12
648 FUNDAMENTALS OF STATI,)TICS

Therefore the probability that a black ball drawn belongs to the


first bag

= (1/3 x3/ Io) _ ~


(1/3 X3/ 10 )+(1/3 X 5/ 8)+(1/3 X S/l2.)- 191
Example 2.8. (a) .The probability that a certain event happened was
1110, and A, who is accurate in 49 Cases out of 50, said that it happened.
B agrees with A in stating that the event happened. B is accurate
in 9 cases out of 10. What is the probability that it actually did occur?
(b) Suppose if C, who is accurate in 7 cases out of 10, denies that
the event mentioned above happened, what is the probability that it
happened?
(a) Probability that the ~vent happened and that A and B were
right

1 X ~9 X ..2_ = 441
10 50 10 5000

Probability that the event did not happen and that A and B were
wrong

=.2.. X !_ X 2_ = _9_
10 50 10 5000

Therefore the probability that the event actuallY did occur


_ '44 1 /5 000
- 441/5000+9/5000

44 1 /S OOO _ 441 __ _49


=---- - -
45°/5000 4,0 50
(b) The probability that the event happened an;:! A and B were
right and C wrong

=- X
1 49
- X l.. X 1.. = 132.3
10 50 10 10 50000
Probability that the event did not happen and A and B were
wrong and C right
9 I I 7 63
=-X-X-X-= - -
10 So 10 10 50000
Therefore the probability t~t the event actually did occur
132.3(5 0000
~ IP'3! ,::>000+6315 0000
J313/50000 1313 11
=1386/50000 = J 38(. = 2.2.
PROBABILITY 6~9

Questions

1. According to the Life Tables, out of one lakh of persons living at age
to,82134 survive to age 40, of whom 837 die in a year. What is' probability for a
man of 40 of surviving one more year.
2. In 128 litters, each of five puppies, the number of males were distributed
as follows : -
No. of males per litter 0 1 2 3 4 5
No. of litters 3 22 44 38 16 5
Calculate the probability 'of a m:Ue birth.
3. Eight balls numbered from 1 to 8 arc placed in a bag and two drawn at
random. What is thl! probability that they are numbered 1 and 2 ?
4. Find out the probability that a man asked to form a two-digit number out
of 2, 3, 5, 7, 9 would form 79, when
(0) repetitions are not allowed.
(b) repetitions are allowed.
5. What is the chance of throwing a number greater than 4 with an ordinary
die whose faces are numbered from 1 to 6 ?
6. In a single throw with two dice find the chance of throwing (1) eight, (2)
eleven.
7. Compare the chance of throwing 4 with one dice, 8 with two dice, and 12
with three dice.
8. A and B throw with three dice; if A throws 14, find B's chance of throw-
ing a higher number.
9. From a pack of 52 cards two are drawn at random; find the chance that
one is a king and the other a queen.
10. In shuffling a pack of cards, four are accidentally dropped find the chance
that the missing cards should be one from each suit.
11. If n people are seated at a round. table, what is the probability that two
named individuals will be neighbours ?
12. A bag contains 4 white,S red, and 6 green balls. Three balls are drawn
at random. What is the probability that a white, a red and a green ball are .drawn ?
13. Find the chance of drawing a king, a queen and a knave in that order from
a pack of cards in three consecut,ive draws, the cards drawn not being replaced.
14. Goddard, the captain of the West Indies Cricket team, is reported to have
observed the rule of calling "heads" every time the toss was made during the five
matches of the Test-series with the Indian team, what is the probability of his winning
the toss in all the five matches ?
How will the probability be affected if he had made a rolle of tossing a coin privately
to decide whether to call "heads" 'ur "tails" on each occasion. (l. A. S., 1950).
15. A lady decl~res that, by tasting a cup of tea mlde with milk, she can dis-
criminate whether milk or tea infusion was first made into the cup. It is proposed to
test this claim by means of an 'experiment with 12 cups of tea, 6 made in one way
and 6 in the othe"-.way, and presenting them, in random order to her.
Calculate the probability that on' the null hypothesis the lady would judge
correctly all the 12 cups it being known to her that 6 are of each kind,
If, however, the 12 cups were presented to the lady in 6 pairs, each pair to
consist of either kind, and the presentation be again in random order, how will the
probability of correctly judging with every cup on the null hypothesis be altered?
Which of the two designs would you prefer and why? (T. A. S., 1949),
FUNDAMENTALS OF STATISTICS

16. In a given race the odds in favour-of three horses A, B, C, are 1:3; 2:3;
and 2:5 respectively. Assuming that a dead heat is impossible, find the chance that
one of them will win the race.
17. Find the probability of throwing 6 at least once, in four throws with a
.ingle die.
18. A and B throw alternately with a pair of dice. A wins if he throws 6
before B tlirows 7, a'nd B if he throws 7 before A throws 6.
If A begins, show that his chance of winning is ~.
19. A, B and C in order toss a coin. 1'he firs~ one who throws a head wins.
What are their respective chances?
20. A, B and C, in order, draw from a pack of cards, replacing their cards after
eac:h draw. If the first man to draw a heart wins, what are their respective chances?
21. (a) Given n independent events with respective probabilities of occurrence.
P1; p'J.; .... ·• .. •.... •• .. •..Pn
Write down the probability of at least one of these events happening.
(b) What is the p~babi1ity of getting 9 cards of the same suit in one hand
~t a game of bridge ? (I. A. S., 1951).
22. A problem in statistics is given to three students, R. G: T. whose chances
of solving it are i, ! and t. What is the probability that the problem will be solved?
23. A has six shares in a lottery in which ·there are three prizes and ten blanks.
B has two shares in a lottery in which there are four prizes and eight blanks.' Who
has the better chances to win a prize ?
24. A bas four shares in a lottery in which there are four prizes and ten blanks,
B has thrce in which there are three prizes and four blanks. Who has the better
chance of winning exactly one prize? Who of winning two prizes ?
25. A can hit a target 3·times in 5 shots, B, 2 times in 5 shots, C, 3 times in 4
shots. They fire a volley. What is the probability that 2 shdts hit?
(M. ,A., Punjab, 1945).
26. A is one of five horses entered for a race, and is to be ridden by one of
the two jockeys II and C; it fa 3 to 1 that B rides A, i n which case all the horses are
equally likely to win, if C rides A his chance is doubled. What are the odds in
favour of his winning ?
27. Fourteen quarters and one-five dollat·gold piece are in one purse and fifteen
quartel)l are in another. Ten coins are taken from the first and put into the second.
and then ten coins are talten from the second, and put into the first. Which purse
is probably the more valuable?
28. A man draws from an urn containing two balls, one white and one black.
If he draws a white ball he wins. If he fails to draw a white ball, the draw is replaced,
another black ball is added and he draws again. If he fails to draw a white ball in
the next draw, the process is repeated. What are his respective chances of \yinning
in 2. 3, 4, 5, 7, and·10 trials?
29. In each of a set of games it is 2 to 1 in favour of the winner of-the pl'evious
game, what ie. the ~cc that the player who wins the first game shall win three at
retlat of the next four?
30. If the chance that a vessel arrives safely at a port is 'l"!'
find the chance
that out of 5 vessels expected 4 at least will arrive safe!y.
31. There are 9 coins in a bag, 5 of which are sovereigns -.nd the rest are
unknown coins ;)f equal nlues; find what they must be if the probable value 0 f
• draw is 12 shillings.
32. A pays B 1 Ih. to guess the number of heads in a single toll of 4 coin s
what expecutlon should B plate on each of the possibilities; no head, one head. twO
bqds, etc. ?
PROBABILITY

33. A bag contains S balls which are just as likely to be white as coloaued.
Two white balls are drawn from the bag what is the probability that all are white "1
34. Suppose A is' known to tell the.. truth in five cases out of six and he stllb:S
that a white ball was drawn from a bag containing 9 black and one white ball what is
the probability that the white ball was really drawn ?
35. A, Band C run a race. The odds are 8 to 3 against A, S to 2 against B.
Assuming a dead heat to be impossible, find the odds against C.
36. The odds,against A solving a problem are 8 to 6, and the odds in favour of
B solving the. same 15toblem are 14 to 10. What is the probability that if both of
them try the probfcm will be solved ?
37. Four persons draw each a ....ard from an ordinary pack, find the chance
that no cards .are of equal value.
38. A, B and C, in order cut from a pack of cards. replacing them after c:;adl
cut. The first to.cut a diamond is to win, what are their respective chances ?
39: A and Blwho are players of equal skill leave a badminton game when A
had scored 12 points and B 13. If the game was to finish at IS, and the winner was
to get Rs. 32, what share each ought to take ?
40. P has in ilis pocket one sovereign and four shillings. He takes out two
coins at randoul and promises to give one coin to R and the other to Q: What is
the worth of Q's expectation' i>
41. There are three balls in a bag. But j t is not known of what colour they
are; one ban drawn from the bag is of red colour. What is the probability that all
balls are red ?
42. There are 6 balls in a bag. Their colours are not known. A draw of
3 balls is made and all of them are finite. Whaf is the probability that no white ball
is left in the .bag.
43. A bag contains 5 balls. Their colours are not known. A ball is taken
out and found to be )Vhite. It is replaced and another draw is made. This ball also
happens to be white. This is also replaced, and a simultaneous draw of two balls
is made. What is the chance th~t both of them would be white ?
44. A speaks the truth in 75% cases and B in 80% of the cases. In what per.
centage of cases are they likely to contradict each other in stating the same fact jI
45. A and B are tWo very weak students of statistics and their chances of solVing
i
- a problem correctly are and ii respectively. -they are given a question and obtain
the same answer. If the probability of their making a common mistake is TU1ST find
the chance that their answer was correct.
4.f). A and B play for a prize of Rs. 324. A is to throw a dice first and is to win
if he throws 6. If he fails, B is to throw ar.-:l is· to win if he throws 6 ()r S. If he
fails, A is tlJ throw again and to win if he throws, 6 or S or 4 and so on. Find their
respcctiTe expectations.
47. Eight mice are selected at random from a large number and then divided
into two groups of four each-group A and group B. Each mouse in group.A is
given a dose 'a of a certain p,oison whIch is expected to kill one in four.. Each m~
in group B is given a dose b' of another poison which is expected to kill one in two.
Show that, nevertheless. there may be fewer deaths in group B than in group A and
find the probability of the happening. (M. C~1II., Allahabad, 1954)
~. Define 'mutually exclusive events' state and prove the theorem of addition.
of probabilities concerning mutually exclusive events.
A. D and C in order toss a coin. The first one to thrr-;v a hcad wins, what an::
their respective chances of winning ? .Assume that the game may continue indefinitely
(1. A. r .• 1955).
FUNDAMENTALS OF STATISTICS

49. There are m points on a line consisting of the ro)ours black and white.
If p and fJ are the probabilities of a point being black and white respectively so that
p+!l=1, find the expectations of obtaining
(i) a black-black join
(i) a black-white join, and
(iii) a white-white join,
a join being defined as the line joining adjacent points. (1. A. S., 1949).
50. Ix special type of an automatic telephone dial can be designed to consist
always' of 3 integers subject, however, to two restrictions, namely (0) that zero
cannot be dialled first and (b) that two consecutive integers cannot he dialled, the
lower one first and the next higher one immediately after. (For this purpose '9'
and '0' are not consecutive integers.).
How many subscribers can be served and in how many of their numbers will
there be at least one 2.ero ? (1. A. S., 1953).
51. State and prove the theorem of multiplication of probabilities.
p is the probability that a man aged x will die in a year. Find the probability
that out of 5 men, A, B, C, D,E each aged x, A will die in the year and be the first
to die. (1. A. S., 1954).
52. The following table gives the values of p (x, i) the probability that a person
aged x will survive upto-{i-1) years more but die before the end of the j·th year : -

I~ 70
1

.25
2

.20
3

.15
4
.10
75 .30 .25 .20 .15
(0) Find the chance that out of 10 persons, each now aged 70, exactly 5 will
die before the end of 2 years. I
(b) I pay to each person aged 75 an amount of Rs. 1,400 and ask him to pay me
a certain amount at the end of every year so long as he is alive, upto a maximum of
·t years. Find the value of the yearly instalment which I must receive from each if
I intend not to gain or lose in this transaction in the long run. Ignore all considera-
tions ?f investmen~ of money, interest etc. (1. A. S., 1957).
53. A fiction and a non-fiction book are selected from a bookshelf containing
12 fiction and 30 tl6n-fiction books_ In how many ways can the choice be made ?
54. It is considered that the only way to maintain peace between 6 countries
is to have non-aggression pacts between every possible pair of countries. How
many pacts are necessary ?
55. A customer buying a dozen eggs always examines a sample of 3 to see if
they are fresh. In how many ways can she pick the sample? If the dozen includes
3 bad eggs, in how many ways can she take a sample which includes at least one bad
egg?
56. 1 have 24 friends whose birthdays are equalJy likely to fall on any day of
the year and who will send me an invitation to their birthday. What is the chance
Lhat I will receive two or more invitations for the same day ?
n. Suppose it is 9 to 7 agaulst a penon A, who is noW 3S year~ of age living
till he is 6~. and 3 to 2 against a penon B noW 4S living till he is n: find the chance
that one at least of these persons will be alive 30 years hence. (B. A., Punjab.• I9s 8).
~ 8. The following mortality table shows the number of survivors to various ages
of 1,00,000 newly born males.
,\ge Survivors Age Survivors
0 1,00,000 60 67.7 8 7
10 93,601 70 4 6 ,739
20 9 2 ,293 80 19,86')
30 Qo,oqa 90 2,81ll
40 116,880 100 6
l() 80 $11
PRODABILITY

(i) Find the probability of a newly born infant in this population Jiving still
60 years old. (ii) Find probability of a 2.0 years old in this population living until
he is 50 years old.
s!); Three groups of children contain respectively 3 girls and I boy,2. girls and
2. boys; I girl and ~ boys. One child is selected from each group at random. Show
th~t the chance that the three selected at random consist of 1 girl and 2 boys is ;!
(M. Com., Raj., 1965).
60. A factory using quality control methods mass produces an article and pass
rccords shoW fhat on the average 4 ~rtkles are found defective out of evcry batch of
100. What is the maximum numba--of defective articles likely to be encountered in
a batch of 100 ?
It is brought to your notice that recently several batches of 100 were turned out
containing I I to 15 defectives. What inference would you draw ?
61. A and B stand in a ring with ten other persons. If the arrangement of u
persons is at random, find the chance that there<are exa_ctly three persons between
A and B. (Agra, M. St., 1951).
62. 'P' is the probability that arc managed X will die in a year. Find the pro-
Iiabil ty that out of five men A, 13, C, D and E each aged X, A will die in the year
and be the first to die. (I. A. S., 1954)
Theoretical Frequency
Distributions 24
.Me.n~

In most of the examples relating to frequency distributions in


earlier chapters (excepting CHapter 2.4) we have taken the data arising
out of either actual observations or exp'triments. For example, series
relating to height measurements or marks obtained by students or wages
of 11lbourers are all obtained by observat~on or measurement of data.
It is also possible to start on a--certain assumption and to deduce what
the frequency or the frequency distribution of a particular' universe
shall be. , Such JistriblitiolU which are not obtained by actual observations or
experiments but are 11I1z1hematit'alty deduced on certain assumptions are called
Theoretical FrefJllenry Distributions.
Theoretical frequency distributions are of many types but three
of them are of very great importance in statistical analysis. They are :""7
(1) Binomial Distribution-due to James Bernoulli (published in
1713, eight years after his death).
(%) Norll(al Distribution-due to Demoivre, (often associat~d with
the. names of Laplace and Gauss who 4iscussed it at the close of the
18th centus:y). ,
(J) Poislon Distribution due to S. D. Poisson (published in 1837.)
We shall discuss these in turn.
BINOMIAL DI~ TlI.IBUTION

It i. a particular case of multinomial distribution and is of very


great importance in research and problems connected with probability
and sampling. In the last chapter we discussed the procedure of finding
out the.~robability of.the happening o~ exac~ly r ev~nts in n trials, if the
probabdity of happemng of the event In a SIngle' trIal was known. We
had feetl th2t the desired probability was obtained by IIC,prq"". This
formula is lnsed on the binomial theorem. This will be evident from
the following. lines. .
Suppose two coins a and b are tossed simultaneously. There are
four possible outcomes as follows :-
COlns
a b
H=Head
H H T=Tail
H T
T H
T T
THEORETICAL FREQUENCY DISTRIBUTIONS

Thus the coins can fall in four ways-either 0 and bfaU heads or a
heads and b tails or 0 tails and b heads or both 0 and b tails. If p stands
for the probability of a coin falling head and q for falling tail, then,
The probability of two heads is PXp=p2
The probability of one head and one tail is=(pxq)+(qxP)=pq
+pq=%pq
The probability of two tails is, qxq=q2
The reader who has done even elementary algebra will at once
recognize the terms of the expansion (p+q)2 which are p2+2.pq+q2.
This gives us a clue to follow up.
Let us noW analyse the sample (jf three coins. If three coins 0,
band c are tossed simultaneously the results would be as follows :-

Type of event Ways of Arising Probability Probability


of wa,/ of the type
of result.
a b &
Three heads H H H p3 p3
Two heads one
tail H H T p2q
H T H pqp 3p2q
T H H qp2
One head and H T T Pfi2
two tails T H T qpq 3pq2
T T H q2p
Three tails T T T q3 q3

Thus the probabilities of three heads, two heads and one tail, one
head and two tails, and 3 tails are respectively, pa, 3p2q, 3pq2 and -q3
These are the terms in the expansion (p+q)3
Thus we arrive at a very simple rule of finding out the prCi>babilities
of three heads, two heads, one head and zero head. Their respective
. probabilities would be as follows :-

3 Heads=p3=(!)3 =1
2. Heads = 3P2q= 3(1)2(t) =i
2
I Head= 3P'1 = 3(i)(l) II =i
o Head=q3=(l)1 =1
The terms of the binomial expansion are (P+'1)ft
n(n-I)
=bft+npn-lq+ _ _pn-2q2+ n(n-I)(n-z)p,,-aq3+ ...... +q"
IXZ IxaXJ
6S6 FU'NDAMENTALS OF STATISTICS

The numerical coefficients in thI~ expression can be obtained


without actual multiplication. 'J'l-.'!j. would be respectively ,"Co' "Ct ,
"C:, etc. They can also be obtained from Pascal's Triangle as given
below :-
Pascal's Triangle
um rs
in the
Sample Coefficients in the Expansion of
or power (p+q)"
, of the
exponent
I I I
2. I 2. I

4
~,
1
1
5
'V '
441
10 10 1
I 6 15 20 IJ 6 I
7 1 7 2.1 35 35 11 7 1
8 I 8 2.8 56' 28 8 56 70 1

10
9
I
1
10 4~
9 ~ 1:.6 12.6 84 ~
V 10 2.52. 2.10 \2.0 V 10
1
I

It is clear from the above that each term in the triangle is derives!
by adding the two terms in the line above, which lie on either side of
it. Thus as Is indicated .by three small triangles, in line four 6 is ob-
tained by aMing 3 and 3 in the third line; in line ten, 120 is obtained
by adding ;6 and 84 and 45 is. obtained by adding 36 'and 9.
Now if we have to raise a binomial (p+q)8 where P= i and q=
1, we can do it in the following ways :..£.
(p+q'3=.p8+3p3-1d+ 3(3~1) ,pa-Sqz+ 3(3- 1)(3- 2)q3
J '2 IX2. IX2.X3
=P'+3Plq+3pqa+~l
=(1)1+ 3(1)1'(1)+ 3(1)(1)2+(1)3
I p 2.7 - 27
= 64 + 64 + 64+ 64
\Ve would obtain the same, results if instead of actual multiplica-
tion of n (n-I), (n-z) etc.,- for finding out the numerical co-efficients
of various terms we s!lbstitute sC3 , sC~, aCI and 'Co
In such a case
(P+i)3 =3C3P3+3C~pa.q+3C1P~2+3Ccq3
=p3+ 3P"q+3Pq2+q3
THEORETICAL PREQUENCY DISTRIBUTIONS'

If we consult Pascal's triangle for the power 3 we shall find that:


the numerical coefficient are I, 3, '3, I, and from these figures we can
easily write down the expansiQn of the binomial (p+q)S as
Ip
3
+ 3P2'1+ 3pq2+ 1'1 3
We have thus seen that binomial can be raised without much
difficulty with the help of either nCrpr qn- r or Pascal's triangle .• The
expansion of a binomial in the above fashion would give us the proba-
bilities of r successes, r - I successes, r-2 successes, etc., in n trials. If,
however, such n trials are 'repeated N times the chances of r successes~
or r - I successes etc" would be obtained by the. expansion of N(p+q)n.
Thus, if four coins are tossed simultaneously 256 times the number of
times we can expect to get four heads, three heads, etc., would be ob-
tained by the various terms of the expansion 256 (p+q)4.. H P stands
for the chance of getting a head in a single throw of coin and q for
getting a tail their values would be P=i, q_=i and
256 (p+q)'= 2. 56 (p" +4p3'1l +6p2 q2 +4p1q3+'1')
Since P=q=i
H6 (i+i)4;:::: [2. 56(i)'J+ [25 6 X 4 (t)3(t)J+[25 6 x6(t)l(i)2]
+[256 x 4(!)(t)SJ+ 25 6(t)']
= 16+64+96+6 4+ 16
Thus the number of times we can expect 4, 3. %, I, and a heads
in 2.56 trials would. be respectively J6, 64, 96, 64 and 16.
These are their theoretical or expected frequencies in the above
case. The frequencies actually obtained may be ditferent from these.
However, if the value of N is large ':vhich means if in the above case
instead of 2.56 throws, the number cf throws are 2560 or %5600, the
difference between the expected and observed frequencies would not
be much.
General .form of binomial distribution
The general form of binomial dist:ibution depends on the following
two things :-
(I) The values of P and q; and
(2.) The value of the. exponent 11.
If P=q=! the distribution obtained 'would be symmetrical and
the expected frequencies on either side of the ce~tral value would be
identical. Thus in the case where four coins were tossed 2.56 times,
the frequencies for 4, 3, 2, 1 and 0 heads were obtained as 16, 64, 96..
64 and 16 respectively. The distribution is thus symmetrical. If,
however, P is not equal to q the distribution would not be symmetrical
but skew. In the above illustration if p was equal to {- and '1, the i
frequencies obtained would have been respectively I, 12, 54, lOS and
81. It can be verified by the expansion of 2.5 6 ({+j.)4.
•2 A
FVNDAUENT.AI..S OF 'STATISTICS

Clearly this distribution is skew. If the value of p is increased


from !' to i the position 6f the maximum frequency would gradually
advance and the two' tails of the distribution would become less skew
until the value of p is equal to t when the two tails of the distribution
would be perfectly symmetrical.
, If P= q the effect of increasing the value of the exponent 11 would
be that the value of mean would increase and so would the value of
measures of dispersion. If p is not equal to q, an increase in the
value of p would not only raise the value of the mean and dispersion
but would als9 reduce the asymmetry of the distribution. It means
that if p and q are not eq\lal then also we can obtain a more or less
symmetrical distribution if the value of p is large. If P=T~ and
q=T~ the expahsion 10,000 (p+q)'lO would give us the following
distribution :- ...

Number of Expected Frequency


Successes (approximated)
0
1 68
2. 2.7 8
3 7 16
4 13 04
5 17 89 \
6 19 16
7 16 43
8 1144
9 654
10 308
II 12.0
12. 39
13 10
14 2.
15
16
17
18
19
2.0

Here even though p is not equal to g the distribution is not very


skew. It is on account of the fact that the e:l!-1?onent P has a fairly
high v~lue of .10. If it is raised still funher the distribution would
become more symmetrical. If in the abou case_1he vatue of ~.was only
3, then the ,.various frequencies in.the .expansion .of 10,000 .(.!'~+!''!u)~
would have been .170, 1890, 4410 and H30' In other words, the
distribution would have been as follows :-
- THEORETICAL FREQUENCY DISTll~~ON!i-

Number of Successes Expected Frequency


o
I
Z
3
It is very obvious that this distribution is many times more skew
than the first one, where the value of n was zoo
, Thus we conclude, that the type of distribution. that we shall
obtain would depend on the values of p and q as also on. the value of
the exponent n. We shall get a symmetrical distribution if p is equal
to q and a skew distributl.on if p and q are ~nequ;tl. However, when
p and q a.re unequal but the vafue of exponent n is large the distribu-
tion would not be very skew. ~

Comparison' of actual and expected frequencies


It .has already been observed that if the number of experiments
are large or ,if the value of N is large' the difference between the
frequencies observ;ed and expect_ed· would nOt be very significant. It
is on this. assumption that various types of sampling experiments are
conducted: A comparison of actual and expected frequencies can be
done either graphically or algebraically. In the first method both the
actual and expected frequencies are plot,ted 6n a graph and if the two
curves are similar and close to -each other the difference is supposed
to be insignificant and the fit is said to \>e good. If, o~ the other
'hand, the curves are. pulled away from each other the fit 1S poot and
the difference betwtten actual and expected frequencies significant. In
algebraic methods the actual differences between the observed and
expected, frequeQcies are found out and then the various tests of
significance are applied tot conclude whether the differences are signifi-
cant or not. We shall discuss these tests in a later chapter on Sampling.
Fo~ the present we give below the actual and expected frequencies of
a particular event.
Suppose IZ coins are tossed 4096 times and suppose the falling
of head is said to be success. The number of 12., 11, 10, 9, 8, etc.,
successes would be ,obtained by the expansion of 4 0 96 (p+q)12.
Since p='1=1 the expansion would be 4096 (t+t)12. This would
give us 'the theoretical or expected frequencies. The actual frequencies
may differ from the expected frequencies. The following table gives
the expected' and actual frequencies of 4096 throws of 12. coins :-
Number of Heads Expected Frequency Actual Frequency.
12. I
II u.
10 66
9 12.0
8 49S
60C- FUNDAMENTALS OP STATISTICS

7 79z 847
,
6 92 4
79 z
94 8
731
4 495 4;0
5 220 19 8
% 66 60
I 12 7
0 I 0

. We can cOII}pare the actual and the expected frequencies by plot-


hng them on a graph paper. The above frequencies are plotted below
in figure No. I, and it would be seen that the two curves are remark-
ably close to each other. The fit is thus excellent and the difference
~e~een the actual and expected frequencies does not appear to be
51gOlficant.
~~r--r--~~~--'-~--~~--T--r--~~~
I ,. 'J
--- T'MotY(!cal Frt'l:{OfflC)'
- Actual f'~'1«I1C'
,1 ~
-\'\
,
8M ' ,

r~,.\
~600r-~~-4--+-~V/~--+-~~'~-+~~I~-4~
,1 II ,
i'\
~~ 400r-+-___
;v'
~-+---+-~,~---+_~--~---~~~~r+--4_--~_+~
I
200r-~_+---r~~_r--t-_+--+__4--~~r__+--+_4
'\
V '\
o~~/~~~~~~~~~~~~
o3 4 S 6 7 8{9 W «
2 n
NumhH of Succ,ss,.s
Fig. I
Mean and standard deviation of binomial distribution
The mean and standard deviation of such theoretical frequency
distributions where we know the number of independent events and
the probability of the happening of the event in question, can be very
easily calculated. If M stands for the mean of such distribution, n for
the number of independent events and p for the probability of the
happening of the event in a single trial.
M=np
In the question relating to J2 coins solved above the value of the
me:tn of the theoretical frequencies would thus be
THE ORE'll CAL FREQUENCY DISTRIBUTIONS
6,6t
M=u. (1)=6
The mean of the actual frequencies comes to 6.14. Thus the two
means arc not very different from each other.
The '\"alue of the standard deviation of the expected frequencies in
-such cases is
a=VfiPi
lIt the above example the standard deviation of the theoretical
frequencies would be :
a=Vu(!X!)=:Y'3' = 1·73
The standard deviation calculated from the actual frequencies
comes to 1.71. Thus a binomial distribution has mean np and standard
deviation Vnpq.
NOlU{AL DISTRIBUTION

In figure No. I of this chapter the curve which represents the


theoretical frequency distribution is a perfectly symmetrical figure. It
is a 11. sided polygon. Since there are 13 points representing the I;
frequencies there are u sides in the figure. If the value of the expo-
nent n was not lZ. but 8, there .would have been an 8 sided figure and
if the value of the exponent was 100 there would have been a 100 sided
figure.
If in the binomial (l+l)" the value of n is infinity, we would have
a very very large number of points on the graph and a perfectly smooth
symmetrical Curve would be obtained. Even if p and q were not
equal but if the value of the exponent was very ·very large, we would
still get a curve almost perfectly smooth and symmetrical. Such smooth
and symmetrical curves are called Normal Prol..?bility Curves or Normal
Curves of E"or.
A normal curve possesses many important properties and is of
extreme importance in the theory of errors. The relationships in the
normal curve are as follows :-
(I) The arithmetic average, median and mode coincide.
(1.) The first and third quartiles are equidistant from the median.
(3) Mean deviation is .7979 of standard deviation.
(4) Semi-inter-Quartile-Range is equal to the probable error
which is .6741 of. the standard" deviation.
(5) The ordinate at the mean is the highest ordinate. The height
of the ordinate at a distance of one standard deviation from the mean
is 60.653% of the height of the mean ordinate and the heights of other
ordinates at various sigma distance from the mean are also in a fixed
relationship with the height of the mean ordinate.
(6) The points of inflection (the 'points where the curvature
changes its direction) are each one standard deviation from the ordi-
nate.
(7) The curve is asympotic to the base line. It means that it
continues to approach but never reaches the base line.
,.
6161 FUNDAMENTALS OF STATISTICS

(8) The most important relationsbip in the normal curve is the


area relationship. Since ordinate. .at a given sigma distance from the
mean has always the same relationship.with the mean-ordinate. it follows
that the are~ of the curve enclosed between the mean ordinate and an
ordinate at a certain sigma distance' from the mean woUld always be the
same proportion of the total arta of 'the curve. Thus the area enclosed
between mean ordinate and an ordinate at a distance of one standard
deviation from the mean is always 34.134% of the total area 'of the
curve~ It melitis that the area epclosed between"'two ordinates at tine
sigma distance from the mean on either side would always be 68.268%
of the total area. Similarly the area between two ordinates at two
sigma distance.from the mean ordinate on either side would be 9~.4~%
of the total area. Ordinates at three sigm4 distances from the mean on
either side would enclose 99.73% of the total area. The following figure
shows the area relationship in a normal curve :-
Area Relationship in a Normal Curve

Fig. 2
The above figure shows the area enclosed by'ordinates at I. 2 and 3
sigma distances from the mean ordinate. The fbllowing table shows the
area relationship in a normal curve in more details : -
Area of the Normal Curve between Mean Ordinal,' and Ordi~all.f al
various Sigma distanctS fr011l 'the Mean as Percentage of the Tptal Area.
< , '

Distance <from the Perc~tage of .total


Mean Ordinate Area
0.5 U 19.146
~34·I34
43.3 1 9
47·5°0
2.0U 47·72.5
2.5 U 49·379
2.5758 U 49·5°0
3.0 U 49. 86 5
THEORETICAL FREQUENCY DISTR.IBUTIONS

Thus the two ordinates at distance 1.96 0 distance from the.


mean on eit'her ~ide would 'enclose 47.1+47.1 or 95% of the tot!ll area.
and two ordinates at 2..175 8 0 distance from the mean on either side
would enclose 49.5+49.5 or 99% of the tctal area. The area enclosed
between ordinates at 30 distance 'from the mean on either side would
be 49.865+49.865 or 99.73% of the total area:
We have specifically mentioned the area ,~n~19~~d, ,betweenAhe
ordinates ·l\t"1~'65'di~at'lt:!. '~:.·5Ug'·U'distance:
"because in various tests
of significance these figures are most commonly used. Various tests of
significance are made by taking into account 95%. 99% and 99.73% of
the area of normal curve. ,
Equation or normal curve
The, equation of this curve can be written in variou~ ;ways. The
simplest form is as follows :-
Xli
y=yoe- 20 2

Where .1=the computed height of an ordinate at a distance of x frpm


the mean .
.y -the height of the maximum ordinate at the mean. It is
o constant in ·the equation.
e=A constant having the value of 2..7182.8 (It is the base of
the natural or Napierian logarithms).
0= standard deviation.

x=.!.ny given value of the d~dent variable expressed as a


deviation' from the mean.
The maximum ordinate or
Ni
.Yo.- av'~-

Where N-Total number of items in the sample


i =- Class interval
1;-The constant 3.1416 ; Y.zn-.2..,066
Thul "
Ni
Yo '.2..50660
The equation of the normal curve can now be written as
Ni 'Xl
.Y- ov'.2.1; , - 2all
. Ni Xl
or y= %.5 0 66(1' .2..7 1 8.2.8- %al
FUNDAMENTALS OF STATISTICS'

With the above formula fl normal curve can be fitted to any given
frequency distribution. The height of the mean ordinate can be calcu-
lat'!d by the formulayo= N~6 u
2.50
and then heights of" other ordinates
at various a distances from the mean ordinate can be calculated by the
formula
Ni x2
- 6 6U 2.7 1828 -
Y= 2.50 --2
2.0'
The following example would illustrate the procedure of fitting a
normal cu~ve.
Example 1. Fit a nor'mal curve to the following frequency distri-
bution relating to the heights of certain children :-
Heights (inches) No. of children

40.5-4 2.5 'I


4 2.5 -44·5 4
44.5 -46·5 2
4 6 .5 -4 8'5 IS
48.5 -5 0 .5 13
50 .5 -52·5 14
52·5-54·5 23
54,5-5 15 ,5 10
,6.5-5 8 '5 7
58.5- 60 .5 10
60.5- 62 .5 .3
62.5- 64.5 0
64.5- 66 .5' I

Total 106

In the above frequency distribution the number of items is i06


and class interval is 2. The standard deviation is 4· 7
The first thing that we should do now is to find the height of the
mean ordinate. The formula of finding the height of the mean ordi-
nate is
Ni
Yo= 2.50660'
Now substituting the values in the above formula we get
I06X2 212
Yo= 2..5066 X4.7 Il·7 810 2.
= 17.993 or approximately 18. Thus the height of the mean
ordinate is 18.
The height of the ordinates at one u 'distance from the mean
would be ,.
Ni , (4'7)~
Y= -,2.-'5-o-;"66a-::--- 2.~7182.8:- 2(4.7)2
!l:IEORETICAL FREQUENCY DISTRIBUTIONS

==

I 1
=18 X .1 =18X
V %.71828 1.6489
== 10.9 1 75
Thus the heights of the ordinate at one (7 distance from the mean
on either side would be 10.9175. Similarly the heights of other or-
dinates can be calculated and these points can be plotted to obtain a
normal curve.
In actual practice these Calculations ake not done. Only the
height of the mean ordinate is calculated and the heights of other
ordinates are seen from qta,thematical tables. It has already been said
earlier thai the heights of the ordinat~ at various sigma distances ftom
the mean are ~ a fixed relationship to the height of the mean ordinate.'
It has also been said that the height of the ordinate at"one sigma distance
from the mean is 60.653% of the height of the inean ordinate. Thus
if the height of the mean ordinate is 18 the height of the ordinate at
.
one Sigma distanc:e wo uld be 18 X60.6n = 10.9175. Thi' S IS enctly t h e
100 .
sarne figure which we obtained by the use of the formula. When the
heights of various ordinates have been found in the above fashion a
normal .curve, can be drawn from them.
The following tal;lle gives the heights. of ordinates at various sigma
distances from the mean ordinate as percentages of the .height of the
mean ordinate :-
_ _..,D
...l~·~e from Percentage height of the ordinate
1\lean ordinate as compared to the height
of the mean ordinate
.2. 5 ~'9%5
.5 0 88.%5 0
·75 75.484
1.00 60.615
1·%5 45.783
1·50 .P·46 S
1·71 2J.6%7
.t.OO 13·534
.1·%5 7.95 6
.1.5 0 4·394
.t·n %.2.80
,.00 I.IlI
666 ltUNDAMENTALS OF STATISTICS

Basic assumptions of normal curve


Normal distribution and consequently the normal curve discussed
above deal with the occurrence based on pure chance and the logical
assumptions underlying them are : - '
(x) The casual forces which affect individual events are independent
of each other. .

(zr The casual forces affecting individual events are numerous


and. of approximately equal weight.
(3) The operation of casual forces is such that it produces devia-
tions about the mean which are equal in number and size on either side.
It means that the deviations below the mean are equal in number and
magnitude to the deviations above the mean.

These three assumptions hold g,?od generally in the realm of pure


chance. They also hold good in case of physical and biological sciences.
The-leaves of a tree, for example, vary in size in accordance with the
above rules. There will be a few very short and a few very long
le,aves as they tend towards the mean length, their frequencies ate
generally in accor~ance with the nor~al frequency distribution. Simi-
larly the heights of trees of the same variety and age in the same
locality would conform to the laws of normal curve. The heights of
men of matured ages belonging to the same race anli living in similar
natural conditions would also give a normal frequency distribution'
The similarity between pure chance occurrences and the distribution
of natural phenomena extend's the applicatiort of scientific and statistical
methods. to the realm of physical and biological sciences. In fact
wherever these laws apply definite conclusions and sound generaliza-
tions can be obtained and then statistical measures no more remain
pure descriptive devices. Foretasting becomes an easy and accurate
job under such circumstances.

In economic and social data, however, normal distributions are


very rare on account of the fact that the three basic principles laid
down above, do not hold good in such cases. Here the causes affecting
a phenomenon are not always independent of each other. A coin
when tossed has equal chance of falling either head ot tail but in hutnan
relationships events are not thus fully free or independent. Varieus
types of relationships exist in human beings 'which 'affect the inde-
pendence of various factors affecting them. The child of II mentally
deficient parent has not the same chance to be· mentally normal or
superior as others. The ages "Of 'husbands and wives are highly corre-
lated and the chance that a husband of Z5 years will have a wife of 50
years, is not the same as the chance of his having a wife of ,U. years.
Similarly there is association between economic status and cultural
development, so that the children of such parents who ate poor have not
the same chance of normal culture.and development as the childr~n of
-THEORETICAL FREQUENCY DISTRIBUTIONS

the rich. Thus, f9rces affecting human beings are not independent in
character. Further, they are not evenly balanced nor do they tend to
produce variations of equal size and magnitude on ,either side of the
average. The value of mode, median and mean do not coincide here,
and as such the distribution which is obtained is not symmetrical but
skew either to the :tight or to the left.
Besides this, there are other difliculties also it}. so~al spences •
.The datil 'are' not only very complex, variable and skew to a marked
extent, but the complexity, variability and skewness are not uniform and
permanent. They are subject to trends, cycles and similar other com-
plex changes. Under such circumstances it becomes difficult to do
dependable researches and to arrive at solid conclusions. But this. does
not mean that the p,roperties of the normal distribution and various
types of inferences that. can be drawn from them are useless in case of
social and economic pheno~ena. In fact as we shall see in subsequent
chapters, it is generally in these fields that the properties of the normal
distribution are utilized in drawing various types of inferences. Of
course these inferences are drawn under certain assumptions and with
varying degrees qf dependability but this does not matter much in eco-
nomic and social studies which are basically characterised as belonging
·to 'inexact sciences/
POISSON DISTRIBUTION

We have s<!en in earlier sections that even when p and tj are un-
equal, a binomial distribution tends to be a normal distribution provid-
ed ,the value of the exponent l1'is sufficiently large so that (p-q) becomes
very small as compared to Vnpq. If, however, p is indefinitely small,
the limits of the series are found in a different way. Here we presume
that n is very large and that the average of the series or np is a finite
number. If the average of the series is represented by a or in other
words if np=a and if the above conditions are· satisfied the binomial
distribution assumes a very convenient form.
The probability of II successes is then obtained by the following
rule :-
an
P"=ronr
where e is the base of the natural logarithms and has a value of 2'7183.
The above equation was given by Poisson in 1837 and is known after
his,name.
As an example let us consider the ~ollowitlg d~ta collected by
BorlkelPiith and quoted by R. A. Fisher showing the clulnce of a cavalry
man being killed by a horse kick in the course of a year. The data are
based on the records of 10 Army Corps for 2.0 years and thus gives 200
readings :-
66S FUNDAMENTALS OF STATISTICS

Number of years the Number of total deaths


Deaths number of deaths (Col. x X Cob)
occurred
(x) (%7 (;)
-
0 109- 0
1 65 65
2. 2.% 44
; 3 9
4 I 4
5 0 0
6 0 0

Total 2.00 12.%

The total number of deaths is I2.%. It is in 2.0 years in 10 army


corps SO that the average number of deaths per yel!r per army corp 'is
~
.to X 10
=61. It is the mean and as such would
_
be represe~ted by a.

Now if we have to find 'out the theoretical frequencies in the


above case for 0 death, x death, 2. deaths, etc., it ~lI,n be very easily
done by the equation of Poisson distribution ,given above,
Thus the probability of 0 death is
aO
pO=e- 'Of
a

Now e-a=2·7x83-·61
I

I
- Anti-log (.6x x1dg 2.7183)
I
-Anti-log (.61 x 0.4346)
I
= Anti-log (.265106)
I 1000
= 1.841 = 18 4 1
0°= I because any figure raised to the power 0 is equal to I.
o 1=1
a O
Therefore pO or e-aOT
TaEOlU!TICAL FREQUENCY DISTlUBUTIONS

1000 1 1000
== 1841 X 1= 1841
This is the probability of 0 -deaths and therefore the number of
1000
deaths e'YfVocted
-r-
in 7.00 readings would be - - X
Is41
7000= 108.7

Similarly the probability of one death or


a1
pl=rGIT
1000 .61 1000 61
=--x-=--
IS41 1 lS41
x--
100

and the expected number of deaths in 200 readings would be


1000 61
-S-X
1 41
- - X3- 00 =66·3
100

Similarly the probability oi two deaths or


a2
p2=r4
21
= ~X .37 2 1
1841 2

1000 372 I
= 18 4 1 X iOOOO
and the expected number of deaths in 200 readings would be
1000 ~721
-8- X - - X 200= 20.2
1 41 20000

In the same manner we can find out the expected frequencies !=If;
and 4 or more deatl;).s. They would be respectively 4.1 and .7.
Now we can compare the actual and the expected frequencies
from the following table ;_
Number of deaths per Frequencies expected Prequencies
year per crop in 200 readings observed
0 108,7 10 9
I 66.; 65
2 20.2 22
~ 4. 1 ;
4 ·7 1
Total 200.0 200

It will be observed that the Poisson's distribution gives a very


close fit to the data.
FUNDAMENTALS OF STATISTICS

M,/III /IIIJ Standard Deviation of Poisson Distribution


We know that the mean and standard deviation in Binomial
distribution are respectively liP and v"iijq: It iS,easy to deduce the
values of mean and standard deviation of Poisson' distribution where
p =~ and q is very nearly equal to I (since p is very small).
11

In Poisson's distribution
Mean or a=np
a
=nx-
11
=a
and the standard deviation or
a=Vnpq
=J n X : X I =Va
Thus the standard deviation of the Poisson distt;ibution of
example No. I solved above would be V.6I=.78I. For the actual
frequencies the value of standard <;levja~on comes to .78 which is
"Very close to the expected value of the standard deviation.
Thus~ it is clear that the above series is anI excellent example of
Poisson distribution.
In fact the Poisson's distribution applies to such cases where we
can count the number of times an event happens.. but .where it is f)ltile
to find out the number of times it did not hippen. Thus in the
~bove illustration we can count the number of times a man was
killed by a horse kick but· it is meaningless and out of question to
nnd out the number of times a man was not killed by a horse kick.
Similarly 'it is possible to count the number of goals scored in football
matches by a particular team but it is not possible to know the number
of times goals were not scored by the said team. Here p or the pro-
bability of the; happening of the event is very. small and so q is almost
equal 'to unity. In such cases ,we cannot use the pinomial expansion
(p+q)" becaus~ the value of p is unknown. As such Poisson's dis-
tribution raises the equation

. It can be known even from elementary algebra that any figure


ralsed to the power a multiplied by the same figure raised to the power
-a, is equal to unity. Thus ~s X 5-1 =25 X 1 = I. Once this is under-
stood we can substitute the \l'alues of the m~:n of the series in place of
a and can get the probability of 0, I, 2., 3, etc., successes. T-here is only
one constant in this equation and it is a. In binom\al equations there
are two constants nand p.
THEORETICAL FREQUENCY DISTRIBUTIONS

The following example would further clarify Poisson's distri-


bution :-
Example %. The following mistakes per page were observed in
a book :-
Number of mistakes Number of times the
per page mistake occurred
o
1
%
3
4
Total
Fit a Poisson curve to the above data.
In the above e~ample the. mean, or the value of a is .44. The
probability of 0 mistake Qr

Therefore

and in 3%5 pages the expected frequency against 0 mistake per page
would be

Similarly
0 ~ 44
pl=Z'7 18 3-'" _:44 = 2. 93 X100
-
1 3%50
and the expected frequencies against one mistake per page would be
%093 44
-- X - X 3%5=9%.1
3%50 100'
Slmilarly
II Z093 ('44)2 2093 X 193 6
P =32~~ X -z- == 3250 zoooo
and the expected frequency against two mistakes per page would be
20 6
93 X 193 X 325 =20·3
3250 20000 .

respectively ,.0and 0.,


Similarly the frequencies against 3 mistakes and f01U mistakes wouls! be
FUN.DAMEN'I'ALS OF S'I'A'I'IS'I'lCS
~ ...,
The observed and expected frequencies would then be as
follows :-
Number of
Number of mistakes times Number of times
per page observed expected

2. I I 2. 0 9.3
°I 90 92.·1
19
%
3 5
" %0·3
3·0
4 0 0·3

Total 32.5 32.5. 0

It IS clear from the two examples g1ven above that 1n .P01sson


distribution we presume q almost e~ual to unity and do not try to find
out its value. In the above examples no attempt was made to find the
number of people who were not killed by horse kicks or the number
of cases in WhICh there were no mistakes in the book.
It should be remembered that our presumption is that the expecta-
tion of the event is constant. We nave already said that the value of
(J is a constant in Poisson distribution. If it is. not' constant but varies
from trial to trial, Poisson's equation would not give a good fit and
should not be used. Therefore, it should not be applied .to the number
of suicide caseS in a community because the number of suicIdes is not
constant. They vary with the stress of times, as was evident by the
suicide wave all over the world during the depression of 1930. Even
in case of number of goals scoted by a football team per match the
expectation is not constant, but depends amongst other things on the
standard of the opposite team. In such cases where the expectation
varies in different trials a modified form of Poisson distribution is
used which allows for the variability in expectation. We shall, how-
ever, leave it to be discussed in more advanced works.
Questions
1. What is meant by theoretical frequency distribution ? Pointt out the
chief characteristics of the Binolnial, Normal and Poisson distributic·ns.
2. Discuss the various types of relationships which hold good in a normal
distribution. What is meant by area relationship in a r.ormal curve and of what use
it is in the theory of sampling ?
3. When does a binomiafdistribution tend to become a Poisson distribution? In
which cases would you apply a Poisson distribution in place of bi nomial distribution?
4. How are the values of mean and standard devjations calculated in Binomial,
Normal and Poisson distributions ? Illustrate your answers with examples.
5. What is the proportion of the area of the normal curve included between
m±la, m±2a, and m±3a ?
THEORETICAL FREQUENCY DISTRIliUTIONS

6. Two types of electric bulbs have the same average life of 2000 hours. Their
standard deviations are, however, 15 and 20 hours respectively. In each case what
is the chance that the bulbs would not burn longer than 1800 hours ?
7. How would you fit a normal curve to a given frequency distribution ?
8. Fit a normal curve to the following data and find out its mean and standard
deviation : -
Results obtained by W.F.R. Weldon, of 4096 throws of 12 dice each, a throw ·,f
4, 5 or 6 being called a success : -
Success Frequency
o o
1 7
2 60
3 198
4 430
5 731
6 948
7 847
8 536
9 257
10 71
11 11
12 o
9. Assuming that half the population in India is vegetarian so that the chance
of an individual being a vegetarian is half and assuming further that 500 invatigators
each take 12 individuals to see whether they are vegetarians, ho~ many investigators
would you expect to report that four people or less were vegetarians ?
10. A card is drawn from a pack of 52 cards and thC{! replaced. The process
is repeated 10 times and the number of black cards drawn IS noted. One thousand
such experiments (of drawing 10 cards in each experiment) are conducted and the
results obtained are given below : -
Number of Black Frequency
Cards
o 0
1 8
2 46
3 117
4 210
5 245
6 206
7 120
8 39
9 6
10 4
What theoreti-;al frequency distribution would be expected to apply to the
above data and why ? Calculate the theoretical frequencies of t:1e distribution and
see if the fit is good.
11. The district of Rangoon was divided in 100 zones and th~ number of
direct hits on the residential houses during the fiying bomb raids in the last War was
recorded. Results are gh·en, below ' -
Number of Hits Number of Zones
o 23
1 35
2 23
3 12
4 4
5 2
6 •
L

Total 100
FUNDAMENTALS OF STATISTICS

W"nich theoretical frequency distribution should apply in the above case and
why ? Calculate the theoretical frequencies in that distribution and compare them
with the obseC"'ed ones.
12. Articles are produced by a factory in la~e quantities and 3% of them are
found to be defective. They are despatched in batches of equal number. How large
should a batch be to ensure that
(0) not more than 1 in 5;
(b) not more than 1 in 10;
contains more than three defective articles.
13. If the chance of being killed by a moto.r accident during a year is 1/3000
usc Poisson distribution to calculate the probability that out of 500 persons at least
one would die of motor accident in a year.
14. A person can hit a target one out of tWentY' times. Use Poisson's distri·
bution to determine how many tdals should be had in order to have 99% chance of
hitti ng the target :It least ten times.
15. Male and female children are born in approximately equal numbers. If
twins are born, in what relative proportion would you expect
(0) two boys;
(b) two girls;
(c) one of each.
16. In what circumstances may a Poisson distribution be used? Give the
genet'll term of Poisson distribution and'derive the mean and variance of the distribu-
tion. (P. C. S. 1953).
17. Derive the normal distribution as the limiting fotm of the symmetrical
binomial distribution.
Assume the mean height c>f soldiers to the 68.22 in, wIth a variance of 10.8
:in.). How many soldiers in a regiment of 1,000 would you expect to he over six
feet tall? (1. A .. S, 1956).
18. Give an example of the Poisson distribution, explaining the underlying
stochastic model responsible for it.
In a city with 400 census blocks. each having approximately the same popula-
tion, the frequency distri bution of the number of cholera cases is as follows :_
No. of cases o 1 2 3 4
No. of city blocks 160 146 64 25 5
Examine by an appropriate goodness of fit test whether the occurrence 0 f
cholera cases is distributed at random all over the city. (1. A. S., .957).
19. From records of 10 Pl'ussian army corps kept over 30 years the follOWing
data were obtained showing the number of deaths caused by the kick of a horse. De-
termine the average number of deaths per army corps per annum, and calculate the
theoretical Poisson frequencies.
Number of deaths Frequency of
per army corps occurrence
per annum
o 109
1 65
2 22
3 3
4 1
Total 200
20. A London district was divided up into 200 sub·areas And the number of
direct hits on dwelling houses during the flying bomb raids Was recorded :
THEORETICAL FREQUENCY ,DISTRIBUTIONS 67S

Number of hits o 1 2 3 4 5 6 Total


Number of sub-areas 46 71 48 23 9 3 o 200
(a) Why should the Poisson distribution be expected to apply
(b) Calculate the theoretical figures given by the Poisson distribution.
(M. Com., Allahabad, 1959).
21. A ruler is marked in inches only. 50 people estimate the length of the line
to the nearest tenth of an inch with the following results :
Estimate of length 6.2" 6.3" 6.4" 6.5" 6.6" 6.7" 6.8" 6.9" 7.0"
Number of people 1 2 5 11 12 10 7 1 1
Fit a normal frequency curve to the -data.
22. The following data shows suicides of women in eight German states
duri ng fourteen years ;
No. of suicides in
a state per year
I 0 1 2 3 4 5 678 9 l.;.O_ _ _T__:o_.,t_al-,.--_
observed f~equency 9 19 17 20 15 11 8 2 3 5 3 112
Fit a Poisson distribution to the data ?
Theory of Sampling 25
--..------__-------------------
Meaning and use
It has been discussed in an earlier chapter that statistical data
can be collected either by census enquiry or by sample enquiry. In a
-census enquiry all the units of a universe have to be studied whereas in
a sample enquiry only selected number of units are observed and con-
clusions are drawn about the universe from their study. If, for example,
we wish to know the monthly expenditure of students reading in the
universities in this country, either we can find out the monthly expendi-
ture of each student who reads in any university of the country and
then on the basis of this census data, have an idea about the average
monthly expenditure of students, or we can select some students read-
ing in this country and study their monthly expenditure and thus
.obtain an idea about the monthly expenditure of the university students
of the country, in general. Many difficulties are generally faced in a
census enquiry, particularly when the field of enquiry is large. A
sample_ enquiry is more adaptable than a census enquiry, because iD
it a small number of trained investigators' can collect the whole data
whereas in census enquiry a large army of investigators may have to be
appointed. It is difficult to give proper training to a large number
of investigators and as such data collected by them may not always
be very dependable. Besides this a sample enquiry needs less time
and money also. Moreover, sample enquiry is much more scientific
than a census enquiry because in it the extent of the reliability of the
results can be known whereas this is not always possible in census
enquiries.
Sample surveys are also advocated for the reason that census
enquiries are in many cases either impossible or unnecessarY. Thus to
find out the production of wheat in India, in any year, census enc:fuiry
is more or less impossible. Even if it were possible it is not at all
necessary. A grain merchant does not examine each grain of wheat
that he purchases. ;He simply takes out a handful and from it gets an
idea -about the quality of the whole consignment. Similarly· a fruit mer-
chant does not examine each and every apple, mango or guava he pur-
chases. He inspects only a few 9f them. The corn merchant and
the fruit dealer are not conversant with the theory of sampling; they
simply believe that the sample gives them a corrl:!ct idea about the
universe. This belief is built after years an.d years of experience: There
is further a belief that the larger is the size of the sample inspected
the better is the idea obtained' about the universe. If a fruit dealer
inspects only five apples out of 500, which he is purchasing he may
not get an accurate idea about the condition of the whole lot. In such
cases he will inspect some more and satisfy himself.
THEORY OF SAMPLING '67,7; ..

Theory of sampling gives a scientific basis to such belief. In


the present chapter we shall give a general idea about the theory of
sampling and the technique used in it and in subsequent chapters we
shall discuss the practical problems involved, in details.
Types of universe
Finite and Infinite Universes. Before discussing the methods by which
samples can be selected and the results analysed, it will be better to
give an idea about the various types of universes from which samples
can come. Broadly speaking the universes can be of two types Finite
and Infinite. By finite universe we mean such populations which contain
a definite number of units. Thus the number of students in the Indian
universities is a finite universe. Similarly the population of the Indian
Union is a finite universe. As against this an infinite universe is one,
in which the number of units is infinite. Thus the length of leaves
of a tree or the height distribution of the Indian population or the
production of wheat in India would give infinite universes. Even
though it may be possible to measure the leaves of a tree or the heights
of all the persons of India or the production of wheat in this country,
the actual values would always vary within certain limits. The series
that we shall get in such cases would be continuous, as exact measurements
are not possible. As we shall see later on, infinite populations are better
for sampling studies. In the last chapter we have already noted that
the probabilities of various events can be better estimated if the universe
is infinite.
Hypothetical and Existed Universes. Universes can be classified as
hypothetical and existent. Hypothetical universe is one which does not
consist of concrete objects. For example, if a dice is tossed each throw
is an individual unit and we can construct a universe by throwing the
dice a large number of times and recording its results. Such universes
consist of an infinite number of items, because we can go on throwing
the dice any number of times we like, unless of course, it wears out.
Existent universe as the name suggests, refers to a population of con-
crete objects, like the number of persons having a certain income
or the number of books with a certain number of pages, etc. We
have noted in previous chapters that in the hyt>othetical universe the
values of p and q remain constant in various ttlals, and this is a very
important property of such populations. The probability of a dice
falling with number 6 upwards will always remain II6 in all possible
throws, and this property enables us to fit a particular curve to such
data with a high degree of accuracy.
I
iObjects of Sampling
Now that we have some idea about the type of universes from
which samples can be chosen we would discuss in brief the main
objects of sampling studies. It is obvious that the most important aim
of sampling studies is to obtain maxilfllllJl information about the phenomena
IInder sfll4J lIIith fhe least san-ifi.e oj money, time and energy. If the Sample
FUNDAMENTALS OF STATISTICS

study ha~ 'heen made in such a manner that we can obtain a Jarge variety
of information.___about the phenomena to which the sample relates, it
would be easy f~us to have an idea about simila~ inform.ation relating
to the universe. If, fQr example, the sample studIes relatlOg to expen-
diture of selected stud~ts in Indian universities have been done pro-
perly, they would give us an idea about the distribution of expendit~re
of all the students in Indian universities. Thus the aim of samphng
studies is to obtain the best possible values of the parameters. (The word
parameter is used to indicate. various statistical measureS like mean,
standard deviation, correlation, etc., in the universe. As against this
the term statistics refers to the statistical measures relating to the sample.)
This aim is best achieved if the sample studies are made in such a way
that they disclose a mathematical relationship between the .values of
the distribution. For example, if it is found out that a part1c~lar ~re~
quency distribution obtained by a sample study conforms to Blnoml~l,
Normal or Poisson distribution, the parameter values can be very eastly
estimated and a high degree of reliance can be placed on them.
Thus a large part of sampling theory is devoted to finding out
some constant of the universe. If they are found out, a very accurate
idea about the parent distribution is obtained from the sampling studies.
Even if only the mean and standard deviation of the universe can be
estimated by some mathematical relationship observed in the sample it
is enough to have an idea about many other parameter values.
I
Precision in sampling
Since the main aim of sampling studies is to obtain information
about the problem under study in the universe at large, and .since
sampling studies are made only from a few units collected out of a
large number constituting the universe, an obvious question that arises
is, "to w~at extent can we depend on the sample estimates" ? It is
clear that 1f a sample fails to reveal the main characteristics of the uni-
verse. it does not serve the purpose for which it is meant. As such the
qu~st10n ~elating to . the reli~bility and de~end~bility of the sample
estImates IS a very VItal and Important questIOn In theory of sampling.
I~ th~ sa~pling studies reveal unmistakably that the observed frequency
distrlbut!?n conforms to some theoretical frequency distribution thl
prC?blem l~ solved to a considerable extent because then it is possibl. to
estimate .he parameter values with a high degree of accuracy. If, how-
ever, no such mathematical relationship is disclosed the problem has to
be very carefully investigated. The contlllSionJ in the Jan,pling studies are
based not on certaintiu but on probabilities. The probabilities of some
~yents ~re high and of others low and the degree of accuracy of samp-
hn~ estImates, naturally. depends on the degree of probability with
whtch they are made. Thus if out of 1000 people 999 have heights
below 74" we can say with a high degree of confidence that the height
of the Ioooth person would also be below 74". Here the probability
of the statements being true, is very high, almost touching the realm of
certainty.. Similarly the probability of the accuracy of the statements
THEORY OF SAMPLING

that a man cannot jump higher than IZ feet or that a man cannot live
for more than 150 years, is so high that they are never questioned.
Theoretically speaking a man can be more than 74" high, can jum?
higher than IZ feet and can live more than 150 years. As against such
events there are others whose probability of happening is very low and
we cannot make any assertion with even a fair degree of accuracy.
Theory of sampling makes an attempt to indicate the degree of
reliance that can be placed on various estimates obtained from samp-
ling studies. This is done by assigning limits within which the estimate
is expected to vaiy. These limits vary with the degree of confidence
which we wish to achieve in our assertions. Thus if we want to asset:t
a fact with a very high degree of confidence, the limits which shall be
placed, will be wide so that the chance of the estimate going beyond
them is minimum. It means that the degree of confidence which can
be put in any estimate is expressed in terms of probability. We can
thus make a statement that the probability that the average monthly
expenditure of university students in India would be within the limits
of Rs. 60 and Rs. J.ZO, is .99. It would mean that the degree of confi-
dence that we place in the estimates is very high because the probability
of the actual figure being beyond these limits is very low, 1-.99 or
.01.
The accuracy or precision of estimates depends on a variety
of factors. The first is the manner in which the estimate i.r made from the
sample data. This leads us to the theory of estimation. The second
is the manner in which the sample was obtained. This leads us to the study
of technique of sampling. A third factor is tbe size of tbe sample. If
the size of the sample is small much reliance cannot be placed on the
estimate.
We shall now discuss the various types of sampling and see what
is the extent of confidence that can be placed on various samples under
different types of sampling.
Types of sampling
Samples can be selected from a universe in the following three
manners :-
(I) By random sampling.
(z) By purposive sampling.
(3) By mixed sampling.
Ral1dom sampling. As has been discussed in earlier chapters random
sampling is one where the individual units constituting the sample at:e
selected at random. By random selection we mean that the selection
has been done in such a manner that the probability of the inclusion
of each item of the universe, in the sampfe. is equal. The selection is
thus entirely objective. The first and by far the most important ques-
tion that arises here is, how to obtain such a sample. We shan discuss
a little latter, how a random sample can be select(",.J and how selections
which appear at first thought quite r-andom, are in r ...ality not so.
-"0 ' FUNDAMENTALS OF STATISTICS

purposive sampling. Purposive sampling means selecting the items


of the sample in accordance with some purposive principle. Here the
probability of the inclusion of some units of the u:oiverse in the sample
is very high, while the probability of the inclusion of others very low.
In this method some criterion of selection is first laid down and items
are selected in accordance with it.
MixeJ sampling. In mixed sampling there is a mixture of random
sampling aod purposive sampling. The universe is first divided in some
groups on the basis of purposive sampling and then from each sub-
division certain items are selected io accordance with random sampling.
If the population is first divided into "strata" by purposive method
and then from each stratum some units are selected by random samp-
ling the method is called Stratified Sampling.
Thus, if we have to select 50 hostellers from a university with a
view to study their monthly expenditure, either we can choose them at
random from the total number of hostellers in that university in which
case we shall make the selection on the basis of random sampling; or
we can select 50,hostellers whose monthly expenditure to our know-
ledge is neither too high nor too low in which case we shall be making
the selection on the basis of purposive sampling; or, we can first divide
students hostel-wise and then from each hostel we can select a few
stud!!nts on the basis of random sampling io -which ease we shall be
making the selection on the basis of mixed sampling.
Bias in sampling
Human bias. Selection of the sample by aoy of these methods may
be affected by what is called human bias. Mankind is in fact a most
imperfect instrument for making a representative choice. Wherever
there is any scope for personal judgment, bias automatically creeps in.
Bias may be either deliberate or subconscious. If the investigator is
prej udiced aod if he has the chance of selecting the sample he will
select such units which will suit his conclusion. Even if there is no
deliberate bias, subconsciously the selection may be prejudiced. Even
trained and totally unbiased investigators may not be able to make a
purely random choice and subconscious bias may make the sample differ-
ent from what the investigator intended it to be.
Bias and purpo1ive sampling. In purposive sampling the chance of
human bias affecting the selections is very great. It is so on account of
the fact that human factor plays a very important role in the selection
?f items and we have already said that hum~n being are most imperfect
~nstruments for making a random sdection. Thus, if an investigation
IS conducted about the expenditure of hostel students in a university,
an investigator may select consciously or subconsciously such students
who spend' either less or more than the average, depending 00 the
direction of his bias.
Bias and randqm sampling. In random sampling the chances of
human bias are minimised. Here the bias may be due to the fact that
THEORY OE SAMPLING

selection is not purely random. As w.e shall see a little later the selec-
tion of a purely random sample is a very difficult task and one can
never be sure that a particular sample is a perfectly random sample.
Another reason of bias in random sampling may arise when a selected
unit is, for some reason, deleted and a new unit is substituted in its
place. Thus, if a" particular student selected at random, is not available
for questioning or if a particular house selected at random in a house-
to-house enquiry is vacant, and if a new unit is substituted in such cases,
the sample no more remains a random sample, and the results may be
biased.
Further, the results of random sample enquiry may be rendered
inaccurate if the selected units are not properly investigated or even
where they are investigated properly, faulty reasoning is applied to
draw inferences.
The foregoing discussion clearly shows that if a sample is to be
made free from bias it is necessary that all personal choice should be
eliminated. The human factor must have the least say so far as the
selection of the sample is concerned. The technique of selection should
therefore, be such, that no room is left for the personal whims of the
investigators. As we shall see in the following paragraphs the selection
of samples is now-a-days done by mechanical aids and through such
devices that the human factor has the least chance of affecting the
choice.
Other types of bias which arise due to substitution of new units
in pbce of selected ones or due to incomplete investigation of the
sampled units can be removed if only proper care is taken in conduct-
ing the enquiry.
SELECTING A RANDOM SAMPLE

(1) In a finite universe


It has already been said that it is not an easy task to select a sample
which is purely random and many sampling methods which appear to be
perfectly random are in reality not so. If we have to 'select ten students
from a hostel with a view to study their monthly expenditure and if
there are 100 rooms in the hostel and if we select every tenth room as
a sample unit, so that the student occupying it is included in the sample,
it would appear as if the selection is perfectly random. But it may not
be so: Suppose every tenth room in the .hostel is a cubicle whose
rent is higher than of other rooms, the sample would consist of all such
students who are spending more than the average students. It is clear
from this example that one should exercise extreme care to see that the
selection is, really random.
One fundamental rule which should always be kept in mind is that
fhl! modI! of the selection of sample should bear no relation to that property of
thl! parent population which is under stuqy. Thus in the above example the
~.9de of selecting ten students should be such that it is not affected by
682. FUNDAMENTALS OF STA'I'ISTICS

the expenditure of the inmates of the hostel. If this rule is observed


there is no reason why, only those students will be selected who spend
either more or less than the average. In such cases the likelihood is that
all types of students will come in the sample and they would be in the
same proportion in the sample in which they are in the universe. Bias
creeps in the samples only when the procedure of selection is related to
that property of the universe about which information is sought.
Bearing in mind the above rule the selection of random sample
can be made in any of the following ways :-
(i) By drawing lots or by lottery method.
(ii) By arranging the units of the universe in a particular order
(geographical, numerical, alphabetical, etc.,) and selecting every 10th,
20th, looth or nth unit depending on the size of the universe and the
size of the sample to be selected.
(iii) By using table of random numbers.
We shall discuss these methods ,in turn.
Lottery sampling. In this m~thod the various units of the universe
are represented by small chits of paper which are folded and mixed to-
gether. From this, the required numbers ar<;_ picked out blind folded.
Thus the names of 100 hostellers or their serial number can be written
on small chits of papers which can be folded in such ~ manner that they
are indistinguishable from each other and then ten folded chits can be
drawn from this lot at random. This method of selt",:tion is indepen-
dent of the expenditure of the 100 hostellers and as such should give a
representative sample. If, however, some chits are folded less and
others more, so that some are bigger in size than the others, selection
may not be random, because now each unit has not an equal chance of
being included in the sample. A blind folded person is more likely
to draw bigger chits than smaller ones and the sample may lose its
characteristic of randomness.
To avoid the above difficulty we can construct a card population.
Here the name or the number of each of the 100 students would be
written on a card. All cards would be identical in size, shape and
design. They would then be shuffled a number of times and then 10
cards would be drawn at random from them. This method would,give
us better results than the first one, but if the names or numbers .on the
cards are written in adhesive ink, so that on some cards it is more
thick than on others, the probability of the selection of various cards
would not be identical and the sample may not be fully representative.
Moreover, tbis method can be used only when the universe is not very
large. If the number of items in the universe is very large it would be
very difficult to construct a card population.
Serial, geographical or alpbabetical arrangemenl. A second method of
selecting a random sample is by arranging the units of the universe in
some order aJld selecting every loth, 20th, Iooth or nth unit depending
THEORY OF SAMPLING

on the size of the universe and the size of the sample. Thus, if the~
are 100 rOams in a hostel each with a serial number we can select every
tenth room and the student Hving in it can be included in the sample.
Another arrangement is that the names of the students are written in
alphabetical order and then every tenth student is selected from the
list. Similarly, the units of the universe can be arranged geographically
and the selection of the sample done in accordance with the above
procedure. 'Thus, if we have to select 10 villages out of 100 in a par-
ticular district, the villages can be arranged geographically so that the
names of villages in different tahsils are noted down and every tenth
village selected.
The above mentioned methods would give a random sample unless
every tenth unit is of a variety diffr.::ent from the common lot in which
case the sample would no more be random.

Random Numbers. Due to the difficulties involved in the two


methods discussed above the method of random sampling numbers
has become very popular. A number of people have constructed random
sampling numbers of which those consturucted by L. H. C. Tippett are
most widely used. The tables constructed by Tippett consist of 10400
four digit numbers. They are constructed out of 41600 digits taken
from census reports by combining them in fours. The first forty numbers
given in the table are I!!produced below :-

2.952. 664 1 ;992. 9792. 7979 59 11 F7° 562 4


.4 16 7 952.4 1545 1396 72. 0 3 535 6 13°0 26 93
2.37° 74 8; 34°8 2.76 Z. ;5 6 3 108 9 69 1; 76 9 1
05 60 5246 1112. 6107 6008 8126 4433 871 6
2.754 9 143 140 5 9°2.5 7 00 2. 6UI 8816 6446
The technique of selecting random sample with the help of th ese
numbers is very simple. Suppose we have to select a sample of:~
from a population of 8000. We shall first number t:~e various units
of the population from I to 8000. Now we have to select 2.0 numbers
between 1 to 8000.· We shall open any page of the Tippett's tables
and select the first 20 numbers which are less than 8000. If the first
page is used the numbers selected would be 2.952, 6641, 3992., 7979,
59II , ;170, 56 2.4, 4167, 1545, 1396, 72. 0 3, 1356, 1;00, 2.693, 2370,
748;, H08,2.762, ;56;, and 1089. The units with the above numbers
would constitute the sample.
If the universe is small, say of roo units only. and if we have to
take a sample of 10 we can assign numbers I to 100 to the various units
of the universe and then select ten such figures which are less than 100
it should be remembered that though Tippett's numbers are of four
digits in many cases they are less than 100 or even 10. Thus 0008 is a
four-digit number but le~s than 10 in value. A better procedure
however, would be to utilise a larger number of these random numbers.
There are 10400 numbers in all. Now if instead of assigning one
FUNDAMENTALS OF STATISTICS

number to each unit of the 'universe we assign it, say 104 numbers, so
that the first unit has numbers 0001 to 0104 and the second unit 0105
to 02.08, we shall be in a position to use all the numbers of the table.
Having done this we can select any ten numbers from the tables and
they would constitute the sample. If we get two or more numbers say
in 0105 to 02.08 group we can ignore the numbers after that unit has
been selected.
A question that arises at this stage, what is the guarantee that
these numbers are really random? No proof of it can be given but
experience has shown that the numbers have given very satisfactory
results. Thus the proof of their randomness lies only in the success
of thousands and thousands of repeated investigations that have been
conducted by using them throughout the world. Another set of 1,00,000
numbers has been constructed by Kendell and Babbington Smith by
using a randomizing machine. They are also very popular and have
given correct results in a larger number of investigations.
:2) In an infinite universe

The methods of selecting a random sample discussed above are


appropriftte only in a finite universe, where it is __possible to assign a
card or a number to each unit constituting the population. In cases
where the universe is infinite these methods cannot be 'applied. For
example, if we have to take a sample of wheat or of flout or of cement
from a bag it is not possible in any of these cases to as~ign a numl?er
to each grain or particle constituting the universe and as such the
methods of constructing card population or of random sampling num-
bers cannot be used. In such cases a thorough mixing of the grain or
cement may be done and by dividing and sub-dividing the lot in parts,
a sample of an adequate size can be obtained. The contents of the
bag after thorough mixing may be divided in two equal parts of which
one may be selected and this may further be divided in two parts after
mixing. In this way the process can be continued till one of the sub-
divisions is equal to the size of the desired sample.
(3) In hypothetical universe

Selection of a' random sample in the hypothetical population pre-


sents even more difficult task. Here not only can we not apply the
technique of card populations and random sampling numbers but
even the process of continuous sub-division of the universe followed
in case of infinite population is not of any use. Suppose we wish
to have a randqm sample of the possible throws of a dice we cal\
neither construct card population nor use random sample numbers. It
I is aJ 30 not possible to use the techniquP. of continuous sub-division of
the univer~~. In such cases the only thing possible is to take the sample
by actually throwing a' dice, a number of times and observing the
results. The only point to be kept in mind here, is that, sampling
conditions should remain unchan!,ed throughout the experiments. In
THEORY OF SAMPLING

such cases, no doubt, there is no proof that the sample approprIatelY


represents the universe, but since we know how a di<,:e behaves we have
ample justification in presuming that the sample has been taken
from a universe which is in a way existent.

SELECTING PURPOSIVE AND MIXED SAMPLES

We have already pointed out that in purposive sampling the


investigator chooses some units from the universe in accordance with
certain procedure previously laid down. It is clear that in this method
the bias of the investigator can play a very important role and destrov
the representativeness of the sample. But there may be cases where a
purposive sample may give better results than a random sample. Sup-
pose we have to select only five students out of 5000 with a view to
study their height measurements. In this case it is quite likely that -a
random sample may give. us most unrandom results. A purposive selec-
tion in such a case may give better result!; because here the investigator
would select such five students who have a normal or mean height
according to his judgment. But it should be remembered that as the
size of the sample is increased the chances are, that a random sample
would give better results than a purposive sample, because the larger
the size of the sample the greater are the chances of operation of the
law of compensation. In case of purposive sampling as the size of the
sample increases the chances of bias affecting the results also increase.
In purposive sampling the bias is generally cumulative. It should be
remembered that bias arising due to chance is compensatory and the
bias arising out of human behaviour is generally cumulative.
Thus, if we have to choose between a purposive sample and a
random sample we shall weigh the uncertainties of the former with
those of the latter. In .most of the studies since the size of the sample
is not very small~ selectlons are m~de on the basis of r~ndom sampling.
Moreover, even 1n case of small s1zed samples, purpos1ve sampling can
give better estimates only about the mean of a series ,and not about its
dispersion. or skewness, because in purposive sampling only those units
would be selected whose values are round about the mean value.
In practice sometimes a mixture of the tWO methods may give
the best possible results. Th6 universe may be divided into strata and
then from each stratum some units may be selected by random sampl-
ing. In this way stratified san/piing may give us the advantages of both
these methods.
To reduce the chances of errors we can also have multi-stage sampling.
This is also a variety of mixed sampling. Here first the universe is
divided in first-stage sampling-units from which a sample is selected.
The selected first-stage-samples are then sub-divided in second-stage-
units from which another sample is selected. Third stage and fourth-
stage sampling is done in the same manner, if necessary. Thus for the
purpose of urban surveys a sample of towns may be taken first and then
FUNDAMENTALS OF Sl'A'l'ISnCS

in each of the selected towns a sub-sample of households may be taken


an~ if .n~ed be from each of the selected household a thi.rd-stage-sample
of Indiv1duals may be obtained.
. Sometimes different proportions of the different strata are includ~
ed In the sample. This is called IIlilization oj lJariable sampling fraction.
Suppose the students of a university are first divided in four strata
faculty-wise and then from each stratum we take random samples
keeping in mind the proportions of students in the four faculties, we
shall be making use of variable sampling fractions. Thus, if in the
Faculty of Arts there are 2.000 students and in the Faculty of Science
only 1000 the samples from these strata would also be in the ratio of
2.: I.

Conducting a sample enquiry

Collection of data. After a, sample has been selected the next step
is to collect the required data from the sample units. For this purpoSe
we shall have to select an appropriate method of collection of data. We
have already discussed in an earlier chapter the various methods which
are used in the collection of data and their merits and limitations. The
reliability of the results of a sample survey depends to a considerable
extent on the manner in which data have been collected and on the
reliability of the collected information. The fact that a particular
sample is a purely random sample is absolutely n9 proof that the in-
ferences drawn from its study are also dependable. Sample surveys
may give most unreliable and inaccurate results even though tht:
samples are purely random, if the data are nQ.t collected properly. If
the investigator or the informafJ.t is biased or if the questionnaire
adopted is unsatisfactory or if the method of collection of data is not
appropriate, the results of the survey are bound to be misleading.
Therefore,it is necessary that adequate care be taken in the collection of
data and one should not think that since the sample selected is random
the results ought to be satisfactory.
After the data have been collected, statistical inferences are drawn
from them. It has already been said that sample studies are meant to
draw certain conclusions regarding the universe as a whole, and as such
the generalisations from sample studies should be very carefully made.
The question that naturally arises here is how far would the results of
the sample hold good for the universe as a whole. It is obvious that in
such studies one cannot be dogmatic about the inferences. Usually
the conclusions are started in a very general form. As has been said
befor~ the extent of confidence associated with a particular generali-
sation is expressed in terms of probability, and invariably certain limits
are laid down within which results are expected to vary. In inductive
reasoning we apply to the unknown (universe) the results of the known
(sample). It is no doubt a leap in the dark, but statistical methods
t:elating to sampling, definitely reduce the hazards and dangers involved
in such an attempt.
THEORY OF SAMPLING 67 g

We have pointed out that in natural phenomena there is a certain


type of regularity in all processes and when we apply the results of a
sample to universe we do so on the assumption that a certain type of
uniformity exists in the universe so far as the particular problem which
is under study is concerned. Thus when on the basis of a sample
study we generalize that the monthly expenditure of the hostel students of
a university is say Rs. 85, we presume that the distribution of expenditure
of hostel students in general, obeys certain laws, and there is a certain
type of uniformity in the series. But the accuracy of this assumption
cannot be proved in any manner and as such there is always a priori
element in all types of statistical induction.
As we shall see in later chapters, most of the generalisations from
sampling studies are based on the presumption that the values in the
universe are symmetrically distributed and would give a normal fre-
quency curve. We have seen in the last chapter that in a normal fre-
quency distribution we can safely lay down limits within which certain
values are most likely to vary, and this fact is in reality the main stay of
all sampling studies. Various tests of significance are used tCi find out
whether a particular result significantly differs from the expected one.
We shall stl.1dy them in succeeding chapters.
Questions

1. Show the necessity of the uses of the method of random sampling in any
extensive investigation. How would you make use of these methods in carrying out
an economic survey of the rural areas of C. P. ? (B. Com., Nagpur, 1948).
2. What is meant by "sample" methods of enquiry? When is it adopted
nnd what are its advantages ? Describe the test that may be applied to determine
whether the sample is representative or not.
(B. Com., Hans., Andbra, 1944).
3. D'iscuss the special fcatures of thc diffcrent types of universes from which
samples can be drawn.
4. What are the main objects of sampling? Compare and contrast the merits
and drawbacks of Sample and Census studies.
5. What do you undcrstand by ,"random sample?" Is it a synonym for "re-
presentativc sample ?" Why is a random sample supposed to speak for the 'population'?
To what types of enquiries is the technique of random sampling specially applicable?
(M. A., Rajplllana, 1951).
6. Discuss the various methods of judging the .teliability of various types of
sampling studies.
7. "Random sampling owes its importance to the fact that we can assess the
results obtained from it in terms of probability."
Elucidate this statement and also discuss the technique of random sampling
investig'ltion. (M. A., Allahabad, 1950).
8. What do you u?dcrstand by the ten"? '.'bias" _? .How ~ bias in .samples
he reduced ? Is it pOSSIble to completely eliminAte bIas 10 samplmg studIes? If
not, why?
9. Discuss the relative ad\"llntages and dis"dVAn~gcs .of the ~ethod of comple~c
enumcrdtion and the method of random sample su~y In sQC1al and. economIc
enquiries. (M. A., P;:I!Iab, 1952).
10. Di~cuss the important methods of selecting rtuldorn slImplcs ffOUl different
types 0 f u ni verses.
688 FUNDAMENTALS OF STATISTICS

11. How are purposive and mixed samples selected? What are the chief
dar.:gers in the selection of samples by these methods ?
12. ·'One of the aims of statistics is to describe population (through sample)
and to this end statistical constants are calculated".
Discuss fully the above statement and show how this is sought to be achieved.
(M. A., Paino, 1'940).
13. Write a note on the respective merits and demeritS of:-
(a) Random sampling,
(b) Purposive sampling,
(t) Mixed sampling.
14. Compare the relative advantages and disadvantages of the method of
complete enumeration and the method of random sample survey. Explain with
reasons the method you will adopt in enquiries relating to : -
(a) Area under rice.
(b) Cost of production of sugarcane in Bihar. (M. A., Paino, 1942).
15. Write a note on the theory of sampling.
16. How would you conduct a sample survey? What special points should
be kept in mind in the : -
(0) selection of a sample,
(b) collection of data.
17. What do you understand by "statistical induction"? What precautions
are necessary in drawing inferences from a sample survey ?
18. What is meant by "precision" in connection with sampling studies? Discuss
how the precision of sampling studies is estimated at various levels of significance.
19. Discuss how the theory of sampling is based oJ the theory of probability.
20. Write short notes on :
(a) Inertia of large numbers,
(b) Multi-stage-sampling,
(t) Stratified sampling,
(d) Utilisation of variable sampling fractions.
21. If a sample is obtained by selecting every tenth item, what possible bias
could result? Give examples. Why is this not a random sample ?
22. Suppose it is desited to estimate the mean family size in a certain town.
Would the recording of the family size of a random selection of high-school studc.'llts
be a reasonable way to obtain data ? '
Sampling of Attributes 26
We have already pointed out the distinction between statlsl:1cs of
variables and statistics of attributes in earlier chapters, and have also
discussed various statistical methods which are used in these two types
of statistics for the purpose of analysis of data. In sampling studies
also, we shall discuss the statistics of attributes separately from the sta-
tistics of variables.
The. sampling of attributes may be understood as drawing a
sample from a universe which consists of A's and a's. Thus if we
are studying the problem of blindness and if this attribute is represent-
ed by A then a would represent the absence of blindness. In order to
find out percentage of blind people in a particular universe we may
take a sample and study the percentage of blind people in it, so that we
may be in a position to draw certain conclusions about their percentage
in the universe. For the sake of convenience here also we shall call
the drawing of an individual on sampling as an "event", and the pre-
sence of attribute A as "success" (represented by p) and its absence as
"failure" (represented by q). Thus, if we have taken a sample of 200
people and 'if we find that out of them 8 are blind we will say that the
number of events was 200 and out of it there were 8 successes and 192
8 192
failures. The probability of success or p=- and q=-
200 200
Simple sampling
Before proceeding further we' shall lay down certain assumptions,
which we presume, would hold good in the sample which is under
study. The sampling which satisfies these assumptions would be called
"Simple SaIlJpling." Thus by simple. sampling we shall mean a random
sample in which the following conditions hold good ;-
(I) The probabilities of drawing individuals with attributes A, or the
chance of success of various events are independent whether previous trials have
been made or not. It means that ~he proportion of A's at each draw of a
sample unit is identical. This assumption holds good in case of toss-
ing a coin or drawing a ball or a card provided that before the second
and subsequent draws the ball or card drawn previously is re-
placed. Thus the probability of a coin falling "heads" is identical in all
throws and similarly the- probability of drawing a black ball from a
bag containing three black and four white balls is identical for all
draws provided there is replacement each time. In actual practice
this condition would not hold good in drawing samples relating to
attributes froJIl a "finite" population. For example, if in the universe
of Io,oos>---people there are 100 blinds the probability of drawing a
FUNDAMENTALS OF STAl'ISTICS

blind in the first event is 100 and in the second_22._ and in the
10000 9999
8
third 9 and so on. It will be noticed, however, that if the num-
"9 8
ber of items is very large there will be no material difference in the
_
probabilities of various events even if this condition does not hold
good.
(2.) The probability (or p) of drawing an individual with attribute A
remains constant and is the sdme for all samples. This condition would hold
good only if the proportion -of A's in the universe remains constant
each time a sample is drawn. If a dice is tossed at two different places
or at two different times the probability of success (if coming of
No.6 is taken as success) would be identical. This cannot be said
about sampling of attributes if the two samples have been drawn at
two different places (of the same universe) or at two different times. The
proportion of blind in the same universe would not be identical either
at two places or at the same place at different times. In the analysis of
sampling of attributes we presume that this would be so.
The simple sampling is a particular type of random sampling in
, which the above conditions hold good.
It should, however, be kept in mind that in a<ktual practice in
most of the data that we shall come across, these conditions would
not hold good and statistical inferences will have to be made with the
hypot~esis tha~ these con~itions a~e satisfied .by th: data. In certai~
biologIcal studies and studIes relat.lUg to phy~lcal sCl.ence5 these condi-
tions do hold good. In econom1C and sOClal studies, however, the
limitations imposed by these conditions do not leave much room for
I the application of the rules which we shall discuss below and which
apply in case of simple sampling only.
Mean and standard deviation in simple sampling of attributes
We have already mentioned in earlier chapters that if the probability
of tlu! happening of an event in one trial is known we can find the
probability of its happening r times in n trials by: the expansion of
a binomial. If p denotes the chance of success of an event and if I-P
or q denotes the chance of its failure and if we take N samples with
n events the frequency of samples with n, (11-1), (n-2..) ..... successes
are the terms in the series N (p+q)n or
N { r+nr-1q+ n(n-l) pn-Zq2+ .. .q" }
IX2.
We have mentioned in the chapter on Theoretical Frequency
Distributions that the mean and standard deviation of such series are
given by the following rules :-
Mean or M=np
Standard Deviation or (J = v'npq
SAMPLING OF ATTRIBUTES

If instead of recording the number of successes we record the


proportion of successes or _I_th of the number in each sample the
n
mean proportion of success or
M'=p
and standard deviation or

u'= JPt
The follow~ng examples would illustrate the above rules :-
Example x. Suppose four coins are tossed simultaneously 1600
times and falling of heads are called successes. We have thus 1600
samples of four tosses eac)l.

Suppose the following distribution has been obtained :-

Successes Frequency
4 90
3 42.0
2. 'So
I 410
0 100
Total 1600

In the above case the mean of the series is ± 1.994 and the stand-
ard deviation .99. If the coins are unbiased and if they are properly
thrown the value of p or the probability of success would be t and fre-
quencies of 4, 3, 2., I and 0 successes would be the various terms of the
expansion x600(!+t)' or they would be respectively 100, 400, 400
and 100. For this theoretical or expected distribution, the value of mean
would be=np or (4xi) or 2. and 6.e value of standard deviation
would be=v'fiiq or V4XiXt or I.
We thus find that there is some difference between observed
values of the mean and standard deviation and their expected values.
These differences may be due to what we have been calling fluctuations
of sampling. The question that arises here is, to what extent can such
differences be assigned to sampling fluctuations and consequently
ignored? We know that samples which are classified according to fre-
quencies of attributes give rise to binomial distribution. A binomial
distribution gives a single humped type of curve when p and q are
equal, or even when they are unequaf but the value of the exponent n
is large it gives us a distribution which very closely resembles a nor·
1llal distribution. We, however, know that in a normal frequency
distribution 99.73% of the items lie within the limits given by mean
FUNDAMENTALS OF STATISTICS

±; standard deviation. In other words, np±; vnpq would cover a


very large majority of the cases. In fact in a perfectly normal distribu-
tion 99.73% of the total area of the curve would lie within this range.
If, therefore. a sample gives us such a value of p which is within this
range, the difference between the actual and observed values should be
ascribed due to fluctuations of sampling. If, however, the sample gives
us such a value of p which lies out of this range, it is most unlikely
that the difference is due to fluctuations of sampling. In other words,
if the difference between the observed value of the mean and its expect-
ed value is more than three times of the standard deviation of simple
sampling, the difference is significant and could not have arisen due
to sampling fluctuations.
In the above case the difference between the actual and observed
means is only .006 which is much less than the value of even one stand-
ard deviation of simple sampling, and as such, the difference is insigni-
ficant and could have arisen due to sampling fluctuations.
The following examples would further illustrate the above pro-
cedure :-
Example 2.. 12. dices are thrown 3086 tinres and a throw of a 2., 3,
4 is reckoned as a success. Suppose that 19 1 44 throws of a 2, ; , or 4
have been made out. Do you think that this obser~ed value deviates
from the expected value ? If so, can the deviation from the expected
value he due to fluctuations of simple sampling ?

Solution
The total number of throws = ;086 X 12.= 37°;2
Thf" chance of success, that is of throwing a
2,3 or 4 with one dice in one throw =1'
Hence the expected value of successes = 1X 37032. = 185 I~
The observed value of successes is 19142..
Thus, the observed number of successes is in excess of the expect
cd number by (19142-18516)=62.6.
The standard deviation of simple sampling is
a=vnpq= vtx iX,37032.
=9 6.2
The deviation observed is 6.5 times of this figure and it is, there-
fore, most imp~obahle that it is due to fluctuations of sampling.
Example 3. Certain cross of the pea gave 532.1 yellow and 1804
green seeds. The expectation is 2. 5 per cent. of green seeds on a Men-
delian hypothesis. Can the divergences from the expected values have
arisen from fluctuations of simple sampling only?
SAMPLING OF ATTRIBUTES

Sofution
The total number of pea seeds examined=(H21+1804)=7l%~.
The expectation of green seeds is 25 per cent. of the total.
:. the expected result is 1781 green seeds. But the observed
result is 1804 green seeds, and so it is in excess of the expected result
by 23. The stand~rd deviation of simple sampling is
u=v'npq=v'0.25 XO.75 X7IZ5-36.6.
The diver~ence from expectation is thus only 0.6 times of this
and hence may very well have arisen from fluctuations of simple
sampling.
Example 4. Balls are drawn from. a bag containing equal numbers
of black and white balls, each ball being returned before drawing
another. In 2250 drawings, 1018 black and up white balls have been
drawn. Do you suspect some bias on the part of the drawer ?
Sollliion
The expectation of drawing a white ball in one draw is t, since
the bag contains equal number of black and white balls.
In 2250 drawings the expected number of white balls is II2S.
But the number of white balls drawn is up.. Thus the numerical
difference from the expected result is 107.
The standard deviation of simple sam.pling is
u==v'npq= vixix:U50=23.7
The divergence from the expectation is thus about 4.5 times of
this and hence it is not probable that it arose due to ~fluctuations of
sampling.
Explanation of the deviation must be sought somewhere else, and
it seems reasonable to suspect that the drawer was biased.
In the above example we have calculated the standard deviation
of numbers of the simple liampling. We can similarly calculate the
standard deviation of the proportions of the simple sampling by the
use of the formula already given. The following examples would illus-
trate the rules regarding the calculation of the standard deviation of
proportions of simple sampling.
Example 5. A group of scientific men reported 1,70~ sons and
1,527 daughters. Do these figures conform to the hypothesis that the
sex ratio is i ?
Solulion
The total number of observation is 170 5+ 1527=3231.'
The number of sons is 1705.
'694 FUNDAMENTALS OF STAnSnCS

0
Therefore the observed male ratio is 17 5 or o.p-n.
32 32.
On the given hypothesis the male ratio ought to be 0.5000.
Thus the observed male ratio is in excess of the theoretical ratio
by 0.0275.
The standard deviation of the proportion is

s=J_1!!L=
n
J !X!X _1_=.008-8.
3232.
The divergence from hypothesis· is thus about 3:13 times of this
standard error and it is, therefore, most improbable that it arose as a
sampling fluctuation. Hence it can be definitely said that the figures
given do not conform to the given hypothesis.
Example Ii. I2. dices were thrown 6500 times, 4, 5 or 6 being
reckoned as a "success". What proportion of success do you expect ?
If in actual observation the proportion of success is found to be o. 5016~
find the standard deviation of the proportion with the given number
of throws and state whether you would ~egard the excess of successes
as probably significant of bias in the dice.
SO/Iltion
The total number of throws=6500XI2=780oo.
The expected proportion of success is ! or 0.50000.
The observed proportion of successes is 0.5016, and thus i$ in
excess of expected proportion by .0016.
The standard deviation of the proportion is

s=J Pi ~J !xix
'..
1
7 8000
•001 79

The deviation observed is onlY'9 times this figure and it is, there-
.fore, probable that it arose ·as a sampling fluctuation. Therefore, the
,excess of the proportion of successes is not significant of bias in the
dice. .
Standard errors
The standard deviation of simple sampling is briefly called Stan-
dard Error. The term standard error has in reality a wider meaning
than merely the standard deviation of simple sampling. But for the
sake of convenience the term can be defined as mentioned above. Thus
if the difference between the actual and observed frequencies is Plore
than three times the standard error the difference is said to be signifi-
cant which means. that such a difference could not have arisen due to
fluctuations of sampling or the probability of such a difference arising
due to chance is very very low. If the difference is less than three-
SAMPLING OF ATTRIBUTES

times the standard er~or it could have arisen due to fluctuations of


sampling. If the difference is less than say twice the standard error,
the probability of its arising due to chance is fairly high and it can be
ignored. If the difference is mote than twice the standard error but
less than thrice the standard error then theoreticaly speaking it could
have arisen due to sampling fluctuation, but the probability of its arising
due to chance is very low. As such even though such a difference can
be ignored it is never safe to do so.
In all the examples solved above we know the l.'robability of the
happening of the event in the universe. In many cases it is not known.
In such cases usually the proportion of success in the sample or the
value of p in the sample is taken as an estimate of the proportion
of success in the universe. The assumption is justifiable only if the
size of n is large and neither p nor q is very small. It is obvious that
if n is large and if the value of either p or q is not very small which
means that the difference between them is not much, the distribution
would be very much like a normal one.' We have discussed this point
already in the chapter on Theoretical Frequency Distributions. Another
course open to os in such cases is to take the highest value ofpq. It
is clear that the value of pq can never exceed t X! or t. As such if
we do not know the value of p in the universe and if the value of p
in the sample is very smaIl' or very large (so that the value Of q is very
large or very small) then instead of taking the sam}?le values of p and tj
the ~~ue of pq may be taken as ! which is their maxtmum value.
Example 7. 400 children are chosen in an industrial town of
Northern India, and 150 are found to be underweight. Assuming the
conditions of simple sampling, estimate the perc~tltage of children who
are underweight 10 that industrial town and assign limits within which
the percentage probably lies.
Sollilion
Taking the observed values of p and q in the sample, we have p,
. the ch ance 0 f
t hat IS '
getting an und ' h t .."hild-- 15 0 -_ _
erwelg 3 and
400 8
q, that is the chance of failure = i
Total number of children examined, or n is 400.
Substituting the above values of p, q and n in the given form~a
we get,
Standard error of the proportion of Ghildren u:ho are IInderweight
=Jpq
n= j.l..8 X -85 X _1_=0.02.4=2..4 per cent.
~o .
Whatever may be the percentage of children who are under·
weight in the population, a simple sample should give a percentage
within three times, this standard error.
'6916 . FUNDAMENTALS OF STATISTICS

- 'Hence taking i (or 37.5%) to be the estimates of the number of


children who are underweight, we have that the limits are 37.5 per
cent. ±(_~ X2.4) per cent. that is 30.3 and 44.7 per cent. approximately.
If, however, we feel that the value of p in the sample should not
be taken as its value in the univetse we can proceed in a different
manner. Whate.ver be the values of p and q in the universe the
'9'alue of pq cannot exceed ! and hence the standard error cannot
'exceed J 1
Ix! X --or
4
0 0
.025 or 2.5%.

On this basis the limits would be 37.5±(3 X2.S) per cent. Or 30%
and 45%.
We find that the difference between the two results is very little.
Example 8. 500 eggs are taken at random from a large con-
signment, and 50 are found to be bad. Estimate the percentage of
bad eggs in the consignment and assign limits within which the per-
centage probably lies.
Sollliion

Proportion of bad eggs in the sample or p =..:. ~nd so q=..2..


, 10 10
Standard errOr of the proportion of bad eggs
= JPfj J_I
II
=
10
X -2. x
10
_1
500
=0. 0 13 = 1.3 per cent.
Whatever may be the percentage of bad eggs in the consignment,
a simple sample should give a percentage within three times the stan-
dard error.

Here taking"!_ (or 10%) to be the estimate of bad eggs we have


10
that the limits are 10 per cent. ± 3.9 per cent. that is, 6.1 per cent. and
13.9 per cent. approximately.
In this example the value of p in the sample is very small land
hence if we take the maximum '9'alue of pq and find the maximum value

of standard error it would be JtX!X-1-or .022 or 2.2%.


SOO
On this basis the limits within which the percentage of bad eggs
in the consignment should vary would be (10 ± 6.6)% or 3.4% and
16.6%. Now whatever be the values of p and q in the universe the
percentage of bad eggs in the consignment should lie within these two
limits.
SAMPLING OF ATTRIBUTES

In such cases where p is very small the binomial distribution a ally


tends to become a Poisson distribution and as such the v lue of
standard deviation is = V np rather than V npq because q is more r less
equal to unity. Butvn.p= v'M. Therefore where p is very
and the distribution is like a Poisson distribution the standard
tion of simple sampling is equal to the square root of the mean.
above example then, the value of the standard deviation wo
V-{"O"XSoo or 7.07 or(7;:~ XIOO)% or 1.41. For Poisson dist 'bu-
tion also the rule is that mean ±3 standard dedation includes a la ge
majority of observations. On this basis the limits in this question
would be 10%±(3 X 1.41)% or S.77% and 14.2.3%. But sometimes
this criterion gives the lower limit as a negative percentage which is
ouf ?f question in actual practice. .
Example 9. A life insurance company founded upon the Indian
Experience Table has a thousand policies averaging Rs. 2.,000 on lives
at age 2.S. From the experience table it is found that of 89,032. alive at
age 2.5; 88,000 are. alive at age 2.6. Find the upper and lower values of the
amount that company will have to payout in insurance during the year.
Solution
The chance that a man of age zs would not be alive at age 2.6 is
88000 lOp.
p= 1 - 89 0 320 = 89 0 320
Therefore, the expected number of dea 'hs of 1000 insured per-
sons is
IOOOX 1032.
890 3· 2 = I2. approx.
The standard error of the simple sampling is

0'= =
J
=3·4
1032.
--X
89032.
38o()(
89032.
'X 1000

The actual number of deaths would lie between the expected


number of deaths ± three times this standard' error, i. e., I2. ±
(3 X 3.4) or 1.8 and 2.2..2..
The average amount of a policy is 2.,000 rupees. Therefore, the
upper and lower values of the amount that the company will have
to payout in insurance during the year are Rs. 2.,000 X 1.8 and
Rs. 2.,000 X 2.2..2. or Rs. 3,600 and Rs. 44,400.
Standard error and size of sample
It will be observed from the above examples that the standard
error depends entirely on the value of p and the size of the sample.
FUNI)AMBNTALS OF STATISTICS .

As such the range within which P lies is not dependent on the size of
the universe. This is the reason why the size of the universe is not
indicated anywhere in the formula of the standard error. But standard
error is affected by the size of the sample. If, therefore, P is constant
and if 11 is changed the value of standard error of p would also change.
The value of the standard error varies' inversely as the square root of 11.
Therefore, if 11 becomes larger the value of the standard error becomes
smaller. The standard error decreases In proportion to the square
root of the number of items in the sampls:. If the value of p is t and
the value of 11 is 100 the value of standard error would beV! X r! X 1.
or .05 or 5%. If we wish to reduce the standard error to one-hai}uolf
its magnitude that is 2.5% the value of 11 should increase four-fold and
not two-fold only. ThusViXtX-ri-u-.025 or 2.~%.
Standard error and precision
Standard error gives us an idea about the unreliability of a sample.
The greater the standard error the greater is the departure of actual
frequencies from the expected ones and consequently greater is the un-
reliability of the sample.' The reciprocal of the standard error· or
({ I d is a measllre of reliability of the sample. We have seen
stan ar error
in the chapter on Measures of Dispersion that this value is called
Precision. The reliability or precision of an observed proportion varies
as the square root of the number of items in the sample. To double
the precision (which means the same thing as reducing the standard
errOrs to one-half) the number of observations should be increased to
fourfold and to treble the precision number of observations shOUld be
increased to nine-fold.
Standard error of the difference between proportions of two samples
In question done so far we have tried to study the difference
between the· actual proportions as observed and the expected propor-
tions in the universe. There may be cases where two samples have been
taken from distinct materials or different popul~tions and they give PI
and P2 as the proportions of A's, the numoer of observations in the
two samples being 111 and 118 respect~vely. The question that may arise
here is whether the difference in the two proportions disclosed' by the
two samples is significant or there is no real difference between them
and the observed difference is due to fluctuations of sampling, the two
populations being similar so far as the proportion of A's, in them is
concerned. In such ·cases we do not have any idea about the propor-
tions of A's in the universes from which the samples have been drawn.
However, in such cases we can proceed on the NIIJI HJPofhuis, i.e., on
the hypothesis that there is no difference in the values A and 11 and
whatever difference is there is due to sampling fluctuations. We can
further assume the value of p in the univers~ of Po as the weighted mean
proportion in the two samples taken together. In other words
SAMPLING OF ATTRIBUTES

P0= PxnJ'+P'J!Ia
n1+n:
This is the best possible estimate of Po that we can have in the
given circumstances. The standard ert:ors in two samples would be

J POqO and JPoqo. On the hypothesis that PI is in reality equal to P2


n" n2
the standard error of the difference would be

1: e. 1 - 2 =J Poqo (!_
nl
+ _1_)
na
If the observed difference between PI and pg is more than three
times the standard error of the difference it is sigriificant, otherwise it
could have arisen due to chance fluctuations and as such c:,'ln be
ignored.
The following examples would illustrate the above rule :-
Example 10. In a random sample of 1000 persons from town A
400 are found to be consumers of rice. In a sample of 800 from to~n B'
400 are found to be consumers of rice. Discuss the question wh~the;
the data reveal as significant difference between A and B so far aa the
proportion of rice consumers is concerned.
Soltltion
In the two towns together, the percentage of rice consumer!> is
(400+400) X 100
Po= 1000+800 44·4
and therefore
qo= (I 00-44.4) = 55.6
In town A it is 40 per cent. whereas in town B it is 50 per cent.
The difference in the percentages of rice consumeh in the two towns is
10. Assuming that the samples taken are simple samples, the standard
error' of sampling for the difference between percentages observed i.n
the samples of the given sizes would be:

-2=) Poqo (~+


s. '. 1
n
_1_)
n, 1

=J 6) (1~00+ 8~O)
(44·4X55·
=2.3S7 per cent.
The actual difference which is 10 per cent. is over 4.2 times this
standard error. So it can be ,concluded that the data reveal a signifi-
cant difference between A and B so far as the proportion of rice con-
sumers is concerned.
Example II. The following table gives the proportion of dark-
coloured people in two cities.
FUNDAMENTALS OF STATISTICS

City Total Population Percentage of


observed dark-coloured
A 25 0 42
B 45 0 33
Cal\ the difference observed in the percentage of dark-coloured
people be due to the fluctuations of sampling?
Soilltion}
I~ the two cities together the percentage of dark-coloured people is

p ( I0 5+ 149)XIOO 6
0= 250 + 450 ~ .~ approx.
and therefore !1o=(Ioo-36.3)=63.7 approx.
If this were the- true percentage, the standard error of sampling
of the difference between percentages observed in samples of the given
sizes would be

.T. e'1-2=(po!10)
(_1
t
fix
+_1)* 112

=tJI (36.3 X 63.7) (_I + _I )


250 450
= ).8 per cent approx.
The actual difference is 9 per cent and is 2.4 times of standard
error, hence, the difference observed can be attributed to the fluctua-
tions of sampling, though it cannot be very definitely said so. Usually
if the actual difference is more th~n twice the standard error of the
difference it is considered to be significant.
Example 12. One thousand articles from a factory are examined
and found to be three per cent defective. Fifteen hundred sin).ilar
articles from a second factory are found to be only 2 per cent defective.
Can it reasonably be concluded that the product of the first factory is
inferior to the second ?
SO/IIIion
In the two factories together, the percentage of defective articles
is
(30+30)x 100
1000+1500 =2·4
If this were the true percentage, the standard error of sampling
for the difference between percentagc.s observed in samples of the given
sizes ~ould be
. s. e. l-a=(PcIJo) *(-+ - )*
I I
"I .. 112

-(2. 6)t (_I + )t _1


- ·4X97· 1000 1500
=0.62.5 per cent approx.
SAMPLING OF ATTRIBUTES

The actual difference is I. per cent and is only 1.6 times this stan-
da.rd error a.nd so could have arisen due to fluctuations of simple
sampling.
Hence, it c~nnot be reasonably concluded that the product of the
first factory is inferior to that of the second.
Sometimes we may come across cases where the proportiort of
A's are not the same in the two materials or universes from which the
asmples have been chosen, but PI and P2 are the! real proportions. In
such cases we may be interestea in finding out whether the difference
would vanish if further samples were taken. Such a situation usually
'1.rises in questions where association between attributes is studied. The
proportion of A's in the universe of B and in the universe of ~ may be
I different from each other and we can presume that PI ~nd PIl are the
real propo.rtions and then we can test our hypothesis. We may then
find out whether further samples would also indicate the difference in
the proportiori of A's in the universe of B's md ~'s or whether the
difference has arisen only in the [resent case due to sampling fluctua-
1:lons. Here the standard error 0 the two proportions Pi and P2 would
be respectively as follows :-

s. e' 1 = J Pl(/1
111
and
s. e..,. = J'PaqS
n.
As such the standard errOr of the difference of j'l and h would be

s:e. 1-a= j PIQl


111
+ PaQ.,.
113

If the actual difference between PI and Ps is more than three times


the standard error of the difference, it is significant otherwise it may
have arisen due to sampling fluctuations and furthei: samples may not
indicate any signific'ant difference in the values of PI and h.
Example 13. Out of lOCO blind people 100 are found to be
deaf also. Out of 4000 not-blinds the number of deafs was found to
be 200. Do you think that further sampIes would reveal similar differ-
ence between the proportions of deaf people amongst the blinds and
not-blinds ?
SO/lItio,'Z
Here the value of
100 I
P1= - - or.l
1000
or 10<;0
FUNDAMENTALS OP S'rATIsnCS

The value of
200
PI= 4 000 or .05 or 5%

and of

fJa= ·95 or 95%.

Assuming that PI and P2 are the real propottions of the deaf


people in the two universes-blinds and not-blinds respectivdy, the
standard error of the difference or

I. e'1-2= J "I
h91 + hfJa
"2
= J 10 X 9
1000
0
X 5 X 95 per cent.
4000 .

=1. 01 9%

The actual difference between the two proportions is 5% which


is more than three times the standard error of tlie difference. It indi-
cates that if further samples were taken there would still be found a
significant difference between the percentage of deafs ill blinds and the
percentage of deafs in not-blinds. The difference in the l'roportions
has thus not arisen in this sample alone, by fluctuations of sampling.
The difference is genuine. It indicates that there is a teal association
between blindness and deafness.
Sometimes the proportion of A's ill one sample is. compared not
with the proportion of A's in another sample but wi~h the proportion
of A's in the two samples taken together. Thus if P1 and P2 represent
the proportions of A's in the two samples drawn from two universes
and if in lieu of comparing PI with P2 it is compared with the proportions
of A's in the two samples taken together or with Po where
I>
.1'0=
PlI11+P2"2
"1+"a
the standard error of the di.tIerence of PI and Po is given by the
following formula :-
s. e. 1-0= J"1+"2
Po'l~ X.!:!..
"1
Such questions arise ,,:hen in testing association between two
attributes A and B, instead of comparing the proportion of A's amongst
B's with the proportion of A's amongst B's the proportion of A's in
B's is compared with the proportion of A's in the universe. Thus, if in
example 13 we. wish to know whether the percentage of deaf people
in the universe, we shall get the following results.
SAMPLING OF AlTRIBUTES

s. e.1-o= j Potio
n1+nll
X n1
nil
In the present case Po or the proportion of deafs in the uni"erse
.IS equaI to -300- or .06 or 6%, and qo therefore = 100-6 or 94%,
5000
The values of n1 and n2 are respectively 1000 and 4000. Substituting
these-figures in the above formula we get

s. e·1-o=
J 6X94
1000 X 4000
X-
4 0 00
1000

The actual difference between the two percentages is 10-6 or 4.


This difference is more than three times the standard error of the
difference and as such is highly significant. It confirms -our previous
conclusion that the association between blindness and deafness is real
otherwise the difference between the percentage of deafs in blinds
could not have been materially different from the percentage of deafs
in the universe as a whole.
In all the above cases we have presumed that conditions of
simpb sampling hold good. If the limitations imposed by simple
sampling are removed all these formulae would need modification. We
shall, however, not discuss such problems as they are of an advanced
nature and ~re usually needed only by advanced students of the subject.
Questions
/
1. What is meant by simpJe sampling? What limitations does it impose on
the samples selected from different types of universes ?
2. How would you calculate the mean and standard acviation in simple
sampling of distributions ? .
3. What are the various tests of signiilcance generally used in sampling of
attributes ?
4. How would you calculate the standard error of the difference between two
proportions and what are the rules of interpreting it ?
5. A die is tossed 960 times and ~t falls with 6 upward 184 times. Is the die
biased?
6. A coin is tossed 400 times and it turns up head 216 times. Discuss whether
the coin may be an unbiased one, and explain briefly the theoretical principles you
would use for this purpose. (1. A. S.).
7. In tossing a hundted peonies a student gets 66 heads. Do you think that
he has used sufficient care to obtain a mndom tOES each time ?
8. In breeding certain stocks, 408 h~iry ao~ 126 glabr?us. plants wel~ ob~ained.
If the expectation is one-fourth glabrous. IS the divergence Significant, or mlght It have
occurred as a Buctuation of sampling ?
9. In a sample of 100 in Central U. P., 60 an: found to be wheat eaters and'
40 rice eaters. Can we assume that both the food articles are equally popular ?
PUNDAMENTALS OF STATISTICS

10. In a group' of 50 first lO:ousins there were found to be 27 males and 23


females. Ascertain If the observed proportions are inconsistent with the hypothesis
that the sexes should be in equal proportion.
11. 1000 ladies are chosen at random from the inhabitants of Bombay State,
and it is found that 55% of them have dark eyes and the remainder have eyes of some
other colour. What can be inferred about the proportion of dark-eyed ladies in the
Whole population of Bombay State ?
12. In a locality containing 18,000 families, a sample of 840 families was selected
at random. Of these 840 families, 206 families were found to have a monthly income
of Rs. 50 or less. It is desired to estimate how many out of the 18,000 families have
a monthly income ofRs.50 or less. Within what limits would you place your estimate?
(U. P. C. S., 1943).
13. Explain the terms (a) Statistia(b) Parameter(t) Standard error of a statistics.
A random sample of 500 pine-apples was taken from a large consignment and
65 were found to be bad. Estimate the proportion of the bad pine-apples in the con-
signment, as well as the $tandard etror of the estimate. Deduce that the percentage
of bad pine-apples in the consignment almost certainly lies between 8.5 and 17.5.
(1. A. S., 1954).
14. A sample of 1000 days is taken from meteorological records of a certain
distdct, and 120 of theJ]l are found to be foggy. What are ~e probable limits to the
percentage of foggy days in the district ?
15. In a newspaper article of 1600 words in Hindustani 64% of the words ate
found to be of HindI origin. Assuming that simple sampling conditions hold good,
estimate the protJortion of Hindi words in the writer's vocabulary and assign limits
to that proportion.
16. A dealcr takes 100 samples from a consignment of 10,oob items of certain
goods and finds that there are 50 items of grade I, worth Rs. 5 per thousand, 30 items
of grade n, worth Rs. 4 per thousand, and 20 items of grade Ill, worth Rs. 3 rr
thousand. Within what limits should the value of the consignment be fixed
17. A man buys 1,000 sacks of potatoes. He finds thlIt from 1,000 potatoes
chosen from the sacks at random, 400 atc ofc1ass A, worth Rs. 10 a sack; 250 are of
class B, 'Worth Rs. 7 per sack; 200 atc of class C, worth Rs. 5 a sack; and 150 are of
class D, WOlth Rs. 4 per sack. What are the upper and lower bounds for the value
of the potatoes ?
18. In town A, 19400 ~rsons were observed and 27% of them were found
to be short-sighted. In town B, 29750 persons were observed and 30% weJe found
to be short-sighted. Can the difference observed in the percentage of short-sighted
persons be attributed solely to the fluctuations of sampling ?
19. One type of ~rcraft is found to develop engine trouble in 5 flights out
of a total of 100 and another type in 7 flights out of a total of 200 flights. Is there a
significant difference in the two types of aircrafts so far as engine defects are concerned i
20. In 1910, for white males in the age of group 30-34 years, the number
dying in <lljca~o was 902 out of 1,06,307; in New York 2130 out of 2,21,598(United
States Life Tables). Are the death rates in the two citics significantly different ?
21. In 1910, in the original registration states, the number of white males
dying between the age of 30 and 31 was 1609 out of a population of 2,53,445 of white
males in this age group;the corresponding figures for white females of the same age
were 1302 out of 2,39,912. "United States Life Tables." Is there a significant
difference between the death rates of the two sexes at this age ?
22. In a certain association table the following frequenciec were obtained :-
(AB)=927; (A~)=642;
(aB)=296: (Q~)=357;
Can the aasociation of the table have arisen as a fluctuation of simple sampling
the true association J>eing zero.
SAMPLING OF ATTRIBUTES

23. The ligures for ~ntitoxin treatment in a hospital, {or a certain period, in •
the trestment of dipbfheria were : -
Cases Deaths
·Antitoxin treatment 228 37
Ordinary treatment 337 28
Can it be concluded that there were significantly more deaths in the group
treated by antitoxin.
24. The following table relates to the hair colour of girls at Bombay.
Of Dark Total Percent
Hair Colour observed Dark
Bombay 21,537 49,507 43.5
Film Sector of Bombay 4,008 9,743 41.1
Non-Film Sector of Bombay 17~529 39,764 44.1
Do you regard the difference observed in the percentages of girls of dark hair
colour in Bombay and its Film Sector as significant?
25. The subject under investigation is the measure of dependence of Tamil
on words of Sanskrit origins. One newspaper article reporting the proceedings of
the Constituent Assembly contained 2025 words of which 729 words were declared
by a literary critic to be of Sanskrit origin. A second article by the same author des-
cribing atomic research contained 1600 words of which 640 words were declared by
the same critic to be of Sanskrit origin. Assuming that simple sampling conditions
held estimate the Bmits for the proportion of Sanskrit terms in the wnter's vocabulary
and examine whether there is any significant difference in the dependence of this writer
on wdrds of Sanskrit origin in writing on these two subjects. (1. A. S., 1947).
26. Show how you would test the significance of the difference between the
prevalence of a certain attribute in two given populations from each of which you
could take large samples.
In a random sample of 500 men from a particular district of U.P•• 300 are found
to be smokers. Out of 1,000 men from another district. 550 are smokers. Do the
data indicate that the two districts are sign. tcantly different with respect to the pre-
valence of smoking ,among men ? (P. C. S., 1953).
Sampling of Variables
(L,arge Samples) 27
NIlMI oj tbe problem. In ~he last chapter we studied the sampling
of attributes and were concerned with the question whether a particu-
lar member of the sample possessed an attribute A or did not possess
it. Now we shall be discussing the sampling of variables and here we
shall come across such individuals which can assume any value of a
vapable. Thus in a series relating to heights we are nO'more concern-
ed with the question whether a particular individual is tall or not tall
but we have before us individuals who can have any height ranging
fro1l;l the lowest to the highest. Under such circumstances we cannot
classify the items of a sample in two groups-one ~ossessing an attribute
and the other not possessing it, because in statistics of variables the
values of various items of the sample can range within wide limits.
Theoretically speaking the ·limits arCf infinite but for the sake of con-
venience and practical considerations the range is limited.
Ol!}crts Of siutfy. The aims of samp,ling studies in statistics of
variable are the !lame as in case of statistics. of attributes. Here also we
compa,., the arlllaL or observed frequencies wilh Ihosl eXPfcled IIfIder cerlain'
aJ.fumptions and try to find out whether the difference can be audbuted
to chance. As in sampling of attributes here too we try 10 oblain one
or 11JJO ronstants for fhe universe-mean or standard deviation because if
they are obtained, an idea about the type of parent distribution is easily
formed. In sampling studies relating to variables, as in sampling studies
of attributes our third aim is to a/seSs toe reliability oj 01lr estimates.
Sampling distribution
In statistics of variables generally the question of finding o~t the
values of p and q does not arise and as such it is very difficult to obtain
the expected frequencies. If, however, we take a large ilumber of
samples from the same universe and calculate any function (mean or
standard deviation, etc.), we snaIl have a series relatiIlg to the vaJues of
the function. If say, 100 samples are taken and if the mean value of
each sample is found out, we shall have a series relating to the mean
o£ 100 samples. Similarly we can have a series relating to standard
deviation of 100 samples. Such series relating to the values of a func-
tion are called Sampling Distrib1f.tiolls. There is one, very important
characteristic of all sampling distributions and it is that they give a more
or less normal distribution. If the number of samp~. used in the
sampling distribution is large, almost invariably, the sampling distribu-
tion would be a normal distribution even though the parent di!!tribution
from which tbe samples have been taken is not normal. This is a very
important and useful cbaractcristk whkh helps. in the analysis of samples
S:'UiPLING OF VARIABLES (LARGE SAMPLES'

relating to variables. Such a distribution relating to any function


obtained from a large number of sampling studies is, as we have said
above, called a sampling distribution. Thus, if the mean expenditure
of university students is estimated in 100 samples there will be 100 mean
values and this series of means will be called a Sampling Distribution of
Means. Similarly a series of 100 standard deviations obtained through
100 samples studies would be called a Sampling Distribution of Standard
Deviations. These sampling distributions, as has been said, would con-
form to normal distribution.
Sampling distributions are of very great theoretical importance
since they resemble normal distributions. If we have a sampling dis-
tribution of means relating to the expenditure of university students
we can estimate the mean of the universe from this distribution and
can lay down limits within which the actual mean would vary. We
know that in the normal distribution there is a mathematical relation-
ship between the area enclosed within the mean ordinate and the ordi-
nates at various sigma distances from the mean and as such we can
calculate the probability of a particular mean value significantly differing
from the mean of the universe. The actual mean of the universe in
such cases would almost certainly lie within the two limits given by
mean ± 30'. The area of the curve enclosed within thes~ limits is
,.9973 or 99.75% of the total area. Thus the area outside these limits
Iis only .002.7 or .2.7%. The probability of the actual mean being
outside these limits is taus oofy .002.7 which is a very insignificant
figure.
It should, however, be remembered that a sampling distribution
would be a normal one only if both, the number of samples and their
size are large. If the sampling distribution does not gtve a normal
distribution it is an indication that the samples are biased or inadequate
or there is something wrong with the sampling technique. It should
also be remembered that in actual practiCCf all sampling distributions are
based on simple sampling. Thus conditions of simple sampling dis-
cussed in the last chapter are assumed to hold good in all cases which
we shall discuss below.
The standard deviation of the sampling distribution 'Would be
called Standard Error.
Simple sampling of variables
As has been said above we shall study sampling of variables
under simple sampling conditions only. Thus our assumptio.o,s would
be:
r. That we are drawing our samples from precisely the same
record.
2.. That each member of our sample at each draw is drawn from
the same record; -20.d
3. That the drawing of each member of the sample is indepen-
dent of the draws of all other members.
FUNDAMENTALS OF STATISTICS

If these conditions hold good tlien the sampling distributions can


be safely represented by the mean and standard deviation of the data
of the sample, as in case of simple sampling of attributes. In other
words. we can then safelYlresume that mean± ~a would cover almost
all cases and the value 0 the parameter would not lie ou(side these
limits. Here again we shall use the standard error to give us an idea
about the precision of our results.
Since we have presumed that sampling distribution is normal and
further that simple sampling conditions are satisfied we can use the
mean of the sample in place of the unknown parameter mean. In such
cases we shall have to make certain approximations. -Por example. if
the sample mean is used as an estimate of the population mean and if
the standard deviation of the population is estimated by taking the
deviations from the sample mean the formula should be. that standard
. . IS
deVlatlon J-Ed_ lOstead fJEdn
. equa1 to 2
• 0
l
--. If n is large we -can

use the value JE;I as standard deviation. since


n-I
this approximation
would not 1l'I_aterially affect the results.
As in the chapter on sampling' of attributes here also we shall
refer the standard deviation of the sampling distribution as standard
error. In actual practice, however. sampling distributions are not avail-
able and we have to make estimates of the parameter of fd,nction from
the values of one sample only. In such cases we shall assume the
mean or standard deviation of the sample as the mean or standard
deviation of the universe and then lay down limits within which the
particular parameter function would be expected to vary. This being
so, the rest of the chapter is devoted to the calculation of the standard
errors of the various parameters which have to be studied.
Standard ettor of the mean
We know that the standard error of the sampling distribution of
mean is equal to its standard deviation. In case only one sample mean
is available the standard error of the mean of the parameter is calculated
by the formula:-
Standard Error
. u(Population)
of the Mean= .~
yn-I
Where n stands for the number of items in the sample.
In case the standard deviation of the population is not known the
standard deviation of the sample can be substituted in its place and then
the formula would be :-
Standard Error
a(Sample)
of the Mean= ~I
yn-I
SAMPLING OF VARIABLES (LARGE SAMPLES)

Hut if the Slze' "~f the sample is lar'ge or" in aU• .:! words i.f n \5
large. '\/n.::::I can be approximated as vn:
as it would not make an
appreciable difference and the formula then becomes
Standard Error
o{Sample)
of t h e M ean=
__o__ _"__;:

Vn-
It should be observed that the above formula is derived without
any reference to the form of the parent distribution or the proportionate
size of the sample and is therefore of general application. The standard
error of the mean can give us two limits within which the parameter
mean is expected to fluctuate. Our supposition here is that the s~mpl­
ing distribution of mean would conform to the normal curve and even
when there is one sample. mean ±3 u would give us the range within
which the mean value is expected to vary.
Suppose in a given case the sample mean has a value of 40 and the
number of items in the sample is 12.1. Suppose further that the stand-
ard deviation of the sample is 16.5. Then the
Standard Error
16·5 16,5
o f t h e mean=--=--=1.5
vlli II

We can now say that the parameter mean is expected to range


within 40±(3 X 1.5) or between 40±4.5 or between 35.5 and 44.5. We
know that in a normal distribution ~ean± 3(7 covers .9973 of the total
area of the curve. In other words, the chances are .9973 out of one or
99.73% that the parameter mean would lie within this range. We can
put it otherwise ,also that the probability of the parameter mean being
outside these limits is 1-.9973 or .002.7 only. It is obvious that the
probability of the conclusion being wrong is very low. If, however,
we take narrower range say mean±2.O' the limits would be 40±3 or n
and 43. Now in a normal distribution mean±2.(7 covers ,954, of the
total area. Here the probability of the parameter mean falling outside
these limits (37 and 43) is I -.9'4' or .0455. If we take the criterion
mean ± I .96u the area covered within these limits is .9500 of the total
area or the probability that the parameter mean would be outside this
range is 1-.9500 'Or .05. Usually the criterion used is either 1.96(7 or
2. cr. In the former case (I.96u) we shall be correct in 91% cases and in
the latter case (2.(7) we shall be correct in our assertion in 95.45% cases.
The term Sampling Error is sometimes used to indicate tho: error
at a certain level of significance. Thus in the above illustration where
the standard error is 1. 5 the sampling error at 5% level of significance
would be 1.5 X 1'96 or 2.94. We know that mean ±1.96(7 gives two
limits which cover 95% of the total area of the curve and the prohahility
of a value being outside this rano-e is .os or 5% only. The value 1.96
is the critical t'alllc at 5% level of significance. Thus sampling error is
equal to the standard error multiplied by critical value. The critical
710 FUNDAMENTALS OF STATISTICS

value at 1% level of significance is 2.5758 which means that mean I

covers 99% of the area of the curve. It should be noted th~


2.) 7 580"
the level of significance is inversely related to the extent of precision.
~hu~ 5% level of signi£ca~ce. indicates less. precision than I % level of
sIgnificance. In 5% of slgmficance we will be accurate in 9S<X of
the cases and in r% level of significance we shall be accurate in 99% of
the cases.
Sometimes the term level of confidence is used in place of level of
significance. The two limits which are obtained at a certain level of
significance (or level of confidence) are given by mean±sampling error.
These limits are also called Confidence Intervals.
The following examples would illustrate the use of the formula
relating to the standard error of the mean : -
Example I. To know the mean weight of all Ioyear~old boys
in the State of Rajasthan, a sample of 2Z5 is taken. The mean weight
of this sample is found to be 67 pounds with a standard deviation of 12
pounds. Can you draw any inference from it about the mean weight of
the universe ?
Solution
Standard error of the sample mean (weight) is
G 12
s. e. til =- =-'=.8pound.
VN
V225
Assuming that the conditions of simple sampling hold good the
mean weight of the universe in all probability would be lying within
the limit set by three times this standard error to the mean of the
sample.
So, the mean weight of all Io-year-old boys in th~ State of Rajas-
than lies between 67 pounds ± 2.4 pounds, that is 64.6 pounds and
69.4 pounds.
Example 2. An industry desires to make a survey of --the mean
weekly wage of 10,000 of its workers. Since a study of all the
workers is impossible, a representative sample of 400 wbrkers is selected.
The mean weekly wage of the 406 workers is Rs. 30 and the standard
deviation Rs. 2.50. If additional samples were selected, by how much
would the results differ from the above sample ?
Solution
Standard error of the mean weekly wage of the given sample is
= G =....!:1...= .12 5 .Re.
s. e. '" V N V400 .
If additional samples were selected, their means would not differ
by the mean weekly wage of this sample by more than three standard
errors of the mean of this sample. That is, the ~ean weekly wages
of all the additional sa~ples would lie. between Rs. 30±'375 or Rs.
29.625 and Rs. 30.375
SAMPLING OP VARIAllI..ES (LARGE SAMPLES) '~<ll

Examp16 3. Suppose that it has been determined that the a~etag4


pulse rate of miles ita the 2.0-25 years age group is 72. beats per inutL.
and that the standard deviation is 9.5 beats per minute. If a group
of 55 distance runners, all in the given age-group, were examined and
found to have an average pulse ·rate of 65. should this be rega: :ed
as a significant deviation from the general average?
Solulion
Standard error o.f the average pulse rate of 55 distance runners
is
= CT = 9·5 = 1.28 peats per minute
s. e. m V N V55
The difference between the two average pulse rates is 7 beats per
minute, which is 5.47 times this standard error.
Hence the deviation of the average pulse rate of distance runners
from the general average is significant.
Example 4. Suppose that for tyres of a certain factory, the mean
mileage is 15180 and the standard deviation is 12.48 miles.
Suppose, further that a sample of 900 tyres shows a mean mileage
of 15232' niiles with a standard deviation of 12.10 miles. Ascertain if
the sample mean represents a significant divergence from the pop\'lla-
tion mean.
Solution
The standard error of the sample mean is

.r. e. m=~;
VN
where up is the standard error of the entire population;
and
N is the numb!!r ,of items... in the s.ample
Substituting the given values, we get
124 8 .
s. e. m=-=4I.6 miles .
. V900
The observed difference between! the tWo means of mileage is
15232-15180, or 52 miles. This is only 1.25 times this standard error
and so could ha.ve arisen through the,fluctuations of simple sampling. '
Hence it can be said that the divergence of the sample mean from the
population mean is not significant.
Nole. When the standard deviation of the population is given,
the standard deviation of the sample should not be used as the latter
is only a substitute for the former.
FUNDAMENTALS OF STATISTICS

Example s. The data concerning height measurement for a random


sample of individuals from a given population are as follows :-
Mean is equal to J72.
S. D." " " IZ
N. " " " 65
If a large number of samples of the same size were selected at ran·
dom from the given population, what would be the limits of the z%
confidence interval for the true mean ?
Solution
Standard error of the mean height of the sample is
a lZ
s. f· .... =~= - = 1 . 5 approx.
VN ~
At z% level of confidence the limits of the variation of the mean
would be 172.±(J., XZ.'P)=I7Z±;.48=168.5-Z and 17'.48 (98% of the
area of a normal curve is included within M±z.;z.a).
Standard error of the median, quartiles, deciles, etc.
In a normal distribution the values of mean and median coincide,
but the standard error of the mean is to, the standard error of the
median roughly as 100 is to U1. It should be noted that it is for this
reason that we have stated in the chapter on Measures of Central
Tendency that median is in general more affected by fluctuations of
sampling than the mean.
The standard error of the median or
a
s. "md,,=I.2SHI-
v'Ii
The standard error of the quartiles (both first and third ) is given
by the formula:-
a
s. e. q=J.;6z63-
v-n
The standard errors of the various deciles arc as follows : -
Standard Error of
First and Ninth decile

Second and Eighth decile

Third and Seventh decile

Fourth and Sixth decile

Fifth decile or median


SAMPLING OF VARIABLES (LARGE SAMPLES)

The following examples would illustrate the above rules : -


Example 6. Suppose it has been found out that the median
weight of the soldiers of Rajput Regiment i.s ~ ~ 4· 7 lbs. If t.he n!Imber
of the soldiers is 7750 and the standard deviation of the weight 1S %.1.>.
lbs., find out the standard error of the median weight.
Solulion
Standard error of the median weight
a
=1.2~33lrn

Substituting the value for (/ and N.


We get,
I.2SBI X2I.;
s.e. m~ = =.303 lbs.
""175 0
Thus, the standard error of the median weight is ';03 lbs.
Exalllple 7. Suppose it has been found out that the first Quartile
height of 580 post-graduate students of a university is 65.71 inches and
the third quartile height for the same is 69.21 inches with a standard
deviation of 4.56 inches. Find out the standard errors of the two
quartiles.
SoilIlion
The standard errors of the first and the third quarttles are given
by the formula,
(/

, vn
s.e. q=I'36z63--=

Substituting the values of (/ and N.


we get,
1.36263 X4. \6
Sol. 9. 1/'5"8c)
=Anti log. [(log. I.~6:t6;+log. 4.S6)-!(10g. 580)]
=Anti log. [(.1341+.6190)-!(2.7634)J
=Anti log. [.7931-1'3817]
=Anti log. I.4II4
=.2.576
Thus, the standard error of the quartiles is .2. 576.
Example 8. Suppose in U. S. A., .z,50,000 marriages were con-
tracted in a certain year in which the ages of the bridegrooms varied
from 10 years to 80 years. If this distribution has got a standard
deviation of 8 years, what would be the standard error of the 1st,
2.nd, 3rd, etc., decile ages of the bridegrooms if they are calculated
from the given data.
FUNDAMENTALS OF STATISTICS
714
Solution
The standard errors of the 1St and the 9th decile are given by
the formula.

s.e. 1St or 9th DeCl'1e=I.7 0 94 2 - (1

VN
Substituting the values of a and N we get for the age of bride-
grooms.
s. e. 1st and 9th Decile

The standard errors of the l.nd and the 8th decile are given by
the formula.
s. e. 2nd and 8th Dedle= I.42877....!!._ ;
VN
Substituting the values of (1 and N we get for the age of bride-
grooms.
I.428 77 X 8~
s. e. 2nd and 8'th Decile ~~~== =.023 year.
V 2 50 ,ooo \
The standard errors of the 3rd and the 7th deci1es are given by
the formula.
(1
s. e. ;rd and 7th Decile = 1.3 1800-
v'N
Su'!Jstituting the values of a and N we get for the age of l?ride-
grooms. ---- _ _

s. e. 3rd and 7th Decile I·31 800 X8


"';"~====.O2.1 year.
V2, 50,000
The standard error of the 4th and the 6th deciles are given, by
the formula.

s. e. 4th and 6th Decile = 1.268 0 4 VaN ;

Substituting .the values of a and N, we get for the age of bride-


grooms.
8
s. e. 4th and 6t h D eCl'1e- I.268Q4X == .020 year
'1/ 25,00,000
The standard error of the 5th decile is the same as of the median'
a.nd is given ~y the formula.
SAMPLING OF VARIABLES (LARGE SAMPLES)

a
S. e. md,,=I·2.5;3 1 VN;
Substituting the values we get,
I.z53P x8
s. e. d ,,- = . I =.ozo year.
v 2.50000

Standard error of various measures of dispersion and skewness


Standard error of the meafl deviation or
a
s. e. m' d=.602.8
VN
Standard error of quartile deviation or

s. e. P' d=·7 86 7 z v:
Standard error of the standard deviation or
a
S. e. a=. I
vzn
Standard error of variance or

s. e.a 2=( 2) :

Standard error or coefficient of variation or (v)


s. e. v=. IV J1+2.V42
V zn II/. 10

The standard error of Karl Pearson's coefficient of skewness


or

s. e. I r= J -3-
2.11
Some of the above forntulae are illustrated below in the follow-
ing examples :.,....
Example 9. For the height distribution of 7575 females. ~he
standard deviation is found to be 2.46 inches .. Within what limits 'this
standard deviation may be taken to be correct ?

SoluJion
Taking the universe to be normal, tile standard error of standard
deviation of the height is
a 2.46
e. a= V 2N = V2.X757S =0.02 approx.
FUNDAMENTALS OF STATISTICS

Hence, assuming that the sampling is simple, we may liay that S. D.


in the universe almost certainly lies in the range 2.46±0.66 (i.I. Stand~
ard Deviation ::1:3 s.e.a.)
Example 10. To find out the averaze age of bridegrooms in
U. S. A. 20%,851 marriage cases were studied. The average age of
bridegrooms for this sample was found to have a standard deviation of
.z.655(). Estimate the standard deviation for the average age in the
population and assign limits within which this will probably lie ?
Solutiol1
Assuming the parent distribution for the age of bridegrooms to be
norm~l, the standard error of the standard deviation of the sample is
a 2.6556
S. I. a= - _ = .1 =.0042. approx.
v' 2N V202.S57X2
If the sampling is simple. the standard deviation of the universe
cannot deviate from the standard deviation of the sample by more
than three times the standard error of the standard deviation of the
sample. Hence, we may say that the S. D. in the universe almost cer-
tainly lies in the range 2.6556±0.0t26, that is 2.6430 and 2.6682.
Example I I. What would be the standard error of the variance if
in a series the number of items is 100 and the standard deviation is 5 ?
SOIIl#OIl
Variancc=a l
In this problem
Variance = 5'= 25
The standard error of variance or

J. I.al=u! J:
Substituting the values We get
J. ,.0'=2.} j z.
100

=3·535
Thus the standard error of the variance in the problem is J. J3 f.
&alllpl, u. The following is the frequency table of Six Months
Prime Commercial Paper RateS, January, 1931 to December, 1941.
Class InterVal. Frcqucnciea
2.50 to 2.99% .l
3. 00 to 3·49 7
3' ,0 t:. 3.99 .10
4.00 to 4.49 30
4.,0 to 4.99 2.0
,.00 to '.49 10
5.5 0 to 5·99 S
6.00 to 6.49 6
S1UtiPLING OF vARIABLES (LARGE SAMPLES)
i
Calculate the standard error of the coefficient of variation for the I
above table. I

SO/Illioll Calculation of the coefficient of variation.


'"u
.E
...
e-
c:l
Devla-
tion
. .ou·. ...=
0
Frequency Frequency
Size of item
~
~ from the ~~ ....... X X square
u0" as. avo g.t deviatiom of
~ ~ (4.25) VJ"tI
deviation
%.5 0 to %'99% %·75% % -1.5 0 %.z5 - 3 4·5
3. 00 to 3.49" 3·%5 " 7 -1'.00 1.00 -7 7. 0
3.5 0 to 3.99" 3·75 " zo - .1 0 '%5 -10 1. 0
4. 00 to 4.59" 4. 2 5 " 30 0 0 0 0.00
4.5 0 to 4· 59" 4·75 " 20 +.5 0 ·2.5 +10 1. 0
1.00 to 1.49" 1. 2 1 " 10 +1.00 1.00 +10 10.0
5.5 0 to 5.99" 5·75 " 1 +1.5 0 z.z5 + 7·5 II·2.5
6.00 to 6.49 " 6·2.5 " 6 +2..00 4. 00 +u 24·00
11=100 dx= Edx 2
+19·5 =66."
Arithmeti& lltIerage for the distriblltioll or
Edx 19.5
a=x+ --=4.2.5+--4.4
n 100
(approx.)

Standard deviation or
a=J Edx 2 _ (Xdx)1I = J66.75 _
II n 100
(.:2:1...)2
100

=·79
Coefficient of variation is
aXloo ·79Xloo'
V =---= =17·91
a 4.4
The standard error of the coefficient of variation is given by thel
formula

s. V
e.v=~
v 2.n
J1+-,- 2V
10
a

Substituting the values of V and II in the formula,

s. e. • =
17·95
. I 2. XIOO
'V
J1+ 2 X(17'9S)2
10'
= 1·3
Thus, the standard error for the coefficient uf variation for the
given distribution is 1.3'
FUNDAMENTALS OF STATISTICS

Standard error of coefficient of correlation, regression and associa-


tion
The standard error of the coefficient of correlation or
l-r2
S.e. r = __
vn
The standard error of the regression coefficient or
b aXV1-r 2
s.c. = - - - - -
aY(Vn)
The standard error of the regression estimate (yon x) or
s. e.Y"=(]'yVI~
Standard error of the ,regression estimate (x on.Y) or
s. I.Z1/ =axVl-r2
Standard error of the coefficient of association or
( 1_Q2) J-I----I--~I----I-

s. e. q = z (AB) + (A~) +1-(>""":aB=) + (a13)


The following examples illustrate some ofthe a,bove formulae :-
Example I;. To study the correlation between the stature of the
father and the stature of the son, a sample of 900 is takt:n from the
universe of father and sons. The sample study gives the correlation
between the two to be +.67. Within what limits does it hold true for
the universe ?
Soilltion
The standard error of the correlation coefficient between the sta-
ture of the father and stature of the son is
I-r2 1-(0.67)2
s. c. r= - - - = 0.18
VN V~
If, the sampling was simple, the correlation in the universe can-
not deviate from the correlation in the sample by more than thrice this
standard error, i.e., .054. Hence, the correlation in the universe 'most
probably lies within .67±.oH, that is, .616 and .72.4.
Example 14. To find out the correlation bet.ween the yield of
milk per week and percentage of butter fat, a sample study of 12.2.50
buffaloes is made. This gives the value of r to be -.082.. Fh!d out the
standard error of the estimate.

foltltion
The standard error of the calculated r 3n the sample is
1-(1-0.082.)2
1.1, '- 0. 008 9
VN VIUSO
SAMPLING OF VAlUABLES (LARGE SAMPLES)

The correlation observed is nine times of its standard error and


small though it is, could not have arisen from sampling fluctuations.
Example IS. Finp, out the r~gression coefficient of Hapur prices
over. Karachi prices from the following data and also calculate its
standard error.
assuming n to be 900 Hapur Karachi
, Rs. Rs.
Av. price per maund of wheat 18 IZ
Standard deviation ; Z
r between prices at Hapur and Karachi +.67,
SolNlion
Regression coefficient of Hapur p..:ices ex) over Karachi
(T",
prices (y) bxy=r-
(Tv

=.67-3-= 1.005
%
The standard error of this regression coefficient is
(T .. '\f'I-r2 hiI -(.67)Z
s.e·b= =
(Tllyr:r- .zy~
=.02.8.
Thus, the regression coefficient, of Hapur prices over Karachi
prices is 1.005 and its standard error is .0%8.
Example 16., Fot a given group of adults, the coefficient of cor-
relation between height and weight is .6; standard deviations of height
and of weight are; inches and IZ lbs. respectively and the means in
height and weight for the entire group are 69 inches and l45 lbs. res-
pectively. Find out the best estimate of the weight of an individual
who is 7% inches tall. Assign limits to this estimate in which in all
probability his actual weight would be lying.
Soilltion
The regression equation of weiglit (Y) on height (X) is
(Y -A II )="..!!-II (X-A~);
(Til

or Y =r~(X-A",)+AII'
(T.

Substituting the given values, 'We get


·6xu.
Y= - - (X-6 9)+145
3
= Z.4X-I6,.6+X45
=%.4X-zo.6
7]r,
FUNDAMENTALS OF STATISTICS

Estimating the weight of an individual whose height is 72.


inches, we get
y 79= 2.·4 X 72. -2.0.6
=152..2. Ibs.
The standard error of this estimate is
s. e' Vf1J =u vVI-r2
=12VI-.61
=9.6 lbs.
Thus, the best estimate of th~ weight of an individual who is
72. inches tall is 152.1. lbs. and this estimate cannot deviate frpm his
actual weight by more than three times this standard error. Hence in
all probability his actual weight would be lying between 152.2 Ibs. ±
2.8.8 lbs., that is, 123.4 lbs. and 181.0 lbs.
Example 17. Two groups of children, one belonging to the
professional class, 125 in m~mber, and the other belonging to the
labouring class, 12.4 in numbc;r are compared and the following
results are obtained :-
~oor children Well-to-do children
per cent per cent
Below normal weight 55 I;
Above normal weight II 48
Find the coefficient of assoClation between the weight of the
children and their social status. Also calculate the standard error of
this coefficient of association.

Solution
Denoting poor children by A, well-to-do children by a and below
normal weight by B, and above normal weight by {3,
we get,
(AB)=55 (aB)=I;.
(A{3) = I I (a{3)=48.
Substituting these values in the formula,
Q= (AB) (a~)-(A~) (aB)
(AB) (a{3)+<A{3) (aB)
Where Q represents coefficient of association,
we get,
(55 X4 8)-(II X 1 3)
Q= (5SX48)+(IIX I 3)
2.6 4°-143
+
= 26 4 0 1 43
=+.89'
SAMPLING 01' VJlRIABLES (LARGE SAMPLES)
711
The standard error' of this coefficient of association is

s. eo 9=
l-QI
.t
J 1 I
CAB) +~+(aBj+ (4f
I 1

=1-<.8 9)lj_I_+ _1_+ _1_+ _1_


2 ~J 1I 1~ 48
=.1477·
Thus the coefficient of association between the weight of the
children and their social status is +.89 and its standard error is .1.77.

Up till now we have discussed the various formulae for the cal-
culation of the standard error of a measure calculated from a sample
and on its basis we bave tried to lay down the limits within which the
parameter value of that particular measure is expected to vary. We may,
many times, come across problems of slightly different type where the
results of two samples may be before us and we may then have to test
whether there is II. significant difference between the results of the two
samples and to find out whether two such samples could have come
from the same universe. We shall discuss below some problems of
this type.
Standard error of the difference of sample means
Suppose two samples have given us two mean values and we have
to find out whether there is a significant difference between the two
values, or whether they could have come from one universe or from
universes having the same mean and standard deviation. Here we shall
calculate the standard error of the difference of the two sample means
and then we shall find out whether the difference between them is more
than say thl:ee times the standard error of the difference. If the actual
difference between the two means is more than thrice the standard error
of the difference it is said'to be significant otherwise the difference can
be due to fluctuations of sampling.

There is no hard and fast rule of taking the criterion of three


standard errors only. As has been said earlier mean±3 standard error
would cover 99073% of the cases"and the probability of a difference
equal to or more than three standard errors arising out of chance would
only be 1 -.9973 or .0027. We can test the results at 1.96 standard
error in which case as we have seen earlier the probability of getting a
difference equal to or more than I '96 standard error due to sampling
fluctuations would be .05.
The formulae for the standard error of the difference of two
sample means are as follows :-
(1) Where the standard deviation of the universe is not known.
~J.2 FUNDAMENTALS OF STATISTICS

(2) Where standard deviation of the population 1S known,

j 2(_I + __1_)
eT
P 111 1/2

(3) Where mean of sample number I, is compared with the com-


bined mean of the two samples.

Ja/ nl;£~n2)
In all the above cases we presume that simple sampling condi-
tions hold good. If, however, the two samples are from such universes
Dc tween which there is a correlation, the standard error of the differ-
ence of sample means would be

j 0"]2
--
2
+- - -
--~----~---------
0"2
-2r - - -
0"1 X0"2

111 112 "l X 112


The following examples would illustrate the above formulae :-
Example 18. From the data given· below, find out whether there
is a real difference in favour of the soldiers of Scotch extraction in
matter of weight, or is the difference so slight that it may be attribu-
table to chance?
N, eT

Soldiers of Scotch extraction 18.11 ; 144·93 pounds~ 17.41 pounds


Soldiers of French extraction 746; 142.16 >, ; 16.04 "
Solutiol1
Since the two samples are drawn quite independently'from different
universes, the standard error of the difference of the two mean
weights is
eT~2
s. e' m1 - m 2=
J
---

Substituting the given values we get


a12

111
+---
112

+ (16.04)2
s. e' m1 -m2=

=·715
J
(1 7 .4 1)2
182 I 74 6 J.5 4
11

The observed difference between the mean weights of soldiers or


Scotch extraction and French extraction is 2.77 lbs. which is more than
three times this standard error. Hence the difference in weight.is a
real one in favour of Scotch extraction.
Example II). A random sample of 200 villages was taken fran:
Gorakhpur di~trict and the average population per village was found
to be 485 with a standard deviation of 50. Another random sample of
200 villages from the same district gave an average population of 510
per village with a standard deviation of 40. Is the difference between
the average of the two samples statistiCAlly significant? Give reamns.
SAM]>LING OF VAlUABLES (LARGE SAMPLES)

SO/It/ion
Supposing the two samples have been drawn quite independently
the standard error of the difference of the two means is

s.e· m1-!U2=
n1
j n2
Substituting the given values, we get,

The observed difference of the two average populations per


village is 510-485, Or 25, which is thus 5.5 times this standard error
and so could not have arisen through fluctuations of simple sampling.
Hence, the difference between the average of the two samples is
statistically significant.
Examp/e 20. If 60 new entrants in a given university are found
to have a mean height of 68.60 inches, and 50 seniors a mean heigh1
of 69.51 inches, is the evidence, conclusive that the mean height of the
seniors is greater than that of the new entrants? Assume the standarc
deviation of height to be 2..48 inches.

Solution
Since the two samples are independent and come from the same
uniyerse under simple sampling conditions, the standard error of the
difference of two mean heights is
,-------------------
s. e. ml-m2=J (]2p (_1_+
n 1
_1__ ) "2

Substituting the given values, we get,

s. e. ml-m2=J (2..48)2 (_1_+ _1_)


60 50

=·47
The observed difference of the two mean heights is '91 inches
which is only two times this standard error and so could have arisen
throLlgh the fluctuations of simple sampling. H:::llce it can not be
saId that the mean height of the seniors is greater than that of the new
entrants.
Example 2.1. The mean produce of wheat of a sample of 100
fields comes to 2.00 lbs. per acre with a standard deviation of 10 lbs.
Another sample of 150 fields gives the mean at 2.z.~ l~s. with a 5t~ndard
deviation of 12. Ibs. Assuming the standard devIatIOn of the yIeld at
I 1 Ib3. for the universe find out if there is a significant difference bet-
ween the mean yields of the two samples.
71.4 FUNJ)A)lBNTALS OF STATISTICS

S"/lIliull
Supposing the samples arc independent and come from the same
universe, the stan!iard error of the difference of the mean yields would
be :

.r.e.""'..>1......s==J qlp (_1_ + _1_)


"1 "z
Substituting the given values.......e get,

=J{lI)I(_1 +_1) 100 1,0

= 1·42,

The observed difference between the two means is 2,0 lbs. which
is more than thrice the standard error of the difference of means. Hence
the difference is significant and could not have arisen due to fluctuation
of sampling.
Exaulple H. A random sample of 100 villages in a district gives
the mean population of 500 persons per village. Another sample of
150 villages from the same district gives t'!~ -mean at 504. If the
standard deviation of the mean population of villages in that district
is 10 find out if the mean of the first sample is signHlcantly different
from the combined means of the two samples taken together.
S"/IIIiOl1
The combined mean of the two samples
(lOOX 500)+(qox 504)
= 100+J50
= ,02·4
The difference between the first sample mean and the combined
mean thus is (502.4-,00)
=1·4
The standard error of the difference of the first sample moan and
the combined me:.n is
,,2-
=J up" IJ1(ll1+nJ
Substituting the values we get

=J (20)1
1 0
5
100(100+1)0)
, - v' .1.4-

== 1·~49
The observed difference is less than thrice the standard error of
the difference and as such it could have ali sen due to fluctuation of
sampling. hence. the difference is not significant.
SAMPLING OF VAR.IABLES (LARGE SAMPUS)

Example .13. In an'intelligence test administered to 6Q fathers and


their 100 children, the following results were obtained.
Fathers m.ean score 114; S.D.:: 1 ~;
Sons mean score no; S.D. == 11;
As~uming the r between the two to be +-75. calculate the standard
error of the difference of the two means and state whether the difference
is significant ?
Sollltion
Since the two samples are related, the standard error of the difference
of their mean scores is
2
s. e'm1-m:a=
J alea2
+ --
"1"2
Substituting the values, we get,
- al
2rx - X -
"1
a.
II,
~~--~~c------------------

s.e· ml-m2=
J (1;)8
-6-+--zX'7~X
0_
(ll)·
100
.!L
60
II
x100
-
V;·9793
== 1.99
Thus the standard error of the difference of the two mean scores
is 1.99.' The observed difference is 1I4-I.IO~ or ...., which is only
twice this standard efror and sO could have ansen through the B.uctua-
tion of the simple sampling. Hence, the difference is not significant.
Standard error of the difference betwccn twO sample medians
The standard error of the difference of two sample m.edians is
obtained by the fortn-ula-
I. ,. mdlll.!:tII/... == V "·'··../..l+ S.,.ltII4,,.
The following example illustrates the formula :-
Exampl, Z4. Two samples of 100 and 80 studCnts are taken
with a view to fitld out their average monthly expenditure. It is found
out that the median monthly expenditure for the first group is Rs. 8S
and for the second group is Rs. 100.
The standard deviation fot the first group is Rs. 7 and fox the
second, Rs. 8.
Examine if the difference between the median monthly expendi-
ture of the two samples is statistically significant ?
SolulitJn
Assuming tha.t the conditions of simple sampling hold true, the
standard error of the median monthly expenditure is
FUNDAMENTALS OF STATISTICS

Substituting the values, we get,


1. 2 5331 X7
s. e. md.. for the 1St sample= v' = .877
roo
1. 2 53;1 X8
land, 's. e. mdn for the 2nd sample= vi = 1.I2.1
80
Supposing that the two samples have been drawn quite indepen-
/dently, the standard error of the difference of their median monthly
expenditure is
s. e. ,udnl-mdna = V s.e. 2 mdnl+ s •e . 2 mdn2
Substituting the calculated values, we get
s. c. mdnl-mdn2 = y"(""".8-77-:)7""2+~(1-.1-2~1)2
= V2.0258
=1.4 2
The observed difference of the median monthly expenditure oq
the two samp-les is Rs. 100-85 or Rs. 15, which is more than ten times)
;his standard error and so could never have arisen through the fiuctua-'
dons of sampling.
Hence, the difference of the median monthly expenditure of the
two samples is statistically significant. I
Standard errOr of the difference between two sample standard de-
viations.
The standard error of the difference between two standard de-
viations is calculated by the following formula :
(I) Where population standard deviation is not known
o~2
j ---;---
1.n1
'02 2

2112

(2) Where population standard deviation is known

J a p2 X
2
(_I + _I)
III 112

(3) Where standard deviation of sample I is compared with the


combined standard deviation of the two samples

Jal 112
---;- X nr(1I1+na)
The following examples would illustrate the above form~lae.
Example 25. In a sample of 1000 the mean is found to be 17·5
and the standard deviation 2.5. In another sample of 800 the mean is
18 and the standard deviation 2..7. Assuming that the samples are
SAMPLING OF VARIABLES (LARGE SAMPLES)

independent, discuss whether the two samples could have come from
universes which have the same standard deviation.
Sollilion
Supposing that the samples are drawn independently, the stan~­
ard error of the difference of their standard deviations is

s. e.a1-all=J~+ O"z2
Z"1 znz
S~bstituting the values, we get
(z.~)2 + i::7)2
Z X 1000 Z X800
=.088.

The observed difference of the standard. deviations of the two


samples is 0.2 which is less than three times this standard error and
so might have arisen as sampling fluctuation. Hence, the two samples
may have come from universes which have the same standard devia--
tion.
Example 26. For two groups of left-handed and right-handed
students it was found that no significant difference existed between
the two mean I. Qs. For left-handed students, 0'1 was 15.2, while
for right-handed students, 0'2 was 15.5. 1'1 the differen<:e between these
measures significant? It will be rem~mbered tHat N 1 =Na=68.
ISo/lIlion
, Suppo~ing that the samples are drawn independently., the stand-
lard error of the difference of their standard deviations is

+
s. e'O']-O':a=
J_1_
0'2

2."1

Substituting the given values, we get


0"2
_2_
2."2

s.e'O'l-aa= (15.2.)2+ (15.5)2


2X~~ zX68
=i~86
The observed difference between the standard deviations of the
I two samples is 15.S -I5.2=0.~. This is only .16 times the standard
error and so might have arisen as sampling fluctuations. Hence, the
difference between these measures cannot be taken as significant.
Exam}/' Z7. The standard deviation of the height of 100
M. Com. students is 3' and of another zoo M. A. students is 4". Test
the significance of the difference of the standard deviations on the
assumption that the standard deviation of the height of post-graduate
students is 3.f·.
FUNDAMENTALS OF STATISTICS

follllio"
Supposing that the two samples are independent and have come
from a universe which is normal, the standard error of the difference
of the standard deviations is

s.e.al-aa=
J up?
- -(
~
1 I) ==;,
-+-
"1 "2
J
(H)2 ( I
--
~
-+--
100
I )
~oo

= j 6.U'X3
10 0 = V.o9 18n

==·3°3
The observed difference in the two standard deviations is I
which is slightly more than thrice the standard ertor of the diffcrc~e
of deviations, and hence the difference is signi6cant.
&"".11, as. The standard deviation of the weights of 746
United States'soldiers of French extraction was shown to be 16.0,3
pounds. At the time of demobilization weight measurements were
made not only of the United States soldiers of French extraction, but
of 80,000 soldiers of all types. For the entire group the value of ap
was J7'06 pounds. Does the observed a for the French differ signifi-
cantly from this value ?
SO/lIlio"
Supposing that the distribution is normal, the sbfudard error of
tl-.e di6erencc of, the standard deviations of weight of the soldiers of
the French estraction and of all types is
I.I.C'I'o-al=
J Z
ailS
..:£-x - - - -
Substituting the values, we get,
;;::-

~ (Ift+"a)

I. l.a.-a, =j-:(:-I-7.-06~)I:--x---7-9a-'-4--__
a 79254(746+ 7t);?'i4
··oW
The observed cW£eicnce of the standard deviations is t .0,3 pounds
whichia only a.,3 times this standard error and so can be attributed to
the fiul:t'uations of the sampling. Hence, the data do not reveal,a signi-J
6cant diHcrenc:c between the two standard deviations.
" 'Bx"IIIP" .19. The standa~ deviation of the wages of 400 textile
workers is Ils. 12.8, another sample of 600 teztile workers $ives the
standarcL:ieviation at Rs. 1'.7. Find out if the standa.rd devJation of
the first ~ple significantly differs from the combined standard devia-
tion of the two samples which is Ils. J4.
SDIMIi",
Supposing that the distribution is normal the standard error of
the diiefenc:e of the standard deviation of the fuat sample and the
combined ltandard deviation of the 'two samples i.,
SAUI"LING 01' VAIlIAUU (UltGE SAUPLES)

Substituting the values, we get,


~~------~-----
r. "0'0-0'1=J(1 4 )I X 600
2. 400(400+600)

-J
-
9S x6oo -~-
400 X 1000 - .147

=.3 8.
The observed difference between the standard deviation of the
Brst sample and the combined standard deviation is 1.1 which i. more
than thrice the standard error. Hence the clliference i. aigniJiQUlt. •

!. Uader what ASsumptions would JOI1 teat the li~ of IIIIIlIft uti
standaad deYi.lion in lUSt: samples jl
2. What do )'OQ understand by simple aamplinl o! ....nabla) What lillli-
tation. an: i~ by the ..sumption that .implc lamplinc c:oodi.tioaa . . _a&r:d.
by a particular eampIiog ttudy ?
3. Calculate the IWIdud error of the mean fcom the followinc daIa collcc:&cQ
in one of the many Dftdom .ample inquiriCl conducted to find out aYUIF camin&
of a particular claa :-
Euaitlp per moath Number of PetaODl
ia rupees
up to 10 so
•• 20 150
•• 30 300
•• 40 500
.. 50 700
60 800
:~ 70 900
•• 10 1000
(M. C,a•• ~"". 1951).
4. The~ frequency table 11\oW8 the diltribution of 1110 observations
made oa 149 tty price leries duriog ten bllSiDCII cycles :
Dluation of cycle Frequency
(in months)
7.50 to 12.49 months 7
U.5O to 17.49 27
17.50 to 22.49 61
22.50 to 27.49 115
27.50 to 32.49 139
32.50 to 37.49 1~
37.50 .to 42.49 167
42.50 to 47.49 124
47.50 to 52.49 122
52.50 to 57.49 67
FUNDAMENTALS OF STA'I'ISTICS

57.50 to 62.49 52
62.50 to 67.49 15
67.50 to 72.49 15
72.50 to 77.49 8
77.50 to 82.49 2
82.50 to 87.49 2
87.50 to 92.49 o
92.50 to 97.49 1
Calculate the average duration of the cycle for this distribution and state that
within what limits is your average correct?
5. To test if a small electric current affected the growth of wheat seedling
100 Rairs o~,plants v:e:e grown in pa~I1el boxes and one me.mber of each pai:
was treated by recelVlOg a small electriC cutrent. The mean dIfference in heights
between the treated and the untreated (treated-untreated) is+4 m. m. with a standard
deviation of 2 m. m.
Does the electric current exercise any influence on the growth of the plants ?
6. If the standard deviation of pulse rate in adults is 8 and the normal pulse
rate is 70 would you say a high pulse rate was diagnostic if a group of 64 people
suffering from a disease were found to have a pulse rate of 75?
7. The following table shows frequency distribution of yield of wheat in rna·
unds per acre in 998 irrigated fields selected at random in the province of Punjab.
Limits in mds. No. of fields
0- 4 ~ 45
4- 8 184
8-12 281
12--16 228
16-20 155
20-24 71
24-28 22
28--32 5
32--36 1
Calculate the sampling error of the mean.
8. Suppose that the standard deviation of stature in men is 2.48 inches. One
hundred male students in a large university are measured and their average height is
found to be 68.52 inches. Determine the 98 per cent. confidence limits for the mean
height of the men of the university.
9. The average breaking strength of steel rods is specified to be 18.5 thousand
pounds. To test this a sample of 100 rods was tested and gave the- following results:-
Breaking Strength No. of rods.
15,000-16,000 12
16,000-17,000 20
17,000-18,000 40
18,000-19,000 20
19,000-20,000 8
Do the results of the test justify the hypothesis ?
_._10. /l sample of 900 members is foundJ:O have a mean of 3.4 em. Can it he
reasonably regarded as a simple sample from a large population with mean 3.2 cm.
and standard deviation 2.3 cm. ?
11. A certain colliery is supposed to supply coal of ash content about lS%.
One hundred samples of coal of the company when analysed gave an ash content
of 16.8%with a standard deviation of 8%. Do these results justify company's claim?
SAMPLING OF VARIABLES (LARGE SAMPLES) 73 1
12. What ill the standard error of the standard deviation if thc parcnt distri-
bution is (0) normal (b) not I normal ?
It is known that the mean and standard deviation of a variable are respectively
100 and 10 in the univer~e. It is, however, considered sufficient to draw a sample of
sufficient size but such as to ensure that the mean of the sample would be, in all pro-
bability within 0.01% of the true value. How much would the cost be (exclusive of
overhead charges) if the charges for drawing 100 members of a sample be one rupee?
cr.
A. S., 1947).

13. The followihg table gives the frequency distribution of weights for adult
males born in Punjab.

Weight in lbs. Number of men within


given limits of weight.
90- 2
100- 34
tto- 152
120- 390
130- 867
140- 1623
150- 1559
160- 1326
170- 787
180- 476
190- 263
200- 107
210- 85
220- 41
230- 16
240- . 11
250- S
260- 1
270- o
280- 1
Total 7749
Find out the semi-interquartile range and calculate its standard error.
14. 8500 males of the 20-25 age group were examined and were found to
have a pulse rate of 71 beats per minute. The standard deviation was 9.5 beats per
minute. Find out the standard error of this standard deviation assuming the uni-
verse to be normal.

15. The pearson's coefficient of skewness( 7 Z )for a distribution is+·12.


Find out its standard error if the number of items in the distribution is 150.
16. Calculate the coefficient of skewness of the following distribution and
dso calculate its standard error.
Size of item Frequency
3 2
4 4
5 5
6 8
7 10
S 7
9 8
10 6
JlUNDAlIlZNTALS OF STATISTICS

17. The (ono....ma i. the disttibutioo of Brsdstteet'8 Commodity Pricea (t897-


1913), exp.ressed as percenta.. of ttend. ,
Percentage of Ptequcocy Perceatage of Ftequcnc:y
Trend Treod
8S 1 100 14
86 0 lOt 30
87 2 102 15
88 0 103 15
89 .. 104 22
90 2 105 12
91 1 106 4
92 3 107 7
93 6 108 2
94 10 109 0
95 8 110 3
96 13 111 1
71 5 112 1
98 16 113 1
99 15 114 1
(..) Wbst ....ould be the limiu of errOl in the calc:u1ation of the .....iance of the
aboTe diatributioa ?
(t) WhIt i, the probAble enor fot the Secoud :MomcIIt about the meaD Eot the
data 01 the di.tribution 1
18. Tluee apc:riIDCGU 'W'CrC coucluc:tcd to cbecmioe the Rlstion&!Up hctwcctl
laterality of Jwsd II11d Jlce.aa1ity oE OF. The coucllti!)DI betweco (1) difference of
.t"nath of pip and (2) ~ in TilUlll acuity were
~.02410 ~3234
-0.00738
'ub~~
.c003 su~
+0.02M2 1447 subjCCtll •
Pind the standard Cfl'Otl of the thtee correlatioD coefficients, and hence show
tltat it cannot be concluded that thCfC is any .ignificant correlation betwc:cn late.tallty
of hand and latctality of eye.
19. Prom the foUo....ing foat samples gi.iag the stature ia iocbca for .coo adult
males, 100 in Bngland, 100 in SeotJand, tOO in Wales And 100 io Ireland, iiDd out as
to wbAc would be standaM enor of the mean of a I8IDple which consists ~( .11 these
400 men.
5tature in iochca (or Adult Males born in
Mean
Bogland Scotland Wales mW:1d
67.31 68.55 66.62 67.78
Standard Deriation 2.56 2.50 2.25 2.17
20. To It certaiD number of ltudents chosen at random from It given &chool
population, a tc5t of general intelligence (x) and It test of silent £caetins comprehen-
sion (y) ace l&dmicilter:ed. Suppose that the following mcasw:es ace derived fmm the
distrillUtioo of scores on these tWo tests : -
A.-102; 0'.-12; A.=80; 0'.=10;
and rJg=+.8
With the help of this data, estimate the acores that a student would make J
general intelJigence if he makes a lcore of 56 on silent reading c:omprebeosiontes
Also calculate the standard ctror of 'your estimate.
21. What is meant by the standard enor and what ate its practical U5es ? In •
. telligence tests on two gro1.1PS of boys and girls, give the following results. Examine
if the difference is signifiOlnt ; -
Girls. Meul 84; $.D. 10: No. 121.
Boys. Mean 81; S.D. 12: No. 81. (P. C. S., 19-43).
S..ufPLING OF VAlUABLES (LAIlGE SAMPLES)

22. The intelligence quotient (1. Q.) ratings of a sroup of 68 left-handed


students Rh~wed a mean of 110.52. A similar group of 68 light-handed studenu had
II mean 1.Q. of 109.48. Are the left-banded students actually aboYe the right-handed
in respect of I. Q. rlting, ot i. the diiference so di~t •• perhaps to h....e been due to
chance variations arising in sampling ? For left-handed ltudents N, =68. x J "'"
ltO.S2, C11=15.2. For right-handed Itudenta N.=68, x,=I09.48. C1,=15.5.
23. The heights of a random .ample of '304 Scotchmen had a mean of 68.5456
inches and a standard deviation of 2.480 inches. A random aample of6194 Eng-
lishmen gave an average height of 67••375 inches, and a standard deviation of 2.586
inches. Is this dilfetencc significant ?
24. A tandem ,ample of 1000 men from Northern India shows their mean
wage to be Rs. 2/8 per day with standard deviation of Rs. 1/8. A .ample of 1500
men from Southern India gives a mean wage of Ill. 2/11 per day "ith a .tandard
deviation of Rs. 2. Discus, the suggestion that the mean rate of wages varies a s
between the two regions.
25. A random sample of 1000 fanns in a certain year 8i"es an avenge yield
of barley of 2000 lbs. per acre, with II standard deyiation of 1921bl!. A random Simple
of 1000 farms in the follOWing year give. an average yield of 2100 lb •. per acre with
II standard deviation of 2241bs. Show that theae data are inconsistent with the hypothe-
sis that the average yields in the country III a "hole "en: the same in the two years.
26. The nerage chest girth in 74,.59 males classed as gmdc: I 'Was found to be
35.S· with II stan<hrd deviation of 1.94". For 2146 malea classed as grade II the
average girth was 34.8" "'ith a .tandani deviation of 2.01'. Is then: II significant
difference in the chest girth of the two grades ?
27. 50 children are ginn II special diet for a certain period and a control group
of 50 other children have normal meals. Their gains in weight arc:
Specials 7.2 Ibs. Controls 5.7 100.
The standard dcyiation for the weight gain is 2 lbs. 'Would you conclude that
the special diet really promotes weight ?
28. In order to find out the average daily earnings of IICmi-skilled 'Worker in
the shawl manufacture trade of Stinapr, an investigator took II random SImple of
350 worker., and his investigation gave the average daily earnings to be Rs. 5-8-0.
He took another random sample of 300 workers and found the average earning to be
Rs. 5-12-0. Is there significant difference between the aYe.l'lSC earning. of the first
group and the average earning of both the groups taken together, assuming that the
standard deviation of the nerage daily eaminge il 10 Ill.
29. The foUowing <htllo telate to the weight (in pounds) of 100 huabattds Ind
their 100 wives :
Mean Stanckrd deviation
Hl18bands 125 8
Wives 110 6
A~suming the,. between the two to be+.6. Calculate the standard error of
the difference of the two means lind state whether the difference i. si!nificant.
30. The following data relate to the height measurement of 100 husbands
and their 120 wives.
Mean Height Standard Deviation
Husbands 64'- 5
Wives 61' 4
Auuming that the coelJicient of correlation between the heights of husbands
and wives i~ +.6, lind out if the ditfetence in heighti is signHicant.
: 34 FUNDAMENTALS OF STATISTICS

31. In a wheat variety test conducted over a wide area, the mean difference
between two varieties was found to be 5.5 bushels to the acre. The standard ertOr
of this difference was 1.4 bushels per acre, and was determined from 100 pairs of plots.
Set up the fiducial limits at the 5% probability level for the mean difference in
yield between the two varieties.
32. The median height of 100 M. Com. students is 66" with a standard deviation
of 3" and the median height of 121 M. A. (Economics) students is 64" with a standard
deviation of 4". Are M. Com. students taller than M. A. (Economics)
students?
33. For two groups of students it was found that there was no significant
difference in their mean weights. However, the standard deviation in the first
group (of 100 students) was 10 lbs. and in the second group (of 150 students) 13
Ibs. Is there a significant difference in the two standard deviations ?
34. To find out the mean weight of girl students two independent samples
of 100 each are taken which reveal no significant difference between the means; for
the first sample the standard deviation is 2.5 inches while for the second it is ·3.S
inches. Test the significance of the difference between these two measures taking
the standard deviation of the universe to be 2.8S.
3S. The following data relate to the wages paid to workers in a certain industry.
1st sample 2nd sample;
Number of workers 586 648
Mean monthly wage Rs. 52.5 Rs. 47.S
Standard deviation of wages _ 10 11
Find out if the mean wage and the standard deviation of wages of the first sam-
ple significantly differ from the combined mean and the combined standard deviation
of the two samples. \
36. Two hundred observations drawn independently from a population have
a mean equal to 30 and standard deviation of 15.
To examine the null hypothesis that the hypothetical mean is 20 the procedure
ollowed is to compute w=( 30-20 )-:--lS/y'200 and to verify whethet this
exceeds 1.96.
Explain clearly the basis of this test, assuming that the available under study
in the population is not known tl priM'; to follow a normal distribution.
If you are told that the distribution in the population is symmetrical, about 20
do you suspect the sampling procedure if 150 observations out of 200 exceed the
value 20 ?
(I. A. S., 1957).
37. The heights of a random sample of 1,304 Scotsmen had a mean of
68.S456", with standard devi2tion 2.480". A .r2ndom sample of 6,194 Englishmen
gave an average height of 67.4375", and a standard deviation of 2.548". Is this
difference significant ?
38. In an ordn2nce factory tW;) different methods of shell filling are compared. I
The average and standard deviation of weights in a sample of 96 shells filled by an
old process are 1.261bs. and O.013lbs. tnd a sample of 72 shells filled by a new process
gave an average of 1.28 lbs. with a standard deviation of 0.011 lbs. Is the differ·
cnce in averdge weights significant ?
39. Two chemists estimate the acidity of a jar of dilute hydrochloric
acid. Their results are an average of 10.162 with a standard deviation of 0.23 based
on 15 determinations and 10.341 with standard deviation of 0.12 based on 24,
determinations. What is the level of significance of the difference between the chemists'
estimates of the acidity ?
40. If it is known that the variance of the length of life of electric;!ight bulbs
is 2,500 and it V{e obtain for 25 light bulbs a mean life of 500 hours, determine a 95
percent coruidence-interval estimate of the population mean ?
SAMPLING OF VARIABLES (LARGE SAMPLES) 73,'i

41. The mean life of stockings used by an atmy was 40 days with a stanJard
deviation of 8 days. Assume the life of the stockings follows a normal distribution.
If 100,000 pairs ate issued, how many would need replacement after 35 days ? After
46 days ? , .
42, On an examination a class of 18 students had a mean of 70 with <>=6,
Another class of 21 h9d a mean of 77 with <>=8, on the same examination. Is there
reas.on to believe that one class is significantly better than the other? Consider
the classes as samples from some universe. What might the universe be ?
43. A machine is set to turn out ball bearings having a radius of 1 centimetre
(allowable error±.Ol centimetre). A sample of 10 ball bearings produced by this
machine has a mean radius of 1.004 centimetres with <>=.003. Is there reason to
suspect the machine is turning out ball bearings having a mean radius greater than
1'0 centimetre ?
44. A sample of 25 workmen showed an average increase in wages of 45 ceDts
per hour with 0'=10 cents. Give 90 pet cent confidence limits for the mean wage
increase in the population from which the sample was drawn. State assumptions
used?
Chi .. square Test And
Goodness of Fit 28
M,tIf1ll1~. In the 19th chapter while discussing the study of asso-
ciation in manifold clusmcation or in contingency tables 'We had men·
tioned that the nlue of chi·square is used to study the divergence of
actual and expected frequencies and from this value coefficient of con-
tingency is calculated to find out if there is any association between the
attributes in question. Thus chi-square is a measure of actual diver-
gence of the observed and expected frequencies. It is very obvious
that the importance of such a measure would be very great in sampling
studies where 'We have invariably to study tm divetgence between
theory and fact. In sampling studies we never expect that there will
be perfect coincidence between actual and observed frequencies and the
Iquestion that we have to tackle is about the extent to which the differ-
ence between actual and observed frequencies can be ignored as arising
,due to ftuetuations of s2mpling. Chi-square as we have seen is a mea-
sure of actual dift"erence between the expected and observed frequencies
and as such if there is no difference between actual and observed fre-
quencies the value of chi-square is o. If there is al difference between
the observed and the expected frequencies then the value of chi-square
would be more than o. But the difference in the explained observed
frequencies may also be due to ftuctuation of sample and the value
of chi-square may arise due to sampling fiuctuations and it should be
ignored in drawing inferences. Such values of chi-square under differ-
ent conditioDl are usually available in the shape of tables and if the
actual value of chi-square is more than that given in the table it indi-
cates that the difference between expected and observed frequencies is
not solely d~ to sampling fluctuations, and that there is some other
reallon for it. If, Oft the other hand, the calculated value of chi-square
is less than the table value it indicates that the difference between actual
and observed frequencies may have arisen due to chance fluctuations
and can he ignored. In this way chi.square test enables us to find out
whether the divergence between theory and fact or between expected
and actual frequencies is significant or not. If the calculated value of
chi-square is very small as compared to its table value it indicates that
the divergence between actual and expected frequencies is very little
and consequently the fit is good. If, on the other hand, the calculated
value of chi-square is very big as coml'llred to its table value it indicates
that the divergence between expected and observed frequencies is very
great and consequently the fit is poor.
Before going into further details of the chi-square test we shall
Introduce the reader with certain terms used in this connection.
CHI-SQUARE TEST AND GOODNESS OF FIT

Degrees of freedom
The term degrees of freedom refers to the number of " independent
constraints" in a set of data. We shall illustrate this concept with a few
examples. Suppose there is a :z X:z association table and the actual
frequencies of the various classes are as follows :-
A a
AB "B
B 2.2. 60
38
A(3 af3
8 40
32.
30 70 100
Suppose that we presume that the two attributes A and Bare
independent then the expected frequency of the class (AB) would be
0
3 X60 or 18. Now once we decide the expected frequencies of the
100
class (AB) the expected frequencies of the remaining three classes are
automatically fixed. Thus for the class (aB) expected frequency must
be'60-IS or 42 and similarly for the class (Af3) the frequency must
be 30-18 or I2 and for (af3) it must be 70-42 or 2.8. It means then
that so far as this table is concerned we have only one choice of our
own and in the remaining three classes we have no freedom to fill in
the frequencies as we like. It means that we have only one degree of
freedom so far as this table is concerned. There is one independent
constraint here and three constraints are dependent. In such tables the
degrees of freedom are calculated by the formula:
V=(C-I) (r-I)
where v stands for the degrees of freedom. C for the number of
columns and,. for the number of rows.
Thus in 2 X 2 table degrees of freedom are (2. - I) (2. - I) or I.
Similarly in 3 X 3 table, degrees of freedom are (3 - 1) (3 - I) or 4 and
in 3 X4 table the degrees of freedom are (3 -I) (4-1) or,6.
If the data are not given in the shape of contingency tables as
abSve but are given in the shape of a series of individual observations
or discrete or continuous series then the degrees of freedom are calcu-
lated in a different way. Take the following illustration relating to 102.4
throws of ten coins each which gives the following distribution :-
Number of Heads Actual Frequencies
o 2.
I 10
2. 38
; 106
4 188
5 :on
6 "2(>
7" FUNDAMENTALS OF STATISTICS

7 u.S
8 59
9 7
10 3
Total 1024

In this question if we write down the expected frequencies we


have freedom to write any ten figures we choose but the eleventh
figure must be equal to 1024 minus the total of the ten figures we have
written, because the total of the e~pected frequencies must be equal to
the total of the actual frequencies. Thus, there are ten degrees of free-
dom in the above question. In such cases the degrees of freedom are
equal to (n-I) where n is the number of frequencies (or values in case
of a series of independent observations).
Levels of significance
We have mentioned in the beginning of this chapter that the
observed values of X2 (Chi-square) are compared with the table values,
to conclude whether the difference between actual and observed ft!-
quencies is due to the sampling fluctuations and as such insignifiatnt or
whether the difference is due to some other reason and as such signifi-
cant. In previous chapters on Sampling of Attributes and Sampling of
Variables we have seen that the divergence of theory land fact is always
tested in terms of certain probabilities. The probabilities indicate the
extent of reliance that we can place on the conclusions drawn. The same
technique is- used in ('..ase of XI test also and the table values of X2 are
available at various probability levels. These levels are called levels
of significance. Generally two types of tables are available-one in
which the probability of a particular calculated value of X2 for given
degrees of freedom arising due to chance fluctuations is given and
the other in which the independent value of XI for given degrees of
freedom at certain level of probability are given. By independent
value of Xll we mean the value of XI which can arise due to chance
fluctuations. Thus we can analyse our results in two ways. In the first
method we find out the probability of the observed value of X2 arising
due to chance fluctuations and see whether the probability is high or
low. Usually if the value of pis .05 or more it is considered significant
which means that if the value of p is .05 or more, then the difference is
not significant and can be ignored. If the value of p is one it indicates
that there is absolutely no difference between expected and actual fre-
quencies. If the value of p is less than .05 it is generally considered to
be insignificant and the conclusion in such cases is that the difference
between actual and observed frequency significant and could not have
arisen due to chance fluctuations. In the second method we can find
out from the tables the independent values of X2 at certain levels of
significance. Usually the value of Xl! at 5% level of significance for
the given degtees of freedom is seen from the tables and' if the calculated
CHI-SQUARE TEST AND GOODNESS OF FIT 7~9

value of X 2 is more than the table value it means that the difference
is significant, and if it is less it indicates that the difference could have
lrisen due to chance fluctuations and as such can be ignored. The
second method is more convenient and easy and is generally used.
We shall now give below some illustrations in which· the value of
~2 has been calculated and conclusions drawn from it. .as we have
seen in the chapter on Association of Attributes the value of XI is
:alculated by the formula

where I stands for the observed frequency' and II for the corres-
ponding expected frequency.
Example. The following table shows the number of people
interviewed by age-groups and the number in each age group estimated
to- have peptic ulcers.
Age group 15-2.0 2.-2.5 2.5-::-35 35-45 45 -5 5 55- 6 5 65 -75 Total
\fos. Interviewed 199 3 00 112.8 1375 108 9 62.5 15' 4 867 1
P. U. Cases I 8 3 8 9 6 10 5 56 12 31
Do these figures justify the hypothesis that peptic ulcer is equally
popular in all age groups '(
Solution
If peptic ulcer was equally popular in all age groups then in each
age group (~X' 100) or 6.,% of ·the people should suffer from it.
4 87 1
On this basis the observed and expected frequencies would be as
follows : -
Age ~rou__E_ 15-2.0 2.0-2.5 25-35 35-45 45-55 55-6~ 65-75
Observed Cases I 8 ~8 96 105 56 12.
Expected Cas.es 13 1,9·5 73 89 7 1 40·5 10

X2= (1-1~)2 + (8- 19.5)2 +


(3 8 -73)2 + (96 - 89)'
13 19·5 73 89
+ (105-71)1 +(,6-40.,)2 + (12.-10)2
71 40.5 10
=57·6
There are six degrees of freedom in the question. For six degrees
of freedom the table value of X' at ,% level of significance is 12..59.
The calculated value is much higher than this and as such the differenc
is significant and the hypothesis is not justified.
FUNDAMENTALS 01" STATISTICS

We could have looked at this problem from another point of view


also. For six degrees of freedom when p= .05 the value of X 2 is 12..592.
and when P=.OI the table value of Xli 16.8I2.. When the value of X a
for six degrees of freedom is 51.6 the value of p would be much less
than .01 even. It means that the probability of such a value of X 2
arising due to chance :8 very little or in other words the difference
between observed ah": expected frequencies is highly significant.
Example 2.. Two hundred digits were chosen at random from
a set of tables. The frequencies Ot the digits were as follows : -
Digit Frequency
o 18
I 19
2. 2;
3 2.1
4 16
5 25
6 Z2
7 20
8 21
9 15
Use X 2 test to assess the correctness of hypothesis that the digits Were
distributed in equal numbers in the tables from which they were
chosen. I
Solution
If we take the hypothesis that the digits were distributed in equal
numbers in the tables as correct, the expected frequencies of the digits
would be :
Digit o I 3 4 6 7 8 9
Frequency 2.0 20 20 20 20 20 20 20 %0 20
Substituting the observed and the expected frequencies in, the
formula

We get,
X2=(I8-20)2+ (19-20)' (2;-20)2 (2.1-20)2+(16-20)2
ro ro + ro + ro ro
(2.1- 20)2 (2.2.-20)1 (20-20)2 (.u -20)2 + (15 -20)2
+ 20 + 20 + 2.0 + 20 20
=-irs {4+1+9+1+16+25+4+0+I+25}
=403-
The number of degrees of freedom
=("-I)(r-I)=(10-I)(1-I)
=9
CHI-SQUARE TEST AND GOODNESS OF FIT 741

The s% value of X~· for 9 degrees of freedom is J6.919. TIle


calculated value of X 2 which is 4.; i5 thus much less than this figure.
Hence the hypothesis seems reasonable and correct.
X 2 test is also used to study the association between attributes.
In such cases the expected frequencies ar,e the independer..t values of
the classes, calculated on the assumption that the attributes in question
are independent of each other. The following examples would illus-
trate the study 'Of association by X2 test
Example 3. The following table is published in a memoir written
by Karl Pearson:-
Eve colour in sons
not light light

not light 2.3° 148


Eye colour
in
fathers light 15 1 47 1

Test whether the colour of the son's eyes is associated with that of
the father's (you may use the fact that 5% value of chi-square for I
:legree of freedom is ;.84).
Solutioll
Let us take for our hypothesis, the supposition that the eye colour
of sons and the eye colour of fathers are independent. If this be true,
the expected freguencies would be :
Eye colour in sons
not light light

not light 144 234


Eye colour
in
I
fathers light '2.37 ,8,
Substituting the observed and the theoretical frequencies in the
formula

We get.
X2=(2.30-144)1 +(I84-2.34)2+(I~ I -2.37)2 +(47~-385)'
144 234 %37 ;85
=(86)2 {_I_ +_1_ + _1_ +_I_}
144 2;4 2.37 ;85
FUNDAMENTALS OF STATISTICS

=133·37·
The number of degrees of freedom
=(t:-l)(r-l)=(z-I)(I-I)
=1.

The value of X 2 at s% level of significance for I degree of free-


dom is ;.841 which stands a very poor comparison to the calculated
value, I H.37. So this leads us to the definite conclusion that the
"'l'iypothesis is wrong and the colour of son's eyes is associated with that
of the fathers to -a great extent.
Example 4. In an experiment on the immunization of goats from
anthrax the following results were obtained. Derive your inference
0n the efficacy of the vaccine.
Died of anthrax Survived Total
Inoculated with vaccine 2 10 12.
Not inoculated 6 6 12

Solution (I)
We take for our hypothesis, the supposition that inoculation and
survival are independent attributes. If this- be true, the e~pected
frequencies would be : -
Died of anthrax Survived
Inoculated with vaccine 4 8
Not Inoculated 4 8
Substituting the observed and the expected frequencie~ - in the
formula

}
We get,
X2=(Z-4)! + (10-8)2 + (6-4t +(6-8)-
-It-
4 8 4
=(Z)2 H:+i+i+i}
=3'
The number of degrees of freedom is
=(c-I)(r-I)=(z-I)(Z-I)
=1.
The v.alue of X2 at s% level of significance for /one degree of
-freedom is 3.841. The calculated value which is 3 is thus less than it.
Hence there is no cause to suspect the hypothesis and the data do not
suggest that the survival is associated with inoculation. Whatever
association there appears to be between these two from a direct com-
parison may be due to fluctuations of sampling.
CHI-SQUARE TEST AND GOODNESS OF FIT

SoIlIIiflll (2.)
There is a simple formula for calculating X 2 for this type of table.
i.I., (2. X 2.) fold table given by Brandt and Snedecor.
Representing tne (2 X 2) fold table as follows : -
b1 b2 Tb
(1 (2 Tc
Tl TI T
.. (b (2-( b )2T
Xl 1S glven by 1 1 2
TrTz·Tb,Tc.
We multiply diagonal frequencies and find the difference between
the two products. The difference is squared and multiplied by the
grand total, and the result is divided by the product of the sub-totals.
In our case, the table is as follows

L~
8
1~ ~:
16
I 24

Substituting these values, we get


X2 {(2X6)-(6XIO)}2XZ4 = 4 8 X48X 24
8 X 16 X 12 X u. 8 X 16 X 12 X 12
=3'
Example ~. In an experiment on immunization of cattle from
tuberculosis the following results were obtained
Died or affected Unaffected
Inoculated 12 2.6
Not inoculated 16 6
Examine the effect of vaccine in controlling susceptibility to tu-
bercu10sis
Soilltion
Percentage of inoculated who are unaffected is 26 X 100=68.4
38
Percentage of not inoculated who are unaffected is ~
22,
X 190 = 27 . .3

Thus the comparison of the two percentages brings out clearly


the fact that inoculation and non-susceptibility to tuberculosis are
positively associated.
But, it is just possible that this association may be due to fluc-
tuations of sampling. 'So, to test the significance •of ~ssociation, We
proceed as follows :
Taking for our hypothesis the supposition t~t the two .attributes
given in the problem are independent, the theoretical frequenCles would
be:
744 FUNDAMENTALS OF STATISTICS

Died or affected UnaJfccted


Inoculated 17'7 %O·~
Not-inoculated 10'3 I I '7
Substituting the observed and the theoretical frequencies in the.
formula

we get,
X 2 = (11-17'7)1 + (16-20'3)3 (16-10';)' (6-11'7)'
11'7 2.0'3 + 10'3 + II '7
( .)2 { I
= 57 17'7
I
+ 2.0'3 +
I
10'3 + 1
11'7
J
=9.3 67
The number of degrees of freedom is
=«(-1)(,.-1) =(2.-1)(2. -I)
= I,
The value of XI for one degree of freedom at ~% level of signi-
ficance is 3'841. The calculated value is thm greater thu this value.
So the hypothesis is incorrect and the association between the two
attributes is significant. Thus it is established th'nt the varone is
effective in controlling susceptibility to tuberculosis.
Example 6. Two investigators draw samples from the same
town in order to estimate the number of persons falling in the income-
groups "poorer", "middle class", "well-to-do" (The limits of the
groups are defined in terms of money and are the same for both investi-
gators.) Their results are as follows :-

Investigator Income-gtoup
"Poorer" "Middle class" I "Well-to-do" Total
A 140 100 I IS 2.55
B 140 50 I 2.0 2.10
Total 280 15 0 I H 465
Show that the sampling technique of at least one of the investL
gators i~; o;uspected
So/ption
Let us take as our hypothesis the supposition that the sampling
technique of both-the investigators is the same and beyond suspicion,
If this be true, and further, as both the samples have been drawn from
one population, the two samples should contain almost same propor-
tion of "poorer", "middle class" and "well-to-do" people. On this
basis the theoretical frequencies would be : -
CHI-SQUARE TEST AND GOODNESS OF FIT
74'

Investigator Income-group
"Pooter" "Middle class" "\X1ell-to-do" Total
A 154 8z 19 z55
B u6 68 r6 2.10
Totals zSo no 3~ 46 5
Substituting the observed and the theoretical frequencies In the
formula

)(2=1: rUt?}
We get,

= J ;.,8
The number of degrees of freedom
=(c-IKr-I)=(3 -1)(2-1)= 2..
The value of X 2 for 2 degrees of freedom at 5% level of signifi-
cance is 5.991 which is much less than the calculated value. Hence the
hypothesis does not hold ground and suspicion arises in the samplin<?
technique of at least one of the investigators. b

Example 7. The following table gives the results of a series of


controlled experiments. Discuss whether the treatment may be consi-
dered to have any positive effect.
Positive No effect Negative
Treatment 9 Z I
Control .L 6 1...
12 8 4
Solutioll
A simple formula for calculating X2 for (z XII) fold tables is
given by Brandt and Snedecor. Representing the (2 X 11) fold table as
follows : -
1 2 3 n
b bn Tb
c j Tc
T
FUNDAMENTALS OF STATISTICS

"Each frequency in either of the rows is squared and divided by


the correspondIng sub-total. These are summated and the correction
term subtrac~ed as shown in the formula. The remainder is multiplied
by the quotient of the square of the total frequency divided by the
product of two sub-totals on the right." In the given example the
table is : •
Positive No effect Negative
Treatment 9 ~ I 12
Control 3 6 3 12

12 8
II>
4 124
Substituting these values in the formula, we get
X2= (24)2 {(9)2 +(Z.)2+(1)2_( lZ
12. X 12 12. 8 4 2.4
r}
= ~:~~: { 6.75+°'5+'°,25-6 } =4X I.5
=6
The number of degrees of freedom is,
(t'-I) (r-I)=(2-I) (3-1)=2
The value of XII for 2. degrees of f'reedom at 5% \evel of signifi-
cance is 5.99. The calculated value is thus slightly greater than this
value. Hence the treatment may be considered to have some positive
effect.
Where the probability of the happening I.)f an event is known and
the actual frequency of the number of times the event happened is
given, X 2 can be calculated directly from the following formula :-

X2= (a -+_/!_n
b )2
tJ
Where
a=observed frequency b=total number of observation less I{
p=probability of the happening of the event
'1= probability of not happening of the event
n=Total frequency.
The following examples would illustrate this formula :
Exampll 8. In a certain hospital during a certain week, So babies
were born out of whom 30 were males and 20 females. Using X'I
test ascertain if this distribution is inconsistent with the observation
that the sex ratio among births I: 1.
SoJulion
Representing
the chance of a male birth by P
CHI-SQUARE TEST AND GOODNESS OF FIT '741

the chance of a female birth by q


the nurllber of male births by a
the number of female births by b
and the total number of observations by n
we have as per, data
P=.50 b=20
q=.50 b=So
a=3°
Substituting these values in the formula

X 2=
( a_ L
q
[j /
We get, X2=
( 30-:;~20
--"-------=2.
r
_p_ n ·5°
--50
q ·5°
The number of degrees of freedom is I. The value of X2 for
I degree of freedom at 5% level of significance is 3.841. The calculated
value is less than this figure. Hence the observed sex ratio during the
given week is not significantly different from the general observation
of sex ratio which is I : I.
The value of X 2 in the above question could also have been cal-
culated in the usual manner as follows :-

Observed Expected
Frequency Frequency

Male births 3° 25

Female births 20 2$

Total 50 50

X2=E [(i-il )2] =~+~


il 25 25

=~+~=2
25 25
Example 9. Upon the basis of past experience the fatality rate
from malaria fever for a certain community was found to be 12.5 per
cent (that is, reported deaths from malaria fever+reported cases of
malaria fever=.I250). A survey was made of certain congested areas.
The homes studied were selected as nearly as possible at random
and a fatality ratio of 30% (45 deaths) was found for 150 cases -of
FUNDAMEN'I'ALS OF S'I'A'I'ISnCS

malaria fever. Using X2 test, deterinine if it represents a significant


departure from the population value 12.5.
Sollltion
Representing
the chance of death when attacked by malaria fever by p
the chance of survival when attacked by malaria fever by q
the number of deaths from malaria fever by a
the number of survivals among those who were attacked by
malaria fever by b
and the number of observations by n.
we have as per data :-
p=. Z2 S 11= 105
q=.875 11= 150
a=45
Substituting these values in the formula,

(
4 - /!_b
q
)2 (45 _,US
.~,
10 5 )2
--~--p__'_-- We get, X' = .12 5
-11
q
--
.875
150
=4%·
The number of degrees of freedom is I. The\value of X2 fot
one degree of freedom at 1% level of significance is 6.64 which is much
less than the calculated value. Hence the sample fatality rate repre-
sents a significant departure from the population fatality rate' from
malaria fever.
By the formula
X 2 =E[(f f)l]
We shall get the same value of Xi.
Thus

Expected Frequency Observed Frequency

Number of deaths 45 18,75

Number of survivals 10 5 131·%5


r

Total 15° 15°·00


. -
CHI-SQUARE 'I'BST AND GOODNESS OF FIT 749

=4 a
Expected values
The above examples illustrate the use of X2 in testing the signi-
5mnce of the difference between the observed and the expected values.
The expected values must be very carefully obtained. Where we are
:lealing with data about throws of a coin we know the theoretical fre-
~uen<ies a priori. In many other cases the expected frequencies can bp
calculated on the basis of the past records. Thus we can know fron.
past records the incidence of malaria or tuberculosis but in such cases
there 'is need of caution. The incidence of the disease in the past and
the present may not be alike. In association tables the expected fre-
quencies can be calculated on the hypothesis that the attributes in
question are independent of each other.
Conditions for the application of chi-square test
While applying the X 2 test the following conditions should be
:>bserved :
1. The number of observations must be reasonably large. It is
Q.ecessary because if the number of items is small the values /-11 or
the differences between the actual and observed frequencies would not
Se normally distributed. In X2 test our presumption is that these
differences give a normal distribution. It is difficult to say what should
. .be the least number of observations, but as far as possible the number
,hould not be less than 50.
2. No theoretical or expected frequency should be very small.
8rdinarily tbey should not be less than 5. If the theoretical frequencie~
ue smaller than this number, then adjoining classes should be merged
:ogether.
3· In X2 test we calculate the probability of getting on random
;arnpling a value of X 2 equal to or more than the observed value and
if the probability is very low. we conclude that there is a significant
divergence between observed and expected frequencies. Usually if p is
less than 0.5 the divergence between theory and fact is supposed to be
5ignificant. If, however, the value of p is high it is no conclusive proof
that the difference between the observed and expected frequencies is
insignilicant. All that can be said is, that so far as X2 test is concern-
ed the difference is insignificant, and the data and the hypothesis are in
agreement. If the value of p is very large say near unity then also there
is reason to suspect the hypothesis because in actual practice there is
never a very close agreement between observed and expected frequen-
cies. Very close agreements are as rare as very great divergences.
The additive property of X 2
X2 has a very useful property of addition. If a number of sample
studies have been conducted in the same field then the results can be
FUNDAMENTALS OF STA'I'lS'tICS

pooled together for obtaining at'. accurate idea about the real position.
Suppose ten experiments have been conducted to test whether a parti-
cular vaccine is effective against a particular disease. Now hen: we
shall have ten different values of X2 and ten different values of v (deg-
rees of freedom). We can add the ten of XI to obtain one value and
similarly ten values of v can also be added together. Thus, we shall
have one value of X2 and one value of the degrees of freedom. Now
we can test the results of all these ten experiments combined together
and find out the value of p.
Suppose five independent- experiments have been conducted in a
particular field. Suppose in each case there was one degree M free-
dom and following values of XI were obtained :-
Experiment Number Value of Degrees of Freedom
Xa
1 4'3 1
2. 5'7 I
3 1..1 1
4 3·9 1
S 8·3 1

Now at 1% level of significance (or for P-01) the value of XI


for one degree of freedom is 3.841. From· the calculated iyalues of X2
given above we notice that in only one case i. e., experiment No . .3,
the observed value of XI is less than the expected value of 3.841. It
means that so far as this experiment is concerned the difference is insig-
nificant but in the remaining four caSes the calculated value of X2 is
more than 3.841 and as. such at 5% level of significance the difference
between the expected and actual frequencies is significant. If we add
all the values ofX2 we get (4.3+1.7+%.1+3'9+8.3) or %4+ The total
of the degrees of freedom is ,. It means that the calculated value of
X2 for five degrees of freedom is 1.4.';. If we look in the table of X2
we shall find that at 5% level of significance for five degrees of freedom
the value of XI is 11.070. l'he calculated value of XI which is 24 . .3
is much higher than this expected value and as such we can conclude
that the difference between observed and expected frequencies is a
significant one. Even if we take 1% level of significance (or P=.OI)
I,
the table value of X2 is only .086. Thus the probability of getting
a value of X2 equal to or more than 2.4.3 as a result of sampling fluctua~
tions is much less than even .01 or: in other words the difference is
significant.
XI. Distribution
XI! distribution has some other useful properties also. When
there is only one degree of freedom or when Il= I, xa distribution
, gives a normal curve with unit standard deviation for the positive
values of the variate. When the degrees of freedom are more than one,
X2 distribution gives a single humped type of curve. The curves in
CHI-SQUARE TEST AND GOODNESS OF PI 7P

such cases, is at a tangent to the x-axis at the origin (X t = 0). , As the


value of X 2 rises the curve also rises and reaches its muimiun when
XII=V-I. After this if the value of X2 increases the curve falls very
slowly to o. Thus wp.en the degrees of freedom are more than I the
X2 gives a curve skew to the right. However, as the degrees of free-
dom increase the curve becomes more and more symmetrical. - If the
degrees of freedom are large say more than ;0 then v' 2X2 is distri-
buted almost normally about a mean v' 21J- 1 with unit standard devia-
tion. This property is of very great importance because when the
degrees of freedom are large it enables us to use the tables relating to
normal distribution for the calculation of p. Suppose we have to find
out the value of p when X 2 =60.5 and V=41. The mean of X 2 distri-
bution would be
,y-;:;-I = V 82.-1=9
The value of V 2X2 is V 12.1 = 1 I and this value is 2.0 to the
right of the mean. Now if we consult the tables relating to area of
normal curve between the mean ordinate and the ordinate at various
sigma distances from the mean we shall find that the area between the
mean ordinate and an ordinate two sigma distances from the mean to
the right is .47725 of the total area. The area on the right of the mean
ordinate is one-half of the total area. Therefore the area to the right
of the ordinate at 2. sigma distances to the right of the mean is .5000-
.4772. 5 or .022.75. This is the probability of X 2 value being as high
as 60.5 due to fluctuations of sampling, when degrees of freedom
are 41.
Goodness of fit
Since X 2 test gives us an idea about the divergence between
observed and expected frequencies it is also described as a test of the
"<goodness ojfit." In other words, it tells us how good the fit is of the
expected and observed frequencies. Conventionally if the value of p
is small the fit is .said to be poor. It means that there is considerable
divergence between theory and fact and if the curve of the expected
frequency is superimposed on the curve of observed frequencies there
would be a wide divergence between the two. The fit would not be
good. If, on the other hand, the value of p is high the fit is said to be
good. It means that there is no ~ignificant divergence between observ-
ed and expected data and if the curve of the expected frequencies is
superimposed on the curve of the observed frequencies there would
not be mu<:h divergence between the two. The fit would be good.,
Questions
1. What is X 2 test? How does it measure the significance of difference
between theory and fact ?
2. What are the uses of X2 test and what are its limitations ?
3. What are the conditions for the application of X2 test ?
4. What are the important properties of X 2 distribution ? What do you;
understand by the additive property of X 2 ? I
FUNDAMEN'l'ALS OF S'I'A'I'ISTICS

5. What is meant by goodnes~ of fit? When is a fit said to be good and


when poor ?
6. Find X2 for the following table : -
Passed Failed
English 50 60
Hindi 200 180
History 350 400
Geography 150 210
a~a ~ ~
7. In a recent diet survey the following results were found in an Indian city :-

Number of Hindus Muslims

Families taking tea 1236 164


Families not taking tea 564 36

. Discuss whether there is any sigslificant dilference between the two communities
in the m..tter of tea taking.
S. The table given below shows the data obtained during an -epidemic of
cholera : -
Attacked N6t Attacked Total
Inoculated 31 469 -500
Not Inoculated 185 1315 1500
216 I784 2000
Test the elfectiveness of inoculation in preventing the attack of cholera.
[Five per cent. value of X2 for one degree of freedom is 3.84 ]
(Indian AmJiI and A((_tz StI'Ii(I, 1941).
9. Two treatments A and B were tried to control a certain type of plant disease.
The following results were obtained : -
A : 200 plants were examined and 40 were found infected.
B : 200 plants were examined and 10 were found infected.
Is treatment B superior to treatment A ?
10. The following table shows fractional test meal results in a series of patho-
logically verified cases of ulcer and the cancer of stomach :_

Hypochlor- Hyperchlor-
Achl or-hydri a hydria Normal hydria Total

Chronic ulcer 3 7 35 9 54
Cancer 22 2 6 0 30

Toul 25 9 41 9 84

Is there a significant dIfference in acidity conditions-between chronic ulcer and


cancer ?
CHI-SQUARE TEST AND GOODNESS OF FIT 753

11. In December 1954, there was an outbre~k of cholera in a certain jail. Of


120 persons who were uninoculated, 10 contra~ted the disease, 4 of them died. Of
180 persons who were inoculated 5 contracted pbgue and none of them died.
Study the associ ad on between
(a) inoculation and contracting the disease. and
(b) inoculation and mortality among persons who have contracted the disease.
Use ~ test.
12. In the course of anti-malarial work in Birnagore in the third quarter of
1932, quinine was admini~ered to 606 adults of a total population of 3,540. The
incidence of malarial fever is shown below. Discuss the preventive value of quinine.
Fever No-Fever Total
Quinine 19 587 606
Non Quinine 193 2741 2934
Total 212 3328 3540
You may use the 5% value of Chi-square; for n degree of freedom, equal to 1,
the v:alue being 3.841. (M. A., Cai&utta, 1937).
13. The Fortune magazine of the U.S.A. published the following results ?f a
sample survey of public opinion regarding election of Roosevelt as the presIdent
of U. S.A. :~
Attitude towards election Rich Poor Total
. Favourable 508 1559 2067
Unfavourable 905 1114 2019

Total 1413 2673 4086


Is attitude towards election issue guided by the economic status of the voters ?
What test would you apply ? The following table of 1% values of chi-square is
rel'roduced for your use : -
Degrees of freedom 1 2 3 4·
1% value of chi-square 6'635. 9'210 11'341 13'277
(M. A., Paino. 1944).
14. From the following table, showing the number of plants having certain
characters, test the hypothesis that the flower colour is independent of flatness of
leaf.
Flat leaves Curled leaves Total
White Flowers 99 36 135
Red Flowers 20 S 25
You may use the following table giving the value of X2 (chi-square) for one
degree of freedom for different values of P.
P-'99 '95 '90 '50 '10 '05 '01
XL-'0001S7 '00396 '0158 ·455 2'706 3'841 6'635
(Punjab Cerlijitafe in Sfafisliu, 1946).
15. The normal rate of infection for a certain disease in cattle is known to be
50%. In an experiment with seven animals injected with a new vaccine it was found
that none of the animals caught infection. Can the evidence be regarded as conclusive
(at the 1'Yo level of significance) to prove the value of the new vaccine ?
(Military Actounts and Inman Railway Accounts Service, 1942).
16. 12 dice were thrown 4096 times and a throw of 6 was reckoned as success
the observed frequencies were as given below : - •
Success 0 1 2 3 4 5 6 7 and over
Frequencies 447 1145 1181 795 380 115 24 8
754 FUNDAMENTALS OF'STATISTICS

Find the value of X2 on the hypothesis that the dice were unbiased and hence
'lhow that the data are consistent with this hypothesis So far as the X2 teEt is
concerned.
17. The number of ma;es in each 106 eight-pig Jitters was found, and they
a re given by the following frequency distribution : -
Number of males per litter 0 1 2 3 4 5 6 7 8 Total
Frequency 0 5' 9 22 25 26 14 4 1 106
Assuming that the probability of an animal being male or female is even
0. e., p=q=l), and the frequency distribution follows the binomial Jaw, calculate the
expecteci frequencies of the nine cIas6es. Find also the value of XI to test the good-
ness of fit. (Punjab, M. A., MalhI., 1946).
18. Ten coins are tossed 1024 times and the following frequencies observed
No. of heads 0 1 2 3 4 5 6 7 8 9 10
Frequencies 2 10 38 106 188 257 226 128 59 7 3
How does this compare with a normal distribution ?
19., In experiments on pea-breeding, Mendel obtained the 'following frequencies
of seeds; 315 round and yellow; 101 wrinkled a?d yellow; lOS round ~nd gleen; 32
wrinkled and grecn. Total, 556. Theory predIcts that the frcquenCles should be
in the proportions 9 : 3 : 3 : 1.
Examine the correspondence betwcen theory and experiment.
20. qenetic theory states that children having one parent of blood type M
and the other of blood type N will always be of one of the three typCll M. MN, Nand
that the proportions of these types will on average be as 1 : 2 : 1. A report states
that out of 300 children having one M parent and one N parent, 30% were found to
be type M, 45% type MN and remainder type N. Test the hypotheSis by X= test.
(I. A. S.).
21. Given the following actual and theoretical normal frequencies (total 400).
Test the goodness of fit (degrees of freedom 10). '
Actual 4, II, 17, 29, 43, 56, 58, 63, 61, 25, 20 9 4
Theoretical 4.6, 7'9, 16.8, 30'3, 44'7, 59'1, 65, 60'4, 47.5, 31.16, 18.25, 8.S: 5.3
22. Test the following data for goodness of fit (volume of trading on New
York Stock Exchange expressed as percentage of straight line trend, 1897-1913).
Class Marks 0 1 2 3 4 5 6 7 8 9 10
Obscrved frequencies 11 35 50 48 24 15 9 7 3 1 1
Theoretical frequencies 15 29 48 43 35 21 9 3 1 0 0
Here n=204. Is the theoretical frequency curve a good fit to the observed
data?
23. In a certain cross the types represented by BC, Be, bC and be are expected
to occur in a 9 : 3 : 3 : 1 ratio. The actual frequencies obtained were :
BC Ee be be
102 16 35 7
Test the goodness of fit of obiervation to theory.
24. Tcst the goodness· of fit of obsetvation to theory fat the following ratios:-
Ob~erved Values Theoretical Values
A a A a
(1) 134 34 3 1
(2) 240 120 3 1
(3) 76 56 1 1
,(4) 243 13 15 1
GHI-SQUARE r-:ST AND GOODNESS OF FIT 755

25. (D) Describe some applications and the limitations of chi-square tC5t.
(b) If for one-h~f of n events the chance of success j s p and the chance of failure
9. while for the other half the ehance of success is qand the cb:ance offailurep, what is
the standard deviation of the number of successes, the events being all independent?
Is the mean number of successes np in this case ? (P. C. S., 1956).
26. Suppose that, in a public opinion survey answers to the questions ' -
(D) Do you drink ? .
(b) Ate you in favour of local ol?tion on sale of'liquor
Were tabulated as below ; - ..
Question (D)
Yes No Total
Yes 56 31 88

No 18 6 24

Total 74 37 112
Can you infer that opinion on option is dependent on whether or not an indi-
vidual drinks ?
Values of X' on levels of significance P
Degrees of freedom P=0.05 P=0.10
1 3.84 2.71
2 5.99 4.60
3 7.82 6.25 (P. C S., 1953).
27. Explain the technique of the X' test of goodness of fit, pointing out the
precautions to be observed in its applications.
Two sample pools of votes for two candidates A and B for a public-office are
taken, one from among residents of urban areas, and the other from residents of rural
areas. The results are given below. Examine whether the nature of the area is related
to voting preference in this election.
Votes for

~
A B Tota

--- 620 380 1,000


Rural

Urban 550 450 1,000

Total 1,170 830 I 2.000


(1. A. S., 1956).
28. In an experiment in botany the results of crossing two hybrids of a species
of flower gave observed frequencies of descriptive categories, of 120, 48, 36, 13.
Do these results disagree with theoretical trequencies which specify a 9 : 3 : 3 : 1
ratio ?
29. Four coins were tossed simultaneously and the number of heads occurring
at each throw was noted. This was repeated 2<40 times with the following results :
No. of heads 0 1 2 3 4 Total
No. of throws 13 64 85 58 20 240
On the assumption that all the coins have an c;:qual chance of coming down
heads or tails, calculate the theoretical distribution from the Binomial expansion of
240 (t+l) 4. Test the results for goodness of fit ?
30. The f~Ijowing table gives the numbers of boys and girls whose hair colour
falls into fiv~ different groups. Is there any eviJ:lence of association between h:ait
'colour and sex ? .
Il6 FUNDAMENTALS OF STATISTICS

Fair Red Medium Dark Jet Black Total


B'oys 592 119 849 504 3-6
-----
2,100
t--.--
Girls 544 97 677 451 14 1,783
Total 1,136 216 1,526 955 50 3,883
31. The numbers of fiction and non-fiction books issued b y a libra ry on six
days of a week are given below. Is there any evidence that the proportion of fiction
to non-fiction is associated with tl}e day of the week ?
Mon. Tues. Wed. Thur. Fri. Sat. Total
Fiction 813 1,238 1,823 873 721 2,634 7,902
Non-Fiction 87 149 165 72 98 438 1,009
Total 900 1,387 1,988 745 819 3,072
-----
8,911
32. The ollowing ,ata show the effect of vitamin B defiCIency on the sex-
ratio of the offspring of rats. Is the effect significant ?
Males Females Total
Vitamin B deficient 123 153 276
Vitamin B sufficient 145 150 295
Total 268 303 571
I
33. Determine whether or not typboid attack is independent of inoculation.
Attacked Not attacked Total
Inoculated - St; 6,759 6,815
Not inoculated 272 1.1,396 11,668

Total 328 18,396 18,483


34. Suhp hate content of boiler water rnay have some eftect on the cracking of
boiler plates. Test the following data which were collected to examine this possibility.
SO, (p. p. 111.) 0 200 400 600
uncracked 37 43 26 44
cracked 12 30 19 15

35. Four hundred and ninety-two candidates for scientific posts gave parti:
culars of their university degrees and their hobbies. The degrees· were in either
rnaths., chemiHry or physics, and the hobbies could be classified roughly as music
crafts work, reading or drama. Every candidate, therefore, represents one degree
and one hobby. Do you find any association between the two ?

- Maths. Chemistry Physics Total

Music 24 83 17 124
CraftswO!k 11 62 28 101
Reading 32 121 34 187
Drama 10 26 44 80

Total 77 292 123 492


The Sampling of Varfabl~e:~
(Small Samples)
:! 1 '
718 FUNDAMENTALS OF STATISTICS

(3) Since the standard deviation of small samples tends to be


smaller than the standard deviation of the universe all errors based on
these smaller standard deviations also tend to be small. As such unless
corrections are made for the errors of small samples they would tend
to underestimate the actual error.
Tests of Significance
It is obvious that the estimates obtained from small samples are
hardlJ useful for estimating the parameter values. Thus it is difficult
t j estimate the parameter mean or the standard deviation of small sized
sample. Out of various estimates obtained by small samples some
would be better than others hut we cannot place absolute reliance on
anyone of them. As such usually in the analysis of small samples we
do not try to estimate the parameter value but only try to find out if
the observed value of the sample could have arisen by sampling fluc-
tuations from some value which is given. Thus, if a sample of ten
units gives us the mean height of 64" we cannot estimate from this the
mean height of the universe. However, we can find out if this height
of 64" is consistent with the hypothesis that the true height of the
universe is 66.4'
Nil" l?JpoJhesis. The method of -approaching the analysis dis-
cussed in the last paragraph is based on the I?Jpothesis oj no difference. It
is called "Null Hypothesis." Thus in the abote illustration we shall
I presume that there is no significant difference between the observed
height of 64" and the true height of 66." We shall then test whether
this hypothesis is satisfied by the data or not. If the hypothesis is
disproved the <Ufference would be considered to be significant, if not,
the difference 'would be described as arising due to -&:bance fluctuations.
~
Significance of a sample mean
We know that in case of large samples the standard error of the
mean is given by the formula V :-. When we test the significance
of a sample mean in case of large samples we find out the relationship
between the difference of the actual and observed mean and the standard
error of the mean. Thus, if the difference between actual and observed
means is up to a maximum of three times the standard error, the differ-
ence is said to be insignificant. In actual practice as we have seen the
criterion is usually 1.96 times the standard error which gives a value
of p = .05 . If the relationship between the difference of observed and
expected means to the standard error is denoted by T, then
T=x-tn
s. e....
where x stands for the mean of the sample, 111 for the mean of the
universe and s.e'f(l for the standard error of the mean.
75')
THE SA""l'LING OF VARIABLES (SMALL SAMPLES)

If the value of T is more than 3 or z or 1.96, the difference is


supposed to be significant at various levels of p. If it is less it is des-
cribed as arising out of sampling fluctuations.
In case of small samples the analysis is more or less of the same
type. Here also the difference between the observed and actual mean
is found out and its relationship with the standard error of the mean is
calculated. But in case of small samples the standard deviation is
calculated by the formula) lJd instead of
n-l
2
J
lJd
n
2
• We have meo-
tioned in earlier chapters the concept of degree.r of freedom, and have
also noted that in the calculation of standard deviation the denominator
should be n-I and not n; but in case of large sample the use of n in
place of 11-1 does not involve any significant error and as such the

standard deviation in case of large samples is calculated by r:rl~


N 11
a
and the standard error by V' n' In case of small samples, however,
the use of n in place of n-I would make a significant difference and
as such the standard deviation in case of small samples is calculated
by J lJd
n-I
2
• After this the value of T is calculated in the usual
manner. In case of small samples we shall use "t" to denote the rela-
tionship between the standard error of the mean and the difference
between the observed and actual means. Thus in small samples
x-m
.t=-.-
Where x and m stand respectively for the observed and actual
a
means and .r for the standard error of the mean or...; n ' where a has

been calculated by the formula) Edt


n-I

Thus t= x-m = x-m"';-;;

JVnlJd2
n-I
J Ed
n-I
3
-

x-m,/_
= -a- y n
If standard deviation has been calculated by the formula
a= j _¥2 then the value of "t" can be written as
x-m _
t=--y'n
a
FUNDAMENTALS OF STATISTIC:;

In cas!,! of large samples the value of t can directly tell us whether


the difference between actual 'and observed means is significant or not.
If the calculated value of t is more than 3, the difference is obviously
very significant; even if it is more than 1.96 the difference is significant
at 5% level significance. In case of small samples since the proper-
ties of normal distribution are not fully applicable, certain corrections*
tbave to be done in the table values of t.
Form of "t" distribution
The form of I-distribution (or the table values of I) which is due
to " Student" (a f!lmous mathematical statistician), is like a symmetrically
. single humped- curve and resembles a normal curve, only it is more
lapt6kurHcy+. As in case of X 2 distribution, here also the probability
p that on random sampling we shall get the value of I not greater than
some value say t,. is the area of the curve to the left of the ordinate at
the point t ...
The following examples illustrate the significance test for a small
sample mean : -
Example 1. Ten individuals are chosen at random from a popu-
lation and their heights are found to be, in inches 63, 63,64, 65, 66,
69, 69, 70, 70• 71. In the light OD these data. discuss the suggestion that
the mean height in the universe is 65 inches,:

S ollllion Calculation of the average height and its Istandard deviation.

No. of Height in inches Deviation from Square of


individuals the average (67) deviation
(m) (d) (dJ )
I 63 -4 16
z 6; -4 16
; 64 -3 9
4- 65 -:. 4
5 66 -1 J
6 69 +:. 4
7 69 +:. 4
S 70 +3 9
9 70 +3 9
10 71 +4 16

n=lo ~m=67° ~dl=88

* This co;rcx:tion which is given by the ratio of the standard etto! of the
small samples to the standard error of the true mean is known, and is not to be
calculated. it has been dist:overed and measured for all size of small samples by
mathematial statisticians and are available in the shape of tables,
t See Chapter XI.
THE SAMPLING OF VARIABLES (SMALL SAMPLES) 76J

Em
Average height of the sample = n
-67 in.ches.
Standard deviation of the sample is

a= J Ed
n-l
2
= J_!8_-
10-1
= 3.13 inches
Substituting thes<! values in the formula,
t= x-m v'n
0'
where x represents the mean of the sample, m the mean of the
universe, (1 the standard deviation of the sample and n the number of
observations, we get,
t= 67- 6 5 v'I'"O=2..02..
3-}3
The number of deg;.ees of freedom n-I = 9.
. The value of t for 9 deg~ees of freedom at 5% level of significance
is 2..2.62.. The calculated value of t is 2..02. and is less than the table
value so at 5% level of significance this error could !Iave arisen due to
fluctation of sampling and it can be said that the mean height in the
universe is 65 inches.
We can interpret this result in terms of p also. On random sam-
pling the probability of g.etting a vlaue equal to or higher than 2..02. for
nine degrees of freedom 1S more than .05. When p=. I and degrees of
freedom=9 the value of t= 1.8B. Thus the value of p in the present
problem is somewhere between .05 and.!. If the value of p is .os and
1=2..:.68 it means that the probability is 5%, or in only one case out of
:'0 this difference could have arisen due to chance. Similarly if P is .1
and tis 1.833 it means that the probability is 10% or in only one case
out of 10 such difference could have arisen due to chance. In the pre-
sent example, it is in one case out of about is (between 10 and 2.0)
that such a difference could have been due to chance fluctuations. It
means that we can say with a fair degree of confidence- that the differ-
ence may have arisen due to fluctuations of sampling.
Example:.. Ten individuals are chosen at random from a popu-
lation and their heights are found to be in inches, 63,6;,66,67,68,69.
'7°,7°,71, and 71. In the light of'these data discuss the suggestion that
the mean height in population is 66".
Soltllion
In the above question the value of the mean of the sample of x
would come to 67.S" and the value of standard deviation or a would
come to _;.OII".
As such the value of
1= __"_-_m_y~ 67.8 -66
(1
;.OII
v'1o
=1,89
FUNDAMENTALS OF STATI::;T1C::;

At 5% level of significance the value of t for 9 degrees of free-


dom is 2.262.. The calculated value is less than this and as such the
difference may be due to sampling fluctuations and it can be said that
the mean height of the universe is 66 H •
Example;. A certain stimulus administered to each of 12.
patients resulted in the following increase of blood pressure; 5, 2., 8, - 1,
3, 0, 6,-2., I, 5, 0, 4. Can it be concluded that the stimulus will be in
general accompanied by an increase in blood pressure ?
Solulion
Calculation of the mean and the standard deviation of the sample

Increase in blood Deviation from the Square of deviation


pressure of 12 patients as. avo (3)
(m) (dx) (dx?)
5 2 4
2 -I I
8 5 2.5
-I -4 16
;

j
0 0
0 -3 9
6 3 9
-2 -5 25
I -2 I 4-
5 2- 4
0 -3 9
4 I I

n=l2. 2 dX=-5 2 dx2 = 107

Mean qf the sample

x=x+ 2dx=.+ -5 =2·58.


;I
fI 12
Standard d_eviation of the sample

u= JL :2 _(L ;X)2 = J 117 - ( -;; y


=L.97
We take for our hypothesis the supposition that the stimulus does
not result in an increase in blood pressure, that is fJl= o.
Now, substituting these values in the formula.

t=~v'n=I
u
We get,
t= 2.5 8 - 0 ;--
'\,12-1=2..89
2·97
THE SAMPLING OF VARIABLES (SMALL SAMPLES) 7:6'3

The number of degrees of freedom is IZ - I or I I. The value


of t for II degrees of freedom at 5% level of significance is 2.201 which
is less than the calculated value of t. So, the deviation is significant
which means that the hypothesis is incorrect. Hence it can be conclud-
ed that the stimulus' will be in general accompanied ~y an increase in
blood pressure.

Note :-The original formula for the calculation of


X-HI .r- In small samples while calculating standard
t=-- vn.
a

deviation the formula used is a= J1:dx2


n-I
but in this case we have used

the f~rmula J1::X2.In order i:o adjust the difference we have modi-

fied the formula for the calculation of t by using viz-:::::I instead


of V;;:

Example 4. Eleven school boys were given a test in drawing.


They were given a month's further tuition and a second test of equal
difficulty was held at the end of it. Do the marks give vidence that
the students have benefited by the extra coaching.

Boys Marks Marks


1st Test 2nd Test
1 23 24
2 20 19
; 19 22
4 21 18
5 18 20
6 20 22.
7 18 20
8 17 20
9 2.3 23
10 16 20
II 19 17

Solution

Calculation of the mean and the standard deviation of the differ.


ences between the marks of the two tests.
FUNDAMENTALS OF STATISTICS

Differences of Deviations Square of


Marks Marks marks (2nd Test from the the
Boys 1St Test 2nd Test -utTest) average(I) geviations
~

(m) (d) (dZ).

I 23 24 +1 0 0
2. 20 19 -1 -2 4
3 19 22 +3 2 4
4 21 18 -3 -4 16
5 18 20 +2 I I
6 20 22 +2. I I
7 18 20 +2 1 r
8 17 20 +3 2 4
9 23 23 0 -I I
10 16 2.0 +4 3 5>
11 19 17 -2 -; 9

Em=II Ed z =50

Mean of fh~ difference


-
1I
X=-=-=I.
.rm
n II

Standard deviation of the differences

(j =J =J Ed
n-I
2
50
11-1
= 2·2.4

Now, let us proceed on the «Null Hypothesis," i. e., on the sup-


position that the students have not been benefited by the extra coach-
ing, that is, the mean of the differences between the marks of the two
tests is zero, or fII=O.
Substituting these values in the formula.
t= x 111 v'n
(j

We get,
f -- 2=5? v'-
I I = 1.4 8 2.
2·2.4
The number of degrees of freedom is I I - I or 10.

The value of t for 10 degrees of freedom at 5% level of signifi-


cance is 2..22.8 .The calculated value is conisderably less than this value.
Hence, the hypothesis is correct and the marks do not give any evi-
dellce that the students have benefited by the extra coaching.
THE SA..\l:PLI:t'G OF VARIABLES (SMALL SAMPLES) j'r; ,

Example 5. Two laboratories carry out independent estimates of


fat content fat ice-cte1m made by a certain firm. A sample is taken
from each batch. h.alved, and the separate halves sent to the two labo-
ratories. They cbtain the following results.
Percentage fat content in ice-cream

Batch No. r 2 3 4 , 6 7 8 9 10

--
Lab. A 7 8 7 3 8 6 9 4 7 8
Lab. B 9 8 8 4 7 7 9 6 6 6

Is the testing reliable ?


Solt,lion
If the testing is reliable the mean of the difference in results should
not significantly differ from zero.
We shall calculate the mean and the standard deviation of the
difference of results.
Calculation of the mean and standard deviation of results of the
tests.

Difference of results Deviation from mean


(Lab. B-Lab. A) (·3)
(m) (d) (d 2 )

+2. +1·7 2..89


0 - .3 .09
+1 + ·7 ·49
+~ + ·7 ·49
-1 -1·3 1.69
+1 + ·7 ·49
0 ~

- ·3 .09
+2. +1.7 2..89
'-I -I.~ 1.69
-2 -2·3 5-29
+3 1:d2 16.10

Mean of lhe difference is


1:111 3
x = -n-= -;-;-=·3'
Standard Deviation of'the differenfe or

u=
Jn=I = j
JJd 16.1
10-1 =1.33
766 FUNDAMENTALS OF STATISTICS

Taking the true mean as 0 and substituting the figures in the


formula

f=~vn
CT

We get
1=_.3- 0 V'i"(;=.7 1
I. 33

The value of t for 9 degrees of freedom at s% level is 2..S7. The


calculated value is much less than this, hence the difference is insignifi-
cant ?nd the tests have been reliable .

.;ignificance of the difference between two sample means


. To test the significance of the difference betwFcn two sample
means the value of t is calculated by the following formula: -
X
t= l-:_X'J, I

S
J-+-
"1 "2
Where Xl and Xa are the means of the samples, "1 and "2 the num-
ber of items in the two samples and S the standar~ deviation of the
difference between two samples. The value of S is calculated as foll-
ows:-

SJ I(X1-X1)l!+ I(x'J,-x.;"J2
~+"2-2.
Where Xl and X2 are the values of various items in the two
samples.
It will be observed that this formula of the claculation of t is again
based on the formula of testing the significance of the difference of two
sample means, when samples are large. The value of f is found out by
obtaining the ratio between the standard error of the difference of two
sample means and the actual difference between the two mean values.
As we have seen in earlier chapters in case of large samples the
standard error of the difference of two sample means is equal to

j op2(.2..+ _ I )
"1"2
or 0' p J"1
_I +_1
"'J
. In case of small samples also the same calculations are done. op
IS replaced by S which is the standard deviation of the difference of nyo
sam~les and is calculated in the manner indic;lted above. In place Of
~!le sum of "1 and Nt the figure which is used is % less than this because
~he degrees of freedom in such -cases is equal to "l+nll-z..The follow-
mg ..examples would illustrate the above formula : -
THE SAMPLING OF VARIABLES (SMALL SAMPi.ES) 761
Example 6. The figures below are for protein tests of the same
variety of wheat grown in two districts. In district 1 the average for
~ samples is IZ.74, and in district 2., the average for 7 samples is 13.05.
If these are the only figures available, test the significance of the differ-
ence betwee" the average proteins for the two districts.
Protein Results
District 1...... H.6 13.4 11.9 IZ.8
District 2...... 13.1 13.4 12.8 13.5
Sollilian
Calcllialion of Ihe !land4rd error of Ihe difference of the average protein!
for the two dis/ricls 1 and 2.
District 1 ! DIstrIct Z
-
Deviation Deviation
Protein from the Square of from the Square of
Results average the deviation Protein , average the'devia-
(12·74) results (I ).0;) tion
(Xl) (Xl-Xl) (XI -Xl )2 (Xz) (Xs-x,J (Xt-- X 2)2
12.6 -.14 .01 96 1;.1 +°'7 ·°°49
13·4 +.66 .435 6 13·4 +·37 .13 6 9
11.9 -.84 .7°5 6 u.S -·2.3 .°5 2 9
12.8 +.06 .00;6 1;·5 +.47 .22°9
13. 0 +.26 . 06 76 1;·3 +.27 .°7 29
12·7
IZ·4
-."
-.6;
. 108 9
.39 6 9
1:(x1 -x1)2' 1:(xa-xJ'I.
= 1.2.;20 - =·9943
The difference between the average proteins of the two distticts
is 13.°3-12.. 74, or .29. The standard error of the ditfer~nce is

s=j lJ(x1-X1)2+ I'(Xa- X 2)2


"1+"3- 2
=
J
1.2320°+·994; - . / - -
- - - - - - - - v .2..1.163=.472.
5+7- 2
Substituting these values in the formula,
t= X I -X2

We get
Sx
J-+-I

"1
1

"a
768 FUNDAMENTALS OF STATISTICS
,
The number of degrees of.freedom is n1 +n2 -2= 5+7-2= 10.
The value of t for 10 degrees of freedom at s% level of significance
is 2.228. The calculated value is considerably less than this figure.
Hence, it can be concluded that the average protein contained by wheat
in the two districts does not differ significantly.
Note :-± signs have no significance while we compare the cal-
culated and the table values of t.
Example 7. 'The heights of six randomly chosen sailors are in
inches: 63, 65, 68, 69, 71 and 72. Those of ten randomly chosen
soldiers are: 61,62.,65,66, 69, 69, 70,71, 72 and 73. Discuss the li;:5ht
that these data throw on the suggestion that soldiers are, on an average,
t..ller than sailors.
Solfltion
Calculation of the standard error of the difference of the mean
heights of the sailor and the soldier.

Sailors Soldiers

Deviation Deviaiion
Height in from avo Square of Height fromav. Square of
inches height deviation in height deviation
(?& inches) inches (67.8 inches)

(Xl) (X1_:,(1) (X1 -XI )2 (X2) (X2-X2) (X2-XJ2


63 -5 25 61 -6.8 46.24
6S -j 9 62. -5. 8 33. 64
68 0 0 65 -2.8 7. 84
69 +1 I 66 -1.8 3. 24
71 +3 9 69 +1.2. 1·44
72. +4 16 69 +1.2 1.44
70 +2.2. 4. 84
71 +3. 2 10.24
72 +4. 2 17. 6 4
- 73 +5.2. 2.7. 04
(ExJ= (Ex1 -Xl )2 EX2= E(x2-xJ2
40 8 =60 67 8 =153. 60
Mean height oj the sailor
-
x1= -EXl
-
n 6 - = 68'lnches
= -408
1

Mean height oj the soldier


~ EX2 678
Xz = = - - =67.8 inches.
nz 10
'I'HE SAMpLING OF VARIABLES (SMALL SAMPLES) 76&

Differen" belwe,,, lit, 1111' 1111111 heighl is


x1 -x,=61-67.8, or .1 inches.
The standard error oj this differen~e is

j L'(Xl-'XlYI+X(Xt-Xt)1
"1+"2-:1
'-"-60-+-1~-3-:.6 - v'--6
=

=~·9
J 6+10-1 - 11-1

Substituting these values in the formula

Wc get

3-9Xv'i+ .1
.. 1'0'
=·°99
The number of degrees of freedom is
n1+PJI-z=6+IO-Z, or 14.
The v~l\le of t for 14 degrees of freedom at ,% level of signi-
ficanee is ~.145. The calculated vlllue is much less than this value.
So the difference between the mean heights of sailors and soldiers is
not significant. Hence the sugges~ion that soldiers are on the average
taUer than sailors is wrong.
Example 8. Eight pots growing three wheat plants each were
exposed to a high tension discharge, while nine similar pots were
enclosed in an earthen wire case: 'the number of tillers in each pot
were as follows :-
Caged 17 .16 18 %~ :6
Electrified 16 16 16 1~

See whether electrification exercises any tcal effect on the tiller-


iug by using I test of significance.
Sol"liall
Calculation of the standard e:ror of the differenee of the average

.
tillers of the two samples .
l'UNDAMENT.U.S OP STA"rISTICS

Caged Electrified

--rTillers IDeviation Square Tillers Deviation


I in I from the of in ftom the Square
Pots each average devia- Pots each ave~age of
Pot I tiller_(_:3) tion Pot tiller (18) deviation
+_':;.L_ .!,;:l -.).1) (Xl - XI )2
- I XI (xa-x2) (X2-~~.2
I 17 -6 36 16 -2. 4
2.
;
I 26
18
+3 9 2.

;
16
22.
-2. 4-
16
1 -5 2.5 +4
4 25 +2 4 4 16 -2. 4-
5 27 +4 16 5 2.1 +3 9
6 28 +~ 25 1 6 18 0 0
2.6 15 -3
7
8 23 I +3
0
9
0
7
8 2.0 +2.
9
4
9 ; 17 -6 ;6
fll = ') \ IJ(>:!:.) 11:&=8 L'(xJ
\=10 7 L'(x1':;1) 2
=160
(144 L'(X2- X2)'I.
= 50
Mean Tiller of caged JPllIple
- r(x,) 207
xl =--- = --=2.3·
"1 9
Mrllll Tiller of 'Electrified' salllple
x 2= L'(x2) = 144 = IS.
n" 8
Difference between the two mean tillers is

"'1-X'l.=2.3-18= 5·
The slalldard error of Ihis difference i.t

s=J E(X l - X l)l+ E(XS-X~2


"1+"Z-Z
=jI60+ 50
9+ 8 - 2

=VI4=;.74.
Substituting these values in the formula

SXJ_I"1 +_1"2
THE. SAMPLING OF VARIABLE.S (SMALL SAMPI..ES)
77 1
We get

J ~=2.751
3· 74X V ~+i
The number of degrees of freedom is

"1+"2- 2 =9+ 8 - 2 = 15·


The value of t for t 5 degrees of freedom at 5% leve~ of signifi-
cance is 2.13 I which is less than the calculated value of t_ Hence, the
difference between the mean tiller of the two samples is significant
which is indicative of the fact that electrific:ttion does exert some effect
00 the tilleriog.

EXQflJple 9- Two types of batteries are tested for their length of


life and following results are obtained :_
No. in sample Mean Variance
Battery A 10 500 hours 100
Battery B 10 560 hours 121

Is three a significant difference in the two means ?


So/utioll
On the hypothesis that the two samples are drawn from popu-
lations with identical means and variance, the value of combined stand-
ard deviation or

S or u p=Jn1u12+n2ui
"1+"2- 1
=J(IOX 100)+(IOX 111)
10+10-2
v' 1Z1.8
= 1 LOS
Substituting these values in the formula

We get,
500 -5 60 _ _ =
t = ___....:-_---'c..-_ 12_2

11-8)_1_ +_1_
10 10

For 18 degrees of freedom at 5% level of significance the value


of Ht" is 2. I. The calculated value being much higher than this the
difference is highly significant_
I'UNDAMtN'rALS OF STATISTICS

s.pficanc:e Or the co-efticient of i:OrteJation


In caSt'! of large samples the standard error of the coefficient of
correlation Of r is given by the formula

-.1_,.-
.r,;-
In case oE small samples the standard error of the coefficient of
(:of!elation is calculated in tbe same way as in case of large samples.
with the difference that instead of 1-1'". v'I-r' is used and instead of
vii; VII-I. is used because in the calculation of coefficient of correla-
tion two degrees of freedom are lost :lnd therefore the number of deg-
ftC. of freedom is 11-2.. Thus the standard error of the coefficient of
conclation in small samples is givetl by the formula:-

~
vn-Z:-
The value of I is calcu12ted by finding out the ratio between the
coefficient of' correlation and its standard error.
Thus
VI-r sVii=Z
t=r..:..-=-xr
. ..;n -t'Z. v' I -r.
The following examples would illustrate the above formula :-
&1I111pl, 10. Use t test to lind whether a correlation coefficient
of+., is significant if 11= 5 t.
Sqllllioll
Substituting the values of rand tz in the formula.

yin:::;:
t= xr
VI-r s
We get,
v'51 - 2
1= - - - - -x· 5 == 4·07
Vt-(.,)3
For 49 degrees of freedom at ,% level of significance the value
of 1=2.01. The calculated value being much higher than this the
correlation is signiuont.
Bxampl, Il. A random sample of IS frorp. ~ normal univer.se
gives a correlation c;:oefficient of-O.1. Is this SIgnificant of the exIS-
tence of correlation in the universe ?
THl!: SAWPLING OF VAIUABIJ!S (SMALL SAMPLES) 77J

SDltltion
Substituting the values of rand f1 in the formula

f=
Vn-2. X r
VI-r!
We get,
V Ij-2
t= x-o.j=-2.o7
VI-(-O.5)'
The number of degrees of freedom is 15-2 or 13. The value of
t for 1; degrees of freedom at 5% level of significance is 2.16. The
calculated value of f is smaller than this, hence correlation of the
sample is not significant to warrant an existence of correlation in the
universe.
Z-Transformation
Fisher has also given a method of testing the significance of the
coefficient of correlation in small samples. In this method the coefficient
of correlation in the sample or r is transformed into Z. The value of
Z is calculated by the formula
Z=i 10ge*I+r
l-r
,

I+r
-logl' - - I.11IJ
I-r
The staqdard error of Z or
I
I.~• • =./--
V 11-5
Similarly the coefficient of correlation in the universe or P can be
transformed into~. The value of ~ is calculated exactly in the same
manner as the value of Z; of course here the coefficient of correlation
of the universe is taken into account.
If the coefficient of correlation of the universe is not known it is
°
supposed to be and then the value of ~ also becomes o.
To test the sigtillicance of a coefficient of correlation the difference
between Z and E is calculated and the relationship of this difference
and the standard error of Z is then found out. If the difference is more
than thrice or twice (depending on the level of significance) the stan-
dard error it is supposed to be Significant. The following examples
would illustrate the above rules :-
Bxlll1lph u. A correlation coefficient of 0.,
is discovered in a
sample of 19 pairs. Apply Z test to find out if chis is significantly
different •
• Natutal or Nepharion .ystem of logs. The relationship between nltural
log and loglO which are ordinarily used is 10810= log" X 2.3026.
FUNDAMENTALS 01' STATISTICS

(a) from.'7
(b) from 0.;
Sollltill'!
Su :Jstituting the given value of r in the formula
I+r
Z = loglo-- X 1. 151 3
I-r
We get
1+.5
Z = IoglO--
1-., XI.ISI;
=IOglO; X 1.1, I; =.4771 X I.IP3
=·549
The standard error of Z is
I
s. e. = ...; N-;
Now if we assume the correlation in the universe, i. e., p to be
e
zero then is also equal to zero. The deviation of Z from is more e
than twice the standard error of Z. Hence the 4y.pothesis is wrong and
we may conclude that the given correlation 'is significantly .different
from zero.
\
Note : -For Z test the coefficient of correlation of a sample is
converted into Z by the formula Z=IOglO I+r X I.Ip;. Similarly
I-r
the coefficient of correlation of the universe of population is converted
into~. The formula used for ~he purJ:'ose is the same as above.

Now, if the supposition is that the correlation in the universe'


i. e. P=+0.3.
Then,
. I+P
~=Ioglo--X 1.IP3
I-p
1+.;
= 1oglO - - - X 1.15 1 ;
1-.;
=}OgIO 1.857 X I.lp; =.%688 X 1.1 p;
=.3 0 9
The deviation of Z from e
is, therefore
0.309-0.'49= -0·%4°·
This is only about :96 times the standard error of Z which is 0.% S
and so could have arisen from the fluctuations of sampling. Hence,
it c~n be concluded that the given correlation coefficient is not signi-
ficantly different from +0'3'
THE SAMPLING OF VARIABLES (SMALL SAMPLES) T!~

Significance of the difference between two sample coefficients of


correlation
Z test can be useu to test the significance of the difference of two
sample coefficients of correlation. The coefficients of correlation are
first transformed into Z by the formula already given and then the
standard error of the difference of two Zs is calculated by the following
formula

1"'11-$7.=) __-5_+ "3-3


111
1 _1_

If the actual difference between the two Zs is more than thIice


or twice (dependiog 00 the level of significance) the standard error of
the difference, ·it is said to be significant otherwise not.
I test can also be applied at this stage. The value of I is equal to
Zl-Za
Zt-Za I.'.
The calculated value of I is then compared with the table value
to conclude whether the difference is significant or not.
The following example would illustrate the above formula !-

&Olllpl, I;. Apply "z" test to determ:ne the significance between


two correlation given by
r1 ""'o.6, ra=o.,
" 1 =9 "2==12
SOllilioll
If r1 =o.6
I+r
~=logIO --x
I-r
1.1,13
1+0.6
-logl. 1-0.6 XI.ISI;
-IOgl. 4X 1.1 513 =.60%1 X 1. 1 , 13
-0.693·"
Again, if rll=o."
l+r
Z3- 1og10 - - X 1.1513
I-r
Iogi • 1+0.1 X 1.1513
1-0.,
=IOgl. 5 X 1.1513=.4771 X I.IP5
-0·149·
The difference of Zl and Z. is
ZI-Zl=o.69;-o.'49=o.I44·
FUNDAMENTALS OF STATISTICS

The standard error of thi~ difference of Zl and Z2 is

J. e. 2l.Z~= ) __1_+ _I_= j _ I _ ,... __1_


"1-; " 2-; 9-; lZ.-;
.13 approx.
=
The observed difference of the two Zs is only .Z.7 times this stand-
ard error and so could be attributed to the sampling fluctuations.
Hence the difference between the two correlations is not significant.
We can also calculate I at this stage
t= ~-Zl
ZJ-2a
.I.e.
Substituting the calculated values we get,
\
t= 0.693 -0·549
·B
=·%7
The number of degrees of freedom is "1+11.-6.
=9+ u - 6 =I,
The value of t for IS degrees at 5% level of significance is %.I31.
The calculated value of I is much less tnan this figure. Hence the
difference is insignificant. \
We have discussed above the tests of significance ·of mean and
coefficient of correlation only. Tests of significance of standard devia·
tion and other measures relating to small samples are also conducted in
similar fllshion.
Questions
1. Why i~ the analysi~ of small samples done by rules different from those
with which large samples are analysed ?
2. What is the effect of the smallness in the size of the sample on various
standard measure that are calculated in the analysis of samples ?
3. How wouH you test the significance of the mean of a small sample ?
4. What is meant by Z-transformation ?
5. How would you test the significance of the difference of
(0) two sample means
(b) two coefficients of correlation in small samples ?
6. Find "Student's" I fot the following variate values in a sample of 10:-6,
-4, -3, -2, -2, 0, t, 2, 3, 5; taking", to be zero.
7. A machine which produces mica insulating wuhers for use in electrical
devices is set to turn out washers having a thickness of 10 mils. (1 mU-O.OOl inch.)
A sample of washers has an average thickness of9.52 mils. with a standard deTiation
of 0.60 mil. Find out the significance of such a deviation.
S. The sleep of 10 patients was measured for the effects of two soporific drugs
referred to in the h!lowing tahle as (1) and (2). From the data given below show that
there is a signir ·"It difference between the effects of the two drugs.
777

Additional houts of sleep gained by use of soporifics (1) and (2).


Difference
Patient (1) (2) (2)-(1)
1
2
+.7
:-1.6
1.9
.8
1.2
2.4
3 - .2 1.1 1.3
4 -1.2 .1 1.3
5 - .1 -.1 o
6 3.4 44 1.0
7 3.7 5.:> .8
8 .8 1.6 1.8
9 o 4.6 4.6
10 2.0 3.5 1.4
mean .75 2.33 1.58
1.70 1.90 1.17
9. The following table show5 the average yearly mortality in automobiles
accidents per 100,000 population in.5 cities of a country for two three·year periods.
Rate per 100000 Rate per lOOCOO
~~ ~ b
1939-41 period 1950·52 tJedod
A 26.6 28.5
B 18.1 21.3
C 15.0 17.7
D 27.6 35.8
E 20.9 25.4
Determine whether or not there was a significant increase in the mortality in
automobile accidents from the first period of the second.
10. A farmer grows cropa on two fields, A and B. On A. he puts Its. 10 worth
of manure per acre and on B Rs. 20 worth. The net returns per acre, exclusive of the
cost of manure, on the .two fields in five years arc: :
Year Field A, Rs. per acre Field B, R~. per acre
~1
1 34
2 28
3 42 49
4 37 38
5 44 50

Other things being equal, discuss the question whether it is,likely to pay the
farmer to continue the more expensive dressing.
11. Why should there be different formula for testing sisnifiance of difference
in means when the samples are (tt) 'small' (b) 'large' ?
The yields of .two types, "Type 17" 'lnd "Type 51" of grams in pounds per acre
at 6 replication are given below. What comments would you make on the diJferences
in the mean yields? You may assume that if there be 5 degrees of freedom and
1'=0.2, I is approximately 1.476.
Yield in pounds Yield in J)ounds
Replication "Type 17" "Type 51"
1 20.50 24.86
2 24.60 26.39
3 2~06 28.19
4 29.98 30.75
5 30.37 29.97
6 23.83 22.04
(I. A. S., 1951).
718. FUNDAMENTALS OF STATISTICS

12. Find out the reliable of the sample mean of the following data : -
Breaking strength of 10 specimens of 104-inch diameter handdrawn copper
wire.
Specimen Breaking-strength
ill pounds
1 578
2 572
3 750
4 568
5 572
6 570
7 570
8 572
9 596
10 584
13. How can "," test be applied for testing the significance of the difference
between two sample means. Calculate the value of t in the case of two characters
A and B Whose corresponding values are given below : -
A 41, 49, 34, 36, 49, 50, 36, 20, 18
B 46, 44, 30, 35, 26, 28, 29

14. The ash content of coal from two different mines was analysed, five analysiq
being made of the coal from the first mine, four of that from the second mine. Are
we justtfied in supposing that the two mines consist of ooal with the same percentage
of ash content on the basis of the results obtained, which are recorded in the
following tables.
\
Table A Table B
per cent per cent
ash content ash cl)ntent
24.3 18.2
20.8 16.9
2J.7 20.2
21.3 16.7
17.4
15. Is there a significant difference in the strength of lead in two number
2 Pencils manufactureJ by R. G. T. Guteba & Co.?
Pencil (a) Pencil (b)
Test Strength in Test Strength in
kilograms kilograms
1 1.6C 1 1.78
2 1.72 2 1.48
3 1.68 3 1.72
4 1.50 4 1.62
Total 6.50 Total 6.60
16. Below are given the gains in weight (pounds) of hogs fed on two different
diets. Twelve animals were fed on diet A, 15 on diet B. Is either diet superior?
Gains in weight on diet A: 25,32, 30, 34, 24, 25, 14, 32, 2.~, 30, 31, 35
Gains in weight on diet B: -44, 34, 22, 10, 47, 31, 40, 30, 32, 35,
18, 21, 35, 29, 22
17. The means of two random samples of sizes 9 and 7 rt'spectively are 196.42
and 198.82 respectively. The sums of the squares of the deviation from the me.ln are
26.94 and 18.73 respectively. Can the samples be considered to have been drawn
from the same normal population ?
THE SAMPLING OF VARIABLES (SMALL SAMPLES) 17c)'

d.f· 5% value of I 1% value oft


13 2.160 3.012
14 2.145 2.977
15 2.131 2.947
16 2.120 2.921
(E. A. tllId B. Sf., LlHkflO»J, 1940)
18. Mitchell conducted a paired feeding experiment with pigs on the relativI
value of limestone and boncmeal for bone development. The results are given below.
Ash content in percentage of Pairs of Scapulas of pigs fed on Limestone and
Bonemeal.
Fair Limestone Boncmeal
1 49.2 51.5
2 53.3 54.9
3 50.6 52.2
4 52.0 53.3
5 46.8 51.6
6 50.5 54.1
7 52.1 54.2
8 53.0 54.3
Mean 50.94 53.14
Determine the significance of the difference bctween the means in two ways:
(1) by assuming tha~ the values are pair:<i, and (2) by assumi~g. that the values ate? not
paired. On the baSIS of your results, d15CU5S the effect of pamng.
19. Two yarns spun to the same count are tested for their stre;ngth. The fol-
lowing results arc obtained ; -
No. in Sample Sample mean Sample Standard Dev.
Yarn A 9 42 7.5
Yarn B ,4 50 6.5
Is the difference in mean strength significant
20. Apply I test to find whether correlation is significant, if r ...... . 6 and 11 ••• 38.
21. A correlation coefficient of+.5 is discovered in a sample of 19 pairs. Is
its ignificant? Usc I test.
22. A random sample of 15 from a normal universe gives a correlation of
Use 'Z' test to find if it is significant of correlation in the universe. ,
+.5.
23. The correlation coefficient between mathematics aptitude aad language
aptitude for a group of 20 boys is 0.42. For a group of 25 girls the correlation is
0.75. Is the difference signi8cant ?
24. In a test examination given to two groups of students, the marks obtained
were as follows ; -
First Group-1S, 20, 36, 50, 49, 36, 34, 49, 41.
Second Group-29, 28, 26, 35. 30, 44, 46.
Examine the significance of difference between the arithmetic averages of the
marks secured by the students of the above two groups. (P. C. S.• 1951).
25. What i.s Fisher's I-test fo~ small san;'ples? ~x~lain, giving the snitable
formula. how thIS test can be applJed for testing the SIgnIficance of the difference
between two sample means. Hence calculate the value of I in the case of two charac-
ters A and B whose corresponding values are given below :_
A 16, 10, 8, 9, 9, 8.
B 8, 4, 5, 9, 12, 4. (P. C. S., 1952).
FUNDAMENTALS OP STAnSTICS

26. What i. student's l-distribution? Explain ita t'.Se in small samples.


Ten indiTidualll an: chosen at IIUldom from a population and their heights are
fc:'und to be, in inches, 63, 63, 66, 67, 6B, 69, 70, 70, 71, 71. In the light of these dat2.,
diSCUSS the suggestion that the mOln height in the universe is 66 inche•.
Given that .... 9
P=.947 for 1=1.8
P=.955 for 1=1.9
(P. c. S., 1954).
27. What is I-teat in small sample theory? HoW" does student's I differ from
FiSher>a t ? , -
A group of seven-week old chickens, leared on a high protein diet, weigh
12, 15, 11, 16, 14, 14, 16 ounces; a second group of 5 chickens, similarly treated
except that they receive a 10,.,. protein diet, weigh 8, 10,14, W. 13 ounces. Calculate
the value of I and test whether there is significant eTideocc that additional protein has
increased the weight of the chickens.
Values of "t' on levels of significance 'P'
Degree of freedom p= 0.23 p ... 0.81
10 2.23 1.B1
11 2.20 1.80
12 2.18 1.78
(P. c. S., 1956).
28. .A set of 15 observations gives = n ... 68.57, standard deviation=2.40 ;
lmother of 7 obscrntions gives mean,,,64.14, standard deviation=2.70.
Use the I-test to find whether the two sets of data were drawn from popt>lation.
with the same mean, it being assumed that the standard deviations in the two popu-
btions were (.) equal, (;) not equal. I (1. A. S., 1952).
29. DeUne Student's J and write down' (..ithout PlOOf) hs sampling distribu-
tion.
For a random sample of 10 pigs, fed on diet A, the increases in weight in pounds
j n Ii certain period were :
10, 6, 16, 17, 13, 12, 8, 14, 15. 9.
For another random ,ample of 12 pigs, fed on diet B, the increases in tile sanle
period wen: :
"7, 13, 22, 15, 12, 14, 18, 8, 21, 23, 10, 17.
Test whether diets .A IlnQ. B differ significantly as regards their effects on increase
in weights. You may use the following extract from statistical tables
Degrees of fleedorn Values of I significant at
5 per cent level
of probability.
19 2.09
20 2.09
21 2.0B
22 2.07
23 2.07
(1. A. S., 1954).
3(). The following data relate to the stature and shoulder height of 10 boys
taken once in the mornin. at 6 a.m. and . i n lilt night lilt about 10 p.m. on the SlIme
day. Exami11C by appropriate test whether the morning averages differ significantly
from the night averages. It is believed that eertain portions increase in length in a
relaxed poahion of the body so that the measurements in the morning immediately
after we get up from bed would ha'Ve higher values. Can you indicate which portion
of the body, if any, does so on the basis of the measun:ments provided?
THE SAXPLING OP VAlUABLES (SliALL/SAMl>LES) 781

Serial No. Stanuc Sr.ouldcr height


of boys
Momio& Nicht Morning Nigh[

1 177.3 176.1 164.6 164.9


2 165.9 165.1 154.2 154.3
3 174.1 173.1 161.1 161.6
4 170.4 169.3 158.8 158.2
5 166.6 165.9 153.0 152.9
6 16'7.9 161.4 156.8 156.8
7 163.3 162.6 152.3 151.9
I i60.... 159.7 146.2 146.8
9 lSi.• 157.9 147.4- 146.9
10 160.7 159.9 148.5 148.0

(C. A. S •• 1957).
31. A cettaia stimulus is to be tested for its eft'ect: on hlood pressure. 'I'welTc
men hne their ItJDod preurue measwed befoR: and after the ltitnulus. The teSultl
are as follows.
Man Before .A£te.c
1 120 !28
2 124 1~t
3 130 131
4 118 127
5 140 132
6 128 125
7 140 !41
'8 135 137
9 126 118
10 130 1.32
11 126 129
12 127 135

Is there re ason to believe that !:lie stimulus 'Would, on the a-rer.lge. r.rlse
lood pressure five point. ?
32. What usumptiollS are made about the population when ao experIment is
lrricd out and I1nlLlyzed by usc of a I test?
33<. If it is not necessary to pair observations and you decide to pair, do
DU have a dUfc11lnt level of significance than if you did not pair? Discuss the
~ect of p4liring 00 your chance of making a correct inference both in the case where
ou should pair and in the case where it is not necessary :
34. Suppose that 10 samples of 9 obserfttions each have vuiances as follows
23.5 30.6 29.3 27.5 27.5
26.3 29.8 30.7 22.3 26.5
Is there reason to doubt, at the 5 per cent. Inc:i of significance, that these samples
re from populations having equal YIlrilDCQ i'
35. A fertilizer milling mschine is let to give 10 pounds of aitDIte {or every
'00 rounds of fetdli:zer. Ten 100 pound hap arc examined. The ~n:entlges of
aitrate arc .. foHows : -
9, 12. 11, 10. 11. 9. 11. 12. 9. 10
Is there reason to believe that the mean is not equal to 10 pet cet1t. ?
FUNDAMENTALS OF STATISTICS

36. The following data give paired yields of two varieties of wheat. Each
pair was planted in a different locality. Test the hypothesis that the mean yields ace
cqua.l. Find a 90 per cent. confidence interval for the difference in the mean yields.
1 45 32 58 57 60 38 47 51 42 38
II 47 34 60 ~9 63 44 49 53 46 41
Expl:lin why pairing is necessary in this problem?
37. A group of SO boys and a group of 50 girls were given a test in arranging
different shaped blocks. The times are recorded in the frequency table below. Ana-
lyse the data. If this was an industrial experiment, give a 90 pel' cent. confidence
interval for the mean saving in time if gids perform the task rather than boys.

40 51
Frequency of girls
F re ucncy of boys 1
Analysis of Variance 30
By now the reader would be fairly familiar with the methods of
:alculation and use of standard deviation in various types of studies
relating to dispersion. correlation, regression and sampling errors. In
the chaptet: on Measures of Dispersion it was pointed out that the
standard deviation is the square root of the arithmetic average of the
squared deviations taken from the mean. It was also mentioned there
that the square of the standard deviation, or the arithmetic average of
the squares of deviation taken from the mean is called .. Variance."
This name was first used by R. N. Fisher who developed very elabo-
rately its theory and uses. Variance is in many cases a better and more
convenient measure of variation than the standard deviation and in the
present chapter we shall discuss some of the simple methods by which
variance of a set of series is analysed.
In the previous three chapters we discussed the methods of deter-
mining whether two samples 'have come from the same universe or
from two universes which are significantly different from each other.
One of tl}e methods by which this study is done is by the calculation
of s~ndard error of the difference of the means of the two samples ;
another method is that of X' test ; and in case of small samples the
methods that' we follow is that of t test. Here we shall discuss a
method which measures the signi6cance of the difference between selltral
mea..1S at one time. This Method is the analysi, of variance.
We know that variance or
V-a'
E(X-X)'
"""-----..:..-
"
Where x stands for the values of individual items, x for the mean
::>f the series and 0 for the total number of items.
In case of small samples, however, we mentioned in chapte, ~o
that the standard deviation should be calculated by dividing the sum
::>f the squares of the deviations between the values of items and the mean
value. by the degrees of freedom rather than by the total number of
[terns. Degrees of freedom in such cases are equal to n-I In other
words, in small samples, standard deviation or

q=J L'(x-x)~
0-1
FUNDAlO!.N'l'ALS 01" STATIS'nCS

and Vadana: or
V""" .E(x-xya
"-1
We shall now illustrate how the total Tarianc:e of • .erics can be
divided in components and analysed in detaila. Suppose four samples
of five items each have been taken from a uni-.ene .ad they give
dUferent values of the mean. In these £OlU umples there arc in all
twenty items. Now if we wish to satisfy OuraelVCI that the samples in
reality have come from the same universe we must measure the diver-
gence between their means and the mean of the universe 01: the combi-
ned mean of all the four samples. If there is a significant divergence
between the combined mean and the means of the sample;l then 'We &hall
think that samples have not come from the same universe. Here there are
two difficulties. One is thlrt there are.four Rtllple means and 'We have to
test them for significance of dilference at one and the same time. The
second difficulty is that the variation in the valuet of these twenty items
may not only.be due to the fact that samples c:Wier from each other
but alao on account of the fact that within each sam£?le the items dUfcr
from each other. Thus there are two types of-variatlons in the data-
one ;6h11t6N the varioull samples and the other within the various samples.
N ow if the variations within the samples and betwe~ the samples are
not significantly different from each other then the iamples belong to
the same universe, because variation between the samples is just like
variations within the sa.mple. In such a case if we had taken-only C?nc
sample of 2.0 items it would not have made any difference so far as
variation of items is concerned. If, however, the variation between
samples is much greater than the variation within the aamples, it means
that the samples come from different types of universes otherwise there
'Would not have been a significant difference in the variations between the
samples and within the samples. In the analysis of variance. therefbre,
we :find out the relationship of the variation between th. samples and
the variation within the samples. ..
Suppose the values of the itClllJl in the four samples 'Were as
follows : -
(

Sample 1 Sample z Sample 3 Sample ..


4 5 12. 10
6 i 16 JO
z. 6 104- 10
6 10 8 6
z. ~O .10 4-
'Total %0 p 70 40
Mean 4 '6 14 8
Total number of .items in the samples or
N=.2o
ANALYS!S Of V/Ul,TANCl!

The grand total of the values of all the items of all the sa.mples or
T=2.0+3 0 +7°+40=r60
The grand mean of all the items of all the samples
160' or 8
2.0
We shall now study the total variation of these 20 items and we
shall also study the variance between the ... amples and within the
samples.
Total variation
Here we shall find out the squares of the deviation of each of these
2.0 items from the grand average.
Squares of the deviations of vari01ls items from
tiM grand aJ1et't'J!" of 8.
Sample I Sample 2. Sample 3 Sample 4
16 4 16 4
4 0 64 4
36 4 ;6 4
4 4 0 4
;6 64 144 I6
Total 96 76 2.60 ;2
Grand total of squares
=96+76+260+32.=464
Degrees of freedom
=2.0-1 = 19
Variance between the samples
Here we shall presume that each item of the sample is equal to
its average and then study the variance between different samples. In
other words, we shall calculate the square of the deviations of tbe
means of the various samples from the grand average. If the value
of each item in the first sample is t,!lken as 4, for the second sample as
6, in the third samt>le as 14 and in the fourth sample as 8 and if the
.squares of the deVlations of these values from the grand average are
calculated they would be as follows :-
Sample I Sample :t Sample 3 Sample 4
!6 4 36 0
r6 4 36 0

16 4 36 0
16 4 36 0
16 4 36 0

T"'tal 80 2.0 180 0


-~-
Sum of the ~uarea bctwCCD the .mplc is
80+olo+llo+o-do
·
V arlance between t h Ies'IS -
e samp ol80
-
1 4-
(because there are four samples and the degrees of freedom
are (4-1).
Variance within th~ samples

Sample I Sample .1 Sample 3 Sampte 4


o 4 4
4
°4 4 4
4 4
4 °
16 4
4 ~6 16
Total 16 80
Grand total of the sum of squares
= 16+ S6+80+3.1= 184
Variance within the samples
1 84
.10-4
All these results can be tabulated as follows :-

::Source ot Sum of D~rees of


Variation Squares reedom ~
Variance
Between Samples 280 .; .180
- .; =93·3
Within Samples 184 16 184
""i'6 =11.,
Total 464 19

It will be observed that the sum of the squares between the sam-
ples and within the samples is equal to the total of the squares of all
the items from the grand average. Similarly the degrees of freedom in
the total variation is equal to the sum of the degrees of freedom between
the samples and within the samples.
Now if we presume that the difference in the variance between
the samples and within the samples is insignificant, we shall be proceed-
ing on the null hypothesis. We shall then have to find out the extent
of difference in the: variance between the samples and within the sam-
ples which can be ignored as arising due to fluctuations of sampling and
on that basis we shall draw our conclusion as to whether difference in
the present problem is significant or not. In the analysis of variance
we do not study the absolute difference in the variance between the
samples and within the samples but the ratio of these variations is
obtained. In the above problem the ratio of variation or

F = ...2!:L=8.I
11.5

Now if we look at Snedecor's table for the value of F for the


gi'ie...'1 degrees of freedom at 5% level of significance it is ~. 4. The
calculated value of F is higher than this and as such the difference is
significant. It means then that the variance between the samples is
significantly greater. than the variance within the samples. In other
words, such samples could not have come from the same universe or
the mean values of various samples are Significantly different from each
_other.
_w._ -_ It should->-be- rcmcmi;~~d,~th~t'
F Variance between samples
Variance within samples
Ordinadly variance between salilples would be greater than vari":
ance within samples. If the case is reverse and the variance between
the sample~ is less than the variance within the samples the position of
the numerator and the: denominator should be interchanged and con-
clusions drawn accordingly, but this will
happen very rarely. •
Shott-cut method
The method of calculating the ratio of varia.nce between the Sam-
ples and within the samples ,as discussed a hove is a very long one.
A short~ctit method can also be used to calculate the value of F. The
problem solved above by the direct method has been solved below by
the short-cut method :-
Sample 1 Sample z ~ampIe ; Sample 4
4 6 12. 10
6 8 16 10
Z 6 14 10
6 10 8 6
z 0 ~o
4
Total 10 ;0 70 40
The grand total or T is
=zo+30+70+40= 160
FUNDAMENTALS OF STATISTICS

We shall now compute what is called "The Correction Factor."


The Correction Factor
Ta 160x 160
~=---2.0--

=12.80
The next step is to find out the square of all the items (not of their
deviations from mean as in the previous case). They would. be as
follows :-
Sample 1 Sample 2. Sample , Sample 4
16 ;6 144 100
,6 64 2.S6 100
4 ;6 19 6 100
;6 100 64 36
4 0 4 00 16
Total 96 2,6 1060 ;5 2
Grand total of the sum of squares
=9 6 + 2 ;6+1060+; 52= 1744
Total sum of squares is obtained by subtracting the correction
factor from the grand _total of squares. Thus \
Total sum of squares
= 1744-u80=464
It will be observed that it is equal to the figure obtained by the
first method.
The sum of the squares between the samples is obtained by find-
ing out the sum of the squares of the sample totals aLd dividing it by
the number of it~ms which make up each sample total and then sub-
tracting from it the correction factor.
Thus the sum of squares between the samples.
2.02+,02+702+402
= 1280
5

,
= 400+900+49°0+ 1600 12.80

7800
=---1280
5
= J5 6o - 12.80=2.80
This figure is also the same as obtained by the first method.
The sum of squares within the samples is found out by subt~act­
lng the sum of the squares between the samples from the total sum of
squares. Thus the sum of squares within the samples
=464-2.80= 184
This figure is also the &ame as obtained in the previous case.
ANALYSIS OF VARIANCE

When these three figures have been obtained the table of a{\nJ~':::s
of variance can be set up exactly in the same manner as done in the
previous case and the value of F can be calculated•
.The variance ratio or F has a very important property that its vallie
remains unchanged if all the figures are either multiplied or divided by II common
factor or if a common·factor is added to or subtracted from each figure.
Thus if in a problem $e figures are big or otherwise inconve-
nient they can be reduced in magnitude either by division or by sub-
traction of common figure. The value of variance ratio would remain
unaffe<;ted. If in the above example all the values are multiplied·by 10
or divided by 1. or if 5 is added to all the values or if .3 is deducted
from all the values the value of F would remain unaffected.
Suppose we added I to all the values they would then be :
Sample I Sampl~ .2. Sample .3 Sample 4
5 7 13 II
7 9 17 II
5 7 15 II
7 11 9 7
, I :u
Total 75 45
Grand total at T=.180
T2 180X 180
Correction factor or N = .2.0 161.0
The squares of the above items would be
sample I Sample .2. Sample 3 Sample 4
2, 49 16 9 11.1
49 81 28 9 :21
9 49 1..2.5 1.2.1
49 111 81 49
9 I 441 .2.5
rotal 141 501 1.2.0 5

Grand total of sum of squares


=I4 I +,ol+ao'+437=2084
Total sum of squares
=.2.084-1620=464
Sum of squares between the samples
==
,
2.52+3,1+75'+4SI_1620

= 62.5+12.2.5+561.5+2025 16%0
5
00
= 95 - 1620
5
== 19°0-162.0= 2.80
790 FUNDAMENTALS OF S'I'ATIS.'I'ICS

Sum of squares within the samples


=464-280= 18 4.
It will thus be observed that we have obtained the same figures for
the sum of squares and consequently the value of F would remain-
unchanged. If we multiply or divide all the figures by a constant, the
value of the sum of the squares may not be the same but the value of
F would remain unchanged. If in this example all the original figures
are divided by 2 the total sum of squares would be 116, the sum of
squares between the samples would be 70 and the sum of squares within
the samples would be 46. The value of F would then be 2.~ -;- -i~ or
8.1. This is the value which we obtained from the original series
also. Thus the division, multiplicatiun, addition or subtraction of a
common figure does not affect the value of F.
The .first method of the ,calculation of the value of F is applicable
in all types of cases. The short-cut method discussed above is appli-
cable only to those cases where the number of items in all the samples
is equal. If the number of the items in various samples is unequal some
modification is necessary in the method of calculating the sum of squares
between the samples. The method that should be followed to calculate
the sum of squares in such cases is as follows :-
(z) Find out the square of eac4 sample total.
(2) Divide the square of each sample total ~ the number of items
in the sample.
(;) Find the total of the items thus obtained from individual
samples.
(4) Subtract the correction factor from this figure and the result-
ing figure would be the sum of the squares between the samples.
The sum of the squares within the samples and the total sum of
squares can be calculated by the short-cut method already discussed.
It would be observed that when the values of items obtained in
various samples are available a part of the total variation is due to varia-
tion within the samples and part of it is due to variation betw6en samples.
If there: was no variation within the samples which means that if all
the values within the sample were identical then the entire variation
would be between the samples. Similarly if there was no variation
between the samples so that the mean values of various sarhples were
identical then the entire variation would be within tht_ samples. In
actual practi~e both these variances are found togethet and in the
analysis of variance an attempt is made to measure these variations
separately.
The study of analysis of variance is extremely useful in various
types of experiments in the field of economics, sociology, education,
psychology, etc. In the field of agricultural experiments also the analYSis
of variance is extremely u.seful. By the analysis we can find out whe-
ther the results of various types of experiments significantly differ from
each other. We give below an example which illustrates the usefulness
of the analysis of variance in the field of agriculture : -
ANALYSIS OP VARIANCl!

The following table gives the results of experiments on four varieties


of a crop in 5 blocks of plots
Block
1 234
A 3Z 54 H 35 37
B 34 H 36 37 35
Variety C 31 34 35 32 36
D 29 26 30 28 %9
Prepare the table of analysis of variance to test the significance of
difference ~tween the yields of the four varieties.
Solution
Calculation of the variance within the varieties and between the
varieties.

'0 q
..wu .........s Deviation of q

-
0
"'[j 0
0 Q1 Deviation of the mean block ';1
."
.c .~~ the block 'j; yield of each .~
0:
..!4 ..c::u ~ ..;: yield of each II)

- variety from II)


u u II)
'l;j
Variety 0 ~ _. ... variety from
0· ... .....0 the mean
"'[j
.....0
~ .....0 .c~ its mean block yield
..... c:: ...
II)
...
II)
0 "'[j
..s block yield ..s of all 0:
'0
z
d
~ ~
II)
5-
en
varieties (3.2..8) 5-
en
I 3Z -2.% 4. 84
2 34 -0.% 0.04
A 3 33 34·% -I.Z 1.44 1.4 1'96
4 35 +0.8 0.64
5 37 +%.8 7. 84
I 34 -1.0 1.00
Z B -1..0 4. 00
B 3 36 35. 0 +1.0 1.00 +2.% 4. 84
,
4 37
35
+2.0
0.0
4. 00
0.00

I P -2.6 6.7 6
(
1. 34 +°·4 0.16
C 3 35 H·6 +1.4 1.9 6 +0.8 .64
4 52 -1.6 z. ,6
5 36 +2.·4 5.7 6
1 29 +0.6 0.3 6
2. 26 -2.·4 5.7 6
D 3 30 28·4 +1.6 2. ,6 -4·4 19.3 6
4 1.8 -0·4 0.16
5 2.9 +0.6 o. ;6
51 • Z0 zG.80
79 1 FUNDAMENTALS OF STATISTICS

From the above table,


the sum of squares of deviations Eor within the varieties
= 51.2.0
and the sum of squares of deviations for between the varieties
=2.6.8X, = 134.0
The number of degrees of freedom for within the varieties is
n1=k (n-l)=4 (5 -1)= 16
and the number of degrees of freedom for between the varie-
ties is
nll=(_~-I)=4-1
=;.
With the help of the above data, the table of the analysis of variance
is set up as follows :

Source of sum of Degrees of Variance (col.


Squares Sum of squares freedom 2.-:-col. 3)
(1) (2.) (3) (4)

Within varieties j I .2. IG ;.2.0

Between varieties 134. 0 3 44. 67


I .--
Total 1'8,.2. 19

From the above table


i
F _ .!!. _ _Variance Btetween varieties 44. 6 7
- til - Variance within varieties ~= 13·9
The value of F at 5% level of significance for the given degrees
of freedom is only 3.2.4. It indicates that the difference in yield due
to varieties is significant.
Thus by the analysis. of variance we are in a position to conclude
that t:here is a signifidnt difference in the yields of various varieties of
whe.n.
Table values of F
R. A. Fisher and George W. Snedecor have worked out the limits
of the chance sample occurrences of various combinations of the
degrees of freedoms. These ratios (or the table values of F) at various
levels of significance are available in tables prepared by them. By
comparing the ooserved values of F with the table values we can con-
clude whether the difference between the samples could have arisen due
to chance fluctuations. If the results are tested at p= .05 they arc said
to be significant and if they are tested at p= .01 they are said to be
ANALYSIS OF VAUANCE 793
highly significant. The higher the alcu1ated value of F is above the
table value the more definite can one be about the dependability of his
conclusions.

Question.
1. Deline variance. How is it related to the standard deviation? Discuss
its utility and usefulness in various types of statisticalllnlllysis.
2. What do you understand bf-
'(a) Variance between samples
(;) Variance within samples.
3. What is the relationship of variance between the samples and "riance
within the samples.
4. What is the meaning of F coefficient ? How is it computed?
5. Indicate the usefulness of th~ study of analysis of variance in various fields
of economic activity.
6. The UgUfCS given below arc yields in bushel. per acre of 6 plots of
gram. Three of these plots are of variety A Ind three of variety B.
A 30 32 22
B 20 18 16

Set up a table of analysis of variance and calculate F to find out the significance
of difference between the yields of two .. arieties.
7. The following table gives the yields of four ('lots each of three varieties
of wheat. Find out if the Variety differences are signilieant.

Plot yields Total


A 26 27 28 31 112
B 18 19 21 22 80
C 16 18 19 19 72
Total 264

8. Set up a table of analysis of variance fOf yield of three atrains of wheat


planted in five randomised blocks.
Strains Blocks
A 20 21 23 16 20
D 18 20 17 15 2S
C 2S 28 22 28 32
9. Set up a taDle of analysis of variance fOf :-
Plots Varieties
a b t J
1 200 230 250 300
2 190 270 300 270
3' 240 ISO 145 180
(Ptmjab C. SI., 1946).
10. The following table gives the strength of lead in Number 2 Pencilr
manufactured by "Company D."
.751·' FUNDAMENTALS OF STATISTICS

Pencil 1 Pencil 2 Pencil 3 Pencil 4 Pencil 5


strength in strength in strength in strength in strength in.
/I'est kilograms kilograms kilograms kilograms kilograms
1 1.82 1.70 1.70 1.82 1.92
2 1.56 1.36 1.68 1.98 1.85
3 1.78 1.54 2.02 1.84 1.64
4 1.76 1.92 1.92 1.64 1.75
Prepare the table of analysis of variance to test. the significance of difference
between the strength of pencils.
11. The following table gives the length of Cuckoo's eggs deposited in the
nests of three species of birds.
Order of Tree-pipit length Pied Wagtail Wren length in
measurement in millimetres Length in millimetres millimeters
1 22.7 23.0 19.8
2 23.3 23.4 22.1
3 24.0 24.0 21.5
4 23.6 23.3 20.9
5 22.1 23.4 22.0
6 21.8 22.4 21.2
7 21.1 21.8 22.3
8 23.4 21.8 21.0
9 23.8 24.9 20.3'
10 23.2 24.0 20.9
Prepare the table of analysis of variance to test the significance of difference
between the lengths of these species.
12. Set up a table of analysis of variance and calculate ~ from the results of
the foUowing four samples : -
Sample 1 Sample 2 Sample_3 Sample 4
2 3 6 5
3 4 8 5
1 3 7 5
3 5 4 3
1 0 10 2
13. The following data relate to the results obtailled by 4 investigators investi-
gating a common problem. Each of the 4 investigators had taken a sample of 6 items
Do these results significantly vary from each other? .
A B C D
66 42 54 78
82 66 90 54
60 30 60 60
50 60 81 42
60 36 60 71
90 48 51 49
14. Explain the terms (a) the null hypothesis, (b) the level of significance.
Two random samples drawn from two normal populations are-
Sample 1. .............. 20, 16, 26,27, 23,22,18,24,25,19;
Sample 2 ............... 27, 33, 42,35,32, 34,38,28, 41,43,30,37.
Obtain estimates of the variances of the populations and test whether the
two populations have the same variance.
(Extract from Statistical Tables) the VaHance Ratio.
Sand 1 per cent. points of F.
ANALYS,IS OF VARIANCB

(VI is the number of degrees of freedom for the greater estimate of "ada DCC
and VI for the smaller).

8 12

8 3.44 3.28
6.03 5.67
9 3.23 3.07
5.47 5.11
10 3.07 2.91
5.06 4.71

(I. A. S., 1955).

15. Give examples of situations in fields of application where analysis of


Variance might be used?
Twenty pigs are divided at random into fout loti with five pigs in each. Each
lot is given a different feed. The weight gain in pounds by each of the pi~ f91: a
fixed length of time is given in the following table. What do you infer from It ?

Feed A FeeaB Feed C Feed D


133 163 210 195
, 144 148 233 184
135 152 220 199
149 146 226 187
143 157 229 193

16. An agency wished to determine wheth~r five makes of automobiles would


average the same number of miles per gallon. 'A random sample of three cars of
each make was taken from each of three cities, and 'each car had a test run with 1
gallon of gasoline. The table records the number of miles travelled. Perform the
analysis of variance and state fully your conclusions.

CitIes
Allahabad Bombay Delhi
A 20.3 19.8 .21.4 21.6 22.4 21.3 19.8 18.6' 21.0
B 19.5 18.6 18.9 20.1 19.9 20.5 19.6 18.3 19.8
C 22.1 23.0 22.4 20.1 21.0 19.8 22.3 22.0 21.6
D 17.6 18.3 18.2 19.5 19.2 20.3 19.4 18.5 19.1
E 23.6 24.5 25.1 17.6 18.3 18.1 22.1 24.3 23.S
Designs of Experiments 31
Meaning and Need
Data are the fundamentals of statistics and designs are forerunner
of data. Design is a plan for the collection and analysis of facts ane
figures. Preparations of statistical designs should be done very care
fully as any error committed at this stage is likely to upset the entire
investigation. Generally the need for a well-thought-out-plan is no
realised by many and the importance which this problem deserves is no
given to it with the result that many inquiries do not serve the purpose
fo>: which they are conducted. They give misleading conclusions ane
are very apt examples of the misuse of statistical methods. The technique
of the collection of data and the methodology of their analysis have ~
great bearing on the reliability of the results arrived at. Therefore,
in any statistical enquiry the problem of collecting and analysing data
must be very carefully considered. The selection of an efficient design
requires careful planning ir. advance of data ~ollection and analysis.
Thoughtlessness in the selection of a design is' very likely to make the
statistical enquiry entirely useless. A detailed analysis of the actual data
involves huge cost and labour ll.::1d therefore it is im~erative that an
efficient design must be selected. ';:'his can be done only by visualizing
the analysis of data obtainable in different plans and keeping in mind
their standard error and cost. The design which gives the smallest
sampling error is supposed to be best design for a particular investiga-
tion. We have discussed the technique of the collection of data and the
the<?ry of sampling in earlier chapters and a repetition of the same is
not necessary here. Presently we shall confine ourselves to a brief
study of some of the important designs of experiments.
Experimental Designs
Experimentai designs concern the arranging of treatments in such
a manner that the inferences and conclusions regarding the effects of
these treatment;; can be easily done and their reliability measured.
Experiments are made with a view to find the validity of a particular
hypothesis and to have an idea about the extent of the reliability that
can be placed on a particular conclusion arrived at. If a physici?.n
wants to know whether a particular drug which has been invented will
be beneficial in the treatment of a particular disease or if a farmer wants
to know whether a new type of fertiliser will give him better yields he
will frame his investigation in terms of some suitable hypothesis. Mter
this he will design an experiment to find out whether the hypothesis
which he has presumed is correct or whether it is wrong and conse-
quently has to be rejected. The selection of the design will have a very
. \
DESTGNS OF EXPERIMENT~

great bearing on the accuracy of the ultimate resu.lts. If a wrong desIgn


is selected it is quite likely that the conclusions arrIved at may be absolute-
ly fallacious. In the course of investigation or experimentation many
complications and dpubts are bound to arise and unless they are removed
the experimental design will not be adequate.
An example will make the above point clear. Suppose a farmer
wants to compare the different effects of two fertilizers on the yield of
wheat. To do this he selects a 2.0 acre field and divides it in two equal
parts. He dresses one part with one type of fertilizer and the other
part with the other type of fertilizer. He sows the entire field and
measures the yield of both parts at harvesting time. On the basis of
this experiment he tries to find out whether there is any significant
difference in the effects of the two fertilizers on the yields of wheat.
On first thought it appears as if there is nothing wrong with this design
of experiment and the conclusion which the farmer arrives at should be
reliable and dependable. A doser examination will, however, reveal
that this design of experiment contains many loopholes and consequently
the results may not be infallible. Here the fault lies in the fact that
over such· a large field there are almost certain to be considerable
variations in the natural fertility of the soil. On one part of the field
there may be greater fertility than the other and if it is so, the difference
:n the two yields will not be totally due to difference between the two
types of fertilizers. Another weakness in this design is that since there
is only one measurement for each fertilizer it is not possible to find out
if the observed difference between the yield of two fertiizers is a signi-
ficant one or not. It is thus very clear that the selection of the design
of experiment should be done with utmost care as any inappropriate
design is likely to give such inference which may ultimately be found to
be inaccurate.
The method of trial and error or to be more apptopriate, the method
of random sampling is likely to help considerably to do away with the
inadequacies of experimental designs. By a random selection of ex-
perimental units it is possible to remove the ambiguity about the casual
interpretation of the observed associations. Random sampling is the
most essential ingredient of all experimental designs. Besides, there
are many devices for increasing ¢e precision of the inferences and the
calculations arrived at.
Comparison in Pairs
In an experimental design a better comparison can be made between
the effects of two treatments if their results are separately grouped in
pairs and the analysis is carried out on the basis of the differences between
the val~es of each pair. . An example would illustrate this point better.
SupposlOg a firm of englOeers wants to find out whether the steel pipes
which are coated with lead are less affected by corrosion than the un-
coated pipes. For this purpose they take 10 coated pipes and 10 un-
coated pipes and they bury them under ground at 10 places in pairs so
that at each place there is one lead coated pipe and other uncooted on~.
FUNDAMENTALS OF STATISTICS

They are taken out after some time and the effects of corrosion on each
pair is noted down separately. At this stage two types of experiments
can be conducted. One is that the average effects of corrosion on 10
coated pipes is calculated and a similar calculation is done for the 10
uncoated pipes. These two averages can be compared and an inference
can be drawn whether the effects of corrosion on the two types of pipes
are significantly different from each other or not. This design of experi-
ment is not a satisf'actory one because in this method the effects of
variations between soil types has not been eliminated. The effect of
corrosion may be due to (x) soil type which may vary from place to
place; (z) due to the type of pipe coating and (3) due to random varia-
tions. In the above design it is not possi ble to eliminate the effects
of variations in the type of soil which may be there at different places
where pipes were burried and as such it is difficult to say that the diffe-
rence in the effects of corrosion in the two types of pipes is entirely
due to the type of coating on them.
However, if the differences in the corrosion effects of each of the
10 pairs is found out and then analysis of data is done this difli!:ulty
will _not be there. In this case we shall have 10 figures representing
the difference in corrosion effects and we shall find out the average of
these differences and proceed on the null hypoth!!sis assuming that there
is no real difference in the corrosion effects. We have already illustrated
this technique by numerical examples in the ,chapter of analysis of small
s~mples. In this design if it is calculated that the .hypothesis has been
disproved and should be rejected and that there is a,significant difference
in the corrosion effects, we are on more solid grounds than in the pre-
vious case. The reason being that the effects of variations between
the soil types have been eliminated in this case because each pair is in
the same type of soil.
In certain studies the problem before the investigator is to test the
significance of the difference between a group of means. This is usually
solved by the technique of analysis of variance. We have already dis-
cussed this technique in an earlier chapter and a representation of the
same is unnecessary. Sometimes a more elementary and less accurate
test than the analysis of variance is used and this depends on the distri-
bution of the range. The technique which is used is known by the
name of standardised range. The range of a sample is standardised
by dividing it by the standard deviation. Standard range is cal~u1ated
for different samples and it forms a statistical distribution from which
the significance of any observed result can be calculated.
Latin Squares
This is an experimental design which is mostly used in agricultural
experiments where nature plays a very important role and renders even
fairly reliable results very difficult to obtain. In agricultural investi-
gations a number of experiments have to be carried out simultaneously
and inferences have to be drawn under conditions which are different
from those ob~ining in other type of studies.
DESIGNS OF EXPElI.1MEN'l'S 799
An example would make the above point clear. Supposing an
experiment has to be made in which the effeets of 4 different type. of
fertilizers on the yield of wheat is to be tested. In such a case the most
important point is to take into account the varying fertility of the soil
in different blocks in which the experiment has to be conducted. If
this factor is ignored then the result obtained would not be dependable
because the difference, if any, in the various fertilizers would not be
exclusively on account of their intrinsic worth but also on account of
the difference in the fertility of the soil on which the experiments have
been conducted. To remove these difficulties the technique of latin
squares is adopted and the experiments are conducted in such a manner
that the ultimate results are not effected by the varying fertility of the
soils of the various blocks. In the above example the field on which
the experiment will be cond)1cted shall be divided into 16 similar small
plots which will then be treated with fertilizers (A), (B), (C) and (D)
as shown below : -

A B C D.

B D A C

C A D B

D C B A

The above arrangement has been, made in such a manner that the
effects of difference in soil fertility would have no bearing on the ulti-
mate conclusions and inferences which are arrived at. The arrangement
of -the fertilizers in the above experiment is such that ":-
(I) Eam fertilizer appears 4 times in the design.
(ii) Each fertilizer is used once and only once in each row, and
(iii) Each fertilizer is used once and only once-in each column.
Under the above arrangement a detailed comparison can be made
of all the fertilizers by comparing the average results of the 4 sections
used to each fertilizer.
The great advantage of this type of design, which is only one of its
kind, is that it enables difference in fertility gradients in the field to be
eliminated in the comparison of the effects of the 4 fertilizers on the
yield of wheat. Hence even if there is a difference in the intensity of
fertility among the 4 plots there will b'e no significant difference between
their respective yields. In practice, the results of individual fertilizer
a:s well as the mean results of all the fertilizers in each section would be
used in analysing this design.
This method of analysis, however, suffers from one .defect Ilnd it is
that although each row and each column represents equally a114 fertili-
'):ers there may be considerable difference in the row and column means
FUNDAMENTALS OF STA'l'TS'l'ICS

bOt.l Up and across the field. However. this defect can be removed
by making the means of the rows and columns the same as the field mean
by adjusting the results.
Factorial Designs
In economic and social phenomena usually a large number of
factors affect a particular problem and this multiplicity of causes creates
certain difficulties in the way of analysis of results and also in making
generalisations. In such case attempts are made to study the effects
of a single factor at a time by making the other factors constant as far as
possible. It is not always possible to do so ('In account of the nature of
economic science where the various factors operating at a time are so
intermingled that their effects cannot be studied individually. However.
there are certain methods by which an idea can be obtained, though not
very precisely about the effects of a single factor by keeping aside the
effects of other factors which a're kept constant. Usually in such cases
the experimental design is so arranged that the effects of irrelevant factors
may be eliminated by analysis. This process is known as Balancing.
In the experiment referred in earlier pages of this chapter about the
corrosion of steel pipes the factor whose effect was to be measured was
the coatfng of pipes and the other irrelevant factor namely the type of
soil was eliminated by pairing the results. Similarly in the experiment
of fertilizers the difference in the soil fertility was eliminated by using
a Latin square design. I
But the process mentioned above should not be used fa those
experiments where the effects of varying more than one factor are to be
determined. The fo)-lowing example would illustrate the point. Sup-
pose that a university wants to evaluate 3 methods of presenting lectures.
Suppose that these methods are called A, Band C respectively. It
also wants to evaluate different times of a day at wqch the lectures
might be given namely morning, mid-day and aftetnoon. These two
factors-method and time of presentation can be evaluated simulta-
neously by the design shown below :-
Method
Time Total
A B C

Morning I I I 3
Mid-day I I I 3
Afternoon I I I 3
-
; 3 3 9
Here 540 students offering the subject can be assigned on the basis of
random sampling to the 9 sections shown above. Each section will
contain 60 stuaents. All the 540 students will take the same examination
and then the performance of each section will be measured by calcula~
ting its mean score in the examination.
DESIGNS OF EXPERIMENTS

It is possible now to compare the methods of the presentation of


lectures by comparing the average performance of the 3 sections exposed
to each method. It is also possible to compare the times of day by
comparing the average performance of the 3 se¢ons at each time of
the day.
This type of design is called a TlIJo-Fatlor-Fatlorial Design. Simi-
larly factorial experiment can be made with more than 2. factors at a time.
Factorial design provides us with two advantnges. Firstly it pro-
vides equivalent accuracy with less labour and as such is a source of
economy. In the example discussed above, two factors-method and
time-have been evaluated in one single experiment. If instead of
adopting this design the method of varyihg the factor, one at a time, was
used, two experiments would have been necessary and it could not have
been possible to appraise both these factors in one trial. Another
advantage of this design is. that it gives information about such effect ..
which cannot be obtained by treating one single factor at a time. In
experimental work sometimes it is necessary where the effects of one
factor varies with changes in other factors, and this study cannot be
made if one factor is 'varied at a time.

Questions
1. Discuss the need and utility of planning statistical experiment.
2. What is meant by 'Experimental Designs' ?
3. Write, short notes on
• (i) Latin square.
(ii) Factorial Designs.
Statistical Quality Control 32
Meaning
It is a well-known fact that all repetitive processes no matter how
carefully arranged are not exactly identical and contain some variability
which cannot be assigned to any particular cause. Even in the manu-
facture of commodities by highly specialised machines it is not unusual
to come across differences between various units of production. For
example, in the manufacture of bottles, corks and cartons, even though
"highly efficient machines are used some difference may be noticed in
'\rarious units. It is difficult to say what actually is the cause of it, but
the facts remain that such variations are not uncommon. If the
difference is not" much it can be ignored and the product can be passed
off as O.K. But if it is beyond certain limits the ;trticle has to be
1:ejected and the cause of such variations has to be investigated. In
statistical quality control such limits are laid down within which the
repetitive process-whether it is manufacture of corks and bottles or
the estimatton of printers-errors or complaints from customers-should
vary. If the variation is within the specified limits there is no cause
for alarm but if it crosses the limits, then the hypothesis has to be rejected
and fresh decisions have to be taken. The uPl'er and the lower limits
thus laid down are known as Upper Control Lirillt (U.C.L.) and Lower
Control Limit (L.C.L.).
Statistical quality control thus is a wellaccepted and widespread
process on the, basis of which it is possible to understand the funda-
mental'principles and techniques by which statistical decisions are made.
Statistics by itself is a science in the application of which ·certain important
decillions nave, to be arrived at. It has already been pointed out t)1~t
broa.dly spealting statisticS refer to the process of collection, analysis
'and rnterpietation of .certain figures relating 1:0 some facts. At each
of thes~ stages before the actual application of the statistical technique
it is essential that some decisia;ns are arrh;ed at, because without them
the study would not'serve the purpose (or which it is meant. Statistical
quality control enables us to understand the important idea of statistical
decision . ~kin~. ~n a<:count of this iml?0rtance statistical quality
control IS applIed 1n varIOUS types of studies.
Statistical quality contrbl also provi<les an opportunity to introduce:
the important idea of sequential sampling. It enables us not only to
tak~ decisions about accepting or rejecting a particular inference but
it also helps in deCiding to withhold. judgment and to su~gest the collec-
tion of more data to arrive at a more fundamental deCision.
STA~CAL QUAunT CONTROL

The idea of statistical quality control became very importani.~tl .


.widespread during World 'War II and in America, England and nllny
European C:ountries there are specialised societies and institutes for the
~tudy of.this problem and intensive research is being done to improve
Its te~que.
Purpose
The purposes for which the statistical quality control are used are
t~o-fold, namely
(I) Process control and
(2.) Acceptance inspection.
In both these caseS samples are taken through various sampling
proceduJ;es which we have already discussed in earlier chapters. On the
basis of the data collected certain decisions are arrived at. In drawing
inferences and arriving at conclusions, as we have already diScussed the
s~ze of the samples, their probable distribution and the characteristics of
the population from which the samples have been taken, are kept in
mind. Samples taken may be single, double, multiple or sequential.
Sequential sampling is, however, more popular than others.
In process control as the name itself suggests an attempt is made
to find out if a particular process is within control. If the process is
wit¥n control then it can be expected that its results in future would be
as they hid been in the past. Process control thus aims at helping the future
performance. The criterion used for this purpose is, whether the latest
results of the process are within the particular range of analysis. To
arrive at this conclusion, the help of control chart is taken. As we have
alrbady pointed out, variations among the items may be due to chance or
may be due to other factors. If the variations among the items are in ran-
dom samples it may be attributed to chance and hence individual cases need
not be satisfactorily scrutinised. If the process is in control and the per-
fqrmance is not satisfactory then it should be remedied by effecting some
changes in the process itself. ,But on the other hand if the process is out
of control at1d the performance is not satisfactory then there is room fO,r
t\1rther investigation and specific causes must be located, in the variations
~hich have been noticed. Under such circumstances it would become;!
necessary to remove these specific causes if an improvement is to be
~ffected in the future performance. In the chapter on Business Fore-
~sting we have noticed that predictions about future are generally
based on past results and here also if a process is in control it is relatively
safe to predict the future ,on the basis of past performance. It should be
remembered, however, that the process control is not meant fOJ: deciding
:Whether the process itself is satisfactory or not. It only aims at finding
lout whether the process is in the state of statistical control and
'whether predictions can be made on this basis. 1£ the process is out of
'control the question of making prediction about future does not arise
because variations in items are not necessarily random here and -may
be due to certain causes which need investigation. Predictions about
80'1 FUNDAMEN'i"ALS OF STAnSTICS

future can be made only if the process is in .control because then the
variations can be attributed to chance.
Acceptance inspection as the name suggests is a method by which
it is possible to decide whether existing group of things should be accepted
or .rejected on the basis of a sample study. It is obvious that the universe
in such cases is finite and consists of items which are generally equal to
a lot. A random sample from a lot is taken and the sample material
is inspected by using definite statistical standards. On the basis of the
sample study a decision is taken about the quality of the entire lot and
on the basis of such decision the lot is either accepted or rejected. In
acceptance inspection the standards are fixed according to what is
required of the product whereas in process control the standards depend
on the inherent capability of the process. In acceptance inspection,
"for example, we can take a decision that if 5% or less articles ~n the
samples are defective the lot will be "accepted and if the percentage
exceeds 5 it will be rejected. In process control it is not possible to lay
down any such limit in the beginning and the limits will depend on the
capability of the process itself.
With this preliminary background we now proceed to examine the
'Process Control,' and 'Acceptance Inspection~ in some detail.
PaOCEss CoNTROL

Evolution
In old days when there was no division of labour and nO standardi-
sation of the goods prodQced and when an artisan produced a commodity
from beginning to end, there was no problem of process control. Each
unit was produced as an independent unit and there was no uniformity
among corresponding parts of the units of production. Later on,
with a change in technique of production and with the advent of division
of labour and mechanisatiOn, instead of one whole commodity being
produced at one place or by one person. various parts of the commodity
began to be p.roduced separately and in large n\lmbers. Then it became
necessary that the various corresponding parts produced must be
identical so that they can be assembled together without any clifficulty.
But it was found that due to variations in the raw materials consumed
or the tools used there was no perfect uniformity in the articles produced.
The recognition of this variation of the goods produced amounted
to admitting that some variations in process measurement were unavoid-
able an~ could not be escaped. ThiS leads to the concept of "tolerance
limits? and then variations began to be permitted if they did not exceed
certain prescribed limits. Later, more studies were made in this tech-
nique of laying down limits and thus maintaining a certain amount of
uniformity in the articles produced. In recent years many researches
have been conducted to improve the technique of process control to
enable manufacturers to make more confident and accurate predictions
about the quality of the commodity produced.
STATISTICAL QUALITY CONTROL .805

Control Chart Technique .


The technique of process control is, as has been pointed out earlier,
to prepare control charts and to see whether a product is within limit
laid down. A control chart is a graph which presents lines defining
the range of expected variability. It also gives a running record of
quality measurement by which every new measurement can be graphically
compared with the past perfo~mance and thus can be properly evaluated.
Control chart is thus an important medium through which statistical
methods are ~pplied in the working of modern manufacturing units.
Chart of individual observance
It has already b~en discussed in earlier chapters that if a sample
is taken from a normally distributed population then practically all the
items (99:7%) have a value within mean ±3 times of the standard devia-
tion. If M stands for the mean and u for the standard deviation then
M±su covers 99.7 per cent. cases, M±zu about 95 per cent. cases and
M± IU about 67 per cent. cases. Control charts are prep-:!.red on this
very basis. The upper and the lower limits can be indip.lt~d by the
values M±3u respectively. For individual observance the control
chart would then be of the following type.

VCL (M+3u)
Mean
LCL (M-;u)
o L-________________

In the above graph VCL stands for Upper Control Limit, LCL for
Lower Control Limit and x represents the numbers obtained in an obser-
vance. If the population is normally distributed, successive obser-
vance if plotted would mostly fall within these limits. Only 3 out of
1000 can fall outside these limits.

The construction of this chart will be more clear by an illustrative


example. If during a certain season, the average proportion of books
not returned by the due date in a particular library is known and stable,
we can plot a control chart resembling the above chart on which we
shall plot the results of samples for different days-say the number of
books not returned among a sample of 100 due on a given day. The
VCL and LCL are set so that, if the process is in control, only' about 3
plotted points in 1000 on an average, will fall outside the limits. 1f a
point falls outside the limits, some cause other than the chance must be
assigned to it and care should be taken to change the process. In our
example if a point falls outside the limits, there might have been an
increase in thefts or in errors in checking or returned books.
The appropriate distance of control limits can be determined from
the acceptable false alarm rate. If the process is .in control, there will
80(1 FUNDAMENTALS 9F STA~STIC~'

be fAlse alarms from; of every 1000 samples, on an average. If the false


alarm rate is more, the limits can be set farther apart and in doing so
the risk of not detecting an important change when it occurs is increased.
Generally 3 false alarms in 1000 samples is normal. However, in some
cases it may be economically advantageous to accept more false alarms
in order to reduce the risk of missing a change which shoold be corrected.
Chart of averages
In practical usage, means of small samples are u.sed and control
charts are very rarely used for single observations because of the follow-
ing alleged reasons :-
(I) Individual observations are more susceptible to variations than
means and herice single observations are not able to discriminate bet-
ween in control and out of control processes.
(ii) Individual it~ms are seldom normally distribut~d and so the
risks may not be at all what they Seem from the normal curve.
(iii) If ob_servations are grouped on a rational basis-for -example,
items produced by one machine ate kept separate in a group, then the
average of the variabilities within gtoups prOVIdes an appropriate measure
of the variability to be expected between groups if the process is in
control.
Sometimes ZoO' is used instead of 30' ("two-sigma limits instead of
three sigma limits"). This is advantageous from con~umer's point of
view AS his risk is reduced as the distance between UCL and LCL is
reduced. It may be noted that LCL does not always exist as for example.
in sampling for proportion defective, the true proportion may be so
low that 2ero defectives in the sample would not be unusual.
ReiaJionship behPeen the control chart and the normal distriblltion. With
:fixed control limits. a change in M will be equivalent to a v~rtical shift
of the normal distribution and this will increase the probability of A point
falling outside the limits. It is because of the fact that an increase in 0'
will also increase this probability, and a decrease in a will decrease it.
Thus the control chart for averages tends to detect process changes that
involve changes in the mean or increases in the variability, but in no
case it will detect decreases in variability.
Control Chart of Range
Sometime separate control charts are used for variability apd here
the range of R is the usual measure of variability. To make a distinc-
tion between them, we call control charts based on means as X control
charts and control, charts based 'on the range as R control charts or R
charts. Usually some objections are placed to the range ~s a measure
of dispersion. But these objections are unimportant here. In patti.-
colar, all samples are of the same size, so variations in the range corres-
pond with variations in the variability. Generally the mean range
within groups is usually used for setting control limits on ~, rather
than the mean standard deviation.
807

Control chari and anarysi:r of variance. Analysis of varillnce is a method


used for testing. the si~~nce of a large number of samples and
thus it is a method of testing the null hypothesis that a group of
population means are equal. In principle, a similarity may be sought
in the conttol chart for means, and the analysis of variance. In both
these methods the variability witDin samples is used to deduce the varia-
bility among or bef1lleen the samples. Then,-if the actual variability among
samples agrees, except for an allowance for sampling Huctuations, with
the amount consistent with the within-sample variability, dte null
hypothesis that the process is in control) or that all samples are from
populations having the same mean is accepted, othet_Wise, the null
hypothesis is rejected. However, the control chart differs from the
analysis of variance procedure for testing a set of means in the follow-
ing mimner : -
(i) The -variability within samples usually is measured by ranges
instead of standard deviations.
(it) The test criterion is an extreme value of any individual mean,
rather than a swollen standard deviation among the set of means.
Control chart for defttts per unit. This type of control chart applies
to two rather specialized situations :-
(I) One is the case where a count is made of the number of de~ects
of such type as blemishes in a painted or plated surface of a given area,
weak spots in the insulation of rubber-covered wire of a given length,
or imperfections in a belt of cloth.
(il) The other is the case of inspecti! In of fairly complex assewbled
units, such as radio sets, aircraft engines. or machine guns, in which
there are a great many opportunities for oc ::urrences of defects of various
types and the total number of defects of all types found by the inspectors
is recorded for each unit.
Selution r!f control limits . Selection of control limits is animportant
task in the study of statistical quality control. Through whatever
method t}jtey are chosen they must not be confusing, ambiguous and
unsystematic. They must be clear, well defined lind forceful, which may
enable us to take a firm decision about our objective in future. It may
be noted here that statistical quali~y'~control chart by itself makes use
o~ well thought out and tested rules. As its p.rinciples !:lre not loose it
tIles. to avoid indecision, inconsistency and arbitrariness of haphazard
9uality control. It is undoubtedly true that statistical quality control
1S malOly based on the fact that there are variations in the repeated
random samples from a fixed population but these variations are such
of which we can- plan well in advance. It must be properly seen that
all basic principles are systematically adhered to.
In American- quality control work control limits at M±;O' are
most commonly used. The reason for the common usage of 3a instead
of I or 2 or some other multiple of a being that ~ is both a conservative
FUNDAMENTALS OF STA'rISTtCS

figure and a round number. A good statistician aims at having very


few false alarms and he is prepared to undertake every pains to avoid
still greater trouble which in future may arise due to the false results.
His difficulty will be eased only when he has a perfect knowledge of
such false alarms and this knowledge can be had only through conserva-
tive control limits ·.vhich result in few false alarms.
ACCEPTANCE INSPECTION

Mcaoing and technique


Acceptance inspection is a technique of judging the quality of a
group of things usually called inspection lot. The practical utility of this
technique is very great in our modern life particularly when large scale
production is being carried on in gigantic factories and the scope' of
International trade has considerably widened. The principle of mass
production for mass consumption so as to constitute a ma'ss market has
been accepted everywhere. Production technique has become a very
complicated and integrated affair and in such a situation it is necessary
to decide whether a particular lot supplied by a producer is an acc.·...table
quality or whether a lot that is ready and complete for shipment is of
satisfactory quality level. It may also be necessary to decide if an
advance payment can be made to a supplier who has sent a number of
invoices. All these are vital and important questions and a decision
about them must be taken very intelligently. ne technique of accep-
tance inspection as the name itself suggests is that pn practically be
applied as a test of decision.
In the technique of acceptance inspection past experience is not
of much help. Here every lot is inspected separately and distinctly
a:ld decisions are maqe on the basis of current knowledge received about
the lot by current inspection.
In process inspection some point in the process is selected for study.
At this point sample studies are done to find if process is in control or
out of control. Detailed information is gathered and a decision is then
ta.ken about the rejection or acceptance of the processes in question.
If the process is in control and jf the quality level at which the process
is in control at the time of production of the particular lot is known,
there is no point in having acceptance inspection so far as this particular
lot is concerned. If the quality level of the process is not fai~ly high it
will be necessary to find out whether various items of the lot are accep-
table or not. But if the quality level of the process is fairl! high there is no
necessity to examine the items and the lot because the chances of getting
a defective article are insignificant. It would be a rare chance that too
many defective articles are found in a lot in these circumstances.
A recent tendency is that big purchasing units save their time,
energy and cost necessary to examine various lots which they have
purchased by asking the manufacturers to supply their control chart.
They have only to check the control chart and they need not go through
the botheration of acceptance inspection and need not preparc_ ac-cep-
STATISTICAL QUALITY CONTllOL

tance chart themselves. As a ~sult of this, the manuf.lcturers are now


'more careful ~bout their control cha~ and they tty to improv.e the
quality of thelr goods and lower theIr cost.
When a lot is rejected it does not meaB that all the items of the
lot are destroyed or scrapped. All that it means is that it is carefully
examined and the defective articles are removed. so that the remaining
items conform to a particular quality level. The rejected articles can
be put to alternate use or sold at a iower price. Rejection thus keeps
a manufacturer on his toes and he is always careful about the quality
of goods he supplies and ultimately results in improving the quality
level of the goods produced or supplied.
Sampling technique
Acce~ce inspection is usually done on the basis of sampling.
A census mspection or 100 per cent. inspection is not only uneconomical
but impracticable and useless also. A 100 per cent. inspection is no
guarantee of absolutely good quality of the lots purchased. The results
with a scientific sampling procedure may be better than those with a
100 per cent. inspection. We have already discussed this point in the
chapter relating to the theory of samplitig and it need not be elaborated
here. It should, however, be remembered that only a good sampling
procedure can give good results. Sampling p~cedure established
without recognition of the laws of pro~ability would usually give a very
unsatisfactory degree of protection against inspection of defective
articles. Poor sampling procedures are in the long run more costly
also and hence the nced for a really good inspection technique cannot
be over emphasised. But "it should be recognised .that although
modem sampling inspection procedures are generally superior to
the tradi~ional sampling methods and one.who uses acceptance sampling
must face the fact that whenever a po{tlOn of the stream of products
submitted for acceptance is defective, some defective items are likely
to be passed by any sampling acceptance scheme."
Sometimes sampling for acceptance purposes is not done on the
basis of definite rules regarding sue and frequency of the' sample. In
many cases the inspector permits current decisions on ac.ceptance to be
influenced by his knowledge of past quality history of the product
which is being sampled. Although ,it may be sound so far as 1t goes,
such an informal system for determining the basis of acceptance as its
limitations. Inspector's memories of 'past quality history may be short
or inaccurate. A particular inspector who has this past knowledge
may die or resign or be transferred to another job. It may be also that
on account of confidence about the past history the inspector may become
negligent in inspection and fail to discover when the quality has changed
for the worse. Due to these limitations it is' necessary to have definite
rules regarding si:te and frequency of the sample and the basis of rejection
or acceptance. But it is also true that in many cases such formal rule
may gi\Te leas quality protection than the informal scheme of letting the
inspector use his judgment. Since sampling is a problem involving the
810
- ., FUNDAMEN'l'ALS OF STATISTICS

b\Vs of chance, the determination of good aunpling accepWlce scheI?e


requires cosideration of the mathematics of probability. One reason
why many common sampling acceptance plans arc really bad is that
people who specify them do not mllise how little protection they
alford.
Sampling plana
LOI-by-lol alupI4I1l', mint, sitl,pl, sampling hy allribRIe.r. In accep-
tance inspection a defective article is one that fails to conform to speci-
fications in one or mote quality'characteristics. A connnOtt procedure
in acceptance sampling is to consider each submitted lot of a product
separately and to base the decision on acceptance or rejection ot~ lot
on the evidence of one or more samples chosen at random from the
lot. If the decision is alWtlys made on the evidence of only one sample,
the acceptance plan is described llS a .ringl, sampling plan.
In any plan tor single sampling usually three nultlbcts arc specified.
One is the total number of artlcies N in the lot from which the sample
is to be drawn. The second is the number of articles in the random
sample drawn from the lot. The third is the acceptance number c.
This acceptance number is the maximum allowable number of.defective
articles in the sample. More than c defectives will cau6c the rejection
of the lot. For instAnce_, a sampling plan may be as follows :-
N= 500 '\
n= 50
c= 5
These 'three numbers may be interpreted as ~aying: Take 'a
1:andom sample of 50 from a lot of 500. If the sample COQ.Wm mote
than 5 defectivcs reject the lot, otherwise accept the lot.
No sampling plan can give complete protection against the aicep-
tance of detective product. A pnrctical difficulty in devising an ideal
:sampling plan is that it is not possible to change the laws of chance.
DDuble sampling. "Double sampling involves the possibility of
putting off the decision on the lot until Ii second sample bas been taken.
A lot may be accepted at once if the first sample is good enough. or
,;ejected at once if the first sample is bad enough. H the nmt Slmple
is neither good enough nor bad enough, the decision is_ based on the
evidence of the first and second samples combined. In general. double
sampling schemes will involve less ,total inspection than single sampling
for any given quality protection. They also have certain psychological
advantages based on the idea of giving a second chance to doubtful
lots."
Multip/' or sequential s4mpling. Just as double IIMnpling plans may
defer the decision on acceptance or rejection until a second sample has
been taken, other plans may permit any number of samples before a
decision is reached. Plans permitting from three upto an unlimited
number of samples arc dcscribcd as multiple or sequential.
1'l'ATISTICAL QUAI:ITY COl!TIlOL \8'11
, .
NIUgI a1llOIIIJf fJj ItlIllPJillg. One of the most important considera-
tions in choosing among these sampling plans is the number of observ-a-
tions, required for a decisi~. For the single sampling plan the size of
the sample is fixed. so the amount of inspection requiJ:ed i. knOWn.
In the double, multiple and sequential sampling plans, however, there
is variation in the number of items needed to make a decision. The
average sample number or ASN 'Yill depend on the quality of material
submitted for. inspectio~. If the lot is of a high qu~lity,_it will. on the
avemgc, be ae.c:eptc:d early and the ASN will be small. If t~ lot is
of poor quality, it will. on the average, be rejected OIH:l,; and the ASN
will again be $malL The ASN will be larger far lots of intermediate
quality, where the appropriate decision is less obvious.
Conjlkjinl inftre.rts of fOllJllflllr.l anJ protls«,rs in fIJI .r4ktfion qf sampling
pian. There are always two parties to an acceptance ~Qceaure -the
party which submits the product for acceptance or rejection and the
party to whic:h they a1'C submitt~d. These parties are usually: referred
to as producer and. consumer or seller and buyer respectively. Ge-
nemlly speaking the int~ests of these two parties are conft.icting. The
producer or seller wants prdtection against th~ rejection of too much
good product and the consumer or buyer wants protection against
acceptance of too much defective prOduct. Some 'Of the sampling plan
may ,be favourable to t~ pro~r and the others to the consumer and as
such -the seIectton of the sampling acceptance plan has a great impor-
tance. The sampling plan which is firially selected must be such that
both producer's as well as consumer's interests are protected and a fair
judgment arrived at. It should, however, be kept in mind that a sub-
stantial rejection of good prodl1ct, in. an effort to exclude bad product.
is' not really in the interc:st of the consumer. The consumer is interested
in quality. He is also interested in cost. In the long run the costs
inciden,tal to the rejection of good product tend to be passed on by
the p.:oducet to the consunier. Sometimes the consumer may be in-
t~es.ted in having the prOduct immediately and any good product
which he rejects may not be available for his immedia'te use.

O. C. CUrve
By O. C. curve is 'meant pperating Characteristics Curve. The
O. C. curve of an acceptance sampling plan shows the ability of the plan
to distinguish betweeri good and bad lots. In judging acceptance
plans it is desirable to compare their performance from a range of
,possible quality levels of the submitted product. For any given per-
centage of defective articles in a su,bmitted lot the O. C. curve shows
the probability that such a lot will be accepted by the given sampling
plan. In other words O. C. curve shows the percentage of submitted
.ot that would be accepted if large number of lots 'Of any speciiied qua-
ity were submitted for inspettion; The O. C. curve an be thought
)f as showing the probability of accepting lots over a stream of pro-
:lucts having a certain pettcntage of defectives.
812 FUNDAMENTALS OF STAnSTIC~

Since the interests of the producers and consutners or sellers and


buyers conflict, they require separate and different types of sampling
plans. Seller wants a sampling plan which would not care abopt the
perfect accuracy and pass one or two defective items which are below
the fixed standard. The buyer will, however,. be .interested in sud! a
sampling plan which rejects almost all defective items. He will not
like to have a single defective item in the lot. Seller wants that even
if bad articles are excluded all good articles must be included. But the
buyer does not care if any good articles are not included. What he is
particular about is that no bad article should be included.
With any given sampling plan, the acceptance or rejection of a lot
is a matter of d!ance, since it depends on a random sample; the probability
of acceptance, however, depends on the true quality of the lot. It is
important to understand tnat the probability of acceptance is the pro-
bability that if a lot of a certain quality is offered it will be accepted.
It is not the probability that if a lot is accepted it will be of a certain
quality. The latter would be proportional to the product of two
probabilities :-
(I) the probability that a lot of the stated quality will be
offered, and
(i;) the probability that if a lot of the-$tated quality is offered
it will be accepted.
Only the second of these probabilities on be tontrolled by an
inspection plan. Fo.r example. if all lots submitted are of the t;ame
quality, that will be the quality of the lots accepted. This principle
of the inspection has been expressed by inspection men as quality cannot
be inspected into the product, it must be built in.
The O. C. curve represents the relationship between the prol>ability
of acceptance and the quality of the lot. Usually the percentage of lots
accepted decreases as poorer and poorer quality is considered. If the
lot has 2:e.ro percentage defective article it is certain to be accepted and
if it has 100 per cent. defectives it is equally certain to be rejected. Hence
O. C. curve has to start with 100 per cent. and go up to 2:ero per cent.
Generally 100 per cent. probability means certainty but even with 100
per cent inspection there would in actual pr"lctice be no certainty of
accepting all good lots and rejecting all bad ones, because inspection
involves human factor. Human mind may be subconsciously biased
o~ negligent. •
The size of the sample has a direct relationship with the accuracy
of its result. We have discussed this point in chapters on sampling.
Other things remaining the same, the larger the size: of the sample the
greater is its accuracy and more reliable is conclusion arrived at, form its
study. The larger the sample the steepr is the O.C. CUIVe and the
smaller is the zone between the qualities that are almost always accepted
and the qualities that are always rejected. But the lafger the sam pie
size the greater is the inspection cost and hence a judicious inspection
is always a problem of balancing these two factors, viZ' accuracy and cost.
S'l'ATISTICAI. QUALI'I'Y CONTl\OI.

Conclutdon
"Without quality control you, as a producer or purchaser, are in the
same position as the man who bets on a horse race-with one exception,
the odds are not; poswd. Statistical quality· control will give you ,t,be
odds on which you wish to pl::~ce your money, your man power, your
tools and your materials. It will tell you at what level and with what
variation you are operating and, more important~ it will tell you when
your process, tools or materials change from that level and range of
variability-possibly inost important of all will be the change in outlook
on your purchases or production, and the inspection of both. The
dazzling light which statistical quality control throws on everything
surrounding the charatteristic bein$ examined many time reveals
startling facts, sometimes good, sometunes bad. You will· be shocked
out of your complacency. Your philosophy will change' for the
better. Variability will be recognised as a natural inherent characteristic
of your production or, if you are buying of the incoming material."
.FRANK M. S'rAD'MAN

Questions
1. DiSCt1ll the need lind utility of Statistical Quality Contest.
2. What do you undcmtaod by "Process Control" ? How does it differ from
'Acceptance: I11Spcction' ?
3. Write II note 011 the utility of Control Cbatts in Statistical Quality Control.
4. ,HoW would you .elect a sampling plan fOl: acceptance inspection ?
5. Write lIhort ftotes oft :
(i) O. C. Cun-e,
(ii) Sequential Sampling.
(iii) Selection of Control Limita.
(ill) R.elatI~Dlbip of C>otrol Chart IU1d Normal Distribution.
Growth of Statistics in India 33
SBCnON (1)
S'1'AnSnCAL ORGANISAnON

EarIJ beginnings. As in other countries of the world, in India


also. the early beginnings of statistics were due to the interest of state
in Ct!rtaJr types of figures. In 'ancient times the kings and the ruling
chiefs used to collect statistics of population with a view to have rn idea
about their man-power for purposes of waging wars. Some statistics
also emerged out of the day-to-day activities of the governmtent and
were of the nature of by-products of administrative activity. As early
as ~oo B. C:, India had such statistics. In the times of Mauryan. kings
and later, during the regime of Ashoka and in Gupta Dynasty variolls
types of statistics telating to the economic problems of the country
emerged out of the administrative activity of the Government. KautiIJa's
Arthashastra contains many valuable facts and figures about Indian
economic conditions of that time. In Mughal period, particularly
during the regime of Akbar, India had a well-organi~ed economic set-
up and Ai!l-a-Akbari contains very interesting' and useful statistical
information about the Indian . conomy during the period ISS6 to 1601
A. D. However, in those da) s ,",""ere were no statistical organisations
in: the country for collection and analysis of data and most of the
statistics emerged as a result of the working of various laws relating to
land r!=venue and other types of taxes.
,18th emtmy. With the. fall' of the Mughal Empire, when the East
Inma Company came to power then also no' necessity was felt for the
cblJection of statistics ana there was no statistical organisation in the
country. Statistics relating to this period emerged out of the accounts
of ftports and imports maintained by the government. Later on, to-
wards the close of the' 18th century, the RyotJllari syStem of land tenure
.was introduced in certain parts of the country and then it became
necessary to collect certain statistics about the cost of cultivation, value
of !lgricUlturaJ l'roduce and agricultural prices, etc. These statistic;s
were collected' by land revenue officers and were specifically meant for
utilization by the Revenue Departtnent.
19th tellfllry. In the 19th ~tury it was on account of various
raimnes,. pai:tiCitiarly due to the famine of 1860 that some attention was
paid to the collection' of statistics, though even during this period no
sta,p.stical organisation was set up and only the Revenue and Administra-
tive ReportS of various provinces contained some statistical information.
However. it was in the year 1868 that the Statistical Absttact of British
India was first published from London. It continued to be published
GROWTH OF STA'l'IS'l'ICS IN INDIA 8.15

every year from London till 1923 'When its publication was done from
India. In the year 1874 Sir John Strachey, the then Governor of North
Western Province (now called Uttar Pradesh) wrote to the Secretllryof
State for India suggesting him the creation of a department for the
collection of statistical information regarding trade and agriculture and
!he appointment. of Ii Director of Agriculture and Commerce. It wll:s
In accordance wIth these recommendations that a Department of AgrI-
culture and C~mmer~ was set up in this pr?vince in the year 18~5.
One of the maIn functIOns of this department was to collect trade statis-
tics. It?d to suggest )Vays and means 0( improving ,the agri~l~ral
statIstics of the country. A little later the Indian FamIne CommISsIon
recommended the apP9intment of a Director of Agriculture in each
province and th~ appointment of Statistical Officers to assist him in his
work. In accordance with these recommendations, Agricultural Depart-
ments were opened in various provinces and the Central Agricultural
Department which was created in 1871 but which was closed due to
financial stringency arising out of the Afghan War, was also revived
with a view to co-ordinate the 'Work of the ve.rious provincial agrlpll-
~ural departments. Though these agricultural departments were pri-
marily concerned with the improvements of agriculture yet they collected
valuable statistical information about various agricultural problems.
In the yea:r 1881 the first population census was taken but since it was
not country-wide and was not complete it is usually left out of account.
Population censuses at that time and ,even as)ate as 1941 did not need
any permanent staff or department and as such no statistical machinery
'Was established for this purpose. The CensuS Department and the
Census Staff had a very short duration of service and used to be dis-
banded soon after the census operations were over. In the year 1881
the Imperial Gazetteer of ,India was also published for the first tjple. It
.contained economic statistics of the different parts of the eQuntry. Dur-
ing the last few years of the 19th century various departments of the
Government of India started collecting and publishing statistics infor-
mation relating to their subject. It was.in tl}e year 1894 that the first
crop forecast .of wheat production was. made in this country and in
subseq~nt ye~rs. forecasts were made of other agricultural commodities
also. A p1(Lblication entitled Reports of ,Agricultural.Statistics of British
India was brought out by the Revenue and Agriculture Department$ in
the year 1886., The statistics of foreign trade were _1Jublished by the
Finance and Commerce Department and in 1895 a Statistical Bureau was
set up to deal with the' agricultural statistics .and the statistics of foreign
trade. This Bureau 'Was bended by the Director-General of Statistics.
20th tenlll'l'J. In the' year 1905 the office'of the :I:)irector-General
of Cop:unercia! IntelligenCe 'Was created'to maintain liaison between the
Goverpment and the businessmen. The Director- Genetal of Commercial
Intelli~ce w,as also to look ~er the work of the Statisti~al Bureau
which was formerly under the Director-General of Statistics. In the
year 1906 this d~rtment brought out the first issue of the Indian
Trade-Journal. In the year 1912 when the headquart_ers of the Govern-
8161 FUNDAMENTALS OF STATISTICS

ment of India were shifted from Calcutta to New Delhi it wa~ [hought
desirable to separate Statistics from Commercial Intelligence. This
~eparation dislocated the work of both the departments and ultimately
1n the year 1922 they were merged again. The designation of the head
.;1. ~his d~l?a~~~n~ was the~chapged to Director-General of Commer-
cial Intelligence and Statistics.
~nurin'g ih~ first World War the folly of not industrializing this co-
untry and not keeping adequate figures about various economic problems
was realised by the British Government. The country got an impetus
for industrial development during this period and consequently the
question of collection of statistics also came to the forefront. The
Indian Economic Enquiry Committee was appointed in the year 1924
under the chairmanship of Sir M. Viswesvarayya with a view to survey
the then existing statistical material in this country and to make recom-
mendations for improvement, In its report the Committee recommended
that the statistics collected by all Central ~nd Provincial departments
should be placed under the supervision of a Central Authority and
further that each province should have a Statistical Bureau. The Royal
Commission on Agriculture agreed with the observations of the Indian
Economic Enquiry Committee but was of the opinion that the Central
Statistical Authority should merely be a co-ordinating agency and that
the statistics should be collected by various department$ separately. In
1931 the Royal Commission on Labour emphasised the necessity of
collecting various types of labour statistics and it suggested that there
should be legislation to facilitate the work of the collection of data. The
Royal Commission suggested the creation of an Imperial Council at
Agricultural Research. Its primary duty was to promote, guide and
co-ordinate agricultural research and to act as a clearing-house for
information in regard not only of reasearch but other general matters
also connected with agriculture and animal husbandry. It was also
to take over the publication work done by the Imperial Agricultural
Department. Though the Government of India did not accept all the
recommendations of the Commission in this connection yet by a resolu-
tion passed on the 4th of August. 1930. the Secretariat of the Council
of Agricultural Research was constituted as a department of the Govern-
ment of India. This department though primarily concerned with agri-
cultural research had a statistical establishment also and it co-ordinated
the agricultural research carried at different places. In 1933 a Statistical
Research Bureau was set up in New Delhi for the purpose of analysis
and interpretation of economic statistics. In 1933 MIs. Bowley and
Robertson w~re appointed to conduct an economic census of India.
They recommended the appointment of a permanent economic staff with
a Director of Statistics at the centre. The 'Work of this organisation
was to co-ordinate the statistics collected by the provincial and central
departments and also to conduct a census of production and census
of population. These recommendations could not be implemented
by the Government, but in 19.38 an Office of the Economic Adviser to
the Government of India was created. The functions of the Economic
GROWTH o.P STATIS'I'ICS IN INDIA 817
Adviser included collection and analysis of economic statistics. The
Statistical Research Bure~u started in 1933 was merged with this office.
The second World War again found India without a well-organised
statistical machinery. With the outbreak of War in 1939. need was
felt of collecting statistics about a large number of problems and this
resulted in setting up of small statistical organisations in various depa ~­
ments of the Goveroo\cnt both at the Centre as well as in the provinces.
These statistical units collected and analysed statistics and advised the
Government on matters relating to their respective fields. The Govern-
ment felt difficulty in the collection of industrial statistics and to over-
come it an Industrial Statistics Act was passed in 1942. The Directorate
of Industrial Statistics condu<;ted the first Census of Manufactures in
1946 after the Census of Manufacturing Industries Ru1es were passed.
The Labour Bureau started constructing cost of living index numbers
for certain urban and rural areas with the base year of 1939. In 1947
tbe Economic Adviser's Office also started publishing the General
Purpose Wholesale Price Index Number. This index number replaced
the earlier index number whicb was issued by the office of the Econo-
mic Adviser and which was of a sensitive type being constructed out
of 23 commodities only. A National Income Committee was set up
in the year 1949 and it has given estimates of India·s national income
for a number of years. National Sample Surveys were conducted
(and are still being carried on) for the first time in the year 1950. A
Statistical Unit was established in 1949 to co-ordinate all activities rela-
ting to sta.tistics collected in this country and later on this unit deve-
loped i~to Central Statistical Organisation in the year 1950.
In the year 1951 an international statistical conf..!rence was held at
Calcutta to study statistical problems which were common to all Coun-
tries and to suggest improvements with a view to bring about concep-
tual uniformity- in the data collected. The Collection of Statistics Act
was passed in 1953 and it empowered the Government to collect all
types of statistics relating to any matter. In the year 1956 the All India
Agricultural Labour Enquiry was conducted to collect upto date wage
statistics and other important facts relating to labour conditions in the
country. The AU India Rural Credit Survey which was conducted
in the year 1951-52 to collect statistics of rural indebtedness and other
problems of rural finance was followed up in subsequent years and
very valuable figures were collected. The Indian Statistical Insti-
tute. Calcutta and the Indian Council of Agricultural Research have
been doing very valuable work in the field of statistical research and
in the year 1960 the Indian Statistical Institute has been declared· as
an institute of national importance recognised by the Government.
Present Position. The present position of the statistical organisa-
tion of India can best be understood in the background of the Indian
Constitution_. According to the present Constitution there are Some
items over whiph the Central Government has exclusive control while
there are others which are under the direct jurisdiction of the State
Governments. There are some items which are under the jurisdiction
52
818 FUNDAMENTALS OF STATISTICS
f
of both the Central Government and of the State Governments. The
important items of the Union List are Defence, Railways, Posts and
Telegraph, Currency and Foreign Exchange, Banking, Trade and
Commerce with foreign countries, Census, Customs and Excise Dutios
and Income Tax. The Central Government is responsible for the collec-
tion of statistics with regard to all these items. The States List includes
items like Public Health, Agriculture, Livestock, Irrigation. Forest and
Fisheries, etc., and with regard to these items statistics are collected by
the State Governments though the Central Government has also a right
to frame laws relating to any of these items. The concurrent list inclu-
des Vital Statistics, Economic and Social Planning, Trade Unions,
Social Insurance, Labour Welfare, Relief and Rehabilitation, Price Con-
trol, etc., and with regard to these items both the Central Government
and the State Governments can frame laws and collect statistics. It
s·hould not be taken to mean, however, that there is a rigid line of de-
marcation between the fields of operation of the Central and the State
Governments. In fact there is a co-ordination in the 'Work of the
<:;entre and the States. The Centre acts as a co-ordinating agency and
publishes the statistics collected by various states on an all-India basis.
The above survey clearly indicates that the statistical organisation
in this country has gradually been decentralised. Formerly we had a
highly centralised system of the collection of statistics and the Depart-
ment of Commercial Intelligence and Statistics 'Was the pivot round
which the wheel of statistical organisation revolved~ All important
statistics were collected, compiled and published by the Department of
Commercial Intelligence and Statistics. It used to publish statistics
relating to Agriculture, Inland a_nd Foreign Tl!lde and Prices, etc. With
the expansion of the scope of economic statistics in this country this
single department could not cope w!th the situation and it 'Was thought
advisable to decentralize the system and thus to distribute huge task of
the collection of statistics to a number of statistical units. With this
view in mind the statistical organisation of the country was gradually
decentralized. At present, so far as the centre is concerned each minis-
try has a statistical unit (some have more than one) which is responsi-
ble for collection and compilation of statistics relating to the subject of
the ministry. There are about 90 full-fledged statistical organisations
attached to the various ministries at the centre. Important amongst
them are as follows : -
T. Ministry of Food and Agrimltllre
(a) Directorate of Economics and Statistics.
(b) Directorate of Marketing and Inspection.
(c) Statistical Wing of-the I. C. A. R.
(d) Statistical Branches of (t) Forest Research Institute, (ii)
Central Rice .Research Institute, (iii) Central Marine and
Fisheries Research Station, etc. .
(e) Institute of Agricultural Research Statistics.
'j
GROWTH OF S'I' A'I'lS'I'lCS IN INDIA 819
2. Ministry of Commerce and Indristry
(a) Department of Commercial Intelligence and Statistics.
(b) Office of the Economic Adviser to the Govt. of India.
(C) Directorate 'Industrial Statistics.
(el) Statistical Sections of offices of (i) the Textile Commissioner,
(ii) the Iron and Steel Controller, and (iii) the Chief Con-
troller of Imports and Exports.
3. Ministry oj Finance
(a) Department of Research and Statistics of the Reserve Bank
of India.
(b) Research- and Statistics section, Department of Company
Law Administration.
(c) Statistical Branch (Income-tax) of the Central Board of Re-
venue.
(d) Statistics and Intelligence Branch (Customs and Central
Excise), Central Board of Revenue.
4. Ministry of LaboRr and Employment
(0) Labour Bureau.
(b) Statistical Unit in the Department of Mines.
(c) Statistical Branch-Agricultural Labour Enquiry.
Cd) Statistical Section of the Directorate of Resettlement -and
Employment.
5. Ministry of Home Affairs
Office of the Registrar-General and Census Commissioner of
India.
6. Ministry of Healtb
7. Cabinet Secretariat
Central Statistical Organisation-Directorate of National
Sample Survey and National Income Unit.
8. Planning Commission
Specific technical studies being conducted in the different
dIvisions and by the Evaluation Organisation.
Similarly under other Central Ministries also there are statistical
units which collect and analyse statistics relating to the subject which the
ministry deals in.
In)he States also there is a decentralized type of statistical orga-
nisation. There are more than 101 statistical units working in different
states of the country. Almost all the states in the country have either
It Directorate of Economics and Statistics or Bureau of Statistics.
The state statistical organisation is responsible for the collection and
publication of statistics relating to the state concerned &.nd it has Dis-
trict Statistical Officers for exercising supervision and processing of
aata collected from various sources. The Directorate also co-ordi-
8£0 FUNDAMENTALS OF STATISnCS

nate tho Statistics collected by other units in the State. The Intu..
State co-ordination of Statistics is done by the C. S. O.
Decentralhmtion of the statistical organisation necessitates the
presence of an efficient co-ordinatiog agency so that statistics may be
collected uniformly throughout the country and there may not be un-
necessary duplication. Accordingly the Central Statistical Organi-
sation 'Was set up under the Cabinet Secretariat at the Centre in 1951.
The main functions of this Organisation are : -
(II) To ildvise the various ...Jinistries and other Government agen-
cies about statistical matters,
(b) To co-ordinate the statistical 'Work of different ministries
and Government agencies with a view to avoid duplication and to
maintain uniformity,
(t) To lay down standardb!;ed definitions of various terms with a
view to have a uniform collection of statistics so that comparability
may be achieved not only in the national field but in the international
field also, and
(d) To supply statistical data to U. N. O. and other international
bodies on behalf of the Government of India.
Besides the C. S. O. the Directorate of N: S. S. is the most im-
portant Statistical organisation in the country. It 'Was set up ill the
year 1950 and Is by far the most important agency forI the co.ltinuous
collection of reliable statistical data on random sample basis.
Apart from these government organisations, the non-govern-
ment organisations 'Worth mention are-(i) Indian Statistical Insti-
tute, (it') National Council of Applied Economic Research, (iii') Indian
Institute of Economic Growth, (ill) Gokhale Institute of Politics and
Economics. (,,) Institute of Applied Man-power Research. The I. S. I.
was set up in f 937 and ever since has been doing researches in Statis.
tical methods and imparting training for statistica~ a'Ssignments. In
1960, the Gqvernment of India passed the Indian Statistical Institute
Act and recognized it as an institute of national importance.
The above account of the gradual development of statistical orga-
nisation in India clearly indicates that it is only very recently th1\t we
have in our country a statistical organisation 'Worth the name. The
Government has realised the importance of the collection of statistics
and is- keen to see that statistics are collected in our country almost in
the same fashion as they are collected in other countries of the 'World.
It is common knowledge that economic planning cannot progress
successfully in the absence of adequate and accurate statistical data and
the Government is now doing its best to improve the situation in this
directiop.. Formerly considerable difficulty was felt by the Government
in the collection of statistics due to the reluctance of people to part
with factual information. To overcome this difficulty the Government
passed an Act named as the Collection orStatistics Act in the yeat_1953.
It gives legal powers to the Government to collect any type of statis-
tical information regarding industrial and commercial units. All types
GROWTH OF STATISTICS IN INDIA 821

of industries and commercial concerns are covered under this Act.


This is the first comprehensive legislation of its kind in India and it
gIves a clear indication of the Government's intentions regarding coUec·
-tion of statistics in future.
I:MPROVEMENT IN MeTHODOLOGY SCOPE AND COvrmAGE.
SlalistiraJ rlSearrb. Not only there has been a considerable im-
provement in statistical organisation of this country in recent" years but
attempts have also been made to promote statistical research and to
evolve sdentific methods for the collection and analysis of data. The
Indian Council of Agricultural Research, Delhi (formedy known as the
Imperial Council of Agricultural Res:-!arch) has been carrying on experi-
ments in the field of random sampling and has evolved a new technique
of estimating tl;le yield of various crops. The new method which is
based on random sampling is now bdng followed in "most of the states
for the estimation of crop yields. Similar experiments have also been
conducted by the Indian Statistical Institute, Calcutta and the technique
evolved by thcm is being used for making crop forecasts in West
Bengal. These institutions impart statistical training to students and
hold various diploma examinations in statistics. Recently two sessions
of Internationaf Statistical Conferences were also held in our country
and many foreigll experts came here and gave a series of lectures on
various types of statistical methods. The conferences were followed by
short periods of training in which students from neighbouring countries
also participated. Besides this various researches are being conducted
tor the purpose of collecting data particularly in industry and it is a
neW development so far as out country is concerned.
Impr(jl1ed technique. It is gratifying to note that he technique of
the collection and analysis of statistics in our country is fast improving
and the results of various researches and experiments arc being utilited
fot replacing the old and out-of-date methods. Our last population
census held in 1951 was very much different from previous censuses.
The crop forecasts as we have already said above are now radually
being made by the technique of r~ndom sampling and the traditional
method of Normal Yield and Anna:wari Estimate is gradually being
scrapped. Various random sample surveys have been conducted in our
country in reccnt years and the Government is trying to fill up the gaps
in statistical information relating to various problems. The All-India
Agricultural LabQuc Enquiry Committee conducted a very useful sur-
vey in the year 1950-51 under the Ministry of Labour. In April 1967,
the Government appointed a National Labour Commisdon to equire
, into the working conditions in actories and io fields, union issues.
wage policy, incLntive schemes, social security :lud labour legislation.
Siniilarlyan all-Indfu Rural Credit Survey was organised by the Reserve
;Bank of India in 1951-52. A [Q,ore cornprehensis rsurv-ey known as
Ruaal Invcsrm~llt and Credit Survey (1961-62) haecenely been com~
pleted by the Reserve B.lOk of India, National Sample Surveys are being
conducted since 1951 by the Ministry of Finance and eleven rouads
822 FUND.Al4ENTALS OF STAnSnCS

of it have been completed. Recc;ntly National Sample Surveys on a


"marchiog" basis have also been started by the State Directorat("s of
Economics and Statistics. Information about a large variety of prob-
lems relating to rural and urban population has been collected by the
-N. S,. S. The National Income Committee has given the final estimat-
es of India's national incomes for the years 1948-49 to 1950-51.
They have also used the improved methods in making these estimates.
Various indices [dating to the wholesale prices, consumers' prices,
security prices, industrial production, cost of living, earning of
labourers and industrial profits etc., are now being compiled on
scientific lines.
Bill" coverage. It should be noted that not only better methods
are being used at present but the scope of statistics has been consider-
ably enlarged. This has been done in two ways; firstly; we are now
collecting statistics about a larger geographical area of the country.
Now that most of the former Indian states have merged with the Indian
Union statistics are being collected about all of them and coverage of
the Indian statistics has thus considerably increased. Secondly, We Ilre
now collecting statistics about a larger number of problems. Formerly
the position of industrial statistics, statistics of income and cost, etc.,
was very precarious. Now attempts are being made to fill up these
gaps. In many cases information has been collected by rt,ndom sample
Surveys, and in oth«;rs census methods have been used. Thus we are
now holding an annual census of manufactures, and statistics about
many other problems are being collected through the N. S. S.
Non-offMaJ statistiu. Besides the above mentioned improvements
in official statistics certain improvements have been made in non-DffldaJ
$Ialislies a!so. The non-official statistics in our country, as in other
countries also. are much less in quantum than statistics collected by the
Government. Non-official and semi-official statistics are collected by
Universities, Research Institutes, Board of Economic Enquiry, Trade
Unions, Manufacturers' Associations, Chambers of Commerce, Stock
Exchanges and Commercial Journals and Maga~ines. Both the quantity
and the quality of such data are gradually improving in our country
and it shows that the apathy of people to statistics is gradually di~appear­
ing and they are taking more and more interest in statistics. It is very
healthy sign and it augurs well for the future developments of statistics
in this country. -
::'). However, the above survey of the Indian statistics should not be
taken to mean that now there are.no drawbacks or Shortcomings in our
statistics. Even now there lIZe many pitfalls and inaccuracies in our
statistical data and we should like to warn the student of this subject
once again, against accepting all statistics as good and .dependable. We
shall point out the Shortcomings of various types of statistics as and
When we discuss some of the important statistics collectl!d in our coun·
try in the follo~ing sections.
SECTION (2)

POPULATION STATISTICS

Old and New Concepts. Population statistics arc supposed to ht:


the oldest statistics as in ancient days the ruling kings and chiefs used
to collect figures of the number of people in their territories for getting
an idea about their military strength. In most of the countries, there~
fore, the earliest form of economic statistics available relate to the num~
ber of people and their occupation, etc. The importance of population
statistics in earlier times was very great on account of the fact that in
tllose days the strength of the population and consequently of the army
was a very vital factor in the establishment and expansion of empires.
Even today the importance of population statistics is very great. Now~
a-days the ~tility of these statistIcs is not so much from the point of
view of finding out the number of people as from other ?,oints of view.
The concept of population statistics has undergone a constderable change
and today a census is not a mere counting of heads. In the words of
Sardar Vallabbhai Patel: "It involves extraction of information which
plays a vital role in the determination of many of our administrative
policies. The facts elicited during the course of this operation yield
valuable scientific data of sociological importance. In many matters
it provides useful guide for the eJi"ectiveness or otherwise of out econo~ -_
mic policies."
Imporlanu. Thus we find that the concept of population statistics
bas undergone complete change and now population statistics deal not
only with the number of people, their birthplace, age, nationality, sex
and civil conditions but with more important facts relating to means of
livelihood. occupation, economic status, dependency, employment, etc.
All these are important economic problems and in the age of economic
planning every government likes to know about the success or failure
of its policies followed in respect of these problems. Population
statistics are very helpful for this purpose. They tell uS whether the
policy followed in the particular field has brought about the desired
'results or not.
For a country like ours the importance of population statistics 15
still greater because for some time past We are witnessing a constant
race between food supply and population. The Census Commissioner
of 1951 census has already sounded a note of warning that in future
if conditions continue like this, our population problem may become
too gigantic to be easily solved. Under such circumstances the import-
ance of population statistics increases considerably and in fact our
future economic policies have to be decided in the light of the facts
revealed by the censuS of 1951.
824 PUNDAMBNTALS 01' STATISTICS

The statistics of birth place. nationality, age, sex, marital status


and eCC1Ilomic characteristics have their own importance and it would
be futile to elaborate the utility of all these types of statistical data.
Suffice it to say that population statistics are -fundamental and basic
statistics and one' of the most important set of figures that a country
shoula possess. Broadly speaking population statistics can be studied
under three Gategories. They are :_
(0) Population Census.
(b) Vital Statistics, and
(&) Demograp.hic Surveys.
Population census deals with tbe counting of the number of people
on a particular day and noting down the characteristics of the people
counted. Practically all the countries of the world hold population
census generally onCe in 10 years. Vital statistics deal with the record
of births and deaths. If vital statistics are accurately maintained the
population of a country can be known without holding a population
census. In fact the founder of population statistics John Graunt used
vital statistics exclusively to find out the population of England in the
year 1662 because population census 'was not taken in England in those
days. At present In most of the countries of the world both popula-
!ion census and vital statistics are used to study 'p'opulati~n growth and
In fact unless both these types of figures are available growth of popu-
lation cannot be properly studied. Demographic surveys are meant
to collect information about particular problems in certain regions and
they supply very valuable information and in many cases they fill up
the gaps of data relating to inter-censual periodS.
In India we have both the pOp'u1atioo census and vital statisti~.
Demographic surveys have been started ooly ver,y teceotly. Our statis-
tics of population have oot been very satisfa~ro~ ~nd we shaU examine
their shortcomings in the following pages.
POPULA'rIOL'l CENSUS
Census procedure up to 1931 censu8
Censlls Ad. Population census in India was considered a decennial
operation for which haphatclrd temporary arrangements used to be
made. Shortly before a census was scheduled to take place an Act was
passed empowering the Central Government to appoint a Census Com-
missioner at the top of the.. census organisation and Superintendents of
Census Operations in every province. This Act also made it incum-
bent upon various types of institutions (non-official and semi-official)
to assist the Government in holding the population census. By this
Act individuals were also uoder legal obligatioo'w give flccuratc ans-
wers to the questions put to them by Census E.n\:1merators. and any
obstruction aod restraint in the discharge of dutio$" Q( the census officials
and falsification and misrepresentation were punishabfe with a fine.
CenStis slaff. Population census staff thus consisted of the Census
Commissioner at 'the top, State Census Commiss~oners incharge of
GllOwm OF STA'l'lS'I'ICS IN INDIA 825

various States District Census Officers, incharge of various districts,


Charge Superintendents under whose jltdsdiction usually a Tahsil was
k-ept. Circfe Superintendents who were incharge of a town or a city or
a part of it and finally the Block; Enumerators who were inchargc of
a few streets or certain number of houses in a particular town or
village. Indian States had their own officers. To hold the population
census the Government used to divert its own permanent staff for
conducting census operations and usually the distr.ict cenSus officers
Wero persons of the rank .of Deputy Collector and Charge Sup.:rinten-
dents usually Tahsildars and Naib-Tahsildars. In rural areas a charge
was usually the circle of a Kanunr' Circle Inspectors were usually
appointed from the clerical staff 0 various government departments
and the Block Enumerators were dther teachers of Municipal or Dis-
trict Board Schools or other low-paid government servants. On the
rural side the census staff was usually appointed from the revenue
department and generally the Patwari worked as a Block Enumerator.
Training. After the appointment of staff a nominal training was
given to them. Training was given usually in two ways. The staff
above the enumerators was first supplied with lInStll manllals which gave
detailed information and instructions as regards the census procedure
and the duties of the various officials. Apart from this, the staff was
also trained orally. Some enumerators and supervisors were asked to
fill up sample schedules which were corrected by their immediate
officers. This training as has been said was very nominal and it struck
an almost farcical note when wc thiok that nearly two million people
were necessary to conduct the population census in this country.
HOllse fllIl1Ibering. The actual census work began with the num-
bering of houses. It was a very important work: and it used to be dop.e
much before the actual census date. Tbe definition of the 'Word "house"
for the purpose of pOPJllation census is different from its ordinary mean-,
ing. In various populiition censuses held in India ~he definition of the \,
word "house" has not been uniform. In the census of 1931 and in
other censuses before that also the house had been defined on the bases
of "&iJlllha." It is easily intelligible and based on a well-known and
dee~-rooted custom of the people that the members of tbe joint family
eat food cooked from the same chlltha. The counting of houses, there-
fore, amounted to counting the number of families which had a com-
mon cooking place.
Poplllation &ollnl. After the numbering of the houses a preliminary
census used to take place. Usually tnis was done a few weeks before
the actual CensuS date. The enumerator used to go with the schedule
to the various houses in his block' and used to fill up the forms himself.
This work was ca!efully checked by supervisors and other officers. The
actual census usually related to a particular night. On the night of
the census the preliminary record was made up-to-date. Name of per-
soos who had left the houses or had died were struck off from the list
and who had come from outside or 'Were born were entered in the list.
Some difficulty arose in connection with the persons who on the census
826 PUNDAMENTALS OF STATISTIClI

night were travelling by trains or boats or were working in forcsts,etc.


Special arrangements were made for all these and similar other cases.
All persons travelling by rail who purchased tickets after 7 p. m. on
'the night of the census were enumerated on the platform if there was
time and if not. then on the trains. Those alighting at any station
during the night were enumerated there unless they could produce a
pass showing that they had already been I.:ounted. All trains used to
be stopped and every carriage visited at about 6 a. m. on the following
morning in order to include travellers who had escaped notice till then.
Similarly other special arrangements were made for other cases of this
nature.
On the next morning each enumerator prepared a statement show-
ing the population of the block and handed it over to his supervisor
who after checking it prepared a total for his circle and handed it,
in turn, to the Charge Superintendent. 'The Charge Superintendent
similarly prepared a total for his charge, and scnt it to the District
Census Officers, who in turn sent the figures to the provincial Superin-
tendent. The district figures were totalled and provincial totals were
soon obtained.
DI facto censtll. It should be observed that population census in
India till 1931 was held on a dl facto basis. Under-this system persons
are counted wherever they are found on the census night. This sys-
tem sufters from various drawbacks. There is always a possibility
in this system, of double counting and there is no method of veri-
fication if people under-estimate or overestimate the number of per-
sons in their houses'. Actually_ there was a considerable over-esti-
mation in the number of people particularly in Bengal where the stren-
gth of the Hindus and Muslims was almost equal because the seats in
legislature in those days were divided on the basis of communal repre-
sentation. Besides thIS a de facto count does not give a correct picture
of the economic status. distribution of population and other similar
subject. It requires a huge army of enumerators becausc thc whole
operation of census from start to finish has to be completed in one night.
Moreover there are always difficulties in the selection of the census night
because it should be a moonlit night, as in -villages there are no electric
lights. It should be neither very cold nor very hot and should be such
Q night on which there is no major festival of the country when people
are expected to be out of their homes. Due to these difficultIes the
de facto system of holding the population census was replaced by the
tk jure system of holding the population census in 1941.
Changes in 1941:
(I) The most Important change in the census of 1941 was that
one night enumeration 1JJIlS abandoned and population census was held fOI
the first time in our country on the basis of normal residence. We
switched over from de fa&to to de jure count. As has been mentioued
earlier in dejure count people are not counted wherever they are found
on census night but on the basis of their normal residence and for this,
GROWTH OF STATISTICS IN INDIA 827

a period of enumeration is fixed. and if during this period a person


has been at his normal residence even for some time he is counted at
the place of normal residence even thQugh he is not present there on
the day to which the census relates. In 1941 census this period of
enumeration was one week.
(ii) Another important change was that the oid" schedules were
abolished and the enumeration lIIas conducted dirertly on slips" which Were
later on sorted for purposes of tabulation. Prior to the Census of
1941, data used to be copied from the schedules to the slips and then
tabulation was done. This meant duplication of work and also in-
creased the chancer of error in copying. In the Census of 1941 there-
fore, enumeration was done directly on slips. One slip was assigned
to each individulo.l on which the entire information relating to him
was noted down.
(iii) Another innovation of this census was that two per cent
random sample of all the slips lIIas taken Ollt for verification of census data
at a later date. Every 50th slip was taken out and kept separately
from other slips for the purpose of analysis and verification. However
the sample could not be analysed due to emergency created by war
at that time but these slips were later on analysed by the Indian Sta-
tistical Institute, Calcutta, when the Natiofial Income Committee wanted
to have .;;ertain information in the year 1949.
(iv) In the census of 1941 another change introduced was Ihe
Ixtension of the house list. In 1931 census, house list was based on the
list of houses irrespective of the definition which was used for the
wotd 'house' for the purposes of census. In 1941 census at the time
when houses were being numbered a preliminary type of census was
held giving information about the si:te of the family, sex ratio and age
distribution.
(II) In 1941 census there Was rompleJe rentralisation ofprintillg andfor
Ihefirstlime merhanicaltabuJation was introdtlced. However all the tabula-
tion work could not be done by machines but an experiment was made
in this direction by using Governm«nt machines in spare time and the
experiment \,as no doubt a success.
(vi) Besides the above mentioned changes in the technique of
holding the population census certain new itenlS about which no
information was available in this cpuntry were included in the ques-
tionnaire of 1941 Census. Two new questions were introduced to study
the rate ofpopulation growth in the country and they related to:
(a) the number ot children born to a Woman and
(b) age at the time of first child birth. .
This information was needed for the calculation of nc.:t reproduction
rate.
(vii) In 1941 census the orcupational classification was revised and
lIIade m(/I"e .r~iellfiJi~ and realisfi~. However, due to difficulties created
by war statistics of occupational classification could not be tabulated
and as such the census report of 1941 does not give any information
828 FUNDAMENTALS OF STATISTICS

on this pGint. The Indian Statistical Institute, Calcutta, later on gave


some rough idea about the occupational structure of population in 1941
from the sample.slips which Were kept aside. This work: was done at
the inqt.ance of National Income Committee in 1949.
(uii,) For the first time in 1941 separate figures were collected
about persons who could rcad but not write.
Population Census, J:9SI
Gmeral. The population census of India held in March. 1951
was the fitst censuS conducted after the independence of the country
and as such has a special significance. Many changes of far-reaching
nature were made in the technique of holding the census and informa-
tion was collected about a large number of new problems. Many items
which were considered to be less useful were deleted from the list of
questions and new items particularly those relating to economic cha-
racteristics of the population were added. Reports of the last census
are very exhaustive. They have beeJ1 issued in 17 volwnes which are
divided in 6.3 parts. The fitst of these volumes contains an all-India
census report and is divided in 5 parts. The other 16 volumes which are
divided in 58 parts contain the State Census Reports. In addition to
these 307 district census hand-books have been prepared and more
than a dozen brochures dealing with economic and demographic data
have been published. The total cost of the cenSus has COmO to about
149 lakhs of rupees. A lltaif of about 7 lakh people conducted the census
operations. Ou t of these 5,93,.518 were census enuJnerators. 80,006
supervisors and 9,854 charge officers. The period of enumeration
in this cenSus was 3 weeks-from 9th February to 1st March. "Our-
ing these 21 days about 6 lakhs of census workers visited 644 lakhs of
occupied bouses and made enquiries. The information supplied to
them by about 7 crore citizens was recorded in 3,569 lakhs of census
slips, each of which wa, a dossier of one person:'
National'RIgist". Besides the extension of the enumeration period
f.rom one week to three weeks another important innovation of the
census was the preparation of Natfona! Register of Citizens. From the
censuS slips information about individuals was copied on registers.
Every village and every ward of a tewn has now a register of its OWn
which is considered to be the part of the National Register. This
Register is available for referenct! to author1.ted persons either for
administrative purposes or for any social or economic enquiry. Un-
authorised persons have ,no access to it and like other census records
it is not admissible as. ev.iden~ in any court of law. This Register
serves a very useful purpose. It is possible to extract local census
information from it and it serves as a framework for social and economic
surveys conducted on random sample basis. It is extremely useful for
the maintenance of electoral rolls. This Register has introduced a. few
and important check of census enumeration and it will improve the
quality of enumeratio n to a considerable extent.
GaoWTH 01' STATISTICS IN lNDIA 829
Pe"lIonenl Acl. There are other factors as well that lend neW
si~nUicance to the ceosus of 1951. Besides being more thorough and
SCIentific in natme than previous censuses it has the unique privilege of
having been conducted under a permanent Census Act and through a
permanent office of the Registrar-General and Census Commissiontr.
This Eills up a lacuna of long standing in the field of population data
and it shall ~reat1y enhance the accuracy, effectiveness and usefulness of
such enquirIes in future.
Advantages of oe jnrB count were fully realised in the census of
1941 and as has been said the period of enumeration in the 19.51 census
was increased from one week to three weeks. This change added to
the convenience and efficiency of the enumeration aSl the enumerators
got full 21 days to do their job. ~rhe house list whleh was prepared
earlier was also checked during this period by the block enumerators.
The normal residence 'of a person was the criterion for counting.
Unless a person was absent from the place of his normal residence for
the whole of the duration of the census enquiry (9th February to 1st
March) he was counted at the place where he normally resided. The
reference date of 1951 census was 1st March; therefore to bring the
information correct up to March 1st, the first three days of March were
allotted for a re-check operation.
HQII.fBQq/d.r. For the first time in 1951, population was counted
on the basis of "households" and not "houses". A distinction W'as
made between houses and households. A house was defined as a
dwelling place with a separate main entrance and a "household" was
defined as a group of people who live together and take food from a
common kitchen. As will be observed later on, this distinction was
very helpful in making a study about the size of the households in this
countr~. It has been a cornmon feeling for some time past that the
joint family system in India is fast breaking up and the population
censuS of 1951 throws light on this problem by studying the size of the
households.
Yet another important departure in 1951 census from the practice
followed in previous censuses was that the information about races,
casteS and tribes was entirely omitted except in case of "special groups
or backward classes." This was in keeping with the general policy of
the Government of Ind.ia not to encourage sectionalism on the basis of
caste, races, etc.
We shall now discuss the information that was collected at the
last census.
Information collected
(1) NAME AND RELATIONSHIP TO HEAD OF HOUSEHOLD
The head of the household was defined as a person on whom falls
the chief responsibility for the maintenance of the household. The
term head of the household was not meant exclusively for self suppor-
ting persons. The head of a household cOllld be eithcr sclf-supporti ng.
830 PUNDAMENTALS OF ,TA'l'Imcs

dependent or partly dependent. Actual relationship for wife, son.


daughter. brother, sister, father. mother, son-in-law, daughter-in-law,
brother's wife was to be mentioned. Other relatives Were classed in
one group. Unrelated persons living in a household like domestic
servants. etc., Were to be noted down separately.
(2) NATIONALITr, RELIGION AND SPECIAL GROUPS
Nationality of all persons was to be recorded in full. So far '.lS
religion was concerned ii'lfo~mation Was gathered about Hindus,
Muslims and people belonging to t)ther religions as also about those
who did not profess to have any refigion. Special groups Were meant
for Anglo-Indians and for all non-Hindus, except Anglo-Indians, as
also for people belonging to scheduled classes .and backward castes.
The enumeration of scheduled classes and backward castes was neces-
sary under the Constitution of India and special provision was there-
fore made for counting them, otherwise the Government had decided
not to attach a ny importance to the caste of the peoe.
(3) CIVIL CoNDITIONS
Statistics Were collected about unmarried, married, widowed, aad
divorced people. Formerly we did not collect statistics about the
last category. A person was treated as unm2!ried only if he or she
never married in accordance with any religious rights or by registration
or act;ordirig to any custom or form of marriage prevalent in the
community to which he belonged. Prostitutes, concubittes, etc. Were
treated as unmarried.
(4) AGB
Age was recorded llS on last birthday i.e., the actual number of
completed years.
(5) BIRTHPLACB
Statistics of birthplace were collected on the basis of districts.
For persons born in foreign countries the name of the country was
mentioned.
(6,) DATE OOP ARRIVAL OF DISPLACED PERSONS AND THEIR DISn\lCT
OF ORIGIN IN PAK'ISTAN
The partit:on of the country and the subsequent influx of refugees
called for a special item concerning displaced persons in the enumera-
tion slip. A displaced person was defined as any person who has
entered India having left or being compelled to leave his home in
Western Pakistan on or after 17th March, 1947, or his home in East
Pakistan on 01', after 15th October, 1946, on account of setting up of
the two Dominions of India and Pakistan.
(7) MO'I'HBa. TONGUE
Mother tongue was defined as the language spoken from the
cradle. In case of infants and deaf-mutes the language of the mother
was taken to be their mother tongue.
GROWTH OF STATISTICS IN INDIA 831
(8) BILINGUALJSM
The languages recorded under this question were to be Indian
languages. The second language was recorded only if the person
enumerated could speak it fluently and habitually for domestic or
business purposes. Learned languages like Sanskrit, etc., were not
to be entered. If a person was fluent in several Indian languages
only one in wl.lich he was most fluent was to be recorded. .
(9) (a) DEPENDENCY (b) EMPLOYMENT
The census was concerned with two economic characteristics of
every individual~hls economic status and his means of livelihood. The
ninth question related to economic status and questions 10 and 11 to
the principal and secondary means of livelihood. Economic status
was studied in two parts, namely, dependency and employment. A
r.erson could be either "self-supporting" or "earning dependent" or
'non-earning dependent." A self-supporting 'person was defined as one
who earned enougq income at least to maintaIn himself (not necessarily
his family.) A dependent could be either an earning dependent or a
non-earmng dependent. A person who earned an income which was
not sufficient to maintain his own self was classed as an earning depen-
dent and a person who did not earn any income in cash or in kind was
classed as non-earning dependent.
So far as employment was concerned information was collected
only about self-supporting persons. A self-supporting person could
bo either an employer or an employee 01' an independent worker or
none of these. An employer was defined as one who necessarily employed
other persons in order to carry on the business from which he secures
his livelihood. An employee was defined as a person who ordinarily
worms for some other person for a salary or wage in cash or in mind
as a means of earning his livelihood. An independent worker was
defined as one who was not employed by any-one else and who also
did not employ anyone else in order to earn his livelihood.
(10) PRINCIPAL MEANS OF LIVELIHOOD
The information about this item was collected about every in-
dividual. For a self-supporting person his principal means oflivelihood
was recorded and for a dependent whether earning or non-earning, the
principal means of livelihood of the self-supporting person on whom
he was dependent was noted down. Principal means of livelihood was
defined as ono which provided the largest income for a self-supporting
person who had more than one means of livelihood. In case of self-
supporting persons having only one means of livelihood principal
means of livelihood meant the same thing as means of livelihoOd.
(11) SECONDARY MEANS op LIVELIHOOD
For a self-supporting person who had more than one means of
livelihood the name of the livdihood nCiXt in importance to his prin-
cipal means of livelihood was also noted down. For an earning depen-
dent the means of livelihood which provided the earning was noted.
832 PUND~ALS 011 STATISTICS '

(12) LITE'RACY AND EDUCATION


Statistics were collected about three categories of persons, namely
(II) those who carr neither read nor write. (b) those who can read but
cannot write, and (..) those wbo call both read and write. If a person
who could both read and write had passed some examinations the highest
eumination passed by him was also noted down. The test for reading
was the abiltty to read 8h;np1e letter and similarly the test for writing
was the ability to write a simple letter.
(13) UNEMPLOYMENT
l"his was a special question which was entered in the enumeration
slip of Uttar Pradesh. Statistics were collected about persons who
were gainfully e.D;lployed on the date of enumeration and also for persons
unemployed and seeking employment. The period of continuous
anemployment in years and months was also noted down.
(14) SEX
Information was collectrd about males and females. Eunuchs
a.nd hermaphrodites were classed as males.
There were thirteen questions common to all parts of India and
one question was included by each State according to its own liking.
In Uttar Pradesh this special question related to unemployment. In
Bombay and some other States the question related to fertility.
A perusal C?£ the above topics about which information was
collected at the last census indicates that the list was Fairly exhaustivC!_:
It included almost all items which the United Nations Organisation
wanted the various countries of the world to include in their lists. The
United Nations Organisation had given a list of twelve topics out of
which eleven are included in our list. The only topic anout whicb
information was not collected in the 1951 census on an all-India basis
rdated to fertility. Even about fertility certain states like Bombay and
M:ldras had included a special question in their lists.
Population Census of 1961
Census of 1961 was the ninth decennial population census of our
country and was the second taken after independence. The 1951
census coincided with the beginning of the first five year plan and the
census of 1961 with the beginning of the third five year plan. This
coincidence has a great significance, as population statistics. in general,
and statistics relating to occupational distribution of the population
and the employment status of the econolllically active population,
in particular, throw light on the changing pattern of the country's
economy and also help in assessing the economic progress in terms of
national and per Gtlpita income ;tnd level of employment. 1951 census
was conducted at a time when the country had hardly forgotten the
ill-effects of partition and was busy in settling down after the great up-
heaval which the independence brought in its. train. As such the census
of 1961 can be treated as the first comprehensive and complete cenSuS
GR.OWTH OF STATISTICS IN INDIA 833
which our country had in normal conditions after achieving freedom
and after establishing the country's economy with the two five year
plans.
1961 census 'Was the second conduct,:;) after the passing of a per-
manent Census Act and the appointment of a permanent Registrar Gene-
ral and Census Commissioner of India in 1941. It cannot be said
about this census that it was conducted after hurriedly passing an Act
and appointing an officer on an ad hOI basis to do the job. Prepara-
tions for the censu~ had begun long before the actual census opera-
tions started and many Committees and Conferences had discussed
the details of the census work.
Special features of 1961 census
(i) Like the census of-1941 and 1951 the population census of
1961 was also conducted on a de jur, basis and people were counted
on the basis of their normal residence.
(ii) The period of enumeration was 19 days beginning on 10th
February and ending on 28th February, 1961.
(iii) The distinction between house and household which was
made in 1951 census for the first time was retained and made slightly
mOfe elaborate. In this cepsus there were three categories namely
(a) Building, (b) Census House and (c) Households. What was defined
in 1951 census as 'house' was divided in two categories of 'building'
and 'census house' in the 1961 census. The definition of household
remained unchanged. In 1961 census the word "building" referred
to an entire structure raised on ground. The term "census house"
tefew.:d to "a building or part of a building having a separate main
entrance not necessarily leading to a road or lane. Thus one building
could have a number of census houses if each one had a separate en-
trance. A ','household" was defined as a group of persons who or-
dinarily lived together and took food from a common mess. However
it was not necessary that all members of a household should be relatives.
Persons living in hospitals or hostels could constitute a household and
there could also be one-person households. Just as a building could
have a number of census houses similarly a census house could have
a number of census households.
(ill) For the first time in the cenSus of 1961 separate slips were
used for households and individuals. In a household slip informa-
tion 'Was collected about the households engaged in (a) cultivation.
(b) household industries or (r) employed as labourers in either cultivation
or household industries or both. Such statistics had never been collected
in the past and they would throw light on the occupational pattern of
the Indian population living particularly in rural areas. A large variety
of useful information was collected on individual slips and one slip WaS
meant for one individual.
(p) In 1%1 censuS the house list was also extended considerably
to include a large variety of information. At the time wheo the house
53
S34 FUNDAMENTALS OP STATISTICS

list: was being prepared information was collected about the purpose
for which a house was used (namely for residence or for shop or work-
shop or school or any other institute etc.) In case a house was used
1S a workshop or factory. further infi:mnation about the number of
persons employed. type of work done. and kind of fuel or power
used was also noted down. Details were also obtained about the des-
cription of the house-type of walls and type of roof etc. Separate sex-
wise figures about the number of persons below and above !O ~ars
of age living in a house were also obtained. Thus the house list which
was prepared in the months of September and October, 1960 and which
was checked again in December, 1960. contained very useful infor-
mation.
(II') The most important change which was introduced in 1961
census related to economic characteristics about which statistics were
obtained. The occupational classification adopted in this censhs was
different from those adopted in earlier censuseS. For the first time
in this census the whole population of the country was divided in
two broad categories of "Worhlng" and "Not working." The 1951
classification of economic status (self-supporting, earning dependents
and not earning dependents) was entirely dropped.
Statistics about principal and subsidiary means of livelihood which
were collected for the whole population in 1951 census were obtained
only for certain classes of people in the census of 1961 as it was thought
that for a large majority of Indian population, p~icularly of rural
areas this distinction between principal and subsidiary means of liveli-
hood was meaningless. Moreover the basis on which the distinction
'between principal and subsidiary means of livelihood was made in
1951 and earlier census Was changed. Formerly the criterion adopted
~as that of income, So that the principal means of livelihood was
supposed to be one from which the largest share of income was derived
but in 1961 census income was replaced by time and the principal means
of livelihood was supposed to be one in which a person devoted a major
part of his working time.
~IIii) In 1961 census some of the questions which were asked
in 1951 census and which were not more important were entirely drop-
ped. for example the question relating to displaced t>ersons from Pa-
kistan, which was included if' 1951 census was dropped 1 n the last census
of 1961.
(viii) Many other minor changes were also made in the census
of 1961, for example. in earlier censuses prostitutes and concubines
were treated as unmarried but in the census of 1961 the marital status
indicated by them was noted down. Similarly the category of divorced
persons was extended and ren~med as ·separated or divorCed' so that
it could include persons who Were not formally divorced but who were
living separately without any intention of a reunion.
GROW'tH OF STATISTICS IN INDIA 835
In£ormation collected in 1961 census
Iriformation in the last census was collected on two diHerent types
of slips namely Household slips and Individual slips. We shall discuss
the itiformation obtained through these slips separately.
Household slips. The household slip was meant to collect the
following information about cenSus households.
(1) Is the housebo/d an institution. In the household slip it was to
be mentioned whether the enumerated household was an institution
like jail, asylum, a religious institution, hostel, hotel, hospital and
boarding. house etc. If it was any such institution then it was to be
clearly specified in the slip. The idea was to find out the number of
households which were different types of institutions as distinct from
ho.... s~holds which were family units.
(:2) Name of the head of th, household. The head of"the household
was supposed to be the person on whom fell the chief responsibility
for the maintenance or various members constituting a household.
Thus the head of the household was not necessarily the eldest member
of the family or a male. The head of the household could ·have been
a very young person of either sex. However, the enumerators were
instructed not to go in detail of this question and were asked to write
the name of such person, as head of the household, as was given by the
informant. In h:ostels, hospitals etc. the Superintendent was taken as
the head of the household and such households were classed as "House-
holds of unrelated persons." .
(3) Does the household belong to scheduled castes or tribes. _A list of
such castes and tribes residing in different districts of each State was
supplied to the enumerators and they were to find out if any household
belonged to these categories.
~4) Households engaged in cultivation and or household industries and
details of persons working in either or both cultivation and household industries.
This section of the household slip was divided in the following three
parts: (a) Cultivation, (b) Household Industry and (c) Workers at
cultivation and/or household industries.
It is clear from the above account that for the first time in our
country an attempt was made to find out the number of households
and also the number of people who were working in agriculture or house-
hold industries or in both. Distinction was also made between whether
they were ownl?rs or only hired labourers. These ~ tatistics would be
extremely useful in studies relating to the basic structure of the Indian
economy.
Individual slips. On individual slips statistical information was
collected separately about each single individual of the country. One
slip was filled for one individual only and the following information
Ws.s collected.
(1) (a) Name. The name of the person to whom the slip related
was noted down-. If the name of a lady was not disclosed then in place
1:l36 .PUNDAlIIENTALS OF STATISTICS

of name she 'Was to be referred as "the wife or mother or daughter of


50 and SO." H the informant was a lady and she did not want to speak
her husband's name then h~ was referred as «the husband of so and so."
Newly born baby who was not given any name was recorded as "child'.
(b) Relationship with the head of the hOllsehold. Actual relationship
to the head of the household was mentioned. As mentioned earlier
the head of the household was one on whom fell the chief responsi-
bility Ior the maintenance of family members. Head of the household
was not necessarily the eldest member nor necessarily a male. In case
the slip related to such a household as a hotel or hospital where the
Superintendent was taken as the head of the household, and the indi-
viduals were not related to him, it was mentioned that the individuals
were not related to the head.
(2) Age on last birth day. The age of each individual was recorded
as on last birth day. It was recorded in complete years only. The age
of children belo"" one year was recorded as ~ero.
(3) Marital.rlatllS. People were classifi.::d as
(i) Never married,
(il) Married,
(iit) Widowed.
(i,,) Separated or divorced.
A married person was one who was married either once or more
than once and whose wife or husband was alive on the date of census.
Even those persons who were recognised by custOlL or society as
married or who were living as husbands and wives even though no
formal marriage was performed, were treated as married for the pur-
pose of census. A widowed person was one whose husband or wife
was dead and who had not married again. Such couples which were
divorced either by a decree of law court or through recognised social or
religious custom and who had not remarried were treated as divorced.
Those husbands and wives who had separated without any intention
of reunion were also included in the category of separated and divorced.
Prostitutes were classified according to the answers which they gave in
reply to the question about marital status. In earlier census they were
treated as unmarried.
(4) (0) Plate of birth. Data about place of birth were collected
on the following basis :
(i) Born in village or town in which enumerated.
(it') Born in another village or town of the district in which
enumerated.
(iiI) Born in another district of the State in which enumerated.
(ill) Born in another State of India.
'(t') Born in another country.
(VI) Born on sea. air. railways. or road vehicles.
GROWTH OF STATISTICS IN INDIA 837

In case of item No. (iii) above the name of the district of b-irth
and in item No. (iv) the name of the State of birth and in item No. (v)
the name of the country.of birth were noted down.
(b) Whether born in vii/age or in town. This information was collect-
ed separately fr~Ol the information of question (4) ta) above. Per-
sons born in places wEich were not considered a town at the time of
their birth but were in the category of town at the time of census were
considered to have been born in town.
(c) Duration of reJidence if born elsewhere. This information was
collected about those persons only who were not born in the .illage
or town in which they were enumerated. If a person was born in any
other village or town of the same district where he was enumerated or
if he was born in another district (of any State) or in any other country.
his length of residence at the place of enumeration (in complete years)
was noted down. If the period of stay was less than one year. the
length of residence was recorded as Zero.
(5) (a) Nationality. If a person had a nationality other than
Indian then the name of the country of his nationality was noted down.
(b) R,ligion. Data were collected about all religions but sym-
bols were assigned only to Hindu. Muslim. Christian. Jain. Budh and
Sikh religions. For other religions the actual name of the religion
was noted down.
(c) Sdmwled castes and JchetbtJetl tribu. The answer to this enquiry
WAS recorded only if a person belonged to the scheduled caste or scheduled
tribes. A list of such castes and tribes was prepared districtwise and
supplied to the enumerators -and they were to write down the Ilctual
caste or tribe to whiCh the person belonged and not in general terms
like Harijans or untouchables or scheduled castes. Scheduled caste
persons could have belonged only to Hindu or Sikh religions though
Scheduled tribes could belong to any religion.
(6) LiterafJ and etiMeation. The following information was collect-
ed about literacy and education.
(a) A number of persons who could neither read nor write or who
could read but not write. Such persons were treated as illiterate.
(b) Persons who could both read and write. Only such person
were treated as literate and the test of literacy was whether a person
could read and write a simple letter.
(e) Standard oj etiMelltion. If a person was a literate (that is. he
could both read and write) and he had passed some examination. a
further enquiry about the highest examination passed was made and
the anSWer recorded.
(7) (4) Mother 101lf!". For purposes of census mother-congue
was supposed to be the language in which a person's mother spoke
to him or her in childhood or the language commonly' spoken in the
family. If the mother of a person died in his childhood, the language
838 PUNDAl<tBNTALS OF STATISTICS

commonly spoken in the family during his childhood period \VaS taken
as mother tongue. 'For infants and 'deaf and dumb people the language
spoken by their mother was recorded as their mother tongue.
(b) Any o/lher languag,. If a person knew one or more languages
( either Indian or foreign) other than the mother tongue, they were also
noted down. However not more than two such languages were
recorded for a pers~n.
(8-11) Working POPll14tion. '{hose who were classed as Working
could belong to anyone of the following categories:
(8) Working as cultivators.
(,9) Working as agricultural labourer.
(10) Working in any household. industry and
(11) Working in occupations and classes other than those mem-
tioned in 8, 9 and 10 above. .
Main occupation. If a person belonged to more than one ·category,
say, he worked as cultivator as well as in household industry, he was
included in both these classes. All such peqple who were entered
in more than one class were further asked a CJ.uestion as to which of
these categories was their main work and which ranked as No.2 or
No.3 as the case may be. For the purpose of this census the main
occupation was that in which the largest aJ:!lount of time was devoted.
It should be remembered that in earlier censuSes when a distinction
was made between principal and subsidiary mejlns of livelihood the
criterion was not time devoted, but incotp.e, sb that the principal
means of livelihood 'was taken as one from which a person earned the
largest amount of income. In this census, however, the criterion was
changed from income to time.
Prisoners who were undertrials and were not convicted were
supposed to belong to that occupation to which they belonged before
their arrest. Similarly patients in hospitals were supposed to belong
to that category of work to which they belonged before they were ad-
mitted as patients. However convicted prisoners, lunatics in asylum
were classed as "not working."
(12) Not Working Population. All persons who did not do any
work and consequently were not included in categories associated with
items 8, 9, 10 and 11 were classified in this group. Eight categories
of such persons were mentioned and they were as follows:
(i) Wholetime students or children going to school who
did not do any work like making articles at home for
sale or who did not help in household industry or busi-
ness or cultivation.
(it") Persons eng~ged in unpaid dome!>tic work: (like house-
wives) and who did not do any other work like making
article for sale and who did not even assist in household
cultivation. business, trade or industny:
GROW'IH OF STATISTICS IN INDIA 839
(iii) Dependents including infants and children not going to
school. and persons permanently disabled due to illness,
old age etC.
(;,,) Retired people who were not re-employed, rcntiers. per-
sons living on agricultural and non-agricultural royalty.
rent or dividends, persons of independent means for
which they did not have to work and who did not do
any other work.
(v) Beggars, vagrants and independent Women whose source
of income was not disclosed or other persons whose
source of income was not known.
(:Ii) Convicts in jail (not under-trials) or inmates of penal,
mental or charitable institutes.
(vii) Persons who were not employed in work at any time
before but who were seeking work: for the first time.
('tIiii) Persons who were employed in work formerly but were
without work at the time of enumeration and who were
in search of work.
.Persons who could -not be classified in anyone of the eight ca
tegories were included in category (v). Persons who were not working
but had been offered work which they had not joined at the time of
enumeration were included in category (iii).
(13) Sex. People were classified as males or females. Eunuchs
and hermaphrodites were treated as males.
General criticism of Indian po,pulation census
Data 1I0t com/arable. It is a sad commentary on the organisation
of Indian populatIon statistics that the data collec~d iq the last eight
censuses are not strictly· comparable with each other because the de-
finitions of the various terms used and the classifications under which
the data have been published have been changing from census to cen-
sus. The area covered in various censuses differs. In the year 188t
the census covered an area of 1,382, 624 square miles, in 1941 ofl,581,410
square miles and in 1951 census the area covered was only 1.18 million
square miles (the reduction in area was due to the partition of the
country).
Figures inaccurate. Besides this the figures collected at the time
of the census are not very accurate also. There are various reasons
for the inaccuracies in Indian population census data. One of them
is the indifferent attitudes of the Indian population towards census opera·
tions in general. This apathy is on account of the fact that population
census in India is a 'Very temporary affair and according to one writer
the Indian Population cenSus is like a comet which appears on the
Indian hori2:on once in 10 years attracts much attention and passes
away unnoticed. It has already' been said that there was no inter-
censal activity in our country till recently and on account of this the
general interest of the people in population census Wa3 vary short·
840 FUNDAMENTALS OF STATISTICS

lived. It should not be forgotten that it takes two to\'make the cen-
sus-the enumerator afid the citizen, and the role of the latter is the
more: important of the two. After all the accuracy of the census data
would depend on the replies given by -the citizen rather than on the
efficiency of the enumerator. It is gratifying tr note that after the last
census of 1951 the population census department has not been com-
pletely abolished and IS carrying on certain demographic surveys the
findings of SOme of which have been publisned very recently. It is
necessary that such surveys are conducted very frequently so that
people are always in touch with the activities of the census department
and if this is done there is no doubt that the general character of the
Indian figures would considerably improve in future.
An inexpenriv6 renllU. The Indian census is in fact the cheapest
.:ensus in the world. In othet: countries enumerators are paid on the
bas.ls of the number of people counted by them. In our country the
enumerators were given a certificate for the work done. In.the 1951
census a special medal was given to efficient workers. There has
been a strong feeling that about the million workers engaged in
census operation cannot be expected to put their heart in the work
unless they were paid something. In 1961 census a token payment
of Rs. 4 per supervisor and Rs. 16 per enumerator in an average
block was made. Besides the: question of payment the next
question is tha t of training. Our enumerators are not trained and
there is need of having better trained enumerators. The type of
training that they get is hardly any good and particularly in rural areas
the Lekhpals etc., do not take adequate care to see that the returns are
filled up a~curatdy. It is, therefore, extremely necessary that we should
i:),IoI.tt as far as possible, literate and well-trained enumerators to conduct
census operations in future.
We have already pointed out that in the last census a sample
verification of census figures was conducted on a random sampling
basis and it was found that there was an under estimation in the original
figures. On the basis of sample verification it was estimated that the
error was of the magnitude of 1.1 i.e., for every 1000 persons about
11 were omitted. It goes without saying that if both the citizen as well
as the enumerator realise their responsibilities such error in counting
would not arise.
Indian age-returns are admittedly unsound. One very important cause
of it is the ignorance oj people. 'Ignorance is something quite natural in
a population which is illiterate and which does not keep any systematic
record of age. Besides ignorance, jndifference is also responsible for the
unreliability of age statistics in India. As the Census Superintendent
of Central India pointed out in 1931, "Indifference arises from the
outlook on life. The average man or woman in India matures eady
and is short-lived. Life presses heavily on them and fatalism over-
powers them. Childhood, adolescence, middle life, and old age are weI
marked stages in life and the Hindu sociological system has laid dOW
GROWTH OF STATISTICS IN JiNDIA 841

conduct of life and presented rules for the observance of customs and
practices. It matters not if the present age is not known."
There are some other reasons also which are responsible for the
deliberate misstatements about age in Census Reports. In case of un-
married girls who have reached puberty the age returns are definitely
wrong. The reason for it lies in the fact that high class Hindus feel
shy to admit that they have unmarried daughters already pubescent
who should have been married by that time according to the custom
of the community. EO'umeration may also be short at this age on
account of the practice found in certain parts of India of secluding
girls at the age of puberty. At this age the returns regarding the
ages of males are also wrong, and the reasons of it may lie in the desire
of a person to appear definitely either as a boy or as a man. Early
marriage is also responsible for the wrong returns at this age, because
there is a tendency to exaggerate the age of boys just married and to
understate the ages of those who are not married. Widowers and
bachelors who have advanced in age and who wish to marry usually
understate their ages; in case of elderly Womer the returns are usually
correct but recently married girls and particularly those who have
become mothers tend to overstate their ages because motherhood
implies some elderliness. Old persons of both sexes generally overstate
their ages as it gives them a sort of pride. Some people overstate their
ages on account of superstition. -Many people believe that by telling
the age correctly the length oflife would be reduced. Sometimes wrong
figures are given for fear that they may be used in the court of law as
evidence of age. Moreover there is a tendency to quote the age in
round figures. The most popular digits are 0 and 5. This defect can
be removed to a certain extent by grouping the age returns in suc,h a
way that the effect of the bias is lessened if not totally removed. It can
be achieved if there are alternately 3-yeatly and 7-yearly classes so that
their midpoints are always 0 and 5. This was actually done in the 1931
census. Deliberate misstatements of age can be reduced if the persons
are made to understand and appreciate the fact that all ages are merged
in groups and that their names disappear from the age records. If there
is any doubt about the age of any person it can be verified by hastily
asking him his age on the happening of a particular important event.
In ,the last census the enumerators. were equipped for this purpose with
a calendar of important events-events of such local importance that
everyone could be expected to remember one or other of them. The
appointment of female enumerators can also be helpful to a certain
extent as they can easily See the family members themselves and can
verify certain facts. In this respect the 1951 Census Report makes an
interesting observation that "we do not apparently share one weakness
which is prominently observed in some other countries. Our WOmen
folk-spealdng generally and in large numbers-are not keen on being
recorded as younger than they are."'
Statisti&! of oCfllpation. Our returns of occupational structure lW:e
also not very satisfactory. The manner of classification of the popula-
842 FUNDAlmNl'4LS OF STAnSnCS

tion accorcl,ing to occupations has not been uniform :n various censuses.


Not only the nUmber of classes in which the population _\VaS distributed
has been changing from census to census but ev~n t~e definition of the
same classes has not been uniform throughout. The 1951 census,
however, presented an elaborate and detailed account of how people
obtain theu: means of livelihood and this was free from most of the
ambiguity and shortcomings of previous classifications and definitions.
There is, howcvfr, scope for further improvemehts.
Cillil ,onditionl. Similarly our statistics of marital' status, literacy
and education, etc., arc not ,absolutely accurate. In 1931 census on
account of the promulgation of the Child Marriage Restraint Act there
was considerable underestimation of the married population. Child
marriages in India are still fairly popular particularly on the rural side
and even at the last census the data relating to the marital status of the
population cannot bc. said to be collected very accurately. The 1961
Census has a removed a long standing confusion by replacing the
eategory of -'civil condition' by the more appropriate terms <marital
status'.
It is gratifying to note, however, that the quality of the statistics
collected at the time of the population census is gradually improving
in our country and particularly in the last census attempts were made
to secure the co-operation of the people and they met with considerable
success. The last census was the second population cepsus held in this
country after indepen~ence and naturally peoflle were not as indifferent,
towards census operatlons as they had been 1n the past and now th.at
the population cen~lls department has not been wound up and is'
carrying on demographic surveys the situation would 'further improve.
(2) VrrAL STATISTICS
We have examined in the preceding pages the data contained in
Indian Census Reports and have pointed out the chief drawbacks
and inaccuracies of these figures. So far as vital statistics are con-
cerned the condition is much worse than that relating to population
census data. The figures relating to births and deaths are grossly
inaccurate, misleading and unfit for statistical analysis. While making
usc of the figures of births and deaths relating to our country We 6hould
be extremely cautious and should not take these figures at the» face
value. Vital statistics which include statistics of births and deaths
have been collected in our country under the Births: Deaths and Mar-
riages Registration Act of 1886. TIus Act provided only for voluntary
registration of births and deaths. At present some of bur Part A states
have their own birth and death registration acts while in other states
such statistics are collected in accordance with the laws and bye-laws
framed by the .Municipal and District Boards. Under these rules regis-
tration is compulsory. I
Thus the system of registration of births and deaths is very defcc-
th-e in our country. Firstly there is no uniformity in the procedure of
registration in varlOus states. lt is necessary to collect these statistics
in l> well-organised and uniform fashion throughout the country. In
GROWTH OF STATISTICS IN lNDIA 843

Eogland the Ministry of Health is responsible for the collection of these


figures. In our country generally speaking municipalities undertake the
work in urban areas and the village officials in rural areas. The village
headman has to run to Tahsils for reporting births and deaths and this
is an irksome duty which he generally neglects. In case of the birth he
generally waits for some time to see if the child would survive and thus
saves himself from the trouble of running to the Tahsil onCe again for
reporting the death. It is a general practice that the village headman
holds the reports of births and deaths for; some time and goes to the
TaMil once a week or even once a fortnight. Besides this, many times
he hesitates to report deaths as it may result in the unwelcome visits of
suspicious,-police officials.
In urban are9:s the condition is no better. Births and deaths are
to be reported to the Municipal Office and though failure to do so
within a certain periocl involves a small fine it is hardly ever realised
and generally no action is taken. At many places the Registration
Offices are far away from the residence of the people and they generally
avoid the trouble of going long distances for reporting a birth or a
death. Under such circumstances the Governmen should take active
steps to improve the ~ituation. The system of free postcards pr~va­
lent in some western countries can be introduced at least in urban areas.
It is extremely necessary to have accurate vital statistics as in their
absence it is difficult to study the growth of population. The import-
ance of vital statistics is somehow not realised by the p~ople of our
<:ountry. It should be rememl;>cred that these statistics are extremely
essential for the calculation of birth rates, death rates, fertility rates,
fecundity rates and reproduction rates.
After the independence of the country the vital statistics of India
'Were being published through the Statistical Appendices to the Annual
Report of the Director-General of Health Services. Under the agency
of the Director-General of Health Services information was published
State'-wise and split into rural and urban areas. The main heads under
which the data are collected and published are births, deaths, infantile
mortality, deaths by causes. maternity, death rates, vaccination statistics
and sickness and mortality of prisoners in jails. It is heartening to
know that recently the work of vita~ statistics collection has been taken
over by the Registrar-General and Census Commissioner.
(.3) DEMOGRAPHIC SURVE%S
As bas been pointed out earlier the 0 ffice of the Registrar-Gene-
ral of India has become permanent now and recently various demo-
graphic surveys have been conducted by It. This has filled up a very
serious gap in the population statistics of our country. The pattern of
fertility of Women in India has been found to be a distinctive one ac-
cording to a country-wide sample study of births and deaths conduct-
ed by the Registrar-General ofIndia in 1952-53 and 1953-54, the rcports
of which have been released in two Census of India papers-Paper
No.1 of 1955 containing the Report ofU. P. Sample Census anll Paper
No.2 of 1955 containing the'reports of other States.
844 FUNDAMENTALS OJ' STA"i'ISTICS

It has been found that rnpia has a higher age specific fertility
(births per 1000 females in the samo age groups) for all ages as well liS
individual age groups when compared with U.S.A., Canada, England.
Belgium. Denmark. France and even Japan. It has also been found
that fertility in India follows a pattern which is entirely different from
the pattern experienced in other countds of the world. In our coun-
try fertility is low in the age group 15-19 but it rises very sharply in the
next age group of 20-24 and rises slightly in the age grotlp 25-29.
After this there is a decline. In the U. S. A., England and Japan
also the fertility in the age group 15-19 is very low. In these count-
ries also fertility rises Sharply in the age group i-0-24. In U. S. A.
it reaches the p(:ak in this very period. In England and Japan fertility
rises further but slightly in the age group 25-29 and then there is a
decline. In the U. S. A. the decline starts froll1 the age of 24. In our
COuntry the declino in fertility even after the a&e of 29 is gradual while
in other countries the decline is very ~kep. ThIS is the reaSOn ,'Why ferti-
lity in higher age groups say 40-44 even 45-49 is very high in our
country as compared to the other countries of the world.
In the year 1952 Sri R. A. Gopalaswami, the Indian Census
Commissioner and Regist.rar-General, had prepared a scheme for the
improvement of population data. This scheme 'Was divided in t'wo
parts, the first part related to a simultaneou;; revision of the National
Register of Citizens and the electoral rolls and the second part related
to the sample census of births, deaths, etc. In accordance with these
recommendations a sample census of births and deaths was carried out
in all the States excepting Mysore, Hyderabad, Orissa, West Bengal.
Bhopal and Delhi. This study covered 20 states 'With a population of
27.83 Crores (in 1951) or in other words 78'70 of the total population
of India. The onumeration took place bet'ween Scptcll1ber, 1952 and
January. 1953. In Uttar Pradesh it took place in 1954. The first part
of the scheme relating to the simultaneous revision of the National
Register of Citizens and the electoral rolls could be tried only in
Madras, Coorg, Vindhya Pradesh, Madhya Pradesh and Madhya
Bharat.
Some of the findings of the sa.tnple census of births and dtaths
are as follows :
(1) In a majority of the stat~s the bi~th rate has been found to
be b::tween 30 and 40. In almost all cases the birth rates of the sample
are less than those computed by the Census Actuary for 1941-1950. It
mJRnS that there has been:. c;:ertain amount of under-estimation 'of births
in the sample census. The sample I.:ensus was cOnlined to 1%) of the
households in the selected dist.ricts. It may also be that the birth rate
in India at present might be less than the birth rate during the decade
1941-1950 and this may be the reason why the sample CenSUS has ob-
tain..."<l birth rates lower than those obtained by the Census Actuary.
But it would be highly dangerous ~o d.raw any conclusions about the
GROWTH OF STA nsncs IN INDIA
.
845
.
fall in death rate from these data unless this tendency is confirmed by
other facts.
(2) Child birth index for completed maternity in India is placed
between 6 and 7, i.e., on an average a woman produces between 6 to
7 children during the reproduction age period. In Japap this figure is
5.3, in U. S. A. 3.3 and in England 2.6 only. .
(3) For mothers of complt:ted fertility the child loss varies from
about 20~·;, to 33~ ;,. In other words, about 1(5th to 1(3rd of the child-
ren born pr,~deccasc their mothers, undoubtedly a very high figure
which shows the extent of colossal loss of humarl resources which takes
place in our country.
(4) In a majority of the States ofIndia between 40 and 50% of the
births arc of the first and second order. If the total of the births of
the first three ord,t!rs is taken it works out to be between 60 and 70%. In
Japan this percentage is 74, in U. S. A. 76 and in England 85. Accord-
ing to the Census Commissioner all births after the third lead to un-
wanted increase in the size of the family in an undeveloped countey like
ours and it amounts to what he calls improvidence. The proportion
of births of the fourth and higher orders has, therefore, been called the
incidence of Improvident Maternity. This incidence in our country
ranges between 40 and 50%. In Japan this incidence is 26.1% and in
U. S. A. 23.5% and in England 14.9% only. A very interesting feature
of this study is that the incidence of the improvident is higher
in the urbar: than in rural areas. It is rather a very unexpected
result.

(5) The plOportion of deaths in younger ages is very high.


About 50% of the deaths occur amongst children under 10 years as
against onlx 9.7% in U. S. A. and 5.3% in England. The proportion
of such chtldren to the total population is 26.1% in India, 19.60% in
U. S. A. and 15.7% in England. The age specific death rates for age
groups over 10 are also very high in India. The mortality rates for
females in the reproduction ages (15-44) are higher than those for
males in almost all the states of India. In U. S. A. and England the
tendency is reverse. In these COllntries maternal mortality is very low
as compared to OUf country.

As has been pointed out, the statistics of births and deaths as


collected in our country are not complete and their coverage is very
poor. .\[any births and deaths go unregistered and this is the reason
why there is a dilkrencc in the figures of births and deaths based on
registration data and those .calculated on the basis of data provided by
the population census. The following table which gives India's birth
and death rates per thousand for the last 50 years, in decennial averages,
illustrates this point.
846 PUNDAlOSNTALS OF STA'l"ISTICS

Birth and Death Rates (Decennial-Averages)


(pet th9usand of popdaiioo)

Estimated by Reverse
Registered Survival method
Decade
......--_.-.--
Birth Death Birth Death
tate tate tate rate
1901-10 37 48.1 42.6
1911-20 37 34 49..2 48.6
1921-30 34 26 46.4 36.3
1931-40 34 23 45.~ 31.2
1941-50 28 20 .39.9 27.4

The following table sho'\VS the bi~h. death and infant mortality
rates since 1947 based on the registration data.

Birth, Death and Infant Mortality Rates

Per thousand of Per thousand


population births
Vear
Live birth Death Infant
rate rate mortality

1947 26.6 19.7 146


1948 25.2 17.0 130
1949 26.4 15.8 123
1950 24.9 16.1 127
1951 24.9 14.4 130
1952 25.4 13.8 123
1953 24.8 13.0 125
1954 24.4 '12.5 114
1955 27.0 11.7 103
1956 21.6 9.8 102
1957 21.5 11.0 98'
1958 25.1 11.3 92
1959 25.7 12.1
1960 22.4 9.4
GltOWTH OF STA.TISTICS IN INDIA 847
It goes without saying that the birth and death rates presented
in the above table are huge. underestimates. Actually between 194r
and 1951 births had occurred at an average ra~ of 40 per thousand pcr
annum and deaths at an average of Z7 per thousand per annum and
the natural increase in population at an average of 13 per thousand per
annum.
. Viltll SIi1tistiu ,stimal,d from mUlls rePDrll. We have already
l'etei:'red to the estimation of birth and death rates based on population
census data. Apart from general birth and death rates, child ,birth
index, child survival index and child loss index have also been estimated
on the basis of 'census data in certain States'. In the States of Madhya
Pradesh and Travancore, Cochin, statistics were collected about the
average number of live children born to all mothers aged 45 and over
during their child-bearing period (child birth index), the average pum-
ber of children surviving ~child survival index) and the average num-
b~r of children who had predeceased their mothers at the time of enu-
~neration. (Child loss index). These data arc presented in the follow-
Ing table:

Child Birth, Survival and Loss Index


Natur::\l Division State Child Birth Child Sur- Child Loss
Indexvival Index Index
East Madhya. Pradesh 6.1 3.6 2.5
North-West Madhya Pradesh 6.3 3.6 2.7
South-West.Madhya Pradesh 6.6 3.6 3.0
Trayancore Cochin 6.6 4.6 2.0
An experimental census was undertak.en in 61 districts in different
zones of India in 1952-53 and an analysis was also done of the regis-
tration data of 30 municipal towns in 1951. These studies revealed
that. generally speaking, first births accounted for more than one~fifth
of all births, second births for nearly another one-fifth, third births for
about one-sixth and fourths and higber order births for oyer two-fifths
of all births. •
Some inJer,sling jatfl disclosed by Ollr cenillsel. Some interesting
fllcts disc'osed by our censuses. It has been said that a census is the
X-ray examination of the people of a country. They reveal !nteres,t-
ing facts about the population. We give below some of the lnte!" st-
ing information which has been collected in the different censuses of
India
us PUNDAMENTALS OP Sl'ATISTICS

TABLE I
/

Pqpu/ation of Inaja sinGe 1901

1901 23,62,81.245
1911 25,21,22,410
1921 25,13,52,261
1931 27,90,15,498
1941 31,87,01,012
1951 36,11,29,622
1961 43,92,35,082
TABLE II

Total population and land area in some countries in 1951.

India U. S. A. U. S. S. R. World
-------
Population (in crores) 36.1 15.1 19.4 240
Land area (in crore acres) 81.3 590.5 590.4 3251
Area per Gapila (in cents)
(a) All land 225 1264 3246 1354
(b) Agricultural area 97 741 448 351
(&) Arable land 97 302 287 126

TABLE III
Rural and urban population (1921-1961)

Percentage of total
population

Rural Urban
1921 88.6 11.4
1931 87.9 12.1
1941 86.1 13;9
1951 82.7 17.3
1961 82.0 18.0
,
GROW'l'H OP STATISTICS IN INDIA 849
TAlILE IV
Size of ·rural and urban households in 1951.

Type of Household Number of House-


holds
Typical Typical
village Town
Small (3 members or less) 33 38
Medium (from 4 J:o 6 members) 44 41
Large (from 7 to 9 members) 17 19
Very large (10 or more members) 6 5
Total ... 100 100

TABLE V
Sex Ratio 1921 to 1961 (Females per 1000 males)

Total Rural Urbar..


population population population
1921 955 972 847
1931 950 969 839
1941 945 966 830
1951 946 966 860
1961 941 963 845

TABLE VI
Livelihood pattern in S(i;)me other countries
(Per one thousand self-supporting persons)

Occupation India U.S.A. Great


Britain
Agriculture, Animal Husbandry, 706 128 50
Forestry and Fishing.
Mining, Manufactures and Commerce 153 456 555
Other Industries and services 141 416 395
Total 1000 1000 1000
54
850 PUNDAMENTALS OF STATISTICS

TABLE VII
Population Census 1961
Population Density Sex ratio Literacy
(in (No. of (Females Rate
Millions) persons per 1,000 (per 1,000)
per males)
Total square
mile)
1 I 4

States
Andhra Pradesh 35.98 339 981 212
Assam 11.87 252 876 274
Bihar 46.46 691 994 184
Gujrat 20.63 286 940 305
Jammu and Kashmir 3.56 878 110
Kerala 16.90 1,12/ 1.022 408
Madhya Pradesh 32.37 189 953 171
Madras 33.69 669 992 314
Maharashtra 39.55 333 936 298
Mysore ~3.59 318 959 254
Orissa 17.55 292 1,002 217
Punjab 20.31 430 864 242
Rajasthan 20.16 153 908 152
Uttar Pradesh 73.75 649 909 176
West Bengal 34.93 1,032 878 293

Union 'territories
Andaman and Nico- 63,438 (b) 20 617 336
bar Islands
Delhi 2.66 4.640 785 527
Himachal Pradesh 1.35 124- 923 171
Laccadive. Minicoy 24,108 (b) '2,192 1,020 233
1I1ld Amindivi Il'lands
Trip'ltra 1.14 283 932 202
,ALLIND!A 439.24 (c) 370 (d) 941 240
GROWTH OF SYAnsTIcs IN INDIA 851
TABLE VIII
Some Projections of Indian Population

1961 1971 1981 2001


Kingsley Davis
1- 405 451 492 648
II- 384 381 427 459
III- 394 430 476 521
Registrar-General
1- 408 459 528
II- 412 470 535
Coole and Hoover
1- 422 524 662
II- 418 490 641
IIJ- 422 517 588
S. N. Agarwal
1- 423 526 628 788
+10.7 +13.3 +15.8 19.7
The above discussion relating to the vital statistics available in
our country clearly shows that the state of affairs is not satisfactory
even now. Our statistics of births. deaths. infantile mortality and
maternity pattern suff l' from serious drawbacks and unless the regis-
tration of vital statistics is done properly both in the urban as well as
rural areas it would be impossible to study accurately the pattern of
population growth of our country. The Registrar-General of India
should take eady steps to see that an efficient machinery for the col.
lection of vital statistics is established throughout the'country and the
quality of these statistiC$ improves.
Utility of population statistics
It is difficult to over-emphasise the utility of the population statis-
tics in modern times. Though population data are primarily meant for
administration purposes they give valuable information for use by eco-
nomists. businessmen. sociologists, in fact all types of people. The ero·
nomi!1 can make use of population statistics in a large number of ways.
He can study the occupational structure of the population and the rate at
which population is growing and on the basis of these studies he can
make valuable studies. He can study the relationship -between the
growth of population and food supply and he can also study whether
particular industries ~r occupations are overcrowded and whether the
econ-omic policy of the state needs a change to bring about a proper
balance between various types of in dustties and occupations. A hllJineu
."'an similarly can utilise population statistics in a large variety of ways
852 PUNDAldBNTAJ!Ii OF STATISTICS

by studying the density of population. He can find out the best mar kets
for the commodity in which he deals. He can make use of occupational
statistics to find out whether a certain area is inhabited mostly by labou-
rers or agriculturists or by people of a particular occupation. This
would enable him to estimate the demand of his goods at different
places. An inallstrialist can make use of the statistics relating to age
and sex of the population to find out the labour supply that he can
recruit from particular regions. A transport agenfY like railway would
find from the Census Reports the areas which are densely populated
and in which there is scope for the development of tr!lnsport. Simi ..
larly other types of institutions engaged in trade and commerce would
find very valuable data in the Census Reports.
Various types of statistics collected at the time of population
census are useful in their own ways. Thus statistics of marital status
throw light on the pattern of population growth of a country. These
statistics are very important from socio-medical point of view. Sta-
tistics of marital status are also helpful in studying h01lsing problellls
particularly in urban areas. They also throw light on the problelll oj
aep6llaenfY. Similarly statistics of age and sex are not only helpful in
making forecasts abo1lt poplllation for future b~t they are also very help.
ful in studying the problems of fertility. dependency, etc. They also
give an idea of the extent oj l!Ian power that can be mqbilized in times of
war. if necessary. Besides this if statistical data relating to other prob-
lems are classified on the basis of age and sex it becomes more useful
and can be analysed in a better manner.
The importance of the study of literacy and education is also very
grellt particularly in a country where a large number of people are sup-
posed to be illiterate and uneducated. The Government can decide
about the particular policy that it should follow in matters of education
in various parts of the country on the basis of statistics colle~ted at the
time of the census. In the last census in our country the Government
had collected statistics not only about whether a person was literate or
illiterate but also about the hi~hest academic qualifications of the edu-
. cated persons. On the basis oithese facts it is now easy for the Govern-
ment to lay down a policy with regard to the particular fJpe of ,alKa#01l
that is necessary and that should be developed in future. The intor-
mation about the ianll'au spoken by the people is gethered in some
detail, and this helps ln deciding the relative place of regional
lan~uages as medium of instruction for elementary and higher edu-
catlon.
Statistics relating to the economic characteristics of the people
are probably the most important statistics collected at the time of the
census. The data relating to the means of livelihood and the economic
status of the population can be utilised in a large number of ways. It
is possible from these data to find out which occupations and which
industries are overcrowded and on the basis of such studies the govern-
GaOWTH OF STATISTICS IN INDIA 853

ment can take decisions about the r,di.rlribJl#on of popilialion in diffuent


omrpalions. Statistics relating to the economic status of the people
mak.e it possible to find out the number of people who are ",&'onomi&'al{y
a~Ii"," and those who are u 1(ono11li&'ally nol a&'lilll." This is a very useful
information and it Can be utilised in a number of ways. Statistics re-
lating to employment have their own importance. These statistics
make it possible for the government to find out not only the number
of people IIllImployed but also the manner in which people are employed
in different occupations. The extent of under-eJllployment and un-
employment can best be studied if detailed statistics are collected about
them at the time of the population census.
It is obvious from the above survey that population statistics !lre
useful for all types of people and thel can be utilised to solve a large
variety of problems. It is very gratifying to know that people in this
country are now gradually taking interest in statistics and if this ten-
dency continues there is every reason to believe Jhat in future popula-
tion statistics would be put to a large number of uses. In other
countries of the world population statistics are utilised by businessmen"
industrialists, economists, sociologists, transport agencies, insurance
companies, and the government for solving a large variety of prob-
lems connected with their respective fields and we can hope that in
future a similar use of the population statistics would be'made in our
cpuntry also.
SECTION 3

AGRICULTURAL STATISTICS

Early beginnings
Ji4aning. In a broad sense tne term agricultural statistics is \)~ed
to refer to all statistics which have any bearing on argicultural economy,
such as statistics of land utilisation, production, crops, livestock, forest
and fisheries, agricultural prices, wages, land revenue etc. In CommOD
usage, however, this term is used to denote statistics of land utilisation
and of prqduction of crops. We shall be using this term in this sense
throughout this section and shall discuss statistics of agricultural prices,
wages, etc., in another section dealing with price statistics.
Barly b$ginning,r. The collection of agricultural statistics is not
something new to this country. In ancient days the kings and chiefs
who rulea the country used to collect figures of yidd of various crops
and the area under cultivation. The reason for this interest in agri-
cultural statistics was due to the fact that in those days most of the
public revenue was derived from land and as such it was necessary to
have records of area and yield of various crops to find out the amount
ofland revenue. Ain-c-Akbari and some other documents of Moghul
period clearly indicate the manner in which such staostics 'Were collected.
The British people also realised the itnportance of having agricultural
statistics as India was in those days, and even is today, primarily an agri-
cultural country. Prom the point of view of revenue collection, agri··
cultural statistics were not so important for those areas where there was
Permanent Setdement but in Ryotwari areas wbere the amount of land
revenue depended upon net produce· such statistics 'Were absolutely
necessary. Ryotwari system or the system of temporary settlement was
introduced in some parts of the country as early as 1792 and for this
purpose statistics of land, values cost of cultivation, prices of produce
and crop yields had to he collected. .Later on Ryotwari system was
introduced in various parts of the country and agricultural statistics
began to be collected more exhaustively, though on the same lines.
Crop jortta,rJ,r. In those days and even till recently most of the
agricultural statistics of our country were related to crop forecasts th~t
were made from time to time. The history of the present crop forecasts
dates back to the year 1861. Famines and droughts in India were a
common phenomenon in those days and the government had to start the
collection of agricultural statistics with a view to a.ppraise themselves
with the real position. As has been mentioned in earlier sections it was
in the year 1875 that the Department of Agriculture and Commerce was
established in Uttar Pradesh (then known as North Western ProvinCt!~
and later on, on the recommendations of the Indian Famine Commis-
GROWTH Of' STATIS.TICS IN INDIA 855
sion similar departments were opened in other Provinces and the
Central Dcpartm~nt of Agriculture which was closed by the Govern-
ment due to the financial stringency created by the Mghan wax was also
lcvived, to co-ordinate the activities of the various Provincial Agri-
cultural Departments. A Statistical Conference was held at Calc1.ltta'in
1883 and it strongly recommended early publication of crop forecas
and in 1884 the Government of India issued instruction to the Provin-
cial Governments to make an experiment in this line and to start with
wheat. Mer that from time to time both the Central and the Provin-
cial Governments have been taking interest in the matter of crop fore-
casts and in the year 1896 many other commodities were added to
wheat and crop forecasts of important commodities began to be made.
After the first World War certain improvements were made in the agri-
cultural statistics of the country and various researches were conducted
by the Indian Council of Agricultural Research and the Indian Statis-
tical Institute, Calcutta not only with a view to improve the methods
of agriculture followed in this country but also to collect agricultural
statistics on scientific lines. In the last few years the system of making
forecasts and collection of other agricultural statistics has undergone a
very great change and now-a-days crop forecasts which are issued ate
made on scientilic lines by the use of tandom sampling technique.
We shall now discuss some of the important agricultural statistic!.
available in out country at present : -
Area Under Crops
Two sets of acreage figures are available in our country at present
!lit· (i) Official Series based on village records and (it) the N. S. S.
Series based on Sample Surveys. We shall first discuss the Official
Series.
(1) Official Seri,s. Fairly detailed statistics are available in the
Official Series about area under crops. The total area under crops
is. broadly speaking, divided into toad-crops and non-food crops.
Pood crops are further sub-divided into food grains (consisting of
cereals and pulses), sugar, condiments and spices, fruits and vegetables
and other food-crops. Statistics relating to non-food crops are subdivi-
ded into oil seeds, fibres, dyes and tanning materials, drugs and narco-
ti'cs. fodder crops, green manure crops and other non-food crops.
In subsequent pages we shall. exami~e . these statistics in d~tail
and shall discuss the methods by WhICh statIstICS of area under varIOUS
:rops are estimated in our country from village records. As has
been mentioned earlier there are tw9 types of settlements in our
:ountry, namely, (a) temporary settlements and (1J) pe'rmanent settle~
nents. The system of temporary settlements was introduced in
:>ur country in the_year 1892, with a view to fix land revenueS for a
:Jeriod which was subject to change at the time of the nat settle-
ment. Ordinarily the interval between the two settlements was 25
:0 30 years. In order to determine:: the land revenue and to make
!stimates of forecasts, detailed statistics had to be collected about
856 FUNDAMENTALS OF STATISTICS

land revenue, land value, cultivation costs, yield and value of produce
etc. Most of the temporarily settled areas are in U. P., Punjab and
Madras.
The other type of settlement. namely, the permanent settlement
was mostly prevalent in Bihar, Bengal, Orissa and some parts of
eastern U. P. In such areas land revenue was permanently settled
and the question of revision ordinarily did not arise. After the indepen-
dence of the country the land tenure systems were changed and many
land reforms of far-reaching nature were made. The zamindari sys-
tem was abolished and an attempt was made to have owner culti-
vators in the country. We shall discuss the collection of area statis-
tics in these two types of settlements separately, because whereas in
temporarily settled areas there was a permanent reporting agency
collecting figures of area under different crops regularly~ in per-
manently settle9 areas where there was no question of the revision
of land revenue, there was no such agency for the collectiqn of area
statistics and figures relating to such tracts of land (which came under
permanent settlements) were in the nature of very rough estimates
and in many cases even guess work.
Temporarily settled areas
,Techniqll, ojl.lli1lllJlion. In temporarily settled areas the figures
of ~ci:eage a.re collected by the village accountant or the patlllari and
are recorded by him in his register popularly known in northern
India as Khasra. The village accountant is know~ by different names
in various parts of the country such as Karnam in south, Telathi in
Bombay and Karamchari in Bihar. In Uttar Pradesh also patwaris
have been replaced by 1,lcbpaJs. The work of the village accountant
is supervised by his immediate superior officer known by the name
Kanllngo in northern India. He is expected to check at least 7 percent
of the !:hasra number or the entries made by the village accountant.
He is to select those fields for cheCking where numerous changes appear
to have occurred during the year. The selection is not based on
random sampling. These figures of acreage under various crops are
collected on the basis of complete enumeration. Most of the geogra-
phical area of temporarily settled tracts is cadastrally surveyed anu
detailed maps are available in tahsils and district headquarters. The
village accountant who is the primary reporting agency is expected
to conduct a field to field inspection to find out the amount of, area
devoted to different crops.
,The figures supplied by different primary reporting agencies
within a district are totalled to find out the total area devoted to
various crops in the district. State totals are obtained by adding
district, figures.
lShorteomin!,s. Generally it is believed that area figures in the
te~porarily settled tracts are fairly saqsfactory. However absolute
reliance cannot be' placed on them due to a variety of reasons. The
most important reason is the inejfiGien&,y and carelessness of the primary
re porting agen&,y. The reports from many villages are not received
GROWTH OF STATISTICS IN INDIA 857
in time and in many cases they are not included in the compilation of the
figures. On account of this the coverage of the figures becomes incomplete.
There is somt: tendency in the primary reporting agency to avoid
a change. Many times field to field inspection is not done and either
the last year's figures are repeated or the registers are filled on the
basis of pure guess work. Messrs. Bowley and Robertson had suggest-
ed that this could be avoided if the supervisors and higher officials
issue more detailed instructions and kept better supervision. The
suggestion, however, seems to be untenable because already the Agri-
cultural Manual gives very detailed instructions both to the primary
reporting agency and the supervisors with regard to the
manner in which area should be measured and recorded. No amount
of supervision can remove this tendency so long {IS the primary re-
porting agency is unavoidably careless about these measurements.
In' fact the lekhpal and the supervisors are so much preoccupied
with other work; of miscellaneous type that it is too much to expect
more than a passing interest from them so far as this work; is con-
cerned. Some people have suggested that the work; of estimating the
area and yield should be done through some other agency. but there
are certain advantages in the statistical collection and reporting being
done by the agency which is responsible for the related administration
which in this case is the revenue agency. The data collected by such
agencies is lik;ely to be superior than those collected through adhoc
statistical agencies. Moreover when statistics are collected through
administrative agencies such figures as are immediately and urgently
needed, become immediately available to the administrative agency.
Besides, in the course of the collection of statistics such agencies gather
considerable amount of experience and k;nowledge (other than statistics
proper) and this can be utilised only when statistics are collected .by
the administrative agency concerned. For these reasons we are of
opinion that the collection of area and yield statistics should be done
by the Revenue Department and the practice should not be changed.
National Income Committee was of the view that improvement in area
statistic~ could be brought about if the total coverage of figures were
spread over a number of years, say 5, so that every year the crop area
was reported from only 1/5th of the villages. They thus proposed
to spread the complete coverage over a period of say 5 years. This
suggestion was made with a view to reduce the work; load of area repor-
ting to only 1/5th of its present burden. This suggestion would, no
doubt, reduce the burden of the primary reporting agency but the area
statistics obtained, would not necessarily be representative of the total
area distributed over various crops. The Government of India have
recently brought into force a scheme of central supervision and random
checking of the work of Itkhpals ana it may improve the situation to
a certain extent. It would be worthwhile to collect some data on
the basis of random sample surveys relating to the area devoted to
various crops and to compare these figures 'With those obtained by
complete enumeration. This has been done recently through N.S.S.
858 FUNDAMEN'JIALS OF S'I.'A'nSTICS

and it has been e"tablished on the basis of these such studies that even
111 the temporarily settled areas where there is supposed to be complete
.eaumc:ration the figures of area reported have been underestimates.
Another source of error in the estimates of area is presented by
mixed trops. Till recently there was no uniform practice of reporting
such area. The areas covered by several crops in a field ",ere es-
timated in various ways in different States and the estimates in SOme
cases were based on formulae prescribed by the State authority in
individual cases as it was impracticable to prescribe a general method
of calculation. These formulae and ratios were very old and they
needed revision because the composition of crops in mixed farming
undergoes changes quite often. Crop cutting experiments have to be
conducted t'1 find out the ratios and they have to be periodically revised.
The Technical Committee had suggested that in all cases the gross
unadjusted acreage of the mixture should be recorded separately for
each major crop mixture and published in Season and Crop Reports and
Crop Forecasts side by side with the net acreage of the components as
calculated at present. Where fixed ratios were used for estimating the
;omponents of the mixture the Committee suggested that they should
be fiXed for each district and their accuracy shoq,ld be tested at periodical
intervals during the crop cutting surveys. In case of minor crop mixtures
the .areas should be allocated to the various comI,'0nents according
to eye estimates-the practice followed in Punjab and some other parts
of the country. Recently a uniform procedure has been laid down
for adoption by all states, under which for major crop mixtures, the
gross unadjusted acreage under the mixture has to be shown side by
side with the net acr~ge under the components calculated according
to the prescnt practice. In respect of minor crop mixtures the areas
are to be allocated to the various components on the basis of pre-
~termined formulae.
Another factor responsible for inaccuracy and confusion in the
figures of area under various crops is the uncertainty whether the area
means the area lown or the area Ilittess/lIllY tropped. The present rule
is that it is the area sown and not the area harvested . .However if
the first sowing fails and if the area is given to another crop. then in
subsequent forecasts such area is deducted from the area in original
crop and added to the area of the subsequent crops sown.It is suggested
that if on the first sowing there is no germination and even 1f the area
has not been devoted to any other crop such area should not be counted.
However if there is germination hut the crop is extremely bad such area
must be! included. The Technical Committee suggested that the normal
percentage of harvested to sown areas should be estimated through
random sample surveys. This is indeed a very good suggestion to
find- out the difference between the area sown and the area actually
harvested and cropped.
Area lown I11Or, than onte is also responsible for SOme confusion
abuut statistics of area under various crops. Area sown more than
once should be counted separately each time. This practice is
GROW'I'H OF STATISTICS IN INDIA 859

being followed in our country. However it is necessary to indi-


cate such areas separately as othelw1se the total of figures of
area under various crops would be more than the total area on which
cultivation is done. The International Rice Commission (Bangkok)
had also recommended that the area sown with rice twice a year
should be shown separately. Inaccuracy in area figures also re suIts
due to the inclusion of field ridges and bllflds 10 the measurements.
though they are neither sown nor cropped. The present practice
is that permanent bunds which are more than 6 teet wide are ex-
cluded and the temporary bunds and field ridges within a field
separating several plots are included in the figure of area under a
particular crop. The magnitude of area under temporary bunds
is considerable in the hill districts, and in plains also, they do
affect the statistics. It has, however, to be accepted that area under
field ridges and temporary bunds cannot be calculated accurately and
only approximate allowance can be made. In the estimation of random
samplirig crop cutting surveys such areas are always excluded and it is
suggested that a certain percentage, say 2 percent, of the area, should
be considered as occupied by ridges and the proper adjustments should
be made for this in the estimates of area under various crops as compiled
by the Revenue Departments also.
Estimation of IIn~II'lillat,d pat~bt.r in ~1I'tillated fit/tIs also pre-
sents some difficulty and results in certain amount of error in the area
figures. In temporarily settled areas, uncultivated areas are separatdy
shown. The IeJWpal is also expected to make an eye estimate of th.:
un-cultivated patches in cultivated fields and to make allowances for
these. According to the 'Manual' this area is to be measured by poles
and chains but thls is rarely done.
Area II1Id" If'IIit and vegetable within the (HltivalNl fields also pre-
sents considerable difficulties in estimation. Statistics of area taken
up by fruits and vegetables should include only such tracts on which
there are orchards and groves. It would be enough if a complete
enumeration of fruit trees is done once in five years in respect of
other scattered trees. It is estimated that about 6 million acres of
cultivated area in our country is under fru!ts and vegetables. At
present, apart from some ad ho~ figures of production of some important
fruits no other statistics are available. There are no doubt difficulties
on account of extensive mixed sowings in vegetables, in apportioning
areas to different constituents of mixture and a sound techmque for the
collection of data has to be evolved. Recently the Ministry of Food
and Agriculture has given grants-in-aid to State Governments for tak-
ing up pilot surveys for evolution of suitable techniques in respect of
minor crops ansI such studies are in progress in the States of Bombay.
Madhya Pradesh and Rajasthan.
Pennanently settled areas
TuhnilJlle of tlstimatio". In permanently settled areas there are
no detailed surveys of land nor is there, in many cases, a permanent
860 FUNDA.MENTALS OF STA.TISTICS

revenue staff and as such the acreage figures under different crops
cannot be accurately estimated. There is no uniform system of
measuring the area under various crops and different practices are
followed in Bengal, Bihar, Orissa and certain parts of Madras.
The police chaukidar or the village headman usually collects these
primary statistics and there are hardly any senior officers corresponding
to supervisors and kanungos of the temporarily settled areas for supervi-
sion. Till recently, therefore, the estimates of area under various
crops reported for permanently settled regions were largely in the
nature of guess work. The village headman passed on the estimate:; to
the sub-divisional officer who in turn sent the estimates to the district
officer who modified them in the light of his OWn experience. The
Director of Agriculture modified the figures of district officers on the
basis of all the figures received from various districts. It is obvious
that the system was full of defects.
Need. If correct statistics about area under various crops have
to be collected for permanently settled regions it is necessary that
they should also have a permanent staff' of the type that temporarily
settled areas have. Further, it is necessary to have detailed surveys of all
such areas and maps should-be prepared for various tahsils and villages.
So far as area statistics are concerned it has been reco~nised on all hands
that ther' rhould be complete entimeraNOn and that samplIng methods should
not be followed. \
Recent &hanges. Certain steps have been taken in this direc-
tion in the States of Bihar, Orissa and Bengal. In 1944-45 the Bihar
Government established a primary agency of Karamcbaris who have
since been recording the statistics of area on the basis of complete
enumeration. In West Bengal also a plot to plot survey was carried out
in 1944-45 and shortly afterwards a system of collecting acreage statistics
on the basis of random sampling surveys was introduced. Ad boc
investigators were appointed to collect agricultural statistics and the
central Government also gave financial assistance "for the scheme.
It is expected that gradually all the permanently settled regions in
the country would follow a system of complete enumeration and would
have a permanent staff for the collection of agricultural statistics as
the temporarily settled areas have.
In.Jian States under British rule presented another problem in
this respect. A large part of their area was also not surveyed but now
with the integration of these States reporting agencies have been set
up and in future we can hope that better and more extensive statistics
would.be available with regard to these areas also.
StlggIJ#ons for the improvement of acreage statistics. In order that
statistics of area may be improved and more accurate figures may be
available, the method of complete enumeratton should be extended to
permanently settled areas' also. As has been pointed out above in
certain tracts of land under the permanent settlements, area statistics are
being collected on a random sample basis for want of primary .reporting
Gl\Or'WTH OP S'l'AnS'l'tCs TN J.NDIA 861
agency. This affects the quality of these statistics. Complete enumtra-
tion is the best method so far as area statistics are concerned and it has
received the approval of variou.c; expert committees and conferences.
In view of the fact that the basis of all economic development measures
is the village, it is necessary to have detailed statistics about this unit.
The random sampling method may give fairly reliable figures of area
under principal crops for the State as a whole, but it cannot be expected
to give satisfactory area statistics when the figures for smaller units
like district, tabsil or village are needed. Random sampling survey
method cannot be recommended for use in case of minor crops also.
It; was due to these reasons that the Bengal Famine Enquiry COIrl-
mission and the Inter Departmental Committee on Official Statistics
advocated the adoption of the method of complete enumeration for
finding out area statistics. At present out of the total geographical
area of 806.3 million acres 550.7 million acreS (or 68.3%) is under
complete enumeration and 23.1 million acres (or 2.9%) under sample
surveys. For 146.6 million acreS (or 18.2%) there are only rough
estimates kvailable. Thus 89.4% of the total geographical area is
said to be reporting and the remaining 85.9 million acres or 10.6%
is non-reporting.
However certain difficulties arise in case of lI11sllrll!yed areas where
the method of complete enumeration cannot be easily adopted. It
is, therefore, suggested that in such areas which have not been ca-
dastrally surveyed, the services of experienced patwaris and other
primary reporting agencies should be utilised wherever they exist
and they should be asked to prepare maps relating to these areas
after which the estimates of area of various fields should be obtained.
In such rmsllrtJlI.JetI areas where reporting agencies tID not exist special me-
thods of making such estimates can be adopted. The difficulties are
no doubt aggravated by the fact that certain areas are not easily ac-
cessible. In such areas which have not been cadastrally surveyed and
where no reporting agency exists eatperiments can be made to
estimate the area under crops through aerial photography. Some
attempts ;'1 this direction were made by the statistical branch of the
Indian Council of Agricultural Research, in collaboration with the
Surveyor General of India. The earlier experiments made in 19417:.
48 unfortunately did not meet with much success as difficulty was
experienced in distinguishing different crops from the photographs
so obtained. Coloured filters were also used for the purpose but they
also did not give satisfactory results. However recent experiments
have given much better results and they are being continued.
N. S. S. Series. Apart from the figures collected from village
records there is another set or iigures collected through N. S. S. This
series was started vezy recently and it relates mainly to area under various
crops. N. S. S. collects these figures of area under various crops
during its regular rounds of survey. Estimates are given for the whole
country and also for certain population zones. These figures are based
862 FUNDAMENTALS 0,1' STATISTICS

on random sampling. There are differences between these figures and


the official figures obtained through village records.
The differences arise due to the following factorS!:
(i) Difference in methods.
(it') Difference in coverage of crops.
(iii) Difference in seasons to which figures relate.
~it1) Difference in field works of the two agencies.
(,,) Difference in the classification of area in food and fodder
crops.
(I/i) Difference in the allocation of area under mixed crops
and
("il) Due to sampling eItor in the N. S. S. estimates.
It is, however, very difficult to find out the degree of reliability
of,these two sets of figures, namely, the official figures, and the N. S. S.
estimates. It is not possible to say which of these, if any, is accu-
rate. It is really a sad commentary on our agricultural statistics
that we are not in a position even to assess the accuracy of the data
which We possess.
The pre-final estimates of crop acreage are still based mainly
on impressions of the primary reporters. Since 1955-56 the N. S. S
started sample surveys for improving the pre-final estimates of acre-
age but these N. S. S. estimates were found to suffer from huge
sampling errors and moreover they were not available in
time to be of. much use. However the position in this regard is
gradually improving.
As in case of area statistics, two sets of figures are available at
present in our country about yield also. They are: :
(I) Official series, and
(il) N. S. S. series.
The official series is more comprehensive and covers a large
variety of craps, whereas N. S. S. series is, at present, confined
to c.ereals only. Figures of yield in the official series were formerly
estimated on the basis of Normal Yield and Condition Factur. Nor-
mally yield of crops in various districts was estimated on the basis
of crop cutting experiments and it gave an idea about the yield per
acre ordinarily expected in a particular area. This figure was
modified at the time of estimating the yield in a particular season
on the basis of actual conditions prevailing at the time of esti-
mation.
Changes have been made in this technique in recent years and the
system is gradually being replaced by another in which yield is estimated
through crop estimation surveys conducted by the States under the
over-all supervision of the N. S. S. In this new technique the concepts
of normal yield and condition or season factor are done away with, aod
GltOWTH OF STATISTICS l!N INDIA 863

yIeld is directly estimated from crop cutting experiments. Figures


estimated both by "traditional method" (based on normal yield and
condition factor) and the "random sample survey method" are available
at present.
Besides the above official series, N. S. S. itself collects yield statis-
tics during its normal rounds of surveys. These statistics arc based
on random sample surveys but are different from the statistics of the
official series in which surveys are now conducted by different States
under the supervision of N. S. S.
1. Official figures (A) Traclidonal method
Under this method, as has been pointed out, output statistics are
based on conCepts of normal yield and condition factor which deserv:e
some discussion at length.
Normal Yidd
OM COIUtpt. ,Till recently normal yield w.as defined by the Govern-
ment as "average outturn, on average soil, in a year of average character'
as deduced from a consideration of the information obtained on experi-
ments made during the year under review." There was a lot of confu-
sion with regard to the actual sense in which the ,,"ocd c4normal" had
been used by th~ Government. It appears that the Govemment had
confused the word "normal" with the word "average" as at many places
the word "average" had been used to denote the meaning of "normal."
These two words have entirely diflerent meaning. The word "'aver-
age" generally sigiillies a mean of past figures, though it may not neces-
sarily be an arithmetic average. It could be Mode or Median. The
word "'normal" in connectibn with crop yields is actually intended to
mean a crop which ordinarily a cultivator expects. It neither indicates
a bumper crop nor an ordinary or poor crop. It is not necessary that
the normal crop should be an average crop also.It can be better than the
average and it can also be worse. Indian crops are generally sub-normal.
The concept of normal yield in Government mind was very vague
and very often the Government had issued contradictory statements wi~b
regard to its meaning. Generally it is believed that by "normal", the
Government meant the "Mode" of previous crops. It will be worth-
while to compare this definition bf normal yield with the official instruc-
tion of the Bureau of Statistics, U. S. Department of Agriculture. Ac-
cording to it "the normal conditio~~ not an average condition but a
condition above the average giving plElinise of more than an average
crop. Furthermore a normal condltion does not indic9.te a perfect
crop or a crop that is, or promises to be, the largest in quantity and the
best in quality 'Which the region reported upon may be considered capable
of producing. The normal indicates something less than this and this
comes between the average and the possible maximum. The normal
may be described as the condition of perfect healthfulness, unimpaired
by drought, hall, insects or other injurious agency and with such growth
and development as may reasenably be lookeo for under these favour-
able conditions."
864 Ptl'NDAMENTALS OF STATISTICS

The above definition gives a fairly good idea about the concept
of the word "normal" but all the same it is a hypothetical concept and
leaves much rOom for personal bias. It is a subjective estimate. Ob-
viously the concept of the word "normal" in the mind of the Indian
Government was not the same as that of U. S. Bureau. The Govern-
ment of India's opinion about "American normal" was that it was a
"full normal condition" and further that a: normal condition was below
the full normal. It was felt in certain quarters that the system of normal
yield should be replaced by a system of average yield. There are a number
of countries which follow this system. Generally average yields in these
countries are es~imated by finding out the moving average of yields over
a period ofS to 10 years. The adoption of this technique in our country
presented two problems. Firstly this system needed very efficient crop
cutting experiments and secondly it involved a change in the standard
against which condition factor was reported. It was felt by some people
that the village lekhpal.r and crop reporters may not understand the ini·
plication of this change from normal to average yield and they may
continue to express the condition factor against a background of normal
yield. This risk was no doubt great but it was worth taking.
The Department of Agriculture in each State was responsible
for fixing the normal yield per acre for various crops in each district.
The estimate of normal yield was based on the system of crop cutting
experiments which were conducted according to certafn rules laid down
by the State Government. According to these rules average plots of
land were selected by the officers of the Agriculture and Revenue De-
partments and on these fields the crop was sown and cropped before
them. The figures of yield thus arrived at were forwarded to the Direc-
tor of Agriculture who, after taking into account various other facts,
finally fixed the normal yield figures for various districts. Normal
yield figures were generally not changed for a period of 5 years. The
aim of the annual crop cutting experiments WAS to furnish tests of accu-
racy of the original figures and to make it possible for the Agriculture
Department to revise these estimates if necessary after 5 years.
New C(JltCljf. Recently the concept of normal yield has been
changed and it IS now defined "as the moving average of actual yields
per acre as determined on the basis of the results of crop-cutting Str-
veys over a period of ten years."' This is a much better definition a d
it will certainly remove some of the shortcomings associated with he
yield estimates in our country.
Soor/Goming.r and suggestion.r. Apart from the vagueness of the term
'normal yield' and the subjectivity of the concept, the system of estima-
ting normal yield and conducting crop cutting experiments invited a
grea> deal of criticism not only from non-government people but also
from those who were responsible for these calculations. Mr. Allen,
Director of Agriculture, remarked as early as 1932 that "examining the
crop tests which I have had to do since taking over charge I have stuck
with the very flimsy and unreliable evidence on 'Which such district
averages are based. In my opinion, at present the probable error is
GROWTH OF Sl'ATISl'IGS IN INDIA 865

in the neighbourhood of 20 per cent and the sONcal1ed average yield on


which the estimates of the total yield is based is far from reliable."
The most glaring defect of this system of calculation of normal
~ is associate~ with the method of selection of the fJJpicai field for crop
cutting experiments. It is left to the local experimenting officer to
select any field Which he likes and this "purposive selection" is vet"'
undesirable from the statistical point of view as it leaves ample scope
for personal prejudices and bias of the officers concerned. It has to
be admitted however that the task of the selection of "typical field" is
by no means an easy one. This difficulty was foreseen as early as 1893
and it was for thfs reason that the then Chief Secretary of U. p. had
remarked that the difficulty in the selection of the typical field is so great
that these crop cutting experiments are seldom of any value. It was
for this reason that in 1932 Sir Malcolm Hailey. the then Governor of
U.P .• remarked that "my own feeling is that these figures are valueless."
It should be understood that it is not easily possible to effect improve-
ments by giving more time to the experimenting officer as suggested
by some people because the nature of the defect is not related to the
amount of time at the disposal of the perspns concerned. The only
solution of the problem lies in changing the system itself. The pur-
posive selection should be replaced by a system of selection based on
random sampling. However random sample selection would necessitate
a large number of crop cutting experiments. The Royal Commission
on Agriculture in India while denouncing the system of selection by
eye. of the average fields a.s "statistically indefensible" also pointed
out "that before random selection of village and fields for crop cutting
experiments is introduced the means to carry out far more numerous
experiments must be provided for."
Anotlier reason of inaccuracy of normal yield figures is that the
officials of the Revenue Department who carryon these experiments
are usually prloccupied with variolls types of miscillaneolls wonk and tbey
do not pay much attention to this work entrusted to them. Though
it is expected that senior officials of the Revenue Department will look
after this work yet in actual practice, it is the supervisor or the Kanflngo
who selects the field and he generally takes care to see that the field
selected is near the camp of his officers to suit their convenience. It
would have been better if the services of trained Economic Intelligence
Inspectors were utilised for this purpose.
Usually thl districts in which the 1?f"0P_ clliling experiments are carried
.In are morl or less selected once for a// and the nllmber of crop clltting experi-
nients condl/cled are very f~lJJ. According to Government rules not l,ess
than four experiments should be made for each crop. This is the mini-
mum but for all practical purposes it is the maximum also. The num-
ber of experiments should be more than this. The argument that I:ly
increasing the number of such experiments the quality of work; will go
down. is futile And absolutely meaningless.
According to present rules the area of crop Cllt is generally 1!10th
of an acre and in some cases like sugarcane it is 1 N40th of an acre. There
55
866 FONDAMENT.\LS OF STATISTICS

is no uniformity in the area of the crop cut and it is very difficult to say-
whether it would be worthwhile to bring about uniformity in this
matter.
There is also a feeling that these figures of crop cutting experi-
ments are not proper,", utilised. They are kept in safe custody for a period
of five years till the question of revision of the normal yield figure comes
up. Such figures should be carefully examined as and when they are
received and a consolidated review of all the districts and for all the
crops should be prepared each year and sent to the officers of the dis-
trict where crop cutting experiments were made. Such reviews should
point out the mistakes committed in conducting crop cutting experi-
ments. When these figures would be scrutinised annually, it may not
be necessary to revise the normal yield figures rigidly after every five
years only. Figures could be revised as and when necessary, and the
annual scrutiny will guide the time when revision is needed.
Condition or Seasonal factor
Concept and lechni(j1le of estiPlalion. The figure of normal yield as
calculated on the basis of _crop cutting experiments has to be adjusted
in the light of the conditions prevailing at the time of the estimation
of crop yields. This condition or seasonal factor denoted the condition
of the crop in a particular season in relation -to the normal crop. It
was usually expressed in terms of annas, a fixed number of annas repre-
senting the normal. It was a purely subjective estimate and was not
arrived at by any type of statistical calculation. Once or more during
the growth of the crop and again at the time of harvesting, the crop
reporters estimated the yield as so many annas taking a de£nite
number of annas, for example 12 or 16 as standard. The tehsiltlar re-
ceived such statemen~s from various crop reporters in his jurisdiction
and he assumed some sort of average from them using also his general
knowledge of crop condition and he reported a single figure for the
whole tehsil to the district officer. The district officer modified this
figure according to his knowledge and either proceeded like the lehsildar
and selected a single figure for the district as a whole or applied the
anna yield to the area sown for each tehsil separately and reported an
average for the whole district,
Defects. This system of estimating the seasonal factor was ex-
tremely defective. The crop reporters were generally uneducated or
little educated people and the task assigned to them was one which
required a very great ability and perfectly unbiased and balanced state
of mind. AFi a matter of fact it is very difficult even for trained per-
sons to give an accurate idea is about the condition of a crop as compared
to a normal crop which itself is a vague term, and as such it should not
be any surprise if the untrained crop reporters couid not give correct
~~. . .
Crop reporters were in many cases found to be bIased. It was
generally believed that the official crop 1eporters 'Were pessimistic by
nature and it resulted in underestimation o( the crop. Probably the
GROWTH OP STATISTICS IN INDIA 867
reason for this pessimism was that the idea of the word "normal" in
their minds was somewhat different from the one which the Govern·
ment intended it to be. Their idea of a normal crop probably was
'a crop which they ,longed to See but rarely saw.' When they compared
the actual crop with this concept of the normal yield underestimation
was bound to occur.
It was also felt that the figures of allllalPari estimates were some-
t).mes vitiated on !lccount of the fact that in temporarily settled tracks
remission in the land revenue had to be granted when the seasonal
condition fell below a certain perct:ntage of the normal. It was not
unlikely that when the seasonal conditions were on the border line for
the grant of remission, some Overzealous crop reporters pitched their
estimates too high, thinking that they would displease their superiors
by the possible loss of revenue. I
Moreover, the number of reports received about the condition
factor was not adequate and there was also a tendency in the crop re-
porters to favour even figures. The position waS further aggravated
by the fact that it was not possible to estimate the extent of error of these
estimates. These estimates were usually guess work and it was diffi-
cult to say whether they were correct or incorrect and what was the
extent of error contained in them. '
Besides these discrepancies in the collection of these ilguresthere
were many drawbacks in the working of these figures for final estimates'
'The methods by which the tehsildar arrived at the condition figure of
the tehsil and the district officer for the district were open to objection.
No recognised system of averagin~ was followed and this was supposed
to be a very serious drawback of these estimates. Messrs. Bowley and
Robertson had suggested that arithmetic average of tehsil figures
should be calculated for aqiving at the district condition figure, but we
continued to follow the old practice and no change was introduced
except in some states like Madras where the arithmetic average of the
village figures was calculated to find out estimate for the 'firka' and
weighted arithmetic average ofJirka estimates to find out the condition
factor for the revenue circle. The weights were proportional ~o the
area under the crop in various firkas. It would have been better jf the
condition figure was represented as a percentage of the normal other
than in terms of annas.
Recently the Central Government has standardised the procedure,
for arriving at the condition factor and recommended to the states to
adopt the n<;w pattern.
The above discussion clearly shows that the yield statistics as
calculated by traditional method of finding out the normal yield and
the condition factor Were fhn of defects. Both the normal yield and
the condition factor coul~ not be correctly estimated and the yield
estimates were extremely faulty. At present the final estimate of crop
production even in the official series are made on the basis cf randoin
sample surveys. Yet it is, necessary to obtain accurate data about the
868 PUNDAMENTALS OF STATISTICS

normal yield and the cOhdition factor because the earlier crop estimates
are still based on them. 11 is gratif'ying to note that the normalyield is now
computed as a moving average of the actual average'yields per acre obtained by
Crop Cutting Surveys but these surveys should be conducted on random sampling
basis during the precedingyears and the condition factor for each district should
be computed as the weig'hted average of tehsilfigures, weights being proportionate
to area under the crop in various tehsils. Certain improvements have been
made in some States but the geneo:al condition cannot be said to be
satisfactory as yet.
1. Official figures-(B) Random sampling method
Due to the various defects pointed out above it was realised that
the system of crop estimation in our country ne,eded a change. It was
also thought that crop estimation should be done on the basis of random
sampling surveys rather than on the basis of concepts like normal yield
and anna condition which were subjective in nature and always contained
an element of uncertainty and guess work. It is a matter of consider-
able satisfaction that the scientific method of random sampling is fast
replacing the traditional system so full of defects as we have just seen.
This system makes super.fluous the determination of normal yield and
condition factor and gives the estimates of yield directly. Besides
being more straightforward and scientific, random sampling is re-,
latively less exposed to personal bias. Indirectly it takes into account
the effects of such factors as rainfall, soil, irrigation, methods of culti-
vation, etc.
Techniqlle. "The technique of random sampling consists, in
principle, of choosing a sample of elements out of a given totality of
elements comprising the population, in a manner as to offer each element
of the totality an equal chahce of inclusion in the sample. The tech-
nique not only ensures that the sample is representative of the population
but also provides the means of knowing how far one is likely to be in
error in estimating any characteristic of the population on the basis
of this sample. The advantages of such a method in yield estimation
are that we are able to obtain an unbiased estimate of the average yield
per acre and can determine in addition the margin of error by which
the estimated average yield is likely to depart from the true unknown
value of the yield for the tract survey.".
The random sampling method for the estimation of yield statistics
is the most scientific method. The Brst random samfling scheme on
scientific lines as reco~ended by the Indian Counci of Agricultural
Research was carried ouf by the Indian Central Cotton Committee in
1942 in Akola district of C~ntraI Provinces. In 194.3-44 it was exten-
ded to Buldana district also. In the year 1944-45 it covered the whole
cotton belt of Central Provinces and the estimates of the surveys were
found to be about 10% higher than the official estimates. The first
• "Statistics of area and yield of crop in India" by Dr. P. V. Sukhatme,
A~rlcultural Situation in India, March, 1949.
GROWTH OP STATISTICS IN INDIA 869

sample survey in food·crops was carried out by the Indian Council of


Agricu"ltural Research in the Punjab and Central Provinces in ihe year
1943-44. The area covered was more than 150 million acres. In
some cases the figures as obtained by random sample studies were simi-
lar to those obtained by the tr,:aditional method while in others there
was a wide variation between them. At present the estimates of produc-
tion of principal food grains are based on the results of crop cufting surlJeys
based on random sampling, in most of the Stales. In case of cash crops also
the position is gradually imprOfJing.
The plan of random sample surveys to estimate crop yields con-
sists in selecting a certain number of villages from each lehsil (or lalllka)
of a State, at random, and in selecting within each village by the same
principle, two or three fields out of all the fields on which a particular
crop has been sown. In each selected field a plot usually of the siZe
bf i-8th of an acre (33' X i6i') is located at random. The selected plot
is properly marked and the experiment consists of cutting the crop,
having the produce threshed, winnowed and weighted. This is usually
done a few days before the crop is" due for harvesting. Proper allow-
anCe is made for the moisture contained on the basis of experiment:.
conducted for this purpose. Once the yield is elitimated on this basis
it is Simply to be multiplied by the area under cultivation to obtain the
figures of total yield of various crops. Thus this system entirely doe'S
away with the necessity of obtaining normal yield estimates and the
estimates of condition factor. Under it the yield is estimated directly
by multiplying the area with the estimated yield. The entire field work
is carried out by the existing staff of the Revenue and the Agriculture
Departments under the technical surpervision of the statistical staff of
the Indian Council of Agricultural Research. The method of random
sampling as applied to crop cutting experiments has achieved great
prominence. A major part of the food crops of the country is covered
by this system and this system has been successfully extended to· crops
other than food grains, i.e., cotton, jute, paddy and certain oil seeds.
This method has noW gained official recognition from the Govern-
ment of India in a restricted sense. Final forecast of crop yields are
now based on this method for such crops and in such States where the
experimental stage of these surveys is over.
Besides the Indian Counc!l of Agricultural Research, the Indian
Statistical Institute, Calcutta, also started experiments on the basis of
random sampling methods. They have not confined their experiments
only to the yield estimate but have extended them for the estimation
of area statistics also., The Erinciple underlying these surveys con-
ducted by the Indian Council of Agrlcultural Research and those con-
ouctcd by the Indian Statistical Institute;: is the same but there are some
fundamental differences in the details of the procedure between these
two types of surveys. For instance, the sampling unit in case of I.e.A.
R. is the village whereas the Institute does not select random villages.
They think that as the villages are not of equal size they cannot strictly
speakin'g give every inch ofland an equal chance of.being incJuded in
870. FUNDAMENTALS OF STATISTICS

the sample. Another point of difference is the shape and size of the
field selected. The Statistical Institute selects, at random. plots of 1000
square feet to form the sample plots. The I. C. A. R. selects rectan-
gular plots which are larger 1n size than those selected by the Institute.
Again while the Institute appoints special staff of field investigators
for conducting these surveys the 1. C. A. R. surveys are conducted
through the existing agencies of the Revenue and Agriculture Depart-
ment.
As has been s!lid earlier that even though the final estimates of crop
yields are now based on the resul~s of crop cutting experiments by the
random sampling methods yet it is necessary to obtain accurate data
regarding the condition factor and the normal yield for earlier crop
forecasts. As such it is necessary to effect the improvements that
have been suggested in the foregoing pages with regard to the estima-
tion of the normal yield and the allllatpt!rl estimate. The Government
is conscious of this fact and efforts are being made to effect improve-
ments.
DifeC!l.r. Though theoretically speaWng tpe results based on the
new technique of crop cutting surveys conducted on random sampling
basis should be better than those under traditional m("thod in which
subjectivity was present in ample measure yet it cannot be asserted
with any degree of confidence that it is so. Thue .rl(rlley.r are no/ (ondIiG/-
ted iIJ a satisfaGtory manner and the supervision from outsid, agenGilS I,alll.(
milch to he desired. The supervising agencies are generally overburdened
with their own work: which is of a varied type and they do not attach
much importance to this aSSignment. Probably they do not realise
the great im.portance which should be associated with this work. It
has been est1mated that effective supervision over these State crop
cutting survey.) is not more than 2% of the total number. This amount
of supervision is very inadequate and it need~ strengthening. Yet
another shortcoming of these estimates is that the Sllrtley.r ar, not GOlllllGled
on a uniform pattern in all States. Some States like U. P. follow the
pattern of 1. C. A. R. surveys while others like West Bengal follow the
I. S. I. pattern. It has been pointed out that these two techniques con-
siderably differ from each other. The samp~ng unit in I. C. A. R. sur-
veys is the villa.ge whereas in 1. S. I. Surveys the village is no~ the
unit. The shape and the siZe of the plot also differ in these two types
of surveys.
In order to have dependable statistics relating to agricultural out-
put it is necessary that these drawbacks are removed and the scope and
coverage of the official series is enlarged so as to include minor crops
and fruits and vegetables etc. about which no statistics are available
a~ present.
2. National Sample Survey series
The N. S. S. also collects data relating to yield of major cereal
crops during their regular rounds. These figures are collected on the
bas1s of sample surveys and are obtained through their general purpose
GROWTH OF STATISTICS IN INDlA 871

investigators who collect statistics about a large variety of diverse items


like consumers' income and expenditure, farm planning and population
planning etc. This agency appears to be unsuitable for this purpose
as the nature of its work is too varied and the task of collection of crop
cutting data requires investigators who are specially trained for the
job. Recently, conducting of N.S.S. yield survey is being supervised
by the officials of the State Bureaus of Economics and statistics and
itis, therefore, hoped that this defect would be gradual1y removed.
The figures collected by State (the official series) di.ffer from
those collected through the N. S. S. It should be remembered that
the figures in the official series are also collected now for a large part
on the basis of crop estimates on random sampling basis and yet there
is sometimes a wide divergence between these two sets of figures. It
is difficult to say which of the two is more dependable.
Our yi6id statistics are yet very defective and we do not still know how
much do we produce whetber it is mough for ollr own requirements whetber we
can export if there is a surplus and how much to import if there is a deficit.
This is a very unsatisfactory state of affairs and at a time when lIIe are having
a planned economic development this lacuna is reallY One IIIhich deserves all
attention.
Crop estimates
Crop estimates which were formerly known as crop forecasts are
being issued in this country ever since the close of the 19th century,
when the first forecast relating to wheat was issued. The purpose of
the crop estimates is to give an idea about the probable size of the crop
before it is actually harvested. Usually there are three estimates for
each crop but there is not hard and fast rule as some crops have only
two or even one estimate and others have more than three. These
estimates are published on different dates of the year taking into
account the harvesting time of each crop. The first estimate is gene-
rally issued about a month after the completion of sowings. It is
meant to give an idea about the probable area sown and weather con-
ditions at the time of sowing. It also indicates how the germination
is proceeding. The second estimate is generally issued two months
after the first one and it includes the area of late sowings and indicates
the probable character of the crop and in some cases even the expected
percentage yield. The final estimate contains the data about the total
area sown and the yield expected to be harvested during the season.
The question of extending the scope of the forecasts of crops has
been receiving the attention of the Government and a number of crops
have been added to the list during the last few years. In the year 1944-
45 all India crop estimates were started in jow::.r, bajra, and maize. In
the year 1948-49 barley. gram and ragi were added to the list. Tobacco
was included in the list in the year 1949-50. At present the Govern-
ment of India issues crop estimates for 23 crops grouped as under : -
(a) Cereals: rice, jowar, 'bajra, maize, ragi, wheat and barley.
(b) Pulses: gram. pulses (tur) other kharif pulses and other rabi
pulses.
872 FUNDAMENTALS OF STATISTICS

(c) Oiheeds: groundnut, sesamum, rape and mustard, linseed


and castor seeds.
(d) Fibres: cotton. jute and mesta.
(e) Plantations: tea. coffee and rubber.
(f) Miscellaneons: sugarcane. potatoes, tobacco, pepper, ginger
and chillies.
In respect of castor seed, pepper and sugar only one estimate is
issued in a year: For jowar. bajra. maite, ragi. jute and other kharif and
rabi pulses two estimates each are issued and for wheat and cotton five
estimates are issued each time. So far as the estimates prior to the
final estimates are. concerned even at present they are based on the
data supplied by the primary reporting agency, that is, the patwari. He
is required to furnish information about rough estimates of the area
sown and the general weather conditions at the time of sowing. How-
ever, for the final estimate in the temporarily settled areas field to field
investigation is undertaken for keeping up-to-date and correct figures of
area and condition of the crop. Under the old method the patwart
had also to submit the estimates about condition factor or the Anna-
wad estimate. At present. however. the final estimates are based on
the random sampling surveys which are condusted by the Department
of Agriculture. Where random sampling crop cutting surveys are
conducted the question of having the estimates of normal yield and
crop condition does not arise. The Director of AgrIculture gets the
figures of yil'ld directly from his own staff. The final estimates are
revised if necessary in the light of sUQsequent information available and
at the time of the final forecasts. In the next year the revised estimates
of the previous year are also published. N. S. S. figures are not used
for official crop estimates.
Shortcomings. These crop estimates which are virtually statistics
of crop output unde~ various stages suffer from a number of defects.
It has already been pointed out that earlier crop estimates are based
on the traditional method of normal yield and seasonal factor and as
such suffer from all the defects and shortcomings associated with these
concepts. Even. in the final estimate where largely random sampling
data are used there are certain defects which have already been pointed
out earlier. Apart from these defects there is a deplorable lack of
punctuality in the publication of these forecasts. Sometimes these
estlmates are published much after the crop has aGtually been harvested
and has actually reached the market. The utility of the crop estimates
is considerably reduced if they are published: after actual harvesting.
The delay in publication is due to the fact th"l.t all India and State fore-
casts are released simultaneously and figures froin many States are not
received in time because of late sowings in certain areas. This delays
the publication of all crop estimates. It would be better if State
estimates are released as and when they are ready. The delay in the
publication of forecasts can also be reduced to a (,·ttain extent if the
overburdened patwari is released of some of his ~:, .les to e1.'lllblc his
GROWTH OF STATISTICS IN INDIA 873

to submit the various figures within the time limit prescribed. A cer-
tain amount of delay in the final crop estimate is also due to the
fact that in the random sampling method the estimate is made fairly
late when the crop is almost ready for harvesting. If crop estimation
under this method 'Was don" a little earlier probably this reason
for delay could also be avoided.
Another defect of our crop estimates is that they contain infor-
mation which is generally a month old whereas in other countries
crop forecasts generally relate to information which is only one week
old. If crop forecasts are released early. this period could be reduced
in our country as well.
The crop estimates do not receive due publicity in our country.
They are no doubt published in daily papers and magazines and also
broadcast by All India Radio yet all this is not enough. The ser-
vices of the Information Department of the Government should be
utilised further to have better publici ty.
At present the number of estimates are too many in respect of
some crops and too few for others. Two good crop estimates are
better than 4 or 5 unsatisfactory oneS and it would be better if efforts
are made to have only two or at the most three forecasts for each
crop and these forecasts are of a more satisfactory quality than the
present ones.
It would also be better if the form on which the crop estimates
are submitted by the primary reporting agencies are standardised for
the country as a whole. and information is called about the area under
irrigated crops and probable harvest prices also.
In order to judge the standard of accuracy achieved by the crop
estimates attempts should be made to estimate the output by other
methods also. In case of commodities like cotton and jute an estimate
could be obtained by finding out cQnsumption of these goods by Indian
mills. figures of exports and imports and of stock at th~ beginning and
end of the season. Similar methods could be used for other crops also.
In countries like United Kingdom and U. S. A. objectivity to
crop estimates is imparted by correlating them with weather conditions.
They havC? found by experience that there is a definite relationship
between crop yield and weather conditions and that crop forecasts based
on weather conditions combined with a study of soil fertility manuring
etc. are more accurate than subjective forecasts. We could also make
experiments in these directions.
One more suggestion can be given to impart objectivity to these
estimates which are at present subjective in nature. In some coun-
tries like Japan. suitable systems of physical measurements during the
various stages of crop growth have been evolved and they are supposed
to give a dependable idea about the degree of accuracy of crop forecast-
ing. The importance of pre-harvest estimates is considerable in the
context of price policies and food administratio'n and such a step. if
taken in our country. will be fully justified.
874 FUNDAMENTALS ~F STATISTICS

Land utilisation statistics


Illdiall AgricultllraI-Stath/it'S published by the Directorate of
Economics...and Statistics under the Ministry. of Food and Agriculture
gives detailed information about the manner in which land is utilised
in the country. Volume I of this publication gives the State-wise
statistics and Volume II district-wise. These statistics are based on the
reports of acreage submitted by the patwari for purposes of revenue
cqIlection. In the permanently settled areas, however, where till re-
cently tbere was no such agency as village accountant these statistics
were not available.' Attempts are being made now to collect land uti-
lisation statistics in the permanently settled areas also. Besides this'
such statistic~ are not available for such regions which are .inaccessible
and which have not been mapped and surveyed propedy. At present
such statistics are available for about 85% of the total geographical
area of the country.
Normally land utilisation statistics wyre available under the
following five headings : -
(1) Area under forests;
(2) Area not available for cultivation;
(3) Other cultivated lands including current fallows;
(4) Area under current fallows, and
(5) Net area sown.
The above classification was found to be unsatis?ctory on account
ot the fact that it did not throw light on sman categories of land used
and since such statistics were necessary for agricultural planning the
classification was changed in the year 1950-51. At present there are
nine classes under which 'information is available. They are : -
(1) Forests.
(2) Land put to non-agricultural uses;
(3) Barren and uncultivated land;
(4) Permanent pastures and other grazing land;
(5) Miscellaneous tree crops and groves;
(6) Cultivable waste;
(7) Other fallows;
(8) Current fallows; and
(9) Net Area sown.
Besid~s the statistics relating to the classification of area, another
important heading under which land utilisation statistics are available
relates to the classification of area. sown crop-wise. These stfltistics are
fairly detailed so far as the temporarily settled areas are concerned.
Area under each crop is known in all temporarily settleQ regions as
there is a permanent revenue' staff to collect these statistics. As has
been said earliet attempts are being made to obtain these statistics
accurately for the permanently settled regions also. These statistics
are usually published 'under two . broad headings, namely, food crops
and non-food crops. The food crops group includes cereals, potatoes,
GROW"rH OF STATISTICS IN INDIA 875
sugarcane, spices, fruits and vegetables, etc. The non~food crops group
includes oilseeds. fibres, dyes, drugs and narcotics, green manures,
fodder crops, etc. In our country more than 80% of the cultivated
area is devoted to food crops and less than 20% to non-food crops.
Besides these, statistics are also available abou~ the area which is
irrigated. These statistics are available both according to the Source
of irrigation as also according to the crops .irrigated. They are also
published in the Agricultural Statistics of India. This publication gives
details of the irrigated area classified according·to different sources of
irrigation in the following manner : - '
(1) Government Canals;
(2) Private Canals,
(3) Tanks.
(4) Well~ and
(5) Other sources-mainly tempotary bunds for storage of rain
water and streams too small to be classed as canals.
The total of the areas of all the above five items gives the total
irrigated area. It means that the irrigated areas sown more than once
are counted only once and only net areas are considt'.red. Figures of
irrigated area under various crops are also available in this publication.
Separate figures are given for rice, wheat, barley, jowar, bajra, maize,
sugarcane and cotton. Miscellaneous crops are shown under three
sClb-headings. The total area under each crop is the gross area irrigated
or in other words areas sown more than once in the year are counted
separately for each crop. 'rhe reason why the total area under irrigated
crops is greater than the total irrigated area is easily explained by what'
has been said above. In the first case, i.I., in estimating the total area
under irrigated crops the area estimated at both the harvests of the year
will be counted twice whereas in the second case, i.e., in calculating
the total irrigated areas it will be counted only once. It should be
remembered that the classification of area under irrigation between
government canals and private canals is not uniform throughout the
country. In Uttar Pradesh the figures of both the private and govern-
ment canals are included in government canals but in Madhya Pradesh
the figures of government canals are included in pdvate canals. The
ligures are generally only rough estimates except for land irrigated
from government source. Other figures are not completely reliable.
Publications on agricultural statistics
Agricultural statistics which have been discussed above are pub-
lished by the Directorate of Economics and Statistics under the Ministry
of Food and Agriculture in the following major publications -
(1) Agricultural Situation in India (monthly);
(2) Indian Agricultural Statistics (annual);
(3) Estimates of Area ahd Produttiou of Principal Crops in India
(annual);
(4) Indian Land Revenue Statistics (annual);
876 FUNDAMENTALS OF STATISTICS

(5) Average Yield Per Acre of Principal Crops in India (quin·


quennial);
(6) Bulletin of Food Statistics (annual);
(7) Commodity Statistics Series including separate Pablications
of Food-grains, Tea, Coffee, Rubber, Cotton, Jute, Tobacco,
Sugar, Lac (annual);
(8) Abstract of Agricultural Statistics;
(9) Indian Crop Calendar;
(10) Indian Agricultural Atlas.
Besides these, important agricultural statistics 'are also published
in : -
(1) Statistical Abstract ofIndia (annual);
~2) Monthly Abstract of Statistics;
(3) Monthly Survey of Business Conditions;
(4) Indian Trade Journal;
(5) Publications of the Reserve Bank:;
(6) Commerce;
(7) Eastern Economisr.
(8) Capital, and other important commercial magazines.
General shortcomings of agricultural 8tatistics in I~dia
From the foregoing discussion it is easy to enumerate the ge-
neral defects in the agricultural statistics relating to .our country.
The Technical Committee appointed by the Government of India
under the chairmanship of Mr. W. R. Natu has classified the general
shortcomings of the Indian agricultural statistics under the following
heads : -
(1) Gaps in coverage;
(2) Lack of uniformity i.o definition and classification;
(3) Defects in tabulation and processing;
(4) Defects of primary reporting agency;
(5) Defects of inspection, supervision and check:ing. and
(6) Defects of planning and co-ordination
We shall now discuss them in turn:
Gaps in C()1Jert~ge. Gaps i~ the coverage of Indian agricultural
statistics are of two types, namely, (0) gaps in geographical coverage,
and (b) nOn-availability of sbtistics in respect of certain areas. As has
been said earlier agricultural statistics are not available for the whole-
geographical area of the country. :hven at present the geographical
area covered by agricultural statistics discussed above is roughly three-
fourths of the total geographical area of the country. No estimates are
at present available for; one-fourth of the area of the country. Most
of the non-reporting areas are not cadastrally surveyed and it is a great
iml)cdiment in the way of proper organisation and collection of agri-
cuftural statistics in these areas. The chief handicaps in those areas
which have been surveyed is the non-existence of any primary reports
ing agency. It is, therefore, necessary that so far as such areas are
GROWTH OP STATISTICS IN INDIA 877

concerned which have been cadastrally surveyed and where there is no


primary reporting agency one should be established. So far as such
areas are concerned where no surveys have been undertaken and which
have not been mapped aerial photography or some other method
should be adopted for collecting statistics of area and yield.
Non-availability of statistics in respect of certain items such as
minor cereals and pulses, principal condiments and, spices; fruits and
vegetables and fodder and cattle feed, etc., is another type of short-
coming relating to gaps in the coverage of agricultural st!ltistics in this
country. Besides .hese, very reliable information is not available regard-
ing certain other items like cultivators' holdings,animal husbandry
products and agricultural labour. The need for collecting reliable
statistics about cost of production of principal crops, utilisation of
ag'ricultural produce, rural indebtedness, etc., has also been felt since
long. It is gratifying to note that in recent years the Government has
been trying its best to remedy these defects. The reporting area in the
country has increased and it is likely to increase in future also until
the whole country is covered. Reporting agencies are also being
~radually set up in such areas which did not have them and regular
estimates are also being published for several minor crops. We can,
therefore, say that the position is gradually improving and we can hope
that in a few years 'time gaps in coverage or agricultural statistics of
India would be completely filled up.
Lack of uniformity in definition, etc. It has been pointed out that
various terms and phrases are used in different sense in different parts
of the country and even at present there is lack of uniformity in d efini-
dons of various terms in the classification of agricultural statistics. Not
only this, even the technique of analysis of the collected figures is not
uniform throughout the country. The methods of obtaining area statis-
tics differ between temporarily settled areas and permanently settled areas.
The acreage statistics of the country are not of a uniform quality. The
definitions of items entering the classHication of area adopted for the
statistics published in Agritultll1'al Statistics of India are not uniform in
all the States, for example the definitions of the terms fallow land,
cultivable waste, etc., differ in different states. The method of yield
estimation also varies from state to state. While some States followed
the methOd of normal yield others like Punjab adopted the method of
direct estimation and now as we have discussed earlier, final estimates
of crops are based on random sa.mpling method. Under the traditional
method of estimation of crop yield, the condition factor expressed in
~erms of Annawari estimates did not indicate the same anna equivalent
10 all the States. Normal crop was indicated bv 12 annas in some States
and by 13.3 annas and 16 annas in other States. Besides this, even at
pr.esent there is divergence in the methods of recording area under
mIxed crops and also in the methods of recording area relating to
uncultivated patches in a cultivated field and the area under field
idges and bunds. It is gratifying to note that the Government is
king certain steps to remedy these defects also.
878 FUNDAMBNTAL$ OF S'l'ATISnCS

De/,tts of tabulation. It is a well recognised fact that in our country


even at present a large quantity of statistical data collected are not pro-
perly utilised on account of lack of proper processing. Even at present
a lot of useful information is available at the primary source but is not
properly tabulated and as such cannot be properly utilised. For
example. in East Punjab, Delhi and Madhya Pradesh information re-
garding tansfers of agricultural property is collected for each village by
the revenue agency but this information is not properly consolidated
and tabulated and no figures are available for the State as a whole. In
many other States if the data collected by the primary agency are pro-
perly processed· very useful studies can b~ made but unfortunately this
is not done and, a large amount of data is wasted. It is extremely
necessary that the available statistics relating to agriculture should be
tabulated and accurately analysed so that correct decisions can be taken,
about the policies to' be followed for the agricultural development of
the country.
Defects oj Primary R,porting Agmry. The primary reporting agency
so far as temporarily settfed areas in this country are concerned, is the
village accountant or the patwari who is known by different names in
various parts of the country. In permanently settled areas as we have
discussed earlier, till recently there was no permanent reporting agency
but it is gradually being set up. The accuracy of agricultural statistics
depends to a very large extent on the manner in which the primary
reporting agency discharges its work. If the data ~ollected by them are
biased or inaccurate for any reason the whole analysis of the agricultu-
ral statistics relating to the country would be defective. It is extremely
necessary that the data collected ~t the source should be as accurate as
possible. We have already pointed out that the work done by the
primary reporting agency is unfortunately of an unsatisfactory character
in our country. The patwari is a low paid Government servant who
is burdened with a large variety of works and it is unreasonable to
expect that he would devote enough time for the collection of statistics
In a proper manner. In many caSes. the patwaris deliberately ente!
inaccurate entries in their registers and thus the agricultural statistics
become inaccurate at the very source lit which they are collected.
Recently governments have taken steps to reduce the charge of the
patwaris and attempts are also being made to keep aproper supervision
on their work. In Uttar Pradesh on account of the mass resignation of
the patwaris a couple of years ago, the government has replaced this
agency by what are called /,lehpa/s. If proper trainin~ is given ta these
people and if they are not overburdened with work 1t can be expected
that the statistics collected by them would be of a satisfactory character
and the defects associated with the primary reporting agency would be
considerably minimised if not completely eliminated.
Defut of inspution, stI/iervirion, it ha~ been mentioned earlier
that the work of the patwari is cbecked by the kanungo, naib tahsildar
and tahsildars. These people who are also burdened with heavy admi-:.
nistrative duties do not always devote personal attention to such inspec-
GROWTH OF STATISTICS IN INDIA 879
tion and the result is that the defects in the statistics collected by the
l,khpall remain unnoticed and the statistics are passed on to higher
stages without any correction or editing. At present the kanungo~
and the naib-tahsildars, etc., have to check a certain peJ;centage of the::
entries made by the I,J:hpal in his register and they arc expected to
check' such cases in which there have been considerable changes from
the past. It is a wrong method of checking the work. The entries
which are checked by these people should be selected on the basis of
random sampling and then it would be possible to have some idea
about the extent of inaccuracy in the figures supplied by the 'patwari.
The tour programmes of these officials are generally commun_icated to
the I,J:hpai beforehand and there is no possibility of a surprise check
on his work. Usuaijy the Tahsildar and ~ven the District officers send
their programmes of inspection and tour 'to the /e}:bpol in the hope that
he would make proper arrangements for their stay and would look
after their comforts when they are on toun This state of affairs entirely
defeats the purpose for which inspections are meant. It is gratifying
to note, however, that some State Governments are now insisting on
better supervision of the work of the primary reportIng agencies and
are also gradually following the system of random sample checks on the
work done by the k}:bpol. Schemes of central supervision and check-
ing have also been introduced by the Government of India recently.
Lack of co-ordination. Yet another defect in the p"esent system
of agricultural statistics is the lack of proper co-ordination of the
various types of data that are collected. In various States, Depart-
ments of Agriculture, Food and Civil Supplies, etc., obtain agricultural
statistics independent of each other and these statistics are not co-
ordinated. There is duplication of work also. Various departments
conduct ad hot surveys regatding the same problem, independent of
each other, and this not only means duplication of work, wastage of
money and time, but the statistics thus collected by various departments
have been many times found .~o be widely differing from each other.
It is extremely necessary that statistics of agriculture collected in various
States should be well co-ordinated. As has been pointed out earlier
attempts are being made by various State Governments to achieve this
objective and State Statistical Organisations are now gradually co-
ordinating 'the statistics collected by various departments. The
Central Statistical Organisation also co-ordinates the work of various
State Statistical Organisations.
Delay in publication. Besides the above-mentioned defects, another
defect of agncultural statistics in our country is the undue delay in
their publication. This is a common defect with a large majority of
official statistics of our country. Various committees and commis,sions
appointed from time tc_> time have repeatedly pointed out this draw-
back and have emphasised the necessity of publishing statistics in time.
In case of agricultural statistics the data are first collected by the le/ehpaJ
and then again consolidated at the district headquarters and it is atter
880 PqNDAMENTAoLS OF STATISTICS

this that the data are consolidated at the State Headquarters. After
this the States send ~heir figure to the Directorate of Economics and
Statistics under the Ministry of Food and Agriculture at the centre for
consolidation and publication on an all-India basis. If there is delay at
any of these stages in any part of the country the all·India statistics are
held up. India is a big country and in various areas the time of sow-
jngs of various crops are different and even if the various agencies
associated with the collection of these statistics are efficient and punc-
tual, all-India statistics cannot be published sOOn. It is necessary that
strict punctuality must be observed at every stage and attempts shoul~
be made to publish the statistics within the time limit set because if
statistics are published late much of their utility is lost and then they
have only an historical importance.
In conclusion it may be said that this.unsatisfactory state,of affairs
is partly du~ to the fact tha\. the collection, of agricultural statistics was
in the past Lreated only incidental to the collection of land revenue.
The preparation of crop forecasts was taken up later on, at the insis-
tence of persons interested in trade. Actually it was at the insistence
of the cotton merchants of Manchester and Lancashire that crop fore-
casts began to be issued in this country. It w~s never realised until
recentiy that every development plan has to be based on accurate sta-
tistical information. The collection and compilation of statistics has.
therefore, remained more or less a by-product of either the official acti-
vity or a luxury which could be enjoyed in relatively easy times and
skip~ over in times of difli<:ulty. This Is the reason why statistics
collected- in our country are usually available in patches and their com-
pilation. and processing have been hapha~rd. Even at present as has
been said -already, a good deal of data runs to waste for want of proper
processing. A lot of additional information can be derived without
any extra cost if there is proper planning and interpretation of statistics.
With all sorts of economic plans and schemes afoot, it can be reason-
ably expected that these defects shall receive proper attention at the
hands of the government and the agricultural statistics of the country
would soon become more accurate and dependable.

Indices of agricultural production


A number of indices of agricultural production are available in
our country. We dISCUSS some of them below:
(z) MiniJIr,y of Food and Agriculture Tndex oj agricultural produt:-
tion. This is the most important and most commonly used index of
agricultural production ,in our country. These indices were avail-
able till 1949 when their compilation was suspended pending a re-
vision of the whole scheme. The base period of these indices was
the quinquennium 1934-35 to 1938-39. These series were based on
19 items which were divided in two major groups namely food grains
and non-food grains. It used weighted geometric mean in its calcuk·
GROWTH OF STi.TI9l:ICS IN INDIA 881

tion a.nd the weights were proportional to the total value of each co-
mmodity during the base period. To find out the average value of
products the average harvest price during the base period was taken
into account and where no price was available it was estimated from
wholesale prices. 'I:he general index was obtained by combining the
index numbers relating to the food grains group and non-food grains
group. Weights assigned to these groups were respectively 2 and
1. Later, the base period of the indices was changed to the triennium
1937-38 to 1939-40.
The revised index was available in the year 1950-51 with the base
year 1945>-50. This base year was also provisional and after 1953-54
it was finalised and the final base now is the year ending June 1950.
These indices are published in the Annual Report on Currency and
Finance issued by the Rese~ve Bank: of India and Agricultural Situation
in India. In the revised index there are a number of groups and sub-
groups. The first major group is the food-grain group which is
divided in two sub-groups namely (1) cereals and (2) pulses. Cereals
are further sub-divided into five smaller groups of rice, wheat, jwar,
bajra and maize. The second major group is the non-food grains
group and this group is sub-divided into four sub-groups namely (1)
oilseeds (it) .fibres (iii) plantation crops and (iv) miscellaneous. Each
of these four groups is further sub-divided into a number of smaller
groups. Indices are available separ~tely for each item, each sub-
group, each major group and for all commodities.
There are in all 26 commodities (for which regular crop fore-
casts are issued) included in the seLies. The values of different items
of production during 1949-50 have been taken as weights and the in-
dex number is computed as a weighted arithmetic average of production
relatives for individual crops. In working out the production rela-
tives chain base method is' used to allow for changes in coverage as
well as technique of estimation.
(ii) R'4.reru6 B:JI1/e of India index and Agrk1llJurai Production. This
index: number was pu.blished in the Reserve Bank of India Bulletin.
The indices were available from 1939-40 onwards. Upto 1946-
47 before the partition of the country. the base period of the index
number was the average of the triennium 1936-37 and 1938-39. Mter
th~ partition of the country the index: number of agricultural produc-
tion issued by the Reserve Bank: of India related to the Indian Union
only and up to the year 1948-49 the base year of the index number
was the same as previously. From 1949-50 onwards the base year
of the index number had been the vear 1948-49. The index number
was based on 17 commodities dis·tributed over five major groups.
The index: number was weighted and the weights were the values
of crop production. The various items and their weights were
as follows:
56
882 FUNDAMENTALS OF STATISTICS

Weight
1. Rice 38
2 Jowar and bajra 12
3.
4.
Maize
Ragi
e
2
5. Wheat 14
6. Barley 4
7. Gram 7
8. Sesamum 1
9. Groundnut 7
10. Rape and mustard 2
11. Linseed 1
12. Castor seed 0.3
13. Cotton 3
14. Jute 2
15. ~ea 4
16. Coffee ••. 0.4
17. Rubber 0.1
The above mentioned commodities Were divided in five groups
namely, foodgrains beverages,oilseeds fibre and~others. It would be ob-
served that in this Index number food-grains received a total weightage
of 79 out of 100 and amongst the foodgrains rice rece\ved a very high
wei~htage of 38. In the Ministry of Food and Agriculture Index food
gralns received a weightage of 66.9 only and rice only 35.3. Similarly
wheat receives a weightage of 14 in this index whereas in the other
index its weightage is only 8.5. There are differences in the weightage
of other items also.
(iii) Eastern Economist index of Agricultural Production. This index
is based on the average prices of 1936-37 to 1938-39 and includes
14 items spread over four major groups. This series is available from
1939-40 and was first published in the special budget number of
Eastern Economist of 1952-53.
The major groups and the items within them are as follows:
A. Food,grains: (i) Rice, (#) Wheat, (iii) Millets, (it!) Gram.
B. Fibre:r.: (i) Cotton, (m Jute.
C. Oilseeds: (I) Sesamum, (it') Groundnuts. (iii) Rape
and Mustard, (iv) Linseed.
D. Misc,llanevJlIs: (i) Sugarcane, (il) Tobacco, (iiI) Tea,
(iv) Coffee.
This index number is also a weighted index and the Wieghts
ate also proportional to the values of the cotllmodities during the base
period.
GROWTH OF STATISTICS IN llNDU 883
(iv) F. A. O. index. The Food and Agriculture Organisation of
the United Nations Organisation also publis-hes II. series of index
numbers of agricultural production relating to many countries of the
world including India. The base of these indices is the average
of the years 1934-38. A large number of commodities have been
included in these indices and they are divided in eleven groups.
This index number is also weighted and. the system of weighting
is a very complicated one. The commodity weights are world prices.
Each commodity price is calculated first in term~ of gold francs
per metric ton. In 19-34-38 the prit:e of a metric ton of wheat
was 100 gold francs and on this basis the price of all other commo-
aities are converted to wheat relative prices on the basis of which
weights art;: assigned. This systetn or weighing is a very compli-
cated one and is under scrutiny at present. The Food and Agricul-
ture Organisation of the United Nations Organisation is interested
in international comparisons of the indices of agricultural production
and that is why world prices are taken into account for assigning weights
to different items.
Live Stock and POUltry Statistics
In a country which is predominantly agricultural and in which
more than 80 percent of the popUlation lives in villages and is con-
nected with rural economy. the necessity of aknowledge of cattle wealth
cannot be overemphasised. In India the statistics of livestock were
first collected at the instance of the Secretary of State for India and it
was in the year 1883 that the Statistical Conference prescribed a form
on which the details of cattle cenSus were to be filled. Since then
figures of livestock began to be published quinuqennially in Agri-
tllltllraJ Statisliti of India. In temporarily settled areas it- was the
village Jekhpa/ who was entrusted with the task of reporting the
numbet: of cattle in his area. The figures submitted by tHe patwari
were generally not reliable. In permanently settled areas the con-
dition was still worse. It was in the year 1916 that the Government
of India decided to improve the situation and to have a cattle census
for -the whole of the country. A cattle census waS held in the year
1920 and since then it is being held every fifth year. The last two
Censuses which were due in the year 1950 and 1955 were respec-
tively held in the years 1951 and 1956. The current census was
conducted in the year 1961. Information relating to the 1951 and 1956
censuses are available in the publication entitled Indian Live Stotk
Census. Besides containing data about population of live stock and
poultry this publication also contains information about the agricul-
tural ~mplements and machineries of different types including tractors
in various States.
Sborltdmings. These statistics are not supposed to be very satisfac-
tory though it should be admitted that there has been a gradual im-
provement both as regards coverage as well as the quality ofdata. The
1'930 Live Stock Census had a coverage of SO percent of the area, the
884 FUNDAMENTALS OF STATISTICS

1945 census, 92 per cent of the area and 1951 census 94.4 percent
of the area. In 1956 the coverage was still better.
Live stock census prior to 1951 related to undivided India and
in 1950 the Ministry bf Food and Agriculture brought out a brochure
"Live Stock Statistics" and re-estimated the data for 1940 and 1945
on the basis of the present boundary of the Indian Union. In 1951
census the data relate to all States except Orissa and some parts of
Rajasthan and Manipur. The statistics given in these publications
are not fully comparable from census to census as information is
sometimes not available for certain areas in various States. Moreover
formerly the number of princdy Indian States taking part in the
enumeration was not the same at the time of every census and whatever
figures are available in previous reports relating to former Indian States,
cannot be relied upon because there was no proper agency for the
collection of data there. Moreover this census was not held at the
same time in all States. However, the last two censuSes were held
almost at the same time in all States.
Another drawback in the present live stock statistics is that the
classification of cattle is not uniform for the whole country. Moreover
the definitions of the words bull, bullocks, breeding bulls and othes
bulls etc. are not uniformly followed in all States and as such ttgure s
are not strictly comparable.
Recent improvements. In 1956 certain improvements were made
and, for the first time, the census was conducted on a household basis
on a uniform pro forma. A sample verification was also done by the
Directorate of National Sample Surveys in June-July 1956. The Indian
Council of Agricultural Research had also done sample verification
in certain areas after the census of 1950. However the quality of
these statistics is still very poor. Now it has been decided to conduct
each alternate livestock census in the same year in which human census
is done. This will improve the quality of these statistics to some extent.
Livestock Products
Formerly statistics relating to live stock products were available
in Indian A.griclllfttral Statistics. These statistics were extremely defective
as not only their coverage was incomplete but also because their quality
was very poor. The methods adopted to estimsate the output of live
stock products were extremely defective and generally speaking the
data were only of nominal importance. In 1951, however, these
statistics were published in Indian Livestock Statistics which contained
figures for 1947-48, 1948-49·.and 1949-50.
Livestock products can be divided in three main categories:
(i) Edible livutock products (primarJ). Milk, eggs, meat
and poultry.
(ii) Bdih/~ livestock products ( se~ondar.J). Ghee, butter,
. curd, cream etc.
GRQWTH OF STATISTICS IN INDIA 885

(iii) Non-edible liveStock prodll&tl. Hides and skins, wool,


and hair, blood, ivory, bones and horns dung and
tertili§ers. '
Livestock and livestock products statistics are published not
only in the 'Indian Livntock Census' (equinquennial) and 'Indian Live-
!lock StathtjC!' referred to above, but also in 'Agricnltnral Statistic!
of India', 'Abstract oj Agricultliral Statistics in India' and 'Statiltical
Abltract of India'. Market reports issued by the Directorate of Market-
ing and Inspection also contain some statistics on the subject.
Fisheries Statistics
Statistics relating to fisheries are also very inadequate and un-
dependable in our country as there is no organised machinery for
their collection. The available data about fishery fall in the following
categories :
(i) Data avai/able in market reportl.
(H) Data available with Fisheries Research Institllte and Stations.
(a) Central Inland Fisheries Research Station at Cal-
cutta.
(b) Central Marine Fisheries Research Station at Mandapam
Camp.
(c) Deep Sea Fisheries Station at Bombay.
(tI) Offshore Stations at Tuticorin. Cocmn and Vishakha-
patam.
(e) Central Fisheries 'Technological Research Station at
Cochin.
(f) Fisheries Extension Units at 10 different places.
(iii) Data available with Fisheries Development Adviser and in State
Ga~ettes. Figures of total catch and price of some variety of fish arc
available in sOPle statistical publications of certain States.
(iv) Data about consumption of fish collected by N. S. S.-N. S. S.
gives some figures about the consumption of fish in certain types of
households and these statistics are collected in the usual rounds which
are annually held. They are p~blished in the reports issued by
the Directorate of N. S. S.
The above discussion cleady shows that the data available about
fisheries are not only inadequate and incomparable but highly un-
organised and un-coordinated also.
It is, however, gratifying to note that our five year plans have
given attention to the development of fisheries. There are two types
of schemes namely (i) relating to marine fishing and (it) relating to
inland fisheries. The F. A. O. is aiso assisting the Schemes and it is
hoped that in future we shall have much better statistics emergingo from
these developmental schemes. _
886 FUNDAMENTALS OF STATISTICS

A word about the future development of fishery statistics is neces-


sary. It would be worthwhile to colfect these statistics on a uniform
basis throughout the country. This can be done if an AU-India census
of persons. boats and nets engaged in fishing is conducted every year.
This should be combined with intensive studies of their operational
economies. This would give an idea about the total catch and other
prohlems. Data relating to processing and marketing of fish and fish
products. if collected regularly. can also give an idea about the output
of fisheries in our country.
Forest Statistics
Before 1947-48 forest statistics were published in the "Annllal
R.eklrns of Stalistit'.r relating to F'()resl .Administration in India" issued
by central Forest Department. Government of India. Now these
statistics are published in the "InditZil FOf'est Stptistics" issued by the
:Qirectorate of Economics and Statistics in the Ministry of Food and
Agriculture. Government of India.This publication contains fairly
comprehensive statistics about the forest economy of India (exclud-
ing Jaaunu and Kashmir). These figures are based on the returns
submitted by forest departments of diHerent States and they relate to
the year ending 31st March. Broadly speaking the following types of
.data are available in this publication: .
ei) Area under forests.
(ii) VO/llm, 4 timber and fir~wood.
Besides giving the above statistics the publication "Indian Forests
Statisht'l" also contains useful information relating to employment
given by forests, revenue and expenditure of forests departments and
foreign trade in forests products.
"A Review of Forests Administration in India" is also published
once in every five y.cars. This mainly deals with the changes in forests
area in different ownerships, fluctuations in forests produce and their
causes. progress of survey work in different States etc. It is in the
nature of a commentary on the changes in forests st~tistics.
«Indian AgriPIIltliral Statistiu" also contains certain statistics
relating to forestry. These figures however differ from those published
in "Indian For.ests Statistics." The reason for this is that the term 'area
under forests' is not used in the same sense in the two publications.
Moreover the data published in the" Indian Forests Statistics" relate to
the year ending 31st March and those published in the "IndianAgriclIl-
IIlral Statistics" relat~ to the year ending 30th June. Forest statistics
are also publiShed in "Statistical Abstratf of India."
Spealcing generally our forest statistics can be said to be satis-
factory though there is usually considerable delay in the publication
of these figures andit should be avoided. At the close of the year 1961
the latest figures available are those relating to 1.957-58 and there is thus
a lag of more than three years.
GROWTH OF STATIS'tICS IN INDJA 887
Mines and Mineral Statistics
Statistics about minerals are collected in our country by the
(t) Chief Inspector of Mines.
(il) Geological Survey of India (G. S. I.)
(iiI) Indian Bureau of Mines.
(iv) The· Coal Commissioner.
(v) The Salt Controller.
(VI) The Petroleum Division.
The Chief Inspector of Mines collects statistics about all sucb
mines which co~ under the Indian Mines Act. Statistics collected
relate to number of mines, employment, wages, and hours of work .. In
case of coal besides these statistics, data are also collected about stock
position. These statistics are published in the Annual Report of the Chi~f
Inlpector of Mine/.
The statistics coJJected by the Geological Survey of India relate
primarily to mines not covered by the Indian Mines Act. Data collect-
ed by it relate to the quantity and value of mineral produce and daily
average num)Jer of persons imployed. G. S. I. also publishes State-
wise consolidated statements relating to the quantity and value of aU·
minerals/roduced in India in the "R,cnrds of Geological Survey of India."
Abridge figures are also published in " tndian Mnerall."
The Indian Bureau of Mines get figures from both the above sources
and also collects statistics relating to cost of fuel, electricity used, cost
of madunery depreciation and cost of operation per ton etc. Since
1955 the Bureau of Mines has been publishing mineral statistics in
"Mineral Production of India" which is an annual publication.
The Coal Commissioner published detailed statistics of coal dis-
tribution. Other coal statistics were compiled by the Department
of Commerci'al Inte1ligence and Statistics but now they are compiled
by the Indian Bureau of Mines also. The Salt Commissioner is pri-
marily concerned with statistics of production and stock of salt and the
Petroleum DivIsion prepares statistics of output, consumption and distri-
bution of petrol and petroleum products. 1 These figures are, however,
treated as confidential.
The coverage of our mineral statistics is on the ./hoTe quite-satis-
-fuc.rory. There are, however, minor discrepancies ih the figures pub-
lished from different sources and they should be ,removed.
Cost of Agricultural -Production
Statistics relating to cost of protiuction- of agricut!l!al_ C91lJlllO-
dities in our country are extremely scanty and of highly doubrtul quality.
Till recently the only compreh.ensive material, on the subject, in OUI
country, was contained in the Report on COlt Production of Cropl in Prin-
cipal Sugarrane-and Cotton Tractl of India. This study was done by the
Indian COllncil of Agricultural Research.
888 FUNDAMiENTALS QF STATISTICS

'-. The Directorate of Economics and Statistics under the Ministry


of Food and Agriculture conducted some farm management studies
in collaboration with some research institutes at six typical regions of
the country in the States of Bombay, Madhya Pradesh, U. P., Madras,
Punjab and West Bengal, in 1954~55. The scheme is being continued
but so far only one report (relating to 1954~55) has been published. A
second report is shortly expected.
It is high time that the N. S. S. takes up this work of collecting
cost of cultivation statistics on a more extensive scale to fill up this
important gap in our agricultural statistics. . Intens ive farm manage-
ment and cultIvation cost studies should also be undertaken by research
workers and institutes independently.
AgricultUral Savings and Investments
Data. relating to agricultural savings and investments are very
meagre in our country. Some figures are available in the Rural Credit
Survey conducted. by the Reserve Bank of India but the coverage of
these figures is HriUted and the data are not exhaustive. There is a com-
.mon feeling that our agricultural sector has accumulated substantial
savings in recent years but nothing is known about it nor about the
investment of these savings.
It is gratifying to note that the National Council of Applied
Economic Re!learch is undertaking a pilot survey of rural savings
and investments. \
Utilisation of Agricultural Prod\1ce
Agricultural produce can be utilised for various purposes as for
seeds, for farm consumption as fodder, for farmer's own consumption,
for payments in kind for services or goods obtained and for disposal
in the market. Unfortunately such statjstics are very meagre in our
country.
Some useful information relating to some of these items particu-
larly about trading and marketable surplus is available in the Market
Report relating to different commodities. The COf1(J1lodity Series which
are brought out by the Directorate of Economics and Statistics under
the Ministry of Food and Agriculture, Government of India about
cottoll, jute, .sugar, oFseeds;tobacco, lac, tea, coffee and rubber also
contain informatio'b. on some of these items. «Bulletin o!FoodStalislics"
and "Food Simatioll in India" also contain sta~stics of consum"ption and
stock.
""he~e statistics-are, however, not enough and are not collected
on a uniform pattern on a regular basis.
Agricultural Co~operatibn
Statistics relating to co-operation in our country are published by
the Reserve Bank of India. Statistics are available about the following
types of co-operative organisations :
·
GROWT'H OF
"-
STA~ISTICS IN INDIA 889
(i) State and Central Banks,. (ii) State non-credit societies,
(iii) Central and Primary Land Mortgage Banks,
(io) Agricultural Credit Societies and (v) Non-
agricultural Credit Societies.
Detailed statistics relating to their number, membership, share
capital, reserve and other funds, working capital loans advanced, loans
recovered and loans due are published in "Stati.flical Tables r,lating to
C(J-operalive M-ovements in India." .
Statistics relating to noncredit societies, supervising unions, State
Unions and State Institutes and societies under liquidation are published
in the Review of Co-operative MoveflletJt in InditT, i~sued by the Reserve
Bank: and in Statistical Abstract of India and· some otber publiGations
issued by the Directorate of Economics and Statistics, Ministry of Food
and Agriculture, Government of India.
Statistics relating to other sources of agricultural finance are almost
non-existent in our country and as such the relative importance of
co-operation in the field of agricultural nnanaces can.not he properly
analysed. Some figures are, however, available In t"he report of the
Committee of Direction set up by the Reserve Bank: of India. Some
data have also been collected through N. S. S. and they throw some
light on the pro.]:>lems of agricultural finance and rural indebtedness.
Agricultural Holdings
Statistics relating to _ cultivators' holdings are not separately
collected in our country. Figures about the size of holding can be
obtained from the records of the village patwari or other revenue agencies
in areas where there is complete enumeration and field to field measure-
ment. The Technical Committee had recommended that the present
livestock: cenSus should be completely inf;egrated and conducted as a
part of the quinquennial census of holdings which should be 'held in our
country.
The last population census of 1961 contained a detailed schedule
relating to agricultural househol~s and an attempt was made for the
first time to collect statistics of cultivator"s hol.ding. The N. S. S. is
conducting an All India Surv'!1 of Land Holdings in its current (sixteenth)
round. These statistics are being collected in connection with the
sugge"Stion of F. A. O. for agricultural census. The survey will, how-
ever, provide data on a regional basis and State basis only. Data will
not be available for smaller units like villages or even tabsils.
L:1nd ReVenue
Statistics of land revenue are published at present in "Indian Land
Revenue Statistiu." Formerly these figures were published in Indian
Agricultural Statistics.
Agricultural Implements and Machinery
Statistics relating to agricultural implements and machinery are
collected cinc;e in five years along with livestock. census and are
890 WiNDAMENTALS OF STATiSTICS

published in the "bzdian Livestock Statistks". These figures are very


undependable and their coverage is extremely poor. It is necessary
that such figures are collected on a more comprehensive basis along
with the quinquennial census of agricultural holdings which has been
suggested earlier.
Area under improved inputs
Various improvements have been made in the technique of pro-
Quction in recent years but we do not have any idea about the area in
which improved agricultural practices are used and their results. Better
quality of seeds, better manures and fertilizers and better irrigation
facilities have been made available in certain area:; but their results
and those of improved agricultural practices adopted in sowing, harvest-
ing and other agricultural operations are not ~nown at all. Millions of
rupees are annually spent on these items and yet nothing has been done
to find out the result achieved by this expenditure on improvements.
Studies should be done on a comprehensive scale and on a regular basis
to find out the results of each input and improved practice. This would
help in the input-output analysis of the agricultural sector of our
economy. This techno-economic data on production estimate in agricul-
ture is very badly needed in our country.
Agricultural empoloyment
Statistics relating to employment provided by thd agricultural
Sector and the total population dependent on a~riculture are collected
at the time of population census. However the classification of popu-
lation under various categories within the agricultural sector has not
been uniform in all censuses and as such figures ate not strictly com-
parable.
SBCTION 4

INDUSTRIAL SECTOR
Need. In modern times the economic structure of almost all im"
portant countries of the world is incre~singly dominated by large scale
industries and economic development is measured more by industrial
development than anything else. As such the importance of statistics re-
lating to t\1e industrial sector of various countries is continuously increas-
ing. Since proper industrial development is not possible in the absence of
reliable and adequate data, statistics relating to the size of industrial
units, their capital structure, employment provided by them, impact
of rationalisation and productivity movements and various types of
input-output ratios have assumed great significance in modern times,
when there is almost a mad ra(:e between various countries for achieving
supremacy in the industrial field. Economically advanced nations
of the world like U. S. A., U. K., Germany and U. S. S. R. now col-
lect comprehensive statistics relating even to the minutest problem
associated with development and growth of industry.
. The availability of statistical data relating to jndustries in our
cOl.lntry has always b~en very poor. As stated earlier, the attitude
of the British Government towards industrial development was never
sympathetic and the question of having any efficient system of collec-
tion of industrial statistics never arose. In modern times the importance
of industrial statistics has considerably increased and in our country also
·keen interest has been shown by the Government in matters relating
to industrial development and the question of having adequate and
dependable industrial statistics has also received the attention it deserves.
The paSSing of Industrial Statistics Act, 1942, the Census of Manu-
facturing Industries Rules in 1945, the Collection of Statistics Act in
1953 and the Collection of Statistics (Central) Rules in 1959 are impor-
tant steps that have been taken in recent years to improve the indus-
trial statistics of our country.
Data collected in other countries
Before actually examining the industrial statistics available in our
country during the British period and after independence it would
be better to have an idea about the nature of industrial statistics col-
lected in other countries of the world. Broadly speaking the available
industrial statistics in other countries can be classified under "the fol-
lowing heads :
(i) Capital strIlC/llre. Under this heading statistics relating both
to fixed as well as working capital are collected. Not only figures
relating to authorised, issued and paid-up capitals are available but
details of investments in land,·buildings plant and machinery, furnit~e
892 FUNDA,MBNTALS OF STATISTICS

~nd other fixed assets are also noted down. Figures of expenditure
on extet;lsions and replacements of these assets as also the amount of
depreciation and repairs during a year are regularly collected. Work-
ing capital figures are available on the basis of their distribution over
items like raw materials, fuels. finished and semi-1inished products.
cash in hand and bank etc. Foreign capital investments and figures
relating to the manner in which internal capital is raised (by issue of
shares,debeptures,reinvestments of profits and loans) are also available.
(ii) E11Iploy;ment. Statistics are collected about the number of
persons employed and wages and salaries paid. to them. Employees
are classified in a number of ways, usually by the nature of work (skil-
led, unskilled, technical, non-technical, clerical, supervisory and ad-
ministrative etc.) Figures of wages and other emoluments paid are
collected in detail. Figures of total man-hours worked. average
employment per working day, and average wages and salaries are also
estimated. Figures of industrial disputes, absenteeism, labour turn-
over etc., are usually available as a result of the working of various labour
legislatjn!1s. Social security statistics generally emerge from the working
of social security measures.
(iii) Inputs. Very detailed statistics of different industrial inputs
are collected in other countries. Inp\:t and output analysis has assumed
great significance in recent years and such statistics are now collected
in great detail. Figures relating to both quantity and value of
each industrial input are obtained. Industrial input~ may be either items
of raw materials, chemicals, packing material and consumable stores
or of fuels, lubricants and electricity consumed etc. Apart from tbese
two major categories of inputs (i) materials, (ii) fuel and power there are
other inputs also like various expenses on inward transport, printing,
advertising, warehousing, purchase agency service, local rates and taxes
.!tc. about which detailed statistics for every unit are collected. These
figures are of very great help in estimating the net output or value added
by a group of units and also in setting up certain t~chno-economic
ratios which are essential for correct analysis and interpretation of facts
and figures.
(iv) Outputs. Just as statistics of inputs are collected in detail,
statistics of output are also obtained comprehensively. Figures re~ating
to quantity and value of the main product, Dy-product and subsidiary
products manufactured in a year are obtained. These figures give an
idea about the gross output from which the total inputs hav:e to be
deducted to arrive at the figure of nct output.
Ev) Other data. Apart from figures of capital structure, industrial
employment, input and output, figures are also collected about the
potential exparrsion and maximum capacity of the units from various
points of view-like output, employme~t etc. These figures are very
helpful for the purpose of economic planning. Figures of stock and
supply of various goods and the utilisation of selected articles (generally
scarce) are also obtained. These statisti~s are of immense help in input
GROWTH OP STA TISTlCS IN INDIA '893

and output studies and in locating the point where econO!llY is pos-
sible and rationalisation schemes can be put through.
The points discussed above indicate only the broad headings
under which information is collected in other countries of the world
and should be available in our country also. In actual practice each of
these major classes are sub-divided into a large number Qfsmaller classes
depending on the nature of the data required. These statistics are
usually collected either by sending schedules through post to the
industrial units and asking them to fill them up and to return them
to the relevant statistical authority or by deputing factory inspectors
who go from factory to factory and collect the iollformation needed.
We shall now discuss the data that are available in our country
and shall examine their drawbacks and suggest ways of improvement.
We shall not study labour statistics in this section but shall discuss
them later on in connection with statistics of wages. Industrial statistics
available in our country prior to independence were extremely me!lgre
and whatever improvements have been made are of a comparatively
recent origin. As such we shall examine the available statistics in two
parts namely, one relating to the pre-independence period and the
other relating to post-independence period.
Pre-Independence period
Statistical data available in India about large scale industries before
the year 1947 can be studied under three headings, namely:
1. General s~atistics.
(a) Number of factories.
(b) Labourers employed and wages paid.
(c) Capital invested.
2. Statistics of output and costs.
3. Statistics of power consumed.
General statistics. So far as general statistics were concerned data
were available about the number of factories, the number of persons
employed by them and the amount of cal>ital invested in them. These
figures were available in :
(1) Large Indllstrial Establishment in India, which was issued by
the Department of Commercial Intelligence and Statistics at that time
Now this publication is issued by the Labour Bureau, Ministry of Labour.
(2) Statistical Abstracts of (British) India,
(3) Statistics of Factories, and
(4) Report on the Working of Joint Stork Companiu.
For the purpose of statistics factories were divided in ten major
groups. namely_(i) Textiles, (H) Engineering, (iiI) Mineral and Metal
(iv) Food, Drink and Tobacco, (v) Chemical dyes etc. (vi) Paper and
Printing, (viI) Processes relating to wood. stone and glass. (vHi) processes
connected with skins and hides, (ix) Gins and Presses, (x) MisceIla-
894 FUNDAMENTALS OF STATISTICS

neous. Each of these ten groups was further sub-divided into a number
of smaller groups and the number of factories in each of the major
and minor groups was given both districtwise as well as provinc~wise.
Separate sets of tables for seasonal and perennial factories were given.
Seasonal factory was ta".en to mean a factory which did not work
for more than 180 days in a year. These figures were compiled from
the returns of Provinc~al Factories Department. Figures relating to·
Indian States were specially collected. The average number of persons
employed daily was calculated by diviCling the total ~ttendance of all
working days in the factory by the number of working days. These
figures were publiShed in the Large Industrial Establishment as well
as in the Statistical Abstract. The figures in the Abstract related
to those factories only to which the Factories Act of 1934 appUed.
These figures Were fully comparable with those publiShed in the
Large Industrial Establishment because the Factories Act applied. in
some cases to those factories also which employed less than 20 persons
and such factories were ignored in Large Industrial Establishments.
Figures were available provincewise and the provincial figures were
classified under three headings namely. adults. adolescents and children.
Separate figures were available for males and females for the first two
groups. These publications also contained sortie information about the
capital invested in various factories. Separate figures were available
for authorised and paid-up capital and debentures., However no
separate figures were available for fixed and working capital and the
amount of money spent on land. buildings, plant and machinery and
other fixed assets Was not known.
OIItpHt and Cost Statistics. So far as statistics of output and cost
were concerned, there were hardly any statistics worth the name avail-
able in our country prior to the year 1946. Stati~tics of inputs were
particularly conspicuous by their absence and even statistics of output
Were extremely f~ulty and inadequate. There was no legal binding
on any industrial unit to supply information about output and cost
and whatever information was available during this period was collec-
ted on a voluntary basis. As such figures were neither complete nor
comparable. The situation with regard to cotton mill industry was
somewhat better due to passing of the Cotton Industry (:Statistics)
Act in the year 1926. According to this Act a cotton mill was under
legal obligation to supply statistical information which was publiShed
in Monthly Statistics of Cotton Spinning and Weaving il1 It1dian Mills. Under
this Act figures were collected about particulars of all cotton goods
manufactured. description and weight of all yarn spun. amount of
cotton pressed and consumption of Indian cotton in Indian mills.
Figures of production of some industries were available in another
publication named Monthly Statistics of the Protlnction of Certdin Selected
Intlns/ries in India. This publication c6ntained information about the
production of jute manufa~tures. paper, iron and steel. petrolll:nd kero-
sine oil, cement, paints and heavy chemicals and wheat flour mills
in India. In, all these ,cases figures were supplied voluntarily by the
GaOWTH OP STA!ISTICS IN INDIA 895
factories. These figures were not comparable month after month
because the number of factories supplying information was not the
same each time. Besides these statistics information was also available
about the production of sugar and match industries. The production
figure~ of sugar and match-boxes were based on the reports of the excise
authorities under the Sugar and Match (Excise duty) Act 1934. Certain
statistics were also available about the production of distilleries and
breweries.
Statistics of Power. So far as statistics of power consumed were
concerned MonthlY Survey of Bu!iness Conditions in India (now merged
with Indian Trade Journal since 195 I) used to give monthly statistics
of the electric power generated and consumed in India. Upto October
194:> information was given in a detailed form and ~he figures of con-
sumption were given under seven heads, namely, Domestic, Com-
mercial, Industrial Tramways. Electric Railways, Street lightings and
Miscellaneous. Sidce November 1942 only total figures of the ener-
gy generated and total untts sold for consumption began to be given.
Upto the year 1943 these statistics were supplied by Economic Adviser's
Office but since January 1944 statistics relating to the electric power
generated and consumed began to be supplied by the Electrical Com-
missioner to the Government of India.
The above survey of industrial statistics available in India prior'
to independence clearly indicates that the situation was highly un-
satisfactory. The data available were extremely inadequate and their
quality was very poor. Production figures were not comparable year
to year as the numoer of units supplying the data was not uniform.
Statistics relating to inputs were very inadequate and figures relating
to the quantity and value of raw materials used in production, added
value of manufacture, value of fuels and power consumed, number of
engines, horse power and kind of power used, value of land, build-
value of manufacture, value of fuels and power consumed, number of
engines, horse power and kind of power used, value of land, build-
ing. machinery etc. and figures relating to :various items of cost were
almost non-existent.
Post-independen'$:e period
After the independence of the country the Indian Government
realised that unless the country' was industrially developed economic
development was not possible and particularly when the government
decided to proceed with the 5-year plans it became absolutely essential
to collect detailed statistics about the various problems associated with
industries and it is not surprising, therefore, to find that the condition
of industrial statistics of our country has considerably improved in
recent years. We shall discuss below some of the important achieve-
ments of the government in th-e field of industrial statistics in recent
years.
(1) Annual Census of Manufactures
The Government was conscious of the fact that industrial statistics
in India were inadequate and that their necessity was acute from more
than one point of view. The need for a legislation for the collec~on
896 FUNDAMENTALS OF STATISTICS

of such statistics was a long felt one and it was in 1942 that Industrial
Statistics Act was 'passed.The Act covered the then British India and
provincial governments were ft:ee to frame their own rules and to de-
cide the dates on which their rules Were to come in force. Though
the Act was passed in 1942 it was only late in 1945 that the Directorate
of Industrial Statistics was set up at the Centre to enforce it. The first
stage for implementing the Act was the notification by the State Go-
vernments of the rules for conducting an industrial census. Census
of Manufacturing Industries Rules were uniformly adopted by all the
States in the year 1946 and in the same year the first Census of Manu-
factures was conducted. Annual censuses have been conducted since
then and the results published in the Census of Manufacturing Industries.
This Act is applicable to all 'factories which employ 20 or more persons
or employ 10 or more and use power. In accordance with this Act
the factories are under legal obligation to supply the requisite informa-
tion to the government and failure to do so is punishable with fine.
All information required under section 3 of these rules was to be
furnished in English and it was to be treated as confidential by the
Statistical Authority. '
Under these rules industries were classified in 63 categories out
of which 29 were selected in schedule I about which data were to be
collected as per form which was a part of schedule II ofl the above men-
tioned rules. Thus the details of the statistics to be collected were
part of the ruleS which could not be even modified without a tedious
legal procedure. About the remaining 34 industries statistics were
to be collected at a later date. The 29 industries included initially in
schedule I were as follows ! -
1. Wl:eat flour, 2. Rice milling, 3. Biscuit making, 4.
Fruits and Vegetables Processing,S. Sugar, 6. Dis-
tilleries and Breweries, 7. Starch, 8. Vegetable Oils,
9. Paints and Varnishes, 10. Soap, 11. Tanning,
12. Cement, 13. Glass and Glass-ware, 14. Cera-
mics,15. Playwood and Tea Chests, 16. Paper and
Paper board, 17. Matches, 18. Cotton Te.~tiles, 19.
Woollen Textiles, 20. Jute-Textiles, 21. Chemicals,
22. Aluminium, Copper and Brass, 23. Iron and
Steel, 24. Bicycles, 25. Sewing Machines, 26. Pro-
ducer Gas plants, 27. Electric lamps, 28. Electric
Fans, 29. General Engineering and Electrical En-
gineering.
As per schedule II for each.of the above 29 industries a form was
prescribed on which statistics were collected. The data coollected about
~hese 29 industries were, generally speaking, of unform pattern. How-
ever all the forms were not identical because various items of inputs
and OUtputs in d..i1ferent industries were different. The data collected
under these rules were as follows :
GROWTH OF STATISTICS IN INDIA 897

General Information including items like name of the factory, its


location, present address, address of proprietor, managing agent, etc.
Capital Strn(/Hre as on ; 1st December, detailed information was
to be supplied about paid-up capital, productive capital and the man-
ner in which fixed' capital had bc:~n invested. Detailed statistics about
working capital were also collected.
Employment and W~.ee.r. Persons employed, salaries and wages
paid, man-hours wQrked- during the year ending ; 1st December.
Power consllmed. Fuel, electricity, coal, gas lubricating material
and water purchased at any time and consumed during the year ending
31st December.
Materials consHIIJcd. Materials purchased and consumed during
the year ending 31st December in the manufacture of products and
by-products made for sale and the work given out during the same
year.
OlltpHt. Quantity and value of products and by-products.
The following table gives the data collected for a few years
CENSUS OF MANUFACTURING INDUSTRIES

1947 1954 1917

1. Registered factories in existence 5,63 2 7,06 7 7,72.7


z. Factories froin which returns were
received 4,87 2 6,637 6,7 80

3· Fixed Capital employed (in crores) 177. 2 355. 6 S44


4· Working capital employed 2.2.6·3 4;2.·2 600

S· Total capital employed 40 3.5 787. 8 I I 44


,
6. Number of workers employed ,(in lakhs) 14. 87 IH4 16,76
7· Number of persons other than workers
employed (in lakhs). 1.45 1.81 2..19
8. Total number of persons employed i6.;2 17. 15 18'9S

9· Wages paid to workers (in crores) 108,9 17 1 • 2 199


10. Salaries paid to persons other than wor·
kers (in crores). u·S 4I.1 SS
II. Money value of other benefits H 6·3 16
Md ' F'UNDAMEN1!ALS qP STA'tIS'l'ICS

12. Total salaries and wages paid (in crores) 135·7 2.I8.6 270
13· Value at factory of materials etc. consum-
ed including cost of transport (in crores). 48~'4 883·4 12.06
14· Value of work done for factories by other
concerns (in crores). . . .• 2..8 6,3 10
15· Depreciation (in crares) 12..6 25. 0 40

16. Total of materials and fuels consumed


and depreciation (in crores). ~oo.8 9 1 4.7 12.~6
17· Factory value of products and by-pro-
ducts, commission and transport charges. 737·2. 12.86.~ 17 10
18. Value of work done by customers (in
crores). 5·7 1.1 14

19· Total of product and by-product for sale


(in crores). 74 2 .9 12. 87.6 172-4

2.0. Value added by manufacture (in era res


of rupees (19-16). 2.42.·1 37 2 .9 4 68

These data collected annually were published in the 'Census of Indian


Manufactures (CM). From 1946 onwards the annual census was con-
ducted on a statutory basis and the results published. In 1953 Collec-
tion of Statistics Act was passed and this Act became operative from
10th November 1956 and on this date the Industrial Statistics Act of
1942. and the Census of Manufacturing Industries Rules of 1945 be-
came inoperative. The new rules under the Collection of Statistics I
Act could not be framed till 1959 and so for the years 1957 and 195 8
the annual census of manufactures was again done on a voluntary basis.
In 1959, however, the new !cules became operative and data were again
collected on statutory basis.
Drawbacks. The information collected under the 1945 Rules was
much better and more comprehensive than the data available prior
to the passing of this legislation. However it suffered from various
defects.
(1) The most important drawback of these statistics waS the
adoption of defective concepts and definitions of various terms like "manu-
facturing process," "workers" and "wages" etc. The definitions of
these terms were borrowed from Factori«;s Act and Payment of Wages
Act. They were not suitable for purposes of economic analysis.
GROWTH OF STATISTICS IN INDIA 899

(2) The forlJJS oj the Schedllle.r and Qllestionnaires were not flexible
and no change could be made in them without a very tedious and long
legal formality. This defect has been removed in the new rules framed
in 1959 under the Collection of Statistics Act.
(;) Their coverage was not complete and only 29 industries out of 6;
were covered. The idea was to include the remaining 34 industries
at a later date but these industries could not be in~luded till the day the
Industrial Statistics Act became inoperative. In fact from 195 I on-
wards data were available with regard to 2.8 industries, as no unit con-
tinued in Producers Gas Plants Industry.
(4) The forms on which statistics were collected were fJot suitable
for Government own~ faG/ories and they did not supply information. With
the expansion of the public sector the number of such units increased
fast and there was need of bringing them within the orbit of the census.
The factories attached to Training Institutes also came in a different cate-
gory as they did not employ labour permanently on regular basis and
as such were left out.
(5) Even for the industries covered the forms used for tbe collection
oj statistics were n"t very suitable and there were genuine difficulties in
supplying all the information called for. In' fact these forms were
designed on the lines of similar forms used in U. K. :;'Lld U. S. A.
but conditions in those countries were different as tP:;lr factories were
of more advanced type and maintained detailed accounts of inputs
and outputs. Our forms should have been more simple and such as
c(;>uld be easily understood. Moreover the information had to be
supplied in English only and this also created some difficulty.
On the whole, however, it can be said that the data collected \
under eM were not very unsatisfactory. The details were published
Statewise as well as industrywise. There was however a great delay
in publicdtion oj these figtlres.
..f"
2. SSMI.
Apart from the annual census of manufactUres conducted by
the State Governments the Directorate of National Sample Survey
conducted a Sample Survey of Manufacturing Industries (SSMI) since
the year 19F. It covered all establishments registered under section
zm(t} and 2m(it} of the Factories Act of 1948, that is those using power
and employing 10 or more workers and those not using power and
employing 2.0 or more workers. tts scope was further extended to
cover establishments registered or licensed under the Industries (De-
velopment and Regulation) Act of 1951 as amended from time to time.
This was done for the first time in the fourth round which related to
the year 1954. The SSM! covered the whole of the country with the
exception of Andaman and Nicobar islands. Units under the Minis-
tries of Defence and Railways. Government of India, were excluded
from its purview. The survey was spread over all the 63 industries
which were classi£ed for the census of manufactures. In' the year
-I954 there were 32,767 factories in all (covered by zm(t) and 2.m<it)
of the Factories Act 1948) and ~he sample size was 3,567 factorIes
900 FUNDAMENTALS OF STATISTIC5

(though out of these data could not be collected for It I factories).


Thus the SSMI sample was about 10 per cent. of the total number.
In later rounds the number of industries was further increased and in
the eighth round relating to the year 1958 as many as 16z types of
industries were covered by the survey. In this round data were collec-
ted for two reference years, 19S7 and 1958. About 8,000 factories
and 7,Z50 scheduled undertakings were covered in the eighth iound.
The chief items contained in the questionnaire of the SSMI were
as follows:
Capital Structllre
(i) Value of fixed assets such as land and buildings and
machinery etc.
(ii) Value of working capital consisting of stock of fuels,
raw materials, products, by-products, semi-finished
products and cash in hand etc.
(#1) Rent of fixed assets secured on lease.
(ip) Duration of working period.
EmplOY/lIent and Wager. Employment figures with necessary
breakdowns showing wa.ges and salaries paid to different classes of
employees. \
Inpllts. Value and quantity of consumption of' fuels, raw ma-
terials and chemicals etc. including services received from other
units.
Olltpnl. Value and quantity of products and by-products of the
factory and services rendered to customers.
The SSMl reports give the details of the concepts and defi-
nitions followed (or the collection and compilation of these figures.
As has been pointed out the coverage of SSMI was wider than that
of CM (census of manufactures.) It covered all industries and its geog-
raphical coverage was also better. The data were collected under SSM!
through trained investigators who visited the sample units and got the
forms filled up. Thus both the quality and coverage of SSMI- was
better than those of CM. It was for this reason that the National
Income Committee preferred the SSM! data to CM figures in the
estimation of national income.
There is no comparable published material "hich can be used
as a comprehensive external check to assess the 'lccuracy of the figures
of SSMI .. Only in case of factories using power the figures of CM
can be compared with SSM! estimates for the first 29 groups of in-
dustries although their coverage is not identical. Whereas CM covers
;l.ll the factories using power and engaging ao or mote workers on any
working day in the year SSMI covers factories using powet and en-
GllOWTH OF'· STATISTICS IN INDIA 901
'gaging to or more workers. As a result the SSMI results are expected
to be higher than the conesponding eM figures.

It may be observed that successive estimates of SSM! for the


years 1951-54 have similar movements with those of CM but the CM
figures are in all cases lower than the SSMI figures. As noted earlier
the SSM! had a wider coverage than CM. These annual surveys were
carried on upto the year 1958.

Collection of Statistics Act 1953


It has already been mentioned that from the year 1946 orlwards
industrial statistics· in our country were collected in accordance with the
Census of Manufacturing Industries Rules under the Industrial Statis-
tics Act. According to these rules the companies which were incor-
porated outside British India wete not required to submit copies of
SllUlual balance-sheets, profit and loss accounts and Directors' report.
They were to submit only the information required on the forms sent
to them. This was a serious drawback of that legislation. Under
the British rule such a thing could be tolerated but when the country
became independent this shortcoming of the rules created many diffi-
culties. The coverage of the Industrial Statistics Act and the Census
of ~Ianufacturing Industrial Rules was also limited and the Government
could not obtain data about units which were not included in schedule
1 (which included only 19 industries out of 6;) and also about firms
not covered by the Factories Act of 1934. In the year 1951 the Gov-
omment wanted to know the number of Indian and foreign people
employed by Indian and foreign firms and for this the Government
issued a notification calling upon all undertakings to supply the in-
formation. Response was very poor and the Government the),(
realised the urgency of having a comprehensive legislation which wo_uld
empower them to collect all types of statistics from various industrial
and business units. With these objectives the Government enacted
the Collection of Statistics Act in 1953.
The jurisdiction of this Act extended to the whole of Indian
Union except the State of Jammu and'Kashmir and it was to come in
force on such a date as the Central Government may notify. This
Act came in force on loth November 195 6.
It is applicable to all commercial concerns and factories and in-
dustrial concerns.
Under this Act the Go';~t1;ment could colle~ statistics relating
to any of the following items : 'It;l

(a) Any matter relating to any industry or class of industries.


(b) Any matter relating to any commercial or industrial con-
cern and in particular relating to factories.
FUNDAMENTALS OF STATISTICS

(c) Any of the following matters so far as they relate to the


welfare of labour and conditions of labour namely :
(i) Price of commodities,
(ii) Attendance.
(iii) Living conditions including housing, water supply
and sanitation.
(iv) Indebtedness.
(v) Rents and dwelling houses.
(vi) Wages and other earnings.
(vii) Provident and other funds provided for labour.
(viii) Benefits and amenities provided for labour.
(ix) Hours of work.
(x) Employment and unemployment.
(xi) Industrial and labour disputes.
(xii) Trade unions.
This Act debars the State Governments from collecting statlstlcs
relating to 'items specified in list I (7th schedule of the Constitution).
These are items included in the Union List. Provision has also been
made in this Act to avoid conflict between the State action and the
action taken by the Central Government under this Act in respect of
the same matter.
This Act gives the same powers to the statistical authority (named
by the Government) to collect statistics and to have the right of access
to records or documents as were given by the Industrial Statistics
Act of 1942. The penalties for non-compliance are also the same as in
the Industrial Statistics Act. In fact all these provisions of the 1942.
Act have been taken almost verbatim in the Collection of Statistics Act
of 1953.
This Act replaces the Industrial Statistics Act of 1942. which was
found wanting to enable the Government to collect statistics not cove-
red by the Act.
Under this Act rules were to be framed regarding the form and
manner in which the information and returns may be furnfshed, the
particulars which they should contain1 the interval with which and the
authority to which such information and returns may be furnished,
the manner in which the right of. access to documents and the right of
entry may J:>e exercised. .'
(» -
(3) Collection of Statistics (Central) Rules 1959-Annual Survey of
Industries
Although the Collection of Statistics Act 1953 wa~ brought into
force with effect from loth November 19,6, the Collection of Statistics
(Rules) could be passed only in 1959.
GROWTH OF STATISTICS IN INDIA 901
These rules are, generally speaking, on the same lines as those of
Census of Manufacturing Industries Rules (1945) framed under the
Industrial Statistics Act (1942). However there is one significanr
improvement in these rules. In 1945 Rules the details of items about
which data could be collected were specified in various returns, forms
and schedules and it did not give a very clear picture of the situlltiC''''l_
In 1959 Rules the items about which data can be statutorily collectc:d
are mentioned in sufficient details in section 4 of the rules. The pre-
s5!nt rules are flexible and can meet the changed situations as and when
they arise and the schedules can be modified without entailing detailed
legal formalities.
The annual survey of industries which has now been done under
these rules has replaced both the CMI and SSM!. There was a lot of
duplication in the collection of data by these two agencies and though
the schedule and forms used by them were not identical yet generally
speaking they were similar. This duplication has now been avoided and
in a way the powers of the State Government have been taken away so
far as collection of industrial statistics is concerned and even though
the State agencies have recently been nominated as representatives
of the Directorate of the N S S the overall responsibility for the collec-
tion and processing of these statistics is not with the State Govern-
ments. At present data rdating to the years 1959 and 1960 are being
collected on statutory basis under the new rules and the integrated
scheme is known as Annual Survey of Industries (A. S. I.) In AS! two
types of enquiries are being conducted namely :
(I) Census in respect of all factories employing on any day
50 or more workers without the aid of p0'.:Y~! ~
(it) Sample survey in respect of factories employing 10 to
49 workers with the aid of power and 20 to 99 workers
without the aid of power and industrial concerns,
happened to be selected in the probability sample for
the survey year under consideration.
Scope and coverage. ASI extends to all factories registered and
licensed under the Factories Act of 1948 and "Industrial Concerns"
as defined in the Collection of Statistics Act 1953 with the following
exception:
Iron ore mining, metal mining except iron ore JIllJl11lg, stone
quarrying, clay and sandpits, salt mining and quarrying, chemicals
and fertilizers and mineral mining and non-metallic mining aud quarry-
ing not elsewhere classified.
Like CMI and SSMI the ASI also does not cover establishments,
owned, managed or controlled by the Ministry of Defence and Railways
and also workshops attached to training institutions.
The geographical coverage of ASI is the whole of Indian Union
except Jammu and Kashmir about which statistics are being collected
on a voluntary basis.
FUNDAMENT~LS OF STATISTICS

Unified forlll. As has been mentioned earlier the CMI and SSM!
were using different schedules and forms fot collection of industrial
statistics. Now in AS! one single form of return has been designed
to meet the requirements of both the census and the sample surveys.
Data collected under A S I relate to :
Capital strllctnre. Details of fixed and working capital and transac-
tions relating to fixed capital (replacements, improvements and ex-
pansions) during the year.
Emplqyment and wages. Average employment and ·emoluments
during the year, employment by categories etc.
Inputs. Raw materials, chemicals, packing materials and con-
sumable stores consumed during the year. Work done by other con-
cerns for repairs and manufacturing processes.
Fuels and lubricants (excluding intermediate products consumed
during the year).
Other expenses not included in materials and fuel and lubricants
consumed.
Olltput. Quantity and value of manufactured products, by~pro­
ducts and intermediate products produced duriQ.g the year. Work
done for other concerns on repairs and manufacturing processes. Value
o( semi-finished goods including work in progress.
Slocks. Stock of raw materials, fuels, products and by-products
at the end of the accounting year.
Installed capacity. Installed capacity of production during the
y~r,. its !?_asis of estimation, spare capacity and expected additional pro-
ductIon. -
Power equiplllent. Prime movers (steam engine, internal com-
bustion engine and other prime movers) as at the end of the year. Also
electric motors (AC ~nd bC) at the end of the year.
It will be noticed that this information is very much similar ·to
that collected under the SSM!. However more detailed information
is being collected about these items and information is also being col-
lected about a number of new items which were neither included in
SSMI nor CMI. For the first time we shall be collecting statistics
relating to t -
(a) Equipments other than power equipment installed.
(b) Skilled, semi-skilled and unskilled workers.
(c) Installed capacity of production.
(I) Sales effected during the year classified by the type of con-
sumers.
(e) Labour and management relations.
(/) Training facilities given by the factories, and
(g) Industrial research.
GllOWnI OF STATISTICS IN INDIA

G'fllral Shortcomings. These statistics also suffer from many de-


fects and there is considerable room for improvement.
(1) As in CMI, the concepts and definition of various terms have
been borrowed wholesale from the Factories Act and Payment of
Wages Act etc. We have already pointed out that these c;ldinitions are
not suitable for industrial purposes and that fresh definitions and con-
cepts ought to have been laid down for purposes of industrial statistics.
(2) Besides these conceptual drawbacks, the distinction betwc:et:l
skilled semi-skilled and unskilled labour has to be very carefully 10-
·erpre{ed. 'Ex-factory value' has not been satisf~ctorily defined. It
is not clear whether the values have to be calculated at 'market price'
or at 'factory price'.
(3) Similarly the concept of intermediate products needs proper
classification.
(4) Finally the industrial classification as attempted under these
rules also needs improvements as well as expansion:
(4) Monthly Statistics of output
Both CMI and SSMI gave annual figures of industrial output
and these figures were available quite late, usually not earlier than the
third quarter of the following yeat. They could not meet the quick
requirements of data for studying immediate and short term changes
in output and ,employment though their utility for long tet;m planning
could not be questioned. To meet fuese short term needs the Direc-
torate of Industrial Statistics collects monthly statistics relating to the
production and installed capacity ,of certain selected industries. These
figures are not collected under any statutory authority and are volun-
tarily supplied by industrial units themselves. It has already been
mentioned that such statistics were collected even before the indepen-
dence of the country and were published in MonthlY Statistics ,oj ProJNe-
lioll oj Selevted IlIdmtrie.r ill India which was brought out then, by the
Director General of Commer£<: and Intelligence and Statistics~ Now
this publication is brough~ 0\1t by the Directorate of Industrial Statis-
tics under the Ministry of -Commerce and Industries. This Direc-
torate also makes use of .data collected and furnished by various bodies
like the offices of Coal Commissioner, Chief Inspector of }Jines, Indian
Tea B9:U~ Salt Commissioner, Textile Commissioner, Iron and Steel
O:Jnttollet and Geological Survey of India etc. 'These bodies estimate
the monthly production of industries on the basis of returns submitted
by various factories. These returns are submitted on a voluntary
basis except in case of coal, sugar, vegetable oils, cotton textile and
iron and steel where the supply of such figures is not left at the option
of the factoties.
At present more than 90 industries are included in the publication
and they are divided in three main categories namely :-
(I) MdUng and quarrying.
FUNDAMENTALS OF STATISTICS

(i,) Manufactures.
(ii,) Electric light and power.
Since these statistics are compiled, from voluntarily supplied in-
formation they are neither complete nor "comparable month after month
because the number of factories supplying information is not always
the same. However for major industries like cotton and jute textiles,
sugar, iron, steel etc. the coverage is fairly wide and' in some cases almost
the whole output is covered. In other cases statistics are obtained
mainly from big units.
This publication also contains figures of installed capacity. In
case of textiles installed capacity is estimated in terms of spindles and
looms. In some cases it is in terms of output. Figures of installed
capacity are estimated generally by the agencies responsible for the
collection of data but in some cases like sugar the estimates of installed
capacity are made by the Directorate of Industrial Statistics itself. In
case of iron and steel and electricity generation the capacity is estimated
on the basis of continuous operation of the plant throughout the year,
adjustments however being made for shut-down. In most other cases
installed capacity is estimated by taking into account the duration of
working of each industry.
A general study relating to the growth of industrial statistics in
our country and the data available at present about large scale indus-
tries has already been made in the preceding chapt~r. It is now pro-
posed to examine the statistics relating to some of the important in-
dustrial problems like finance, inputs, outputs, employment and wages
and taxes and profits etc. Presently we shall confine our study to
statistics of industrial finance only.
At present the industrial finance statistics of our country are
mainly available from the following sources:-
(,) Directorate of Industrial Statistics which had the data
collected under the annual CM!.
(i,) Directorate of NSS which had the data collected earlier
under SSMI and at present under AS!.
(iiI) Department of Company Law Administration (Ministry
of Finance) which has data compiled from the annual
returns filed by Joint Stock Companies.
(Iv) Office of the Controller of Capital Issues.
(v) Finance Corporations.
(vt) Reserve Bank of India.

C. M. DATA

The data collected under the annual CMI (upto 1958) related to
the figures of paid-up and productive capital of the manufacturing in-
dustries.
GROWTH OF STATISTICS IN INDIA 907
These figures gave only a very general and superficial idea about
the productive investment in our large scale industries. These figures,
however, were available industrywise. They were not at all enough
for statistical analy~is of the capital structure of our industrial economy.
They did not give an idea about the sources of industrial finance and
the form in which it was obtained. The coverage -of these figures, as
has been pointed out earlier. was very limited and not only a large num~
bet of industries were not at all covered by CMI but even for the indus~
tries covere'd the data were not complete.
SSM! DATA (UPTO 1958)
The Directorate of NSS while conducting the annual sample sur-
vey of manufacturing industries also collected data relating to the ca-
pital structure of the units covered.
The data collected under ASI is similar to those collected under
SSMI. In the ASI schedule there is a provision for reporting actual
expenditure incurred during the year on plant and machinery and tools,
bought for installation and this expenditure is to be shown classified
under the following headings : -
A-(I) new (2) secondhand.
B-(I) indigenous and (2) imported.
Only expenditure incurred on major additions and alterations OJ
replacements are to be reported. The idea is that only such expense~
should be included which either extend the normal economic life OJ
raise the productivity of the assets. However under this heading
there is no mention of the materials and fuels consumed and laboul
and other costs incurred by the establishments in the manufacture oj
machine for its own use. In many cases big industrial units manu-
facture parts of the plants and tools for their own use and such incre-
ments to capital is being ignored. This is a serious lacuna. In fact
there is an instruction that care should be taken not to include (in the
relevant block) any material used on capital account that is for additions
to capital. This cannot be justified on any account and it would give
an incorrect picture of the value created or added in the establishment
and it is also contrary to the recommendations to the Statistical Comi-
mission, United Nations, in International Standards in Basic Industrial
Series.
DEPARTMENT OF CoMPANY LAW ADMINISTRATION DATA
The figures obtained by the Company Law Administration relate
to the number of companies registered in any year and their capital.
Figures are available about the authorised, issued, called-up and paid-
up capital. Figures relating to calls, unpaid and forfeited shares are
also given. Figures of new registration and liquidation of companies
are also published. These statistics are given in two publications
namely :~
(I) M_onth!J BIlle Book on Joint Stock Con/panies in India.
(ii) Joint Stock Companies ill India (Annual).
FUNDAMENTALS OF STATISTICS

Monthly Abstract of Statistics and Statistical Abstract of India


(annual) also published ,these figures.
Companies are classified and separate statistics are available about
Banking companies, Loan and Insurance companies, Transit ana Trans-
port companies, Trade and Manufacturing companies, Mills and Presses,
Plantation companies, Mining and Quarrying companies, Estate and
Building companies, Breweries and Distilleries, Hotels, Theatres and
Entertainment companies etc.
Statistics relating to the capital of companies engag~d in manu-
facturing industries available in these publications are not adequate from
many points of view. They no doubt give figures of share capital but
industrial finance in modern times is not confined to share capital alone.
A large part of finance nowadays is obtained in the shape of loans from
finance institutions, Government and banks. Self financing has also
become very important and quite a substantial percentage of the total
financial requirements of the industrial corporations are met from their
own resources by the process of ploughing back profits. For having a
complete picture of industrial finance .in ariy country it is necessary to
have detailed statistics about the contribution of all these agencies,
DATA PUBLISlmD BY THE OFFICE OF THE CoNTROLLER OF
CAPITAL ISSUES
It has been mentioned earlier that I~dustries (Oevelopment and
Regulation) Act was passed in 1951 and all new and. existing under-
takings and any subsequent expansion of undertakings were required
to be licensed. The office of Controller of Capital Issues under the
Ministry of Finance receives applications frQm new units for consent
to raise certain amount of capital and also from existing units for raising
capital for expansion. This office scrutinizes these al'plications and
then gives consent for the amount it deems fit. The statistics relating
to the amount of capital for the issue of which permission has been
sought and the amount for which consent has been given are published
by it in QuarterlY Statistic.r on the Working of Capital Issue Control. These
figures arc available separately for industrial and non-industrial units.
Issues are classified as (I) initial-issued for the first time and (2) existing-
issued by existing units. Figures ate classified according to the manner
in which capital should be raised-that is through ordinary shares, pre-
ference shares, debentures and bonus \ issues. Figures are separately
given fa.. non-government companies and government companies.
DATA AVAILABLE FROM FINANCE CORPORATIONS

(i) Industrial Finance Corporation of India. The Industrial Finance


Corpora~on brings out a report on its working every year which con-
tains detailed statistics about its financial position and the assistance
given by it. T.ill March 1960 loans sanctioned by the Corporation to
industrial concerns amounted to ]2..18 crons of rupees. This includes
loans amounting to 7.84 crores of rupees granted in 1959-60. About
zj;rd of the loans sanctioned were in resp ct of new undertakings which
GROWTH OF STAnSnCS IN INDIA 909
commenced production after independence. Loans worth 47048 crord
of rupees were actually disbursed.
As mentioned in the last para there is a lag between the loans
sanctioned and the loans actually disbursed. Till March 1961 only
about 66 per cent. of the loans sanctioned wl!te disbursed. The reason
for this is that the loans sanctioned by the Corporation are not with·
drawn by the industrial concerns immediately. They withdraw the
amount gradually to save interest charges. Legal forlnatities are ~so
responsible to some extent, for the difference in 'the figures of loans
sanctioned -and disbu~sed. Attcm.pts are, however, being made to re..
move the delay caused by legal formalities.
(ii) State Finante CorporationJ. Besides the Industrial Finattce
Corporation of the Government of India there are at present 1 ~ State
Finance Corporations in the States of Madras, ~jab, Bombay, Gujrat,
Kerala, West Bengal, Assam, U. P., Bihar, RajAsthan, M. P., Andhra,
Orissa, Mysore and Jammu and Kashmir. The capital of these. cor-
porations is fixed by the State concerned subject to a minimum of
Rs. ,0 lakhs and a maximum of Rs. five.crores. The share capital of
State Finance Corporation can be subscribed by the public to the ex-
tent of 25 per cent and the re.llt is subscribed by the State Government,
Reserve Bank, Scheduled Banks, Co-operative Banks, Life Insurance
Corporation and other finance institutions. They ca:n ilso issue bonds
al1d debentures to augment their financial resources.
(iii) The National Industrial Development Corporation. This was estab.-
lished 011 October 20, 1954 as a private limited company with an autho-
rised capital of rupees one crore of which Rs. tttn lakhs have been issued
and- the amount has been provided entirely by the Government of India.
Main objective of this Corporation is to help in the establishment and
development of the industries particularly those which will fill up the
gaps jn the industrial structute ,of the country.
(Iv) Industrial Credit and InpjiStment Corporation. This was set up in
January 1955 to a.ssist industrial enterprise in the private sector. This
Corporation has been registered as a private limited company. It has
an authorised eapital of Rs. 2. ~ crores and the present lssued capital
is Rs. 5 crores.
It has received a loan of rupees ten crores from the Government of
India and credit to the extent of 10 million dollars (4.76 crores of tupeBlJ)
from the World Bank. The annual report of the corporation contains
all details about its activ.ities and industry.wise figures of financial
assistance given by it are also available.

(v) R~finalJce Corporation for Industry (Private) Ltd. This was set
up in June 1958 and it provides relending facilities to industrial
concerns against loans given to them i>y banks. Till March 1960 the
Refinance Corpqration had sanctioned assistance to the extent of Rs.
4.16 crores.
FUNDAMENTALS OF STATISTICS

DATA AVAILABLE FROM RESERVE BANK OF INDIA

The statistics published by the Reserve Bank of India about in-


dustrial finanace are of two types namely : -
(i) Statistics relating. to the direct partidpation of the Reserve
\ Bank in the industrial finance and
\ (;I) Statistics relating to the participation of scheduled and non-
scheduled banks in the financing of Indian industries.
It, however, invests in the share capital of the financial inst!tutions
which directly provide financial assistance to industries~ It also gives
advances and loans to such institutions. In this way the Reserve Bank
of India partiCipates in the finance :;Of Indian industries.
The Reserve Bank of India also p~blishes statistics relating to the •
role of scheduled ana'non-scheduled banks in the financing- of Indian
industries. ',
.... ,
The Reserve Bank of India publisl;les details of the assets and the
liabiliflies of the scheduled banks and the figures relating to the classifi-
cation ..tif the loans and advances are alsO' -available 'Yith it.
Besides these figures which are regular1y published by the Reserve
Bank of India, some other industrial finance statistics are also available
with it. Department of Research and Statistics of the Reserve Bank of
India conducts special studies relating to certain pIOblems associated
with industrial finanace in India. One very useful\ study conducted by
it relates to the extent of self financing in Indian industries.. The Reserve
Bank of India analysed the balance-sheets and profit and loss accounts
of 750 Joint Stock Companies during the five years from 1951 to 1955
(both inclusive) and arrived at certain conclusions. These 7,0 com-
panies covered units in manufacturing, mining, plant~tion, electricity,
shipping. trading and real estate sectors. All units were limited com-
panies registered in In'clia with rupee capital and having a paid-up capital
of not less than Rs. five lakhs each. The paid-up capital of all these 750
companies was 2./3rd of the total paid-up capital of all public limited
companies in the sectors which were covered.
From 1955 onwards this. study was conducted in respect of 1001
companies and the Reserve Bank Bulletin of September 1961 gives
all details relating to these studies.
FOREIGN CAPITAL STATISTICS

Statistics relating to foreign capital are regularly published by


the Reserve Bank of India. Foreign capital comes to the country
either (I) in the shape of loans and grants or (2.) as permanent or long
term investment in industrial enterprises. Such capital usually comes
from the following sources :-
(I) Foreign Governments.
(iJ) Private foreign investors-.
(iiI) lnternational agencies like IMP and World Bank etc.
GROWm OF STATISTICS IN INDIA

(hi) U. N. O. agencies like FAO. UNESCO. ILO. etc.


(JI) Private philanthropic agencies like Ford Foundation, and 1toc:k-
feller Foundation etc.
Purposewise distribution of foreign loans and credits arc available:
10 the Reserve Bank Bulletins on the following basis ; -

(I) Railway development.


(il) Power projects.
(jii) Steel and Steel projects.
(iv) Orissa Iron Ore Project.
(v) Port Developments.
(VI) Transport.
(vii) Industrial Development.
(viit) Agri~ultural development.
(ix) Wheat loans.
Under each of the above heads separate figures are available for
the contribution of I. B. R. D. and D. L. F. Allocations for private
and public sectors are also shown separately.
Reserve Bank publications give separate figures for various coun·
tries and different agencies within a country. Assistance for specific
purposes are also indicated. External assistance is classified as follows :
(I) Loans and credit.
(a) repayable in foreign currencies.
(b) repayable in rupees.
(ii) Grants.
(iii) Other assistance.
Foreign BTlsiness InveslHlents. Figures relating to foreign business
investments are also published by the Reserve Bank of India. The
Reserve Bank Bulletins give. the _tradewise and countrywise foreign
business investments in India during the last few years.
These foreign business investments represent all investments of
a long term nature by such people who are not residents of India. Thus
they consist of (I) net foreign liabilities of branches of foreign incorpo-
rated companies working in India and (il) shares and debentures of
Indian companies held by foreigners. Most of these foreign invest-
ments are by private foreign agencies though in recent years Indian
companies have also borrowed from institutions like World Bank. Pri.
vate foreign investments include reinvestments of profits also.
INDUSTRIAL OUTLAY AND INVESTMENT IN
5 YEAR PLANS
Statistics relating to proposed outlays in industries and actual
investment made both in public and private sectors are available in
five year plans. Investments target and figures of actual investments
are available industrywise also. "-
The above brief study of the statistics of industrial finance clearly
show that though we are collecting a large variety of statistics, they are
912 PUNDAMENl'AU OP S'lIA'l'lST:tCS

by no means adequate for detailed analysis. Our figures only give us a


superfluous idea about the tot.,.l investments and the main sources from
wliich finance flows to the industrial sector but for analytical study
these statistics are. inadequate. We should collect and publisb indus-
trial finance statistics in a coordinated fasbion so that all basic data are
available on a uniform basis at one place. The data collected should
be such as would facilitate the study of problems associated with the
~pital structure of our industries as well as the dividend policies fol-
lowed by them.
INDICES OF INDUSTRIAl. PRODUCTION

Index of Industrial Production


This index was first compiled by the Economic Adviser to the
Government of India, Mini stty of Commerce and Industry as an
inferil1l series (1937 = 100) ustng arithmetical average. Since it has
been revised several times. The base year has been raised to 1960,
It is now published in two series, the monthly series has 324 items
and the annual series 449 items. The number of items and their
~ghts are given below:
-
No. of items Weights
Monthly Annual
Series Series

I. Mining and ~uarrying 35 35 9.72


n. Manufacturing 288 413 84.19

Food manufacturing industries 8 15 12.09


Beverages and tobaceo indus-
tries 2 .2 2.22
Manufacture of textiles 19 20 27.06
Manufa:cturing of footwear, other
wearing apparel and made-up
textile goods 2 3 0.21
Manufacture of wood and cork
except manufacture of furniture 6 6 0.80

Manufacture of furniture and


fixturea 1 0.39
Manufacture of paper products 6 6 1.61
GROWTH OF S'l'ATIS'l'lCS IN INDIA 913

Manufacture of leather and fur


Products except footwear and
other wearing apparel 5 5 0.43
Manufacture of rubber products 26 26 2.22
Manufacture of chemicals and
chemical· products 83 138 7.26
Manufacture of products of
petroleum and coal 1 9 1.45
Manufacture of non-metallic
mineral products except pro-
ducts of petroleum and coal 16 16 3.85

Basic metal industries 26 27 7.38

Manufacture of metal products


except machinery and trans-
port epuiproent 15 18 2·51

Manufacture of machinery
except electrical machinery 43 71 3.38
Manufacture of electrical
machinery, apparatus, appli-
ances and supplies 17 21 3.05

Manufacture of transport
equipment 8 15 7.77
! liscellaneous manufacture
industries 5 14 1.23

III. Electricity 1 1 5.37


Total
324 449 100,00

"CAPITAL" Index of Industrial Activity

The Capital is an economic weekly published from Calcutta.


It compiles and publishes an index of industrial activity every
month. This is a major effort to prepare an indicator of industrial
58
914 FUNDAMENTALS OF S'l'ATISTrCS

actiTity in the country. The base year of the index Was 1935 when
the lefies was started in 1938. It has been changed to 1953. The
constituents and weIghts assigned in the series have dso been- comp-
letely revised. The weights assigued to various items ar:c u follows;
Weights.

A. Industrial Production
1. SU8a~ .... 63
2. Tea 4.95
3. Jute Textiles 8.11
~. Cotton Textiles
(i) yarn 7.75
(~i) woven material 19.39
5. Cement 1.54
6. Coal 6.01
7. Iron and steel
(i) pig iron and ferro-alloys 1.91
(ii) finished steel 3.S1
8. Paper and board 1.27
9. Tyres 2.13
10. Automobiles 1.45
11. Bicycles 0.45
12. Electricity 2.11
65.52
.s. Transportation
.i.ailways: net ton-miles
100.00
Index Number of Industrial P.rofits
The office of the Economic Adviser under the :Ministry of Com-
merce and Industry used to pubUth an index number of industrial pro-
fits till April, 1951 when th.i3 work was transferred to the Ministry
of Finance. The index number 11 flOW published by the Company
Law branch under the Ministry of PiMnce. This index is based on
the following eight industries :
(I') Cotton, (it) Jute, (iii) Cement, (iff> Tea, (fI) Iron and steel, (n)
Paper, (flit") Sugar and ";il) coal.
The t.echnique of construction -of this index is Tery 8imple. A
number of companies have been selected [rom the list available in the
FUNDAMENTALS OF STATISTICS ~ 915

. nvestors Year Book and the profits of these companies are found out
nd an index for each industry is calculated on the chain base system.
bis chain relatives are also linked to the year 1939~ Formerly they
sed to be linked to' the year 1918.
This index number is extremely defective and should not be used
:>r general purposes. Some industries have been well represented
1 the index while the representation of others is very poor. Percen-
1ges of paid up capital of companies included in the index to the total
aid up capital of all the companies in that industry for the year I9~9-40
'ere 46.5 in case of cotton, 91.7 in case of jute, 80.t in case of cement,
).6 in case of tea, 71.2. in case of iron and steel, 31.6 in case of sugar,
to in case of paper and 63.7 in case of coal. It is obvious from these
~ures that whereas some industries like paper and jute «Ie well repre-
-nted others like cotton, sugar and tea are not represented adequately.
Besides being unrepresentative another drawback of this index
that the definition of the term profit used in the series is very defective.
here is no uniformity in the returns submitted by the companies and
ey use the term 'profit' in different senses
Apart from the statistics recently published by the Reserve Bank
'out 1001 companies and the above mentioned index of industrial
'ofits, no other statistics relating to industrial profits are available
our country. No doubt the companies have to file a copy of their
cadiitg and Profit and Loss Account and Balance Sheet with the Re-
strar of Joint Stock Companies, but. these valuable data are not offi-
tily put to any use. At a time when we are trying to have planned
onomic development in our country, studies relating to profits of
:lustrial enterprises are of special consequence. This is particularly
in our country where our economic plans aim Itt the establishment
a socialistic pattern of society. Studies relating to profits of the
:vate sector are extremely necessary and it would be worth while
have exhaustive data about this problem.

EMPLOYMENT STATISTICS

At present there are three sources fJ:om which industrial employ-


:nt statistics emerge. They are-
(I) Labour Bureau; Government of India.
(ii) Census of Manufactures.
(iiI) Sample Survey of Manufacturing Industries (upto 1958).
(iv) Annual Survey of Industries Act (From 1959 onwards)
bour Bureau Statistics of Employment
These statistics are collected by the Chief Inspector of Factories
a half-yearly and annual basis under the provisions of the Factories
t 1948 and are consolidated by the Labour Bureau, Governme'1t
India.
916 GROWTH OF STATISTICS IN INDIA

The following employment statistics are published by the Labour


Bureau ! -
(i) Number of working factories and average daify employment.
Similar statistics are available about employment in mines (ini-
tially collected by the Chief Inspector of Mines) and in plantations (ini-
tially collected by the Ministry of Food and Agriculture).
The Indian Labour Year Book also publishes employment statistics
relating to Railways, Post and Telegraphs, Ports, Shops and Commercial
Establishments, Central Government establishments and also Agricul-
ture.
(it) Number of registrations and placements effected 0 the employment
exchanges and the number of employers using employment exchanges.
(iii) Number of persons undergoing training in the training centres and
the number of training centreS.
(iv) Statistics relating to Labour Absenteeism.
Figures of absenteeism are available for cotton, woollen, engineer-
ing, leather, gold mining, coal mines, plantations and some other
industries for a number of centres.
(v) Statistics relating to labour turn-over.
Industrial Relations Statistics
It is proposed to study here statistics re1aVng to trade unions,
industrial disputes and machinery for prevention and settlement
of disputes.
Trade Union Stati~tics

The Labour Bureau compiles and publishes detailed statistics


relating to trade unions in the country. The scope and coverage
of these statistics is however limited, as all trade unions in our
country are not registered and the available statistics relate only to
trade unions registered under the Trade Unions Act of 192.6.
Even for registered unions statistics available are not comparable
and temporal comparisons cannot be continuously made. The
comparability of various breakdowns of these statistics is affected
by the fact that industrial classification has not remained unchanged.
The statistics published by the Labour Bureau about- union in India
are as follows !
(i) Number of registered Trade Unions and membership of the
Unions submitting returns. Membership figures are given
sexwise and average membership per union is also
calculated. Statewise figures are also available separately
for workers' union and employers' unions. These
figures are classified according to industries.
(it) Trade Union Finances. Statistics are "available relating
to sources of income and various items of expenditure
of the registered trade unions.
FUNDAMENTALS OF STATISTICS 917
Industrial Dispute Statistics
I The Labour Bureau maintains All India statistics ot industrial
disputes resulting in \vork stoppages. Both 'strikes' and 'lockouts'
are covered. Political strikes, sympathetic demonstrations etc. are
left out as they do not relate to any demands of the labour lying within
the competence of the management. Only those work stoppages
are included in these statistics which include 10 or more workers whe-
1

ther directly or indirectly.


These statistics are collected for all states through State Labour De-
paItments. The information is obtained on a voluntary basis on uni-
form lines laid down for the purpose. Important data collected are
as follows :-
(i) Number of workers involved directty or indirect(y. It refers to the
maximum number affected on any day during the duration of the
stoppage of work.
(ii) Number of man-dqys lost. These are obtained by adding up
the actual vacancies caused directly or indirectly by the work stoppage
in each shift of each potential working day (excluding Sundays and other
holidays) during the period of work stoppage.
(iii) Disputes classified by industries. Industrywise classification of
disputes is available in the publications of the Labour Bureau.
(Iv) Disputes classified by cause. The Labour Bureau also publishes
statistics relating to disputes classified by cause.
(v) Classification of terminated industrial disputes by results.
Indusirial Disputes, Prevention and Settlement Machinery Statistics.
The Labour Bureau publishes statistics relating to the number of :
(a) Workers Committees,
(b) Production Committees.
(c) Joint Committees,
in different industries. These figures are available both statewise as well
as industrywise.
SOCIAL SECURITY AND LABOUR WELFARE STATISTICS

Social Security Statistics


A number of social security benefits are enjoyed by the It;ldian
Labour under various legislations. Some of .them are given below :
(I) Under Workmen Compensation Act 1923. The Labour Bureau
publishes statistics :-
(a) the ,number of injuries for 'which compensation was paid
(b) the amount of compensation paid.
These figures are available statewise. This Act is administered
by the State Governments, and they compile these statlstlcs and send
th(;m to the Labour Bureau which publishes them.
918 GROWTH OF STATISTICS IN INDIA

(il) Employees State Insurance Scheme. Statistics relating to the vorI.-.


ng of E. S. 1. scheme are published annually in the Indian Laboll!
(ear Book. The following details are available :
Ca) Rates of weekly contribution of employees.
(b) Rates of benefits applicable under various types of disable-
ments (temporary, permanent, partial, permanent total).
(c) Dependants benefits.
Cd) Medical benefits.
(e) Cash benefits paid during the year under various heads
(sickness benefits, maternity benefits, various types of
disablement etc.)
(f) Area where E. S. I. Act has been enforced: Due to various
difficulties this Act was not simultaneously enforced at
all places. Gradually the area of its operation is in-
creasing.
(g) Number of employees covered by the Act.
(h) Sources of income and items of expenditure ofE. S. I. Fund.
(iii) Maternity Be/lifits Act. There are a number of Maternity
Benefits Acts in certain States and the Labour Bureau publishes state-
wise figures of the number of women. '
(a) who claim maternity benefits (b) who w~re paid benefits in
full or part (c) total amount paid.
(iv) Coal Mines Provident Fund and Bonus Schemes Act 1948. The
Labour Bureau publishes statistics relating to the provident fund
contributions and the number of workers who earn bonus, the amount
of bonus etc.
Cv) the Employees Provident Fund Act of 1952. This Act provide~
for the institution of Compulsory provident fund for employees io
factories. Statistics relating to this Act are also published by the LaboUl
Bureau.
There are a number of other labour legislations in our countr~
and the Indian Labour Year Book publishes stati$tics arising out of th,
working of such legislations.
Labour Welfare Statistics
Labour welfare includes, "such services facilities and amenitie
as may be establfshed in, or in the vicinity of, undertakings to enabl.
the persons employed therein, to perfoi'm their work in healthy anc
congenial surroundings" (Indian Labour Year'Book 1959, page 222-
based on I. L. O. Concepts) Labour Welfare inCludes inter alia canteel
facilitie.s, rest and recreation facilities, transport facilities to and fran
workshop and medical and educational. facilities etc.
The Indian Labour Year Book contains details of the facilitie
available for all the above schemes and the extent of their utilisation i
different States, industries and in other types' of employment.
SECTION 5

PRICE STATISTICS

Need. The importance of having accurate statistics relating to


prices is very great because price variations affect all individuals in
some form or, 'the other~ Changes in the price level affect different
classes of people in different ways and as such not only is it necessary to
have detailed statistics of Wholesale prices relating to the whole country
but also statistics at retail prices relating to different towns, cities or
regions of the country. The data available in India relating to prices
can be studiec;l under two headings. namely, wholesale prices ~lQd retail
prices. W,holesale prices can relate to either harvest pri~es 'Or price.
of other commodities or, the prices of various types of securities. Retail
prices, generally relate to the prices paid by consumers fat different
types of commodities at ~rious centres in the country. Statistical
data are available either in 'the shape of prices of the comrnoclities or
in the shape of' index: nu~bers tQ measure the general price level or
prices in diffei~,nt localities': <;>r regions.
Publications. Fonnetly, the data: re~ting to prices wete published
in Prices and Wage.r a,nd the Government of India Ga~.etiu. The statis-
tics contained in these publications were extremely defective and were
not comparable either. They generally contained prices of raw materials.
Later on Indian Trade Journal and Wholesale Prices of Certain S,lected
Arli~/es of TrlZde at Selecied Slations in Inaia which were issued by the
Departmen~ of Commercial Intelligence and Statistics started giving
publicity to prices of agricultural commodities and manufactured
goods. Monthfy,SIIT'II'_' of ,BlISiness Conditions ill India published by the
Economic Adviser's office ,also contained certain statistics relating
to the prices of'industrial and agripIltural commocUties. Later on in
the year '1951 UUS publication was metged with the jOflf"!l41 ~ Illtimtty
and Trade which is a. mon~y pUbllcation issueq by the Ministry of
Commerce and Industries. The Agri(ll/tlll"al Sifrl4lioll.t ill ,India which
is a monthly 'publi'c!ltiori '~ the Directorate of Ecot'lomies and Statistics '
in the Ministry of ,Food and: Agricqlture also publishes pritts of certain
agricultural co~odities,. in ,details, ';rhe l}silkPn of Agricllltllral Pricu
and Indian Agri&1lllllTal 'Prit", Ssati.ttiu Which are ~th issued by the MiDistry'
of Food and Agricul~re 'also contain ,statistics, of paryest prices, procure-
ment ptices for foodgrains. whol~e market prices; .etc.
Harvest Prices
Meaning. It IS absolutely essential to have statistics relating to
harvest prices because 'these st~tistics, not 'only help the Government in
deciding its policy but they also help traders, businessmen and agricul-
turists in a number of ways. It is slightly difficult to define the term
GROWTH OF STATISTICS IN INDIA
920
" harvest prices" and unfortunately this term is used in a number of
senses in various parts of the country. Strictly speaking the term
"harvest prices" refers to the wholesale prices received by the farmer
at the time of the harvest. In India in certain states the wholesale
prices of a few important markets are taken as harvest prices and in
other states the practice in this respect is different. Up to 1946-47
statistics of harvest prices were published in Indian Agricultural Statistics
and later on all data relating to agricultural prices began to be pub-
lished in Indian Agricultural Prices Statistics issued by the Directorate
of Economics and Statistics under the Ministry of Food and Agricul-
ture. These statistics are obviously defective and give misleading
conclusions. In order to remove these defects a new scheme was started
in the year 1950. According to it harvest prices have been defined
"as the average wholesale prices on which the commodity is disposed
of by the agriculturist to the trader at the village site during the speci-
fied harvest period." The average wholesale price is calculated as
follows: Some villages are chosen as representative of each district
and price statistics are collected from them. The prices refer to the
most common variety of a commodity and are generally those ruling
on Fridays. The simple arithmetic average of these prices indicates
the district average. The average prices of various districts are used
to estimate the harvest prices for the state as a whole. District prices
are weighted in production to the quantity produced in various districts
and the weighted arithmetic average give's the harvest prices of
the state.
Defects. The above discussion clearly shows tha~ despite this new
scheme the statistics of harve~t prices are extremely defective even
now. First of all it cannot be said in the absence of standardization
that the prices refer to same quality of the commodity each time and
as such it is not possible to compare the price quotations of different
places or of the same place ,at different times. Besides this the data
collected are not properly analysed and as such they cannot be put to as
much use, as would have been possible if there was proper tabulation,
classification and analysis of collected information. These statistics are
not available regularly and this is another defect from which they suffer.
It is extremely essential that statistics of prices are available regularly
in the shape of a time-series and then only it is possible to utilise such
data fully.
Formerly the Department of Commercial Intelligence and ~tlitis­
tics used to collect certain statistics of harvest prices. These statistics
are published in Agricultural Statistics of India, Volume I -and also in the
Indian Trade Journal. These prices were based on the information
collected through non-official agencies and they represented the average
of the weekly quotations during harvest period at the important market
centres adjoining the major producing areas of each crop. The prices
were collected through the branches of the Imperial Bank of India and
help was also taken of the revenue departments. These figures were
extremely defective and were published very late. At present these
FUNDAMENTALS OF STATISTICS 921~

prices have been renamed as harvest se"asonal prices and are published in
Agricultural Situations in India. These prices should be used with cau-
tion because there is no uniformity in the quality of the commodities
for w~ich figures are given and these prices have been found to differ
widely from similar figures collected by various trade associations and
technical journals in the country.
Other prices
Beside the prices of agricultural commodities at harvest time the
prices of other commodities are also available in a hrge number of
magazines and journals some of which are official while other's are un-
official or semi-official. MonthlY Survry rif Business Conditions i/1 India
used to publish the prices of a large number of commodities like cotton,
jute, iron, steel, sugar, coal, foodgrains, oilseeds, tea, etc. As has been
said earlier this publication has been merged with the jottrnal of Industry
and Trade which is a monthly publicatio1} of the Department of Industry
and Commerce. This publication gives the prices of agricultural and
non-agricultural commodities and the Economic Adviser's index num-
ber of wholesale prices which we shall discuss later on in this very
section, is based on the prices available in this publication.
During the period of war when prices of most of the commodi-
ties were controlled the Government used to publish the controlled
prices. Statistical data were published about the prices at which the
Government used to purchase the commodities and the prices at which
it sold the commodities to the public. Besides the above statistics
data relating to the prices of securities are also published in the shape
of bulletins by the Reserve Bank. Every week, the Reserve Batik of India
Bulletin is published and it gives the prices of securities, gold, silver,
etc. The weekly bUlletin of statistics and the MonthlY Abstract rif Sta-
tistics, ;lS also the Statistit:ll Abstract of India whi:ch are brought out by
the Central Statistical Organisation also contained statistics relating to
securities and bullion. The above account clearly shows that there
has been some improvement in the price statistics relating to our coun-
try in recent years. Most of the statistics prices relating to our country
are published by the Office of the Economic Adviser and the Direct-
orate of Economics and Statistics. These departments obtain price
statistics from various state governments and also non-official organi-
sations like Chambers of Commerce, trading firms, etc. The state
governments have also effected certain improvements in the collection
of price statistics and this work is now done by the Economic In-
telligence Inspectors who are specially appointed to collect statistical
data. Normally this work was done by the patwaris and kanungos who
did not devote much time to this work and supplied statistics of a very
poor quality. In order that these staistics may be collected uniformly
throughout the country the Central Government has laid down certain
instructions which are followed by most of the states.
Despite improvements that have been done in recent years in
price statistics there are yet a large number of defects from which
GROWTH OF STATISnCS IN INDIA

they suffer. First of all due to the absence of standardization, the prices
available at different periods are not strictly comparable with each
other. Moreover price statistics are not published at uniform inte~. t
and the data are not collected in such a manner that price index numbers I
may be easily constructed from them. In other countries of the world
a large variety of price indices are published and various types of per-
sons utilise them with profit. This thing is not possible in our country
unless price statistics are improved considerably both in quality as well
as in quantity.
PRICE INDEX NUMBERS

lndex nutnber of Harvest prices


Index numbers of harvest prices of principal crops are published
1;>y the Directorate of Economics and Statistics under the Ministry of
Food and Agriculture. The prices on which the index numbers are.
based are the average of the weekly quotations relating to the harvest
period at important marketing centres, dose to the major producing
areas for each crop. The commodities covered by this index are 15.
and include riCe, wheat, jowar, bajra, barley, maize, gram, l.inseed.
sesamum, rape and ·mustard, groundnut, raw sugar, cotton, jute and
tobacco. The base year of ~he index number is 1938-39. This index
number used chain base. method of .construction and current years
prices are e~ressed as price relatives of the priceslof the immediately
preceding year. They are, however, linked with the base year.
The index number is based on the prjce relatives of various states.
The price rela#ve of a commodity for the whole state is arrived at by
calculating the simple ·geometric mean of the diffetent price relative at
different centres. The price~ of various varieties are first geometrically
averaged into one price which is used for the purpose of calculating
the price ·relativcr. Thus the procedure works out as follows : -
(I) The price telative of eflch variety of a commodity at each
centre is calculated on the basis of price of the immediately preceding
year.
(iI) Simple geometric mean of tpe price relati~es of the various
varieties of a comlnodity at·eadi centre. is calculated to give the price
relative of that commodity for that centre.
(iii) Simple geometric average of the different price relatives at
different centres is calculated to give a single price relative for the
commodity in question for the whole of the. state.
(if)) The prici: relatives of various states are .combine·d into one by
calculating weighted geometric mean. The weights are decided on the
basis of the· figures of the current year's production in various states.·
It would be observed t~t the weights used in the index number are
moving. .
As has been said earlier these chain relatives are lihked to a common
year also. The base period is 1938-39.
FUNDAMENTALS OF STATISTICS 923
GENERAL WHOLESALE PRICES

Wholesale prices of various commodities in our country are collected


by the office of the Economic Adviser to the Government of India and
by the Directorate of Economics and Statistics and Statistical Bureaux
in thd states. These bodies get the price data both from official sources
like customs house and the State Bank and their own ageneies and non-
. official sources like chambers of commerce and other business and trade
organisations. The State Governments have also a specially qualified
staff for the collection of these statistics. Economic Intelligence
Inspectors in most of the States collect prices at regular intervals from
various important market. The staff of the Marketing Depattment
also collects price statistics.
Wholesale pricl1 of Certain Staple Articles' of Trade at selected stations
in India. The office of the Economic Adyiser, Government of India
(Ministry of Commerce and Industry) collects these prices every week.
These weekly prices are used for the compilation of wholesale prices
index numbers. Monthly figures of prices are also available in the
Monthly Abstract of Statistics. The weekly priCes a!,e available in "Index
Number of Wholesale Prices in Indja.'~
So far as staple articles of trade ate concerned their number is more
than '50 and the price data are collected from different markets in the
country.
Apart from the above statistics, data are also available about the
prices of all those items which enter the .Economic Adviser's Index
Number of Wholesale ,Prices. There are at present I I 2 .commodities
included in the index and the weekly prices of each one of these commo-
dities are available in the publication entitled "Index Number of Wholea
sale Prices In India."
The Directorat~ of Economics and StatistiCs in certain states collect
weekly \\!holesale prices of certain articles .at different centres of the
state. These prices are published if!. the bulletins issued by tne State
Statistical Bureaux or by the Directorate itself. 'Prices are gene~lly
collected on Fridays and the agency. for the collection of these prices
is generally that of Economic' Intelligence Inspe,etors.
The data available about wholesale prices in our country are not
very adequate. Most of the figur~s relate to only those items which
enter the:- Economic Advisers Index Number of Wllolesale Prices. There
are a'large number or" other cp.mm.6ditie!l which .are not included'in this
index but which ~lre imppttant froin many points of view, about which
no regular' statistics' are collected either by' the states or by the Central
Government. It is necessary tha,t price data should be collected about
all important commodities at different "centres pf the country and they
should be regularly publishep.
General Whole.rale Price Index NJlmb~rs. General p1.lrpose whole-
sale price index numbers are compiled by the office of the Economic
,.924 GROWTH OF STATISTICS IN INDIA

Adviser to the Government of India. The following indices brought


out by this office deserve special mention :
(t) Economic Adviser's (sensitive) Index Number of Wbolesale Prices.
This index number was staned during the period of second world war.
It was a sensitive index number of the wholesale prices which was consti-
tuted by taking into account the prices of 23 commodities only. These
23 commodities were divided in four groups namely (a) food and to-
ba~co group '(~) other agricultural commodities group, (c) raw ma-
terIals (non-agncultural group) and (d) manufactured articles group.
The base period of th'is index Was the week ending 19th August
1939. The prices on which the index was based were all India whole-
sale prices and several varieties were included in case of some commo-
dities but price relatives of all the varieties were averaged into a simple
price relative for being included in the index. The index number
was an unweighted one and the av~rage used was the simple geome-
tric mean. First the index number for each group was worked out
by calculating the simple geometrical mean of price relatives included
and later on, the general index was obtained by averaging the price re-
latives of all the 23 commodities in a similar manner.
An index called the primary commodity index was also worked out
by averaging the commodities included in the first there groups and
another index of chief articles of exports was also calculated by taking in-
to account 14 items out of the list of 2,3'
This index number came in for a lot of critiCism as the number
of items included in the series was very small. This was not a repre-
sentative index and it excluded important items like millets, pulses,
gur, salt etc. On the other hand unimportant items like groundnut
and copra were included in the food group. In view of this criticism
this index number was discontinued in December 1947.
Economic Adviser's index number of wholesale prices (Base
August, 19.39).
In view of the criticism of the war time index number issued
by the Economic Adviser's office, a new scheme for the compilation
of a wholesale price index number suitable for general purposes was
prepared in the year 1944 and the idea was to construct a wholesale
price index number in five stages. The scheme was started in February,
1944 when index number of the food group began to be published and
it was completed in the beginning of the year 1948 when the index
number relating to the last group, namely, the miscellaneous group
began to be published and the five index numbers relating to the five
major groups were also combined in one.
This new index number \ is a weighted geometric mean of the
price relatives of 78 commodities which are arranged in 18 sub-groups
and five important economic groups.
The index numbers (all the six, one for each of the five groups
and one combined) are now published every week. Weekly index
FUNDAM:ENTALS OF STATISTICS
925,!
numbers are calculated from one day a week prices on or about Friday.
In order to secure representative character for the index numb.er parti-
cularly from the point of view of markets included, several varieties
have been included in case of many commodities but their quotations
are first geometrically averaged so that at the time of actual compila-
tion of the index number each commodity has only one price relative.
In all a total of Z I 5 quotations are taken into account in the compila-
tion of the index number. For the most part prices are those charged by
manufacturers or importers or those prevailing in wholesale markets.
The weekly quotations of various commodities are first converted
into price relatives. Simple geometric mean of the price relatives of
several quotations gives the commodity index. Weighted geometric of t~e
various commodity indices within a sub-group gives the sub-group-
index and the weighted geometric mean of the sub-group indices gives
the group index and again the weighted geometric mean of the group
indices gives the all-commodities index or the general index. The weights
assigned to various commodities are in proportion to the total values
of the commodities as determined from the estimated quantities mark-
eted and the prices prevailing in the year 19,8-39. For the sake of con-
venience in case of agricultural commodities and other industrial raw
materials the estimated quantity retained by the producers have been
left out of account. In regard to manufactured or semi-manufactured
articles it has been presumed that the whole production was put on the
market.
The weights of the various groups and sub-groups and items are
as follows : -
Weights of Major Groups
Food articles 31
Industrial raw materials 18
Semi-manufactures 17
Manufactures 30
Miscellaneous 4

Total 100
In the food articles group there are three sub-groups, namely,
cereals, pulses and others, with weights 59, 8 and H respectively. Simi-
larly in the industrial raw materials group there are four sub-groups,
namely, fibres, oilseeds, minerals and others with weights of 53, 30, 10
and 7 respectively. Other groups are similarly divided in sub-grm.,ps.
In the semi-manufactures group there are seven sub-groups and in the
manufactures group there are three sub-groups. There in no sub-
group in the miscellaneous group.
Each of the sub-group is sub-divided into a number of items.
Thus in the cereals sub-group the number qf items is 4. In the sub-
group relating to pulses the number of items is z and so on.
.926 Criticism.
r GROWTH OF S1'A.1'IS1'ICS IN INDIA.

The above index number is the most popularly used


wholesale price index number of our country. Though it is the best
index number that we have telating to wholesale' prices yet it can be
criticised on a large number of grounds. Firstly the weights used in
the index number are very old and out of date and they are not appro-
priate for current times. During World War II and after that also,
manufacturing industries of the country have developed considerably
and it is necessary to distribute the weights between various commodi-
ties, sub-groups and groups, etc. Besides this the manner in which the
weights have been selected is also defective. The weights are based on
the gross market values of the commodities and nqt on the basis of net
output. There is a certain amount of duplication in certain cases, for
example raw cotton is first taken into account and later when cotton
cloth and yarn are treated the value of raw cotton is again included.
The number and the character of quotations used in the index number
are also not representative. There are as many as 8 quotations for
shoes and only; for rice. The grouping of the index number is also
not free from defect. The food articles group index is very often re-
ferrc;d to as food index though strictly speals-ing it is not food index.
Food Index should include cereals or foodgrains only. Food articles
group included in this index number includes pulses, tea, coffee, gur
and salt, etc. As such the index number of the food articles group
should not be used to represent the variations in the prices of food.
The number of items included in the index is lonly 78 and for a
vast and heterogeneous country like India this number is inadequate.
This index number is meant to be a general purpose index number and
as such it should include a larger number oj items than is done at present.
In view of all these defects it is necessary that this index number
is completely overhauled. The base period of the index number should
be changed. The number of quotations should be increased. The
weights should be redistributed and the grouping should also be
more logical and scientific.
Economic adviser's (revised) index number of wholesale prices
With a view to remove the defects of the Wholesale Price Index
Number, the Office of the Economic Adviser has recently issued a
revised index number. The revised series includes as many as 112
commodities and 555 individual quotations as compared with 78 commo-
dities and 2.;0 quotations in the old index number. The additional
commodities included in the revised index number are as follows : -
New items. Maize, Barley, Ragi, Potatoes, Onions, Oranges,
Bananas, Milk, Ghee, Fish, Eggs, Meat, Sugarcane, Hemp, Foreign
cotton, Tanning materials, Lubricating oils, Aviation spirit, Diesel oil,
Electricity, Bamboos, Aluminium, Tin, Lead, German silver, Handloom
cloth, Hosiery goods, Coal tar products, Medicines, Tools, Bobbins,
Leather belting, Cycles, Plywood tea chests, Pottery goods and Lime.
Markets. The choice of the markets has been made on the basis
of the place of the commodities in the national economy and the
l',uND.AMENTALS OF STATISTICS 927
representative character of the markets. The recommendations of the
Agricultural Prices Enquiry Committee (Thapar Committee), and the
opinions of the leading Chambers of Com~erce and leading manu-
facturers and various government departments have been given due
consideration in making the final selection of the markets.
Basi year. The main consideration for the selection of the base
year of the new index has been that it should be a post war and post
partition year of narrow fluctuations in prices and further that it should
be as near as possible to the cOJ:lUllen.cement of the Five Year Plan.
It was found 'that the year ending August 1949 and the fiscal year
19S2.-H were periods 'of minimum price fluctuations. The year 1952-53
was finally selected because it :was found to be a year of gene~l
stability and becau"Se the Standing Committee of the Departmental
Statisticians had earlier recommended the adoption of this year as the
base year of all official index numbers.
New grDllpl. The revised index number has two new groups viZ'
(,) Liquor and tobacco and (ii) Fuel, power, light a~d lubricants. The
miscellaneous group of the old index number has been broken up and
apportioned into the other groups identifiable in terms of commodities.
Weightl. "The weights assigned to various commodities are
based on the estimates of marketed values of domestic produce and the
values of imports inclusive of duty. As regards manufactures, weights
have been fixed in accordance with the data for gross value of products
as obtained at the Third Census of Indian Manufacturers 1948; imports
have also been taken into account. In regard to intermediate pro-
ducts, only the portion produced for sale has been considered. In
the case of electricity, the weight is based on the energy sold by the
electricity undertakings and valued at the average all-India rate. Pet-
roleum data are based on consumption figures. The weights refer
to the post-partition period 1948-49. Such data are available for all
the commodities included in the index for the year 1952-5;. The
weig~t base is thus different from the price comparison base." Accord-
ing to the new weighting of the groups given below, the relative im-
portance of the gtpups has changed :
Revised Old
(1) Food articles 5 0 .4 3 1 •0
(2.) Liquor and tobacco 2. I
(3) Fuel, power, light and lubricants ,.0
(4) Industrial raw materials 15·5 18.0
(5) Manufactured articles 29.0 47. 0 *
'(6) Miscellaneous 4. 0
Total 100.0 100.0

* Semi manufactures 17.0 and Manufactures 30.


GROWTH OF STATISTICS IN INDIA

From the above table it is dear that the weight of the food group
has increased in the new index and the weights of non-foo'd groups
have decreased.
Average. This index number does not use geometric mean; it
uses weighted arithmetic average instead. The general indices of the
two series can be linked on the basis : 100 of the new series=380,6
(being the average for 1952-53) of the old series.
Calcutta wholesale price index number
This index number is monthly ; at present it is compiled from
69 commodities. Formerly it used to be compiled from 72. commodi-
ties which were divided in 16 groups. A separate index number is
worked out for each group by finding out the simple arithmetic average
of the price relatives of the articles included in the group. The base.
of the index number is July 1914 and the price quotations relate to the
wholesale prices of the commodities prevailing in Calcutta. Though
the index number of each individual group is calculated by the use of
simple arithmetic average yet an element of weighting is introduced by
taking more than one quotation for some items within a group. Thus
,. cereals" includes four varieties of rice, and only one each of wheat,
barley, maize and oats.
The general index number is arrived at by calculating the simple
arithmetic average of all the individual price relfltives included in'the
compilation. It can be said that this general in\:lex number is also a
weighted index number, the weights in each case being equal to the
number of items included in each group. This index number should
be used with caution. The price quotations on which it is based refer
to one day in a month and as such cannot be representative of the prices
of the whole month. The weights of the index number have not aiways
been the same because the number of varieties for which quotations
are available from time to time has not been the same. Two items were
dropped under sugar since February, 1934 and one item under cotton
manufactures since September 1936; as their quotations were no longer
available. This index number is unfit for being used in discussing pro-
blems of an all-India character, and there is a proposal to discontinue
the index number as it does not serve a very useful purpose. A similar
index number used to be published for Bombay but it was later dis-
continued. The Calcuttfl Wholesale Price Index Number was com-
piled by the Department of Commercial Intelligence and Statistics under
the Ministry of Commerce and Industry. Now this index number is
being issued on a temporary basis and may be discontinued any time.
It is published in the Indian Trade Journal.
Labour Bureau Series of Consumer Price Indices
Labour Bureau publishes consumer price indices for the follow-
ing centres in different States :
Assam : Gauhati, Silchar, Tinsukhia.
Bihar: Jamshedpur, Jharia, Dehri-on-Sone, Monghyr.
FUNDAMENTALS OF STATISTICS

Maharashtra : Akola.
Delhi : Delhi.
Madhya Pradesh: Jubbulpore, Bhopal, Satna.
Madras and Kerala : Plantation Centres (covering four centres
only).
Mysore : Mercara.
Orissa : Cuttuck, Berhampur.
Punjab : L.udhiana.
Rajasthan : Ajmer, Beawar.
West Bengal: Kharagpur.
The base period of most of these indices was the year 1944, but
it has been shifted to 1949. Base year of indices of Bhopal is 1951,
of Satna and Mercara 1953, and Beawar August 19P, July 1952. Four
Plantation centres covered in Madras-Kerala are Gundalpur, Kulla-
kamby, Vayithiri and Valparie and the base period for them is January
to June 1949. Most of other indices were started before 1949 and had
formerly different bases. The All India Consumer Price Index number
has also 1949 as base.
The items included in these indices are distributed over five major
groups namely :
(i) Food
(it) Fuel and lighting
(iiI) House rent
(iv) Oothing, bedding and foot-wear
(v) Miscellaneous.
The items falling under each group naturally differ from centre
to centre depending on the consumption pattern in various parts of
the country. These items were selected on the basis of the family bud-
get enquiry conducted in 1943-44 for 15 centres and for the remaining
5 centres the enquiries were conducted later on. The scope of these
enquiries and the sampling frame and fraction were not identical for
all centres and this is a ma.jor shortcoming of this series.
The prices are collected through Economic Intelligence Inspec-
tors or Marketing Inspectors, though in certain centres other agencies
are also used for this purpose. The prices are obtained on a weekly
basis though they do not refer to the same day of the week for
all centres.
Index numbers are constituted in two stages. First, group in-
dices are computed and then they are combined into the general index.
Both the group indices and the general index are weighted. In case
of group indices weights hav~ been assigned to items falling in the
group in proportion "0 expenditure on items in relation to the total
expenditure on the group concerned. In case of general index, weights
are assigned to various groups in proportion to expenditure on them
in relation to the total expenditure on all the groups combined together.
930 GROWTH 01' STATISTICS IN INDIA

It should, however, be remembered that now with the shi ing of


the base year to 1949 the weight-base differs from the price se. The
pri~e relatives are based on 1949 price levels but the weigh have re-
ma1ned unchanged. This procedure cannot be justified on ny ground
and weights should also change and should relate to t year 1949.
Probably du~ to difficulties in obtaining a weighting dia ram and con-
ducting family budget inquiries this has not been do so far. It is,
however, expected that in the new series which is ected to come
out shortly the price base and the weight base will e identical. ",
"-
The All India Average Working Class Index is arrived at by Cllculating'
the weighted mean of the indices of various centres, In its calcula-
tion, besides the indices compiled by the Labour Bureau, stMne in-
dices compiled by the States are also taken into account. Weights
are assigned in proportion to the f-::.ctory employment at various centres.
Where more than one centre have been chosen within a particular I
State, these indices are first combined into one index so that for no State \
there is more than one figure. These State figures are combined to
get all India Index.
Drawbacks. These indices suffer from a number of defects. It
has already been pointed out that out of the .10 series of indices main-
tained by the Labour Bureau 16 have a common base shifted to 1949=
100. The weighting diagram of these indices are based on the family
budget enquiries conducted from 1943 to 1945. The centres for which
these indices are being computed were not selected on the basis of any
random sampling scheme to represent urbart areas but the selection
was done on the basis of the industrial importance of towns in different
regions. The base of the series relating to Mercara, Bhopal, Beawar
and Satna are 1953. I9P, August 1951 to July 195.1 and 1953 respec-
tively. Original base of Mercara index was July 1948 to December
1948, but now it has been shifted to 1953. The shifting of the base
period of these indices is open to criticism. It has already been pointed
out that the base period should synchronise with the period during
which family budget enquiries are conducted. At present when the
base year of these indices has been shifted to 1949 there is not a single
series in which the family budget enquiry was conducted during this
period. This is a serious drawback of the present series.
In the Labour Bureau series families were selected for family
budget enquiries on the basis of both pay~roll sampling .and tenement
sampling as sampling frames. The size of the samples selected was not
uniform in all the series and it was decided on the basis of practical
considerations. Information was collected by the in,terview method
either from the head of the family or the housewife. The size of the
sample should have been uniform for all centres and data relating to
expenses on various items should have been collected from other mem-
bers of the family also. The figures collected were with reference to the
last week or the last month preceding the date of interview. For cer-
tain items like food, fuel and lighting and items falling in the miscella-
neous groups the reference period was the last week and for others it
FUNl)AJrfENl'ALS OF STATISTICS
931
was the last month. Though family budgets of workers living singly
(without family) were separately collected the indices are based on the
family budgets relating to the workers who live with their families
only. This is also a shortcoming of these indices as even those workers
who live singly should have been included. Weights have been de-
termined in the Labour Bureau series on the basis of expenditure as
recorded in the average budgets but items like interest on debts or re-
mittances to dependents were not taken into account. Even the ex-
penditure on consumption groups was not comprehensively noted.
Expenditure on furniture and utensils etc. was omitted in the determi-
nation of weights. This cannot be justified on any ground.
In certain indices (Jharia, Mercara and Madras plantation workers)
house rent has not beef!. incl\:1.ded on the ground that workers live in
houses provided free by the employers or in houses owned by them.
This again is open to criticism. Similarly fuel and lighting group is
omitted from the series' relating to Madras plantation workers. 'the
practice 9f omitting certain consumption grbups from the indices on
the ground that no expenses are incurred on them is unscientiiic. Such
items should be properly valued and should be indu!;led both in the
income as well as expenditure of the workers.
So far as the selection of individual items is concerned, they were
sel~ted on the basis of (i) importance of the item in the consump-
tion group and (iI) existence of a suitable pricing unit. In assigning
weights to priced items the expenditure on unpriced items was added,
this imputation being done on the basis of price behaviour. The
expenditure on such items which were neither priced nor imputed
was assumed to follow the same trend as the group to which they
belonged and as such were included in the group weights. Probably
this was the only alternative which could be followed in the compila-
tion of 'these series though theoretically it cannot be justified.
Retail prices relating to the base year are arrived at in the Labour
Bureau series by finding out the simple average of the monthly average
prices of the J2., months of the year 1944. Current prices are collected
from two selected shops in ea~h of the working class localities and
they relate to a particular day in the week. House rent is taken as
constant in the Labour Bureau sedes. These data .are collected by
the part-time servants of the Central Government who are subordi-
nate employees of the State Governments and do price collection
work in addition to their normal duties. It should be noted that these
people are not trained inv~tigators and the surpervision exercised
over their work is not satisfactory. This affects the quality of the data
collected.
All India Average Working Class Consumer Price Index is of
little significance in a vast country like India where consumption
pattern and price trends widely differ from region to region. The
series of the price indices on which the All India Index is based can-
not be regarded as representative of the whole country as the cover-
FUNDAHEN'l'ALS OF STA'1'IS'JttCS

age of all these indices is not satisfactory. To construct an- All Incli&
Index it is necessary that a large 1111mber of family budget enquiries
are conducted throughout the country and a larger dumber of regional
indices are compiled and then combined into one index.
Thus the Labour Bureau indices are, strictly speaking, not COD\-
parable either on the basis of time or space. The period of family
budget enquiries in these indices the frequency of price quotations, the
classification of items etc. are not uniform. Weight base and the
price base are also not identical and the quality of primary data on which
these indices are based is also uot above criticism.
New Scheme of Labour Bureau
Due to the above mentioned shortcomings a new series of can-
sumers price indices are being prepared by the Labour Bureau. Ag
mentioned earlier the family budget enquiries relating to workers at 50
industrial centres have already been completed in August-September
1959 and the details of the scheme are being worked out. Very sooo
we may have a new and uniform series of consumer price indices for a
large number of centres in the country.
State Series
Various State Governments compile ('ost of living index num-
bers and publish in their own Labour G:l.Zettes oriBulletins. At present
about 41 index numbers are being compiled in d.ifferent States of ~e
conntry in addition to the 20 indices compilt:d by the Labour Bureau.
Government of India. There is considerable diversity in the scope
and construction of these indices and t~ey are not comparable with each
otber. The base periods Qf these indices are different. Some index
numbers have as old a base as 1921 while there are others whose base
periods are very recent. Besides these diversities in the base period
the number of items included in these indices, their technique of cons-
truction and the method of collecting primary data, all differ from each
other and as Sllch the:ie indices are in no way comparable.
Most of these series are now maintained with the base 1949 -100
also. In case of other series which have a b:ase periQd of 1949 or those
following 1949 there is no change in the base periods. With a view
to arrive at the indices calculated on the original base periods, coo-
version factors have been worked out. The indelt calculated with the
new base of 1949 when multiplied by the respective conversion factor
gives the index with the original base. These conversioll factors are
published by the Labour Bureau in its Gazettes.
Most of the State series of indices were already subjected to cri-
ticism on the g[Qund that the base periods and the period of the family
budget enquiries on the basis of which weighting diagrams were arrived
at differed. After the shifting of the base to 1949 this criticism becomes
more valid.
PUNDAMEN'rALS OF S'rAl'IS'rICS
933
The number of items priced and included in these indices varies
between 2.0 and 75. Items of consumption expenditure have· been
mostly represented under the groups :
(I) Food (ii) Fuel and lighting (iii) Clothing (io) House rent
and (v) Miscellaneous. Hyderabad city index includes a
sixth group of "intoxicants" also.
We discuss below some of the State indices.
Bombay Working Class Cost of Living Index Number (Revised)
The working class cost of living index number for Bombay city
;vas first published in the year 192.1. As in the absence of any family
Judget enquiry it was not possible to assign weights to various items,
:his index number was constructed on the aggregate consumption
method. The first family budget enquiry in the working class was
held by the Bombay Labour Office from May 192.1 to April 192.2. in
Bombay city. A second enquiry was conducted between May 193z
and June 1933 and the results of the second enquiry have been used
in the compilation of the revised index number. Three per cent.
sample of the working class families was taken in the Bombay city
Enquiry and if the sample tenement was vacant or occupied by the
persons out of the scope of the enquiry the next tenement fulfilling the
conditions was taken up. Information was collected by the inter-
view method. Members of the Bombay Labour Office paid house
to house visit to collect information regarding the amount of money
spent on various items. These items are distributed over five major
groups which have the following weights :

Group No. of items Weight


in the group

I. Food 2.8
"'7
2.. Fuel and lighting 4 7
;. Clothing 6 g
14· House rent I 13
5· Miscellaneous 7 14

Total 46 89

The group weights, except that of miscellaneous group, are


arrived a~ by finding out the .expen~ture on various groups (as
recorded In· the average budget) 10 relatlOn to the total expenditure
(both consumption and non-consumption) as recorded in the average
budget. Thus the group weights for all the groups except the mis-
~ellaneous group, include expenditure on both priced and unpriced
Items. The assumption is that the imputed expenditure on unpriced
items will tollow the priced trend recorded by the respective index·
numbers. The weight for the miscelkneous group.s ill' determined by
expressing the actual expenditure on the item inCluded as a per«n·
uge of the total family expenditutc. The weight for the miscell·
tneou9 group thus' dOfJS not cover expenditute on ,unpriced and
unimputed items in this group which include non-oonsllmption expendi-
ture. It is assumed that they follow the price trend of the ge$leral
index number instead of the miscellaneous group index. This is
the reasoo why the total of; the weights ~~S not ad4. upto 100. It
shou14 be rem.~bered that the total of the group weights in. this
ipdex cloes not. r~teseo,t the Perc:en.tage which the ebnsuttipti6n ex·
penditure bears to the ,to,tal,4tiilly ~di,tut~, 'the price quotations
q£ all the artides except foJ clQ~ {O'IU. varieties .of nsh.. 'brinjills
and pumpkins are coJ;le~d weekly by the ~bour Q.tnce. frotI). ct'(\ro
shops in each of the twd.ve <llifcrent ,industrial aml& cov.eted... :tJrlees
of all clothing articles ocept J:hanI ate cOllected from [o~ diff~rtt
cqttoa. mills ha.v;ing retail shops in Bomooy city. Prices 01 'brln}ah~
pumpkins and usn ate taken. £(Om municipal tc:cords. I

The base yeat. of the index number was orig~y the yeu endiq.g
June 1934. The method used in the compilation of the index number
resembles that of the Rtitish Ministry -of Labout'. Tbe index number·
for each grcmp .i~ -&at cakalated by averaging tP.e group ngules ~
gether mer Welghting ~ by the perc:enta~5 that each item of a.
group bears to' the total expmidituoc on the ~oop. 'the 1inal in~~
number 'is aalcolatcd by a:vemgin8 'the group .indiCes after w~1)Wlg
them by the percentages that expendi~. on tadl Broop b~
to the total expenditure on all the grp'ups. The average used in botO
-cases is the.. arithmetic average.
In October J~4jt1R: ~cral index was shifted to a new base of
August 1~9. Formula. adopted for this p1UpOfe was very simple.
It was as follows ~
IbXl00
III=-""'__-
u
Where Ia indicates the index 011 ·the new base, n Rp.resents the
index on the old base ilnd 1& tho index for August I9~9 with a baR
for 193jJ.
Later on· the Labour Bureau shifted the base .of thilf'indc:.x to f949 ,
and in this ahifting also a conversion factor has been alculated for the
purpose.
Xanpur Working Cmea Cost or Living Index Nwnber
Kanpur working ~ss COOSlUDer ~rice indCJl; lllm).ber is maintained
by the office of the Labour Commtssioner, Government of Uttar
Pradesh. This series is based on the results of a family budget ~uiry
conducted in the year 1938-~9 among the working class families.
The tene~ents sampling was used as the sampling frame and 10 per
cent. sample was collected for study. In all I4Z% budgets were collected
FUNDAMENTALS OF STATISTICS
935
but due to the difficulties created by.the outbreak ~Second
World War only 300 budgets. all reIating to only one locality of Kanpur
city, namely Juhi. could be processed. The average budget was,
therefore, derived on the basis of the analysis of only 300 family bud-
gets. Items included in this index number are divided in the five
usual groups. Expenditure on household requisites is not included
in these five groups and the inde~ number covers only about 69 per
cetlt. of the total. expenditure of the working class families. The
weights of various items in this index number have been worked out
on the basis of relative expenditure on different items. Though
within the group the total weights of individual items adds upto
100 the total of the groups weights is only 69' In other words
only 69 per cent of the total family expenditure has been accounted
for by the items of consumption. expenditure.
Prices are collected from 10 selected shops in various labour lo-
calities of Kanpur. Prices are inclusive of all sorts of taxes the inci-
dence of which is bome by the consumers. The selection of shops
for price collection has been done on the basis of their popularity.
situation and volume of purchases made by the workers. Formerly
the price related to every Sunday but now they are collected on every
Saturday. The weekly prices are averaged to obtaill the monthly
prices of various commodities. It should be noted that unlike most
of the series of consumer price indices the rent index for Kanpur has
been kept upto date and this has been possible by conducting special
enquiries every six months. In the computation of the index, Les-
peyre's formula (which is the most common formula for the compila-
tion of these, indices) is used and the index is computed liS a weighted
average of the price relatives in two stages. First group indices are
compiled and then they are merged into a general consumer price
index number. The following are the group weights in this index

Group Number of items Weight

I. Food II 42
2. Fuel and lighting 2 6
3. Clothing 2 8
4. House-rent I 9
~ . Miscellaneous 6

Total 21
,. r GR9WTH OF STATISTICS IN INDIA
6<1)36
Working Class Consumer Price Index Calcutta
\
This index was originally initiated by the Controller of "Civil
Supplies Department. Government of West Bengal. At present it is
maintained by the office of the Labour Commissioner, Govdnment
of West Bengal. A special feature of this index is that n,O large
scale surVey of the family budgets were conducted to obtain the
weighting diagram. Instead, the results of a limited survey of family
budgets conducted by the Burma Shell Co .• among its v,,"orkers were
utilised for the purpose. This survey was conducted by the Burma
Shell Co. in 1939-40. The consumption expenditure was reported
under the usual five groups namely food, fuel andfighting, clothing,
house rent and miscellaneous. '
Prices are collected for this index from tw~ selected shops situated
in each of the two important industrial ~Ones of Calcutta, namely
Cossipore ~d Kidderpore. Prices are c611ected weekly by the clerks
working in the office of the Labour Commissioner. The house rent
is kept fixed in this index and ~lC formula used in its calculation is
the Lespeyre's formula. The general index is obtained by combin-
ing the group indices which, are first compiled as weighted averages of
the prices relatives.
A very big sho,rlcoming of this index is that the miscellaneous
group index is kept unchanged at 150. This figure was worked out
for the month of March 1943 by the Civil Supplies De~artment. Later
on when this 'index was taken over by the Labour Commissioner's
office, detail~ ofitems included in the miscellaneous group and their
weights were not available !u:1'd hence this group index has remained
unchanged.
S/forlcoming.r. All the State indices suffer from a number of draw-
backs. In most cases a temporal or spatial comparison is out of
q·.lestion even after shifting of the bases to 1949. There is no uni-
formity in their base periods. The items included in the series and
their classi~cation, ~e technique of obtaining price d.ata,. the frequ-
ency of prIce quotatiOn, and even the system of weIghting differ.
The family budget enquiries on the basis of which weights have been
arrived at do .not synchr~mise with the. base years, and in many cases
the consumptIon expenditure covered IS not full. As has been point-
ed out, in certain indice~, house re~t i.s altoge,ther a~sent or not fully
represented. The technIque of assIgmng weIghts 10 many cases is
not scientific and generally speaking the quality of these indices is
not very satisfactory.
It is, however,. gratifying ,to note that many S.tat~s are working
on schemes for startIng new senes of consumer pnce lOdlces of working
cla~s and ~he Labour ~ureau, Government ,?f India, is also very shortly
~ou::g to Issue an entIrely new set of such lOdices for about 50 centres
FUNDAMENTALS OF STATISTICS 937 I

Consumer Price Index Jyumbers. At present the Labour Bureau


compiles consumer price index numbers relating ~o a large number of
centres in the country. These indices have already been discussed
earlier. Besides these indices, the Labour Bureau also publishes the
consumer prices of certain commodities at different centres. These
prices are the same which are taken into account for the compilation
of consumer price indices.
It is very obvious from the above that the data relating to retail
prices in our country are not only very inadequate but of very poor
quality also. Thapar Committee had made some very useful sugges-
tions regarding the collection of retail price statistics, particularly re-
lating to the agricultural sector. Some of the suggestions have already
been discussed in an earlier chapter relating to the statistics of the
agricultural sector of the country. This Committee w,as of the opinion
that there should be no duplication of organisations for the collection
of wholesale and retail prices llnd that only one organisation should
collect both types of data. They were also of opinion that the scope
and coverage of price data available in the country should be widened.
Their suggestions with regard to compilation of parity index numbers
of prices has already been discussed. They very rightly pointed out
that the existing data about "future" prices in our country were very
inadequate and recommended that such prices should be regularly
collected and published. The centres for which price qata are available
at present are very few and it is necessary that the ·list of centres is also
considerably enlarged. This is essential in view of the great disparity
in prices which prevails in various parts of our country.
Suggestions for Improvement
A number of suggestions can be given for the general improve-
ment of price statistics available in our country. At present several
varieties of a commodity are available in the market and it becomes
difficult to obtain the prices of the same variety each time. It is ne-
cessary to have standardisation of commodities. In the field of indus-
tries "quality marking" schemes and statistical quality control are
being experimented upon and have met with some success. In agri-
cultural commodities the Marketing Department of the Government of
India has certain schemes for standardisation of commodities and
<CAg Mark" scheme is one of the most popular ones.
Another suggestion that can be given is that price data should
be available at regular intervals. At present some times there are un-
even gaps in the ava~labi1ity of price statistics' which ,should be avoi-
ded. It will be worthwhile to carry out surveys relating to produc-
tion consumption and sale of commodities at regular intervals and in
this direction the efforts of Directorate of N. S. S. deserves special
mention. The quality or price data in our country would be improved
if there is a classification of markets and separate quotations are avail-
able for various types of markets in the country. The figures relating
to regulated markets should also be available separately.
GROWTH OF STA'I.'lSnCS IN INDIA
~38
Another drawback which can be removed without much diffi-
culty relates to the conceptual differences about various terms. Dii--
erent State Governments do not collect statistics on a comparable
basis and the figures are many times not fit for inter-state comparisons~
The days of the week to which the prices relate, the frequency of quo-
tations, the selection of commodities etc. differ from index to index
rendering them uncomparable. With a better coordination of statis-
tical data such a defect can be easily removed.
It is also necessary that proper attention is paid to the' selection
of the representative centres from which data are collected. The con-
sideration for the selection of a centre should not be the convenience
of obtaining the price quotations but the imp~rtance of the market
centre for a particular commodity.
There is a Scope for improvement in the method of quoting
prices and obtaining modal price quotations. Prices should be quoted
as the actual prices paid inclusive of all the taxes, terminals etc. Only
those price quotations should be taken into account which relate to
modal prices, that is, prices at peak periods of marketing, at which
the largest number of transactions take place.
The quality of price statistics would also improve if there is stan-
dardisation of weights and measures. The Government has now taken
adequate steps in this direction and metric system of w<{ights and mea-
sures has been adopted. This will go a long way in improving the
quality of price statistics of our country. Till recently the prices were
quoted in different weights and measures and comparison. was not ~n
easy job. -
A word about the agency for the collection of price data is also
necessary. At present a number of Government and non-government
orga-nisations collect statistics and they are published indifferent journals.
It is necessary that price data should be collected -by properly trained
investigators on a uniform basis throughout the country. Various
state governments are making some efforts in this direction by getting
the price data through'the agency of Economic Intelligence Inspectors
and Marketing Inspectors but the situation is yet far from satisfactory.
It is, however, necessary that as soon as possible regulated markets
should be formed in the country and this would also facilitate the task
of collecting price data on a scientific and comparable basis. There
-should be proper supervision over the work of the field staff collecting
price statistics and the data collected should be properly processed be-
fore it is published.
Lastly it can' be said that the utility of price statistics available in
the country can be considerably eghanced if they are published promptly
and without delay. Price data which are stale have only a historical
importance and unless these figures are available without much delay
they lose much of their utility. Since the processing of data and its
publication takes a long time, it would be worthwhile if cycl-ostyled
weekly bulletins of prices are issued about local and regional pdces.
GROWTH OP STATISTICS IN INDIA 939
The All India prices and Price Indices could be published even atte%
"Sometime but local and regional prices should be available as soon 118
possible with the least amount of timelag.

SECURITY PRICE S'l'ATISTICS


Prices of Government and nongovernment securities are pub--
lished in a large number of papers every week. The vuiotis stock
exchanges in the country bring out their bulletins containing prices
quoted for different types of securities and the nonofficial technical
journals like Capital, Commerce etc. contain detailed figures relating
to the prices of important securities of all types.
A series of inde:x numbers relating to security prkes are also avail-
able. A brief account of these is given below :
INDICES OF SECURITY PalCES
<a) Economic Adviser's Series
Index numbers of security prices were published formerly by the
, Office of the Economic Adviser to the Government of India and also
by a few non-official agencies. The Economic AdViser's index number
of security prices had the year 1927.28 as base and it utilised quotations
of about 150 scrips. This series had become rather old and obsolete
and'as such this inde:x number was discontinued in the yeu 1949.
(b) Old Series of the Reserve Bank of India
The Reser:ve Bank of India considered it desirable to construct
a new series of General Purpose Index Number of Security Prices mak·
ing it as broadbased as possible by selecting a large representative set
of scrips from important stock exchanges. Since January, 1946 the
Reserve Bank of India started a weekly series of such indices with the
year 1938 as base. The quotations of scrips used in these indices were
obtained from the lists published by the Stock Exchanges of Bombay.
Calcutta and Madras. Scrips were examined both from the point of
view of:-
1. Importance of the concern, and
2. Activity of the scrips in the market.
In all 398 scrips were seJep:ed and they were' divided in three IDa-
jor groups, namely : -
1. Government and semi-government securities;
2. Pixed dividend industrial securities;
3. Variable dividend industrial securities.
The first group was divided in three sub-groups. the secpnd in
nine sub-groups and the third in nineteen sub-groups.
Por obtaining the prices of the base year the arithmetic average
of daily price quotations was calculated in the ,year 1938. For each
'Selected scrip weekly average prices were first calculated stming from the
~ week of January, 1946. Piice relatives were then worked out for each
940 FUNDAMENTALS OF S1.IA.'l'ISTICS

scrip on the basis of ,the prices of the immediately preceding week. For
the first week of Jarluary, 1946 the price relatives were based on the
average prices of 1938. Unweighted geometric mean of the price rela-
tives of all scrips falling within each sub-group 'was -then calculated to
give the sub-group-link-relatives at each of the three centres, namely
Bombay, Calcutta and Madras. For the link relatives of th~ sub-groups
two sets of index numbt:rs were prepared : -
1. The regional index numbers. and
2. All India index numbers.
So far as regional index numbers were concerned for each centre
the iink relative for each of (he main groups was formed by computing
weighted arithmetic 'average of the sub-group link relatives. Weight&,
used in case of fixed dividend industrials and variable dividend indus
trials were proportional to the total paid-up capital of all companies
quoted on the stock exchange and belonging to the particular sub-
group. In case of Government securities weights were proportional
to the amounts of loans outstanding. Regional index numbers for
each of the' groups and sub-groups for each centre were formed by serial
multiplication of the corresponding link ,relatives.
So far as all-India Index numbers were concetued all-India sub-
group link relatives were first formed" by calculating weighted arithmetic
average of the sub-group link relatives at each Of the three centres.
From the all-India sub-group link relatives index,numbers were obtained
for each sub-group by serial multiplication. Similarly all-India group
link relatives were formed by weighting the suh-grQt;r.p link relatives
and all-India group index .numbers were obtruned from them by serial
multiplication.
(c) ::N ew series of the Reserve Bank of India
Ip. the year 1952-53 it was realised that the series of indices of se-
curity prices published by the Reserve Bank of India had become obsolete
and required certain modifications. The pre-war base period had
become out of date and some new securities quoted on the stock ex-
changes of the country had become very influential and deserved in-
clusion in the indices. The Government oflndia and State Governments
had also Boated certain loans in the market and these scrips were also
to be included in the index number. These reasons led to the r~vision
of Reserve Bank series in August, 195~. Though these revised indices
were published for the first time in August, 1953 they were worked
backwards up to the first week of April, 1953.
The important changes that were made in the new series are of
various types. Firstly a number of new scrips have been added and a
few old, scrips have been dropped. For variable dividend industrial
securities quotations have also been taken from Delhi Stock Exchange
and a new series of index numbers for debentures is also constructed.
At present the variable industrial securities relate to four centres...
namely, Bombay, Calcutta, Madras and Delhi and others to three centres
GROWTH OF S'l'A'IISTlCS IN INDIA" 941
only. Minor changes have also been made in the groupings. Now
. there ate four main groups as follows : -
t. Government and semi-government securities with three sub-
groups.
2. .Debentures of industrial concerns with eight sub-groups.
3. Preference shares with nine 'sub-groups.
4. Variable dividend securities with five sub-groups which are
further divided in eighteen still smaller groups.
Weights of industrial securities ~e now proportional to the market
value of shares during the base period instead of being proportional to
the paid-up capital. It was thought that market values correctly re-
present the relative importance of different industries and since this
it the practice in U.S.A.• France. Norway. Finland. New zealand. etc.
is was adopted in our country also. There has been a slight c.lulnge ih
the technique of the construction of index numbers also and .it is that
the chain index number of sub-groups arC': now weighted inste3Q. of
weighting the relative.
SECTION 6
WAGE STATISTICS
Statistics of wages in India have not been collected at regular
intervals either by official or non-official committees 01: commissions
or by any government deRartment under statutory sanction as has been
tdone in other countries of the world. This state of affairs was empha-
ic:ally criticised by the Royal Commission on Labour which pleaded
for better and more statistics about wage level in various industries in
different centres. Agricultural wages present anotill..t problem and the
data il.vailable about them are. extremely scanty and defective. It is
supposed that the condition of agricultural labourers is worse than that
of the industrial labourers and it is necessary to solve the problems of
agricultural labour and to study the various aspects of tqe agricultural
economy from the statistical point of view.-
In the matter of collecting statistics of wages with regard to in-
dustrial labourers the state of Bombay had taken the lead and pad con-
ducted comprehensive surveys to enquire into the wage levels of certain
industries. Similar surveys had een conducted in Bihar and some
other provinces. Certain statistics were collected, under the P'!1l'11ent
'/ Wag.ys Ad of 1936. The Report of the Labout Investigation Com-
mittee appointed by the Government of India (popularly known as
Rege Committee) also contained statistics of wages relating to certain
industrial centres of the country. Some earlier statistics relating to
wages are available in the publication Pri" and Wagu. This publica-
tion used to give the results of wages census in respect of some urban
and rural occupations. Its publication was suspended long ago. These
figures are extremely defective. The rates have been quoted between
very wide ranges and.the frequency of employment is not given. More-
over the unitoftime for which wages have been recorded is notruniform.
At present, so far as statistics of industrial wages are concerned, they
are available in the Annual Reports of the Chief Inspector of Mine and the
AIIlZllal Reports of the Working of Fact()f'ies Act. The Annual Reports
of the Working of the Trade Unions Act and the Workmeff s Compensation
Ati as also the Btnplqyee.r State Insllr(lllCl Act also contain certain statis-
tics of wages. The Labour Gazettes of the various .States and certain
publications brought out by the Labour Bureau also contain statistics
of industrial wages. Besides these the AnnIltJI Census of Manufacturl1' s /
also publishes statistics about the level of wages in different industries
of the country. Indian Labonr Ga~effe published by the Labour Bureau
contains statistics about the employm.ent in factories, employment
~hange statistics, wages and earnings, minimum wages in certain in-
dustries and average weekly earnings of certain types of labour.
GROWTH OF STATISTICS IN INDIA 943
Wage statistics are obtained from (I) the Labour Bureau, (il) CM
(iii) SSM! and (ill) AS! (from 1959 onwards).
Labour Bureau data of wages in manufacturing industries
The Labour Bureau publishes statistics relating to per capita
average annual earning collected under the Payment Qf Wages Act
1936. Under this the statistical authority in various Statc-s only collects
the returns from individual factories as defined in section 2(m) of the
Factories Act of 1948 and sends them to the Labour Bureau for con-
sideration. processing and publication.
Apart from tht: above mentioned statistics relating to manufactur-
ing industries the Labour Bureau also publishes statistics relating to : -
(a) Per caph.a .average annual earnings collected under the Mines
"Act (by the Chief Inspector of Mines) and Index Numbers of Nominal
Earnings in Employees in different Minor Industries (Base December
1951-100).
(b) Earnings of workers in plantation industries.
(&) Average annual earnings of certain type of staff working in
Government Ra.tLways, Docks~ and some atl hoe figures relating to earn-
ings of workers in nationalised motor transport.
(d) Wages of working journalists.
(e) Minimum Wages fixed or revised under the Minimum Wages
Act of 1948.
(J) Average Wages of casual agricultural labour.
eM, SSM! and AS! Wages Statistics
The CM contained the following wage statistics (upto 1958) r~
lating to 'workers' and 'qther employees' of the manufacturing units
covered by it : -
(!) Total salaries and wages paid in cash during the year kll fines
and deductions for abs~nce or damage or loss.
(ii) Total money value of any privilege or benefits or contribu-
tion not paid in cash but which was capable of being estimated in terms
of money and whicl! accrued to individual employees and not to a group
of employees.
All employees not covered by 'workers' were classed in the second
category (of persons other t~an workers).
Under SSM! also similar figures were collected on a random
sample basis for a large number of industries.
Under the AS! workers are classified as skilled, semi-skilled and
unskilJed and separate figures of wage are being collected for each cate-
gory. Wage statistics collected under AS! are of two tvpes namely
(,) For part I of the form which is applicable to certain types of
factones the term wages includes all contractual payments and exclu-
des all ,x-gratia payments like profit sharing bonus etc.
FUNDAMENT4LS OF S'lIATISTICS

(i,) Por part II of -the form which is applicable to other types of


factories the term wages include.s all payments made to labour including
ex-gralia payments like profit sharing bonus etc.
It has already been mentioned earlier that it would have been
better if there was only one definition of wages for both types of units
covered by the ASI.
Both in CM and ASI the definition of wages has been taken "".-
hatio} from Payment of Wages Act. This definition hardly meets the
requirements of an industrial census as the wage payments made do
not relate to quantum or amount of productive work done in the estab.
lishment because wages according to this definition includes all addi-
tional remuneration including termination of service benefits. Moreover
it includes all contractual payments only and excludes all tx-gratia pay-
ments like profits sharing bonus etc. which is not correct from the point
of view of industrial census or survey.
Labour Buteau Index of Earnings of Factory- Workers
Just as the Labour Bureau has started constructing an all-India
index number of the cost of living of working class people similarly
all-India Index of Earnings of Factory Workers has also been compiled
by it with a view to enable comparative study of the trends of earnings
and cost of living. This index was published for the first time in Feb-
ruary. 1953. Similar index numbers are being published by various
States for industrial workers as well as for middle class families, Go-
vernment employees, etc. The Labour Bureau Index Number of Earn-
ings of Factory workers is compiled on an all-India basis. This index
is an annual index and is divided in three parts and is computed in res-
pect of-
t. A1l industries for each state,
2. Bach industry for all states combined, and
3. All industries for all states.
The base year of the index number was 1939 and' the series is
available from the year 1944 onwards. The data on which it is based
are those collected by the Labour Bureau under the Payment of Wages
Act. As such the index number is based on the data relating to only
such factories which are covered by this Act. The figures of average
earnings have been limited by the term wages as defined in the Payment
of \Vages Act. Accoiding to it wages include "all remuneration cap-
able of being expressed in terms of money which would if the terms of
the contract of employment were fulfilled be payable and includes any
bonus or additional remuneration of the nature aforesaid, but it does
not include value of any housing accommodation, supply of light,
water, me,dical attendance and other amenities provided by the em-
ployer, contributions paid by ,the employer for any pension fund or
provident fund, etc. It would have been better if the base year of this
index number was 1944 because the aU--India cost of living indices bad
the b..se year of t944 but so far as earnings were concerned the year
GItOW'lIH OF S1'A TfSnCS IN INDIA

1944 was considered to be an abnormal year and as rodl it coulo nur be


selected as a basco However, for the purpose of comparison of the cost
of living ana- the earnings the Labour Bureau shifted the base of the
index of earnings to 1944 and worked out a new series. The indices
of earnings were deflated for changes in the cost of living and thus a
series relating to the real earnings of the factory workers was published.
Now the base of this index is shifted to 1949 for the purpose of com-
piling the series relating to net earnings.
Agricultural Wages
So far as agricultural wages are concerned statistics relating to
them are still very meagre and defective. The available material ~onsists
of either the data published in Prkes and Wages or the data collected
through some socio-economic enquiries.
Present Position
(il) It was in the year 1950 that the Directorate uf Economics and
Statistics in the Ministry of Food And Agriculture, Government of
India, prepared a scheme for the collection of these statistics on a uni-
form basis throughout the count':y. At present statistics are collectec
on the basis of this scheme. Agricultural labourers are classifi¢d as
follows:
-- - ~ (I) Skilled -Labour : (a) CA:'~~,!1.t.k!_S _@_blacksmith {::)- coo'bfers.
(it) Field Labour: It includes plougtlli~:; ~Qwers, traOf~­
planters, weeders -and reapers etc.
(iii': Other agricultural labour: It includes coolies, load-
carriers and well-digger!'! etc.
(ip) Herdsmen. ,~ , ,.:J
For all these classes of agricultural labourers wage statistics are
collected separately for males, females and children. These statis-
tics ~re published in :
(z) AgriGtlltural Wage.r ill I"dia and
(p) Agri&llfnral Sift/anon 111 Indk••
Both these publications are brOl.lght out by the Directorate of
Economics and Statistics in the Ministry of Food and :Agriculture
Goyernment of India. . ,
(b) SlalistiGs tmder Minimum- Wages Act. This Act which was
passed in the year 1948 lays down the fixation of minimum wages for
employees in a number of employments both agricultural as well as
non-agricultural. This is a protective measure to improve the income
of agricultural labourers. Under this legislation minimum wages
for agricultural workers have been fixed throughout the States of Kerala,
Orissa, Punjab, Rajasthan, Delhi and Tripura and for specified area
in the States of Assam, Andhra Pradesh, Bihar, Bombay, Himachal
Pradesh; Madhya Pradesh, Mysore, Uttar Pradesh and West Bengal.
Minimum wages have also been fIxed by the Central Goverrullent
in certarn agricultural demonstration farms and military farml' noder
60
PUNDA MP.NTA LS (J F STA 1'ISTICS

the Central Ministries of Food and Agriculture and Defence respec-


tively. All these statistics are published by the Labour Bureau, Gov~
ernment of India.
(c) Ad boc dala collet/cd throllgb AgriCllilllral Labollr Eflqlliries. Two
agricultural labour ('nquiries were conducted recently in our coun-
try the first in 1950-51 and the second in 1956-57. They have brought
out very useful and interesting data on agricultural wages Juring thl'Ce
two periods.
The -first enquIry was conducted in 800 village~ selected un the
basis of stratified random ;,;unpling and covered a sample of 11,OOU
agricultural labour families. Its report w IS published in 1954-55. The
second enquiry was conducted in 3,600 villages a'1d it covered a sample
of 28,600 agricultural labour house-holds. The report of the second
enquiry was published in 1960. These reports contain a \Vid~ variety
of d,lta relating to OCcup;}t1onal ~trLlct lire, employment, wages :tm~ earn-
ing5, income, expendlture anti Indebtedness.
Cd) 1961-Genslls data. In the population census of 1961, as has
hel~n mentionned earlier, statistics were collected about the number of
agricultural labour households. Th-ough no~wage -statistics were col-
lected during the census, yet the data collected would provide a sournl
basis for aU· agricultural labour wage studies 'which \ may be conducted
in future. An agricultural !abo'irer was defined in the last census as a
person who worked on somebody else's land in lIeu of payment (in
cash or kind) and who had no responsibility connected with super-
vision or direction of the work, who had no ownership or tenallcy r'jght
On land on which he worked antI who was not responsible for profit or
lost of cultivation.
Labour Bureau COJlSutners Price Judex NIII»bers for AgriCIIlt:Jral La-
bour (interim series). The Minimum Wages Act of 1948 twhich is applica-
ble inter alia to employment in agriculture) requires fixation as well as
revision of minimum wages in accordance with changes in cost of living.
As such this work was given to the Ministry of Labour and Enlploy-
ment. Now the Labour Bureau under this Ministry compiles indj(T~
of consum;:r prices for agricultural labours.
SF:CTI()~ 7

TRADE AND TRANSPORT STATISTICS


Traue Statistics of India unlike other statistics are neither very
meagre nor very defective. The Department of Commercial Intelli·
gence and Statistics has been collecting statistics of trade ever ~ince it
was created and. even at pr s nt this \vork is done by it. However,
1110St of the trade statist ics even at present arise out of the compilations
made in the course of administrative la\ys such as those relating to taxa-
tion of goods entering or leaving the country or in the returns submit·
ted by the railways etc. Statistics of foreign and inland trade are avail-
able in the following publications : -
1. Accounts relating to the Fortign Trade III/d N{lI'igaliol1 of II/dill.
(MonthlY)
It was, however, felt that the entire foreign trade statistics of the
country should be available in one publication and at one place because
the then existing publications did not give a complete picture <)f our
exports and imports. In 1952 therefote the figures contained in (I)
AmJUnts Relatirlg to Ihe Foreign Sea ·:znd Air-borne Trade allJ Nat'i_galian
of India and (ii) Acro/tnts Relating Iii Trc:'e of [f1dia Ity Lt:!!:J wilh i'lrrign
fPll1tlries were combined into one publication wnh;h \\ as entitled "Ar-
tYJIIfllJ relating 10 the Forei_g11 (Sea, Air and Lanti) Traa,. lind Nangalion of
[ndia." Initially all the figures of exports and imports by land roUle
could not be merged with similar figures of exports and imports through
sea and air on account of difficulties of clagsification. In the beginning
our overland trade with Pakistan, Afghanistan, Iran and Burma only
was included in this publication and figures rebting to our trade with
Tibet, Sikkim and Bhutan could not be included. However, later on,
they were also taken up, though in their case, only the quantity exported
or imported (in m..lullds) was noted down and figures relating to value
were not shown due to certain difficulties. In April, 1956 the name of
this publicat:on .\'a5 slightly change~l :lnd it came to be known as "Ac-
cOllnls relating to the Foreign Tr(Jde alld Ntll'ig.1tioll oj India" tbe words "See
Air and Land" having been dropped. Earlier in 1952 the duty tables
(relating to the collection of custom ,hltie~) contained in this publication
we<::e ddtled and they began to be published separately.
The data contained in the "ACCOfm/.f rei.11illg to fhe Forei_g1J Trade
flIld NdVi_galiOIl of IlIdia" :tre as follows :
P~R'r A-Forcigfl Trade By Sta a1/d Air:

Ci) Imports, exports llnd re-esports of merchandise.


(ii) Special tralk ~tatistjcs of opium and other dangerous
drugs.
948 FUNDt\MBNTA~S OF STATISTICS

(;;,) Trade in treasure-gold, silver, Indian Government coins


and currency notes.
(iii) Goods under bond.
(v) Overall balar:e of trade in merchandise 2nd treasure.
(vi) Declared values per 'unit of principal articles of imports
'and exports.
(IIi') Trade by countries and "urrency areas classified by groups
of ~ommodities.
(viii) Commoditywise trade (value and quantity) by principal
countries and also trade in treasure.
(ix) Countrywise trade in principal articles': Trade at Ports,
Transit trade and Shipping in foreign trade.
(x) Index numbers of volume and average unit declared
values of exports and imports.
PART B-Shipping :
Details about vessels entered and cleared witli cargoes.
PAR'lY C-Portign Tram by Land :
(I) Land frontier trade with Pakistan, Afghanistan, Iran and
Burma (both quantity and value).
(i~ Land frontier trade with Nepal, Tibet, Sikkim and Bhu-
e

'tan. (quantity in maunds).


Ci;i) Over-all summary of foreign trade by Land.
2. ForliJ1ll Trade by Sea anti Air. Figures regarding foreign trade
by S~a and Air rclate to trade registered at Indian Ports (both Sea and
Air). Tbe numb::r of 'trade ports included in the publications has' been
gradually increasing. ' In 1949 the trade ports of Kerala and Kathiawar
and in 1950 the air port of Delhi were included in the list. Foreign
trade returns of Cochin were formerly included in Madras Central Zone
but since April, 1953 they have been included in the return of Kerala.
Articles of trade are mentioned in these publications in alpha-
betical order and exports are mainly divided in two categories namely,.
(I) Exports 'of Indian .merchandise, and
(;1). Re-exports of foreign merchandise.
. The articles of exports and imports which were formerly classi-
fied •In four categories, were later clasSified in the following five cate-
gorles :
(i) Food and drink and tobacco.
(il} Raw materials ~nd produce and articles mainly manu-
factured.
(li~ Articles wholly or mainly manufactured.
(iu) Living animals.
(II) Postal articles not specified under imports and exports,
_ GIlOWTH OF STA'IIS'l'lCS IN INDIA

lirom 1st j.ll1Uary, 1957 our country adopted a new trade classi-
fication and it is much better and more exhaustive anG s..cientific than the
e.ulier one. The c1assificatio'n of goods und~r the new system is more
logical and scientific and the following are the main categories uncle r
the new scheme:-
<i) Food.
(ii) Beverages and tobacc().
(iii) Crude materials, inedible except fuels.
(;6) Mineral fuels.
(0) Animal and vegetable oil and fats.
("'1 Chemicals.
(lIil) Manufactured goods.
(piit) Mach.inery and transport c<luipment.
tix) Miscellaneous manufactured articles.
Each of the above nine classes is further sub-divided into a number
of sm:\ller sub-divisions. Due to the adoption of this new classification
figures relating to the perioe prior to 1957 are not strictly comparable
with those of later year.
With regard to each item of export and import, figures are avail·
able not only of value but also of quantity exported or imported. Fi-
gurl!s of quantity represent the net weight exclusive of packing ete.
Value of the goods imported or exported was formerly based on tht
wholesale cash price less trade discount, for which like goods could be
sold at the time and place of import or export as the case may be. In
case of export no deductions were made but in case of import. from the
value thus calculated deductions were made for import duty payable
thereon. If it was difficult to ascertain the wholesale cash price the value
generally represented the cost at which goods of like nature and quality
could be delivered at such place. However if the goods were subject
to duty on tariff valLl::ttion, this value was taken as the correct value even
though there might be a great difference between this valuation and the
actual value.
With effect from April, 1951 the basis of valuation has been chang~
cu. Now the export values conform strictly to f.o.b. basis inclusive of
export duty. Imports are now valued at c.i.f. basis.
In making use of these figures the following things should be
remembered : -
(I) That these are based on declarations of importers and
exporters in the Bills of Entry and Shipping Bills
respectively which are accepted practically with.out ques-
tion. The policy of the Government has been that
the goods should not be detained on account of mit- .
statements. affecting 'statistics only' (not revenue) antI
therefore to a very great extent the accuracy of these figurell-
950 FUNDAMENTALS OF Sl'Al'lSTICS

is dependent on the correctness of the declarations made


by traders, particularly in such matters which do not
affect the Government revenue.
(ii) Exports do not include the goods purchased by the
Government and shipped on Government chartered
vessels.
Uii) Export fig u rl!s relate only to Indian merchandise as
foreign m!cchandisc e:'j)orted from India is classed under
re.exports.
(iv) Import figures include all goods landed in India ieres-
pxtive of their final Jcstination and dutiable articles
whether in passengers' luggage or imported by letter
post. -
Both imports and exports are classified in Jiffcrent tablcl> accord-
ing to the-countries and also according to the nature of the articles_
All the above information is published on a monthly basis with a
time lag of about eight weeks and the "Annual Statements of the va-
rious Sea and Air Borne Trade of India" Volumes I and II contained
annual figures in a more detailed fashion.
Weekly figures of imports and exports of particular articles (by
sea) are published in the likely supplement to the Monthly Abstract of
Statistics. I
)t is a monthly publication brought out by the Department of
Commercial Intelligence and Statisties and it contains details of quantity
and value of the principal items of import and export trade by Sea, Air
and Land with 52 selected countries which arc grouped on geographical
basis. The following are the main divisions :
(i) Western Hc:misphere-U. S. A. and ()ther countr,cs.
(ii) Western Europe-U.K., E.E.C. countries, E.F.T.A.
countries other than U.K. and others.
(iit) Eastern Europe-U. S. S. R. and others.
(ill) Middle East.
(v) Other Africa.
(VI) Other Asia (excluding U. S. S. IC)-South East Asia,
Far East Asia.
(viI) Oceania.
These statistics .Ire das~ified according to coulltrit.:~ to whicb im-
ports and exports arc credited, that is, imports to the country of con-
signment and exports to the country of final desti~H\tion and these
countries are claSSIfied vfl :! politico-geographic basis. Pigures include
trade by sea, air and land but figures of treasure are excluded.
These statistics are also published in the "Monthly Statistics of
Foreign Trade of India" issued by the Department of Commercial In·
telligence and Statistics, Government of India.
Indices of Foreign Trade
,/

Indices of foreign trade of India are also c<lmpilcd by the Depart-


men" of Commercial Inteliigence <lnci Statistics. These series relate
to :
(i) Unit Value Indices of 11llport~.
(ii) Volume Indices of Imports.
(iii) Unit Value Indices of Expllrl~.
(i1)) Volume Indices of Exports.
(1)) Index of the Tenm of Trade (r.luD of l'XP0rlS prij'e index
to import price index.)
These indices are compiled on a monthly basis and allnllal indices
are ,t1so computed from tbese figure:. They are published in "ACCOUllts
Relating to fooreign TrJde and NavIg.. rion of India" and arc also re-
produced In the monthly bulleti.n of Reserve Bank of India.
Formcrl\' the base of these indices was the- year 1948-49 and the
indices related to (i) Average Unit Declared Values, of Imports and
Exports and (ii) The Volume of Exports and Imports. Later on, the
b.15e was shifted to the year 1952-53. It was again shifted l} little later,
,lnd at present the base period of all indIces j~ the year 1958. After the
adoption of the new trade clas'ii6cation In 1957 the old ba~e created
many difficulties and for this reason the ncw base of 1958 was adoptt:d.
Thesc Indices cover all foreIgn tr.ld,,-Sra AIr and Land. Indices
of exports exclude re-exports. The items of imports and exports arc
c1assifi~d according to the 'lew trade cla,sl.JC,ltlon (adopted in 1957)
,lad distributed over nine 1l12jClr groups. Scpar,lte indices are available
not only for each of these nine groups but also for each item included
in a group. A general index combining all Hems is also prepared se-
P,lr.ltcly for exports and i.npo[b. Th.: un:' v,lLze indices of exports
and imports are prepared by the ,tggregau \ e method and current guan-
tities arc used a5 weights. In the volume Indices the values of base
period .1re u'ied a~ weights. The qllantum Indices are in iirectly cal-
culated frOLri the value ratio, and prtce incii('l's used in ullit vaJue in-
dices.
3. S!.lIisfici of Foreigll Se,1 Barna Trade of II/dia fry COlIlJtries alJd
Cflrrell(/ A,reas.
4. CrutOfllS dlld Excise ReveII[(e Slain/un! of Il1diall UI/ion.
This public,Hiqn contains tlit.:: statistics about the gro~s collection
of custom duties on both exports and imports. These :.tatistics were
formerly published in the "Accounts Relating to the Foreign Se~ and
"\ir B"rne Trade and NavigMioll of India."
J. [ndo-Pakistan Trade Statistic.r.
6. Exports oj Illdian Arlu!lIr, alld Sports Goods-Monthly.
7. Arcollllt Relating to the Coasting Trad, and Navigatioll of India ..
,liollthly.
952 FUNDAMEN'l'Al,S OF STA'1'lSnCS

'This pUblication gives figures relating to trade among different


pons of the same state and also of trade between maritime states.
For purposes of coastal trade statistics, the Indian Coast was divided
formerly lnto the foUowing maritime blocks: (1) West Bengal, (2)
Orissa. (3) Madras (including Andhra), (4) Travancote Cochin, (5)
Cochin Port, (6)Bombay, (7) Saurashtra, (8) Okha and (9)K.utch. From
April 1957 the following nine maritime blocks corresponding each to
a mad time state or a union territory following reorganisation of states
has been adopted :
(i) West Bengal.
(ii) Orissa
(iii) Andhra Pradesh.
(ip) Madras.
(I') Kerala.
(vi) Mysore.
(vii) Bom~ay.
(piii) Andaman and Nicobar Islands and
(ix) Laccadive. Minicoy and Amindivi Istands.
Trade betweea ports in the same maritime blocks is' classed as
"internal trade" and that between one maritime block and another as
"ex;ternal trade."
The publications entitled "Accounts Relating to the Coasting,
Trade and Navigation of India" and ~(Stalistical Abstract of India"
publish statistics relating to:
(i) Quantity as well as value of imports and exports of Indian
and foreign merchandise.
(ii) Import and Export of treasures and
(iii) Shipping in the coasting tra.de.
Whereas thl:: figures contained in the Accounts relating to the
Coasting Trade and Navigation of India have an approximate time lag
of about seven months, the figures contained in the Statistical Abs-
tract have a time lag of about two years.
(8) Accounts Relating to the Inlana (Rail (llld Rig" BOrl1e Trade oj
fl/dia).
For the purpose of registration of inland trade, at present the
::ountry has been divided into 36 trade blocks roughly representing
~he former states of the Indian Union with the addition of the Chief
port towns of Bombay, Madras, Calcutta and Cochin. fhe Andhra
ports, t he Saurashtra ports and the ports in Madras other than Madras
Ports have also been treated separately.
The figures published with regard to the internal' trade relate
to the actual imports into these blocks as the internal trade within the
trade blocks is excluded from the scope of these statistics. Separate
GROWTH OP ST }.TlST«:·S I!:" rr-:Dlil.

figures are available about the trade in merchandise and treasure. Fi-
gures of movements of selected articles by rail and river between thest
blocks and port towns are published by the Department of Commercial
Intelligence and. Statistics. The registration of the rail-borne trade
is done by the R.ailway Audit Office. This office registers the goods
carried for delivery to consignees on their own lines and in some cases
for delivery pn connected lines. Trade carried on bet'\\'een the stations
in the same block is not regisrered. The required information' is col-
lected from the invoices which show the details of the goods that is,
place of deStination, the nature of the goods and their gross weight.
A certain percentage which varies according to the class. of goods is
taken to represenfthe weight of packing material and it is deducted
from the gross weight to arrive at the net figure. In tables only the
net weight is recorded.
The river borne trade. formerly u'<>ed to represent the trade curied
by inland steamers as well as country boats. Later on it \\ as thought
desirable to delete the figures relating to the trade carried through
country buats. The collection of £gures relatipg to the trade carried
on through country boats required elaborate arrangements and even
then the quality of statistics obtained was highly unsatisfactory. At
present the river borne trade represents only the trade carried by in·
land steamers between various blocks. The trade carried by steamers
is registered by steamer agents. The trade partly carried by rail and
partly by river when booked through and carried by steamers running
in connection with railways is generally recorded by the Railway and
is treated as rail-borne trade. ' _
These ~tatistics relate only to the qcantities and not the value of
goods tI:ansported.
9. Raw Col/on Trade StafiJtics
This publication deals with the imports and exports of raw cotton
from one State to another. It is a monthly tJublication and it gives
figures of the quantities moved from one block to another. The
figures relate to almost all the varieties' of cotton, and intra-block
movement is excluded.
10. Review of Trade of llldia.
This publication was formerly brought out by the Department
of Commercial Intelligence and Statistics but now it is publiShed by
the Office of the Economic Advh,er under the Ministry 6f Commerce
and Industry. This publication gives a review of India's trade both
foreign as well as inland and it contains useful tables relating to the
foreign trade, coastal trade, inland trade, balance of trade, etc.
11. StaiislicaJ Abstract of India.
Statisticnl .Abstl:act of India contains important tables relating
both to the: foreign as well as inland trade of India.
954 FUNDAMENTALS OF STATISTICS

1. Rajjways
~ailway statistics are compiled and published in our country by
the Rallway Board. Formerly they were available in the "AJlnual Re~
port of the R-ailway Board on Indiall Railways!' Now besides this annual
report. the Rail~ay Board also publishes "Montb{y Railway Statistics"
~lld thIS pl;lblicat~oo contains all relevant statistics relating to the work-
Ing of Indian Railways. Some of the data contained in this nublicatiol1
are also reproduced io the Monthly Abstract of Statistics. - Important
data contained in these publications relate to the following:
(i) Passenger Traffic and Earnings.
(il) Freight Traffic and Earnings.
(iii) Wagons loaded.
(iV) Labour employed and wages paid.
The figures relating to passenger traffic and earnings are given
separately for each class of traffic and for each railway. So far as freigllt
traffic is concerned figures are classified commoditywise and figures of
total tonnage are also available. Statistics relating to passenger miles
alld ton miles arc also published and the density of traffic as measured
by the number of passenger miles and ton _miles to the length of runn·
ing track is also calculated.
Figures of mileage published relate to both r~ule mileage. Double
or triple lines are cvunted only once but in calculating the track mileage
they are counted the proper number of times and besides this, the trans-
portation and commercial sidings are also taken into account.
In addition to these, statistics relating to capital ou tlay alld cami.lgs
of railways are also available. These figures include tolal capital as
charge, gross earnings, working expenses, net e,lcoings, perceutage
of working expenses to gross earnings and percentage of net earning
to total capital at charge etc.
De/ails regarding number of locomotives and wagons of various
kinds on the last day of the official year avcrage load per train, and
repairs etc., are also available~
The available railway statistics published by the Railw,:y Buard
are open to criticism. As early as 1937 the Indian Rallway Enpuiry
Committee criticised the then existing railway statisiics and offered
valuable suggestions for improvement. Some ot these sllggesticIDs
have been PUl in operation and the quality of statistics 1 a_-. \.. :iderably
improved since then. However even nuw ~ome of these published
statistics are misleading. Statistics reIatin8 to the working of railw,lY
include figun's of coaf'hing, earnings per train mile, costs of haulin
passenger train one mile, cost of hauling goods unit (1 ton) one mile,
profits on working a passengea train one mile etc. Such cost statistics
which are very technical in nature do not mean what they purport to
mean. The expert critic Lan makp. nv use I)f them and they ledd the
GROwfu'or STATI:-'1'IC~ It-; INDIA 9.55

non-expert to false conclusions. Similarly figures relating to net pas-


senger ton mile, passenger station to station statistics arc also very
confusing.
There is nc<.-ci for the revlSJon in the concept and nature.; of rail-
way statistics. In eatlier days whe 1 road competition did not exist
and the traffic came to the railways as a matter of course it "as but
natural to give greater importance to the statistics relating to the work-
ing of railway~ and comparatively less importan~~ to t~e charac~r.r and
volume of bUSIness. At present when' competItIOn wIth road IS keen
it is necessary to give a greater importance to commercial statistics
rather than to operating statistics. The country needs more informa-
tion relating to, say, cotton traffic on railways than the statistics relating
to items, like shunting operations etc.
The time lag of railway statistics at present is about 6 months.
Formerly it used to be 10 to 12 months. If possible tHis should be
further reduced. There is considerable scope for improvement in the
existing railway statistics. There is scope for additIonal statistics on
some points on which at prese.tt either no information ;1 collected or
the existing information is extremely meagre. A Mudy of variable
costs incurred in connection with differ~nt classes of traffic would be
of practical use to railway administration. Morr. attention should be
given to the cost of the working of the goods shed and information
should be available both about the handling costs and the clerical COMS.
It is gratifying to note that the Government is conscious of bring-
ing about a co-ordinated development of the different modes of the
transport in the country and it constituted a Committee on Trmsp(lI 1
P,,]icyand cOllrdinarirm (Neogy Commit'ee) to ;,dviae on ](JOg tcrlll
transportation policy, and against the background of this policy, to
define the role of various means of transport in the next 5 to 10 years.
This Committee submitted its primary report to the Planning r.om-
mission in February, 1961. The report contains detailed factn;:! ma-
terial pertaining to road-rail co-ordination and it hall raised many issues
which are of considerable importance from the point of view of for-
mulating a long term transport policy for the country. The final report
of the Committee is expected shprtly and the programmes for transport
development in the third plan will be reviewed in the light of the filial
recommendations.

2. Roads and Road Transport


The statistics of road and road trallspof( in (lUI C(Julltry WC'T(' for-
merly published in the "Indian Road" which was a m()l1thly publication
issued by the Department of Commercial Intelligence and Statistics and
the figures were reproduced in the Statistical A":'r{/~I and some data
were also available in the AgriCIIltural Stajistics of lfIJ.:'I.. Now these
statistics arc published by the Ministry of Transpnrt and C~\mmuni-
9.56 FUND \MENTALS OF STATISTICS

cations in "Basi{ Road Statistiu" which is an annual publication. The


data contained are:
(i) R.oad mileage.
(ii) Growth of Traffic.
(iii) R.oad Finance and Tax;ation.
(i,,) R.oad expansion.
The time lag of these figures is about 4 years which renders them
unfit for current economic analysis. The statistics contained in the
above publication are fairly detailed. Details are given abou t roads
rnlintained by the Public Works Department and local bodies like Muni-
cipal and District Boards. Separate figures are available about llerviced
and ,uQserviced roads and these are further classified into a number of
categories. Separate figures of expenditure on maintenance, repairs
and minor improvements as well as on new constructions and adminis-
tration and other charges are also available.
These figures are-_reproduced in the Statistical Abstract of India
and in some other publications. «Road Pa{ts of India" also containa
statistics -about roads and road transport. Details relating to the nums
ber of motor _cars, motorcycles, lorries a.nd buses running in differen-
states are also available. Statistics reI~ting to the number of-bullockt
carts in various states are published in Li", Stade Statistiu and are based-
on periodical census. However these statistic;s are not fully reliable
or complete.
3. Inland Water Transport
The statistics relating to navigation canals are published in the
fta/is/ieal Abtlract of India and ~lso in the Indi411 Agricultllfal Statislirs.
The data relate to
(i) Length of th~ canals open for navigation.
(ii) Total number of boats plying cargo.
(iii) Total number of boats plying passengers.
(ill) Quantity of cargo carded.
Cu) Value of cargo carried.
(II;) Number of passengers carJ;ied.
(viii) Quantity of cargo carried by r.tfting.
Prospects of inland water transport in our country ate vcr good
as it is cheaper than railways. During the course of the first five lICar
plan Ganga Brahmaputra Board was set up. During the second live
year plan the Government agreed to give conservancy grants to joint
steamer companies. During the third live year plan, not only loan
assistance is being provided to joint steamer companies but many new
schemes have also been taken up for the development of inland watct
transport. Many State Governments are also implementing develop-
mental schemes in their territories and it is ex:pected that more detailed
and ex:haustive information regarding the development of inland water-
ways in our country would be available in future.
GROWTH 01" ST~S'rICS IN INDT.~

Shipping
The statistics relating to shipping in our country arc nvailable in
the following publications:
(i.) .Monthly Abstract of Statistics.
(ii) Account s Relating to Foreign Trade and Navigation of
India.
tiii) Accounts relating to the Coasting Trad~ of India (Monthly)
The data published relate to :
(i) Tonnage of vessels entered :'lnd cleared with cargo.
(a) in foreign trade.
(b) in coasting trade.
(ii) Nationality of vessels.
(~ii) Country from which :'lnd to which ships arrive and go'
(iv) Ports into or ~rom which ships enter or cleared.
(t') Number cleated and tonnage of ships buil t in variOlls part s
of the country.
(vi) Number of tonnage of vessels registered in India.
(vii) - Loss of tonnage and number of live'S lost.
(viii) Number of passengers carried on long and short voyages.
(When Ii ship is continuously ou r of the port for 120 hours 'or more it
is a long voyage oth.erwise a short one. Long voyage figures are given
under two heads according as the destination is in India or outside ~ndja.
Short voyage figures are classified according as the desdnation is within
the same state, beyond the state but in India and outside India.)
5. Civil Air Transport
The statistics of Civil Air Transport are published by the Direc-
tor General of Civil Avi~tion, Ministry. of Transport and Communi-
cations, Government of India. The data are contained in the follow_
ing publications:
(i) Monthly News Itller of Civil Atlia/ioll.
(if) Monlhly Abslrtl(1 of Sfatistiu.
The figures relate to the following:
(i) Passengers miles flown.
(ii) . Revenue and nonrevenue load.
(iii) Passengers carried.
(iv) Miles flown.
(v) Hours flown.
Separate figures a~e :lv.libl,lc fnr inlernal :Jill! inteff1:lti0I1:11 ser-
vices ..
958 FUNDAMENTALS I)F STATISTICS

6. Communications
Statistics relating to Post Offices and Telegraph Offices and Tele-
phones are comp:JeJ in our country by the Director General of Posts
and Telegraphs in the Ministry of Transport and Communications.
They are published in the Monthly Abstract oj Statistics and the annual
figures are available in th, Annual Report of the Post and Telegraph D.-
par/menll·.
The following statistics are available in these publications
(a) Post Offices
(i) Nflmber of Post offien afld letter /'oxer.
(ii) Nllmber of poslal artirles balldled.
(iii) Mileage ov!!r wbicb mails are carried.
(iv) Total Nflmber alld amount of money orders.
(v) Post Office Savings Bank and cash certificates.
(vi) Postal Insurance Fllnd.
(flH) Dead LetJer Office. Nun/her of lellers dealt lVi/h.
(viii) Capilal expendittlre, rueipts and char.t:es.
(b) Telegraphs
The following information is pubHshed
(i) "i".,'''o;·aph Lines.
(ii) Number oj 1t!:legranJS halldled and tbeir reeei/l: s.
(iii) Number of Telegraph offices.
Besides the above information stati!;tics relating to technical de-
tails are also provided in ,the report.
(C) Telephones
The following information is available about the telephones In
the annual report of the Posts and Telegraph Department
(i) N"mber of exchanger.
(ii) Total lines and lolal lelephonts connected.
(iii) Nllmber of Departmenlal non-exchange teleph01les.
(iv) Total line and wire mileage.
(v) Revenue earned fronl rents and telephone (all fees.
(vi) Nllmber of licensed telepbone companies, excbanges, Sllb-excba1lges
and telephones.
(I'ii) Strength of the staff uBployed in the variofls brdllcbes of the depart-
nlenl and statements reiating to the receipt.r and expenditllre oj
the departments.

(d) Wireless and Radio


Statistics of wireless and radio broadcasting are available in the
AI/Ilual Report oj the Post and Telegraph Department and the [{eport on the
GROWTH OF STATISTICS r~ INDIA 959

Progress oj Broadcasting in India issued by the Director General, All-Indh.


Radio. The following information is available :
(i) Nllmber of wireless stations ;1/ operalioll.
(ii) Nflfllber of messages handled.
(iii) Nfllliber of u'irtfeu licencu.
(ir,) Rectipt! ,wd p{lJnJtllts of the r.;dio fe/t'gr"p/! braNch of Il,!' po.!!
,lIId telegraph departmellts.
(I') Number of radio fei,granlJ smt Iwd ruril'cd alld tbe allJOfmJ accrued
fo the GOt'tmHlenl.
(l'i) Nflfllber of broadrast receiving licwsn.
(vii) NII/llbcr oj liwu(! iUlI('d, rfllflJltd, It,ptd IlIId tbe lolal fJlI~!/b(r
ill force.
(/liii) D"tttiis of the progrt1t1'!l1i! Irtl'lJlllissioJlJ. bOflrs t/c.
(is) Detailt regardillg tbor' Wal',- 'lfId fIIfdiflll/ )J'aN IralJJllli.r.o'ions {/II,;
of Ibl' pOll'er of Ir""tn/iller s n.,
SEC'I'ION 8
FINANCIAL STATISTICS
Financial statistics can broadly speaking be divided in two heads,
namely, (1) Statistics relating to Banking,' Currency, . Exchange and
Bullion and (2) Statistics relating to Public Finance.
Statistics relating to banking, currency, etc., are published mostly
by the Reserve Bank of India in the following publicat~ons':
(1) Statement 'J Affairs of the Reserve Bank oj India
This is a weekly statement issued by the Reserve Bank and is
divided in two parts giving separate figures of assets and liabilities of
the Banking and Issue Departments of the Reserve Bank. The Bank-
ing Department is concerned wi th banking activities only and as such
its assets consist of notes, rupee coins, subsidiary coins, bills discpunted,
balances held abroad, loans and oaqvances t9 the governme~ts. othe'r
loans and advances, investmentt etc. The Ua.bilities of the Banking
De1?artment consist of the ~d lllP ~~pital, reserve fund, depo~its. ~~th
varIOUS governments, banks, etc., bills pily:b1e _abd other liabllIues.
So far as the issue department,I! GOncerneQ_ the assets congi~t of gold
and bullion, sterling securi~jes. rupc:c <oio, rupee securities, internal bills
of exchange, etc. The liabilitie&,Otthc Issue Department are the notes
in the Banking D_part~ent and the nO,tes in circulation.
Apart from the above statement which gives the assets and liabi-
lities of the Reserve Bank detailed statistics are available about other
actiVities performed by the Reserve Bank. These statistics generally
relate to:
(i) , Loans IJnd advallces made to scheduled ~ank;.,
(ii) Statistics relating to the transacJi~ns in foreign cflrrency.
(iii) Statistics relating to the movement offllods by telegraphic transfers.
Figures are separately available for telegraphic transfers issued and paid
at Bombay, Calcutta, Ne·.\' Delhi, Kanpur, Madras, Bangalore and
Nagpur.
(iv) Clearing HOllse Statistic!. Fjgures are available about cheguc
chearances a1 various branches of the Reserve Bank and other centres.
Figures relate to both number as well as amount.
(v) Money rates. Figures are available about the bank rat and
the rates at which the Reserve Bank advances loans to scheduled bank~
for general banking purposes, for' financing bonafide commercial and
trading transactions and the rates at which the Reserve Bank advance'
loan to State Go-operative Banks for g~neral banking purposes, bona_
fide commercial an.d trade 'transacrions, seasonal agricultut:lI operntions
GROW'l'H OF S'l'A'1'ISTICS IN INDIA 961
and marketing of Crops Co-operative Sugar Factories, Cottage indus·
tries and medium term loans for agricultural purposes.
(2) Statement of Affairs of the Sfhedliled Banks
This is a consolidated statement about the position of the schedul-
ed banks and is issued by the Reserve Bank every week. It generally
relates to their position as at the close of each Friday. Statistics avail-
able relate to demand liabilities, time liabilities,total cash,balances with
Reserve Bank advances in India, Bills discounted in India, investments
in India, etc. For the sake of comparison figures are givt"'1 for the past
three or four weeks.
(3) Reserlle Bank oj India Blil/elin (Monlhly)
So far as non-scheduled banks are concerned they are governed
by the Indian Companies Act according to which a banking company
which is not a scheduled bank has to transfer at least 20% of the dec-
lared profits to the res.erve until this equal to the paid-up capital. The
non-scheduled banks (if they are limited companies) are also required
to maintain a cash reserve of at least 1~~ of time and ?% of demand
liabilities and to file a statement with the Registrar of Joint Stock Com-
panies. Statistics relating to non-S{:heduled banking companies are
published in the Reserve Bank of India Bulletin as also in Statistical
Tables relating to Banks in India.
(4) Report on tbe Trend and Progress of Banking in India.
This is an annual report prepared by the Reserve Bank and it
reviews the important events in the banking field during the year con-
cerned. Their effects on the nation's economy' are reviewed in details.
It contains some very useful information from which important conclu-
sions can be derived. This publication contains statistics, amongst
other things, on the liabilities and assets of the Reserve Bank, conso-
lidated position of the scheduled bank.s, interest and money rates, ana-
lysis of the investment of banks, cheque clearings, velocity of circulation
of deposit money, etc.
(5) Statistiral Tables Re/~ting to Bank! in India.
This is also an annual publication and gives detailed statistics
about the working of Indian and foreign banks. This publication gives
s.tatistics not only about the scheduled banks of the country but about
the non-scheduled banks also. Statistics about the co-operative banks
are also included in it.
(6) Statistical Statement! Relating to the Co-operatille MOlJement
in India.
This is also an annual publication. The information published
relates to provincial and central co-operative banks and credit societies,
land mortgage banks and societies and other types of co-operative
societies. The information published relates to a number of banks, paid
up capital reserve and other funds, deposits and loans held, loans out-
standIng, cash balances, etc.
61
962 FUNDAMl!.NTALS OP STATIS'tICS

(7) Rlporl on Ctlrrenry and Finan&e


This report is published in two/arts. Part one contains a review
of world economic development an part two gives a comprehensive
survey of the economic situation in- the country. A very large variety
of statistical information is available in this publication and statistics
relating to practically all important types of banking and currency deve-
lopments during the year under review are aV'ailable. So far as currency
and bullion are concerned this publication contains detailed statistics
about the supply of money, absorption of currency, circulation of notes,
trends in currency circulation, paper currency reserves, etc. Figures
are _also available about gold and silver reserves with R.eserve Bank.
Figures about the purchase and sale of silver, the price of gold and silver,
their production imports and exports are also available.
Public Finance
So far as statistics of public finance are concerned they are availa-
ble in the budgets of the Central and State Governments. The budgets
of the GoV'ernments give J1 detailed account about the pubJic revenue
and public expenditure. Statistics are separately available about the
railway budget and the general budget of the Central Government and
about the budget of the various state governments.
Statistics available about public finance are V'ey detailed and relate
to the revenues of the government, expenditure on revenue account,
receipts of the government and disbursements I of the government. As
has been said earlier ligures are available separately about railway finance.
Statistics of income and expenditure of local bodies like municipal and
district boards are alsp aV'ailable. Figures of public debt, burden of
taxation, incidence. of taxation, etc., are also worked out and are avail-
able in various puhlications of the Reserve Bank "it., the combined
Finance and R.evenue Account of the Central and State goV'ernments
(Annual) and in many non-official publications also.
The Economic Division of the Ministry of Finance brings out
every year a document entitled "An B&ollomi& Cla.r.riji&aliofJ of Jhl C'IIJraJ
Blldget." The classification follows the t~chnique of socia[ accounting
and groups togethere like items, after eliminating all accounting trans-
actions.
This classification is a modest attempt in the direction of preparing
an «Economic Budget" of the Central GoV'ernment. For preparing an
economic budget eV'en of the public sector such classification has to be
adopted for state budgets and commercial undertakings of the Govern-
ment operatd by public corporation, autonomous bodies, etc. The
task of having such data for the private sector is still more complicated
and it will be a long time before an integrated economic budget for
all the economic activities of the country can be prepared.
INSURANCE STATISTICS
Detailed statistics relating to life and general insurance are availa-
ble in our country. Life Insurance was nationalised in the year 1956
and at present statistics rela~ing to life insurance are available' from the
GROWTH OF STATISTICS IN INDIA

Department of Insurance under the Minist,ry of Finance. The infor-


mation is available under the following heads :
(i) Amollnl of Income and OIIlgo of Indian Life insllranc,
(ii) Amollnts oJ Vfe Inlllrance Fund, Paid-lip capital and lolal assets
of Indiam inlllrers.
(iii) N6ttP bIIsineu attd business at close.
(iv) Rales of dividend, resulls of valuation elc.
(v) Postal Vfe Inmrance business.
Apart from the statistics relating to life insurance business figures
are available about the general insurance. Among general insurance
the most important lines which are popular in our country are fire,
accident, motor, burglary, theft etc., and statistics regarding these are
available in the Insllrance Year Book and certain other publications as
well.
Balance of Payments
_ There was no arrangement for compilation of reliable statistics
in this field till the year 1948. With the establishment or International
Monetary Fund it became necessary for member cquntries to collect
balance of payment statistics on a uniform basis and it is sioce 1948
that the Reserve Bank of India is regularly publishing comparable sta-
tistics on balance of payments through its Department of Research and
Statistics.
The balance of payments statistics available in our country relate
to the following: /
(i) India's overall balance of payments (current account).
(ii) India's overall balance of payments (capital account)
(iii) Regional summary of the balance of payment (on current
account) : Information is available separately about!
1. Sterling area.
2. Dollar area.
3. O.B.B.C. countries.
4. Rest of non-sterling area.
(iv) India's balance of payments on current account with selected
countries.
(v) India's foreign exchange reserves and liabilities.
(vi) Foreign business investments in India.
(vii) India's international investment position.
(viii) Distribution of foreign liabilities and assets.
The Reserve Bank ofIndia brought out a Bulletin entitled, "India's
balance of payments from 1948-49 to 1955-56" in January 1957.
Th1s booklet contains a detailed account of statistic relating to India's
balance of payment for the above period. The statistics relating to the
above period and 1955-56 onwards are available in the bulletin issued
964 FUNDAMENTALS OF STATISTICS

by the R.eServe Bank of India. These statistics are collected from ex-
change Control records which suffer from certain defects in the coverage,
classification and' timings. In regard to coverage the glaring defect
'relates to barter transactions which the exchange control does not cover.
Then there are other transactions 'which are omitted for instance, in-
vestments in kind, earnings ploughed back by nonresident companies,
short term trade credit etc. The gap in the private transacdon is larger
than the official transactions because the latter have been covered
through other sources.
So far as the classiiic:Hion of the transactions is concerned the
results of the exchange con'trol records are far from satisfactory. Ac-
cording to the booklet issued by the R.eserve Bank of India the rea-
sons for the unsatisfactory classification are (a) incomplete classification
of receipts (b) recordings of the transactions on a net basis and not gross
basis and (t) splitting of the same transaction under different categories.
Classification of receipts is incomplete mainly due to the rule under
which purpose of the transaction need not be disclosed if the individual
amount is not above Rs. 25,000/-. Such transactions have roughly
been estimated to be to the tune of Its. 50 crores per year. This defect
has been partly removed with the help of two surveys the travel survey
conducted on a continuous basi's since 1952 and the survey of unclassified
receipts relating to the quarter (July-September 1955) conducted by the
Reserve Bank of India. '
The transportation transactions are good examples of the re-
maining two defects mentioned above. Exchange control records
the individual merchandise transaction exactly in the same way as the
transactions are x:.:ported to the exchange control which may be either
f.o.b., c.f., c.i.f. Thus one cannot get a true picture of this item. Ex-
change control records only such traosat:tions uoder this head which
are not included in the merchandise data. EXChange control statistics
do not ep.ter satisfactorily the residual transport transactions included
under the item "transportation." They .include certain transactions
which must be excluded altogether and which are often entered on a net
basis, recdpts and payments offsetting each other.
The el!;pott figures of the data are recorded neither on shipment
basis nor the payment basis. They are recorded when the dOCllments
are received which may occur within varying periods from the date of
shipments. In a majority of cases, however, the figures are entered In
the same month in which the shipment takes place. In case of imports
the figures more or less cQrrespond to the time when the exchange is
released 'the non-residents except where the gaps in the statistics are
filled from independent sources (for example, data relating to foreign
aid). The releases of exchange may however not coincide with the tim~
when the goods arrive in India and to that extent the statistics presen-
ted are not fully accurate. With regard to other transactions the figures
correspond to the time of transfer of funds to and,from India.
GROWTH OF STATISTICS IN INDIA 965
Although the source of the statistics of India's balance of pay-
ments, that is, exchange control records, suffers from above mentioned
defects, there seems to be at present 110 way out except to accept them as
the final choice. Although the Reserve Bank of India is making efforts
to improve the quality of these statistics by conducting surveys from
time to time yet we cannot as yet say that OUt balance: of payments
statistics are fully dependable.
Apart from the data available from the Reserve Bank about India's
balance of payments certain statistics relating 10 merchandise imports
and exports (based on custom data) are published by the Director Gene-
ral of Commercial Intelligence and Statistics. However figures of trade
in merchandise as derived from customs and the figures published by
the Reserve Bank which are derived from the Exchange Control De-
partment vary considerably because of differences in timings coverage
and basis of valuation. The custom data relate to the physical movement
of the commodities whereas the documents of the Exchange Control
Department relate to actual "payments" rather than to "acc.ruaJs".
The major cause of discrepancy between the two s.ets of figurell Hel>
in the different basis of valuation adopted. The data based on cus-
toms returns are more useful for a study of balance of trade rather than
the balance of payments and as such one has to rely almost exclusively
on the Reserve Bank data So far as balance of payments are concerned.
It can be reasonably1Joped that as a result of the effects which are being
made by the Reserve Bank of"India the balance of payments statistics
df oilr country would improve in future.
SECTION 9
NATIONAL INCPME STATISTICS
Importance
National income statistics along with their various breakdowns are
the most important statistical measures of the ,economic activity of a
nation and are very useful in analysing current economic conditions. If
these statistics are available in the shape of a time series their iplpor-
tance increases considerably, for then, it is possible not only to describe
a nation's economy at'a particular'fime but-also to compare the changes
in it at different periods. National income statistics thus throw light
on the relative importance ofthe::various sections of a nation's economy.
The contribution of the vari(lUS/c:t)mponent parts of a country's eco-
nomy towards the national il"come and the statistics of per capita in-
come in different sectors indica-te·their relative importance. A balanced
economic development~of a counvty is oot possible in the absence of such
facts and figures. The creation of the, United Nations organization
and other international institutions has opened an entirely new avenue
for national income statistics. These stadstics, play an'important role
in the field of international economic relation~ ana are necessary for
international comparisons of the burden of taxation, war efforts, effects
of war and simifar other t,hings, The problem of the development
of economically backward countries and the progress made by them
is always studied in the background of national income and per capita
income statistics.
Methods of Calculation
Prot/litis M4lhodr. The concept of national income can be discussed
from three sides, "i~.• production, income and expenditure. From the
side of production, national income can be said to represent the na-
tional product or tlt~ total of the net values added in a l'artkular period,
in all 'branches of economic activity (including serviCes). Net income
from abroad is also included in the total. If the valuation of the out-
put of goods and services is done at market prices the national in-
come is said to be national income at market prices and if the valu-
ation is done at prices which equal to the payments received by various-
facto~s of production only, national income is said to be national in-
come at factor cost. In the latter case (national income at factor cost)
the net values of output are calculated net of indirect taxes but subsi-
dies are included so tliat the price is just equal to the payments made
to various factors of production. Another point that should be noted
in this connection is that the output must be the result of the labour
and capital invested by the residents of the country only. The term
residents is in itself ambiguous and many difficulties arise when, say,
the permanent residence of a pet:Son is in one country, his place of
GROWTH op S.TATIS'lIICa IN INDIA 967
work in a second country and the location of employer in a third coun-
try.
There is .no uniformity in the procedure followed in this respect
by various countries.
Intome'l Meihod. From the side of income the term national
income refers to the total of the distributive shares. In other words,
it is equal to the aggregate of the payment accrued to various factors
of production in a particular period in the shape of wages and salaries,
interest, rent, profits, etc. National income according to this method
is calculated by totalling the income of all the residents of a country
during a specified period. Here again the term income is rather am-
biguous. Gross income is not difficult to be calculated but there is
a diversity of opinion with regard to the items which should be de-
ducted from the figure of gross income to alrive at net income.
Expenditure M4Ihod. F,rom the side of expenditure the term
national income is equivalent to net national expenditure. In other
words, it is equal to the total expenditure on final consumption· goods
plus investments (both domestic and fordgn) and hoardings, if any.
In this field also various terms like expenditure, final consumption
goods, savings, hoardings and investment have different interpretations.
Difficulties in the calculation of India's national income
Mter dealing bridly with the importance bf national income
st-4tistics and the concept of the same We now propose to deal with the
special difficulties that are felt in the calculation of the national income
of India. Ordinarily there are difficulties in the calculation of national
income in all countries but in our country the difficulties are of a peculiar
nature. Broadly speaking we can divide the difficulties in two main
categories, "it., those which arise due to inadequacy and inaccuracy
of statistical data and secondly those which arise due to a peculiar cha-
~cter of our economy.
Data lnaiktjllale and Inatcurate. Statistical data relating to output
and cost of agricultural and industrial production are hopdessly inade-
quate in our country. We have, no doubt, some statistics with re<gard
to output of agricultural commodities but his information is also .in-
adequate aud incomplete. Data regarding the production of milk,
meat, vegetables, fruits, etc., do not exist, and on the whole, our food
statistics are unsatisfactory. In the field of industrial production, the
situation has improved only very re.cently, when the Census of Manu-
facturing Industries Rules were passed. We have now fairly good
statistics of the output of various large scale industries of the .country.
However, at present, we are collecting statistics of only 29 industri.es
out of the 63 industries that we have. It is necessary to collect statls-
tics with regard to the remaining industries also. Cottage and smal
scale industries present another sphere where few statistics exist. These
968 FUNDAMENTALS OP STATISTICS

industries occupy an important place in the economic life of our country


and we cannot alford to ignore them. Thus we find that the situation
with regard to output statistics is not satisfactory in our country. The
condition of the statistics of cost is worse than that of output statis~ics.
We do not have any data with regard to cost of agricultural.output
and even with regard to the cost of industrial output, the situation
is not very happy. It is well known that statistics of output and cost
are very important and essential statistics from the point of view of the
calculation of national income but unfortunately these statistics are not
satisfactorily collected in our country and this adds to the difficulties
in the calculation of national income, as in the absence of complete
statistics, sample surveys have to be conducted to fill in the gaps.
The situation in other spheres is equally bad. Statistics of occu-
pational classification are not at all satisfactory. The calculation of the
working population is not easily possible in our country and the estima-
tion of natlonal income by the rather customary method of industrial
origin cannot be conveniently and accurately done. There has been no
uniformity with regard to the classification of p'opulatiOll in different
censuses in our country and sometimes the classIfication has been very
incomplete and unsatisfactory. In the censuS of 1951, however, an
attempt was made to collect statistics about occupations and economic
status in a more well-defined manner than was done in previous cen-
suses.
Statistics of income, consumer's expenditure, investment, hoard-
ing and capital formation are very meagre in our country. These sta-'
tistics are very _important in the estimation of national income of a
country and their inadequacy and gross inaccuracies make the task of
national income computation very difficult. In fact these statistics
practically do not exist in our country and that is the reason why at
least for some years to come we cannot think of estimating India's na-
tional income by the method of "Census of Incomes" or by the method
of "Census of Expenditure, Investments and Savings.~'
Besides these items, our statistics of exports and imports, balance
of payments, othet busi1)ess and financial statistics are neither very ac-
curate nor adequate. In a nutshell what Bowley and Robertson said
about 20 years ago still holds good to a certain extent. They were of
opi.nion that most of our statistics are unnecessarily diffuse, gravely
inexact, incomplete or misleading .... and further that the "situation
cries out for overhaul" .... under the control of a well-qualified sta-
tistician.
Besides these difficulties which arise due to non-availability of
statistical data or their inaccuracy. there are other probl~ms which
arise primarily due to some peculiarities of the Indian economy. These
difficulties are generally felt in the estimation of national income of
under. developed countries. We will examine some' of them here
GROWTH OF STATISTICS IN INDIA 969

Barllr Bconol1lY. One peculiarity of out economic system is the


prevalence of "barter" to a considerable extent. The existence of non-
monetary transactions on a large scale considerably complicates the esti-
mation of national income. Value of output cannot be easily calculated
under such a situation. On the rural side barter economy is more im-
portant than the money economy and it becomes more or less necessary
to divide the national income in two main branches, "it., monetary
section and non-monetary .section.
Defective OtGIlpalional Clauification. Another peculiarity of our
economy is the absence of a clear-cut line of demarcation between
different occupations. People follow m?re th.lD one occupation and it
becomes difficult to have a proper occupational classificatIon. Agricul-
rurists, in general, besides their main occupation of agriculture follow
some other occupations of the nature of some cottage or small scale
indust'ry anc;l under such circumstances the estimation of national in-
come by industtial origin presents many difficultieS'. In fact from the
point of view of countries like ours there is need of a revision of the
popularly used classification of national income by industrial origin.
Most of our national income is derived from small scale enterprises
(including small scale agriculture) and as such it would be worth while
to haV'e 3 classification of national income on the basis of the "size and
character of the enterprise" rather than on the basis of different types
of industries.
Regional Jjversities. The above mentioned difficulties are aggra-
vated by the fact that the diversities in different parts of the country arc
so great that data which are collected in a particular region cannot be
Ilsed to draw conclusions regarding any other region or regarding the
whole of the country. Our country is like a sub-continent and there
are great diversities in the climatic condltions, tastes, habits, customs
and environments of the various parts of the country. Just as statistics
collected in France cannot be utilised to draw conclusions about Ger-
many similarly statistics collecl~ in Bombay cannot be utilised for
drawing inferences about Uttar Pradesh or about the wbole of the,
country. In fact there is a very great heterogeneity between different
parts of our country and this situation makes the collection of statistics
necessary for the whole of the country and the results of the enquiries
conducted in some parts of the country cannot be utilized, without
considerable risk of inaccuracies, for the analysis of the particular pro-
blem throughout the country.
Indifference of people. Another thing which complicates the situa_
tion is the ignorance and indifference of the Indian people towards
statistics. In other countries. many statistics are collected by investi-
gators direc ly from the citizens. People in other countries are statistics-
minded and understand the implications and the importance of the
figures they supply. In our country even the producers. of goods are
generally not in a mood to supply information about the output or
cost of the commodities {1roducG'd by them.. This peculiar pSJ"Chology
970 P11NDAMBNTALS OF STATISTICS

of our people is an additional difficulty in the calculation of national


income, for which statistics have to be collected about many problems.
·Partition. Last though, not the least, are the difficulties created by
the partition of the country. All the statistics that we had before the
partltion of the country are now practically of no use and fresh figures
have to be collected about all the problems connected with national
income.
Technique suitable to Indian conditions
At the very outset it must be made clear that the method suitable
to Indian conditions is more dictated by the availability of statistical
data thanlby type of logical reasoning. Various estimates of India's
nadonal income have been made from time to time and in each case the
availability of data at the time of calculation has been the guiding force
in the determination of the technique of calculati~n.
One more thing which would be mentioned here is that, even
today, out of the three recognized methods of national income calcula-
tion (vi~., Census of Products Method, Census of Incomes' Method,
and Census of Expenditure, Investment and Savings Method none of
them can be exclusively applied in all the sectors of our economy) It is
necessary that a combination of at least the first two methods should be
made, and in some fields the first should be applied and in others the.
second. This is what has been actually done so far .
by Ivarious authors.
Datlabhai Naoroji, Cromer and Barbollr, Lord Curzon, Digby, Find/qy
Sbtrras, Shah and Khambhata, Wadia and Joshi, Vakil and MliranjaIJ and Dr.
V. K. R. V. Rao are the important personS who have estimated the
national income of India. Of these, the most popular estimates used
in recent years and recognized inter-nationally are the estimates of Dr.
V. K. R.. V. R.ao. We shall briefly outline the method followed by him
in his calculations which relate to the year 1931-32.
V. K. R. V. Rao's Estimates. Dr. R.ao combined both the Census
of Products and the Census of Incomes Methods. On the basis of the
occupation~l cenSUS he found out the total numbers of e•. _ners in the I

country, the total of whose incomes was to constitute the national


income of India. These people were divided in two main sections-
those whose incomes were to be evaluated by the Products Method and
those whose incomes were to be <!valuated by the Incomes Method,
Agriculture, Pastures, Mines, Forest, Fishing and Hunting were the
occupations classified in the first sect.ion and for these the Product.s,
Method was used. In the second sectIon the occut'ations covered were'
industry, trade, transport, public forces and adminIstration, professions
and the liberal arts and domestic services. In this section the Census oJ
Incomes Method was used. Income tax statistics were utilised as far as
possible and for the remaining parts ad hOG surveys were conducted by
Dr. Rao, and other published material was also judiciously used. To
GROWTH OF STATISTICS IN INDIA 971

these totals, the income from house property and other miscellaneous
items which could not be specifically classified was added. From this
gross income, money values of goods and services consumed in th~
course of production and net increase in country's foreign indebtedness
were deducted and the national income figure was thus arrived at.
Dr. Rao's method of national income calculation was- the best
under the conditions and circumstances in which the estiinate was made.
His results are considered to be much better than those of his pre-
decessors because of the many ad hot enquiries cpnducted by him to
gather information about many problems on which no statistics existed.
Scheme suggested by M~ssrs. Bowley and Robertson
In November 1933 the Government of India invited Dr. A. L.
Bowley and Mr. D. H. Robertson to exlUnine the data available for
estimating the national income and wealth of India and to make sugges-
tions for their improvement. They submitted their report in 1934 and
were of opinion that the then available statistics of the country were very
scanty and highly defective and unfit for national income estimation.
They suggested a scheme for the estimation of India's national income
the outline of which is discussed he.t:e. They defined national income
as "the money measure of the aggregate of goods and services accruing
to the inhabitants of a country during a year inclu ling net decrements.
from their individual or collective'" wealth. The authors discussed both
the Census of Products and Census of Incomes Method in detail and
recommended a cantious combination of both of them for estimating
~he total income of the country. The scheme suggested by them was
primarily/based on the Census~of products method though a small part
of the scheme relating to urban area was dependent on the coll~ction of
income figures. They were in favo\1r of distinguishing the rural income
from the urban income, as in their opinion, the nature of products, and
the methods of investigation suitable for rural and urban areas, were
different from each other. .

For estimating the rural income they recommended estimation of


quantity and value of all goods and services arising from land or ren-
dered in the villages by the method of intensive surveys in selected
viJlf.~e.s. They suggested random sampling for selection of villages.
Fo!,..urban income they suggested that in the first instance surveys of
large towns should be conducted on the basis followed in other countries.
They also recommended an Intermediate Urban Population Census.
These three enquiries were to be supplemented by a Census of Produc-
tion applied to factories using powers, mines and some other industries.
Despite the .fact that t~e scheme suggested by these experts was fairly
comprehensIve no actlon was taken by the Government on it and We
diQ. not have aoy estimate- of national income on the basis of the scheme
suggested by them.
977. FUNDAMENTALS OP STATISTICS

Three methods of calculation of national income discussed above


can be summarised in the following manner*

Net national prodllft at N J.tional ;'UOIIII at Net national expen-


factor cost factor cost ditMTIat factor cost

t. Net prolucts: 3. Wages and salaries 9. Consumer~' expen-


(a) Agriculture. manu- inc!. other labour diture on goods
facturing, trade etc. income. and services.
(b) Banks and other 4. Income of unincor- 10. Government ex-
financial intermedi- porated enterprises penditure on
aries. goods and ser-
vice'l
(G) Insurance companies 5. Rents 1. 11. Net ihvt!stments
pension and social 6. I lferest 1. (a) Do,mestic capi-
security funds. 7. Dividends 1. tal formation.
(d) Government 8. Undistributed pro- (i) Pdvat;e.
fits of corpora- (ii) Publk.
tions. (b) Net foreign
2. Net income from abroad. investment.
12. ~ess :. Net busi-
ness lnsurance
premiums
13. Less; Allowan-
ces by enterpci"
ses for bad debts
14. Less: 'Indirect
taxes minus sub-
sidies.

All these three definitions or concepts would give the same figure
()f national income provided the different items are treated in a like
manner in aU the cases. There is, however, a great diversity of
opinion with regard to the meaning of various terms used by differenf
countries and as such international comparisons should be made with
a great amount of caution. It is not possible, in a short space to
discuss the various items over which differences of opinion exist, but
the most important of them are the non-monetary items like unpaid
services of hO\1sewives, services durable consumer's goods like
*B.1sed on 'Nltional Income Stal....c:s'-Statistical Office of United Nations.
I. Net rent interest and dividends accruing (a) persons (b) social security
funds pension fun'ds, fund of life insurance companies, and non-profits institutions.
Inco~e received from abr;,,,d is in::luded, while income paid abroad is exclu~d.
&. Includes all expenditures on noncapital goods and services.
GROWTH OF S'J'AJ:]STICS IN INDLA 973
furniture, etc., rental value of owner occupied houses, farmer's con-
sumption of his own product and payments in kind. As a general rule
national income is confined to those items only which have a monetary
value but a better estimate of national income would certainly be one
which includes all types of goods and services. These items are differ-
ently treated by different countries like many other items (including
international transactions and capital gains and losses) about which
opinions are divided. Generally, however, most of the countries in-
clude services in the calculation of national income though no country
in the world includes all types of services. Unpaid services of house-
wives are probably nowhere taken into account in the calculation of
national income.
Now-a-days many countries calculate the national income by using
all the three methods separately or in a combined form. United States.
United Kingdom, Australia, France, Sweden and some other countries
calculate national income separately by the three methods and get fairly
consistent results.
National Infom, Commillee. In August, -1949 the Government of
India appointed a committee to calculate the national income of India
and its report has been published. The committee has calcuJated the
national income for a number of years and the method that it has follow-
ed in its calculations is broadly on the same lines on which Dr. Rao
made his estimates. The method followed by the committee is as
follows ! -
The committee has also combined the Inventory and the Incomes
Methods like its many predecessors in the line. First of all the com-
mittee has estimated the working force for the year 1948-49 and its
distribution in various occupations. Occupational classification is on
the basis of the classification of the economy by industry (including
agriculture. services, etc.). The Inventory Method or the Census of
Product s Method has been applied in the sphere of agriculture, forestry
and animal husbandry, hunting, fishing, mining and industry. The In-
comes Method has been applied in the field of Transport, Trade, . Public
Force, Public Administration, Professional and Liberal Arts and Dom-
estic Services. In some cases like Professional and Liberal Arts, statis-
tics of consumer's expenditure have also been utilised for the purpose
of income calculation. In urban areas the income from house property
is estimated from the municipal records and in the rural areas the in-
come is estimated on the basis of an average rate of return based on
the esnmated values of houses. After finding out the net income from
....arious sources in the above manner the committee has made adjust-
ments for earned income from abroad and has thus 1inally arrived at
the figure of national income for the year 1948-49.
Comparison of Dr. Rao's and the N.I.C. M~thods
Though the methods followed by Dr. R.ao and the National In-
come Committee appear to be similar in principle yet there are many
974 PUNDAMENTALS OP STATISTICS

dHferencesin detail. In calculating the working force Dr. R.ao included


both working dependents and subsidiary workers and assigned them
weights of l and I' respectiV'ely as compared to 1 for the principal earner.
The national income committee has included only working dependenm
in its calculation of working force and has excluded subsidiary workdta.
The committee was of the opinion that the weights adopted by Dr. ltao
were more or less of an arbitrary character.
The National Income Committee has estimated the income from
industries by the InV'entory Method whereas Dr. R.ao calculated it by
the Income Method. Problbly this is due to the fact that now we haV'e
better statistics of industrial production and cost of output than were
aV'ailable to Dr. Rao. In accordance with the Census of Manufacturing
Industries R.ules we collect V'aluable information which has been utilised
by the committee in its estimates. Besides these there are other points
where the methods of Dr. R.ao and the National Income Committee
haV'e differed, though there is no clifference of a fundamental character.
Current national income estimates.
After the Firat and the Final Reports of the National Income
Committee, the esa published regularly a ~hite paper every year
giving the national income statistics for th~ previous year. This has
been done till 1966 following the pattern and methodology set by the
NIC in its two reports. I
In 1967 the esa brought out the revised series of national
product, 1960-61 to 1964--65. The major changes introduced in the
revised series are : "-
1. Modifications in the industrial classification for the measure-
ment and presentation of national product estimates.
2. Use of 1961 population census data.
3. More detailed sector-wise estimates for both organiud and
unorganized sectors of the economy.
4. ~he base year for compiling constant price estimates has been
shifted from 1948-49 to 1960-61.
5. All the available sources of data have been utifu:ed to present
a comprehensiV'e seties. It has drawn data from the
Government Budgets Reserve Bank of India, Annual
Survey of Industries, National Sample Surveys and the 1961
Ceru.IlS.
6. A thorough revision has been made of the estimates of agri-
cultural output, yield, live-stock products and by-products.
7. For large scale manufacturing the revised index of industrial
Production (1960=100) has been used.
8. In unorganized sectors like small scale industries, road trans-
port, trade, hotels and other services, national sample
survey data hays been supplemented with the Census figures.
GROWTH OF STATISTICS IN INDIA 975
In the revised .series both the gross and net product at current
and at 1960-61 constant prices have been prepared. For large scale
manufacturing, agriculture JUld mining the series adopts the 'product
approach' and for the unorganized sectors the 'income approach' is
used. For construction industry estimates have been prepared on the
'commodity-Bow' and 'income' methods.
Margin of error in the estimation of NIC. The national income
committee made an attempt in its final report to estimate the margin
of error in the figures published by it. The type of data aV'ailable to
the committee, as has been pointed out aboV'e, were not very satisfactory
and the margin of error naturally has been estimated by the NIC it-
self, at a V'ery substantial figure. The-Committee could not follow any
recognised technique in the estimation of a statistically valid margin
of error in the absence of suitable data and was compelled to use more
or less arbitrary methods to get Some idea of uncertainties of the sec-
tor totals and the national aggregate. In some cases where estimates
were prepared by two independent methods t-he difference between
the two gave them an idea of the extent of error. In some cases com-
pletely independent information and estimates were available for cer-
tain stat~/and the NIC checked its own. estimates with those of the
States and, the difference between the two gave them an idea of the
margin ofl error. Occasionally the basic statistics used also gave them
some idea of the sampling error. Thus the figures of earnings used
in the sector of small enterprises showed a coefficient of variation of
-die oJder of 50 per cent. This gave some idea of the margin of error
invoJved. In many cases however estimates were made merely on
subjective grounds. The margins of errors estim~ted by the Committee
vary from 10 per cent in case of mining, factory establishment, com-
munications, railways, organised banking insurance, and Government
services to 33.3 per cent in 'other commerce and transport', profes-
':.ions -and liberal arrs, domestic servants, house property and small en-
terprise. In agricuiture the crror was estimated to be 20 per cent and
in forestry and fishing 25 per cent. All these margins of errors relate
to the revised estimates of national income for the year 1948-49 and the
Committee estimated that the overall percentage of error should be
taken as 10, for the aggregate domestic product.
Dlfltltion of India's info",e as done by NIC. Statistics of national
income published by the NIC were both at current prices-as well as
constant prices. For estimating the domestic product at constant
prices the Committee did not use either the wholesale price or the cost
of living -index as a national deflator. Purely on theoretical grounds
national income should be deBated either by using cost of living in-
dices or by estimating national income in terms of quantum or output,
the Committee attempted to obtain indications of the measure of change
in national real output by covering each important ,category of distri-
bution of the national income and estimating it 'by the method appro-
Wiate for that Category. The Committee however felt difficulty in this
976 'PUNDAMEN'I'ALS OF STATISTICS

approach because output had been estimated in certain sectors on the


basis of average earning and number of workers rather than on
product ap_proach. However in all sectors where estimation was based
on the product approach the Committee evaluated physical production
for the current year at the base years price as far as possible. This
procedure was applied to the sectors of Agriculture, Animal Husban-
dry, Fishery, Forestry, Mining and factory establishment covering more
than 55 per cent of the net output in 1950-51. The Committee however
retained the cost product ratios of the. base year for lack of adequate
data. This naturally was a major source of error in their estimtion,
In the small enterprise sector since output figures could not be obtained
rhe NIC used an index of consumer's price for the purpose of deflation.
In other fields different indicators were used, for example, it he sector
of communications the number of letters etc., carried and he number
of documents sent were used as indicators of the movement of out-
put. In case of banks the number of cheques and for insurance the
number of policies were taken as indicators.
It is clear from the above that the procedure followed by the NIC
for -deflating national income was not very satisfactory but there can
be no doubt about the fact that the Committee did its best under the
limitations with which it was working. bespite the drawbacks in the
procedure adopted by the NIC the figures of natiqnal income based on
the prices f)f 1948-49 do give some indication of the changes in real
output.
SlIggulion.r for improvt'1llenl given by tbe NrC. The final reporl
of the NIC gave some useful suggestions for the improvement of
national income statistics and pointed out the directions in which these
should be made. Some of the suggesdons are given below:
(I) Since the. basic weakness in income approach was found to
be the statistics of occupational distribution it was suggested that all
wage and salary data should be collected to check up census figures and
to throw light on limitations of one or the other. There had been
some improvement in respect of statistics of occupational distribu-
tion in the census of 1951, but the committee felt that further
improvements were needed in this direction.
(ii) The available data relating to employment in factories
covered by census of Manufactures and by the Labour Bureau in
corporate business, public enterprise etc., could be improved and
should be brought out on a regular and continuous basis.
(iii) Regular statistics should be collected about employees com-
pensation, wages, salaries, provident fund, pension etc., by the Labour
Bureau. The Committee felt that it was more easy to collect these
statistics than to collect the statistics of income. The Committee also
suggested tha t some studies should be undertaken to find ou t employers'
compensation to form labour and domestic servants.
GROW'I'H OP STATISTICS IN INDIA 977
(itl)· In the agricultural sector statistics needed a lot of improve-
ment and the reporting agency should be introduced in all non-report-
ing areas.. The yield from agricultural crops should be estimated
nn the basis of the crop cutting experiments from variou9 crops. The
collection of agricultural prices should be systematised and made more
t>rompt. The Committee suggested that traders prices at certain type
:)f markets should be collected instead of village prices or producers
price. Regarding certain aspects of agriculture. animal husbandry.
trade transport and small enterprise the Commi'ttee suggested that
~tructuraJ studies should be undertaken by research institutes and
academic bodies.
(IJ) The Committee felt that there were many gaps in the income
tax data due to evasion etc., and it strongly recommended the improve-
ment of these statistics. The Committee WaS also of opiflion that agri-
cultural incomes must also be included for income-tax purpose~.
(vi) The Committee suggested that statistics :;hould be properly
:oordinated and for this purpose an advisory committee should be
let up and statistical bureau in States should be set up for collecting
mportant data. The Central Statistical Organisation should perform
he job of coordination and guidance.
("ii) For filling in gaps the Committee suggested the streng-
hening of NSS and rhe maintenance of a close liaison bet,!een the Reo
,carch Programmes Committee. Indian Statistical Institute and National
;ample Surveys for undertaking co-ordinated and planned Scheme!~ for
mprovement in national income statistics. The Committee also sug-
~ested that National Income Unit should continue to undertake the
vork of estimation of national income but for better coordination the
~ational Income Unit should be transferred from the Ministry of Fi-
lance to the Central Statistical Organisation.
Many of thes~ suggestions have been accepted and even imp le-
nented by the government. The coverage of our statistics and their
=Juality have both improvea. Statistical bureaus have been set up in
:Hfferent States and the National Income Unit is now a part of the Central
Statistical Organisation which is, doing the work of coordination of
:; ratistics collected in the country an d also gulcUng the work of the statis-
tical bureau in various states. Periodically a national income con-
ference is held in our country ro consider the various methodological
improvements in the calculations of the national income ofIhdia. Joint
cooferences of Central and State Statisticians are also held from time to
time and many state Statistical bureaus have now started estimating
incomes relatlng to their states. We thus find that national income
statistics in our country are gradually improving in all directions.
EarHer Estimates of Indian National Income
More than a dozen.estimates of India's national income have been
made so far. It is not possible for us to deal with all of them at this
62
978 FUNDAMENTALS OF STATISnCS

place. However, we give below the important per capita estimates


made by different authors along with the year or years to which they
relate : -

Year for Per capita


Auth.H which estimate income in
is made rupees
---~ ------- ---- ----
1. Dadabhai Nacroji 1868 20
2. Baring and Barbour 1882 27
3. Lord Curzon 1897-98 30
4. Digby 1899 17.5
5. B. N. Sharma 1911 50
6. Vakil and Muranjan 1910-14 58.5
7. Wadia and Joshi 1913--14 44.5
8. F. Shirras 1921 107
9. Shah and Khambatta 1921-22 67
10. V. K. R. V. Rao 1931-32 65
11. V. _K. R. V. Rao 1942-43 114

Need of Caution. A brief analysis of the above\figures will clearly


show that there are wide variations in the figures even if they relate to
practically the same period. This means that the figures cannot be
very much relied upon. Comparison of the figures of the per capita
income over different periods should be made with a great amount of
caution. An increase in the per capita income figure does not imply
that there has been economic progre~s. For studying problems of
this nature it is the real income which should be compared and not
the money income. Due to changes in the value of money itself a
higher money value may in reality mean a lower real value. In terms
of the price level of 1939 the per capita income of ·1953-54 which is Rs.
283-9 (at current prices) would be much less than this figure. For
purposes of comparison various figures of per capita income should be
corrected for ?rice changes over different periods and per capita in-
come at constant prices s~ould be compared. Another point which
should be remembered i.l this connection is that the definitions and
concepts used by different authors have not been the same. Some
authors have deliberately excluded "services" from their calculations
while others have included [hem. There are other similar differences
as well. Yet another point to be noted is that the geographical limits
to which these estimates apply have also been changing. Some authors
have included former Indian States in their calculation while most of
them have excluded them. The separation of Burma from India in
1937 and the creation of Pakistan after Indian-Independence have subs-
tantially changed the bounuaries of the Indian Union and the laleS!
estimates (of the National Income Committee) and C. S. O. have taker.
into accOUnt. the present area of the Indian Union.
GROWTH OF STATISTICS IN INDIA 979

Special features of India's national income


The most important characteristics of the Indian economy as
revealed by the figures published by the National Income Committee
and C. s. O. is its essentially rural character. Agriculture and allied
ocoupations contribute roughly 49% of the national income in 1960-61.
Another very important fact revealed by these figures is the prominence
of small tnllrprise in this country. Small enterprises consisting of agri-
culture (other than plantations), fishery, small enterprises and hand
trades, professions and liberal arts and domes tic services con tribute
more than 60% 0'£ the national income whereas the larger enterprises
consisting of agriculture (plantation, etc.) forestry, mining, factory
establishments, railways, communications and organised banking and
insurance contributed only 15<}'o of the national income in 1960-61.
This is a very significant revelation, and should be taken into account
at the time of priorities and allocation .of resources. For purposes
of planning a more detailed study of this phenomenon would be very
helpful. Another imporl ant breakdown of the national income in-
dicates the share of the government in the grneration of the net domestic
prodllct.
The net output of Government enterprises which was 2.8 percent
of the total output in 1948-49 has been gradually increasing and it reach-
ed the figure of 4.2 per cent in 1961-62. The net output of Government
administration has also increased during this period from 4.6 percent
in 1948-49 to 6.9% in 1961-62. These figures clearly indicate that the
public sector in the country is gradually and constantly expanding and
the relative importance of the private sector is declining. Yet the share
of the private sector at 88.9 percent in 1961-62 is eight times larger as
compared to that of the public se<;tor.
Besides the above-mentioned breakdowns there are many other
ways in which the national income figures can be looked upon. At
present in the absence of complete statistical information such analysis
is not possible in our country. However, the information would be
of use for future analysis. First thing about which information should
be collected in future is the extent of non-monetised economy in ollr cOlln/ry.
National income· is the money measure of goods and services produced
in a certain period in a country and it has to take into account both
the monetised and the non-monetised section of the national eco-
nomy. However, if the non-monetised section constitutes a big part
of the whole econvlDY the calculation of its monetary worth becomes
very difficult. In our country the prevalence of barter, the payment
of wages in kind and the farmer's consumption of his own produce are
all responsible for the fact that an important part of om economy is
still non-monetised. What is the actual extent of this pllenomenon is
a point which requires detailed invF.:stigation. It is fel t that more than
half of our production (agricultural production, in any case) belongs to
the non-monetised section and only less than half is exchanged for
~8(1 PUNDAMF.NT4I.S OF STATISTICS

money incomes to ,the market. Another point w~ich requires careful


consideration in future is the investigation about division of inc01ll1 between
t1l)0 main calegories viZ., IIrban and rurrJl. The distinction between urban
and rural incomes is rather difficult to be made. The definitions of the
terms cannot be easily framed. 'Should it be the income generated in
rural area or should it be the income eatnc;:d by the inhabitants of such
areas which should be called Rural Income, is a question not easy to
answer. It may be mentioned in this connection that in the year 1934
BOWley and Robertson had recommehded the estimation of India's
national income in two main groups, viZ., rural income and ,urban
income, but the recommendations could not be implemented by the
government. It is necessary that we should study' this problem and try
to find out the shan; of these two important sectors in the income of
the country.
Another important breakdown of the nadonaJ income wlJich
should be properly analysed and studied in future is with regard 10
conslllller'.r expenditllre, souings, and inv#IIIIe11l. In other countries these
breakdowns are available in great details and family budget studies Rre
V'ery exhaustively made to find out the corIsumer's preferences and
necessities. Such studies are very essential for purposes of planning.
In conclusion it may be said that there are ho~eful developments
which indiea.te improvement in national income statistics in the
country. The preparation and publication of the official estimates of
national income is now a COntinUin~task of the CSO. It haS a per-
manent National Income Unit (NJ with the exclusive responsibi-
lity in this area. A uniform, scienti c and detailed classification has
been prepared, and standard methods ot estimating has been worked
out. The labours of NIU shalilill up a long standing gap in the.
available statistical information in the country.
National income, national accounts, social accounts
Man has always tried to develop an orderly understanding of his
affairs. In his quest, he moved from personal to institutional, and
from national to social viewpoints.
National income statistics are the first major attempt to present
t~e sumy;>tal of all the income of a country generated from produc-
tIve a~tlv1ty of the citizens working within the country' and
even In foreign countries. Among other things, this proved to
be a us~ul indicator of the participation of a country in world
production.
. Nati?nal product statistics are another way of looking at
natIonal InCOme statistics. All incomes of the citizens of a
country. rise when they receive payment for their contribution to
p.roduct~o~ ~~tivity. This income for the contribution to produq,-
tlOn acttvlty IS the payment made to them as the providers of the
factors of production. Looking this way, national income is identI-
GROWTH OJ" STATISTICS IN 981

f cal with the national product because the value of the production in
a nation will be the same as the;: pa'yment made to the factors which
contr~buted tQwards this production. Hence, national income will
be equal to the value of all products at factor cost. Tlie vruue of
pro~uction may be meas~red in terms of a f~ctor cost, i.e., payment
to tne factors of productlon; or at mafket prIce,. i.e., the price which
users paid for these products. So we learn tliat :
1. National income is same as not national product ::.t factor
cost.
2. Net national product at factor cost plus deprec'atiofl is
equal to the gross national product at factor cost
3. G~oss national product at factor cost plus indirect taxes
(net) is equal to the gross national product at market price.
This is popularly mentioned as the GNP.
The national income and product figures are aggregates and reveal
the totals of income and product. They are useful but 'unable to
reflect the basic structure of the economy. The reason is that these
statistics do not throw light on the constituents. of these totals.
Moreover, the totals of income and products generated in the country:
do not show the economic' activit'V which reSulted in these totals of
income and 'product. The economic activity is !lot an isolated and
stagnant performance. It is a network or flows. It is continuous,
interrelated human effort passing. shifting, transfer;-ing income and
products from one activity to another-from production to consu~ption
and accumulation. The sum total of all these activities makes up the
national economic activity. \
National accounts have been developed to present· a more faith~'
ful and finished picture of the national economic activity. It seeks
to attempt an orderly understanding of the strtlcture of tl)e economy
and the flow of economic activities in the country. National accouJ;lt
technique trjes to bring forward what the nation has been doing to
obtain the income and product totals. In understanding the natllre
of nationa1llccounts, three things must be clead y made out:
1. national accounts or statements record the transactions (or
activities) in the areas of production, consumption, accumulation and
foreign transactions by.sectors.
2. sectors are the institutional subdivisions of the economy, name-
ly, ownership, fonn of organization, industcy etc.
.3. the transactions are presented in the standard design developed
by the science of accountancy. The object is to arrange and structure
the transactions in such a way that they reflect the inflows and ou~flows
generated by the activities.
We find that national income is the preperation and presentation
of the totals of income and products generated in th<: co~ntry. This
982 FUNDAM~'NTALS OF STATISTICS

deals with the quantllm and concerns itself with the magnitude.
National accounts on the other hand seeks to detail alit these totals;
and to ,'reate a stmetllre which permits compilation and comparison
of all CCOI/olllie effort which would reflect the economic activity in the
national economy. Then, the concept of social accounting is a pro-
jection of nati'mal accounting approach into the more comprehensive
area. of a structure which is so c:>mprehensive that it covers total socia'
(lctivity so far as they can be guanti1ied. This would try to eliminate
the artificial distinction between the economic and non-economic
activities. Social accounts call for more elaborate structure and are
concerned with comprehensive, orderly consistent p re5entation of the
facts of human endeavour in any society. Once the problems ot
detailed data collection, classification and c(,mputation are solved,
social accounting will truly mirror and measure the national effort and
achievements.
In our country we do net have cno ugh .st'ltistics to have
meaningful set of national and social accounts. However, the C.S.O.
is bringing out a simple set of national accounts for the country. We
need more exhaustive information for the preparation of soda
accounts and efforts ate being made to fill up the gaps in statistical
data so that such accounts may be prepared for the country.
The national income commiltee attempted to ptesen't a sample set
of social accounts of India for the year 1948-49. These accounts were
derived from three kinds of information :_ I

(i) Net domestic product at factor cost built Up ,from the


estimate of the net value added in each branch of activity.
(ii) A detailed classification on economic lines of the transactions
of public authority. that is, of Central Government enterprises. State
Government and local authorities and
(iii) The transactions between Indian Union and tile rest of the
World.
With these types of information the committee txied io:£.ll,·~in·nq­
merically the conceptual structure of our national income. It lttterhpteCi
to distinguish between -
(a) the different types of economic activity, in 3 basic foOns :
production, consumption and additions to wealth I
(b) ~fferent classes of transactions particularly betw,cc~ Ge~-'
men! and prlVate sectots and within the latter between different formS of
organisations such as enterprises and households and
(t) different classes. of transactions particularly those which in...
valved transfer of Committees and services and those which too,," the
f~rm of simply unilateral payments.
GROWTH OJ' STATISl'lCS IN INDIA 983

The committee presented four accounts each for the three sectors
of the economy, namely, enterprises, house-holds and Government.
The four accounts presented for each of these three sectors related to
productioll (operating account) consumption (appropriate account),
addition to wealth (resting account) and external acCOUnt.
However in the tinal report of the National Income Committec'
these accounts were dropped as it was felt that the statistical data avail-
-able in the country were not adequate for the presentation of national
accounts. Since then the national account estimates have not been
published by the CSO also. It would be worth while to collect statis-
tics and to present a simple set of national or social accounts because
they reflect in a nut shell the entire economic activities of a country and
give a bird's eye view of the economy as a whole.
S~CTION 10
NATIONAL 'SAMPLE SURVEYS
Blgimling. With the attainment of freedom the full impact of the
large gaps in the statistical information available in our country was felt
and an urgent need arose to tone up its quality and- quantity. At the
instance of the Prime Minister Pt. Jawaharlal Nehru, Prof. P. C. Maha-
lanobis prepared an abstract scheme of organizing a National Sample
Survey (NSS) which was approved in principle by the Government of
India in January 1.950. A Directorate of National Sample Survey was
set up under the Ministry of Finance to collect economic and social
information from the whole country on the basis of random sampling.
The general approach in the planning of NSS was entrusted to Indian
~tatistical Institute together with the Gokhale Institute of Politics and
Economics, Poona, running under the direction of Prof. D. R. Gadgil.
A,s the experience and the field of activity of the Calcutta and Poona
Instit-utes had been very different it was not unnatural that certain
differences of opinion emerged at the early stages. However, as a result
of joint discussion. a plan for the sample design in its more restricted
sense ,of allocation. and the methods of selection of the sample units,
etc.' was evolved.
Meaning. The idea of using sampling methbd for collecting eCo-
nomic data was not new. It can be said to be a projection of tIle
"Scheme for an Economic Census of India" prepared by Professors
A. E. Bowley and D. H. Robertson in 1934. Sampling is of course
the only possible approach if it is desired to collect data about various
facts relating to agriculture,,industry, trade and services, etc. The only
possible method is to send round investigators to collect information
from a -comparatively small number of individual households, that is,
from ajample of households, selected in such a way that it would be
possible to estimate on the basis of the sample, the required information
for the country as a whole. Where the scope of survey is wide and the
territory. to be covered is large, the sample survey is the only practical
means for obtaining information at short intervals. This is especially
true of a rural economy of the size of India covering a vast area of 12.6
lakhs of square miles.
Metbod. For administrative purposes India was till recently. divid-
ed into 29 states (some.of which are very small), ~bout 300 districts,
about 2500 tahsiis (or equivalent units) 3000 towns, and about 5,86,000
villages in round figures. The total area of India is 1.26 million squart
miles broken up into some hundreds of millions of "plots" of land, and
at the 1951 census, population of 1ndia was 36 crores (7 crore households)
of w~om 6 crores in round figures live in towns and 30 crores in
GROWTH OF S'l'A'lISTICS IN INl>IA

villages. Tur.Q.ing now to the livelihood pattern we find that less than
one~third of the Pl!ople are self-supporting and of these again, less than
a third derive their income principally from non+-agricultural sources.
I As regards the sampling design, it can be stated very briefly that
the rural area' of the country was divided into about 250 geographical
Strata from which about a thousand sample villages were selected. In
the first three rounds the villages were di~ectly selected within a stratum
but in subsequent rounds two tahsils were selected in each 'stratum and
two villages were selected within each sample tahsil (an administrative
unit of about 500 square miles on an average). Finally a sample of
households was taken up for the household enquiry and a sample of
clusters of plots for the land utilization survey. From the third round
the survey was extended to urban areas with a stratification of towns
by size.
,. A general idea of the nature of information collected is given
below!-
(a) Sample villages: general economic information, and weekiy
prices of sclecte.d commodities; rates of daily wages of skilled and un-
skilled workers, etc.;
(b) HOllSeholds : (i) General Particulars. Age, sex, marital
status, economic and employment st1l.tus; births, deaths, etc.; detail!
regarding holdings, use of land under various categories; li~estock, reaj
assets, loans, savings, housing conditions, etc.;
(ii) Consumer expendifllre: on a very large )lumber of items;
(iii). Household enterprises ! agriculture and animal hl,lsbandry
acreage and production of different crops; p'articulars of industry, crafts
and trade, including fixed capital machinery and tools, fuel power, .raw
material~, quantity and value of production, Source of finance, etc.;
(c) Utilization of land: survey of sample of reyenue plots;
(d) Crop survey: crop acreage and estimates of the yield of crop
per acre by direct crop-cutting experiments; ana
(e) Sample 'survey of oJanujacturing establishments: (with 10 opera-
ives or m~re with power, or 20 operatives or more without power:
covering practically all groups of industry over the whole of India,
The statistical work of NSS (including the preparatiol1 of the
sample design and schedul~s, the processing and tabulation'of the data
and the preparation of the reports) is being done in the lSI. The first
report was published in December, 1952 and since then a chain of re-
ports an.d tr;chnical papers have been continuously coming. out. In
addition, special surveys have been conducted and reports and data in
tabular form have been supplied' from time to time to different agencies,
for example, the Fact Finding Committee of the Ministry of Rehabilita-
tion (survey of economic condition of refugees of West Bengal); The
Press Commission (habits of newspaper reading); The Taxation Enquiry
Commission (household consumption by expenditure levels); Tht
\
\
986 FUNDAMENTALS OF STATISTICS

Ministry of Rehabilitation (survey of refugees in Bombay); Ministry


of Works, Housing and Supply (survey of housing); and some good
work in connection with Mysore Population Study conducted by the
United Nations and the Ministry of Health and so on.
The information is collected in the NSS mainly by the "interview
method" in which the)nvestigators visit each household included in the
sample and make direct enquiries from the householders. In case of
crops and certain other items, the investigators collect the data by their
own direct observations. The investigators are employed on a wholc-
time basis and work throughout the year; in addition there is a whole-
time inspect ing and auxiliary staff. The total ~trength of the field
branch of the NSS is about 600 who work under the direct control of
the Department of Economic Affairs, Ministry of Finance.
First Round. The first round of the NSS waS started in October,
1950 and was completed in March, 1951. In this round a sample of
1833 villages (out of a total of about 5,60,000 for the whole of India)
was selected for investigation. The sample was divided into two
groups of villages each of which was scattered throughout the country
and two different sets of schedules were used in the two groups. One
set (which was prepared by lSI) was employed for ~ollecting informa-
tion in the first group of 1189 villages, and the second set of schedules
(prepared by the Gokhale Institute of Politics and Economics) was
used in the other group of 644 villages. Both LIte schedules covered
comprehensively, the demographic and economic characteristics of the
households. The Poona Schedules, however, contained somewhat
lesser details. There was one important difference in regard to the
pejod of time for which the particulars were to be collected. Most of
the information collected in the schedules 'drawn up by lSI related to
the one year period from July, 1949 to June, 1950. But most of the
items of Poona Schedules referred either to the day of visit or SOme
short period immediately preceding that day and as such many items
in them are liable to seasonal variations. It is for this reason th;..t the
Report on the Poona Schedules of the National Sample Survey (1950-
51) states that the tables have been purposefully presented as relating
to only one month (four weeks) or only three days as the case may be
and have not been inflated to be brought to an annual basis. The use of
these two different sets of schedules had lhi~ advantage that ~n the very
first round of the survey, it was possible to obtain all India data for two
different periods-for the period July, 1949 to June, 1950 and secondly
for the period September, 1950 to February, 1951-for~the purpose of
comparison and verification. However, thtS expectation was belied as
a rrsult of the highly critical attitude which the Poona Report adopted,
about the whole venture.
SIIbst9llent rounds. The second round (April-june 1951) also co-
vered only the rural areas. In the third round (August-November 1951)
the urban areas (including the big cities of Bombay, Calcutta" Delhi, and
Madras) were also covered. We have noted earlier tnat in the fourth
round (April-October, 1952) the design for the urban areas remained
GJtOWTH OF STATISTlCS IN INDIA 987

broadly the same as in the third round but the design for the rural areas
was completely changed. The fifth round also included industrial pro-
duction which covered all household and non-household enterprises
in both rural and urban areas other than those employing a minimum of
10 or more workers with power or 20 or more without power on any
day in the year. The sixth round which commenced in May, 1953
sought to collect comprehensive information on a wide range of sub-
jects : consumer expenditure; other household expenditure, posses-
sions including land, tanks, agricultural implements, livestock, poultry,
etc., fertility, binp.s, deaths and diseases; manufacturing establishments,
agriculture and animal husbandry; small scale industry and handicraft;
transport, trade, professions and services, etc. \'{'ith time and ex-
perience the scope of the surveys became wider and included more
and more subjects of enquiry. The 20th survey (J uly 1965-June 1966)
included Goa, Daman, Diu and Pondicherry It now covers almost
all the aspects of the life and living of the people In the country.
In July 1969 the 24th round of the survey started.
Assessment of results. It would be instructive to note here some
of the criteria for the assessment of the results of the NSS. The main
difficulty arises out of the fact that the information on many aspects of
economic conditions in the rural ar eas of India has been obtained for
the nnt time. In most cases similar data are not available for the pur-
pose of direct comparison. There can be three broad approaches which,
of course, are complementary and must be used jointly and simultane-
ously. Firstly, the great scientific merit of sampling method is that in a
properly designed sample survey,it is possible to calculate the margin of
error of the results from the sample data themselves. Such calculations
supp1y a powerful tool to judge the significance of comparison
based on tlle NSS estimates. Secoud, in a properly designed sample
snrvey it" is. alsO ~ble to study (for example, by using an inter-pene-
trating ne,twork of sub-samples) and sometimes to eliminate, the effect of
non~liqg ,e&:ors which arise,from pias, recording mistakes, and other
disturbing factors operating at the stage of collecdon of the primary
information. NSS has used this method with considerable success.
Thit~ special ~quality' checks may be carried out by highly trained
and experienced workers (indudng senior statisticians) who would
tl;l~elves go ~t to the field and directly collect some of the critical
prifu:il:y ihta. ,The 'results thUS obtained can be used to determine the
re1iability of the data collected by the reguI,ar investigators. All these
~hods are lnternal. in, the senSe that information would be collected
hl'ttfe NSSitse!f in many differeDt ways with a view to improving the
~ahiHty 'of the results. A _seeond broad approach is to use external
~t;eks by comparing the NSS. results with data obtained from entirely
lndependent sources. External checks are of great value, and it is
intended to include test items in the NSS schedules from time to time
", ith the deliberate intention of using such information for purposes
of comparison with data obtained from independent sources.
988 FUNDAMENTALS OF STATISTICi

In caSe of aU scientific investigation there exiSts a final criterion


namely, the reliability of forecasts made on the basis of present know-
ledge. In the case of the NSS al_so inferences drawn in a valid manner
on available evidence will have to be judged by future events. The
validity of the NSS results, thus will have to be assessed by a piecing
together of a large mass of evidence based partly on internal consis-
t~ncy, partly on external checks, and ultimately on the acq:tracy and
social usefulness of the estimates and forecasts. The different strands
of evidence cannot be expected to be all concordant. Some of the re-
sults may be contradictory. It may easily happen, especially in the initial
stages, that certain items of information are reliable while data on certain
other items are still untrustworthy. Even in abstract principle all the
results cannot be always accurate. The theory _of probability demands
that forecasts made on the basis of sampling must sometimes prove
wrong. In the case of a big enterprise like NSS some reasonable timt'
must elapse before a proper assessment can be made of its perfonxlance.
All that can be said at present is that the results of the NSS are definitely
encouraging and provocative of thought.
So far as the present endeavours go it is hoped that it would be
.possible in future to conduct three or four ~egular rounds of survey
every year -with a standard programme of work covering important
sectors of info~matio.n. These regular surveys would be designed with
a view'to. supplying, from year to year, repeated 01;' lseasonal estimates
of important economic factors to indicate economic trends. It is also
proposed to conduct from time to time may be once ~n each year, special
sutV'_eys relating to particular subject~ such as education, transport,
health, etc., in co.nnection with the Ministries concerned. It is intended
in addition to keep a certain amount of work-load free for ad hoc en-
quiries at short notice. In this way it is hoped that the NSS 'Would
supply. in future a Continuous stream of essential information relating to
the' economic progress of the country for purposes of planning.
SEcnON 11

THE PRESENT POSITION


Now that we ha'Ve bdefly discussed the statistics relating to bume
of the important economic problems of our country we are in a posidon
to ha'Ve a correct view of the present position in regard to the statis-
tical 4ata a'Vailable in our .!=ountry. One of the main shortcomings of
the Indran 'statistics-1las been their inadelJllary and in&ollJp"/lOtSs. We
ha'Ve already seen how statistical data relating to agricultural anQ. indus-
trial production, prices, wages, and cost of living,.etc., are not complete.
Our statistics ofincome, wealth, indebtedness, etc., are also very mt"iigre.
As early as 1925 the Indian Economic Enquiry Committee has pointed
out that so far as statistics relating to finance, population, trade, trans-
port and communication, education and vital ~tatistics are concerned the
statistics are more or less complete while statisdcs relating to agricultllre,
dairy farming, forest, fisheries, .minerals, large-scale industries and small
scale industries .are satisfactory in some i'espects but incomplete or totally
wanting in others. They had further obse['Ved that statistics relating to
income, wealth, cost of living. indebtedness, wages and prices were not
at all satisfactory. These observations of the Indian Economic En-
quiry Committee were perfectly true till e'Ven recently. E'Ven today
when so many impro'Vements have been made in the coverage of Indian
statistics the' situation is far from satisfactory. The Government is no
doubt doing its best to collect statistics on as many problems and as
quickly as possible, but a lot of work has yet to be done. The National
Sample Sur'Veys ha'Ve no dou~t expedited the collection of statistics in
our country and it can be expected that as a result of the activities of this
organisation the gaps in the coverage of Indian official statistics would
be filled up soon.
Due to the integration of the former Indian states the gaps in tbe
geographical (overage of the statistics are gradually being filled up and
trained staff is being appointed in those areas which had formerly no
agency for the collection of statistics. We have already pointed out
that in the permanently settled areas which had no permanent staff for
the collection of sta tistics. trained in'Vestigators are being appointed to do
the job. Similarly, in former Indian states, investigators have been
appOinted to do the work of collection of statistics. These improve-
ments are of a far-reaching effect and we can reasonably hope that before
long the statistical data available in the country would be adequate for
all practical purposes.
Another drawback of our statistics is that whatever data are avail-
able are not c4ways accflrate. The agency for the collection of official
data is not always trustworthy. We have pointed out how the primary
reporting agencies in the villages do not do their work satisfactorily
990 FUNDAMENTALS 01' STATISTICS

due to a variety of reasons. In this direction also certain improvements


have been effected in recent years and the government is gradually
app >in ting Intelligence Inspectors and other trained personnnel for the
collection of statistics as also for their proper analysis and interpretation.
OUf statistics besides being inadequate and inaccurate are not pro-
perly analYsed. Scientific methods ate not always applied for the analysis
of statistical data. As has been mentioned earlier most of our statistics
result from the administrative activity of the GovernrtJent and statistical
analysis suitable for administrative purposes is not necessarily suitable
for economic interpretations also. Official statist~s before being used
for other purposes shQuld be properly analysed otherwise they may give
misleading conClusions. It is gratifying to note that the government is
now taking steps for the proper analysis and interpretation of the statistics
collected by it. Various statistical units .aftached to different ministries
at the Centre as also in the States now do the job of analysis and inter-
pretation of data. These statistical u;',its usually comprise of trained
personnel and there has been a considerable improvement in the analysis
and interpretation of official statistics in our cO":lntry in recent years.
Besides these drawbacks, another shor!coming of our statistics
has been that their scope, rignificance and me/hods of comJilaNon are not al-
ways flllly known. Some years ago the Economic Advlser's Office under
the Ministry of Commerce and Industry published three "Guides to
Current official Statistics." These publications related to the official
statistics about production, trade, transport and public finance, etc. It
is necessary that these "guides" must be made uptodate. In recent years
numerous changes have taken place in the technique and organisation
ot' the collection of statistics and it is necessary that general public should
know as to how various official statistics are collected and what is their
scope and agency of collection. However, in recent years most of the
government publicatiQns which contain statistics collected by the official
agencies also give in brief the technique, scope and methods of their
compilation. Certain footnotes and explanatory memoranda or intro-
ductory notes are also l!iven in these publications for further cladfication
of the :.''!ta coiiecred.
Yet another shortcoming of the Indian statistics has been lack of
(o-ordination. There was till recently, and to a certain extent even now,
duplication and overlapping in the collect Ion of statistics. Moreover
statistics collected were not properly co-ordinated. However, this
defect has been removed to a consid,'rable extent in recent years. In
most of the states the work of the various statistical units attached to
different ministriesisco-ordinated by the special statistical organisations
and at the Centre the Central Statistical Organisation co-ordinates the
work of the various state organisations as also of the statistical units'
attached to the ministrie.s at the centr~.
GROWTH OF' STATISTICS IN INDIA 991

anee. The utility of statistical data is considerably reduced and in many


cases the statistical data become more or less useless for all practical
purposes if they are not published in time. Though some improvement
has been noticed ip recent years so far as this drawback is concerned
yet many government publications see the light of the day when it is
too late to utilise them. They have only an academic value and busi-
nessmen, economists, and other persons usually treat them as things of
the past. Even our crop forecasts come very late and many times they
are pubUshed when the crop has been harvested and is actually in the
market. The .de:ay in the publication of govc;rnment statistics arises
partly on account of the fact that it takes a long time to collect the statis-
tics relating to t he whole country and it is not surprising that statistics of
an all-India character are not published in time. Now that the statistical
organisation of the country has been decentralized and most of the
ministries at the Centre :is well as at the States have their own statistical
organisations statistics relating to different subjects and different regions
can be made available without much delay. In fact some of the minis-
tries have started bringing ou't monthly publicarions in which recent
data collected and analysed by them are available. It 1s necessary that
a'tempts should be made to bring out the statistics collected within a
reasonable rime so that they serve the purpose for. which they are meant.
The above survey of the past and the present position of the sta~
tlst ical data available in our country clearly indicates tha t in recent years
:he situation has considerably improved particularly after the indepen-
dence of the country atJd more particularly afrer the beginning of the
first Five Year Plan. Serious attempts have been made by the govem-
ment to improve the statistics relating to various types of problems. It
does not, however, mean that there is no scope for further improvement.
fn fact there is considerable scope for further improvement and econo-
mic development of the country in future and the success of the
Five Year plans would depend to a considerable extent on the availabi-
lity and proper analysis and interpretation of statistical data relating
to various economic problems.

Questions
I. Write a brief critical note on the 19~ 1 census of population.
(R. Com., Allahabad, 19~2.),
2. Discubs the possible value of Census Reports to producers, manufacturers
.. nd businessmen. How can the Indian Census Reports be made more useful to these
people;' (M. CO"", Allahabad, 1948),
(B, Com, Na5.Pflr, 1945),
. 3· Discuss the main features of population statistics in India. What sugges-;
tl()nsw,mldyou offer to make them more reliable and useflll ? (M.A.,Allahabad, 1951).
4· "The Phoenix system is in fact a financial mistake as well as an-intellectual
crIme," (Cenml Commiuioner, 1941).
How far do you agree with the above criticism of the Indian census? In what
respect is th<: system of conducting the 195 I census and improvement over the
!9·H census ~ (111, A" Punjab, SeplP",bu, 195 I).
992 PUNDAMEN't'ALS OF S1'ATISTICs

J. What statistical data are nc:ccsaary for cilculation of the nct reproduction
rate? What is the deficiency in the existing Indian data in this respect?
(M. A •• Allahabad, x9' x),
. 6. "The available ~gricultutal statistic!! in India are incomplete and inaccurate
10 so. fat as ~a) data ~la~Jng to acreage and production in non-reporting areas ate
wantmg,(b) 1Oformatlon 10 respect of permanently settled areas is rar from satisfac-
tory, and (&) t}t, level of accuracy of output figures leaves much to be desired".
Comment on this statement, suggesting measures fQr elfectfug improvements
in each of these directions. (B. Com., AllalJabad, i949 and 195 1 ).
7. Define a normal yield and describe the official method of determining it.
What do you .consider to be the defects of the method and how would you remove
them ? (M. A .• Raj., 1950)
8. What do you understand by Jhe:term "Indian Agricultural Statistics" i'
outline their shortcomings and give concrete suggestions to remedy them.
(M. A., RaJ., 195 I,)
9. Why are agricultural statistics in the temporarily lIettled areas in India
said to be comparatively more reliable than those in the permanently settled areas.
(M. A., PHlljab. APril, 195 1 ).
10. Wdte a l'.lcid note on either the system of crop forecasting in India or
the adequacy. and reliability of data available on agticultur~1 prices and wages in'
India. .(M.A., PilI/jab, SeptBmber, 195 2 ).
1t • What important statistics of food production are available in India? How
are they c9mpiled, and in what official publications are they found?
(M. CoJ,J., Allahabad, 19S 2)
12. What is meant by Census of Production ? Why is such cenSllS taken?
How rar is the Indu~trial Statistics Act adequate from the point of view of
holding this cens).ls in India? (M. Com. -Allah.d, 1947).
13. Write a lucid note on the nature and scope of i~dustrial statistics in
India. (D. Com., Allahabad, 1953)·
14. Explain the importance of "Price Statistics", and examine the nature and
scope of the data relating to them avaqable in India. (M. Crm., Allababad, 1947).
IS. Which publication would you consult to find (i)the number of co-opera-
tive and land mbrtgage banks in different Indian States, (;;) the rail-borne inland
trade for the U. P, (iii) the annual absorption of small coms in India, and (iJ') the
\!heat area irrigated in U. P .. and Bihar? (M. A •• AI/aDabad, 19P).
16. What statistics are nec~ssary to ~e.ep under review the pu~c~asin.g
power of a country's currency? Write a crlhcal account of such statisticS 111
India. (M. A .• Agra., 1.945)'
17. What are the methods usually ad?~ted .for ~easuring til:e natjon~. income
of a country? 'Which of these in your opinion IS SUited to Indian cond1tlons ?)
(M. A., Allahabad, 1950).
18. What is national income? What statistical methods of its estimation are
known to you. Give a lucid account of the method actually adopted in any of th e
recent ()fficial or non-official estimates framed for India ..
(M.A., Punjab, April, 1952).
12' What are the special probiems of national income estimation in India ?
DeSCrIbe briefly the variobs methods followed for the calculation of Indian income.
eM. Com., Allababad, 1952).
20. 'Describe briefly the Ihethod followed by the National Income Committee
for framing estimates Gf national income of India 1948-49. How far does this method
.Hlfer from the one recommended by the Bowley Robertson Committee?
(B. C(}m., Allahahad, 1911)
GROWTH OF STATISTICS IN INDIA 993
E. Write notes on :-
(..) N.S.S.
(b) C.S.O.
(I) National Income Unit;
(tI) Reserve Bank Index of Security Prices;
(,) Consumer Price Indes Numbers (Labour BU!e1U)~
(f) Economic Adviser's Index Number of Wholesale Prices (now interim
series) ,
2.2.. 'Plsn~ng without lltatilltica is a ship 'With('\\It rudder and compass.' In
the light of this statement explain the importance of statistics as an. ~ecti~ aid to
national 1?lanning in India. ' (B. COlli., Bt:uuwaJ, 1958).
~ In a communique issued by the Government of lndia t~1I main featu1CS of
the 1961 CetlSUS have been described lIS the collection of economic data for pJanning
purposes, which indicates thac tbe prepatation for the bQge task are now afoot. In
the Jight of the above objective: what suge-estion can you give for the next population
census of lndi".
MATHEMATI[AL TABLES
Loprltbms

Mean Differences I
0 I 2 3 4 5 6 7 8 9
I 2 :s14 5 6 789

10 0000 0043 0086 0128 0170 0212 0253 0294 0034 0074 4 8U:l'12125 21) 33 37
II 0414 0~53 0492 0531 0569 0607 0645 0682 0719 0755 4 811 15 111 23 26 SO 3'
12 0792 0828 0864 0899 0934 0069 1004 1038 1072 1106 3 710 14 1721 24283
13 1139 1173 1206 1239 1211 1303 1335 1367 1399 1430 3 610 1816 19 2326 2'
14 1461 1492 1523 1553 1584 1614 1644 1678 1~3 1732 3 6 9 121618 21 24 2'
15 1761 1700 1818 1847 1875 1003 1931 1959 ? 7 2014 3 6 8 11 14 17 20222
16 2041 2068 2095 2122
2148 2175 2201 2227 '2253 2279 3 5 8 11 13 16 1821 2
11 2304 2380 2355 2380
2405 2430 2455 2480 2504 2529 2 5 7 101215 17 202
18 2553 2577 2601 2625
2648 2672 2695 2718 2742 2765 2 5 7 91214 1619 I
19 2788 2810 2833 2856
2878 2900 2923 2946 2967 2989 2 4 7 II 11 13 1618 j
20 301Q 3032 3054 3076 30116 3118 3139 31da 3181 3201 2 C 6 81113 1517 :
21 3222 8243 3263 3284 33~ 3324 3345 3365 3385 3404 2 4 6 81012 14 16 '
3444 3464 3483 360 3522 3541 8~79 3598
22 3424 3500 2
" 6 8 10 12
~~ ~~
2'
24
25
3617
3802
3979
3636
3820 :M 3674 36112 3711 3729
8856 3874 3892 3009
3997 4014 4081 4048 C005 4082
3747
3927
4009
3766 3784
Stu5 3962
4116 4133
2
I
!
4 6
4 5
8 6
7 (911
7 1111
7 " 10
12 a
1214
.

26 4160 U66 U8S 4200 4216 4232 4249 4265 4281 4298 2 3 5 7 810 1113
4314 4330 4346 4362 4378 4393 4409 4425 4440 4466 2 S 5 6 8 11 13
21
28
29
. .72
4624
4171
4487
4639
4786
4502
4654
4800
4518
4660
4814
4633
4683
(821)
4548
4698
4843
4564 4579 4594 C609
4713
4857 m?4742 4757
4se6 4000
2
1
1
S 6
3 4
3 4
6 8 I)
6 7 9
6 7 9
" 11 12
10 12
1011
30
31 4914 4928 4942 4956 4069 4983 4997 5011 5024 5038 1 8 4 6 1 8 1011
32 5051 5065 5079 50112 5105 5110 6132 5145 filS9 5172 1 8 4 5 7 8 911
5185 5198 5211 5224 5237 5250 5263 52,8 5289 5302 1 3 5 9 10
33
34
35
5315
5441
5328
5453
5940
5465
5353
5478
5366
5400
5378
5502
5391
551' 55~7 5539 5551 1 2 4
5416 5428 1 8 "" 5
5
6
6
6
8
8
7
910
910
36 5563 5575 5587 5599 5611 5623 5635 5/47 5858 5670 1 2 4 5 6 7 8 10
5682 5694 5705 5717 5729 5762 ~63 5775 5786 1 2 S 9
~;~
'51 5 6 7 8
38 6798 5809 5821 5832 5843 6866 77 5888 5899 1 2 3 I'> 6 7 8 9
J9 6911 5922 5033 5944 5955 5066 5077 988 5000 6010 1 2 8 4 I'> 7 8 II
0021 6031 0042 6053 6004 6075 6085 0096 6107 6117 1 2 3 8 II
40
6138 OaQ 6160 6170 6191 6201 6212 6222 1 2 3
" 5 0
6128 6180
"" 5 7 6
41 I) 6
42 6232 6243 6253 6263 5274 6284 62~ 6304 6314 6325 1 2 8 6 7 8
43 6335 6345 6355 6366 6375 6385 63 6405 6415 6425 {I 7 II
" I'>
f1.~
1_-H
64~ 6503
~~
44 6435 6Ul 6454 6464 6474 6484 6513 l,.. 6 7 8
45 6532 6542 6551 6561 6571 6580 65 6599 123 6 7 8
6R'1~ I,,~ ~
~
46 6628 0637 6646 6656 6702 6712 1 2 3 4. 5 6 7 7
47 6721 6730 6739 67~
6812
"6767 ~?6
6321~ o<lS9 6848 68.S?
I 6785 6794 6803 123 4 5 5
Ii
6 7
7
fo~
48 6875 6384 6893 1 2 3 6
49 ~~ o~ll 6920 61128 6937 6946 6964 6972 6981 1 2 S 44 "4. 5 6 7
-~ 61100 6998 7007 7016 7024 7033 17042 7050 7059 7067 123 S 4- Ii 6 7

7126 7135 7143 7152 1 2 5 5 7 ,


1~ 7372
51 7076 7084 7003 7101 S S 4-
52· 7160 7168 7177 7185 7193 7 7210 7218 7226 7235 1 2 2 S 4 Ii 6 7
53 7243 7251 7259 7267 7110
72i5 7 7292 7300 7308 7316 1 2 2 3 4 5 6 6
54 7324 7332 73-lO 7348 735617 7380 7388 7396 1 2 2 3 4 5 6 6 ~
I

/
)(ATHBMATlCAL'l'A.US

I Mean Differences
0 1 2 :5 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9

55 7404 7412 7419 7427 7435 7443 1'51 7459 7466 7474 1 2
.-
2 3 4 6
--
1\ 6 7
56
57
58
7482
1559
7634
7490
7566
7642
7497
7574
76411
7505
7582
7657
7513
7589
7664-
7520
7597
7672
7528
7604
7679
7636
7612
7686
7543
7619
7694
7551
7627
7701
1 2
1 2
2
2
1 1 2
3 .
3 4 5
5
3 4 4
5 6
5 6
5 6
7
7
7
59 7709 7716 7723 7781 7738 7745 7762 7760 7767 7774 1 1 2 3 4 4 5 6 7
60 7782 7789 7796 7803 7810 7818 7825 7832 7839 7846 1 1 2 3 4 5 6 6
61 7853 7860 7868 7875 7882 7889 7896 7903 7910 7917 1 1 2 3
" 5 6 6
62 7924 7931 7938 7945 7952 7959 7966 7973 79SO 7987 1 1 2 3 3
63 7993 8000 8007 SOH 8021 8028 8035 8O~1 8048 8055 1 1 2 3
"3 "4" I) II
5 5 6 •
tl

64 8062 8069 8075 8082 8089 8096 8102 8109 8116 8122 1 1 3 3 5 5 (I
65 8129 8186 8142 8149 8156 8162 8169 8176 8182 8189 1 1
2
2 3 3 "
4 5 5 6
66 8195 8202 8209 8215 8222 8228 8235 8241 8248 8254 1 1 2 3 S 4 5 Ii 6
67 8261 8267 8274 8280 8287 8293 8299 8306 8312 8319 1 1 2 3 3 4 5 5 6
68 8325 8331 S338 8344 8351 8857 8363 8370 8376 8382 1 1 2 3 3 4 4 5 6
69 8395 8401 8407 8414 8420 8426 8432 8439 8445 1 2 2 3 4 5 6
70
8388
8451 8457 8463 8470 8476 8482 8488 8494 8500 8506 1
1
1 2 2 3 "4 4 5 (I

~
II

, """
8518 8519 8525 8531 8537 8543 8549 8556 8561 8567 1 1 2 2 II 4 6
8M's 8579 8585 8591 8597 8603 8609 8616 8621 8627 1 1 11 2 3 4 6 5
n 8633
8692
8639
8698
8645
8704
8651
8710
8657
8716
8663
8722
8669
8727
8676 8681
8739
8686
8745
1 1 2 2 3
8
4
~
6
6
6
6
8733 1 1 2 2
J: 8761 8756 8762 8768 8774 8779 8785 8791 8797 8802 1 1 2 2 3 S 4 6 6
76 8808 8814 8820 8826 8831 8842 8848 8854 8859 Ii {;
8865 8871 8876 8882 8887
8837
8893 8899 8904 8910 8915
1
1
1
1
2
2
2
2
S 3
8 3 "" 4 5

f! 8921
8976
8927 8932 8938 8943
81182' 8987 8993 8998
8949
9004
8954 811e0 8965 8971
9()09 9016 9020 9025
1
1
1
1
2
2
2
2
3 3
8 3
4
4
4 {;
.( 5
80
81
0031
9085
1lO3G 00'2 1lO47 0053
9090 9096 9101 9106
0058
9112
11068 9OG9 0074 0079

9117 9122 9128 9133 1


1 1
1
2 2
2 2 3
S 3
3
" " r.
4 4 5
82 9138 9148 9149 9154 9159 9165 9170 9175 9180 9180 1 1 2 2 3 3 4 4 5
83 9191 9190 9201 9206 9212 9217 9222 Y227 9232 0238 1 1 2 2 :\ 11 5
84 9248 9253 9258 9269 0274 ""4 ""4
85
9243
9204 9299 9304 0300
0263
9315 9320 9325
9279
93ilO
9284 9289 1
9335 9340 1
1
1
2 2 3
2 2 3
3
3 " "" 5
5
9345 9350 9355 9360 9365 9370 9375 9380 9385 031lO 1 4
86
87
88
93~5 9400
9445 9450
9405
9455
9410 9415
9460 0405
9420
9469
9425
9474
9430 1I~35 9440 0
04711 9-184 9489 0
1
1
1
2
1
1
2
2
2
3
2
2
3
3
3
3
3
"
4
5
4
4
89 0494 9499 9504 9509 9513 9518 9523 Q!'i28 0533 0538 0 I 2 3 "
4

""
1 2 3
90 0542 9547 9552 9557 9562 9566 11571 9570 9581 058U 0 1 1 2 2 3 3 4
91 9590 9595 9600 9605 9609 9614 11610 9624 962» !l033 0 2 3 4
92 9638 964~ 96-17 9652 9657 9661 9066 9671 0675 0080 0
9685 9689 9694 9699 07()3 9708 9713 9717 0722 072i 0
1
1
I
1 2
1 2
I 2
2
2
3
3 3
3
4
4
"
4
93
94
95
9731 9736 9741 9745 9750 9754 9759 1)703 071)8 9773 0
9777 9782 9786 9791 9795 9800 9805 9809 9814 9818 0
1
1
1 2
1 2
2
2
3
3
3
3
3
4
4
""
4
9823 9827 11832 9836 9841 9845 9850 11854 9859 9863 1 1 3 4
96
97
98
986S
91H2
987219877
9917 9921
9881
9926
OSSIl
9930
9S9Q
11934
989'
9939
IlSIl'J
9943
(1003 99QS
9948 0952
0
0
0
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3 " ..
4
4
4
99 9956 9961 9965 9969 9974 9978 99& 99tl7 9991 9996 0 1 1 2 2 3 3 3 4
,ANTILOGARITHMS

Mean DUfcrcnrPR
0 I 2 3 4 5 6 7 8 9
I 2 3 4 5 6 1 8 9
- --- - - f - - _---
·00 1000 1002 1005 1007 1009 1012 1014 1016 1019 1021 0 0 1 1 1 1 2 2 2
1023 1026 1028 1030 1038 1040 1042 1046 0 0 1 1 1 2 2 2
~g~~
·01 1035 1
·02 Ion 1050 10:;2 1054 lO;:tU 1062 10(;4 1067 100\) 0 0 1 1 1 1 2 2 2
·03 1072 IOU 1076 1079 1081 10S4 1086 1089 1091 1094 0 0 1 1 1 1 2 2 2
·04 10!l6 1099 1102 1104 1107 1109 1112 1114 1117 1119 0 1 1 1 1 2 2 2 2
·05 1122 1125 1127 1130 1132 1135 1138 1UO 1143 1146 0 1 1 1 1 2 2 2 2
·06 11148 1151 1153 1156 1159 1161 1164 1167 1169 1172 0 1 1 1 1 2 2 2 2
·01 1175 117~ 11~0 1183 1186 1189 1191 1194 1197 1199 0 1 1 1 1 2 t 2 2
·08 1 1202 1205 1208 1211 1213 1216 1219 1222 1225 1227 0 1 1 1 1 2 2 2 8
·09 '1230 12:l:l 12:16 12:19 1242 1245 124i 12;;0 1253 1256 0 t 1 1 1 2 2 2 3
·10 1259 1262 1265 1268 1271 1274 1276 1279 1282 1~85 0 1 1 1 1 2 2 2 S
·11 1288 1291 129~ 1297 1300 130.1 1306 1309 1312 1315 0 1 1 1 2 2 2 2 8
·12 1318 1321 ]324 1327 1330 ]334 ]3:17 ]:)40 ]343 1346 0 1 1 I 2 2 2 2 3
·13 1:3411 1:152 1355 1358 136] 1365 1368 1371 1374 1377 0 1 1 1 2 2 2 3 3
·14 1380 1384 1387 1390 1393 1396 1400 140:1 1406 1409 0 1 1 1 2 2 2 3 s
·15 1413 1416 1419 1422 1426 1429 1432 1435 14.39 1442 0 1 1 1 2 2 2 3 s
·16 U45 14~9 1452 1455 14511 1462 1466 1469 1472 U76 0 1 1 1 2 2 2 3 s
·,1 HiD 1483 1486 1489 1493 Ufl6 UOO 1503 1507 1510 0 1 1 1 2 2 2 3 8
·18 1514 151i 1521 1524 1528 1631 15:15 1538 1542 1545 0 1 1 1 2 2 2 3 8
·19 1549 1552 1556 1560 1563 15(17 1570 1574 1578 1581 0 1 1 1 2 2 3 3 8
·20 1585 1589 1592 1596 1600 1603 1007 1611 1614 1618 0 1 1 1 2 2 3 3 S
·21
·22
·23
1622 1626
WOO 1663
HIllS 1i02
1629
1667
1706
1633
1671
1710
1637
1675
1714
1641
1679
1718
1644
1683
1722
1648
1687
1726
1652
1690
1730
1656
1694
1734
0
0
0
1
1
1
1
1
1
Z
2
2
2
2
2
2
2
2
S
3 3
3 3
"s S
4
·24
·25
17:18 1742
1778 1782
1746
1786
1750
1791
1754
1795
1758
17119
1762
1803
1766
1807
1770
1811
1774
1816
0
0
1
1
1
1
2
2
2
2
2
2
3 3
3 3 •
4

·26 1820 IB2~ 1828 1832 IB37 ISn 1845 1849 IBM 1858 0
1 1 2 2 3 3 3 4
·27 Is62 1866 1871 1875 1879 11!84 1888 1892 1897 ]901 0 1 1 2 2 3 3 3 f
·28 1005 1910 1914 1I1I9 1923 1928 1932 1936 19~1 1940 0 1 1 2 2 3 3 4 f
·29 1950 11154 1959 1963 1968 1972 1977 1982 1986 1991 0 1 1 2 2 3 3 4 4
·30 1995 2000 2004 2009 2014 2018 2023 2028 2032 2037 0 1 1 2 2 3 3 4 4
·31 20~2 20~6 2051 2056 2001 2065 20'0 207&
2123
2080
2128
2084
2133
0 1 1 2 2 3 3 4
4 4
,
·32 2089 209~ 2099 2104 2109 2113 2118 0 1 1 2 2 3 3
·33 2138 2143 2148 2153 2158 2163 2168 2173 2178 2183 0 1 1 2 2 3 3 4 4
·34 21811 2193 2198 2203 2201' 2213 2218 2223 2228 2234 1 1 2 2 3 3 4 4 5
·35 2239 2244 2249 2254 225!1 2265 11270 2275 2280 2286 1 1 2 2 3 3 t 4 6
·36 2291
2344
2296 2301 2307 2312
23;;0 2355 2360 2366
2317 2323 2328 2333
2371 237i 2362 2388
2339
2393
1
1
1
1
2
2
2
2
3
3
3
3
,
4
4
4
5
6
·37
·38 2391) 240~ 2410 2-l15 2421 2427 2432 2~38 2443 2449 1 1 2 2 3 3 4 4 6
·39 2455 2460 2466 2472 2477 2483 2~81) 24115 2500 2500 1 1 2 2 S 3 4 Ii 5
·40 2512 25111 2523 2529 2535 25~1 2S47 2553 2559 2564 1 1 2 2 3 4 4 5 6

, ,.
·41 2570 25i6 2582 2588 2594 2000 2606 2612 2618 2624 1 1 2 2 3 4 4 5 6
·42 2630 26'J6 2642 2649 2655 2661 2G67 2673 2679 2685 1 1 2 2 S 4 5 6
·43 26112 2698 2704 2710 2it6 2723 272~ 2735 2742 2748 1 1 2 S ·3 5 6
·44 2754 2761 2767 2773 2780 2786 2793 2799 2805 2812 1 1 2 a 3 4 f I) 6
2818 2825 2831 2838 2844 2851 2858 2864 2871 2877 1 J 2 S S 4 5 5 6
·45
·46
·47
2884
2951
28111
2958
2897
2965
2904
2972
2911 2917 2924
2!l79 2985 299~
~931
2099
2938
3006
2944
3013
1 J
1 1
2
2
if 3
3 3
,
4
5 II 6
5 6 f>
·48 3020 3027 3034 3041 3048 30!)5 306~ 3061' 3076 3083 1 1 2 3 4 4 6 6 II
·49 3090 3097 3105 3112 3119 312G 313 31'11 31411 3165 1 1 2 3 I) 6 6
1 J 7 " 4
MATHEMATICAL TABLES

ANTILOGARITHMS

J(ean DIlI'ere_

0 I 2 3 4 5 6 7 8 9
I 2 3 4 5 6 1 8 ,
. e
"SO
"51
3162 3170 3177 3184 3192 3109 3206 3214 3221 3228 1
3236
3311
3243
3319
3251
3327
3258
3334-
3266
3342
3273
3350
3281
33r.7
3289
3365
3296
3373
3S04
a38t
1
1
2 Z 9 4 5
11 3

1 2 1\ 8 4 fi
4 Ii
5
5 ~
, 7

7
"52 3443 6
"53 3388 33116 3404 3412 3420 3428 3436 34IH 3459 1 2 II 3 4 5 6 7
"54
"55
34li7
3548
3415
3556
3483
3565
3401
3573
3499
3581
3508
3589
31>16
31>97
3524
3606
31>32
3614
3540 1 2 II 3 4 5
3622 1 2 2 3 6 . 6
6
6
7
7
7
"56 3631 3639 3648 3656 3664 3673 3681 3600
3776
3698 3707
3793
1 2 S 3
1 2 3 a
.. r.
6
6 'I 8
6 'I 8

" ""
"57 3715 3124 3733 3741 3150 3158 3767 3784
"58 3802 3811 3819 3828 3831 3846 3855 3864 3873 3382 1 2 S 5 6 7 8
"59 3890 3899 ;1008 3917 3926 3936 3945 3!Jr.4 3963 3972 1 2 3 4 5 5 6 7 8
"60 3981 3090 3999 4009 4018 4027 4036 4046 405r. 4064 1 2 3 4 5 6 6 'I 8

"61 4074 4083 4003 4102 4111 4121 HaO 4140 4150 4159 1 2 3 .. 5 6 7 8 \)
"62 4169 4178 4188 4108 4207 4217 4227 4236 4246 4256 1 2 3" " Ii I) 7 8 \)
-63 4266 4216 4285 4295 4305 4315 4325 4335 4345 4355 1 2 3 4 6 6 7 8 9
"64 4365 4315 4385 4395 4406 4416 4426 4436 4446 4457 1 2 S .. 5 d 7 8 9
"65 4467 4471 4487 4498 4508 4519 4529 4539 4550 4560 1 2 3 4 Ii 6 7 8 II

"66 4511 4581 4592 4603 4613 4624 4634 4645 4656 4667 1 2 3 4 5 6 1 9 10
"67 4677 41\88 4699 4710 4721 4732 4742 4753 4764 47~6 1 2 3 4 5 7 8 9 10
"68 4786 4797 4808 4819 4831 4842 4853 4864 4875 4887 1 2 3 4 6 7 8 9 10
"69 4898 4909 4920 4032 4943 4955 4966 4977 4089 5000 1 2 3 5 6 7 8 9 10
"70 5012 5023 5035 5047 5058 5070 5082 5093 5105 5117 1 2 4 5 6 7 8 911

"71 5120 5UO 5152 51M 5176 5188 5200 6212 5224 5236 1 2 4 5 6 7 8 10 11
"72 5~"8 5260 5272 5284 5297 5309 5321 5333 5346 535S 1 2 4 5 6 7 9 1011
"13 5;,70 5383 5395 5408 5420 5433 5445 5458 5470 5483 1 3 4 5 6 8 o 10 11
"14 54~5 5508 5521 5534 5546 5559 5572 5585 5598 5610 1 3 4 .'i. 6 ~ 9 10 12
"75 5623 5636 5849 5662 5675 5689 5702 5715 5728 5141 1 3 4 5 7 8 9 10 12

"76 57~4 57G8 5781 5794 5808 5821 5834 5848 5861 5875 1 3 4 5 7 8 9 11 Ii
"77 5888 5902 5!ll6 5929 5043 5957 5970 5984 5908 6012 1 3 4 5 7 8 10 11 12
"78 6026 6039 6053 6067 6081 6095 0109 6124 fH3S 6152 1 3 4 6 7 8 10 11 13
"79 61116 6180 6194 6200 6223 6237 6252 6266 6281 6295 1 3 4. 6 7 9 10 11 13
"80 6310 63U 6339 6353 6368 6383 6397 6412 6427 6442 1 3 4 6 7 II 10 12 13

"81 6457 6471 6486 6501 6516 6531 n!")-l6 6561 n5i7 6502 2 3 5 \} 8 9 11 12 14
"82 6f)()7 6022 (i637 6653 6668 6683 6GO\) 6714 6730 6145 2 3 5 6 8 II 11 12 14
"83 67tH 67711 6792 6808 6823 6839 6~55 6671 6887 6902 2 3 5 6 8 II 11 13 14
"84 !lU18 1iU34 61150 6966 61182 61108 7015 7031 7047 7063 2 3 5 6 810 11 13 15
"SSij7U7!l 7000 7112 11:l9 7145 111ll 717B 7194 7211 7228 2 3 5 7 l! 10 12 13 15
"S6 j7'2'lt 72(H 7278 729" 7311 7328 7345 7362 7379 7396 2 3 5 7 810 12 13 }&
"87 7413 7430 7447 7464 7482 7490 75111 7534 7551 7568 2 3 5 7 910 1214111
"89 I ~~~~ 7780 7198 1816
·88 7flOa 7621 7638 7656 7674 76!l1 7709 7721 7745 2 4 5 7 911 12 14 16
7834 7852 7870 781111 71107 7925 2 4 5 7 II 11 131416
"90 1 794 7962 7980 7998 S017 S035 8054 8072 8001 Bll0 2 4 6 7 \) 11 13 15 17
;1
.,1 8128 RU7 8166 81S5 8204 8222 8241 8260 8279 8299 2 6 8 911 13 1h 17
-92 8318 8337 8356 8315 8395 8414 8433 8453 8t12 8492 2 " 6 810 12 14 15 17

,""
.,3 8;;11 8531 8551 8570 8590 8610 8630 86r.O 8610 8600 2 4 6 810 12 14 16 18
"94 8710 8130 8750 8770 8790 8810 8831 8851 8872 8892 2 6 8 10 12 14 16 18
.,5 S913 8933 8954 8974 8995 9016 9036 9051 0078 9099 2 6 8 10 12 15 11 19
"96 9120 9141 9162 9183 9204 9226 9247 9268 9200 9311 2 6 8 11 13 15 17 19
"97
"98
9333
9550
9772
0354
0512
9376
9594
9397
9616
11419
9638
9441
9661
9462
0083
9484
9705
9506
9727
9528
9750
2
2
4
4
" 7
7
9 11 13
9 11 13
15
III
1720
1820
\)1114
" 97115 9817 9840 Il863 9886 9008 9931 0054 9977 t 5 7 III 1820
9911 FUNDAMENTALS OF STA'rISnCS

POWERS, ROOTS & RECIPROCALS


-
• n" n" In :;n .!
n JfOii In
1 1
JlOfI
----
I 1 1 1-0000 1-0000 1-00000 3·1623 lo()1)()()() 0·31623
2 4 8 1·4142 1'2599 0'50000 4-4721 0·70711 0'22361
3 9 27 1·7321 1·4422 0·33333 5'4772 0·57735 0'18257
4 16 64 2-0000 1'5874 0·25000 6·3241> 0·50000 0'15811
5 25 125 2·2361 1·7100 0·20000 7-()711 0'''721 0·14142
6 30 216 2·4405 1'8171 0·16607 7·7460 0·40825 0·12910
1 49 843 2-6458 1·9129 0·14286 8·3666 0·37790 0'11952
8 64 512 2·8284 2·0000 0·12500 8·9443 0·35355 0·11180
9 81 729 3-0000 2-0801 0·11111 9·4868 0·33333 0'10541
10 100 1000 3'1023 2·r544 0'10000 10-0000 0·31623 0'10000
II 121 1331 3'S106 2·2240 0·00091 10·4881 0·30151 0-09535
12 144 1728 3·4641 2·2894 0'08333 10·9545 0'28M8 0-09129
I~ 169 2197 3'8056 2·3513 0'07602 IHOl8 0·27735 0-08771
14' 196 2744 3·7417 2·4101 0'07143 11·8322 0'207~6 0·08452
15 225 3375 3·8730 2'4662 0'00667 12'24n 0·25820 0'08165
16 256 4000 4-0000 2·51118 0·06250 12·6491 0·25000 0'0,1106
17 289 4913 4'1231 2·5713 0·05882 13·0384 0·24253 0'07670
18 324 5832 4·2426 2'6207 0-05556 1"4164 0·23570 0·07454
19 361 6859 4·3589 2'6684 0,0526.'1 8·7840 0·22942 0·07255
20 400 8000 4''121 2'7U4 0-05000 14'1421 0·22361 0'07071
21 441 9261 4·5820 2·7589 0'04762 14'4914 0·21822 0-<l6901
22 484 10648 4'6904 2'8020 0'04545 14·8324 0·21320 0-<l6742
23 629 12167 4'7058 2'8439 0'04348 15'1658 0·20851 00()6594
24 676 13824 4'8990 2'8845 O-O~161 15'4919 0·20412 0·06455
2S 625 15625 5'0000 2·0240 0-<l4000 15·8114 0·20000 0~325

Z6 676 17576 5-0990 2'0025 0-<l3846 16'1245 0·10012 0-00202


27 729 19683 5·1962 3·0000 0·03704 16·4317 0·19245 0-00086
28 784 21952 5·2915 3-0.166 0·<rJ571 16·7332 0·18898 0-05976
29 841 24389 5'3852 3·0723 0'03HI! 17·0294 0·18570 0'05872
30 900 27000 6'4772 S'1072 C'03333 17·3205 0'18257 0'05774
31 961 29701 5'5678 3·1414 0·03226 17·6068 0·17001 0-0:;680
32 1024 32768 5·6569 3·1748 0-03125 17·8&15 0·17678 O-<l5S90
33 1089 35937 5·7446 3'2075 0·03030 IS'1659 0'17408 O-<l5SOS
34 1156 39304 5·8310 3·2396 0-<l2941 18·4391 0·17150 0-<l54ts
35 1225 42875 5·9161 3'2711 0-<l2857 18·7083 0·16903 0-05345
36 1296 46666 6'0000 3'3019 0-02778 111'9737 0'16667 0-05270
37 13011 50653 6'0828 3·3322 0·02703 19·2354 0·16440 0-05UIII
38 14U 54872 6'I64t 3·3620 0'02632 19'4936 0·10222 0-0:;130
39 1521 59319 6·2450 3·3012 0'02564 111-7484 0·16013 0'05064
40 1000 64000 0'3245 3'4200 O-<l2500 20-0000 0'15811 0-05000
41 16.Q) 68921 6'4031 3'44~2 O-<l2439 20·2485 0·15617 0,041)39
42 176t 74088 6·4807 3'4760 0·02381 2O'4U39 0·15430 0-<l48S0
43 18~9 79507 0'5674 3'6034 Oo()'l326 20'7364 0'16250 0'04822
44 1936 85184 6·6332 3'5303 0-<l2273 20·9762 0·16076 0-<l4767
45 2025 91125 6·7082 3'5569 I 0'02222 21'21~2 0·14907 0-<l4714

.. g:
46 2116 97336 11-7823 3'5830 0'02174 21'4476 0'U744 0-<l4003
47 2209 103823 6'8557 3·6088 , 21·11796 0·14587 0-<l4613
48 2304 l1o.~92 6'9282 3-6342 21·90811 0·1443-& 0-<l4564
2401 117649 7-0000 a·6693 O-oiO&l 22'1359 0'14286 0-<l4518
50 2500 125000 7-0711 31l84O 0-02000 22'3607 0'14142 01K472
\
999

POWERS. ROOTS & RECIPROCALS

n nl n' .jn :;.. .!..


n JiOii 1
-:J:'
1
.jl0..
-- --- --- - - -
51 2601 132651 7·1414 3·7084 00{)1961
- - - ---- - - -
22'5832 0·14003 0·04428
52 2704 140608 7·2111 3'7325 0'01923 22'8035 0'13868 0·043&
53· 2809 148877 7·2802 3·7563 00{)1887 23-0217 0·13736 0-04344
54 2916 1f>74M 7-l~485 3·7796 0001852 23·2371l 0·13608 0·04303
55 3025 166375 7·4162 3·8030 0·01818 23'4521 0·13484 0·04264

56 3136 175616 7·4833 3-8259 0·01786 23'6643 0'13363 0·04226


57 3240 185193 7·5498 3'8485 0·01754 23'8747 0'13245 00041119
58 3364 195112 7'6158 3'1I7111l 0·01724 2~'0832 0'13131 O'041r)2
59 3481 205379 7·6811 3'6930 00{)1695 24'2809 0'13019 0·OU17
60 3600 216000 7·7460 3·9149 0·01067 24'4949 0'12910 00040112
';1 3721 226961 7·8102 3·0365 00{)1639 24·6982 0·12804 0·04049
62 3844 238828 7-I!740 3'9!'>79 0·01613 24·8998 0'12700 0·04U1R
63- 31169 250047 7·P373 3·9791 0·01587 25·0996 0·12599 0-03984
64 40\111 262144 8·0000 4'0000 00{)1563 25·29l12 0'\2500 0·oa953
65 42'.a5 274625 8{)623 4-0207 00{)1538 25·4951 0·12403 0·03922

66 4356 287406 8·1240 4·0412 0·015U 25·6005 0']2.109 0·03802


67 4489 300763 8·1854 HlIl15 0'01493 25·8844 0'12217 00{)3863
68 4(124 314432 8-2462 400817 0-01471 2600768 0-12127 0·03885
69 4761 328509 8-3066 4'1016 0·01449 26·2679 0·12039 00{)3807
70 4900 343000 8·3666 4'1213 0'01429 26·4575 0'11952 0-0371!Q

71 5041 357911 8·4261 4·1408 0001408 26·8458 0'11868 0-03753


12 518t 373248 8·4853 4·11102 0001389 26·8328 0'11785 0-03727
73 5329 380017 8·5440 4'1793 0'01370 27-0185 0'11704 00{)3701
7' 5476 405224 8-6023 4·1983 0·01351 27·2029 0'11625 00{)3676
75 5625 421875 8·6603 4·2172 0·01333 27·3861 0'11547 0-03651

76 5776 438976 8·7178 4·2358 0001316 27·6681 0'11471 00{)3627


77 5929 456533 8'7750 4'2543 0001299 27·7489 0'11396 0003604
78 608" 474652 8·8318 4·2727 0·01282 27·9285 0·1l3J!3 0003581
79 6241 (93031) 8·8882 4'2908 00{)1266 28·1069 0'11251 0'03558
80 6400 512000 8·0443 4'3089 0·01250 28·2lUS 0'11180 0003536

81 6561 531441 9'0000 4'3267 0'01235 28'4604 0'H111 0·03514


82 6724 551368 0·0554 4'3445 0-01220 26-6356 0'1104~ 0003492
83 6889 571787 IH104 4'3621 0·01205 28-8097 0·10976 0·03471
84 7056 592704 11-1652 4·3795 0-01100 28'11828 0·10011 0·03450
85 7225 614125 9·2195 4·3968 0'01176 21),1548 0·10847 0·(Y.l430

86 7396 636056 9·2736 4·4140 00{)1163 29-3258 0·10783 0·03410


87 7569 658503 9'3274 4'4310 0'01149 29·4958 0-10721 0-03300
88 7744 681472 9·3808 4·4480 0'01136 29·6648 0·10660 0·03371
89 7921 704969 0'4340 4'.647 0·01124 29·8329 0·10600 0'03352
90 8100 720000 0·4868 4·4814 00{)1111 30·0000 0·10541 0-03333

91 8281 753571 9-53!U 4'4979 0-1)1099 30·1662 0·10483 0'03315


92 8464 778688 9'5917 4'5144 0'01087 30·3315 0·10426 0·0:1297
;: 8649
8836
804357
830584
9'6437
9'6954
4'5307
'·5468
4'5629
00{)1075
0001064
30'(1)59
3O'65!U
30-8221
0·10370
0·10314
0'03279
0·03262
0'03244
95 9025 857375 9'7468 0-01053 0·10260

96 9216 SS4.73e 0·71180 4·571111 0'QI042 SO·IIS31l Q·lm<l6 0·Q3227


97 9409 912673 9'!l489 -t'5947 0'01031 31-1448 0·10153 0003211
98 9604 941192 9·89115 4·6104 0'01020 31'3050 0·10102 0·03194
99 11801 970299 9'9499 4·6261 00{)1010 31'4643 0'10050 0'03178
100 10000 1000000 10-0000 '·6416 00{)1000 31'6228 0·10000 0003162
1000 PUNnAXBNTALS 01' STATISTICS

SQUARE ROOTS FROM 1 to 10

Meaa I>iSelwIces
0 1 2
--
3
• ,; 6

-- -
7 8 II
123 U8 781
I-- 1 - ---- I - - -- I- I-
1-0 1'000 1'005 1'010 1'01 5 1'020 1'02 5 1'030 1'034 1'039 1'044 OIl 223 344
-- -- --
1-1 1'049 1'054 1'°58 1.063 1·068 1'°72 1'°77 1·082 1:086 "0<)[ Oil 223 344
1'2 1'0<)5 1'100 1' 105 1'109 l'Il4 1'1l8 1'122 1'127 1'13 1 "136 ° I I 223 344
1'3 1'14° 1'145 1'149 1'153 1'158 1'162 1'166 1'170 1'175 {'{79 OIl 223 334
1-4 1' 183 1'187 1'192 1'196 1'200 1'204 1'208 1'212 1'217 1'221 oI I 222 334
1·5 1'225 1'229 1'233 1'237 {'241 1'245 {'249 1'253 1'257 1"261 OIl 222 334
1-6 1' 265 1'269 1'273 1'277 1'281 1' 28 5 1'288 1'292 1'296 ('3OO OIl 222 333
1'7 1'3°4 1'3°8 1'311 1'3 15 1'3 19 1'3 23 1'327 1'330 1'334 1'338 OIl 2::1 2 333
1,8 1'34 2 1'345 1'349 1'353 1'35 6 1'360 1'364 1'367 1'371 1'375 011 122 333
1'9 1'37 8 1'382 1'386 1'38 9 1'393 1'396 1'400 1'4°4 1'4°7 1'411 011 122 333
2·0 1'4 14 1'418 1'421 1'42 5 1'428 1'432 1'435 1'439 1'442 1'446 011 122 233
2'1 ['449 1'453 1'456 1'459 1'46 3 1'466 1'470 1'473 1'476 1'480 oI I 122 233
2,2 1'48 3 1'487 1'490 1'493 1'497 1'500 1'5°3 1'5°7 1'5 1P 1'5 13 011 122 233
2,3 1'5 17 1'5 10 1'52 3 1'526 1'530 1'533 1'536 1'539 1'543 1'546 011 122 233
2·4 1'549 1'55 2 1'556 1'559 1'562 1'56 5 1 °5 68 1'572 l S7S
o
I'S78 oI I t22 233
2·5 loS81 1'584- 1 °5 8 7 1'59 1 1°594- 1°597 1·600 1'603 1·606 1·60<)
122 Oil 233
2,6 1·612 1·616 1.61 9 1·622 1·625 1·628 1.631 1,634- 1·637 1·640122 oI I 223
2,7 1,64-3 1°64 6 1,649 1'652 1,655 1'6S/) 1 0661 1,664 1'667 1,670122 OIl 223
2·8 1·673 1.676 1·679 1·682 1.685 1,688 1.691 1'694 1'697 1'700112 011 223
2·9 1'7°3 1'706 1°709 1'712 1'7 1 5 1'718 1'720 1'7 23 1'726 1°729[[ 2 Oil 223
3·0 1'732 1'735 1'738 1'74 1 1°744 1'746 1'749 1'75 2 1'755 1'758I I2 011 223
3,1 1'7 61 1'764 1'766 1'769 1'77 2 1'775 1'778 1'780 1'7 8 3 1'786I I 2 ° I I 223
3·2 "7 89 1'792 1'794 1'797 1·800 1.803 '°806 1·808 1·811 1.81 4
I I 2 oI I 222
3,3 1.817 1'81 9 1·822 1"82 5 1·828 1·83° 10833 1.836 1,838 1'841112 Oil 222
3·4 1·844 1·847 1,849 1.85 2 1·855 1·857 1·860 1'863 1,865 1·8681I 2 ° I I 222
3·5 1·871 1'873 1°876 1·879 1·881 1°884 1·887 10889 1.892 1·895 112 all 222
a'6 {·897 I'goo l'g03 1'90 5 l°goB 1°910 1 °9 13 ['916 1'9 18 1'921 II Z OIl 222
3·7 1'924 1'926 1'9 29 1'93 1 1'934 1'936 1°939 1'94 2 1'944 1'947 II 2 ° I I 222
8·8 1'949 '.95 2 ['954 1°957 1'960 1'962 1°96 5 1'96 7 1'97° 1'97 2 II Z ° I I 222
39 1'975 1'977 1'980 1'982 1'98 5 1'987 1'990 1'992 1'995 1'997 I I2 01 J 222
...·0 2'000 2'002 Z'005 2'007 2'010 2'012 20015 2'017 2'020 2'022 II I 00 I 222
4'1 2' 02 5 2'027 2'°30 2'°3 2 2'035 2'°37 2 0040 2'042 2'045 2'°47 1I I 001 222
4,2 2'°49 2'°5 2 2'°54 2'°57 2'°59 2·062 2°064 2'066 2'069 2'071 I I I 001 222
4·3 2'°74 2'076 2'078 2'OSI 2'083 2·086 2'088 2'090 2'093 2'°95 II I 001 222
4·4 2'098 2'100 2'IO,a 2'1°5 2'107 2'lIO 2°112 2'114 2'117 2'119 II I 00 I 222
4·5 Z'121 2'124- 2'126 20128 2°13 1 2'133 2'135 2'138 2'14° 2'142 I I I 001 222
4·6 2'145 2'147 2'149 2'15 2 2'154 2'156 201 59 2'161 2' 163 2'166 II J 001 22:1
4·7 2'168 2'17° 2'173 2'175 2'177 2'179 2'182 2' 184 2'186 2' 189 I I 1 00 I 222
4·8 2'191 2'193 2'195 2'198 2'200 2'202 2'2°5 2'2°7 2' 209 2'211 I J I 001 222
4·9 2' 21 4 2'216 2'218 2'220 2'223 2'225 2'227 2'229 2'23 2 2'234 I I J 00 I 222
5·0 2·236 2'23 8 2'241 2 0243 2'245 2'247 2'249 2'252 2'254 2'256 001 J I I 222
5-1 2'258 2'261 2' 263 2' 26 5 2'267 2'269 2°272 2'274- 2'276 2 0278 001 I I I 222
S'lI 2'280 2'283 2028 5 2'28 7 2' 289 2'291 2 0293 2'296 2'298 2°3 00 001 I I I 222
5·3 2'3°2 2'30 4 2'3°7 2'3°9 2'3 11 2'3 13 2'3 15 2'3 17 2'3 19 2'3 22 001 I I ! 222
6-4 2 °3 24 2'3 26 2 °3 28 2 °33 0 2°33 2 2'335 2°337 2'339 2'34 1 2'343 001 I I I [22
1001
I14THBVATICAL TABU'

Square Roots ftom 1 to 10

0 1 sa 3 , IS 8 7 8 9
Mean I>lIfem>ceS,

123 458 789


S'S
--
2'345 ~'347 2'349 2'352
I-

~'354 2'356 2'35 8 ~'3Qo 2'362 2'364 001 I I I


--
J 22
6,8 2'366 iz'369 2'371 2'373 iz'375 2'377 2'379 ~'381 2'3 83 2'3 8 5 001 1 I I I 2 2
6'7 2'387 12'390 2'392 2'394 2'396 2'398 2'400 2'402 :!'404 :Z'406 001 I I 1 I 2 2
1,8 2'408 ~'410 2'4 12 2'415 ~'417 z'4 19 z'4 21 ~'423 2'4 25 2'42 7 001 I 1 I 1 2 2
1'8 2'429 2'43 1 2'433 2'435 ~'437 2'439 2'441 2'443 2'445 2'447 001 I I 1 I 22
$,0 2'449 2'452 2'454 2'45 6 ~'4S8 2'460 2'462 2'464 2'466 2'468 001 I I I I 2 2
8-1 2'470 l;z'47 2 2'474 2'476 ~'478 2'480 2'482 2'484 2'486 2'488 001 1 I 1 122
8'2 2'~~ P.'492 2'494 2'4'}6 2'498 2'500 2'S02 2'504 2'506 2'508 001 1 I 1 I 2 2
8'8 2'510 j2'S12 2'5 14 2'5 16 2'5 18 2'5 20 2'S22 2'5 24 2'526 2'5 28 001 1 I 1 I 2 2
8,. 2'530 2'53 2 2'534 2'536 2'538 2'540 2'542 2'544 2'546 2'548 001 I I 1 122
$,5 z'559 2'S61 2'5 63 2'5 65 2'5 6 7 001 1 I I I 2 2
8·6
2'550 2'55 1 2'553 :!'5SS !:5S7
2'5 69 2'57 1 2'573 2'575 '577 2'579 2'5 81 2'5 8 3 2'5 8 5 2'5 87 001 I 1 I 122
6''1 2'5 88 2',90 2'592 2'594 2'596 Z'598 2'600 2·602 2,604 2,606 001 I I I I 2 2
8,8 2,608 iI'610 2,612 2' 61 3 2.61 5 2,617 2'619 2,621 2' 62 3 2,625 001 I I I 122
8'9 2,627 2'629 2.631 2,63 2 2,634 2,636 2'63 8 2,640 2,642 2,644 001 I I I 12:1
7,0 2·646 2,648 2,65° 2-651 2,653 2,655 2'657 2,659 2,661 2'663 00 I I I I 122
7,1 2,665 2·666 2,668 2,670 12'67 2 2,674 2'67 6 2,678 2,680 2,681 001 I I I 1 J 2
Nt 2,683 2,685 2' 687 2' 689 2'691 2'693 2'694 2,696 2'69~ 2'700 001 I I I I I 2
7'8 2'7°2 2'704 2'706 2'707 ~'709 2'7 lI 2'7 13 2'7 1 5 2'717 2'7 18 001 I I I I I 2
7'. 2'720 2'722 2'7 24 2'7 26 ~'728 2'7 29 2'73 1 2'733 2'735 2'737 001 I I J 1 J 2
7,& 2'739 2'1'40 2'742 2'744 2'746 2'74 8 2'750 2'75 1 2'753 2'755 00 I I I I I I 2
7'8 2'757 2'759 2'760 2'762 2'7 64 2'7 66 2'7 68 2'769 2'77 1 2'773 00 I I I I 1 I 2
7'7 2'775 2'777 2'778 2'7 80 ~'782 2'784 2'7 86 2'7 8 7 2'7 89 2'79 1 001 I J J 1 I 2
7,8 2'793 2:795 2'796 2'798 2,800 2,802 2,804 2' 80 5 2,807 2,809 001 I I I I I 2
7,0 2,8n 2,812 2' 81 4 2,816 2,818 ::'820 2'821 2' 82 3 2' 82 5 2,827 001 I I I I , 2
8,0 2,828 a,830 2,832 2'834 2,835 2,837 2'839 2,841 2'843 2,844 001 I I I I I 2
8'~ 2,846 a,848 2,850 2'8S1 2'853 :1'855 2'857 2'858 2'860 2,862 00' I I J I I 2
8'2 2,864 a' 86 5 2,867 2'869 2,87 1 2,872 2'874 2,876 2,877 2.879 001 I 1 I I I 2
8,8 2,881 a·883 2,884 2·886 2·888 2·890 2.891 2·893 2,895 2,897 001 1 I J I I 2
8'4 2,898 a'goo 2'902 2'903 2'905 2'907 2'909 2'910 2'9 12 2'9 14 001 1 I I 1 I 2
8·6 2'9 15 a'9'7 2'919 2'921 2'9 22 2'924 2'926 2.'927 2'929 2'93 1 001 1 1 I I I 2
8,6 2'933 ~'934 2'93 6 2"938 2'939 :1'94 1 2'943 2'944 2'94 6 2'948 001 I I I I I 2
8,7 2'95 0 a'95 1 2'953 2'955 2'95 6 2'95 8 2'960 2'961 2'9 6 3 2'9 6 5 o 0 I I I I I I 2
a'8 2'9 66 2'968 2'970 2'972 2"973 2'975 2'977 2'97 8 2'980 2'982 00 I 1 J J 1 I 2
8"0 2'983 a'98 s 2'987 2'988 2'990 2'99 2 2'993 2'995 2'997 2'998 001 I I T 1 I 2
9-0 3'000 3'002 3'003 3'005 3'007 3'008 3'010 3'012 3'0 13 3'01 5 000 I I I I I I
9'1
9'2
3'01 7 3'018 3'020 3'022 3'023 3'02 5 3'027 3'028 3"030 3'03 2 000
3'033 3'035 3'036 3'038 3'040 3'04 1 3'043 3'045 3'046 3'048 000
I
I
1
I I
I I
I
I
1 ,I
9'3 3'050 3'05 1 3'053 3'055 3'056 3-0 58 3'059 3-061 3'063 3.064 000 I , I 1 I I
0,4 3'066 3,068 3'069 3'?J.I 3'072 3'074 3'076 3'077 3'079 3,081 000 , I I I , 1
9,6 3'082 3' 08 4 3'085 3:'087 3'08 93:090 3'092 3'09.~ 3'r,S 3'097 000 , I I I I I
9'8 3'ogB 3'100 3'102 3' 103 3'105 3;106 3'108 3'11'0 3' II 3' 1I 3 000 I t I 1 I I
9" 3'114 3'116 3'11,8 3'Il9 3'121 3'122 3' 124 3'126 3'127 3' 129 0 0 0 t 1 I I I I
9,8
9'9
3'130 3'13 2 3"34 3"35 3'137 3'138 3"40 3'14 2 3'143 3'145 000
3'146 3"48 3- 150 3'151 3'153 3'154 3"56 3'158 3".59 3,,6, 0 0 0 ,I ,I ,I , I I
I
I
,
,0'
.001. FUN1)A'ttfENTALS OP STAl1STIC~

SQUARE ROOTS FROM Ioto 100

Hean Differences
0 1 sa 3 4 5 6 '1 8 9
123 456 '189
10 3'162 3'178 3'194 3'209 3'225 3'240 3'256 3'271 3'286 3'302 235 68 9 II 12 14
11 3'3 17 3'33 2 3'347 3'362 3'37 6 3'39 1 3'406 3'421 3"435 3'450 134 679 101213
12 3'464 3'479 3'493 3'5°7 3'5 21 3'536 3'550 3'564 3'578 3'59 2 134 678 10 II 13
13 3,606 3,619 3,633 3,647 3'661 3,674 3'688 3'701 3"715 3'7 28 134 57 8 10 II 12
14 3'742 3'755 3'768 3'782 3'795 3,808 3,821 3,834 3'847 3'860 134 57 8 911 12
15 3'873 3,886 3,899 3'9 12 3'924 3'937 3'950 3'962 3'975 3'987 134 5 68 9 10 II
16 4'000 4'012 4'02 5 4'037 4'05 0 4,062 4'074 4'087 4'099 4'111 124 56 7 91011
17 4' 12 3 4'135 4'147 4'159 4'17 1 4' 183 4'195 4'207 4'219 4'23 1 124 56 7 810 II
18 4'243 4'254 4'266 4'278 4'290 4'3°1 4'3 13 4'324 4'336 4'347 12 3 56 7 8 910
19 4'359 4'370 4'3 82 4'393 4'4°5 4'416 4'4 27 4'43 8 4'450 4'461 12 3 56 7 8 910
20 4'472 4'483 4'494 4'506 4'517 4'5 28 4'539 4'550 4'561 4'572 12 3 46 7 8 910
21 4'5 83 4'593 4,604 4'61 5 4,626 4,637 4,648 4,658 4,669 4'680 12 3 45 6 8 910
22 4,69° 4'7°1 4'712 4'722 4'733 4'743 4'754 4'764 4'775 4'7 8 5 12 3 45 6 7 8 9
23 4'796 4,806 4,817 4,827 4,837 4,848 4,858 4,868 4,879 4'889 12 3 45 6 7 8 9
24 4,899 4'909 4'919 4'930 4'94° 4'950 4'960 4'970 4;980 4'990 I :1 3 456 7 8 9
26 5'000 5'010 5'020 5'03 0 5'040 5'05 0 5'060 5'°70 5'079 5' 08 9 12 3 45 6 7 8 9
26 5'099 5'109 5'119 5'128 5'138 5'148 5'15 8 5' 167 5'177 5' 187 12 3 456 7 8 9
27 5'196 5'206 5' 21 5 5'225 5'235 5'244 5'254 5' 263 5'273 5'282 123 456 7 8 9
28 5'29 2 5'3° 1 5'3 10 5'3 20 5'3 29 5'339 5'348 5'357 5'3 67 5'376 12 3 456 7 7 8
29 5'38 5 5'394 5'404 5'4 13 5'422 5'43 1 5'441 5'450 5'459 5'468 12 3 455 6 7 8
30 5'477 5"486 5"495 5'5 0 5 5'514 5'5 2 3 5'53 2 5'541 5'55 0 5'559 12 3 445 6 7 8
31 5'5 68 5'577 5'5 86 5'595 5' 604 5'612 5,621 5'63° 5'639 5'648 12 3 345 6 7 8
32 5,657 5,666 5,675 5' 68 3 5,692 5'7° 1 5'7 10 5'718 5'727 5'736 12 3 345 6 7 8
33 5'745 5'753 5'7 62 5'77 1 5'779 5'788 5'797 5' 805 5,814 5'822 12 3 345 6 7 8
34 5'83 1 5,84° 5,848 5,857 5,865 5,874 5,882 5,891 5,899 5'908 I,. 3 345 6 7 8
35 5'9 16 5'9 2 5 5'933 5'941 5'95 0 5'95 8 5'967 5'975 5'983 5'99 2 122 34S 6 7 8
36 6'000 6'008 6'017 6' 02 5 6'033 6'042 6'050 6'°5 8 6'066 6'075 122 345 6 7 7
37 6' 08 3 6'09~ 6'099 6'107 6'II6 6' 124 6'13 2 6'140 6'148 6'156 122 345 6 7 7
38 6'164 6'173 6'181 6' 189 6'197 6' 20 5 6' 21 3 6'221 6'229 6'237 [22 345 6 6 7
39 6'245 6'253 6'261 6' 26 9 6'277 6' 28 5 6'293 6'301 6'3 09 6'3 17 122 345 6 6 7
40 0'3 25 6'33 2 6'340 6'34 8 6'35 6 6'3 64 6'372 6'380 6'3 8 7 6'395 122 34S 6 6 -;
41 6'40 3 6'4 11 6'4 19 6'4 27 6'434 6'44 2 6'450 6'45 8 6'4 6 5 6'473 122 345 5 6 7
42 6'481 6'488 6'496 6'5 04 6'5 12 6'5 19 6'5 2 7 6'535 6'54 2 6'55 0 [22 345 5 6 7
43 6'557 6'5 65 6'573 6'580 6'588 6'595 6' 603 6'611 6,618 6'626 122 345 5 6 7
44 6,633 6'641 6,648 6,65 6 6,663 6,67 1 6·678 6,686 6,693 6'701 122 345 5 6 7
45 6'708 6'716 6'7 23 6'73 1 6'73~ 6'745 6'753 6'760 6'768 6'775 I 1 2 344 5 6 7
46 6'782 6'790 6'797 6' 804 6,812 6,819 6,826 6'8341 6 '84 1 6,848 I I 2 344 5 6 7
~7 6,856 6' 86 3 6,87° 6,877 6,885 6,892 6,899 6'9°7 6'9 14 6'921 [ I 2 344 5 6 7
,48 6'9 28 6'935 6'943 6'95 0 6'957 6'9 64 6'97 1 6'979 6'9 86 6'993 I 12 344 5 6 6
49 7·000 7'007 7' 01 4 7'021 7' 02 9 7'03 6 7'°43 7'°5° 7'°57 7' 064 I J 2 344 5 6 6
60 7'07 1 7'07 8 7' 08 5 7'09 2 7'099 7'106 7' I l 3 7'120 7' 12 7 7'134 I J 2 344 5 6 6
51 7'14 1 7'14 8 7'155 7'162 7' 169 7'176 7' J8 3 7'19° 7'197 7'204 I I 2 344 5 6 6
52 7'2U 7'218 7'225 7'23 2 7'239 7'246 7'253 7'259 "266 "273 1 J 2 334 5 6 6
53 7'280 7'287 7'294 7'3° 1 7'3 08 7'3 14 7'3 21 7'3 28 7'335 7'342 112 334 S S 6
54 7'348 7'355 7'3 62 7'369 7'376 7'3 82 7'3 89 7'396 7'40 3 7'409 I I 2 334 5 5 6
MA.THKM.\TICAL TABLES

SQUARE ROOTS. FROM 10 TO/IOO

Mean Differences
0 1 2 3 4 5 6 '1 8 9
-66 - - _- -- -- _- -- I- -- -- -- -
123 456 789
- --
7'4 16 7'423 7'43° 7'~36 7'443 7'45 0 7'457 7"463 7'470 7'477 112 334 55 6
56 7"483 7'490 7'497 7'5°3 7'5 10 7'5 17 7'5 23 7'53° 7'537 7'543 I I 2 334 55 6
67 7'55° 7'55 6 7'5 63 7'57 0 7'57 6 7'5 83 7'589 7'596 7' 603 7' 609 I I 2 334 55 6
58 7,616 7,622 7,629 7'635 7,642 1,649 7. 655 7,662 7,668 7'675 I I 2 334 55 6
59 7,681 7,688 7,694 7'7°1 7'7°7 7'714 7'720 7'7 27 7'733 7'740 I I 2 334 456
60 7'746 7'75 2 7'759 7'7 65 7'77 2 7'778 7'785 7'79 1 7'797 7'804 1 I 2 334 45 6
61 7·810 7,8 17 7.823 7.829 7,836 7'84~ 7'849 7,855 7,861 7,868 I I 2 334 45 6
62 7·874 7,880 7,887 7.893 7·899 7'906 7'9 12 7'9 18 7'9 25 7'93 1 I I 2 334 45 6
83 7'937 ,'944 7'95° 7'95 6 j'962 7'9 69 7'975 7'9 81 7'98 7 7'994 I I 2 334 45 6
64 8'000 8·006 8'012 8'01 9 8'02 5 8'°3 1 8'°37 8'044 8'°5° 8'°5 6 112 234 45 6
66 8·062 8'068 8'075 8'081 8'087 8'°93 8'099 8'106 8'112 g'I18 I I 2 234 45 6
66 8' 124 8'13° 8'136 8'14 2 8'149 8'155 8'161 8'167 8'173 8'179 1 1.2 234 455
67 8' 185 8'191 8'198 8'20" 8'210 8'216 8'222 8'228 8'234 8'24° I I 2 234 455
68 8'246 8'25 2 8'258 8'264 8'27° 8'27 6 8' 28 3 8'289 8'295 ·8'3°1 I 1 2 234 455
69 8'307 8'3 13 8'319 8'3 25 8'33 1 8'337 8'343 8'349 8'355 8'3 61 I I 2 234 455
70 8'367 8'373 8'379 8'385 8'390 8'396 8'402 8'408 8'414 8'420 I I 2 234 455
'11 8'4 26 8'43 2 8'438 8'444 8'450 8'45 6 8'462 8'468 8'473 8'479 I I 2 234 455
'11 8'48 5 8'491 8'497 8'503 8'509 8'S 1 5 8'5 21 8'526 8'53 2 8'538 I I 2 233 455
'13 8'544 8'550 8'55 6 8'5 62 8'5 67 8'573 8'579 8'58 5 8'59 1 8'597 I I 2 233 455
74 8·602 8·608 8·614 8·620 8·626 8.631 8'637 8,643 8'649 8·654 11 2 233 455
76 8·660 8·666 8.672 8·678 8'683 8·689 8'695 8'7°1 8'706 8'712 112 233 455
76 8'7 18 8'7 24 8'729 8'735 8'74 1 8'746 8'75 2 8'75 8 8'7 64 8'7 69 I 1 2 233 455
7'1 8'775 8'781 8'786 8'79 2 8'798 8·8°3 8'809 8·8r 5 8,820 8·826 112 233 445
78 8.832 8·837 8·843 8·849 8'854 8·860 8·866 8'87 1 8·877 8·883 1 I 2 233 445
79 8·888 8'894 8·899 8'905 8'911 8'9 16 8'922 8'927 8'933 8'939 I 1 2 233 445
80 8'944 8'95° ,8'955 8'961 8'967 8'972 8'978 8'983 8'989 8'994 11 2 233 445
81 9'000 9·006 9'011 9'01 7 9'022 9'028 9'°33 9'°39 9'°44 9'°5° I 1 2 233 445
82 9'°55 9'061 9,066 9'°7 2 9'°77 9'o8~ 9.088 9'094 9'099 9'10~ I 1 2 233 445
83 9'110 9'116 9'121 9'127 9'13 2 9'13 9'143 9'149 9'154 9'160 I I 2 233 445
64 9' 16 5 9'171 9'176 9'182 9'187 9'19 2 9' l gB 9'2°3 9'209 9' 21 4 1 1 2 233 445
85 9'220 9'225 9'23° 9'236 9'24 1 9'247 9'252 9'257 9'263 9'268 112 233 445
88 9'274 9'279 9' 28 4 9'290 9'295 9'3°1 9'306 9'3 11 9'3 17 9'3 22 1 1 2 233 445
87 9'3 27 9'333 9'33 8 9'343 9'349 9'354 9'359 9'3 65 9'37° 9'375 I 1 2 233 4405
88 9'3 81 9'386 9'39 1 9'397 9'4°2 9'4°7 9'4 13 9'4 18 9'4 23 9'4 29 1 1 2 233 445
89 9'434 9'439 9'445 9'45° 9'455 9'460 9'466 9'471 9'476 9'482 1 1 2 233 445
90 9'487 9'492 9'497 9'5°3 9'508 9'5 13 9'5 18 9'5 24 9'5 29 9'534 112 233 445
91 9'539 9'545 9'55° 9'555 9'5 60 9'5 66 9'57 1 9'57 6 9'5 81 9'586 1 1 2 233 445
92 9'592 9'597 9.602 9.607 9.612 9'618 9.623 9,628 9.633 9.638 I 1 2 233 445
93 9·644 9.649 9·654 9.659 9·664 9.670 9.675 9.680 9.685 9'690 I 1 2 233 445
H 9'695 9'701 9'706 9'711 9'716 9'721 9'726 9'73 1 9'737 9'7~2 11 2 233 445
96 9'747 9'75 2 9'757 9'762 9'767 9'77 2 9'778 9'78 3 9'788 9'793 1 1 2 233 445
96 9'798 9'8~3- 9·808 9.81 3 9.818 9.823 9.829 9.83.... 9.839 9.844 1 1 2 233 445
97 9·849 9·854 9.859 9.864 9.869 9.874 9.879 9.814 9.889 9·894 I I 1 233 445
98 9.899 9'905 9'9 10 9'9 1 5 9'9 z0 9'9 z5 9'930 9'935 9'940 9'945 01 1 223 344
99 9'95° 9'955 9'960 9'965 9'97° 9'975 9'g80 9'985 9'990 9'995 01 1 223 344
FUNlJAMBNTALS OF STATlSTlC~

RECIPROCALS OF NUMBERS FROM I to 10

[Numbers ito diberellce columns to be subtracted. nol added,)

Mean DlJferences

-_ -- -
0

1,0 1'000
1
~
2 3 4
_-
5
1-
6
-
7

9901 980,," 97 09 961 5 95 2 4 94'34 9346 9259 9174


8
--
9
1 2 a 456
--- ---
'1 89

1-1 1'909 1 9009 89 29 8850 877 2 8696 862( 8547 8475 84()o3
1-2 ,8333 8264 81 97 81 30 8065 8000 7937 7874 781 3 775 2
1,3 '7 692 7634 7576 75 19 7463 7407 7299 7246 7 194
1-4 '7 143 709 2 7042 6993 6944 6897 7W
6 49 6803 6757 6711 5 [0 14 19 24 29 3338 43
1015 ,6667 662 3 6579 653 6 6494 6452 64 10 6369 63 29 628 9 4 813 17 212 5 293338
1'6 '02 50 6211 61 73 61 35 6og8 6061 6024 5988 595 2 '59 17 4 7 II 15 1822 262933
1,' '5882 5848 581 4 57 80 5747 57 14 5682 5650 561 8 5587 3 610 13 1620 23 262 9
1'8 '555 6 55 25 5495 5464 5435 540 5 537 6 5348 53 19 52 91 3 6 9 n IS 17 202 3 26
1-9 '5 263 5236 5208 5181 5155 5128 5102 5076 5051 502 5 358 II 13 16 1821 24
2,0 '5000 4975 4950 49 26 4902 4878 4854 4831 4808 47 85 257 1012 14 17 19 21
2'1 '4762 4739 47 1 7 4695 4673 4651 46 30 4608 45!i7 4566 2 4 7 9 tI 13 1517 20
2,2 1'4545 45 25 45 05 44 84 44 64 4444 44 25 4405 43 86 43 67 2 4 6 8lt>I2 14 1618
2'31'4348 43 29 43 10 4 292 4274 4255 4237 4219 4202 4184 2 4 5 7 9 II 13 1416
2,4 '4167 4 149 41 32 4 1I 5 4098 4082 406 5 4049 4032 4016 2 3 5 7 810 121 3 15
2,51'4000 3984 3968 3953 3937 39 22 3906 3891 3876 3861 2 3 5 6 8 9 II 12 14
26 '3846 383 1 381 7 3802 3788 3774 3759 3745 373 1 37.17 I 3 4 6 7 8 10 I I 13
2" '3704
2,8 '3571
3690 3676 3663
3559 3546 3534
13
650
35 21
3636 362 3
3509 3497
3610
3484
3597
347 2
3584
3460
1 3 4
I 2 4
5 7 8 9 II 12
5 6 7 9 IO II
2'9 '3448 3436 34 25 34 13 3401 3390 337 8 3367 3356 3344 1 2 3 5 6 7 8 9 IO
3,0 '3333 33 22 33II 3300 3289 3279 3268 3257 3247 3236 I 2 3 4 5 6 7 9 10
3'1 '3 226 321 5 3205 3 195 318 5 3175 3 165 3155 3145 3135 I 2 3 4 5 6 7 8 9
3,2 '3 12 5 3I l 5 3106 3096 3086 3077 3067 3058 3049 3040 I 2 3 4 5 6 7 8 9
3,3 '3 030 302 1 301 2 300 3 2994 2985 297 6 29 67 2959 295 0 I 2 3 4 4 5 67 8
3,4 '294 1 2933 29 24 29 15 2907 2899 28 90 2882 28 74 286 5 I 2 3 3 ~- 5 678
3,6 , ' 28 57 28 49 28 41 28 33 282 5 281'1 2809 2801 2793 2786 I 2 :2 3 4 5 6 6 7
3,6 '2778 2770 2762 2755 2747 2740 273 2 27 25 27 17 2710 I 2 2 3 4 5 5 6 7
3" '2703 2695 2688 2681 26 74 2667 2660 26 53 2646 2639 I 1 2 3 4 4 5 6 6
3'8 ' 26 32 262 5 2618 26Il 2604 2597 259 1 2584 2577 2571 1 1 2 3 3 4 S 5 6
3'9 '2564 2558 2551 2545 253 8 253 2 2525 25 19 25 13 2506 I I :2 3 3 4 4 5 6
4,0 '2500 2494 2488 2481 2475 24 69 2463 2457 245 1 2445 I 1 :2 2 3 4 4 5 5
4,1 '2439 2433 242712421 24 15 2410 2404 2398 2392 23117 I I :2 2 3 3 4 5 5
4'2 '23 81 2375 2370 23 64' 235 8 2353 2347 2342 233 6 233 1 ..I I :2 2 3 3 4 4 5
4,3 '2326 23 20 23 15 z309 23 04 2299 2294 2288 228 3 2278 I I 2 2 3 3 4 4 5
4'4 '2273 2268 2262'2257 225 2 2247 2242 2237 223 2 2227 I I :2 2 3 3 4 4 5
4,5 '2222 221 7 2212 2208 2203 2198 21 93 2188 218 3 2179 0 1 I 2 2, 3 ~a "t ~
4,6 ' 21 74 2169 216 5 n60 21 55 2151 21 46 21 41 213'r 21 32 o I 1 2 2 3 3 4 4
4" '2128 212 3 211 9 211 4 2110 2105 2101 2096 2092 2088 o I I 2 2 3 3 4 4
4'8 ' 208 3 2079 2075 20~0 2q66 2062 2058 2a53 2049 2045 0 1 I 2 2 3 3 3 4
4,9 '204 1 2037 2033 2028 202 4 2020 2016 2012 2008 2004 0 I I 222 3 3 4
5,0 '2000 1996 1199 2 1988 1984 1980 1976 1972- 1969 1965 0 I I 222 3 3 4
5'1 '1961 1957\1953 1949 1946 1942 1938 1934 193 1 1927 0 I I 2 2 2 3 3 3
6'2 '19 23 19 19 19 16 19 12 1908 1905 190- 1898 1894 1890 0 1 ! 1 2 2 3 3 3
6'3 '1887 188i!1880 1876 1873 18~11866 1862 18 59 18 55 0 1 1 122 2 3 3
5,4 ' 18 52 1848 1845 1842 18 38 1835 18 32 1828 182 5 1821 0 I 1 I :2 2 2 3 3
14ATH!!'~I.c'nCAI, TABLES

RECIPROCALS OF NUMBERS FROM 1 to 10

[Nwnbers ,,, diflerena columns to oe suouacted. not added,]

Mean DflIenmc:es
0 1 2 3 4 I) 6 7 8 9
123 456 789
I- -- --
6,6 '1818 181S 18u 1808 180S 1802 1799 179S 1792 1789
-- all
-- I-
122 233
5,6 '1786 1783 1779 1776 1773 1770 1767 1764 1761 1757 a I I 122 233
5,7 '1754 175 1 1748 1745 1742 1739 1736 1733 1730 17~7 o I I I 12 223
5,8 '17 24 17 21 1718 17 15 1712 1709 1706 li04 1701 1698 a l l I I 2 223
5,9 '1695 1692 1689 1686 1684 1681 1678 1675 1672 1669 01 1 112 223
6,0 '1667 1664 1661 1658 1656 1653 1650 1647 164S 1642 01 1 I 1 2 223
6-1 . ' 1639 1637 1634 163 1 1629 1626 162 3 1621 1618 1616 o I I I 12 222
6·2 ' 161 3 1610 1608 1605 1603 1600 1597 1595 1592 1590 Oil 112 222
6,3 '15 87 1585 1582 1580 1577 1575 157 2 1570 1567 1565 001 I 1 1 222
6,4 '15 62 15 60 m8 1555 15,53 1550 1548 1546 1543 1541 001 I I I 2.22
6,6 '153 8 1536 1534 153 1 15 29 152 7 15 24 15 22 15 20 15 17 001 I I I 222
6,6 'ISIS 15 13 ISH 1508 1506 1504 1502 1499 1497 1495 001 1 I I 222
6'7 '1493 1490 1488 1486 1484 1481 1479 1477 1475 1473 001 1 1 1 222
6,8 '1471 1468 1466- 1464 1462 1460 1458 145 6 1453 145 1 001 I I I 222
6·9 '1449 1447 1445 1443 1441 1439 1437 1435 1433 143 1 001 I I I 222
7·0 '14 29 1427 1425 1422 1420 14 1 8 1416 1414 1412 14 10 001 I I I 122
H '14 08 1406 1404 1403 1401 1399 1397 1395 1393 139 1 001 1 1 1 122
7,2 '13 89 138 7 1385 1383 1381 1379 1377 1376 1374 137 2 001 I I I 122
"7·3'1370 1368 1366 1364 1362 13 61 1359 1357 1355 1353 001 I I I 122
7·4 '135 1 1350 1348 1346 1344 1342 1340 1339 .1337 1335 001 I 1 I I I 2
7,5 '1333 1332 1330 1328 1326 132 5 132 3 1321 1319 1318 001 I I I 112
7,6'13 16 13 14 13 12 13 11 1309 130 7 130 5 1304 1302 1300 001 I I I 1 I 2
7,7 '1299 12 97 1295 1294 1292 12go 128 9 128 7 1285 1284 000 I 1 1 1 1 I
7,8 '1282 1280 1279 1277 1276 1274 127 2 1271 1269 1267 000 1 1 1 1 I 1
7·9'1266 .1264 1263 1261 1259 12 58 12 56 u55 1253 1252 000 1 I 1 1 1 I
8,0 '1250 1248 1247 1245 1244 1242 1241 1239 1238 1236 000 1 1 1 I I I
8'1 ' 12 35 12 33 12 32 1230 1229 1227 122 5 1224 1222 1221 000 I I 1 I I I
8,2 '1220 1218 1217 121 5 1214 1212 1211 1209 1208 1206 000 1 I 1 I 1 1
8,3 ' 1205 1203 1202 1200 1199 1198 119 6 119S I! 93 1192 000 1 1 1 I I I
8,4 , II go 1189 1188 1186 1185 1183 1182 !l81 1179 1178 000 I 1 1, I 1 1
8,5 '117 6 1175 1174 1172 1171 1170 u68 1167 1166 1104 000 J I I I 1 1
8·6 ' 116 3 1161 1160 1159 1157 1156 1155 1153 1I52 lIS' 000 1 1 I I t ,
8,7 '1149 1148 1I47 1I45 1I44 1143 U4 2 1140 1139 1138 000 1 1 1 I 1 1
8·8 '1136 Il35 1134 1133 1131 1130 1129 1127 1126 1125 000 1 1 1 1 J I
8·9 '1124 1122 lUI 1120 1119 1,117 Ill6 IllS lll4 1112 000 I I I I I 1
9,0 '1111 1110 1109 1107 1106 II 05 11 0 4 1103 IIOI 1100 000 1 I I I I I
9·1 '1099 1098 1096 1095 1094 1093 1092 logo 1089 1088 000 011 I I I
9'l! '1087 1086 1085 1083 1082 1081 1080 1079 1078 1076 000 o I I I 1 I
9,3 ' 1075 1074 1073 1072 1071 1070 1068 1067 1066 1065 000 o I I I 1 I
9·4 '1064 1063 1062 1060 1059 •1058 1057 1056 1055 1054 000 o I I I I I
9,5 '1053 1052 1050 1049 1048 1047 1046 1045 1044 1043 000 01 I r 1 I
9,6 '104 2 1041 1039 1038 1037 1036 1035 1034 1033 1032 000 o I I 1 1 I
9,7 ' 103 1 1030 102911028 102 7 1026 102 5 102 4 1022 1021 000 01 1 1 I I
9·8 '1020 1019 1018 1017 1016 1015 1014 101 3 1012 1011 000 o I I I I I
9'9 /'1010 1009 1008 1007 1006 1005 1004 1003 1002 1001 000 001 I I I
10c6 FUNOAUlLNTALS OP STA'I'IS'I1(;S

TABLE Ordinates and Areas of the Standard


Normal Curve
(In terms of (1' units)
.,
,! Area, Ordinate Area Ordinate
II:
Area Ordinate
- 00 .0000 .3989
"
.110 .1915 .3521
"
1.00 .3413 .2420
.01 .0040 .8989 .51 .1960 .3503 1.01 .8438 .Z31l6
.02 .0080 .3989 .62 .1985 .3485 1.02 .3461 .2371
.03 .0120 .3988 .63 .2019 .3467 1.03 .3485 .2347
.0. .0160 .3986 .M •• 2OM .3448 I.M ..3508 .2323
.015 .0199 .8984 .M .2088 .8429 1.05 .3$31 .2299
.06 .0239 .3982 .M .2123 .3410 1.06 • 35M .2275
.fY1 .0279 .3980 .a7 .2157 .3391 1.07 .31177 '.2251
.08 .0319 .3977 .118 .2190 .3372 1.08 .3599 .2227
.09 .0359 .3973 .59 .2224 .3362 1.09 .3621 .2203
.10 .0398 .3970 .60 .22117 .3332 1.10 • 3M3 .2179
.11 .0438 .3964 .61 .2291 .3312 1.11 .36611 .21115
.12 .M78 .3961 .62 .2324 .3292 1.12 .3686 .2131
.13 .0517 .3956 .63 .2357 .3271 1.13 .3708 .2107
.14 .06117 .39111 .114 .2389 .32111 1.14 .3729 .2083
.13 .0596 .3945 .65 .2422 .8230 1.lli .3749 .20511'
.UI .0636 .3939 .66 .2454 .3209 1.16 .3770 .2036
.17 .0675 .3932 .67 .2.a6 .3187 1.17 .3790 .2012
.18 .0714 .3925 .68 .2517 .3166 1:18 .3810 .1989
.19 .0763 .3918 .69 .2549 .3144 1.19 .3830 .1965
.20 .0793 .8910 .10 .21180 .3123 }.20 .3849 .1942
.21 .0832 .3902 .71 .2311 .3101 .21 .3869 .1919
.22 .0871 .3894 .'12 .2M2 .3079 1.22 .3888 .1895
.23 .0910 .3885 .13 .2673 .301S(! 1.23 .3907 .1872
.24 .0948 .3876 .1' .2703 .3034 1.24 .3925 .1849
.26 .0981 .3867 .15 .2734 .3011 1.2l! .39« .1826
.26 .1026 .3857 .76 .2764 .2989 1.26 .3962 .1~
.'¥1 .1064 .3847 .77 .2794 .2966 1.27 .3980 .1781
.28 .1103 .8836 .18 .2823 .2943 1.I~8 .3991 .1758
.29 .1141 .3825 .19 .2852 .2920 1.29 .4015 .1736
.80 .1119 .3814 .SO .2881 .2891 1.30 .4032 .1714
.81 .1217 .3802 .81 .2910 .2874 1.31 .4MII .1691
.82 .12116 .3790 .82 .2939 .28110 1.32 .4066 .1669
.83 .1293 .3778 .83 .2967 .2827 1.33 .1082 .1641
.84 .1331 .3'1t4 .84 .29911 .2803 1.34 .4099 .1626
.35 .1368 .3752 .85 .8023 .2780 1.35 .4115 • 16M
.86 .1406 .3739 .86 .3051 .2756 1.86 .4131 .1582
.37 .144:) .3725 .87 .3018 .2732 1.37 .4147 .1661
.38 .1480 .8712 .88 .3106 .2709 1.38 .4162 .1539
.89 .1517 .3697 .89 .8133 .2685 1.3' .4177 .11118
.40 .1554 .8683 .90 .lU49 .2661 1.40 .4192 .1497
.'1 .1591 .3008 .91 .3186 .2637 1.41 .42Q7 .1476
.42 .1628 .8653 .112 .3212 .2613 1.42 .4222 .1456
.43 .1664 .3637 .93 .8238 .2589 1.43 .4236 .1435
."
.41i
.1700
.1736
.8621
.86005
.114
.94
.3264
.3289
.2565
.2541
1."
1.45
.4231
.42605
.14105
,.1394
.46 -~772 .3589 .96 .83105 .2516 1.46 .427Y .1374
.47 .1808 .3572 .97 .3340 .2402 1.47 .4292 .1354
.fa8 .1844 .3555 .98 .3365 .2468 1.48 .:4306 .1334
.49 .1879 .3538 .99 .8389 .2444 1.49 .(319 .1315
.GO .1914 .3521 1.00 .8413 .2420 1.40 .4332 .12911
'MATHEWATrC~L TABLES
!
TABLE I Ordinates and Areas ~f the Standard
Normal Curve. Concluded
(In terms of u units)

! A.-ea Or(llnate ! Ana Ordinate ! .Area Ordinate


~ ~ ~

1.110 .4332 .1295 2.00 .4112 .OMO 2.50 .4938 .0115


1.51 ..a45 .1276 2.01 • 417e .0629 2.1I1 .4940 .0171
1.1I2 .4357 .1257' 2.02 .4783 .0619 2.62 .4041 .0167
1.M .4370 .1238 2.03 .4788 .0508 2.53 .4943 .0163
1.M .4382 .1219 2.04 .40793 .0498 2.640 .4945 .0168
1.115 .4.39-l .1200 2.06 .4.798 .0488 2.115 .4.948 .01M
1.641 .4406 .1182 2.06 .4803 .0478 2.66 .4948 .0151
1.47 .4418 .1163 2.07 .4808 .0468 11..'17 .4949 .0141
1.58 .4429 .1145 • 2.08 .4812 .0459 2.68 .4951 .01U
1.1$9 .4441 .1127 2.09 .4817 .0«9 2.1$9 .'962 .0139
1.60 .«1$2 .1109 2.10 .4821 .0«0 2.60 .4953 .0136
1.61 .«63 .1092 2.11 .4826 .0431 2.61 .4965 .0132
1.62 .«7' .107' 2.12 .4830 .0422 2.62 .4966 .0129
1.63 .«84. .1057 2.13 .483' .04.l3 2.63 .4957 .0126
1.64 .«95 .1040 2.1' .4838 .0404 2.64 .4959 .0122
1.66 .41106 .1023 2.16 .4842 .0395 2.65 .4960 .0119
1.66 .45111 .1006 2.16 .41UG .0387 2.66 .4961 .0116
1.67 .4626 .0989 2.17 .4860 .0379 2.67 .4962 .0113
1.68 .4536 .0973 2.18 .'864 .0371 2.68 ...963 .0110
1.69 .4546 .0957 2.19 .4857 .0363 2.69 .4964 .0107
1.70 .4l1M .0940 2.20 .'861 .03611 2.70 .4966 .0104
1.71 .4564 .0925 2.21 .4864 .0347 2.71 .4966 .0101
1.72 .4573 .0909 2.22 .4868 .0339 2.72 .4967 .0099
1.73 .4582 .0893 2.23 .4811 .0332 2.73 .4968 .0098
1.74 .4591 .0878 2.24 .4875 .0325 2.74 .4969 .0093
1.75 .4599 .0863 2.25 .4878 .0317 2.75 .4970 .0091
1.76 .4608 .084.8 2.26 .4881 .0310 2.76 .4971 .0088
1.77 .4616 .0833 2.27 .4884. .0303 2.77 .4972 .0088
1.78 • 46ZS .0818 2.28 .4887 .0297 2.78 .4973 .OOM
1.79 .4633 .0804 2.29 .4890 .0290 2.79 .4974. .0081
1.80 .4eU .0790 2.30 .4893 .0283 !l.80 .'974 .0079
1.81 .'649· ~.0775 2.31 .4896 .0271 2.81 .'975 .0077
1.82 .4656 .0761 .'l.32 .4898 .0270 2.82 .4976 .00711
1.83 .4664 .0748 2.33 .4901 .0264 2.83 .4977 .0073
1.84. .'671 .073~ 2.34 .4904 .0258 2.M .4977 .0071
1.85 .'678 .0721 2.35 .4906 .0262 2.86 .4978 .0069
1.86 .4686 .0701 2.36 .4909 .C246 2.86 .4979 .0067
1.87 .4693 .0694 2.31 .4911 .0241 2.87 .4979 .00611
1.88 .4699 .0681 2.38 .4913 .023.5 2.88 .4980 .0063
1.89 .4706 .0669 2.39 .4916 .0229 2.89 .'981 .0061
1:90 .4713 .0656 2.40 .4918 .0224 2.90 .4981 .0060
1.91 .4719 .0644 2.41 .4920 .0219 2.91 .4982 .0058
1.92 .4726 .0632 2.42 .4922 .0213 2.92 .4982 .OOM
1.93 .4732 .0620 2.43 .4925 .0208 2.93 .4983 .0055
1.94 .4738 .0608 2.4' .'921 .0203 2.94 .'984 .00.53
Lilli .47H .0596 2.45 .4929 .0198 2.95 .4984 .0051
1.96 .4750 .0584 2.46 .4931 .0194 2.96 .49M .0050
1.97 .4756 .0573 2.41 • 49;l2 .0189 2.91 .4985 .0048
1.98 .4761 .0562 2.48 .4034 .0184. 2.08 .4986 .0041
1.99 .4707 .0551 2.49 .4936 .0180 2.99 .4986 .0046
2.00 .477'l. .0540 2.50 .4f '8 .0175 3.00 .4987 .0044
1008 PUNDMBNTALS OF STATISTICS

TABLE OF XI AND

...
X 'CC'"
Degrees of
freedom PROBABILITY PR9BABILITY
.05 .01 .05 .01

I ;.84 6.64 11.7 1 6;.66


2 '·99 9· h 4·;0 9.9 2
3 7. 82 Il·34 p8 5. 84
,.s
-4 9·49
11.07
12.,S9
13. 28
'5. 0 9
16.81
2..7 8
2·51
4. 60
4. 0 3
,.7
2·4S 1

7 14. 0 7 18.48 2.3 6 ;.'0


II 15.' I 20.09 1·31 3.,6
9 16,92 2.1.67 2.26 3. 25
10 18·31 23· 11 2.23 ;.17
II 19. 68 24.7 2 2.2d ;.II
12 2.1.0; 26.2.2 2.18 }.06
c3 22.3 6 27. 6 9 2.16 3. 01
,,
14 23. 68
2.,.00
2.9. 14
30.,8
2.14
2..13
2.9 8
2.·9'
,6 26,3° 32•00 2.12. 2·92.
'7 27,'9 ".4 1 2..11 2..9 0
18 18.87 34. 80 2.10 1.88
'9
20
;0.1"4
31.4 1
,6.19
37·51
2.·°9
2.09
--~~
2.114
2.1 ;2.67 38,93 z:t>8 .1.8,
u. 33.9 2 40 .2.9 2.07 2.82
2.3 35. 17 4 1•6 7 2.07 2.81
24 3 6.42 4 2.9 8 2.06 2.80
2' 37· 6S 44.3 1 2.06 2·79

You might also like