You are on page 1of 19

Unit 1:

Introduction; Meaning & scope classification of data Frequency distribution measures of central
tendency-Dispersion Skewness , Kurtosis Probability theory Bayes theorem
INTRODUCTION:
In the modern world of computers and information technology, the importance of statistics is very well recogonised
by all the disciplines. Statistics has originated as a science of statehood and found applications slowly and steadily
in Agriculture, Economics, Commerce, Biology, Medicine, Industry, planning, education and so on. As on date
there is no other human walk of life, where statistics cannot be applied.
Origin and Growth of Statistics:
The word Statistics and Statistical are all derived from the Latin word Status, means a political state. The theory
of statistics as a distinct branch of scientific method is of comparatively recent growth. Research particularly into
the mathematical theory of statistics is rapidly proceeding and fresh discoveries are being made all over the world.
Meaning of Statistics:
Statistics is concerned with scientific methods for collecting, organising, summarising, presenting and analysing
data as well as deriving valid conclusions and making reasonable decisions on the basis of this analysis. Statistics is
concerned with the systematic collection of numerical data and its interpretation. The word statistic is used to
refer to
1. Numerical facts, such as the number of people living in
particular area.
2. The study of ways of collecting, analysing and interpreting
the facts.
Definitions:
Statistics is defined differently by different authors over a period of time. In the olden days statistics was confined
to only state affairs but in modern days it embraces almost every sphere of human activity. Therefore a number of
old definitions, which was confined to narrow field of enquiry were replaced by more definitions, which are much
more comprehensive and exhaustive. Secondly, statistics has been defined in two different ways Statistical data
and statistical methods . The following are some of the definitions of statistics as numerical data.
1. Statistics are the classified facts representing the conditions of people in a state. In particular they are the facts,
which can be stated in numbers or in tables of numbers or in any tabular or classified arrangement.
2. Statistics are measurements, enumerations or estimates of natural phenomenon usually systematically arranged,
analysed and presented as to exhibit important interrelationships among them.
Definitions by A.L. Bowley:
Statistics are numerical statement of facts in any department of enquiry placed in relation to each other. - A.L.
Bowley Statistics may be called the science of counting in one of the departments due to Bowley, obviously this is
an incomplete definition as it takes into account only the aspect of collection and ignores other aspects such as
analysis, presentation and interpretation.
Bowley gives another definition for statistics, which states
statistics may be rightly called the scheme of averages . This definition is also incomplete, as averages play an
important role in understanding and comparing data and statistics provide more measures.
Definition by Croxton and Cowden:
Statistics may be defined as the science of collection, presentation analysis and interpretation of numerical
data from the logical analysis. It is clear that the definition of statistics by Croxton and Cowden is the most
scientific and realistic one. According to this definition there are four stages:
1. Collection of Data: It is the first step and this is the foundation upon which the entire data set. Careful planning
is essential before collecting the data. There are different methods of collection of 3 data such as census, sampling,
primary, secondary, etc., and the investigator should make use of correct method.

2. Presentation of data: The mass data collected should be presented in a suitable, concise form for further
analysis. The collected data may be presented in the form of tabular or diagrammatic or graphic form.
3. Analysis of data: The data presented should be carefully analysed for making inference from the presented data
such as measures of central tendencies, dispersion, correlation, regression etc.,
4. Interpretation of data: The final step is drawing conclusion from the data collected. A valid conclusion must be
drawn on the basis of analysis. A high degree of skill and experience is necessary for the interpretation.
Functions of Statistics:
There are many functions of statistics. Let us consider the following five important functions.
Condensation:
Generally speaking by the word to condense , we mean to reduce or to lessen. Condensation is mainly applied at
embracing the understanding of a huge mass of data by providing only few observations. If in a particular class in
Chennai School, only marks in an examination are given, no purpose will be served. Instead if we are given the
average mark in that particular examination, definitely it serves the better purpose. Similarly the range of marks
is also another measure of the data. Thus, Statistical measures help to reduce the complexity of the data and
consequently to understand any huge mass of data.
Comparison:
Classification and tabulation are the two methods that are used to condense the data. They help us to compare data
collected from different sources. Grand totals, measures of central tendency measures of dispersion, graphs and
diagrams, coefficient of correlation etc provide ample scope for comparison. If we have one group of data, we can
compare within itself. If the rice production (in Tonnes) in Tanjore district is known, then we can compare one
region with another region within the district. Or if the rice production (in Tonnes) of two different districts
within Tamilnadu is known, then also a comparative study can be made. As statistics is an aggregate of facts and
figures, comparison is always possible and in fact comparison helps us to understand the data in a better way.
Estimation:
One of the main objectives of statistics is drawn inference about a population from the analysis for the sample
drawn from that population. The four major branches of statistical inference are
1. Estimation theory
2. Tests of Hypothesis
3. Non Parametric tests
4. Sequential analysis
In estimation theory, we estimate the unknown value of the population parameter based on the sample observations.
Suppose we are given a sample of heights of hundred students in a school, based upon the heights of these 100
students, it is possible to estimate the average height of all students in that school.
Tests of Hypothesis:
A statistical hypothesis is some statement about the probability distribution, characterising a population on the basis
of the information available from the sample observations. In the formulation and testing of hypothesis, statistical
methods are extremely useful. Whether crop yield has increased because of the use of new fertilizer or whether the
new medicine is effective in eliminating a particular disease are some examples of statements of hypothesis and
these are tested by proper statistical tools.
Scope of Statistics:
Statistics is not a mere device for collecting numerical data, but as a means of developing sound techniques for their
handling, analysing and drawing valid inferences from them. Statistics is applied in every sphere of human activity
social as well as physical like Biology, Commerce, Education, Planning, Business Management, Information
Technology, etc. It is almost impossible to find a single department of human activity where statistics cannot be
applied. We now discuss briefly the applications of statistics in other disciplines.
Statistics and Industry:
Statistics is widely used in many industries. In industries, control charts are widely used to maintain a certain
quality level. In production engineering, to find whether the product is conforming to specifications or not,
statistical tools, namely inspection plans, control charts, etc., are of extreme importance. In inspection plans
we have to resort to some kind of sampling a very important aspect of Statistics.
Statistics and Commerce:
Statistics are lifeblood of successful commerce. Any businessman cannot afford to either by under stocking or
having overstock of his goods. In the beginning he estimates the demand for his goods and then takes steps to
adjust with his output or purchases. Thus statistics is indispensable in business and commerce. As so many
multinational companies have invaded into our Indian economy, the size and volume of business is increasing. On

one side the stiff competition is increasing whereas on the other side the tastes are changing and new fashions are
emerging. In this connection, market survey plays an important role to exhibit the present conditions and to forecast
the likely changes in future.
Statistics and Agriculture:
Analysis of variance (ANOVA) is one of the statistical tools developed by Professor R.A. Fisher, plays a prominent
role in agriculture experiments. In tests of significance based on small samples, it can be shown that statistics is
adequate to test the significant difference between two sample means. In analysis of variance, we are concerned
with the testing of equality of several population means. For an example, five fertilizers are applied to five plots
each of wheat and the yield of wheat on each of the plots are given. In such a situation, we are interested in finding
out whether the effect of these fertilisers on the yield is significantly different or not. In other words, whether the
samples are drawn from the same normal population or not. The answer to this problem is provided by the technique
of ANOVA and it is used to test the homogeneity of several population means.
.Statistics and Economics:
Statistical methods are useful in measuring numerical changes in complex groups and interpreting collective
phenomenon. Nowadays the uses of statistics are abundantly made in any economic study. Both in economic theory
and practice, statistical methods play an important role. Alfred Marshall said, Statistics are the straw only which I
like every other economist have to make the bricks. It may also be noted that statistical data and techniques of
statistical tools are immensely useful in solving many economic problems such as wages, prices, production,
distribution of income and wealth and so on. Statistical tools like Index numbers, time series Analysis,Estimation
theory, Testing Statistical Hypothesis are extensively used in economics.
.Statistics and Education:
Statistics is widely used in education. Research has become a common feature in all branches of activities. Statistics
is necessary for the formulation of policies to start new course, consideration of facilities available for new courses
etc. There are many people engaged in research work to test the past knowledge and evolve new knowledge. These
are possible only through statistics.
.Statistics and Planning:
Statistics is indispensable in planning. In the modern world, which can be termed as the world of planning, almost
all the organisations in the government are seeking the help of planning for efficient working, for the formulation of
policy decisions and execution of the same. In order to achieve the above goals, the statistical data relating to
production, consumption, demand, supply, prices, investments, income expenditure etc and various advanced
statistical techniques for processing, analysing and interpreting such complex data are of importance. In India
statistics play an important role in planning, commissioning both at the central and state government levels.
.Statistics and Medicine:
In Medical sciences, statistical tools are widely used. In order to test the efficiency of a new drug or medicine, t test is used or to compare the efficiency of two drugs or two medicines, t test for the two samples is used. More and
more applications of statistics are at present used in clinical investigation.
.Statistics and Modern applications:
Recent developments in the fields of computer technology and information technology have enabled statistics to
integrate their models and thus make statistics a part of decision making procedures of many organisations. There
are so many software packages available for solving design of experiments, forecasting simulation problems etc.
SYSTAT, a software package offers mere scientific and technical graphing options than any other desktop statistics
package. SYSTAT supports all types of scientific and technical research in various diversified fields as follows
1. Archeology: Evolution of skull dimensions
2. Epidemiology: Tuberculosis
3. Statistics: Theoretical distributions
4. Manufacturing: Quality improvement
5. Medical research: Clinical investigations.
6. Geology: Estimation of Uranium reserves from ground water
Limitations of statistics
Statistics with all its wide application in every sphere of human activity has its own limitations. Some of them are
given below.
1. Statistics is not suitable to the study of qualitative phenomenon:
Since statistics is basically a science and deals with a set of numerical data, it is applicable to the study of only
these subjects of enquiry, which can be expressed in terms of quantitative measurements. As a matter of fact,
qualitative phenomenon like honesty, poverty, beauty, intelligence etc, cannot be expressed numerically and any

statistical analysis cannot be directly applied on these qualitative phenomenons. Nevertheless, statistical techniques
may be applied indirectly by first reducing the qualitative expressions to accurate quantitative terms. For example,
the intelligence of a group of students can be studied on the basis of their marks in a particular examination.
2. Statistics does not study individuals:
Statistics does not give any specific importance to the individual items, in fact it deals with an aggregate of
objects. Individual items, when they are taken individually do not constitute any statistical data and do not serve
any purpose for any statistical enquiry.
3. Statistical laws are not exact:
It is well known that mathematical and physical sciences are exact. But statistical laws are not exact and statistical
laws are only approximations. Statistical conclusions are not universally true. They are true only on an average.
4. Statistics table may be misused:
Statistics must be used only by experts; otherwise, statistical methods are the most dangerous tools on the
hands of the inexpert. The use of statistical tools by the inexperienced and untraced persons might lead to wrong
conclusions. Statistics can be easily misused by quoting wrong figures of data. As King says aptly statistics are
like clay of which one can make a God or Devil as one pleases .
5. Statistics is only, one of the methods of studying a
problem:
Statistical method do not provide complete solution of the problems because problems are to be studied taking
the background of the countries culture, philosophy or religion into consideration. Thus the statistical study should
be supplemented by other evidences.
FREQUENCY DISTRIBUTION

In statistics, frequency distribution is a tabulation of the values that one or more variables take in a sample.
Univariate frequency tables
Univariate frequency distributions are often presented as lists ordered by quantity showing the number of times each
value appears. For example, if 100 people rate a five-point scale .assessing their agreement with a statement on a
scale on which 1 denotes strong agreement and 5 strong disagreement, the frequency distribution of their responses
might look like:
rank Degree of agreement Number
1

Strongly agree

20

Agree somewhat

30

Not sure

20

Disagree somewhat

15

Strongly disagree

15

This simple tabulation has two drawbacks. When a variable can take continuous values instead of discrete values or
when the number of possible values is too large, the table construction is cumbersome, if it is not impossible. A
slightly different tabulation scheme based on the range of values is used in such cases. For example, if we consider
the heights of the students in a class, the frequency table might look like below.

Height range Number of students Cumulative number


4.5 5.0 feet 25

25

5.0 5.5 feet 35

60

5.5 6 feet

80

20

6.0 6.5 feet 20

100

A Frequency Distribution shows us a summarized grouping of data divided into mutually exclusive classes and the
number of occurrences in a class. It is a way of showing unorganized data e.g. to show results of an election, income
of people for a certain region, sales of a product within a certain period, student loan amounts of graduates, etc.
Some of the graphs that can be used with frequency distributions are histograms, line graphs, bar charts and pie
charts. Frequency distributions are used for both qualitative and quantitative data.
Joint frequency distributions
Bivariate joint frequency distributions are often presented as two-way tables:
Two-way table with marginal frequencies
Dance Sports TV Total
Men

Women 16
Total

18

10

20

30

16

16

50

The total row and total column report the marginal frequencies or marginal distribution, while the body of the
table reports the joint frequencies.
Applications
Managing and operating on frequency tabulated data is much simpler than operation on raw data. There are simple
algorithms to calculate median, mean, standard deviation etc. from these tables.
Hypothesis testing is founded on the assessment of differences and similarities between frequency distributions. This
assessment involves measures of averages , such as the mean and, and measures of variability or dispersion, such as
the variance.
A frequency distribution is said to be skewed when its mean and median are different. The kurtosis of a frequency
distribution is the concentration of scores at the mean, or how peaked the distribution appears if depicted graphically
for example, in a histogram. If the distribution is more peaked than the normal distribution it is said to be
leptokurtic; if less peaked it is said to be platykurtic.
Frequency distributions are also used in frequency analysis to crack codes and refer to the relative frequency of
letters in different languages.

A bar chart or bar graph is a chart with rectangular bars with lengths proportional to the values that they
represent. Bar charts are used for comparing two or more values that were taken over time or on different conditions,
usually on small data sets. The bars can be horizontally oriented (also called bar chart) or vertically oriented (also
called column chart). Sometimes a stretched graphic is used instead of a solid bar. It is a visual display used to
compare the amount or frequency of occurrence of different characteristics of data and it is used to compare groups
of data.

Histograms are used to plot density of data, and often for density estimation estimating the pdf of the underlying
variable. The total area of a histogram used for probability density is always normalized to 1. If the length of the
intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot.
As an example we consider data collected by the U.S census Bureau on time to travel to work. The census found that
there were 124 million people who work outside of their homes. This rounding is a common phenomenon when
collecting data from people.

Histogram of travel time, US 2000 census. Area under the curve equals the total number of cases. This diagram uses
Q/width from the table.
Data by absolute numbers
Interval Width Quantity Quantity/width
0

4180

836

13687

2737

10

18618

3723

15

19634

3926

20

17981

3596

25

7190

1438

30

16369

3273

35

3212

642

40

4122

824

45

15

9200

613

60

30

6461

215

90

60

3435

57

This histogram shows the number of cases per unit interval so that the height of each bar is equal to the proportion
of total people in the survey who fall into that category. The area under the curve represents the total number of
cases (124 million). This type of histogram shows absolute numbers.

A pie chart (or a circle graph) is a circular chart divided intosectors, illustrating relative magnitude or frequencies.
In a pie chart, the arc lengthof each sector (and consequently its central angle and area), is proportional to the
quantity it represents. Together, the sectors create a full disk. It is named for its resemblance to a pie which has been
sliced.

MEASURES OF CENTRAL TENDENCY.


1) Define Measures of Central Tendency.

A number or a quantity which is a representative of a set of data is called Measures of Central


Tendency. This single value describes the characteristic of the entire mass of data and is also
called as central value or average.

2) What are the properties of a Good Average?


i)
ii)
iii)
iv)
v)

It should be easy to compute and understand.


It should be based on all items.
It should not be affected by the extreme observations.
It should be rigidly defined.
It should be capable of further mathematical treatment.

3) What are the various Measures of Central Tendency or Averages?


The following are the various Measures of Averages:
i) Arithmetic Mean
ii) Median
iii) Mode
iv) Geometric Mean
v) Harmonic Mean
4) Define Arithmetic Mean.
Arithmetic Mean is obtained by adding together all the items and dividing this by number of items.
The arithmetic mean is the "standard" average, often simply called the "mean".

For example, the arithmetic mean of six values: 34, 27, 45, 55, 22, 34 is:

5) Write the Merits and Demerits of Arithmetic Mean.


MERITS
i)
It is rigidly defined.
ii)
It is easy to compute.
iii)
It is based on all the observations.
iv)
It is suitable for further Algebraic treatment.
v)
It is least affected by fluctuation of sampling.
vi)
It is easy to understand
vii)
It provides a good basis for comparison.
viii)
The mean is a more stable measure of central tendency.
DEMERITS

i)
ii)
iii)
iv)
v)
vi)
vii)

It is affected by extreme items.


It cannot be determined by inspection.
It is unrealistic.
It may lead to a false conclusion.
It cannot be accurately determined even if one of the values in no known.
It is not useful for the study of qualities like intelligence, honesty and character.
It gives greater importance to bigger items of a series and lesser importance to smaller items.
6) Define Median.
Median is the middle value of the distribution.
7) Write the Merits and Demerits of Median.
MERITS
i.
ii.
iii.
iv.
v.
vi.

It is easy to understand.
It is easy to compute.
It is quite rigidly defined.
It eliminates the effect of extreme items.
It is amenable to further algebraic process
Median can be calculated even from qualitative phenomena. i.e., honesty,
character etc.,
vii.
Median can sometimes be known by simple inspection.
viii.
Its value generally lies in the distribution.
ix.
It can be calculated for distributions with open-end classes.
x.
It can be determined graphically
DEMERITS
i.
ii.
iii.
iv.
v.
vi.

Typical representative of the observations cant be computed if the distribution of


item is irregular. For example, 1,2,3,100 and 500, the Median is 3.
Where the number of items is large, the prerequisite process i.e., arraying the items
is a difficult process.
It ignores the extreme items.
In case of continuous series, the median is estimated, but not calculated.
It is more effected by fluctuations of sampling than in mean.
Median is not amenable to further algebraic manipulation.

8) What are the uses of Median?


The Median is useful for distributions containing open-end intervals since these intervals do not enter its
computation. Also since the median is affected by the number rather than the size of items, it is frequently
used instead of the mean as a measure of central tendency in cases where such values are likely to distort
the mean.
9) Define Mode.
The Mode or the Modal Value is that value in a series of observations which occurs with the greatest
frequency.
10) Write the Merits and Demerits of Mode.
MERITS
i.
ii.
iii.
iv.
v.
vi.

It is easy to understand.
It is easy to calculate.
It is usually an actual value as it occurs most frequently in the series.
It is not affected by extreme values as in the average.
It is simple and precise.
It is the most representative average.

vii.
viii.

The value of mode can be determined by the graphic method.


Its value can be determined in an open end class interval without ascertaining the
class limits.
DEMERITS
i.
ii.

It is not suitable for further mathematical treatment.


It may not give weight to extreme items.
iii.
In a bimodal distribution there are two modal classes and it is difficult to
determine the value of the mode.
iv.
It is difficult to compute when there are both positive and negative items in a series
and when there is one or more items are zero.
v.
It is stable only when the sample is large.
vi.
Mode is influenced by magnitude of the class-intervals.
vii.
It will not give the aggregate value as in average.
11) What are the Uses of Mode?
The Mode is employed when the most typical value of a distribution in desired. It is the most meaningful
measure of central tendency in case of highly skewed or non-normal distributions, as it provides the best
indication of the point of maximum concentration.
12) What is the Relationship between Mean, Median and Mode?
Karl Pearson has given an approximate empirical relation connecting mean, median and mode. This
relation is true only in the case of distributions which are moderately asymmetrical. The relation is
Mode = 3 Median 2 Mean.
13) Explain Geometric Mean.
Geometric Mean is defined as the Nth root of the product of N items.
14) Write the Merits and Demerits of Geometric Mean.
MERITS
i.
ii.
iii.
iv.

It is based on all observations.


It is rigidly defined.
It is capable of further algebraic treatment
It is an average most suitable when large weights have to be given to small items
and small weights to the large items.
v.
It is less affected by the extreme values.
vi.
It is useful in studying economic and social data.
vii.
It is suitable for averaging ratios, rates and percentages.
DEMERITS
i.
ii.
iii.

It is difficult to understand.
Non-Mathematical persons cannot do calculations.
The Geometric Mean cannot be computed if and item in the series is negative or
zero.
iv.
It has restricted application.
15) What are the Uses of Geometric Mean?
i.
Geometric Mean is highly useful in averaging ratios, percentages and rate of
increase between two periods.
ii.
Geometric Mean is important in the construction of index numbers.
iii.
In economic and social sciences, where we want to give more weight to smaller
items and smaller weight to large items, geometric mean is appropriate.
16) Define Harmonic Mean.

Harmonic Mean is the reciprocal of the Arithmetic Mean of the reciprocal of values of various items in the
variable.
17) Write the Merits and Demerits of Harmonic Mean.
MERITS
i.
ii.
iii.
iv.
v.

It is rigidly defined.
It is based on all the observations of the series.
It is suitable in case of series having wide dispersion.
It is suitable for further Mathematical treatment.
It gives less weight to large items and more weight to small items.
DEMERITS

i. It is difficult to calculate and is not understandable.


ii.All the values must be available for computation.
iii.It is not popular.
iv.It is usually a value which does not exist in series.
18) What are the Uses of Harmonic Mean?
Harmonic Mean is useful for computing the average rate of increases in profits of a concern or average
speed at which a journey has been performed or the average price at which an article has been sold.
19) Define Dispersion.
Dispersion is the measure of the variation of the items.
20) What are the properties of a good Measure of Variation?
i. It should be rigidly defined.
ii. It should be easy to compute and understand.
iii. It should be based on each and every value of the distribution.
iv It should be subject to further mathematical treatment.
v.It should not be affected by extreme items.
21) Explain the Types of Dispersion.
There are two types of Dispersion. They are
1.Absolute Dispersion

2. Relative Dispersion

1. Absolute Dispersion

When the Dispersion is expressed in terms of original data, it is called Absolute Dispersion. Suppose we
are given the monthly income of 7 workers, such as 100, 180, 120, 190, 110, 80, 70. Here the difference between
maximum and minimum income is Rs.120 and this is called absolute dispersion.
2. Relative Dispersion
When Dispersion is expressed in abstract number such as ratio or percentage, it is called Relative
Dispersion.

22) What are the methods of studying Variations?


The following are the methods of studying Variation:
The Range
The Inter Quartile Range and the Quartile Deviation
The Mean Deviation or Average Deviation
The Standard Deviation
The Lorenz Curve
23) Define Range.
Range is defined as the difference between the value of the smallest item and the value of the largest item
included in the distribution. Symbolically,
Range = L S
Where

L = Largest item and

S = Smallest item.

24) Write the Merits and Demerits of Range.


Merits :
It is simple to compute.
It is easy to understand.
It gives a rough but quick answer.
Demerits :
It is not reliable because it is affected by the extreme items.
It is not suitable for mathematical treatment.
It gives only general idea about dispersion.
It cannot be computed when distribution has open end classes.
It is not based on all the observations of the series.
25) What are the Uses of Range?
Range is used in industries for the statistical quality control of the manufactured product by the
construction of control chart.
Range is useful in studying the variations in the prices of stock, shares and other commodities that
are sensitive to price changes from one period to one period.
The meteorological department uses the range for weather forecasts since public is interested to
know the limits within which the temperature is likely to vary on a particular day.
26) Define Inter Quartile Range and Semi Inter Quartile Range (or) Quartile Deviation.
The distance between the first and the third quartiles is called Inter Quartile Range. Symbolically,
Inter Quartile Range = Q Q
Semi Inter Quartile Range or Quartile Deviation is defined as half the distance between the third and first
quartiles. Symbolically,
Semi Inter Quartile Range (or) Q.D. = Q Q
27) Write the Merits and Demerits of Quartile Deviation.
Merits:

It is simple to understand.
It is easy to compute.
It is not influenced by the extreme values.
It can be found out with open end distribution.

It is not affected by presence of extreme values.


Demerits:
It ignores the first 25% of the items and the last 25% of the items.
It is a positional average; hence not amenable to further mathematical treatment.
It value is affected by sampling fluctuations.
It gives only a rough measure.
It does not show scatter around the average.
28) Define Mean Deviation.
Mean Deviation is the average difference between the items in a distribution and the median or
mean of that series. It is also known as the average deviation.
29) Write the Merits and Demerits of Mean Deviation.
Merits:
It is simple to understand.
It is easy to compute.
Mean Deviation is a calculated value.
It is not much affected by the fluctuations of sampling.
It is based on all items of the series and gives weight according to their size.
It is less affected by the extreme items.
It is rigidly defined.
It is flexible because it can be calculated from any measure of central tendency.
It is a better measure for comparison.
Demerits:
It is a non-algebraic treatment.
Algebraic positive and negative signs are ignored. In Mean Deviation measure +5 and -5
have the same meaning. It is mathematically unsound and illogical.
It is not a very accurate measure of dispersion.
It is not suitable for further mathematical calculation.
It is rarely used. It is not popular as Standard Deviation.
30) What are the Uses of Mean Deviation?
It is useful in marketing problems.
It is useful while using small samples.
It is used in statistical analysis of economic business and social phenomena.
It is useful in calculating the distribution of wealth in a community or a nation.
It is useful in forecasting business cycles.
It will help to understand the Standard Deviation.

31) Define Standard Deviation.


Standard Deviation is the square root of the mean of the squared deviation from the arithmetic mean. It is
also known as root mean square deviation.
32) Write down the difference between Mean Deviation and Standard Deviation.
Mean Deviation
Standard Deviation
1. Deviations are calculated from mean, Deviations are calculated only from mean
median or mode.
2. Algebraic signs are ignored while
calculating Mean Deviation.
3. It is simple to calculate.
Algebraic signs are taken into account.
4. It lacks mathematical properties,
because algebraic signs are ignored.

It is difficult to calculate.
It is mathematically sound because
algebraic signs are taken into account.
.
33) Write the Merits and Demerits.
Merits:
It is rigidly defined.
Its value is always definite.
It is based on all the observations.
The actual signs of deviations are used.
As it is based on Arithmetic Mean, it has all the merits of Arithmetic Mean.
It is the most important and widely used measures of dispersion.
It is possible for further algebraic treatment.
It is less affected by the fluctuations of sampling and hence stable.
Squaring the deviations make all of them positive, as such there is no need to ignore the signs (as
in Mean Deviation).
It is the basis for measuring the coefficient of correlation, sampling and statistical inferences.
The Standard Deviation provides the unit of measurement for the normal distribution.
It can be used to calculate the Combined Standard Deviation of two or more groups.
The coefficient of variation is considered to be the most appropriate method for comparing the
variability of two or more distributions, and this is based on mean and standard deviation.
Demerits:

34)

35)

It is not easy to understand.


It is difficult to calculate.
It gives more weight to extreme values, because the values are squared up.
It is affected by the value of every item in the series.
As it is an absolute measure of variability, it cannot be used for the purpose of comparison.
It has not found favor with the economists and businessmen.
What are the Uses of Standard Deviation?
Standard Deviation is the best Measure of Dispersion.
It is widely used in statistics because it possesses most of the characteristics of an ideal Measure of
Dispersion.
It is widely used in sampling theory and by biologists.
It is used in Coefficient of Correlation and in the study of symmetrical frequency distribution.
Define Coefficient of Variation.
According to Karl Pearson, Coefficient of Variation is the percentage variation in mean, Standard
Deviation being considered as the total variation in the mean. The Standard Deviation is an absolute
measure of dispersion. The corresponding relative measure is known as the Coefficient of Variation.
Symbolically,
Coefficient of Variation (C.V) =

36)

What are the Uses of Coefficient of Variation?


It is used in such in problems where we want to compare the variability of two or more than two series.
That series (or group) for which the C.V is greater is said to be more variable or conversely less

consistent, less uniform, less stable or less homogeneous. On the other hand, the series for which C.V
is less is said to be less variable or more consistent, more uniform, more stable or more homogeneous.
For example, the average height for adult men in the United States is about 70 inches (178 cm), with a standard
deviation of around 3 in (8 cm). This means that most men (about 68 percent, assuming a normal distribution) have
a height within 3 in (8 cm) of the mean (6773 in (170185 cm)), whereas almost all men (about 95%) have a height
within 6 in (15 cm) of the mean (6476 in (163193 cm)). If the standard deviation were zero, then all men would
be exactly 70 in (178 cm) high. If the standard deviation were 20 in (51 cm), then men would have much more
variable heights, with a typical range of about 50 to 90 in (127 to 229 cm).
Example
Consider a population consisting of the following values:

There are eight data points in total, with a mean (or average) value of 5:

To calculate the population standard deviation, first compute the difference of each data point from the mean, and
square the result:

Next average these values and take the square root to give the standard deviation:

Therefore, the above has a population standard deviation of 2.


The above assumes a complete population. If the 8 values are obtained by random sampling from some parent
population, then computing the sample standard deviation would use a denominator of 7 instead of 8.

37)

Define Variance.
Square of Standard Deviation is called Variance.

the coefficient of variation (CV) is a normalized measure of dispersion of a probability


distribution. It is defined as the ratio of the standard deviation to the mean :

38)

Explain Lorenz Curve.


Lorenz Curve is a graphical method of studying dispersion. It is used to study the variability in the
distribution of profits, wages, revenue, etc., and to study the degree of inequality in the distribution of income and
wealth between countries or different periods. It is a cumulative percentage curve in which the percentage of items
is combined with the percentage of other things such as wealth, profit, turnover etc.,

Procedure for drawing Lorenz Curve:

Obtain the cumulative values of both the variable and the frequencies.
Find the percentage for these various cumulative values.
Take the percentage of cumulative frequencies on X-axis starting from 0 to 100.
Take the percentage of cumulative values of the variable on Y-axis starting from 0 to

100.

Draw a Straight line joining the points (0, 0) and (100,100).


Plot the various points corresponding to X and Y.
Join these plotted points by a smooth curve. This curve is called as Lorenz Curve.

The Straight line joining the points (0,0) and (100,100) is known as line of equal distribution. The
Lorenz Curve always lies below the line of equal distribution. If the curve is far away from the Line of
equal distribution then there is a greater variability. The curve will coincide with the line of equal
distribution if the distribution is uniform.

MEASURES OF SKEWNESS AND KURTOSIS


39) Explain Symmetrical Distribution & Asymmetrical Distribution.
SYMMETRICAL DISTRIBUTION:
In a Symmetrical Distribution, the values of mean,
Median and mode coincide. The spread of the
frequencies is the same on both sides of the centre
point of the curve.
ASYMMETRICAL DISTRIBUTION:
A distribution which is not symmetrical is called

a skewed distribution and such a distribution could


either be positively Skewed or negatively skewed.
POSITIVELY SKEWED DISTRIBUTION:
In the positively skewed distribution, the value of the
mean is maximum and that of mode least the
median lies in between two.
NEGATIVELY SKEWED DISTRIBUTION:
In a negatively skewed distribution the value of mode
is maximum and that of mean least the median lies
in between the two.
40) Define Asymmetrical or Skewed.
When a series is not symmetrical it is said to be asymmetrical or skewed.
41) Write the Characteristics of Dispersion and skewness.
DISPERSION
1. It shows us the spread of individual values
1.
about the central value.
2.
2. It is useful to study the variability in data.
3.
3. It judges the truthfulness of the central
tendency.
4.
4. It is a type of averages of deviation
average of the second order.
5.
5.

SKEWNESS
It shows us departure from symmetry.
It is useful to study the concentration in
lower or higher variables.
It judges the differences between the
central tendencies.
It is not an average, but is measured by the
use of the mean, the median and the mode.
It shows whether the concentration is in
higher or lower values.

It shows the degree of variability.

42) What are the important measures of relative skewness?


There are four important measures of relative skewness, namely
The Karl Pearsons coefficient of skewness.
The Bowleys coefficient of skewness.
The Kellys coefficient of skewness.
Measure of skewness based on moments.
43) Define Moments.
The Arithmetic Mean of the various powers of the deviations from mean in any distributions is called the
moments about mean or central moments of the distribution.
44) Explain measure of skewness based on moments.

SKEWNESS
Skewness is a measure of the asymmetry.

1. negative skew: The left tail is longer; the mass of the distribution is concentrated on the right of the figure.
It has relatively few low values. The distribution is said to be left-skewed. Example (observations):
1,1000,1001,1002,1003
2. positive skew: The right tail is longer; the mass of the distribution is concentrated on the left of the figure.
It has relatively few high values. The distribution is said to be right-skewed. Example (observations):
1,2,3,4,100.
In a skewed (unbalanced, lopsided) distribution, the mean is farther out in the long tail than is the median . If there is
no skewness or the distribution is symmetric like the bell shaped normal curve, then the mean = median = mode.
Many textbooks teach a rule of thumb stating that the mean is right of the median under right skew, and left of the
median under left skew. This rule fails with surprising frequency. It can fail in multinomial distribution , or in
distributions where one tail is long but the other is heavy. Most commonly, though, the rule fails in discrete
distributions where the areas to the left and right of the median are not equal. Such distributions not only contradict
the textbook relationship between mean, median, and skew, they also contradict the textbook interpretation of the
median.

Definition
Skewness, the third moment, is written as 1 and defined as

where 3 is the third moment about mean and is the standard deviation.

This is analogous to the definition of kurtosis. The skewness of a random variable X is sometimes denoted Skew[X].

KURTOSIS
Kurtosis (from the Greek word , kyrtos or kurtos, meaning bulging) is a measure of the "peakedness" of the
probability distribution of a real-valued random variable. Higher kurtosis means more of the variance is due to
infrequent extreme deviations, as opposed to frequent modestly-sized deviations.

Definition
The fourth standardized moment is defined as

where 4 is the fourth moment about the mean and is the standard deviation. This is sometimes used as the
definition of kurtosis in older works, but is not the definition used here.
Kurtosis is more commonly defined as the fourth cumulant divided by the square of the second cumulant, which is
equal to the fourth moment around the mean divided by the square of the variance of the probability distribution
minus 3,

which is also known as excess kurtosis. The "minus 3" at the end of this formula is often explained as a correction
to make the kurtosis of the normal distribution equal to zero. Another reason can be seen by looking at the formula
for the kurtosis of the sum of random variables. Because of the use of the cumulant, if Y is the sum of n independent
random variables, all with the same distribution as X, then Kurt[Y] = Kurt[X] / n, while the formula would be more
complicated if kurtosis were defined as 4 / 4.

You might also like