You are on page 1of 102

ABF 102

BUSINESS STATISTICS Vol-1

ACeL
Amity University

In the modern world of computers and information technology, the importance of statistics is very well recogonised by all the disciplines. Statistics has originated as a science of statehood and found applications slowly and steadily in Agriculture, Economics, Commerce, Biology, Medicine, Industry, planning, education and so on. As on date there is no other human walk of life, where statistics cannot be applied.

Preface
The importance of Business Statistics, as a field of study and practice, is being increasingly realized in schools, colleges, universities, commercial and industrial organizations both in India and abroad. It is a technical and practical subject and learning of it means familiarizing oneself with many new terms and concepts. As the Students Study Material is intended to serve the beginners in the field, I have given it the quality of simplicity. This Study Material is intended to serve as a Study Material for students of BFIA course of Amity University. This Study Material of Business Statistics, is student oriented and written in teach yourself style.
The primary objective of this study material is to facilitate clear understanding of the subject of Business Statistics. This Material contains a wide range of theoretical and practical questions varying in content, length and complexity. Most of the illustrations and exercise problems have been taken from the various university examinations. This material contains a sufficiently large number of illustrations to assist better grasp and understanding of the subject. The reader will find perfect accuracy with regard to formulae and answers of the exercise questions. For the convenience of the students I have also included multiple questions and case study in this Study Material for better understanding of the subject.

I hope that this Material will prove useful to both students and teachers. The contents of this Study Material are divided into eight chapters covering various aspects of the syllabus of BFIA and other related courses. At the end of this Material three assignments have been provided which are related with the subject matter.

I have taken considerable amount of help from various literatures, journals and medias. I express my gratitude to all those personalities who have devoted their life to knowledge specially Statistics, from whom I could learn and on the

basis of those learnings now, I am trying to deliver my knowledge to others through this material. It is by Gods loving grace that he brought me in to this world and blessed me with loving and caring parents, my respected father Mr. Manohar Lal Arora and my loving mother Mrs. Kamla Arora, who have supported me in this Study Material. Words may not be enough for me to express my deep sense of gratitude and indebtedness to Dr. Shipra Maitra, Director (Amity College of Commerce & Finance, Amity University, Noida) for the benevolent guidance, constructive criticism and constant encouragement throughout the period I have been involved in this Study Material. I am thankful to my beloved wife Mrs. Deepti Arora, without whose constant encouragement, advice and material sacrifice, this achievement would have been a far of dream.

By- Dr. Adarsh Arora

Table of Contents
Preface ....................................................................................................................................................................... 2 CHAPTER ONE : INTRODUCTION TO STATISTICS....................................................................................... 8 1.1 Introduction ..................................................................................................................................................... 8 1.2 Meaning of Statistics...................................................................................................................................... 9 1.3 Origin and Growth of Statistics: ............................................................................................................. 10 1.4 Definitions : ................................................................................................................................................... 10 1.4.1 Definition by Florence Nightingale ................................................................................................ 11 1.4.2 Definitions by A.L. Bowley: ............................................................................................................ 11 1.4.3 Definition by Croxton and Cowden: ............................................................................................... 11 1.4.4 Definition by Horace Secrist:.......................................................................................................... 12 1.4.5 Definition by Professor Secrit : ...................................................................................................... 12 1.4.6 Definition by Croxton and Cowden : .............................................................................................. 13 1.5 Characteristics of Statistics: .................................................................................................................... 13 1.5.1 Statistics are aggregate of facts : ................................................................................................... 13 1.5.2 Statistics are numerically expressed : ............................................................................................ 13 1.5.3 Statistics are effected to a marked extent by multiplicity of causes : ........................................... 14 1.5.4 Statistics are collected in a systematic order :............................................................................... 14 1.5.5 Statistics must be collected for a predetermined purpose : ......................................................... 14 1.5.6 Statistics should be placed in relation to each other :................................................................... 14 1.6 Functions of Statistics: ............................................................................................................................... 14 1.6.1 Condensation: ................................................................................................................................ 14 1.6.2 Comparison: ................................................................................................................................... 15 1.6.3 Forecasting: .................................................................................................................................... 15 1.6.4 Estimation: ..................................................................................................................................... 15 1.6.5 Tests of Hypothesis: ....................................................................................................................... 16 1.7 Scope of Statistics: ............................................................................................................................ 16 1.7.1 Statistics and Industry: ................................................................................................................... 16 1.7.2 Statistics and Commerce: .............................................................................................................. 17 1.7.3 Statistics and Agriculture: .............................................................................................................. 17 1.7.4 Statistics and Economics: ............................................................................................................... 18 1.7.5 Statistics and Education: ................................................................................................................ 18

1.7.6 Statistics and Planning: .................................................................................................................. 19 1.7.7 Statistics and Medicine: ................................................................................................................. 19 1.7.8 Statistics and Modern applications: ............................................................................................... 19 1.8 Limitations of statistics: ............................................................................................................................ 20 1.8.1 Statistics is not suitable to the study of qualitative phenomenon: ............................................... 20 1.8.2 Statistics does not study individuals: ............................................................................................. 20 1.8.3 Statistical laws are not exact: ........................................................................................................ 20 1.8.4 Statistics table may be misused: .................................................................................................... 21 1.8.5 Statistics is only, one of the methods of studying a problem: ....................................................... 21 1.9 Distrust Of Statistics ................................................................................................................................... 21 1.10 Uses of Statistics : ..................................................................................................................................... 22 1.10.1 To present the data in a concise and definite form : ................................................................... 22 1.10.2 To make it easy to understand complex and large data : ............................................................ 22 1.10.3 For comparison : .......................................................................................................................... 22 1.10.4 In forming policies :...................................................................................................................... 22 1.10.5 Enlarging individual experiences :................................................................................................ 22 1.10.6 In measuring the magnitude of a phenomenon: ......................................................................... 22 1.11 Types of Statistics ..................................................................................................................................... 23 1.12 Common Mistakes Committed In Interpretation of Statistics .................................................. 23 Chapter One: End Chapter Quizzes .............................................................................................................. 25 CHAPTER TWO: PRIMARY AND SECONDARY DATA ............................................................................... 27 2.1 Primary Data ................................................................................................................................................. 27 2.2 Sources of Primary Data ........................................................................................................................... 27 2.2.1 Direct personal investigations :...................................................................................................... 27 2.2.2 Indirect oral investigations : .......................................................................................................... 28 2.2.3 Information through correspondence : ......................................................................................... 28 2.2.4 Mailed questionnaire method : ..................................................................................................... 28 2.2.5 Schedule to be filled in by the enumerator : ................................................................................. 28 2.3 Secondary Data............................................................................................................................................. 28 2.4 The nature of secondary sources of information............................................................................. 29 2.5 Sources of Secondary data ....................................................................................................................... 31 2.5.1 Internal sources of secondary data................................................................................................ 31

2.5.2 External sources of secondary information ................................................................................... 32 2.5.3 Examples of Sources of External Secondary Data .......................................................................... 35 2.6 The problems of secondary sources ..................................................................................................... 36 2.7 Difference between Primary & Secondary Data............................................................................... 39 Chapter Two: End Chapter Quizzes .............................................................................................................. 41 CHAPTER THREE : MEASURES OF DISPERSION....................................................................................... 44 3.1 Meaning........................................................................................................................................................... 44 3.2 Definitions : ................................................................................................................................................... 44 3.3 Types of Dispersion : .................................................................................................................................. 44 3.3.1 Absolute Dispersion : ..................................................................................................................... 44 3.3.2 Relative Dispersion :....................................................................................................................... 44 3.4 Features of an ideal measure of dispersion ....................................................................................... 46 3.5 Methods of measuring Dispersion ........................................................................................................ 46 3.5.1 Range ............................................................................................................................................. 46 3.5.2 Quartile Deviations ........................................................................................................................ 49 3.5.3 Mean Deviation.............................................................................................................................. 52 3.5.4 Standard Deviation (S. D.) .............................................................................................................. 57 3.5.5 Co-efficient Of Variation ( C. V. ) .................................................................................................... 58 Chapter Three:- End Chapter Quizzes ......................................................................................................... 62 CHAPTER FOUR:-MEASURES OF SKEWNESS ............................................................................................. 65 4.1 Skewness ........................................................................................................................................................ 65 4.2 Definitions : ................................................................................................................................................... 65 4.3 Difference between Skewness and Dispersion ................................................................................. 67 4.4 Tests of Skewness ........................................................................................................................................ 67 4.5 Methods of measurement of Skewness ............................................................................................... 67 Chapter Four: End Chapter Quizzes ............................................................................................................. 73 CHAPTER FIVE: CORRELATION ...................................................................................................................... 76 5.1 Introduction .................................................................................................................................................. 76 5.2 Definitions : ................................................................................................................................................... 77 5.3 Coefficient of Correlation ......................................................................................................................... 78 5.4 Types of Correlation ................................................................................................................................... 82 5.5 Degrees of Correlation............................................................................................................................... 83

5.6 Techniques in Determining Correlation ............................................................................................. 84 5.6.1 Rating Scales............................................................................................................................... 85 5.7 Methods of Determining Correlation ................................................................................................... 89 5.7.3 Spearmans Rank Correlation Coefficient ............................................................................ 94 Chapter Five: End Chapter Quizzes ............................................................................................................ 100

CHAPTER ONE : INTRODUCTION TO STATISTICS


1.1 Introduction In the modern world of computers and information technology, the importance of statistics is very well recogonised by all the disciplines. Statistics has originated as a science of statehood and found applications slowly and steadily in Agriculture, Economics, Commerce, Biology, Medicine, Industry, planning, education and so on. As on date there is no other human walk of life, where statistics cannot be applied. Statistics is a discipline which is concerned with: designing experiments and other data collection, summarizing information to aid understanding, drawing conclusions from data, and estimating the present or predicting the future. Today, statistics has become an important tool in the work of many academic disciplines such as medicine, psychology, education, sociology, engineering and physics, just to name a few. Statistics is also important in many aspects of society such as business, industry and government. Because of the increasing use of statistics in so many areas of our lives, it has become very desirable to understand and practise statistical thinking. This is important even if you do not use statistical methods directly.

Examples of Statistics: Unemployment rate, consumer price index, rate of


violent crimes, infant mortality rates, poverty rate of a country, batting average of a

baseball player, on base percentages of a baseball player, salary rates, standardized test results. 1.2 Meaning of Statistics The word 'Statistics' is derived from the Latin word 'Statis' which means a "political state." Clearly, statistics is closely linked with the administrative affairs of a state such as facts and figures regarding defense force, population, housing, food, financial resources etc. What is true about a government is also true about industrial administration units, and even ones personal life. The word statistics has several meanings. In the first place, it is a plural noun which describes a collection of numerical data such as employment statistics, accident statistics, population statistics, birth and death, income and expenditure, of exports and imports etc. It is in this sense that the word 'statistics' is used by a layman or a newspaper. Secondly the word statistics as a singular noun, is used to describe a branch of applied mathematics, whose purpose is to provide methods of dealing with a collections of data and extracting information from them in compact form by tabulating, summarizing and analyzing the numerical data or a set of observations. The various methods used are termed as statistical methods and the person using them is known as a statistician. A statistician is concerned with the analysis and interpretation of the data and drawing valid worthwhile conclusions from the same. It is in the second sense that we are writing this guide on statistics.

Lastly the word statistics is used in a specialized sense. It describes various numerical items which are produced by using statistics ( in the second sense ) to statistics ( in the first sense ). Averages, standard deviation etc. are all statistics in this specialized third sense. 1.3 Origin and Growth of Statistics: The word Statistics and Statistical are all derived from the Latin word Status, means a political state. The theory of statistics as a distinct branch of scientific method is of comparatively recent growth. Research particularly into the mathematical theory of statistics is rapidly proceeding and fresh discoveries are being made all over the world.

1.4 Definitions : Statistics is defined differently by different authors over a period of time. In the olden days statistics was confined to only state affairs but in modern days it embraces almost every sphere of human activity. Therefore a number of old definitions, which was confined to narrow field of enquiry were replaced by more definitions, which are much more comprehensive and exhaustive. Secondly, statistics has been defined in two different ways Statistical data and statistical methods. The following are some of the definitions of statistics as numerical data. 1. Statistics are the classified facts representing the conditions of people in a state. In particular they are the facts, which can be stated in numbers or in tables of numbers or in any tabular or classified arrangement.

2. Statistics are measurements, enumerations or estimates of natural phenomenon usually systematically arranged, analysed and presented as to exhibit important interrelationships among them.
1.4.1 Definition by Florence Nightingale

the most important science in the whole world: for upon it depends the practical application of every other science and every art: the one science essential to all political and social administration, all education, all organization based on experience, for it only gives results of our experience.

1.4.2 Definitions by A.L. Bowley:

Statistics are numerical statement of facts in any department of enquiry placed in relation to each other. - A.L. Bowley Statistics may be called the science of counting in one of the departments due to Bowley, obviously this is an incomplete definition as it takes into account only the aspect of collection and ignores other aspects such as analysis, presentation and interpretation. Bowley gives another definition for statistics, which states statistics may be rightly called the scheme of averages . This definition is also incomplete, as averages play an important role in understanding and comparing data and statistics provide more measures.
1.4.3 Definition by Croxton and Cowden:

Statistics may be defined as the science of collection, presentation analysis and interpretation of numerical data from the logical analysis. It is clear that the

definition of statistics by Croxton and Cowden is the most scientific and realistic one. According to this definition there are four stages: 1. Collection of Data: It is the first step and this is the foundation upon which the entire data set. Careful planning is essential before collecting the data. There are different methods of collection of data such as census, sampling, primary, secondary, etc., and the investigator should make use of correct method. 2. Presentation of data: The mass data collected should be presented in a suitable, concise form for further analysis. The collected data may be presented in the form of tabular or diagrammatic or graphic form. 3. Analysis of data: The data presented should be carefully analysed for making inference from the presented data such as measures of central tendencies, dispersion, correlation, regression etc., 4. Interpretation of data: The final step is drawing conclusion from the data collected. A valid conclusion must be drawn on the basis of analysis. A high degree of skill and experience is necessary for the interpretation.

1.4.4 Definition by Horace Secrist:

Statistics may be defined as the aggregate of facts affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to a reasonable standard of accuracy, collected in a systematic manner, for a predetermined purpose and placed in relation to each other. The above definition seems to be the most comprehensive and exhaustive.
1.4.5 Definition by Professor Secrit : The word statistics in the first sense is

defined by Professor Secrit as follows:-

"By statistics we mean aggregate of facts affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to reasonable standard of accuracy, collected in a systematic manner for a predetermined purpose and placed in relation to each other." This definition gives all the characteristics of statistics which are : Aggregate of facts, Affected by multiplicity of causes, Numerically expressed, Estimated according to reasonable standards of accuracy, Collected in a systematic manner, Collected for a predetermined purpose, Placed in relation to each other.
1.4.6 Definition by Croxton and Cowden : The word 'statistics' in the second

sense is defined by Croxton and Cowden as follows:"The collection, presentation, analysis and interpretation of the numerical data." This definition clearly points out four stages in a statistical investigation, namely: 1) Collection of data 2) Presentation of data

3) Analysis of data

4) Interpretation of data

In addition to this, one more stage i.e. organization of data is suggested.

1.5 Characteristics of Statistics:


1.5.1 Statistics are aggregate of facts : A single fact is not called statistics. To

become statistics, there must be more than one fact. However the data may relate to production, sales, employment, birth, death etc.
1.5.2 Statistics are numerically expressed : Only those statements which can be

expressed numerically are statistics. It does not deal with qualitative statements like students of MBA are intelligent. On the other hand if say that sales of Escorts Ltd. is Rs. 354 crores. These are statistical facts stated numerically.

1.5.3 Statistics are effected to a marked extent by multiplicity of causes :

Statistical data are affected to a great extent by various causes. For instance, the production of wheat depends upon the quality of seed, rainfall, quality of soil, fertilizer used, method of cultivation etc.
1.5.4 Statistics are collected in a systematic order : Statistical data are collected

in a systematic manner. Means the investigator has to chalk out a plan keeping in view the objective of data collection, determine the statistical unit, technique of data collection and so on.
1.5.5 Statistics must be collected for a predetermined purpose : The objective

of data collection must be predetermined and well established. A mere statement of purpose is insufficient.
1.5.6 Statistics should be placed in relation to each other : The Statistical data

must be comparable. It is possible only when the data are homogeneous. 1.6 Functions of Statistics: There are many functions of statistics. Let us consider the following five important functions.
1.6.1 Condensation:

Generally speaking by the word to condense , we mean to reduce or to lessen. Condensation is mainly applied at embracing the understanding of a huge mass of data by providing only few observations. If in a particular class in Chennai School, only marks4

1.6.2 Comparison:

Classification and tabulation are the two methods that are used to condense the data. They help us to compare data collected from different sources. Grand totals, measures of central tendency measures of dispersion, graphs and diagrams, coefficient of correlation etc provide ample scope for comparison. If we have one group of data, we can compare within itself. If the rice production (in Tonnes) in Tanjore district is known, then we can compare one region with another region within the district. Or if the rice production (in Tonnes) of two different districts within Tamilnadu is known, then also a comparative study can be made. As statistics is an aggregate of facts and figures, comparison is always possible and in fact comparison helps us to understand the data in a better way.
1.6.3 Forecasting:

By the word forecasting, we mean to predict or to estimate before hand. Given the data of the last ten years connected to rainfall of a particular district in Tamilnadu, it is possible to predict or forecast the rainfall for the near future. In business also forecasting plays a dominant role in connection with production, sales, profits etc. The analysis of time series and regression analysis plays an important role in forecasting.
1.6.4 Estimation:

One of the main objectives of statistics is drawn inference about a population from the analysis for the sample drawn from that population. The four major branches of statistical inference are 1. Estimation theory 2. Tests of Hypothesis 3. Non Parametric tests 4. Sequential analysis

In estimation theory, we estimate the unknown value of the population parameter based on the sample observations. Suppose we are given a sample of heights of hundred students in a school, based upon the heights of these 100 students, it is possible to estimate the average height of all students in that school.
1.6.5 Tests of Hypothesis:

A statistical hypothesis is some statement about the probability distribution, characterising a population on the basis of the information available from the sample observations. In the formulation and testing of hypothesis, statistical methods are extremely useful. Whether crop yield has increased because of the use of new fertilizer or whether the new medicine is effective in eliminating a particular disease are some examples of statements of hypothesis and these are tested by proper statistical tools.

1.7 Scope of Statistics:

Statistics is not a mere device for collecting numerical data, but as a means of developing sound techniques for their handling, analysing and drawing valid inferences from them. Statistics is applied in every sphere of human activity social as well as physical like Biology, Commerce, Education, Planning, Business Management, Information Technology, etc. It is almost impossible to find a single department of human activity where statistics cannot be applied. We now discuss briefly the applications of statistics in other disciplines.
1.7.1 Statistics and Industry:

Statistics is widely used in many industries. In industries, control charts are widely used to maintain a certain quality level. In production engineering, to find whether the product is conforming to specifications or not, statistical tools, namely inspection plans, control charts, etc., are of extreme importance. In inspection

plans we have to resort to some kind of sampling a very important aspect of Statistics.
1.7.2 Statistics and Commerce:

Statistics are lifeblood of successful commerce. Any businessman cannot afford to either by under stocking or having overstock of his goods. In the beginning he estimates the demand for his goods and then takes steps to adjust with his output or purchases. Thus statistics is indispensable in business and commerce. As so many multinational companies have invaded into our Indian economy, the size and volume of business is increasing. On one side the stiff competition is increasing whereas on the other side the tastes are changing and new fashions are emerging. In this in an examination are given, no purpose will be served. Instead if we are given the average mark in that particular examination, definitely it serves the better purpose. Similarly the range of marks is also another measure of the data. Thus, Statistical measures help to reduce the complexity of the data and consequently to understand any huge mass of data. connection, market survey plays an important role to exhibit the present conditions and to forecast the likely changes in future.
1.7.3 Statistics and Agriculture:

Analysis of variance (ANOVA) is one of the statistical tools developed by Professor R.A. Fisher, plays a prominent role in agriculture experiments. In tests of significance based on small samples, it can be shown that statistics is adequate to test the significant difference between two sample means. In analysis of variance, we are concerned with the testing of equality of several population means.

For an example, five fertilizers are applied to five plots each of wheat and the yield of wheat on each of the plots are given. In such a situation, we are interested in finding out whether the effect of these fertilisers on the yield is significantly different or not. In other words, whether the samples are drawn from the same normal population or not. The answer to this problem is provided by the technique of ANOVA and it is used to test the homogeneity of several population means.
1.7.4 Statistics and Economics:

Statistical methods are useful in measuring numerical changes in complex groups and interpreting collective phenomenon. Nowadays the uses of statistics are abundantly made in any economic study. Both in economic theory and practice, statistical methods play an important role. Alfred Marshall said, Statistics are the straw only which I like every other economist have to make the bricks. It may also be noted that statistical data and techniques of statistical tools are immensely useful in solving many economic problems such as wages, prices, production, distribution of income and wealth and so on. Statistical tools like Index numbers, time series Analysis, Estimation theory, Testing Statistical Hypothesis are extensively used in economics.
1.7.5 Statistics and Education:

Statistics is widely used in education. Research has become a common feature in all branches of activities. Statistics is necessary for the formulation of policies to start new course, consideration of facilities available for new courses etc. There are many people engaged in research work to test the past knowledge and evolve new knowledge. These are possible only through statistics.

1.7.6 Statistics and Planning:

Statistics is indispensable in planning. In the modern world, which can be termed as the world of planning, almost all the organisations in the government are seeking the help of planning for efficient working, for the formulation of policy decisions and execution of the same. In order to achieve the above goals, the statistical data relating to production, consumption, demand, supply, prices, investments, income expenditure etc and various advanced statistical techniques for processing, analysing and interpreting such complex data are of importance. In India statistics play an important role in planning, commissioning both at the central and state government levels.
1.7.7 Statistics and Medicine:

In Medical sciences, statistical tools are widely used. In order to test the efficiency of a new drug or medicine, t - test is used or to compare the efficiency of two drugs or two medicines, t test for the two samples is used. More and more applications of statistics are at present used in clinical investigation.
1.7.8 Statistics and Modern applications:

Recent developments in the fields of computer technology and information technology have enabled statistics to integrate their models and thus make statistics a part of decision making procedures of many organisations. There are so many software packages available for solving design of experiments, forecasting simulation problems etc. SYSTAT, a software package offers mere scientific and technical graphing options than any other desktop statistics package. SYSTAT supports all types of scientific and technical research in various diversified fields as follows 1. Archeology: Evolution of skull dimensions 2. Epidemiology: Tuberculosis 3. Statistics: Theoretical distributions

4. Manufacturing: Quality improvement 5. Medical research: Clinical investigations. 6. Geology: Estimation of Uranium reserves from ground water.

1.8 Limitations of statistics: Statistics with all its wide application in every sphere of human activity has its own limitations. Some of them are given below.
1.8.1 Statistics is not suitable to the study of qualitative phenomenon: Since

statistics is basically a science and deals with a set of numerical data, it is applicable to the study of only these subjects of enquiry, which can be expressed in terms of quantitative measurements. As a matter of fact, qualitative phenomenon like honesty, poverty, beauty, intelligence etc, cannot be expressed numerically and any statistical analysis cannot be directly applied on these qualitative phenomenons. Nevertheless, statistical techniques may be applied indirectly by first reducing the qualitative expressions to accurate quantitative terms. For example, the intelligence of a group of students can be studied on the basis of their marks in a particular examination.
1.8.2 Statistics does not study individuals: Statistics does not give any specific

importance to the individual items, in fact it deals with an aggregate of objects. Individual items, when they are taken individually do not constitute any statistical data and do not serve any purpose for any statistical enquiry.
1.8.3 Statistical laws are not exact: It is well known that mathematical and

physical sciences are exact. But statistical laws are not exact and statistical laws are only approximations. Statistical conclusions are not universally true. They are true only on an average.

1.8.4 Statistics table may be misused: Statistics must be used only by experts;

otherwise, statistical methods are the most dangerous tools on the hands of the inexpert. The use of statistical tools by the inexperienced and untraced persons might lead to wrong conclusions. Statistics can be easily misused by quoting wrong figures of data. As King says aptly statistics are like clay of which one can make a God or Devil as one pleases .
1.8.5 Statistics is only, one of the methods of studying a problem:

Statistical method do not provide complete solution of the problems because problems are to be studied taking the background of the countries culture, philosophy or religion into consideration. Thus the statistical study should be supplemented by other evidences.

1.9 Distrust Of Statistics It is often said by people that, "statistics can prove anything." There are three types of lies - lies, demand lies and statistics - wicked in the order of their naming. A Paris banker said, "Statistics is like a miniskirt, it covers up essentials but gives you the ideas." Thus by "distrust of statistics" we mean lack of confidence in statistical statements and methods. The following reasons account for such views about statistics. Figures are convincing and, therefore people easily believe them. They can be manipulated in such a manner as to establish foregone conclusions. The wrong representation of even correct figures can mislead a reader. For example, John earned $ 4000 in 1990 - 1991 and Jem earned $ 5000. Reading this one would form the opinion that Jem is decidedly a better worker than John. However if we carefully examine the statement, we might reach a different

conclusion as Jems earning period is unknown to us. Thus while working with statistics one should not only avoid outright falsehoods but be alert to detect possible distortion of the truth.

1.10 Uses of Statistics :


1.10.1 To present the data in a concise and definite form : Statistics helps in

classifying and tabulating raw data for processing and further tabulation for end users.
1.10.2 To make it easy to understand complex and large data : This is done by

presenting the data in the form of tables, graphs, diagrams etc., or by condensing the data with the help of means, dispersion etc.
1.10.3 For comparison : Tables, measures of means and dispersion can help in

comparing different sets of data..


1.10.4 In forming policies : It helps in forming policies like a production

schedule, based on the relevant sales figures. It is used in forecasting future demands.
1.10.5 Enlarging individual experiences : Complex problems can be well

understood by statistics, as the conclusions drawn by an individual are more definite and precise than mere statements on facts.
1.10.6 In measuring the magnitude of a phenomenon: Statistics has made it

possible to count the population of a country, the industrial growth, the agricultural growth, the educational level (of course in numbers).

1.11 Types of Statistics As mentioned earlier, for a layman or people in general, statistics means numbers - numerical facts, figures or information. The branch of statistics wherein we record and analyze observations for all the individuals of a group or population and draw inferences about the same is called "Descriptive statistics" or "Deductive statistics". On the other hand, if we choose a sample and by statistical treatment of this, draw inferences about the population, then this branch of statistics is known as Statical Inference or Inductive Statistics. In our discussion, we are mainly concerned with two ways of representing descriptive statistics : Numerical and Pictorial. 1. Numerical statistics are numbers. But some numbers are more meaningful such as mean, standard deviation etc. 2. When the numerical data is presented in the form of pictures (diagrams) and graphs, it is called the Pictorial statistics. This statistics makes confusing and complex data or information, easy, simple and straightforward, so that even the layman can understand it without much difficulty. 1.12 Common Mistakes Committed In Interpretation of Statistics 1. 1.12.1 Bias:- Bias means prejudice or preference of the investigator, which creeps in consciously and unconsciously in proving a particular point. 2. 1.12.2 Generalization:- Some times on the basis of little data available one could jump to a conclusion, which leads to erroneous results.

3. 1.12.3 Wrong conclusion:- The characteristics of a group if attached to an individual member of that group, may lead us to draw absurd conclusions. 4. 1.12.4 Incomplete classification:- If we fail to give a complete classification, the influence of various factors may not be properly understood. 5. 1.12.5 There may be a wrong use of percentages. 6. 1.12.6 Technical mistakes may also occur. 7. 1.12.7 An inconsistency in definition can even exist. 8. 1.12.8 Wrong causal inferences may sometimes be drawn. 9. 1.12.9 There may also be a misuse of correlation.

Chapter One: End Chapter Quizzes 1. The statement, Statistics is both a science and an art, was given by b- Tippet d- A. L. Bowley

a- R. A. Fisher c- L. R. Connor

2.

The word statistics is used as b- Plural d- none of the above

a- Singular

c- Singular and plural both

3. stated by

Statistics provides tools and techniques for research workers, was

a- John I. Griffin c-A. M. Mood

b- W. I. King d- A. L. Boddington

4.

Out of various definitions given by the following workers, which

definition is considered to be most exact? a- R. A. Fisher c- M. G. Kendall b- A. L. Bowley d- Cecil H. Meyers

5. statistics.

Who stated that there are three kinds of lies: lies, dammed lies and

a- Mark Twin c- Darrell Huff

b- Disraeili d- G. W. Snedecor

6.

Which of the following represents data? b- only two values in a set

a- a single value

c- a group of values in a set d- none of the above

7.

Statistics deals with b- quantitative information d- none of (a) and (b)

a- qualitative information c- both (a)and (b)

8.

Relative error is always b- negative d- zero

a- positive

c- positive and negative both

9.

The statement, Designing of an appropriate questionnaire itself wins

half the battle, was given by a- A. R. Ilersic c- H. Huge b- W. I. King d- H. Secrist

10. type

Who originally gave the formula for the estimation of errors of the

a- L. R. Connor c- A. L. Bowley

b- W. I. King d- A. L. Boddington

CHAPTER TWO: PRIMARY AND SECONDARY DATA

2.1 Primary Data The foundation of statistical investigation lies on data so utmost care must be taken while collecting data. If the collected data are inaccurate and inadequate, the whole analysis and interpretation will also become misleading and unreliable. The method of collection of data depends upon the nature, object and scope of statistical enquiry on the one hand and the availability of time and money on the other hand. Data, or facts, may be derived from several sources. Data can be classified as primary data and secondary data. Primary data is data gathered for the first time by the researcher. So if the investigator himself prefers to collect the data for the purpose of purpose and enquiry and uses the data, it is called collection of primary data. These data are original in nature. According to Horace Secrist, primary data are meant that data which are original, that is, those in which little or no grouping has been made, for instance being recorded or itemized as encountered. They are essentially raw material.

2.2 Sources of Primary Data


Primary data may be collected by using the following methods, namely :
2.2.1 Direct personal investigations : Under this method the investigator

personally contacts the informants and collect the data. This method of data collection is suitable where the field of enquiry is limited or the nature of inquiry is confidential.

2.2.2 Indirect oral investigations : This method is generally used in those

cases where informants are reluctant to give information, so information is gathered from those who possess information on the problem under investigation. The informants are called witnesses. This method of investigation is normally used by enquiry committees and commissions.
2.2.3 Information through correspondence : Under this method, the

investigator appoints local agents or correspondents indifferent parts of the field of enquiry. They send information on specific issues on regular basis to investigator. This method is generally adopted by various television news channels, newspapers and periodicals on regular basis.
2.2.4 Mailed questionnaire method : Under this method, a questionnaire is

prepared by the investigator containing questions on the problem under investigations. This questionnaires are mailed to various informants who are requested to return by mail after answering the questions. A covering letter is also enclosed requesting the informants to reply before a specific date.
2.2.5 Schedule to be filled in by the enumerator : Under this method,

enumerators are appointed areawise. They contact the informants and and information is filled up by them in the schedules. The enumerators should be honest, painstaking and tactful as they have to deal with people of different nature.

2.3 Secondary Data


Secondary data is data taken by the researcher from secondary sources, internal or external. The researcher must thoroughly search secondary data sources before commissioning any efforts for collecting primary data. Once the primary data are collected and published, it becomes secondary data for other investigators.

Hence, the data obtained from published or unpublished sources are known as secondary data. There are many advantages in searching for and analyzing data before attempting the collection of primary data. In some cases, the secondary data itself may be sufficient to solve the problem. Usually the cost of gathering secondary data is much lower than the cost of organizing primary data. Moreover, secondary data has several supplementary uses. It also helps to plan the collection of primary data, in case, it becomes necessary. Blair has rightly defined, secondary data, as those already in existence and which have been collected for some other purpose than the answering of the question at hand. Secondary data is of two kinds, internal and external. Secondary data whether internal or external is data already collected by others, for purposes other than the solution of the problem on hand. Business firms always have as great deal of internal secondary data with them. Sales statistics constitute the most important component of secondary data in marketing and the researcher uses it extensively. All the output of the MIS of the firm generally constitutes internal secondary data. This data is readily available; the market researcher gets it without much effort, time and money.

2.4 The nature of secondary sources of information


Secondary data is data which has been collected by individuals or agencies for purposes other than those of our particular research study. For example, if a government department has conducted a survey of, say, family food expenditures, then a food manufacturer might use this data in the organisation's evaluations of the total potential market for a new product. Similarly, statistics prepared by a ministry on agricultural production will prove useful to a whole host of people and organisations, including those marketing agricultural supplies.

No marketing research study should be undertaken without a prior search of secondary sources (also termed desk research). There are several grounds for making such a bold statement. Secondary data may be available which is entirely appropriate and wholly

adequate to draw conclusions and answer the question or solve the problem. Sometimes primary data collection simply is not necessary. It is far cheaper to collect secondary data than to obtain primary data. For

the same level of research budget a thorough examination of secondary sources can yield a great deal more information than can be had through a primary data collection exercise. The time involved in searching secondary sources is much needed to complete primary data collection. Secondary sources of information can yield more accurate data than that less than that

obtained through primary research. This is not always true but where a government or international agency has undertaken a large scale survey, or even a census, this is likely to yield far more accurate results than custom designed and executed surveys when these are based on relatively small sample sizes. It should not be forgotten that secondary data can play a substantial role in

the exploratory phase of the research when the task at hand is to define the research problem and to generate hypotheses. The assembly and analysis of secondary data almost invariably improves the researcher's understanding of the marketing problem, the various lines of inquiry that could or should be followed and the alternative courses of action which might be pursued. Secondary sources help define the population. Secondary data can be

extremely useful both in defining the population and in structuring the sample to be taken. For instance, government statistics on a country's agriculture will help

decide how to stratify a sample and, once sample estimates have been calculated, these can be used to project those estimates to the population.

2.5 Sources of Secondary data


Secondary sources of data may be divided into two categories: internal sources and external sources.

2.5.1 Internal sources of secondary data


Sales data : All organisations collect information in the course of their everyday operations. Orders are received and delivered, costs are recorded, sales personnel submit visit reports, invoices are sent out, returned goods are recorded and so on. Much of this information is of potential use in marketing research but a surprising amount of it is actually used. Organisations frequently overlook this valuable resource by not beginning their search of secondary sources with an internal audit of sales invoices, orders, inquiries about products not stocked, returns from customers and sales force customer calling sheets. For example, consider how much information can be obtained from sales orders and invoices: Sales by territory Sales by customer type Prices and discounts Average size of order by customer, customer type, geographical area Average sales by sales person and Sales by pack size and pack type, etc. This type of data is useful for identifying an organisation's most profitable product and customers. It can also serve to track trends within the enterprise's existing customer group.

Financial data: An organisation has a great deal of data within its files

on the cost of producing, storing, transporting and marketing each of its products and product lines. Such data has many uses in marketing research including allowing measurement of the efficiency of marketing operations. It can also be used to estimate the costs attached to new products under consideration, of particular utilisation (in production, storage and transportation) at which an organisation's unit costs begin to fall. Transport data: Companies that keep good records relating to their

transport operations are well placed to establish which are the most profitable routes, and loads, as well as the most cost effective routing patterns. Good data on transport operations enables the enterprise to perform trade-off analysis and thereby establish whether it makes economic sense to own or hire vehicles, or the point at which a balance of the two gives the best financial outcome. Storage data: The rate of stockturn, stockhandling costs, assessing the

efficiency of certain marketing operations and the efficiency of the marketing system as a whole. More sophisticated accounting systems assign costs to the cubic space occupied by individual products and the time period over which the product occupies the space. These systems can be further refined so that the profitability per unit, and rate of sale, are added. In this way, the direct product profitability can be calculated.

2.5.2 External sources of secondary information


The marketing researcher who seriously seeks after useful secondary data is more often surprised by its abundance than by its scarcity. Too often, the researcher has secretly (sometimes subconsciously) concluded from the outset that his/her topic of study is so unique or specialised that a research of secondary

sources is futile. Consequently, only a specified search is made with no real expectation of sources. Cursory researches become a self-fulfilling prophecy. Dillon et. al3 give the following advice: "You should never begin a half-hearted search with the assumption that what is being sought is so unique that no one else has ever bothered to collect it and publish it. On the contrary, assume there are scrolling secondary data that should help provide definition and scope for the primary research effort." The same authors support their advice by citing the large numbers of organisations that provide marketing information including national and local government agencies, quasi-government agencies, trade associations, universities, research institutes, financial institutions, specialist suppliers of secondary marketing data and professional marketing research enterprises. Dillon et al further advise that searches of printed sources of secondary data begin with referral texts such as directories, indexes, handbooks and guides. These sorts of publications rarely provide the data in which the researcher is interested but serve in helping him/her locate potentially useful data sources. The main sources of external secondary sources are : (1) (2) (3) (4) Government (federal, state and local) Trade associations Commercial services National and international institutions. Governme nt statistics These may include all or some of the following: Population Social surveys, family expenditure censuses surveys statistics

Import/export

Production Agricultural statistics. Trade associations

statistics

Trade associations differ widely in the extent of their data collection and information dissemination activities. However, it is worth checking with them to determine what they do publish. At the very least one would normally expect that they would produce a trade directory and, perhaps, a yearbook.

Commerci al services

Published publications

market

research

reports

and

other

are available from a

wide range of

organisations which charge for their information. Typically, marketing people are interested in media statistics and consumer information which has been obtained from large scale consumer or farmer panels. The commercial organisation funds the collection of the data, which is wide ranging in its content, and hopes to make its money from selling this data to interested parties. National Bank economic reviews, university research reports,

and international journals and articles are all useful sources to contact. institutions International agencies such as World Bank, IMF, IFAD, UNDP, ITC, FAO and ILO produce a plethora of secondary data which can prove extremely useful to the marketing researcher.

2.5.3 Examples of Sources of External Secondary Data


Following are some of the examples of sources of external secondary data : The Internet is a great source of external secondary data. Many

published, statistics and figures are available on the internet either free or for a fee. The yellow pages of telephone directories/stand alone yellow pages

have become an established source of elementary business information. Tata Press, which first launched a stand alone yellow pages directory for Mumbai City, and GETIT yellow pages have been leading in this field. Today, yellow pages publications are available for all cities and major town a in the country. New Horizons, a joint venture between the Living Media group of publications and Singapore Telecom has been publishing stand alone directories for specific businesses. Business India data base of the Business India publications had been publishing the Delhi Pages directory. The Thomas Register is the worlds most powerful industrial buying

guide. It ensures a fast, frictionless flow of information between buyers and sellers of industrial goods and services. This purchasing tool is now available in India. The Thomas Register of Indian manufacturers or TRIM is Indias first dedicated manufacture-to-manufacture register. It features 120,000 listing of 40,000 industrial manufacturers and industrial service categories. It is available in print, CD forms and on the internet. The source Directory brought out by Mumbai based Source Publishers

is another example. It covers contact information on advertising agencies and related services and products, music companies, market research agencies, marketing and sales promotion consultants, publication, radio stations and cable

and satellite station telemarketing services, among others. It currently has editions for Metro cites. The Industrial Product Finder (IPF): IPF details the many application

of the new products and tells what is available and from whom. Most manufacturers of industrial products ensure that a description of their product is published in IPF before they hit the market. Phone data service: Agencies providing phone data services have also

come up in major cities in recent times Melior Communication for example, offers a tele-data service. Basic data on a number of subjects/products can be had through call to the agency. The service is termed Tell me Business through phone service. Its main aim, like that of yellow pages, is to bring buyers and sellers of products together. It also provides some elementary databank support to researchers.

2.6 The problems of secondary sources


Whilst the benefits of secondary sources are considerable, their shortcomings have to be acknowledged. There is a need to evaluate the quality of both the source of the data and the data itself. The main problems may be categorized as follows: Definiti ons The researcher has to be careful, when making use of secondary data, of the definitions used by those responsible for its preparation. Suppose, for example, researchers are interested in rural communities and their average family size. If published statistics are consulted then a check must be done on how terms such as family size have been defined. They may refer only to the nucleus family or include the extended family. Even apparently simple terms such as farm size need

careful handling. Such figures may refer to any one of the following: the land an individual owns, the land an individual owns plus any additional land he/she rents, the land an individual owns minus any land he/she rents out, all of his land or only that part of it which he actually cultivates. It should be noted that definitions may change over time and where this is not rganizati erroneous conclusions may be drawn. Geographical areas may have their boundaries redefined, units of measurement and grades may change and imported goods can be reclassified from time to time for purposes of levying customs and excise duties. Measur ement error When a researcher conducts fieldwork she/he is possibly able to estimate inaccuracies in measurement through the standard deviation and standard error, but these are sometimes not published in secondary sources. The only solution is to try to speak to the individuals involved in the collection of the data to obtain some guidance on the level of accuracy of the data. The problem is sometimes not so much error but differences in levels of accuracy required by decision makers. When the research has to do with large investments in, say, food manufacturing, management will want to set very tight margins of error in making market demand estimates. In other cases, having a high level of accuracy is not so critical. For instance, if a food manufacturer is merely assessing the prospects for one more flavour for a snack food already produced by the company then there is no

need for highly accurate estimates in order to make the investment decision. Source bias Researchers have to be aware of vested interests when they consult secondary sources. Those responsible for their compilation may have reasons for wishing to present a more optimistic or pessimistic set of results for their rganization. It is not unknown, for example, for officials responsible for estimating food shortages to exaggerate figures before sending aid requests to potential donors. Similarly, and with equal frequency, commercial rganizations have been known to inflate estimates of their market shares. Reliabil ity The reliability of published statistics may vary over time. It is not uncommon, for example, for the systems of collecting data to have changed over time but without any indication of this to the reader of published statistics. Geographical or administrative boundaries may be changed by government, or the basis for stratifying a sample may have altered. Other aspects of research methodology that affect the reliability of secondary data is the sample size, response rate, questionnaire design and modes of analysis. Time scale Most censuses take place at 10 year intervals, so data from this and other published sources may be out-of-date at the time the researcher wants to make use of the statistics. The time period during which secondary data was first compiled may have a substantial effect upon the nature of the

data. For instance, the significant increase in the price obtained for Ugandan coffee in the mid-90s could be interpreted as evidence of the effectiveness of the

rehabilitation programme that set out to restore coffee estates which had fallen into a state of disrepair. However, more knowledgeable coffee market experts would interpret the rise in Ugandan coffee prices in the context of large scale destruction of the Brazilian coffee crop, due to heavy frosts, in 1994, Brazil being the largest coffee producer in the world. Whenever possible, marketing researchers ought to use multiple sources of secondary data. In this way, these different sources can be cross-checked as confirmation of one another. Where differences occur an explanation for these must be found or the data should be set aside.

2.7 Difference between Primary & Secondary Data


The difference between primary data and secondary data can be studied in following points, which are : Primary research entails the use of immediate data in determining the survival of the market. The popular ways to collect primary data consist of surveys, interviews and focus groups, which shows that direct relationship between potential customers and the companies. Whereas secondary research is a means to reprocess and reuse collected information as an indication for betterments of the service or product. Both primary and secondary data are useful for businesses but both may differ from each other in various aspects.

In secondary data, information relates to a past period. Hence, it lacks aptness and therefore, it has unsatisfactory value. Primary data is more accommodating as it shows latest information. Secondary data is obtained from some other organization than the one instantaneously interested with current research project. Secondary data was collected and analyzed by the organization to convene the requirements of various research objectives. Primary data is accumulated by the researcher particularly to meet up the research objective of the subsisting project. Secondary data though old may be the only possible source of the desired data on the subjects, which cannot have primary data at all. For example, survey reports or secret records already collected by a business group can offer information that cannot be obtained from original sources. Firm in which secondary data are accumulated and delivered may not accommodate the exact needs and particular requirements of the current research study. Many a time, alteration or modifications to the exact needs of the investigator may not be sufficient. To that amount usefulness of secondary data will be lost. Primary data is completely tailor-made and there is no problem of adjustments. Secondary data is available effortlessly, rapidly and inexpensively. Primary data takes a lot of time and the unit cost of such data is relatively high.

Chapter Two: End Chapter Quizzes


1. Statistical results are, b- not absolutely correct d- misleading

a- cent per correct c- always incorrect

2.

Data taken for the publication, Agricultural Situation in India will be

considered as a primary data b- secondary data d- neither primary nor secondary

c- primary and secondary data

3. respondents abcd-

Mailed quesetionnaire methods of enquiry can be adopted if

live in cities have high income are educated are known

4.Statistical data are collected for, a- collecting data without any purpose b- a given purpose c- any purpose d- none of the above

5. Method of complete enumeration is applicable for abKnowing the production Knowing the population

cd-

Knowing the quantum of export and im port All the above

6. A statistical population may consist of abcdan infinite number of items an finite numberof items either of (a) and (b) none of (a) and (b)

7. Which of the following example does not constitute an infinite population? abcdPopulation consisting of odd numbers Population of weights of newly born babies Population of heights of 15-years -old children Population of head and tails in tossing a coin successively

8. Which of the following can be classified as hypothetical population? abcdAll labourers of a factory Female population of a factory Population of real numbers between 0 and 100 students of the world

9. A study based on complete enumeration is known as abcdsample survey pilot survey census survey none of the above

10.Statistical results are abcabsolutely correct not true true on average

d-

universally true

CHAPTER THREE : MEASURES OF DISPERSION


3.1 Meaning There may be variations in the items of different distributions from average despite the fact that they have value of mean. Hence, the measure of central tendency alone are incapable of taking complete decisions about the decisions. It has to be supplemented by some other measures. 3.2 Definitions : Dispersion is the measure of the variation of the items. ---- A.L. Bowley Dispersion is the measure of the extent to which the individual items vary. ---- L.R. Connor The arithmetic mean of the deviations of the values of the individual items from the measure of a particular central tendency used. Thus the dispersion is also known as the "average of the second degree." Prof. Griffin and Dr. Bowley said the same about the dispersion. 3.3 Types of Dispersion : Dispersion can be divided into following types :
3.3.1 Absolute Dispersion : It is measured in the same statistical unit in

which the original data exist, e.g., kg, rupee, years etc.
3.3.2 Relative Dispersion : Absolute dispersion fails to measure the

comparison between two series specially when the statistical unit is not the same. Hence, absolute dispersion has to be converted into relative measure of dispersion. Relative dispersion is measured in ratio form. It is also called coefficient of dispersion.

The measures of central tendencies (i.e. means) indicate the general magnitude of the data and locate only the center of a distribution of measures. They do not establish the degree of variability or the spread out or scatter of the individual items and their deviation from (or the difference with) the means. i) According to Nciswanger, "Two distributions of statistical data may be symmetrical and have common means, medians and modes and identical frequencies in the modal class. Yet with these points in common they may differ widely in the scatter or in their values about the measures of central tendencies." ii) Simpson and Kafka said, "An average alone does not tell the full story. It is hardly fully representative of a mass, unless we know the manner in which the individual item. Scatter around it. A further description of a series is necessary, if we are to gauge how representative the average is." From this discussion we now focus our attention on the scatter or variability which is known as dispersion. Let us take the following three sets.
Students Group X up Y 1 2 3 50 50 50 50 45 50 55 50 30 45 75 50 Gro Group Z

mean

Thus, the three groups have same mean i.e. 50. In fact the median of group X and Y are also equal. Now if one would say that the students from the three groups are of equal capabilities, it is totally a wrong conclusion then. Close examination reveals that in group X students have equal marks as the mean, students from group Y are very close to the mean but in the third group Z, the

marks are widely scattered. It is thus clear that the measures of the central tendency is alone not sufficient to describe the data. 3.4 Features of an ideal measure of dispersion An ideal measure of dispersion must possess the following features : Simple to understand Easy to compute Well defined measure Based on all the items of data Capable of algebraic treatment Should not be affected by the extreme items.

3.5 Methods of measuring Dispersion


Dispersion can be calculated by using any of the following method : 3.5.1 Range 3.5.2 Quartile Deviation 3.5.3 Mean Deviation 3.5.4 Standard Deviation 3.5.5 Co-efficient of Variation

3.5.1 Range
In any statistical series, the difference between the largest and the smallest values is called as the range.

Thus Range (R) = L - S

Coefficient of Range : The relative measure of the range. It is used in the comparative study of the dispersion co-efficient of Range = Example ( Individual series ) Find the range and the co-efficient of the range of the following items : 110, 117, 129, 197, 190, 100, 100, 178, 255, 790. Solution: R = L - S = 790 - 100 = 690 Solution: R = L - S = 100 - 10 = 90 Co-efficient of range = Example ( Discrete Series ) Find the range and the co-efficient of the range of the following items : x f 8 3 10 8 12 12 13 10 14 6 17 4

Solution

X 8 10 12 13 14 17 Range = L-S = 17- 8 = 9 Coefficient of Range = L-S/ L+S = (17-8) / (17+8)

f 3 8 12 10 6 4

= 9/25 = 0.36 Continuous Series Example (Continuous Series) Find the range and the co-efficient of the range of the following items : X(m arks) F(St udents) 5 8 12 6 4 0-10 10-20 20-30 30-40 40-50

Solution

X(Marks) 0-10 10-20 20-30 30-40 40-50 Range = L-S = 50-0 50 Coefficient of Range = (L-S) / (L+S) Relative Range = (50-0) / (50+0) = 50/50 =1

F(Students) 5 8 12 6 4

3.5.2 Quartile Deviations

If we concentrate on two extreme values ( as in the case of range ), we dont get any idea about the scatter of the data within the range ( i.e. the two extreme values ). If we discard these two values the limited range thus available might be more informative. For this reason the concept of interquartile range is developed. It is the range which includes middle 50% of the distribution. Here 1/4 ( one quarter of the lower end and 1/4 ( one quarter ) of the upper end of the observations are excluded.

Now the lower quartile ( Q1 ) is the 25th percentile and the upper quartile ( Q3 ) is the 75th percentile. It is interesting to note that the 50th percentile is the middle quartile ( Q2 ) which is in fact what you have studied under the title Median ". Thus symbolically If we divide ( Q3 - Q1 ) by 2 we get what is known as Semi-Iinter quartile range. Q.D. = (Q3-Q1)/2, where Q1 = First Quartile and Q3 = Third quartile Relative or Coefficient of Q.D. : To find the coefficient of Q. D., we divide the semi interquartile range by the sum of semi interquartiles. Symbolically : Coefficient of Q.D. = (Q3 Q1) / (Q3 + Q1) Example ( Individual Series ) Find the quartile deviation and its co-efficient from the following items :
X(marks) 5 8 10 12 15 9 11 12 15 20

Solution

S. No.

X(Marks)

Revised X (In ascending order)

1 2 3 4 5 6 7 8 9 10 Q1 = ( N+1)/4th item

5 8 10 12 15 9 11 12 15 20

5 8 9 10 11 12 12 15 15 20

Where N = No. of items in the data Q1 = (10+1)/4 = 11/4 = 2.75th item and 2.75th item = 2nd item + ( 3rd 2nd item) 75/100 = 8 + (9-8) = 8 + 0.75 = 8.75 Q3 = 3 (N+1)/4th item = 3 ( 10+1)/4 = 33/4 = 8.25th item

and 8.25th item 8th = (9th 8th item) 25/100 = 15+(15-15)/4 = 15+ 0 = 15 Q.D. = (Q3 Q1) /2 = (15- 8.75)/ 2 = 3.125 and coefficient of Q.D. = (Q3 Q1) / (Q3+Q1) = (15 8.75) / (15+8.75) = 6.25/ 23.75 = 0.26 Example (Discrete Series) Find the range and the co-efficient of the range of the following data : Solution Central size of items(x) 2 3 4 5 6 7 8 9 10 11 2 3 5 6 8 12 16 7 5 4 2 5 10 16 24 36 52 59 64 68 Frequency(f) c.f.

N = 68 Q1 = ( N+1) /4th item = (68+1)/ 4th item = (69)/4 = 17.25th item 17.25th item lies in c.f. 24 and against value of X = 6 Q1 = 6 Q3 = 3(N+1)/4th item = 3(68+1)/4 th item = (3*69)/4 = 51.75th item 51.75th item lies in c.f. 52 and against it value of X = 8 Q3 = 8 Q.D. = (Q3-Q1)/2 = (8-6)/2 =1 Coefficient of Q.D. = (Q3-Q1)/(Q3+Q1) = (8-6)/(8+6) = 2 / 14 = 0.143
3.5.3 Mean Deviation

Average deviations ( mean deviation ) is the average amount of variations (scatter) of the items in a distribution from either the mean or the median or the mode, ignoring the signs of these deviations by Clark and Senkade. Individual Series Steps : (1) Find the mean or median or mode of the given series.

(2) Using and one of three, find the deviations ( differences ) of the items of the series from them. i.e. xi - x, xi - Me and xi - Mo. Me = Median and Mo = Mode. (3) Find the absolute values of these deviations i.e. ignore there positive (+) and negative (-) signs. i.e. | xi - x | , | xi - Me | and xi - Mo |. (4) Find the sum of these absolute deviations.
i.e. | xi - x | + , | xi - Me | , and | xi - Mo | .

(5) Find the mean deviation using the following formula.

Note that : (i) generally M. D. obtained from the median is the best for the practical purpose. (ii) co-efficient of M. D. = Merits and Demerits of Mean Deviations Merits 1. deviation. 2. 3. This method is based on all the items of the data. The mean deviation is less affected by the extreme items in relation to It is a better technique of dispersion in relation to range and quartile

standard deviations. Demerits

1. This method lacks algebraic treatment as deviation from an average.

signs are ignored while taking

2. Mean deviation can not be considered as a scientific methods as it ignores signs.

Example Calculate Mean deviation and its co-efficient for the following salaries: $ 1030, $ 500, $ 680, $ 1100, $ 1080, $ 1740. $ 1050, $ 1000, $ 2000, $ 2250, $ 3500 and $ 1030.

Calculations :

i) Median (Me) = Size of = Size of 11th item. Therefore, Median ( Me) = 8 ii) M. D. =

Example ( Continuous series ) Calculate the mean deviation and the coefficient of mean deviation from the following data using the mean. Difference in ages between boys and girls of a class.
Diff. in years 0-5 5 10 10 15 15 20 20 25 No.of students 449 705 507 281 109

25 30 30 35 35 40

52 16 4

Calculation: 1) X

2) M. D.

efficient of M. D.3) Co-

3.5.4 Standard Deviation (S. D.)


It is the square root of the arithmetic mean of the square deviations of various values from their arithmetic mean. it is denoted by s.d.

Thus, s.d. ( x ) =

where n =

fi

Merits : (1) It is rigidly defined and based on all observations. (2) It is amenable to further algebraic treatment. (3) It is not affected by sampling fluctuations. (4) It is less erratic. Demerits : (1) It is difficult to understand and calculate. (2) It gives greater weight to extreme values.

Note that variance V(x) =

and s. d. ( x ) = Then V ( x ) =

and

3.5.5 Co-efficient Of Variation ( C. V. ) To compare the variations (dispersion) of two different series, relative measures of standard deviation must be calculated. This is known as co-efficient of variation or the co-efficient of s. d. Its formula is C. V. = Thus it is defined as the ratio s. d. to its mean.

Remark: It is given as a percentage and is used to compare the consistency or variability of two more series. The higher the C. V. , the higher the variability and lower the C. V., the higher is the consistency of the data. Example Calculate the standard deviation and its co-efficient from the following data.
A B C D E F G H I J
Solution

10 12 16 8 25 30 14 11 13 11

No. A B C D E

xi 10 12 16 8 25

(xi - x) -5 -3 +1 -7 +10

( xi - x )2 25 9 1 49 100

F G H I J n= 10

30 14 11 13 11 xi = 150

+15 -1 -5 -2 -4

225 1 16 4 16 |xi -x |2 = 446

Calculations : i)

ii) iii)
Example Calculate s.d. of the marks of 100 students. Marks No. of students (fi) 0-2 2-4 4-6 6-8 8-10 10 20 35 30 5 Midvalues (xi) 1 3 5 7 9 10 60 175 210 45 10 180 875 1470 405 fi xi fi xi2

n = 100

fi xi = 500

fi xi2 = 2940

Solution 1)

2)

Chapter Three:- End Chapter Quizzes

1. Which of the following is not a measure of dispersion? abcdmean deviation quartile deviation standard deviation average deviation from mean

2. Which of the following is a unit less measure of dispersion? abcdstandard deviation mean deviation coefficient of variation range

3. Which one of the given measures of dispersion is considered best? a-standard deviation b- range c- variance d- coefficient of variation 4. For comparison of two different series, the best measure of dispersion is efghrange mean deviation standard deviation none of the above

5. Out of all measures of dispersion, the easiest one to calculate is a- standard deviation

b- range c- variance d- quartile deviation 6. Mean deviation is minimum when deviations are taken from a. b. c. d. mean median mode zero

7. Sum of squares of the deviations is when deviations are taken from a. b. c. d. mean meadian mode zero

8. Which measure of dispersion is least affected by extreme values ? a. b. c. d. 9. is called a. b. c. d. 10. variance absolute deviation standard deviation mean deviation range mean deviation standard deviation quartile deviation

The average of the sum of squares of the deviations about mean

Quartile deviation is equal to a. b. interquartile range double interquartile range

c. d.

half of the interquartile range none of the above

CHAPTER FOUR:-MEASURES OF SKEWNESS


4.1 Skewness
11.The voluminous raw data cannot be easily understood, Hence, we calculate the measures of central tendencies and obtain a representative figure. From the measures of variability, we can know that whether most of the items of the data are close to our away from these central tendencies. But these statical means and measures of variation are not enough to draw sufficient inferences about the data. Another aspect of the data is to know its symmetry. in the chapter "Graphic display" we have seen that a frequency may be symmetrical about mode or may not be. This symmetry is well studied by the knowledge of the "skewness." Still one more aspect of the curve that we need to know is its flatness or otherwise its top. This is understood by what is known as " Kurtosis." 4.2 Definitions : Different authorities have defined skewness in different manners. Some of the definitions are as under : According to Croxton and Cowden, When a series is not symmetrical, it is said to be asymmetrical or skewed. It may happen that two distributions have the same mean and standard deviations. For example, see the following diagram.

Although the two distributions have the same means and standard deviations they are not identical. Where do they differ ? They differ in symmetry. The left-hand side distribution is symmetrical one where as the distribution on the right-hand is asymmetrical or skewed. For a symmetrical distribution, the values, of equal distances on either side of the mode, have equal frequencies. Thus, the mode, median and mean - all coincide. Its curve rises slowly, reaches a maximum ( peak ) and falls equally slowly (Fig. 1). But for a skewed distribution, the mean, mode and median do not coincide. Skewness is positive or negative as per the positions of the mean and median on the right or the left of the mode. A positively skewed distribution ( Fig.2 ) curve rises rapidly, reaches the maximum and falls slowly. In other words, the tail as well as median on the righthand side. A negatively skewed distribution curve (Fig.3) rises slowly reaches its maximum and falls rapidly. In other words, the tail as well as the median are on the left-hand side.

Size 1 2 3 4 5

Frequency 12 13 14 15 14

Size 1 2 3 4 5

Frequency 4 6 12 10 8

Size 1 2 3 4 5

Frequency 3 7 8 10 12

6 7

13 12

6 7

7 3

6 7

6 4

4.3 Difference between Skewness and Dispersion Dispersion refers to spreadness or variations of items in a series while skewness refers to the direction of variation in a series. Thus, we measure the lack of symmetry in the distribution. Skewness may be both positive as well as negative depending upon the fact whether the value of mode is on the right or on the left side of the distribution.

4.4 Tests of Skewness 1. The values of mean, median and mode do not coincide. The more the difference between them, the more is the skewness. 2. Quartiles are not equidistant from the median. i.e. ( Q3 -Me ) ( Me - Q1 ). 3 The sum of positive deviations from the median is not equal to the sum of the negative deviations. 4. Frequencies are not equally distributed at points of equal deviation from the mode. 5. When the data is plotted on a graph they do not give the normal bellshaped form.

4.5 Methods of measurement of Skewness 1. First measure of skewness It is given by Karl Pearson Measure of skewness Co-efficient of skewness

Skp = Mean - Mode i.e. Skp = - Mo

J=

Pearson has suggested the use of this formula if it is not possible to determine the mode (Mo) of any distribution, ( Mean - Mode ) = 3 ( mean - median ) Skp = 3 ( - Mo ) Thus J =

Note : i) Although the co-efficient of skewness is always within 1, but Karl Pearsons co-efficient lies within 3. ii) If J = 0, then there is no skewness iii) If J is positive, the skewness is also positive. iv) If J is negative, the skewness is also negative. Unless and until no indication is given, you must use only Karl Pearsons formula.

Example Find Karl Pearsons coefficient of skewness from the following data:
Marks above 0 10 20 30 40 No.of students 150 140 100 80 80

50 60 70 80

70 30 14 0

Note: You will always find the different values of J when calculated by Karl Pearsons and Bowleys formula. But the value of J by Bowleys formula always lies with 1. Example The following table gives the frequency distribution of 291 workers of a factory according to their average monthly income in 1945- 55.
Income group ($) Below 50 50-70 70-90 90-110 110-130 130-150 No.of workers 1 16 39 58 60 46

150-170 170-190 190-210 210-230 230 & above

22 15 15 9 10

Solution Income group Below 50 50 70 70 90 90 - 110 110 - 130 130 - 150 150 - 170 170 - 190 190 - 210 210 - 230 230 & above n= f 1 16 39 58 60 46 22 15 15 9 10 c.f. 1 17 56 114 174 220 242 257 252 281 291

f = 291

Calculations : 1) Median = Size of item

= Size of

item

= Size of 146th item which lies in (100-130) class interval. Me = =

= =

Chapter Four: End Chapter Quizzes


1. For a positive skewed distribution, which of the following inequally is abcdmedian > mode mode > mean mean > median mean > mode

2. For a negatively skewed distribution, the correct inequality is abcdmode < median mean < median mean < mode none of the above

3. In case of a positive skewed distribution, the relation between mean, mead, median, and mode that hold is abcdmedian >mean >mode mean > median > mode mean = median = mode none of the above

4. For a positive skewed frequency curve, the inequality that holds is abcdQ1 +Q3 >2Q2 Q1 + Q2 > 2Q3 Q1 + Q3 > Q2 Q3 Q1 > Q2

5. If a moderately skewed distribution has mean 30 and mode 36, the median of the distribution is a10

bcd-

35 20 zero

6. First and third quartile of a frequency distribution are 30 and 75. Also its coefficient of skewness is 0.6. The median of the frequency distribution is a- 40 b- 39 c- 41 d- 38 7. For negatively skewed distribution, the correct relation between mean, median and mode is abcdmean = median = mode median < mean < mode mean < median < mode mode < mean < median

8. In the case of positive skewed distribution, the extreme values lies in the abcdleft tail right tail middle any where

9. The extreme values in a negatively skewed distribution lie in the abcdmiddle right tail left tail whole curve

10. Which of the following statements is true for a measures of deviation is amean deviation does not follow algebraic rule

bcd-

range is a crudest measure coefficient of variation is a relative measure all the above statements

CHAPTER FIVE: CORRELATION


5.1 Introduction So far we have considered only univariate distributions. By the averages, dispersion and skewness of distribution, we get a complete idea about the structure of the distribution. Many a time, we come across problems which involve two or more variables. If we carefully study the figures of rain fall and production of paddy, figures of accidents and motor cars in a city, of demand and supply of a commodity, of sales and profit, we may find that there is some relationship between the two variables. On the other hand, if we compare the figures of rainfall in America and the production of cars in Japan, we may find that there is no relationship between the two variables. If there is any relation between two variables i.e. when one variable changes the other also changes in the same or in the opposite direction, we say that the two variables are correlated. W. J. King : If it is proved that in a large number of instances two variables, tend always to fluctuate in the same or in the opposite direction then it is established that a relationship exists between the variables. This is called a "Correlation." The correlation is one of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. Let's work through an example to show you how this statistic is computed. Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. For example, height and weight are related; taller people tend to be heavier than shorter people. The relationship

isn't perfect. People of the same height vary in weight, and you can easily think of two people you know where the shorter one is heavier than the taller one. Nonetheless, the average weight of people 5'5'' is less than the average weight of people 5'6'', and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how much of the variation in peoples' weights is related to their heights. Although this correlation is fairly obvious your data may contain unsuspected correlations. You may also suspect there are correlations, but don't know which are the strongest. An intelligent correlation analysis can lead to a greater understanding of your data. It means the study of existence, magnitude and direction of the relation between two or more variables. in technology and in statistics. Correlation is very important. The famous astronomist Bravais, Prof. Sir Fancis Galton, Karl Pearson (who used this concept in Biology and in Genetics). Prof. Neiswanger and so many others have contributed to this great subject. 5.2 Definitions : An analysis of the covariation of two or more variables is usually called correlation. A. M. Tuttle Correlation analysis attempts to determine the degree of relationship between variables. Ya Lun Chou The effect of correlation is to reduce the range of uncertainty of ones prediction. Tippett

5.3 Coefficient of Correlation


The main result of a correlation is called the correlation coefficient (or "r"). It ranges from -1.0 to +1.0. The closer r is to +1 or -1, the more closely the two variables are related. If r is close to 0, it means there is no relationship between the variables. If r is positive, it means that as one variable gets larger the other gets larger. If r is negative it means that as one gets larger, the other gets smaller (often called an "inverse" correlation). While correlation coefficients are normally reported as r = (a value between -1 and +1), squaring them makes then easier to understand. The square of the coefficient (or r square) is equal to the percent of the variation in one variable that is related to the variation in the other. After squaring r, ignore the decimal point. An r of .5 means 25% of the variation is related (.5 squared =.25). An r value of .7 means 49% of the variance is related (.7 squared = .49). A correlation report can also show a second result of each test statistical significance. In this case, the significance level will tell you how likely it is that the correlations reported may be due to chance in the form of random sampling error. If you are working with small sample sizes, choose a report format that includes the significance level. This format also reports the sample size. A key thing to remember when working with correlations is never to assume a correlation means that a change in one variable causes a change in another. Sales of personal computers and athletic shoes have both risen strongly in the last several years and there is a high correlation between them, but you cannot assume that buying computers causes people to buy athletic shoes (or vice versa).

The second caveat is that the Pearson correlation technique works best with linear relationships: as one variable gets larger, the other gets larger (or smaller) in direct proportion. It does not work well with curvilinear relationships (in which the relationship does not follow a straight line). An example of a curvilinear relationship is age and health care. They are related, but the relationship doesn't follow a straight line. Young children and older people both tend to use much more health care than teenagers or young adults. Multiple regression (also included in the Statistics Module) can be used to examine curvilinear relationships, but it is beyond the scope of this article. Correlation Example Let's assume that we want to look at the relationship between two variables, height (in inches) and self esteem. Perhaps we have a hypothesis that how tall you are effects your self esteem (incidentally, I don't think we have to worry about the direction of causality here -- it's not likely that self esteem causes your height!). Let's say we collect some information on twenty individuals (all male -- we know that the average height differs for males and females so, to keep this example simple we'll just use males). Height is measured in inches. Self esteem is measured based on the average of 10 1-to-5 rating items (where higher scores mean higher self esteem). Here's the data for the 20 cases (don't take this too seriously -- I made this data up to illustrate what a correlation is):

Correlation Example Let's assume that we want to look at the relationship between two variables, height (in inches) and self esteem. Perhaps we have a hypothesis that how tall you are effects your self esteem (incidentally, I don't think we

have to worry about the direction of causality here -- it's not likely that self esteem causes your height!). Let's say we collect some information on twenty individuals (all male -- we know that the average height differs for males and females so, to keep this example simple we'll just use males). Height is measured in inches. Self esteem is measured based on the average of 10 1-to-5 rating items (where higher scores mean higher self esteem). Here's the data for the 20 cases (don't take this too seriously -- I made this data up to illustrate what a correlation is):
Person 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Height 68 71 62 75 58 60 67 68 71 69 68 67 63 62 60 63 65 67 63 61 Self Esteem 4.1 4.6 3.8 4.4 3.2 3.1 3.8 4.1 4.3 3.7 3.5 3.2 3.7 3.3 3.4 4.0 4.1 3.8 3.4 3.6

Now, let's take a quick look at the histogram for each variable:

And, here are the descriptive statistics: Variable Mean Height 65.4 StDev 4.40574 Variance Sum 19.4105 1308 Minimum Maximum Range 58 75 17 1.5

Self 3.755 0.426090 0.181553 75.1 3.1 4.6 Esteem Finally, we'll look at the simple bivariate (i.e., two-variable) plot:

You should immediately see in the bivariate plot that the relationship between the variables is a positive one (if you can't see that, review the section on types of relationships) because if you were to fit a single straight line through the dots it would have a positive slope or move up from left to right. Since the correlation is nothing more than a quantitative estimate of the relationship, we would expect a positive correlation. What does a "positive relationship" mean in this context? It means that, in general, higher scores on one variable tend to be paired with higher scores on the other and that lower scores on one variable tend to be paired with lower scores on the other. You should confirm visually that this is generally true in the plot above.

5.4 Types of Correlation 5.4.1 Positive and negative correlation 5.4.2 Linear and non-linear correlation

A) If two variables change in the same direction (i.e. if one increases the other also increases, or if one decreases, the other also decreases), then this is called a positive correlation. For example : Advertising and sales. B) If two variables change in the opposite direction ( i.e. if one increases, the other decreases and vice versa), then the correlation is called a negative correlation. For example : T.V. registrations and cinema attendance. 1. The nature of the graph gives us the idea of the linear type

of correlation between two variables. If the graph is in a straight line, the correlation is called a "linear correlation" and if the graph is not in a straight line, the correlation is non-linear or curvi-linear. For example, if variable x changes by a constant quantity, say 20 then y also changes by a constant quantity, say 4. The ratio between the two always remains the same (1/5 in this case). In case of a curvi-linear correlation this ratio does not remain constant.

5.5 Degrees of Correlation


Through the coefficient of correlation, we can measure the degree or extent of the correlation between two variables. On the basis of the coefficient of correlation we can also determine whether the correlation is positive or negative and also its degree or extent.
5.5.1 Perfect correlation: If two variables changes in the same direction

and in the same proportion, the correlation between the two is perfect positive. According to Karl Pearson the coefficient of correlation in this case is +1. On the other hand if the variables change in the opposite direction and in the same proportion, the correlation is perfect negative. its coefficient of correlation is -1. In practice we rarely come across these types of correlations.

5.5.2 Absence of correlation: If two series of two variables exhibit no

relations between them or change in variable does not lead to a change in the other variable, then we can firmly say that there is no correlation or absurd correlation between the two variables. In such a case the coefficient of correlation is 0.
5.5.3 Limited degrees of correlation: If two variables are not perfectly

correlated or is there a perfect absence of correlation, then we term the correlation as Limited correlation. It may be positive, negative or zero but lies with the limits 1.

High degree, moderate degree or low degree are the three categories of this kind of correlation. The following table reveals the effect ( or degree ) of coefficient or correlation.
Degrees Absence of correlation Positive Zero Negative 0

Perfect correlation High degree Moderate degree Low degree

+1 + 0.75 to + 1 + 0.25 to + 0.75 0 to 0.25

-1 - 0.75 to 1 - 0.25 to 0.75 0 to - 0.25

5.6 Techniques in Determining Correlation


There are several different correlation techniques. The Survey System's optional Statistics Module includes the most common type, called the Pearson or product-moment correlation. The module also includes a variation on this

type called partial correlation. The latter is useful when you want to look at the relationship between two variables while removing the effect of one or two other variables. Like all statistical techniques, correlation is only appropriate for certain kinds of data. Correlation works for quantifiable data in which numbers are meaningful, usually quantities of some sort. It cannot be used for purely categorical data, such as gender, brands purchased, or favorite color. Following are the techniques for determining the correlation :-

5.6.1 Rating Scales


Rating scales are a controversial middle case. The numbers in rating scales have meaning, but that meaning isn't very precise. They are not like quantities. With a quantity (such as dollars), the difference between 1 and 2 is exactly the same as between 2 and 3. With a rating scale, that isn't really the case. You can be sure that your respondents think a rating of 2 is between a rating of 1 and a rating of 3, but you cannot be sure they think it is exactly halfway between. This is especially true if you labeled the mid-points of your scale (you cannot assume "good" is exactly half way between "excellent" and "fair"). Most statisticians say you cannot use correlations with rating scales, because the mathematics of the technique assume the differences between numbers are exactly equal. Nevertheless, many survey researchers do use correlations with rating scales, because the results usually reflect the real world. Our own position is that you can use correlations with rating scales, but you should do so with care. When working with quantities, correlations

provide precise measurements. When working with rating scales, correlations provide general indications.

Calculating the Correlation


Now we're ready to compute the correlation value. The formula for the correlation is:

We use the symbol r to stand for the correlation. Through the magic of mathematics it turns out that r will always be between -1.0 and +1.0. if the correlation is negative, we have a negative relationship; if it's positive, the relationship is positive. You don't need to know how we came up with this formula unless you want to be a statistician. But you probably will need to know how the formula relates to real data -- how you can use the formula to compute the correlation. Let's look at the data we need for the formula. Here's the original data with the other necessary columns:

Person
1 2 3

Heigh t (x) 68 71 62

Self Esteem (y) 4.1 4.6 3.8

x*y 278.8 326.6 235.6

x*x 4624 5041 3844

y*y 16.81 21.16 14.44

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Sum =

75 58 60 67 68 71 69 68 67 63 62 60 63 65 67 63 61 1308

4.4 3.2 3.1 3.8 4.1 4.3 3.7 3.5 3.2 3.7 3.3 3.4 4 4.1 3.8 3.4 3.6 75.1

330 185.6 186 254.6 278.8 305.3 255.3 238 214.4 233.1 204.6 204 252 266.5 254.6 214.2 219.6 4937. 6 2

5625 3364 3600 4489 4624 5041 4761 4624 4489 3969 3844 3600 3969 4225 4489 3969 3721 8591 5

19.36 10.24 9.61 14.44 16.81 18.49 13.69 12.25 10.24 13.69 10.89 11.56 16 16.81 14.44 11.56 12.96 285.4

The first three columns are the same as in the table above. The next three columns are simple computations based on the height and self esteem data. The bottom row consists of the sum of each column. This is all the

information we need to compute the correlation. Here are the values from the bottom row of the table (where N is 20 people) as they are related to the symbols in the formula: The first three columns are the same as in the table above. The next three columns are simple computations based on the height and self esteem data. The bottom row consists of the sum of each column. This is all the information we need to compute the correlation. Here are the values from the bottom row of the table (where N is 20 people) as they are related to the symbols in the formula:

Now, when we plug these values into the formula given above, we get the following (I show it here tediously, one step at a time):

So, the correlation for our twenty cases is .73, which is a fairly strong positive relationship. I guess there is a relationship between height and self esteem, at least in this made up data!

5.7 Methods of Determining Correlation


We shall consider the following most commonly used methods.(1) Scatter Plot (2) Kar Pearsons coefficient of correlation (3) Spearmans Rankcorrelation coefficient. 5.7.1 Scatter Plot (Scatter diagram or dot diagram): In this method the values of the two variables are plotted on a graph paper. One is taken along the horizontal ( (x-axis) and the other along the vertical (y-axis). By plotting the data, we get points (dots) on the graph which are generally scattered and hence the name Scatter Plot. The manner in which these points are scattered, suggest the degree and the direction of correlation. The degree of correlation is denoted by r and its direction is given by the signs positive and negative.

i) If all points lie on a rising straight line the correlation is perfectly positive and r = +1 (see fig.1 ) ii) If all points lie on a falling straight line the correlation is perfectly negative and r = -1 (see fig.2) iii) If the points lie in narrow strip, rising upwards, the correlation is high degree of positive (see fig.3)

iv) If the points lie in a narrow strip, falling downwards, the correlation is high degree of negative (see fig.4) v) If the points are spread widely over a broad strip, rising upwards, the correlation is low degree positive (see fig.5) vi) If the points are spread widely over a broad strip, falling downward, the correlation is low degree negative (see fig.6) vii) If the points are spread (scattered) without any specific pattern, the correlation is absent. i.e. r = 0. (see fig.7) Though this method is simple and is a rough idea about the existence and the degree of correlation, it is not reliable. As it is not a mathematical method, it cannot measure the degree of correlation.

5.7.2 Karl Pearsons coefficient of correlation: It gives the


numerical expression for the measure of correlation. it is noted by r . The value of r gives the magnitude of correlation and sign denotes its direction. It is defined as r=

where N = Number of pairs of observation Note : r is also known as product-moment coefficient of correlation.

OR r =

OR r = Now covariance of x and y is defined as

Example Calculate the coefficient of correlation between the heights of father and his son for the following data.
Height of father (cm):

165

166

167

168

167

169

170

172

Height of son (cm):

167

168

165

172

168

172

169

171

Solution: n = 8 ( pairs of observations )


Height of father xi 165 166 Height of son yi 167 168 -3 -2 -2 -1 6 2 9 4 4 1 x = xix

y= yi-y

xy

x2

y2

167 167 168 169 170 172 xi=1344

165 168 172 172 169 171 yi=1352

-1 -1 0 1 2 4 0

-4 -1 3 3 0 2 0

4 1 0 3 0 8 xy=24

1 1 0 1 4 16 x2=36

16 1 9 9 0 4 y2=44

Calculation:

Now,

Since r is positive and 0.6. This shows that the correlation is positive and moderate (i.e. direct and reasonably good). Example From the following data compute the coefficient of correlation between x and y.

Example

If

covariance

between x and y is 12.3 and the variance of x and y are 16.4 and 13.8 respectively. Find the coefficient of correlation between them. Solution: Given - Covariance = cov. ( x, y ) = 12.3 Variance of x ( Variance of y ( Now,
y 2 x 2

)= 16.4

) = 13.8

5.7.3 Spearmans Rank Correlation Coefficient


This method is based on the ranks of the items rather than on their actual values. The advantage of this method over the others in that it can be used even when the actual values of items are unknown. For example if you want to know the correlation between honesty and wisdom of the boys of your class, you can use this method by giving ranks to the boys. It can also be used to find the degree of agreements between the judgements of two examiners or two judges. The formula is :

R= where R = Rank correlation coefficient D = Difference between the ranks of two items N = The number of observations. Note: -1 i) R 1. Perfect positive correlation or complete

When R = +1

agreement in the same direction ii) When R = -1 Perfect negative correlation or complete

agreement in the opposite direction. iii) When R = 0 No Correlation.

Computation: i. Give ranks to the values of items. Generally the item with

the highest value is ranked 1 and then the others are given ranks 2, 3, 4, .... according to their values in the decreasing order.

ii.

Find the difference D = R1 - R2

where R1 = Rank of x and R2 = Rank of y Note that iii. iv. Note : In some cases, there is a tie between two or more items. in such a case each items have ranks 4th and 5th respectively then they are given = D = 0 (always) Calculate D2 and then find Apply the formula. D2

4.5th rank. If three items are of equal rank say 4th then they are given = 5th rank each. If m be the number of items of equal ranks, the factor is added to S D2. If there are more than one of such cases

then this factor added as many times as the number of such cases, then

Example: Calculate Rank Correlation from the following data.


Student No.: Rank in Maths : Rank in Stats: Solution : 1 1 2 3 3 7 4 5 5 4 6 6 7 2 8 10 9 9 10 8

10

Student No.

Rank in Maths (R1) 1 3 7 5 4 6 2 10 9 8

Rank in Stats (R2) 3 1 4 5 6 9 7 8 10 2

R1 - R2 D -2 2 3 0 -2 -3 -5 2 -1 6 SD=0

(R1 - R2 )2 D2 4 4 9 0 4 9 25 4 1 36 S D2 = 96

1 2 3 4 5 6 7 8 9 10 N = 10

Calculation of R :

Example Calculate R of 6 students from the following data. Marks in Stats : Marks in

40

42

45

35

36

39

46

43

44

39

40

43

English : Solution: Marks in Stats 40 42 45 35 36 39 Marks in English 46 43 44 39 40 43 (R1 -R2)2 =D2 4 2.25 1 0 0 0.25

R1 3 2 1 6 5 4

R2 1 3.5 2 6 5 3.5

R1 - R2 2 -1.5 -1 0 0 0.5

N=6

SD=0

S D2 = 7.50

Here m = 2 since in series of marks in English of items of values 43 repeated twice.

Example The value of Spearmans rank correlation coefficient for a certain number of pairs of observations was found to be 2/3. The sum of the squares of difference between the corresponding rnks was 55. Find the number of pairs. Solution: We have

Example A panel of two judges A and B graded dramatic performance by independently awarding marks as follows:

Solution

The equation of the line of regression of y on x

Inserting x = 38, we get y - 33 = 0.74 ( 38 - 33 ) y - 33 = 0.74 y - 33 = 3.7 y = 3.7 + 33 y = 36.7 = 37 ( approximately ) Therefore, the Judge B would have given 37 marks to 8th performance. 5

Chapter Five: End Chapter Quizzes


1. The idea of product moment correlation was given by abcdR. A. Fisher Sir Francis Galton Karl Pearson Spearman

2. abcd-

Correlation coefficient was invented in the year 1910 1890 1908 none of the above

3. abcd-

The unit of correlation coefficient is kg/ cc per cent non-existing none of the above

4. abcd-

The correlation between two variables is of order 2 1 0 none of the above

5.

Coefficient of co-current deviation depends on

abcd-

the signs of the deviations the magnitude of deviation both (a) and (b) none of (a) and (b)

6.

If each group consists of one observation only, the value of

correlation ratio is abcd1 0 between 1 and 0 between 1and 1

7.

From a given (2*c) contingency table, the appropriate measure of

association is abcdcorrelation ratio biserial correlation intracless correlation tetrachoric correlation

8. abcd-

Another name of autocorrelation is biserial correlation serial correlation Spearmans correlation none of the above

9. means that

If the correlation coefficient between two variables is positive, it

abcd-

far apart coincident near to each other none of the above

10. abcd-

The correlation between the two variables is unity, there is perfect correlation perfect positive correlation perfect negative correlation no correlation