You are on page 1of 123

Business Statistics

Contents
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. Meaning and Scope Collection of Data Classification and tabulation Diagrammatic and Graphic Representation Averages Dispersion Skewness and Kurtosis Correlation Linear Regression Analysis Index Numbers Time series Analysis Theory of Probability Random Variable, Probability Distribution and Mathematical expectation Theoretical Distributions Sampling Theory and Design of sample Surveys Interpolation and Extrapolation

Quantitative Decision Making

Learning Objectives
Basic Statistics and its application in dayto-day life of a Manager
Various aspects of quantitative techniques and their application in Decision making Also frequently used models of Statistical analysis
Understand: Complexity of Managerial decisions Quantitative Techniques Need of using Quantitative approach in decisions Role of statistical methods in data analysis Brief idea of various statistical methods Know the areas of applications of quantitative approach in business and management.

Introduction
Individual business prior to Industrial revolution and need for info----Decisions based on past experience and intuition.
Marketing of products Test marketing of products The manager (also the owner) Progress of work Any other fact the owner needed to know

Intuition alone has no place in decision making


Becomes highly questionable when decisions involve the choice among several courses of action each of which can achieve several management objectives simultaneously.

Also in:

Statistical methods used in Marketing, Finance, Production and personnel


Regional planning Transportation Public health Communication Military agriculture

QT: A group of statistical , and OR (programming) Techniques


QT approach in decision making :
Problems be defined, analyzed and solved in a conscious, rational, systematic, scientific manner based on ;
Data, facts, info, and logic (and not whims and guesses)
QT provides decision maker a scientific method based on quantitative data in identifying a course of action to achieve the optimal value of the predetermined objective or goal. Usage of numbers , symbols or mathematical formulae are used to represent the models of reality.

Statistics and different senses


Statistical Data
Numerical or quantitative aspects

Statistical Methods
Collect, organize /classify, present, analyze and interpret

Functions of Statistical Methods


Data Collection Organize: segregate/condense Presentation: orderly manner: graphs/charts Analysis Interpretation
examples

Statistics: Characteristics of Data: Common to refer data in quantitative form as Data.


Not all numerical data is statistical. For numerical description to be statistics:
Aggregate of facts Affected to a marked extend by multiplicity of causes (controllable/uncontrollable) Enumerated or estimated according to reasonable standard of accuracy. Collected in a systematic manner for a pre-determined purpose. Placed in relation to each other Numerically expressed

Types of Statistical data


Secondary Primary

OR : a mathematical model to represent the situation under study.


Helps to:
Either to predict the performance of a system Or determine the action or control needed to optimize the performance.

Classification of Statistical Methods into three categories


Descriptive Statistics
Data Collection Presentation

Inductive statistics
Statistical inference Estimation

Statistical decision Theory


Analysis of business Decision

Descriptive Statistics
Used for re-arranging, grouping, and summarizing sets of data
Changes in price index, Yield by wheat using different charts and graphs
having large quantities of numerical data for easy understanding Various types of averages, central tendency and dispersion, trends, index numbers.

Inductive Statistics
The development of some criteria which can be used to derive info about the nature of entire population or universe from the nature of the small sample.
Include :
probability, probability distribution, sampling and sampling distribution, various methods of testing hypothesis :correlation, regression, factor analysis, time series analysis.

Statistical Decision Theory; 4 different states of decision environment


State of decision and Consequence
Certainty: Deterministic Risk: Probabilistic Uncertainty: Unknown Conflict: Influenced by an opponent
Subjective approach (uses probabilities) Also known as Bayesian approach,

Models in OR
Based on Purpose:
Descriptive: behavior of a system ( Behavior of demand of an inventory item) Explanatory, : Explain behavior with relationships( wages, promotion policy,) Predictive: predict stock prices for given any level of earning per share. Prescriptive (normative): norms for comparison of alternate solutions (Allocation).

Based on Degree of Abstraction


Physical, Graphic, Schematic, Analog, Mathematical

Based on Degree of certainty, and risk


Deterministic: Linear programming, transportation and assignment models Probabilistic: simulation models, decision theory

Based on Specified behavior characteristics


Static, Dynamic, Linear, Non-linear

Based on Procedure (method) of solution


Analytical, Simulation

Classification of models help in understanding the nature and role of models


Abstract or Physical
Static : linear programming Dynamic model
Linear or non-linear
Stable, unstable unstable( Constrained) Unstable (explosive) Transient steady state, Transient (non existent)

Ref:

Various Statistical Techniques


Measure of Central tendency Measure of Dispersion: Correlation

Regression analysis:
Time Series Analysis

Index Numbers
Sampling and Statistical Inference

Measure of Central tendency


Mean:
common arithmetic average
Divide the sum of the values of observation s by number of items observed.

Median:
Item lies exactly half way between the lowest and highest values when they are arranged in ascending/descending order. Not affected by value of observation
Divides the number of households into two equal parts. (50% of all households have income below median income)

Mode:
Category that has max number of observation, (that occurs more frequently)

Measure of Dispersion:
spread away from central tendency (mean/mode/median) :
Range, mean deviation, Standard deviation. The data spread in symmetrical or asymmetrical pattern: skewness

Frequency distribution in the shape of a peak: measure called: Kurtosis

Correlation
Dependent variable associated with changes in other independent variable.
Sales as depended variable and advertising budget as an independent.
Could be casual or causal relationships

Regression analysis: determining casual relationship between two variables


Use of Multi-variate statistical techniques for determining casual relationships involving two or more variables:
Multi-regression analysis, Discriminant analysis, factor analysis

Time Series Analysis


A set of data (arranged in some desired manner) recorded either at successive points in time or over successive periods of time. The changes considered as a resultant of combined affect of a force
The force components:
Editing time series data Secular trend Periodic changes (cyclical/seasonal variations) Irregular or random variators.
Cost of living, growth of agricultural /food production, seasonal requirements of items, impact of war, strikes

Index Numbers: a relative number representing net result of change in a group of variables
Stated in percentages given or current year, and base year
production, sales price, volume of employment,

Sampling and Statistical Inference


Sampling for reasons Schemes for drawing samples are classified as :
Random Sampling Schemes
Every element has an equal chance (probability) of being selected

Non-random sampling schemes


Drawing samples based on choice or purpose of selectors
Sampling analysis using various tests : Z normal distribution Students t distribution, F distribution X^2 distribution

Advantages to Management
Definiteness Condensation Comparison Formulation of policies Formulating and testing hypothesis Prediction

Application of techniques in Business and Management


Management
Marketing Production Finance, accounting and Investment Personnel

Economics Research and Development Natural science

Marketing
Marketing research info Building and maintaining an extensive market Sales forecasting

Production
PPC and analysis Machine performance evaluation QC Inventory control

Finance, accounting and Investments


Financial forecast, budget preparation Fin Investment decision Selection of securities Auditing function Credit policies, credit risk, delinquent account

Personnel
Labour turnover rate Employment trends Performance appraisal Wage rates and incentive plans

Economics
Measurement of Gross National Product and inputoutput analysis Determination of business cycles, seasonal fluctuations Comparison of market price, cost and profit of individual firm Analysis of population, Operational studies of Public utilities Formulation of appropriate economic policies and evaluation of their effects

Research and Development


Development of new product lines Optimal use of resources Evaluation of existing products

Natural science
Diagnosing based on inputs Efficacy of certain drugs Study of plant life

Exercise/ Assignments
1. Comment on the statement: Statistics are numerical statements of facts, but all facts numerically stated are not statistics 2. Explain the distinction between : Descriptive and Prescriptive models
1. Presentation topic:
1. Formulate a business problem and analyze it by applying the major phases of statistics

Functions and Progressions

Learning Objectives:
Insight into different aspects of the types of functional relationships among business variables Their applications in various fields of management
Need to Identify/define relationships among business variables Define functional relationships Various types of functional relationships Use of graph to depict functional relationships Managerial applicability Progression and application..

Introduction
For decision problems which use mathematical tools, the first requirement is to identify or formally define all significant interactions or relationships among primary factors (also called variables). The relationships usually are stated in the form of an equation or inequation.
Study mathematical problems in the context of managerial problem

Definitions
Variables: A variable is something whose magnitude can vary or which can assume various values. Represented by symbols (first letter of the name)
Discrete variable: suspect to counting (houses, machines) Continuous Variables: suspect to measurements (temp, height)

Constant and Parameters:


A constant: Remains fixed in the context of a given problem or situation
An Absolute ( or numerical) Constant retains same value in all problems
Absolute ( or numerical) value of b is denoted by lbl regardless of its algebraic sign. lbl=l-bl

An Arbitrary (or parametric) constant or parameter retains same value throughout any particular problem, but may assume different values in different problems

P21 (ex1)

Types of Function
Linear Functions:
The power of independent variable is 1 A function with only one independent variable is called a Single variable function.
(P21(1)

A single variable function can be linear or non-linear.


(p 22)

A linear function with one variable can always be graphed in two dimensional plane (or space). The graph of such functions is always a straight line. (P22ex2

Polynomial functions:
Polynomial function of degree 1 is called a linear function Polynomial function of degree 2 is called a Quadratic function (p23-ab

Absolute Value Functions : ( p23(3 Inverse Function: (P 23 Step function: For different values of an independent variable x in an interval the depended variable y=f(x) takes a constant value, but takes different values in diff intervals. (p24-5) Algebraic and Transcendental functions

Activity
P 25 activity B -1a&b assignment

Business Application
Linear Function ( P27-ex3 assignment Quadratic function ( P27-ex4 assignment
Activity D (Page 28-b_assignment

Sequence and Series


If for every positive integer,n, --------related to some number-----sequence
Installment buying, simple and compound interest problems Annuities and present values Mortgage payments

Arithmetic progression (AP)


Arithmetic progression: A sequence whose term increases or decreases by a constant number called Common difference of an AP and is denoted by d
P29 ex6 assignment

Geometric progression (GP)


A geometric progression: A sequence whose term increases or decreases by a constant ratio called Common ratio of an AP and is denoted by d
P29 ex7 assignment P31 ex 8

Concept of Maxima and Minima with managerial applications


Page 55 ex18 assignment

Descriptive Statistics

Data Collection and analysis

Contents
Collection of data:
Need and significance of data collection Primary and secondary data Different methods of collecting primary data Edit primary data and know sources of secondary data and its use Census versus sample

Classification and presentation of collected data

Treatment of data through central tendency measurements,


Deviations and different measures of variation.

Introduction
The need for data collection
Statistical data is a set of facts expressed in quantitative form. The use of facts expressed as measurable quantities can help a decision maker to arrive at better decisions.

Primary and Secondary Data


Distinguish between Primary and------

Methods of collecting Primary Data


Observation Questionnaire
Personal interview Mail Telephone
Designing/Preparing questionnaire Pre-testing a questionnaire Editing the primary data.

Important points in Designing a questionnaire


Covering letter Number of questions to be minim (15-40) Simple, short, and unambiguous Sensitive and personal nature be avoided Answer to questionnaires should not require calculations Logical arrangement Crosscheck and footnotes

Editing Primary Data to ensure:


completeness Consistency Accuracy Homogeneity

Sources of secondary data


Published Sources Unpublished Sources

Precautions in use of secondary Data


Because of bias, inadequate sample size, errors of definitions, computational errors Hence to consider:
Suitability Reliability Adequacy

Census (complete enumeration) and Sample


Advantages and disadvantages of census (Physical destruction)

Exercises/Assignments
1. Distinguish between Primary and Secondary data. Indicate the situations in which each of these----? 2. Distinguish between census and sampling methods of data collection. Compare merits/demerits. Why sampling unavoidable in certain situations.

Presentation of Data

Presentation of Data
Learning objectives
Understand the need and significance of presentation of data Necessity of classifying data and various types of classification Construct frequency distribution of discrete and continuous data Frequency distribution in the form of :bar diagrams, histograms, frequency polygon, and ogives

Classification Discrete frequency Distribution Continuous frequency distribution Choosing the classes Cumulative and Relative frequencies Charting data

Introduction
After the understanding various ways of data collection:
The successful use of Data collected depends on:
The manner in which it is arranged, displayed and summarized.

Presentation of data can be displayed either in tabular form or through charts


In tabular form , it is necessary to classify the data before the data is tabulated. Hence to understand: classification , tabulation and charting of data.

Classification of data
After the data has been systematically collected and edited, The first step in presentation of data is Classification

Classification is the process of arranging the data according to points of similarities and dissimilarities

Principal objectives of classification


To condense the mass of data in such a way that salient features can be easily noticed To facilitate comparisons between attributes of variables To prepare data to be presented in tabular form To highlight significant features of data at a glance

Some Common Types of Classification


Geographical Classification
Production of wheat state-wise

Chronological Classification
Sales figures of a company for last six years

Qualitative Classification
Dichotomous Classification
An attribute divided into two classes, one possessing and the other not possessing it (basis of employment)

Manifold Classification : divided into several classes (educational level)

Quantitative Classification : according to characteristics that can be measured (employees as per monthly salaries)
Discrete : limited to certain numerical value of a variable Continuous: Take all values of the variable

Examples
Chronological classification Discrete frequency distribution Continuous frequency distribution
P14,15

Construction of a Discrete Frequency distribution


Place all possible values of the variable in ascending order in one column Then prepare another column of Tally mark to count the number of times a particular value of the variable is repeated
To facilitate counting use blocks of 5 Tally marks with a space left in-between blocks

The frequency column refers to numbers of tally marks, a particular class will contain
p15

Construction of a Continuous Frequency distribution


Class limits: 60-69: lower and upper limits, lowest and highest Class intervals: width, span or size20-10=10 Class frequency: The number of observation falling within a particular class is called , class frequency or frequency. Total frequency (sum of all frequencies) indicate the total number of observations considered in a given frequency distribution. Class mid-point: sum of two successive lower points divided by 2.

Assignments
1. What do you understand by classification of data? 2. Why classification of data is required? 3. Illustrate the difference between qualitative and quantitative data.

Types of class interval: Methods


Exclusive and Inclusive (on whether upper limit is included or excluded) ----(p16) Open end (p17)
Generally opt for exclusive method But If Inclusive is suggested, minor adjustments required to determine class interval
Correction factor: Lower limit of second class-upper limit of first class, divided by 2 Deduct the correction value from lower limit and add to upper limit

Guidelines for choosing the class


The number of classes should not be too small or too large (5 to 15) If possible Values of widths of interval should be numerically simple like 5, 10, 25 (values like3,7,9 be avoided It is desirable to have classes of equal width, (classes with unequal class interval can be formed, like in income distribution) The starting point of a class should begin with 0,5,10, or multiples of. ( eg 3-13 not allowed) Class interval should be determined, considering, min max value and the number of classes to be formed
(p18)

Activity
Distinguish between:
1. Discrete and continuous frequency distribution 2. Class limits and class intervals 3. Inclusive and exclusive methods

Cumulative and Relative frequencies


Rather than listing the actual frequency opportunity each class , it may be appropriate to list either cumulative frequencies or relative frequencies or both.
Cumulative frequencies: cumulates the frequencies, starting from either lowest or highest values. (p18-19) Relative Frequencies: Very often, the frequencies in a frequency distribution are converted to relative frequencies to show percentage for each class. The frequency of class is divided by the total number of observations (total frequency).To get the percentage for each class, multiply the relative frequency by 100. (p19)

Important advantages in looking at Relative frequencies (percentages)


1. Facilitates a comparison of two or more sets of data. 2. Constitute the basis for understanding the concept of probability.

Activity
Explain the concept of relative frequency

Charting of Data

Popular Methods of Charting frequency distribution


Bar Diagram Histogram Frequency Polygon Ogive or Cumulative frequency curve

Bar diagram
Most popular Example: Population, per capita income, sales and profits A bar is a thick line whose width is shown to attract the viewer. A bar diagram may be either vertical or horizontal.
DRAWING A BAR DIAGRAM:
Take characteristic (or attributes) under consideration on X-axis and the corresponding value on the Y-axis. It is desirable to mention the value depicted by the bar on the top of the bar. The gap between one bar and the other is kept equal. Also width of bars are same. The only difference is in length of the bars.
That is why this type of diagrams are known as one dimensional. (P20)

Histograms
One of the most commonly used and easily understood methods of graphic representation of frequency distribution. A histogram is a series of rectangles having areas that are in the same proportion as the frequencies of a frequency distribution
CONSTRUCTING HISTOGRAM:
On horizontal axis or X-axis, we take class limits of variables, and on vertical axis or Y-axis, we take frequencies of class intervals shown on horizontal axis If class intervals are of equal width, then the vertical bars of equal widths.(P20-21) On the other hand if the class intervals are unequal , the frequencies have to be adjusted according to width of class interval (P 21-22)

Activity
Draw a sketch of a histogram and a bar diagram and explain the difference between the two.

Frequency Polygon
A graphical presentation of frequency distribution A polygon is a many sided closed figure,
A frequency polygon is constructed by:
taking the mid points of upper horizontal points of each rectangle on the histogram and connecting these mid-points by straight lines. In order to close the polygon, an additional class is assumed at each end, having zero frequency. (p22-23)
The histogram is usually associated with discrete data and a frequency polygon is appropriate for continuous data. (But the distinction is not always followed) The frequency polygon and frequency curve have a special advantage over histogram particularly when to compare two or more frequency distributions

Activity
What is the procedure for making a frequency polygon? Illustrate.

Ogives or Cumulative frequency Curve


A graphical presentation of a cumulative frequency distribution .
There are two methods:
Less than ogive:
The upper limits of various classes are taken on X-axis, and frequencies obtained by the process of cumulating the preceding frequencies on Yaxis.By joining these points we get less than ogive

More than ogive.


By taking lower limits on X-axis and cumulative frequencies on the Yaxis.by joining these points we get more than ogive.

The shape of less than ogive curve will be a rising one, Whereas the shape of more than ogive curve wood be a falling one

Activity
With the help of an example , explain the concept of less than ogive and more than ogive.

Types of Data
Data refers to known facts or things used as basis for inference or reckoning. Types of Data:
Qualitative: concerned with qualities and non-numerical characteristics. Quantitative: concerned with numerical characteristics.
Discrete: take only one of a range of distinct values (no of employees). Continuous: take any value within a given range (time, length) (P160-161BR)

The Concept of Level of Measurements


Scales of Measurement
Nominal level (Classificatory/ named) Data: Ordinal level (Ranking/ordered) data: Interval level (Numerical) data Ratio level (Numerical) data: represent highest level of precision.

Nominal level (Classificatory/ named) Data: And Implications for Data handling Methodologies
Classification of data: Statements of equality or differences (according to variable occupation) Although mode could be used, very few statistics can be applied to data collected in this form

Ordinal level (Ranking/ordered) data: And Implications for Data handling Methodologies
Can be Classified in terms of of equality or differences Permit you to order individual data and make decisions such as this score is greater or lesser than another. (employee grades or choices ranked) Since arithmetic mean cannot be calculated , the use of many other statistics are also excluded.

Interval level (Numerical) data And Implications for Data handling Methodologies
Have characteristics of both Nominal and Ordinal scales, but also provides additional info regarding the degree of difference between individual data items within a set of group. Most measures of human characteristics have interval properties. (Interval between IQ Scores/ assignment marks) However precision in interval scale is limited. Also some statistics such as geometric mean are excluded from use with data collected in this form.

Ratio level (Numerical) data: represent highest level of precision. And Implications for Data handling Methodologies
A Mathematical number system (height, weight, time) Ratio Scale allow ratio as well as interval decision (allowing us to say something is so many times big/bright/heavy) Any statistics can be used on data collected in this form. (Some scales such as temp may appear to have ratio properties, but in fact are only interval scales) (Centigrade)

Parametric and non-parametric methods (assumptions about parameters of the data)


Associated with every data analytic method, there is a set of assumptions that underlie the use of that method. t-test (to compare the means of two samples of data) as one of the most popular (p133-RM)
non-parametric methods;
For research in social sciences in mind Valid for use with nominal or ordinal level. For very small samples (less than n.=10), though the power of any test weakens with very small samples.

Measures of central Tendency

Measures of central Tendency


Learning objectives:
Concept and significance of measures of central tendency. Computing: arithmetic mean, weighted arithmetic mean, median, mode, geometric mean, and harmonic mean. Computing several quantiles: quartiles, deciles, and percentiles Relationships among various averages.

Significance of measure of central tendency


The objective is to find one representative value which can be used to locate and summarize the entire set of varying values. To find some central value around which the data tend to cluster
Average income Average sales figure may be compared with that of another

Properties of a Good measure of central tendency


Easy to understand Simple to compute Based on all observations Uniquely defined Capable of further algebraic treatment It should not be unduly affected by extreme values.

Important measures of central tendency commonly used by Business and Industry.


arithmetic mean, weighted arithmetic mean, median, quantiles mode, geometric mean, harmonic mean.

Arithmetic Mean (or Mean or Average)


In statistics term average refers to any of the measure of central tendency
The Arithmetic mean is defined as being equal to the sum of numerical values of each and every observation divided by the total numbers of observations. Eg; Average monthly salary ..ungrouped data When observations are classified into a frequency distribution, The midpoint of a class interval would be treated as the representative average value of that class.

(P-31 .)

Mathmetical properties of Arithmetic mean


The sum of deviations of observations from AM is always zero The sum of squared deviations of observations from the mean is minimum Arithmetic means of several sets of data may be combined into a single AM for combined sets of data.

AM
Advantages:
Easily computed Readily understood Almost all properties of a good measure of central tendency.

Disadvantages
Distorted by Extreme values Open end distribution and assigning mid point value.

Weighted Arithmetic mean


Arithmetic mean gives equal importance (or weight) to each observation. In some cases all observations do not have same importance

Useful in problems relating to construction of index numbers.


P33,34

Median
Divides the distribution into two equal parts. 50% of the observations in distribution are above the value of median ------ The median is the value of the middle observation when the series is arranged in

P34,,35

Mathematical Property of Median


Sum of absolute deviations about the median is minimum Easy to determine and easy to explain Affected by number of observations and not by value of observation, hence less distorted as a representative value than AM It may be computed for an open- end distribution
Disadvantages:
Less familiar than AM As a positional average its values are not determined by each and every observation. Not capable of algebraic treatment

Quantiles
Related positional measures of central tendency The most familiar quantiles are
Quartiles:
Values which divide the total data into 4 equal parts Since 3 points divide the distribution into 4 equal parts, we have 3 quartile. Q1(25% of observations are smaller and ----), Q2,Q3

Deciles
Values which divide the total data into ten equal parts. Since 9 points divide the distribution into 10 equal parts, we have 9 Deciles denoted as D1, D2---D9

Percentiles:
Values which divide the total data into 100 equal parts. Since 9 9points divide the distribution into 100 equal parts, we have 99 percentiles denoted as P1, P2----P99 P36,37

Locating Quantiles graphically:


To locate median graphically, draw less than ogive (cumulative frequency curve), Take variables on X axis and frequency on Y axis Determine median value by locating N/2 observation on Y axis, Draw a horizo line to cum freq curve From where it meets, draw perp to X axis The point where it meets X axis is the median value.
Same way values of Q1---, D1---,P1---, etc can be found

p38

MODE
Most commonly observed value in a set of data----P39

Locating the mode graphically


Construct a histogram p40

Relationship among Mean, Median and Mode


A distribution in which mean, median and mode coincide is known as Symmetrical (bell shaped) distribution If a distribution is skewed, ( not symmetrical), then mean, median and mode are not equal. In a moderately skewed distribution, distance between mean and median is approx , one third the distance between mean and mode
Mode=3median-2mean p41

Geometric Mean
Geometric mean like arithmetic mean is a calculated average. Very useful in averaging ratios and percentages. Also in determining the rate of increase or decrease Also capable of further algebraic treatment
GM is more difficult to compute and interpret Cannot be computed if any observation has either a value zero or negative observations

Harmonic Mean
A measure of central tendency for data expressed as rates (km/hr, tonnes/day , Km/ltre) Defined as the reciprocal of arithmetic mean of reciprocal of individual observations.
Harmonic mean like arithmetic mean and geometric mean is computed from each and every observations It is specially used for averaging rates
Cannot be computed when on or more observations have zero value or when there are both positive and negative observations In dealing with business problems rarely used.

Measures of variation and skewness


After having understood various measures that are that are used to provide a single representative value of a given set of data, we know that this single value alone cant adequately describe a set of data. Hence we got to study two important characteristics of a distribution:
Variation Skewness

Measure of Variation( Dispersion)


A measure of variation (dispersion) describes the spread or scattering of the individual values around the central value.
Illustration (p47)

Significance of Measuring variation


1. Determines the reliability of an average by pointing out as to how far an average is representative of the entire data. 2. Determine nature and cause of variation in-order to control the variation itself 3. Enable comparisons of two or more distributions with regard to their variability. 4. Measuring variability is of great importance to advanced statistical analysis. (like in sampling or statistical inference)

Properties of a Good measure of variation


Should possess, as far as possible same properties as those of a good measure of central tendency. Some of the well known measures of variation which provide a numerical index of the variability of the given data are:
Range Average or mean deviation Quartile Deviation or Semi-Interquartile range Standard deviation

Absolute and Relative measures of variation


Measures of Absolute variation are expressed in terms of the original data. In cases two sets of data are expressed in different units of measurement, then the absolute measures of variation are not comparable. In such cases measures of relative variation are used. Also in cases:
Comparison between two sets of data having the same unit of measurement, but with different means.

Range
Difference between the highest (numerically large ) value and the lowest value in a set of data. R=H-L Range is very easy to calculate and gives us some idea about the variability of data. However, the range is a crude measure of variation , as it uses only two extreme values.
Concept of range utilized in SQC, in studying variations in prices of shares and debentures and other commodities that are very sensitive to price changes from one period to another. Also a good indicator in weather forecast

For grouped data, the range may be approximated as difference between upper limit of the largest class and the lower limit of the lowest class. The relative measure corresponding to range, called the coefficient of range , is obtained by applying formula P48,49

Quartile deviation or Semi-interquartile range


Computed by taking the averages of the difference between the third quartile and the first quartile. The relative measure corresponding to quartile deviation, called coefficient of quartile deviation.
QD is superior to range as it is not based on two extreme values, but rather on middle 50% observations. Another advantage of QD is that it is the only measure of variability which can be used for open-end distribution.
The disadvantage is that it ignores the first and last 25% observations.

P49,50

Average Deviation or Mean Deviation


Is an improvement over the previous two measures in that it considers all observations in the given set of data. This measure is computed as a mean of deviations from mean or the median. All deviations are treated as positive regardless of sign. Theoretically, there is an advantage in taking the deviations from median, because, the sum of absolute deviations from median is minimum. However, in actual practice, the arithmetic mean is more popular. The relative measure corresponding to the average deviation, called coefficient of average deviation is obtained by dividing average deviation by the particular average used in computing the average deviation. (Mean or median) p51

Advantages and disadvantages (of Average Deviation)


Though a good measure of variability, its use is limited, If only to measure and compare variability among several sets of data, the AD may be used.
Major disadvantage is its lack of mathematical properties. This is more so because non-use of signs in its calculations make it algebraically inconsistent.

Standard Deviation
Most widely used and important measure of variation. (In computing average deviation , the signs are ignored). The std deviation overcomes this problem, by squaring the deviations, which makes them all positive. The std deviation, also known as root mean square deviation. The square of Std Deviation is called variance
The Std Deviation and variance becomes larger as the variability or spread within the data becomes greater. It is readily comparable with other Std deviations, and greater the Std Deviation, greater the variability. The Std deviation is commonly used to measure variability, While other measures have special uses, It is the only measure possessing the necessary mathematical properties to make it useful for advanced statistical work. p53

Coefficient of Variation (C.V)


Frequently used relative measure of variation . This measure is simply the ratio of std deviation to mean expressed as percentage.
p54

Skewness
The measure of central tendency and variation do not reveal all characteristics of a given set of data Two distributions having same mean and Std deviation, may differ widely in the shape of their distribution.
Distribution of data is symmetrical or not (asymmetrical or skewed) Thus the skewness refers to lack of symmetry in distribution

Method of detection of skewness is to consider the tail of distribution


Symmetrical distribution:
No extreme values in a particular direction, so that low and high values balance each other.
Mean=median=mode

Negatively skewed distribution


Longer tail towards lower value, or left hand side, the skewness is negative. The mean is decreased by some extremely low values.

Positively skewed Distribution


Longer tail of distribution towards higher values, or right hand side, the skewness is positive. The mean is increased by some unusually high values. p55

Relative skewness
In order to make comparisons between the skewness in two or more distributions, the coefficient of skewness
(Karl Pearson method, Bowleys methods )

In practice the value of coefficient of Skewness , SK may be between +-1

You might also like