You are on page 1of 11

REPORT ON

Use of Central Tendency and Dispersion in Business Decision


Course Title: Business Statistics Course Code: STS201 Submitted To: Mr. Raihanul Hasan Senior Lecturer Submitted By:

Date of submission: 26-12-12

BBA PROGRAM

STATE UNIVERSITY OF BANGLADESH

We can use single numbers called Summary Statistics to describe characteristics of a data set. Two of these characteristics are particularly important to decision makers: 1. Central tendency 2. Dispersion Measures of central tendency and dispersion provide a convenient way to describe and compare sets of data. Central Tendency: Central tendency is the middle point of a distribution. Measures of central tendency are also known as Measures of location. Measures of central tendency yield information about the center, or middle part, of a group of a numbers. It does not focus on the span of data set or how far values are from the middle numbers. Dispersion: Dispersion is the spread of the data in a distribution, that is, the extent to which the observations are scattered. Objectives: To use summary statistics to describe collection of data. To use the mean, median and mode to describe how data bunch up To use the range, variance and standard deviation to describe how data spread out. MEASURES OF CENTRL TENDENCY Measures of central tendency include three important tools mean (average), median and mode. Mean The arithmetic mean is the most common measure of central tendency. For a data set, the mean is the sum of the observations divided by the number of observations. Basically, the mean describes the central location of the data. For a given set of data, where the observations are x1, x2,.,xi ; the Arithmetic Mean is defined as :

The weighted arithmetic mean is used, if one wants to combine average values from samples of the same population with different sample sizes:

Example 1: Observations Weights 12 2 15 5 20 7 22 6 30 1

Find the mean. Observations 12 15 20 22 30 Total Advantages can be specified using and equation, and therefore can be manipulated algebraically is the most sufficient of the three estimators is the most efficient of the three estimators is unbiased Disadvantages is very sensitive to extreme scores (i.e., low resistance) value is unlikely to be one of the actual data points requires an interval scale anything else about the distribution that wed want to convey to someone if we were describing it to them? Weights 2 5 7 6 1 21 xiwi 24 75 140 132 30 404 Mean =
401

/21 =19.10

Median The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an even number of observations, the median is not unique, so one often takes the mean of the two middle values. For Odd number of observations: Median = (n+1)/2
th

observations.

For Even number of observations: Median = Average of (n/2)


th

and (n/2 + 1)

th

observations.

Here are the sample test scores you have seen so often: 100, 100, 99, 98, 92, 91, 91, 90, 88, 87, 87, 85, 85, 85, 80, 79, 76, 72, 67, 66, 45 The "middle" score of this group could easily be seen as 87. Why? Exactly half of the scores lie above 87 and half lie below it. Thus, 87 is in the middle of this set of scores. This score is known as the median. In this example, there are 21 scores. The eleventh score in the ordered set is the median score (87), because ten scores are on either side of it.

If there were an even number of scores, say 20, the median would fall halfway between the tenth and eleventh scores in the ordered set. We would find it by adding the two scores (the tenth and eleventh scores) together and dividing by two. Advantages is unbiased is unaffected by extreme scores (i.e., high resistance) doesnt require the use of an interval scale, as long as you can order the scores along some continuum then you can find the median

Disadvantage cannot be specified using an equation so cant be manipulated algebraically is the least sufficient of the three estimators is less efficient than the mean

Mode The mode is the most frequently occurring value. It is the most common value in a distribution: The mode of 3, 4, 4, 5, 5, 5, 8 is 5. Note that the mode may be very different from the mean and the median. With continuous data such as response time measured to many decimals, the frequency of each value is one since no two scores will be exactly the same. Therefore the mode of continuous data is normally computed from a grouped frequency distribution. The grouped frequency distribution table shows a grouped frequency distribution for the target response time data. Since the interval with the highest frequency is 600-700, the mode is the middle of that interval (650). Range 500-600 600-700 700-800 800-900 900-1000 1000-1100 Frequency 3 6 5 5 0 1

Table 3: Grouped frequency distribution Advantages represents a number that actually occurred in the data represents the largest number of scores, and so the probability of getting that score is greater than the probability of getting any of the other scores if an observation is just chosen at random is unaffected by extreme scores (i.e., high resistance) is unbiased doesnt require an interval scale

Disadvantages the mode depends on how we group the data cannot be specified using an equation so cant be manipulated algebraically is less sufficient than the mean is less efficient than the mean

Percentiles They are measures of central tendency that divide a group of data into 100 parts At least n% of the data lie below the nth percentile, and at most (100 - n)% of the data lie above the nth percentile Example: 90th percentile indicates that at least 90% of the data lie below it, and at most 10% of the data lie above it The median and the 50th percentile have the same value. Applicable for ordinal, interval, and ratio data Not applicable for nominal data For Calculation: Organize the data into an ascending ordered array. Calculate the percentile location:

P ( n) 100

Determine the percentiles location and its value. If i is a whole number, the percentile is the average of the values at the i and (i+1) positions. If i is not a whole number, the percentile is at the (i+1) position in the ordered array.

FOR EXAMPLE Raw Data: 14, 12, 19, 23, 5, 13, 28, 17 Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28 Location of 30th percentile:

30 (8) 2.4 100

The location index, i, is not a whole number; i+1 = 2.4+1=3.4; the whole number portion is 3; the 30th percentile is at the 3rd location of the array; the 30th percentile is 13.

Quartiles Measures of central tendency that divide a group of data into four subgroups Q1: 25% of the data set is below the first quartile Q2: 50% of the data set is below the second quartile Q3: 75% of the data set is below the third quartile Q1 is equal to the 25th percentile Q2 is located at 50th percentile and equals the median Q3 is equal to the 75th percentile Quartile values are not necessarily members of the data set

E.g. Ordered array: 106, 109, 114, 116, 121, 122, 125, 129 Q1:

i
Q2:

25 (8) 2 100 50 (8) 4 100

Q1

109 114 111.5 2

i
Q3:

Q2

116 121 118.5 2

75 (8) 6 100

Q3

122 125 123.5 2

Comparing the Mean, the Mode and the Median The information obtained from these three measures of central tendency in a data distribution is similar in the sense that all reflect some aspect of the data values which is typical of the whole distribution. But they differ in the kind of typicality which they report and in how sensitive they are to changes in the values of the observations. The mean represents the balance point, or center of gravity of the distribution. Its value will change when there is a change in any of the data values in the distribution. The mode represents the most frequent or probable single value in the distribution. If the value of a datum in the distribution changes from a non-modal value to the modal value, the value calculated for the mode remains the same, even though the mean would (and the median might) change. The median represents the middle score of the distribution. If the value of a datum is changed so that its position relative to the magnitude of the other values is not changed, the median will remain the same, even though the mean would, and the mode might. Users of Central Tendency Many jobs and tasks require some use of central tendency, with specialists in diverse fields using central tendency on a regular basis. Statistician Statisticians rely heavily on measurements of central tendency as they examine data and produce visualizations that help others understand complex number systems. For example, a statistician who works in pure mathematics produces algorithms to predict results of a trial based on the central tendencies that past data sets exhibit. In another case, sports statisticians use central tendency to produce advanced metric stats that talent scouts, managers and fans use to analyze and enjoy a game. Management Analyst A management analyst is a business professional who uses central tendency to analyze the internal workings of a company. These analysts examine typical data and averages in a number of business-related areas, including payroll, expenditures, revenue, profit margins and sales numbers. Management analysts compare this central tendency data to information from other businesses and economic markets, producing new strategic proposals for business leaders to consider or adopt in

the pursuit of growth. For example, a management analyst may examine a business's expenditures and determine that rising payroll costs are due to a rising median salary, with more high-wage levels available to employees. This information could lead a business to alter its pay structure to achieve a better balance of compensation and payroll savings.

MEASURES OF DISPERSION Measures of Dispersion provide us with a summary of how much the points in our data set vary, e.g. how spread out they are or how volatile they are. In measuring dispersion, it is necessary to know the amount of variation and the degree of variation. The former is designated as absolute measures if dispersion and expressed in the denomination of original variants while the latter is designated as related measures of dispersion. Absolute measures can be divided into positional measures based on some items of the series such as (I) Range, (ii) Quartile deviation or semi interquartile range and those which are based on all items in series such as (I) Mean deviation, (ii) Standard deviation. The relative measures in each of the above cases are called the coefficients of the respective measures. For purposes of comparison between two or more series with varying size or number of items, varying central values or units of calculation, only relative measures can be used. The following are the important methods of studying variation: 1. Range 2. Mean deviation 3. Standard deviation and Variance (which is closely related to standard deviation) 4. The Coefficient of Variation

Range
Range is the simplest of the summary measures of variation .It is also the crudest and most prone to error .It is computed as the difference between the largest and the smallest value in a data set: Range = H- L Absolute range Relative range; Coefficient of range = = Sum of the two extremes For example, for the data set {2, 2, 3, 4, 14} Range = 14-2=12 14 2 12 = = 0.75 14 + 2 16 H-L H+L

Coefficient of range =

Example: Given the following data: 3 6 9 11

How to compute the sample range? Solution: H = 11, L = 3 range = H - L = 11 - 3 = 8

Mean Deviation
Mean Deviation can be calculated from any value of Central Tendency, viz. Mean, Median, Mode. Accordingly, Mean Deviation can be of the following types: Mean Deviation about Mean Mean Deviation about Median Mean Deviation about Mode

Mean Deviation about Mean =

Properties of Mean Deviation about Mean: The average absolute deviation from the mean is less than or equal to the Standard Deviation. The mean deviation of any data set from its mean is always zero. The mean absolute deviation is the average absolute deviation from the mean and is a common . Variance and standard deviation Variance and standard deviation are the most common of all of the measures of variation. Variance is a measure of statistical dispersion, indicating how its possible values are spread around the mean. Thus, variance indicates the variability of the values. A smaller value implies a smaller variation from the mean The positive square root of Variance is called the Standard Deviation.

Let us consider an example: Values 4 6 5 5 Total =20 , mean=5 Variance = .2 =1/2 S.D = Uses of Standard Deviation: To determine, with a great deal of accuracy. Useful in describing how far individual items in a distribution depart from the mean of the distribution. Indicator of financial risk Quality Control in construction of quality control charts & process capability studies Comparing populations for household incomes in two cities & employee absenteeism at two plants Xi - Mean(x) -1 1 0 0 [Xi - XMean] 1 1 0 0
2

CHEBYSHEVS THEOREM Applies to any distribution, regardless of shape Places lower limits on the percentages of observations within a given number of standard deviations from the mean At least (1-1/k2) of the elements of any distribution lie within k standard deviations of the mean

CHEBYSHEVS THEOREM

Number Of Standard Deviation K=2 K=3 K=4

Distance From The Mean

Minimum Proportion Of Values Falling Without Distance 1-1/2 = 0.75 1-1/3 = 0.89 1-1/4 = 0.94

2 3 4

Coefficient of Variation The Coefficient of variation is a statistic that is the ratio of the standard deviation to the mean expressed in percentage. The coefficient of variation essentially is a relative comparison of a standard deviation to its mean. The coefficient of variation can be useful in comparing standard deviations that have been computed from data with different means. Suppose five weeks of average prices for the stock A are 57, 68, 64, 71 and 62. To compare a coefficient of variation for these prices, first determine the mean and standard deviation: = 64.40 and = 4.84. The coefficient of variation is: The standard deviation is 7.5% of mean.

Sometimes financial investors use the coefficient or standard deviation or both as measures of risk. Imagine a stock with a price that never changes. An investor bears no risk of losing money from the price going down because no variability occurs in price. Suppose, in contrast, that the price of the stock influence widely. An investor who buys at a low price and sells for a high price can make a nice profit. However, if the price drops below what the investors buys it for, the stock owner is subject to a potential loss. The greater the variability is, more the potential for loss. Hence, investors use measures of variability such as standard deviation or coefficient of variation to determine the risk of a stock.

REFERENCES: 1) http://www.scribd.com/doc/24787874/Measures-of-Central-Tendency 2) http://www.headscratchingnotes.net/2011/12/measures-of-central-tendency-and-measuresof-dispersion/ 3) http://www.scribd.com/doc/74113636/Data-Analysis-Measures-of-Central-TendencyDispersion 4) http://www.ehow.com/info_8538178_jobs-use-central-tendency.html

You might also like