Professional Documents
Culture Documents
Part 2: Statistics
Learning Objectives
To understand summary statistics like central tendency , Dispersion and skewness and Kurtosis To use the Mean, Median, Mode to describe how data bunch up To use the range, variance and standard deviation to describe how data spread out To explore computer based software ( SPSS ) to analyze the data and to see other useful ways to summarized data.
Standard Notation
Measure Sample X Population
2
S
S2 n
Size
Variation
Range Varianc
Shape
Skew
e Standard Deviation
Arithmetic Mean
Arithmetic Mean
Frequency Distribution Data Grouped by Classes. Each Observation fall somewhere in one classes.
What to do to find A.M of group Data? - Calculate Mid point of each class
-
Round up to make mid point to whole number. Multiply the Mid point by Frequency in that class
Sum all the results Divide the total number of observations in sample [ Sum of Frequency]
Where,
data
Simplify calculation of mean for grouped data Eliminate problem of large and inconvenient midpoints
Assign small value whole numbers(consecutive integers) codes to each of the midpoints. Assign zero to the middle midpoint or the one nearest to the middle of frequency distribution. Negative integer to the value smaller than midpoint and Positive for larger.
Example:
Affected by extreme values Unable to compute mean for the dataset that has open end classes
Class in minutes 4.2-4.5 4.6-4.9 5.0-5.3 5.4-Above
Frequency
Tedious to compute: use every data point in calculation. Ex: if there is 600 data point?
Weighted Mean
Example:
Q. Bob goes the Buy the Weigh Nut store and creates his own bridge mix. He combines 1 pound of raisins, 2 pounds of chocolate covered peanuts, and 1.5 pounds of cashews. The raisins cost $1.25 per pound, the chocolate covered peanuts cost $3.25 per pound, and the cashews cost $5.40 per pound. What is the cost per pound of this mix?.
KUSOM | Managing Information | PGDM
Geometric Mean
Why? To measure the mean of data that change over period of time, like growth rate over period of time.
Geometric Mean
Growth Factor: A.M of Growth factor: (1.07 + 1.08 + 1.10 + 1.12 + 1.18)/5 =1.11
Means 11 percent interest rate per year If bank give interest at constant interest Rate of 11 percent for 5 years then $100 x 1.11 x1.11 x 1.11 x 1.11 x 1.11 = 168.51 Differs from 168 in previous table.
Geometric Mean
Correct mean is 10.93 percent per year Which is close to 11 percent ( incorrect mean)
In some situation A.M and G.M are very close, though small difference can lead to poor decision.
Geometric Mean
In highly inflationary economies, banks must pay high interest rates to attract savings. Suppose that over 5 years in an unbelievably inflationary economy banks pay interest at annual rates of 100,200, 250,300 and 400 percent which correspond the growth factor of 2,3,3.5,4 and 5. The initial deposit of $100 for 5 years. Calculate A,M and G.M? Find the error ?
Median
Median : Example
Class in $ 0-49.99 Frequency 78 Cumulative Frequency
50.00-99.99 100.00-149.99
150.00-199.99 200.00-249.99 250.00-299.99 300.00-349.99 350.00-399.99 400.00-449.99 450.00-499.99 TOTAL
123 187
82 51 47 13 9 6 4 600
Solution: =$126.35
Calculated from group data with open ended classes, unless median fall in open end classes Calculated even when data are qualitative desc, like color, sharpness
Cons: Time consuming for large dataset. Tips: Use common sense to select the statistical tool
KUSOM | Managing Information | PGDM
Mode
Value that is repeated most often in the dataset Risk in mode of ungroup data.
Delivery trips per day made by Redix concrete plant in 20 days period. Mean : 6.7 Mode: 15
Mode
Group data in frequency distribution Assume the mode in the class with most items ( i.e. class with highest frequency) Use below formula to get single value from modal class:
Mode: Example
Then, Lmo = 4;d1 = 8-1=7; d2= 8-6=2; w= 3 Mo = 6.33 is the estimate of the mode
There can be more than two modes in data set. Bimodal ( Two modes)
Not affected by extreme values. Can use for open ended classes.
Cons: Useless when no mode and data is repeated same in all class
HINTS
If you are averaging a small group of factory wages fairly near each other, A.M is accurate & Fast. If there are 500 new houses in a development all within $10,000 of each other in value, and data is skewed: Median much quicker and accurate. Effect of inflation and interest requires G.M. Common sense: Although average children is 1.65 but children park manager will make better decision taking modal value 2 kids.
Reliability of Central location: in widely disperse data, mean is not much reliable as in Curve C.
Widely disperse data contain more problems, Identify it early before further analysis
Range
Dispersion : Range
Range = Value of highest - Value of Lowest observation observation
Ignores the nature of variation other than highest and lowest observation Heavily influenced by extreme values The range of sample of population like to vary widely because it only focus on highest and lowest values of population. Open-ended classes has no range: no highest and lowest values.
Interfractile Range
In Frequency distribution, a given fraction or proportion of data lies above or below fractile. Interfractile Range is measure of the spread between two fractiles in frequency distribution.
Type of Fractiles
Deciles: 10 equal parts Quartiles: 4 equal parts Percentiles: 100 equal parts
Quartiles divide a rank-ordered data set into four equal parts. The values that divide each part are called the first, second, and third quartiles; Q1, Q2, and Q3, resp.
Q1 is the "middle" value in the first half of the rankordered data set. Q2 is the median value in the set.
Q3 is the "middle" value in the second half of the rank-ordered data set. Interquartile Range = Q3 Q1
Example:
Example:
Chebyshevs theorem
No matter what the shape of distribution at least 75% of the values will fall within +/- 2 standard deviations from the mean fo the distribution, and at least 89% of the values will lie with in +/- standard deviations from the mean.
Bell shape frequency distribution curve
Coefficient of Variation
Independent of the unit in which the measurement has been taken. Expressed as a percent For comparison between data sets with different units or widely different means
Chapter Review