Professional Documents
Culture Documents
Learning Goals
Concept of Central Tendency Various measures Of Central Tendency An appropriate measure in a given situation
She further asks Was this week hotter than the last week?
Also, you have the data on last weeks temperature (in degree Celsius)- 28, 30, 30, 32, 33, 38, 42 Clearly, she would not be happy if you were to present these numbers instead of an yes/no/same!
4
In addition, it is possible to compare two (or more) sets of data by simply comparing their averages(central tendency) .
y (1 / n) yi
i 1
AM of this weeks temperature= ( 36+ 35+ 36+34+ 37+40+39)/7= 36.71 For the last week week= 33.14 So you tell your friend, on average this week was hotter! Most common measure- average GPA or marks in a semester, average expenditure for the last two months most often refer to the AM In statistical literature, average need not necessarily represent the AM
Situation 1: Consider the following example of salary break up in a small firm visiting your campus for placement: Table 1
Employee CEO (only 1) Senior Analyst (10 of them) Junior Analyst (20 of them) Salary (Monthly) 3,00,000 70,000 50,000
35,000
15,000
8
Situation 1 continued
Arithmetic Mean= (3,00,000+ 10*70,000+ 20*50,000+2*35,000+2*15,000)/35= 60,000
However, more than half the employees, 24 out of 35, get salary less than 50,000! AM doesnt seem representative- The salary of the CEO pulls it up! Also, as a college graduate, you know that you are not going to be a CEO or an intern so you are hardly interested in the values of extreme observations. (Too high a value for CEO and too small a value for intern)
Situation 2
Situation 2: Qualitative Data- Data on colors of flowers in your garden: 3 Blue, 7 yellow, 8 purple,15 red. Which one is the most representative color? Clearly, you cannot find AM in this data! Limitations of AM: Affected by extreme observations in a dataset. In this example, salary of the CEO- Can be corrected by using trimmed mean or winsorized mean. For further reading see- http://en.wikipedia.org/wiki/Trimmed_estimator http://en.wikipedia.org/wiki/Winsorising Gives equal importance to all observations- can be corrected by weighted arithmetic mean. For eg, suppose your mid-term exam has a 40% weightage and end-term a 60% weightage, then an average of 75% in midterm and 65% in end-term yields a weighted average of (.40*75+ .60*65)= 69 Cannot be used in summarizing qualitative data
10
11
Some Comments:
If the total number of values n, say, is an odd number, then the median is the (n+1)/2 th value. If it is even, the AM of the n/2 th and (n/2)+1th values is the median (Convention!)
Scores of 9 students: 40,37,41,38,31,37,44,45,42. Median is score of (9+1)/2= 5th student, which is 40 ( Arrange the marks in ascending order, then take the marks of the 5th student) Add the score of another student: 48. Now, the median of the score of 10 students=AM of the score of the 5th and the 6th student= (40+41)/2= 40.5
Applied Statistics and Computing Lab
12
15
4d (d / 10) (d / 5) (d / 8) (d / 6)
17
The HM of 10,5,8 and 6 is, which is precisely the average speed of this man! HM may not be very commonly used measure of central tendency, but it is the appropriate average when the variable is of the form x per unit of y.
Applied Statistics and Computing Lab
18
ab
(2) (1/ a 1/ b)
2ab ab
2 Check: ( AM * HM ) (GM )
But this holds true for only two numbers. For any number of observations, AM>= GM>= HM, with equality holding when all observations are equal.
Applied Statistics and Computing Lab
19
May remain unchanged even after the alteration of several observations May remain unchanged even after the alteration of several observations
Median
20
Mode
Some Applications
An investor deciding on whether to invest this year in a stock that yielded bimonthly returns 15%, 4%, 5%,7%,10%,10% last year. He computes the GM Computer sales representative sells the brands and number of computers shown.
Brand IBM PS(2)/ M30 IBM PS(2)/ M50 IBM PS(2)/ M70 Compaq No of computers sold 500 410 250 506
Sales representative interested in most popular brand. Compute mode. Problem: Ignore the importance of other brands.
Applied Statistics and Computing Lab
21
Applications continued
A manufacturing company claims: On average we ship parts within 37 hours of order entry. But a careful look at the data shows that for the worst off 10% of customers the shipping time was within 89 hours of order entry. Simple average misleading? Look at data on the worst off 1,5,10 or 25% of customers. Use quantile! Word of caution: Typical representation (average) in certain situations results in gross misrepresentations Use average depending on the business situation at hand
Source: Thriving on Chaos- Tom Peters
22
Illustration Using R
We take the age variable from the bodymeasurement dataset Objective: Compare the ages of male and female using various measures of central tendency R-Code Age=age$Age Gender=age$Gender # Attach the variable name Agemale <- ifelse(gender == "Male", Age,0 ) Age.male<-subset(Agemale, Agemale!=0) # Assigning a variable to only mens age summary(Age.male) # Viewing five point summary Age.female <- ifelse(gender == "Female", Age,0 ) Age.female<-subset(Agefemale, Agefemale!=0) # Assigning a variable to only womens age summary(Age.female) AM = mean(Age.male) AM GM = exp(mean(log(Age.male))) GM HM = 1/mean(1/Age.male) HM Mode = names(sort(-table(Age.male)))[1] Mode quantile(Age.male,c(.5,.25,.75,.46,.79)) # Finding any n-th quantile dotchart(Age.male) dotchart(Age.female) # To spot extreme observations
23
Conclusion
But Sherlock Holmes did not just talk about average- you can say with precision what an average number will be upto So, what exactly is this precision that he is referring to? The investor in our example can calculate his average return, in the ages data we can find the average age of men and women- but how precise a representation is this average? In the next module on dispersion, you will be able to answer these questions and more.
Applied Statistics and Computing Lab
24
Thank you