(8b) Grouped Data - Central Tendency and Dispersion

CENTRAL TENDENCY AND DISPERSION: For grouped data
Applied Statistics and Computing Lab Indian School of Business
Applied Statistics and Computing Lab
Learning goals
Understanding data with class intervals Learning to evaluate various measures of central tendency and dispersion, for grouped data
Introduction
We studied measures of central tendency and dispersion for discrete data The data was represented in form of a list How do we deal with data with class intervals? Can we find a value that represents a given class interval? Class intervals could emerge from both discrete as well as continuous data We would look at a dataset consisting of N observations, distributed across n classes
3
Class mark
Class mark is the midpoint of a class interval Calculated as the arithmetic mean of the class limits E.g. if we are looking at the number of students whose scores lie between 60 and 70, (60 is the lower limit and 70 is the upper limit) 60 + 70 = 65 2 is the class mark or the midpoint of the class interval 60-70 Class mark cannot be determined for a data with open classes (intervals indicated by open bracket on either sides) In case of overlapping classes (where the upper limit of a class and the lower limit of the next one are equal), we assign that overlapping value to that class where the value is the lower limit
4
Cumulative frequency
Cumulative frequency is the frequency of values up to the upper limit of the corresponding class interval For the class, denote its frequency as , cumulative frequency as and class mark as
Class interval Frequency Cumulative frequency = = + = + + = + ++ = + + ++ = 1 2 3 i n Class mark + = 2 + = 2 + = 2 + = 2 + = 2
Class #
Example
Weights from the body measurement data used earlier Weight values are given up to one decimal point
Class interval 40-49.9 50-59.9 60-69.9 70-79.9 80-89.9 90-99.9 100-109.9 110-119.9 Total Frequency 27 124 120 115 87 25 8 1 507 Cumulative frequency 27 151 271 386 473 498 506 507 Class mark 44.95 54.95 64.95 74.95 84.95 94.95 104.95 114.95 6
Summation of all values

In ungrouped data set, the value of each observation is considered In grouped data, that is not possible How can we account for all the values of a dataset? As the class mark or the midpoint is considered to represent every value belonging to that particular class interval, that value holds to be a proxy for all the values Can repeat the class mark as many times as the number of values belonging to that class interval; nothing but the frequency Hence, =
7
Means
For data consisting of N observations distributed across n distinct class intervals, = = = = =
For the weights data, Arithmetic mean = 69.15 Geometric mean = 67.88 Harmonic mean = 66.64
8
Median
How do we determine the value that has 50% of the data on each of its two sides? Initially we can at least determine the class interval in which the value would lie, the median class Let be the upper limit and be the lower limit of the median class Let indicate the frequency of the median class and indicate the cumulative frequency of the class preceding the median class, then ( )( ) 2 = + This is obtained under the assumption that cumulative frequency increases from every class to another
9
Median (contd.)
Total 507 observations

= 253.5 observation splits the data into 2 equal halves Frequency 27 124 120 115 87 25 8 1 507 Cumulative frequency 27 151 271 386 473 498 506 507 Median class, as the 253.5th observation would lie in this interval
Class interval 40-49.9 50-59.9 60-69.9 70-79.9 80-89.9 90-99.9 100-109.9 110-119.9 Total
507 ( )( ) (69.9 60)( 151) 2 2 = + = 60 + = 68.46 120 Applied Statistics and Computing Lab
10
Quantiles
Suppose k is the number of quantiles k=4 for quartiles, k=10 for deciles and k=100 for percentiles The quantile is the ( ) value of the data Must note that ( ) is not the numerical value of the quantile, it is only the position corresponding to the quantile when the data is organised in an ascending order For median i.e. the 2nd quartile, it was the ( ) = ( ) value

Using cumulative frequencies, we can then determine the class to which the given quantile belongs As per the notations used earlier, = + [ ] = 4 and = 2 gives the median 3 1 4 4 = = { + } { + } Where, the s and s refer to the lower and upper limits of the corresponding quantile classes
11
Mode
Can easily identify the class interval with the highest frequency; the modal class How do we determine the value which has the highest density? Formula given by: = + [
]
where;
12
Mode (contd.)
Class interval 40-49.9 50-59.9 60-69.9 70-79.9 80-89.9 90-99.9 100-109.9 110-119.9 Total Frequency 27 124 120 115 87 25 8 1 507
Modal class, class interval with the highest frequency
= +
124 27 = 50 + 59.9 50 = 59.51 + 124 27 + 124 120

13
Absolute deviations
For data consisting of N observations distributed across n distinct class intervals,
= =
where, is the class mark

14
Central moments
For grouped data consisting of N observations distributed across n distinct class intervals, ( ) = = where, is the class mark = =
( )
= Coefficient of skewness and kurtosis can be calculated accordingly

15
Conclusion
We can verify that the values obtained with the formulae for grouped data, are very close to the values obtained by considering the data as ungrouped In many situation, describing data using class intervals is more insightful Therefore these formulae can be useful for quick hand calculation In this age of extensive computational power, these measures can be calculated without dividing the data into class intervals Yet, these formulae are important from theoretical point of view
16
Thank you

(8b) Grouped Data - Central Tendency and Dispersion

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(8b) Grouped Data - Central Tendency and Dispersion

Uploaded by

Copyright:

Available Formats

CENTRAL TENDENCY AND DISPERSION: For grouped data

Applied Statistics and Computing Lab Indian School of Business

Applied Statistics and Computing Lab

Applied Statistics and Computing Lab

Applied Statistics and Computing Lab

Applied Statistics and Computing Lab

Summation of all values

Applied Statistics and Computing Lab

Applied Statistics and Computing Lab

Modal class, class interval with the highest frequency

124 27 = 50 + 59.9 50 = 59.51 + 124 27 + 124 120

Applied Statistics and Computing Lab

where, is the class mark

= Coefficient of skewness and kurtosis can be calculated accordingly

Applied Statistics and Computing Lab

You might also like