Professional Documents
Culture Documents
~
x
x%/ x / x
Symmetrical data
~
x
. For instance,
i1
Sample range
The raw values of any given investigation are initially not in any order. They
form what is referred to as raw data or scores. When the data are arranged in
either ascending or descending order they constitute an array.
Member of an array are distinguished by a star in the subscript notation
discussed earlier. In this case, an array of a random variable x is denoted by
xi* x1* , x2* , x3* ,..., xn* . And, its sample range, R = x* n x*1.
The summation theory summary
This summary has been found to be useful in derivation of statistical
formulae.
(i)
C = C +C +C ++C (N terms of C) = NC.
(ii) XY = XY and XX = X2.
(iii) CX = CX, or 1/CX = 1/C (Division by C)
(iv) (X +Y) = X +Y (AX +BY) = AX + BY.
(v) X = N. ( From the formula for the mean)
(vi) (X X0)/N = X0 + X (Used in assumed mean formula)
(vii) (X - X ) 2 = X2 -N X 2 = X2 (X) 2 /N = SS x x.
(viii) (X- X )(X - Y ) = XY (X) (Y)/N = SS x y.
2
1
x1 x2 x3 ... xn . It is a balance point of all the values (fair).
n
For example the mean of $12, $8, $25, $26, and $10 is
x $
12 8 25 25 10 81
$16.2 . Its value does depend on array position.
5
5
$102.99 .
Geometric mean ( g m )
This is a mean that is generally given by g m n x1 x2 x3 ... xn .
In business it is used to average proportional increases or decreases. (Rate of
growth/decay) For example, if students enrollment in a course, on a yearly
basis was: 84, 97,116, and 129, the g m 4 84 97 116 129 105.081168 . To
compute proportional rise in number, we find pi then gm multipliers as,
p1 =
97 84
0.155 ,
84
p2 =
116 97
1.96 ,
97
and p3 =
129 116
0.112 ;
116
then each
years multiplier, given as (1+p): 1.155, 1.196, and 1.112. Using the
multipliers geometric mean formula, g m multipliers 3 1 p1 1 p2 .....1 pn to
give g m multipliers 3 1.155 1.196 1.112 1.153823333 1.154 .The actual
average rise is r = [gm m -1], giving r 15.4%. Note that, the gm of the
proportions does not give the correct answer for the average rise. The use of
multipliers is the accepted way; avoid
g m 3 0.155 0.196 0.112 0.150407189 .
Alternatively, multipliers of the form: (100 + pi %) could be used to get r.
Self check exercise
If it is known that the price of an item has increased by 6%, 13%, 11%, and
15% in each of the four successive years, find the geometric rise of the price.
Work out the geometric multipliers first, after which you compute r. (11.2%)
(iii) The mean of the rise or fall in value within a period is also given by,
gm A
Vn
1.
V0
22
1 0.270981615 .
2
It is defined as,
hm
n
1
n
1 1
1 . *Use x-1 function to calculate it.
...
x1 x2
xn
For example, the mean of the speeds of a KQ plane that flies to different
destinations at 100,200, and 300 Km per hour is given by,
hm
3
7
163 ; 163.64
1
1
1
Km per hour
11
Compute the Harmonic mean for (i) 2, 4, and 8 (3.43) (ii) 2, 4, and 6.
(3.27) (iii) Ksh15/min, Ksh12/min and Ksh7/min. (10.24)
The root mean square value (r m s)
This is also referred to as the quadratic mean. This is denoted by,
rms
1
xi 2 . For example, the r m s of: 1,3,4,5 and 7 is given by,
n
rms
1
1 32 42 52 72 2 5 4.472135955 4.47 units.
5
~ * n 1
For data with odd size, n, the median is given by x x 2 .
x* n x* n 1
~
2 . Thus, the
For data with even size, n, the median is given by x 2
2
median lies between the two values at the middle of the array. E.g. the
median of 1, 2, 3, and 4 is between 2 and 3; which is 2.5.
Self check exercise
What makes the median of the numbers: 3, 4, 4, 5, 6, 8, 8, 8, and 10 to be the
5th number in the array with a value of 6? (n is odd)
Why is the median of 5, 6, 6, 6, 6, 7, 7, 8, 9, and 10 given as 6.5? (n is even)
Modal value ( x )
This is the value that occurs with the greatest frequency. For example, in the
case of the values: 80, 40, 40, 30, 50, and 40 the mode is 40. However, in the
cases where the frequencies are the same we have no mode. E.g. 17, 18, 35,
43, 42, and 45, have no modal value.
Grouped data aspects
fx
f
i
used.
x x0
f x x x fd
n
f
0
given by, x xo
fd
n
c , where d 2
x0
fd
1
, d3 x x0 c }.
c
Note that, the coding methods are used to reduce the BULKINESS or remove large decimal expression
values used in in the calculation to manageable whole number ones. The d3 coding method is especially
used in enlarging tiny values expressed with large decimal places to manageable whole number values. To
decide on the coding method to use, between d2 and d3, one must work out the d1 column first, then
examine the pattern of the values obtained to see if there is a common divisor, GCD to be used in d2 or a
common multiplier to be use in the d3 option. An example where d2 coding is decided on is shown below.
Let x0 = 25 and c =10.
Class
0-10
10-20
20-30
30-40
40-50
50-60
Mid-mark (x)
5
15
25
35
45
55
Frequency (f)
12
18
27
20
17
6
f = 100
d1=(x-x0)
-20
-10
0
10
20
30
d2 = d1/c
-2
-1
0
1
2
3
fd2
-24
-18
0
20
34
18
fd2 = 30
(28 units)
In the above table it is noticeable that d1 column values pattern has a GCD, c = 10, giving a d2 = d1/c. This
is why the d2 coding is applicable in this table. Try: The example on circular bolts diameters, shown
overleaf. Which of the codings would be appropriate in computing the mean of their diameters? Give a
reason for your answer, the table format and how you decide on the value of c to use in the tabulation.
Circular bolts diameters summary
Diam.
0.9747
0.9750
0.9753
0.9756
0.9759
0.9762
0.9765
0.9768
0.9771
0.9774
0.9777
0.9780
0.9749
0.9752
0.9755
0.9758
0.9761
0.9764
0.9767
0.9770
0.9773
0.9776
0.9779
0.9782
15
42
68
49
25
18
12
Freq.
Exercise
Use a suitable table, based on coding method, to find the arithmetic mean of the following data:
X
Y
20-25
2
25-30
14
30-35
29
35-40
43
40-45
33
45-50
9
Check your results using the assumed mean formula (or coding d1).
1/X
f 1/X =f/X
Log X
f Log X
20
1/3
6.66
0.477
9.54
40
1/5
8.00
0.699
27.96
30
1/7
4.29
0.845
25.35
10
1/9
1.11
0.954
9.54
100
f log X 72.39
20.06
In this case the frequency data computation formulae used are respectively described as follows.
Using,
hm
f
f
x
hm
100
4.98 In the case of g m, the g m = anti-log [
20.06
g m 10
f log x
n
. Thus, gm =
10
72.39
100
f log x ].
n
5.295414984 5.295
Notice that for the sake of accuracy the f/x and f x log x may be just straight on added on the calculator.
Find the geometric and harmonic means of: X:
f:
2 3
10
20 40
5
50
30 25 20
n / 2 f
1
c . Note, that (n+1)/2 is used for odd sum of frequency, f.
f
med
Median, ~
x = L1
1.
1
c , where the d1 are differences in frequencies.
= L1
Model values, x
d
1 d2
2.
3.
Quartile values,
2n / 4 f
n / 4 ( f )1
1
c
c Q2 L1
f
f
Q1
Q2
Q1 L1
Write an equation of each of the following: Q3 , P4 and D10. Note that the median formula is a guide or
basis for writing of theses formulae. Also, the use of (n+1)/2 rule on odd size discrete data does not apply.
Exercise
1. Compute the median and modal value of:
(i)
(ii)
X
Y
90-100,
9
80-89,
32
70-79,
43
60-69,
21
50-59,
11
10-19,
20-29,
30-39,
40-49,
50-59
10
12
20
40-49,
3
30-39
1
2. Find the Q1 and Q3 for the data in 1 (i) determine the 1st percentile of the very data.
3. Compute the missing frequencies and then the arithmetic mean of the data shown overleaf
Class: 0-10 10-20 20-30 30-40 40Freq.: 14
27
15
Let f= 100, ~
x 24 , and x 24 .
4. Given n 1 = 20 and n 2 = 30 with 1 = 64 and 2 = 47 find the combined mean, of the two sets of values.
10
~
x
x%/ x / x
Symmetrical data
~
x
mean
2 mean mod e
3
, (ii)
1
3 median mod e .
2
For example, the estimate the mode of a distribution with a mean of 52 and a
median of 54 is given by, x 3x 2 x% x 3 52 2 54 48 . x 48.
The obtained value and the other two can be used to sketch the skew ness of
the parent data. This is done by the use of the relative locations of the three
averages on the real number line, as indicated in the skew ness shown above.
In most cases, the mode is located at the hump or pile-up of the data and
the median is usually in between of the other two values.
Hence, skew ness, s k =
(s.d)
z , where
n
xx
Alternatively, sk
, or
sk
Q1 Q3 2 ~
x
,
Q3 Q1
Bowleys.
1
Q
The Kurtosis, k = 2
, where Q is the semi-inter-quartile range given
P90 P10
11
by Q =
1
( Q 3 Q1 ) .
2
Alternatively,
3.
xi nx
n n1 n2 n3 ... nk .
n1 x1 n2 x2
.
n1 n2
Illustration
The mean weight of 25 male students in a class is 64kgs, the mean
weight of 35 female students in the same class is 58kgs. Find the
combined mean weight of the class.
Solution
Let the sizes of the two genders be, n1 = 25 and n2 = 35 and their
average weights to be, x1 64 and x2 58 .
Using the combined mean formula, x
x
25 64 35 58 3,630
60.5kgs.
25 35
60
n1 x1 n2 x2
we have,
n1 n2
xc
Illustration
In computing the average price at which 200 items were sold by a
vendor it was initially found to be Ksh. 40. It was later discovered that
during the summation of the data the prices 43 and 35 had been misread
as 34 and 53. Use this information to compute the corrected mean of the
data.
Solution
Let the size of the items be n = 200, and the mean price x 40 .
Using the correction of the summation formula above,
xc 200 40 43 35 34 53 7,991 .
The corrected mean is given as xc
xc
xc
7,991
Ksh.39.955 .
200
14
mean weight of the male students is 70 Kg. Find the mean weight
of the female students. Hence, find the ratio of the two genders.
( 60, )
7. The average time IT students spent in the computer lab in two days
is 5 hrs. The rest of the students spend during the same time 4
hrs. If the combined average of these students stay in the lab is
5hrs, find the ratio of IT students to the rest of the students. (1:1)
8. (a) Two samples of sizes of 60, and 40 and means of 3 and 5 are put
together to see how their combined mean would be like. Find the
combined mean of the two samples. (3.8)
(b) If it is later realized that the means 3 and 5 were not the correct
means of the two sample and that the correct means were 6 and 4,
respectively. Compute the correct combined mean of the samples.
(5.2)
9. (a) In a review of same data of size n1 = 50 and a mean, x1 =30 it
was discovered that two of the values used, 19 and 18 had by
mistake entered as 16 and 28 respectively. Compute the
corrected mean of this data. (29.86)
(b) This data was combined with that of a second sample that was
of size, n2 = 60 and a mean x 2 = 25. What is the combined
mean of the two samples? (27.21)
Applications of the median, the mode and skew ness
A part from the indication of the centre of data, the two statistics have
the following uses.
a) The median
(i) Estimating for the mean and modal value of the data
The estimation is based on the moderate skew ness relationships
between three averages, discussed earlier under skew ness.
As mentioned earlier, for naturally occurring events, whose distribution
is normal or bell-shaped, respective averages coincide. For instance,
median, mode, quartile 2, Q2, decile 5, D5, and percentile 50, P50
coincide with the mean. They are said to be equivalent ().
However, for a skewed distribution they dont coincide. Instead the
mean, median and the mode are related by respective equations derived
earlier under skew ness. Hence, estimation of one of the averages given
15
any of the two is easily done using the relevant estimation relationship.
For example, given the mean is 27 and the mode is 30 units, the median
x 2/3 x + 1/3 x , ~
x
is estimated using the relationship, Median, ~
2/3 x 27 + 1/3 x 30 = 28. The estimated median, x%= 28. Using the
obtained results the skew ness of the data is described as negative.
(ii) The Median is a reference for commenting on skew ness
In using the obtained averages in commenting generally on the skew
ness of some given data, the median is used as a reference. In doing so
it takes the place of zero or origin on the number line, such that a value
that is located above it is said to be on the higher side and vice-versa.
For example, for a positive skew ness where the mean, median and
mode is given as 27.6, 26.03 and 23.31units, the related comments on
the skew ness may be as follows.
(i) The mean of the data used is slightly on the higher side, because its
value (27.6) is slightly greater than that of the median of 26.3 units.
(ii) Using the mode value of 23.31 it is evident that most of the values
used are on the lower side, because the mode value is less than that
of the median one.
The interpretation of this, in terms of salary, would be that most of the
workers are under paid, but on the average they are overpaid. This is
attributed to a few of the workers being slightly over paid. Draw a
skew ness sketch that clearly communicates these conclusions.
b) The mode
The mode is mainly use to indicate the value or class that occurs most
frequently. It is not based on all the values in an observation and it is
not affected by extreme values. It can also be obtained graphically. But
it is not capable of further mathematical treatment apart from the
estimation of the mean or the median, given the other averages.
It should be noted that the quartiles, deciles, and percentiles are
mainly used in dividing the data into the required portions. The
statistics like Kurtosis are also defined in terms of quartiles and
percentiles as discussed earlier under measures of shape.
c) Skewness and kurtosis
These are measures of shape that are use to describe collectively or in
summary the relative locations of the averages involved. The skew
ness indicate the locations of the mean, median and mode while
Kurtosis has to do with the semi-inter-quartile and percentile ranges.
For example a negatively skewed data has the mode on the higher side
16
and the mean (average) on the lower side and vice- versa. On the other
hand a leptokurtic data has shorter ranges than a platykurtic one.
Exercise
(a) List the merits of using skewness to give a summary
description of data over the use of the related averages.
(b) Describe wages or salaries that have a mesokurtic shape.
(c) Describe the locations of the averages associated with
skewness. Use a sketch to show their locations.
A SUMMARY OF MEASURES OF CENTRAL TENDENCY
The summary in tabular form is as shown below.
Category
List of the 4
categories
Examples of
computations
Most popular
data
representative
Arithmetic mean
and other means
Positional measure
Data
partitioner
Quartiles
Deciles and
Percentiles
2 f 1
x = 2.5,
x% L1
c,
f med
h m = 1.92,
g m 2.2134.
for grouped data
cases.
Q1 P25
Using, Pi Xi n
Q1 X*.254
X*1
( X*1 +X*2)/2
1.5
REFERNCES
Francis A. (1998). Business Mathematics and Statistics, 5th Edition, Ashford Color press, Gosport,
Hants, UK.
Saleemi N..A., (1991) Business Calculations and Statistics Simplified, A text book for K.A.T.C. Paper
3 and Business Calculations Papers of other Examining Bodies. Saleemi N. A. Publishers, Nairobi,,
Kenya.
Thomas H. Wonnacott et al.,(1990), Introductory Statistics for Business and Economics, 4th Edition,
John Wiley and Sons, New York.
Douglas A. Lind et al., (2000), Basic Statistics for Business and Economics, 3rd Edition, The McGrawHill Companies, Inc.
17
Sukhminder Singh et al. (1991), Statistical Methods for Research Workers, 2nd Edition, Kalyani
Publishers, New Delhi, India.
Panneerselvam R. (2005), Research Methodology, Prentice-HALL of India Private Limited, New
Delhi.
Gupta S. C. and Kapoor V. K. (2002), Fundamentals of Mathematical Statistics, Sultan Chand & Sons,
Delhi.
18