You are on page 1of 90

Chapter two

Examples
Example one
Example 2
 20 pre-school children has been tested their HB level here is
their results, construct ungrouped frequency distribution.

11 15
10 7
9 8
16 6
15 12
9 15
12 14
13 16
14 13
7 10
Example 3
The blood glucose level, in milligrams per deciliter, for 30
patients is shown below. Construct a frequency distribution
for the data set, using six classes.
55 115 111
63 97 90
84 81 82
Time series graph

Examples
US labor force

date 1960 1970 1980 1990 2000 2008

women 11 10 8 9 11 15
men 34 28 18 15 17 22
Chapter 3

Examples
The mean
Mean of grouped data
The median
The mode in a grouped data
 Mode = mean-3(mean-medium)
Chapter 4

Examples
Range

 The range for a set of data items is the difference between the
largest and smallest values.
 Although the range is the easiest of the numerical measures
of variability to compute, it is not widely used because it is
based on only two of the items in the data set and thus is
influenced too much by extreme data values.
 Range = max - min
Interquartile Range

 A form of the range that avoids the dependence on extreme


values in the data set is the interquartile range (IQR), or Q-
spread.
 This descriptive measure of variability is simply the
difference between the third quartile , or 75%-tile data item,
and the first quartile , or 25%-tile data item.
 In effect, it is showing the range for the middle 50% of the
data and, as such, is not affected by the extreme values in the
data set
 IQR = Q3-Q1

 Q1= ¼ N
 Q3= ¾ N
Exmaple one

I. The following are 25 final averages in a math class:


46 64 72 79 89
49 66 74 79 91
53 66 75 80 94
60 67 76 83 95
61 71 79 88 98
•What is the range?
•What is the interquartile range?
Average Absolute Deviation from the Mean
 Obviously, there are limitations in using range or interquartile range as
measures of variability.
 It would seem reasonable that any useful measure of variability should
 measure the spread around the mean since the mean is the “balance
point” of a distribution.
 If you find the difference between each data item and the mean, you
will get negative values for items that are less than the mean and
positive values for items greater than the mean.
 If you then sum up all of these differences, you will get zero; this
illustrates a special property of the mean.
 However, by taking the absolute value of each difference, you will get
the distance of each item from the mean, and the sum of these
distances would measure the total spread around the mean
 If you were to include more data items, equally spread around
the mean, you would increase the total of the distances even
though the new distribution might be less variable.
 Therefore, it is important to divide the total absolute deviation
by the number of data items; this will give an average absolute
deviation from the mean.
 Average Absolute Deviation = X X
N
 This average absolute deviation gives the average distance of any
data item from the mean and thus is a good measure of spread.
Example 2

Given the following data: 5, 7, 11, 12, 13, 18.


•What is the average absolute deviation from the
mean?
Standard Deviation
 If you were to calculate the average absolute deviation of a distribution
using a value other than the mean, you could possibly get a smaller
average absolute deviation.
 This result is one of the reasons that the average absolute deviation is
not the best measure of variability.
 Instead, calculate the average of the squared differences from the
mean; this is the variance of a distribution.
 If you were to calculate the average of the squared differences of a
distribution by using a value other than the mean, you would always
get a larger value.
 The mean is the one number that minimizes the average of the squared
differences in a distribution.
 Variance =
Example 3

Given the following data: 5, 7, 11, 12, 13, 18.


•What is the variance
 There are still two slight inconveniences in using variance as our
measure of variability.
 First, variance does not give an estimate of the distance of a typical
data from the mean; it is too big.
 Second, if the data items have a unit of measurement associated with
them, then the variance would not have the same unit of
measurement; it would have square units.
 By taking the square root of variance, we get standard deviation, which
is the measure of variability that we want.

 Standard Deviation =
The two commonly used indicators of variability are the
variance and the standard deviation.
•Higher values for both of these indicators indicate a
larger amount of variability than do lower numbers.
• Zero stands for no variability at all (e.g., for the data 3,
3, 3, 3, 3, 3, the variance and standard deviation will
equal zero).
•When you have no variability, the numbers are a
constant (i.e., the same number).
•The variance tells you (exactly) the average
deviation from the mean, in "squared units."
•The standard deviation is just the square root of the
variance (i.e., it brings the "squared units" back to
regular units).
•The standard deviation tells you (approximately) how
far the numbers tend to vary from the mean. (If the
standard deviation is 7, then the numbers tend to be
about 7 units from the mean. If the standard deviation
is 1500, then the numbers tend to be about 1500
units from the mean.)
 If data are normally distributed, then an easy rule to apply to the
data is what we call “the 68, 95, 99.7 percent rule." That is . . .

 Approximately 68% of the cases will fall within one standard


deviation of the mean.

 Approximately 95% of the cases will fall within two standard
deviations of the mean.

 Approximately 99.7% of the cases will fall within three standard
deviations of the mean.
Measures of Relative Deviation
 When the deviation of observations within a series is to be
measured, the standard deviation is the best measure.
 But the size of the standard deviation depends upon the size
of the mean as well as the unit of measurement of
observation.
 Hence to compare the variations of two or more variables
which are in different units as well as with marked
differences in the size of the means, comparison with
standard deviation is not suitable.
 As an example, the variation of the haemoglobin level of a
group of students and variation of their body weights will
have different means and they are measured in different units.
 Haemoglobin level is expressed as gm % while the body
weight will be in kilograms, further, the size of the mean
haemoglobin level will be smaller while the mean body
weight will be a big number and the size of the standard
deviations will also be different.
 In order to compare the deviations of such variables of data,
the standard deviation is expressed as a percentage to the
mean value and this quantity is known as Coefficient of
Variation.
 This has no unit but it is expressed as a percentage.
 Coefficient of Variation = (Standard Deviation/Mean x 100)
Example 5
 The mean and standard deviation of the haemoglobin level of
a group is 12.6 gm % and 1.5 gm% respectively while the
mean and standard deviation of the body weight of the same
group is 50 kg and 2.2 kg respectively.
 To compare the deviations of these two sets of observations
coefficient of variation is calculated for each of the data.
 From these values it can be seen that the variation is greater
for haemoglobin level than for body weight of the group
although the absolute value of standard deviation was higher
for body weight.
Identifying outliers
Example
 Check the following data set for outliers.
 5, 6, 12, 13, 15, 18, 22, 50
Chapter 7

examples
Normal distribution
Example
 Find the area under standard normal distribution for
each of the following as percentage.
 Between Z = 0 and Z = 1.5
 Between Z=0 and Z=-2
 To the right of Z = 2
 To the left of Z = 2
 Between Z= 1.5 and 2.5
 Between Z= - 1.5 and -2.5
 Between Z = 1.5 and – 1.5
Z score
 Using the data presented in Table, find the percentage of
students whose scores range from the mean (70.07) to 85,
the SD is 10.27
 (1) Convert 85 to a Z score:
Z = (85-70.07)/10.27 = 1.45
2) Look up the Z score (1.45) in Column A, finding the
proportion (.4265)
 (3) Convert the proportion (.4265) to a percentage (42.65%); this is the
percentage of students scoring between the mean and 85 in the course.
Finding the Area Between the Mean and a
Negative Z Score

 Using the data presented in Table 10.1, find the percentage of


students scoring between 65 and the mean (70.07)
 (1) Convert 65 to a Z score:
Z = (65-70.07)/10.27=-.49
(2) Since the curve is symmetrical and negative area does
not exist, use .49 to find the area in the standard normal
table:
 3) Convert the proportion (.1879) to a percentage (18.79%); this is the
percentage of students scoring between 65 and the mean (70.07)
Example 1
 One student has their exam result in mathematics and a second
student has their exam result in English.
 The second student has a higher mark than the first student;
however, given that the exam marks for English and mathematics
have different distributions, it is not possible to say that the second
student has gained a higher achievement.
 In order to make a judgment as to whether the second student has
done better than the first, we need to judge their mark according
to the mean and standard deviation of each set of marks.
 For each value, in this case a student’s exam mark, a Z score
converts how far each exam mark is from the mean exam mark in
units of standard deviation.
 The formula for calculating Z scores is:
xi  x
zi 
 Where s

 z = individual Z score
 xi = individual observed value, for example, exam mark
 = mean for the set of data
 s = standard deviation.
 A positive Z score means that the observed data is above the
mean. A negative Z score means that the observed data is
below the mean.
 Student One:
 Mathematics exam mark of 60%. Mean 50%. Standard
deviation = 5.6

 Student Two:
 English exam mark of 70%. Mean 66%. Standard deviation =
10.5
 Student One’s mathematics exam mark converted into a Z
score:

 Student Two’s English exam mark converted into a Z score:


 Both students have a positive Z score, which means that they
both did above average in their respective exams. Student
One has a higher Z score than Student Two. Although Student
Two gained the higher exam mark, Student One actually did
better in relation to the other students sitting the exam in
mathematics.
Example 2

 The average pregnancy lasts 266 days, w/ a standard


deviation of 16 days
 Laura gave birth after 273 days
 Let’s convert this to a Z-score: xi  x
zi 
s
 X = 273 mean = 266 SD = 16

273 266 7 = +0.4375


z 
16 16
 Laura’s pregnancy was longer than average, which resulted in
a POSTIVE Z-score
=

Example
 Converting a Z-score to a “raw” score:
 The length of Ellen’s pregnancy results in a Z-score of –1.25
 How many days was she pregnant?

X    z
 Z = -1.25  = 266  = 16
 X = 266 + (-1.25)(16)
 X = 246 days
 Ellen’s pregnancy was shorter than average. This was expected as
her Z-score was NEGATIVE
Example 3

 The average time it takes for a certain pain reliever to begin


to reduce symptoms is 30 minutes with standard deviation of
4 minutes. Assume the variable is normally Distributed. If 40
patients are randomly selected, approximately how many will
be reduced pain to in less than 25 minutes?
Solutions

 Let’s convert this to a Z-score:

xi  x
zi 
s
= 25 -30 = -1.25
4
 Look the table: the value in the table between Z = 0 and Z =
-1.25 is 0.3944
 Therefore the apian relief reduced the apian for 0.3944 X 40
= 15.7 approximately 16 patients
Determining the normality
 A normally shaped or bell-shaped distribution is only one of many
shapes that a distribution can assume; however, it is very important
since many statistical methods require that the distribution of values
(shown in subsequent chapters) be normally or approximately
normally shaped.
 There are several ways statisticians check for normality. The easiest
way is to draw a histogram for the data and check its shape.
 If the histogram is not approximately bell shaped, then the data are not
normally distributed.
 Skewness can be checked by using the Pearson co efficient of skewness
(PC) also called Pearson’s index of skewness. The formula is
 If the index is greater than or equal to 1 or less than or equal
to -1, it can be concluded that the data are significantly
skewed.
 In addition, the data should be checked for outliers by using
the formula for detecting the outliers.
 A survey of 18 high-technology firms showed the number of
days’ inventory they had on hand. Determine if the data are
approximately normally distributed.
 5 29 34 44 45 63 68 74 74 81 88 91 97 98 113 118 151 158
 4

 Since the histogram is approximately bell-shaped, we can say that the distribution is
approximately normal.
Chapter 8

Examples
Example 1
Example 2
Z test
Steps for solving hypothesis testing
using z test
Example one
t Test for a Mean
 When the population standard deviation is unknown, the z test is not normally used for
testing hypotheses involving means. A different test, called the t test, is used.
 The distribution of the variable should be approximately normal.
 the t distribution is similar to the standard normal distribution in the
following ways.
 1. It is bell-shaped.
 2. It is symmetric about the mean.
 3. The mean, median, and mode are equal to 0 and are located at the center of the
distribution.
 4. The curve never touches the x axis.

 The t distribution differs from the standard normal distribution in the


following ways.
 1. The variance is greater than 1.
 2. The t distribution is a family of curves based on the degrees of freedom, which is a number
related to sample size.
 3. As the sample size increases, the t distribution approaches the normal distribution.
Example 1
Example 3

You might also like