You are on page 1of 38

Statistics 3

DR Taher

S
Measurements of central tendency

S  Measures of central tendency provide information


about typical or average values of a data set.
There are many such measures but we will
consider only the mean, median, and mode as
these are most commonly used.
Mean

S  MEAN Among the three measures of central tendency, the


mean is the most popular and widely used. It is sometimes
called the arithmetic mean.

S  If we compute the mean of the population, we call it the


parametric or population mean, denoted by μ (read “mu”).

S  If we get the mean of the sample, we call it the sample mean
and it is denoted by (read “x bar”).
Mean
Question

S  Scores of 15 students in community medicine


quiz consist of 25 items. The highest score is 25
and the lowest score is 10. Here are the scores:
25, 20, 18, 18, 17, 15, 15, 15, 14, 14, 13, 12, 12,
10, 10. Find the mean in the following scores
Answer

S  Σ x = 25+ 20+ 18+18+17+15+15+ 15+ 14+14+ 13+ 12+


12+ 10+10 = 288

S  N=15

S  Mean = 288/15

= 15.2
Mean

S  Advantages:-
S  Easy to calculate

S  It is unique in that a data set has one and only one mean.

S  Its value is influenced by all observations in the data set.

S  This is usually the best choice to describe data unless there is an
outlier.

S  Disadvantages:-
S  It affects by the outlying values

S  Can not be used in qualitative variables


outliers

S  Example:

S  Find the mean for the following data

2,5,6,8,11,14

Mean= 46/6 = 7.7

S  If we substitute 14 by 100 to the previous data find the mean


2,5,6,8,11,100

Mean = 132/6 = 22
Median

S  Median:- the median is the value that divides a dataset into
two equal parts so that the number of values that are greater
than or equal to the median is equal to the number of values
that are less than or equal to the median.

S  Median is what divides the data in the distribution into two
equal parts.

S  It is also known as the middle value or the 50th percentile.

S  Fifty percent (50%) lies below the median value and 50% lies
above the median value.
Median

S  Middlemost or most central item in the set of ordered


numbers; it separates the distribution into two equal halves

S  If odd n, middle value of sequence

S  if X = [1,2,4,6,9,10,12,14,17]

S  then 9 is the median


If even n, average of 2 middle values

S  if X = [1,2,4,6,9,10,11,12,14,17]
then 9.5 is the median; i.e., (9+10)/2
Median

S  Advantages :- Median is not affected by extreme values.


Uniqueness. Suitable for ordinal data, open ended tables,
and when data is not normally distributed.

S  Disadvantage:- Doesn’t take care of all variables. Not


suitable for symmetrical data.
Mode

S  The mode is the most frequently occurring number in a


distribution

S  if X = [1,2,4,7,7,7,8,10,12,14,17]

then 7 is the mode

S  Easy to see in a simple frequency distribution

S  Mode is not affected by extreme values

S  This is usually the best choice to describe data if you want to
select the most popular value or item.
Mode

S  It is classified as unimodal, bimodal, mulitimodal

S  Unimodal is a distribution of scores that consists of only one


mode.

S  Bimodal is a distribution of scores that consists of two modes

S  Mulitimodal is a distribution of scores that consists of more


than two modes
Mode

S  Example 1: what is the mode for the following data 1, 3, 3,


3, 4, 4, 6, 6, 6, 9

S  There are two modes: 3 and 6 (Bimodal)

S  Example 2: what is the mode for the following data 1, 3, 3,


3, 4, 4, 6, 6, 6, 9,10,10,10
S  There are 3 modes: 3,6 and 10 (multimodal)
Mode

S  When all values appear the same number of times the idea
of a mode is not useful. But we could group them to see if
one group has more than the others.
Mode

S  Example: {4, 7, 11, 16, 20, 22, 25, 26, 33}

S  Each value occurs once, We can try groups of 10:

S  0-9: 2 values (4 and 7)

S  10-19: 2 values (11 and 16)

S  20-29: 4 values (20, 22, 25 and 26)

S  30-39: 1 value (33)

S  In groups of 10, the "20s" appear most often, so we could choose 25 as
the mode.
S  You could use different groupings and get a different answer!
Quartile

S  A statistical term describing a division of observations into four


defined intervals based upon the values of the data and how they
compare to the entire set of observations.

S  The first quartile Q1 is the number below which lies the 25 percent of
the bottom data.

S  The second quartile Q2(the median) divides the range in the middle
and has 50 percent of the data below it.

S  The third quartile Q3 has 75 percent of the data below it and the top
25 percent of the data above it.
Quartile

S  Q1= n+1/4

S  Q3= 3(n+1)/4

S  Example: 5, 8, 4, 4, 6, 3, 8

S  Put them in order: 3, 4, 4, 5, 6, 8, 8

S  Q1= 7+1/4 = 8/4 = 2

S  Q1 is the second number 3, 4, 4, 5, 6, 8, 8

S  Q1= 4
Measures of variation (Dispersion)

S  Dispersion :

How tightly clustered or how variable


the values are in a data set.

S  Example

S  Data set 1:

S  [0,25,50,75,100]

S  Data set 2: [48,49,50,51,52]

S  Both have a mean of 50, but data set


1 clearly has greater Variability than
data set 2.
Range

S  The simplest of our methods for measuring dispersion is


range.  Range is the difference between the largest value and
the smallest value in the data set.  While being simple to
compute, the range is often unreliable as a measure of
dispersion since it is based on only two values in the set. 

S  RANGE = (X largest – X smallest)


Range

S  Example:

S  Data set 1: [1,25,50,75,100]; R: 100-1 = 99

S  Data set 2: [48,49,50,51,52]; R: 52-48= 4

S  The range ignores how data are distributed and only takes
the extreme scores into account
Dispersion: Interquartile Range

S  Difference between third & first quartiles • Interquartile


Range = Q3 - Q1

S  Spread in middle 50%

S  Not affected by extreme values


Variance and standard deviation
Standard deviation

S  The standard deviation enables us to determine, with a great


deal of accuracy, where the values of a frequency
distribution are located in relation to the mean
Dispersion: Standard Deviation
Normal Distribution

S  Many human traits, such as intelligence, personality, and


attitudes, also, the weight and height, are distributed among
the populations in a fairly normal way
Characteristics of a Normal
Distribution

S  The shape of distribution resemble a bell figure

S  The normal distribution is symmetrical about its mean

S  Mean= Median= Mode

S  The highest point is at its mean

S  The height of the curve decreases as one moves away from
the mean in either direction, approaching, but never
reaching zero
Characteristics of a Normal
Distribution

S  More properties of normal curves Empirical rule

S  About 68.3% of the area under a normal curve is within one
standard deviation (SD) of the mean

S  About 95.5% is within two SDs

S  About 99.7% is within three SDs


Standard normal curve

S  There is only one standardized normal curve

S  The curve is smooth, bell shaped, perfectly symmetrical curve

S  The total area under the curve =1

S  The mean= zero. standard deviation=1

S  The mean, median, and the mode are all coincide

S  The distance of value x from the mean of curve in units of standard
deviation is called relative deviate or standard normal vitiate “Z”
score
Symmetry: Skew

S  Skewness is a measure of the asymmetry of the probability


distribution. Roughly speaking, a distribution has positive
skew (right- skewed) if the right (higher value) tail is longer
and negative skew (left-skewed) if the left (lower value) tail
is longer (confusing the two is a common error).
Symmetrical vs. Skewed Frequency
Distributions

S  Symmetrical distribution:

S  Approximately equal numbers of observations above and below


the middle
S  Skewed distribution:

S  One side is more spread out that the other, like a tail

S  Direction of the skew:

S  Positive or negative (right or left)


Symmetrical vs. Skewed
Skewed Frequency Distributions

S  Negatively skewed :

Skewed left

Tail trails to the left

Mode > Median > Mean

S  Positively skewed :

Skewed right

Tail trails to the right

Mean > Median > Mode


Relation between mean, median and
mode

You might also like