Professional Documents
Culture Documents
MATH30-6
Probability and Statistics
Objectives
At the end of the lesson, the students are expected to
• Define and differentiate various measures of describing
data;
• Describe a given set of data using various measures;
and
• Interpret values that arise from computation.
Measures of Describing Data
• Measure of Central Tendency
- Also known as Measure of Center, Measure of Central
Location
- Measure of finding the mean, median or mode of the
dataset
- The midrange is rarely used. It is calculated by adding
the highest data value to the lowest data value and
dividing the sum by 2.
• Measure of Position
- Measure of finding the kth element of the distribution
- Also the quantiles or fractiles of distribution
Measures of Describing Data
• Measure of Variation
- Measure of how the data is distributed about the
mean.
• Measure of Shape
- Measure of the degree of symmetry of a distribution.
The Mean
• Most widely used parameter of describing a ratio
data.
• May be classified as
- Arithmetic mean
- Weighted mean
- Geometric mean
- Harmonic mean
- Trimmed mean
- Quadratic or Root Mean Square (RMS)
Arithmetic Mean
For Discrete Case
Sample mean
𝑛
𝑥𝑖 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛
𝑥ҧ = =
𝑛 𝑛
𝑖=1
Population mean
σ𝑋
𝜇=
𝑁
Arithmetic Mean
Characteristics
• All values are used.
• It is unique.
• The arithmetic mean is the only measure of central
tendency where the sum of the deviations of each
value from the mean is zero.
• It is calculated by summing the values and dividing by
the number of values.
• Every set of interval-level and ratio-level data has a
mean.
• The mean is affected by unusually large or small data
values.
Arithmetic Mean
Arithmetic Mean
6-1/205 Will the sample mean always correspond to one
of the observations in the sample?
𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3 … +𝑤𝑛 𝑥𝑛
𝑥ҧ𝑤 =
𝑤1 + 𝑤2 + 𝑤3 + ⋯ + 𝑤𝑛
σ𝑛𝑖=1 𝑤𝑖 𝑥𝑖
𝑥ҧ𝑤 = 𝑛
σ𝑖=1 𝑤𝑖
Weighted Mean
Example:
1. The Carter Construction Company pays its hourly
employees $16.50, $19.00, or $25.00 per hour. There
are 26 hourly employees, 14 of which are paid at the
$16.50 rate, 10 at the $19.00 rate, and 2 at $25.00
rate. What is the mean hourly rate paid of the 26
employees?
The Median
• The midpoint of the values after they have been
ordered from the smallest to largest
• There are as many values above the median as below it
in the data array.
• For an even set of values, the median will be the
arithmetic average of the two middle numbers.
Sample median
𝑥 = 𝑥 𝑛+1 Τ2 if n is odd,
𝑥𝑛Τ2 +𝑥𝑛Τ2+1
𝑥 = if n is even.
2
The Median
Characteristics
• There is a unique median for each data set.
• It is not affected by extremely large or small values and
is therefore a valuable measure of central tendency
when such values occur.
• It can be computed for ratio-level, interval-level, and
ordinal-level data.
• It can be computed for an open-ended frequency
distribution if the median does not lie in an open-
ended class.
The Median
Example:
1. Find the median of
1.8, 2.1, 1.7, 1.6, 0.9, 2.7, and 1.8
The Mode
• The value of the observation that appears most
frequently
The Mode
Characteristics
• Used when you want to find the most
occurring/frequent score
• A quick approximate of the average
• An inspection average
• The most unreliable among the three measures
because its value is undefined in some observations
• The only measure of central location that can be used
for nominal data
• Usually used in polls
• If a distribution is said to have 2 modes, it is bi-modal,
if three, a tri-modal. Generally, multi-modal.
The Mode
Example:
1. At a certain poll, the following data were recorded:
1 − Yes, 2 – No, 0 – Undecided. What is the modal choice?
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 2
𝑥ො = 1
Measures of Location
• Quantiles (or Fractiles) are points taken at regular
intervals from the cumulative distribution function of a
random variable.
• Dividing ordered data into q essentially equal-sized
data subsets is the motivation for q-quantiles; the
quantiles are the data values marking the boundaries
between consecutive subsets.
• There are q − 1 q-quantiles, with k an integer satisfying
0 < k < q.
Measures of Location
Quartiles
• Dividing the dataset into 4 groups.
Deciles
• Dividing the dataset into 10 groups.
Percentiles
• Dividing the dataset into 100 groups.
Quartile
• Any of the three fractiles obtained by dividing the set
of data into four equal parts
Program Q1 Q2 Q3
STATDISK 4.5 12.5 24.5
Minitab 3.75 12.5 26.25
Excel 5.25 12.5 22.75
TI-83 Plus 4.5 12.5 24.5
Algorithm used in QUARTILE()
function in Excel
1. Find the kth smallest member in the array of values,
where:
k=(quart/4)*(n – 1)+1
If k is not an integer, truncate it but store the fractional
portion (f) for use in step 3.
Grades in Statistics
Jon 100 Ann 84
Ron 65 Ria 86
Dan 75 Let 85
Tom 85 Bel 82
Bob 95 Nel 83
Range 35 Range 4
Range
Conclusion: Grades of males are more scattered while
grades of females are more compressed. Females are
more homogeneous in their math ability.
σ 2 σ 2
𝑛 𝑥𝑖 − 𝑥𝑖
𝑠2 =
𝑛(𝑛 − 1)
• Sample standard deviation (s)
- Positive square root of s2
𝑠 = 𝑠2
The quantity n − 1 is often called the degrees-of-freedom
associated with the variance estimate.
Variance and Standard Deviation
• Population variance (σ2)
𝑛
𝑥𝑖 − 𝜇 2
2
𝜎 =
𝑁
𝑖=1
𝜎= 𝜎2
Variance
Determine the variance in the previous example treating
the data as a population and sample.
Grades in Statistics
Jon 100 Ann 84
Ron 65 Ria 86
Dan 75 Let 85
Tom 85 Bel 82
Bob 95 Nel 83
ഥ
𝒙 84 ഥ
𝒙 84
Variance
Males
100 − 84 2 + 65 − 84 2 + 75 − 84 2 + 85 − 84 2 + 95 − 84 2
𝑠2 =
5−1
𝑠 2 = 205
100 − 84 2 + 65 − 84 2 + 75 − 84 2 + 85 − 84 2 + 95 − 84 2
𝜎2 =
5
𝜎 2 = 164
Variance
Females
84 − 84 2 + 86 − 84 2 + 85 − 84 2 + 82 − 84 2 + 83 − 84 2
𝑠2 =
5−1
𝑠 2 = 2.5
84 − 84 2 + 86 − 84 2 + 85 − 84 2 + 82 − 84 2 + 83 − 84 2
𝜎2 =
5
𝜎2 = 2
Variance
Conclusion: Males showed more variability. The higher
the variance, the more variable or far apart the values are
from each other.
Females
s = 1.5811
σ = 1.4142
Mean Absolute Deviation
𝑛
𝑥𝑖 − 𝑥ҧ
MAD =
𝑛
𝑖=1
Measures of Variation
Example:
12 + 6 + 13 + 2 + 5 + 0 + 9 + 6 + 10 + 7
𝑥𝐴ҧ = =7
10
8 + 10 + 9 + 12 + 5 + 1 + 4 + 7 + 9 + 3
𝑥ҧ𝐵 = = 6.8
10
• Kurtosis
- The degree of peakedness exhibited by the distribution
Skewness
Pearsonian Coefficient of Skewness in a sample (Pearson’s
Coefficient of Skewness by Karl Pearson) using the mode
𝑥ҧ − 𝑥ො
𝑆𝑘1 =
𝑠
Interpretation of values:
1. Sk < 0, “negatively skewed” or “skewed to the left”
2. Sk = 0, symmetrical
3. Sk > 0, “positively skewed” or “skewed to the right”
Skewness
Pearsonian Coefficient of Skewness in a sample (Pearson’s
Coefficient of Skewness by Karl Pearson) using the median
3 𝑥ҧ − 𝑥
𝑆𝑘2 =
𝑠
Interpretation of values:
1. Sk < 0, “negatively skewed” or “skewed to the left”
2. Sk = 0, symmetrical
3. Sk > 0, “positively skewed” or “skewed to the right”
Skewness
• A measure of the asymmetry of the frequency distribution
Leptokurtic Platykurtic
Mesokurtic
KURT() Function in Excel
Relative Kurtosis
𝑛 4
𝑛 𝑛+1 𝑥𝑖 − 𝑥ҧ 3 𝑛−1 2
𝐾= −
𝑛−1 𝑛−2 𝑛−3 𝑠 𝑛−2 𝑛−3
𝑖=1
Interpretation of values:
1. K < 0, “platykurtic” or “relatively flat”
2. K = 0, “mesokurtic” or having the same kurtosis as the
Normal Distribution. The kurtosis of a Normal Distribution
is 3.
3. K > 0, “leptokurtic” or “relatively peaked”
Summary
• The measures of central tendency are mean, median,
and mode. Midrange is rarely used. Midrange is found
by adding the highest data value to the lowest data
value and dividing the sum by 2.
• Different types of means (arithmetic, weighted,
geometric, harmonic, etc.) are computed depending on
the nature of data.
• The measures of location are quartiles, deciles, and
percentiles.
• The measures of variation tell us about how the data is
distributed about the mean.
• The measures of shape refer to either skewness or
kurtosis.
References
• Montgomery and Runger. Applied Statistics and Probability
for Engineers, 6th Ed. © 2014
• Microsoft® Excel
• Walpole, et al. Probability and Statistics for Engineers and
Scientists 9th Ed. © 2012, 2007, 2002
• http://irving.vassar.edu/faculty/wl/econ209/dessript.pdf
• http://www.preciousheart.net/chaplaincy/Auditor_Manual
/10descsd.pdf