Professional Documents
Culture Documents
Dr. Ghamsary
Chapter 2
Page 1
Elementary Statistics
M. Ghamsary, Ph.D. Chapter 02
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 2
Descriptive Statistics
Grouped vs Ungrouped Data
Ungrouped data: have not been summarized in any way are also called raw data Grouped data: have been organized into a frequency distribution
Raw Data: When data are collected in original form, they are called raw data.
The following are the scores on the first test of the statistics class in fall of 2004. 76 62 68 69 79 90 79 86 52 97 78 55 96 89 73 66 88 92 94 50 71 89 78 88 58 76 59 92 93 88 86 66 81 85 85 70 55 62 80 60 80 72 82 86 99 63 75 83 78 61
Group Data: When the raw data is organized into a frequency distribution
Frequency Distribution: is the organizing of raw data in table form, using classes and
frequencies.
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 3
Tally
Frequency 6 9 12 15 8
Class: Number of classes in the above table is 5. Class Limits: represent the smallest and largest data values in each class. Lower Class: the lowest number in each class. In above table 50 is the lower class limit of the first class, 60 is the lower class limit of the 2nd class, etc. Upper Class: the highest number in each class. In above table 59 is the upper class limit of the first class, 69 is the upper class limit of the 2nd class, etc. Class Width: for a class in a frequency distribution is found by subtracting the lower (or upper) class limit of one class minus the lower (or upper) class limit of the previous class. In above table the class width is 10.
Class Boundaries are used to separate the classes so that there are no gaps in the frequency
distribution. Class 50-59 60-69 70-79 80-89 90-99 Class Frequency Boundaries 49.5-59.5 6 59.5-69.5 69.5-79.5 79.5-89.5 89.5-99.5 9 12 15 8
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 4
1. The Histogram
o Making decisions about a process, product, or procedure that could be improved after examining the variation (example: Should the school invest in a computer-based tutoring program for low achieving students in Algebra I after examining the grade distribution? Are more shafts being produced out of specifications that are too big rather than too small?) o Displaying easily the variation in the process (example: Which units are causing the most difficulty for students? Is the variation in a process due to parts that are too long or parts that are too short?)
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 5
Histogram of Test1
Normal 16 14 12 Frequency 10 8 6 4 2 0 55 65 75 Test1 85 95
Mean StDev N 76.8 12.98 50
95 90 80
Percent
70 60 50 40 30 20 10 5
30
40
50
60
70 80 Test1
90
100
110
120
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 6
Scatterplot of f vs x
15.0
12.5
10.0
7.5
5.0 60 70 80 90 100
Midpoints x
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 7
Scatterplot of Cumulative f vs x
50
40 Cumulative f
30
20
10
0 60 70 80 90 100
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 8
16 14 12 Frequency 10 8 6 4 2 0
C Grade
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 9
5. Pie Chart
o A pie chart is a way of summarizing a set of categorical data or displaying the different values of a given variable (example: percentage distribution).
Pie charts usually show the component parts of a whole. Often you will see a segment of the drawing separated from the rest of the pie in order to emphasize an important piece of information
A 8, 16.0%
F 6, 12.0%
D 9, 18.0%
B 15, 30.0%
C 12, 24.0%
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 10
6. Pareto Charts
A Pareto chart is used to graphically summarize and display the relative importance of the differences between groups of data.
16 14 12
Frequency
10 8 6 4 2 0
7. Dot plot
A dot plot is a visual representation of the similarities between two sequences.
D o tp l o t o f T e s t1
49
56
63
70 Te s t 1
77
84
91
98
10
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 11
8. Stem-Leaf
o The Stem-and-Leaf Plot summarizes the shape of a set of data (the distribution) and provides extra detail regarding individual values. o They are usually used when there are large amounts of numbers to analyze. Series of scores on sports teams, series of temperatures or rainfall over a period of time, series of classroom test scores are examples of when Stem and Leaf Plots could be used. Stem 5 6 7 8 9 Leaf 025589 012236689 012356688899 001235566688899 02234679
11
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 12
Type of Distributions:
There are several different kinds of distributions, but the following are the most common used in statistics. Symmetric , normal, or bell shape Positively skewed, Right tail, or skewed to the right side. Negatively skewed, Left tail, or skewed to the left side. Uniform
500
400
300
200
100
18
36
54
72
90
108
126
144
12
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 13
Positively skewed
500
400
300
200
100
0.00
0.09
0.18
0.27
0.36
0.45
0.54
0.63
Negatively skewed
500
400
300
200
100
0.36
0.45
0.54
0.63
0.72
0.81
0.90
0.99
13
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 14
Uniform
1000
800
600
400
200
10
14
Elementary Statistics
Test1 76 62 68 69 79 90 79 86 52 97 78 55 96 89 73 66 88 92 94 50 71 89 78 88 58 Sex 1 1 1 1 0 0 1 1 0 1 1 1 1 1 0 0 1 0 1 1 0 0 1 0 1 Grade C D D D C A C B F A C F A B C D B A A F C B C B F Test1 76 59 92 93 88 86 66 81 85 85 70 55 62 80 60 80 72 82 86 99 63 75 83 78 61
Dr. Ghamsary
Sex 1 1 1 1 0 0 0 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 Grade C F A A B B D B B B C F D B D B C B B A D C B C D
Chapter 2
Page 15
1=Female 0=Male
Grade
F D C B A
Count
0 Male Female
Sex
15
12
Count
6 3 0 F D C B A
Grade
50
Percent
40
30
20
10
0 F D C B A
Grade
15
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 16
90
80 Test1
70
60
16
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 17
Numerical measurements:
x=
x
i =1
, or simply
x=
x.
n
x
i =1
, or simply =
x
.
Note: The sample mean, x is an unbiased estimate of the population mean, . Example1: Find the mean of 10, 7, 3, 12, 18.
x= 10 + 7 + 3 + 12 + 18 = 10 . 5
Example2: Find the mean of 10, 7, 3, 12, 18, 13, 17, 15, 25, 3
x= 10 + 7 + 3 + 12 + 18 + 13 + 17 + 15 + 25 + 30 150 = = 15 10 10
Example3: Find the mean of scores in the test#1, 2004 in data set in this chapter.
x=
76 + 62 + " + 78 + 61 = 76 . 8 50
17
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 18
Median: is defined to be the midpoint of the data set that is arranged from smallest to largest. Example4: Find the median of 10, 7, 3, 12, 15.
Solution: First we must sort the data set as follows: 3, 7, 10, 12, 15. The median is 10.
Mode: is defined to be the value in the data set that occurs most frequently.
Example7A: Find the mode of 10, 7, 3, 12, 15, 3. Mode is 3. Example7B: Find the mode of 10, 7, 3, 10, 15, 3. Modes are 3 and 10. Example7C: Find the mode of 10, 7, 3, 10, 10, 3. Mode is 10. Example7D: Find the mode of 10, 7, 3, 10, 7, 3. There is no mode, since all values occur with same frequency Example7E: Find the mode of 10, 7, 3, 12, 15, 18. There is no mode, since no values occur more than once.
18
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 19
Example 8: Find the mean, the median, and the mode of data set:
10, 17, 13, 12, 15, 18, 10, 17, 14, 16, 35, 28, 22, 17, 23, 12, 15, 28, 10, 20 Solution: First we must sort the data set 10, 10, 10, 12, 12, 13, 14, 15, 15, 16, 17, 17, 17, 18, 20, 22, 23, 28, 28, 35 o Mean: x = o Median:
10 + 10 + 10 + 12+ .....+28 + 28 + 35 352 = = 17.6 20 20
o Mode: 10, 17
Example 9: Find the mean, the median, and the mode of data set:
25, 42, 18, 37, 25, 18, 40, 57, 64, 66, 85, 86, 92 85, 88, 92, 67, 33, 75, 85, 48, 60, 80, 60, 50
Example10: Find the mean, the median, and the mode of data set:
12.37, 13.33, 32.67, 12.37, 26.45
19
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 20
Solution: First we need to find the class marks(midpoints) and then we use the following formula :x where
[ x. f ] ,
n
x : is the midpoint or class mark, and f :is the frequency n :is the number of data points
Class
Frequency
Class marks
f
50-59 60-69 70-79 80-89 90-99
n=
x
54.5 64.5 74.5 84.5 94.5
x. f
327 580.5 894 1267.5 756
6 9 12 15 8
f =50
x . f =3825
So the mean is
x=
[ x. f ] = 3825 = 76.5
n 50
20
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 21
Frequency 4 10 12 20 8 6
Weighted Average (Mean): The formula in above is also called weighted average or weighted
[ w .x ] w
Example12: Find the GPA of John who has the following courses with the corresponding units
and grades. English Math Spanish 5 units with the grade of A 3 units with the grade of F 2 units with the grade of D
Solution: In this problem, x will be the value of the grades and w is the number of units,
x=
Example13: A teacher is teaching 3 classes: There are 30 students in the first Class with the
average of 70 on the final exam. The second class has 40 students with the average of 60 on the final exam. The 3rd class has 20 students with the average of 80 on the final exam. Find the weighted average of the three classes combined together. Solution: Let x be the average of and w be the number of students.
x=
[ w .x ] w
21
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 22
Measures of Variation
Range Variance Standard Deviation The Range: is defined to be the highest value minus the lowest value in the data set The Variance: is defined by the following:
Sample: s variance).
( x
i =1
x)
n 1
N
or
s2 =
( x )
n 1 n
Population: variance).
( xi )
i =1
, or =
2
d x i
N N
Sample: s =
( x
i =1
x)
n 1
, and
Population: =
( x
i =1
22
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 23
Example14A: Find the range, variance, and the standard deviation of the following data
set. 3, 0, 7, 5, 15. Solution: o Range: Largest- Smallest = 15-0=15
s2 =
( x
i =1
x)
n 1
b 3 6g + b0 6g + b7 6g + b5 6g + b15 6g =
51
51
s2 =
9 + 36 + 1 + 1 + 81 , 51
s2 =
x 3 0 7 5 15
x x
3-6=-3 0-6=-6 7-6=1 5-6=-1 15-6=9
( x x)
9 36 1 1 81
2
( x x ) =0 ( x x )
s2 =
=128
( xi x )
i =1
n 1
128 128 = = 32 5 1 4
Variance
32 5.66
o Standard deviation: As we know the standard deviation is positive square root of variance. standard deviation = =
23
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 24
s2 =
2
( x )
n 1 n
x . x = 3 + 0 + 7 + 5 + 15 = 30 x
s2 =
2
then we have
( x )
n 1 n
( 30 ) 308
5 1 5
308 =
same as above.
----------------------------------------------------------------------------------
Example14B: Find the range, variance, and the standard deviation of the following data set.
10, 17, 13, 12, 15, 18, 10, 17, 14, 16 28, 22, 17, 23, 12, 15, 28, 10, 20, 35
Solution:
24
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 25
Example15A: Find the standard deviation for the following group data
Class 50-59 60-69 70-79 80-89 90-99 Frequency 6 9 12 15 8
Solution: First will modify the above formula for the variance. But first we need to find the class marks
(midpoints) and then we use the following formula
s2 =
bx xg . f
2 i
n1
or s
( xf ) f
n 1 n
where
x=
f
6 9 12 15 8
[ x. f ] = 3865 = 76.5
n 50
x
54.5 64.5 74.5 84.5 94.5
x. f
327 580.5 894 1267.5 756
bx xg
i
2
bx xg . f
2 i
n=
f =50
x. f
=3825
bx xg . f
2 i
= 7800
25
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 26
After substitution in s =
2
bx xg . f
2 i
n1
we get
s2 =
s = 159.18 12.6
s2 =
x
x
( xf ) f
n 1 n
f
6 9 12 15 8
x. f
327 580.5 894 1267.5 756
x2. f
(54.5)2.6 =17821.5 (64.5)2..9 =37442.25 (74.5)2.12 =66603 (84.5)2.15=107103.8 (94.5)2.8 =71442
n=
f =50
2
x. f
x2. f
=3825
=300412.5
s2 =
( 3825) 300412.5
40 1 50
300412.5
7800 = 159.18 and hence the standard deviation will be s = 159.18 12.6 , which the same as 49
the above result.
26
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 27
Example15B: Find the standard deviation for the following group data
Class 00-04 05-09 10-14 15-19 20-24 25-29 Frequency 4 10 12 20 8 6
27
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 28
Question 1. What will happen to the mean, median, mode, range, and standard deviation if we add
a fix number, c, to all values in the data set?
Answer. The mean, median, and mode will increase by c units, but the range, and standard
deviation will not change.
Question 2. What will happen to the mean, median, mode, range, and standard deviation if we
subtract a fix number, c, from all values in the data set?
Answer. The mean, median, and mode will decrease by c units, but the range, and standard
deviation will not change.
Question 3. What will happen to the mean, median, mode, range, and standard deviation if we
multiply a fix number, c, to all values in the data set?
Answer. The mean, median, and mode will be multiplied by c units, so does to the range, and
standard deviation.
Example 16:
X 15 13 15 15 22 Mean Median Mode Range Sd 16 15 15 9 3.46 X+7 15+7=22 16+7=23 15+7=22 15+7=22 22+7=29 16+7=23 15+7=22 15+7=22 9 3.46 X-7 15-7=8 16-7=9 15-7=8 15-7=8 22-7=15 16-7=9 15-7=8 15-7=8 9 3.46 X*7 15*7=105 16*7=112 15*7=105 15*7=105 22*7=154 16*7=112 15*7=105 15*7=105 9*7=63 3.46*7=24.22
y = ax + b
28
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 29
Empirical Rule
If the distribution of a data is bell shape or normal, then
Approximately 68% of scores are one standard deviation away from the mean. They fall in the interval x 1s , x + 1s . Approximately 95% of scores are two standard deviation away from the mean. They fall in the interval x 2s , x + 2s . Approximately 99.7% of scores are two standard deviation away from the mean. They fall in the interval x 3s , x + 3s .
Example17. Suppose the IQ scores are normally distributed with the mean of = 100 and
standard deviation of = 15 . Then by the empirical rule
Approximately 68% of scores are in the interval 100-15, to100+15 or 85 to 115. Approximately 95% of scores are in the interval 100-2(15), to100+2(15) or 70 to 130. Approximately 99.7% of scores are in the interval 100-3(15), to100+3(15) or 55 to 145.
29
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 30
Coefficient of Variation
The coefficient of variation is defined to be the standard deviation divided by the mean. Coefficient of variation (CV) =
s . If x is 0 or close to 0, then this measure shall not be used. x
Normally this measure is used in the case we have 2 or more groups of data with different units.
Example18.
Class A Class B Class C Mean =129, and standard deviation= 11 Mean =150, and standard deviation= 25 Mean =60, and standard deviation= 15 CV=11/129=.085 or 8.5% CV=25/150=.167 or 16.7% CV=15/60 = .25 or 25.0%
Measures of Position
Standard Scores
z= xx or s z= x
where, x or is the mean s or is the standard deviation. This value, z, measures the deviation from the mean in number of standard deviation which is also has no unit.
Example19. Suppose John is taking 3 classes with the following scores. In which class has he
better score? Class A Class B Class C English test score = 145 Mean =129, and standard deviation= 11 Physics test score = 190 Mean =150, and standard deviation= 25 Statistics test score = 88 Mean =60, and standard deviation= 15 Z=(145-129)/11 =1.45 Z=(190-150)/25 = 1.60 Z=(88-60)/15=1.87
30
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 31
Percentiles
The percentile corresponding to a given score (X) is denoted by P and it is given by the following formula
P=
Example20. John has the score of 88 in a class of 20 students. Find the percentile rank of a his
score. 81, 65, 75, 76, 78, 62, 63, 65, 70, 90, 61, 75, 76, 79, 58, 88, 82, 95, 90, 67.
Solution: In any problem of finding percentile, we must sort the data set from smallest to largest. 58, 61, 62, 63, 65, 65, 67, 70, 75, 75 76, 76, 78, 79, 81, 82, 88, 90, 90, 95.
P=
So johns score has 80th percentile, which means 80% of all scores are below 88.
Step2: Compute the L = p% of n., where L is the location for the score.
In this example L=12%of 20=0.12(20)=2.4 or 3.
Step3: Go to the data set and pick the score at the 3rd position which is 62.
It is usually written as P12=62
31
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 32
If L is a whole number, use the score as the average of Lth and (L+1)th location score.
Example22. In data set of example 20, find the score corresponding 40th percentile.
Step1: as before
58, 61, 62, 63, 65, 65, 67, 70, 75, 75 76, 76, 78, 79, 81, 82, 88, 90, 90, 95
Step2: L =40% of 20= 0.40(20)=8 which is a whole number so we are going to pick the average of 8th
and 9th scores.
Step3:
Inter-Quartile Range (IQR): is the difference between 3rd and 1st quartiles and it is denoted by
IQR and it is defined by IQR = Q3 Q1.
32
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 33
Outlier: An outlier is an extremely high or an extremely low data value, To check for outlier we
compute Q1-1.5(IQR) and Q3+1.5(IQR), then if
The suspected score is below Q1-1.5(IQR) or The suspected score is above Q3+1.5(IQR)
Then the score is said to be an outlier.
41 51
41 51
41 51
44 52
46 53
46 55
47 55
48 61
49 86
33
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 34
Five commonly used Statistics: The five numbers in any data set that is used frequently are
1. Minimum: is 58 2. Q1: L= 25% of 20 =.25(20) = 5. Since this is a whole number we use the average of 5th and
observation. In above ordered data set we have 5th score is 65 6th score is 65 their average is also 65. SO Q1=65. 6th
3. Q2: L= 50% of 20 =0.50(20) =10. Again since this is a whole number we use the average of
and 11th observation. In above ordered data set we have 10th score is 75 11th score is 76 their average is (75+76)/2=75.5 SO Q2=75.5.
10th
4. Q3: L= 75% of 20 =0.75(20) =15. This is a whole number we use the average of 15th and 16th
observation. In above ordered data set we have 15th score is 81
34
Elementary Statistics
16th score is 82
Dr. Ghamsary
Chapter 2
Page 35
5. Maximum: is 95.
So the five statistics are
100
90
C1
80
70
60
35
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 36
Example27. In data set below use computer to find the descriptive statistics and plot all appropriate charts for all variables that was discussed so far.
Test1 76 62 68 69 79 90 79 86 52 97 78 55 96 89 73 66 88 92 94 50 71 89 78 88 58 Sex 1 1 1 1 0 0 1 1 0 1 1 1 1 1 0 0 1 0 1 1 0 0 1 0 1 Grade C D D D C A C B F A C F A B C D B A A F C B C B F Test1 76 59 92 93 88 86 66 81 85 85 70 55 62 80 60 80 72 82 86 99 63 75 83 78 61 Sex 1 1 1 1 0 0 0 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 Grade C F A A B B D B B B C F D B D B C B B A D C B C D
36
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 37
90
80 Test1
70
60
37