Professional Documents
Culture Documents
Page 1 of 92
MANUAL
IN
STATISTICS
… statistics made simple …
18th edition
TABLE OF CONTENTS
4 Weighted Means 38
6 Probability 60
7 Normal Distribution 68
8 Test of Hypothesis I 77
9 Test of Hypothesis II 80
4 Weighted Means 32
5 Sampling 40
7 Probability 55
8 Normal Distribution 66
9 Estimation 69
10 Test of Hypothesis 72
11 Two-way ANOVA 84
Sources/ References:
Concepts, sample problems and information given by this manual were taken from the following :
2. Graduate Research Manual – Guide to thesis and Dissertations (Aquinas Graduate School)
7. Manual on Training on Microcomputer-Based for the Social Sciences (Richie Fernando Hall AdeNU,
2005)
12. http://statistics.about.com/od/Descriptive-Statistics/a/What-Is-Kurtosis.htm
Statistics Handouts
Page 4 of 92
Population vs Sample
Population is the set of all entities and elements under study. Sample is the
subset of population.
Parameters vs Statistics
Parameters refer to all descriptive measures or characteristics of population
while statistics refer to sample characteristics.
Census vs Survey
Census is the process of gathering information from every element of the
population while survey is the process of gathering information from every
element of the sample.
MEASUREMENT SCALES
a. Nominal – are simply labels, names or categories. Number assignment is
used for identification purposes, no meaning can be attached to the
magnitude or size of such numbers. Examples are gender, civil status,
telephone numbers, etc..
b. Ordinal - whereas nominal scales only classify, ordinal scales do not only
classify but also order the classes. Examples are job position, military
ranks, etc..
c. Interval – quantitative but has no true zero point. Examples are IQ, room
temperature, etc..
d. Ratio – quantitative and has true zero point. Examples are number of
children, physics test scores, etc…
Statistics Handouts
Page 5 of 92
SUMMATION NOTATION
For a given universe, suppose we observe a variable, say X. We may denote the
first value as X1, the second as X2 and so on. In general, Xi is the observation on
variable X made on the ith individual.
Given a set of N observations or data values represented by X1, X2, …, XN, we express
their sum as
∑ 𝑋𝑖 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁
𝑖=1
𝑁 𝑁
∑ 𝐶𝑋𝑖 = 𝐶 ∑ 𝑋𝑖
𝑖=1 𝑖=1
Theorem 2. If c is constant, then
∑ 𝐶 = 𝑁𝐶
𝑖=1
∑(𝑎𝑋𝑖 ± 𝑏𝑌𝑖 ) = 𝑎 ∑ 𝑋𝑖 ± 𝑏 ∑ 𝑌𝑖
𝑖=1 𝑖−1 𝑖=1
Statistics Handouts
Page 6 of 92
A. From all patients admitted in a hospital, the following information are collected:
1. name of patient
2. age
3. sex
4. body temperature
5. blood pressure
6. amt. of deposit
7. first time to see a doctor regarding ailment? (yes/no)
8. heartbeat per minute
9. weight
10. height
11. no. of glasses of fluid intake per day
12. no. of meals taken in a day
B. The following information are of interest for selected students of AdeNU who are cigarette
smokers.
1. age when first smoked
2. average no. of sticks consumed per day
3. main source of allowance
4. amt. of weekly allowance
5. Is your father a smoker? (yes/no)
6. occupation of father
7. brand of cigarette
8. position in the family
Date Set 1. Data on head circumference (in cm) and foot length (cm) of 8 new born
babies.
Baby no. 1 2 3 4 5 6 7 8
Head 31.5 33 37.5 38.5 35 32 38 34
circumference (x)
Foot length (y) 5.6 6.2 6.8 6.6 6.4 5.4 6.0 6.1
Data Set 2. Data on height (cm) and weight (lbs) of 8 stat students.
Student no. 1 2 3 4 5 6 7 8
Height(x) 168 141 165 180 165 156 150 147
Weight (y) 110 90 120 125 142 97 105 110
Statistics Handouts
Page 7 of 92
1. Survey Method – questions are asked to obtain information, either through self
administered questionnaire or interview (personal, telephone or internet)
4. Use of Existing Studies – e.g., census, health statistics, and weather bureau
reports
Two types:
documentary sources – published or written reports, periodicals,
unpublished documents, etc.
field sources – researchers who have done studies on the area of interest
are asked personally or directly for information needed
Advantages:
This method is appropriate only if there are few numbers to be presented.
Gives emphasis to significant figures and comparisons
Disadvantages:
It is not desirable to include a big mass of quantitative data in a “text” or
paragraph, as the presentation becomes incomprehensible.
Paragraphs can be tiresome to read especially if the same words are repeated
so many times
Advantages:
More concise than textual presentation
Easier to understand
Facilitates comparisons and analysis of relationship among different categories
Presents data in greater detail than a graph
a. Heading – consists of a table number, title and head note. The title explains
what are presented, where the data refers and when the data apply.
b. Box Head – contains the column heads which describes the data in each
column, together with the needed classifying and qualifying spanner heads.
c. Stub – these are classification or categories found at the left. It describes the
data found in the rows of the table.
e. Source Note – an exact citation of the source of data presented in the table
(should always be placed when figures are not original)
Statistics Handouts
Page 10 of 92
Illustration:
HEADING
Table 4.4
Philippines Crime Volume and Rate by Type in 1991
1991
Type Volume Crime
BOXHEAD
Rate
d
Total 11,326 195
SOURCE NOTE
Guidelines:
Title should be concise, written in telegraphic style, not in complete sentence
Column labels should be precise.
Categories should not overlap.
Unit of measure must be clearly stated
Show any relevant total, subtotals, percentages, etc..
Indicate if the data were taken from another publication by including a source
note
Tables should be self-explanatory, although they may be accompanied by a
paragraph that will provide an interpretation or direct attention to important
figures
Statistics Handouts
Page 11 of 92
Advantages:
main feature and implication of a body of data can be grasped at a glance
can attract attention and hold the reader’s interest
simplifies concepts that would otherwise have been expressed in so many words
can readily clarify data, frequently bring out hidden facts and relationship
a. Line Chart – graphical presentation of data especially useful for showing trends over
a period of time.
b. Pie Chart – a circular graph that is useful in showing how a total quantity is
distributed among a group of categories. The “pieces of the pie” represent the
proportions of the total that fall into each category.
c. Bar Chart – consists of a series of rectangular bars where the length of the bar
represents the quantity or frequency for each category if the bars are arranged
horizontally. If the bars are arranged vertically, the height of the bar represents the
quantity
d. Pictorial Unit chart – a pictorial chart in which each symbol represents a definite
and uniform value
Statistics Handouts
Page 12 of 92
In creating a stem-and-leaf display, we divide each observation into two parts, the
stem and the leaf. For example, we could divide the observation 244 as follows:
Stem Leaf
2 ⋮ 44
Alternatively, we could choose the point of division between the units and tens,
whereby
Stem Leaf
24 ⋮ 4
The choice of the stem and leaf coding depends on the nature of the data set.
Example: Typing speeds (net words per minute) for 20 secretarial applicants
68 72 91 47
52 75 63 55
65 35 84 45
58 61 69 22
46 55 66 71
2 2
3 5
4 5 6 7
5 2 5 5 8
6 1 3 5 6 8 9
7 1 2 5
8 4
9 1
Note: The stem-and –leaf display should include a reminder indicating the units of the
data value.
Example:
Date Set. Given below is the distribution of statistics test scores of 50 students (Perfect score is 70 and
passing score is 60% of it )
5 20 21 24 27 30 35 38 45 55
8 20 21 25 28 30 35 39 47 58
10 20 23 25 29 32 36 40 48 59
18 20 23 25 29 35 36 40 49 60
19 21 23 26 30 35 37 40 50 70
65
First find: c’ = R/K = = 9.28 ≈ 9
7
The class size is to have the same precision as the raw data and should take the
value nearest to c’. Hence, c’ = 9
4. Enumerate the classes or categories based on the quantities calculated in steps 1-3
bearing in mind that:
a) the lowest class must include the lowest observed value and the highest class,
the highest observed value. (The lowest value of the data is the lower class limit of
the first class).
b) That each observation will go into one and only class (that none of the values can
fall into possible gaps between successive classes and that the classes do not
overlap).
Classes Frequency
5 - 13 3
14 - 22 9
23 - 31 15
32 - 40 13
41 - 49 4
50 - 58 3
59 - 67 2
68 - 76 1
Note:
If data Unit of precision
is a whole number 1
has 1 decimal place 0.1
has 2 decimal places 0.01
2. Class Mark (CM) = the center of a class. It is the midpoint of the class interval
where observations in a class tend to cluster about.
( 𝐿𝑇𝐶𝐵+𝑈𝑇𝐶𝐵) 𝐿𝐿+𝑈𝐿
CM = 𝑜𝑟
2 2
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
RF = x 100%
𝑁
Statistics Handouts
Page 16 of 92
A. On organizing data: Construct an FDT for the given data. Show computations for R, K and c.
Table 1.
Blood Glucose of 20 individuals of the Honolulu Heart Center, 1969
1 107
2 145
3 237
4 91
5 185
6 106
7 177
8 120
9 116
10 105
11 109
12 186
13 257
14 218
15 164
16 158
17 117
18 130
19 132
20 138
Statistics Handouts
Page 18 of 92
Table 2.
Socio-Economic Characteristics of 30 Countries as of January 1997
1 Japan 80
2 Australia 78
3 Canada 78
4 Hongkong 78
5 Italy 78
6 Switzerland 78
7 France 77
8 US 77
9 Britain 76
10 Germany 76
11 New Zealand 76
12 Singapore 76
13 Brunei 75
14 Taiwan 75
15 Macau 73
16 Fiji 72
17 Malaysia 72
18 South Korea 72
19 Sri Lanka 72
20 China 71
21 Mexico 71
22 Saudi Arabia 70
23 Russia 69
24 Thailand 69
25 Iran 68
26 Brazil 67
27 Philippines 67
28 Turkey 67
29 Vietnam 67
30 Egypt 64
Statistics Handouts
Page 19 of 92
I. Measure of Location – value within the range of the data which describes its
location or position relative to the entire set of data. The more common measures
are measures of central tendency, percentile, decile and quartile.
B. Percentile (Pi) – divides the data set into 100 equal parts, each part having one
percent of all the data values. For example, if patrick received a rating of 90th
percentile in the National Secondary Achievement Test, this means that 90%
of the students who took the test had scores lower than Patrick’s.
C. Decile (Di) – divides a data set into ten equal parts, each part having ten
percent of all data values. The first decile is the 10 th percentile, the second
decile is the 20th pe4rcentile, and so on, up to the tenth decile which is the
100th percentile.
D. Quartile (Qi) – divides a data set into four equal parts, each part having
twenty-five percent of all data values. The first quartile is the 25th percentile,
the second is the 50th percentile, the third is the 75th percentile, and the
fourth quartile is the 100th percentile.
II. Measure of Dispersion – describes the extent to which the data are dispersed.
The more commonly used measures are:
A. Range (R)
- not a stable measure of variation because it can fluctuate greatly
with a change in just a single score, either the highest or the
lowest
- easiest to compute but the LEAST SATISFACTORY because its
value is dependent only upon the two extremes
B. Variance (s2/𝜎 2 )
- considers the position of each observation relative to the mean of
the set; denoted by 2
D. Coefficient of Variation ( CV )
- used to compare the variability of 2 or more sets of data even
when the observations are expressed in different units of
measurement.
Statistics Handouts
Page 21 of 92
III. Measure of Skewness (SK) – describes the extent of departure of the distribution
of the data from symmetry.
SK > 0, Positively Skewed bump on the left indicates that the mode
corresponds to a low value
tail extending to the right means that the
mean, which is sensitive to each score
value, will be pulled in the direction of
the extreme scores and will have a high
value
median which is unaffected by extreme
values will have a value between the
mode and the mean
K= 3
A distribution that is peaked in the same way as any
normal distribution, not just the standard normal
distribution, is said to be mesokurtic. The peak of a
mesokurtic distribution is neither high nor low,
rather it is considered to be a baseline for the two
other classifications.
Data Set 1: 115 115 120 120 120 125 125 130 300
Data Set 2: 115 115 120 120 120 125 125 125 130 130
Data 1 Data 2
1. Mean = = Xi/N
2. Median
4. Variance
2 = Xi2 - 2
N
Where is the mean of the ungrouped data
6. Coefficient of Variation
CV = [ / ] x 100%
7. Measure of Skewness
3( Mean Median )
SK =
Statistics Handouts
Page 24 of 92
Data 1 Data 2
𝑖
8. Pi = 𝑥𝑁
100
𝑖
9. Di = 𝑥𝑁
10
𝑖
10. Qi = 𝑥𝑁
4
Note:
MEDIAN
If n is odd, the median position equals (n=1)/2, and the value of the (n+1)2th
observation in the array is taken as the median, i. e.,
Md = X( [n/1] / 2)
If n is even, the mean of the two middle values in the array is the median, i.e.,
𝑋𝑛 + 𝑋𝑛
+1
2 2
Md =
2
Data Set
40 256.7 1783.83
2. median (Md)
NOTE: the middle class is the class which
N contains the (n/2)th value of the array
2 CFb
= LTCBMd + c
FMd
4. Variance ( 2)
fiXi 2
= N
2 where
6. Coefficient of Variation
CV = [ / ] x 100%
7. Measure of Skewness
3(mean median)
SK =
Statistics Handouts
Page 27 of 92
7. Percentiles
i
(100 ) N CFb
Pi = LTCBPi + c
FPi
8. Deciles
i
(10 ) N CFb
Di = LTCBDi + c
FDi
where LTCBDi = LTCB of the Di class
C = class size
<CFb = <CF of the class preceding Di
class
FMd = frequency of the Di class
N = total number of observations
9. Quartiles
i
( 4 ) N CFb
Qi = LTCBQi + c
FQi
FORMULAS:
1. Mean () =
fixi
N
N fiXi 2
2 CFb
2. Median (Md) = LTCBMd + c
7. Variance (2) = N
2
FMd
i
(100 ) N CFb 9. CV =
x100%
4. Pi = LTCBPi + c
FPi
3(mean median)
10. SK =
i
(10 ) N CFb
5. Di = LTCBDi + c
FDi
i
( 4 ) N CFb
6. Qi = LTCBQi + c
FQi
Statistics Handouts
Page 29 of 92
THE BOXPLOT
Definition. The boxplot is a graph that is very useful for displaying the following
features of the data:
Location
Spread
Symmetry
extremes
outliers
Remarks:
1. The height of the rectangle is arbitrary and has no specific meaning. If several
boxplots appear together, however, the height is sometimes made proportional to
the different sample sizes.
2. If the outlying observation is less than Q1 – 3 IQR or greater than Q3 + 3 IQR it is
identified with a circle at their actual location. Such an observation is called a far
outlier.
Statistics Handouts
Page 30 of 92
Examples:
1. Data Set A: 1 15 21 22 24
10 18 22 23 25
14 20 22 24 28
2. Data Set B: 3 10 11 12 19
8 10 12 16 19
9 10 12 16 30
Statistics Handouts
Page 31 of 92
More Problems:
1. Suppose a teacher assigns the following weights to the various course requirements:
Assignment 15%
Project 25%
Midterms 20%
Finals 40%
The maximum score a student may obtain for each component is 100. Sheila obtains
marks of 83 for assignment, 72 for project, 41 for midterms and 49 for the finals. Find her
mean mark for the score.
2. Two of the quality criteria in processing butter cookies are the weight and color development
in the final stages of oven browning. Individual pieces of cookies are scanned by a
spectrophotometer calibrated to reflect yellow-brown light. The readout is expressed in per
cent of a standard yellow-brown reference plate and a value of 41 is considered optimal
(golden-yellow). The cookies were also weighed in grams at this stage. The means and
standard deviations of 30 sample cookies are presented below.
Mean sd
Color 41.1 10
Weight 17.7 3.2
3. The following are weight losses (in pounds) of 25 individuals who enrolled in a five-week
weight-control program:
2 3 3 4 4 4 5 5 6 7 7 8 8
8 9 9 9 9 10 10 10 11 11 11 12
Compute for the 3rd quartile, 7th decile, and 89th percentile.
Statistics Handouts
Page 32 of 92
A. Using your raw data set and the FDT you constructed in exercise # 2, compute for
the appropriate descriptive measures (ungrouped and grouped). Show solution for
grouped data only.
B. Construct these tables in your workbooks and summarize the values obtained.
ungrouped Grouped
IV. Fractiles
P90 D6 Q3
Ungrouped Grouped Ungrouped Grouped Ungrouped Grouped
C. Interpret the obtained values for your mean, median and mode (ungrouped data
only).
Statistics Handouts
Page 33 of 92
Weighted Means
Weighted Mean is a statistical measure obtained when data is gathered from a survey questionnaire
using the Likert Scale
A Likert scale is a psychometric scale commonly used in questionnaires and is the most widely
used scale in survey research. When responding to a Likert questionnaire item, respondents specify
their level of agreement to a statement.1 A Likert item is simply a statement the respondent is asked
to evaluate according to any kind of subjective or objective criteria.
Generally, the level of agreement or disagreement is measured. Often five ordered response levels
are used, although many psychometricians advocate using seven or nine levels. A recent empirical
study2 found that a 5- or 7- point scale may produce slightly higher mean scores relative to the
highest possible attainable score, compared to those produced from a 10-point scale, and this
difference was statistically significant.
1
http://en.wikipedia.org/wiki/Likert_scale
2
Dawes, John (2008). "Do Data Characteristics Change According to the number of scale points used? An experiment using 5-
point, 7-point and 10-point scales". International Journal of Market Research 50 (1): 61–77.
Statistics Handouts
Page 34 of 92
5 4 3 2 1
A. GENERATION OF WASTE
Weighted
5 4 3 2 1
Means
A. GENERATION OF WASTE
Table 3
Adjectival Interpretation of the Likert Scale (cumulative mean)
Table 4
Adjectival Interpretation of the Likert Scale (per item)
Table 5 .
Extent of Solid Waste Management in AdeNU ( faculty and students) , 2007
Weighted Interpretation
Mean
A. GENERATION OF WASTE
2. Introduces strategies on how to apply the 4R's ( Reuse, 1.68 Very Low
Recycle, Reduce and Respond ) of Solid Waste Management
2.08 Low
3. Provides campaign to patronize the use of reusable and recycled
materials
4. Rejects products which are harmful to the environment such as 1.52 Very Low
foam, styrofoam, CFC aerosols, oil-based paints, pesticides,
insecticides, plastics, wood preservatives, glues and adhesives
5. Encourages the use of unused side of old papers or recycles its 1.86 Low
own paper ( as shown by the exam papers used, handouts, memo,
letters, etc)
1.47 Very Low
6. Encourages or requires the use of refillable inks for pens,
ballpens, printers, etc..
1.56 Very Low
7. Allows the use of old notebooks from previous years instead of
requiring new ones
2.04 Low
8. Encourages to reuse envelopes, boxes, packaging materials and
folders 1.46 Very Low
9. Repairs or disposes defective computers in laboratories or
offices
Generation of Waste
The extent of performance of SWM practices of students and faculty on the area of generation of
wastes is given in Table 5. The results show the respondents’ mean, based on the nine (9) indicators
used, ranged from 1.4 to 2.08 or from “ very low” to “low” ratings. The respondents gave an overall mean
that resulted to “very low” to the following indicators: “provides information through campaigns or
seminars about SWM (1.67)”, “introduces strategies on how to apply the 4R's of Solid Waste Management
(1.68)”,, “rejects products which are harmful to the environment such as foam, Styrofoam, CFC aerosols,
oil-based paints, pesticides, insecticides, plastics, wood preservatives, glues and adhesives (1.52)” ,
“encourages the use of refillable ink (1.47)”, “allows the use of old notebooks (1.56) “ and “repairs or
disposes defective computers (1.46)”. The “very low” also implied that almost none of the respondents
On the indicators stating that “provides campaign to patronize the use of reusable and recyclable
materials (2.08)”, “encourages the use of unused side of old papers or recycles its own paper (1.86)”,
“encourages or requires the use of refillable materials (3.2)”,and “encourages to reuse envelopes, boxes,
packaging materials and folders (2.04)” had an overall mean of “low”. Only 25% of the respondents
The students and faculty gave an overall weighted mean that resulted to “very low”. In totality,
the cumulative mean score resulted to 1.7. The result implied that almost none of the indicators were
Survey results reveal that there was a need for intensive information campaign about SWM and
that the University had yet to implement strategies on how to apply the 4R’s. Such an outcome presents
an opportunity to promote waste-saving measures among the student and teaching population in the
A. For the raw data given, obtain the weighted mean for each item and the
cumulative/total weighted mean.
C. What is the highest and lowest obtained weighted means. Interpret the values.
D. Conclusion. Make a discussion on the result of the test base on the objective of the
study.
Problem Set
Thesis title: Portable Games and Devices towards Aggressive Behavior of the First Year BS Digital
Animation Students of Ateneo de Naga University
Objective: To determine the level of influence of playing Portable Games and Devices on the behavior
specifically aggressiveness of the respondents
Table 1
Results from the Standard Questionnaire by Buss and Perry.
Weighted
Indicators 5 4 3 2 1 Means
1. Some of my friends think I am a 18 12 15 12 13
hothead.
2. If I have to resort to violence to protect 17 21 10 15 7
my rights, I will.
Lesson # 5 – Sampling
5000
n =
1 5000(0.05) 2
5000
=
1 5000(.0025)
5000
=
1 12.5
5000
=
13.5
= 370.37 ~ 370
SAMPLING METHODS
every element in the population not all elements are given a equal
chosen sample
to find out how her faculty feel elements for the sample
She places all 150 names of the university wants to find out how
thoroughly , and then draws the bookstore provides. Every day for
out the names of 25 individuals two weeks during her lunch hour,
completed questionnaires.
Statistics Handouts
Page 43 of 92
respondents according to gender where there are 219 females and 146 males.
Using stratified sampling, how many respondents will be obtained from each
strata?
365
n =
1 365(0.05) 2
365
=
1 365(.0025)
365
=
1 0.9125
365
=
1.9125
= 190.849 ~ 191
Statistics Handouts
Page 45 of 92
Population of 365
Researcher identifies
2 subgroups or strata
219 146
219 females (60% = ) 146 males (40% = )
365 365
sampling, the population is divided into primary stage units (PSU) then a
sample of PSUs is drawn. In the second stage of sampling, each selected PSU
The process of subsampling can be carried to a third stage fourth stage and so
stage.
large universe
Statistics Handouts
Page 48 of 92
R1 R2 R3 R4 R5
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
Choose randomly 1 city for each province
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
Populations
A B C D E
25%
AHG K L I D W E R
F G H I J
T Y U O P S F G H J K L M N O
50%
Z X C V B N M
P Q R S T
25%
M N
B C 25%
F H M O 50%
O C M Q S 25%
Populations
AB
CD
CDE
FG AB
IJK
MNO GH
HKL
EF
CD
AB
FG
AB
HKL
CLUSTER SAMPLING D A
TWO-STAGE SAMPLING
Statistics Handouts
Page 51 of 92
D. Snowball
a. Some might use this technique because they just want to get a “feel” of the
market before launching or producing a certain product.
Examples.
1. If there are eight doors providing access to a building, in how many ways can a
person enter the building by one door and leave by a different door?
2. How many even three-digit numbers can be formed fro the digits 1, 2,5,6 and 9 if
each digit can be used only once?
3. How many positive integers of three different digits can be formed from the integers
1,2 3, 4 and 5.
4. How many different arrangements, each consisting of five different letters, can be
formed from the letters of the word “PERSONAL” if each arrangement is to begin and
end with a vowel?
5. How many different arrangements of five distinct books each can be made on a shelf
with space for five books?
6. Suppose that there are 3 math books and 3 physics books, how many different
arrangement of the six books can be made on a shelf if books on the same subject
are to be kept together?
Examples:
1. A bus has six vacant seats. If three additional passengers enter the bus, in how
many different ways can they be seated?
2. In how many ways can 3 boys and 3 girls be seated in a row containing six seats if
a. a person may sit in any seat
b. boys and girls must sit in alternate seats?
Theorem 2. If we are given n elements, of which exactly m1 are of one kind, exactly m2
are alike of a second kind, …, and exactly mk are alike of a kth kind, and if n=m1 +
m2 + .. + mk, then the number of distinguishable permutations that can be made of the
n elements taking them all at one time is
𝑛!
𝑚1 ! 𝑚2 ! 𝑚3 ! … 𝑚𝑘 !
Examples:
1. Determine the number of different nine-digit numerals that can be formed from the
digits 6,6,6,5,5,5,4,4 and 3.
Examples:
1. A football conference consists of 10 teams. If each team plays every other team, how
many conference games are played?
2. A student has twelve posters to pin up on the walls of her room, but there is space
for only 7. In how many ways can she choose the posters to be pinned up?
3. How many committees of five can be formed from 7 sophomores and 5 freshmen if
each committee is to consist of 3 sophomores and 2 freshmen?
at most 3 sophomores?
Statistics Handouts
Page 55 of 92
2. Solve problems requiring the applications of the concept of permutation and combination.
1. How many different outcomes are possible in a roll of 2 dice? In tossing 5 coins? In
rolling 2 dice and tossing 3 coins simultaneously?
2. How many distinct permutations can be made from the word COOL? List them
down.
3. Package of 10 game boy sets contains 3 defective sets. If 5 sets are to be picked out
randomly and sent to a customer for an inspection, in how many ways can the
customer find at least two defective set?
4. How many different telephone numbers can be formed from a seven-digit number if
the first digit cannot be zero?
5. A college freshman must take a science course, a humanities course, and a math
course. If she may select any of 6 science courses, any of 4 humanities, and any of 4
math courses, how many ways can she set her program?
6. A shelf contains 3 books in red binding, 4 books in blue and 2 in green. In how
many different orders can they be arranged if all the books of the same color must
be kept together?
7. How many different numbers greater than 200 can be formed from the digits 1,2,3,4
and 5 (a)if repetitions are not allowed? (b) repetitions are allowed?
8. How many committees of 5 can be selected from 12 republicans and 8 democrats (a)
if it must contains 2 republicans and 3 democrats? (b)if it must contains at least 3
republicans?
9. There are 8 baseball teams in a league. How many games will be played if each team
play each of the other teams 40 times?
10. In how many ways can one make a selection of 5 black balls, 3 red balls, and 2
white balls from a box containing 8 black balls, 7 red balls and 5 white balls?
11. The tennis squad of one college consists of 8 players that if another consist of 10
players. In how many ways can a doubles match between the 2 institutions be
arranged?
12. In how many ways can one make selection 4 novels, 3 biographies and 6 detective
stories from a shelf containing 10 novels, 8 biographies and 10 detective stories.
Statistics Handouts
Page 56 of 92
Lesson #7 – Probability
PROBABILITY
P(E) = . n(E) . where n(E) are the numbers of elements in E and S respectively.
n(S)
Furthermore, if P(E)= 0 then the event will never happen or it is an “impossible” event.
If P(E) = 1, the event is certain to happen or it is a “sure” event.
Examples:
1. Determine the probability of each of the following events:
a. Obtaining a 4 on a throw of a single die
b. Obtaining a head on a toss of a coin
2.
1 2 3 4 5 6
a. a. If 2 dice are thrown, what 1 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
is the probability of obtaining 2 (2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
a sum of 8? a sum of 3? 3 (3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
5 (5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6)
4. If a French, Spanish, Russian and English books are placed at random on a shelf
with a space for 4 books, what is the probability that the Russian and English books
will be next to each other?
Statistics Handouts
Page 57 of 92
What is the probability that only the male will be alive at age 65?
What is the probability that at least one of the two will be alive at age 65?
Example. What is the probability that in a single toss of a two dice, the sum
will be 4 or 7?
Example. Take a math class with 52 students, 27 of whom are males and
the rest are females. A total of 21 of the males and 15 of the females got a
grade above 90. What is the probability that if a student is chosen at
random, this student has either grade of above 90 or is a male?
Statistics Handouts
Page 59 of 92
When the data are presented in the form of frequencies and are classified
according to qualitative rather than quantitative categories, they are called
qualitative data in contingency tables.
Illustration:
Vegetarian Status
Vegetarian Non Total
Gender Vegetarian
20 23 43
Male
22 25 47
Female
42 48 90
Total
Exercise # 6 - Probability
Objectives:
At the end of the exercise, the student is expected to be able to apply the different operations on probability
1. On a throw of two dice, what is the probability of obtaining a sum that at most 5?
2. If a single card is drawn from deck of 52 playing cards, what is the probability of
each of the following events: (a) obtaining a red card; (b) obtaining a heart; and (c)
obtaining an ace or spade?
4. A number of two different digits is to be formed from the digits 1,2,3,4 and 5.
Determine the probability of each of the following events:
a. the no. is odd
b. no. is greater than 25
5. A couple is planning to have three children. Find the probabilities that the couple
will have
a. two girls and one boy
b. at least two boys
c. no boys
d. at most two girls
e. two boys followed by a girl
Statistics Handouts
Page 62 of 92
What is the probability that a patient chosen at random from among the 150 will be:
a. pregnant
b. female or elderly
c. female and elderly
d. male or a child
e. male provided that he is elderly
f. child given male
Statistics Handouts
Page 63 of 92
PROBABILITY DISTRIBUTIONS
Definition. A function whose value is a real number determined by each element n the
sample space is called a random variable.
Remark. We shall use an uppercase letter, say X, to denote a random variable and its
corresponding lowercase letter, x in this case, for one of its value.
Sample Points X Y
HHH 3 3
HHT 2 1
HTH 2 1
HTT 1 -1
THH 2 1
THT 1 -1
TTH 1 -1
TTT 0 -3
Definition. A random variable defines over a discrete sample space is called a discrete
random variable
Definition. A table or formula listing all possible values that a discrete random
variable can take on, along with the associated probabilities, is called a
discrete probability distribution.
Remark. The probabilities associated with all possible values of a discrete random
variable must sum to 1.
Examples. For Experiment #1, the discrete probability distributions of the random
variables X and Y are
x 0 1 2 3
P(X = x) 1/8 3/8 3/8 1/8
Y -3 -1 1 3
P(Y = y) 1/8 3/8 3/8 1/8
Definition. The function with values f(x) is called a probability density function for the
continuous random variable X, if
*the total area under its curve and above the horizontal axis is equal to 1; and
*the area under the curve between any two ordinates x=a and x=b gives the
probability that X lies between a and b.
Remarks:
Expected Values
x x1 x2 … xn
P(X = x) f(X1) f(X2) … f(Xn)
Examples:
x 0 1 2 3
P(X = x) 1/8 3/8 3/8 1/8
Y -3 -1 1 3
P(Y = y) 1/8 3/8 3/8 1/8
x x1 x2 … xn
P(X = x) f(X1) f(X2) … f(Xn)
The variance of X is
𝑛
Example:
E(X) = 1.5
= 0.75
Example. A used car dealer finds that in any day, the probability of selling no car is
0.4, one car is 0.2, two cars is 0.15, 3 cars is 0.10, 4 cars is 0.08, five cars is 0.06 and
six cars is 0.01. Let g(X) = 500 + 1500X represent the salesman’s daily earnings, where
X is the number of cars sold. Find the salesman’s expected daily earnings.
Statistics Handouts
Page 67 of 92
In making use of the properties of the normal curve to solve certain types of
statistical problems, one must first learn how to find areas under the normal curve.
The first step in finding areas under the normal curve is to convert the normal
curve of any given variable into a standardized normal curve by using the formula:
X X
Z
S
WORDED PROBLEMS:
1. Given a normal distribution with mean 350 and standard deviation s=40, find the
probability that x assumes a value greater than 362.
Statistics Handouts
Page 68 of 92
2. An electrical firm manufactures light bulbs that have a length of life that is
normally distributed with mean equal to 800 hours and a standard deviation of
40 hours. Find the probability that a bulb burns between 778 and 834 hours
3. On an examination the average grade was 74 and the standard deviation was 7. If
12% of the class are given A’s, and the grades are curved to follow a normal
distribution, what is the lowest possible A and the highest possible B? Find D6.
Objectives: At the end of the exercise the student should be able to:
1.Find probabilities using the standard normal probability curve;
2. Apply the concepts of finding areas under the normal probability curve in solving
problems
II. Find the unknown constant a given the area under the normal curve.
a. P(z < a) = 0.25
b. P(z > a) = 0.99
i. the lowest passing grade if the lowest 10% of the students are given F’s;
ii. the highest B if the top 5% of the students are given A’s;
Lesson # 9 – Estimation
ESTIMATION
- refers to any process by which sample information is used to predict or estimate the
numerical value of some population measure.
- Two types of estimators: point estimator and interval estimator. A point estimator
yields a numerical value of the estimate. An interval estimate gives a range or band
of values within which the value of the parameter is estimated to lie.
P( X k X k ) 1
Where
𝑥̅ = 𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛
k Z ( s.e.)
2
s.e.
n
= level of significance
1- = level of confidence
Statistics Handouts
Page 71 of 92
Example.
1.The mean IQ of a random sample of 400 high school students is 110. The standard
deviation of the population of IQ scores is 16. If the population is normally distributed,
find:
a. a .95 confidence interval estimate of
Z 1.96
2
Z 1.64
2
Find the .90 confidence interval estimate of the mean weight of all the pupils in a
certain school if a random sample of 25 pupils has a mean weight of 70lbs with a
standard deviation of 15lbs. Assume the population weights to be normally distributed.
t 1.711
2
Statistics Handouts
Page 72 of 92
3. The contents of 7 similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0,
10.2 and 9.6 liters. Find a 95% confidence interval for the mean content of all such
containers, assuming an approximate normal distribution for containers contents. (
𝑡∝ = 2.447 )
2
4. The mean and standard deviation for the quality grade-point averages of a
random sample of 36 college seniors are calculated to be 2.6 and 0.3, respectively. Find
the 99% confidence interval for the mean of the entire senior class. Interpret the
obtained confidence interval. ( 𝑍∝ = 2.575 )
2
5. The manager of a home delivery service for pizza pies wants an estimate of the
average time it takes to deliver an order within the town proper of the City of Naga. A
sample of 25 deliveries had a Mean time of 15 minutes and a standard deviation of 4
minutes. Construct a 95% confidence interval for the average time for all deliveries.
Interpret the interval obtained. ( Z = 1.96 )
Type I error ( error) – when we reject the null hypothesis when in fact the
null hypothesis is true.
Type II error ( error) – when we accept the null hypthesis when in fact the
null hypothesis is false.
Definition. When the rejection region located at only one extreme of the range
of values for the test statistics, the test is ONE-TAILED. If Ha is a statement of
non-equality represented by the sign , then the hypothesis is non-
directional, thus we have a two-tailed test.
Statistics Handouts
Page 74 of 92
Case 1. Z Test
A. Ha: 0 or
B. Ha: 0
C. Ha: 0
X 0
Zc
n
k. Decision Rule: At a level of significance ,
i. Ho: = 28.5
Ha: > 28.5
iv. Computation:
𝑋̅ − 𝜇𝑜 29.2 − 28.5
𝑍𝑐 = 𝜎 = = 0.933
3
√𝑛 √16
Z = 1.645
Example 2. For the past five years, the mean height of AdeNU students is 60
inches with a standard deviation of 4 inches. A simple random sample of 100
is taken from the present students. It was found that the mean height is 65
inches. Is there reason to believe that the mean height of present AdeNU
students different from the past five years at 5% level of significance?
Statistics Handouts
Page 76 of 92
Case 2. T Test
D. Ha: 0 or
E. Ha: 0
F. Ha: 0
X
Tc
s
n
i. Ho : = 6
Ha: < 6
iii. Decision Rule : reject Ho if Tc < -T, otherwise accept Ho.
v. Computation:
𝑋̅ − 𝜇 5.8 − 6
𝑇𝑐 = 𝑠 = = −3.536
0.16
√𝑛 √8
1. A certain brand of powdered milk is advertised as having net weight of 250 grams. If
the net weights of a random sample of 10 cans are 253, 248,
252,245,247,249,251,250,247 and 248 grams, can it be concluded that the average
net weight of the cans is less than the advertised amount? Use = 0.01 and assume
that the net weight of this brand of powdered milk is normally distributed.
2. In a time and motion study, it was found that the average time required by workers
to complete a certain manual operation was 26.6. A group of 20 workers was
randomly chosen to receive a special training for two weeks. After the training it was
found that their average time was 24 minutes and a standard deviation of 3
minutes. Can it be concluded that the special training speeds up the operation? Use
= 0.05
3. The manager of an appliance store, after noting that the average daily sales was only
12 units, decided to adopt a new marketing strategy. Daily sales under this strategy
were recorded for 90 days after which period the average was found to be 15 units
with a standard deviation of 4 units. Does this indicate that the new marketing
strategy increased the daily sales? Employ = 0.01
4. The daily wages in a particular industry are normally distributed with a mean of
P66.00. In a random sample of 144 workers of a very large company in this industry,
the average daily wage was found to be P62.00 with a standard deviation of P12.50,
can this company be accused of paying inferior wages at the 0.01 level of
significance?
Statistics Handouts
Page 79 of 92
III. ANOVA
Sample Problems:
a. A researcher wishes to know if there are differences on the average preparation time of
four methods of preparing a solvent.
b. An agriculturist may compare the average yields of three corn varieties used by Los
Banos
c. A consumer wish to know if the different brands of gasoline in the market are equally
good with respect to average mileage
d. A medical researcher is interested in comparing the effectiveness of 3 different
treatments to lower the cholesterol of patients with high values
e. An ecologist wants to compare the amount of certain pollutant in five rivers
SAMPLE PROBLEMS
A. Dependent or Paired
a. Ho: The weights before and after are equal therefore the procedure is not
effective.
Ha: The weights before and after are not equal therefore the procedure is
effective.
B. Independent
2. Some statistics students complain that pocket calculators give other students
advantage during statistics examination. To check this contention, a simple
random sample of 45 students were randomly assigned to two groups, 23 to
use calculators and 22 to perform calculations by hands. The students then
took a statistics examination that required a modest amount of arithmetic.
The results are shown below:
With Calculator 85 86 89 84 82 83 90 91 86 90 87 87 92 85 86 89 88
88 89 90 85 89 90
Without Calculator 86 88 90 92 86 85 88 89 85 91 86 85 92 84 83 88 90
91 86 90 86 87
Do the date provide sufficient evidence to indicate that the students taking
this particular examination obtain higher scores when using a calculator? Test at
= 10%.
b. Decision rule: Reject Ho if T-computed > critical value, otherwise accept Ho.
ANOVA
3. A study was conducted to compare the three teaching methods. Three groups
of 6 students were chosen and each group is subjected to one of three types of
teaching method. The grades of the students taken at the end of the semester
are given as:
b. Decision rule: Reject Ho if F-computed > critical value, otherwise accept Ho.
f. Conclusion: There is evidence to say that the three methods are not equal.
We can also conclude that Method III is more effective since it students got higher
grades compared to the other two methods.
Statistics Handouts
Page 83 of 92
4. It is believed that people with high blood pressure need to watch their weight.
A random sample of 300 subjects was classified according to their weight and
blood pressure. At the 5% level of significance, is there sufficient evidence to
conclude that a person’s weight is related to his blood pressure?
Blood Pressure
Weight High Normal Low
Overweight 40 34 18
Normal 36 77 27
Underweight 16 33 19
b. Decision rule: Reject Ho if X2-computed > critical value, otherwise accept Ho.
Subject 1 2 3 4 5 6 7 8 9 10 11 12
Initial 120 141 130 162 150 148 135 140 129 120 140 130
Weight
3-Month 123 143 140 162 145 150 140 143 130 118 141 132
Weight
Source: Basic Statistics for Health Sciences by Kuzma
d. Ho:
Ha:
e. Test Statistic:
f. Decision Rule:
h. Decision:
i. Conclusion:
Statistics Handouts
Page 85 of 92
2. An investment analyst claims to have mastered the art of forecasting the price
changes of gold. The ff. Table gives the actual gold price changes and the
changes forecasted by the investment analyst (in%) on a simple random
sample of 8 months. Use a = 5%.
Month 1 2 3 4 5 6 7 8
Actual Price Changes 7.3 -2.1 8.5 -1.5 9.2 6.7 -4.8 -0.8
Forecasted Changes 14.9 -19.7 7.0 -5.3 1.0 -0.8 -8.3 6.7
a. Ho:
Ha:
b. Test Statistic:
o. Decision Rule:
q. Decision:
r. Conclusion:
Statistics Handouts
Page 86 of 92
a. Ho:
Ha:
b. Test Statistic:
c. Decision Rule:
e. Decision:
f. Conclusion:
Statistics Handouts
Page 87 of 92
Academic
Performance
Passed 31 45 4 80
Failed 1 4 15 20
Total 32 49 19 100
a. Ho:
Ha:
b.Test Statistic:
c.Decision Rule:
e.Decision:
f.Conclusion:
Statistics Handouts
Page 88 of 92
Example 1. A research study was conducted to examine the impact of eating a high
protein breakfast on adolescent’s performance during a physical education physical
fitness test. Half of the subjects received a high protein breakfast and half were given
a low protein breakfast. All of the adolescents, both male and female, were given a
fitness test with high scores representing better performance. Test scores are
recorded below.
Males Females
High Protein Low Protein High Protein Low Protein
10 5 5 3
7 4 4 4
9 7 6 5
6 4 3 1
8 5 2 2
5% 1%
Interpretation:
Seatwork:
1. Different typing skills are required for secretaries depending on whether one is
working in a law office, an accounting firm, or for research mathematical group at a
major university. In order to evaluate candidate for this positions, an employment
agency administers three distinct standardized typing samples. A time penalty has been
incorporated into the scoring of each sample based on the number of typing errors. The
mean and standard deviation for each test, together with the score achieved by a recent
applicant, are given in Table below. For what type of position does this applicant seem
to be best suited?
2. Researchers have sought to examine the effect of various types of music on agitation
levels in patients who are in the early and middle stages of Alzheimer’s disease. Patients
were selected to participate in the study based on their stage of Alzheimer’ s disease.
Three forms of music were tested: easy listening, Mozart, and piano interludes. While
listening to music, agitation levels were recorded for the patients with a high score
indicating a higher level of agitation. Scores are recorded below.
21 9 29 22 14 15
24 12 26 20 18 18
22 10 30 25 11 20
18 5 24 18 9 13
20 9 26 20 13 19
Statistics Handouts
Page 91 of 92
3. A study examining differences in life satisfaction between young adults, middle adult
and older adult men and women was conducted. Each individual who participated in
the study completed a life satisfaction questionnaire. A high score on the test indicates
a higher level of life satisfaction. Test scores are recorded below.
Male Females
Young Middle Older Young Middle Older
Adult Adult Adult Adult Adult Adult
4 7 10 7 8 10
2 5 7 4 10 9
3 7 9 3 7 12
4 5 8 6 7 11
2 6 11 5 8 13
Mean = 3 6 9 5 8 11
Statistics Handouts
Page 92 of 92
Scale ( +/ -) Decision
Table.
Result of AdNU Entrance Examinees of 20 Examinees
No. SAI RPM Math English
1 52 25 47 21
2 84 40 48 11
3 113 90 58 29
4 92 90 47 14
5 98 80 54 17
6 91 80 56 19
7 52 15 52 18
8 116 40 68 38
9 101 60 69 22
10 83 15 48 16
11 65 10 52 16
12 96 95 54 19
13 94 80 54 15
14 89 65 56 20
15 91 45 54 21
16 92 80 64 17
17 101 95 58 33
18 97 95 56 17
19 89 80 56 11
20 96 95 58 27