You are on page 1of 92

Statistics Handouts

Page 1 of 92

MANUAL
IN

STATISTICS
… statistics made simple …
18th edition

Ms. Yumi Vivien V. De Luna, MSME


Subject Teacher
Statistics Handouts
Page 2 of 92

TABLE OF CONTENTS

Exercise No. Title Page

1 Variables and the Summation Notation 6

2 Frequency Distribution Table 17

3 Numerical Descriptive Measures 31

4 Weighted Means 38

5 FPC, Combination and Permutation 54

6 Probability 60

7 Normal Distribution 68

8 Test of Hypothesis I 77

9 Test of Hypothesis II 80

Lesson No. Title Page

1 Methods of Data Collection and Presentation 7

2 Frequency Distribution Table 14

3 Numerical Descriptive Measures 19

4 Weighted Means 32

5 Sampling 40

6 FPC, Combinations and Permutations 51

7 Probability 55

8 Normal Distribution 66

9 Estimation 69

10 Test of Hypothesis 72

11 Two-way ANOVA 84

12 Pearson Moment Correlation 88


Statistics Handouts
Page 3 of 92

Sources/ References:

Concepts, sample problems and information given by this manual were taken from the following :

1. Fundamental Statistics for College Students by Pagoso, et al.

2. Graduate Research Manual – Guide to thesis and Dissertations (Aquinas Graduate School)

3. How to Design and Evaluate Research Education by Fraenkel and Wallen

4. Introduction to Statistics by Walpole

5. Introduction to Statistical Methods by Parel, Alonzo, et al.

6. Laboratory Manual in Statistics I, UPLB

7. Manual on Training on Microcomputer-Based for the Social Sciences (Richie Fernando Hall AdeNU,

2005)

8. Statistics for the Health Sciences by Kuzma

9. Applied Basic Statistics by Flordeliza Reyes

10. Fundamental Concepts and Methods in Statistics by George Garcia

11. Simplified Statistics for Beginners by Dr. Cesar Bermundo

12. http://statistics.about.com/od/Descriptive-Statistics/a/What-Is-Kurtosis.htm
Statistics Handouts
Page 4 of 92

I. Statistics and its Scope

STATISTICS encompasses all the methods and procedures used in the


collection, presentation, analysis and interpretation of data.

DESCRIPTIVE STATISTICS comprise those methods concerned with


collecting and describing a set of data so as to yield meaningful information.

STATISTICAL INFERENCE comprises those methods concerned with the


analysis of a subset of data leading to predictions or inferences about the
entire set of data

 Population vs Sample
Population is the set of all entities and elements under study. Sample is the
subset of population.

 Parameters vs Statistics
Parameters refer to all descriptive measures or characteristics of population
while statistics refer to sample characteristics.

 Census vs Survey
Census is the process of gathering information from every element of the
population while survey is the process of gathering information from every
element of the sample.

II. Variables and its Level of Measurement


Variable is an observable characteristics of a person or object which is capable
of taking several values or of being expressed in several different categories. It
can be either quantitative (discrete or continuous) or qualitative data.

MEASUREMENT SCALES
a. Nominal – are simply labels, names or categories. Number assignment is
used for identification purposes, no meaning can be attached to the
magnitude or size of such numbers. Examples are gender, civil status,
telephone numbers, etc..
b. Ordinal - whereas nominal scales only classify, ordinal scales do not only
classify but also order the classes. Examples are job position, military
ranks, etc..
c. Interval – quantitative but has no true zero point. Examples are IQ, room
temperature, etc..
d. Ratio – quantitative and has true zero point. Examples are number of
children, physics test scores, etc…
Statistics Handouts
Page 5 of 92

SUMMATION NOTATION

For a given universe, suppose we observe a variable, say X. We may denote the
first value as X1, the second as X2 and so on. In general, Xi is the observation on
variable X made on the ith individual.

Given a set of N observations or data values represented by X1, X2, …, XN, we express
their sum as

∑ 𝑋𝑖 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁
𝑖=1

where Σ is the summation symbol;


i is the index of the summation; and
Xi is the summand.
1 is the lower limit
N is the upper limit

Theorem 1. If c is a constant, then

𝑁 𝑁

∑ 𝐶𝑋𝑖 = 𝐶 ∑ 𝑋𝑖
𝑖=1 𝑖=1
Theorem 2. If c is constant, then

∑ 𝐶 = 𝑁𝐶
𝑖=1

Theorem 3. If a and b are constants, then


𝑁 𝑁 𝑁

∑(𝑎𝑋𝑖 ± 𝑏𝑌𝑖 ) = 𝑎 ∑ 𝑋𝑖 ± 𝑏 ∑ 𝑌𝑖
𝑖=1 𝑖−1 𝑖=1
Statistics Handouts
Page 6 of 92

Exercise # 1 – Variables and the Summation Notation

At the end of this exercise, the student must be able to:


1. identify different types of variables
2. classify data according to level of measurement
3. employ summation notation

I. Identify the level of measurement.

A. From all patients admitted in a hospital, the following information are collected:
1. name of patient
2. age
3. sex
4. body temperature
5. blood pressure
6. amt. of deposit
7. first time to see a doctor regarding ailment? (yes/no)
8. heartbeat per minute
9. weight
10. height
11. no. of glasses of fluid intake per day
12. no. of meals taken in a day

B. The following information are of interest for selected students of AdeNU who are cigarette
smokers.
1. age when first smoked
2. average no. of sticks consumed per day
3. main source of allowance
4. amt. of weekly allowance
5. Is your father a smoker? (yes/no)
6. occupation of father
7. brand of cigarette
8. position in the family

II. Instruction will be given by your teacher.

Date Set 1. Data on head circumference (in cm) and foot length (cm) of 8 new born
babies.
Baby no. 1 2 3 4 5 6 7 8
Head 31.5 33 37.5 38.5 35 32 38 34
circumference (x)
Foot length (y) 5.6 6.2 6.8 6.6 6.4 5.4 6.0 6.1

Data Set 2. Data on height (cm) and weight (lbs) of 8 stat students.
Student no. 1 2 3 4 5 6 7 8
Height(x) 168 141 165 180 165 156 150 147
Weight (y) 110 90 120 125 142 97 105 110
Statistics Handouts
Page 7 of 92

Lesson #1 – Methods of Data Collection and Presentation

METHODS OF DATA COLLECTION


Various methods for data gathering are available. A researcher should be able to
use the most appropriate.

1. Survey Method – questions are asked to obtain information, either through self
administered questionnaire or interview (personal, telephone or internet)

Ways Advantages Disadvantages

Personal  Flexibility in obtaining  expensive


Interview answers  field interviews are
 More in-depth answers hard to control
 Can observe the  errors in interviewing
respondent’s behavior  time consuming

Mailed  wider geographic  responses rate may be


Questionnaires distribution of low
respondents possible  hard to obtain in-
 respondents can answer depth information
at their convenience  usable mailing list
 no personal interviewer’s may be unavailable
bias  respondent not the
 centralized control o addressee
people doing the survey  cannot observe
 relatively inexpensive respondent’s behavior
 respondent may be more
candid if he/she can
answer anonymously

Phone  relatively inexpensive  unlisted telephone


Interview  fast number
 centralized control of  outdated telephone
people doing survey directory
 respondents maybe more  interview time needs
candid to be relatively short
 selected sample may
not have telephones
Statistics Handouts
Page 8 of 92

2. Observation Method – makes possible the recording of behavior but only at a


time of occurrence (e.g., observing reactions to a particular stimulus, traffic
count).

Advantages over Survey Method:


 does not rely on the respondent’s willingness to provide information
 certain types of data can be collected only by observation (e.g., behavior
patterns of which the subject is not aware of or ashamed to admit)
 the potential bias caused by the interviewing process is reduced or eliminated

Disadvantages over Survey Method:


 things such as awareness, beliefs, feelings and preferences cannot be observed
 the observed behavior patterns can be rare or too unpredictable thus
increasing the data collection costs and time requirements

3. Experimental Method – a method designed for collecting data under controlled


conditions. An experiment is an operation where there is actual human
interference with the conditions that can affect the variable under study. This is
an excellent method of collecting data for causation studies. If properly designed
and executed, experiments will reveal with a good deal of accuracy, the effect of a
change in one variable on another variable.

4. Use of Existing Studies – e.g., census, health statistics, and weather bureau
reports

Two types:
 documentary sources – published or written reports, periodicals,
unpublished documents, etc.

 field sources – researchers who have done studies on the area of interest
are asked personally or directly for information needed

5. Registration method – e.g., car registration, student registration, and hospital


admission
Statistics Handouts
Page 9 of 92

METHODS OF DATA PRESENTATION

1. Textual form – data are incorporated to a paragraph.

Advantages:
 This method is appropriate only if there are few numbers to be presented.
 Gives emphasis to significant figures and comparisons

Disadvantages:
 It is not desirable to include a big mass of quantitative data in a “text” or
paragraph, as the presentation becomes incomprehensible.
 Paragraphs can be tiresome to read especially if the same words are repeated
so many times

2. Tabular Presentation – systematic organization of data in rows and columns

Advantages:
 More concise than textual presentation
 Easier to understand
 Facilitates comparisons and analysis of relationship among different categories
 Presents data in greater detail than a graph

PARTS OF A STATISTICAL TABLE:

a. Heading – consists of a table number, title and head note. The title explains
what are presented, where the data refers and when the data apply.

b. Box Head – contains the column heads which describes the data in each
column, together with the needed classifying and qualifying spanner heads.

c. Stub – these are classification or categories found at the left. It describes the
data found in the rows of the table.

d. Field – main part of the table

e. Source Note – an exact citation of the source of data presented in the table
(should always be placed when figures are not original)
Statistics Handouts
Page 10 of 92

Illustration:

HEADING
Table 4.4
Philippines Crime Volume and Rate by Type in 1991

1991
Type Volume Crime
BOXHEAD
Rate
d
Total 11,326 195

Index Crimes 77,261 124


Murder 8,707 8,707
Homicide 8,068 8,069
STUB Physical Injury 21,862 21,862 FIELD
Robbery 13,817 13,817
Theft 22,780 88,780
Rape 2,026 2,026

Non Index Crimes 44,065 71

Source: Philippines National Police

SOURCE NOTE

Guidelines:
 Title should be concise, written in telegraphic style, not in complete sentence
 Column labels should be precise.
 Categories should not overlap.
 Unit of measure must be clearly stated
 Show any relevant total, subtotals, percentages, etc..
 Indicate if the data were taken from another publication by including a source
note
 Tables should be self-explanatory, although they may be accompanied by a
paragraph that will provide an interpretation or direct attention to important
figures
Statistics Handouts
Page 11 of 92

3. Graphical Presentation- a graph or chart device for showing numerical values or


relationship in pictorial form

Advantages:
 main feature and implication of a body of data can be grasped at a glance
 can attract attention and hold the reader’s interest
 simplifies concepts that would otherwise have been expressed in so many words
 can readily clarify data, frequently bring out hidden facts and relationship

Common Types of Graph

a. Line Chart – graphical presentation of data especially useful for showing trends over
a period of time.

b. Pie Chart – a circular graph that is useful in showing how a total quantity is
distributed among a group of categories. The “pieces of the pie” represent the
proportions of the total that fall into each category.

c. Bar Chart – consists of a series of rectangular bars where the length of the bar
represents the quantity or frequency for each category if the bars are arranged
horizontally. If the bars are arranged vertically, the height of the bar represents the
quantity

d. Pictorial Unit chart – a pictorial chart in which each symbol represents a definite
and uniform value
Statistics Handouts
Page 12 of 92

THE STEM-AND-LEAF DISPLAY

The stem-and-leaf display is an alternative method for describing a set of data. It


presents a histogram-like picture of the data, while allowing the experimenter to retain
the actual observed values of each data point. Hence, the stem-and-leaf display is
partly tabular and partly graphical in nature.

In creating a stem-and-leaf display, we divide each observation into two parts, the
stem and the leaf. For example, we could divide the observation 244 as follows:

Stem Leaf
2 ⋮ 44

Alternatively, we could choose the point of division between the units and tens,
whereby

Stem Leaf
24 ⋮ 4

The choice of the stem and leaf coding depends on the nature of the data set.

Steps in Constructing the Stem-and –Leaf Display

1. List the stem values , in order, in a vertical column


2. Draw a vertical line to the right of the stem value
3. For each observation, record the leaf portion of that observation in the row
corresponding to the appropriate stem
4. Reorder the leaves fro lowest to highest within each stem row. Maintain uniform
spacing for the leaves so that the stem with the most number of observations has
the longest line.
5. If the number of leaves appearing in each row is too large, divide the stem into
two groups, the first corresponding to leaves beginning with digits 0 through 4
and the second corresponding to leaves beginning with digits 0 through 4 and the
second corresponding to leaves beginning with digits 5 through 9. This
subdivision can be increased to five groups if necessary.
6. Provide a key to your stem-and-leaf coding so that the reader can recreate the
actual measurements from your display.
Statistics Handouts
Page 13 of 92

Example: Typing speeds (net words per minute) for 20 secretarial applicants

68 72 91 47
52 75 63 55
65 35 84 45
58 61 69 22
46 55 66 71

Stem Leaf (unit=1)

2 2
3 5
4 5 6 7
5 2 5 5 8
6 1 3 5 6 8 9
7 1 2 5
8 4
9 1

Note: The stem-and –leaf display should include a reminder indicating the units of the
data value.

Example:

Unit = 0.1 1 2 represents 1.2


Unit = 1 1 2 represents 12
Unit = 10 1 2 represents 120
Statistics Handouts
Page 14 of 92

Lesson #2 – Frequency Distribution Table

Date Set. Given below is the distribution of statistics test scores of 50 students (Perfect score is 70 and
passing score is 60% of it )

5 20 21 24 27 30 35 38 45 55
8 20 21 25 28 30 35 39 47 58
10 20 23 25 29 32 36 40 48 59
18 20 23 25 29 35 36 40 49 60
19 21 23 26 30 35 37 40 50 70

Steps in the construction of frequency distribution:


1. Determine the range R of the distribution.

R = highest observed value – lowest observed value


= 70 - 5
= 65
2. Determine the number of classes, k, desired. By the square root rule.

K= N , where N = total number of observations


= √50 = 7.07
K≈7
 the number of classes is to be rounded off to the nearest WHOLE NUMBER.

3. Calculate the class size, c.

65
First find: c’ = R/K = = 9.28 ≈ 9
7

The class size is to have the same precision as the raw data and should take the
value nearest to c’. Hence, c’ = 9

4. Enumerate the classes or categories based on the quantities calculated in steps 1-3
bearing in mind that:

a) the lowest class must include the lowest observed value and the highest class,
the highest observed value. (The lowest value of the data is the lower class limit of
the first class).
b) That each observation will go into one and only class (that none of the values can
fall into possible gaps between successive classes and that the classes do not
overlap).

 Successive lower class limits may be obtained by adding c’ to the preceding


lower class limit. And so with the upper limits.
Statistics Handouts
Page 15 of 92

I. Tally the observations to determine the class frequency or the number of


observations falling into each class.

Classes Frequency
5 - 13 3
14 - 22 9
23 - 31 15
32 - 40 13
41 - 49 4
50 - 58 3
59 - 67 2
68 - 76 1

II. Add other informative columns.

1. True Class Boundaries (TCB) – remove discontinuity between classes and


consider the true range of values.

(Lower TCB) LTCB = LL – 0.5 (unit)


(Upper TCB) UTCB = UL + 0.5(unit)

 a unit depends on the precision of data

example. 1st class: LTCB = 5 - 0.5(1) = 4.5


UTCB = 13 + 0.5(1) = 13.5

Note:
If data Unit of precision
is a whole number 1
has 1 decimal place 0.1
has 2 decimal places 0.01

2. Class Mark (CM) = the center of a class. It is the midpoint of the class interval
where observations in a class tend to cluster about.

( 𝐿𝑇𝐶𝐵+𝑈𝑇𝐶𝐵) 𝐿𝐿+𝑈𝐿
CM = 𝑜𝑟
2 2

3. Relative Frequency (RF) – proportion of observations falling in one class (in %)

𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
RF = x 100%
𝑁
Statistics Handouts
Page 16 of 92

FREQUENCY DISTRIBUTION TABLE

Classes True Class


Boundaries (TCB) CM Freq RF (%) CF RCF
LL UL
LTCB UTCB < > < >
5 - 13 4.5 - 13.5 9 3 6
14 - 22 13.5 - 22.5 18 9 18
23 - 31 22.5 - 31.5 27 15 30
32 - 40 31.5 - 40.5 36 13 26
41 - 49 40.5 - 49.5 45 4 8
50 - 58 49.5 - 58.5 54 3 6
59 - 67 58.5 - 67.5 63 2 4
68 - 76 67.5 - 76.5 72 1 2
50 100
Statistics Handouts
Page 17 of 92

Exercise # 2 – Frequency Distribution Table


Objectives:
At the end of the exercise, the student is expected to:
1. describe the different methods of data presentation;
2. organize data by constructing a frequency distribution table

A. On organizing data: Construct an FDT for the given data. Show computations for R, K and c.

Table 1.
Blood Glucose of 20 individuals of the Honolulu Heart Center, 1969

ID no. Blood Glucose


(in mg)

1 107
2 145
3 237
4 91
5 185
6 106
7 177
8 120
9 116
10 105
11 109
12 186
13 257
14 218
15 164
16 158
17 117
18 130
19 132
20 138
Statistics Handouts
Page 18 of 92

Table 2.
Socio-Economic Characteristics of 30 Countries as of January 1997

Obsn. Country Life


No. Expectancy

1 Japan 80
2 Australia 78
3 Canada 78
4 Hongkong 78
5 Italy 78
6 Switzerland 78
7 France 77
8 US 77
9 Britain 76
10 Germany 76
11 New Zealand 76
12 Singapore 76
13 Brunei 75
14 Taiwan 75
15 Macau 73
16 Fiji 72
17 Malaysia 72
18 South Korea 72
19 Sri Lanka 72
20 China 71
21 Mexico 71
22 Saudi Arabia 70
23 Russia 69
24 Thailand 69
25 Iran 68
26 Brazil 67
27 Philippines 67
28 Turkey 67
29 Vietnam 67
30 Egypt 64
Statistics Handouts
Page 19 of 92

Lessons # 3 – Numerical Descriptive Measures

NUMERICAL DESCRIPTIVE MEASURES

I. Measure of Location – value within the range of the data which describes its
location or position relative to the entire set of data. The more common measures
are measures of central tendency, percentile, decile and quartile.

A. Measure of Central Tendency – describes the “center” of the data. It is a single


value about which the observations tend to cluster. The common measures
are mean, median and mode.

Characteristics When to Use

1. Mean (𝑥̅ /𝜇) – sum of  interval statistic  variables are in at least


the observations  calculated average interval scale
divided by the  value is determined  value of each score is
number of by every case in the desired
observations totaled distribution  values are considerably
 affected by extreme concentrated or closed
values to each other

2. Median (Md) – middle  ordinal statistic  ordinal interpretation


value of an array  rank or position is needed
average  middle score is desired
 not affected by  we want to avoid
extreme values influence of extreme
values

3. Mode (Mo) –  nominal statistics  nominal interpretation


observations which  inspection average is needed
occurs most  not unique; have  quick approximation of
frequently in the data more than one mode central tendency
set  most “popular” score desired
 unaffected by
extreme values
 represents the
majority
Statistics Handouts
Page 20 of 92

B. Percentile (Pi) – divides the data set into 100 equal parts, each part having one
percent of all the data values. For example, if patrick received a rating of 90th
percentile in the National Secondary Achievement Test, this means that 90%
of the students who took the test had scores lower than Patrick’s.

C. Decile (Di) – divides a data set into ten equal parts, each part having ten
percent of all data values. The first decile is the 10 th percentile, the second
decile is the 20th pe4rcentile, and so on, up to the tenth decile which is the
100th percentile.

D. Quartile (Qi) – divides a data set into four equal parts, each part having
twenty-five percent of all data values. The first quartile is the 25th percentile,
the second is the 50th percentile, the third is the 75th percentile, and the
fourth quartile is the 100th percentile.

II. Measure of Dispersion – describes the extent to which the data are dispersed.
The more commonly used measures are:

A. Range (R)
- not a stable measure of variation because it can fluctuate greatly
with a change in just a single score, either the highest or the
lowest
- easiest to compute but the LEAST SATISFACTORY because its
value is dependent only upon the two extremes

B. Variance (s2/𝜎 2 )
- considers the position of each observation relative to the mean of
the set; denoted by 2

C. Standard Deviation (s/)


- best measure of variation
- important as a measure of heterogeneity or unevenness within a
set of observations
- used when comparing two or more sets of data having the same
units of measurement

D. Coefficient of Variation ( CV )
- used to compare the variability of 2 or more sets of data even
when the observations are expressed in different units of
measurement.
Statistics Handouts
Page 21 of 92

III. Measure of Skewness (SK) – describes the extent of departure of the distribution
of the data from symmetry.

SK = 0, Symmetric Distribution  the median is the score pt. which bisects


the total area. Half of the area would fall
to the left and half to the right
 mode is the score pt. with the highest
frequency, the pt. on the x-axis
corresponds to the tallest pt. of the curve
 mean is the score pt on the x-axis that
corresponds to the pt. of balance

SK > 0, Positively Skewed  bump on the left indicates that the mode
corresponds to a low value
 tail extending to the right means that the
mean, which is sensitive to each score
value, will be pulled in the direction of
the extreme scores and will have a high
value
 median which is unaffected by extreme
values will have a value between the
mode and the mean

SK < 0, Negatively Skewed


 mean will have a lower numerical value
than the median because the extremely
low scores will pull the mean to the left
 bump usually occurs at the right
indicating that the mode has a high
numerical value
 median will still be in the middle
Statistics Handouts
Page 22 of 92

IV. Measure of Kurtosis – measures the degree of peakedness of a data of


distribution, denoted by k. If the distribution of the data is bell-shaped, k=3. If
the shape of the distribution is relatively peaked, k>3. If the shape is relatively
flat, k<3.

K= 3
A distribution that is peaked in the same way as any
normal distribution, not just the standard normal
distribution, is said to be mesokurtic. The peak of a
mesokurtic distribution is neither high nor low,
rather it is considered to be a baseline for the two
other classifications.

K> 3 A leptokurtic distribution is one that has kurtosis


greater than a mesokurtic distribution. Leptokurtic
distributions are identified by peaks that are thin
and tall. The tails of these distributions, to both the
right and the left, are thick and heavy. Leptokurtic
distributions are named by the prefix "lepto"
meaning "skinny."

K<3 The third classification for kurtosis is platykurtic.


Platykurtic distributions are those that have a peak
lower than a mesokurtic distribution. Platykurtic
distributions are characterized by a certain flatness
to the peak, and have slender tails. The name of
these types of distributions come from the meaning
of the prefix "platy" meaning "broad."
Statistics Handouts
Page 23 of 92

FORMULAS FOR UNGROUPED DATA

Data Set 1: 115 115 120 120 120 125 125 130 300
Data Set 2: 115 115 120 120 120 125 125 125 130 130

Numerical Measures Computation

Data 1 Data 2

1. Mean =  =  Xi/N

2. Median

3 .Mode is determined by mere


inspection.

4. Variance
2 =  Xi2 - 2
N
Where  is the mean of the ungrouped data

5. Standard Deviation = positive


square root of variance

6. Coefficient of Variation

CV = [ / ] x 100%

7. Measure of Skewness
3( Mean  Median )
SK =

Statistics Handouts
Page 24 of 92

Numerical Measures Computation

Data 1 Data 2

𝑖
8. Pi = 𝑥𝑁
100

𝑖
9. Di = 𝑥𝑁
10

𝑖
10. Qi = 𝑥𝑁
4

Note:

MEDIAN

If n is odd, the median position equals (n=1)/2, and the value of the (n+1)2th
observation in the array is taken as the median, i. e.,

Md = X( [n/1] / 2)

If n is even, the mean of the two middle values in the array is the median, i.e.,

𝑋𝑛 + 𝑋𝑛
+1
2 2
Md =
2

where n is the number of samples


Statistics Handouts
Page 25 of 92

FORMULAS FOR GROUPED DATA

Data Set

TCB CM Freq CM x Freq fi Xi2 CF

LTCB UTCB (Xi) (Fi) <

2.65 – 3.75 3.2 5 16 5


3.75 – 4.85 4.3 4 17.2 9
4.85 – 5.95 5.4 8 43.2 17
5.95 – 7.05 6.5 3 19.5 20
7.05 – 8.15 7.6 12 91.2 32
8.15 – 9.25 8.7 8 69.6 40

40 256.7 1783.83

Numerical Measures Computation

1. mean () =  fiXi, where


N
fi = frequency of the ith class
Xi= classmark of the ith class
N = total no. of observation
K = number of classes

2. median (Md)
NOTE: the middle class is the class which
N  contains the (n/2)th value of the array
 2   CFb 
= LTCBMd + c  
 FMd 
 

where LTCBMd = LTCB of the median class


C = class size
<CFb = <CF of the class preceding
median class
FMd = frequency of the median class
N = total number of observations
Statistics Handouts
Page 26 of 92

3. mode (Mo) NOTE: the modal class is the class with


the highest frequency
 FMo  Fb 
= LTCBMo + c  
 2FMo  Fb  Fa 
where
LTCBMo = LTCB of the modal class
C = class size
FMo = frequency of the modal class
Fb = frequency of the class preceding the
modal class
Fa = frequency of the class following the
modal class

4. Variance ( 2)

fiXi 2
=  N
 2 where

fi = freq. Of the ith class


Xi= classmark of the ith class
N = total number of observations
2G = mean of the grouped data

5. Standard deviation () = positive


square root of the variance

6. Coefficient of Variation

CV = [ / ] x 100%

7. Measure of Skewness

3(mean  median)
SK =

Statistics Handouts
Page 27 of 92

7. Percentiles

 i 
 (100 ) N   CFb 
Pi = LTCBPi + c  
 FPi 
 

where LTCBPi = LTCB of the PI class


C = class size
<CFb = <CF of the class preceding Pi
class
FMd = frequency of the PI class
N = total number of observations

8. Deciles

 i 
 (10 ) N   CFb 
Di = LTCBDi + c  
 FDi 
 
where LTCBDi = LTCB of the Di class
C = class size
<CFb = <CF of the class preceding Di
class
FMd = frequency of the Di class
N = total number of observations

9. Quartiles

 i 
 ( 4 ) N   CFb 
Qi = LTCBQi + c  
 FQi 
 

where LTCBQi = LTCB of the Qi class


C = class size
<CFb = <CF of the class preceding Qi
class
FMd = frequency of the Qi class
N = total number of observations
Statistics Handouts
Page 28 of 92

FORMULAS:

1. Mean () =
 fixi
N

N  fiXi 2
 2   CFb 
2. Median (Md) = LTCBMd + c 
7. Variance (2) =  N
 2

 FMd 
 

8. Standard Deviation () = var iance


 FMo  Fb 
3. Mode (Mo) = LTCBMo + c  
 2FMo  Fb  Fa 

 i  
 (100 ) N   CFb  9. CV =

x100%
4. Pi = LTCBPi + c  
 FPi 
 

3(mean  median)
10. SK =
 i  
 (10 ) N   CFb 
5. Di = LTCBDi + c  
 FDi 
 

 i 
 ( 4 ) N   CFb 
6. Qi = LTCBQi + c  
 FQi 
 
Statistics Handouts
Page 29 of 92

THE BOXPLOT

Definition. The boxplot is a graph that is very useful for displaying the following
features of the data:
 Location
 Spread
 Symmetry
 extremes
 outliers

Steps in Constructing Boxplot


1. Construct a rectangle with one end of the first quartile and the other end at the
third quartile
2. Put a vertical line across the interior of the rectangle at the median
3. Compute for the interquartile range (IQR), lower fence (FL) and the upper fence
(FU) given by:
IQR = Q3 – Q1
FL = Q1 – 1.5 IQR
FU = Q3 – 1.5 IQR
4. Locate the smallest value contained in the interval [FL , Q1]. Draw a line from this
value to Q1.
5. Locate the largest value contained in the interval [Q3 , FU]. Draw a line from this
value to Q3.
6. Values falling outside the fences are considered outliers and are usually denoted
by “x”

Remarks:

1. The height of the rectangle is arbitrary and has no specific meaning. If several
boxplots appear together, however, the height is sometimes made proportional to
the different sample sizes.
2. If the outlying observation is less than Q1 – 3 IQR or greater than Q3 + 3 IQR it is
identified with a circle at their actual location. Such an observation is called a far
outlier.
Statistics Handouts
Page 30 of 92

Examples:

1. Data Set A: 1 15 21 22 24
10 18 22 23 25
14 20 22 24 28

2. Data Set B: 3 10 11 12 19
8 10 12 16 19
9 10 12 16 30
Statistics Handouts
Page 31 of 92

More Problems:

1. Suppose a teacher assigns the following weights to the various course requirements:

Assignment 15%
Project 25%
Midterms 20%
Finals 40%

The maximum score a student may obtain for each component is 100. Sheila obtains
marks of 83 for assignment, 72 for project, 41 for midterms and 49 for the finals. Find her
mean mark for the score.

2. Two of the quality criteria in processing butter cookies are the weight and color development
in the final stages of oven browning. Individual pieces of cookies are scanned by a
spectrophotometer calibrated to reflect yellow-brown light. The readout is expressed in per
cent of a standard yellow-brown reference plate and a value of 41 is considered optimal
(golden-yellow). The cookies were also weighed in grams at this stage. The means and
standard deviations of 30 sample cookies are presented below.

Mean sd
Color 41.1 10
Weight 17.7 3.2

Which of the two quality criteria is more varied?

3. The following are weight losses (in pounds) of 25 individuals who enrolled in a five-week
weight-control program:

2 3 3 4 4 4 5 5 6 7 7 8 8
8 9 9 9 9 10 10 10 11 11 11 12

Compute for the 3rd quartile, 7th decile, and 89th percentile.
Statistics Handouts
Page 32 of 92

Exercise #3 – Numerical Descriptive Measures


Objectives:
A1t the end of the exercise, the student is expected to identify and compute appropriate numerical
descriptive measures for ungrouped and grouped data, specifically,
 measure of central tendency
 measure of dispersion; and
 measure of skewness

A. Using your raw data set and the FDT you constructed in exercise # 2, compute for
the appropriate descriptive measures (ungrouped and grouped). Show solution for
grouped data only.

B. Construct these tables in your workbooks and summarize the values obtained.

I. Measure of Central Tendency

Mean Median Mode


ungrouped grouped ungrouped grouped ungrouped grouped

II. Measure of Dispersion

Range Variance Standard Deviation Coeff. Of Variation


ungrouped grouped Ungrouped grouped ungrouped Grouped ungrouped Grouped

III. Measure of Skewness

ungrouped Grouped

IV. Fractiles

P90 D6 Q3
Ungrouped Grouped Ungrouped Grouped Ungrouped Grouped

C. Interpret the obtained values for your mean, median and mode (ungrouped data
only).
Statistics Handouts
Page 33 of 92

Lesson # 4 – Weighted Means

Weighted Means

 Weighted Mean is a statistical measure obtained when data is gathered from a survey questionnaire
using the Likert Scale

 A Likert scale is a psychometric scale commonly used in questionnaires and is the most widely
used scale in survey research. When responding to a Likert questionnaire item, respondents specify
their level of agreement to a statement.1 A Likert item is simply a statement the respondent is asked
to evaluate according to any kind of subjective or objective criteria.

 Generally, the level of agreement or disagreement is measured. Often five ordered response levels
are used, although many psychometricians advocate using seven or nine levels. A recent empirical
study2 found that a 5- or 7- point scale may produce slightly higher mean scores relative to the
highest possible attainable score, compared to those produced from a 10-point scale, and this
difference was statistically significant.

 Strategies: 5- Very Effective, 4- Effective,3-Moderately effective/Undecided,…


 Practices: 5- Highly Observed/Always/Fully Aware, 4- Observed/Sometimes/Aware,…
 Traits/Attitudes: 5-Very Evident, 4-Somewhat Evident, 3-Undecided, 2-Somewhat inevident, 1-Not
evident

1
http://en.wikipedia.org/wiki/Likert_scale
2
Dawes, John (2008). "Do Data Characteristics Change According to the number of scale points used? An experiment using 5-
point, 7-point and 10-point scales". International Journal of Market Research 50 (1): 61–77.
Statistics Handouts
Page 34 of 92

Table 1. Illustration of a Likert Scale Questionnaire

Research Title: Solid Waste Management of Ateneo de Naga University


Below is a list of Solid Waste Management practices. Please check the boxes with the appropriate
number corresponding to your chosen answer as to how these are practices are observed.
Scale: 5 - Very High
4 - High
3 - Moderate
2 - Low
1 - Very Low

5 4 3 2 1
A. GENERATION OF WASTE

Ateneo de Naga University

1.Provides information through campaigns or


seminars about solid waste generation

2. Introduces strategies on how to apply the 4R's


( Reuse, Recycle, Reduce and Respond ) of Solid Waste
Management

3. Provides campaign to patronize the use of reusable


and recycled materials

4. Rejects products which are harmful to the


environment such as foam, styrofoam, CFC aerosols,
oil-based paints, pesticides, insecticides, plastics,
wood preservatives, glues and adhesives

5. Encourages the use of unused side of old papers or


recycles its own paper ( as shown by the exam papers
used, handouts, memo, letters, etc)

6. Encourages or requires the use of refillable inks for


pens, ballpens, printers, etc..

7. Allows the use of old notebooks from previous years


instead of requiring new ones

8. Encourages to reuse envelopes, boxes, packaging


materials and folders

9. Repairs or disposes defective computers in


laboratories or offices
Statistics Handouts
Page 35 of 92

Table 2. Tallied Data

Weighted
5 4 3 2 1
Means
A. GENERATION OF WASTE

Ateneo de Naga University

1.Provides information through campaigns or seminars about solid 0 6 12 38 64


waste generation
2 8 10 29 71
2. Introduces strategies on how to apply the 4R's ( Reuse,
Recycle, Reduce and Respond ) of Solid Waste Management
6 8 22 38 46
3. Provides campaign to patronize the use of reusable and recycled
materials

4. Rejects products which are harmful to the environment such as 0 5 7 34 74


foam, styrofoam, CFC aerosols, oil-based paints, pesticides,
insecticides, plastics, wood preservatives, glues and adhesives

5. Encourages the use of unused side of old papers or recycles its 7 6 12 33 62


own paper ( as shown by the exam papers used, handouts, memo,
letters, etc)
1 1 4 41 73
6. Encourages or requires the use of refillable inks for pens,
ballpens, printers, etc..
2 3 4 42 69
7. Allows the use of old notebooks from previous years instead of
requiring new ones 6 11 18 27 53
8. Encourages to reuse envelopes, boxes, packaging materials and
folders 0 2 3 43 72

9. Repairs or disposes defective computers in laboratories or


offices
Cumulative Weighted Mean
Source: Valenzuela 2007, p.66
Statistics Handouts
Page 36 of 92

Table 3
Adjectival Interpretation of the Likert Scale (cumulative mean)

Rating Scale Range Interpretation

5 4.20 – 5.00 Very High – Almost all indicators are


practiced

4 3.40 – 4.19 High – 75% of the indicators were practiced

3 2.60 – 3.39 Moderate – 50% of the indicators were


practiced

2 1.80 – 2.59 Low – 25% of the indicators were practiced

1 1.00 – 1.79 Very Low – almost none of the indicators


were practiced

Table 4
Adjectival Interpretation of the Likert Scale (per item)

Rating Scale Range Interpretation

5 4.20 – 5.00 Very High – Almost all respondents practice


the said indicator

4 3.40 – 4.19 High – 75% of the respondents

3 2.60 – 3.39 Moderate – 50% of the respondents

2 1.80 – 2.59 Low – 25% of the respondents

1 1.00 – 1.79 Very Low – almost none of the


respondents…
Statistics Handouts
Page 37 of 92

Table 5 .
Extent of Solid Waste Management in AdeNU ( faculty and students) , 2007

Weighted Interpretation
Mean
A. GENERATION OF WASTE

Ateneo de Naga University

1.Provides information through campaigns or seminars about 1.67 Very Low


solid waste generation

2. Introduces strategies on how to apply the 4R's ( Reuse, 1.68 Very Low
Recycle, Reduce and Respond ) of Solid Waste Management
2.08 Low
3. Provides campaign to patronize the use of reusable and recycled
materials

4. Rejects products which are harmful to the environment such as 1.52 Very Low
foam, styrofoam, CFC aerosols, oil-based paints, pesticides,
insecticides, plastics, wood preservatives, glues and adhesives

5. Encourages the use of unused side of old papers or recycles its 1.86 Low
own paper ( as shown by the exam papers used, handouts, memo,
letters, etc)
1.47 Very Low
6. Encourages or requires the use of refillable inks for pens,
ballpens, printers, etc..
1.56 Very Low
7. Allows the use of old notebooks from previous years instead of
requiring new ones
2.04 Low
8. Encourages to reuse envelopes, boxes, packaging materials and
folders 1.46 Very Low
9. Repairs or disposes defective computers in laboratories or
offices

Cumulative Weighted Mean 1.7 Very Low


Statistics Handouts
Page 38 of 92

Generation of Waste

The extent of performance of SWM practices of students and faculty on the area of generation of

wastes is given in Table 5. The results show the respondents’ mean, based on the nine (9) indicators

used, ranged from 1.4 to 2.08 or from “ very low” to “low” ratings. The respondents gave an overall mean

that resulted to “very low” to the following indicators: “provides information through campaigns or

seminars about SWM (1.67)”, “introduces strategies on how to apply the 4R's of Solid Waste Management

(1.68)”,, “rejects products which are harmful to the environment such as foam, Styrofoam, CFC aerosols,

oil-based paints, pesticides, insecticides, plastics, wood preservatives, glues and adhesives (1.52)” ,

“encourages the use of refillable ink (1.47)”, “allows the use of old notebooks (1.56) “ and “repairs or

disposes defective computers (1.46)”. The “very low” also implied that almost none of the respondents

observe the mentioned practices.

On the indicators stating that “provides campaign to patronize the use of reusable and recyclable

materials (2.08)”, “encourages the use of unused side of old papers or recycles its own paper (1.86)”,

“encourages or requires the use of refillable materials (3.2)”,and “encourages to reuse envelopes, boxes,

packaging materials and folders (2.04)” had an overall mean of “low”. Only 25% of the respondents

observe the mentioned indicators.

The students and faculty gave an overall weighted mean that resulted to “very low”. In totality,

the cumulative mean score resulted to 1.7. The result implied that almost none of the indicators were

being observed under the generation component of SWM.

Survey results reveal that there was a need for intensive information campaign about SWM and

that the University had yet to implement strategies on how to apply the 4R’s. Such an outcome presents

an opportunity to promote waste-saving measures among the student and teaching population in the

AdeNU in line with the future promotion of the 4R’s.


Statistics Handouts
Page 39 of 92

Exercise # 4 -Weighted Means

A. For the raw data given, obtain the weighted mean for each item and the
cumulative/total weighted mean.

B. Interpret the cumulative/total weighted mean.

C. What is the highest and lowest obtained weighted means. Interpret the values.

D. Conclusion. Make a discussion on the result of the test base on the objective of the
study.

Rating Scale Range of The Interpretation


Likert’s Scale

5 4.20 – 5.00 Extremely Characteristic of Me – Almost all


indicators are evident.

4 3.40 – 4.19 Somewhat Characteristic of Me – 75% of the


indicators are evident.

3 2.60 – 3.39 Neither Un/Characteristic of Me – 50% of the


indicators are evident.

2 1.80 – 2.59 Somewhat Uncharacteristic of Me – 25% of the


indicators are evident.

1 1.00 – 1.79 Extremely Uncharacteristic of Me – almost


none of the indicators are evident.
Statistics Handouts
Page 40 of 92

Problem Set
Thesis title: Portable Games and Devices towards Aggressive Behavior of the First Year BS Digital
Animation Students of Ateneo de Naga University
Objective: To determine the level of influence of playing Portable Games and Devices on the behavior
specifically aggressiveness of the respondents
Table 1
Results from the Standard Questionnaire by Buss and Perry.

Weighted
Indicators 5 4 3 2 1 Means
1. Some of my friends think I am a 18 12 15 12 13
hothead.
2. If I have to resort to violence to protect 17 21 10 15 7
my rights, I will.

3. When people are especially nice to me, I 14 17 15 17 7


wonder what they want.
4. I tell my friends openly when I disagree 17 28 10 10 5
with them.
5. I have become so mad that I have broken 10 17 14 15 14
things.
6. I can’t help getting into arguments when 16 18 14 13 9
people disagree with me.
7. I wonder why sometimes I feel so bitter 9 23 15 17 6
about things.
8. Once in a while, I can’t control the urge 12 16 10 16 16
to strike another person.
9. I am an even/tempered person. 18 21 15 13 3
10. I am suspicious of overly friendly 11 19 17 13 10
strangers.

Cumulative Weighted Mean


Statistics Handouts
Page 41 of 92

Lesson # 5 – Sampling

SAMPLE SIZE DETERMINATION


N
Slovin’s Formula: n
1  Ne 2

Where n = sample size


N = population size
e = margin of error (usually at 5%)

A researcher would want to make a socio-economic survey of a school with a


population of 5000 students. If he allows a margin of error of 5%, how many students
must he take into sample?

5000
n =
1  5000(0.05) 2

5000
=
1  5000(.0025)

5000
=
1  12.5

5000
=
13.5

= 370.37 ~ 370

Important: Samples should be as large as a researcher can obtain with a reasonable


expenditure of time and energy. A recommended minimum number of subjects is 100
for a descriptive study, 50 for a correlational, and 30 in each group for experimental
and causal- comparative study.
Statistics Handouts
Page 42 of 92

SAMPLING METHODS

Random Sampling Methods Nonrandom Sampling Methods

 every element in the population  not all elements are given a equal

has an equal chance of being chance of being included in the

chosen sample

 example: The dean of a school  some elements may be deliberately

of education in a large ignored (that is, giving them no

midwestern university wishes chance at all) in the choice of

to find out how her faculty feel elements for the sample

about the sabbatical leave  example: The manager of the

requirements at the university. campus bookstore at a local

She places all 150 names of the university wants to find out how

faculty in a hat, mixes them students feel about the services of

thoroughly , and then draws the bookstore provides. Every day for

out the names of 25 individuals two weeks during her lunch hour,

to interview. she asks every person who enters

the bookstore to fill out a short

questionnaire she has prepared and

drop it in a box near the entrance

before leaving. At the end of the two-

week period, she has a total of 235

completed questionnaires.
Statistics Handouts
Page 43 of 92

I. RANDOM SAMPLING METHODS

A. Simple Random Sampling (SRS) – is a method of selecting n units out of N


units in the population in such a way that every distinct sample of size n has
an equal chance of being drawn.

Required : complete list of the elements of the population

Features : each and every number of the population has an equal


chance and independent chance of being chosen

When to use : population size is not very large


population is homogeneous

Procedures : i. Lottery method/Chip-in-the-box/Fish-in-the-Bowl


ii. Table of Random Numbers
iii.Calculator/computer generated random numbers

Illustration: Table of Random Numbers

011723 223456 222167


912334 379156 233989
086401 016265 411148
059397 022334 080675
666278 106590 879809
051965 004571 036900
063045 786326 098000
560132 345678 356789
727009 344870 889567
000037 121191 258700
667899 234345 076567
Statistics Handouts
Page 44 of 92

B. Stratified Sampling – the population of N units is first divided into


subpopulations called strata. Then a simple random sample is drawn from
each stratum, the selection being made independently in different strata.

Required : complete list of the elements of the population

Features : representative for each strata or subgroups of the population

are randomly chosen as elements of the sample

When to use : Population size is large; Population is heterogeneous but

elements can be grouped into homogeneous strata ; When we want

representative for each strata or subgroups

Procedure: Given a population N = 365, the researcher grouped the

respondents according to gender where there are 219 females and 146 males.

Using stratified sampling, how many respondents will be obtained from each

strata?

N = 365 , use Slovins formula to get the sample size n

365
n =
1  365(0.05) 2

365
=
1  365(.0025)

365
=
1  0.9125

365
=
1.9125

= 190.849 ~ 191
Statistics Handouts
Page 45 of 92

Population of 365

Researcher identifies
2 subgroups or strata

219 146
219 females (60% = ) 146 males (40% = )
365 365

using Slovins we compute


the required sample size n,
then we multiply it by the percentage

191 x 0.60 191 x 0.40

115 females 76 males


Statistics Handouts
Page 46 of 92

C. Cluster Sampling – a method of sampling where a sample of distinct groups,


or clusters, of elements is selected and then a census of every element in the
selected clusters is taken.

Features : population is grouped into clusters or small units

composed of population elements; each cluster contains

as varied a mixture as possible and at the same time one

cluster is nearly as alike as the other

: Sometimes referred to as an area sample because it is

frequently applied on a geographical basis, blocks in a

community or city are occupied by heterogeneous groups

When to use : large population

: list of all members of the population is not available;

only a population list of clusters is required.

Procedure : 50 barangays in Naga City

Randomly choose 3 barangays


Statistics Handouts
Page 47 of 92

C. Multi-stage Sampling – the population is divided into hierarchy of sampling

units corresponding to the different sampling stages. In the first stage of

sampling, the population is divided into primary stage units (PSU) then a

sample of PSUs is drawn. In the second stage of sampling, each selected PSU

is subdivided into second-stage units (SSU) then a sample of SSU is drawn.

The process of subsampling can be carried to a third stage fourth stage and so

on, by sampling the subunits instead of enumerating them completely at each

stage.

Features :this technique uses several stages or phases in getting

sample from the general population

When to use : conducting nationwide surveys or any survey involving a

large universe
Statistics Handouts
Page 48 of 92

Illustration of Multistage Sampling:

Philippines (17 regions)

Choose randomly 5 regions

R1 R2 R3 R4 R5

Choose randomly 2 provinces for each region

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
Choose randomly 1 city for each province

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10

Choose randomly 2 barangays for each city

Then choose randomly 5 households for each barangay


Statistics Handouts
Page 49 of 92

Populations

A B C D E
25%
AHG K L I D W E R
F G H I J
T Y U O P S F G H J K L M N O
50%
Z X C V B N M
P Q R S T
25%

M N
B C 25%
F H M O 50%
O C M Q S 25%

SIMPLE RANDOM STRATIFIED


SAMPLING
Statistics Handouts
Page 50 of 92

Populations

AB

CD
CDE

FG AB

IJK
MNO GH
HKL
EF

CD
AB
FG
AB

HKL

CLUSTER SAMPLING D A

TWO-STAGE SAMPLING
Statistics Handouts
Page 51 of 92

II. Non-random sampling

A. Convenience - chooses sample at the researcher’s convenience


example. To find out how students feel about food service in the student
union at an East Coast university, the manager stands outside the main
door of the cafeteria one Monday morning and interviews the first 50
students who walk out of the cafeteria
B. Purposive - use their judgement to select a sample that they
believe will provide the data they need
example. A graduate student wants to know how retired people aged 65
and over feel about their “golden years”. He has ben told by one of his
professors, an expert on aging and the aged population, that the local
Association of Retired Workers is a representative cross section of retired
people age 65 and over. He decides to interview a sample of 50 people who
are members of the association to get their views.
C. Quota - sets a sample size then chooses the respondents
without setting criteria. The researcher proceeds to fill the prescribed
quota. The researcher is left to his own convenience or preference.

D. Snowball

REASONS FOR USING NON-RANDOM SAMPLING

a. Some might use this technique because they just want to get a “feel” of the
market before launching or producing a certain product.

b. Lack of logistics or inadequate knowledge in the use of random methods

c. The validity of the sample is based on the soundness of the judgement of


whoever make the choice.
Example. One would naturally use judgement instead of randomness
in the choice of people who will work for a company.
Statistics Handouts
Page 52 of 92

Lesson # 6 – FPC, Permutations and Combinations

Definition. FUNDAMENTAL PRINCIPLE OF COUNTING. If one event can occur in m


different ways, and if, after it has happened in one of these ways, a second event can
occur in n different ways, then both events can occur, in the order stated, in m x n
different ways.

Examples.

1. If there are eight doors providing access to a building, in how many ways can a
person enter the building by one door and leave by a different door?

2. How many even three-digit numbers can be formed fro the digits 1, 2,5,6 and 9 if
each digit can be used only once?

3. How many positive integers of three different digits can be formed from the integers
1,2 3, 4 and 5.

4. How many different arrangements, each consisting of five different letters, can be
formed from the letters of the word “PERSONAL” if each arrangement is to begin and
end with a vowel?

5. How many different arrangements of five distinct books each can be made on a shelf
with space for five books?

6. Suppose that there are 3 math books and 3 physics books, how many different
arrangement of the six books can be made on a shelf if books on the same subject
are to be kept together?

7. How many ways can a 10-question true-false exam be answered?


Statistics Handouts
Page 53 of 92

Definition. PERMUTATION (nPr). Let S be a set containing n elements and suppose r is


a positive integer such that r < n. Then a permutation of r elements of s is an
arrangement in a definite order, without repetitions of r elements of s.

Theorem 1. The number of permutations of n elements taken r at a tiem is given by


either of the following formulas:
a. nPr = n(n-1)(n-2) … (n-r+1)
b. nPr = n! / (n-r)!

Special case: nPn = n!

Examples:
1. A bus has six vacant seats. If three additional passengers enter the bus, in how
many different ways can they be seated?

2. In how many ways can 3 boys and 3 girls be seated in a row containing six seats if
a. a person may sit in any seat
b. boys and girls must sit in alternate seats?

Theorem 2. If we are given n elements, of which exactly m1 are of one kind, exactly m2
are alike of a second kind, …, and exactly mk are alike of a kth kind, and if n=m1 +
m2 + .. + mk, then the number of distinguishable permutations that can be made of the
n elements taking them all at one time is

𝑛!
𝑚1 ! 𝑚2 ! 𝑚3 ! … 𝑚𝑘 !

Examples:
1. Determine the number of different nine-digit numerals that can be formed from the
digits 6,6,6,5,5,5,4,4 and 3.

2. How many permutations can be formed from the word TENNESSEE?


Statistics Handouts
Page 54 of 92

Definition. COMBINATION (nCr). Let s be a set containing n elements, and suppose r is


a positive integer such that r< n. then a combination of r elements of s is containing r
distinct elements.

Theorem 3. The number of combinations of n elements taken r at a time is given by


nCr = nPr / r!
= n! / (n-r)!r!

Theorem 4. nCr = nCn-r

Examples:

1. A football conference consists of 10 teams. If each team plays every other team, how
many conference games are played?

2. A student has twelve posters to pin up on the walls of her room, but there is space
for only 7. In how many ways can she choose the posters to be pinned up?

3. How many committees of five can be formed from 7 sophomores and 5 freshmen if
each committee is to consist of 3 sophomores and 2 freshmen?

consist of at least 3 sophomores?

at most 3 sophomores?
Statistics Handouts
Page 55 of 92

Exercise #5 – FPC, Combinations and Permutations


Objectives:
At the end of the exercise, the student is expected to be able to:

1. Count the number of ways an event may possibly occur by:


a. listing all possible outcomes in the sample space corresponding to the event; and
b. using the method of counting.

2. Solve problems requiring the applications of the concept of permutation and combination.

I. Show complete solution for each.

1. How many different outcomes are possible in a roll of 2 dice? In tossing 5 coins? In
rolling 2 dice and tossing 3 coins simultaneously?
2. How many distinct permutations can be made from the word COOL? List them
down.
3. Package of 10 game boy sets contains 3 defective sets. If 5 sets are to be picked out
randomly and sent to a customer for an inspection, in how many ways can the
customer find at least two defective set?
4. How many different telephone numbers can be formed from a seven-digit number if
the first digit cannot be zero?
5. A college freshman must take a science course, a humanities course, and a math
course. If she may select any of 6 science courses, any of 4 humanities, and any of 4
math courses, how many ways can she set her program?
6. A shelf contains 3 books in red binding, 4 books in blue and 2 in green. In how
many different orders can they be arranged if all the books of the same color must
be kept together?
7. How many different numbers greater than 200 can be formed from the digits 1,2,3,4
and 5 (a)if repetitions are not allowed? (b) repetitions are allowed?
8. How many committees of 5 can be selected from 12 republicans and 8 democrats (a)
if it must contains 2 republicans and 3 democrats? (b)if it must contains at least 3
republicans?
9. There are 8 baseball teams in a league. How many games will be played if each team
play each of the other teams 40 times?
10. In how many ways can one make a selection of 5 black balls, 3 red balls, and 2
white balls from a box containing 8 black balls, 7 red balls and 5 white balls?
11. The tennis squad of one college consists of 8 players that if another consist of 10
players. In how many ways can a doubles match between the 2 institutions be
arranged?
12. In how many ways can one make selection 4 novels, 3 biographies and 6 detective
stories from a shelf containing 10 novels, 8 biographies and 10 detective stories.
Statistics Handouts
Page 56 of 92

Lesson #7 – Probability

PROBABILITY

 SAMPLE SPACE is the set of all possible outcome of a given experiment.


 A subset of the sample space of an experiment is called an EVENT associated with
the experiment

Definition. PROBABILITY OF AN EVENT. If S is the sample space of an experiment and


E is an event associated with the experiment, the probability of E, denoted by P(E), is
defined by

P(E) = . n(E) . where n(E) are the numbers of elements in E and S respectively.
n(S)

Furthermore, if P(E)= 0 then the event will never happen or it is an “impossible” event.
If P(E) = 1, the event is certain to happen or it is a “sure” event.

Examples:
1. Determine the probability of each of the following events:
a. Obtaining a 4 on a throw of a single die
b. Obtaining a head on a toss of a coin

2.
1 2 3 4 5 6
a. a. If 2 dice are thrown, what 1 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
is the probability of obtaining 2 (2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
a sum of 8? a sum of 3? 3 (3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
5 (5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

3. Determine the probability of each of the following events


a. Drawing a heart from a deck of 52 playing cards
b. Drawing 4 spades in succession from a deck of 52 playing cards if after each
card is drawn it is not replaced in a deck

4. If a French, Spanish, Russian and English books are placed at random on a shelf
with a space for 4 books, what is the probability that the Russian and English books
will be next to each other?
Statistics Handouts
Page 57 of 92

CONJUNCTION AND DISJUNCTION PROBABILITIES

Definition. CONJUNCTION PROBABILITY. This type of probability is associated with


events happening together, one event and another event occurring at the same time.
Events, however, may be independent or dependent

Case 1. P(A and B) = P(A) x P(B)


When the occurrence of one event does not influence the probability of the
occurrence of the other event, these events are said to be independent.

Example. At birth the probability that US female will survive to age 65 is


approximately 7/10. The probability that a male will survive to age 65 is
approximately 3/5. What is the probability that both male and female will
be alive at age 65?

What is the probability that only the male will be alive at age 65?

What is the probability that at least one of the two will be alive at age 65?

Case 2. P(A and B)= P(A) x P(B/A)


When the occurrence of one event is conditioned by the other event, these
events are said to be conditional.

Example. Suppose a box contains 30 fuses 5 of which are defective. What is


the probability of drawing at random two defective fuses in succession if
the first fuse that has been drawn is not returned before making the
second draw?
Statistics Handouts
Page 58 of 92

Definition. DISJUNCTION PROBABILITY. This type of probability is associated with


several events that happen either separately or simultaneously. Disjunction probability
is concerned with “either or” relationship.

Case 3. P(A or B) = P(A) + P(B


When the events do not have common sample points, they are said to be
mutually exclusive.

Example. What is the probability that in a single toss of a two dice, the sum
will be 4 or 7?

Case 4. P(A or B) = P(A) + P(B) – P(AB)


There are also cases of joint events which are not mutually exclusive
because there are some elements common to both events.

Example. What is the probability of getting a sum of 8 or a sum greater


than 7 in a throw of two dice?

Example. Take a math class with 52 students, 27 of whom are males and
the rest are females. A total of 21 of the males and 15 of the females got a
grade above 90. What is the probability that if a student is chosen at
random, this student has either grade of above 90 or is a male?
Statistics Handouts
Page 59 of 92

PROBABILITIES INVOLVING QUALITATIVE DATA IN CONTINGENCY TABLE

When the data are presented in the form of frequencies and are classified
according to qualitative rather than quantitative categories, they are called
qualitative data in contingency tables.

Illustration:

Vegetarian Status
Vegetarian Non Total
Gender Vegetarian
20 23 43
Male
22 25 47
Female
42 48 90
Total

1. To find the probability of a single event from qualitative data, simply


divide the subtotal of the desired event by the grand total.

P(A) = subtotal/ grand total

Example. The probability that a person is vegetarian

2. To find the conjunction probabilities of two independent events from


qualitative data, divide the observed frequency where the two events
intersect by the grand total.

P(A and B) = observed freq. of the two events intersection .


Grand total
Example. The probability that a person is female and a vegetarian
Statistics Handouts
Page 60 of 92

3. To find the probabilities of two dependent events from qualitative data,


divide the observed frequency where the two events intersect by the
subtotal of the event which is used as a condition

P(A and B) = observed freq. of the two events intersection .


Subtotal of the conditional events
Example. The probability of getting a male at random provided that he is a
non- vegetarian

4. To find the disjunction probabilities of the two events

P(A or B) = Subtotal of 1st event . + . subtotal of 2nd event .


grand total grand total

– Observd Freq. Of Intersectx


grand total

Example. The probability of getting a female or a person who is a non


vegetarian
Statistics Handouts
Page 61 of 92

Exercise # 6 - Probability

Objectives:
At the end of the exercise, the student is expected to be able to apply the different operations on probability

II. Show complete solution for each.

1. On a throw of two dice, what is the probability of obtaining a sum that at most 5?

2. If a single card is drawn from deck of 52 playing cards, what is the probability of
each of the following events: (a) obtaining a red card; (b) obtaining a heart; and (c)
obtaining an ace or spade?

3. A committee of 5 is to be selected from 12 seniors and 8 juniors. What is the


probability that the committee is to consist of at most 3 juniors?

4. A number of two different digits is to be formed from the digits 1,2,3,4 and 5.
Determine the probability of each of the following events:
a. the no. is odd
b. no. is greater than 25

5. A couple is planning to have three children. Find the probabilities that the couple
will have
a. two girls and one boy
b. at least two boys
c. no boys
d. at most two girls
e. two boys followed by a girl
Statistics Handouts
Page 62 of 92

6. Classification of Patients in a Hospital

Pregnant Elderly Children


Male 0 27 35 62
Female 28 49 11 88
28 76 46 150

What is the probability that a patient chosen at random from among the 150 will be:

a. pregnant
b. female or elderly
c. female and elderly
d. male or a child
e. male provided that he is elderly
f. child given male
Statistics Handouts
Page 63 of 92

PROBABILITY DISTRIBUTIONS

Concept of a Random Variable

Definition. A function whose value is a real number determined by each element n the
sample space is called a random variable.

Remark. We shall use an uppercase letter, say X, to denote a random variable and its
corresponding lowercase letter, x in this case, for one of its value.

Example (Experiment #1): An experiment consists of tossing a coin 3 times and


observing the result. The possible outcome and the values of the random variables X
and Y, where X is the number of heads and Y is the number of heads minus the
number of tails are

Sample Points X Y
HHH 3 3
HHT 2 1
HTH 2 1
HTT 1 -1
THH 2 1
THT 1 -1
TTH 1 -1
TTT 0 -3

DISCRETE AND CONTINUOUS PROBABILITY DISTRIBUTIONS

Definition. If a sample space contains a finite number of possibilities or an unending


sequence with as many elements as there are whole numbers, it is called a
discrete sample space.

Definition. A random variable defines over a discrete sample space is called a discrete
random variable

Definition. If a sample space contains an infinite number of possibilities equal to the


number of points on a line segment, it is called a continuous sample space.

Definition. A random variable defines over a continuous sample space is called a


continuous random variable.
Statistics Handouts
Page 64 of 92

Discrete Probability Distributions

Definition. A table or formula listing all possible values that a discrete random
variable can take on, along with the associated probabilities, is called a
discrete probability distribution.

Remark. The probabilities associated with all possible values of a discrete random
variable must sum to 1.

Examples. For Experiment #1, the discrete probability distributions of the random
variables X and Y are

x 0 1 2 3
P(X = x) 1/8 3/8 3/8 1/8

Y -3 -1 1 3
P(Y = y) 1/8 3/8 3/8 1/8

Continuous Probability Distribution

Definition. The function with values f(x) is called a probability density function for the
continuous random variable X, if

 *the total area under its curve and above the horizontal axis is equal to 1; and
 *the area under the curve between any two ordinates x=a and x=b gives the
probability that X lies between a and b.

Remarks:

1. A continuous random variable has a probability of zero of assuming exactly any of


its values, that is, if X is a continuous random variable, then P(X=x) = 0 for all real
numbers x.
2. The probability random variable X that can assume values between 0 and 2 has a
density function given by

0.5 𝑓𝑜𝑟 0 < 𝑥 < 2


f(x) = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Statistics Handouts
Page 65 of 92

Expected Values

Definition. Let X be a discrete random variable with probability distribution

x x1 x2 … xn
P(X = x) f(X1) f(X2) … f(Xn)

The mean or expected value of X is

𝜇 = 𝐸(𝑋) = ∑𝑛𝑖=1 𝑋𝑖 𝑓(𝑋𝑖 )

Examples:

1. Find the mean of the random variables X and Y of Experiment No. 1

x 0 1 2 3
P(X = x) 1/8 3/8 3/8 1/8

E(X) = (0)(1/8) + (1)(3/8) + (2)(3/8) + (3)(1/8) = 12/8 or 1.5

Y -3 -1 1 3
P(Y = y) 1/8 3/8 3/8 1/8

E(Y) = (-3)(1/8) + (-1)(3/8) + (1)(3/8) + (3)(1/8) = 0


Statistics Handouts
Page 66 of 92

Definition. Let X be a random variable with mean 𝜇 then the variance of X is

𝜎 2 = 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 − 𝜇)2

Definition. Let X be a discrete random variable with probability distribution

x x1 x2 … xn
P(X = x) f(X1) f(X2) … f(Xn)

The variance of X is
𝑛

𝜎 2 = 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 − 𝜇)2 = ∑(𝑥𝑖 − 𝜇)2 𝑓(𝑥𝑖 )


𝑖=1

Example:

In experiment No. 1, find the variance of X.

Using the definition of Var(X),

E(X) = 1.5

𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 − 𝜇)2 = ∑4𝑖=1(𝑥𝑖 − 1.5)2 𝑓(𝑥𝑖 )

= ( 0 – 1.5)2 (1/8) + ( 1 – 1.5)2 (3/8) + ( 2 – 1.5)2 (3/8) + ( 3 – 1.5)2 (1/8)

= 0.75

Example. A used car dealer finds that in any day, the probability of selling no car is
0.4, one car is 0.2, two cars is 0.15, 3 cars is 0.10, 4 cars is 0.08, five cars is 0.06 and
six cars is 0.01. Let g(X) = 500 + 1500X represent the salesman’s daily earnings, where
X is the number of cars sold. Find the salesman’s expected daily earnings.
Statistics Handouts
Page 67 of 92

Lesson # 8 – Normal Distribution

PROPERTIES OF A NORMAL CURVE

The normal distribution is represented by a normal curve. A normal curve is bell-


shaped figure, has the following six properties:
1. It is symmetrical about X .
2. The mean is equal to the median, which is also equal to mode.
3. The tail or ends are asymptotic relative to the horizontal line
4. The total area under the normal curve is equal to 1 or 100%
5. The normal curve area may be subdivided into at least three standard scores
each to the left and to the right of the vertical axis.
6. Along the horizontal line, the distance from one integral standard score to the
next integral standard score is measured by the standard deviation.

AREA UNDER THE NORMAL CURVE

In making use of the properties of the normal curve to solve certain types of
statistical problems, one must first learn how to find areas under the normal curve.

The first step in finding areas under the normal curve is to convert the normal
curve of any given variable into a standardized normal curve by using the formula:

X X
Z
S

where Z = standard score


X = mean
S = Standard deviation
X = given value of a particular variable

WORDED PROBLEMS:

1. Given a normal distribution with mean 350 and standard deviation s=40, find the
probability that x assumes a value greater than 362.
Statistics Handouts
Page 68 of 92

2. An electrical firm manufactures light bulbs that have a length of life that is
normally distributed with mean equal to 800 hours and a standard deviation of
40 hours. Find the probability that a bulb burns between 778 and 834 hours

3. On an examination the average grade was 74 and the standard deviation was 7. If
12% of the class are given A’s, and the grades are curved to follow a normal
distribution, what is the lowest possible A and the highest possible B? Find D6.

4. The quality grade-point averages of 300 college freshmen follow approximately a


normal distribution with a mean of 2.1 and a standard deviation of 0.8. How
many of these freshmen would you expect to have a score

a. between 2.45 and 3.55?


b. greater than 3.85?
c. less than 1.75?
Statistics Handouts
Page 69 of 92

Exercise # 7 – Normal Distribution

Objectives: At the end of the exercise the student should be able to:
1.Find probabilities using the standard normal probability curve;
2. Apply the concepts of finding areas under the normal probability curve in solving
problems

I. Find the probability.


a. P( z < -1.257 f. P( z > 0.85) k. P(1.33 < z < 1.56)
b P( z < 1.65) g. P( z > 0.69) l. P(-1.48 < z < 2.04)
c. P( z < 0.92) h. P( z > 3.01) m. P(-0.58 < z < 1.05)
d. P( z < -2.02) i. P( z > 2.84) n. P(-0.92 < z < 0.07)
e. P( z < -1.24) j. P( z > 0.53) o. P(-1.45 < z < 1.87)

II. Find the unknown constant a given the area under the normal curve.
a. P(z < a) = 0.25
b. P(z > a) = 0.99

III. Solve the following problems.

a. Given a normal distribution variable X with mean 18 and standard deviation


2.5, find
i. P(X < 15)
ii. P(17 < X < 21)
iii. the value of k such that P(X < k) = 0.2578;
iv. the value of k such that P(X > k) = 0.1539

b. If a set of grades on a statistics exam are approximately normally distributed


with a mean of 74 and a standard deviation of 7.9, find

i. the lowest passing grade if the lowest 10% of the students are given F’s;
ii. the highest B if the top 5% of the students are given A’s;

c. A soft drink machine is regulated so that it discharges an average of 200


milliliters per cup. If the amount of drink is normally distributed with a  = 15
milliliters,
i. What is the probability that a cup contains between 180 and 230
milliliters?
ii. How many cups will likely to overflow if 220 milliliter cups are used to
the next 1000 drinks?
iii. Below what value do we get the smallest 35% of the drinks?
Statistics Handouts
Page 70 of 92

Lesson # 9 – Estimation

ESTIMATION

- refers to any process by which sample information is used to predict or estimate the
numerical value of some population measure.

- The formula, function or procedure used in estimating a population parameter is


called an estimator. The value obtained with the use of the estimator is the
estimate.

- Two types of estimators: point estimator and interval estimator. A point estimator
yields a numerical value of the estimate. An interval estimate gives a range or band
of values within which the value of the parameter is estimated to lie.

 INTERVAL ESTIMATION OF THE POPULATION MEAN


An interval estimate of ( or any parameter) incorporates a measure of the
confidence in the reliability of the range or interval of values within which the
parameter is estimated to lie. Thus, an interval estimate is also called a confidence
estimate, and its limits, confidence limits.

P( X  k    X  k )  1  

Where
𝑥̅ = 𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛

k  Z  ( s.e.)
2

  
s.e.   
 n

 = level of significance

1-  = level of confidence
Statistics Handouts
Page 71 of 92

Example.

1.The mean IQ of a random sample of 400 high school students is 110. The standard
deviation of the population of IQ scores is 16. If the population is normally distributed,
find:
a. a .95 confidence interval estimate of 

Z   1.96
2









b. a .90 confidence interval estimate of 

Z   1.64
2






 Find the .90 confidence interval estimate of the mean weight of all the pupils in a
certain school if a random sample of 25 pupils has a mean weight of 70lbs with a
standard deviation of 15lbs. Assume the population weights to be normally distributed.

t  1.711
2
Statistics Handouts
Page 72 of 92

3. The contents of 7 similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0,
10.2 and 9.6 liters. Find a 95% confidence interval for the mean content of all such
containers, assuming an approximate normal distribution for containers contents. (
𝑡∝ = 2.447 )
2

4. The mean and standard deviation for the quality grade-point averages of a
random sample of 36 college seniors are calculated to be 2.6 and 0.3, respectively. Find
the 99% confidence interval for the mean of the entire senior class. Interpret the
obtained confidence interval. ( 𝑍∝ = 2.575 )
2

5. The manager of a home delivery service for pizza pies wants an estimate of the
average time it takes to deliver an order within the town proper of the City of Naga. A
sample of 25 deliveries had a Mean time of 15 minutes and a standard deviation of 4
minutes. Construct a 95% confidence interval for the average time for all deliveries.
Interpret the interval obtained. ( Z = 1.96 )

6. A random sample of 12 students in a certain dormitory showed an average weekly


expenditure of P400 for snack foods, with a standard deviation of P50.25. Construct a
90% confidence interval for the average amount spent each week on snack foods by
female students living in this dormitory, assuming the expenditure to be approximately
normally distributed. Interpret your confidence interval.
( t /2 = 1.796)
Statistics Handouts
Page 73 of 92

Lesson # 10- Test of Hypothesis

COMMON TERMS IN INFERENTIAL STATISTICS

 A HYPOTHESIS is a statement, which aims to explain facts about the real


world. A test of hypothesis is a two-way decision problem. It is a procedure to
substantiate or invalidate a claim which is stated as null hypothesis

Definition. A NULL HYPOTHESIS (Ho) is the hypothesis that we hope to accept


or reject; must always express the idea of non significance of difference
An ALTERNATIVE HYPOTHESIS (Ha). The rejection of Ho is the
acceptance of this hypothesis.

 TYPE I and TYPE II ERROR

Decision Ho is TRUE Ha is TRUE

Reject Ho Type I error Correct decision


Accept Ho Correct decision Type II error

Type I error (  error) – when we reject the null hypothesis when in fact the
null hypothesis is true.

Type II error (  error) – when we accept the null hypthesis when in fact the
null hypothesis is false.

 ONE-TAILED AND TWO-TAILED TEST

Definition. When the rejection region located at only one extreme of the range
of values for the test statistics, the test is ONE-TAILED. If Ha is a statement of
non-equality represented by the sign  , then the hypothesis is non-
directional, thus we have a two-tailed test.
Statistics Handouts
Page 74 of 92

Steps in Test of Hypothesis:

i. State the hypotheses, Ho and Ha.


ii. Determine the appropriate test statistic to use
iii. Choose the level of significance and formulate the decision rule
iv. Compute the value of statistic from the sample data
v. Make a decision (reject or accept) in accordance with the decision rule formulated
vi. Draw a conclusion in relation to the objective of the original problem

I. Mean of a Single Population

Case 1. Z Test

a. Hypotheses: Ho:   0 against

A. Ha:   0 or
B. Ha:   0
C. Ha:   0

i. Test Statistic : Z Test


j. Computation:

X  0
Zc 

n
k. Decision Rule: At a level of significance ,

A. For Ha:   0 reject Ho if /Zc/ > Z  , otherwise accept Ho.


2

B. For Ha:   0 reject Ho if Zc < -Z otherwise accept Ho.

C. For Ha:   0 reject Ho if Zc > Z otherwise accept Ho.


Statistics Handouts
Page 75 of 92

Example 1. The weight of crabs is normally distributed with mean 28.5


ounces and standard deviation of 3 ounces. A new breeder claims that he
can breed crabs yielding a mean weight of more than 28 ounces. A random
sample of 16 crabs from the new breeder had a mean weight of 29.2
ounces. At  = 5%, do the data support the breeders claim?

i. Ho:  = 28.5
Ha:  > 28.5

ii. Test Statistic: Z Test

iii. Decision Rule : Reject Ho if Zc > Z otherwise accept Ho.

iv. Computation:

𝑋̅ − 𝜇𝑜 29.2 − 28.5
𝑍𝑐 = 𝜎 = = 0.933
3
√𝑛 √16

Z = 1.645

iv. Decision: Since Zc < Z (0.933 = 1.645), accept Ho.

vi. Conclusion: At 5 % level of significance, there is no enough evidence to


support the new breeders claim OR the mean weight of the samples is not
significantly different from the mean of 28.5.

Example 2. For the past five years, the mean height of AdeNU students is 60
inches with a standard deviation of 4 inches. A simple random sample of 100
is taken from the present students. It was found that the mean height is 65
inches. Is there reason to believe that the mean height of present AdeNU
students different from the past five years at 5% level of significance?
Statistics Handouts
Page 76 of 92

Case 2. T Test

a. Hypotheses: Ho:   0 against

D. Ha:   0 or
E. Ha:   0
F. Ha:   0

l. Test Statistic : T Test


m. Computation:

X 
Tc 
s
n

n. Decision Rule: At a level of significance ,

D. For Ha:   0 reject Ho if /Tc/ > T  , otherwise accept Ho.


[ , n 1]
2

E. For Ha:   0 reject Ho if Tc < -T, n otherwise accept Ho.

F. For Ha:   0 reject Ho if Tc > T, n otherwise accept Ho.

Example 3. A softdrink vending machine is set to dispense 6 ounces per


cup. If the machine is tested eight times, yielding a mean cup fill of 5.8
ounces with a standard deviation of 0.16 oz. Is there evidence at 5% level of
significance that the machine is underfilling cups. Assume normality.

i. Ho :  = 6
Ha:  < 6

ii. Test Statistic: T Test

iii. Decision Rule : reject Ho if Tc < -T,  otherwise accept Ho.
v. Computation:

𝑋̅ − 𝜇 5.8 − 6
𝑇𝑐 = 𝑠 = = −3.536
0.16
√𝑛 √8

-T, n = -T[0.05,7] = -1.895


Statistics Handouts
Page 77 of 92

v. Decision: Since -3.536 < -1.895, reject Ho.

vi. Conclusion: At 5 % level of significance, there is evidence to say that the


machine is under filling the cups.

Example 4. The monthly output of a plywood manufacturers was measured in


nine randomly selected months. The results obtained (in tons) are 100, 120,
100, 102, 130, 140, 150, 140 and 145. Test the hypothesis that the mean
monthly output is 140 tons against the alternative that it is not 140 tons at
10%level of significance. Assume that the monthly output is normal random
variable.
Statistics Handouts
Page 78 of 92

Exercise # 8 – Test of Hypothesis ( Z and T Test)

A. Carry out a complete test of hypothesis for the following problems.

1. A certain brand of powdered milk is advertised as having net weight of 250 grams. If
the net weights of a random sample of 10 cans are 253, 248,
252,245,247,249,251,250,247 and 248 grams, can it be concluded that the average
net weight of the cans is less than the advertised amount? Use  = 0.01 and assume
that the net weight of this brand of powdered milk is normally distributed.

2. In a time and motion study, it was found that the average time required by workers
to complete a certain manual operation was 26.6. A group of 20 workers was
randomly chosen to receive a special training for two weeks. After the training it was
found that their average time was 24 minutes and a standard deviation of 3
minutes. Can it be concluded that the special training speeds up the operation? Use
 = 0.05

3. The manager of an appliance store, after noting that the average daily sales was only
12 units, decided to adopt a new marketing strategy. Daily sales under this strategy
were recorded for 90 days after which period the average was found to be 15 units
with a standard deviation of 4 units. Does this indicate that the new marketing
strategy increased the daily sales? Employ  = 0.01

4. The daily wages in a particular industry are normally distributed with a mean of
P66.00. In a random sample of 144 workers of a very large company in this industry,
the average daily wage was found to be P62.00 with a standard deviation of P12.50,
can this company be accused of paying inferior wages at the 0.01 level of
significance?
Statistics Handouts
Page 79 of 92

II. Two Population Means – T Test

A. Dependent or Paired/ Independent


i. Ho: population mean of A is equal to population mean of B
Ha: The population means are not equal
ii. Decision rule: Reject Ho if p-value < level of significance
Or t-computed > t-value, otherwise accept Ho.

III. ANOVA
Sample Problems:
a. A researcher wishes to know if there are differences on the average preparation time of
four methods of preparing a solvent.
b. An agriculturist may compare the average yields of three corn varieties used by Los
Banos
c. A consumer wish to know if the different brands of gasoline in the market are equally
good with respect to average mileage
d. A medical researcher is interested in comparing the effectiveness of 3 different
treatments to lower the cholesterol of patients with high values
e. An ecologist wants to compare the amount of certain pollutant in five rivers

i. Ho: There is no difference between groups


Ha: There is difference between groups
i. Decision rule: Reject Ho if p-value < level of significance
Or f-value > critical value, otherwise accept Ho.

IV. Chi-Square Test-t of Independence

This test is usually applied on enumeration data or data in contingency tables. It


tests the association or independence of one variable from another variable.

i. Ho: The two variables are independent


Ha: The two variables are dependent.
ii. Decision rule: Reject Ho if p-value < level of significance
Or X2 value > critical value, otherwise accept Ho.
Statistics Handouts
Page 80 of 92

SAMPLE PROBLEMS

Two Population Means - T test

A. Dependent or Paired

1. In a study of the effectiveness of physical exercise in weight reduction, a


simple random sample of 8 persons engaged in a prescribed program of
physical exercise for one month showed the ff. Results:

Weight 209 178 169 212 180 192 158 180


Before

Weight 196 171 170 207 177 190 159 180


After

At 1% level of significance, do the data provide evidence that the prescribed


program of exercise is effective?

a. Ho: The weights before and after are equal therefore the procedure is not
effective.

Ha: The weights before and after are not equal therefore the procedure is
effective.

b. Decision rule: Reject Ho if T-computed > critical value, otherwise accept Ho at


1% level of confidence.

c. Test Statistics: T-test on Two Populations

d. Computation: T-computed = 2.07


Critical value = 3.499

e. Decision: Accept Ho.

f. Conclusion: At 1% level of significance, there is sufficient evidence to say that


the program is not effective.
Statistics Handouts
Page 81 of 92

B. Independent

2. Some statistics students complain that pocket calculators give other students
advantage during statistics examination. To check this contention, a simple
random sample of 45 students were randomly assigned to two groups, 23 to
use calculators and 22 to perform calculations by hands. The students then
took a statistics examination that required a modest amount of arithmetic.
The results are shown below:

With Calculator 85 86 89 84 82 83 90 91 86 90 87 87 92 85 86 89 88
88 89 90 85 89 90

Without Calculator 86 88 90 92 86 85 88 89 85 91 86 85 92 84 83 88 90
91 86 90 86 87

Do the date provide sufficient evidence to indicate that the students taking
this particular examination obtain higher scores when using a calculator? Test at
 = 10%.

a. Ho: The mean scores are equal.


Ha: The mean scores are not equal.

b. Decision rule: Reject Ho if T-computed > critical value, otherwise accept Ho.

c. Test Statistics: T-test on Two Populations

d. Computation: T-computed = 0.25


Critical value = 1.303

e. Decision: Accept Ho.

f. Conclusion: At 10% level of significance there is no enough evidence to say


that the use of calculators will assure students of higher scores.
Statistics Handouts
Page 82 of 92

ANOVA

3. A study was conducted to compare the three teaching methods. Three groups
of 6 students were chosen and each group is subjected to one of three types of
teaching method. The grades of the students taken at the end of the semester
are given as:

Group I Group II Group III


Method A Method B Method C
Student 1 84 70 90
Student 2 90 75 95
Student 3 92 90 100
Student 4 96 80 98
Student 5 84 75 88
Student 6 88 75 90

a. Ho: The three teaching methods are equal.


Ha: The three teaching methods are not equal.

b. Decision rule: Reject Ho if F-computed > critical value, otherwise accept Ho.

c. Test Statistics: F-test ANOVA

d. Computation: F-computed = 13.121


Critical value= 3.68

e. Decision: Reject Ho.

f. Conclusion: There is evidence to say that the three methods are not equal.
We can also conclude that Method III is more effective since it students got higher
grades compared to the other two methods.
Statistics Handouts
Page 83 of 92

Chi-Square Test of Independence

4. It is believed that people with high blood pressure need to watch their weight.
A random sample of 300 subjects was classified according to their weight and
blood pressure. At the 5% level of significance, is there sufficient evidence to
conclude that a person’s weight is related to his blood pressure?

Blood Pressure
Weight High Normal Low

Overweight 40 34 18
Normal 36 77 27
Underweight 16 33 19

a. Ho: Weight is independent with blood pressure or weight is unaffected by


blood pressure or the two variables weight and blood pressure are
independent.

Ha: Weight is dependent with blood pressure or weight is affected by blood


pressure or the two variables weight and blood pressure are dependent.

b. Decision rule: Reject Ho if X2-computed > critical value, otherwise accept Ho.

c. Test Statistics: Chi-square Test

d. Computation: X2-computed = 12.75


Critical value = 9.49

e. Decision: Reject Ho.

f. Conclusion: At 5% level of significance, there is evidence to say that weight


is affected by blood pressure. For overweight persons, most of them
(approximately 40% of the actual population) will have higher blood pressure. For
normal weight person, they are most likely to have normal blood pressure. Those
who are underweight will also most likely to have normal blood pressure.
Statistics Handouts
Page 84 of 92

Exercise # 9 – Test of Hypothesis (T-test, ANOVA and Chi-Square Test)


Objectives:
At the end of the exercise, the student is expected to be able to apply the appropriate statistical procedure
in performing test of hypothesis of various problems

Carry out a complete test of hypothesis for the following problems.

1. As part of a study to determine the effects of a certain oral contraceptive on


weight gain, 12 healthy females were weighed at the beginning of a course of
oral contraceptive usage. They were reweighed after three months. Do the
results suggest evidence of weight gain? Use  = 0.05

Subject 1 2 3 4 5 6 7 8 9 10 11 12
Initial 120 141 130 162 150 148 135 140 129 120 140 130
Weight
3-Month 123 143 140 162 145 150 140 143 130 118 141 132
Weight
Source: Basic Statistics for Health Sciences by Kuzma

d. Ho:

Ha:

e. Test Statistic:

f. Decision Rule:

g. Computation: computed value = 1.75


Critical value = 2.201

h. Decision:

i. Conclusion:
Statistics Handouts
Page 85 of 92

2. An investment analyst claims to have mastered the art of forecasting the price
changes of gold. The ff. Table gives the actual gold price changes and the
changes forecasted by the investment analyst (in%) on a simple random
sample of 8 months. Use a  = 5%.

Month 1 2 3 4 5 6 7 8
Actual Price Changes 7.3 -2.1 8.5 -1.5 9.2 6.7 -4.8 -0.8
Forecasted Changes 14.9 -19.7 7.0 -5.3 1.0 -0.8 -8.3 6.7

a. Ho:

Ha:

b. Test Statistic:

o. Decision Rule:

p. Computation: Computed value = 1.15


Critical value = 2.365

q. Decision:

r. Conclusion:
Statistics Handouts
Page 86 of 92

3. Four groups of 4 patients each were subjected to four different types of


treatment fort he same ailment. The following data are on the number of days
that elapsed before that were completely cured. What conclusions may be drawn
about the four types of treatment?

Treatment Treatment Treatment Treatment


A B C D
Patient 1 10 11 3 6
Patient 2 9 11 4 10
Patient 3 6 18 5 8
Patient 4 7 6 7 11

a. Ho:

Ha:

b. Test Statistic:

c. Decision Rule:

d. Computation: Computed value = 3.474


Critical value = 3.49

e. Decision:

f. Conclusion:
Statistics Handouts
Page 87 of 92

4. Test if there is significant association between academic performance and IQ

Table. Academic Performance and IQ of 100 Students

IQ High Average Low Total

Academic
Performance

Passed 31 45 4 80
Failed 1 4 15 20

Total 32 49 19 100

a. Ho:

Ha:

b.Test Statistic:

c.Decision Rule:

d.Computation: Computed value = 51.25


Critical value = 5.99

e.Decision:

f.Conclusion:
Statistics Handouts
Page 88 of 92

Lesson # 11 - TWO-FACTOR ANOVA

Example 1. A research study was conducted to examine the impact of eating a high
protein breakfast on adolescent’s performance during a physical education physical
fitness test. Half of the subjects received a high protein breakfast and half were given
a low protein breakfast. All of the adolescents, both male and female, were given a
fitness test with high scores representing better performance. Test scores are
recorded below.

Males Females
High Protein Low Protein High Protein Low Protein
10 5 5 3
7 4 4 4
9 7 6 5
6 4 3 1
8 5 2 2

Statistical test results:


Treatment F -value F-critical

between (protein level) *8.89 4.49 8.53


within (gender) *20.00 4.49 8.53
among (interaction betwn 2.22 4.49 8.53
protein level and gender)

5% 1%

Ho : There is no difference on the performance between the two protein levels


There is no difference on the performance between the two gender
There is no interaction between protein levels and gender

Interpretation:

At 5% level of significance it can be concluded that there is significant difference


on the performance for both protein level and gender. There was no significant
interaction effect. Based on this data, it appears that a higher protein diet results in a
better fitness test scores. Additionally, young men seem to have a significantly higher
fitness test score than women.
Statistics Handouts
Page 89 of 92

Seatwork:

1. Different typing skills are required for secretaries depending on whether one is
working in a law office, an accounting firm, or for research mathematical group at a
major university. In order to evaluate candidate for this positions, an employment
agency administers three distinct standardized typing samples. A time penalty has been
incorporated into the scoring of each sample based on the number of typing errors. The
mean and standard deviation for each test, together with the score achieved by a recent
applicant, are given in Table below. For what type of position does this applicant seem
to be best suited?

Sample Applicant’s Mean Standard


Score Deviation

Law 141 sec 180sec 30 sec


Accounting 7min 10min 2min
Scientific 33min 26min 5min
Statistics Handouts
Page 90 of 92

2. Researchers have sought to examine the effect of various types of music on agitation
levels in patients who are in the early and middle stages of Alzheimer’s disease. Patients
were selected to participate in the study based on their stage of Alzheimer’ s disease.
Three forms of music were tested: easy listening, Mozart, and piano interludes. While
listening to music, agitation levels were recorded for the patients with a high score
indicating a higher level of agitation. Scores are recorded below.

Early Stage Alzheimer Middle Stage Alzheimer


Piano Easy Piano Easy
Interlude Mozart Listening Interlude Mozart listening

21 9 29 22 14 15
24 12 26 20 18 18
22 10 30 25 11 20
18 5 24 18 9 13
20 9 26 20 13 19
Statistics Handouts
Page 91 of 92

3. A study examining differences in life satisfaction between young adults, middle adult
and older adult men and women was conducted. Each individual who participated in
the study completed a life satisfaction questionnaire. A high score on the test indicates
a higher level of life satisfaction. Test scores are recorded below.

Male Females
Young Middle Older Young Middle Older
Adult Adult Adult Adult Adult Adult

4 7 10 7 8 10
2 5 7 4 10 9
3 7 9 3 7 12
4 5 8 6 7 11
2 6 11 5 8 13

Mean = 3 6 9 5 8 11
Statistics Handouts
Page 92 of 92

Lesson # 12 – Pearson Moment Correlation


Pearson Moment is one of the measures of correlation which quantifies the
strength as well as direction of such relationship. The correlation coefficient (r) has the
following interpretation:

Scale ( +/ -) Decision

1.00 Perfect Relationship


0.80 - 0.99 Very Strong Relationship
0.60 – 0.79 Strong relationship
0.40 – 0.59 Moderate Relationship
0.20 – 0.39 Weak Relationship
0.01 – 0.19 Very Weak Relationship
0.00 No relationship

Table.
Result of AdNU Entrance Examinees of 20 Examinees
No. SAI RPM Math English
1 52 25 47 21
2 84 40 48 11
3 113 90 58 29
4 92 90 47 14
5 98 80 54 17
6 91 80 56 19
7 52 15 52 18
8 116 40 68 38
9 101 60 69 22
10 83 15 48 16
11 65 10 52 16
12 96 95 54 19
13 94 80 54 15
14 89 65 56 20
15 91 45 54 21
16 92 80 64 17
17 101 95 58 33
18 97 95 56 17
19 89 80 56 11
20 96 95 58 27