You are on page 1of 33

St.

Paul University Philippines Course Content


Graduate School
• Basic Concepts in Statistics
• Measures of Central Tendency
• Measures of Variability
• Correlation and Regression Analysis
A Course Presentation in Statistics • Test of Hypothesis
– Z – Test
– T – Test
– Chi – Square Test
– Analysis of Variance (ANOVA)
• EXPLORING THE SPSS

Course Requirements
Reaction Paper (Film Clip Analysis)
Reaction Paper/ Film Clip Analysis

Problem Set Lies, Damned Lies and Statistics: The


Misapplication of Statistics in
Final Examination Everyday Life

1
Statistics defined . . . Main Divisions
Descriptive Statistics
• STATISTICS is a collection of methods for
planning experiment, obtaining data, and - summarize or describe the important
characteristics of a known set of
then organizing, summarizing, presenting, population data
analyzing, interpreting and drawing
conclusions based on the data. Inferential Statistics

-use sample data to make inferences (or


generalizations) about a population

Population vs. Sample Parameter vs. Statistic

• A POPULATION is the complete collection of • A PARAMETER is a numerical measurement


elements (scores, people, measurements, and so describing some characteristics of a population
on)
• A STATISTIC is a numerical measurement
• A SAMPLE is a portion / subset of elements describing some characteristic of a sample
drawn from a population

2
Qualitative vs. Quantitative Data Discrete vs Continuous Data

• Qualitative (categorical or attribute) data • Discrete data result from either a finite number of
can be separated into different categories possible values or a countable number of possible
that are distinguished by some non – values (that is, the number of possible values are
0, 1, 2, or more)
numerical characteristics

• Continuous data result from infinitely many


• Quantitative data consists of numbers
possible values that can be associated with points
representing counts or measurements on a continuous scale in such a way that there are
no gaps or interruptions

Dependent vs Independent Variable Nominal Level of Measurement

• Dependent variable – the variable that is being


• The nominal level of measurement is
characterized by data that consists of names,
affected
labels or categories only. The data cannot be
- the variable that is being
arranged in an ordering scheme
explained

• Independent variable – the variable that affects • Examples:


- the variable that explains gender of employees, civil status,
nationality, religion, etc

3
Ordinal Level of Measurement Interval Level of Measurement
• The ordinal level of measurement involves • The interval level of measurement is like the
data that may be arranged in some order, but ordinal level, with the additional property that
differences between data values are either meaningful amounts of differences between data
meaningless or cannot be determined. can be determined. However, there are no inherent
(natural) zero starting point

• Examples:
• Examples:
good, better or best speakers; 1 star, 2 star
body temperature, year (2007, 2008, 2013, etc)
or 3 star movie; rank of an employee

Ratio Level of Measurement Visual Summary of the Scales of Measurement


Are there named categories?

• The ratio level of measurement is the YES NO

interval modified to include the inherent


zero starting point. For values at this level, Nominal scale of measurement
`
Are the scores ranked?
YES NO
differences and ratios are meaningful.
Ordinal scale of measurement Are there equal intervals with a
• Examples: meaningful zero point?
YES NO
weights, lengths, distance traveled
Ratio scale of measurement Interval scale of measurement

4
The Mean
Measures of
Central
Tendency
(UNGROUPED • Two Forms
DATA)
– Simple mean
– Weighted mean

Mean Median Mode


The mean takes the symbol X.

Arithmetic Mean (Mean) The Mean


If you have a

“balancing point” of a set of scores Population Sample

the “average score” Total number of cases is N Total number of cases is n

Sum of the scores is ΣX Sum of the scores is ΣX

Compute the mean of the Compute the mean of the


population sample
∑X ΣX
µ= X=
N n

5
Example:
Simple Arithmetic Mean
Consider the following data set:
Where: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
x = an individual Solution:
X
score

X n = the number of
X
X 1 + 2 + 3 + 4 + 5 + 6 + 7+ 8 + 9 + 10
=
n scores/cases n
10
Sigma or x= sum of
the individual score
values
Mean = 5.5

Example: Solution:
• The following data represents the ages of the mothers • To obtain the mean age of the mothers of the Grade 1,
of Paulinian Graders randomly selected from four we have
different grade levels who attended a session on
Counseling. What is the mean age of the mothers per X = 35 + 37 + 45 + 54 + 39 + 48
grade level?
6
= 258
• Grade 1: 35, 37, 45, 54, 39, 48
6
• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
X = 43
• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47 **This means that the mothers of the Grade 1 pupils are relatively young.

6
Example: Answers:
• Find the mean of the other grade levels. Round off
your answers to the nearest hundredths. • Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
ANSWER: 53.73
• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56 • Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47 ANSWER: 50

• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47


ANSWER: 52.44

Example:
Weighted Mean • The following are the responses of 30 randomly chosen
respondents in one item of a research questionnaire.

Xw = w1X1 + w2X2 + w3X3 + . . . + wnXn Verbal Description Weight No. of Responses


Total number of weights Very strongly agree 5 7
Strongly agree 4 11
Where:
Agree 3 9
w = weight per item value
Disagree 2 2
x = individual score values
Strongly disagree 1 1

** Find the weighted response of the respondents and


interpret the result.

7
Solution: Interpretation of Values
• To obtain for the weighted response, we have
Range Verbal Description
X = 5(7) +4(11) + 3(9) +2(2) + 1(1) 4.20 – 5.00 Very strongly agree
30 3.40 – 4.19 Strongly agree
= 111
2.60 – 3.39 Agree
30
X = 3.70  strongly agree 1.80 – 2.59 Disagree

1.00 – 1.79 Strongly disagree

Exercise: Example:
• The following are the grades of one student one
• Construct a likert scale to interpret items of a summer term.
questionnaire with weights 1 – 4.
Subject No. of Units Grade
• Assume the following descriptions were used:
Statistics 3 98
4 – always
3 – sometimes PE 2 90
2 – seldom
Chemistry 5 93
1 – never

** Find the weighted average of the student.


** What could have been the student’s average if all his
subjects are of equal weights?

8
sum of the deviations about the mean is zero
Characteristics of the Mean
(– 1) + (– 2) + (– 2) + 1 + 4 = 0
an interval statistic B

A C D E
calculated average

value is determined by every 3 4 5 6 7 8 9


case in the distribution

affected by extreme values  (+1) 


 (-1) 
most widely used
(-2)
most sensitive measure (+4)
(-2)

Median Median
the value that lies in the middle after ranking all  the value at which 1/2 of the ordered scores fall above
the scores
and 1/2 of the scores fall below
positional measure

the midpoint or the n = odd n = even


50th percentile of a
distribution 12345 1 2 3 4

Median = 3 Median = 2.5

9
Example:
Example
I am the4th
observation. I 5.40 1.10 0.42 0.73 0.48
am the median.
1.10
0.42 0.48 0.73 1.10 1.10
5.40

(even number of values – no exact middle


shared by two numbers)

0.73 + 1.10
MEDIAN is 0.915
2

Example
Example an ordinal statistic

rank or position average


5.40 1.10 0.42 0.73 0.48 1.10 0.66
not affected by extreme values
0.42 0.48 0.66 0.73 1.10 1.10 5.40
can be subjected to a few
(in order - odd number of values) mathematical computations

exact middle MEDIAN is 0.73 Characteristics less widely used than the mean
of the Median
represents a typical score

10
Exercise
Mode
• The following data represents the ages of the mothers
of Paulinian Graders randomly selected from four the value which occurs most frequently in a given data
different grade levels who attended a session on set

Counseling. What is the median of the ages of the does not involve any calculation or ordering of data
mothers per grade level?

• Grade 1: 35, 37, 45, 54, 39, 48


• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47

Example Examples
Consider the following data set:

Observation Value/
a. 5.40 1.10 0.42 0.73 0.48 1.10 Mode is 1.10
Score
1 5 b. 27 27 27 55 55 55 88 88 99 Bimodal - 27 & 55

2 7 c. 1 2 3 6 7 8 9 10 No Mode
3 3
4 8
5 7

11
Characteristics of
the Mode Which is best?

a nominal statistic Advantages Disadvantages


an inspection average
most frequently occurring value Mode Quick and easy to May not be representative
cannot be manipulated mathematically calculate. of the whole sample
rarely used Median Fairly easy to calculate. Tedious to find for a large
most “popular score Half of the scores lie set of numbers or for a set
above the median. that is not in order
Mean Takes all numbers into Can be affected by outliers
account.

When to use . . . Measures of


Central
-an interval interpretation is needed Tendency
Mean (GROUPED
-the value of each score is desired
DATA)
-further statistical computation is expected
Median -an ordinal interpretation is needed
-the middle score is desired
-avoidance of the influence of extreme values is
needed
Mode -a nominal interpretation needed Mean Median Mode
-a quick approximation of a central tendency
measure is desired
-most frequently occurring score is needed

12
The Mean The Mean
i.) Classmark method ii.) Coded – deviation method

fx m fd
X=  n
X = AM + 
n
i

Where: Where:
Xm – class mark / class midpoint AM – assumed mean (Xm of where the zero deviation is set)
f – frequency f – frequency
n – number of cases / observations d – deviation
n – number of cases / observations

Example The Median


**Find the mean, median and mode of the following
data set: n
- cfp
2
X F Md = XLB + i
24 – 26 3 f
21 – 23 12

18 – 20 10 Where:
15 – 17 6 XLB – lower boundary of the median class
12 – 14 6
cfp – cumulative frequency preceding the median class
9 – 11 5
n – number of cases
6–8 5
f – frequency of the median class
3–5 3
i – class size/width

13
The Mode Exercise
**Find the mean, median and mode of the following
data set:
Mo = XLB + ∆1 i
X F
56 – 62 4

∆1 + ∆2 49 – 55 9

Where: 42 – 48 12

XLB – lower boundary of the modal class 35 – 41 12

∆1 – difference between frequency of the modal class 28 – 34 10

and frequency below it 21 – 27 8

14 – 20 6
∆2 – difference between frequency of the modal class
and frequency above it 7 – 13 4

i – class size/width

Other Measures of Position The Quartile


(QUANTILES) kn
- cfp
4
1. Quartile (Qk ) – divides the distribution into 4 Qk = XLB + i
f
equal parts
2. Decile (Dk )- divides the distribution into 10 Where:
equal parts XLB – lower boundary of the quartile class
3. Percentile (Pk )- divides the distribution into cfp – cumulative frequency preceding the quartile class
100 equal parts
n – number of cases
f – frequency of the quartile class
i – class size/width

14
The Decile The Percentile
kn
- cfp kn
- cfp
10 100
Dk = XLB + i Pk = XLB + i
f f

Where: Where:
XLB – lower boundary of the decile class XLB – lower boundary of the percenttile class
cfp – cumulative frequency preceding the decile class cfp – cumulative frequency preceding the percentile class
n – number of cases n – number of cases
f – frequency of the decile class f – frequency of the percentile class
i – class size/width i – class size/width

Exercise Measures of Variability


**Using the frequency distribution below, find:
1. Q1 3. D3 5. P3
• The statistical tool used to describe the degree to
2. D6 4. P78
which scores/ observations are scattered.
X F •It is used to determine the degree of consistency /
56 – 62 6
homogeneity of scores.
49 – 55 9

42 – 48 10 1. range
35 – 41 12 2. mean absolute deviation
28 – 34 10
3. semi – interquartile range/ quartile deviation
21 – 27 8
4. variance
5. standard deviation
14 – 20 6

7 – 13 4

15
Formulas (Ungrouped Data) Formulas (Ungrouped Data)
1. Range 4. Variance
R =HOV – LOV
(X  X ) 2

s2 = n 1
2. Mean absolute deviation

MAD =
/ X  X /
n
5. Standard deviation
3. Semi – interquartile range/ quartile deviation
QD = Q3 – Q1 s= s2
2

Exercise: Application:

• Given the following data, find the range, MAD, • Two seemingly equally excellent students are
variance and the standard deviation. vying for an academic honor where only one must
20, 26, 40, 39, 35 have to be chosen to get the award. The following
are their grades which are used as a basis for giving
the award.
•Student A: 90, 92, 92, 94, 95
•Student B: 90, 91, 93, 94, 95

•Who do you think deserves the award? Why?

16
Guiding Principle Formulas (Grouped Data)
1. Range
• The lesser the value of the measure, the R = HOV – LOV
more consistent, the more homogenous and
2. Mean absolute deviation
the less scattered are the observations in the
set of data. MAD =
f /X m X/
n
3. Semi – interquartile range/ quartile deviation
QD = Q3 – Q1
2

Formulas (Grouped Data) Exercise:


**Using the frequency distribution below, find:
4. Variance
1. Range 3. QD 5. Standard Deviation

=  n 1
f (X  X )
m
2
2. MAD 4. variance
s2
X F
56 – 62 6

49 – 55 9

5. Standard deviation 42 – 48 10

35 – 41 12

s= s2 28 – 34 10

21 – 27 8

14 – 20 6

7 – 13 4

17
Tests of Hypothesis Kinds of Hypotheses
1. Null Hypothesis (Ho)
Hypothesis • It serves as the working hypothesis
•A statement or tentative theory which aims to • It is that which one hopes to accept or reject
explain facts about the real world • It must always express the idea of no
•An educated guess significant difference
•It is subject for testing. If it is found to be
statistically true, it is accepted. Otherwise, it gets 2. Alternative Hypothesis (H1 or Ha)
rejected. • It generally represents the hypothetical
statement that the researcher wants to prove.

Types of Alternative Hypotheses (Ha) Type I and Type II Errors


1. Directional hypothesis When making a decision about a proposed
 expresses direction hypothesis based on the sample data, one runs the
one – tailed risk of making an error. The following table on the
uses order relation of “greater than” or “less than”, next slide summarizes the possibilities:

2. Non – directional hypothesis


does not express direction
two – tailed
uses the “not equal to”

18
Type I and Type II Errors
 A Type I error is the mistake of rejecting the null
hypothesis when it is true.

 The symbol ∞ (alpha) is used to represent the probability


of a type I error.

 A Type II error is the mistake of failing to reject the null


hypothesis when it is false.

 The symbol  (beta) is used to represent the probability of


a type II error.

Level of Significance Critical Region

The probability of making Type I error or alpha The critical region (or rejection region) is the set of all values
of the test statistic that cause us to reject the null hypothesis.
error in a test is called the significance level of the
test. The significance level of a test is the maximum
value of the probability of rejecting the null Region of
hypothesis (Ho) when in fact it is true. rejection

Region of
acceptance

P - value Critical - value

19
Critical Value P - Value
A critical value is any value that separates the
critical region (where we reject the null The P-value (probability value) is the probability of
hypothesis) from the values of the test statistic getting a value of the test statistic that is at least as
that do not lead to rejection of the null extreme as the one representing the sample data,
hypothesis, the sampling distribution that assuming that the null hypothesis is true. The null
applies, and the significance level . hypothesis is rejected if the P-value is very small,
such as 0.05 or less.

Two-tailed, Right-tailed and Two-tailed Tests


Left-tailed Tests Given:
H0: = ; H1: ≠
• The tails in a distribution are the extreme
regions bounded by critical values.

20
Right – tailed Tests Left – tailed Tests
Given: Given:
H0: = ; H1: > H0: = ; H1: <

Steps in Hypothesis Testing Steps in Hypothesis Testing


1. Formulate the null hypothesis (Ho) that there is no 4. Determine the tabular value of the test.
significant difference between the items compared. State ***For a Z – test, the table below summarizes the
the alternative hypothesis (Ha) which is used in case Ho critical values at varying significance levels
is rejected.
Type of Level of Significance
2. Set the level of significance of the test, . Test 0.10 0.05 0.025 0.01

3. Determine the test to be used. One – ± 1. 28 ± 1. 645 ± 1.96 ± 2.33


 Z – TEST – used if the population standard deviation Tailed
is given
 T – TEST – used if the sample standard deviation is Two – ± 1.645 ± 1.96 ± 2.33 ± 2. 58
given Tailed

21
Steps in Hypothesis Testing Steps in Hypothesis Testing
4. Determine the tabular value of the test. 5. Compute for z or t as needed. Vary your solutions using
the formulas:
***For a T – test, one must compute first the
degree/s of freedom (df) then look for the tabular  For z – test
value from the table of Students’ T – Distribution. i. Sample mean compared with a population mean
ii. Comparing two sample means
i. For a single sample iii. Comparing two sample proportions
df = n – 1
ii. For two samples  For t – test
df = n1 + n2 – 2 i. Sample mean compared with a population mean
ii. Comparing two sample means

Steps in Hypothesis Testing Decision Criterion


6. Compare the computed value with its
Traditional Method:
corresponding tabular value, then state your
conclusions based on the following guidelines: ***Reject H0 (Accept H1 ) if the test
 Reject Ho if the absolute computed value is statistic falls within the critical region.
equal to or greater than the absolute tabular value ***Fail to reject H0 (Accept Ho) if the
 Accept Ho if the absolute computed value is less test statistic does not fall within the critical
than the absolute tabular value region.

22
Decision Criterion Decision Criterion
P - value method:
Another option:
*** Reject Ho (Accept H1 ) if P-value 
 (where  is the significance level, such as Instead of using a significance level
0.05) such as 0.05, simply identify the P-value and
leave the decision to the reader.
***Fail to reject H0 (Accept Ho) if
P-value > 

Z - TEST Z - TEST
1. Sample Mean (X) Compared with a Population Mean (μ) 2. Comparing Two Sample Means (X1 & X2)
( X – μ) n X 1 - X2
Z = Z =

δ δ (1/n1) + (1/n2)

Where:
Where:
X – sample mean
X1 – mean of the first sample
μ – population mean
X2 – mean of the second sample
n – number of items in the sample
n1 – number of items in the first sample
δ – population standard deviation
n2– number of items in the second sample
δ – population standard deviation

23
Z- TEST T- TEST
3. Comparing Two Sample Proportions (P 1 & P2) 4. Sample Mean (X) Compared with a Population Mean (μ)
P1 - P2 ( X – μ) n–1
Z = t =
(p1q1/n1) + (p2q2/n2) s
Where:
Where:
p1 – proportion of the first sample
X – sample mean
p2 – proportion of the second sample
μ – population mean
n1 – number of items in the first sample
n2– number of items in the second sample n – number of items in the sample

q1 = 1 – p1 s – sample standard deviation


q2 = 1 – p2

T- TEST Example 1
5. Comparing Two Sample Means (X1 & X2)
Data from a school census show that the
X1 – X2
t =
mean weight of college students is 45 kilos with a
standard deviation of 3 kilos. A sample of 100
(n1 – 1)(s1)2 + (n2 – 1)(s2)2 1 +1
college students were found to have a mean of 47
n1 + n2 – 2 n1 n2
Where: kilos. Are the college students really heavier than
X1 – mean of the first sample the rest using the 0.05 level of significance?
X2 – mean of the second sample
n1 – number of items in the first sample
n2– number of items in the second sample

s1 – standard deviation of the first sample


s2 – standard deviation of the second sample

24
Example 2 Example 3
A researcher wishes to find out whether or not there A sample survey of television programs in
is significant difference in the monthly allowance of
morning and afternoon students in his school. By random
Metro Manila shows that 80 out of 200 men and 75
sampling, he took a sample of 239 students in the morning out of 250 women dislike “May Bukas Pa”
session. The students were found to have a mean monthly program. One likes to know whether the difference
allowance of P142.00. The researcher also took a sample of between the two sample proportions, 80/200 = 0.40
209 students in the afternoon session . They were found to and 75/250 = 0.30, is significant or not at 0.05
have a mean monthly allowance of P148.00. The population level.
of students in that school have a standard deviation of
P40.00. Is there a significant difference between the two
samples at 0.01 level?

Example 4 Example 5
A researcher knows that the average height of
Beta company is manufacturing steel wire
Filipino women is 1.525 meters. A random sample
with an average tensile strength of 50 kilos. The
of 26 women was taken and was found to have a
laboratory tests 16 pieces and finds that the mean is
mean height of 1.56 meters, with a standard
47 kilos with a standard deviation of 15 kilos. Are
deviation of 0.10 meters. Is there reason to believe
the results in accordance with the hypothesis that
that the 26 women are significantly taller than the
the population mean is 50 kilos?
rest using the 0.05 level of significance?

25
Example 6 Example 7
It is known from the records of the city Two types of rice varieties are being considered for
schools that the standard deviation of math test yield and a comparison is needed. Thirty hectares were
scores on ABC test is 5. A sample of 200 students planted with the rice varieties exposed to fairly uniform
from the system was taken and it was found out that conditions. The results are tabulated below:
the sample mean is 75. Previous tests showed the Variety A Variety B
Average yield 80 sack/hec 85 sack/hec
population mean to be 70. Is it safe to conclude that Sample Variance 5.90 12.10
the sample is significantly different from the
population at 0.01 level? Is there significant difference in the yield of the two
varieties at 0.05 level of significance?

Example 8 Example 9
A manufacturer of flashlight batteries claims A company is trying to decide which brand of two
that the average life of his product will exceed 40 types to buy for their trucks. They would like to adopt Brand
hours. A company is willing to buy a very large c unless there is some evidence that Brand D is better. An
shipment of batteries provided the claim is true. A experiment was conducted where 16 from each brand were
random sample of 36 batteries is tested, and it was used. The tires were run under uniform conditions until they
wore out. The results are:
found out that the sample mean is 45 hours. If the Brand C: X1 = 40,000 km s1 = 5,400 km
population of batteries has a standard deviation of 5 Brand D: X2 = 38,000 km s2 = 3,200 km
hours, is it likely that the batteries will be bought?
What conclusion can be drawn?

26
Example 10 Analysis of Variance (F - Test)
-A test that was developed by Ronald A. Fisher
All freshmen in a particular school were
found to have a variability in grades expressed as a -A technique in inferential statistics designed to test
standard deviation of 3. two samples among these whether or not more than two samples (or groups)
freshmen, made up of 20 and 50 students each, are significantly different from each other
were found to have means of 88 and 85respectively.
Based on their grades, is the first group really
brighter than the second group using 0.01 level of
significance?

Analysis of Variance Analysis of Variance


Steps:
2. Compute degrees of freedom
1. Compute for the sum of squares
( x) 2 dft = rk – 1 = N – 1
TSS =  x 2

N

 ( xij ) 2  N
1 ( x) 2 dfb = k – 1
SSB =
r
dfw = dft – dfb
SSW = TSS – SSB

27
Analysis of Variance Contingency Table for ANOVA
3. Compute for the mean sum of squares Sources of Sum of Degree of Mean Sum F – Ratio
Variation Squares Freedom of Squares
SSB (df)
MSSB =
dfb Between SSB dfb MSSB
Column
SSW
MSSW = Within SSW dfw MSSW
dfw Column

4. Compute for the F – Ratio


Total TSS dft
MSSB
F=
MSSW

Exercise Exercise
1. The weights in kilograms of three groups of 5 members 2. The following are the mileage obtained after several road tests were
each are shown in the table below. Is there unusual run using 5 different kinds of gasoline on a Toyota Car.
variation among the groups? ( use ∞ = 0.05) Road Type of Gasoline
Test A B C D E
Group
Members 1ST 35 61 38 65 56
A B C
2ND 31 63 54 60 69
1 50 60 53
3RD 42 50 47 57 70
2 48 40 55
4TH 48 42 60 55 50
3 55 50 40
5TH 40 49 55 60 48
4 50 60 40
Is there significant difference among the mileage yields, at 1% level?
5 46 52 47

28
Exercise Chi – Square Test (X2)
3. Below are the bowling scores of four groups og four - Used to test significant difference or relationship
members each. At 5% significance level, find out if there - Used if data are in frequencies (enumeration data)
is unusual variation among the groups.
Members Group USES:
1. to test the goodness of fit of a normal curve; that is to
A B C D
find out whether or not a sample distribution conforms
1 98 100 87 90 with the hypothetical normal distribution
2. to find out whether or not an observed proportion is
2 78 95 92 93
equal to some given ideal or expected proportion
3 95 90 105 95 3. to test the independence of one variable from another
4 110 85 88 97 variable.

Formulas: Exercise
1. Test the hypothesis that educational attainment does not
i. For a 2 x 2 table (with YATE’s correction for continuity) depend on socio – economic status for the following 100
persons in a particular community.
( OF  EF  0.5) 2
X2 =  EF Socio – economic Educational Attainment
status
Finished College Did Not Finish
ii. For a non 2 x 2 table College
Poor 18 10
(OF  EF ) 2
X2 =  Middle Class 28 25
EF
Rich 14 5

29
Exercise Exercise
2. At 1% significance level, does college academic grade 3. At ABC Company, there are 28 males and 32
depend on the high school NSAT results for the following females. Out of the 28 males, 10 holds executive
200 students?
posts and the others do clerical work. Of the 32
NSAT Rating females, only 5 hold executive position and the
Academic
Grade Low Average High others do clerical work. Prepare a contingency
table, then test the hypothesis that position is
Above 85 13 25 21 independent on sex.
75 – 85 18 31 38

Below 75 14 20 20

Exercise
4. To determine whether type of personality is related to
academic performance, a random sample of 180 high
school students from a certain college were taken and the
Correlation
data are as follows: and
Low Average Average High Average Regression Analysis
Introvert 35 30 25
Extrovert 31 23 36

Is there a significant relationship between personality type


and academic performance?

30
Regression Analysis Regression Analysis
- concerned with the problem of estimation and
n xy   x  y 
forecasting b= n x 2   x 
2

FORMULA:
y = a + bx a = Y – bX
Where: Where:
y  predicted score Y  mean of the y values
a  y – intercept X  mean of the x values
b  slope of the line

Correlation Analysis Range of Values: r = [-1, 1]


- Concerned in the relationship of the changes of
the variables (+) r – shows a direct positive relationship
(- ) r – shows a negative or inverse relationship
Formula: Pearson Product Moment Correlation (r)
r = 0  this indicates no relationship
r = 1 perfect positive relationship
n( xy )  ( x)( y)
r = -1  perfect negative relationship
r=
[n( x 2 )  ( x) 2 ][n( y 2 )  ( y) 2

31
Interpretation: Testing the Significance of r
Pearson r Qualitative Description

±1 Perfect Correlation
t = r (n  22)
2

± 0.91 – ± 0.99 Very High 1 r


± 0.71 – ± 0.90 High

± 0.41 – ± 0.70 Marked

± 0.21 – ± 0.40 Slight/Low

0 – ± 0.20 Negligible

Exercise Exercise
1. It is generally known that the number of road accidents is inversely
proportional with road width. The following data shows the result of 2. The following table shows the final grades of ten students
a study indicating the number of accidents occurring per hundred in Algebra and Statistics.
thousand vehicles.
Algebra (x) 75 80 93 65 87 71
Road width (in feet) (x) 75 52 60 33 22
Statistics (y) 82 78 86 72 91 80
Number of accidents (y) 40 84 55 92 90

a. draw a scatter diagram


a. draw a scatter diagram b. find the equation of the LSRL
b. find the equation of the LSRL c. predict grade in Statistics if grade in
c. predict accident frequency for a road whose width is 55 feet;
Algebra is 78; 82; 89; 95; 100
48 feet
d. find the degree of relationship between road width and d. find the degree of relationship between grades in
accident frequency. Algebra and Statistics

32
Pilar B. Acorda
Email Address : pbacorda@yahoo.com
Mobile Number: 09359547319

33

You might also like