You are on page 1of 10

[RES] 1.

01 REVIEW OF STATISTICAL ANALYSIS


Dr. Adversario || June 29, 2015

RES

Transcribers: Campbell, Candare, Caraan, Carasig, Carballo


Editors: Adviento

A. Measures of Frequency

OUTLINE
I.

II.

III.
IV.
V.
VI.
VII.
VIII.
IX.
X.
XI.

Descriptive Analysis
A. Qualitative and Quantitative measures
1. Measures of Frequency
2. Measures of Location
3. Measures of Central Tendency
4. Measures of Dispersion
B. Tabular and Graphical Presentation
Inferential Analysis
A. Estimation
B. Hypothesis Testing
Factors to be considered in choosing the proper statistical tests
Types of Variables and Levels of Measurement
Assumption of Distribution
Test for Difference Between Group Proportions
Test for Difference of Group Means/Medians
Statistical Tools To Investigate Relationship Between Variables
Analysis of 2x2 tables
Measures of Effects/Association
Measuring the Accuracy of the Diagnostic Tests

Legend:
Remember
(Exams)

Lecturer

Book

Previous
Trans

Trans
Comm

1. Count

absolute number of persons/elements with the characteristic


2. Ratio

single number representing the relative size of 2 numbers

a/b (k)
3. Proportion

special type of ratio where the numerator is part of the


denominator

a/a+b (k)
4. Rate

frequency of occurrence of events in a given interval of time


B. Measures of Location
1. Percentile

one of the 99 values of a variable which divides the distribution


into 100 equal parts
2. Decile

one of the 9 values of a variable which divides the distribution into


10 equal parts
3. Quartile

one of the 3 values of a variable which divides the distribution into


4 equal parts

LEARNING OUTCOMES
I. Determine the appropriate descriptive measure for summarizing data
II. Determine the appropriate method for presenting data
III. Determine the appropriate statistical test for analyzing data
Descriptive Statistics

Method to summarize and present data in a form which will make


it easier to analyze and interpret
Consists of the collection, organization, summarization, and
presentation of data

P25 = D2.5 = Q1
P50 = D5 = Q2
P75 = D7.5 = Q3

Inferential Statistics

Method to make generalizations and conclusions about a target


population based on results from a sample
INFERENCES from samples to populations
Uses probability

Summarizing Figures
Qualitative Measures
Frequency
Location
Quantitative Measures
Central Tendency
Dispersion
Tabular presentation
Graphical presentation

C. Measures of Central Tendency

Mean: average
Median: middlemost
Mode: most frequent

I. DESCRIPTIVE ANALYSIS

Page 1 of 10

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS DR. ADVERSARIO

Interpretation

Mean: The average weight of patients is 14.4 kg

Median: Half of the patients weighed less than 15.85 kg while the
other half weighed more than or equal to 15.85 kg

Mode: The usual weight of the patients is 12.6 and 16 kg


Choice of the Measures of Central Tendency

scale of measurement

nature of the distribution

Coefficient of Variation

Comparing 2 different variables

Comparing 2 different populations on the same variable

Measure of relative dispersion which expresses the standard


deviation as a percentage of the mean

D. Measures of Dispersion

Range
Variance
Standard Deviation
Coefficient of Variation
Interquartile Range
Page 2 of 10

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS DR. ADVERSARIO

Summary

Proportions and percentages are used to summarize nominal and


ordinal data
Percentiles are useful to compare an individual observation with a
norm
The median is used for ordinal data or skewed numerical data
The range is used with numerical data when the purpose is to
emphasize extreme values
The standard deviation is used when the mean is used
The coefficient of variation is used when the intent is to compare
distributions measured on different scales
The interquartile range is used to describe the central 50% of a
distribution, regardless of its shape
Variance and standard deviation can be used to directly compare
two samples with same units of measure

Type of Graphs

Pie Graph

Bar Graph
o Vertical

Tabular and Graphical Presentation

Figures

Visual presentation of results


o Graphs
o Diagram
o Photograph
o Pen and ink drawings
o Flow Charts
o Schematics
o Maps

Page 3 of 10

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS DR. ADVERSARIO


o

Horizontal

Frequency Polygon

Component

Scatterplot

Choosing the appropriate graph


GRAPH
Histogram or
frequency
polygon
Bar
(Horizontalor
Vertical)
Pie or
component bar
o

Histogram

NATURE OF
VARIABLE
Quantitative
continuous

PURPOSE

Qualitative or
quantitative
discrete
Qualitative

Comparison of absolute or
relative counts between
categories
Breakdown of a group total
where the number of
categories is not too many
Shows trend of data or changes
with time
Correlate data between two
variables

Line graph

Time Series

Scatterplot

Quantitative

Graphic representation of a
frequency distribution

Figure Checklist

Is the figure necessary?

Are the data plotted accurately

Is the grid scale correctly proportioned?

Are parallel figures or equally important figures prepared


according to timescale?

Avoid 3D figures for 2D data

Avoid non data ink (ticks, grids, frames) and

chart junk

Avoid optical illusions: broken lines, markers,

hatching fill patterns, improper aspect ratios

Line Graph

Page 4 of 10

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS DR. ADVERSARIO


II. INFERENTIAL ANALYSIS
A. Estimation

Is the process by which a statistic computed for a random sample


is used to approximate (estimate) the corresponding parameter
o Parameter- numerical constant obtained by observing the
total population
o Statistic- numerical variable obtained by observing a random
sample from the population

Summarizing Figure
Box Plot

Parameter

Mean

Variance

Standard deviation

Proportion

Area of Practice

NCR No. (%)

Non-NCR No.
(%)

Co-management

27 (48.2143)

29 (51.7857)

56

CP Clearance

23 (63.8888)

13 (36.1111)

36

Diagnostic Procedures

22 (56.4103)

17 (43.5897)

39

Ventilator Management

18 (43.9024)

23 (56.0976)

41

Reasons for Referral

p
X1-X2

Difference between
Two Proportions

P1-P2

p1-p2

Example:

9 (50.0000)

9 (50.0000)

18

Weaning from ventilator

6 (30.0000)

14 (70.0000)

20

What is wrong with the table?


Too many decimal places
Area of Practice can be merged into one heading
Gridlines should be limited to three to highlight figures

Co-management

Research Objective

Results

To estimate the prevalence of


parasitism among Filipino children
1-5 years old

(61%, 79%)

To determine the average dental


carise score among public
elementary school children

70%

15%
(14.7%, 15.3%)

Estimate: 70% and 15% are the point estimates for each objective and
the values inside the parentheses are the interval estimates.
Example 1: Estimation of Population Mean

Area of Practice
Reasons for Referral

Point estimate single numerical value used to approximate the


population
Interval estimate consists of 2 numbers, a lower limit and an
upper limit, which serves as the bounding values within which the
parameter is expected to lie with a certain degree of confidence
Point estimate is more precise but interval estimate is more likely
to be correct because it gives you a range of values.

Total

Peri-operative evaluation
for thoracic surgery

1-2

Area of
Practice

Difference between
Two Means

Table 13 Most Common Reasons for Referral to Ones Specialty in Area


of Practice

Statistic

NCR No. (%)

Non-NCR No.
(%)

Total

27 (48.2)

29 (51.8)

56

CP Clearance

23 (63.8)

13 (36.1)

36

Diagnostic Procedures

22 (56.4)

17 (43.6)

39

Ventilator Management

18 (43.9)

23 (56.1)

41

Peri-operative evaluation
for thoracic surgery

9 (50)

9 (50)

18

Weaning from ventilator

6 (30)

14 (70)

20

A municipal health officer was interested in identifying factors affecting


the utilization of health services in his area. Among the factors that he
considered was the accessibility of the Rural Health Unit. He
interviewed a random sample of 25 patients and asked about the
distance travelled in going from their homes to the clinic. His findings
showed a mean travel distance of 7km. What is the point estimate of
the mean distance travelled by the population of patients served by the
clinic.
The point estimate of the mean distance traveled by the patients from
their homes to the clinic is 7km, which is the result obtained from the
sample.

Page 5 of 10

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS DR. ADVERSARIO

H0: No difference between those using cadavers and models


H1 (two-tailed test): there is a difference
H1 (one-tailed test): those using models have greater
performance or vice versa

Example 2: Estimation of the Population Proportion


A survey was conducted to study the dental health practices of adults
in a certain urban population. Of 300 adults randomly selected and
interviewed, 123 indicated that they had regular dental check-up twice
a year. What is the point and the 95% interval estimate of the
population proportion who had regular dental check-up?

2.

The point estimate of the population proportion who had regular


dental check-up is 41% (

123
300

Stating the level of significance,


When we arbitrarily set the level of significance at , we are
setting the probability that we shall erroneously reject a true H0 to
be at most equal to e.g. if we set =0.5, the probability that we
are rejecting a true hypothesis is at most only 5%

100):

3.

B. Hypothesis Testing

Process of hypothesis testing

Type I error (
error)

Probability of
rejecting a true
H0

Concluding there is a
difference when none
exists

Type II error
( error)

Probability of
not rejecting a
false H0

Concluding that no
difference exists when
there is

Choosing the test statistic and determining its sampling


distribution
Depends on the sampling distribution of the sample statistic
Probability distribution tables of the different test statistic
o normal table: z statistic
o t table: t statistic
2
2
o X table: X statistic
Factors to be considered in choosing the appropriate statistical
test
o Objectives of the study
o Type of variable
o Level of measurement
o Whether the samples are related or independent
o Assumption on the distribution

STEPS IN HYPOTHESIS TESTING


1.

Stating the null hypothesis, H0 and the alternative hypothesis, H1

Example: A study to compare the performance in Anatomy of 2 groups


of students, those using cadavers for demonstration and those using
models. If the parameter for evaluating student performance is the
proportion who obtain a grade of 2.0 or better, then the H0 and H1 are
formulated as follows:

4.

Determining the critical region


Critical region set of values of the test statistics which will lead
me to reject a null hypothesis
Critical region for a Two-Tailed z test with =0.05

Page 6 of 10

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS DR. ADVERSARIO


OBJECTIVES
To compare the level of
parameter (mean) with a prespecified value (i.e., standard
value, national figure, previous
results)
To compare the level of
parameter (proportion) with a
pre-specified value (i.e., standard
value, national figure, previous
results)

5.

To assess the health effects of


vehicular emissions on vulnerable
population groups by looking at
mean blood lead levels between
schoolchildren and street child
vendors

To compare the parameter


(proportion) between 2 groups

To determine success rate defined


as the proportion who had 1
otitis media episode in the first
year of treatment between
medically and surgically treated
groups

To compare the parameter


(mean) between 2 or more
groups

To compare the hypoalgesic effect


as measured through pain scores
of true (distal & proximal to the
torniquet) and sham acupuncture

To compare the parameter


(proportion) between 2 or more
groups

To compare the prevalence of


current smoking among different
income groups categorized into
quintiles

To determine whether two or


more quantitative variables are
related

To determine whether systolic


blood pressure of patients in the
recumbent and standing positions
vary with each other

To determine whether two or


more qualitative variables are
related

To determine if there is an
association between gender and
smoking status

Example:

Divide the CR into 2 equal parts, /2= 0.025, such that one part is
located in each tail end of the sampling distribution of the test
statistic
From the normal table, z value corresponding to a probability of
0.025 is 1.96
CR z 1.96 and z -1.96

6.

Making the statistical decision i.e. whether or not to reject the


null hypothesis
Rejecting of Nor rejecting the null hypothesis (H0)
1. If the computed value of the test statistic falls in the critical
region, then we reject H0
2. If the probability (p-value) of getting the computed test
statistic under H0 is low, we can say that the sample data
cannot support H0 and thus we can reject H0

7.

Drawing conclusions about the population


STATISTICAL DECISION
Reject the null hypothesis (H0)
Do not reject the null
hypothesis (H0)

CONCLUSION
State the alternative hypothesis
(H1)
There is no sufficient evidence
to say (state the alternative
hypothesis)
NOTE: We dont accept the null
hypothesis

III. FACTORS TO BE CONSIDERED IN CHOOSING THE APPROPRIATE


STATISTICAL TEST

To determine the prevalence of


breast cancer among Filipino
women aged 50-54 years old if the
prevalence based on previous
studies is about 5%

To compare the parameter


(mean) between 2 groups
Computing the test statistic
H0: PM = PC
H1: PM PC
Level of significance () = 0.05
What is the critical region (CR)?

EXAMPLE OF RESEARCH
OBJECTIVE
To determine if the average life
span of Filipinos has changed over
the years since 1995 which was 65
years old

Objectives of the study


Type of variable
Level of measurement
Whether the samples are related or independent
Assumption on the distribution

Page 7 of 10

IV. TYPES OF VARIABLES AND LEVELS OF MEASUREMENT

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS DR. ADVERSARIO


Study Objective

A. TYPES OF VARIABLES
1.

2.

QUANTITATIVE
Variables can be measured and ordered according to quantity or
amount, or whose values can be expressed numerically
o age, height, weight, no. of correct answers
Discrete: integers/whole numbers
Continuous: fractions/decimals
QUALITATIVE
Categories are simply used as labels to distinguish one group from
another
sex, urban-rural classification, religion, region in the country,
occupation, marital status, disease status

LEVELS OF MEASUREMENT
1.

NOMINAL

Number or names which represent a set of mutually exclusive and


exhaustive classes to which individuals or objects may be assigned

sex, regions, race, occupation, patient id no.


2. ORDINAL

Classes can be ordered or ranked


dehydration status: none, some, severe socio-economic status:
low, middle, high
3. INTERVAL

Exact/equal distance between two categories can be determined


Zero point is arbitrary and does not mean absence of the
characteristic

temperature, calendar time, IQ


4. RATIO

Zero point is fixed


weight, height, blood pressure, number of seizure recurrence,
number of pre-natal visits
Number of samples and whether they are related or independent
Study Variable
Average dental caries
score among public
elementary school
children
Prevalence of parasitism
amount Filipino children
1-5 yrs old
Mean blood lead levels
between schoolchildren
and street child vendors
Success rate between
medically vs surgically
treated groups for otitis
media

Type of Variable

To compare the mean blood lead


levels between schoolchildren
and street child vendors
To compare the performance of
10 pairs of students matched by
IQ, one subjected to
programmed materials and the
other subjected to lecture type
of learning process
To compare the prevalence of
current smoking among different
license groups categorized into
quintiles
To determine change in the level
of knowledge on breast cancer at
baseline and after the
distribution of the DOH health
education material

Number of
Samples
2

Type of Sample

Related

Independent

Related

Independent

V. ASSUMPTION OF DISTRIBUTION
Parametric
Assumptions

Random selection

Normality

Homoscedasticity
Numerical data

Interval

Ratio

Non-Parametric
Few assumptions

Non-numerical data

Nominal

Ordinal
Smaller sample size

Level of
Measurement

Quantitative

Ratio

Qualitative

Nominal

Quantitative

Ratio

Qualitative

Nominal

VI. TEST FOR DIFFERENCE BETWEEN GROUP PROPORTIONS

Independent
Related

Page 8 of 10

2
Chi-square/Fishers
McNemar

>2
Chi-square
Cochran Q

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS DR. ADVERSARIO


EXAMPLES
Study Objective
To determine if the
average dental caries score
among public elementary
school children has
improved since last years
survey
To determine if the
prevalence of parasitism
among 6 yo children
entering school differs
from the national figure
which is 70%
Study Objective

VII. TEST FOR DIFFERENCE BETWEEN GROUP MEANS/ MEDIANS


2
Independent
Independent
t-test
Wilcoxon
MannWhitney

Parametric
NonParametric

Related
Paired ttest
Wilcoxon
signed
rank

>2
Independent
One-way
ANOVA
Kruskal
Wallis

2
Interval
Ratio
Ordinal

Related
Two-way
ANOVA
Friedmann

>2

Independent
Independent
t-test

Related
Paired ttest

Independent
One-way
ANOVA

Related
Two-way
ANOVA

Wilcoxon
MannWhitney

Wilcoxon
signed
rank

Kruskal Wallis

Friedmann

VIII. STATISTICAL TOOLS TO INVESTIGATE RELATIONSHIP BETWEEN


VARIABLES VARY ACCORDING TO:
Nominal
Cramer coefficient
Phi coefficient

Interval/Ratio
Pearson product
moment correlation
(simple and multiple)

Kappa coefficient of
agreement

Linear regression
(simple and multiple)

Chi-square test of
association

Ordinal
Spearman rankorder correlation

To assess the health


effects of vehicular
emissions on vulnerable
population groups by
looking at mean blood
lead levels between
schoolchildren and street
child vendors
To compare the rate of
success for the treatment
of otitis media defined as
the proportion who had
otitis media episode in
the first medically and
surgically treated groups
To compare the
performance of 10 pairs
of students matched by
IQ, one subjected to
programmed materials
and the other subjected
to lecture type of
learning process
To compare the
hypoalgesic effect as
measured through pain
scores of true (distal &
proximal to the
tourniquet) and sham
acupuncture
To compare the
prevalence of current
smoking among different
income groups
categorized into quintiles
To determine change in
the level of knowledge
on breast cancer at
baseline and after the
distribution of DOG
health education
material
To determine whether

Page 9 of 10

Level of
Measurement

No. of
Samples

Test
Statistic

Ratio

t-test
for 1
mean

Nominal

z-test
for 1
proporti
on

Level of
Measurem
ent

No.
of
Sam
ples

Type of
Sample

Test
Statistic

Ratio

Indep
enden
t

Indepen
dent ttest

Nominal

Indep
enden
t

Chisquare
test

Ordinal

Relate
d

Wilcoxo
n signed
ranks
test

Ratio

Indep
enden
t

ANOVA

Nominal

Indep
enden
t

Chisquare
test

Nominal

Relate
d

McNemars
Change
Test

[RES] 1.01 REVIEW OF STATISTICAL ANALYSIS DR. ADVERSARIO


systolic BP of patients in
standing and recumbent
position vary with each
other
To determine if there is a
relationship between
gender and smoking
status

Ratio

Pearson
correlati
on

1
Nominal

Chisquare
test of
associati
on

Example 1

What is the proportion of non-significant reduction in cry/fuss duration


among those whose mothers were on low-allergenic diet?
Data Layout for Case Control

Hypothesis
st
Among breastfed infants with colic presenting in 1 6 wks of life,
elimination of multiple, major allergenic food proteins from the
maternal diet is associated with a reduction in crying and fussing.
A randomized, controlled trial of a low-allergen maternal diet was
conducted among exclusive breastfed infants presenting with colic. The
primary endpoint was the duration of crying/fussing measured in
minutes within 48 hours taken at baseline (days 1 & 2) and on days (8 &
9).
Variable
Mean cry/fuss
duration, min/48h
Days 1 & 2
Days 8 & 9

Low-allergen diet

Control diet

690
431

631
509

What is the objective of the study


What is the level of measurement?
How many sample/groups involved?
Are they related or independent?
What is the appropriate test statistic?

between groups at baseline

between groups on days 8 & 9

within each group between


baseline and on days 8 & 9

Exposure
status
Exposed
Unexposed
Total

Ratio of 2 odds, the odds of exposure among cases and the odds of
exposure among the controls

Comparison of mean
between groups
Ratio
2
Independent
Independent t-test

XI. MEASURING THE ACCURACY OF THE DIAGNOSTIC TEST


Data Lay-out
Diagnostic or
Screening
Test
Positive Test

Paired t-test

Negative Test

Data Layout for Cohort

Exposed
Unexposed

Total
Without
Disease
B
D

Controls
B
D
B+ d

Odds Ratio (OR)

IX. ANALYSIS OF 2X2 TABLES

Disease Status
With
Disease
A
C

Outcome
Cases
A
C
A +C

Total

A+B
C+D

X. MEASURES OF EFFECT/ASSOCIATION
Relative Risk (RR)
Ratio of incidence of disease in the exposed to the incidence of disease
in the unexposed

Page 10 of 10

Gold Standard
Disease
Disease Absent
Present
True
False Positive (b)
Positive
(a)
False
True Negative
Negative
(d)
(c)
TP + FN
FP + TN
(a + c)
(b + d)

Total

TP + FP
(a + b)
FN + TN
(c + d)
TP + FP + FN + TN
(a + b + c + d)

You might also like