You are on page 1of 35

Inferential Statistics

Inferential statistics
—  Inferential statistics infer from the sample
to the population .
—  Statistics that use sample data to make
decision or inferences about a population
—  help assess strength of the relationship
between your independent (causal)
variables, and you dependent (effect)
variables.
Purposes
—  Estimating population parameter from
sample data
—  Testing hypotheses
Hypothesis Testing

—  Hypothesis : A premise or claim that we


want to test
—  The process of deciding statistically
whether the findings of an investigation
reflect chance or real effects at a given
level of probability.
—  Is also called significance testing
Elements of Testing hypothesis

—  Null Hypothesis


 Alternative hypothesis
 Identify level of significance
 Test statistic
 Identify p-value
 Conclusion
Hypothesis Testing

—  H0:  There is no association between the


exposure and disease of interest
—  H1:  There is an association between the
exposure and disease of interest
Hypothesis Testing

—  Hypothesis:- Hygiene procedures are


effective in preventing cold.
—  State 2 hypotheses:
Null: H0 : Hand-washing has no effect on
bacteria counts.
—  Alternative: Ha : Hand-washing has an
effect on bacteria counts.
Hypothesis Testing

—  Two types of pitfalls can occur that affect


the association between exposure and
disease
—  Type 1 error: observing a difference when
in truth there is none
—  Type 2 error: failing to observe a
difference where there is one.
Example - Efficacy Test for New
drug
—  Drug company has new drug, wishes to
compare it with current standard treatment
—  Federal regulators tell company that they
must demonstrate that new drug is better
than current treatment to receive approval
—  Firm runs clinical trial where some patients
receive new drug, and others receive
standard treatment
—  Numeric response of therapeutic effect is
obtained (higher scores are better).
Example - Efficacy Test for New
drug
—  Type I error - Concluding that the new
drug is better than the standard (HA)
when in fact it is no better (H0).
Ineffective drug is deemed better.
—  Type II error - Failing to conclude that
the new drug is better (HA) when in fact
it is. Effective drug is deemed to be no
better.
p-value

—  When you perform a hypothesis test in


statistics, a p-value helps you determine
the significance of your results.
—  A small p-value (typically ≤ 0.05) indicates
strong evidence against the null
hypothesis, so you reject the null
hypothesis.
—  A large p-value (> 0.05) indicates weak
evidence against the null hypothesis, so
you fail to reject the null hypothesis.
Confidence interval (CI)

—  A related, but more informative, measure


known as the confidence interval (CI) can
also be calculated.
—  CI = a range of values within which the
true population value falls, with a certain
degree of assurance (probability).
Confidence Interval - Definition

—  A range of values for a variable


constructed so that this range has a
specified probability of including the true
value of the variable
—  A measure of the study’s precision
Confidence interval

◦ 95% C.I. means that true estimate of effect


(mean, risk, rate) lies within 2 standard
errors of the population mean 95 times out
of 100
Interpreting Results

—  Confidence Interval: Range of values for a point


estimate that has a specified probability of including
the true value of the parameter.
—  Confidence level: refers to the percentage of all
possible samples that can be expected to include the
true population parameter. For example, suppose all
possible samples were selected from the same
population, and a confidence interval were computed
for each sample. A 95% confidence level implies that
95% of the confidence intervals would include the
true population parameter.
—  Confidence Limits: The upper and lower end points
of the confidence interval.
Real Life Example of a
Confidence Interval
—  The U.S. Census Bureau routinely uses
confidence levels of 90% in their surveys. One
survey of the number of people in poverty in
1995 stated a confidence level of 90% for the
statistics “The number of people in poverty in the
United States is 35,534,124 to 37,315,094.” That
means if the Census Bureau repeated the survey
using the same techniques, 90 percent of the time
the results would fall between 35,534,124 and
37,315,094 people in poverty. The stated figure
(35,534,124 to 37,315,094) is the confidence
interval.
Selection of Tests of Significance
Hypothesis Testing
—  TestStatistic:
—  n>30 we use Z test
—  n<30 we use t test
Hypothesis testing for difference
between two independent means
—  Independent sample T test is used
—  Example:- in study the effect of Age on
practicing breast self examination “BSE”
—  H0: there is no age difference between
women who perform BSE and women
who not
—  Ha: there is age difference
—  Level of significance= 0.05
Hypotheses testing for paired
samples
—  Paired sample T test is used for analysis
—  Example:- to study the level of security in
Libyan hospitals before and after the 17th
revolution, we asked 206 doctors who work
in Emergency departments about their
assessment of security in scale from 1 to 10
where 1 = worse security
—  10= the best
—  H0: there is no difference in security level
before and after the revolution
—  Ha: there is difference before and after
Alpha=0.05
Chi square test
—  Testing the significance between two
proportions
—  Can be used for more than two groups
—  Example: in the same doctors study we
want to know if the gender of doctors
associated with violence
Correlation

—  Finding the relationship between two


quantitative variables without being able
to infer causal relationships
—  Correlation is a statistical technique used
to determine the degree to which two
variables are related
Simple Correlation coefficient (r)

—  Statistic showing the degree of relation


between two variables
—  It is also called Pearson's correlation or
product moment correlation coefficient.
—  It measures the nature and strength
between two variables of
the quantitative type.
Scatter diagram

—  Rectangular coordinate


—  Two quantitative variables
—  One variable is called independent (X)
and
—  the second is called dependent (Y)
—  Points are not joined
—  • No frequency table
Scatter plots

—  The pattern of data is indicative of the


type of relationship between your two
variables:
—  Øpositive relationship
—  Ønegative relationship
—  Øno relationship
Simple Correlation coefficient (r)

—  The sign of r denotes the nature of


association
—  while the value of r denotes the strength
of association.
Simple Correlation coefficient (r)
—  ifthe sign is +ve this means the relation is
direct (an increase in one variable is
associated with an increase in the other
variable and a decrease in one variable is
associated with a decrease in the other
variable).
—  While if the sign is -ve this means an
inverse or indirect relationship (which
means an increase in one variable is
associated with a decrease in the other).
—  If r = Zero this means no association or
correlation between the two variables.
—  If 0 < r < 0.25 = weak correlation.

—  If0.25 ≤ r < 0.75 = intermediate


correlation.
—  If 0.75 ≤ r < 1 = strong correlation.

—  If r = l = perfect correlation.

You might also like