You are on page 1of 9

Null Hypothesis Significance Testing Putting our Hypothesis to the test We predict the IV will affect the DV.

V. How do we know if it did? Inferential stats. In God we trust. All others bring data. - W. Edward Denning There are three kinds of lies: Lies, damn lies, and statistics. - Mark Twain

The purpose of statistics in psychology To determine if our data supports our hypothesis To make psychological conclusions based empiricism, and not intuition To help us make inferences about a large group (a population) based on observations of a smaller subset of that group (a sample)

Hypothesis: Students with Obsessive-Compulsive Disorder (OCD) will take longer to complete an exam compared to students without OCD. I conduct a study. Students with OCD took, on average, 42.4 minutes to complete the exam. Students without OCD took 38.6 minutes.

Can I conclude that students with OCD take longer to complete exams? What other info will you need to reach such a conclusion? 1. 2.

Example Hypothesis: People underestimate the amount of calories that they consume Sample: 104 college students (60% female) Measures & Procedure Participants are offered snacks (healthy and unhealthy) while they sit in a waiting room for 15 minutes After waiting for the study to begin, they complete a series of questionnaires, which includes a question about the number of calories they ate while waiting for the study The researchers record the actual number of calories eaten

Results

But are these differences meaningful? Can we conclude that people tend to underestimate how many calories they have consumed? We cant just eyeball it. We need to empirically determine if the groups differ. We must use inferential statistics

Null Hypothesis Significance Testing (NHST) Use statistics to confirm if observed differences/relationships from a sample are likely to reflect true differences/relationships in the population Uses probability theory to determine the likelihood that there is NO TRUE DIFFERENCE in the population based upon what we found in our sample In other words, that any difference we see in our groups occurred by chance alone, and not because of the effect of the IV If the differences are sufficient enough that it is unlikely that they could have happened just by chance, we have found support for our hypothesis (i.e., we have an effect!)

Allows us to determine if the differences between groups is likely to be a true difference if the association between variables is like to be a true association

NHST Null Hypothesis (H0) The IV has no effect on the DV There is no difference between groups There is no association between variables Example: Alternative Hypothesis (H1) The IV does have an effect on the DV There is a difference between groups There is an association between variables Example:

An Either/Or Proposition we either reject the H0 Which means it is likely that we have a true difference Or, we fail to reject the H0 It is unlikely that there is a true difference Limits NHST is based on probability, so data cant prove a hypothesis, only support it NHST only tells us whether or not it is likely an effect. It tells us nothing about the magnitude of the effect.

Test of Probability that H0 is true NOT the probability that H1 is true p-value or Probability of obtaining the observed effect if H 0 were true In other words, the probability of obtaining the observed effect by chance Example: Typically interested in p < .05probability less than 5% that the observed effect would occur if there were no real effect in the population There is still a chance that the observed effect is not a real world effect

(alpha) level: testing for statistical Significance p < .05 means: 05probability less than 5% that the observed effect would occur if there were no real effect in the population

p < .05, .01, .001; p < .10 a trend toward significance (use with great care) Should report exact p-value The smaller the probability, the greater the chance that the effect will be replicated in another study

Choosing the p-value Do so before you begin a study Avoid experimenter bias Ex: p = .09, but its a really cool result! Ill just set my p value at .10 instead of the traditional .05. SKETCHY and UNETHICAL Avoid inflating results Simply conducting a lot of analyses increases the liklihood of obtaining a false effect

Two Possible Outcomes p < .05 Reject the Null Hypothesis Supports the alternative hypothesis (the study hypothesis) p > .05 Fail to reject the Null Hypothesis Does not support the alternative hypothesis Remember: we can just support, not prove, either H 0 or H1

Estimated vs. Actual Calories stats But are these differences meaningful? t(103) = 5.79, p < .001 Reject H0:

Can we conclude that people do tend to underestimate how many calories they have consumed?

SOME PRACTICE EXAMPLES ARE GIVEN A researcher wants to test the hypothesis that people will be more likely to give money to a woman who is begging than to a man who is begging

A researcher wants to test the following hypothesis: People are more likely to clap at the end of a movie when a large crowd is present compared to when there is a small crowd or none at all.

Error Because NHST relies on probability, there is always the chance for error Hypotheses are supported or not supported, but never proven or disproven

Null Hypothesis is False Reject Null Hypothesis Fail to Reject Null Hypothesis Correct Type II error

Null Hypothesis is True Type I error Correct

Type I Error When you reject the null hypothesis but it is actually true You find statistically significant results to support your hypothesis, but the effect isnt real Most likely to occur when running many analyses Can reduce Type I error by having a more strict p value (say, p < .01 instead of p < .05) Why is this a problem?

JELLY BEAN CARTOON

Type II Error You fail to reject the null hypothesis when it is actually false You fail to find statistical significance for a real effect Usually caused by small sample size Why is this a problem?

Estimated vs. Actual Calories

Effect Size Very important! NHST only tells us whether or not there was an effect, but tells us nothing about the magnitude of the effect Effect Size: An estimate of the size of an effect that is mostly independent of sample size p-value is very dependent on sample size Many types of effect sizes for different statistical analyses Cohens d and Pearsons r Pearsons r The correlation between two variables Small: r > .10 Medium: r > .24 Large: r > .37

Cohens d The size of the difference between the means of two groups Determined by The size of the difference between the two groups The amount of variability within the groups Small: Medium: Large:

Why are Effect Sizes Important? Help to determine if a statistically significant effect is meaningful in applied terms

Useful in comparing effects across studies (meta-analysis) Necessary for power analyses/determining the necessary sample size for a study Can also be used in pilot data to determine if its worthwhile to collect a larger sample

Effect Size and Statistical Significance Not the same thing Effect size is mostly independent of sample size, statistical significance is very much affected by sample size Should report both

Example

Power and Sample Size Just because an effect is not significant does not mean it isnt there Power = the probability that the null hypothesis will be correctly rejected when it is false Power = the ability to detect statistically significant effects Power = 1 Type II Error Generally want power = .80 or larger

Determinants of Power Significance level Smaller p-values mean less power

Effect size Its easier to detect a large effect than a small one Sample size The most important determinant The larger the sample size, the greater the likelihood of detecting a significant effect Can you have too much power? Yep! With a large enough sample size, any difference will be statistically significant Example: split all Americans into two randomly assigned groups do they differ on weight? We generally worry more about Type I error than Type II p < .05 Power only needs to be .80 As a result, Type II error is much more common

Power Analyses Before you begin a study, should conduct a power analysis to determine the necessary sample size Need to know the effect size of interest Need to know the planned p-value

Some practice p = .09, d = .83, n = 20 What conclusion should be drawn? What type of error is most likely? Why would that error have occurred?

You might also like