Professional Documents
Culture Documents
SPSS TUTORIAL
There is a data set posted on the BlackBoard site that you can use to follow along
In SPSS, you can enter your data and calculate descriptive and inferential statistics using
fairly simple steps.
Statistics
IQ
N Valid 88
Missing 0
(the Valid lets you know how many pieces of data were entered for this variable [also see
the N—Sample Size], Missing lets you know how many data points in this variable are
not entered/answered)
IQ
Frequency Percent Valid Cumulative
Percent Percent
Valid 75 1 1.1 1.1 1.1
79 1 1.1 1.1 2.3
81 2 2.3 2.3 4.5
82 3 3.4 3.4 8.0
83 2 2.3 2.3 10.2
84 2 2.3 2.3 12.5
85 3 3.4 3.4 15.9
86 2 2.3 2.3 18.2
88 3 3.4 3.4 21.6
89 2 2.3 2.3 23.9
90 1 1.1 1.1 25.0
91 3 3.4 3.4 28.4
92 2 2.3 2.3 30.7
93 2 2.3 2.3 33.0
94 1 1.1 1.1 34.1
95 6 6.8 6.8 40.9
96 2 2.3 2.3 43.2
97 1 1.1 1.1 44.3
98 2 2.3 2.3 46.6
99 1 1.1 1.1 47.7
100 3 3.4 3.4 51.1
101 2 2.3 2.3 53.4
102 3 3.4 3.4 56.8
103 2 2.3 2.3 59.1
104 1 1.1 1.1 60.2
105 3 3.4 3.4 63.6
106 4 4.5 4.5 68.2
107 3 3.4 3.4 71.6
108 3 3.4 3.4 75.0
109 3 3.4 3.4 78.4
110 1 1.1 1.1 79.5
111 4 4.5 4.5 84.1
112 1 1.1 1.1 85.2
114 1 1.1 1.1 86.4
115 2 2.3 2.3 88.6
118 3 3.4 3.4 92.0
120 2 2.3 2.3 94.3
121 1 1.1 1.1 95.5
127 1 1.1 1.1 96.6
128 1 1.1 1.1 97.7
131 1 1.1 1.1 98.9
137 1 1.1 1.1 100.0
Total 88 100.0 100.0
(this table lets you know how many people scored/responded with a certain score
[Frequency] and the percentage of total participants answered that same way)
IQ
14
12
10
4
y
c
n
e 2 Std. Dev = 12.98
u
q Mean = 100.3
e
rF 0 N = 88.0 0
7 5.0 8 5.0 9 5.0 1 05.0 115.0 1 25.0 1 35.0
8 0.0 9 0.0 1 00.0 1 10.0 1 20.0 1 30.0
IQ
(this is a histogram of the chart—it gives you a visual view of what your data
distribution/variability of the data, as well as the central tendencies, including how
NORMAL the data is—again, NORMAL data will look somewhat like a bell-curve)
OUTLIERS/ANOMALIES/IMPOSSIBLE NUMBERS
So what happens when we find an outlier or “odd” numbers?
Summarizing the data is similar to the above step (in fact, you can do them both at the
same time!).
Click on ANALYZE, then DESCRIPTIVE STATISTICS, and then FREQUENCIES
again. From here click on STATISTICS. Now you can choose any of the statistics listed
here. I would recommend the ones we talked about in class, such as MEAN, MEDIAN,
MODE, RANGE, STANDARD DEVIATON, STANDARD ERROR OF THE MEAN
(S.E. OF THE MEAN). You will get a similar output with the addition of a table that
looks like this:
Statistics
IQ
N Valid 88
Missing 0
Mean 100.26
Std. Error of Mean 1.384
Range 62
1 03
1 02
1 01
1 00
99
IQ 98
I
C
%
5
9 97
N = 88
IQ
(the dot in the middle represents the SAMPLE MEAN and the bars above and below
represent the UPPER and LOWER CI)
You can also go to ANALYZE, then DESCRIPTIVE STATISTICS, and then EXPLORE.
Then move variable of interest into the DEPENDENT LIST to the right. Make sure you
have STATISTICS or BOTH checked for DISPLAY. Then hit OK. You will see a print
out like this:
Descriptives
Statistic Std. Error
IQ Mean 100.26 1.384
95% Confidence Interval for Lower Bound 97.51
Mean
Upper Bound 103.01
5% Trimmed Mean 99.78
Median 100.00
Variance 168.609
Std. Deviation 12.985
Minimum 75
Maximum 137
Range 62
Interquartile Range 18.50
Skewness .394 .257
Kurtosis -.163 .508
(this gives us a numerical summary of the CI versus a visual display, as seen above)
1 10
1 00
90
IQ
I
C
%
5
9 80
N = 78 10
0 1
DROPOUT
(this shows the MEAN and CI for the variable [IQ] across the different groups
[DROPOUT] as well as the CI for each group)
Similar to the example above (CI OF SINGLE MEAN), you can go to ANALYZE, then
DESCRIPTIVE STATISTICS, and then EXPLORE. Put the variable of interest in the
DEPENDENT LIST and the variable you wish to group it by in the FACTOR LIST. Hit
OK. You should have a print out like this:
Descriptives
DROPOUT Statistic Std. Error
I 0 Mean 101.65 1.467
Q
95% Confidence Interval for MeanLower Bound 98.73
Upper Bound 104.58
5% Trimmed Mean 101.25
Median 102.00
Variance 167.918
Std. Deviation 12.958
Minimum 75
Maximum 137
Range 62
Interquartile Range 17.50
Skewness .281 .272
Kurtosis -.180 .538
1 Mean 89.40 2.130
95% Confidence Interval for MeanLower Bound 84.58
Upper Bound 94.22
5% Trimmed Mean 89.50
Median 89.50
Variance 45.378
Std. Deviation 6.736
Minimum 79
Maximum 98
Range 19
Interquartile Range 13.50
Skewness -.267 .687
Kurtosis -1.377 1.334
(this gives us numeric info on the lower and upper CI for both groups [0 and 1])
PART II: TESTS OF STATISTICAL SIGNIFICANCE AND THE “ANALYSIS
STORY”
Last time we discussed CI’s and how they can be used to support our hypotheses
First step-- assume that the groups do not differ. This is called the Null Hypothesis (H0)
Assume the independent variable did not have an effect
Step 2-- Probability theory: estimate likelihood of observed outcome, while assuming H0
is true
This is what we mean by “statistically significant”—statistical significance is different
from scientific significance or practical/clinical significance
(Note: you don’t have to necessarily know the differences between these, but just know
that something can be statistically significant without being practically significant)
If this is the case, you reject H0 – conclude there is an effect of IV on the DV!
So, the difference between means is larger than what would be expected if error variation
(random chance) alone caused the outcome
Due to the nature of probability testing, it is possible that errors can occur with our
findings!
Types of errors:
Type I error: null hypothesis is rejected when it really is true
We observe statistically significant finding (p < .05)
But in truth, there is no effect of IV
Probability of making Type I error = alpha (α)
Setting level of significance at p < .05 indicates researchers accept probability of Type I
error as 5% for any given experiment
Because of the possibility of Type I and Type II errors researchers are always tentative
about their claims
Use words such as “findings support the hypothesis” or “consistent with the hypothesis”
Never say the hypothesis was proven!!!
There is important info about statistical power and sensitivity in the chapter you should
be familiar with, but in interests of getting through material, I am leaving you to cover
and be familiar with that information
Note the Levene’s Test for Equality of Variances—this tests whether the variances
(standard deviation/standard error of the mean) for the 2 groups are equal (we want it to
be equal). If this test is NOT SIGNIFICANT (p > .05—which is what we want), then we
look at the top row—EQUAL VARIANCES ASSUMED. If this test IS SIGNIFICANT
(p< .05) we use the lower row—EQUAL VARIANCES NOT ASSUMED. What would
we do in this case?
Next we look at the t value, and whether it’s significant. In this case, the t value is
significant, so we would reject the null hypothesis—we have support that there are
differences between the groups.
I have created a new variable in the data set called ADDLVL. This is a variable the
groups participants into 3 groups based on how many ADHD-related problems the
participants had (1= lower, 2= middle, 3= higher). I want to see if these 3 groups differ
on their IQ.
Go to ANALYZE, then COMPARE MEANS, and then ONE-WAY ANOVA. Put the
variable of interest in the DEPENDENT LIST (in this case, IQ) and the grouping variable
in the FACTOR section. You don’t have to make any other changes yet (although I
would go to OPTIONS and click on DESCRIPTIVES... it can never hurt).
You should see an output something like this:
Descriptives
IQ
N Mean Std. Std. Error 95% Confidence Minimum Maximum
Deviation Interval for Mean
Lower Upper
Bound Bound
1.00 24 113.71 10.407 2.124 109.31 118.10 90 137
2.00 51 96.29 10.356 1.450 93.38 99.21 75 120
3.00 13 91.00 6.819 1.891 86.88 95.12 79 102
Total 88 100.26 12.985 1.384 97.51 103.01 75 137
ANOVA
IQ
Sum of df Mean F Sig.
Squares Square
Between 6257.442 2 3128.721 31.616 .000
Groups
Within 8411.547 85 98.959
Groups
Total 14668.989 87
Here, we can see that the ANOVA (F-test) is significant (F= 31.616, p= .000). This tells
us that at least 2 of our groups are statistically significant. BUT WHICH ONES?
To figure this out, we need to run POST HOC analyses. This basically means “after
this,” or in this case “after this test.” POST HOC analyses should ONLY be run if the
ANOVA/F-TEST was significant at the first step (like above). If it was not, there is no
point in running POST HOCs.
There are many different types of POST HOCs that use different assumptions and meet
certain requirements. For this course, I will not make you figure out the many different
ways in which POST HOCs are different. The most common POST HOC is Tukey’s- so
feel free to use this POST HOC should you need it for your analysis.
You calculate POST HOCs similar to running a regular ANOVA. Like before, go to
ANALYZE, then COMPARE MEANS, and ONE-WAY ANOVA. Now- click on POST
HOC. From here, click the desired test (Tukey’s in this case). Then hit OK. You should
have a new table that looks like this:
Multiple Comparisons
Dependent Variable: IQ
Tukey HSD
Mean Std. Error Sig. 95%
Difference Confidence
(I-J) Interval
(I) addlvl (J) addlvl Lower Upper
Bound Bound
1.00 2.00 17.41 2.462 .000 11.54 23.29
3.00 22.71 3.426 .000 14.54 30.88
2.00 1.00 -17.41 2.462 .000 -23.29 -11.54
3.00 5.29 3.091 .206 -2.08 12.67
3.00 1.00 -22.71 3.426 .000 -30.88 -14.54
2.00 -5.29 3.091 .206 -12.67 2.08
* The mean difference is significant at the .05 level.
From here, we can see that GROUP 1 is significantly different from GROUP 2 and
GROUP 3. We can also see GROUP 2 and GROUP 3 are NOT significantly different
from each other. Look at the MEAN DIFFERENCE (I-J) as well as the DESCRIPTIVES
(i.e., MEAN) from before to determine HOW they differ. In this case, GROUP 1 had
higher IQ than GROUP 2 and GROUP 3.
CORRELATION
Next we will run a correlation test. Correlations are used to see whether or not 2 or more
variables are related. So, you can use a correlation to determine the relation between
how a person scores on 2 items/variables/measures. By using this procedure, you can
also use the correlation/relation to predict one variable using another. We’ll talk more
about this in class.
Correlations
IQ GPA
IQ Pearson 1 .497**
Correlation
Sig. (2-tailed) . .000
N 88 88
GPA Pearson .497** 1
Correlation
Sig. (2-tailed) .000 .
N 88 88
** Correlation is significant at the 0.01 level (2-tailed).
In a correlation table, what you want to look for is the PEARSON value (r value) and the
SIG row. In the table, you see that the correlation runs the correlation test for how IQ and
GPA are related, but also how IQ is related to IQ and GPA is related to GPA (see above).
When a variable is correlated with itself, the PEARSON value will be 1 (perfect
correlation). You can ignore those scores.
What you’re interested in looking at is how IQ is related to GPA. In this case, the
PEARSON value between these is .497 and the sig (p) is .000. In this case, the
correlation between these 2 variables IS SIGNIFICANT.
The PEARSON/r value tells us the DIRECTION and the STRENGTH of the relationship.
You might remember from Statistics that a correlation can be NEGATIVE or POSITIVE.
A POSITIVE correlation means that both variables move in the SAME DIRECTION—as
one variable INCREASES, the other INCREASES; or as one variables DECREASES,
the other DECREASES. An example of this is as “time spent studying for exam”
INCREASES, “exam score” increases. Alternatively, as “time spent studying for
midterm” DECRASES, “exam score” DECREASES.”
On the SPSS print out, you can determine whether the correlation is POSITIVE or
NEGATIVE by looking at the PEARSON value—is it a POSITIVE or NEGATIVE
number? In this case, it is POSITIVE, meaning as IQ increases, GPA increases.
The other thing to look for in correlation is the STRENGTH of the relationship. A
correlation score (this is our PEARSON or r value) can range from anywhere between -1
and +1. Here is an example distribution of possible correlation scores:
The closer the correlation is to 1 or -1, the STRONGER it is—the variables are more
closely related and it will be easier to find a significant relationship. The closer they are
to 0, the WEAKER the relationship is, and it may be harder to find a significant
relationship.
You can also run a correlation between 3 or more variables. You would follow the above
procedures (ANALYZE, CORRELATE, BIVARIATE), but now you would just add
additional variables. Make sure all the same things are clicked as noted above and hit
OK. You will now get a print out that looks like this:
Correlations
IQ GPA ADDSC
IQ Pearson 1 .497** -.632**
Correlation
Sig. (2-tailed) . .000 .000
N 88 88 88
GPA Pearson .497** 1 -.615**
Correlation
Sig. (2-tailed) .000 . .000
N 88 88 88
ADDSC Pearson -.632** -.615** 1
Correlation
Sig. (2-tailed) .000 .000 .
N 88 88 88
** Correlation is significant at the 0.01 level (2-tailed).
Now you will notice the table has gotten bigger and the test will test the correlation
between each variables separately (e.g., IQ—GPA, IQ—ADDSC, GPA—IQ, GPA—
ADDSC, ADDSC—IQ, ADDSC—GPA). You would read the table the same way as
above, focusing on the PEARSON/r value and the Sig level.
What can we determine from adding the new variable into this correlation?
We’ll now talk a little bit about how to COMPUTE and TRANSFORM data and
variables in SPSS. I will not include this info on BlackBoard-- this material will not be
included on the exam, but might be helpful for the final project.