You are on page 1of 6

Statistics with the SPSS Package

4. Descriptive statistics
From the Analyze menu choose the Descriptive Statistics option

4.0 Choosing your data set

If you only wish to analyse a subset of the data, then you can use the select cases option from the
data menu, highlight If condition is satisfied and click on if. Suppose we only wish to analyse
the sub-sample made up of males. In the select cases field we input the condition - sex="m". The
inverted commas are necessary when the data is inputted in non-numerical form. It should be noted
that the observations which do not satisfy this condition are crossed out in the first column.

To return to the complete data set again use select cases from the data menu and click on the All
Cases option.

4.1 Frequency tables

These are used to illustrate the distribution of qualitative random variables or discrete random
variables which take a relatively small number of values (e.g. the results of die rolls)

Choose Frequency from the Descriptive Statistics option.

For example, the table below gives the number of students in 3 departments and the percentage
share of each department (in terms of the total number of students)
department

Cumulative
Frequency Percent Valid Percent Percent
Valid Chemistry 31 31.0 31.0 31.0
French 32 32.0 32.0 63.0
Mech. Eng. 37 37.0 37.0 100.0
Total 100 100.0 100.0

4.2 Contingency Tables


The crosstabs option presents a contingency table illustrating the relationship between two
qualitative traits. One variable is placed in the rows field and one variable in the columns field. If
we use the cells option and highlight the expected option, we obtain the average number of
observations that we expect in a cell given that the traits are independent. Considering the relation
between sex and department we obtain the following contingency table.
sex * department Crosstabulation

department
Chemistry French Mech. Eng. Total
sex f Count 19 16 15 50
Expected Count 15.5 16.0 18.5 50.0
m Count 12 16 22 50
Expected Count 15.5 16.0 18.5 50.0
Total Count 31 32 37 100
Expected Count 31.0 32.0 37.0 100.0
In order to describe the nature of any possible association between the two variables, it is necessary
to compare the count (number observed) with the expected count. Here, it can be seen that females
are more likely to study chemistry (there are more females studying chemistry than you would
expect if there was no relation between sex and department), similarly males are more likely to
study mechanical engineering. Since there will always be variation resulting from the fact that we
are observing a sample, such a description is qualitative and should not be used to state whether an
association is significant or not (see 2 tests of independence).

4.3 Descriptive Statistics


The descriptives option calculates various measures of centrality (mean) and dispersion (range,
standard deviation and variance) for quantitative variables. By default the mean, standard
deviation, minimum value and maximum value are displayed, by choosing the options menu the
variance, range and standard error of the mean can be displayed.
Descriptive Statistics

N Range Minimum Maximum Mean Std. Variance


Statistic Statistic Statistic Statistic Statistic Std. Error Deviation
Statistic Statistic
height 100 65 139 204 170.40 1.340 13.398 179.495
weightbef 100 57 43 100 67.15 1.049 10.487 109.987
weightaft 100 57 42 99 68.29 1.068 10.676 113.970
bmi 100 8.66 19.37 28.03 23.0575 .17887 1.78872 3.200
Valid N (listwise) 100

Here, there are 100 observations of the body mass index (BMI), the smallest is 19.37, the largest is
28.03 and the range is 28.03-19.37=8.66. The mean BMI is 23.0575 with a standard error of
0.17887 (this is an approximation of the average error obtained when the sample mean is used to
estimate the unknown population mean BMI). The standard deviation of the BMI in the sample is
1.78872 and the variance 1.788722=3.200.

It should be noted that the median and the other quartiles can be calculated by highlighting the
corresponding statistics in the frequency menu. When the frequency menu is used to analyse a
continuous variable, then the option display frequency tables should be unmarked.

5. Tests regarding the population mean or two populations means

Such tests are carried out using the Compare Means option on the Analyze menu

5.1 Tests regarding a population mean

We choose the one-sample t test option and place the appropriate variable in the test variable field.
Suppose we wish to carry out the following test
H0: the mean height of all Irish students is 170cm
Against the alternative
HA: the mean height of all Irish students is not 170cm

Input height into the field for the test variable. The test value to be input at the bottom of this
window is 170. We obtain the following output
One-Sample Test

Test Value = 170


95% Confidence
Interval of the
Mean Difference
t df Sig. (2-tailed) Difference Lower Upper
height .299 99 .766 .400 -2.26 3.06

t is the realisation of the test statistic (a measure of the distance between the mean from the null
hypothesis and the sample mean). df degrees of freedom, this is the number of observations
minus 1. Sig this is the p-value for the test. The confidence interval given is a confidence interval
for the difference between the population mean and the mean from the null hypothesis. By adding
the test value to the end points of this interval we obtain a 95% confidence interval for the mean
height of all Irish students. Hence, the 95% confidence interval for this mean is
[170-2.26, 170+3.06] = [167.74, 173.06]

The conclusion of the test is based on the p-value. We normally test at a significance level of 5%
(0.05). If the p-value is less than 0.05, we reject the null hypothesis at this significance level (we
have evidence that the null hypothesis is false). In addition, if p<0.01, we have strong evidence that
the null hypothesis is false and if p<0.001, we have very strong evidence that the null hypothesis is
false. In this example p=0.766>0.05, thus there is no evidence that the null hypothesis is false.

We can also use the duality between confidence intervals and tests to carry out a test. SPSS gives
the appropriate confidence interval for the difference between the population mean and its
hypothesised value (here 170). If 0 lies within this interval, we do not reject the null hypothesis at
the corresponding significance level. Here a 95% confidence interval is given. We can use this to
test the null hypothesis at a significance level of 5%. Since 0 belongs to the confidence interval, we
do not reject the null hypothesis that the mean height of all Irish students is 170.

Note: By default a 95% confidence interval is calculated for the difference between the population
mean and its hypothesised value. We can chance the confidence level within the options menu in
the one-sample t test window and choosing the appropriate confidence level.

In cases where we reject H0, we are interested in how the population mean appears to differ from
the mean from the null hypothesis. If t is positive, then the population mean appears to be larger
than the mean from the null hypothesis. If t is negative, then the population mean appears to be
smaller than the mean from the null hypothesis. Note that we can calculate the means of the
variables which interest us using the means option in the compare means menu.

5.2 Tests regarding the difference between 2 population means

5.2.1 For dependent samples

Such tests are used when pairs of observations are made on one group of individuals (e.g. mass
before and after a diet, blood pressure before and after treatment). In such a case we have pairs of
observations (Xi , Yi ), where this pair of observations are made on the i-th member of the test
group. We wish to test the hypothesis

H0: X Y = 0 (i.e. there is no difference between the population means)


against
HA: X Y 0 (i.e. there is a difference between the population means)

We wish to test whether the mean weight of students changes during their studies. The samples
should be entered in 2 columns. We choose the paired-sample t-test option and select the pair of
variables we are interested in (weightbef weight at the start of studies and weightaft - weight after
studies). The output is as follows

Paired Samples Statistics

Std. Error
Mean N Std. Deviation Mean
Pair weightbef 67.15 100 10.487 1.049
1 weightaft 68.29 100 10.676 1.068

Paired Samples Test

Paired Differences
95% Confidence
Interval of the
Std. Error Difference
Mean Std. Deviation Mean Lower Upper t df Sig. (2-tailed)
Pair 1 weightbef - weightaft -1.142 1.966 .197 -1.533 -.752 -5.810 99 .000

The first table give the two sample means together with the sample deviations. The second table
gives the difference between the sample means (i.e. the mean weight before minus the mean weight
after for the samples). The confidence interval is a 95% confidence interval for the difference
between the two corresponding population means (i.e. the mean weight of all Irish students at the
start of their studies minus the mean weight of all Irish students at the start of the their studies). As
before, the confidence level may be changed within the options menu.

Sig. gives the p-value for the test. As before, the conclusion of the test is based on this p-value.
Since p is approximately 0, we reject the null hypothesis (we have very strong evidence that the
null hypothesis is false). In order to see how the mean weight of students changes during their
studies, we return to the first table. It can be seen that on average students are heavier at the end of
their studies. Our final conclusion is

We have very strong evidence that on average students are heavier after their studies than when
they start their studies.

Again we can use duality to carry out the hypothesis test (here at a significance level of 5%). Since
0 does not belong to this confidence interval, we have evidence that the mean weight of students
changes during their studies.

It should be noted that this test is in essence a one-sample test. If we calculate the amount by which
the weight of each student changes (i.e. Di = Xi Yi ), we can test the hypothesis

H0: D = 0
HA: D 0
This may be done in SPSS be defining the variable d = weightbef-weightaft and using the one-
sample t-test option with 0 as the test value. The results are exactly the same as the results obtained
using the paired samples option.

5.2.2 For independent samples

Such tests are used when one type of observation is made on two groups of individuals (for
example, the weight of British and Irish students, the height of males and females). All the
observations should be entered in one column and a grouping variable defines which group an
individual belongs to. We wish to test the hypothesis

The independent samples t-test should be chosen. We wish to test the hypothesis

H0: X Y = 0 (i.e. there is no difference between the population means)


against
HA: X Y 0 (i.e. there is a difference between the population means).

Suppose we want to test the hypothesis that the mean BMI of males is equal to the mean BMI of
females. BMI is chosen as the test variable, sex is chosen as the grouping variable. In order to
define the 2 groups, it is necessary to click on the define groups option. We define the labels of the
2 groups in this window. The output is as follows
Group Statistics

Std. Error
sex N Mean Std. Deviation Mean
bmi m 50 22.8234 1.88803 .26701
f 50 23.2916 1.66969 .23613

Independent Samples Test

Levene's Test for


Equality of Variances t-test for Equality of Means
95% Confidence
Interval of the
Mean Std. Error Difference
F Sig. t df Sig. (2-tailed) Difference Difference Lower Upper
bmi Equal variances
.527 .469 -1.314 98 .192 -.46821 .35644 -1.17555 .23914
assumed
Equal variances
-1.314 96.556 .192 -.46821 .35644 -1.17569 .23927
not assumed

The first table gives the sample means (22.8234 for males and 23.2916 for females). We may be
interested in whether the variance of the BMI depends on sex. This may be read in the first two
columns. The null hypothesis is that variance does not depend on sex, the alternative is that the
variance depends on sex. The p-value for this test is 0.469 (>0.05) hence we do not reject the null
hypothesis.

Now we test whether the mean BMI index depends on sex. We can always use the equal variances
not assumed row (if there is no significant difference between the variances, the results of both
tests are almost identical). The p-value for this test is 0.192 (>0.05), hence we do not reject the null
hypothesis that the mean BMI index does not depend on sex. The final two columns give a
confidence interval for the difference between the mean BMI of all males and the BMI of all the
females (the mean for males minus the mean for females). The 95% confidence interval for this
difference is [-1.17569, 0.23927]. Since 0 is contained in this interval, at a significance level of 5%
we do not reject the null hypothesis. Again, the confidence level may be changed using the options
window.

You might also like