You are on page 1of 14

INTRODUCTION TO SPSS

OFID WORKSHOP
March 2008
Adil Yousif, Ph. D.

Types of Variables:
Quantities are classified into two categories, Constants and Variables
There are two types of variables:
Qualitative and they are either nominal or ordinal (string)
Quantitative and they are numerical (numeric)

Variable Names
The following rules apply to variable names:
Each variable name must be unique; duplication is not allowed.
Variable names can be up to 64 bytes long, and the first character must be a letter or one
of the characters @, #, or $. Subsequent characters can be any combination of letters,
numbers, no punctuation characters, and a period (.). In code page mode, sixty-four bytes
typically means 64 characters
Variable names cannot contain spaces.
The period, the underscore, and the characters $, #, and @ can be used within variable
names. For example, A._$@#1 is a valid variable name.
Variable names ending with a period should be avoided, since the period may be
interpreted as a command terminator.
Variable names ending in underscores should be avoided, since such names may
conflict with names of variables automatically created by commands and procedures.
Reserved keywords cannot be used as variable names. Reserved keywords are ALL,
AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO, and WITH.

Variable names can be defined with any mixture of uppercase and lowercase characters,
and case is preserved for display purposes.

Data Editor
The Data Editor provides a convenient, spreadsheet-like method for creating and editing
data files. The Data Editor window opens automatically when you start a session.
The Data Editor provides two views of your data:
Data View. This view displays the actual data values or defined value labels.
Variable View. This view displays variable definition information, including defined
variable and value labels, data type (for example, string, date, or numeric), measurement
level (nominal, ordinal, or scale), and user-defined missing values.
In both views, you can add, change, and delete information that is contained in the data
file.

Entering Data
In Data View, you can enter data directly in the Data Editor. You can enter data in any
order. You can enter data by case or by variable, for selected areas or for individual cells.
The active cell is highlighted.
The variable name and row number of the active cell are displayed in the top left corner
of the Data Editor.
When you select a cell and enter a data value, the value is displayed in the cell editor at
the top of the Data Editor.

Data values are not recorded until you press Enter or select another cell.
To enter anything other than simple numeric data, you must define the variable type
first.
If you enter a value in an empty column, the Data Editor automatically creates a new
variable and assigns a variable name.

Descriptive
The Descriptive procedure displays univariate summary statistics for several variables in
a single table and calculates standardized values (z scores). Variables can be ordered by
the size of their means (in ascending or descending order), alphabetically, or by the order
in which you select the variables (the default).
The statistics calculated in Descriptive procedure are: Sample size, mean, minimum,
maximum, standard deviation, variance, range, sum, standard error of the mean, and
kurtosis and skewness with their standard errors.

To obtain Descriptive Statistics

From the menus choose:


Analyze
Descriptive Statistics
Descriptive.

Select one or more variables.


Optionally, you can:

Select Save standardized values as variables to save z scores as new variables.

Click Options for optional statistics and display order.

For one-sample Z-test:


Get the sample mean from the descriptive procedure above
From menus choose Transform
Select compute
Use the given values of the population mean and standard deviation to calculate the Zvalue (Note that the critical values of Z are: 1.65 for ( = 0.1), 1.96 for ( = 0.05), 2.58
for ( = 0.01).

Histograms
The histogram plot shows the distribution of a single numeric variable.
To obtain a histogram
From the menus, choose:
Graphs Legacy Dialogs
Histogram
Select a numeric variable for Variable in the Histogram dialog.

Select Display normal curve to display a normal curve on the histogram.


To panel the chart, move one or more categorical variables into the Panel by group.
Select Titles to define lines of text to be placed at the top or bottom of the plot.

Boxplots
Boxplot allows you to make selections that determine the type of chart you obtain.
Boxplots show the median, interquartile range, outliers, and extreme cases of individual
variables.To obtaining Simple and Clustered Boxplots:

From the menus, choose:


Graphs Legacy Dialogs
Boxplot

In the Boxplot initial dialog box, select the icon for simple or clustered.

Select an option under Data in Chart Are.

Select Define.

Select variables and options for the chart.


P.S. From the Frequency tables procedure you can also obtain the following plots: bar
charts, pie charts, and histograms.

To obtain Frequency Tables:

From the menus choose:


Analyze
Descriptive Statistics
Frequencies...

Select one or more categorical or quantitative variables.

Bivariate Correlations
The Bivariate Correlations procedure computes Pearson's correlation coefficient,
Spearman's rho, and Kendall's tau-b with their significance levels. Correlations measure
how variables or rank orders are related.. Pearson's correlation coefficient is a measure of
linear association. Two variables can be perfectly related, but if the relationship is not
linear, Pearson's correlation coefficient is not an appropriate statistic for measuring their
association.
For each variable the statistics that are displayed are: number of cases with nonmissing
values, mean, and standard deviation. For each pair of variables the statistics are:
Pearson's correlation coefficient, Spearman's rho, Kendall's tau-b, cross-product of
deviations, and covariance.

Scatterplots
Scatterplot allows you to specify the type of scatterplot you want. To obtaining
Scatterplots

From the menus, choose:


Graphs Legacy Dialogs
Scatter/Dot

In the Scatterplot dialog, select the icon for simple, overlay, matrix, 3-D, or simple
dot plot.

Select Define.

Select variables and options for the chart


To obtain Bivariate Correlations
From the menus choose:
Analyze
Correlate
Bivariate...

Select two or more numeric variables.


P.S. Before calculating a correlation coefficient, screen your data for outliers (which
can cause misleading results) and evidence of a linear relationship.

Linear Regression
Linear Regression estimates the coefficients of the linear equation, involving one or more
independent variables, that best predict the value of the dependent variable.
The statistics calculated for each variable are: number of valid cases, mean, and standard
deviation. For each model the statistics shown are: regression coefficients, correlation
matrix, part and partial correlations, multiple R, R2, adjusted R2, change in R2, standard
error of the estimate, analysis-of-variance table, predicted values, and residuals. Also,
95%-confidence intervals for each regression coefficient, variance-covariance matrix,
variance inflation factor, tolerance, Durbin-Watson test, distance measures (Mahalanobis,
Cook, and leverage values), DfBeta, DfFit, prediction intervals, and casewise diagnostics.
Plots: scatterplots, partial plots, histograms, and normal probability plots.

To obtain a Linear Regression Analysis

From the menus choose:


Analyze
Regression
Linear...

In the Linear Regression dialog box, select a numeric dependent variable.

Select one or more numeric independent variables.


Optionally, you can:

Group independent variables into blocks and specify different entry methods for
different subsets of variables.

Choose a selection variable to limit the analysis to a subset of cases having a


particular value(s) for this variable.

Select a case identification variable for identifying points on plots.

Select a numeric WLS Weight variable for a weighted least squares analysis.

One-Sample T-Test
The One-Sample T-Test procedure tests whether the mean of a single variable differs
from a specified constant. For each test variable the statistics calculated are: mean,
standard deviation, and standard error of the mean. The average difference between each
data value and the hypothesized test value, a t test that tests that this difference is 0, and a
confidence interval for this difference (you can specify the confidence level).

To obtain a One-Sample T- Test

From the menus choose:


Analyze
Compare Means
One-Sample T-Test...

Select one or more variables to be tested against the same hypothesized value.

Enter a numeric test value against which each sample mean is compared.

Optionally, you can click Options to control the treatment of missing data and the level of
the confidence interval.

Paired-Samples T-Test
The Paired-Samples T-Test procedure compares the means of two variables for a single
group. It computes the differences between values of the two variables for each case and
tests whether the average differs from 0.
The statistics calculated for each variable are: mean, sample size, standard deviation, and
standard error of the mean. For each pair of variables: correlation, average difference in
means, t test, and confidence interval for mean difference (you can specify the confidence
level). Standard deviation and standard error of the mean difference.

To obtain a Paired-Samples T- Test

From the menus choose:


Analyze
Compare Means
Paired-Samples T- Test...

Select a pair of variables, as follows:


Optionally, you can click Options to control the treatment of missing data and the level of
the confidence interval.

Independent-Samples T-Test
The Independent-Samples T-Test procedure compares means for two groups of cases in
which subjects are randomly assigned.
The statistics calculated for each variable are: sample size, mean, standard deviation, and
standard error of the mean. For the difference in means the statistics are: mean, standard
error, and confidence interval (you can specify the confidence level). The inference tests
are: Levene's test for equality of variances, and both pooled- and separate-variances t
tests for equality of means.

To obtain an Independent-Samples T-Test

From the menus choose:


Analyze
Compare Means
Independent-Samples T-Test...

Select one or more quantitative test variables. A separate t test is computed for each
variable.

Select a single grouping variable, and click Define Groups to specify two codes for
the groups that you want to compare.
Optionally, you can click Options to control the treatment of missing data and the level of
the confidence interval.

Chi-Square Test
The Chi-Square Test procedure tabulates a variable into categories and computes a chisquare statistic. This goodness-of-fit test compares the observed and expected frequencies
in each category to test either that all categories contain the same proportion of values or
that each category contains a user-specified proportion of values.
Statistics. Mean, standard deviation, minimum, maximum, and quartiles. The number and
the percentage of nonmissing and missing cases, the number of cases observed and
expected for each category, residuals, and the chi-square statistic.

To Obtain a Chi-Square Test

From the menus choose:


Analyze
Nonparametric Tests
Chi-Square...

Select one or more test variables. Each variable produces a separate test.
Optionally, you can click Options for descriptive statistics, quartiles, and control of the
treatment of missing data.

One-Way ANOVA
The One-Way ANOVA procedure produces a one-way analysis of variance for a
quantitative dependent variable by a single factor (independent) variable. Analysis of
variance is used to test the hypothesis that several means are equal. This technique is an
extension of the two-sample t test.

In addition to determining that differences exist among the means, you may want to know
which means differ. There are two types of tests for comparing means: a priori contrasts
and post hoc tests. Contrasts are tests set up before running the experiment, and post hoc
tests are run after the experiment has been conducted. You can also test for trends across
categories.
Statistics. For each group: number of cases, mean, standard deviation, standard error of
the mean, minimum, maximum, and 95%-confidence interval for the mean. Levene's test
for homogeneity of variance, analysis-of-variance table and robust tests of the equality of
means for each dependent variable, user-specified a priori contrasts, and post hoc range
tests and multiple comparisons: Bonferroni, Sidak, Tukey's honestly significant
difference, Hochberg's GT2, Gabriel, Dunnett, Ryan-Einot-Gabriel-Welsch F test (R-EG-W F), Ryan-Einot-Gabriel-Welsch range test (R-E-G-W Q), Tamhane's T2, Dunnett's
T3, Games-Howell, Dunnett's C, Duncan's multiple range test, Student-Newman-Keuls
(S-N-K), Tukey's b, Waller-Duncan, Scheff, and least-significant difference.

To Obtain a One-Way Analysis of Variance

From the menus choose:


Analyze
Compare Means
One-Way ANOVA...

Select one or more dependent variables.

Select a single independent factor variable

You might also like