You are on page 1of 8

Cathy Chen Marketing Research Term 1, 2014-15

1
SPSS Tutorial I (Lecture 6)

1. Opening up Datasets

The first thing you will probably want to do is to work with a dataset. For this, first
download the datasets from SMU eLearn to a directory that you have access to (e.g.
D:\yourwork). Make sure you remember where you save your file because you will need
to get SPSS to see this again.

Once your dataset has been downloaded, return to SPSS. If you see the screen as follows,
then select more files from the list and click on Ok.




If you do not see this dialog box you will need to select File from the main menu,
select Open and then the dialog box should open up. You will also do this when you
open up new datasets. You can have multiple datasets open, and you can close each of
these datasets without exiting SPSS. If, however, you close the final dataset then you risk
also closing SPSS entirely a dialog box will appear warning you that you should not do
this.

Each dataset can be worked on individually. The output will be stored in a single sheet.
For example, if you run Descriptive Statistics from the Boston Dataset, then the variables
for the Boston dataset will appear in the dialog box.

Cathy Chen Marketing Research Term 1, 2014-15
2
The organization of data in SPSS

Data in SPSS is handled and organized very much like an Excel spreadsheet. Each cell
represents a data point. The data can be nominal (i.e. categories such as male/female,
purple/green/blue.., interval (e.g. 1,2,3,4.. for a Likert type scale), or ratio (think of this
as continuous, such as 122.523). It is also possible to have many other formats, such as
free-form text, date/time, currency and so on.

Typically we will organize the data so that each column represents a variable and each
row represents a sampling unit. Sometimes we will use multiple rows for repeated
observations on the same sampling unit.

Another example, if you want to insert a variable, then right click a column. Inserting
rows you can do by right clicking on the grey cell next to the row of interest (see below):



If you want to change the number in a cell you can double click on it to edit, or, as in
Excel, just press the F2 function key.
Cathy Chen Marketing Research Term 1, 2014-15
3
2. Use of Output

Output windows in SPSS are organized like trees or directories with many levels of
subdirectories. You can export this output to other applications by right clicking on the
object you want, then select Export. When you run one of the applications discussed
below, look for the output in the output window. You may need to scroll down to find it
(it will typically be the last one run).

You can also delete specified output sections for example if you wish to rerun the
analysis and dont want the old copy lying around then delete the one you dont want by
selecting it from the sub-directories, and press the delete key once.


3. MKTG103 Lecture Material

3.1 Visualizing Data (Lecture 6)

Much of the graphical presentation of data is handled via an interactive tool called Chart
Builder. To use this tool, go to the main menu bar, select Graphics and click on
Chart Builder. The way this works is that you can use your mouse to drag and drop
different objects to be graphed together. There are a wide variety of ways to graph data
here, and they are organized in categories (such as line graphs, bar graphs,
boxplots etc). See the illustration:


Cathy Chen Marketing Research Term 1, 2014-15
4


3.2 Trying some graphics.

Try the following graphics we did in class:

1) Scatter-plot open up the file lecture6-boston. From the Graphics menu, select
Chart-Builder and click once on the category called Scatter/Dot. You should see a
number of variables. Select MEDV for the Y-Axis (the vertical one) and Age for
the X-Axis (the horizontal one). You can add annotations such as legends, colors,
titles, labels for the axes etc. with the tabs on the left. Play with some of these and create
your own unique scatter plot. When you are done, consider what is the relationship
between the two variables.

Harder: try to do a scatter plot of CRIM (crime rate) versus MEDV (median value of
homes). What do you see? Try a log transformation of the crime rate variable.

2) Frequency distribution: Run a histogram on the variable MEDV. Select bar-charts.
Drag the variable MEDV to the X axis. When you are ready to view it, press ok.
Interpret the distribution.

3) Density plot: plot Y=MEDV as a density plot. This is located under scatter/dot,
but select the plot with the vertical columns of circles. This is very similar to the
histogram, but with finer divisions for the bins. Interpret this plot what do you see that
is not obvious in the histogram?


3.3 Descriptive Statistics

Another way to get a feel for the data is to run some basic descriptive statistics. SPSS
has these organized by type under the Analyze menu. Select Descriptive Statistics
from this menu and select the one that you want to use.

4) Present the descriptive statistics for the variable MEDV.

3.4 Q-Q Plot: visualizing if the data come from a known (e.g. Normal) distribution.

You can examine a histogram and get a rough sense of whether the data come from a
normal distribution or not. A more formal, theoretically sound way of doing this is to use
a Q-Q plot.

The Q-Q plot (Quantile-Quantile) allows you to visually see if the data appears to be
from a normal distribution or not. The way it does this is systematically compare how
the data deviates from what you may expect that data to look like if it were normally
distributed. The technique plots the order statistics of the observed data (e.g. each of the
percentiles) against the quantiles (think of them as order statistics) of the normal
Cathy Chen Marketing Research Term 1, 2014-15
5
distribution. Where do the parameters (mean and standard deviation) of this normal
distribution come from? They are estimated from the same sample of data.

Another version of this is the P-P plot it plays the same role but plots the probabilities
instead of the quantiles. It is also possible to diagnose different distributions.

To run the Q-Q plot, it is actually not one of the graphics directly available in SPSSs
Chart Builder. You need to go to Analyze->Descriptive Statistics-> Q-Q plots. You
should see the following:



5) Run the Q-Q plot on MEDV first. Later select AGE. What are your conclusions
about the distribution of these two variables?

Harder: Now select CRIM and run a Q-Q plot. What do you see? Can you think of a
way to transform and rerun the Q-Q plot? Is there a way to do this in one step?


3.5 One Way and Two Way Chi-Squared Relationship between Nominal Variables
(Lecture 6)

Command 1:



a) One Way Chi-Square Test :
a. Analyze- NonParametric Test Chi-Square
Cathy Chen Marketing Research Term 1, 2014-15
6


This is useful if you have a nominal variable, and you want to test hypotheses about how
this data is distributed.

Exercise: Download the file lecture6_chisq1 (Canopy of Care Charity) and run the one
way Chi-Square test on pressure. Interpret the results what are your conclusions?


3.6 Lecture 6 (Hypothesis Testing) - Both Means and Proportions

3.6.1: Test against a theory

We now move on to hypothesis testing for ratio (continuous, or not nominal) variables.
For example, income, sales, height, weight, distance from an object, annual precipitation
etc. We may have some prior belief about what the value of this should be. For example,
we expect that the salary of a brand manager for a Fortune 500 firm be $100,000 per
annum.

Command :

Exercises:

1. Open the Lecture6-QualityMotors dataset. Quality motors is an automobile
dealership that regularly advertises in its local market area. It has claimed that a certain
make and model of car averages 30 miles to a gallon of fuel and mentions this figure may
vary with driving conditions. A local consumer group wishes to verify the advertising
claim. To do so, it selects a sample of recent purchasers of this make and model of
automobile. It asks them to drive their cars until two tanks of gasoline are used up and
record the mileage. The group then calculates and records the miles per gallon for each
car. The dataset contains the results.

a. Formulate a statistical hypothesis to test the consumer groups purpose.
b. Calculate the mean average miles per gallon. Compute the sample variance and
sample standard deviation.
c. According to your hypothesis, construct the appropriate statistical test using a .05
significance level.
d. What is the p value? At what level of confidence do you reject the null
hypothesis?

2. Open the Lecture6_tvshare dataset. A TV station is trying to determine their
market share for a talk show program so that they can accurately price commercials for
their station. If their market share is the same as the past level of 8%, they will leave
a) One Sample t-test (Test of a single mean and proportion)
a. Analyze Compare Means One Sample t-test
Cathy Chen Marketing Research Term 1, 2014-15
7
prices unchanged. If, however, market share is greater they will increase prices
accordingly.

a. Can you formulate a hypothesis to help the TV station decide whether to modify
their price? In SPSS can you test this hypothesis?
b. What will be your conclusion based on this test? First state the hypothesis.
Remember the difference between one and two-tailed tests. Which one is this?

3.6.2: Test through comparison of two means

We now examine the question of whether two samples have the same mean or proportion.

Command 1:

The assumptions for the two-sample means test are:
a. The samples are independent.
b. Have equal variances across samples.
c. Are both drawn from a Normal distribution.

Command 2:



Exercises:

1. Open the Lecture6_internet usage dataset. A data of 30 respondents was collected
to examine the Internet usage behavior. The data includes the sex (1=male, 2=female),
familiarity with the Internet (1 = very unfamiliar, 7 = very familiar), Internet usage in
hours per week, attitude toward the Internet and attitude toward technology, both
measured on a 7-point scale (1 = very unfavorable, 7 = very favorable), and whether the
respondents have done shopping or banking on the Internet ( 1= Yes, 2 = No).

a. Suppose we want to determine whether Internet usage was different for males as
compared to females, what analysis should be conducted?

b. Suppose we are interested in determining whether the respondents differed in their
attitude toward the Internet and attitude toward technology, what analysis should be
conducted?

b) Differences in means (Test of differences in means and proportions)
a. Analyze Compare Means Independent Samples t-test
c) Differences in means (Test of differences in means and proportions)
a. Analyze Compare Means Paired Samples t-test
Cathy Chen Marketing Research Term 1, 2014-15
8
2. Open the Lecture6-ad campaign dataset. A new and an old Ad campaign were
shown to two independent groups. Each group consisted of 64 participants. Within the
group who watched the old campaign, 25 indicated they would like to buy the product
based on the campaign. For the group who watched the new campaign, 32 indicated they
would buy the product. Is the response to the two ad campaigns significantly different
from each other? Can you formulate the hypothesis and test it in SPSS?

You might also like