You are on page 1of 14

Chapter 2

Turning Data into Information


2.1 Raw Data 2.2 Types of Variables
The Problem Randomly Pick S or Q A group of 92 college students is given a list of 8 questions: 1. What is your sex (male or female)? 2. How many hours did you sleep last night? 3. Randomly pick a letter -- S or Q. 4. What is your height in inches? 5. Randomly pick a number between 1 and 10. 6. Whats the fastest youve ever driven a car (mph)? 7. What is your right handspan in centimeters? 8. What is your left handspan in centimeters? For questions 7 and 8 a centimeter ruler was provided. For question 3, some students were asked to randomly pick a letter -- S or Q, others were asked to pick Q or S. The data is recorded in the file pennstate1.sav. The variables in the file are: sex hrssleep sqpick height randnumb fastest rtspan lftspan form Each of these variables represents the responses to these questions for the students. From a statistical standpoint, these variables can be labeled as categorical or quantitative. In SPSS we treat variables as 1 of several. The variable type can be seen by using the VariableView tab at the bottom of the SPSS Data Editor screen. The variable sex is recorded as Male or Female. It is treated as a string variable. Other string variables are sqpick (S or Q) and form (SorQ or QorS). All other variables in the data set are treated as numeric. There is no distinction between discrete and continuous variables. To view all types of data used in SPSS, click on the variable type of one of the variables. A gray box with 3 dots in it will appear. Click on the box. A window opens up. The window is shown on the next page:

Here the type of data can be specified and/or changed, when typing in your own data, or when recoding an existing data set.

2.3 Summarizing One or Two Categorical Variables


Example 2.1 Seatbelt Use by 12th Graders How often do you wear a seatbelt when driving a car? This is on of many questions asked in a biennial nationwide survey of American high school students. Survey questions concern potentially risky behaviors such as cigarette smoking, alcohol use, and so on. For the question about seatbelt use when driving, possible answers were Always, Most times, Sometimes, Rarely, and Never. The data is recorded in the file youthrisk.sav, supplied with this manual. Note: The youthrisk data set contains responses for n = 3042 students. The student version of SPSS can only handle a maximum of 1500 cases. The steps described below are for the student version of SPSS. Follow these steps to summarize any other data set. However, the output was produced with the full version of SPSS and cannot be reproduced if only the student version is available. 1. Steps to prepare the data: a. Open the data set youthrisk.sav. Steps to create a frequency table with 1 categorical variable: a. To obtain a frequency table listing the number of students who answered the seatbelt question with each of the 5 possible answers, we use the Analyze menu.

2.

10

b.

Scroll down to the Descriptive Statistics submenu and select the Frequencies option. A window opens up. The window is shown below:

c. d.

Select the variable Seatbelt and move it into the Variable(s): box, as shown above. Click OK. The SPSS Output

Frequencies
Statistics Seatbelt N Valid Missing

3042 0

Seatbelt Frequency 1686 578 414 249 115 3042 Percent 55.4 19.0 13.6 8.2 3.8 100.0 Valid Percent 55.4 19.0 13.6 8.2 3.8 100.0 Cumulative Percent 55.4 74.4 88.0 96.2 100.0

Valid

Always Most times Sometimes Rarely Never Total

11

3.

Steps to create a frequency table with 2 categorical variables: a. To obtain a frequency table of the number of students who answered the seatbelt question with each of the 5 possible answers, separated by gender, we use the Analyze menu. b. Scroll down to the Descriptive Statistics submenu and select the Crosstabs option. A window opens up. The window is shown below:

c. d. e.

Select the variable Gender and move it into the Row(s): box. Select the variable Seatbelt and move it into the Column(s): box. Click on the Cells button at the bottom of the window. A second window opens up. The window is shown below:

12

f. g.

Select the Row option under the Percentages heading, as shown above. Click Continue. Click OK.

The SPSS Output

Crosstabs
Gender * Seatbelt Crosstabulation Seatbelt Some times 167 11.4% 247 15.7% 414 13.6%

Gender

Female Male

Total

Count % within Gender Count % within Gender Count % within Gender

Always 915 62.4% 771 49.0% 1686 55.4%

Most times 276 18.8% 302 19.2% 578 19.0%

Rarely 84 5.7% 165 10.5% 249 8.2%

Never 25 1.7% 90 5.7% 115 3.8%

Total 1467 100.0% 1575 100.0% 3042 100.0%

Example 2.3 Humans Are Not Good Randomizers As part of the survey described in Section 2.2, the students were asked to Randomly pick a number between 1 and 10. The data is recorded in the file pennstate1.sav. 1. Steps to prepare the data: a. Open the data set pennstate1.sav. Steps to create a Pie Chart of one categorical variable: a. To create a pie chart, we use the Graphs menu. b. Scroll down to the Pie option. A window opens up. The window is shown below:

2.

c. d.

Select the Summaries for groups of cases option (the default). Click Define.

13

e.

A window opens up. The window is shown below:

f. g. h.

Select the variable randnumb and move it into the Define Slices by: box. In the Slices Represent box, select % of cases. Click OK.

14

The SPSS Output


randnumb
1 2
1.05% 3.16% 7.37% 11.58% 10.0% 4.74%

3 4 5 6 7 8 9 10

11.05%

29.47%

9.47%

12.11%

Note: To obtain the percentages for each slice, use the following steps: I. Double click on the graph. The Chart Editor opens up. II. Click on the Elements menu and scroll down to the Show Data Labels option. III. Close the Chart Editor. 3. Steps to create a Bar Graph of one categorical variable: a. To create a bar graph, we use the Graphs menu. b. Scroll down to the Bar option. A window opens up. The window is shown below:

15

c. d.

Select the Simple option at the top and the Summaries for groups of cases option at the bottom of the window (both the default). Click Define. A window opens up. The window is shown below:

e. f. g.

Select the variable randnumb and move it into the Category Axis: box. If you wish to display percentages, rather than counts, on the y-axis, select the % of cases option under the Bars Represent heading. Click OK.

16

The SPSS Output

Graph
60

50

40

Count

30

20

10

0 1 2 3 4 5 6 7 8 9 10

randnumb

Example 2.4 Lighting the Way to Nearsightedness A survey of 479 children found that those who had slept with a nightlight or in a fully lit room before the age of 2 had a higher incidence of nearsightedness later in childhood. The raw data consisted of two categorical variables, each with three categories. The data can be found in Example 2.2 of the book. The data is not recorded in an existing file, so the data set needs to be created. 1. Steps to prepare the data: a. Open a new data file. b. Enter the 9 frequencies (155, 153, 34, 15, 72, 36, 2, 7, 5) in the first column. c. In the second column, indicate how the children slept: 1 = darkness, 2 = nightlight, 3 = full light (if the order indicated above is used, type 1, 2, 3, 1, 2, 3, 1, 2, 3). d. In the third column, indicate the incidence of nearsightedness: 1 = none, 2 = some, 3 = high (if the order indicated above is used, type 1, 1, 1, 2, 2, 2, 3, 3, 3). e. Use the Variable View tab at the bottom of the SPSS Data Editor window to name each of the three variables (e.g. count, sleep, eyesite). f. Save the file. Steps to prepare the data for a clustered bar graph: a. The count variable needs to be designated as a weighting variable. For this we use the Data menu.

2.

17

b.

Scroll down to the Weight Cases option. A window opens up. The window is shown below:

c. d. e. 3.

Select the Weight cases by option. Select the variable count and move it into the Frequency Variable: box. Click OK.

Steps to create a Bar Graph of two categorical variables: a. To create a bar graph, we use the Graphs menu. b. Scroll down to the Bar option. A window opens up. The window is shown below:

c.

Select the Clustered option at the top and the Summaries for groups of cases option at the bottom of the window.

18

d.

Click Define. A window opens up. The window is shown below:

e. f. g. h.

Select the variable eyesite and move it into the Category Axis: box. Select the variable sleep and move it into the Define Clusters by: box. Under the Bars Represent heading, select the % of cases option. Click OK.

Note: SPSS creates clustered bar graphs in a slightly different way as presented in the book. To obtain the distribution of Myopia level after sleeping in darkness, combine the three blue bars. To obtain the distribution of Myopia level after sleeping with a night light, combine the three green bars. To obtain the distribution of Myopia level after sleeping in full light, combine the three tan bars.

19

The SPSS Output

Graph
100.0%

sleep
Darkness Night light Full light

80.0%

Percent

60.0%

40.0%

20.0%

0.0% None Some High

eyesite
Cases weighted by count

2.4 Finding Information in Quantitative Data


Example 2.5 Right Handspans As part of the survey described in Section 2.2, the students were asked to measure the span of their right hands (in cm). The data is recorded in the file pennstate1.sav. 1. Steps to prepare the data: a. Open the data set pennstate1.sav. Steps to obtain the five-number summary: a. To obtain the five-number summary, we use the Analyze menu.

2.

20

b.

Scroll down to the Descriptive Statistics submenu and select the Explore option. A window opens up. The window is shown below:

c. d. e.

Select the variable rtspan and move it into the Dependent List: box. Select the Statistics option under the Display heading on the bottom left of the window. Click on the Statistics button. A second window opens up. The window is shown below:

f. g.

Select both the Descriptives and the Percentiles options, as shown above. Click Continue. Click OK.

21

The SPSS Output (edited for length)

Explore
Descriptives rtspan sex Female Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Statistic 20.0170 20.0000 3.110 1.76357 12.50 23.25 10.75 2.00 22.5575 22.5000 2.125 1.45759 18.00 26.00 8.00 2.00 Std. Error .17377

Male

.15627

Percentiles Percentiles 50 20.00 22.50 20.00 22.50

Weighted Average(Definition 1) Tukey's Hinges

rtspan rtspan

sex Female Male Female Male

10 18.00 21.00

25 19.00 21.50 19.00 21.75

75 21.00 23.50 21.00 23.50

90 22.00 24.50

The five-number summary may be obtained by combining the minimum and maximum in the first output-box with the 25th, 50th, and 75th percentiles in the second output-box. In this example the five-number summaries are: Males Females Min = 18.0 Min = 12.5 Q1 = 21.5 Q1 = 19.0 Median = 22.5 Median = 20.0 Q3 = 23.5 Q3 = 21.0 Max = 26.0 Max = 23.25

22

You might also like