You are on page 1of 4

CHAPTER 2

PREPARING QUESTIONNAIRE DATA


What are data?
In surveys, data are numeric values representing responses to questions.
Information used as a basis for drawing conclusions and in decisionmaking
It is assumed that there is too much information to digest and understand without tools such
as computers and statistics.
It is assumed that we will analyze data to simplify or reduce the information; we are then
better able to describe what is known.
Sources of data: We collect it ourselves (basic research), look at data others have collected
in their research (secondary research).
The word "data" is plural. "Datum" is the singular form.
Variables
Each variable occupies one column.
Normally, variables are assigned to columns (left-to-right) in the order they appear in a
survey questionnaire.
In SPSSWIN, the columns are set up to provide a default format of 8.2 - eight characters wide
with two decimal positions. This default variable width can be changed.
While survey data are normally numeric, it is possible to have alphanumeric ("string")
variables (letters, characters) but these cannot be operated on mathematically.
Case
A unit of analysis about which a number of different pieces of information (variables) have
been gathered; basic unit of analysis for which measurements are obtained; e.g., an
individual, a TV show, a city, a county, etc. A questionnaire can also be thought of as a case.
One row in the SPSSWIN Data Editor (spreadsheet). Each case starts on a new row (line).
Every case contains the same variables in the same order (in the same columns).
All responses are converted to numeric codes (e.g., Male = 1, etc.).
Data file
The scores/values for every variable for one participant/questionnaire is called a case; a set
of cases make up a data file.
Coding data from surveys:
After the researcher has completed a survey (questionnaires have been completed by
respondents in mail surveys or by interviewers in telephone and face-to-face surveys), the
first step in data analysis is to code the questionnaires. This involves translating checkmarks
and other notations on the questionnaire to numeric values that can later be entered into a
data file. This process can be tedious but it is a critical step that must be done carefully to
avoid creating random errors.

NOTE: Chapter 8 provides specific examples of the data coding process.


There are several alternative approaches to coding data.
1.
the

The researcher can create a special coding form on which the numeric values from

questionnaire are written. The coding forms are then used for data entry. One
drawback of this
approach is the possibility of introducing errors while writing the data to the coding
form.
2.
The numeric value for each variable can be written on the questionnaire adjacent to
the survey
question(s). This is the approach recommended in Chapter 8.
3.
Data coding can be done on an optical-scan sheet instead of a special coding form as
in #1
above. The sheets are then scanned and a data file (ASCII) is created. This approach
would
require the tedious process of "bubbling in" the appropriate value for each variable.
This
approach is also limited because op-scan sheets typically have limited values (e.g., 1
- 5) for
each answer.
In some instances mail and other self-administered questionnaires are printed on
optical scan
forms. Respondents mark their answers and return the questionnaire for scanning.
This speeds
up the data processing, but general problems of optical scanning remain.
4.
with

Computer-assisted telephone interviewing (CATI) programs write the data associated

respondents' answers directly to a data file, thus eliminating the steps of data coding
and data
entry.
Special problems in coding survey data. NOTE: Many of these examples are
discussed in more depth in Chapter 8.
1.

Problem: Respondent gives multiple answers to a single-answer item.


What to do: Treat the item (variable) as if it was skipped. Assign a value for
"missing" e.g., "9" or "99").

2.

Problem: Respondent does not answer a question (such missing data is also referred
to as "item nonresponse").
What to do: Assign a value for "missing" (e.g., "9" or "99").

3.

Problem: A question provides a list and requests the respondent to rank the top
three. Instead, the respondent gives several items "1's", "2's" and "3's".
What to do: Treat the items (variables) as if they were skipped. Assign a value for
"missing" (e.g., "9" or "99").

4.

Problem: A question provides a list and requests the respondent to rank the top
three. Instead the respondent just checks ( ) some of the items in the list.

What to do: Treat the items (variables) as if they were skipped. Assign a value for
"missing" (e.g., "9" or "99").
5.

Problem: A question provides a list of eight items and requests the respondent to
rank the top three. Instead, the respondent ranks the entire list from "1" to "8".
What to do: Code the items with ranks "1", "2" and "3" with those values. All others
would receive a code for "missing" in this case a "9".

6.

Problem: A respondent writes his/her own answer to a fixed response item. [Fixed
response or closed end questions provide a limited set of specific responses from
which the respondent is to choose one (unless it is a "check all that apply" format).]
What to do: Treat the item (variable) as if it was skipped. Assign a value for "missing"
(e.g., "9" or "99").

7.

Problem: Respondent's handwriting is illegible. The number(s) or words cannot be


interpreted.
What to do: Treat the item (variable) as if it was skipped. Assign a value for "missing"
(e.g., "9" or "99").

8.

Problem: A hybrid fixed response question provides a set of answers plus an "other"
after which the respondent can write in an answer that is not included in the set of
responses. You find that a large number of respondents are writing in the same
answer. Left as is, these essentially identical write-in answers will be collapsed into
the generic "other."
What to do: Assign a code number to the answer and assign that code number
instead of the code number for "other."
Example (code numbers appear in parentheses):
Which of the following was the primary reasons for your attending SC98?
Attend technical presentations (1)
Attend exhibits (2)
Meet people (3)
Meet people (4)
Make technical presentation (5)
Exhibit products (6)
Other:
(7)

Code of "9" reserved for "missing"

In this example, if a large number of respondents wrote in basically the same answer on the line
following "other", instead of giving those responses a "7", the value of "8" could be assigned
9.

Problem: A question asks for a numeric answer (whole number; e.g., "How many
times do you attend a movie theatre in a typical month?") but the respondent gives a
range
(e.g., "4 - 6").
What to do: You could either use the midpoint of the range or treat the item (variable)
as if it was skipped, assigning a value for "missing" (e.g., "9" or "99"). The midrange
works better when the value range is limited (such as "4 - 6" and not "3 - 15"). Be
consistent in using one approach or the other.

10.

Problem: A question asks for a numeric answer (a whole number) but the respondent
provides a fraction or decimal.
What to do: Round up to the next whole number.

Survey Questionnaires: Issues relevant to data coding


1.

Assign each questionnaire an identification number (ID #). It can be the first variable
(first column in the data file).

2.

Provide space (e.g., wider left margin) in which to write the numeric values for the
variables.

3.

For multiple response items, provide clear directions (e.g., "Check all that apply")

4.

For ranking items, provide clear (and redundant) directions for proper ranking (i.e.,
#1, #2, #3).

5.

Golden rule of fixed response items: response sets should be mutually exclusive
and exhaustive. Using the hybrid ("other") form can help.

6.

Decide on a value for "missing" for every variable (e.g., "9", "99" or "999").

7.

Don't ask respondents to calculate what the computer can later calculate.

8.

Use codes consistently (e.g., "1" always used for "Strongly Agree").

You might also like