Data Manipulation

ADVANCED DATA ANALYSIS
Submitted By:
Anitha Kumari
MBA 2nd year,
Section A, DMS, PU
Topic:
DATA MANIPULATION
DATA MANIPULATION
Using SPSS
Concatenating & Merging
Randomize Sample
Creating an ID variable
Data Restructuring
CONCATENATING SPSS FILES
Concatenating refers to placing two things on top of one another.
Concatenating data files basically involves adding cases to an existing data file.
In order to use this option, the two files must be identical in number of variables and
variable names.
From the menu in the Data Editor: Data -> Merge Files -> Add Cases
An open dataset and an external SPSS data file
You do have an option to rename variables by clicking the "Rename" button, but it
actually works better if the variables in the two files are the same prior to merging.
The option to "Indicate case source as variable" will indicate which data file your cases
came from.
If you happen to have more variables in the data set to be merged, they will show up in
the Unpaired Variables box above.

# of observations = # of obser. in file A + # of obser. in file B
What about the data type for the variable Salary?
Merging Files: one-to-one
To combine the working data file, EmployMain.sav with an external file

(EmployBeginSalary.sav) that contains two variables, employee ID (id) and beginning
salary (salbegin).
From the menu in the Data Editor: Data -> Merge Files
EmployMain.sav opened)
-> Add Variables (with
With EmployMain.sav opened
The variable id has identical values to the variable with the same name in the
EmployMain.sav, whereas the variable salbegin is unique to the external data file,
EmployBeginSalary.sav.
The variable names that appear in both datasets will be in the box labeled Excluded
Variables id
At least one variable must be common to both files in order to perform a merge.
In this example, only the variable id appeared in the Key Variables box because it was the
only variable that is in both data files.
Both data sets must be sorted by the same variable(s).
Resulting (merged) data set
Merging Files: match on the basis of a particular variable
For example, you may want to add to your data file a data column containing the average
salary for a person's job category.
The dataset for this example, MeanSalary.sav, is shown below, containing a variable for
job category, jobcat, and a variable representing the average salary for that job category,
meansalary.
Both files need to be sorted on the key variable, jobcat in this example.
Specify a variable on which the two files can be matched.
Assumes that the EmployMain.sav file is open in the Data Editor.
Non-active dataset means the External file.
RANDOMLY SELECT THE SUBSET
From the menu in the Data Editor: Transform -> Random Number Generator.
This step tells SPSS to start at a random place in its table of random numbers.
When doing research involving random numbers for example, when randomly
assigning cases to experimental treatment groups you should explicitly set the random
number seed value if you want to be able to reproduce the same results.
Data -> Select Cases
Select Random sample of cases
For example, we could select 60% of cases for the model building, and 40% for the
model validation.
CREATING AN ID VARIABLE
Often we need to construct an ID variable as the identifier for the observation.
The ID variable is very useful for any data merging, grouping, or stacking.
The data file, smoking.sav, does not have the ID, but case # only.
From the menu in the Data Editor: Transform -> Compute
$casenum - a system-wide variable, used to store the identifier for each case
RESTRUCTURE DATA
In order to analyze the data, either in SAS or SPSS, each observation (not each subject)
must be on its own line.
In the following example, Anxiety 2.sav, we need to restructure these 4 treatments from
the columns to rows.
We
like
turn
selected variable
trial1 trial4 into
the cases.
From the menu in the Data Editor: Data -> Restructure

Step 1
Its typical way to sort your

data by the treatment in order
to do the analysis for repeat
measurement.
Restructure Data
Step 2
Data set will be restructured from wider to longer
Variables: trial 1 trial4

Cases: for a new variable score
Step 3
Step 3
Completed
Step 4
Step 5
You could use the old variable names as the values for this new variable.
The default Name is the Index.
Step 6
Step 7
Figure 30
Figure 31
Anxiety.sav
RestructureAnxiety.sav

Data Manipulation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Manipulation

Uploaded by

Copyright:

Available Formats

ADVANCED DATA ANALYSIS

CONCATENATING SPSS FILES

Concatenating refers to placing two things on top of one another.

An open dataset and an external SPSS data file

the Unpaired Variables box above.

What about the data type for the variable Salary?

Merging Files: one-to-one

To combine the working data file, EmployMain.sav with an external file

-> Add Variables (with

With EmployMain.sav opened

Both data sets must be sorted by the same variable(s).

Resulting (merged) data set

Merging Files: match on the basis of a particular variable

Specify a variable on which the two files can be matched.

Assumes that the EmployMain.sav file is open in the Data Editor.

Non-active dataset means the External file.

RANDOMLY SELECT THE SUBSET

Data -> Select Cases

Select Random sample of cases

Often we need to construct an ID variable as the identifier for the observation.

From the menu in the Data Editor: Transform -> Compute

From the menu in the Data Editor: Data -> Restructure

Its typical way to sort your

Data set will be restructured from wider to longer

Variables: trial 1 trial4

The default Name is the Index.

You might also like