Professional Documents
Culture Documents
Submitted By:
Anitha Kumari
MBA 2nd year,
Section A, DMS, PU
Topic:
DATA MANIPULATION
DATA MANIPULATION
Using SPSS
Concatenating & Merging
Randomize Sample
Creating an ID variable
Data Restructuring
Concatenating data files basically involves adding cases to an existing data file.
In order to use this option, the two files must be identical in number of variables and
variable names.
From the menu in the Data Editor: Data -> Merge Files -> Add Cases
You do have an option to rename variables by clicking the "Rename" button, but it
actually works better if the variables in the two files are the same prior to merging.
The option to "Indicate case source as variable" will indicate which data file your cases
came from.
If you happen to have more variables in the data set to be merged, they will show up in
From the menu in the Data Editor: Data -> Merge Files
EmployMain.sav opened)
The variable id has identical values to the variable with the same name in the
EmployMain.sav, whereas the variable salbegin is unique to the external data file,
EmployBeginSalary.sav.
The variable names that appear in both datasets will be in the box labeled Excluded
Variables id
At least one variable must be common to both files in order to perform a merge.
In this example, only the variable id appeared in the Key Variables box because it was the
only variable that is in both data files.
For example, you may want to add to your data file a data column containing the average
salary for a person's job category.
The dataset for this example, MeanSalary.sav, is shown below, containing a variable for
job category, jobcat, and a variable representing the average salary for that job category,
meansalary.
Both files need to be sorted on the key variable, jobcat in this example.
From the menu in the Data Editor: Transform -> Random Number Generator.
This step tells SPSS to start at a random place in its table of random numbers.
When doing research involving random numbers for example, when randomly
assigning cases to experimental treatment groups you should explicitly set the random
number seed value if you want to be able to reproduce the same results.
For example, we could select 60% of cases for the model building, and 40% for the
model validation.
CREATING AN ID VARIABLE
The ID variable is very useful for any data merging, grouping, or stacking.
The data file, smoking.sav, does not have the ID, but case # only.
$casenum - a system-wide variable, used to store the identifier for each case
RESTRUCTURE DATA
In order to analyze the data, either in SAS or SPSS, each observation (not each subject)
must be on its own line.
In the following example, Anxiety 2.sav, we need to restructure these 4 treatments from
the columns to rows.
We
like
turn
selected variable
trial1 trial4 into
the cases.
Restructure Data
Step 2
Step 3
Step 3
Completed
Step 4
Step 5
You could use the old variable names as the values for this new variable.
Step 6
Step 7
Figure 30
Figure 31
Anxiety.sav
RestructureAnxiety.sav