You are on page 1of 24

ADVANCED DATA ANALYSIS

Submitted By:
Anitha Kumari
MBA 2nd year,
Section A, DMS, PU

Topic:
DATA MANIPULATION

DATA MANIPULATION
Using SPSS
Concatenating & Merging
Randomize Sample
Creating an ID variable
Data Restructuring

CONCATENATING SPSS FILES

Concatenating refers to placing two things on top of one another.

Concatenating data files basically involves adding cases to an existing data file.

In order to use this option, the two files must be identical in number of variables and
variable names.

From the menu in the Data Editor: Data -> Merge Files -> Add Cases

An open dataset and an external SPSS data file

You do have an option to rename variables by clicking the "Rename" button, but it

actually works better if the variables in the two files are the same prior to merging.
The option to "Indicate case source as variable" will indicate which data file your cases

came from.
If you happen to have more variables in the data set to be merged, they will show up in

the Unpaired Variables box above.


# of observations = # of obser. in file A + # of obser. in file B

What about the data type for the variable Salary?

Merging Files: one-to-one

To combine the working data file, EmployMain.sav with an external file


(EmployBeginSalary.sav) that contains two variables, employee ID (id) and beginning
salary (salbegin).

From the menu in the Data Editor: Data -> Merge Files
EmployMain.sav opened)

-> Add Variables (with

With EmployMain.sav opened

The variable id has identical values to the variable with the same name in the
EmployMain.sav, whereas the variable salbegin is unique to the external data file,
EmployBeginSalary.sav.

The variable names that appear in both datasets will be in the box labeled Excluded
Variables id

At least one variable must be common to both files in order to perform a merge.

In this example, only the variable id appeared in the Key Variables box because it was the
only variable that is in both data files.

Both data sets must be sorted by the same variable(s).

Resulting (merged) data set

Merging Files: match on the basis of a particular variable

For example, you may want to add to your data file a data column containing the average
salary for a person's job category.

The dataset for this example, MeanSalary.sav, is shown below, containing a variable for
job category, jobcat, and a variable representing the average salary for that job category,
meansalary.

Both files need to be sorted on the key variable, jobcat in this example.

Specify a variable on which the two files can be matched.

Assumes that the EmployMain.sav file is open in the Data Editor.

Non-active dataset means the External file.

RANDOMLY SELECT THE SUBSET

From the menu in the Data Editor: Transform -> Random Number Generator.

This step tells SPSS to start at a random place in its table of random numbers.

When doing research involving random numbers for example, when randomly
assigning cases to experimental treatment groups you should explicitly set the random
number seed value if you want to be able to reproduce the same results.

Data -> Select Cases

Select Random sample of cases

For example, we could select 60% of cases for the model building, and 40% for the
model validation.

CREATING AN ID VARIABLE

Often we need to construct an ID variable as the identifier for the observation.

The ID variable is very useful for any data merging, grouping, or stacking.

The data file, smoking.sav, does not have the ID, but case # only.

From the menu in the Data Editor: Transform -> Compute

$casenum - a system-wide variable, used to store the identifier for each case

RESTRUCTURE DATA

In order to analyze the data, either in SAS or SPSS, each observation (not each subject)
must be on its own line.

In the following example, Anxiety 2.sav, we need to restructure these 4 treatments from
the columns to rows.

We

like

turn

selected variable
trial1 trial4 into
the cases.

From the menu in the Data Editor: Data -> Restructure


Step 1

Its typical way to sort your


data by the treatment in order
to do the analysis for repeat
measurement.

Restructure Data
Step 2

Data set will be restructured from wider to longer

Variables: trial 1 trial4


Cases: for a new variable score

Step 3

Step 3

Completed

Step 4

Step 5
You could use the old variable names as the values for this new variable.

The default Name is the Index.

Step 6

Step 7

Figure 30

Figure 31

Anxiety.sav

RestructureAnxiety.sav

You might also like