You are on page 1of 94

Introduction to SAS Essentials

Mastering SAS for Data Analytics


Alan Elliott and Wayne Woodward

1
Chapter 4: Preparing Data for Analysis

2
LEARNING OBJECTIVES
To be able to label variables with explanatory names
To be able to create new variables
To be able to use SAS IF-THEN-ELSE statements
To be able to use DROP and KEEP to select variables
To be able to use the SET statement
To be able to use PROC SORT
To be able to append and merge data sets
To be able to use PROC FORMAT
Going Deeper: To be able to find first and last values in a
group

3
The information in this chaper is about the DATA
Step
RECALL typical SAS program flow:

DATA Step defines the data set. All of the


information in this
chapter involve the
DATA Step

PROCS are procedures that do


statistical procedures

4
4.1 LABELING VARIABLES WITH EXPLANATORY
NAMES
SAS labels are used to provide descriptive names for
variables.
The LABEL statement uses the format:

LABEL VAR1= ' Label for VAR1 '


VAR2= 'Label for VAR2';

Use either single or double quotation marks in the LABEL


statement, but you must match the type within each
definition statement.
When SAS prints out information for VAR1, it also
includes the label, making the output more readable.
5
Labeling Variables
The following program (DLABEL. SAS): illustrates the use of
labels.
DATA MYDATA;
INFILE 'C:\SASDATA\BPDATA.DAT'; * READ DATA FROM FILE;
INPUT ID $ 1 SBP 2-4 DBP 5-7 GENDER $ 8 AGE 9-10 WT 11-13;
LABEL ID = 'Identification Number'
SBP= 'Systolic Blood Pressure'
DBP = 'Diastolic Blood Pressure'
AGE = 'Age on Jan 1, 2000'
WT = 'Weight' ;
PROC MEANS; VAR SBP DBP AGE WT;
RUN;

Notice that the LABEL statement is placed within the DATA step.

6
Hands On Exercise p 77
Output without labels:

And with labels:

7
CREATING VARIABLES IN THE SAS DATA STEP
Arithmetic Operators (Table 4.4) page 79.

8
Order of Operations
You may remember the following mnemonic from a math
class that can help you remember the order of
operations: Please excuse my dear Aunt Sally.

Parentheses Exponents Multiplication Division Addition


Subtraction

9
EXERCISE - CREATING A NEW VARIABLE
Open the file named DCALC.SAS, (p 79) create a new
variable by calculation:

DATA ROOMSIZE; Note the creation of a new variable


INPUT ROOM $ W L; named AREA by calculation
AREA=W*L;
Label L="Length" W="Width" AREA="Sq. Feet";
DATALINES;
LIVING 14 22
DINING 14 12
BREAKFAST 10 12
etcetera
;
RUN;
PROC PRINT Label; SUM AREA;
RUN;

10
RUN THE CODE AND OBSERVE RESULTS

Note that the resulting SAS


data set (named ROOMSIZE)
contains a column labeled
Sq. Feet
This column is the calculated
AREA variable, and it is
labeled as Sq. Feet because
of the Label statement:
Label AREA="Sq. Feet";

11
NOTE THE PLACEMENT OF CALCULATION

Note how the AREA calculation is within the


DATA ROOMSIZE; DATA step, after W and L have been
defined, and before the DATALINES
INPUT ROOM $ W L;
statement.
AREA=W*L;
Label L="Length" W="Width" AREA="Sq. Feet";
DATALINES;
LIVING 14 22
DINING 14 12
BREAKFAST 10 12
etcetera

12
ADD TO THE EXERCISE
Suppose ceilings are 8 feet tall, and you want to calculate
the total living area volume (cubic feet for air
conditioning purposes).
Add the calculation
VOL= W*L*8;
To the code. Also add a label to display VOL as Volume
and add VOL to the SUM Statement.
Rerun and observe the results.

DO THE EXERCISE

13
Your new code should look something like this:

Note new
Volume column
Results

14
Creating New Variables as Constant Values
Type in this program and run it:

DATA PI;
INPUT RADIUS;
PI=3.1415927;
NOTE that the value of PI
AREA=PI*RADIUS**2; is a constant used in
CIRCUM=2*PI*RADIUS; subsequent calculations.
DATALINES;
10
100
1000
;
PROC PRINT;RUN;

When you look at the output, notice that PI is a variable in the data set.

15
4.3 USING IF-THEN-ELSE CONDITIONAL STATEMENT
ASSIGNMENTS
Another way to create a new variable in the DATA step is
to use the IF-THEN- ELSE conditional statement construct
Format is:
IF expression THEN statement; ELSE statement;
Thus
IF SBP GE 140 THEN HIGHBP=1; ELSE HIGHBP=0;

Creates a variable named HIGHBP with the values


1 or 0.

16
Comparison Operators

IF SBP GE 140 THEN HIGHBP=1; ELSE HIGHBP=0;


A comparison operator tells SAS how to evaluate a
condition in this care Greater Than or Equal To.

17
Logical Operators

IF AGE GT 19 AND GENDER=M then


GROUP=1; Use either the Mnemonic (OR)
or the symbol for OR, |
or
IF TREATMENT EQ A | GROUP=2 THEN
CATEGORY=GREEN;
18
A more complex use of operators
IF TRT=A THEN GROUP=1;
ELSE IF TRT=B OR TRT=C THEN GROUP=2;
ELSE GROUP=3;

Uses an ELSEIF Clause

19
Do Hands-On Exercise p 83
Open the file DCONDITION.SAS, and run the program.

Note the IF
statement

20
RESULTS

Notice the STATUS column

21
ADD A NEW IF STATEMENT
IF AGE GT 65 THEN GROUP="SENIOR";
ELSE IF AGE GE 18 and AGE LE 65
THEN GROUP="ADULT";
ELSE GROUP="YOUTH";
This code creates a new
variable named GROUP
Note the use of an according to the IF ELSE
ELSEIF Clause IF and ELSE statement.

Exercise: Add this code to the DCONDITION.SAS code.

DO THE EXERCISE

22
RESULTS
Notice the new
column

23
Using IF to Assign Missing Values
Be Careful: Data sets often contain missing data codes to
record when data are missing. For example for the variable age
you might assign an impossible value, say -9, as a missing value
code. Then

IF AGE EQ -9 then AGE = . ;


IF GENDER ="X" then GENDER="";

In your DATA Step assigns the SAS missing value code . (dot) to
AGE when the value is -9. You MUST do this for SAS to know how
to handle missing values in statistical procedures. For character
variables a missing value is a blank.

24
In this exampleWhat could go wrong?
IF AGE GT 12 AND AGE LT 20 THEN TEEN=1;
ELSE TEEN = 0;

Suppose this is your data


ID AGE WHAT VALUE FOR TEEN?

001 12
002 20
003 19
004 .

25
A better way
IF AGE GT 12 AND AGE LT 20 THEN TEEN=1;ELSE TEEN = 0;
IF AGE = . THEN TEEN = .;

Guarantees that if AGE is already


missing, TEEN will also be coded
as missing.

26
Do Hands-On p 84
Uses another method to create TEEN

IF AGE=. Then TEEN=.;


ELSE IF AGE GT 12 and AGE LT 20 then TEEN=1;
ELSE TEEN=0;

27
Using IF and IF-THEN To Subset Data Sets
Data sets can be quite large. You may have a data set that
contains some group of subjects (records) that you want
to eliminate from your analysis. In that case, you can
subset the data so it will contain only those records you
need.
One method of eliminating certain records from a data
set is to use a subsetting IF statement in the DATA step.
The syntax for this statement is as follows:

IF expression;

28
Subsetting IF
For example, to select records containing the (character)
value F (only females) from a data set, you could use this
statement within a DATA step:

IF GENDER EQ 'F';

Note that you can use single or double quotation


marks("F" or 'F' ) in this statement.

29
Subsetting with IF DELETE
The opposite effect can be created by including the
statement THEN DELETE at the end of the statement:

IF expression THEN DELETE;

For example, to get rid of certain records (all males) in a


data set, you could use the code

IF GENDER EQ 'M' THEN DELETE;

Do Hands-on Example p 86
30
Hands on Example p 86 (DSUBSET1.SAS)

Note the subsetting IF


statement

EXERCISE using the structure IF expression THEN DELETE;


Change the IF statement to delete data where GP=A --- PAUSE --- do this
example, then come back to the tutorial.

31
Answer
Using the statement

IF GP = "A" then DELETE;

You get these results. Notice no


group A subjects in the listing

32
Using IF-THEN and DO for Program Control
Another use of the IF statement is to control the flow of
your SAS program in conjunction with a DO statement. In
this case, you can cause a group of SAS commands to be
conditionally executed by using the following type of
code:

IF expression THEN DO;


SAS Code to conditionally execute;
END;

33
Example of Using IF THEN - DO
Suppose you want to calculate BMI (Body Mass Index) for
subjects in a data set, but the formula is only relevant for
subjects older than 19 years. Plus, at the same time you
want to assign other values for this same group of
subjects. You could use the code:

IF AGE GT 19 THEN DO;


BMI=(WTLBS/HTINCH**2)*703; The code within
DO to END is
ISADULT=1; performed only
if AGE GT 10.
INCLUDEINSTUDY=Yes;
END;
You MUST end the DO phrase with an END;
34
Using @ and IF to Conditionally Read Lines in a Data
Set
For big data sets, it is often the case that you don't want to
read in all of the data. One method you could use to
conditionally read in certain records is to set up a test
condition and read in the record only if it meets that
condition.
To do this, you can use the @ (at) sign in your input
statement:
Notice the use of the @ here stops
input so you can use an IF statement
INPUT GP $ 5 AGE 6-9 @ ;
IF GP EQ "A" and AGE GE 10 THEN
INPUT ID $ 1 -3 GP $ 5
AGE 6-9 TIME1 10-14 TIME2 15-19;

35
4.4 USING DROP AND KEEP TO SELECT VARIABLES
The DROP and KEEP statements in the DATA step allow you to
specify which variables to retain in a data set:

DROP variables;
KEEP variables;

For example,
DATA MYDATA;
INPUT A B C D E F G;
DROP E F;
DATALINES;
.. . etc ...
Do Hands on Example p 88

36
Do Hands on Example p 88
Open the program file DKEEP.SAS.

DATA MYDATA;
INFILE 'C:\SASDATA\EXAMPLE.CSV' DLM=','
FIRSTOBS=2 OBS=26;
INPUT GROUP $ AGE TIME1 TIME2 TIME3
TIME4 SOCIO;
KEEP AGE TIME1 SOCIO;
;
PROC PRINT; Notice the KEEP Statement
RUN;

37
Run the code, and observe the results
Notice, only the variables in the KEEP statement are in
the resulting SAS data file.

Exercise:
Change the KEEP statement to

DROP GROUP AGE SOCIO;

Rerun the program and observe the


results Pause here

38
Results of the DROP statement:

39
Extra: Using DROP, KEEP, and RENAME in the DATA
statement
Make this change in the program. Take out the DROP
statement and modify the DATA statement:
Notice how you can RENAME, DROP, or KEEP
within the DATA statement. Note that this
version of the DROP statement uses DROP=

DATA MYDATA (RENAME=(SOCIO=STATUS) DROP=TIME2-


TIME4);
INFILE 'C:\SASDATA\EXAMPLE.CSV' DLM=','
FIRSTOBS=2 OBS=26;
INPUT GROUP $ AGE TIME1 TIME2 TIME3
TIME4 SOCIO;
PROC PRINT; Exercise:
RUN; Pause, enter this code, and return when
youve run the new code.
40
Results of the RENAME and DROP in the DATA
Statement

Notice how SOCIO was renamed


STATUS.

41
4.5 USING THE SET STATEMENT TO READ AN
EXISTING DATA SET
Suppose you have a big data set you want to use
modified. Dont modify your ORIGINAL data set modify
a copy.

Original Data set Modified Copy

42
Another way to ENTER data
Suppose you already have a data set named OLD. You can
make a copy using

DATA NEW; SET OLD;


Or
DATA NEW; SET C:\SASDATA\OLD;

Now the NEW data set is identical to OLD. You can now
modify NEW without changing the original data set.

43
Creating a Data Set from an Existing Data Set

44
Using SET - Example 1
Suppose you have a data set named ALL. You want to
create two subsets, FEMALE and MALE.

DATA MALES; SET ALL; Creates a data set with only Males
IF GENDER ='M';
RUN;
DATA FEMALES; SET ALL;
IF GENDER =F'; Creates a data set with only Females
RUN;

Now you have three data sets to work withALL, MALES


and FEMALES.

45
Using SET - Example 2
You receive a data set from the government, and you
need to modify it before using it:
This is the original data set
This is the new (copied) data set

DATA MYSASLIB.STUDY; SET MYSASLIB.GOV;


IF AGE =-9 then AGE=.;
IF SBP=-99 then SBP=.; * A NUMBER OF RECODES;
BMI = WT /( HT **2) * 703; * A CALCULATION;
*Etc;

Thus youre manipulating a copy of the original


data and not changing the original file.

46
Do Hands-On Exercise p 90
Using a subsetting IF statement
DSUBSET3.SAS

47
4.6 USING PROC SORT
The SORT procedure can be used in the DATA step to
rearrange the observations in a SAS data set or create a
new SAS data set containing the rearranged observations.
The Sorting Sequence is shown in the table:
Sorting sequence information for SAS data sets
Character
variables blank!"#$%&'()*+,-./0123456789:;<=>?@
ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnop
qrstuvwxyz(|)~

Numeric
variables: Missing values first, then numeric values

Default
Ascending (or indicate Descending)
48
Syntax for PROC SORT
The syntax for PROC SORT is:

PROC SORT <options>; BY variable(s);

Common options for PROC SORT include:


DATA=datasetname;
OUT= outputdatasetname;

49
Example of PROC SORT
PROC SORT
DATA=MYDATA OUT=MYSORT;
BY RECTIME; OPTIONS Specifies that
a new resulting data set
be created named
Requires a BY variable. MYSORT

This example sorts the MYDATA data set by RECTIME and puts
the resulting data set into a new dataset named MYSORT. The
original data set is NOT CHANGED.
Do Hands-On Examples p 92 & 93. (DSORT1.SAS, DSORT2.SAS)

50
RESULTS:
Do Hands-On Example p 92
Open the file DSORT1.SAS

DATA MYDATA;
INPUT GROUP RECTIME;
DATALINES;
1 4.2
2 3.6
2 3.1
1 2.1
Note create a new Note SORT BY
1 2.8 data set named S1 RECTIME
2 1.5 Note sorted
1 1.8 column.
;
PROC SORT DATA=MYDATA OUT=S1; BY RECTIME;
Title 'Sorting Example - Ascending';
PROC PRINT DATA=S1;
RUN;

51
Exercise for PROC SORT
Change the PROC SORT statement to sort DESCENDING,
and save the results to a SAS data file named WORK.S2.

PROC SORT DATA=MYDATA OUT=S2;


BY DESCENDING RECTIME;

(Change the PROC PRINT so it will print the new data set.)

PAUSE:
Run this new code, then come back to the tutorial.

52
Results from PROC SORT
Note that now RECTIME
is descending.

53
Example p 93
Open DSORT2.SAS. Note the KEEP= and RENAME= statements.
This is similar to what we recently learned for DROP, KEEP, and
RENAME in the DATA Statement. Run this code.

PROC SORT DATA="C:\SASDATA\SOMEDATA"


OUT=ANALYSIS
(KEEP=ID GP AGE GENDER
RENAME=( ID=SUBJECT AGE=DXAGE));
BY ID;
RUN;
TITLE "Example of KEEP and RENAME in SORT";
PROC PRINT DATA=ANALYSIS;RUN;

54
Compare Before and After SORT and KEEP
Data Set Before SORT and KEEP

Data Set after SORT and KEEP


Note how ID is now SUBJECT
and AGE is now DXAGE.

55
4.7 APPENDING AND MERGING DATA SETS
Appending adds new records to an existing data set. (It is
sometimes called a vertical merge.)
OLD1
APPEND adds records to
the data set
OLD2

Merging adds variables to a data set through the use of a key


identifier that is present in both data sets (usually an
identification code.) (It is sometimes called a horizontal
merge.)
MERGE adds variables to the data set

OLD1 OLD2

56
APPENDING DATA SETS
Appending is accomplished by including multiple data set
names in the SET statement. For example,

DATA NEW; SET OLD1 OLD2;

NOTE: You can append many files:

DATA NEW; SET OLD1 OLD2 OLD3 OLD4 etc;

57
HANDS-ON EXAMPLE P 95
Open the file named DAPPEND1.SAS
The goal is to append data set
OLD1 with OLD2 using the
DATA step.

After the RUN statement, add


these statements

DATA NEW;SET OLD1 OLD2;


PROC PRINT DATA=NEW;RUN;

PAUSE the tutorial Enter


this new code and return

58
RESULTS OF APPENDING

The OLD2 data set is


added to OLD1
resulting in more
records for the NEW
data set

Note if variables in the


data sets dont match,
the resulting data set
will contain missing
values where variables
dont match.
59
Merging Data Sets by a Key Identifier
What is a Key Identifier?
Usually a variables at is an ID number, Subject Number, Patient
Number, etc
Must be unique for each person
There does not have to be the same number in each data set
to merge
Unmatching IDs will result in missing values
First, well look at a one-to-one match

60
Two Steps to a Merge: Sort, then Merge
The technique for merging the data sets using some key
identifier (such as patient ID) is as follows:

1. Sort each data set by the key identifier.


2. Within a DATA step, use the MERGE statement
along with a BY statement to merge
SORT each data set
Example: BY the key identifier
PROC SORT DATA=PRE; BY CASE;
PROC SORT DATA=POST; BY CASE;
DATA PREPOST; Then perform the
merge by the SAME
MERGE PRE POST; BY CASE;
key identifier

61
HANDS-ON EXAMPLE P 97
Open the file DMERGE1.SAS In this example, complete the
2nd PROC SORT by adding BY
CASE. Then do the MERGE.

DATA PREPOST;
MERGE PRE POST;BY CASE;
DIFF=POSTREAT-PRETREAT;
PROC PRINT DATA=PREPOST;RUN;

PAUSE the tutorial Enter


this new code and return

62
RESULTS

Note how the new


variable DIFF was
calculated during the
MERGE

63
Rename While Merging
As with the SORT statement, you can RENAME, DROP, and
KEEP variables during the MERGE. You can also merge
many files at a time. The following shows the syntax for
merging four data sets and performing a RENAME, DROP,
and KEEP on the third data set Note that the RENAME and DROP
DATA newdataset; are occurring in the DATA3 data
set in this example
MERGE datal data2
data3 (RENAME=(oldname=newname)
DROP=variables or KEEP=variables))
data4;
BY keyvar;RUN;

64
Hands On Example page 97 (DMERGE1.SAS)

DATA PREPOST;
MERGE PRE POST; BY CASE;
DIFF=POSTREAT - POSTTREAT;
TITLE 'Merge Example';
PROC PRINT DATA=PREPOST;
RUN; MERGE the data sets
PRE and POST, and
calculate DIFF.

RENAME the variable PRETREAT to BASELINE during the merge (part 4.)

65
Hands On Example continued
To RENAME a variable during the MERGE, use this code:

DATA PREPOST;
MERGE PRE (RENAME=(PRETREAT=BASELINE)) POST;
BY CASE;
DIFF=POSTREAT - BASELINE;
TITLE 'Merge Example';
PROC PRINT DATA=PREPOST; Note the RENAME of the
variable PRETREAT. Also
RUN; make sure you change the
variable name in the
calculation of DIFF.

66
Few-To-Many-Merge
A Few-To-Many merge is used when you have records in
one data set that you want to merge into some table that
contains (typically) a smaller number of categories.
Suppose you own an auto parts store. You sell products to
several kinds of buyers and each get a particular
discount.
You want to produce a report that shows the amount of
actual sales price for a number of purchases.

67
Data for Few-To-Many Merge (Hands-On p 99)

This is the
MANY data set.

This is the FEW


data set

68
How to set up the few-to many (match) merge
Define the Discounts (FEW) data set:
Repair Shops: 33% Discount
CONSUMERS 0% Discount
Other Auto Stores 40% Discount
Define the TYPE data set (The FEW)

DATA TYPE;
Note here that because you use a
FORMAT BUYERTYPE $8.; FORMAT statement to specify the
INPUT BUYERTYPE DISCOUNT; format of BUYERTYPE, you dont
DATALINES; have to indicate type in the INPUT
REPAIR .33 statement. Otherwise, that
CONSUMER 0 statement would have to be
STORE .40
INPUT BUYERTYPE $ DISCOUNT;
;

69
Define the MANY data set Note FORMAT
DATA SALES; Statement

FORMAT ITEM $20. BUYERTYPE $8.;


INPUT ITEM BUYERTYPE PRICE;
DATALINES; This is the MANY data
CARBCLEANER REPAIR 2.30 set in real life this may
be thousands of
BELT CONSUMER 6.99 transactions
MOTOROIL CONSUMER 14.34
CHAIN STORE 18.99
SPARKPLUGS REPAIR 28.99
CLEANER CONSUMER 1.99
WRENCH STORE 18.88
;
70
Prepare the two data sets for the merge
PROC SORT DATA=SALES; BY BUYERTYPE;
PROC SORT DATA=TYPE; BY BUYERTYPE;
And do the merge

DATA REPORT;
MERGE SALES TYPE; BY BUYERTYPE;
FINAL =ROUND(PRICE*(1-DISCOUNT),.01);
RUN;
PROC PRINT DATA=REPORT;RUN; * GET REPORT;

71
Few-to-Many Merge Results

Note: Final price


reflects the
proper discount

72
4.8 USING PROC FORMAT
The PROC FORMAT procedure allows you to create your
own custom formats.
These custom formats allow you to specify the
information that will be displayed for selected values of a
variable.
For example, suppose youve coded DISEASED and NOT
DISEASED as 0 and 1. You can create a format where 0
means DISEASED and 1 means NOT DISEASED so when
output is displayed the words instead of the number
codes appear.

73
Using PROC FORMAT
The steps for using formatted values are
1. Create a FORMAT definition using PROC FORMAT.
2. Apply the FORMAT to one or more variables. You can
apply a format (once it is defined in PROC FORMAT in a
DATA step or in a data analysis PROC statement.
3. For example: Choose any name for the format (similar
restrictions as for SAS variables.) We name them as
FMTsomething to make the name obvious.
PROC FORMAT;
VALUE FMTMARRIED 0 = "No" l = "Yes";
RUN;

74
Numeric and Character Formats
Define a format for a numeric
PROC FORMAT; variable.
VALUE fmtname1
number1=name1 For a character variable,
number2=name2 the format name must
start with a $, and the
etc; textnames must be in
quotes.
VALUE $fmtname2
textname1=name1
textname2=name2
etc;
RUN;

75
Example Numeric and Character Definitions
PROC FORMAT; Numeric format defined.
VALUE FMTMARRIED 0="No"
1="Yes";
VALUE $FMTGENDER M=Male
F=Female;
RUN;
Character format defined take
note of format name
$FMTGENDER and the values M
and F are in quotes

76
Ways to specify formats
Formats may also use ranges. For example, suppose that you
want to classify your AGE data using the designations Child,
Teen, Adult, and Senior. You could do this with the following
format:
Note different ways to indicate ranges.
PROC FORMAT;
Value FMTAGE LOW- 12 = 'Child'
13,14,15,16,17,18,19 = 'TEEN'
20 - 59 = 'Adult'
60 - HIGH = 'Senior';
RUN;

Do Hands On Exercise p 102 (DFORMAT1.SAS)

77
HANDS ON EXERCISE P 102
Open the file DFORMAT1.SAS Create the FORMAT

Use the FORMAT

Exercise Add a format for GENDER ($FMTGENDER) where


M=Male and F=FEMALE and add the format to the
FORMAT statement in PROC PRINT.
PAUSE Return when youve completed this exercise.
78
RESULTS

Results in this output

79
Assigning Formats to Many Variables
You can also assign the same format to several variables.
If you have questionnaire data with variables names Q1,
QS, Q7 where each question is coded as 0 and 1 for
answers Yes and No, respectively, and you have a format
called FMTYN, you could use that FORMAT in a procedure
as in the following example:

PROC PRINT;
FORMAT Q1 Q5 Q7 FMTYN. ;
RUN; Assigns the same format (FMTYN) to three
variables. Note the dot at the end of the
assigned format (REQUIRED)

80
Format Assignments (Data Set vs PROC)
Assign formats to variables within PROC STATEMENTS
Example: Assigning a FORMAT in a PROC makes the format
assignment only within that PROC (temporary)
PROC PRINT;
FORMAT GENDER $FMTGENDER. ;
RUN; Assigning a FORMAT in a DATA
Or in DATA statements statement makes the format
permanent in that data set.
DATA MYDATA;SET OLDDATA;
FORMAT GENDER $FMTGENDER. ;
RUN;

81
Creating Permanent Formats
In all the previous examples, formats were applied in a
PROC step and are considered temporary formats.
When you assign a format in a DATA step, you can also
store those formats in a (permanent) format catalog.
For example, to store an SAS format in a specified
permanent library location, you could use code such as
Creates a FORMAT LIBRARY

PROC FORMAT LIBRARY= MYSASLIB<.name>;

In this case, the MYSASLIB refers to an SAS library location


you have previously created.
82
FORMAT Libraries
For example, the code

PROC FORMAT LIBRARY = MYSASLIB;


VALUE FMTMARRIED 0="No" 1="Yes";
VALUE $FMTGENDER "F"="Female" "M"="Male";
RUN;

creates two subfolders in the MYSASLIB.FORMATS Formats


Library folder named FMTMARRIED and $FMTGENDER.

83
View Formats Folder
That is, when you create an SAS format catalog, a folder
icon appears in the designated SAS Library.
In this case, it is named FORMATS and appears in the
MYSASLIB library. You can verify its existence by
examining the MYSASLIB library using SAS Explorer.
If you double click on the FORMATS folder, you will see
sub folders named with the names of the formats you
have created.

84
Contents of a Format Folder
Click on the FMTMARRIED Formats folder to see its
contents the definition of the format:

85
Tell SAS About Your Formats
Once you have created permanent formats, you can use
them in both PROC and DATA step statements. To tell SAS
the location of a particular format, use the statement

OPTIONS FMTSEARCH=(proclib);

where PROCLIB is the name of the SAS Library where your


formats folder is located.

86
Using Stored SAS Formats
For example, if you have previously created and stored
the FMTMARRIED and $FMTGENDER formats in your
MYSASLIB. FORMATS folder, you could use the following
code to access those formats with PROC PRINT (or any
PROC.) Tells SAS where the formats are located

OPTIONS FMTSEARCH=(MYSASLIB.FORMATS);
PROC PRINT DATA="C:\SASDATA\SURVEY";
VAR SUBJECT MARRIED GENDER;
FORMAT MARRIED FMTMARRIED.
GENDER $FMTGENDER.;
RUN;
87
Discovering SAS Formats
To discover what formats are in a particular format
library, you can use the PROC CATALOG procedure as
shown here. This code displays all of the formats stored in
the MYSASLIB. FORMATS library.

PROC CATALOG CATALOG = MYSASLIB.FORMATS;


CONTENTS;
RUN;
QUIT;

88
Hands On Example p 106 (DFORMAT3.SAS)

LIBNAME MYSASLIB "C:\SASDATA";


PROC FORMAT LIBRARY = MYSASLIB.CAT1;
VALUE FMTMARRIED 0="No"
1="Yes";
VALUE $FMTGENDER "F" = "Female"
"M" = "Male";
RUN;

89
When Your SAS Format Library is Missing
Suppose you have a . SASB7DAT file that uses created formats, but
you do not have the format library? If you attempt to use that data
set, you will get the following error message in the log.

ERROR: The format FMTMARRIED was not found or


could not be loaded.

If this occurs, you must use the following OPTIONS statement (above
the code where you refer to the data set) to tell SAS to access the
data set, or run the procedure without using the defined formats:

OPTIONS NOFMTERR;
In this case, the output displays the raw values of the variables
instead of the assigned format labels.

90
Know the Difference: Format vs Label
A common mistake is to try to use Labels as Formats
or Formats as Labels. Make sure you know the
difference:

LABELS are descriptions for variables


Label AGE=AGE in 2013;

FORMATS are description for values


VALUE FMTMARRIED 0="No 1="Yes";

91
4.9 GOING DEEPER: FINDING FIRST AND LAST
VALUES
Suppose that you want to identify the first and last
person (ID) in each of those groups.
In an SAS DATA step, you identify the first and last values
by FIRST.GP and LAST.GP, where GP is the name of the
sorted grouping (or key) variable.
Do Hands on Example p 107. (DFINDFIRST.SAS)

92
4.10 SUMMARY
This chapter discussed several techniques for preparing
your data for analysis. In the next chapter, we begin the
discussion of SAS procedures that perform analyses on
the data.
Continue to Chapter 5: Preparing to Use SAS Procedures

93
These slides are based on the book:

Introduction to SAS Essentials


Mastering SAS for Data Analytics, 2nd Edition

By Alan C, Elliott and Wayne A. Woodward

Paperback: 512 pages


Publisher: Wiley; 2 edition (August 3, 2015)
Language: English
ISBN-10: 111904216X
ISBN-13: 978-1119042167

These slides are provided for you to use to teach SAS using this book. Feel free to
modify them for your own needs. Please send comments about errors in the slides
(or suggestions for improvements) to acelliott@smu.edu. Thanks.

94

You might also like