You are on page 1of 126

Base SAS programming skills

-Archit Kumar SI BI BOFA


2008 Infosys Technologies Ltd. Strictly private and confidential. No part of this document should be reproduced or distributed without the prior permission of Infosys Technologies Ltd.

Contents

Introduction to SAS Introduction to SAS programs SAS dada libraries. Producing list report- Print procedure Customizing report appearance creating HTML reports Reading raw data files Dropping and keeping variables Concatenating SAS data sets Producing summary reports Introduction to graphics

Controlling input and output


Summarizing data

Reading and writing different types of data Data transformation - Manipulating character values - Manipulating numeric values - Manipulating date values

Do loops in SAS SAS arrays Match merging two or more data sets Using SQL queries in SAS SAS macros

Basic efficiency techniques

Overview of SAS system


Functionality of SAS system is built around the four data driven tasks 1. Data access address the data required by the application 2. Data Management shapes data into a form required by the application 3. Data analysis summarizes, reduces, or otherwise transforms raw data into meaningful and useful information 4. Data representation communicates information in ways that clearly demonstrate its significance

Data Processing
Process of delivering meaningful information -Accessing data - Transforming data - Managing data - Storing and retrieving data - Analysis Raw data

Data Step SAS data set

SAS data set

Proc Step

Report

Introduction to SAS program


/*********************Data step*****************************/ data work.staff; infile raw data file; input LastName $ 1-20 FirstName $21-30 JobTitle $ 36-43 Salary 5459; run; /**********************************************************/ /********************Proc Step******************************/ proc print data=work.staff; run; proc means data=work.staff; class JobTitle; var Salary; run; /**********************************************************/

Fundamental Concepts
SAS data sets Descriptor portion
proc contents data= SAS data set; Run; Proc contents displays the following information about the data set General information about the data set such as data set name, number of observation, number of variables etc. Variable attributes such as name, type, length, position, informat, format etc.

Data portion
proc print data= SAS data set; Run; The data portion shows the data present in the data set in tabular form showing the variables which corresponds to fields and observations which corresponds to the data lines.

SAS variables
There are two types of variables Character Contains any value i.e. letters, numbers, special characters and blanks. Character values have length ranging from 1 to 32767 characters. Numeric Stored as floating point numbers in 8 bytes of storage by default. Eight byte floating point storage provide space for 16 significant digits. SAS variable names Can be 32 characters long. Can be uppercase, lowercase or mixed case. Must start with a letter or underscore. Subsequent characters can be letters, underscore or numeric digits. Date values SAS date values are stored as numeric values. Date value is stored as number of days between January 1, 1960.

SAS - Syntax Rules


Usually begins with an identifying statement Always end with a semicolon. SAS statements are free format. They can begin and end in any column A single statement can span multiple lines. Several statements can be on the same line.

Comments Multiple line comment begins with /* and ends with */. Single line comments can be written by putting an asterisk at

the beginning of line.

SAS Data Libraries


SAS data library is a collection of SAS files that are recognized as a unit by SAS. Where SAS data sets are referred to as SAS files here. Types of SAS libraries Temporary library- When SAS is invoked, it automatically gives access to temporary library which is named as work. Datasets made here are removed once the SAS session ends. Permanent library- SASUSER is the permanent SAS library present in SAS. We can create permanent SAS library using libname statement. Syntax Libname libref SAS data library <options>. Rules- 1. Name of library must be 8 characters or less. 2. Must begin with a letter or underscore. 3. Remaining characters are letters, numbers or underscores. e.g. libname Test_lib c:\workshop\prog1; Once the libname is specified, datasets can be created inside the library by refering to the data sate by libref.filename or libname.data set name

10

PRINT Procedure

General form of print procedure proc print data= SAS data set; run;

The print procedure prints the dataset with all the columns adding a column of observation to it, which has the row number.

Features of print procedure 1. Titles and footnotes Discussed in subsequent slides 2. Formatted value - Discussed in subsequent slides 3. Printing selected variable proc print data= ia.empdata; var empname salary jobcode; run; This statement prints the selected variables only in the order in which they are written.

11

4. Suppressing the observation columns NOOBS option proc print data= ia.empdata noobs; run; 5. Sub setting data Where statement is used to select some observations only. Syntax where <condition>; Where condition contains operators(constants or variables) and operands(comparison, logical, special operators or functions). e.g. Comparison Where salary>25000; Logical Where Jobcode=A and Salary=25000; similarly or and not can also be used. Special operator Between Where salary between 5000 and 7000; Contains(?) Where lastname ? LAM; Example of proc step with where clauseproc print data= iq.empdata; var Jobcode Empid Salary; where Jobcode = A and Salary between 20000 and 30000 ; run; 6. Column totals Sum statement is used to get column total. e.g. proc print data = ia.empdata; var jobcode Salary Empid; sum salary; run;
12

Special where statements


Additional special operators supported by where statement are

Like It selects observation by comparing character values to specified patterns. e.g. where code like E_U%; It searches for code value beginning with E, followed by a single character, followed by a U, followed by any number of characters.

Sounds like The sounds like (=*) operator selects observation that contains spelling variations of the word specified. e.g. where name =* SMITH; Selects name like SMYTHE and SMITT.

IS NULL or IS MISSING Selects observations in which the value of the variable is missing. e.g. where flight is missing; where flight is null;
13

Sequencing and Grouping observations

Sort procedure Sort procedure is used to sequence the observation. 1. Re arranges the observations in SAS dataset. 2. Can create new dataset with re arranged data.

3. Can sort on multiple values.


4. Does not generate printed output. 5. Treats missing value as the smallest possible value. 6. Sorts in ascending order by default.

Syntax proc sort data = input dataset out= Output dataset;


by <descending> by-variable; run; e.g. proc sort data= ia.empdata out=work.jobsal; by jobcode descending salary; run;

14

Grouping data and Printing Subtotals and Grand totals Using a by clause with proc print procedure groups the data according to the different values of that variable. e.g. proc print data=ia.empdata; by jobcode; sum salary; run; The above code groups the data according to jobcode values and the sum statement prints the sum of salary for different groups of jobcode, which is the sub total. Note- Data must be indexed or sorted in order to use by clause. Page Breaks PAGEBY statement is used to put each subgroup on a separate page. e.g. - proc print data=ia.empdata; by jobcode; pageby jobcode; sum salary; run; Pageby must be used along with a by clause and the variable appearing in the by clause only can be used in the pageby clause.
15

Enhancing outputs

ID statement- ID statement is used to suppress the obs column and the variable used with id replaces the obs column i.e. is placed left most. We can use ID statement along with BY statement. ID statement places the variable left most in place of obs and if a BY clause is also there for the same variable then it groups data according to that variable. e.g. proc print data=ia.empdata; id Jobcode; by Jobcode;

pageby Jobcode;
sum Salary; run; The above code will print the output page wise according to groups of Jobcode working as id i.e. in place of obs column and at the end of each page sum of salary values at that page will be displayed.

16

Customizing Report Appearance


Titles and Footnotes
1. Titles appear at the top of the page. 2. Default SAS title is The SAS System. 3. The null title statement, title; , cancels all titles. 4. Footnote appears at the bottom of the page. 5. No footnote appears unless one is specified.

6. The null footnote statement, footnote;, cancels all footnote.


7. More than one titles and footnotes can be specified in one proc step by

numbering the title/footnote. E.g. title1 First Line; title2 Second Line. After getting the second title first one is cancelled.

8. More than one titles or footnotes can be defined by number them title1,title2,,titlen. The value of n can be 10.

17

Column Labels This assigns labels to different fields.

e.g. proc print data=ia.empdata label; label lastname=Last Name Firstname=First Name; run; split = option if placed instead of label in the proc print statement , splits the label into two lines based on the delimiter specified.

SAS System Options SAS options are used to change the appearance of report.

18

1. 2. 3. 4. 5. 6. 7.

Date specifies to print the date and time at which SAS session began at the top of each page. Nodate Specifies not to print the date and time. Linesize =width Specifies the line size. Pagesize=n - Specifies the number of lines per page. Number Specifies that page number be printed on the first line of each page output. Nonumber specifies page number not to be printed. Pageno=n Specifies the beginning of the page number.

Example options nodate nonumber ls=72; Option statement is not placed in a data or proc step.

19

Formatting Data Values


To apply a format to a specific SAS variable, use the format statement. General form of format statement FORMAT variable name format; Example proc print data=ia.empdata; format Salary dollar11.2;

run;
The above code will print the data with salary values formatted, preceded by a dollar sign, with commas, having a total length 11 and 2 decimal places.

20

SAS Formats

SAS Formats w.d e.g. 8.2 $w. $5. Commaw.d Comma9.2 Dollarw.d Dollar10.2

Description Standard numeric format Width=8, 2 decimal places Standard character format Width=5 Commas in a number Width=9, 2 decimal number Dollar sign and commas Width=10, 2 decimal places

21

Date Formats
SAS dates are stored as the number of days between 1st January 1960 and the specified date. So date formats are used to print dates in the standard form. Date formats available and values they display are(e.g. Date= 16Oct2001)-

Format

Displayed Value

MMDDYY6.
MMDDYY8. MMDDYY10. DATE7.

101601
10/16/01 10/16/2001 16OCT01

DATE9.

16OCT2001

22

User Defined Format


Format procedure can be used to define custom formats. General from of PROC FORMAT proc format; value format-name range1=label; ..;

Example proc format; value gender 1=Female 2=Male other=Miscoded;

run;
Above code defines a user defined format gender that replaces the values 1, 2 and other with respective labels.

23

Assigning character values to and range of characters labels. proc format; value $grade A=Good B - D=Fair F = Poor Other= Miscoded; run;

Applying format proc print data=ia.student; format CGPA $grade.;

run;

24

Creating HTML reports


ODS(Output Delivery System) method is used to create output in variety of forms. ODS HTML statement opens, closes and manages the HTML destination. General form of ODS methodODS html file=HTML file specification; SAS code;

ODS html close;

Example ODS html file=D:\odscode.html; proc print data=ia.empdata;

run;
ODS html close;

25

Reading raw data file


Steps for creating SAS data set

Start a data step and name the SAS data set being created(DATA statement). DATA libref.SAS-data-set e.g. - data work.dwflax;

Identify the location of the raw data file to read(INFILE statement). INFILE Filename e.g. infile C:\workshop\dwflax.txt

Describe how to read the data fields from the raw data file(INPUT statement). INPUT input specifications;

26

Input specification
Name SAS variable. Identifies the variable as character or numeric. Specifies the locations of the fields in the raw data file. Can be specified as column, formatted, list or named input.

Example data set


Data work.dwflax;
infile C:\workshop\dwflax.txt; input Flight $ 1-3 Date $ 4-11 Dest $ 12-14 FirstClass 15-17; run;

27

Formatting Input
Formatted input is used to read data values by

Moving the input pointer to the starting position of the field. Specifying a variable name. Specifying an informat. @n : Moves the pointer to column n. +n: Moves the pointer n positions. <$> informat name w.<d> In the above code $ specifies character value, w specifies the total width of field, . specifies the delimiter and d specifies number of decimal places.

Pointer controls

Informat statement is specified in the following way

28

Example
Data work.dfwlax; Infile Raw data file; Input @1 Flight $3. @4 Date mmddyy8. @12 Dest $3. @15 Firstclass 3. @18 Economy 3.; Run; The above code reads Flight starting from 1st position till 3 characters in character format, Date form 4th position in mmddyy8. format, 3 characters for Dest form 12th position in character format, 3 numbers for Firstclass starting form 15th position in integer format and Economy from 18th position till 3 integers.

29

Reading SAS data sets


Steps for creating a SAS data set using another data set.

DATA statement to start a DATA step and name the SAS data set being created. SET statement to identify the SAS data set being read. To create a variable use assignment statement to modify the values of existing data set variable(s).

Example Data work.new_data; set ia.dwflax; total = FirtsClass + Economy; Run; The above code reads all the fields and observations from dwflax and creates a new field in new_data named total.

30

Operators

Operator + *

Action Addition Subtraction Multiplication

Example Sum = x + y Diff = x y Mul = x * y

Priority III III II

/
** -

Division
Exponentiation Negative prefix

Div = x / y
Raise x ** y Negative = -x

II
I I

Operations of priority I are performed first, then II and III, right to left for priority I and left to right for II and III

31

Using SAS functions

SUM function - Calculates the sum of arguments. e.g. Total = Sum(FirtsClass,Economy); Sum function calculates the sum even value is missing for any argument, whereas simple addition does not for any missing value.

Today() Obtains the date value from system clock. MDY(month,day,year) Uses numeric values of month, date and year values to return the corresponding SAS date value. Year(SAS Date) extracts year from a SAS date and returns a four digit value. QTR (SAS Date) Extracts date from SAS date and returns 1 to 4.

32

Month(SAS date) Extracts month from SAS date and returns from 1 to 12. Weekday(SAS date) Extracts day of the week from SAS date returns number from1 to 7, where 1 is Sunday and so on.

33

Dropping and Keeping variables


Drop and Keep statements can be used to control what variables are written to the new data set. General from Drop variables; / Keep variables; Example data test_new; set ia.dwflax; drop FirstClass Economy; Total = FirstClass + Economy; run; The above code creates new data set without FirstClass and Economy variables and with total variable.

34

Conditional processing

IF Then Else clause can be used to conditionally process rows and select some of the observations. Example data flightrev; set ia.dwflax; total=sum(Firstclass,Economy); if Dest=LAX then revenue=sum(2000*Firstclass,1200*Economy); else if Dest=DFW then revenue=sum(1500*Firstclass,900*Economy);

run;

35

Executing set of conditional statements

Do and End statement can be used to execute a set of statements. Example data flightrev;

set ia.dwflax;
total=sum(Firstclass,Economy); if Dest=LAX then do; revenue=sum(2000*Firstclass,1200*Economy);

city=Dallas;
end; else if Dest=DFW then do; revenue=sum(1500*Firstclass,900*Economy); city=Los Angeles; end; run;
36

Variable Lengths

At compile time, the length of a variable is determined the first time the variable is encountered. To overcome this, we specify length of the variable prior to assignment;

e.g. In the previous example, first encountered value of city is Dallas, so the length of city is 6 and Los Angeles will be truncated to Los An. To avoid this we can specify length of the variable city before the if condition. length city $ 11;
$ specifies character value.

37

Deleting or Selecting Rows

Rows can be deleted using a Delete statement with if condition. Example In the previous example we can add one more condition after the total statement as if total le 175 then delete; This statement will delete the rows for which the value of total is less than 175.

Similarly we can select rows by using if statement without delete. Example if total gt 175;

Similar to above conditions we can also compare date values with constant date value written in the form ddMMMyyyyd.

38

Concatenating SAS data sets


Steps for concatenating DATA sets Use the SET statement in DATA step to concatenate SAS data sets Use the Rename = data set option to change the names of the variables Use SET and BY statements to interleave data sets. General form DATA SAS data set;

SET SAS data set1 SAS data set2;


run; The above code works similar to UNION in SQL query.

39

Example data newhires; set n1 n2; run;

If the number and name of fields are same in na1 and na2, then newhires will have all the fields with data from na2 following the data from na2. If the name of fields are different then we can rename the fields using RENAME statement. E.g. if there Name, Gender, Jobcode in na1 and Name, Gender and Jcode in na2 then we can rename Jcode as Jobcode.

40

Example data newhires; set na1 na2(rename=(Jcode=Jobcode)); run;

We can also interleave the resulting data set using BY statement. data newhires; set na1 na2 (rename=(Jcode=Jobcode)); by name; run;

The above code orders the newhires data set by name.

41

Merging Data Sets


MERGE statement is used to merge corresponding observations from two or more data sets. General form DATA SAS data set; Merge SAS data sets; By BY- variable; run;

The above code will form a resulting data set having by variable filed and all the other fields and data corresponding to every common value of by variable and for different values the fields of other data sets will be having null. So merge statement works like a join statement of SQL.

42

Conditional merging

IN= option is used to determine which data set contribute to current observation. Using this we can determine whether the join will be left or right or any other condition.

Example
Data work.combine; Merge ia.gercrew(in = Increw) work.gersched(in = Inshced); by EmpId; if Insched=1; run;

In= option above gives an alias to every observation of that data set and the if condition specifies that observation will be written to resulting data set if value for Inshced is not null or not missing.

43

Additional Features

In addition to oneto-one merge, there can be one to many and many to many merges. In one to many merge, unique value of one data set has many matches in other dataset, which results in that many entries in final data set with same value for first dataset and different values for the second. In many to many merges, many values of first dataset matches with many entries on second dataset, in this case the dataset in which extra entries are present are matched with the last entry having that value in the other dataset.

44

Summary Reports
Summary report procedures used are
Proc Freq Calculates frequency counts.
Proc Means Produces simple statistics. Proc Report Produces flexible, detailed and summary reports.

45

Proc Freq

Proc Freq procedure displays the frequency counts of the data values in a SAS data set. It analyzes every variable in the SAS data set. Displays each distinct data value. Calculates the number of observations in which each data value appears and the corresponding percentage. Indicates for each variable how many observations have missing values. Example proc freq data=ia.dfwlax; run;

46

Features of proc freq

We can limit the number of variables whose frequency we want to see. Tables option is used to limit the number of variables. SAS creates separate frequency for each variable specified after table options separate by a space. Example proc freq data=ia.dfwlax;

tables economy flight;


run;

Nlevels option is used to display the number of levels in the frequency report i.e. frequency for how many values is given.

Noprint option is used for not displaying the frequency counts, it is generally used with nlevels when only number of levels is required.
Example proc freq data=ia.dfwlax nlevels; tables _all_ / noprint;

title Number of levels;


run;

Formats can also be used while displaying frequency reports.


47

Cross tabular frequency


A cross tabular frequency report analyzes all possible combinations of the distinct values of the two variables. Example proc format;
value $codefmt FLTAT1 FLTAT2 = Flight Attendant PILOT1 PILOT2 = Pilot; value money low - <25000=Less than 25000

25000 50000=25,000 to 50,000


50000 < - high = More than 50000; run; pro freq data=ia.crew;

tables jobcode*salary;
format jobcdoe $codefmt. salary money.; run;

Crosslist option can be used similar to noprint for result in listing form.
48

Proc Means

This procedure gives the number observation, mean, standard deviation, minimum and maximum value for every field in the SAS data set. Additional statistics that can be obtained are range, median, sum and nmiss(number of missing values). Var statement can be used for limited the output to some fields and Class statement can be used to categorize the output corresponding to any variable.

Example
proc means data=ia.crew; var salary; class jobcode;

title Salary for Job code;


run;

49

Proc Report
Proc report enables

Creating listing reports. Using report procedure. Creating summary report using SUM, GROUP and ORDER statements. Enhance reports. Request separate subtotals and grand totals. Extra features provided by report procedure in comparison to print procedure are 1. 2. 3. Summary Report. Cross tabular Report. Sort data for report.

50

Report procedure
Default listing displays

Each data value as it is store in the data set. Variable names as report column headings Default width for columns. Character value as left justified. Numeric values as right justified.

Printing selected variable

COLUMN statement is used in order to print selected variables and in the order in which they are specified.
Example Title Salary Analysis;

Proc report data=ia.crew;


Column Jobcode Location Salary; Run;
51

Define statement

Reports can be enhanced using define statement using various attributes. General from DEFINE variable / <attribute list>; Functions of DEFINE statement 1. Format variables, default format is the format stored in the SAS data set 2. Width Width if the variables can be assigned, the default width is variable for character variables and 9 for numeric variables or the width stored in the data set. 3. Order It orders the values of that variable in ascending order by default. Descending need to be mentioned specifically. Suppresses repetitive values.

52

Group variable group option can be used with many variables. It is shown in the report in the order in the order in which variables are written. Order can not be used with group. This also displays the sum of numeric variables for each group, if group is not used then grand total of numeric values is displayed. Sum This is used to print the sum of all values. Mean Used for displaying mean of all the values.

N Used for displaying the number of non missing values.


Max Used for displaying the maximum value. Min Used for displaying the minimum value.

53

RBREAK

This is used for following purposes 1. Adding grand total at the top or the bottom of the page. 2. Adding line before grand total. 3. Adding line after grand total.

General Form RBREAK Before | After </options>; Options 1. Summarize prints the total. 2. OL - Prints a single above the total. 3. DOL Prints double line above the total. 4. UL Prints single line below the total.

5. DUL prints double line below the total.

54

Introduction to Graphics Bar and Pie Charts

GCHART procedure is used to specify a chart with following features 1. Specify the form of the chart. 2. Identify the chart variable. 3. Optionally identify an analysis variable.

General form Proc GCHART data =SAS data set; HBAR/VBAR/PIE Chart variable name </Options>; Run;

This produces chart for different values of chart variable with the length of the bar of size of the pie depending on the frequency of that value. For numeric values SAS automatically divide into intervals and midpoints are identified and one bar for each midpoint is created. To ovoid this we can use DISCRETE option.
55

Options Contd.

SUMVAR This specifies the summary variable against the bar variable and replaces the frequency with that variable. TYPE Used along with SUMVAR variable so as to specify on what basis the summary variable need to be classified for bar variable. E.g MEAN | SUM. Example Proc gchart data=ia.crew; vbar Jobcode / sumvar=Salary type=mean; run; The above code will print a vertical bar chart with jobcode as bar variable, whose length will be decided by mean of salary for a particular jobcode.

FILL This option is used with pie charts so as to specify whether to fill pie slices in a solid (FILL=S) or a cross hatched (FILL=X) patten.
EXPLODE EXPLODE = Value, this option explodes the pie chart for that particular value.
56

Producing PLOTS

GPLOT is used to plot one variable against another variable using coordinate axis. General Form Proc GPLOT data=SAS data set; PLOT vertical variable* horizontal variable </Options>; Run;

You can 1. Specify the symbol to represent data. 2. Use different methods of interpolation. 3. Specify line styles, colors and thickness.

4. Draw reference lines within the axes.


5. Place one or more plot lines within the axes.

57

Example
Proc GPLOT data = ia.Flight;

where date between 02mar2001d and 08mar2001d; plot Boarded * Date; title Total Passengers for flight 114; title2 between 02mar2001 and 08mar2001; run;
This will plot boarded against date for the specified flight dates. The symbol used here by default will be plus + and values will

be shown discrete without any interpolation.

58

Options

SYMBOL Options which symbol statement can take are 1. VALUE It specifies the symbol for showing the values, which can be plus(default), star, diamond, square, triangle and none. 2. I This signifies the interpolation, which can have values I= join/needle/spline. 3. Width(w) This specifies the width of the line. 4. Color( c ) This specifies the color of the line.

Example Proc GPLOT data = ia.Flight; Plot Boarded * Date; Symbol value=square i=join w=2 c=red; Title Total Passengers for flight 114;

59

Controlling Axis

We can use the following options with PLOT statement 1. HAXIS It scales the horizontal axis. 2. VAXIS It scales the vertical axis. 3. CAXIS Specifies color of both the axes. 4. CTEXT Specifies the color of text on both axes.

Example Plot Boarded * Date / Vaxis = 100 to 200 by 25 ctext=blue;

60

Outputting Observations

A SAS data step implicitly outputs the contents of PDV to data set, if we write an explicit output statement, it overrides the implicit output. General form - OUTPUT <SAS data set1> <SAS data set2>...; Output statement can be used to 1. Create two or more SAS observations from each line of input 2. Write observation to multiple SAS data sets.

Example

61

Data forecast; drop numemps; set prog2.growth; year=1; Newtotal=Numemps *(1 + increase); output;

year=2;
Newtotal=newtotal*(1 + increase); output; year=3; Newtotal=newtotal*(1 + increase); output; Run;
62

Writing to multiple data sets


Output statement is used to write observations to desired data

sets.
Example
data army navy airforce; drop type; set prog2.mlitary;

if type eq Army then


output army; else if type eq Navy then output navy;

else if type eq Air force then


output airforce; run;
63

First Obs and Obs statements can be used to control the number of observations to be read by a dataset. OBS statement Set prog2.military(obs = 25); this statement selects first 25 observations from the input dataset into the output data set. First Obs statement Set prog2.military (firstobs=11 obs=25); this statement starts reading observations into military data set starting form 11th observation of the input data set till 25th observation.

64

Writing to an external file


Data can be written to an external file using either ODS method or FILE statement. ODS method ods csvall file=raw data file; proc print data=prog2.maysale noobs; format listdate selldate date9.; run;

ods csvall close;

File statement data _null_; set prog2.maysales;

file raw data file;


put description listdate ; date9.; run;
65

_N_ and ISLAST automatic variables data _null_; set prog2.maysales;

file raw data file;


if _N_=1 then put Description ListDate; put description

listdate ; date9.;
if ISLAST = 1 then put End of data; run;

Specifying delimiter DLM= option is used to specify the delimiter in the file. Example file raw data file DLM=,;

66

Summarizing data

Creating an accumulating variable We can use RETAIN statement to create a variable having a running sum of another numeric variable. Retain statement

1. Retains the value of the value of the variable in the PDV across iterations of the data step.
2. Initializes retain variable to missing if no default value is specified.

Example data mnthtot; set prog2.daysales; retain mth2dte 0; mth2dte=mth2dte+saleamt; run;

The above code will create a new variable mth2dte having a running sum of saleamt, but if there is any missing value in saleamt then all sebsequent values of mth2dte will be missing for that we use sum statement. Sum is a replacement to retain statement.
67

Accumulating totals for a group of data

For accumulating corresponding to a particular variable, data need to be sorted by that variable first and then we can use as by variable and if statement in the following manner.

Example data work.divsal(keep= jcode divsal); set work.salary; by jcode; if first.jcode then divsal=0; divsal + sal; run;

68

Reading delimited raw data file


Common delimiters used are blanks, commas and tab characters. Default delimiter is space. For specifying the format in which SAS should read the data value. We can specify the informat name. To specify an informat, use colon between name of the informat variable name. Colon signals SAS to read from delimiter to delimiter. Length of the variable can also be specified in advance using length statement. Using length, we can avoid colon. Example
data airplanes; length ID $5; infile raw data file; input ID $ Inservice : date9. passcap cargocap; run;
69

Delimiters and missing data


DLM= option is used to specify the delimiter in the following manner infile raw data file dlm=:; If you specify series of delimiters in DLM option then it considers any or all of the characters as delimiter e.g. DLM=:!; If there is missing data in the record then SAS automatically appends the next data to the previous data line. To avoid this MISSOVER option is used. infile raw data file dlm=: missover; If the length of any data value is less then the specified data length then missover statement will take it as missing value, so to avoid this we use TRUNCOVER option.

infile raw data file dlm=: missover truncover;

Two consecutive delimiters are treated as one, so to specify a missing value there should be a placeholder, which can be . for numeric filed and blank for character field.
70

If placeholder is not present then we can use the DSD option. Features of DSD option

1. Sets the default delimiter to comma. 2. Treats consecutive delimiters as missing values. 3. enables SAS to read values with embedded delimiters if the value is surrounded by double quotes.
Example infile Raw data file dsd;

71

Controlling when a record loads


SAS loads a new record into data set when it encounters input statement. We can also use forward slash which moves the pointer to next line. input Lname $20. Fname $10. / City $10. State $20.; This code will read Lname and Fname from first line and then move to next line and start reading city and state.

#n moves the pointer to desired line. input #1 Lname $20. Fname $10. #2 City $10. State $20.; This will read Lname and Fname form first line and City and State from second line. This cycle will carry on for 3rd and 4th record and so on till it reaches the end.

72

If statement can also be used to control loading of observations based on the value of any field. Example input salesid 5. Location $3.; if Location=USA then input Saledate : mmddyy10. Amount; if Location=EUR then input Saledate : date9. Amount: comma8.;

Above code will load salesid and location first and then depending on the value of location read it will load the value of saledate and amount.
For values not satisfying any criteria saledate and amount will be blank.
73

To avoid this scenario, we can use trailing character @ Trailing option holds the raw data record in the in the input buffer until 1. Executes an input with no trailing @ or 2. Reaches the end of data file step. Input var1 var2 var3.@;

Reading multiple observations in one record Multiple observations can be read into one record if we use double trailing @@. Input var1 var2 var3..@@:

74

Data Transformation

SAS provides a variable list, which can be used to refer to set of variables together.
X1 Xn Specifies all variables from x1 to xn inclusive. It can begin with any number and end with any number as long as rules for user supplied variables are not violated Specifies all variables from x to a Specifies all numeric variables from x to a Specifies all character variables from x to a Calculates the sum of all the variables that begin with REV

Numbered range list

Name range lists

X--a X numeric-a X-character-a

Name prefix lists

Sum(of REV:)

Special SAS names

_All_ _Numeric_ _Character_

All variables defined in a data step All numeric variables in a data step All character variables in a data step

75

SAS Functions

Substr function Used to extract a part of string. General form Newvar = Substr(string, start,<length>); Here string can be a string or a variable name, start is the start position and length is the number of characters to be extracted, if length is not written then all characters till end are extracted.

Right/Left function Used for right justification or left justification General form - Newvar=Right(argument) Here the argument will be right justified and the trailing blanks will be moved to start. Vice versa fro LEFT function.

Scan function SCAN function returns the nth word of a string. General form Newvar= SCAN(string , n , <delimiter>); Delimiter here can be omitted, in that case it takes blank as delimiter.

76

Concatenation operator - This operator is used to concatenate two or more strings. To concatenate, we can use either (!!) or (||). General Form Newvar = String1 !! String2; Trim function This function removes trailing blanks form the string General form Newvar = TRIM(argument); If the argument is blank then it returns a blank. Trim function does not trim leading blanks, for that we can use a combination of left and trim. Example Fullname = trim(left(Firstname)) !! !! Lastname;

CATX function This function concatenates character strings, removes leading and trailing and inserts separators. General Form CATX(separator, string 1,,string n); Similar to this CAT concatenates without removing blanks, CATS concatenates and removes leading and trailing blanks and CATT concatenates and removes trailing blanks only.
77

Find function This function searches for a specific substring within a string and returns its location if found and returns 0 if not found. General Form Position = FIND(target,value,<modifiers>,<start>); - Modifier can be I or T. I indicates that search is case insensitive, by default its case sensitive. T indicates that search ignores trailing blanks. - Start identifies the start position of search, a positive value signifies forward search and a negative value signifies backward search.

Index function works same as find function except it doe not have modifier and start argument. UPCASE function This converts all the letters and arguments to upper case and has no effect on digits and special characters.

General Form NewVal = UPCASE(argument);


LOWCASE function converts the text to lowercase. PROPCASE function converts the text to proper sentence form.
78

TRANWRD function This function translates a particular set of character in a string with other set of characters. General Form Desert = Tranwrd(Desert , Pumpkin , Apple); This replaces Pumpkin with apple in desert. If the length of replacing string is greater than replaced string then it causes truncation of string if length is not specified.

SUBSTR left side If substr function is used of the left side of the assignment statement then it replaces that substring in the text with the substring on right. General Form SUBSTR(string , start , <length>)=value;

79

Manipulating numeric values

Round function - This function returns a rounded off value to the nearest unit. General Form NewVar = ROUND(arguments,<round off unit>); Round off unit is numeric and positive. It indicates how many places need to rounded off.

CEIL function This function returns the smallest integer greater than or equal to the argument. Floor function This function returns the greatest integer less than or equal to the argument. INT function This function returns the integer part of the argument. MEAN function This returns the mean of all the arguments. MIN function This returns the minimum no missing value. MAX function - This returns the maximum value.

80

Manipulating Date values

Creating SAS date value MDY function returns SAS date from date, month and year given separately. General Form - Newdate=MDY(month,date,year); TODAY() This function returns the system date. Extracting information We can extract day , month or year from SAS date using DAY(SAS date ), MONTH(SAS date) or YEAR(SAS date) respectively. Similarly we can use QTR and WEEKDAY. Calculating Interval of Years YRDIF function calculates year difference between two SAS dates. General Form Diff= YRDIF(sdate , edate , basis) Basis can take following values 1. ACT/ACT This calculates the actual difference in fraction. 2. 30/360 Specifies 30 day month and 360 days year. 3. ACT/360 Takes actual number of days and divides it by 360. 4. ACT/365 Takes actual number of days and divides it by 365.
81

Converting variable type

INPUT statement is used to convert character value to numeric value. General Form Numvar=INPUT(source,informat) In above data conversion, the assigned variable cannot be same as converted variable, assigned and converted variable name cannot be the original name and rename of same variable.

PUT statement is used to convert numeric value to character value. General Form Charvar=PUT(Source,format); Same rules as above apply to PUT function also. Format can be any valid character format.

82

Automatic conversions

Automatic conversion from character to numeric is done in following cases 1. Assignment to a numeric variable. 2. An arithmetic operation. 3. Logical comparison with a numeric value. 4. A function that takes a numeric argument. 5. It produces a numeric missing value if it does not confirm to standard numeric convention.

Automatic numeric to character conversion is done in following manner 1. Assignment to a character variable. 2. A concatenation operation. 3. A function that accepts character arguments.

83

Do loop Processing

Do loop is used to eliminate the redundant data and perform repetitive work. General Form DO index-variable = start TO stop <BY increment>; End; Example- Data invest; do year = 2001 to 2003; Capital + 5000; Capital + (Capital * .075); end; run;

The above code will write the final value of Capital into the data set.
If we write output; before the end of do loop then it will write all the intermediate values of Capital in the data set.

84

Do While loop This is used for conditional iteration of a set of statements. General form DO WHILE(expression); END; Statement is executed first, if true then only loop is executed.

Do Until loop - This is used for conditional iteration of a set of statements. General form DO UNTIL(expression); END; Statement is executed first, if not true then also once loop is executed.

Combining Do WHILE and DO UNTIL with DO This method is used to avoid infinite loop. DO index variable = start TO stop <BY variable>; WHILE | UNTIL (expression); END;
85

Nested Do loops

Rules for nesting Do loops are 1. Use different iteration variable for all the Do loops. 2. Make sure that every DO has a corresponding END.

Example Data invest; Do Year = 1 to 5; Capital + 5000; Do Quarter = 1 to 4; Capital + (Capital * (.075/4)); End; Output;

End;

86

SAS arrays

Creating variables with arrays Example Data percent (drop = qtr);

Set donate;
Total = sum(of qtr1 qtr 4); array contrib(4) qtr1 qtr4; array percent(4);

do qtr=1 to 4;
percent(qtr)=contrib(qtr)/total; end; run; In the above code, contrib takes the value of qtr1 to qtr4 and percent is an empty array. We can also format the array variable while declaration. Example - var ID Percetn1 Percent4; Format percent1 percent4 percent6.; Percentw.d fromat multiplies value by 100 and adds a % sign at the end
87

Assigning initial values

Example data compare(drop = qtr goal1 goal4); set donate; array contrib(4) qtr1 qtr4; array diff(4); array goal(4) goal1 goal4 (10,15,5,10); do qtr=1 to 4;

diff(qtr) = contrib(qtr) goal(qtr);


end; run; The above code takes the value of existing variable qtr1 qtr4 into contrib, assigns values to new array goal with variable names goal1 to goal4 and calculates value for diff array. Initial values are retained until new values are assigned and in case of less values then array length, rest of the variables are set as having missing value.
88

Temporary arrays

Temporary can be created if we an array for calculation purpose, e.g. in the previous example, array goal is an intermediate array and it is not required in the output data set.

For that we can use _TEMPORARY_ instead of variable name


Example array Goal _temporary_ (10,15,5,10);

89

Rotating SAS data set


Input Data Set ID E00224 E00367 QTR1 12 35 QTR2 33 48 QTR 1 2 3 4 QTR3 22 40 Amount 12 33 22 30 QTR4

Output Data Set ID E00224 E00224 E00224 E00224

E00367
E00367 E00367

1
2 3

35
48 40

E00367

4
90

30

SAS Program for rotation

Data rotate(drop = Qtr1 Qtr4); Set donate; array Contrib(4) Qtr1 Qtr4; do Qtr=1 to 4; Amount = Contrib(qtr); Output; end; run; For every observation read from rotate data set in above code, there will be values coming into contrib from Qtr1 Qtr4. Now inside the loop these values inside contrib will be assigned to amount one by one in every iteration and every time these values will be written into the output data set along with vale of Qtr variable.

91

Conditional match merging of SAS data sets

If we have two data sets transact having account number information for the week, having account number, transaction type and amount as fields and a branches data set having account number and branch location for that account. Our objective is to create three datasets. Newtrans having weeks transactions with fields account number transaction type, amount and branch.

Noactiv showing accounts with no transaction this week with fields account number and branch
Noacct showing accounts with non matching account number, with fields account number, transaction type and amount.

92

Solution
Data Newtrans Noactiv(drop = trans amt) Noact(drop = branch); Merge transact(IN = Intrans) Branches(IN = InBanks); By actnum; If Intrans and Inbanks Then output Newtrans; Else if Inbanks and not InTrans then output Noactiv;

Else If Intrans and not Inbanks


then output Noacct; Run;

93

Writing SQL queries in SAS data set


We can use SQL queries in SAS by enclosing them in PROC SQL; and QUIT; While joining two data sets using an SQL query the data sets need not be sorted contrary to MERGE command in SAS where the input data sets need to be sorted by the BY variable. Example Proc SQL; Select T.Actnum, T.Trans, T.Amt, B.Branch from Transact T , Branches B where T.Actnum = B.Actnum; Quit;

No RUN command is required for an SQL query.

94

SAS Macros
Macros construct input for the SAS compiler. Functions of the SAS macro processor: pass symbolic values between SAS statements and steps

establish default symbolic values


conditionally execute SAS steps invoke very long, complex code in a quick, short way.

95

Advantages of SAS macros substitute text in statements like TITLEs communicate across SAS steps establish default values conditionally execute SAS steps hide complex code that can be invoked easily.

96

Components of SAS macros Macro variables: used to store and manipulate character strings follow SAS naming rules are NOT the same as DATA step variables are stored in memory in a macro symbol table. Macro statements: begin with a % and a macro keyword and end with semicolon (;) assign values, substitute values, and change macro variables can branch or generate SAS statements conditionally.

97

Automatic macro variables


Some of the automatic macro variables are

SYSDATE Current date in date7. format. SYSDAY Current day of week. SYSDSN/SYSLAST Last dataset built.

These are the most commonly used macro variables. Example footnote "this report was run on &SYSDAY, &SYSDATE"; The above code resolves to footnote "this report was run on Friday, 25jul08";

98

Displaying macro variables


%PUT is used to display macro variables on the log. %PUT **** SYSDAY = &SYSDAY; %PUT **** SYSTIME = &SYSTIME; %PUT **** SYSDATE = &SYSDATE;

Example

The above code prints **** SYSDAY = Friday **** SYSTIME = 13:42 **** SYSDATE = 25JUL08 Example of proc print using macro variable

proc contents data=&SYSLAST;


title "contents of &SYSLAST"; run;

99

User defined macro variables


Macro variables can be defined by using %LET statement. General form - %LET var_name = value; This variable can be used anywhere using a & sign. %LET NAME=PAYROLL; PROC PRINT DATA=&NAME; TITLE "PRINT OF DATASET &NAME"; RUN; The above code will substitute NAME with PAYROLL in the proc print procedure and prints the data set.

Example

% STR allows values with semicolon (;) .


&CHART;

Example - %LET CHART=%STR(PROC CHART;VBAR EMP;RUN;);

100

Defining and Using Macros


%MACRO and %MEND can be used to define macros. %Macro name can be used to use or call macros. Example
%MACRO CHART; PROC CHART DATA=&NAME; VBAR EMP; RUN; %MEND; %CHART;

%CHART will invoke the macro and run the code inside the definition of the macro.

101

Parameterized Macro

Example %MACRO CHART(NAME,BARVAR); PROC CHART DATA=&NAME; VBAR &BARVAR; RUN; %MEND; %CHART(PAYROLL,EMP);

The above macro resolves to PROC CHART DATA=PAYROLL; VBAR EMP;

RUN;

102

Conditional Macro

%IF and %DO can be used inside macro to execute a set of steps conditionally. Example
%MACRO PTCHT(PRTCH,NAME,BARVAR); %IF &PRTCH=YES %THEN %DO; PROC PRINT DATA=&NAME;

TITLE "PRINT OF DATASET &NAME";


RUN; END; PROC CHART DATA=&NAME;

VBAR &BARVAR;
RUN; %MEND; %PTCHT(YES,PAYROLL,EMP)
103

Transferring values between SAS steps


SYMGET and SYMPUT can be used to transfer values between data steps or proc steps. Example
%MACRO OBSCOUNT(NAME); DATA _NULL_; SET &NAME NOBS=OBSOUT; CALL SYMPUT('MOBSOUT',OBSOUT);

STOP;
RUN; PROC PRINT DATA=&NAME; TITLE "DATASET &NAME CONTAINS &MOBSOUT OBSERVATIONS";

RUN;
%MEND; %OBSCOUNT(PAYROLL);

104

Efficiency Techniques
Selecting observations Comparison between In, or and where operator while selecting. Reducing observation length Comparison between SCAN and SUBSTR function in terms of disk space usage. Indexing Usage of index in a where statement as compared to if statement. Compressing Making a data set form another sorted data set in different cases of whether input is compressed or the output. Sub setting external files Usage of if statement at different stages while sub setting an external file. Concatenating data sets Comparison between simple concatenations, append, insert into in SQL and union functions. Interleaving data sets - Using sort function separately, by function and order by in union.

105105

Selecting Observations
When we want to test for different values of a variable using the IF statement, we can choose between the IN operator or the OR operator. The examples below show that the IN operator requires more CPU time. The difference becomes even more important when testing huge set of records.

PROGRAM 1-A DATA PRODUCTSALES; SET DATA1.SALES; WHERE PRODUCT_ID IN ('111', '142', '152', '165', '166'); Run;

PROGRAM 1-B DATA PRODUCTSALES; SET DATA1.SALES; IF PRODUCT_ID = '111' OR PRODUCT_ID = '142' OR PRODUCT_ID = '152' OR PRODUCT_ID = '165' OR PRODUCT_ID = '166'; RUN;

106106

PROGRAM 1-C DATA PRODUCTSALES; SET DATA1.SALES; WHERE PRODUCT_ID IN ('111', '142', '152', '165', '166', '411', '412', '417', '421', '423', '519', '525', '526', '733', '736'); RUN;

PROGRAM 1-D DATA PRODUCTSALES; SET DATA1.SALES; IF PRODUCT_ID = '111' OR PRODUCT_ID = '142' OR PRODUCT_ID = '152' OR PRODUCT_ID = '165' OR PRODUCT_ID = '166' OR PRODUCT_ID = '411' OR PRODUCT_ID = '412' OR PRODUCT_ID = '417' OR PRODUCT_ID = '421' OR PRODUCT_ID = '423' OR PRODUCT_ID = '519' OR PRODUCT_ID = '525' OR PRODUCT_ID = '526' OR PRODUCT_ID = '733' OR PRODUCT_ID = '736'; RUN;

107

Comparison on the basis of time


Program number Method used and size of data CPU time elapsed

1-A

5 records IN operator

1.94 sec

1-B

5 values OR operator

0.80 sec

1-C

15 records IN operator

3.92 sec

1-D

15 records OR operator

0.90 sec

108

PROGRAM 2-A DATA CLIENT; SET DATA1.CLIENT; IF LAST_NAME = VAN BRUSSELS; RUN;

PROGRAM 2-B DATA CLIENT; SET DATA1.CLIENT; WHERE LAST_NAME = VAN BRUSSELS; RUN;

Sub setting data in a DATA step is possible through the IF statement or the WHERE statement. Usually the WHERE statement is more efficient than the IF statement, because the IF statement is executed on the data, being in the Program Data Vector, whereas the WHERE statement is executed before bringing the data in the Program Data Vector. The following examples show this behavior.

109

PROGRAM 2-C DATA CLIENT; SET DATA1.CLIENT; IF SUBSTR (LAST_NAME, 1, 3) = 'VAN'; RUN;

PROGRAM 2-D DATA CLIENT; SET DATA1.CLIENT; WHERE SUBSTR (LAST_NAME, 1, 3) = 'VAN'; RUN;

PROGRAM 2-E DATA CLIENT; SET DATA1.CLIENT; WHERE LAST_NAME LIKE 'VAN%'; RUN;

Although there is an exception in where statement too. The above examples show that using the SUBSTR function in a WHERE statement increases the CPU time incredibly compared to the corresponding IF statement. When using a typical WHERE operand (LIKE), the same subset is created, but CPU time decreases and gives a better performance again compared to the sub setting IF statement.

110

Comparison on the basis of time

Program number

Method used

CPU time elapsed (seconds)

2-A

IF

0.90

2-B

Where

0.07

2-C

IF SUBSTR

0.11

2-D

Where SUBSTR

0.22

2-E

Where LIKE

0.09

111

Reducing Observation Length


Several data manipulation functions have space leaks: If LENGTH statement is not specified to identify the resulting variable, a lot of disk space might be wasted. Two examples illustrate this behavior. Within the first example the variable INITIALS contains the output of the SUBSTR function, but the length of this variable equals the sum of the contributing variables. As a result, every observation in the output table contains (length of first name + length of last name - 2) redundant blanks. Let us assume that the length of first name and last name is 20 each in that case every initials will have 38 redundant blanks.

PROGRAM 1-A DATA CLIENT; SET DATA1.CLIENT; INITIALS = SUBSTR (FIRST_NAME, 1, 1) !! SUBSTR (LAST_NAME, 1, 1); RUN;

PROGRAM 1-B DATA CLIENT; SET DATA1.CLIENT; LENGTH INITIALS $ 2; INITIALS = SUBSTR (FIRST_NAME, 1, 1) !! SUBSTR (LAST_NAME, 1, 1); RUN;

112

Some functions like the SCAN function create a result with a default length of 200, being the maximum length of a character variable. Following is an example of space wastage in that case.

PROGRAM 1-C DATA CLIENT; SET DATA1.CLIENT; COUNTRY = SCAN (CLIENT_ID, 1, '-'); CITY = SCAN (CLIENT_ID, 2, '-'); NUMBER = SCAN (CLIENT_ID, 3, '-'); RUN;

PROGRAM 1-D DATA CLIENT; SET DATA1.CLIENT; LENGTH COUNTRY CITY $ 2 NUMBER $ 8; COUNTRY = SCAN (CLIENT_ID, 1, '-'); CITY = SCAN (CLIENT_ID, 2, '-'); NUMBER = SCAN (CLIENT_ID, 3, '-'); RUN;

113

Comparison on the basis of size

Program number 1-A

Method used SUBSTR

Length of variables in different cases 20 + 20

1-B

SUBSTR Length

1-C

SCAN

3 x 200 = 600

1-D

SCAN Length

2 + 2 + 8 = 12

114

Indexing
Although an index is considered for use in a WHERE statement and not in a sub setting IF statement, we still find several programs using an IF statement to subset a table with an index. The gain in CPU time becomes more important if the subset returned by the index is smaller. In the following examples, a simple index exists on the variables SHOP_ID and CUSTOMER_ID. The variable SHOP_ID has only 7 distinct values, whereas the variable CUSTOMER_ID contains approximately 80.000 different values. Accessing the data through the index on SHOP_ID returns +/- 15% of the data, resulting in only a small difference between the WHERE statement (using the index) and the IF statement (performing a sequential search).

PROGRAM 1-A DATA SALES_B_B; SET DATA1.SALES_INDEXED; IF SHOP_ID = 'B-B'; RUN;

PROGRAM 1-B DATA SALES_B_B; SET DATA1.SALES_INDEXED; WHERE SHOP_ID = 'B-B'; RUN;

115

Accessing the data through the index on CUSTOMER_ID returns less than 0.01% of the data and is extremely fast compared to the sub setting IF statement.

PROGRAM 2-A DATA SALES_12345; SET DATA1.SALES_INDEXED; IF CUSTOMER_ID = 12345'; RUN;

PROGRAM 2-B DATA SALES_12345; SET DATA1.SALES_INDEXED; WHERE CUSTOMER_ID = 12345'; RUN;

116

Comparison on the basis on time

Program Number

Description

CPU Time(seconds)

1-A

7 shops If

1.31

1-B

7 shops Where

1.02

2-A

100.00 Clients If

0.76

2-B

100.00 clients Where

0.01

117

Compressing
Compression can be useful if disk space is a problem. Compression must be added in a sensible way: Both compressing the data and decompressing the data requires CPU time. COMPRESS = YES option in the global OPTIONS statement should not be specified. The following examples illustrate the CPU cost of compression: an input SAS data set is sorted into an output SAS data set.
PROGRAM 1-A PROC SORT DATA = DATA1.CLIENT OUT = CLIENT; BY HOME_CITY; RUN; PROGRAM 1-C PROC SORT DATA = DATA1.CLIENT_COMPRESSED OUT = CLIENT; BY HOME_CITY; RUN;

PROGRAM 1-B PROC SORT DATA = DATA1.CLIENT OUT = CLIENT_COMPRRESSED (COMPRESS = YES); BY HOME_CITY; RUN;

PROGRAM 1-D PROC SORT DATA = DATA1.CLIENT_COMPRESSED OUT = CLIENT_COMPRESSED (COMPRESS = YES); BY HOME_CITY; RUN;

118

Comparison on the basis of time

Program

Description

CPU Time (seconds)

1-A

Input not compressed Output not compressed

0.51

1-B

Input not compressed Output compressed

0.78

1-C

Input compressed Output not compressed

0.48

1-D

Input compressed Output compressed

0.80

119

Sub setting external files


The INPUT statement, structuring the input buffers content into variables in the Program Data Vector will consume quite some CPU time. If you only need to process a subset of the external file, only examine part of the input buffer, and if this part meets your sub setting condition, examine the rest of the input buffer. The trailing @ in the INPUT statement allows holding contents the input buffer.

PROGRAM 1-A
DATA CLIENT; INFILE CLIENT; INPUT CLIENT_ID $ 1 - 14 LAST_NAME $ 16 - 35 FIRST_NAME $ 37 - 56 HOME_CITY $ 58 - 77 HOME_COUNTRY $ 79 - 93 ; RUN; DATA CLIENT_LONDON; SET CLIENT; IF HOME_CITY = 'LONDON'; RUN;

120

PROGRAM 1-B DATA CLIENT_LONDON; INFILE CLIENT; INPUT CLIENT_ID $ 1 - 14 LAST_NAME $ 16 - 35 FIRST_NAME $ 37 - 56 HOME_CITY $ 58 - 77 HOME_COUNTRY $ 79 - 93 ; IF HOME_CITY = 'LONDON'; RUN;

PROGRAM 1-C DATA CLIENT_LONDON; INFILE CLIENT; INPUT HOME_CITY $ 58 - 77 @; IF HOME_CITY = 'LONDON'; INPUT CLIENT_ID $ 1 - 14 LAST_NAME $ 16 - 35 FIRST_NAME $ 37 - 56 HOME_COUNTRY $ 79 - 93 ; RUN;

121

Comparison on the basis on time

Program number

Description

CPU Time(minutes)

1-A

DATA (Input) DATA (If)

4:22.80

1-B

DATA (Input If)

2:25.98

1-C

DATA (Input If Input)

0:15.91

122

EFFICIENTLY COMBINING DATA - CONCATENATING SAS DATA SETS


Many users are familiar with the APPEND procedure for adding a new table immediately to a master table, without reading / writing the master table. Still, they rarely code the APPEND procedure, because they are used to typing the DATA step, which is coded very fast. In the next example the traditional DATA step concatenation capabilities are compared with using the OUTER UNION CORR operator in the SQL procedure. The result can also be created using the SQL INSERT statement to add all observations of the second table to the end of the master table.
PROGRAM 1-A DATA SALES; SET SALES DATA1.SALES2003; RUN; PROGRAM 1-B PROC APPEND BASE = SALES DATA = DATA1.SALES2003; RUN; PROGRAM 1-C PROC SQL; INSERT INTO SALES SELECT * FROM DATA1.SALES2003; QUIT;

PROGRAM 1-D PROC SQL; CREATE TABLE SALES AS SELECT * FROM SALES OUTER UNION CORR SELECT * FROM DATA1.SALES2003; QUIT;

123

Comparison on the basis of time

Program Number 1-A

Description DATA (Set)

CPU Time(seconds) 1.65

1-B

Append

0.11

1-C

SQL (Insert into)

0.59

1-D

SQL (Outer union core)

3.98

124

Interleaving dataset
You can concatenate two sorted input SAS data sets into a sorted result in several ways. The following example compares the traditional DATA step followed by a SORT procedure with a BY statement immediately specified in the DATA step and with the OUTER UNION CORR operator with an ORDER BY clause in the SQL procedure. As expected the SQL procedure requires more CPU time than the DATA step.
PROGRAM 1-A DATA SALES; SET DATA1.SALES_B DATA1.SALES_NL; RUN; PROC SORT DATA = SALES; BY SALES_DATE; RUN;

PROGRAM 1-B DATA SALES; SET DATA1.SALES_B DATA1.SALES_NL; BY SALES_DATE; RUN;

PROGRAM 1-C PROC SQL; CREATE TABLE SALES AS SELECT * FROM DATA1.SALES_B OUTER UNION CORR SELECT * FROM DATA1.SALES_NL ORDER BY SALES_DATE; QUIT;

125

Comparison on the basis on time

Program Number 1-A

Description DATA (Set) - Sort

CPU Time(seconds) 6.15

1-B

DATA (Set By)

2.10

1-C

SQL (Outer Union Corr Order By)

11.32

126

You might also like