You are on page 1of 54

1

Copyright 2012, SAS Institute Inc. All rights reserved.


Innovative Techniques: Doing More with Loops and Arrays

SAS Talks
April 12, 2012
PLEASE STAND BY
Todays event will begin at 1:00pm EST.

The audio portion of the presentation will be heard through your computer
speakers.

This is an automatic setup and is preferred. There will also be a limited option to
listen through the telephone to 250 lines.

If you would prefer to dial in, please call:
US Toll-Free: 1-888-682-4285
Toll/International: +1-973-368-0695
Conference Code: 4675179#

If you experience any technical difficulties,
you may contact WebEx Technical Support
at 866-229-3239.

#sastalks

Copyright 2012, SAS Institute Inc. All rights reserved.
Innovative Techniques: Doing
More with Loops and Arrays
SAS Talks
April 12, 2012
3

Copyright 2012, SAS Institute Inc. All rights reserved.
Speakers
Stacy Hobson
Director, Customer Loyalty and Retention
SAS Institute







Art Carpenter
Author and Senior Consultant
California Occidental Consultants

4
Innovative Techniques:
Doing More with
Loops and Arrays
Arthur L. Carpenter
California Occidental Consultants
10606 Ketch Circle
Anchorage, AK 99515
(907) 865-9167
art@caloxy.com
www.caloxy.com
5
INTRODUCTION
Techniques involving the use of DO loops and arrays provide the
programmer with powerful and diverse methods for solving a
wide range of types of problems.

This presentation assumes that the audience has at least a
passing understanding of the:
types of DO loops and their basic syntax
ARRAY statement and its various forms

The primary objective of this presentation is to demonstrate the
interaction of DO loops and arrays.
6
Shorthand Variable Naming
Variables with a common prefix and a numeric suffix
array vis {10} visit1 - visit10;
(2.6.1)
Can create variables.
Introduction
The colon operator can be used for Named Prefix lists
array vis {10} visit:;
7
Shorthand Variable Naming
Specialized name lists address variables by their type.

_CHARACTER_ All character variables
_NUMERIC_ All numeric variables
_ALL_ All variables on the PDV

Since each of these lists pertain to the current list of
variables, they will not create variables. In each case
the resulting list of variables will be in the same order
as they are on the Program Data Vector.

(2.6.1)
Introduction
8
DO Loop Specifications
(3.9.2)
Introduction
do count=1 to 3, 5 to 20 by 5, 26, 33; . . . end;
COUNT: 1, 2, 3, 5, 10, 15, 20, 26, 33
Character index value specifications
do month = 'Jan', 'Feb', 'Mar'; . . . end;
Compound loop specifications
9
Special DO Loop Forms
(3.9.3)
Introduction
do count=1 to 3; . . . end;
COUNT exits the loop with a value of COUNT=3
do count=1 to 3 until(count=3); . . . end;
COUNT exits the loop with a value of COUNT=4
Incremented infinite loop with conditional exit
do k=1 by 1 until(x=5); . . . end;
Notice there is no TO specification
10
ARRAY Statement Forms
(3.10.1)
array list {3} aa bb cc;
array list {1:3} aa bb cc;
array list {0:2} aa bb cc;
array vis {16} visit1-visit16;
array vis {*} visit1-visit16;
array visit {16} ;
array nvar {*} _numeric_;
array nvar {*} _character_;
array clist {3} $2 aa bb cc;
array clist {3} $1 ('a', 'b', 'c');
array clist {4:6} $1 ('a', 'b', 'c');
Introduction
Array dimension = number of elements
11
ARRAY Statement Forms
(3.10.1)
array list {3} aa bb cc;
array list {1:3} aa bb cc;
array list {0:2} aa bb cc;
array vis {16} visit1-visit16;
array vis {*} visit1-visit16;
array visit {16} ;
array nvar {*} _numeric_;
array nvar {*} _character_;
array clist {3} $2 aa bb cc;
array clist {3} $1 ('a', 'b', 'c');
array clist {4:6} $1 ('a', 'b', 'c');
Introduction
These two array dimensions
are the same.
12
ARRAY Statement Forms
(3.10.1)
array list {3} aa bb cc;
array list {1:3} aa bb cc;
array list {0:2} aa bb cc;
array vis {16} visit1-visit16;
array vis {*} visit1-visit16;
array visit {16} ;
array nvar {*} _numeric_;
array nvar {*} _character_;
array clist {3} $2 aa bb cc;
array clist {3} $1 ('a', 'b', 'c');
array clist {4:6} $1 ('a', 'b', 'c');
LIST{1} references the variable BB
Introduction
Index range
13
ARRAY Statement Forms
(3.10.1)
array list {3} aa bb cc;
array list {1:3} aa bb cc;
array list {0:2} aa bb cc;
array vis {16} visit1-visit16;
array vis {*} visit1-visit16;
array visit {16} ;
array nvar {*} _numeric_;
array nvar {*} _character_;
array clist {3} $2 aa bb cc;
array clist {3} $1 ('a', 'b', 'c');
array clist {4:6} $1 ('a', 'b', 'c');
Introduction
Dimension is calculated by SAS
14
ARRAY Statement Forms
(3.10.1)
array list {3} aa bb cc;
array list {1:3} aa bb cc;
array list {0:2} aa bb cc;
array vis {16} visit1-visit16;
array vis {*} visit1-visit16;
array visit {16} ;
array nvar {*} _numeric_;
array nvar {*} _character_;
array clist {3} $2 aa bb cc;
array clist {3} $1 ('a', 'b', 'c');
array clist {4:6} $1 ('a', 'b', 'c');
The three array elements are
initialized with these three
character values.
Introduction
No variable list means that
variables are created:
CLIST1, CLIST2, CLIST3
15
Temporary Arrays
(3.10.2)
array visdate {16} _temporary_;

array list {5} _temporary_ (11,12,13,14,15);

array list {5} _temporary_ (11:15);

array list {6} _temporary_ (6*3);

array list {6} _temporary_ (2*1:3);
Introduction
No Permanent variables are created.
The dimension must be
specified.
16
Temporary Arrays
(3.10.2)
array visdate {16} _temporary_;

array list {5} _temporary_ (11,12,13,14,15);

array list {5} _temporary_ (11:15);

array list {6} _temporary_ (6*3);

array list {6} _temporary_ (2*1:3);
Introduction
No Permanent variables are created.
Initial values are
assigned as a list
inside of parentheses.
17
Temporary Arrays
(3.10.2)
array visdate {16} _temporary_;

array list {5} _temporary_ (11,12,13,14,15);

array list {5} _temporary_ (11:15);

array list {6} _temporary_ (6*3);

array list {6} _temporary_ (2*1:3);
Introduction
No Permanent variables are created.
(3,3,3,3,3,3)
Initial values are
assigned as a list
inside of parentheses.
18
Temporary Arrays
(3.10.2)
array visdate {16} _temporary_;

array list {5} _temporary_ (11,12,13,14,15);

array list {5} _temporary_ (11:15);

array list {6} _temporary_ (6*3);

array list {6} _temporary_ (2*1:3);
Introduction
No Permanent variables are created.
(3,3,3,3,3,3)
(1,2,3,1,2,3)
Initial values are
assigned as a list
inside of parentheses.
19
TRANSPOSING DATA
This task is a simple introduction to the relationship between the
DO loop and the array defined by the ARRAY statement.
(2.4)
Introduction
20
TRANSPOSING DATA
Most, but not all, SAS procedures prefer to operate against
normalized data, which tends to be tall and narrow, and often
contains classification variables that are used to identify
individual rows.

Data in non-normal form tends to have one column for each level
of one of the classification variables.

Converting between the two forms can be accomplished using
PROC TRANSPOSE or from within the DATA step.

This is one of the 'classic' uses of arrays in conjunction with DO
loops.


(2.4)
Introduction
21
TRANSPOSING DATA
Most, but not all, SAS procedures prefer to operate against
normalized data, which tends to be tall and narrow, and often
contains classification variables that are used to identify
individual rows.

Data in non-normal form tends to have one column for each level
of one of the classification variables.

Converting between the two forms can be accomplished using
PROC TRANSPOSE or from within the DATA step.


(2.4)
Normal Form

Obs SUBJECT VISIT sodium
1 208 1 13.7
2 208 2 14.1
3 208 4 14.1
4 208 5 14.1
5 208 6 13.9
6 208 7 13.9
7 208 8 14.0
8 208 9 14.0
9 208 10 14.0
10 209 1 14.0
. . . . portions of the table are not shown . . . .
Introduction
22
Transposing Rows to Columns
(2.4.2)
Normal Form
Obs SUBJECT VISIT sodium
1 208 1 13.7
2 208 2 14.1
3 208 4 14.1
4 208 5 14.1
5 208 6 13.9
6 208 7 13.9
7 208 8 14.0
8 208 9 14.0
9 208 10 14.0
10 209 1 14.0
. . . . portions of the table are not shown . . . .
S v v v v v v v
U v v v v v v v v v i i i i i i i
B i i i i i i i i i s s s s s s s
J s s s s s s s s s i i i i i i i
O E i i i i i i i i i t t t t t t t
b C t t t t t t t t t 1 1 1 1 1 1 1
s T 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6

1 208 13.7 14.1 . 14.1 14.1 13.9 13.9 14 14 14.0 . . . . . .
2 209 14.0 14.0 . 13.9 14.2 14.5 13.8 14 . 13.8 14 14.1 14.2 14.1 14 14.1
. . . . portions of the table are not shown . . . .
23
Transposing Rows to Columns
Commonly the process of transposing will involve the
use of an array and an iterative DO loop.
data lab_nonnormal(keep=subject visit1-visit16);
set lab_chemistry(keep=subject visit sodium);
by subject;
retain visit1-visit16 ; O
array visits {16} visit1-visit16; O
if first.subject then do i = 1 to 16; O
visits{i} = .;
end;
visits{visit} = sodium; O
if last.subject then output lab_nonnormal; O
run;
(2.4.2)
VISITS will
become columns
Assign the value
SUBJECT remains as a CLASS variable.
Introduction
24
Transposing Rows to Columns
Commonly the process of transposing will involve the
use of an array and an iterative DO loop.
data lab_nonnormal(keep=subject visit1-visit16);
set lab_chemistry(keep=subject visit sodium);
by subject;
retain visit1-visit16 ; O
array visits {16} visit1-visit16; O
if first.subject then do i = 1 to 16; O
visits{i} = .;
end;
visits{visit} = sodium; O
if last.subject then output lab_nonnormal; O
run;
(2.4.2)
Introduction
25
Transposing Columns to Rows
Assume that the visits are to become the classification
variable.
data lab_normal(keep=subject visit sodium);
set lab_nonnormal(keep=subject visit:); O
by subject;
array visits {16} visit1-visit16; O
do visit = 1 to 16; O
sodium = visits{visit}; O
output lab_normal; G
end;
run;
(2.4.2)
VISITS exist as
columns
Use visit number as the
array index.
The OUTPUT statement is inside of the DO loop.
Introduction
26
Transposing Columns to Rows
(2.4.2)
S v v v v v v v
U v v v v v v v v v i i i i i i i
B i i i i i i i i i s s s s s s s
J s s s s s s s s s i i i i i i i
O E i i i i i i i i i t t t t t t t
b C t t t t t t t t t 1 1 1 1 1 1 1
s T 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6

1 208 13.7 14.1 . 14.1 14.1 13.9 13.9 14 14 14.0 . . . . . .
2 209 14.0 14.0 . 13.9 14.2 14.5 13.8 14 . 13.8 14 14.1 14.2 14.1 14 14.1
. . . . portions of the table are not shown . . . .
Obs SUBJECT visit sodium

1 208 1 13.7
2 208 2 14.1
3 208 3 .
4 208 4 14.1
5 208 5 14.1
6 208 6 13.9
7 208 7 13.9
8 208 8 14.0
9 208 9 14.0
10 208 10 14.0
11 208 11 .
12 208 12 .
13 208 13 .
14 208 14 .
15 208 15 .
16 208 16 .
17 209 1 14.0
. . . . portions of the table are not shown . . . .
Introduction
27
ARRAY FUNCTIONS
Three functions have been designed to work with arrays.
DIM return the dimension of the array
(3.10.3)
Introduction
data newchem(drop=i);
set advrpt.lab_chemistry
(drop=visit labdt);
array chem {*} _numeric_; O
do i=1 to dim(chem); O
chem{i} = chem{i}/100;
end;
run;
The number of
numeric variables
determines the array
dimension.
28
ARRAY FUNCTIONS
Three functions have been designed to work with arrays.
DIM return the dimension of the array
(3.10.3)
Introduction
data newchem(drop=i);
set advrpt.lab_chemistry
(drop=visit labdt);
array chem {*} _numeric_; O
do i=1 to dim(chem); O
chem{i} = chem{i}/100;
end;
run;
the DIM function
returns the dimension
Although it does in this example,
the index does not always vary
from 1 to the dimension.
29
ARRAY FUNCTIONS
LBOUND and HBOUND Lower and upper array bounds
(3.10.3)
Introduction
data CloseHT;
array heights {&lb:&hb} _temporary_; O
do until(done);
set advrpt.demog(keep=subject ht) end=done;
heights(subject)=ht; O
end;
done=0;
do until(done);
set advrpt.demog(keep=subject ht) end=done;
do Hsubj = lbound(heights) to hbound(heights); O
closeHT = heights{hsubj}; O
if (ht-1 le closeht le ht+1) O
& (subject ne hsubj) then output closeHT;
end;
end;
stop; O
run;
Array bounds unknown
to the programmer
30
ARRAY FUNCTIONS
LBOUND and HBOUND Lower and upper array bounds
(3.10.3)
Introduction
data CloseHT;
array heights {&lb:&hb} _temporary_; O
do until(done);
set advrpt.demog(keep=subject ht) end=done;
heights(subject)=ht; O
end;
done=0;
do until(done);
set advrpt.demog(keep=subject ht) end=done;
do Hsubj = lbound(heights) to hbound(heights); O
closeHT = heights{hsubj}; O
if (ht-1 le closeht le ht+1) O
& (subject ne hsubj) then output closeHT;
end;
end;
stop; O
run;
The array HEIGHTS is
loaded.
Notice the use of ( )
instead of { }. Don't
use ( ).
31
ARRAY FUNCTIONS
LBOUND and HBOUND Lower and upper array bounds
(3.10.3)
Introduction
data CloseHT;
array heights {&lb:&hb} _temporary_; O
do until(done);
set advrpt.demog(keep=subject ht) end=done;
heights(subject)=ht; O
end;
done=0;
do until(done);
set advrpt.demog(keep=subject ht) end=done;
do Hsubj = lbound(heights) to hbound(heights); O
closeHT = heights{hsubj}; O
if (ht-1 le closeht le ht+1) O
& (subject ne hsubj) then output closeHT;
end;
end;
stop; O
run;
Step through
the array one
element at a
time
Read the data again
32
WORKING ACROSS OBSERVATIONS

Because SAS reads one observation at a time into the PDV, it is
difficult to remember the values from an earlier observation
(look-back) or to anticipate the values of a future observation
(look-ahead).

Without doing something extra, only the current observation is
available for use.
(3.1)
33
Processing Within Groups
The problems inherent with single observation processing are
especially apparent when we need to work with our data in
groups.

The BY statement can be used to define groups, but the
detection and handling of group boundaries is still an issue.

Fortunately there is more than one approach to this type of
processing.
(3.1.1)
Across Observations
34
BY Group Processing; First. and Last.
Count clinics and patient visits within each region.
(3.1.1)
data counter(keep=region clincnt patcnt);
set regions(keep=region clinnum);
by region clinnum; O
if first.region then do; O
clincnt=0;
patcnt=0;
end;

if first.clinnum then clincnt + 1; O
patcnt+1; O

if last.region then output; O
run;
Initialize the counters
Count items of interest
(increment counters)
Write totals
Across Observations
35
BY Group Processing; First. and Last.
Count clinics and patient visits within each region.
(3.1.1)
data counter(keep=region clincnt patcnt);
set regions(keep=region clinnum);
by region clinnum; O
if first.region then do; O
clincnt=0;
patcnt=0;
end;

if first.clinnum then clincnt + 1; O
patcnt+1; O

if last.region then output; O
run;
clincnt + first.clinnum; O
Across Observations
Across Observations
36
Transposing to Temporary Arrays
Used to handle more than one observation at a time.
(3.1.1)
data labvisits(keep=subject count meanlength);
set advrpt.lab_chemistry;
by subject;

array Vdate {16} _temporary_; O
retain totaldays count 0;

if first.subject then do; O
totaldays=0;
count = 0;
do i = 1 to 16;
vdate{i}=.;
end;
end;
Temporary array to hold
values of interest.
Initialize the counters
Across Observations
37
Transposing to Temporary Arrays
Used to handle more than one observation at a time.
(3.1.2)
vdate{visit} = labdt; O

if last.subject then do; O
do i = 1 to 15;
between = vdate{i+1}-vdate{i}; O
if between ne . then do;
totaldays = totaldays+between; O
count = count+1;
end;
end;
meanlength = totaldays/count; O
output;
end;
run;
Load the array.
Process across values
Write the mean for this subject.
38
Transposing to Arrays
Code comment to clear an array of values.
(3.1.2)
do i = 1 to 16;
vdate{i}=.;
end;
call missing(of vdate{*});
The MISSING routine assigns
missing values to its arguments.
Across Observations
39
Building a FIFO Stack
When processing across a series of observations for the
calculation of statistics, such as running averages, a stack can be
helpful.

A stack is a collection of values that have automatic entrance
and exit rules.

Values tend to rotate through a stack

Stacks come in two basic flavors; First-In-First-Out, FIFO, and
Last-In-First-Out, LIFO.

In a FIFO stack the oldest value in the stack is removed to make
room for the newest value.
(3.1.7)
Across Observations
40
Building a FIFO Stack
A three day moving average of potassium levels is to be
calculated for each subject.
(3.1.7)
data Average(keep=subject visit labdt
potassium Avg3day);
set labdates;
by subject;

* dimension of array is number of
* items to be averaged;
retain visitcnt .; O
array stack {0:2} _temporary_; O

Index the array starting at 0
Across Observations
41
Building a FIFO Stack
A three day moving average of potassium levels is to be
calculated for each subject.
(3.1.7)
if first.subject then do; O
call missing(of stack{*}); O
visitcnt=0;
end;
visitcnt+1; O
index = mod(visitcnt,3); O
stack{index} = potassium; O
avg3day = mean(of stack{*}); O
run;
Clear the stack
for each subject.
O The MOD function
determines the array
INDEX.
Number of items in
the stack.
Across Observations
42
Using Set Statement Options
Although a majority of DATA steps use the SET statement, few
programmers take advantage of its full potential. The SET
statement has options that can be used to control how the data
are to be read.

END= used to detect the last observation from the
incoming data set(s)
KEY= specifies a index to be used when reading
INDSNAME= used to identify the current data source
NOBS= number of observations
OPEN= determines when to open a data set
POINT= designates the next observation to read
UNIQUE used with KEY= to read from the top of the index
(3..8)
43
Using POINT= and NOBS=
The SET statement by default reads one observation after
another, first observation to last. The POINT= option makes it
possible to perform a non-sequential read.

The POINT= option identifies a temporary variable that indicates
the number of the next observation to read. The NOBS= option
also identifies a temporary variable, which after DATA step
compilation will hold the number of observations on the incoming
data set.

(3.8.1)
SET Statement Options
44
Using POINT= and NOBS=
Randomly read a subset of observations from &DSN.
(3.8.1)
%macro rand_wo(dsn=,pcnt=0);
data rand_wo(drop=cnt totl);
totl = ceil(&pcnt*obscnt); O
array obsno {10000} _temporary_; O

do until(cnt = totl);
point = ceil(ranuni(0)*obscnt); O
if obsno{point} ne 1 then do; O
set &dsn point=point nobs=obscnt; O
output;
obsno{point}=1; O
cnt+1;
end;
end;
stop; O
run;
%mend rand_wo;
%rand_wo(dsn=advrpt.demog,pcnt=.3)
O Total observations to read.
O Randomly select the
next observation to read.
O The value of
POINT is the next
observation to read.
O Flag this observation,
it has now been read.
45
Using the DOW Loop
The DOW loop, which is also known as the DO-W loop, was
named for Ian Whitlock who popularized the technique and first
demonstrated its efficiencies.
(3.9.1)
data implied; O
set big;
output implied;
run;
The DATA step has an implied loop.
The loop is executed once for each
incoming observation.
data dowloop;
do until(eof); O
set big end=eof; O
output dowloop;
end;
stop; O
run;
The implicit loop is replaced
with an explicit one (here a
DO UNTIL).
OTerminate the step with a STOP.
46
Using the DOW Loop
Merge a single observation summary data set onto the analysis
data. Calculate percent change.
(3.9.1)
proc summary data=advrpt.demog;
var wt;
output out=means mean=/autoname;
run;
data Diff1;
if _n_=1 then set means(keep=wt_mean); O
set advrpt.demog(keep=lname fname wt); O
diff = (wt-wt_mean)/wt_mean;
run;
O Use the IF to control reading
the summary data set.
DOW Loop
47
Using the DOW Loop
Merge a single observation summary data set onto the analysis
data. Calculate percent change, using a DOW loop to merge.
(3.9.1)
data Diff2;
set means(keep=wt_mean); O
do until(eof);O
set advrpt.demog(keep=lname fname wt)
end=eof;O
diff = (wt-wt_mean)/wt_mean;
output diff2;
end;
stop; G
run;
O The IF is not needed.
G Terminate the DATA step.
DOW Loop
48
Key Indexing A Simple Hash
Use an array to hold values. Neither data set needs to be sorted.
(6.7.2)
data clinnames(keep=subject lname fname clinnum clinname);
array chkname {999999} $35 _temporary_; O
do until(allnames); O
set advrpt.clinicnames end=allnames;
chkname{input(clinnum,6.)}=clinname; O
end;
do until(alldemog);
set advrpt.demog(keep=subject lname fname clinnum) O
end=alldemog;
clinname = chkname{input(clinnum,6.)}; O
output clinnames;
end;
stop; O
run;
data clinnames;
merge demog
clinicnames;
by clinnum;
run;
Read and store the
names, indexed by
clinic number.
Read a number
and retieve its
corresponding
name from the
array.
49
Use a Hash Object
Hash objects also avoid the need to sort.
(6.7.2)
data hashnames(keep=subject clinnum clinname lname fname);
if 0 then set advrpt.clinicnames; O
declare hash lookup(dataset: 'advrpt.clinicnames', O
hashexp: 8); O
lookup.defineKey('clinnum'); O
lookup.defineData('clinname'); O
lookup.defineDone();

* Read the primary data;
do until(done); O
set advrpt.demog(keep=subject clinnum lname fname) O
end=done; O
if lookup.find() = 0 then output hashnames; O
end;
stop; G
run;
data hashnames;
merge demog
clinicnames;
by clinnum;
run;
Load the names in
the LOOKUP hash
object, indexed by
clinic number. O
For each number
retrieve the
name.
DOW Loop
Many of the examples in this talk are based on
material found in the new SAS Press book:

Carpenter's Guide to
Innovative
SAS

Techniques
by Art Carpenter

and are used with
permission of the author.


50
51
Innovative Techniques:
Doing More with
Loops and Arrays
Arthur L. Carpenter
California Occidental Consultants
10606 Ketch Circle
Anchorage, AK 99515

(907) 865-9167
art@caloxy.com
www.caloxy.com

52

Copyright 2012, SAS Institute Inc. All rights reserved.
Q & A
53

Copyright 2012, SAS Institute Inc. All rights reserved.
Additional Resources
Carpenter's Guide to Innovative SAS Techniques

Upcoming Live Webinars
April 25: Live at SAS Global Forum: Building Better Business
Intelligence with SAS
May 10: Social Networks in Data Mining: Challenges and Applications

Upcoming Live Events
2012 SAS Global Forum livestream of the conference
April 22 25, 2012

SAS Talks on support.sas.com

Follow along on Twitter using #sastalks

Copyright 2011, SAS Institute Inc. All rights reserved.
support.sas.com

You might also like