You are on page 1of 6

SUGI 28 Advanced Tutorials

Paper 19-28

Undocumented and Hard-to-find SQL Features

Kirk Paul Lafler, Software Intelligence Corporation

Abstract
The SQL Procedure contains many powerful and Jurassic Park PG-13 127
elegant language features for advanced SQL users. Lethal Weapon R 110
This paper presents SQL topics that will help Michael PG-13 106
programmers unlock the many hidden features, National Lampoon's Vacat PG-13 98
options, and other hard-to-find gems found in the SQL Poltergeist PG 115
universe. Topics include CASE logic; the COALESCE Rocky PG 120
function; SQL statement options _METHOD, _TREE, Scarface R 170
and other useful options; dictionary tables; automatic Silence of the Lambs R 118
macro variables; and performance issues.
Star Wars PG 124
The Hunt for Red October PG 135
The Terminator R 108
Finding the First Non-Missing Value
The Wizard of Oz G 101
The SQL procedure provides a way to find the first
Titanic PG-13 194
non-missing value in a column or list. Specified in a
SELECT statement, the COALESCE function inspects
a column, or in the case of a list scans the arguments
from left to right, and returns the first non-missing or Summarizing data
non-NULL value. If all values are missing, the result is Although the SQL procedure is frequently used to
missing. display or extract detailed information from tables in a
database, it is also a wonderful tool for summarizing
When coding the COALESCE function, all arguments (or aggregating) data. By constructing simple queries,
must be of the same data type. The example shows data can be summarized down rows (observations) as
one approach on computing the total number of well as across columns (variables). This flexibility
minutes in the MOVIES table. In the event either the gives SAS users an incredible range of power, and
LENGTH or RATING columns contain a missing the ability to take advantage of several SAS-supplied
value, a zero is assigned to prevent the propagation (or built-in) summary functions. For example, it may
of missing values. be more interesting to see the average of some
quantities rather than the set of all quantities.
SQL Code
Without the ability to summarize data in SQL, users
PROC SQL; would be forced to write complicated formulas and/or
SELECT TITLE, routines, or even write and test DATA step programs
RATING,
(COALESCE(LENGTH, 0))
to summarize data. To see how an SQL query can be
AS Tot_Length constructed to summarize data, two examples will be
FROM MOVIES; illustrated: 1) Summarizing data down rows and 2)
QUIT; Summarizing data across rows.

1. Summarizing data down rows


Results The first example shows a single aggregate result
value being produced when movie-related data is
The SAS System summarized down rows (or observations). The
advantages of using a summary function in SQL is
Title Rating Tot_Length that it will generally compute the aggregate quicker
than if a user-defined equation were constructed and
Brave Heart R 177 it saves the effort of having to construct and test a
Casablanca PG 103 program containing the user-defined equation in the
Christmas Vacation PG-13 97 first place. Suppose you wanted to know the average
Coming to America R 116 length of all PG and PG-13 movies in a database
table containing a variety of movie categories. The
Dracula R 130
following query computes the average movie length
Dressed to Kill R 105
and produces a single aggregate value using the AVG
Forrest Gump PG-13 142 function.
Ghost PG-13 127
Jaws PG 125
SUGI 28 Advanced Tutorials

Case Logic
SQL Code In the SQL procedure, a case expression provides a
way of conditionally selecting result values from each
PROC SQL; row in a table (or view). Similar to an IF-THEN
SELECT AVG(LENGTH) AS
Average_Movie_Length construct, a case expression uses a WHEN-THEN
FROM MOVIES clause to conditionally process some but not all the
WHERE RATING IN rows in a table. An optional ELSE expression can be
(PG, PG-13); specified to handle an alternative action should none
QUIT; of the expression(s) identified in the WHEN
condition(s) not be satisfied.
The result from executing this query shows that the
average movie length rounded to the hundredths A case expression must be a valid SQL expression
position is 124.08 minutes. and conform to syntax rules similar to DATA step
SELECT-WHEN statements. Even though this topic is
Results best explained by example, lets take a quick look at
the syntax.
Average_
Movie_Length CASE <column-name>
124.0769 WHEN when-condition THEN result-expression
<WHEN when-condition THEN result-expression>
<ELSE result-expression>
2. Summarizing data across columns END
Being able to summarize data across columns often
comes in handy, when a computation is required on A column-name can optionally be specified as part of
two or more columns in each row. Suppose you the CASE-expression. If present, it is automatically
wanted to know the difference in minutes between made available to each when-condition. When it is not
each PG and PG-13 movies running length with specified, the column-name must be coded in each
trailers (add-on specials for your viewing pleasure) when-condition. Lets examine how a case expression
and without trailers. works.

SQL Code If a when-condition is satisfied by a row in a table (or


view), then it is considered true and the result-
PROC SQL; expression following the THEN keyword is processed.
SELECT TITLE, The remaining WHEN conditions in the CASE
RANGE(LENGTH_TRAIL, expression are skipped. If a when-condition is false,
LENGTH) AS
the next when-condition is evaluated. SQL evaluates
Extra_Minutes
FROM MOVIES each when-condition until a true condition is found
WHERE RATING IN or in the event all when-conditions are false, it then
(PG, PG-13); executes the ELSE expression and assigns its value
QUIT; to the CASE expressions result. A missing value is
assigned to a CASE expression when an ELSE
This query computes the difference between the expression is not specified and each when-condition
length of the movie and its trailer in minutes and once is false.
computed displays the range value for each row as
Extra_Minutes. In the next example, lets see how a case expression
actually works. Suppose a value of Short, Medium,
Results or Long is desired for each of the movies. Using the
movies length (LENGTH) column, a CASE
Extra_ expression is constructed to assign one of the desired
Title Minutes values in a unique column called M_Length for each
Casablanca 0
row of data. A value of Short is assigned to the
Jaws 0
Rocky 0 movies that are shorter than 120 minutes long, Long
Star Wars 0 for movies longer than 160 minutes long, and
Poltergeist 0 Medium for all other movies. A column heading of
The Hunt for Red October 15 M_Length is assigned to the new derived output
National Lampoon's Vacation 7 column using the AS keyword.
Christmas Vacation 6
Ghost 0 SQL Code
Jurassic Park 33
Forrest Gump 0
Michael 0 PROC SQL;
Titanic 36 SELECT TITLE,
LENGTH,
CASE
SUGI 28 Advanced Tutorials

WHEN LENGTH < 120 THEN 'Short'


WHEN LENGTH > 160 THEN 'Long' Brave Heart R Other
ELSE 'Medium'
END AS M_Length Casablanca PG Other
FROM MOVIES; Christmas Vacation PG-13 Other
QUIT; Coming to America R Other
Dracula R Other
Dressed to Kill R Other
Results Forrest Gump PG-13 Other
Ghost PG-13 Other
The SAS System
Jaws PG Other
Jurassic Park PG-13 Other
Title Length M_Length
Lethal Weapon R Other

Michael PG-13 Other
Brave Heart 177 Long
National Lampoon's Vacat PG-13 Other
Casablanca 103 Short
Poltergeist PG Other
Christmas Vacation 97 Short
Rocky PG Other
Coming to America 116 Short
Scarface R Other
Dracula 130 Medium
Silence of the Lambs R Other
Dressed to Kill 105 Short
Star Wars PG Other
Forrest Gump 142 Medium
The Hunt for Red October PG Other
Ghost 127 Medium
The Terminator R Other
Jaws 125 Medium
The Wizard of Oz G General
Jurassic Park 127 Medium
Titanic PG-13 Other
Lethal Weapon 110 Short
Michael 106 Short
National Lampoon's Vacation 98 Short
SQL and the Macro Language
Poltergeist 115 Short
Many software vendors SQL implementation permits
Rocky 120 Medium
SQL to be interfaced with a host language. The SAS
Scarface 170 Long Systems SQL implementation is no different. The
Silence of the Lambs 118 Short SAS Macro Language lets you customize the way the
Star Wars 124 Medium SAS software behaves, and in particular extend the
The Hunt for Red October 135 Medium capabilities of the SQL procedure. SQL users can
The Terminator 108 Short apply the macro facilitys many powerful features by
The Wizard of Oz 101 Short interfacing PROC SQL with the macro language to
Titanic 194 Long provide a wealth of programming opportunities.

From creating and using user-defined macro variables


In another example suppose we wanted to determine and automatic (SAS-supplied) variables, reducing
the audience level (general or adult audiences) for redundant code, performing common and repetitive
each movie. By using the RATING column we can tasks, to building powerful and simple macro
assign a descriptive value with a simple Case applications, SQL can be integrated with the macro
expression, as follows. language to improve programmer efficiency. The best
part is that you do not have to be a macro language
SQL Code heavyweight to begin reaping the rewards of this
versatile interface between two powerful Base-SAS
PROC SQL; software languages.
SELECT TITLE,
RATING,
CASE RATING
WHEN G THEN General
Creating a Macro Variable with
ELSE Other Aggregate Functions
END AS Aud_Level Turning data into information, and then saving the
FROM MOVIES; results as macro variables is easy with summary
QUIT; (aggregate) functions. The SQL procedure provides a
number of useful summary functions to help perform
calculations, descriptive statistics, and other
Results aggregating computations in a SELECT statement or
HAVING clause. These functions are designed to
The SAS System summarize information and not display detail about
data. In the next example, the MIN summary function
Title Rating Aud_Level is used to determine the least expensive product from
SUGI 28 Advanced Tutorials

the PRODUCTS table with the value stored in the would produce a cross-reference listing on the user
macro variable MIN_PRODCOST using the INTO library PATH for the column TITLE in all DATA types.
clause. The results are displayed on the SAS log.
SQL Code
SQL Code
%MACRO COLUMNS(LIB, COLNAME);
PROC SQL NOPRINT; PROC SQL;
SELECT MIN(LENGTH) SELECT LIBNAME, MEMNAME
INTO :MIN_LENGTH FROM DICTIONARY.COLUMNS
FROM MOVIES; WHERE UPCASE(LIBNAME)=&LIB AND
QUIT; UPCASE(NAME)=&COLNAME AND
%PUT &MIN_LENGTH; UPCASE(MEMTYPE)=DATA;
QUIT;
%MEND COLUMNS;
SAS Log Results %COLUMNS(PATH,TITLE);

PROC SQL NOPRINT;


SELECT MIN(LENGTH) Results
INTO :MIN_LENGTH
FROM MOVIES; The SAS System
QUIT;
NOTE: PROCEDURE SQL used: Library
real time 0.00 seconds Name Member Name

%PUT &MIN_LENGTH; PATH ACTORS
97 PATH MOVIES

Building Macro Tools Submitting a Macro and SQL Code with a


The Macro Facility, combined with the capabilities of Function Key
the SQL procedure, enables the creation of versatile For interactive users using the SAS Display Manager
macro tools and general-purpose applications. A System, a macro can be submitted with a function
principle design goal when developing user-written key. This simple, but effective, technique makes it
macros should be that they are useful and simple to easy to run a macro with the touch of a key anytime
use. A macro that violates this tenant of little and as often as you like. All you need to do is define
applicability to user needs, or with complicated and the macro containing the instructions you would like to
hard to remember macro variable names, are usually have it perform, include the macro in each session
avoided. you want to use it in, and enter the SUBMIT
command as part of each macro statement to execute
As tools, macros should be designed to serve the the macro. Then, define the desired function key by
needs of as many users as possible. They should opening the KEYS window, add the macro name, and
contain no ambiguities, consist of distinctive macro save. Anytime you want to execute the macro, simply
variable names, avoid the possibility of naming press the designated function key.
conflicts between macro variables and data set
variables, and not try to do too many things. This Suppose you wanted to determine the values of all
utilitarian approach to macro design helps gain the automatic variables set during the current session. In
widespread approval and acceptance by users. the next example, you enter and save the following
macro statement to inspect the values of current
Column cross-reference listings come in handy when automatic variable settings. By pressing the
you need to quickly identify all the SAS library data designated function key, the macro is submitted,
sets a column is defined in. Using the COLUMNS executed, and the results displayed.
dictionary table a macro can be created that captures
column-level information including column name, SQL Code
type, length, position, label, format, informat, indexes,
as well as a cross-reference listing containing the
SUBMIT %PUT _AUTOMATIC_;;
location of a column within a designated SAS library.
In the next example, macro COLUMNS consists of an
SQL query that accesses any single column in a SAS
library. If the macro was invoked with a user-request
consisting of %COLUMNS(PATH,TITLE);, the macro
SUGI 28 Advanced Tutorials

Debugging SQL Processing Tree as planned.


/-SYM-V-(MOVIES.Title:1 flag=0001)
The SQL procedure offers a couple new options in the
/-OBJ----|
debugging process. Two options of critical importance | |--SYM-V-(MOVIES.Rating:6 flag=0001)
are _METHOD and _TREE. By specifying a | |--SYM-V-(MOVIES.Length:2 flag=0001)
_METHOD option on the SQL statement, it displays | \-SYM-V-(ACTORS.Actor_Leading:2
the hierarchy of processing that occurs. Results are flag=0001)
displayed on the Log using a variety of codes (see /-JOIN---|
table). | | /-SYM-V-(MOVIES.Title:1
flag=0001)
| | /-OBJ----|
Codes Description
| | | |--SYM-V-(MOVIES.Rating:6
sqxcrta Create table as Select
flag=0001)
Sqxslct Select
| | | \-SYM-V-(MOVIES.Length:2
sqxjsl Step loop join (Cartesian)
flag=0001)
sqxjm Merge join
| | /-SRC----|
sqxjndx Index join
sqxjhsh Hash join | | | \-TABL[WORK].MOVIES opt=''
sqxsort Sort | |--FROM---|
sqxsrc Source rows from table | | | /-SYM-V-(ACTORS.Title:1
sqxfil Filter rows flag=0001)
sqxsumg Summary stats with GROUP BY | | | /-OBJ----|
sqxsumn Summary stats with no GROUP BY | | | | \-SYM-V-
(ACTORS.Actor_Leading:2 flag=0001)
In the next example a _METHOD option is specified | | \-SRC----|
to show the processing hierarchy in a two-way equi- | | \-TABL[WORK].ACTORS opt=''
join. | |--empty-
| | /-SYM-V-(MOVIES.Title:1)
SQL Code | \-CEQ----|
| \-SYM-V-(ACTORS.Title:1)
--SSEL---|If you have surplus virtual memory, you can
PROC SQL _METHOD;
achieve faster access to matching rows from one or
SELECT MOVIES.TITLE, RATING, ACTOR_LEADING
FROM MOVIES, more small input data sets. Referred to as a Hash join
.ACTORS the BUFFERSIZE= option can be used to let the SQL
WHERE MOVIES.TITLE = ACTORS.TITLE; procedure hash join larger tables. The default
QUIT; BUFFERSIZE=n option is 64000 when not specified.
Results In the next example, a BUFFERSIZE=256000 is
specified to utilize available memory to load rows. The
NOTE: SQL execution methods chosen are: result is faster performance because of a hash join.
sqxslct SQL Code
sqxjhsh
sqxsrc( MOVIES )
sqxsrc( ACTORS ) PROC SQL _method BUFFERSIZE=256000;
Another option that is useful for debugging purposes SELECT MOVIES.TITLE, RATING, ACTOR_LEADING
is the _TREE option. In the next example the SQL FROM MOVIES, ACTORS
statements are transformed into an internal form WHERE MOVIES.TITLE = ACTORS.TITLE;
showing a hierarchical layout with objects and a QUIT;
variety of symbols. Inspecting the tree output can Results
frequently provide a greater level of understanding of
what happens during SQL processing. NOTE: SQL execution methods chosen are:
sqxslct
SQL Code sqxjhsh
sqxsrc( MOVIES )
PROC SQL _TREE; sqxsrc( ACTORS )
SELECT MOVIES.TITLE, RATING, ACTOR_LEADING
FROM MOVIES,
.ACTORS
WHERE MOVIES.TITLE = ACTORS.TITLE;
QUIT;

Results

NOTE: SQL execution methods chosen are:


sqxslct
sqxjhsh
sqxsrc( MOVIES )
sqxsrc( .ACTORS )
SUGI 28 Advanced Tutorials

Acknowledgments Trademark Citations


I would like to thank Deb Cassidy of Cardinal SAS and SAS Certified Professional are registered
Distribution for accepting my abstract and paper, as trademarks of SAS Institute Inc. in the USA and other
well as the SUGI Leadership for their support of a countries. The symbol indicates USA registration.
great Conference.

Bio
References
Kirk Paul Lafler is a SAS Consultant and SAS

Lafler, Kirk.; Ten Great Reasons to Learn the SQL Certified Professional with 25 years of SAS software
Procedure, SAS Users Group International, 1999. experience. He has written four books and over one
hundred articles for professional journals and SAS
Lafler, Kirk.; Power SAS: A Survival Guide, First Edition;
User Group proceedings. Kirks popular SAS Tips
Apress, Berkeley, CA, USA, 2002. column appears regularly in the BASAS, SANDS, and
SAS Guide to the SQL Procedure: Usage and SESUG Newsletters. His expertise spans application
Reference, Version 6, First Edition; SAS Institute, design and development, training, and programming
Cary, NC, USA; 1990. using base-SAS, SQL, ODS, SAS/FSP, SAS/AF,
SAS SQL Procedure Users Guide, Version 8; SAS SCL, FRAME, and SAS/EIS software.
Institute Inc., Cary, NC, USA; 2000.
Comments and suggestions can be sent to:
SAS SQL Programming Tips: Version 8; Software
Intelligence Corporation, Spring Valley, CA, USA; Kirk Paul Lafler
2002. Software Intelligence Corporation
P.O. Box 1390
Spring Valley, California 91979-1390
E-mail: KirkLafler@cs.com
http://www.software-intelligence.com
Voice: 619.660.2400

You might also like