Professional Documents
Culture Documents
Stored
Data Files
Data
Cleaning
Data
Data
Transformation
Data
Analysis
Data
Transformation
& Publishing
Results
Analysis
What is SAS?
Originally stood for Statistical Analysis System developed by SAS
Institute in Cary, NC.
A few years back, SAS officially dropped the name and is now simply known by its acronym
SAS (pronounced sass)
Explorer
and
Results
Window
Log Window
View the
Log
Created by
Program
Execution
Output Window
View the
Values of
Data set in
This
window
SAS Interface
Program Editor
Log Window
Output Window
Excel
Txt
Oracle
Access
Teradata
Creating a dataset
Using SAS
Cards or Datalines
Infile
Column Input/ Delimiters
Formats/ Informats
Observations
DATA Step
PROC Step
Procedures
SAS statements that can perform statistical analysis, create &
print reports and graphs
AIR
112
118
132
new_dist
179200
188800
211200
calc_date
12Feb07
12Feb07
12Feb07
Data Value - Data value is the basic unit of information. Eg. 112.
Column/Field/Variable - A set of data values that describes a given attribute makes up
a Variable.Eg. AIR , calc_date
Types of Variables(Data Types)
A date is stored as
a number in SAS
Observation - All the data values associated with a case, a single entity, an account,a
subject, an individual make up an Observation. Each row in a dataset is an Observation.
Obs
DATE
AIR
new_dist
179200
calc_date
12Feb07
SAS Data Set - Is the special way that SAS organizes and stores the data. Appears as a
rectangular table with columns and rows.
data mydata;
set sashelp.air;
new_dist = air * 1.6 * 1000;
calc_date = today();
format calc_date date.;
run;
Rules
for
SAS
Names
Can be up to 32 characters long
First character must be a letter (A, b, C, . . ., z) or underscore (_).
Subsequent characters can be letters, numeric digits (0, 1, . . ., 9), or
underscores.
SAS is NOT case-sensitive. SAS processes names as uppercase
regardless of how it is typed.
Blanks cannot appear in SAS names.
Special characters, except for the underscore, are not allowed. In file
reference, the dollar sign ($), pound sign (#), and at sign (@) can be
used.
SAS reserves a few names for automatic variables and variable lists.
For example, _N_ and _ERROR_ .
Dates are stored as Numeric data 01 JAN 1960 stored as 0, 02 JAN
1960 as 1
Data
this_is_the_RISK_score;
THis_is_the_application_DATE = '02Jan1960'd;
run;
2000
3000
2500
3200
8000
set sample_accounts;
run;
Data sample_accounts;
CARDS;
1234670 11-Sep-04 Z 2000
1234671 12-Sep-04 3000
1234672 13-Sep-04 Z 2500
1234673 14-Sep-04 T 3200
Run;
Getting Data In
Getting
Data
In...
Reading fixed formatted data from an external file .
A text file named sample_accounts.txt is saved in d:\Mydata directory. Contents of
the file:
1234670
1234671
1234672
1234673
11-Sep-04
12-Sep-04
13-Sep-04
14-Sep-04
Z
Z
T
2000
3000
2500
3200
DATA acctinfo;
INPUT acctnum $8. date mmddyy10. amount comma9.;
CARDS;
0074309801/15/2001$1,003.59
1028754301/17/2001$672.05
3320899201/19/2001$702.77
0345900601/19/2001$1,209.61 ;
run;
2,341
6572
5,644
552
23A
67B
34B
36A
01JAN2004
15NOV2005
29OCT2005
12SEP2004
Format is just a representation of data. It does not change the value of the variable.
DatasetsDefining Variables
Common Ways of Variable Definition
Using an assignment statement
Data var_test;
id ='JK'; NProducts= 6; pro_price = 4.555;
tot_cost = NProducts*pro_price; final_price = tot_cost;
run;and INPUT statement
Using
DATA acctinfo;
INPUT acctnum $8. date mmddyy10. amount comma9.;
CARDS;
Through
a LENGTH statement
0074309801/15/2001$1,003.59
;run;
Data var_test;
length id $ 10;length NProducts 4;Run;
Data target_data ;
Set base_data;
keep var1 var2 var3 ;Run;
DROP
Data target_data ;
Set base_data;
drop var1 var2 var3 ;Run;
A format is an instruction that SAS uses to write data values to control the
written appearance of data values.
Output:
Obs
today
data format_test;
today =today();
1
MAR2006
dollar_amt =13400.5;
format dollar_amt dollar10.2 today monyy7.;
run;
proc print data = format_test;run;
dollar_amt
$13,400.50
Some formats:
$REVERJw. Writes character data in reverse order and preserves blanks
$UPCASEw. Converts character data to uppercase
MONYYw.
Writes date values as the month and the year in the form
mmmyy or mmmyyyy
QTRw.
Writes date values as the quarter of the year
DOLLARw.d Writes numeric values with dollar signs, commas, and decimal
points
WORDSw.
Writes numeric values as words
Function
Name
LENGTH
LOWCASE
SCAN
SUBSTR
UPCASE
DATEPART
MONTH
TODAY
MAX
MEAN
SUM
LOG
SQRT
RANUNI
INPUT
LAG
PUT
CEIL
ROUND
TRUNC
Description
Returns the length of an argument
Converts all letters in an argument to lowercase
Selects a given word from a character expression
Extracts a substring from an argument
Converts all letters in an argument to uppercase
Extracts the date from a SAS datetime value
Returns the month from a SAS date value
Returns the current date as a SAS date value
Returns the largest value
Returns the arithmetic mean (average)
Returns the sum of the nonmissing arguments
Returns the natural (base e) logarithm
Returns the square root of a value
Returns a random variate from a uniform distribution
Returns the value produced when a SAS expression that uses a
specified informat expression is read
Returns values from a queue
Returns a value using a specified format
Returns the smallest integer that is greater than or equal to the
argument
Rounds to the nearest round-off unit
Truncates a numeric value to a specified length
1004 5000;
DATA credit_limit;
MERGE intial_cl new_cl;
BY account;
DATA credit_limit;
MERGE
intial_cl(in=a) new_cl(in=b) ;
Output 1:
Obs account credit_limit
1
1002
3000
2
1003
4000
3
1004
5000
Output 2:
Obs
account
limit
BY account;
IF
RUN;
a and b;
1
2
1002
1004
3000
5000
WHERE Condition
DATA account_perf;
INPUT account current_os status_code $;
cards;
Output:
1002 300 A
1003 20 A
1004 1200 C
1002
1004
1005 800 Z
;run;
Data perf;
set account_perf;
where current_os >100 and status_code ne 'Z';
run;
300
1200
If answer=9 THEN
do;
flag1=NINE;
flag2=9;
end;
ELSE
do;
flag1=NOT NINE;
flag2=0;
end;
IF x=0 THEN
IF y ne 0 THEN put 'XY 0N0';
ELSE put 'XY 00';
ELSE put 'X n0';
If answer=9 THEN
do;
flag1=NINE;
flag2=9;
end;
do i=1 to n by m;
...more SAS statements...
if i=10 then leave;
end;
if i=10 then put 'EXITED LOOP';
Example Iterative
Example DO UNTIL
Example DO WHILE
do
do
do
do
do
do
n=0;
do until(n>=5);
put n=;
n+1;
end;
n=0;
do while(n<5);
put n=;
n+1;
end;
month='JAN','FEB','MAR';
count=2,3,5,7,11,13,17;
i=var1, var2, var3;
i=1 to 10;
i=1 to k-1, k+1 to n;
i=n to 1 by -1;
23
56
43
21
Bangalore
BNG
BNG
BNG
BNG
Gurgaon
pqr 45 GGN
htr 47 GGN
gyt 23 GGN
Jaipur
Subsetting Solution I
Subsetting IF
Data Bangalore;
Set city_info;
if city EQ 'BNG';
run;
Data Gurgaon;
Set city_info;
if city EQ 'GGN';
run;
Data Jaipur;
Set city_info;
if city EQ 'JPR';
run;
Subsetting Solution II
Output Statement
Gurgaon
Jaipur
23
56
43
21
BNG
BNG
BNG
BNG
City_info
Name age city
abc
gtr
LGK
STY
pqr
htr
gyt
lpq
AWQ
LVS
23
56
43
21
45
47
23
77
56
46
BNG
BNG
BNG
BNG
GGN
GGN
GGN
JPR
JPR
JPR
Appending Solution
data city_info;
set Bangalore Gurgaon Jaipur;
run;
data Bangalore;
set Bangalore end=end_of_data;
if end_of_data then total_BNG_tourist=_n_;
run;
Decisions based on the source of observation (in=)
data food_quality;
set Gurgaon (in=a) Jaipur (in=b) Bangalore(in=c);
length cafe_food $13.;
if a then cafe_food = 'Very Bad';else
if b then cafe_food = 'Just Bad';else
if c then cafe_food = 'Very Very Bad';
run;
data new;
set study (firstobs=5 obs=10);
run;
Selection of Observations
Processing records which satisfy a condition
data new;
set study;
where age > 21;
run;
data new;
set study;
if age > 21;
run;
Merging Datasets
Match Merging
Match-merging combines observations from two or more SAS data sets into
a single observation in a new data set according to the values of a common
variable.
One-to-One Merging
USING SET
Data one2one;
Set animal;
Set plant;
Run;
USING MERGE
Data one2one;
Merge animal plant;
Run;
Sample
data animal;
input zoo $ code $;
datalines;
ant a
ape a
bird b
cat c
dog d
eagle e
;
run;
data plant;
input bot $ code $;
datalines;
apple
a
banana
b
coconut
c
celery
c
dewberry
d
eggplant
e
;
run;
Match Merging
Match merging requires data to be pre-sorted (or grouped) by the match keys:
/* Sorting the Two Sets */
proc sort data = test1;
by acct_no;
run;
proc sort data = test2;
by acct_no;
run;
/* Merging the Two Sets */
data merge1;
merge test1(in=a) test2(in=b);
by acct_no;
if a and b;
run;
AB
Match Merging..
Merging the Two Datasets
AUB
data merge1;
merge test1(in=a) test2(in=b);
by acct_no;
if a or b; /*Outer Join: Default */
run;
Left Join
data merge1;
merge test1(in=a) test2(in=b);
by acct_no;
if a; /* Left Join */
run;
Match Merging
Example
data merge1;
merge test1(in=a) test2(in=b);
by acct_no;
if b and NOT a;
run;
Example
data merge1;
merge test1(in=a) test2(in=b);
by acct_no;
if NOT (a and b);
run;
B Ac
(A B)c
SAS Procedures
PROC CONTENTS
PROC PRINT
PROC DATASETS
PROC FORMAT
PROC SORT
PROC FREQ
PROC MEANS
PROC SUMMARY
PROC TABULATE
PROC IMPORT / EXPORT
PROC SQL
PROC UPLOAD
PROC DOWNLOAD
Tran_Code
data test2;
input Acct_no $4. +4 Store $4. +4 Subdiv $1. +4
cards;
2610
2909
B
Alaska
9902
1495
A
Arizona
4198
1241
B
Dakota
3950
1444
A
Arkansas
5304
2537
B
Kansas
0097
1555
A
Virginia
2054
1212
B
Washington
2174
2739
B
Wisconsin
;
$3. AMT;
PROC CONTENTS
WORK.SALES1
DATA
V8
17:22 Monday, February 17, 2003
17:22 Monday, February 17, 2003
Observations:
Variables:
Indexes:
Observation Length:
Deleted Observations:
Compressed:
Sorted:
8192
1
1
145
21
0
C:\DOCUME~1\004012\LOCALS~1\Temp\SAS Temporary
Files\_TD1056\sales1.sas7bdat
8.0202M0
WIN_PRO
4
AMT
Num
8
8
1
Acct_no
Char
4
16
7
District
Char
12
28
8
Region
Char
15
40
5
Store
Char
4
23
6
Subdiv
Char
1
27
2
TRAN_DT
Num
8
0
3
Tran_Code
Char
3
20
21
8
0
56
0
NO
NO
PROC PRINT
PROC SORT
Sorts the data set with respect to the variable/s mentioned
Default sorts in ascending order
Default Sorting
proc sort data=sales1 out=sales2;
by acct_no tran_dt;
run;
To Sort Acct_no in descending order
proc sort data=sales1 out=sales2;
by descending acct_no tran_dt;
run;
To Delete Replicative Accounts
proc sort data=sales1 out=sales2 nodupkey;
by descending acct_no tran_dt;
run;
PROC DATASETS
PROC DATASETS is a utility procedure that helps to manage the SAS datasets in various
libraries. In a multi-user environment like ours we are constrained by the system resources
like SAS Workspace or the shared folders. To remove unnecessary files and manage the
datasets.
libname mylib 'D:\myfolder';
DATA mylib.intial_cl;
INPUT account credit_limit;
DATALINES;
1002 2000
1003 4000
1004 3000
;
DATA mylib.new_cl;
INPUT account credit_limit;
DATALINES;
1002 3000
1004 5000
Line 4: copy from mylib library to work library the file specified
in select statement
1005 2500
(brand_new_cl)
Default
PROC MEANS
Basic Code
proc means data=sales1;
run;
Specific Statistics for Specific Variable
proc means data=sales1 N Nmiss Sum Mean;
var amt;
run;
Statistics for each Region (1st Option)
proc sort data = sales1 out=sales2;
by region;
run ;
proc means data=sales2 N Nmiss Sum Mean;
by Region;
var amt;
run;
PROC MEANS
Statistics for each Region (2nd Option)
proc means data=sales1 N Nmiss Sum Mean;
class Region;
var amt;
run;
PROC FREQ
Uni-Dimensional
proc freq data=sales1;
table district;
run;
Two-Dimensional
proc freq data=sales1;
table district*region;
run;
Two-Dimensional with only freq counts
proc freq data=sales1;
table district*region /norow nocol nopercent;
run;
PROC SUMMARY
Summary without NWAY option
Proc summary data = sales1;
class Region District Store;
var Amt;
output out = Sales_Summ sum =;
run;
Proc print data = Sales_Summ;run;
Summary with NWAY option
Proc summary data = sales1 nway missing;
class Region District Store;
var Amt;
PROC EXPORT
Exports data from SAS datasets to Excel,
Access, CSV files, Delimited files
proc export
run;
data = dataset-name
dbms = dbms-name
outfile/outtable = 'path\filename' replace;
Data Manipulation
Sorting
Order by
Merging
SQL
Manipulations (Vertical Processing)
proc sql;
create table act_info1 as
select
acct_no as act_num,
count(amt) as cnt_trans,
sum(amt) as act_sales,
min(amt) as min_sales,
max(amt) as max_sales
from
sales1
group by
acct_no
order by
acct_no;
quit;
act_num
97
2174
2610
3950
4198
5304
9902
data act_list;
proc sql;
create table act_info2 as
Select * from sales1 where acct_no in (select acct_no from act_list);
quit;
proc sql;
create table act_info3 as
Select * from sales1 a, act_list b where
quit;
b.acct_no = a.acct_no;
FTP
Proc Upload Statement
FTP
PROC UPLOAD
libname in remote-host-SAS-data-library;
proc upload data= dataname out = in.dataname;run;
data test1;
input acct $4. sale ret;
cards;
0001 20 5
0002 10 6
0001 30 5
;
rsubmit;
libname in1 '/home/rnayakar';
proc upload data = test1 out = in1.test1;
run;
Endrsubmit;
rsubmit;
proc print data = in1.test1;
run;
Endrsubmit;
PROC DOWNLOAD
libname in remote-host-SAS-data-library;
proc download data = in.dataname out = dataname1;
run;
rsubmit;
libname in1 '/home/rnayakar';
proc download data = in1.test1 out = test2;
run;
endrsubmit;
Remote
SAS
Commands
through
Local SAS
Invisible
Interaction
TCP / IP Socket
Data found in character, numeric and date/time formats PUT & INPUT functions used to convert data types
PUT function
numeric to character
Zip code is stored as a number in zip_n variable e.g., 65401, 4567
data new;
set data1;
zip_c = put(zip_n, $5.);
zip_c_1 = put(zip_n, z5.);
run;
INPUT function
character to numeric
Zip code is stored as a character in zip_c variable e.g., 65401, 04567
data new;
set data2;
zip_n = input(zip_c, 8.);
run;
character to date
Date is stored as a character string in dt_c variable e.g. 12JUN2008
data new;
set data3;
dt_n = input(dt_c, date9.);
format dt_n date9.;
run;
Character Functions
Function
Use
Syntax
Compress
Compress (source,characters-to-remove)
Left
LEFT (argument)
Length
LENGTH (argument)
Right
Right (argument)
Scan
Character Functions
Function
Use
Syntax
Substr
Tranwrd
Trim
Removes the trailing blanks from the left-hand side of a variable value.
Trim (argument)
Numeric Functions
Function
Use
Syntax
Sum
To sum variables (it will give required result even if one or two of the
arguments are missing)
Max/Min
Mean
Gives average
Round
Date/Time functions
Function
Use
Syntax
Year
Year (argument)
Hour
Hour (argument)
Minute
Minute (argument)
Second
Second (argument)
Datepart
Datepart (argument)
Timepart
Timepart (argument)
MDY
Returns a SAS date value from the numeric values for month, date and
year
HMS
Returns a SAS time value from the numeric values for hour, minutes and
seconds
Today
Today()
Date
Date()
Datetime
Datetime()
RANNOR(seed)
returns a random variate from a standard normal distribution
RANBIN(seed, n, p)
returns a random variate from a binomial distribution
RANUNI(seed)
returns a random variate from a uniform distribution
Selecting a% of observation from a large dataset
Sample Selection using Data step and random number function
data dev;
set total;
if ranuni(12345) <= 0.5;
run;
data devnew valnew;
set totalnew;
if ranuni(12345) <= 0.4 then output devnew;
else output valnew;
run;
BY-Group Processing
Key Words: By and Retain
Important Note: Dataset has to be sorted by the Key
variable
do;
first_purchase = amt_tran ;
sale_am = 0 ;
sale_no = 0 ;
end;
sale_am + amt_tran * (tran_code = 253)
sale_no + (tran_code = 253) ;
if last.acct_no then output ;
run ;