You are on page 1of 36

SAS Training Session 3

Advanced Topics Using SAS

Sun Li Centre for Academic Computing lsun@smu.edu.sg

Outline
Using arrays in SAS
Recoding variables, computing new variable, collapsing over variables Identify patterns across variables using arrays Reshaping data format btw long and wide using arrays

Introduction to SAS Macro language (7 steps to get started)


Macro variables & functions Defining and calling Macro definition Macro programs for iterative processing

Applied longitudinal studies


Survival analysis Estimating multilevel models using SAS Panel data analysis

Using arrays in SAS


Using arrays in SAS
Recoding variables Computing new variables Collapsing over variables Identify patterns across variables using arrays Reshaping data format btw long and wide using arrays
ARRAY array_name(n) variable_list; <DO - END loop>;

Note: SAS arrays always work with DO-END loop.

Using arrays in SAS


DATA faminc faminc; ; input famid faminc1faminc1-faminc12 ; datalines; datalines; 1 3281 3413 3114 2500 2700 3500 3114 -999 3514 1282 2434 2818 2 4042 3084 3108 3150 -999 3100 1531 2914 3819 4124 4274 4471 3 6015 6123 6113 -999 6100 6200 6186 6132 -999 4231 6039 6215 ;

**recoding variables; DATA recode_missing recode_missing; ; set faminc; faminc; array inc[12] faminc1 - faminc12; * do i = 1 to 12; * if inc[i inc[i]=]=-999 then inc[i inc[i]=.; end; drop i; RUN;

array inc(*) faminc:; faminc:; do i =1 to dim(inc);

Using arrays in SAS


**computing new variables; DATA tax_array; tax_array; set recode_missing; recode_missing; array inc(12) faminc1faminc1-faminc12; array tax(12) taxinc1taxinc1-taxinc12; do month = 1 to 12; tax[month] = inc[month]*0.1; end; RUN; * collapsing over variables; DATA quarter_array; quarter_array; set faminc; faminc; array Afaminc(12) Afaminc(12) faminc1faminc1-faminc12; /*existing vars*/ vars*/ array Aquarter(4) Aquarter(4) incq1incq1-incq4; /* new vars */ do q = 1 to 4; Aquarter[q] Aquarter[q] = Afaminc[3*q Afaminc[3*q[3*q-2] + Afaminc[3*q Afaminc[3*q[3*q-1] + Afaminc[3*q]; Afaminc[3*q]; end; RUN; /* example: For q=1: Aquarter[1] Aquarter[1] = Afaminc[3*1 Afaminc[3*1[3*1-2] + Afaminc[3*1 Afaminc[3*1[3*1-1] + Afaminc[3*1] Afaminc[3*1] = Afaminc[1] Afaminc[1] + Afaminc[2] Afaminc[2] + Afaminc[3] Afaminc[3] For q=2: Aquarter[2] Aquarter[2] = Afaminc[3*2 Afaminc[3*2[3*2-2] + Afaminc[3*2 Afaminc[3*2[3*2-1] + Afaminc[3*2] Afaminc[3*2] = Afaminc[4] Afaminc[4] + Afaminc[5] Afaminc[5] + Afaminc[6] Afaminc[6] */

Using arrays in SAS


**identify patterns across variables using arrays; DATA pattern; set faminc; faminc; length ever $ 4; array Afaminc(12) Afaminc(12) faminc1faminc1-faminc12; /* existing vars */ array Alowinc(2:12) Alowinc(2:12) lowinc2lowinc2-lowinc12; /* new vars */ do m = 2 to 12; if Afaminc[m] Afaminc[m] < (Afaminc (Afaminc[m Afaminc[m[m-1] / 2) then Alowinc[m] Alowinc[m] = 1; else Alowinc[m] Alowinc[m] = 0; end; sum_low = sum(of lowinc:); lowinc:); if sum_low > 0 then ever='Yes'; if sum_low = 0 then ever='No'; drop m sum_low; sum_low; RUN; **reshaping from wide to long; DATA long_array; long_array; set faminc; faminc; array Afaminc(12) Afaminc(12) faminc1 - faminc12; do month = 1 to 12; faminc = Afaminc[month]; Afaminc[month]; output; end; drop faminc1faminc1-faminc12; RUN;

Using arrays in SAS


**reshaping from long to wide;

FIRST. : indicates the first observation for PROC SORT data=long_array data=long_array; long_array; each unique value of by-variable; by famid; famid; RUN; LAST. : indicates the last observation for each DATA wide_array; wide_array; unique value of by-variable. set long_array; long_array; by famid; famid; retain faminc1faminc1-faminc12; array Afaminc(12) Afaminc(12) faminc1faminc1-faminc12; if first.famid then do; do i = 1 to 12; Afaminc[ Afaminc[i] = .; end; end; Afaminc(month) Afaminc(month) = faminc; faminc; if last.famid then output; drop month faminc i; RUN;
Note: When using first.var_name or last.var_name we must first sort the data set on the variable of interest. Moreover, in the data step we must always precede first.var_name or last.var_name with a by var_name statement.

Introduction to SAS Macro language


7 steps to get started using SAS Macros
1. 2. 3. 4. 5. 6. 7. Write your program and make sure it works Use Macro variables to facilitate text substitution Use simple Macro functions Create symput and symget function to pass information to and from a data step Make the program into a Macro definition Use parameters in the Macro and specify the parameters when the Macro is called Use the iterative SAS language within a Macro definition to execute code iteratively.

SAS Macro Language Documentation

Introduction to SAS Macro language


Step 1: Write your program and make sure it works Step 2: Use Macro variables to facilitate text substitution Macro variables: All the key words in statements that are related to macro variables or macro programs are preceded by percent sign % To refer macro variables in your program, preface the name of the macro variables with an ampersand sign &
DATA USPopulation; USPopulation; ... ; PROC MEANS data=USPopulation data=USPopulation; USPopulation; var population year yearsq; yearsq; RUN; PROC REG data=USPopulation data=USPopulation; USPopulation; model Population=Year YearSq; YearSq; RUN; QUIT;

Introduction to SAS Macro language


**Step2: use macro variables to facilitate text substitution; options symbolgen; symbolgen; *defining a macro variable; %let data=uspopulation data=uspopulation; uspopulation; %let indvar=year indvar=year yearsq; yearsq;

Define a macro variable by using %let statement; Dispaly macro variable values as text in the SAS log by using %put statement;

*using a macro variable; *double quotes vs single quotes; quotes; title "the date is &sysdate9 and today is &sysday &sysday"; sysday"; title2 'the date is &sysdate9 and today is &sysday &sysday'; sysday'; PROC MEANS data=&data; var population &indvar &indvar; indvar; RUN; PROC REG data=&data; model Population=&indvar Population=&indvar; indvar; RUN; QUIT; *displaying text in log; %put &sysdate9 is the date on which you invoked SAS.; *displaying SAS system macro variables; %put _automatic_;

Introduction to SAS Macro language


There are many functions that are related to macro variables. They include string functions, evaluation functions and others. Step 3: Use simple Macro functions
**Step3: use simple Macro functions; %let %let %put k = 1; tot = &k + 1; &tot;

**%eval **%eval is only for integer evaluation; %let tot = %eval %eval(&k eval(&k + 1.234); %let tot = %sysevalf(&k + 1.234); %put &tot; %put;

%let tot = %eval %eval(&k eval(&k + 1); %put &tot; %put;

Introduction to SAS Macro language


Step 4: Create symput and symget function to pass information to and from a data step
CALL SYMPUT (new_macro_variable, value_in_string_format) SYMGET (macro_variable') Note: that the macro variable here has to be in single quotes.

Step 5: Make the program into a Macro definition Step 6: Use parameters in the Macro and specify the parameters when the Macro is called
Start the macro definition with %MACRO macro_name; End the macro with %MEND macro_name; To invoke the macro definition, use %macro_name Note: there is no semicolon at the end of macro definition when the macro is called.

Introduction to SAS Macro language


*Step4, 5 and Step 6; %macro mexample(data, mexample(data, indvar); indvar); PROC MEANS data=&data; var population &indvar &indvar; indvar; output out=stats mean=avg mean=avg; avg; RUN; PROC PRINT data=stats; RUN; DATA _null_; set stats; dt=put(today(), dt=put(today(), mmddyy10.); call symput('date', symput('date', dt); dt); call symput('average', symput('average', put(avg,7.2)); RUN; DATA new&data.; new&data.; set &data; avg= avg=symget('average')+0; symget('average')+0; RUN; PROC PRINT data=new&data data=new&data; new&data; RUN; %mend; %mexample( mexample(uspopulation,year yearsq) yearsq)

Introduction to SAS Macro language


Step 7: Use the iterative SAS language within a Macro definition to execute code iteratively
DATA file1 file2 file3 file4; input a @@; if _n_ <= 3 then output file1; if 3 < _n_<= 6 then output file2; if 6 < _n_ <= 9 then output file3; if 9 < _n_ <=12 then output file4; datalines; datalines; 1 2 3 4 5 6 7 8 9 10 11 12 ; RUN; %macro combine(num); DATA big; set %do i = 1 %to &num; file&i %end; ; RUN; %mend; %combine(4) DATA logit logit; ; input v1v1-v5 ind1 ind2; datalines; datalines; ... ; RUN; %macro mylogit(num); mylogit(num); %do i = 1 %to &num; title "dependent variable is v&i"; v&i"; PROC LOGISTIC data=logit data=logit des; model v&i = ind1 ind2; RUN; %end; %mend; %mylogit(5) mylogit(5)

Applied longitudinal data analysis


Longitudinal studies: Studies in which individuals are measured repeatedly over time. Most commonly used longitudinal analysis models:
Survival analysis models for studying event occurrence from well-defined time origin to endpoint. Multilevel models for studying individual change systematic change over time. Its outcome data is longitudinal continuous data. Panel data analysis models for studying cross-sectional time series data changes within subjects over time & difference btw subjects.

Reasons of using sophisticated models for longitudinal data:


The repeated observations are usually (positively) correlated. Time-varying predictors

Survival analysis in SAS PROC PHREG


Recommended reading:
Applied Survival Analysis by Hosmer and Lemeshow Survival Analysis: Techniques for Censored and Truncated Data by Klein and Moeschberger

Survival data: time to event data Reason of using survival model:


The distribution of survival data tends to be positively skewed and not likely to be normal distribution and it may not be possible to find a transformation. Time-varying covariates could not be handled. In addition, some duration is censored. (censored obs - right truncation, left truncation, right censoring and left censoring)

Survival analysis in SAS PROC PHREG


Survival Model
Survival function:

S (t ) = P (T t ) = 1 F (t )
f (t ) h (t ) = S (t ) => d log( S (t )) = h (t ) dt

Hazard function:

S (t ) = exp( H (t )) H (t ) is cumulative hazard function.

Survival analysis in SAS PROC PHREG


Kaplan-Meier Estimator:
(t ) = S
The number of individuals who experience the event at time t ( j ) The number of individuals who have not yet experienced the event at time t ( j )

j |t ( j ) t

(1

dj nj

t ( 1 ) < t ( 2 ) .... < t ( n )

Cox Regression:
h i ( t ) = h 0 ( t ) exp( T x i ) => S i ( t ) = S 0 ( t ) exp( => log( H i ( t )) = log H 0 ( t ) + T x i
T xi )

h0 (t )

is the baseline hazard function.

exp( T ( xi x j )) is the hazard ratio (HR) or incident rate ratio.

Survival analysis in SAS PROC PHREG


Example data: telco.csv
Variable name age marital address income ed Variable information Age in years Marital status 0=unmarried 1=married Years in current address Household income in thousands Level of educations 1= didnt complete high school 2= high school degree 3= college degree 4= undergraduate 5= postgraduate Years with current employer Number of people in household Gender 0=male 1=female Months with service Churn within last month 0 = No 1=Yes Customer categories 1= basic service 2= E-service 3= plus service 4=total service

employ reside gender tenure churn custcat

Survival analysis in SAS PROC PHREG


**step1: import data into working library; **step2: exploring the data - univariate Analyses; PROC LIFETEST data=sas3.telco plots=(s); time tenure*churn(0); strata custcat; custcat; RUN; PROC LIFETEST data=sas3.telco plots=(s); time tenure*churn(0); strata ed; ed; RUN; PROC LIFETEST data=sas3.telco plots=(s); time tenure*churn(0); strata marital; RUN; PROC LIFETEST data=sas3.telco plots=(s); time tenure*churn(0); strata gender; RUN;

PROC LIEFETEST <DATA=SAS-data-set> ; TIME time_var*cencor_var(list) ; STRATA categorical_varlist; RUN; PROC PHREG <DATA=SAS-data-set> ; MODEL tvar*cvar(list) =predictors; <program statements> TEST var_list; STRATA strata_varlist; BASELINE OUT=<> COVARIATES=<>; RUN;

Survival analysis in SAS PROC PHREG


**step3: model building; PROC PHREG data=sas3.telco; model tenure*churn(0)=marital address income ed employ custcat2 custcat3 custcat4; custcat2=(custcat custcat2=(custcat=2); custcat=2); custcat3=(custcat custcat3=(custcat=3); custcat=3); custcat4=(custcat custcat4=(custcat=4); custcat=4); cust_categories: cust_categories: test custcat2, custcat3, custcat4; RUN; PROC PHREG data=sas3.telco; model tenure*churn(0)=marital address employ addresst employt custcat2 custcat3 custcat4; custcat2=(custcat custcat2=(custcat=2); custcat=2); custcat3=(custcat custcat3=(custcat=3); custcat=3); custcat4=(custcat custcat4=(custcat=4); custcat=4); employt=employ*log(tenure); employt=employ*log(tenure); addresst=address*log(tenure); addresst=address*log(tenure); time_varying: time_varying: test employt, employt, addresst; addresst; RUN; PROC PHREG data=sas3.telco; model tenure*churn(0)=marital address employ; strata custcat; custcat; RUN;

Survival analysis in SAS PROC PHREG


**step4: prediction; DATA cov_pat; marital = 1; address = 1; employ = 3; custcat2 = 0; RUN;

custcat3 = 1;

custcat4 = 0;

PROC PHREG data=sas3.telco; model tenure*churn(0)=marital address employ custcat2 custcat3 custcat4; custcat2=(custcat custcat2=(custcat=2); custcat=2); custcat3=(custcat custcat3=(custcat=3); custcat=3); custcat4=(custcat custcat4=(custcat=4); custcat=4); baseline out=surv out=surv covariates=cov_pat covariates=cov_pat survival=surv survival=surv / nomean; nomean; RUN; goptions reset=all; symbol c=red v=triangle h=.8 i=stepjll; stepjll; axis1 label=(a=90 'Survivorship function'); PROC GPLOT data=surv data=surv; surv; plot surv*tenure=marital surv*tenure=marital / vaxis=axis1; vaxis=axis1; RUN; QUIT;

Multilevel modeling in SAS PROC MIXED


Recommended reading:
Introduction to Multilevel Modeling by Ita Kreft and Jan de Leeuw Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling by Tom Snijders and Roel Bosker Multilevel Analysis: Techniques and Applications by Joop Hox

Multilevel data: nesting structured data (example: hsb.sas7bdat)


Variable name Variable information student-level math achievement score (outcome variable) social-economic-status of a student -- student-level the group mean of SES (school-level) Indicating if a school is public or catholic (school-level) 0= public schools 1= catholic schools

MATHACH SES MEANSES SECTOR

Multilevel modeling in SAS PROC MIXED


Nature of nesting:
One level nesting, two-level nesting, or more! Cross nested structures Multivariate dependent variable

Linear /continuous data: PROC MIXED Non-linear: PROC NLMIXED (skip)


Dichotomous: logistic, probit Ordinal: logistic Multinomial: logisitic Count: Poisson, negative binomial Censored/limited continuous: Tobit

Multilevel modeling in SAS PROC MIXED Hierarchical notation:


Yij = 0 j + 1 j X ij + rij

Mixed model notation:


Yij = 00 + 01Z j + 10 X ij +

0 j = 00 + 01Z j + u0 j 1 j = 10 + 11Z j + u1 j

11Z j X ij + u0 j + u1 j X ij + rij

Multilevel modeling in SAS PROC MIXED


PROC MIXED COVTEST <DATA=SAS-data-set> ; CLASS variables; MODEL dep_var = predictors / SOLUTION; RANDOM variables / SUBJECT=var; RUN;

STATEMENT

interpretations produces asymptotic standard errors and Wald Z-tests for the covariance parameter estimates. Identifies categorical variables. prints the fixed effects estimates in the output. identifies the elements of the model to be specified as random effects Identifies grouping variable

COVTEST CLASS SOLUTION RANDOM SUBJECT

Multilevel modeling in SAS PROC MIXED


PROC SQL; create table hsb2 as select *, mean(ses mean(ses) ses) as meanses, meanses, sesses-mean(ses mean(ses) ses) as cses from sas3.hsb group by schoolid; schoolid; QUIT; *model 1; PROC MIXED covtest data=hsb2; class schoolid; schoolid; model mathach= mathach= /solution; random intercept /subject=schoolid /subject=schoolid; schoolid; RUN; *model 2; PROC MIXED covtest data=hsb2; class schoolid; schoolid; model mathach= mathach= meanses sector /solution ddfm= ddfm=bw notest; notest; random intercept /subject=schoolid /subject=schoolid; schoolid; RUN; *model 3; PROC MIXED covtest data=hsb2; class schoolid; schoolid; model mathach= mathach= meanses sector cses /solution ddfm= ddfm=bw notest; notest; random intercept cses /subject=schoolid /subject=schoolid; schoolid; RUN;

Multilevel modeling in SAS PROC MIXED


*final model; PROC MIXED covtest data=hsb2; class schoolid; schoolid; model mathach= mathach= meanses sector cses meanses* meanses*cses sector*cses sector*cses /solution ddfm= ddfm=bw notest; notest; random intercept cses /subject=schoolid /subject=schoolid; schoolid; RUN; RUN; PROC UNIVARIATE data=hsb2; var meanses; meanses; RUN; DATA toplot toplot; ; set hsb2; if meanses<= meanses<=<=-0.323 then do; ms=ms=-0.323; strata="Low"; end; else if meanses>=0.327 meanses>=0.327 then do; ms=0.327; strata="Hig end; strata="Hig"; Hig"; else do; ms=0.032; strata="Med" ; end; predicted=12.1282+5.3367*ms+1.2245*sector+2.9407*cses+1.0345*ms*c sesses-1.6388*sector*cses 1.6388*sector*cses; cses; RUN;

Multilevel modeling in SAS PROC MIXED

goptions reset=all; symbol1 v=none i=join c=red ; symbol2 v=none i=join c=blue ; axis1 order=(order=(-4 to 3 by 1) minor=none label=("Group Centered SES"); axis2 order=(0 to 22 by 2) minor=none label=(a=90 "Math Achievement Score"); PROC GPLOT data = toplot; toplot; by strata; plot predicted*cses predicted*cses = sector / vaxis = axis2 haxis = axis1; RUN; QUIT;

Panel data analysis in SAS PROC TSCSREG


Variables Cases(nt) 11 12 1t 21 22 2t 31 32 3t nt x1 . . . . . . . . . . . . . . . x2 . . . . . . . . . . . . . . . x3 . . . . . . . . . . . . . . . xj . . . . . . . . . . . . . . .

Panel data analysis in SAS PROC TSCSREG


Panel data:
also called cross-sectional time series data with multiple cases (people, nations, firms, etc) for two or more time periods. Cross sectional information: difference btw subjects, btw subject effects. Time series: changes within subjects over time, within-subject effects.

Two effects models:

Panel data analysis in SAS PROC TSCSREG


Panel data analysis models in SAS
Regression Procedures PROC REG LSDV1 LSDV2 LSDV3 w/o dummy /NOINT RESTRICT PROC TSCSREG Fixed effect (within effect) Two-way fixed (within effect) Random effect Two-way random /FIXONE /FIXTWO /RANONE /RANTWO

PROC TSCSREG <DATA=SAS-data-set> ; ID cross_sectional_id_var time_series_id_vari; MODEL depedent_var = regressors / options; RUN;

Panel data analysis in SAS PROC TSCSREG

Panel data analysis in SAS PROC TSCSREG


Example data: The Demand for Liquid Assets (liqassets)
Variable name State year d t s y rd rt rs Variable information CA,DC,FL,IL,NY,TX,WA 1949 - 1959 Per Capita Demand Deposits Per Capita Time Deposits Per Capita S & L Association Shares Permanent Per Capita Personal Income Service Charge on Demand Deposits Interest on Time Deposits Interest on S & L Association Shares

Panel data analysis in SAS PROC TSCSREG


**one**one-way Fixed effect model: *LSDV1 in PROC REG; PROC REG data=b; model d = y rd rt rs ds1 ds1-ds6; test ds1ds1-ds6; RUN; QUIT; *LSDV1: PROC TSCSREG; PROC TSCSREG data=sas3.liqassets; id state year; model d = y rd rt rs / fixone; fixone; RUN; *one*one-way Random effects model; PROC TSCSREG data=sas3.liqassets; id state year; model d = y rd rt rs / ranone; ranone; RUN; **two**two-way Fixed effect model: *LSDV1 in PROC REG; PROC REG data=b; model d = y rd rt rs ds1ds1-ds6 d49d49-d58; test ds1ds1-ds2; test d49d49-d58; test ds1ds1-ds2,d49ds2,d49-d58; RUN; QUIT; *LSDV1: PROC TSCSREG; PROC TSCSREG data=sas3.liqassets; id state year; model d = y rd rt rs / fixtwo; fixtwo; RUN; *two*two-way Random effects model; PROC TSCSREG data=sas3.liqassets; id state year; model d = y rd rt rs / rantwo; rantwo; RUN;

Thanks!
CAC statistical WIKI page:
http://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/SAS.aspx

Statistical consultation service: lsun@smu.edu.sg

You might also like