Professional Documents
Culture Documents
Sun Li
Centre for Academic Computing
lsun@smu.edu.sg
OUTLINE
Advanced procedures
Recoding variables
Reshaping data format btw long and wide using arrays (self-study)
faminc1 * .10 ;
faminc2 * .10 ;
faminc3 * .10 ;
faminc4 * .10 ;
faminc5 * .10 ;
faminc6 * .10 ;
faminc7 * .10 ;
faminc8 * .10 ;
faminc9 * .10 ;
faminc10 * .10 ;
faminc11 * .10 ;
faminc12 * .10 ;
RUN;
incq1
incq2
incq3
incq4
=
=
=
=
faminc1 + faminc2 +
faminc4 + faminc5 +
faminc7 + faminc8 +
faminc10 + faminc11
faminc3;
faminc6;
faminc9;
+ faminc12;
Aquarter[1] =
=
Aquarter[2] =
=
LANGUAGE
2.
3.
4.
5.
6.
7.
SAS MACRO
LANGUAGE
10
SAS MACRO
LANGUAGE
11
SAS MACRO
LANGUAGE
12
SAS MACRO
LANGUAGE
13
SAS MACRO
LANGUAGE
14
SAS MACRO
LANGUAGE
%combine(4)
15
SAS MACRO
LANGUAGE
DATA logit;
input v1-v5 ind1 ind2;
datalines;
1
0
1
0
0
1
0
1
0
1
0
0
1
1
1
1
0
1
1
1
1
1
1
0
0
0
0
0
0
0
;
RUN;
1
0
0
1
1
0
1
0
1
0
0
1
0
1
1
0
1
0
1
0
34
22
12
56
26
46
57
22
44
41
23
32
10
90
80
45
53
77
45
72
%macro mylogit(num);
%do i = 1 %to #
PROC LOGISTIC data=logit des;
model v&i = ind1 ind2;
RUN;
%end;
%mend;
%mylogit(5)
16
ADVANCED PROCEDURES
PROC GENMOD
Bayesian theorem:
Gibbs sampling:
It is a special case of the Metropolis-Hastings algorithm, and
thus an example of a Markov chain Monte Carlo algorithm.
PROC GENMOD
Bayesian Analysis of a Linear Regression Model using Gibbs sampling:
19
PROC GENMOD
20
PROC GENMOD
**Bayesian Analysis of a Linear Regression Model:
ODS html;
ODS graphics on;
PROC GENMOD data=surg;
model y = Logx1 X2 X3 X4 / dist=normal;
bayes seed=1;
ods output PosteriorSample=PostSurg;
RUN;
ODS graphics off;
ODS html close;
21
PROC GENMOD
22
PROC GENMOD
DATA prob;
set postsurg;
indicator = (logX1 > 0);
label indicator= 'log(Blood Clotting Score) > 0';
RUN;
PROC MEANS data = prob n mean;
var indicator;
RUN;
PROC PHREG
24
PROC PHREG
T
h
(
t
)
h
(
t
)
exp(
xi )
Cox Regression Model: i
0
h0 (t ) is the baseline hazard function.
h0 (t )
h(t )
h0 (t ) exp( )
if x 0
if x 1
Bayesian Analysis:
The probability that the hazard of x=0 is greater than that of x=1 is:
25
PROC PHREG
To study the probability of customers switching to other
telecommunication companies: telco.csv
Variable name
Variable information
age
Age in years
marital
Marital status
address
income
ed
Level of educations
1= didnt complete high school
2= high school degree
3= college degree
4= undergraduate 5= postgraduate
employ
reside
gender
Gender
tenure
churn
custcat
0=unmarried 1=married
0=male
1=female
Customer categories
1= basic service 2= E-service 3= plus service 4=complete service
26
PROC PHREG
Bayesian Analysis of the Cox Regression Model:
PROC PHREG
DATA telco;
set sas3.telco;
RUN;
PROC PRINT data=telco (obs=10); RUN;
*Cox Regression Model;
PROC PHREG data=telco;
model tenure*churn(0)=marital address income ed employ custcat1
custcat2 custcat3;
custcat1=(custcat=1);
custcat2=(custcat=2);
custcat3=(custcat=3);
cust_categories: test custcat1, custcat2, custcat3;
RUN;
28
PROC PHREG
*Bayesian Analysis of the Cox Regression Model;
ODS html;
ODS graphics on;
PROC PHREG data=telco;
model tenure*churn(0)= custcat1 custcat2 custcat3;
custcat1=(custcat=1);
custcat2=(custcat=2);
custcat3=(custcat=3);
bayes seed=1 outpost=post;
RUN;
ODS graphics off;
ODS html close;
DATA New;
set Post;
Indicator=(custcat1 < 0);
label Indicator='Basic service < 0';
RUN;
PROC MEANS data=New(keep=Indicator) n mean;
RUN;
29
PROC PHREG
30
PROC PHREG
Self Study: Prediction
DATA cov_pat;
marital = 1;
address = 1;
employ = 3;
custcat2 = 0;
RUN;
custcat3 = 1;
custcat4 = 0;
31
PROC PHREG
32
PROC MIXED
Recommended reading:
The most common of these structures arises from the use of random-effects
parameters, which are additional unknown random variables assumed to affect
the variability of the data. The variances of the random-effects parameters,
commonly known as variance components, become the covariance parameters
for this particular structure.
33
PROC MIXED
Hierarchical notation:
Level 1:
Level 2:
Yij 0 j 1 j X ij rij
0 j 00 01Z j u0 j
1 j 10 11Z j u1 j
34
PROC MIXED
Variable information
MATHACH
SES
MEANSES
CSES
SECTOR
PROC SQL;
create table hsb2 as
select *, mean(ses) as meanses, ses-mean(ses) as cses
from sas3.hsb
group by schoolid;
QUIT;
35
PROC MIXED
Yij 00 01Z j 10 X ij 11Z j X ij
u0 j u1 j X ij rij
The fixed effect would refer to the overall expected effect of a students
socioeconomic status on test scores; the random effect gives information
on whether or not this effect differs between schools.
36
PROC MIXED
PROC MIXED COVTEST <DATA=SAS-data-set> ;
CLASS variables;
MODEL dep_var = predictors / SOLUTION ddfm=bw;
RANDOM variables / SUBJECT=var SOLUTION;
RUN;
MODEL : identifies the model elements.
CLASS : specifies the classification variables.
NOTEST : specifies no hypothesis test for fixed effects.
37
PROC MIXED
**Multilevel model (mixed model): PROC MIXED;
38
PROC MIXED
Self Study: Prediction
To plot the predicted math achievement scores constraining the meanses to
low, medium and high. Please use 25th/50th/75th percentiles to define the
strata of low, medium and high.
PROC UNIVARIATE data=hsb2;
var meanses;
RUN;
DATA toplot;
set hsb2;
if meanses<=-0.323 then do;
ms=-0.323;
strata="Low";
end;
else if meanses>=0.327 then do;
ms=0.327;
strata="Hig";
end;
else do;
ms=0.032; strata="Med" ; end;
predicted=12.1282+5.3367*ms+1.2245*sector+2.9407*cses+1.0345
39
*ms*cses-1.6388*sector*cses;
RUN;
PROC MIXED
PROC SORT data=toplot;
by strata;
RUN;
goptions reset=all;
symbol1 i=join c=red ;
symbol2 i=join c=blue ;
axis1 order=(-4 to 3 by 1) label=("Group Centered SES");
axis2 order=(0 to 22 by 2) label=(a=90 "Math Achievement Score");
PROC GPLOT data = toplot;
by strata;
plot predicted*cses = sector / vaxis = axis2 haxis = axis1;
RUN;
QUIT;
40
PROC MIXED
41
PROC PANEL
Panel data structure: We document values of a total of j factors
for a total of n subjects (e.g. firms) at a time point t.
Variables
Cases(nt)
x1
x2
x3
xj
11
12
1t
21
22
2t
31
32
3t
.
.
.
.
.
.
.
.
nt
42
PROC PANEL
Panel data:
43
PROC PANEL
PROC REG
LSDV1
LSDV2
LSDV3
w/o dummy
/NOINT
RESTRICT
44
PROC PANEL
PROC PANEL
Airline
The data measure costs, prices of inputs, and utilization rates for six
airlines over the time span 19701984. This example analyzes the log
transformations of the cost, price and quantity, and the raw (not logged)
capacity utilization measure.
Variable name
Variable information
LF
IC
IQ
IPF
PROC PANEL
The following fix-two effects model is speculated first:
47
PROC PANEL
Further analysis on random effects:
*random effect panel model;
PROC PANEL data=airline
id I T;
RANONE:
model lC =
RANONEwk: model lC =
RANONEwh: model lC =
RANONEnl: model lC =
RANTWO:
model lC =
RANTWOwk: model lC =
RANTWOwh: model lC =
RANTWOnl: model lC =
RUN;
outest=estimates;
lQ
lQ
lQ
lQ
lQ
lQ
lQ
lQ
lPF
lPF
lPF
lPF
lPF
lPF
lPF
lPF
lF
lF
lF
lF
lF
lF
lF
lF
/
/
/
/
/
/
/
/
ranone
ranone
ranone
ranone
rantwo
rantwo
rantwo
rantwo
vcomp=fb;
vcomp=wk;
vcomp=wh;
vcomp=nl;
vcomp=fb;
vcomp=wk;
vcomp=wh;
vcomp=nl;
There are four ways of computing the variance components in the one-way
random-effects model. The method by Fuller and Battese (FB -default), uses a
"fitting of constants" methods to estimate them. The Wansbeek and Kapteyn
(WK) method uses the true disturbances, while the Wallace and Hussain(WH)
method uses ordinary least squares residuals. The Nerlove method (NL) is
assured to give estimates of the variance components that are always positive.
48
PROC PANEL
Self-study: output results of these random model estimates in tables;
49
PROC PANEL
*self study;
DATA table;
set estimates;
VarCS = round(_VARCS_,.00001);
VarTS = round(_VARTS_,.00001);
VarErr = round(_VARERR_,.00001);
Int
= round(Intercept,.0001);
lQ2
= round(lQ,.0001);
lPF2
= round(lPF,.0001);
lF2
= round(lF,.0001);
keep _MODEL_ _METHOD_ VarCS VarTS VarErr Int lQ2 lPF2 lF2;
RUN;
title 'Parameter Estimates';
title 'Variance Component Estimates';
THANKS!
CAC statistical WIKI page:
http://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/SAS.aspx
51