You are on page 1of 16

Assignment 3

The Veterans Administration Lung Cancer data set includes the following variables:

trt: 1=standard 2=test


celltype: 1=squamous, 2=smallcell, 3=adeno, 4=large
time: survival time
status: censoring status
karno: Karnofsky performance score (100=good)
diagtime: months from diagnosis to randomisation
age: in years
prior: prior therapy 0=no, 1=yes

Question 1
(a) To assess the linear relationship between covariates and logarithm of hazard, we need to
plot martingale residuals versus a covariate of interest.

Figure 1 Residuals to check for linearity

The LOWESS smoothing for age and the LOWESS smoothing for diagnosis time do not present
any obvious systematic trend in either martingale residuals and the deviance residuals.
Moreover, diagnosis time is stretched out due to some outlier; otherwise, its deviance residuals
are symmetric. Karnofsky performance score, on the other hand, shows a systematic decreasing
trend. This suggests that the linearity assumption for this variable may not hold. It is possible to
see non-linear relationship between mortality and Karnofsky score. A 100 Karnofsky score
states that the patient can carry out normal and activity without special care, and a 0 Karnofsky

1|Page
score states that the patient is dead. Thus, the relationship between Karnofsky score and
mortality are negatively related. This negative relationship can be exponential, quadratic or
linear, so here we dont know the nature of this decrease. In our study, if we assume that the
Karnofsky score and mortality are linear, then there is no need for modify of the effects for age,
diagnosis, and the Karnofsky score. If we assume Karnosfky performance score and mortality
are non-linear, we need to construct a model that take into account of the non-linear
relationship between mortality and Karnofsky score.
(b) The score residuals measure the influence of the observation i on the parameter estimates
when i is removed from the model fit. These are calculated separately for each regression
coefficient.

Figure 2 The score residuals of each covariate

In order to calculate for outliers, the formulas Q1 - 1.5*IQR and Q3 + 1.5*IQR were applied.
Treatment: There are three outliers on the upper end in the standard group (in red), and there
are three outliers at the bottom end in the test group (in blue). Individuals 44, 17, and 21 are
red. Individuals 78, 75, and 73 are blue. The most noticeable outliers are individual 44 all the
way on the top in the standard group and individual 78 in the test group.
Celltype: Since celltype is divided into three covariates, we have three score residuals. Each set
of boxplots have different outliers as seen from above. The frequent outliers in all three plots
for cell type are individuals 44 and 17.

2|Page
Karnofsky performance score: There are a few outliers on the upper end (in red), and only one
outlier at the bottom (in blue). The outliers include individuals 100, 21, 118, 44, and 70. There
are several noticeable outliers. For instance, individual 118 has a Karnofsky score of 10, which is
the only case where the score equals 10 in the study. Individual 44 with a Karnofsky score of 40
has the highest score residual in the study.
Diagtime: There are several outliers on the upper end of score residuals (in red), and a few in
the lower end (in blue). The most noticeable outlier sits at 58 months from diagnosis to
randomization, with a score residual of 45.77282, which is individual 12. The outliers include
individuals 77, 118, 45, 81, 78, 58, 21, 8, 95, 44, 12, 106, 36, and 91.
Age: There are some outliers on the upper end of score residuals (in red), and a few in the
lower end (in blue). There are several noticeable blue colored outliers (individuals 44 and 118)
at the bottom right. The outliers include individuals 44, 17, 91, 58, 75, 9, 11, 73, 118, 85, 18, 89,
36, 53, 95, and 111.
Prior: There are two outliers in the without prior therapy group, and a few outliers in the
without prior therapy group. The outliers include individuals 44, 17, 75, 73, 36, 70, and 91. The
most noticeable outliers are individual 44 all the way on the top of the group with no prior
therapy and individual 75 with prior therapy.
In conclusion: Individuals 17 and 44 are outliers almost for all the covariates analyzed above,
and they both have small cells. Thus, they are considered as influential subjects that are
influencing the results of the analysis.
(c) Investigate the appropriateness of the proportional hazards assumptions through
appropriate residual plots, and statistical tests.
The Residual-time Correlations:
> cox.zph(fit)
rho chisq p
trt -0.0284 0.131 0.717536
celltypesmallcell 0.0128 0.026 0.871960
celltypeadeno 0.1428 2.980 0.084304
celltypelarge 0.1708 4.082 0.043339
karno 0.3057 12.815 0.000344
diagtime 0.1498 2.923 0.087323
age 0.1890 5.313 0.021166
prior -0.1756 4.386 0.036242
GLOBAL NA 27.641 0.000548

The cox.zph function tests proportionality of all the predictors in the model by creating
interactions with time. A p-value less than 0.05 indicates a violation of the proportionality
assumption. Here, celltypelarge, karno, age, and prior all have p-values less than 0.05. They all
violated the proportional hazard assumption. The GLOBAL test in the last row is a test for all the
interactions tested at once. Since the p-value is 0.000548, this indicates a violation of the
proportionality assumption.

3|Page
Schoenfeld Residuals Analysis:

Figure 3 Schoenfeld Residuals against time

Treatment: The residuals are mostly centered around the 0 line and there is no systematic
trend over time; thus, we conclude that the proportionality assumption holds.
Small Cell: The residuals are centering around 0 with no systematic trend over time; thus,
based on the plot, we conclude that the proportionality assumption holds.
Adeno: There does not seem to be a systematic trend for the residual for this covariate. Thus,
we can conclude that the proportionality assumption holds.
Large: There seems to up an upward trend over time for this covariate. Thus, it could be the
case that the proportional hazards assumption does not hold for this covariate.
Karno: It looks like the residuals are mostly in the negative side, so the mean of the residuals is
more likely to be negative rather than zero. Thus, the proportional hazards assumption might
not hold for this covariate.
Diagtime: The residuals are centered around 0 with some random fluctuations. There are no
systematic trends over time for this covariate. Thus, the proportionality assumption holds.
Age: There is some increasing trend in the residual for age over time even though the line is
mostly centered around 0. Thus, the proportionality assumption might not hold.
Prior: This is a systematic decreasing trend over time for the prior therapy effect. Thus, the
proportional hazards assumption might not hold for this effect.

4|Page
Covariate-time interactions:
# Covariate-time interactions using tt function
ctt = c()
for (i in 1:length(names(coef(fit)))){
cox =
coxph(Surv(time,status)~trt+celltype+karno+age+prior+diagtime+tt(as.numeric(model.matrix
(fit)[,i])), tt=function(x, t, ...)I(t*x),data=veteran)
ctt = rbind(ctt,summary(cox)$coefficients[9,5])
}
results = cbind(names(coef(fit)),ctt)
colnames(results) = c("covariate","p_value")
results

> results
covariate p_value
[1,] "trt" "0.97796842372455"
[2,] "celltypesmallcell" "0.0292373532064322"
[3,] "celltypeadeno" "0.00185399162105382"
[4,] "celltypelarge" "0.0163737188345113"
[5,] "karno" "0.0171885773543171"
[6,] "diagtime" "0.8106179045302"
[7,] "age" "0.380977018238588"
[8,] "prior" "0.222710266155446"

The covariate-time interactions with significant p-values indicate that the covariate effects are
not constant over time; thus, their effects change based on time. As a result, these effects
violate the proportional hazards assumption. Here, both celltype and karno are statistically
significant, which indicate a violation of the proportional hazards assumption. Unlike the results
we found in the residual-time correlations, this method does not state that age and prior have
violated the proportional hazards assumption. In addition, this method states that
celltypesmallcell and celltypeadeno are both significant, which violated the proportional
hazards assumption. However, in the residual-time correlations method, these two types of
cells are not significant. As we can see, either method can be used to test proportionality, and
these two methods do not necessarily have to give the same results.

5|Page
Question 2

Figure 4 Survival Curves from the Cox Model

R code for Figure 4


cox.fit = coxph(Surv(time, status)~trt, data=veteran)
plot(survfit(cox.fit,newdata=data.frame(trt=c(1,2))),lty=c(1,1),col=c('red','blue'), xlab="Time",
ylab="Survival Probability", main="Survival Curves of the Cox Model")
legend(800, 1, c("standard", "test"), col=c("red", "blue"), lty=c(1:1))

In the cox models, we see the two survival curves are almost identical. Notice that the two
curves do not cross. This is because in these models we assume for proportionality. Thus, there
is no proportionality violation.
Residual-time Correlation:
> cox.zph(cox.fit)
rho chisq p
trt -0.16 3.3 0.0691

From the cox.zph function, the p-value for the residual-time correlation is 0.0691. This p-value
indicates that there is no proportionality violation for the treatment effect. Depending on
definition, if we set the significance level at 0.05, this test barely passed the non-proportionality
test. If we set the significance level at 0.1, then this indicates of proportionality violation.

6|Page
Covariate-time Interaction:
> cox.mod = coxph(Surv(time, status) ~ trt + tt(trt), tt=function(x,t, ...)
x * t, data=veteran)
> summary(cox.mod)
Call:
coxph(formula = Surv(time, status) ~ trt + tt(trt), data = veteran,
tt = function(x, t, ...) x * t)

n= 137, number of events= 128


coef exp(coef) se(coef) z Pr(>|z|)
trt 0.378143 1.459572 0.247019 1.531 0.1258
tt(trt) -0.003486 0.996520 0.001622 -2.149 0.0316 *
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

The p-value of tt(trt) from the covariate-time interaction test is 0.0316 and this indicates the
treatment effect violates the proportionality assumption. To note, the negative tt(trt)
coefficient estimates that as time increases, risk in the treatment group decreases. This
coincides with the survival curves in Kaplan-Meier survival curves in Figure 5. We can see that
as time goes on, the treatment effect increases, thus, risk decreases.

Figure 5 Kaplan-Meier Survival Curves

Since the Kaplan-Meier survival curves in figure 5 cross, this suggests a violation of the
proportional hazard assumption for the treatment effect. We can study the patterns of the
survival curves. The survival curve for the test group is lower than the standard group before
the day when the two curves cross, roughly at 160 days. This shows that the treatment effect is
not constant over time, thus, a violation of the proportionality assumption.

7|Page
Question 3
Add baseline age to the above model (Model 1)
coxph(formula = Surv(time, status) ~ trt + age, data = veteran)

coef exp(coef) se(coef) z p


trt -0.00365 0.99635 0.18251 -0.02 0.98
age 0.00753 1.00756 0.00966 0.78 0.44

Add age to the model as a time-dependent covariate (Model 2)


coxph(formula = Surv(time, status) ~ trt + tt(age), data = veteran,
tt = function(x, t, ...) x + t/365)

coef exp(coef) se(coef) z p


trt -0.00365 0.99635 0.18251 -0.02 0.98
tt(age) 0.00753 1.00756 0.00966 0.78 0.44

Suppose we have the following Cox model and assume there are two individuals are both
assigned to the control group. Then the Cox model becomes the following when we treat age as
the time-dependent covariate for individual i and j, respectively.
t
() = 0 ()exp[1 (0) + 2 agei ( )]
365.25
t
= 0 ()exp[2 agei (365.25)]
t
() = 0 ()exp[2 agei ( )]
365.25
Then the hazard ratio becomes the following:

() exp(2 ( + ))
365
= = exp[2 ( )]
() exp(2 ( + ))
365

The hazard ratio when we treat age as a fixed covariate:


exp(2 ( ))
= = exp(2)=exp[2 ( )]
exp(2 )

The results for both hazard ratios are the same even when treating age as a time-dependent
covariate. One way of explaining this is that age difference between individuals doesnt change
no matter when considering baseline age or time-dependent age.

8|Page
Question 4
(a) Separate treatment effects before and after 100 days of follow-up.

Below shows the long-format dataset from R:

vet = NULL
for (i in 1:nrow(veteran)) {
if (veteran$time[i]<=100) {
vet = rbind(vet, data.frame(veteran[i,], id=i, start=0, stop=veteran$time[i], event=veteran$status[i],
tr1=1, tr2=0))
}
else if (veteran$time[i] > 100) {
vet = rbind(vet, data.frame(veteran[i,], id=i, start=0, stop=100, event=0, tr1=1, tr2=0))
vet = rbind(vet, data.frame(veteran[i,], id=i, start=100, stop=veteran$time[i],
event=veteran$status[i], tr1=0, tr2=1))
}
}
head(vet)
tail(vet)

Below are the results for the first and last two individuals:
> head(vet);tail(vet)
trt celltype time status karno diagtime age prior id start stop event tr1 tr2
1 1 squamous 72 1 60 7 69 0 1 0 72 1 1 0
2 1 squamous 411 1 70 5 64 10 2 0 100 0 1 0
21 1 squamous 411 1 70 5 64 10 2 100 411 1 0 1

trt celltype time status karno diagtime age prior id start stop event tr1 tr2
136 2 large 378 1 80 4 65 0 136 0 100 0 1 0
1361 2 large 378 1 80 4 65 0 136 100 378 1 0 1
137 2 large 49 1 30 3 37 0 137 0 49 1 1 0

9|Page
Now we construct the following Cox Model:
2 ()
() = 0 () exp {1 {100} + 2 {>100} }
> cox.vet
Call:
coxph(formula = Surv(start, stop, event) ~ trt:tr1 + trt:tr2,
data = vet)

coef exp(coef) se(coef) z p


trt:tr1 0.399 1.490 0.228 1.75 0.080
trt:tr2 -0.680 0.507 0.316 -2.15 0.032

Likelihood ratio test=7.96 on 2 df, p=0.0187


n= 190, number of events= 128

(b) Using the tt argument of the coxph function


> cox.tt
Call:
coxph(formula = Surv(time, status) ~ tt(trt), data = veteran,
tt = function(x, t, ...) cbind(I(t <= 100) * x, I(t > 100)
*
x))

coef exp(coef) se(coef) z p


tt(trt)1 0.399 1.490 0.228 1.75 0.080
tt(trt)2 -0.680 0.507 0.316 -2.15 0.032

Likelihood ratio test=7.96 on 2 df, p=0.0187


n= 137, number of events= 128

The two Cox models have the same coefficient and p-values. Based on looking at Kaplan-Meier
or any other measurements of the same data and create new variables and new hypothesis
based on these observations above, we are going to inflate type I error. This is because as the
number of tests increase, there is a higher chance that one of the tests will yield significant
result by chance even though there is no difference in hazard rate between treatment and
control. If we would like to do this, we need to account for multiple comparison or have an
independent dataset which is not observed and apply the same hypothesis in the new
independent dataset. Then the results for the new dataset will be valid.

10 | P a g e
Question 5
Hazard Model
p
i (t) = 0 (t)exp[k=1 k xki (t)]

0 (t) = the baseline hazard function


xki (t) = the value of the kth covariate at time t, in the ith individual
k = 1, 2, , p defines the kth covariate
i = 1, 2, , n defines the ith individual
There are two parameters we estimate: the coefficient of the treatment effect before 100 days
of follow-up and the coefficient of the treatment effect after 100 days of follow-up.

For our case,

i (t) = 0 (t)exp[1trti*{2 100} + 2 trti* {()>100} ],

where {2 100} and {()>100} ], are the indicator functions of follow up time before and at 100
days and follow-up time after 100 days, respectively; trti is an indicator for whether the ith
individual is in the treatment group or not; if trti = 1 then the ith individual is in the treatment
group and if trti = 0 then the ith individual is in the standard group.

The two parameters of the model:

1 is the effect of the treatment on the hazard rate before 100 days of follow-up
2 is the effect of the treatment on the hazard rate after 100 days of follow-up.
Cox Partial Likelihood Function

2 ()
exp[1 { +2 ]
100} { >100}
PL() = 137
=1 [137 2 () ]
=1 ( )exp[1 { 100} +2 { >100} ]

where = 1 if event has occurred; = 0 if the lifetime is censored; ( ) = { } is an


indicator for individualbeing at risk at time ti.

11 | P a g e
Appendix
library(survival)
veteran
veteran$prior[veteran$prior==10]=1
veteran$smallcell <- as.numeric(veteran$celltype=="smallcell")
veteran$adeno <- as.numeric(veteran$celltype=="adeno")
veteran$large <- as.numeric(veteran$celltype=="large")

# Assessing Linearity: Martingale Residuals


fit=coxph(Surv(time,status)~trt+celltype+karno+diagtime+age+prior,data=veteran,method='b
reslow')
r_mart = resid(fit,type="martingale")
par(mfrow=c(2,3))
plot(veteran$age, r_mart, xlab="Age", ylab="Martingale Residuals", main="Martingale
Residuals by Age")
lines(lowess(veteran$age,r_mart),col="red")
plot(veteran$diagtime, r_mart, xlab="Diagnosis Time", ylab="Martingale Residuals",
main="Martingale Residuals by Diagnosis Time")
lines(lowess(veteran$diagtime,r_mart),col="red")
plot(veteran$karno, r_mart, xlab="Karnofsky Performance Score", ylab="Martingale
Residuals", main="Martingale Residuals by Karnofsky Performance Score")
lines(lowess(veteran$karno,r_mart),col="red")

# Deviance Residuals (rescaled versions of martingale residuals)


r_dev=residuals(fit,type="deviance")
plot(veteran$age,r_dev,xlab="Age",ylab="Deviance Residuals", main="Deviance Residauls by
Age")
abline(0,0,lty=2,col='red')
lines(lowess(veteran$age,r_mart),col="red")
plot(veteran$diagtime,r_dev,xlab="Diagnosis Time",ylab="Deviance Residuals",
main="Deviance Residuals by Diagnosis Time")
abline(0,0,lty=2,col='red')
lines(lowess(veteran$diagtime,r_mart),col="red")
plot(veteran$karno,r_dev,xlab="Karnofsky Performance Score",ylab="Deviance Residuals",
main="Deviance Residuals by Karnofsky Performance Score")
abline(0,0,lty=2,col='red')
lines(lowess(veteran$karno,r_mart),col="red")

# Influential Points: Score Residual


par(mfrow=c(2,4))
r_score=residuals(fit,type="score")

12 | P a g e
plot(veteran$trt, r_score[,1], ylab="Score Residuals", xlab="Treatment", main="Score
Residuals by Treatment")
points(veteran$trt[44], r_score[,1][44], col = "red")
points(veteran$trt[17], r_score[,1][17], col = "red")
points(veteran$trt[21], r_score[,1][21], col = "red")
points(veteran$trt[78], r_score[,1][78], col = "blue")
points(veteran$trt[75], r_score[,1][75], col = "blue")
points(veteran$trt[73], r_score[,1][73], col = "blue")

plot(veteran$celltype, r_score[,2], ylab="Score Residuals", xlab ="Cell Type", main="Score


Residuals by Cell Type", names=c("squa", "small", "adeno", "large"))
points(veteran$celltype[44], r_score[,2][44], col = "blue")
points(veteran$celltype[17], r_score[,2][17], col = "blue")
points(veteran$celltype[21], r_score[,2][21], col = "blue")
points(veteran$celltype[36], r_score[,2][36], col = "blue")

plot(veteran$celltype, r_score[,3], ylab="Score Residuals", xlab ="Cell Type", main="Score


Residuals by Cell Type", names=c("squa", "small", "adeno", "large"))
points(veteran$celltype[118], r_score[,3][118], col = "blue")
points(veteran$celltype[125], r_score[,3][125], col = "blue")

plot(veteran$celltype, r_score[,4], ylab="Score Residuals", xlab ="Cell Type", main="Score


Residuals by Cell Type", names=c("squa", "small", "adeno", "large"))
points(veteran$celltype[44], r_score[,4][44], col = "red")
points(veteran$celltype[17], r_score[,4][17], col = "red")
points(veteran$celltype[73], r_score[,4][73], col = "red")
points(veteran$celltype[78], r_score[,4][78], col = "red")
points(veteran$celltype[75], r_score[,4][75], col = "red")
points(veteran$celltype[36], r_score[,4][36], col = "red")
points(veteran$celltype[21], r_score[,4][21], col = "red")
points(veteran$celltype[58], r_score[,4][58], col = "blue")

plot(veteran$karno, r_score[,5], ylab="Score Residuals",xlab="Karnofsky Performance Score",


main="Score Residuals by Karnofsky Performance Score")
points(veteran$karno[100], r_score[,5][100], col = "red")
points(veteran$karno[21], r_score[,5][21], col = "red")
points(veteran$karno[118], r_score[,5][118], col = "red")
points(veteran$karno[44], r_score[,5][44], col = "red")
points(veteran$karno[70], r_score[,5][70], col = "blue")

plot(veteran$diagtime, r_score[,6], ylab="Score Residuals", xlab="Diagnosis


Time",main="Score Residuals by Diagnosis Time")
points(veteran$diagtime[77], r_score[,6][77], col = "red")

13 | P a g e
points(veteran$diagtime[118], r_score[,6][118], col = "red")
points(veteran$diagtime[45], r_score[,6][45], col = "red")
points(veteran$diagtime[81], r_score[,6][81], col = "red")
points(veteran$diagtime[78], r_score[,6][78], col = "red")
points(veteran$diagtime[58], r_score[,6][58], col = "red")
points(veteran$diagtime[21], r_score[,6][21], col = "red")
points(veteran$diagtime[8], r_score[,6][8], col = "red")
points(veteran$diagtime[95], r_score[,6][95], col = "red")
points(veteran$diagtime[44], r_score[,6][44], col = "red")
points(veteran$diagtime[12], r_score[,6][12], col = "red")
points(veteran$diagtime[106], r_score[,6][106], col = "blue")
points(veteran$diagtime[36], r_score[,6][36], col = "blue")
points(veteran$diagtime[91], r_score[,6][91], col = "blue")

plot(veteran$age, r_score[,7], ylab="Score Residuals", xlab="Age", main="Score Residuals by


Age")
points(veteran$age[44], r_score[,7][44], col = "blue")
points(veteran$age[17], r_score[,7][17], col = "red")
points(veteran$age[91], r_score[,7][91], col = "red")
points(veteran$age[58], r_score[,7][58], col = "red")
points(veteran$age[75], r_score[,7][75], col = "red")
points(veteran$age[9], r_score[,7][9], col = "red")
points(veteran$age[11], r_score[,7][11], col = "red")
points(veteran$age[73], r_score[,7][73], col = "red")
points(veteran$age[118], r_score[,7][118], col = "blue")
points(veteran$age[85], r_score[,7][85], col = "blue")
points(veteran$age[18], r_score[,7][18], col = "blue")
points(veteran$age[89], r_score[,7][89], col = "blue")
points(veteran$age[36], r_score[,7][36], col = "blue")
points(veteran$age[53], r_score[,7][53], col = "blue")
points(veteran$age[95], r_score[,7][95], col = "blue")
points(veteran$age[111], r_score[,7][111], col = "blue")

plot(veteran$prior, r_score[,8], ylab="Score Residuals", xlab="Prior", main="Score Residuals


by Prior")
points(veteran$prior[44], r_score[,8][44], col = "red")
points(veteran$prior[17], r_score[,8][17], col = "red")
points(veteran$prior[75], r_score[,8][75], col = "blue")
points(veteran$prior[73], r_score[,8][73], col = "blue")
points(veteran$prior[36], r_score[,8][36], col = "blue")
points(veteran$prior[70], r_score[,8][70], col = "blue")
points(veteran$prior[91], r_score[,8][91], col = "blue")
# Influential Points: dfbeta
r_dfbeta=residuals(fit,type="dfbetas")

14 | P a g e
plot(veteran$trt, r_dfbeta[,1], ylab="Score Residuals", xlab="Treatment", main="Score
Residuals by Treatment")
plot(veteran$celltype, r_dfbeta[,2], ylab="Score Residuals", xlab ="Cell Type", main="Score
Residuals by Cell Type", names=c("squa", "small", "adeno", "large"))
plot(veteran$celltype, r_dfbeta[,3], ylab="Score Residuals", xlab ="Cell Type", main="Score
Residuals by Cell Type", names=c("squa", "small", "adeno", "large"))
plot(veteran$celltype, r_dfbeta[,4], ylab="Score Residuals", xlab ="Cell Type", main="Score
Residuals by Cell Type", names=c("squa", "small", "adeno", "large"))
plot(veteran$karno, r_dfbeta[,5], ylab="Score Residuals",xlab="Karnofsky Performance
Score", main="Score Residuals by Karnofsky Performance Score")
plot(veteran$diagtime, r_dfbeta[,6], ylab="Score Residuals", xlab="Diagnosis
Time",main="Score Residuals by Diagnosis Time")
plot(veteran$age, r_dfbeta[,7], ylab="Score Residuals", xlab="Age", main="Score Residuals by
Age")
plot(veteran$prior, r_dfbeta[,8], ylab="Score Residuals", xlab="Prior", main="Score Residuals
by Prior")

# Scaledsch Residuals
par(mfrow=c(2,4))
residuals(fit)
residuals(fit, type="scaledsch")
cox.zph(fit)
for (i in 1:8){
plot(cox.zph(fit)[i])
abline(0,0,lty=2,col='red')
}

# Covariate-time interactions using tt function


ctt = c()
for (i in 1:length(names(coef(fit)))){
cox =
coxph(Surv(time,status)~trt+celltype+karno+age+prior+diagtime+tt(as.numeric(model.matrix
(fit)[,i])), tt=function(x, t, ...)I(t*x),data=veteran)
ctt = rbind(ctt,summary(cox)$coefficients[9,5])
}
results = cbind(names(coef(fit)),ctt)
colnames(results) = c("covariate","p_value")
results

# Question 2
# Cox Model Survival Curves
cox.fit = coxph(Surv(time, status)~trt, data=veteran)

15 | P a g e
plot(survfit(cox.fit,newdata=data.frame(trt=c(1,2))),lty=c(1,1),col=c('red','blue'), xlab="Time",
ylab="Survival Probability", main="Survival Curves of the Cox Model")
legend(800, 1, c("standard", "test"), col=c("red", "blue"), lty=c(1:1))
cox.zph(cox.fit, global=TRUE)
cox.mod = coxph(Surv(time, status) ~ trt + tt(trt), tt=function(x,t, ...) x * t, data=veteran)

# Kaplan-Meier Survival Curves


km = survfit(Surv(time,status)~trt, data=veteran, conf.type="none")
plot(km,lty=c(1,1),col=c("red","blue"), xlab="Time", ylab="Survival Probability",
main="Kaplan-Meier Survival Curves")
legend(800, 1, c("standard", "test"), col=c("red", "blue"), lty=c(1:1))

# Question 3
coxfit1 = coxph(Surv(time, status)~trt + age, data=veteran)
coxfit2 = coxph(Surv(time, status)~trt + tt(age),tt=function(x,t,...) x+t/365, data=veteran)

# Question 4
# Estimate separate treatment effects for early and later parts of the follow-up
vet = NULL
for (i in 1:nrow(veteran)) {
if (veteran$time[i]<=100) {
vet = rbind(vet, data.frame(veteran[i,], id=i, start=0, stop=veteran$time[i],
event=veteran$status[i], tr1=1, tr2=0))
}
else if (veteran$time[i] > 100) {
vet = rbind(vet, data.frame(veteran[i,], id=i, start=0, stop=100, event=0, tr1=1, tr2=0))
vet = rbind(vet, data.frame(veteran[i,], id=i, start=100, stop=veteran$time[i],
event=veteran$status[i], tr1=0, tr2=1))
}
}
head(vet);tail(vet)

cox.vet = coxph(Surv(start,stop,event)~trt:tr1+trt:tr2,data = vet)


cox.vet
# Using the tt argument of the coxph function
cox.tt = coxph(Surv(time,status)~tt(trt), data=veteran, tt=function(x,t, ...) cbind(I(t<=100)*x,
I(t>100)*x))
cox.tt

16 | P a g e

You might also like