You are on page 1of 13

HW2 Solutions, Stat 798L

#3.6. We assume that time to relapse is Weibull(theta, alpha)


and time to death from relapse is Expon(lambda) and
independent of time to death from relapse. Note that we have
no distributional assumption allowing the possibility of death
preceding relapse, so presumably that would be treated as a
right-censored observation, although that never occurs in the
dataset. (It would be really questionable whether such
censoring could be independent of time to relapse.)
So the answer to parts (a) and (b) is respectively to
extract those terms of the full likelihood for all parameters
which respectively depend upon lambda alone and upon (theta,
alpha). That joint likelihood is

Lik = lambda^6 * exp(-lambda*180) * theta^4 * alpha^4 *


(6*4*3*13)^(alpha-1)*exp(-theta*(6^alpha + 4^alpha +
3^alpha + 9^alpha + 13^alpha + 11^alpha))

For part (c), we are asked effectively to write the conditional


likelihood for death-times following relapse given that relapse
occurs: the likelihood now has only 6 patients' factors but,
because of the assumed independeence of relapse-time and
death-time following relapse, this left-truncated likelihood
agrees precisely with the (theta,alpha) portion of the likelihood

already given as answer to part (b).

#3.8. (a) P(delta=1) = lambda/(lambda+theta).


(b) P(T >t) = P(X>t)*P(C>t) = exp(-(lambda+theta)*t): Expon(lambda+theta)
(c) P(delta=1, T>t) =
int_t^{infty} exp(-theta*x) lambda exp(-lambda*x) dx =
(lambda/(lambda+theta))*((lambda+theta) exp(-(lambda+theta)*t))
which (since delta is binary) proves independence.
(d) Note that S = (lambda+theta)*(T_1+...+T_n) ~ Gamma(n-1,1)
and D=delta_1+...+delta_n ~ Binom(n, p) with p=lambda/(lambda+theta)
are independent by (c). Also E(D) = np, E(D^2) = (np)^2+np(1-p) and
E(1/S^k) = Gamma(n-k)/Gamma(n) for k=1,2. From these formulas, using
\hat{lambda} = (lambda+theta)*D/S, we derive
E(\hat{lambda}) = lambda*n/(n-1) and
Var(\hat{lambda}) = n*lambda*(n*lambda/(n-1)+theta)/((n-1)*(n-2))

Extra Problem: as indicated in class, the book's notation q_i(t)


corresponds to our psi(t) = F_1'(t), and the book's f_i(t) to our
f_X(t) = -S_X'(t). So
rho(t) = (f_X(t)/psi(t)-1)/(S_X(t)/S_T(t)-1).
Letting y = S_X(t) be the quantity for which we want to solve
uniquely, and noting that y = S_X(t) >= S_T(t) by definition,
we have the ODE (with initial condition y=1 for t=0)
-dy/dt = psi(t)*( rho(t)*(y/S_T(t)-1) + 1 )

uniquely determining y, and the RHS of the equation becomes larger


with larger rho, which implies that y = S_X(t) becomes smaller when
rho(s) is increased (simultaneously at all s-values).

#4.1. Aneuploid Tumors are the ones (52) with Profil=1


## Times are supposed to be in weeks, in which case we should use
## multiples of 52 [better than 48] per year
## But solutions in book use the units given, so relate to
## 11 and 61 weeks not months
> Tong = read.table("Data/TongCanc.dat", col.names=c("Profil","Time","Status"))
> tongfit = survfit(Surv(Time,Status), data=Tong[Tong$Profil==1,])
## (a) Use event-time 4 at 11 wk, and event-time 14 at 51 wk
> matrix(round(c(tongfit$surv[c(4,14)], (tongfit$std.err*tongfit$surv)[
c(4,14)]),4), ncol=2, dimnames=list(c("1yr","5yr"),c("Surv","SE")))
Surv

SE

1yr 0.9038 0.0409


5yr 0.6538 0.0660

### matches book's answer

## (b)-(e)
> round(c(NelsAal = cumsum(tongfit$n.ev/tongfit$n.risk)[14], lgKM =
-log(tongfit$surv[14]), SE = tongfit$std.err[14]),5)
NelsAal lgKM

SE

0.41780 0.42488 0.10090 ## this was with Greenwood:


## Tsiatis SE answer was .0992, matching book
> tongft2 = survfit(Surv(Time,Status), data=Tong[Tong$Profil==1,],

error="tsiatis")
## Linear CI: this time using Greenwood SE to match book
> tongfit$surv[14]*(1 + 1.96*c(-1,1)*tongfit$std.err[14])
[1] 0.5245377 0.7831546
## Log-trans CI: called "log-log" in R, using Greenwood SE
> tongfit$surv[14]^exp(1.96*c(-1,1)*tongfit$std.err[14]/
log(tongfit$surv[14]))
[1] 0.5082759 0.7658557
> sin(asin(sqrt(tongfit$surv[14]))+0.5*1.96*c(-1,1)*
tongfit$std.err[14]*sqrt(1/(1/tongfit$surv[14]-1)))^2
[1] 0.5204761 0.7759203

#(f) want to consider bands for time interval "3 to 6 years" but
## since the book translates that to times 36-71, we also do that
## so the time range is 32 to 72 corr., to 12th to 18th evt-times.
> aLU = 1/(1+1/(52*tongfit$std.err[c(12,18)])) ## = c(.8278,.8567)
## with these values, the c05 constant is read off (approximately)
## from Table C.3b, p.464 using linear extrapolations as
## 2.6973-((.86-.8567)/.02)*.0322-((.8278-.6)/.02)*(.016) = 2.5
## Here used "tsiatis" in place of Greenwood to match book's answer
> for(i in 12:18) cat(round(tongft2$surv[i]^exp(2.5*c(-1,1)*
tongft2$std.err[i]/log(tongft2$surv[i])),5), "\n")
0.5056 0.82015
0.4861 0.80471
0.46686 0.78899

0.46686 0.78899
0.44707 0.77267
0.42757 0.75608
0.40837 0.73923

#4.2 Here used only Gp=2 and 3 (resp AML low and Hi risk), Dtime & Dind
# After storing the 137 lines of data in BMT.data, and storing as
## MS-DOS text to strip tabs:
> BMT = read.table("BMT.data", col.names=c("Gp","Dtime","Rtime",
"Dind","Rind", paste("V",6:22,sep="")))[,c(1,2,4)] ## 137 x 3
> table(BMT[,1])
1 2 3
38 54 45 ## only groups 2(Low) and 3 (High) are of interest to us
## 54 patients in AML Low-risk group, but only 23 unique death times

NOTE: the notation sigma_S(t) in the book (Chapter 4, pp.105ff.)


does NOT denote the standard error of the Kaplan-Meier survival
function estimator, but rather the standard error divided by the
Kaplan-Meier estimator itself !! SE's below use Greenwood, not Tsiatis.

##(a)
> BMTlo = survfit(Surv(Dtime, Dind), data=BMT[BMT$Gp==2,2:3])
> rbind(time= BMTlo$time[BMTlo$n.ev>0], surv = round(BMTlo$surv[
BMTlo$n.ev>0],4), SE = round((BMTlo$surv*BMTlo$std.err)[BMTlo$n.event>0],4))

time 10.0000 35.0000 48.0000 53.0000 79.0000 80.0000 105.0000 222.0000 288.0000
surv 0.9815 0.9630 0.9444 0.9259 0.9074 0.8889 0.8704 0.8519 0.8333
SE 0.0183 0.0257 0.0312 0.0356 0.0394 0.0428 0.0457 0.0483 0.0507

time 390.0000 393.0000 414.0000 431.0000 481.0000 522.0000 583.0000 641.0000


surv 0.8148 0.7963 0.7778 0.7593 0.7407 0.7222 0.7037 0.6852
SE

0.0529 0.0548 0.0566 0.0582 0.0596 0.0610 0.0621 0.0632

time 653.0000 704.0000 1063.0000 1074.0000 1156.0000 2204.0000


surv 0.6667 0.6481 0.6258 0.6034 0.5811 0.4842
SE

0.0642 0.0650 0.0665 0.0678 0.0688 0.1054

### The std.err component given directly is for Nelson-Aalen not KM

> BMThi = survfit(Surv(Dtime, Dind), data=BMT[BMT$Gp==3,2:3])


rbind(time= BMThi$time[BMThi$n.ev>0], surv = round(BMThi$surv[BMThi$
n.ev>0],4), SE = round((BMThi$surv*BMThi$std.err)[BMThi$n.event>0],4))
time 2.0000 16.0000 62.0000 63.0000 73.0000 74.0000 80.0000 93.0000 97.0000
surv 0.9778 0.9556 0.9333 0.9111 0.8889 0.8667 0.8444 0.8222 0.8000
SE 0.0220 0.0307 0.0372 0.0424 0.0468 0.0507 0.0540 0.0570 0.0596

time 105.0000 121.0000 122.0000 128.0000 129.0000 153.0000 162.0000 164.000


surv 0.7556 0.7333 0.7111 0.6889 0.6667 0.6444 0.6222 0.600
SE

0.0641 0.0659 0.0676 0.0690 0.0703 0.0714 0.0723 0.073

time 168.0000 183.0000 195.0000 248.0000 265.0000 318.0000 341.0000 363.0000

surv 0.5778 0.5556 0.5333 0.5111 0.4889 0.4667 0.4444 0.4222


SE

0.0736 0.0741 0.0744 0.0745 0.0745 0.0744 0.0741 0.0736

time 392.000 469.0000 491.0000 515.0000 547.0000 677.0000 732.0000 1298.0000


surv 0.400 0.3778 0.3556 0.3333 0.3111 0.2889 0.2667 0.2370
SE

0.073 0.0723 0.0714 0.0703 0.0690 0.0676 0.0659 0.0649

## (b)
> round(cumsum(BMTlo$n.event/BMTlo$n.risk)[BMTlo$n.ev>0],4)
[1] 0.0185 0.0374 0.0566 0.0762 0.0962 0.1166 0.1375 0.1587 0.1805 0.2027
[11] 0.2254 0.2487 0.2725 0.2969 0.3219 0.3475 0.3738 0.4009 0.4286 0.4631
[21] 0.4988 0.5359 0.7025 ### Nelson-Aalen at distinct BMTlo death-times
> round(BMTlo$std.err[BMTlo$n.ev>0],4)
[1] 0.0187 0.0267 0.0330 0.0385 0.0435 0.0481 0.0525 0.0568 0.0609 0.0649
[11] 0.0688 0.0727 0.0766 0.0805 0.0844 0.0883 0.0922 0.0962 0.1003 0.1062
[21] 0.1123 0.1185 0.2176
### standard errors
> round(cumsum(BMThi$n.event/BMThi$n.risk)[BMThi$n.ev>0],4)
[1] 0.0222 0.0449 0.0682 0.0920 0.1164 0.1414 0.1670 0.1934 0.2204 0.2759
[11] 0.3054 0.3357 0.3669 0.3992 0.4325 0.4670 0.5027 0.5397 0.5782 0.6182
[21] 0.6599 0.7033 0.7488 0.7964 0.8464 0.8990 0.9546 1.0134 1.0759 1.1426
[31] 1.2140 1.2909 1.4021
> round(BMThi$std.err[BMThi$n.ev>0],4)
[1] 0.0225 0.0321 0.0398 0.0466 0.0527 0.0585 0.0640 0.0693 0.0745 0.0848

[11] 0.0899 0.0950 0.1002 0.1054 0.1107 0.1162 0.1217 0.1274 0.1333 0.1394
[21] 0.1458 0.1524 0.1594 0.1667 0.1744 0.1826 0.1913 0.2007 0.2108 0.2218
[31] 0.2339 0.2472 0.2739

## (c)
> plot(BMThi$time, cumsum(BMThi$n.event/BMThi$n.risk), pch=18)
> points(BMTlo$time, cumsum(BMTlo$n.event/BMTlo$n.risk), pch=5)
### "Crude" hazards (Cum hazard per unit time)
### For low-risk group:
> cumsum(BMTlo$n.event/BMTlo$n.risk)[BMTlo$n.ev>0][23]/
BMTlo$time[BMTlo$n.ev>0][23]
[1] 0.0003187612
### For hi-risk group:
> cumsum(BMThi$n.event/BMThi$n.risk)[BMThi$n.ev>0][33]/
BMThi$time[BMThi$n.ev>0][33]
[1] 0.001080166

### Average hazard intensities per unit time.

## (e)
Median survival-time estimators, each followed by three kinds of
confidence intervals:
> BMTlo$time[min((1:54)[BMTlo$surv <= 0.5])]
[1] 2204
> range(BMTlo$time[(1:54)[abs(BMTlo$surv-0.5)< qnorm(0.975)*
GreenwdSE.lo]]) ### 1063 2569 linear interval, lo-risk
> range(BMTlo$time[(1:54)[abs(log(-log(BMTlo$surv))-log(-log(0.5)))<

qnorm(0.975)*GreenwdSE.lo/abs(BMTlo$surv*log(BMTlo$surv))]])
[1] 1063 2569 ### log-transformed interval, lo-risk
> range(BMTlo$time[(1:54)[abs(asin(sqrt(BMTlo$surv))-asin(sqrt(0.5)))<
0.5*qnorm(0.975)*GreenwdSE.lo/sqrt(BMTlo$surv*(1-BMTlo$surv))]])
[1] 1063 2569 ### asin-sqrt-transformed interval, lo-risk

### To reinforce the method, we calculate the table of


### ("Delta-Method") normal deviates on the linear, loglog, and
### asinsqrt scales with the idea that only those times with
### absolute deviates less than 1.96 fall in the respective CI's

> cbind(Time= BMTlo$time, LinDev = (BMTlo$surv-0.5)/GreenwdSE.lo,


Log2Dev=(log(-log(BMTlo$surv))-log(-log(0.5)))*BMTlo$surv*
log(BMTlo$surv)/GreenwdSE.lo,
Asinsqrt=(asin(sqrt(BMTlo$surv))-asin(sqrt(0.5)))*2*sqrt(
BMTlo$surv*(1-BMTlo$surv))/GreenwdSE.lo)[BMTlo$n.ev>0,]
Time

LinDev Log2Dev Asinsqrt

[1,] 10 26.2441393 3.6130874 9.5367245


[2,] 35 18.0144190 4.1158433 8.6967652
[3,] 48 14.2581260 4.3215990 8.0459432
[4,] 53 11.9511517 4.3948018 7.4918439
[5,] 79 10.3284835 4.3917989 6.9987540
[6,] 80 9.0932676 4.3389629 6.5483870
[7,] 105 8.1026853 4.2508102 6.1298903
[8,] 222 7.2782441 4.1361854 5.7361936

[9,] 288 6.5726714 4.0008680 5.3623818


[10,] 390 5.9555175 3.8488329 5.0048694
[11,] 393 5.4061282 3.6829219 4.6609409
[12,] 414 4.9099031 3.5052296 4.3284764
[13,] 431 4.4561660 3.3173371 4.0057792
[14,] 481 4.0368840 3.1204610 3.6914616
[15,] 522 3.6458621 2.9155521 3.3843672
[16,] 583 3.2782180 2.7033628 3.0835164
[17,] 641 2.9300269 2.4844934 2.7880662
[18,] 653 2.5980765 2.2594271 2.4972814
[19,] 704 2.2796916 2.0285536 2.2105115
[20,] 1063 1.8923447 1.7262462 1.8513666
[21,] 1074 1.5267718 1.4237446 1.5046043
[22,] 1156 1.1781685 1.1201575 1.1677264
[23,] 2204 -0.1494589 -0.1504246 -0.1494094

> BMThi$time[min((1:44)[BMThi$surv <= 0.5])]


[1] 265
> range(BMThi$time[(1:44)[abs(BMThi$surv-0.5)< qnorm(0.975)*
GreenwdSE.hi]]) ### 162 469 linear interval, lo-risk
>range(BMThi$time[(1:44)[abs(log(-log(BMThi$surv))-log(-log(0.5)))<
qnorm(0.975)*GreenwdSE.hi/abs(BMThi$surv*log(BMThi$surv))]])
[1] 153 469 ###

log-transformed interval, hi-risk

> range(BMThi$time[(1:44)[abs(asin(sqrt(BMThi$surv))-asin(sqrt(0.5)))<

0.5*qnorm(0.975)*GreenwdSE.hi/sqrt(BMThi$surv*(1-BMThi$surv))]])
[1] 162 469 ### asin-sqrt-transformed interval, hi-risk

## (f)
Last jump before 300 days was at 288 days (the 9th jump) in the
low-risk group, and at 265 days (22nd jump) in the high-risk group.
> Shat = c(BMTlo$surv[9], BMThi$surv[22]) ### .8333 and .4889
sigShat = c((BMTlo$surv*BMTlo$std.err)[9], (BMThi$surv*BMThi$std.err)[22])
thet = exp(qnorm(.975)*sigShat/(Shat*log(Shat)))
> Shat[1]^c(1/thet[1],thet[1]) #Lo-risk log interval: 0.7042 0.9096
Shat[2]^c(1/thet[2],thet[2]) #Hi-risk log interval: 0.3374 0.6241

> asnShat = asin(sqrt(Shat))


Delt = 0.5*qnorm(.975)*sigShat/sqrt(Shat*(1-Shat))
> sin(c(max(0,asnShat[1]-Delt[1]),min(pi/2,asnShat[1]+Delt[1])))^2
### Lo-risk asin sqrt interval: (0.7233, 0.9198)
> sin(c(max(0,asnShat[2]-Delt[2]),min(pi/2,asnShat[2]+Delt[2])))^2
### Hi-risk asin sqrt interval: (0.3454, 0.6333)

## (g) EP bands, in each risk-group, tL=100, tU=400.


NB. In lo-risk group, last death-times before tL, tU are resp. 6th and
11th event-times; in hi-risk gp, resp. the 9th and 26th event-times.

> aLU = matrix(1/(1+1/c(54*(BMTlo$std.err)[c(6,11)]^2,

45*(BMThi$std.err)[c(9,26)]^2)),
nrow=2, dimnames=list(c("L","U",),c("lo","hi")))
lo hi
L 0.1111111 0.2
U 0.2037037 0.6
## Get confidence coefs for alpha=.05 case from table via lin.interp.
## resp. 2.536 for "lo" and 2.767 for "hi"
> for (i in 6:11) cat(round(BMTlo$surv[i]^exp(2.536*c(-1,1)*
BMTlo$std.err[i]/log(BMTlo$surv[i])),5), "\n")
0.71758 0.95906
0.69605 0.94819
0.67474 0.93674

### EP band for "lo" risk group

0.65372 0.92478
0.63298 0.91237
0.61254 0.89955
> for (i in 9:26) cat(round(BMThi$surv[i]^exp(2.767*c(-1,1)*
BMThi$std.err[i]/log(BMThi$surv[i])),5), "\n")
0.56989 0.91526
0.52343 0.88571
0.50076 0.87015
0.47847 0.85413
0.45654 0.83767
0.43498 0.82079
0.41378 0.80351
0.39294 0.78585

0.37245 0.76782
0.35231 0.74942
0.33252 0.73068
0.31307 0.71159
0.29398 0.69215
0.27523 0.67237
0.25685 0.65225
0.23882 0.63179
0.22115 0.61098
0.20387 0.58981

### EP band for "hi" risk group

You might also like