Professional Documents
Culture Documents
A Survival Analysis
Laboratory Session 11
2 Activities 1
3 Exercises 1
3.1 Life tables and Survival Curves . . . . . . . . . . . . . . . . . . . . . . . . . 1
3.2 Cox Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4 Homework 4
5 References: 8
5.1 Articles for Critical Appraisal . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.2 Required Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.3 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6 Output 9
i
1 Learning Objective
Upon completion of the course unit, students should be able to:
1. Understand the concept and it’s application of a survival analysis in public health
research.
2. Understand the process of estimation and inference for the survival analysis, espe-
cially in interpreting hazard coefficient and relative risk estimates (RR)
3. Able to assess the best model for a survival analysis applied to public health data
2 Activities
1. Discussion: a survival analysis, including estimation of RR
2. Laboratory session:
(a) Understand the concept and it’s application of a survival analysis for data
analysis.
(b) Understand the process of estimation and inference for hazard coefficient and
relative risk (RR)
(c) Reading computer outputs from survival analysis
(d) Competence to read journal publication using survival analysis models.
3 Exercises
3.1 Life tables and Survival Curves
This exercise uses the dataset "leukaemia".
1. First, set up the data for survival analysis. The time variable is weeks, the number
of weeks to relapse. The outcome variable is relapse, which is 1 if the subject had
a relapse at that time and 0 if they did not. Hence the command to set the data up
for survival analysis is stset weeks, fail(relapse).
1
2. Obtain a life table for the subjects on Drug A with the command sts list if
treatment1 == 1. What is the median survival in this group (at what time does
the survivor function reach 0.5)?
4. Obtain a life table for the subjects on standard treatment with the command sts
list if treatment1 == 0. What is the median survival in this group?
6. Do the answers to your previous questions suggest that Drug A is better, worse, or
the same as standard treatment?
7. Produce a Kaplan-Meier curve for each of the treatments with the command sts
graph, by(treatment1). Does this confirm your answer to the previous ques-
tion?
8. Add a horizontal line to the graph by adding the option yline(0.5) to the pre-
vious command. This line represents half of the group surviving and half having
a relapse: the point where it crosses the two survival curves should give you the
median survival times you calculated in earlier questions.
9. Add the option lost to the previous command. This will show how many subjects
were censored at each time point. How many subjects were lost to followup in the
two treatment arms ? Does this agree with the results you got from sts list?
10. Add the option gwood to the previous command to obtain con?dence bands for
the survival curve? (The odd name for this option is because the formulae used to
calculate the con?dence bands were developed by a Major Greenwood). Why do
the con?dence bands get wider over time?
11. Perform a logrank test to compare the survival on Drug A to that on standard
treatment, with the command sts test treatment1. Is the difference between
Drug A and standard treatment statistically significant?
12. Would have had the same answer to the previous question if you had used a
Wilcoxon test in place of a logrank test? (You can do this by adding the option
wilcoxon to the previous command.)
2
3.2 Cox Regression
1. Have a look at the survival curves by white blood cell count using sts graph,
by (wbc3cat). Does the white blood cell count a?ect survival ?
3. Given that proportion of subjects in the “High” cell count group is greater in the
standard treatment arm than in the Drug A arm, would you expect this to have
increased or decreased survival in this arm of the trial?
4. White blood cell count is a potential confounder, so we need to adjust for it. First,
we will perform an unadjusted Cox regression to obtain the hazard ratio before
adjusting. This is done with the command stcox treatment1. What is the
hazard ratio for Drug A, and its 95% confidence interval?
5. Now obtain the adjusted hazard ratio with the command xi: stcox treatment1
i.wbc3cat. What is the adjusted hazard ratio and its 95% confidence interval?
6. How did the confounding by white blood cell count a?ect the apparent effect of
Drug A? Is this what you expected from the earlier questions?
7. Now we need to test the proportional hazards assumption. First for treatment:
produce a plot of the observed and predicted Kaplan Meier plots with stcoxkm,
by(treatment1). Are the observed and predicted curves close to each other?
8. Now we can test the same assumption for the e?ect of white blood cell count, with
stcoxkm, by(wbc3cat). Are the observed and predicted curves close to each
other?
9. To obtain a formal test, we need to store the scaled and unscaled Schoenfeld residu-
als by running the command xi: stcox treatment1 i.wbc3cat, sca(sca*)
sch(sch*). Now enter the command stphtest to get an overall test of propor-
tionality. Is the regression model valid?
10. Use the command stphtest, detail to obtain tests of proportionality for each
individual variable. Is there any evidence of non-proportional hazards? show non-
proportionality?
3
4 Homework
A. Please read the Case–Control Study of Human Papillomavirus and Oropharyn-
geal Cancer G. D’Souza and Others - 10 May, 2007 in the New England Journal of
medicine. This article can be downloaded from the following address:
http://content.nejm.org/cgi/reprint/356/19/1944.pdf
The investigators were looking for associations between HPV and oropharyngeal
cancer. Use the table 3 to answer questions 1-5.
1. In Table 3:
(a) There is no association between having Positive Oral HPV-16 infection and
oropharyngeal cancer because the confidence interval does not cross 1.0.
(b) There is a 33.3-fold increase in oropharyngeal cancer in patients seropositive
for E6 or E7 but it is not statistically significant.
(c) There is a 32.2-fold adjusted increase in the odds of oropharyngeal cancer
for those with Seropositive HPV-16 L1 serologic status and it is statistically
significant.
(d) The data cannot be interpreted because the numbers are too sparse.
(e) According to these data, patient who were seropositive for HPV-16 L1 were
less likely to develop oropharyngeal cancer.
2. Calculate the unadjusted risk ratio for the risk of oropharyngeal cancer in pa-
tients who were positive for oral HPV-16 infection.
(a) 17.6
(b) 11.4
(c) 3.06
(d) 8.0
(e) Cannot calculate from the information given
3. What statistical method was used to calculate the “adjusted odds ratios” given
in the table?
4
(d) Logistic regression
(e) Multiple 2x2 tables
4. The unadjusted odds ratio for HPV-16 L1 seropositivity is 17.6 but the adjusted
odds ratio is 32.2. How do you explain this difference?
(a) This is most likely an error—as adjustment for confounding should always
reduce the magnitude of the odds ratio.
(b) The change is irrelevant since both odds ratios are statistically significant any-
way.
(c) The unadjusted odds ratio was an underestimate—which could happen if
some of the confounders were inversely related to exposure or disease.
(d) Since both odds ratios are statistically significant, this indicates that there is
little confounding going on.
(e) The unadjusted odds ratio was artificially inflated due to confounding.
5. Which of the following represents the correct statistic for comparing oral HPV
infection prevalence in cases versus controls?
32%
(a) Z = √
(0.32)(0.68)(300)
(b) Z = q32%−4%
(0.13)(0.87)
300
32%−4%
(c) Z = q
(0.13)(0.87) (0.13)(0.87)
100 + 200
−8
(d) Td f =298 = √ 32
32 8
100 + 200
(e) None of the above; only a non-parametric test should be used here.
B. The following figure displays the Hazards Ratios derived from five different Cox
regression models. Use the table 4 from the article in the NEJM available in the web
as follow:
http://content.nejm.org/cgi/reprint/356/11/1099.pdf
Please answer questions 1-4 belows.
1. Which of the following is the result of an unadjusted (univariate) analysis?
5
(c) 1.023 (0.997-1.049)
(d) None of the above. Since this is a Cox-Regression, it is always an adjusted
(multivariate) analysis.
C. The following figure displays the Kaplan-Meier curves from a randomized trial
comparing botulism toxin A with botulism toxin B for the treatment of cervical
dystonia (n=122). Patients were followed until their pain returned or until they
were censored.
1. Which of the following can be concluded directly from the figure?
6
Figure 1: The Kaplan-Meier curves from a randomized trial comparing botulism toxin A
with botulism toxin B for the treatment of cervical dystonia
(a) Botulism toxin A is a better drug for treating cervical dystonia than toxin B.
(b) Botulism toxin B is a better drug for treating cervical dystonia than toxin A.
(c) The median time to return of pain was longer in the botulism toxin A group
than the B group.
(d) The median time to return of pain was longer in the botulism toxin B group
than the A group.
(e) There is a statistically significant difference between the treatments.
2. The authors also ran a univariate Cox regression to get the hazard ratio compar-
ing treatment A to treatment B for the outcome return of pain. The hazard ratio
from this model will be:
(a) =1.0
(b) > 1.0
(c) < 1.0
(d) ≥ 1.0
(e) ≤ 1.0
3. The median time to return of pain in the botulism toxin A group was approxi-
mately:
(a) 0 weeks
7
(b) 5 weeks
(c) 12 weeks
(d) 14 weeks
(e) 25 weeks
4. The estimate of survival from pain for the botulism toxin A group at 19 weeks is
about:
(a) 100%
(b) 80%
(c) 70%
(d) 50%
(e) 30%
5 References:
5.1 Articles for Critical Appraisal
1. The Case–Control Study of Human Papillomavirus and Oropharyngeal Cancer G.
D’Souza and Others - 10 May, 2007 in the New England Journal of medicine. This
article can be downloaded from the following address: http://content.nejm.
org/cgi/reprint/356/19/1944.pdf
2. Bewick V, Cheek L, Ball J. Statistics review 12: Survival analysis. Critical Care. 2004;
8(5): 389 - 94. Available online http://ccforum.com/content/8/5/389
3. Moran JL, Solomon PJ. Statistics in review. Part 2: generalised linear models, time-
to-event and time-series analysis, evidence synthesis and clinical trials. Crit Care
Resusc. 2007; 9(2): 187-97.
8
5.3 Suggested Reading
1. Bull K, and Spiegelhalter DJ. Tutorial in Biostatistics Survival Analysis in Observa-
tional Studies Stat. Med., 1997; 16, 1041–1074 .
2. Rosner, B. Design and Analysis Techniques for Epidemiologic Studies. Chapter 13.
Exercise of Fundamentals of Biostatistics, 5th ed. Belmont, CA: Duxbury Press, pp:
159–185.
3. Rosner, B. Hypothesis Testing: Time Person Data. Chapter 14. Exercise of Fundamen-
tals of Biostatistics, 5th ed. Belmont, CA: Duxbury Press, 2004; pp: 226–248.
6 Output
Achieve competencies in :
LOG SHEET
Name: ID:
Score : ____________________
Instructor,