Professional Documents
Culture Documents
A Logistic Regression
Laboratory Session 10
2 Activities 1
3 Exercises 1
4 Homework 4
5 References: 5
5.1 Articles for Critical Appraisal . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.2 Required Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.3 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
6 Output 5
List of Tables
1 Lung cancer data in tumor.dat . . . . . . . . . . . . . . . . . . . . . . . . . 2
i
1 Learning Objective
Upon completion of the course unit, students should be able to:
1. Understand the concept and it’s application of a logistic regression for data analysis
in public health research.
2. Understand the process of estimation and inference for the logistic regression, espe-
cially in interpreting coefficient as Odds Ratios (OR)
3. Able to assess the best model for a logistic regression applied to public health data
2 Activities
1. Discussion: a multiple logistic regression, including estimation of OR
2. Laboratory session:
(a) Understand the concept and it’s application of a logistic regression for data
analysis in public health research.
(b) Understand the process of estimation and inference for the logistic regression,
especially in interpreting coefficient as Odds Ratios (OR)
(c) Reading computer outputs from logistic regression analysis
(d) Competence to read journal publication using multiple logistic regression mod-
els.
3 Exercises
One dataset will be analyzed in this exercise. The dataset (tumor.dat) originates from
a clinical trial in which lung cancer patients were randomized to receive two different
kinds of chemotherapy (sequential therapy and alternating therapy). The outcome was
classified into one of four categories: progressive disease, no change, partial remission, or
complete remission. The data were published in Holtbrugge and Schumacher (1991) and
also appear in Hand et al. (1994). The central question is whether there is any evidence
of a difference in the outcomes achieved by the two types of therapy.
1
Table 1: Lung cancer data in tumor.dat
Therapy sex Progressive No Change Partial Complete
diseases remission remission
Sequential Male 28 45 29 26
Female 4 12 5 2
Alternative Male 41 44 20 20
Female 12 7 3 1
Assume the ASCII file tumor.dat contains the four by four matrix of frequencies
shown in Table 1. First read the data and generate variables for therapy and sex using
the egen function seq():
Please check that the data conversion is correct by tabulating these data as in Table 1:
table sex outc, contents(freq) by(therapy)
To be able to carry out ordinary logistic regression, we need to dichotomize the outcome,
for example, by considering partial and complete remission to be an improvement and
the other categories to be no improvement. The new outcome variable may be generated
as follows:
2
gen improve=outc
or using
The command logit for logistic regression behaves the same way as regress and all other
estimation commands. For example, automatic selection procedures can be carried out
using sw and post-estimation commands such as testparm and predict are available. First,
include therapy as the only explanatory variable:
The coefficient of therapy represents the difference in log odds between the therapies
and is not easyto interpret apart from the sign. Exponentiating the coefficient gives the
odds ratio and exponentiating the 95the odds ratio. Fortunately, the command logistic
may be used to obtain the required odds ratio and its confidence interval directly (alter-
natively, we could use the or option in the logit command).
To test whether the inclusion of sex in the model significantly increases the likelihood,
the current likelihood (and all the estimates) can be saved using
Including sex
gives the output of OR and 95% confidence intervals adjusted for sex (confounding).
Please discuss this result with your group and tutor.
3
4 Homework
1. Please use ASCII data file from Framingham study or the framfull.txt data set
with codebook as fram.cod. Your research assistance will guide you how to read
the data in STATA program as well as the detail of the data set. Let consider only
those who did not survive at the end of the study or consider only cause of death
among sample.
Ask your tutor to produce equations for logistic regression models on factors associ-
ated with the cardiovascular accident (cva) or stroke. Write up your dummy tables
and submitted to your tutor!
Please answer following questions:
2. Read following article entitle: Semba RD, de Pee S, Ricks MO, Sari M, Bloem MW,
Diarrhea and fever as risk factors for anemia among children under age five living
in urban slum areas of Indonesia. International Journal of Infectious Diseases 2008; 12:
62-70.
a. Please re-arrange the table 2 and 3 so that reader can easily read the OR and
its’ confidence interval.
b. Please re-write the models (1 to 3) from table 2 and 3 into regression coeficient
its’ SE of the orignal coeficients.
c. Can you add descriptive presentations of this findings in the form of graphs?
4
5 References:
5.1 Articles for Critical Appraisal
1. Sørensen TH, Olsen KR, Vedsted P. Association between general practice referral
rates and patients’ socioeconomic status and access to specialised health care. a
population-based nationwide study. Health Policy 2009; 92: 180–186.
2. Sparrow R., Targeting the poor in times of crisis: the Indonesian health card. Health
Policy and Planning 2008; 23:188–199.
2. Bewick V, Cheek L, Ball J. Statistics review 14: Logistic regression. Critical Care. 2005;
9(1): 112 - 8.This article is online at http://ccforum.com/content/9/1/112
6 Output
Achieve competencies in :
5
LOG SHEET
Name: ID:
Score : ____________________
Instructor,