You are on page 1of 8

Biostatistics I: Basic for Public Health

Lecture No.: KUI-611


Starting Date: 20/10/2013

A Logistic Regression
Laboratory Session 10

Copyright © 2013, S.A. Wilopo, Department of Public Health


Faculty of Medicine, Gadjah Mada University, Yogyakarta, Indonesia
Contents
1 Learning Objective 1

2 Activities 1

3 Exercises 1

4 Homework 4

5 References: 5
5.1 Articles for Critical Appraisal . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.2 Required Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.3 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

6 Output 5

List of Tables
1 Lung cancer data in tumor.dat . . . . . . . . . . . . . . . . . . . . . . . . . 2

i
1 Learning Objective
Upon completion of the course unit, students should be able to:

1. Understand the concept and it’s application of a logistic regression for data analysis
in public health research.

2. Understand the process of estimation and inference for the logistic regression, espe-
cially in interpreting coefficient as Odds Ratios (OR)

3. Able to assess the best model for a logistic regression applied to public health data

4. Skillful in reading and interpreting published research on these topics

2 Activities
1. Discussion: a multiple logistic regression, including estimation of OR

2. Laboratory session:

(a) Understand the concept and it’s application of a logistic regression for data
analysis in public health research.
(b) Understand the process of estimation and inference for the logistic regression,
especially in interpreting coefficient as Odds Ratios (OR)
(c) Reading computer outputs from logistic regression analysis
(d) Competence to read journal publication using multiple logistic regression mod-
els.

3 Exercises
One dataset will be analyzed in this exercise. The dataset (tumor.dat) originates from
a clinical trial in which lung cancer patients were randomized to receive two different
kinds of chemotherapy (sequential therapy and alternating therapy). The outcome was
classified into one of four categories: progressive disease, no change, partial remission, or
complete remission. The data were published in Holtbrugge and Schumacher (1991) and
also appear in Hand et al. (1994). The central question is whether there is any evidence
of a difference in the outcomes achieved by the two types of therapy.

1
Table 1: Lung cancer data in tumor.dat
Therapy sex Progressive No Change Partial Complete
diseases remission remission
Sequential Male 28 45 29 26
Female 4 12 5 2
Alternative Male 41 44 20 20
Female 12 7 3 1

Assume the ASCII file tumor.dat contains the four by four matrix of frequencies
shown in Table 1. First read the data and generate variables for therapy and sex using
the egen function seq():

infile fr1 fr2 fr3 fr4 using tumor.dat


egen therapy=seq(), from(0) to(1) block(2)
egen sex=seq(),from(1) to(2) by(therapy)
label define t 0 seq 1 alt
label values therapy t
label define s 1 male 2 female
label values sex s

block(2) causes the number in the sequence (from 0 to 1) to be repeated in blocks of


two, whereas by(therapy) causes the sequence to start from the lower limit every time
the value of therapy changes. We next reshape the data to long, placing the four levels
of the outcome into a variable outc, and expand the dataset by replicating each observa-
tion freq times so that we have one observation per subject:

reshape long fr, i(therapy sex) j(outc)


expand fr

Please check that the data conversion is correct by tabulating these data as in Table 1:
table sex outc, contents(freq) by(therapy)

To be able to carry out ordinary logistic regression, we need to dichotomize the outcome,
for example, by considering partial and complete remission to be an improvement and
the other categories to be no improvement. The new outcome variable may be generated
as follows:

2
gen improve=outc

recode improve 1/2=0 3/4=1

or using

gen improve = outc>2

The command logit for logistic regression behaves the same way as regress and all other
estimation commands. For example, automatic selection procedures can be carried out
using sw and post-estimation commands such as testparm and predict are available. First,
include therapy as the only explanatory variable:

logit improve therapy

The coefficient of therapy represents the difference in log odds between the therapies
and is not easyto interpret apart from the sign. Exponentiating the coefficient gives the
odds ratio and exponentiating the 95the odds ratio. Fortunately, the command logistic
may be used to obtain the required odds ratio and its confidence interval directly (alter-
natively, we could use the or option in the logit command).
To test whether the inclusion of sex in the model significantly increases the likelihood,
the current likelihood (and all the estimates) can be saved using

estimates store model1

Including sex

logistic improve therapy sex

gives the output of OR and 95% confidence intervals adjusted for sex (confounding).
Please discuss this result with your group and tutor.

3
4 Homework
1. Please use ASCII data file from Framingham study or the framfull.txt data set
with codebook as fram.cod. Your research assistance will guide you how to read
the data in STATA program as well as the detail of the data set. Let consider only
those who did not survive at the end of the study or consider only cause of death
among sample.
Ask your tutor to produce equations for logistic regression models on factors associ-
ated with the cardiovascular accident (cva) or stroke. Write up your dummy tables
and submitted to your tutor!
Please answer following questions:

a. Estimate regression coefficient of effect of blood pressure to risk of having cva!


What is the crude odd ratios (OR) and 95% confidence interval of crude OR of
having cva?
b. Estimate regression coefficient of effect of blood pressure to risk of having cva
adjusted for other variables! What is the adjusted odd ratios (OR) and 95%
confidence interval of OR of having cva adjusted to other variables?
c. Present the results of your analysis on a single table (not more than 7 models
or equations).
d. What are confounding factors? How can you justified?
e. Please write up a short report on the results of your analysis (not more than 2
pages).

2. Read following article entitle: Semba RD, de Pee S, Ricks MO, Sari M, Bloem MW,
Diarrhea and fever as risk factors for anemia among children under age five living
in urban slum areas of Indonesia. International Journal of Infectious Diseases 2008; 12:
62-70.

a. Please re-arrange the table 2 and 3 so that reader can easily read the OR and
its’ confidence interval.
b. Please re-write the models (1 to 3) from table 2 and 3 into regression coeficient
its’ SE of the orignal coeficients.
c. Can you add descriptive presentations of this findings in the form of graphs?

4
5 References:
5.1 Articles for Critical Appraisal
1. Sørensen TH, Olsen KR, Vedsted P. Association between general practice referral
rates and patients’ socioeconomic status and access to specialised health care. a
population-based nationwide study. Health Policy 2009; 92: 180–186.

2. Sparrow R., Targeting the poor in times of crisis: the Indonesian health card. Health
Policy and Planning 2008; 23:188–199.

5.2 Required Reading


1. Lewis, S. Regression analysis. Practical Neurology 2007;7;259-264

2. Bewick V, Cheek L, Ball J. Statistics review 14: Logistic regression. Critical Care. 2005;
9(1): 112 - 8.This article is online at http://ccforum.com/content/9/1/112

5.3 Suggested Reading


1. Rosner, B. Design and Analysis Techniques for Epidemiologic Studies. Chapter 13.
Exercise of Fundamentals of Biostatistics, 5th ed. Belmont, CA: Duxbury Press, 2004;
pp: 159–185.

6 Output
Achieve competencies in :

1. Estimating and interpreting multiple logistic regression coefficients

2. Reading computer outputs from multiple logistic regression analysis

3. Competence to read journal publication using logistic regression models.

5
LOG SHEET

Name: ID:

No Activities Date Signature Comment


1. Group Discussion on the logistic re-
gression coefficient
2. Assignment: Statistics calculation of
the OR from data analyzed using lo-
gistic regression
3. Assignment: Reading journal using
multiple logistic regression

Score : ____________________

Instructor,

You might also like