You are on page 1of 37

Chapter 3-14.

Case-Cohort Study Design

Comparison of Three Relative Effect Measures in Cohort Studies (Risk Ratio, Rate Ratio,
and Hazard Ratio)

This comparison, pages 1 to 6, was presented in Chapter 11. It is repeated here for review. The
topic of this chapter is how to do design and analyze case-control studies to obtain the same types
of effect estimates.

For illustration, we will use the following data in life table format from a hypothetical cohort
study.
Life Table of Hypothetical Data
Exposed Non-Exposed
Follow- Begin Disease Day- Begin Disease Day- Day-
up day N Cases Specific N Cases Specific Specific
Risk Risk Risk
Ratio
1 50 5 0.10 50 2 0.04 2.5
2 30 10 0.33 40 8 0.20 1.7
3 10 10 1.00 20 10 0.50 2.0
totals 90 25 110 20

These data can be entered into Stata using the following commands in the gregchapter4.do do-
file.

clear
input day exposure disease count
1 1 1 5
1 1 0 15
2 1 1 10
2 1 0 10
3 1 1 10
1 0 1 2
1 0 0 8
2 0 1 8
2 0 0 12
3 0 1 10
3 0 0 10
end
drop if count==0
expand count
drop count

_____________________

Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah
School of Medicine, 2010.

Chapter 3-14 (revision 16 May 2010) p. 1


To verify the data were corrected entered, we request a life table using

ltable day disease ,by(exposure) noadjust intervals(1) hazard

Beg. Cum. Std. Std.


Interval Total Failure Error Hazard Error [95% Conf. Int.]
-------------------------------------------------------------------------------
exposure 0
1 2 50 0.0400 0.0277 0.0400 0.0283 0.0048 0.1114
2 3 40 0.2320 0.0646 0.2000 0.0707 0.0863 0.3606
3 4 20 0.6160 0.0917 0.5000 0.1581 0.2398 0.8542
exposure 1
1 2 50 0.1000 0.0424 0.1000 0.0447 0.0325 0.2048
2 3 30 0.4000 0.0825 0.3333 0.1054 0.1598 0.5695
3 4 10 1.0000 . 1.0000 0.3162 0.4795 1.7085
-------------------------------------------------------------------------------

which agrees with the original table.

Chapter 3-14 (revision 16 May 2010) p. 2


Risk Ratio Analysis

This type of analysis ignores time-at-risk. For that reason, it assumes an equal follow-up time for
every study subject.

Life Table of Hypothetical Data


Exposed Non-Exposed
Follow- Begin Disease Day- Begin Disease Day- Day-
up day N Cases Specific N Cases Specific Specific
Risk Risk Risk
Ratio
1 50 5 0.10 50 2 0.04 2.5
2 30 10 0.33 40 8 0.20 1.7
3 10 10 1.00 20 10 0.50 2.0
totals 90 25 110 20

The risk ratio analysis uses partial information (shown in blue) from the complete data in the life
table.
Risk Ratio Analysis Data
Exposed Non-Exposed
Disease 25 (50%) 20 (40%)
Non-Disease 25 30
N 50 50

cs disease exposure

| exposure |
| Exposed Unexposed | Total
-----------------+------------------------+----------
Cases | 25 20 | 45
Noncases | 25 30 | 55
-----------------+------------------------+----------
Total | 50 50 | 100
| |
Risk | .5 .4 | .45
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Risk difference | .1 | -.0940265 .2940265
Risk ratio | 1.25 | .8064465 1.937512
Attr. frac. ex. | .2 | -.2400079 .4838742
Attr. frac. pop | .1111111 |
+-----------------------------------------------
chi2(1) = 1.01 Pr>chi2 = 0.3149

Analyzing these data in this way, we do not demonstrate a significant effect (RR=1.25, p=0.315).
In fact, this crude RR underestimates each of the day-specific RR estimates.

Chapter 3-14 (revision 16 May 2010) p. 3


Rate Ratio Analysis

This type of analysis uses time-a-risk, but in a crude way. It does not assume an equal follow-up
time for each study subject. It assumes, however, that risk is constant across the follow-up time.

Life Table of Hypothetical Data


Exposed Non-Exposed
Follow- Begin Disease Day- Begin Disease Day- Day-
up day N Cases Specific N Cases Specific Specific
Risk Risk Risk (and
Rate*)
Ratio
1 50 5 0.10 50 2 0.04 2.5
2 30 10 0.33 40 8 0.20 1.7
3 10 10 1.00 20 10 0.50 2.0
totals 90 25 110 20
* Since the intervals are each one day, the day-specific person-time is just the day-specific beginning
sample size, and so the day-specific risk is also the day-specific rate).

The rate ratio analysis uses partial information (shown in blue) from the complete data in the life
table.

Rate Ratio Analysis Data


Exposed Non-Exposed
Disease 25 (50%) 20 (40%)
Person-Days 90 110

ir disease exposure day

| exposure |
| Exposed Unexposed | Total
-----------------+------------------------+----------
disease | 25 20 | 45
day | 90 110 | 200
-----------------+------------------------+----------
| |
Incidence Rate | .2777778 .1818182 | .225
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Inc. rate diff. | .0959596 | -.0389695 .2308887
Inc. rate ratio | 1.527778 | .8147248 2.900724 (exact)
Attr. frac. ex. | .3454545 | -.2274083 .6552584 (exact)
Attr. frac. pop | .1919192 |
+-----------------------------------------------
(midp) Pr(k>=25) = 0.0800 (exact)
(midp) 2*Pr(k>=25) = 0.1599 (exact)

Analyzing these data in this way, we almost demonstrate a significant effect (IRR=1.53,
p=0.080). Notice again, this crude IRR underestimates each of the day-specific IRR estimates.

Chapter 3-14 (revision 16 May 2010) p. 4


Hazard Ratio Analysis (Survival Analysis)

This type of analysis uses time-a-risk in a very complete way, using all of the information from
the life table. It does not assume an equal follow-up time for each study subject. It allows for
and models a changing risk across the follow-up time.

Exposed Non-Exposed
Follow- Begin Disease Day- Begin Disease Day- Day-
up day N Cases Specific N Cases Specific Specific
Risk Risk Risk
Ratio
1 50 5 0.10 50 2 0.04 2.5
2 30 10 0.33 40 8 0.20 1.7
3 10 10 1.00 20 10 0.50 2.0
totals 90 25 110 20

Analyzing these data using survival analysis,

stset day ,failure(disease==1)


stcox exposure

Cox regression -- Breslow method for ties

No. of subjects = 100 Number of obs = 100


No. of failures = 45
Time at risk = 200
LR chi2(1) = 4.65
Log likelihood = -174.40643 Prob > chi2 = 0.0310
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
exposure | 1.916208 .5796004 2.15 0.032 1.059198 3.466632
------------------------------------------------------------------------------

This time, we observe a significant effect (HR = 1.92, p = 0.032).

Chapter 3-14 (revision 16 May 2010) p. 5


Inefficient Use of Time in Rate Ratio Analysis

The rate ratio analysis only considers the ratio of cases to average person-time, without
distinguishing times to event and times to censored.

person-time = total time for subjects


= mean time  N

Suppose the individual times-at-risk for a sample are: 10, 20, and 30. The person-time is
computed as:

PT = total time for subjects


=10+20+30
= 60

which is equivalent to

PT = mean time  N
= (10+20+30)/3  3
= 20  3
= 60

Thus, if we had the scenario where events occurred early while censoring occurred later in one
study group, while in the other study group an equal number of events occurred later while
censoring occurred early, the person time could be equal for the two study groups and we would
erroneously conclude no difference in rates (rate ratio =1) between the two groups.

(let x----x denote time)

x-------------------------------------x (censored) Group A


x-----x (died)
x--------x (died)
x--------------------------------------------x (censored)

x-------------------------------------x (died) Group B


x-----x (censored)
x--------x (censored)
x--------------------------------------------x (died)

In this example, the person-time is equal and the rate ratio = 1, yet clearly Group A shows greater
risk for death (Group B survives longer).

Conclusion: Cox regression is sensitive to a changing risk across time, while Poisson regression
(or a rate ratio analysis) is not. Usually both approaches beat the risk ratio approach, since they
have the advantage of using more information, namely time, in the analysis.

Chapter 3-14 (revision 16 May 2010) p. 6


How This Relates to Case-Control Studies

It would be nice if we could gain the additional power of a hazard ratio analysis somehow in a
case-control study.

In a case-control study, we do not use time in the analysis. It would seem, then, that a case-
control study can do no better than the risk ratio analysis from a cohort study, which also does
not use time in the analysis.

If the rare-disease assumption is meet, the ordinary case-control study OR is approximately the
RR that would be obtained in a cohort study.

However, if a case-cohort study design is used, which we will see how to design in this chapter,
the OR provides an unbiased estimate of the RR, without the need for the rare-disease
assumption.

Furthermore, if a density case-control study design is used, which is also referred to by many as a
case-cohort study design, the OR provides an unbiased estimate of the HR, also without the need
for the rare-disease assumption. Thus, we can obtain the benefit of a HR analysis, which
improves our chances of demonstrating an exposure-disease association.

We will see how to conduct the sampling required for these three variants of the case-control
study design, use simulation to demonstrate what the OR estimates, and explain why it estimates
these effect measures.

Chapter 3-14 (revision 16 May 2010) p. 7


Dataset

We will use the dataset found in the Breslow and Day (1987, Appendix VIII and Appendix ID).
Men (n=679) employed in a nickel refinery in South Wales were investigated to determine
whether the risk of developing carcinoma of the bronchi and nasal sinuses (ICD = 160), which
had been associated with the refining of nickel from previous studies in the 1930s, was present in
this cohort. The data are in the file nickelrefinary.csv. The variables are:

CaseID Case ID
PrimaryICD Primary ICD Code
Exposure Nickel Exposure Level
DateBirth Date of Birth
AgeEmp Age First Employed
AgeBegFol Age Follow-up Began
AgeEndFol Age at Death or Loss

Executing the following commands in the do-file editor, we see up the variables and save them to
a new file (this has already been done, with nickelrefinary.dta in the datasets & do-files
subdirectory.

* -- set up variables -------------------------------

clear
set mem 10M
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\"
cd "IntroEpiCourse\datasets & do-files"
insheet using nickelrefinary.csv

* -- set up tumor disease variable


gen tumor = cond(primaryicd==160,1,0)
replace tumor = . if primaryicd == .
label define tumorlab 0 "0. no tumor" 1 "1. tumor"
label values tumor tumorlab
label variable tumor "carcinoma of the bronchi and nasal sinuses"
tab tumor

* -- set up nickel exposure variable


gen nickel = cond(exposure>0,1,0)
replace nickel = . if exposure == .
label define nickellab 0 "0. no exposure" 1 "1. exposure"
label values nickel nickellab
label variable nickel "occupational exposure to nickel"
tab nickel

* -- set up time-at-risk variable


gen timerisk = ageendfol - agebegfol
label variable timerisk "time at risk"
sum timerisk

save nickelrefinary , replace

* -- end set up variables ---------------------------

Chapter 3-14 (revision 16 May 2010) p. 8


Creating a Dataset Without Rare Disease

So that we have a dataset where the rare disease assumption is not met, we next duplicate the
cases five times save this augmented dataset to a separate file. (This is for illustration only—of
course you would not do this in an actual data analyis.)

* -- set up data with 5 x cases


use nickelrefinary, clear
tab tumor
keep if tumor==1 // reduce to cases only
save tumorcases, replace // save cases to file
use nickelrefinary, clear // bring data back in
append using tumorcases
append using tumorcases
append using tumorcases
append using tumorcases
tab tumor
save nickelrefinary5xcases, replace

This has already been done. The file nickelrefinary5xcases.dta is in the datasets & do-files
subdirectory.

Population Effect Measures (Original Data With Rare Disease)

For illustration, we will assume our N=679 represents the population that we will sample from.
Doing this, we can determine the population effect measures that our samples will be estimates
of. Reading in the original data (not the 5 x cases dataset),

File
Open
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on nickelrefinary.dta
Open

use "C:\Documents and Settings\u0032770.SRVR\Desktop\


Biostats & Epi With Stata\datasets & do-files\
nickelrefinary.dta ", clear

* which must be all on one line, or use:


cd "C:\Documents and Settings\u0032770.SRVR\Desktop\"
cd "Biostats & Epi With Stata\datasets & do-files"
use nickelrefinary.dta, clear

Chapter 3-14 (revision 16 May 2010) p. 9


Computing the “population” (full cohort dataset before we take a sample) odds ratio,

Statistics
Observational/Epi. analysis
Tables for epidemiologists
case control odds ratio
Main tab: Case variable: tumor
Exposed variable: nickel
OK

cc tumor nickel

cc tumor nickel
Proportion
| Exposed Unexposed | Total Exposed
-----------------+------------------------+----------------------
Cases | 46 10 | 56 0.8214
Controls | 343 280 | 623 0.5506
-----------------+------------------------+----------------------
Total | 389 290 | 679 0.5729
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Odds ratio | 3.755102 | 1.824533 8.48588 (exact)
Attr. frac. ex. | .7336957 | .4519145 .8821572 (exact)
Attr. frac. pop | .6026786 |
+-----------------------------------------------
chi2(1) = 15.41 Pr>chi2 = 0.0001

Chapter 3-14 (revision 16 May 2010) p. 10


Computing the “population” (full cohort dataset before we take a sample) risk ratio,

Statistics
Observational/Epi. analysis
Tables for epidemiologists
Cohort study: risk ratio etc.
Main tab: Case variable: tumor
Exposure variable: nickel
OK

cs tumor nickel

| occupational exposure to|


| nickel |
| Exposed Unexposed | Total
-----------------+------------------------+----------
Cases | 46 10 | 56
Noncases | 343 280 | 623
-----------------+------------------------+----------
Total | 389 290 | 679
| |
Risk | .1182519 .0344828 | .0824742
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Risk difference | .0837692 | .0454195 .1221188
Risk ratio | 3.429306 | 1.760546 6.679826
Attr. frac. ex. | .7083958 | .4319943 .8502955
Attr. frac. pop | .5818966 |
+-----------------------------------------------
chi2(1) = 15.41 Pr>chi2 = 0.0001

Chapter 3-14 (revision 16 May 2010) p. 11


Computing the “population” (full cohort dataset before we take a sample) rate ratio,

Statistics
Observational/Epi. analysis
Tables for epidemiologists
Incidence rate ratios
Main tab: Case variable: tumor
Exposed variable: nickel
Person-time variable: timerisk
OK

ir tumor nickel timerisk

| occupational exposure to|


| nickel |
| Exposed Unexposed | Total
-----------------+------------------------+----------
carcinoma of the | 46 10 | 56
time at risk | 7546.318 7801.739 | 15348.06
-----------------+------------------------+----------
| |
Incidence Rate | .0060957 .0012818 | .0036487
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Inc. rate diff. | .0048139 | .0028815 .0067463
Inc. rate ratio | 4.755696 | 2.367283 10.56925 (exact)
Attr. frac. ex. | .7897259 | .5775749 .9053859 (exact)
Attr. frac. pop | .6487034 |
+-----------------------------------------------
(midp) Pr(k>=46) = 0.0000 (exact)
(midp) 2*Pr(k>=46) = 0.0000 (exact)

Chapter 3-14 (revision 16 May 2010) p. 12


Computing the “population” (full cohort dataset before we take a sample) hazard ratio,

1) informing Stata we have survival time variables:

Statistics
Survival analysis
Set up and utilities
Declare data to be survival time data
Main tab: Time variable: timerisk
Failure event: Failure variable: tumor
Failure values: 1
OK

stset timerisk , failure(tumor==1)

2) requesting a Cox regression,

Statistics
Survival analysis
Regression models
Cox proportional hazards model
Model tab: Independent variables: nickel
OK

stcox nickel

Cox regression -- Breslow method for ties

No. of subjects = 679 Number of obs = 679


No. of failures = 56
Time at risk = 15348.05715
LR chi2(1) = 27.68
Log likelihood = -321.86045 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nickel | 5.022065 1.765996 4.59 0.000 2.520922 10.00472
------------------------------------------------------------------------------

Chapter 3-14 (revision 16 May 2010) p. 13


Population Effect Measures (Augmented Data With Frequent Disease)

We then do something similar with the augmented dataset

* 5 x cases sample stats


use nickelrefinary5xcases , clear
cc tumor nickel
cs tumor nickel
ir tumor nickel timerisk
stset timerisk , failure(tumor==1)
stcox nickel

. cc tumor nickel
Proportion
| Exposed Unexposed | Total Exposed
-----------------+------------------------+----------------------
Cases | 230 50 | 280 0.8214
Controls | 343 280 | 623 0.5506
-----------------+------------------------+----------------------
Total | 573 330 | 903 0.6346
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Odds ratio | 3.755102 | 2.635221 5.405224 (exact)
Attr. frac. ex. | .7336957 | .6205252 .8149938 (exact)
Attr. frac. pop | .6026786 |
+-----------------------------------------------
chi2(1) = 61.12 Pr>chi2 = 0.0000

. cs tumor nickel

| occupational exposure to|


| nickel |
| Exposed Unexposed | Total
-----------------+------------------------+----------
Cases | 230 50 | 280
Noncases | 343 280 | 623
-----------------+------------------------+----------
Total | 573 330 | 903
| |
Risk | .4013962 .1515152 | .3100775
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Risk difference | .249881 | .1941373 .3056248
Risk ratio | 2.649215 | 2.013878 3.484987
Attr. frac. ex. | .6225296 | .5034455 .7130549
Attr. frac. pop | .5113636 |
+-----------------------------------------------
chi2(1) = 61.12 Pr>chi2 = 0.0000

Chapter 3-14 (revision 16 May 2010) p. 14


. ir tumor nickel timerisk

| occupational exposure to|


| nickel |
| Exposed Unexposed | Total
-----------------+------------------------+----------
carcinoma of the | 230 50 | 280
time at risk | 10233.57 8608.377 | 18841.95
-----------------+------------------------+----------
| |
Incidence Rate | .022475 .0058083 | .0148605
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Inc. rate diff. | .0166667 | .0133458 .0199877
Inc. rate ratio | 3.869473 | 2.839303 5.365002 (exact)
Attr. frac. ex. | .7415669 | .6478009 .8136068 (exact)
Attr. frac. pop | .6091442 |
+-----------------------------------------------
(midp) Pr(k>=230) = 0.0000 (exact)
(midp) 2*Pr(k>=230) = 0.0000 (exact)

. stset timerisk , failure(tumor==1)


. stcox nickel

Cox regression -- Breslow method for ties

No. of subjects = 903 Number of obs = 903


No. of failures = 280
Time at risk = 18841.95076
LR chi2(1) = 105.83
Log likelihood = -1683.4208 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nickel | 4.188323 .660423 9.08 0.000 3.074829 5.705048
------------------------------------------------------------------------------

Population Relative Effects

Considering our total sample as our population, we observed the following population effect
measures.

Population Relative Effect Actual Dataset Augmented Dataset


Measure with almost rare with frequent disease
disease (15% in unexposed
(3% in unexposed 60% in exposed)
12% in exposed)
Odds Ratio (OR) 3.76 3.76
Risk Ratio (RR) 3.43 2.65
Incidence Rate Ratio (IRR) 4.76 3.87
Hazard Ratio (HR) 5.02 4.19

Chapter 3-14 (revision 16 May 2010) p. 15


Classical Case-Control Study (Controls Are Sampled From Population Controls Only)

Most researchers choose their controls from the population controls only. First we will do this
for one sample.

We will use 2 controls for each case (2:1 sampling ratio). In real practice, you might choose a
greater number, such as 8 controls for each case. We use 2:1 in this illustration, in order to keep
the sample size much smaller than the population size, which makes the simulation more
believable.

Using the original dataset,

File
Open
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on nickelrefinary.dta
Open

use "C:\Documents and Settings\u0032770.SRVR\Desktop\


Biostats & Epi With Stata\datasets & do-files\
nickelrefinary.dta ", clear

* which must be all on one line, or use:


cd "C:\Documents and Settings\u0032770.SRVR\Desktop\"
cd "Biostats & Epi With Stata\datasets & do-files"
use nickelrefinary.dta, clear

Chapter 3-14 (revision 16 May 2010) p. 16


Crosstabulating disease and exposure,

Statistics
Summaries, tables & tests
Tables
Two-way tables with measures of association
Main tab: Row variable: tumor
Column variable: nickel
Cell contents: Within-column relative frequencies
OK

tabulate tumor nickel, column

carcinoma |
of the |
bronchi and | occupational exposure
nasal | to nickel
sinuses | 0. no exp 1. exposu | Total
------------+----------------------+----------
0. no tumor | 280 343 | 623
| 96.55 88.17 | 91.75
------------+----------------------+----------
1. tumor | 10 46 | 56
| 3.45 11.83 | 8.25
------------+----------------------+----------
Total | 290 389 | 679
| 100.00 100.00 | 100.00

From this population 2 × 2 table, we want to use all of the cases (n=56, the entire tumor row) and
twice as many controls (sample 112 controls from the “no tumor” row),

carcinoma |
of the |
bronchi and | occupational exposure
nasal | to nickel
sinuses | 0. no exp 1. exposu | Total
------------+----------------------+----------
0. no tumor | 280 343 | 623 <= sample 56 x 2 = 112 controls
| 96.55 88.17 | 91.75
------------+----------------------+----------
1. tumor | 10 46 | 56 <= use all 56 cases
| 3.45 11.83 | 8.25
------------+----------------------+----------
Total | 290 389 | 679
| 100.00 100.00 | 100.00

First, we set the random number generator seed so we can get the same sample if we need to
replicate our results later,

set seed 999

We now want to sample n=112 if tumor = 0 and just keep all of the cases (tumor = 1),

Chapter 3-14 (revision 16 May 2010) p. 17


Statistics
Resampling
Draw random sample
Main tab: Sample size: 112
by/if/in tab: Restrict to observations (if expression): tumor == 0
OK

sample 112 if tumor==0, count


<or>
sample 112 , count , if tumor==0

Seeing what we got,

tabulate tumor nickel, column

carcinoma |
of the |
bronchi and | occupational exposure
nasal | to nickel
sinuses | 0. no exp 1. exposu | Total
------------+----------------------+----------
0. no tumor | 56 56 | 112
| 84.85 54.90 | 66.67
------------+----------------------+----------
1. tumor | 10 46 | 56
| 15.15 45.10 | 33.33
------------+----------------------+----------
Total | 66 102 | 168
| 100.00 100.00 | 100.00

which is just what we wanted.

Chapter 3-14 (revision 16 May 2010) p. 18


Now, computing the sample odds ratio,

cc tumor nickel

Proportion
| Exposed Unexposed | Total Exposed
-----------------+------------------------+------------------------
Cases | 46 10 | 56 0.8214
Controls | 56 56 | 112 0.5000
-----------------+------------------------+------------------------
Total | 102 66 | 168 0.6071
| |
| Point estimate | [95% Conf. Interval]
|------------------------+------------------------
Odds ratio | 4.6 | 2.019615 11.17006 (exact)
Attr. frac. ex. | .7826087 | .5048562 .910475 (exact)
Attr. frac. pop | .6428571 |
+-------------------------------------------------
chi2(1) = 16.17 Pr>chi2 = 0.0001

In our sample, we get an OR of 4.60 (in contrast to the population OR of 3.76), which seems
rather off. However, we cannot judge if this is an unbiased estimate for the population OR,
because this estimate is subject to sampling variability.

Summarizing the steps to conducting a classical case-control study,

* -- classical sampling for case-control study


* sample controls from controls)
use nickelrefinary , clear
tab tumor nickel, col // observe 56 cases, 623 controls
set seed 999
sample 112 , count , if tumor==0
* used 2:1 sampling ratio selecting controls from control row
tab tumor nickel, col // 56 cases, 112 controls
cc tumor nickel // compute odds ratio

Chapter 3-14 (revision 16 May 2010) p. 19


Using a Monte Carlo simulation, we compute the OR from 1,000 separate samples, to determine
the long-run average OR. This will inform us whether or not the approach we used produces
unbiased estimates of the population OR.

* -- long-run average ordinary case-control study (control row


sampling)
clear
set obs 1
gen or=.
* create a file with 1 missing observation (a blank file)
save or_control_row, replace
*
set seed 999
forvalues i=1(1)1000{
quietly use nickelrefinary, clear
quietly gen or=. // variable to hold odds ratio
quietly sample 112 , count, if tumor==0 // control row sampling
quietly cc tumor nickel
quietly replace or=r(or) in 1/1
quietly keep or
quietly keep in 1/1
quietly append using or_control_row
quietly save or_control_row, replace
}
use or_control_row, clear
histogram or
sum or

The result is:


.8
.6
Density

.4
.2
0

2 4 6 8
or

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
or | 1000 3.814517 .6563894 2.090909 7.965854

Using the mean of the 1,000 ORs as the long-run average, we get OR = 3.81.

Chapter 3-14 (revision 16 May 2010) p. 20


Performing a similar simulation for the augmented dataset, with a 1:1 sampling ratio to keep the
sample as small as possible relative to the population size, using the mean of the 1,000 ORs as
the long-run average, we get OR = 3.77.

The simulation results are:

Classic Case-Control Design (sample controls from controls only)


Almost Rare Disease Frequent Disease
(3% unexposed, 12% exposed) (15% unexposed, 60% exposed)
Relative Population Simulation Long Population Simulation Long
Effect Measures Run Average Measures Run Average
Measure OR OR
OR 3.76 3.81 3.76 3.77
RR 3.43 2.65
IRR 4.76 3.87
HR 5.02 4.19

We see that the OR is an unbiased estimate of the OR, regardless of the rare disease assumption.
The OR is not an unbiased estimate of the RR, however. If the rare disease assumption was met,
however, it would be a reasonable close estimate. Notice the OR=3.81 is much closer to the
RR=3.43 in the “almost rare disease” column.

Chapter 3-14 (revision 16 May 2010) p. 21


Let’s see why the odds ratio is not affected by our choice of sampling ratio (e.g, 2:1, 3:1, etc.).

Data Layout for Case-Control Study


(Stata’s cc and cci commands)
Exposure
Disease Exposed (1) Unexposed (0) Totals*
cases (1) a b M1
noncases (0) c d M0
Totals* m1 m0
*The uppercase Ns (sample sizes) are fixed by the researcher,
and the lowercase Ns are observed.

odds ratio = ad/bc

With a 1:1 sampling ratio,

1 M1 = M0 = 1  (c+d) = 1c + 1d

With a 2:2 sampling ratio,

2 M1 = M0 = 2  (c+d) = 2c + 2d

We see that both c and d are multiplied by the same constant.

This has no effect on the odds ratio, regardless of the constant k,

odds ratio = a(kd)/b(kc)


= ad/bc since the k’s cancel

Sampling in general,

Data Layout for Case-Control Study


Exposure
Disease Exposed (1) Unexposed (0) Totals*
cases (1) a b M1 <-select some n fraction of cases
noncases (0) c d M0 <- select some k fraction of controls
Totals* m1 m0
*The uppercase Ns (sample sizes) are fixed by the researcher,
and the lowercase Ns are observed.

odds ratio = (na)(kd)/(nb)(kc)


= ad/bc since both the n’s and k’s cancel

Chapter 3-14 (revision 16 May 2010) p. 22


Case-Cohort Study Design

Rothman (2002, pp.84-86) suggests sampling controls from the entire population, regardless of
case or control status. Thus some cases may be selected as controls as well. (Rothman does not
even mention the classical case-control design, where controls are sampled only from non-
diseased subjects.) When controls are selected this way, which is from the entire population at
risk, than the study design is called a case-cohort design, rather than a case-control design
(Rothmand and Greenland, 1998, p.108).

This time we sample from the total row

Data Layout for Case-Control Study


Exposure
Disease Exposed (1) Unexposed (0) Totals*
cases (1) a b M1
noncases (0) c d M0
Totals* m1 m0 <- select some k fraction of controls
*The uppercase Ns (sample sizes) are fixed by the researcher,
and the lowercase Ns are observed.

so choosing our controls as some fraction k of the total row, we have

c = km1
d = km0

Our odds ratio is then


a 1 a a
 
ad a(km0 )  a   km0  km1  k  m1 m1
OR       b  1 b  b  RR
bc b(km1 )  km1   b   
km0  k  m0 m0

So if we choose our controls as some fraction of the total row, our odds ratio is identically the
risk ratio. That is, in a case-cohort study, the OR directly estimates the RR, regardless of the rare
disease assumption.

Exercise. Look at the Cai methods paper.

Notice in the second sentence of the abstract, they point out that the controls are sampled
from the “total row of the full cohort 2 x 2 table” when they state,
...a case-cohort design, which consists of a small random sample of the whole cohort and
all of the disease subjects...”

In the second paragraph, they cite some studies that have used the case-cohort design.

Chapter 3-14 (revision 16 May 2010) p. 23


Conducting a case-cohort study on this full-cohort, we use all of the cases and take a random
sample of controls from the total row of the 2 x 2 table.

This is a bit more complex, so we will commands rather than menus, since it will be easier to see
what we are doing.

* -- case-cohort study (sample controls from total sample)


use nickelrefinary , clear
tab tumor nickel, col // observe 56 cases, 623 controls
keep if tumor==1 // reduce sample to cases
save tumorcases, replace // save cases to file
*
use nickelrefinary , clear
set seed 999
sample 112 , count // 2:1 sampling ratio, selecting controls from
// total row since we do not use the “if
// tumor==0” this time
replace tumor=0 if tumor==1 // set these all to control status
append using tumorcases // bring cases back in
tab tumor nickel, col // 56 cases, 112 controls
cc tumor nickel

Then using the Monte Carlo method to obtain the long-run average OR from the original (rare
disease) dataset

* -- long-run average ordinary case-control study (total row


sampling)
clear
set obs 1
gen or=.
save or_control_row, replace // create a file with 1 missing
observation
*
set seed 999
forvalues i=1(1)1000{
quietly use nickelrefinary, clear
quietly gen or=. // variable to hold odds ratio
quietly sample 112 , count // total row sampling
quietly replace tumor=0 if tumor==1 // set all to control status
quietly append using tumorcases // bring cases back in
quietly cc tumor nickel
quietly replace or=r(or) in 1/1
quietly keep or
quietly keep in 1/1
quietly append using or_control_row
quietly save or_control_row, replace
}
use or_control_row, clear
histogram or
sum or

with a similar simulation using the augmented (frequent disease) dataset, also in the do-file.

Chapter 3-14 (revision 16 May 2010) p. 24


The simulation results are:

Classic Case-Control Design (sample controls from controls only)


Almost Rare Disease Frequent Disease
(3% unexposed, 12% exposed) (15% unexposed, 60% exposed)
Relative Population Simulation Long Population Simulation Long
Effect Measures Run Average Measures Run Average
Measure OR OR
OR 3.76 3.76
RR 3.43 3.48 2.65 2.67
IRR 4.76 3.87
HR 5.02 4.19

For this sampling approach, we see that the sample OR is an unbiased estimator of the population
RR, as Rothman claims it should be.

For the case-cohort design, the rare-disease assumption is not required for the OR to be an
estimate of RR (Rothman and Greenland, 1998, p.110). We have demonstrated that to be the
case.

Chapter 3-14 (revision 16 May 2010) p. 25


Density Case-Control Study Where Controls Are Sampled From Cases & Controls Which
Have Same Or Longer Time-At-Risk (Risk Set Sampling)

Rothman (2002, pp.76-80) suggests sampling controls from the entire population, regardless of
case or control status, but also select the controls from subjects with similar or longer time at risk
as the cases, in a matched fashion. Again, some cases may be selected as controls as well. This
is called risk-set sampling.

In this study design, we want the OR to be an unbiased estimate of the hazard ratio, HR, where
HR is a type of weighted average of the day-specific risk ratios.

Using the hypothetical data table we used above,

Life Table of Hypothetical Data


Exposed Non-Exposed
Follow- Begin Disease Day- Begin Disease Day- Day-
up day N Cases Specific N Cases Specific Specific
Risk Risk Risk
Ratio
1 50 5 0.10 50 2 0.04 2.5
2 30 10 0.33 40 8 0.20 1.7
3 10 10 1.00 20 10 0.50 2.0
totals 90 25 110 20

In this design, we also use a type of “total row sampling”. That is we select our controls from the
“Begin N” column’s of the life table.

For the 5+2 cases that occurred on day 1, we sample our controls from the 50+50 persons still at
risk on day 1.

For the 10+8 cases that occurred on day 2, we sample our controls from the 30+40 persons still at
risk on day 2,

and so on.

We do this by forming risk sets. For every case, we form a risk set that includes all subjects with
an equal or longer follow-up time. Then we sample 2 controls from that risk set, if we use a 2:1
sampling ratio, that we match with that case.

This is identically sampling from the correct row of the Begin N column.

Just as we saw in the case-cohort study, proportionality is maintained, which guarantees that our
OR is an estimate of RR for each row of the life table.

Chapter 3-14 (revision 16 May 2010) p. 26


Since we will analyze the data with conditional logistic regression, which maintains the matches
with the case and controls on the same row of the life table, we maintain the day-specific RR
analysis. Finally, the OR that comes out of the conditional logistic regression is a type of
weighted average across the rows.

Cox regression, which computes the HR directly, also summarizes the day-specific RR, (which is
also called the day-specific HR), computing a type of weighted average which it reports as the
HR.

Thus, the conditional logistic regression from the case-control study does the same thing as the
Cox regression from a cohort study. (NOTE: the conditional logistic approach is biased and so
should not be used, as is pointed out below.)

Let’s do it.

Obtaining a risk-set sample (density sample) and computing the OR,


* --- density case-control study ----
use nickelrefinary, clear // begin with full dataset
stset timerisk , failure(tumor==1)
set seed 999 // set seed so can replicate the analysis
* following command selects 2 controls per case from risk set of
* subjects with same or longer time-at-risk
sttocc, number(2)
clogit _case nickel, group(_set) or

we get
Conditional (fixed-effects) logistic regression Number of obs = 168
LR chi2(1) = 17.65
Prob > chi2 = 0.0000
Log likelihood = -52.698159 Pseudo R2 = 0.1434

------------------------------------------------------------------------------
_case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nickel | 4.951997 2.131085 3.72 0.000 2.130428 11.51049
------------------------------------------------------------------------------

Notice we used conditional logistic regression to obtain the odds ratio. Given our matching of
time-at-risk (from risk-set sampling), we had to obtain an odds ratio using a matched sample
approach since matched studies require a matched analysis (Rothman and Greenland, 1998, p.
98; Greenland and Thomas, 1982).

Chapter 3-14 (revision 16 May 2010) p. 27


Look at the data in Stata after running this. Notice Stata created three new variables, called
_case, _set, and _time. Interpret these.
list tumor nickel timerisk _case _set _time in 1/9 , sep(3)

+--------------------------------------------------------------------+
| tumor nickel timerisk _case _set _time |
|--------------------------------------------------------------------|
1. | 0. no tumor 1. exposure 35.6823 0 1 .70149994 |
2. | 0. no tumor 1. exposure 16.3425 0 1 .70149994 |
3. | 1. tumor 1. exposure .7014999 1 1 .70149994 |
|--------------------------------------------------------------------|
4. | 0. no tumor 1. exposure 9.000103 0 2 1.1797981 |
5. | 0. no tumor 1. exposure 6.370903 0 2 1.1797981 |
6. | 1. tumor 1. exposure 1.179798 1 2 1.1797981 |
|--------------------------------------------------------------------|
7. | 0. no tumor 1. exposure 19.6768 0 3 1.4220009 |
8. | 0. no tumor 0. no exposure 41.7836 0 3 1.4220009 |
9. | 1. tumor 0. no exposure 1.422001 1 3 1.4220009 |
+--------------------------------------------------------------------+

Notice that for each risk set, the follow-up time of the matched controls was greater than or equal
to the follow-up time of the case.

Chapter 3-14 (revision 16 May 2010) p. 28


To demonstrate that the OR using this study design provides an unbiased estimate of the HR, we
next obtain the long-run average.

* --- density case-control study (long-run average)----


clear
set obs 1
gen or=.
save or_density, replace // create a file with 1 missing observation
*
set more off // turn off scrolling prompt when display iteration
number
set seed 999
forvalues i=1(1)1000{
quietly use nickelrefinary, clear
quietly stset timerisk , failure(tumor==1)
quietly sttocc, number(2)
quietly clogit _case nickel, group(_set) or
quietly gen or = exp(_b[nickel]) in 1/1
// convert coefficient to OR
quietly keep or
quietly keep in 1/1
quietly append using or_density
quietly save or_density, replace
display `i' // display iteration number
}
set more on
use or_density, clear
*histogram or
sum or

We then do the same thing for the augmented dataset.

The simulation results are:

Density Case-Control Design


Almost Rare Disease Frequent Disease
(3% unexposed, 12% exposed) (15% unexposed, 60% exposed)
Relative Population Simulation Long Population Simulation Long
Effect Measures Run Average Measures Run Average
Measure OR OR
OR 3.76 3.76
RR 3.43 2.65
IRR 4.76 3.87
HR 5.02 5.42 4.19 4.43

We see that the OR is a biased estimate of the HR, and so the conditional logistic regression
model should not be used for the analysis. It is close though. Another approach is taught below.

Chapter 3-14 (revision 16 May 2010) p. 29


Terminology Inconsistencies

Not all authors use the study design names consistently. For example,

A)

Rothman (2002, p.84-86) uses the term case-cohort study design to refer to the situation when
follow-up is assumed equal for all subjects, or just simply ignored, so controls are simply
selected from all subjects at risk (from the total row of the 2 × 2 table).

Prentice (1986, p.2) calls this study design a case-cohort design: binary response.

B)

Rothman (2002, pp. 76-80) uses the term density case-control study design to refer to the
situation when follow-up is not equal for all subjects, so risk-set sampling is used to select
controls from all subjects at risk with equal or longer follow-up times (from the appropriate row
of a life table).

Prentice (1986, p. 4) calls this study design a case-cohort design: time to response data.

Some Methods Papers

Jewell (2004, pp.51-53) presents risk-set sampling (density case-control study design) as a way
to use a case-control study to obtain an estimate of the hazard ratio (HR).

Rothman (2002, pp.76-80) presents the density case-control study design as a way to obtain an
estimate of the incidence rate ratio (IRR). Although he does not say so, Rothman is apparently
making the assumption that risk is constant across time. Under that assumption, HR = IRR.

Prentice RL. (1986). A case-cohort design for epidemiologic cohort studies and diease prevention
trials. Biometrika 73:1-11.

King G, Zeng L. (2002). Estimating risk and rate levels, ratios and differences in case-control
studies. Statist Med 21:1409-1427.

Volovics A, van den Brandt PA. (1997). Methods for the analysis of case-cohort studies. Biom J
39(2):195-214.

Chapter 3-14 (revision 16 May 2010) p. 30


Some Studies That Used the Case-Cohort Approach

1) Rossing MA, Daling JR, Weiss NS, Moore DE, Self SG. (1996). Risk of breast cancer in a
cohort of infertile women. Gynecol Oncol 60(1):3-7.

Abstract
The purpose of this study was to assess: (1) the risk of breast cancer associated with use
of ovulation-inducing agents (such as clomiphene citrate) as treatment for infertility; and
(2) the risk associated with ovulatory abnormalities that result in infertility. We
performed a case-cohort study among 3837 women evaluated for infertility at clinics in
Seattle, Washington, at some time during 1974–1985. Computer linkage with a
population-based tumor registry was used to identify women diagnosed with breast cancer
before January 1, 1992. Data regarding infertility testing and treatment were abstracted
from the infertility clinic medical records for women who developed breast cancer and a
randomly selected subcohort. Twenty-seven women in the cohort developedin situor
invasive breast cancer, in comparison with an expected number of 28.8 cases
(standardized incidence ratio, 0.9; 95% confidence interval (CI), 0.6–1.4). Infertile
women with evidence of an ovulatory abnormality were at a risk of breast cancer similar
to that of women whose infertility was believed to be due to other causes. The risk among
women who had taken clomiphene was reduced relative to infertile women who had not
used this drug (adjusted relative risk, 0.5; 95% CI, 0.2–1.2), but the reduction in risk did
not increase with duration of use. The possibility that use of clomiphene as treatment for
infertility lowers the risk of breast cancer should be examined in other, larger studies.
Notice this study has a long and unequal follow-up, so the density case-control design is well-
suited for this study.

2) Savitz DA, Cai J, van Wijngaarden E, et al (2000). Case-cohort analysis of brain cancer and
leukemia in electric utility workers using a refined magnetic field job-exposure matrix.
American Journal of Industrial Medicine 38:417-425.

3) Voorrips LE, Goldbohm RA, Brants HA, et al. (2000). A prospective cohort study on
antioxidant and folate intake and male lung cancer risk. Cancer Epidemiol Biomarkers Prev
9:357-65.

In the 3rd paragraph of their Data Analysis section, they state they did something special to
adjust the variance estimates, using a software routine they developed:

“Because standard software was not available for case-cohort analysis, specific macros
were developed to account for the additional variance introduced by sampling from the
cohort instead of using the entire cohort (29).”

Chapter 3-14 (revision 16 May 2010) p. 31


Something similar is available in Stata, but you must update your Stata to get it. First use the
help facility to search on “case cohort”. Then click on the sbe41 link when you see this:

STB-59 sbe41 . . . . . . . . . . . . Ordinary case-cohort design and analysis


(help stcascoh, stselpre if installed) . . . . . . . . . V. Coviello
1/01 pp.12--18; STB Reprints Vol 10, pp.121--129
selects a sample from a cohort, prepares the dataset for
analysis using a Cox regression model, and computes the
Self-Prentice variance estimator of the parameters

This approach in Stata fits a Cox regression model to the data with an appropriate variance
estimate, so the p values and confidence intervals are correct.

What Researchers Actually Use

Usually when sampling is from a larger cohort, follow-up times are available. Rather than using
the risk set sampling and conditional regression approached described above, researchers instead
using Cox regression model with a special variance estimator (at least three such estimators have
been proposed). All of the example papers presented in this chapter followed this suitably
adapted Cox regression analysis approach.

After updating Stata to get the stcascho and stselpre commands,


use nickelrefinary, clear
stset timerisk , failure(tumor==1) id(caseid)
stcox nickel // full cohort
stcascoh, alpha(.2) seed(999) // sample 20% of the controls
stselpre nickel

The results are:

Chapter 3-14 (revision 16 May 2010) p. 32


1) full cohort
Cox regression -- Breslow method for ties

No. of subjects = 676 Number of obs = 679


------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nickel | 5.020635 1.765523 4.59 0.000 2.520176 10.00199
------------------------------------------------------------------------------

2) sampled cohort
. stcascoh, alpha(.2) seed(999) // .2 or 20% of the cohort

failure _d: tumor == 1


analysis time _t: timerisk
id: caseid

Total sample = 174


------------------------------------------------------------------------------
191 total obs.
0 exclusions
------------------------------------------------------------------------------
191 obs. remaining, representing
174 subjects
56 failures in single failure-per-subject data

Self Prentice Variance Estimate for Case-Cohort Design

Self Prentice Scheme

------------------------------------------------------------------------------
| Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nickel | 4.652655 1.846243 3.87 0.000 2.137624 10.12676
------------------------------------------------------------------------------

Prentice Scheme

------------------------------------------------------------------------------
| Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nickel | 4.587959 1.82057 3.84 0.000 2.1079 9.98594
------------------------------------------------------------------------------

Saving the Case-Cohort Sample


At this point, only the sample of N=191 subjects are in the data editor. Be sure to save this file if
you want to do something like a chart review to collect further predictor variables on this case-
cohort sample.

Chapter 3-14 (revision 16 May 2010) p. 33


Using simulation to check the unbiasedness of this approach, with the Prentice Scheme

clear
set obs 1
gen or=.
save hr_simulation, replace // create a file with 1 missing
observation
*
set more off // turn off scrolling prompt
* set seed 999 // doesn't work outside of shcascoh command
forvalues i=1(1)1000{
quietly use nickelrefinary, clear
quietly stset timerisk , failure(tumor==1) id(caseid)
quietly stcascoh, alpha(.1798) // sample 18% (n=112) of controls
quietly stselpre nickel // fit model
quietly matrix A=e(b)
quietly svmat A // creates variables from matrix columns
quietly gen hr = exp(A1) in 1/1 // convert coefficient to OR
quietly keep hr
quietly keep in 1/1
quietly append using hr_simulation
quietly save hr_simulation, replace
display `i' // display iteration number
}
set more on
use hr_simulation, clear
sum hr

we get

. sum hr

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
hr | 1000 5.051037 1.065216 2.663914 9.633725

so the long-run HR=5.05, compared to the population HR=5.02, which is an unbiased estimate.

In contrast, the risk-set sampling, followed by conditional logistic regression, which was
demonstrated above and gave a long-run average HR=5.42, produces a biased estimate and so
should not be used.

Chapter 3-14 (revision 16 May 2010) p. 34


Sample Size Determination

We have seen that subjects can be included as both cases and controls in the case-cohort
approach. This overlap requires that the sample size be inflated to allow for this. Rothman,
Greenland, and Lash (2008) comment,

“Case-cohort designs have other advantages as well as disadvantags relative to alternative


case-conrol designs (Wacholder, 991). One disadvantage is that, because of the overlap of
membership in the case and control groups (controls who are sleeced may also develop
disease and enter the study as cases), one will need to select more controls in a case-
cohort study than in an ordinary case-control study with the same number of cases, if one
is to achieve the same amount of statistical precision. Extra controls are needed because
the stratistical precision of a study is strongly determined by the numbers of distinct cases
and noncases. Thus, if 20% of the source cohort members will become cases, and all
cases will be included in the study, one will have to select 1.25 times as many controls as
cases in a case-cohort study to ensure that there wil be as many controls who never
become cases in the study. On average, only 80% of the controls in such a situation will
remain noncases; the other 20% will become cases. Of course, if the disease is
uncommon, the number of extra controls needed for a case-cohort study will be small.”
------
Wacholder S. (1991). Practical considerations in choosing beteen the case-cohort and
nested case-control design Epidemiology 2:155-158.

The “1.25” comes from: 80%, or 4/5 of the controls are “controls only”. To get this sample size
back up to 100% controls only, with equals number of cases, you (5/4)(4/5) = 1, where 5/4 =
1.25.

Chapter 3-14 (revision 16 May 2010) p. 35


References

Breslow NE, Day NE. (1987). Statistical Methods in Cancer Research, Vol II: The Design and
Analysis of Cohort Studies, Lyon, France, IARC.

Cai J, Zeng D. (2004). Sample size/power calculation for case-cohort studies. Biometrics
60:1015-1024.

Dupont WD. (2002). Statistical Modeling for Biomedical Researchers: A Simple Introduction to
the Analysis of Complex Data. Cambridge UK, Cambridge University Press.

Greenland S, Thomas DC. (1982). On the need for the rare disease assumption in case-control
studies. Am J Epidemiol 116(3):547-553. with erratum in Am J Epidemiol
1990;131(6):1102.

King G, Zeng L. (2002). Estimating risk and rate levels, ratios and differences in case-control
studies. Statist Med 21:1409-1427.

Jewell NP. (2004). Statistics for Epidemiology. New York, Chapman & Hall/CRC.

National Heart, Lung, and Blood Institute. (1998). Clinical guidelines for the identification,
evaluation, and treatment of overweight and obesity in adults: the evidence report.
Bethesda, MD, National Heart, Lung, and Blood Institute.

Onyike CU, Crum RM, Lee HB, Lyketsos CG, Eaton WW. (2003). Is obesity associated with
major depression? Results from the third national health and nutrition examination
survey. Am J Epidemiol 158(12):1139-1153.

Prentice RL. (1986). A case-cohort design for epidemiologic cohort studies and diease prevention
trials. Biometrika 73:1-11.

Rothman KJ. (2002). Epidemiology: An Introduction. New York, Oxford University Press.

Rothman KJ, Greenland S. (1998). Modern Epidemiology, 2nd ed. Philadelphia, PA.

Rothman KJ, Greenland S, Lash TL. (2008). Case-control studies. In Rothman KJ, Greenland S,
Lash TL, Modern Epidemiology, Philadelphia, Lippincott Williams & Wilkins, 2008,
pp.111-127.

Savitz DA, Cai J, van Wijngaarden E, et al (2000). Case-cohort analysis of brain cancer and
leukemia in electric utility workers using a refined magnetic field job-exposure matrix.
American Journal of Industrial Medicine 38:417-425.

Volovics A, van den Brandt PA. (1997). Methods for the analysis of case-cohort studies. Biom J
39(2):195-214.

Voorrips LE, Goldbohm RA, Brants HA, et al. (2000). A prospective cohort study on antioxidant
Chapter 3-14 (revision 16 May 2010) p. 36
and folate intake and male lung cancer risk. Cancer Epidemiol Biomarkers Prev
9:357-65.

Chapter 3-14 (revision 16 May 2010) p. 37

You might also like