You are on page 1of 26

Epidemiologic Study Design and Data Analysis

BIOINFORMATICS TECHNOLOGIES LFSC520

Shelley Harris, Ph.D. Associate Professor Department of Epidemiology and Community Health & Center for Environmental Studies saharris@vcu.edu

Outline
Introduction to epidemiology Study designs and measures of risk in epidemiology Statistical Analysis programs Epidemiological Analysis programs An example calculation .. Some cautions

Shelley A. Harris

Digitally signed by Shelley A. Harris DN: CN = Shelley A. Harris, C = US, O = VCU Reason: I am the author of this document Date: 2005.02.02 16:42:43 -05'00'

Introduction to Epidemiology
Epidemiology is the study of patterns of disease occurrence and other healthrelated conditions in human populations and of the factors that influence these occurrences and conditions.

Leading Causes of Death


33.5%

Percent of Total Deaths


23.5%

6.7% 4.3% 4.0% 3.7% 2.2% 1.4% 1.2% 1.2% 1.2%

He Di art se as es Ca nc Ce er re br ov Di as se cu as lar es Ac cid Ch en ro ts n Lu ic O ng bs Di tru se ct as ive es Pn eu m In o n flu ia en & za Di ab et es

ho sis L o HI iver f V In fe ct io n

Su ici

US data/Adapted from Cancer Journal for Clinicians, 1994.

Ci rr

Ho m

ici de

de

Male Cancer Statistics


Estimated incidence
Melanoma of skin Oral Lung Pancreas Stomach Colon & Rectum Prostate Urinary Leukemia & Lymphomas All others 3% 3% 16% 2% 2% 12% 32% 9% 7% 14%

Estimated deaths
2% Melanoma of skin 2% Oral 33% Lung 4% Pancreas 3% Stomach 10% Colon & Rectum 13% Prostate 5% Urinary 8% Leukemia & Lymphomas 20% All others

US data/Adapted from Cancer Journal for Clinicians, 1994.

Female Cancer Statistics


Estimated incidence Estimated deaths

Melanoma of skin Oral Breast Lung Pancreas Colon & Rectum Ovary Uterus Urinary Leukemia & Lymphomas All others

3% 2% 32% 13% 2% 13% 4% 8% 4% 6% 13%

1% 1% 18% 23% 5% 11% 5% 4% 3% 8% 21%

Melanoma of skin Oral Breast Lung Pancreas Colon & Rectum Ovary Uterus Urinary Leukemia & Lymphomas All others

US data/Adapted from Cancer Journal for Clinicians, 1994.

A report from the National Cancer Institute (NCI) estimates that about 1 in 8 women in the United States (approximately 12.8 percent) will develop breast cancer during her lifetime.

Descriptive Epidemiologic Studies


correlational or ecologic studies used to determine patterns of disease or disability across different populations, geographical areas and time

Breast Cancer
International incidence rates (per 100,000
women)
129.5 108.8 108.6 106.8 94.3 84.3 76.4 60.1 37.0

Sweden

US

Italy

Netherlands

United Kingdom

France

Germany

Spain

Japan

Adapted from International Opportunities in Cancer Management, SRI International, 1994.

Breast Cancer Rates in American Women

Breast Cancer Rates


England & Wales Denmark New Zealand Canada United States Australia France Spain Singapore USSR Hong Kong Japan China Thailand

10

15

20

25

30

35

Rate/10000

Ecological or Correlational Studies


generally inexpensive and quick substantial international and temporal variations relate observed differences in the morbidity or mortality to the spatial and temporal distribution of risk factors living habits, genetic composition of groups, occupational or environmental exposures

Ecological Studies- Breast Cancer?


50 45 40 35 30 25 20 15 10 5 0

Australia Germany Rate of Breast Cancer Per Capita Consumption of Beer 25 50 20 40

Holland 18 40

England 17 35

France 15 30

Canada 10 26

United States 7 24

Ecological Studies- Heart Disease


50 45 40 35 30 25 20 15 10 5 0

Australia Germany Rate of Heart Disease Per Capita Consumption of Beer 7 50 10 40

Holland 15 40

England 17 35

France 18 30

Canada 20 26

United States 25 24

Sperm Counts over time (years)


S p e r m C o u n t s (x 1 0 6 ) 10 8 6 4 2 0 0 1 2 3 4 Year 5 6 7 8 R = 0.9793
2

Popular Hypotheses:
Exposure to synthetic environmental estrogens is related to: 1) increases of breast and prostate cancer over time 2) differences in rates of breast and prostate cancer between counties 3) decreases in sperm quality/quantity observed over the last 50 years Exposure to natural environmental estrogens accounts for: 1) differences in rates of breast and prostate cancer between counties and differences in heart disease

An ecological fallacy is defined as:


The bias that may occur because an association observed between variables on a group level does not necessarily represent the association that exists at an individual level.

10

Geographic Information Systems


An organized collection of computer hardware, software, geographic data, and personnel designed to efficiently capture, store, manipulate, analyze, and display all forms of geographically referenced data

Breast Cancer Sociodemographic and Lifestyle


sociodemographic
SES Marital status Residence Race/ethnicity Religion 1.1-2x 1.11.1-2x 1.11.1-2x 1.11.1-2x 1.11.1-2x 1.1(high / low) (never / ever) (urban / rural) (white, hispanic / asian) asian) (Jewish / Mormon) Obesity/body shape?? low physical activity??

lifestyle factors
Diet (high fat) ? high alcohol consumption?? Smoking, passive smoke???

11

Established/Suspected Breast Cancer Risk Factors


Age > 50: 4x Family history of disease: 1 relative premenopausal or 2 relatives with any form: 4x History of benign breast disease: 4x BRCA1 or BRCA2 mutation: 4x Ionizing radiation (esp b/w puberty and 20 yrs): 2-4x (esp 2Lifetime exposure to estrogen Lifestyle factors? Environmental factors?

Lifetime exposure to estrogen


early menarche: 1.1-2x (11 / 15) 1.1Nulliparity: 1.1-3x (nulliparous / parous) Nulliparity: 1.1parous) late age of first birth: 1.1-3x (>30 / <20) 1.1late onset of menopause: 1.1-2x (55 / 45) 1.1lactation? oral contraceptives ?? hormone replacement therapy ??

12

Suspected Environmental/Occupational Causes of Breast Cancer


1) 2) 3) 4) 5) Low level ionizing radiation Solvent exposures Electromagnetic fields (EMF) Organochlorine compounds Pesticides

Analytic Studies
To test hypotheses it is necessary to conduct analytic epidemiological studies. Analytic studies can be divided into two main types: 1) Experimental studies -> Clinical trials 2) Observational Studies

13

The Epidemiologic Study

Controlled Assignment

Uncontrolled Assignment

Experimental Studies

Observational Studies

Non Randomized Assignment

Randomized Assignment

Sampling with Regard To Disease or Effect

Sampling with Regard To Exposure, Characteristic, Or Cause Prospective studies (cohort, case-cohort)

Community Trials

Clinical Trials (Efficacy, Effectiveness)

Cross-sectional and/or Retrospective Studies

Exposure or Characteristic At Time of Study

History of Exposure or Characteristic Prior to Time of Study Retrospective Studies (Case-control)

Cross-sectional Studies

Case-Control Studies
selected into a study based on their disease status. sometimes called retrospective studies or case-referent most common type of epidemiologic study

14

Case-Control Studies
Retrospective Prospective

Follow back in time to determine exposure status

In 2001 select case and control groups

1960

2001

2030

Measures of Risk or Association


Cases Exposed Not exposed Totals a c N1 Controls b d N2 Totals M1 M2 N

Odds Ratio (OR) = (a/b) / (c/d)

Odds of being a case if you are exposed = a/b Odds of being a case if you are not exposed = c/d

15

Self-reported pesticide exposure Yes No Totals

Breast Cancer Yes 499 19 518 No 462 56 518 Totals 961 75 1036

Odds Ratio (OR) = (499 / 462) / (19 / 56) = 3.18

Cohort Studies
considered a natural experiment called follow-up studies, incidence studies, or longitudinal studies 1) Prospective cohort 2) Retrospective cohort

16

Cohort Studies
Retrospective Prospective In 2005 select exposed and non-exposed groups

In 2005 select exposed and non-exposed groups

1960

2005

2030

Measures of Risk
Disease Exposed Not exposed Totals a c N1 No Disease b d N2 Totals M1 M2 N

Rate of disease in exposed Rate of disease in non-exposed Relative risk = (a/M1) (c/M2)

= a/(a+b) = a/M1 = c/(c+d) = c/M2

17

[PCBs] in blood High Low Totals

Breast cancer Yes 20 5 25 No 9980 9995 19975 Totals 10000 10000 20000

Relative Risk (RR)

= (20 / 10,000) / (5 / 10,000) = 4.0

Established Environmental Causes of Breast Cancer


1) Ionizing radiation

18

Some Statistical Analysis Software


SAS SPSS SYSTAT SPLUS

SAS is a large, general-purpose package descended generalfrom an original program that was designed to run on mainframe computers in a "batch" mode, ie. by the user ie. submitting a batch of commands and then getting a pile of results in a separate output file (or window, now that Windows and Mac versions are available). Along with a slightly complicated approach to data management, this makes the program harder to learn and compared with SPSS there is less capability to learn by experiment using menus. On the other hand, the data processing capabilities are extremely powerful and the range of statistical procedures wide.
University of Melbourne

19

SPSS is a well-known package particularly popular in the social sciences and psychology. It is a very large and somewhat cumbersome program but also very powerful and capable of performing almost all the standard methods of analysis. Recent Windows versions have a convenient user interface, but it can still be hard to keep track of exactly what you've done. The menu-based interface makes it relatively easy to learn, at least for simple applications
University of Melbourne

S-PLUS is a program for specialist statisticians only. It is an interactive, object-oriented system, with both a wide range of built-in functions and complete programming capabilities for extending these. Probably its most useful feature for us is an extremely powerful and relatively easy-to-use capacity for graphics.
University of Melbourne

20

Epidemiological Analysis Software


Epi Info (free) http://www.cdc.gov/epiinfo PEPI (not so free) EGRET (not free)

Epi Info
Latest Version: Epi Info Version 3.3 With Epi Info and a personal computer, epidemiologists and other public health and medical professionals can rapidly develop a questionnaire or form, customize the data entry process, and enter enter and analyze data. Epidemiologic statistics, tables, graphs, and maps are produced with simple commands such as READ, FREQ, LIST, TABLES, GRAPH, and MAP. Epi Map displays geographic maps with data from Epi Info. A new version, Epi Info for Windows retains many features of the familiar Epi Info for DOS, while offering Windows ease of use strengths such as point-and-click commands, graphics, fonts, and point- andprinting. http://www.cdc.gov/epiinfo/ http://www.cdc.gov/epiinfo/

21

Egret
Egret software is a statistical package that specializes in offering modeling and offering graphics capabilities to investigators conducting epidemiological and biomedical epidemiological studies. Egret is a user-friendly statistical package for epidemiologists. userComprehensive Set of Models: Many Not Available Elsewhere
Contingency Tables Logistic Regression Conditional Logistic Regression* Logistic Regression with Random Effects Beta-Binomial Regression BetaPoisson Regression Weibull Regression Exponential Regression Cox Proportional Hazards Regression Cox Regression with Time-Dependent Covariates TimeKaplan-Meier Analysis and Plots KaplanExtensive Post-Fit Analysis with Plots, Including Delta-Betas, and Hazard Functions PostDeltaPlus a new spreadsheet-based data editor and a statistical scratchpad spreadsheetUnlike other epidemiology software, Egret permits the case/control ratio to vary over strata case/control without using an approximation for the conditional likelihood function. function.

Cancer: Surveillance, Epidemiology and End Results


The SEER Cancer Statistics reports, publications, public-use data and analysis software are available at the National Cancer Institute web site: http://SEER.Cancer.Gov/

22

SEER The small print


PC Software to Calculate Statistics from SEER and Other Data Sources Sources SEER*Stat statistical software can be used to view individual cancer records or records calculate incidence, mortality, survival, and prevalence statistics from SEER and statistics other cancer-related databases. All variables in the SEER public-use data are cancerpublicavailable for analysis. Statistics calculated in SEER*Stat can be viewed, printed, or be exported for further analysis using other statistical software, including those described below. Joinpoint is statistical software for the analysis of trends using models with several different lines that are connected at the "joinpoints." The software takes trend data "joinpoints." (e.g., cancer rates) and fits the simplest joinpoint model that the data allow. Joinpoint is often used to analyze trends in rates calculated by SEER*Stat. SEER*Stat. DevCan software uses lifetable methods to compute the lifetime and age-conditioned ageprobability of developing cancer and dying of cancer in the general population. Input general data for the computations include cancer incidence and mortality rates as well as allallcause mortality rates. Data sets are supplied to estimate risks of developing and dying of cancer for over 20 cancer sites by race and sex. In addition, DevCan can be addition, used to calculate the lifetime risk using rates calculated in SEER*Stat and exported SEER*Stat for use in DevCan. DevCan.

Class exercise see handouts


Sample size calculation using: 1) Hand calculation 2) Web-based free program (find your own) 3) Examples in EXCEL and SAS

23

True Difference Present Conclusion of Statistical Test Different Correct (true positive) (1-=Power) Incorrect: Type II ( ) error (false negative) Absent Incorrect: Type I () error (false positive)

Not Different

Correct (true negative)

Power n=100

Ho n=1000

alpha beta

Ho

24

Sample Size

1 _ _ 1 + p q ( Z / 2 + Z ) 2 k n= ( p1 p2 ) 2

CRAP Detector #1
Beware the large sample size.
Effects can be statistically significant and biologically inconsequential

CRAP.: Circular Reasoning or Antiintellectual Pomposity

25

CRAP Detector #2
Beware the small sample size
It is hard to find significant differences and no difference means nothing.

Some Thoughts
garbage in garbage out. consult the biostatistician and the epidemiologist we charge by the hour

26

You might also like