Introduction to Epidemiology

TOPIC 1: INTRODUCTION TO EPIDEMIOLOGY
WHAT IS EPIDEMIOLOGY AND WHY IS IT IMPORTANT FOR PUBLIC HEALTH?
 Epidemiology is the study of the distribution and determinants of health states or events in
specified populations and apply tis to control and prevent health problems
 Associated with the collective health of people in a community
 Provides evidence of health problems as well as the effects and impacts of interventions to
address health problems
 Directs public health and health promotion action for the prevention, control and treatment of
disease
THE MEANING AND SCOPE OF EPIDEMIOLOGY
 There are six main areas that epidemiology contributes to public health and health promotion
1. Determining the extent of ill health or disease in the community
o What is the burden of disease in the community?
o How is it distributed?
o These questions are critical in planning health services and programs
2. Identifying the cause of ill health and risk factors for disease
o The aim is to intervene and prevent morbidity and mortality from the disease, via
prevention programs
o If the causal factors or risk factors for ill health are identified it would be easier to
develop ways to reduce or eliminate exposure to these factors
3. Studying the natural history and prognosis of ill health.
o To know about the natural pattern of ill health, so that we know when and how to
intervene and whether that intervention makes a difference
4. Investigating disease outbreaks or epidemics
5. Evaluating existing and preventive and therapeutic programs and services
o Do the health programs and services offered change the natural history of disease?
o Do they change the outcomes for the better?
6. Providing the foundation for developing public policy and regulation
o Information on the causes of ill health, and factors that help to maintain good health,
provide governments with the basis on which to implement policies that help to
maintain good health of the whole community
BRANCHES OF EPIDEMIOLOGY
 DESCRIPTIVE EPIDEMIOLOGY
o Focuses on the description of health states and events and their distribution
 Person, place and time are some of the key descriptors of health and illness in populations
o When considering the distribution or pattern of a condition in a population, we are asking:
 Who has the condition
 Where are they?
 When did they have it?
 Person place and time are some of the key ways that epidemiologists describe the health and
illness of populations
 PERSON
o The first steps in an investigation is to count how many people are involved, and the
number of people affected need to be related to the size of the population through the
calculation of risks or rates
o Compare the people affected at different times or different places.
o Since people are not usually homogenous, so the counts may be expressed in terms of
characteristics of the people in the population.
o Inherent characteristics of the population such as sex and age are very commonly used to
describe subgroups of a population
o Other characteristics of interests include, socio-economic characteristics (i.e. education,
occupation, or income) or health-related characteristics (i.e. smoking, alcohol
consumption or sun exposure)
 PLACE
o Describing the geographic distribution of a condition, likewise, can assist in delineating
those who may benefit from health-related interventions and provide pointers as to
possible determinants or risk factors.
o Place of residence, schools, workplaces or birthplaces are useful descriptors
o Administrative descriptors of place such as local government area, city, state or country
may be useful
o Analysis of data by place may provide clues as to the source of infection or environmental
exposure and to the possible means of transmission
 TIME
o Time trends can be used to:
 Predict what may occur in the future
 Provide clues as to what is causing a change in the condition’s occurrence
 Examine the effectiveness of policies or programs
o USES OF DESCRIPTIVE STUDIES
 Extent and distribution of health states in the community
 Trends in health states over time
 Natural history and prognosis of disease
o MAIN STUDY DESIGNS
 CROSS-SECTIONAL STUDIES (PREVALENT STUDIES)
 The study type best suited to describe the situation in a defined population at a single point
in time, and provide a snapshot of the population at that time
 It is common for a cross-sectional study to include both descriptive and analytical aims
 MAIN FEATURES
o Population of interest:
 May be the general population or a specific
subgroup (i.e. age group, those who live in a
specific area or those with certain
characteristics in common (e.g. single
mothers or car drivers))
o Study sample:
 Not usually possible or necessary to
measure health characteristic(s) for every
member of the population of interest,
hence a subset or sample of the population
is selected instead.
 The purpose of the study sample is to
represent individuals from the population of interest, to generalize the results measured
in the sample to the entire population of interest.
 The method of sample selection is crucial to accurately represent the population, hence
the most reliable way to achieve a representative sample is crucial to select the sample
randomly from the population of interest.
 REPEAT CROSS-SECTIONAL STUDIES
 A cross-sectional study is a snapshot of a single point in time, this cannot provide
information on time trends or on the natural history or prognosis of a disease.
 An alternative way to obtain information that includes the passage of time is the repeat
cross-sectional studies approach, new sample at each time point.
 Instead of taking a single sample form a population at one point in time, a new sample is
drawn from the population of interest at each point in time.
 With repeated cross-sectional studies, measurements are made on different individuals in
the sample, rather than the result of the passage of time.
 Regardless of this disadvantage, repeat cross-sectional studies are still a useful means of
investigating trends over time
 Many health authorities conduct surveys (a form of repeated cross-sectional studies) at
regular intervals and monitor the populations health status at each point in time and trends
over the entire time.

LONGITUDINAL DESCRIPTIVE
STUDIES
 They are useful for studying the
natural history and prognosis of
disease, they can be used to
describe changes over time in any
health state.
 They Follow a defined group
through time to describe the
changes over time
o E.g. natural history studies,
monitoring of change over time
 They are a way of describing the occurrence of health-related events over time.
 Longitudinal studies a defined group over time, which is kind of the same as a cross-
sectional study where participants are followed in time and measures repeated.
 The important feature of longitudinal studies is similar to those of cross-sectional studies.
The process of sampling and obtaining study participants has the same characteristics as it
does when measures are being taken at only one point in time
 REPEAT CROSS-SECTIONAL STUDIES VS LONGITUDINAL STUDIES
 Repeat cross-sectional studies involves the selection of a new sample at each point in time,
retention of a sample from time period to time period is not an issue however the
measurements made on the different individuals at each point in time is an issue.
 This means that there may be variability between time points that is the result of the
passage of time. Regardless of this disadvantage repeated cross-sectional studies are still a
useful means of investigating trends over time
 ANALYTICAL EPIDEMIOLOGY
o Address the why? questions
 A range of more sophisticated methods for investigating the determinants of health and
illness
o Involves a range of more sophisticated methods for investigating the determinants of health
and illness.
RATES IN EPIDEMIOLOGY
 A rate is used as a general term in epidemiology

 Rate and risk are often used synonymously
 DEFINITIONS
o Rate: a measure of the frequency of occurrence of a phenomenon. It is an expression of
frequency with which an event occurs in a defined population in a specified time period
o It is essential for comparison between populations at different times, different places or
among different classes of people
 COMPONENTS
o The numerator
o The denominator
o The specified time in which events occur
o The multiplier, a power of 10 (converts the rate from a fraction or decimal to a whole number)
Quiz
Use this short quiz to test your understanding of the material covered in Topic 1 (please note
answers will not be provided). If you have any questions please post your questions on the Topic 1
Discussion Board.
1. What is epidemiology?
2. What are five ways in which epidemiology contribute to public health and health
promotion?
3. What are the first steps in epidemiology?
4. What are some important issues that epidemiology can address?
5. Three essential characteristics we look for in descriptive statistics are person, time and
place. Give some examples of each, which may be of interest in epidemiological studies.
6. What are the main descriptive epidemiological study designs?
7. Name the advantages and disadvantages of each study.
TOPIC 2: DESCRIPTIVE STUDIES
POPULATION OF INTEREST, SAMPLING FRAMES AND SAMPLING
 In order for individuals to be randomly selected form a population there has to be some way of
identifying members of the population.
 Population of interest: all of the individuals in a group that you care about
 A sampling frame: a list of all the members in the population of interest
 The more complete and accurate the listing the better any sample selected using that listing will
represent the population of interest.
 The first step of selecting a sample is to define the population of interest
 The second step is identification of an appropriate sampling frame representative of the
population
 The third step is to draw a sample through random sampling, systematic sampling or some other
means
 The fourth step is recruiting the participants, usually not all those selected in the sampling frame
would be willing to undertake in the experiment. Hence individuals who participate are those
that the data will be collected from.
 TYPES OF SAMPLES
o RANDOM
 Random sample: a type of probability sampling method in which everybody has an equal
chance of being selected
 An unbiased representation of the total population.
 TYPES OF RANDOM SAMPLE
 SIMPLE RANDOM SAMPLING
o Each person has the same (and known) chance of inclusion
o Involves some type of method to generate the sample
 i.e. picking names out of a hat
o May not be ideal when interested in a sub-group
 STRATIFIED RANDOM SAMPLING
o This involves dividing the population into subgroups and then sampling randomly within
each sub-group where each specified subgroup (e.g. gender, or age) each person has the
same (and known) chance of inclusion
 CLUSTER RANDOM SAMPLING
o Used for samples that already exist in groups (e.g. schools, hospitals, suburbs, etc) then
random selection of individuals within the cluster are used as a sample to represent the
population of interest
o Two methods are used (groups are randomly selected, then individuals are randomly
selected from the groups)
o SYSTEMIC
 Systemic sampling: a type of probability sampling method in which sample, members from a
larger population are selected according to a random starting point and a fixed periodic
interval.
 i.e. picking out every 3rd team member from everybody who attended a service in a specific
time
o CONVENIENCE
 Convenience sampling: a non-probability sampling technique that involves the selection of
members just because of their easy accessibility and proximity to the researcher
 i.e. passer-by
 SAMPLE SELECTION
o The best way to achieve a sample that is representative of
the population is to choose randomly or through probability
sampling techniques from the population of interest.
o A sampling frame is used to achieve a random sample
o Sometimes random samples are not possible as there may not
be any possible sampling frame hence alternatives such as
selective or systemic samples are used
 Where attendees at a specific service are selected during that
specific time
o Avoid samples that exclude certain groups or groups that may
be unintentionally excluded
 This will result in the included being different from the not
included
FEATURES OF DESCRIPTIVE STUDIES
 ALL DESCRIPTIVE STUDIES

o
Those who are selected, and data is collected from become the study participants
o
A random sample is only representative if all or most of those selected to participate actually
participate
o All descriptive studies should calculate a response rate
o RESPONSE RATE (OR PARTICIPATION RATE)
 This is used to determine the individuals who actually participated from those selected to
participate
 A good response rate is 70% or more 𝑡ℎ𝑜𝑠𝑒 𝑤ℎ𝑜 𝑝𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑡𝑒𝑑
𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒 𝑅𝑎𝑡𝑒 = × 100
 A bad response rate is 50% or les 𝑡ℎ𝑜𝑠𝑒 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑
 The actual study participants will only
represent the selected sample if all
(or most0 of those selected people participate. The extent at which this occurs is called the
response rate or participation rate.
 LONGITUDINAL STUDIES
o In longitudinal studies participants are followed over a long period of time and during the
period individuals tend to drop out hence the retention rate is used.
o RETENTION RATE (OR DROP-OUT RATE)
𝑡ℎ𝑜𝑠𝑒 𝑤ℎ𝑜 𝑝𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑡𝑒𝑑 𝑎𝑡 𝑡ℎ𝑒 𝑙𝑎𝑠𝑡 𝑡𝑖𝑚𝑒 𝑝𝑜𝑖𝑛𝑡
𝑅𝑒𝑡𝑒𝑛𝑡𝑖𝑜𝑛 𝑅𝑎𝑡𝑒 = × 100
𝑡ℎ𝑜𝑠𝑒 𝑤ℎ𝑜 𝑝𝑎𝑟𝑡𝑐𝑖𝑝𝑎𝑡𝑒𝑑 𝑎𝑡 𝑇𝑖𝑚𝑒 1
o Retention rates measure the study participants for whom measures are available at the end of
the study as a proportion of those who were in the study at the beginning. (TIME 1)
o The retention rate can be calculated for any time period during the study. Alternatively, we
could calculate the drop-out rate or loss-to-follow-up.
o The closer to 100% the retention rate the better.
HOW TO DISTINGUSIH BETWEEN A WELL-DONE STUDY AND A POOR QUALITY ONE
 QUESTIONS TO ASK YOURSELF

o What is the sampling frame?
o What is the method of sampling?
o What is the response rate (and if relevant, retention rate)?
o Rate the study
 i.e.
 They’ve done a pretty good job
 Its not great, but may be the best that is possible—and its better than nothing
 They could’ve done better
MEASURES OF THE OCCURRENCE OF HEALTH STATES AND EVENTS
 PREVALENCE
o Prevalence: the measure of disease frequency in descriptive studies.
o Prevalence answers the question “What fraction of the group is affected at this moment in
time?”
o Prevalence measure existing health states at a point or period in time
o Prevalence is the number of people with a health state in a defined population at a given time.
It can be calculated via the following formula:
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑒𝑜𝑝𝑙𝑒 𝑤𝑖𝑡ℎ ℎ𝑒𝑎𝑙𝑡ℎ 𝑠𝑡𝑎𝑡𝑒 𝑎𝑡 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑒𝑑 𝑡𝑖𝑚𝑒

𝑃𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒 = × 10𝑛
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑒𝑜𝑝𝑙𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑎𝑡 𝑡ℎ𝑒 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑒𝑑 𝑡𝑖𝑚𝑒
o THE TWO MEASURES OF PREVALENCE

 POINT PREVELANCE
 The prevalence measure relates to a specific point in time
 Point prevalence: is the number of persons with the disease in a given time interval (e.g.
one year) divided by the number of persons in the population (which is prevalence at the
beginning of an interval plus any incident cases)
 It is used to pinpoint how many people are suffering from a disease at one time, then
analyse a trend to see if the number has decreased or increase the next time the point
prevalence is looked at
 Also helps determine whether further research needs to be conducted or funding for
treatment of the disease
 Point prevalence and cumulative incidence captures all the new and existing cases during
the defined point period
# 𝑝𝑒𝑜𝑝𝑙𝑒 𝑤𝑖𝑡ℎ ℎ𝑒𝑎𝑙𝑡ℎ 𝑠𝑡𝑎𝑡𝑒 𝑎𝑡 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑒𝑑 𝑡𝑖𝑚𝑒

𝑃𝑜𝑖𝑛𝑡 𝑃𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒 = × 10𝑛
# 𝑝𝑒𝑜𝑝𝑙𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝒂𝒕 𝒕𝒉𝒆 𝒔𝒑𝒆𝒄𝒊𝒇𝒊𝒆𝒅 𝒕𝒊𝒎𝒆
 PERIOD PREVALENCE
 What has existed over a period of time
 Period Prevalence: the proportion of a population that has the condition at some time during
a given period (e.g., 12-month prevalence), and includes people who already have the
condition at the start of the study period as well as those who acquired it during that period
# 𝑝𝑒𝑜𝑝𝑙𝑒 𝑤𝑖𝑡ℎ ℎ𝑒𝑎𝑙𝑡ℎ 𝑠𝑡𝑎𝑡𝑒 𝑎𝑡 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑒𝑑 𝑡𝑖𝑚𝑒

𝑃𝑒𝑟𝑖𝑜𝑑 𝑃𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒 = × 10𝑛
# 𝑝𝑒𝑜𝑝𝑙𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝒅𝒖𝒓𝒊𝒏𝒈 𝒕𝒉𝒆 𝒔𝒑𝒆𝒄𝒊𝒇𝒊𝒆𝒅 𝒕𝒊𝒎𝒆
INCIDENCE
 Incidence: the measure of disease that allows us to determine a person’s probability of being
diagnosed with a disease during a given period of time, therefore incidence is the measure of
new cases occurring in a defined time period.
 The best way to calculate the incidence is to follow a population or a group over time, so that
the occurrence of new cases can be measured.
 Incidence is best measured in longitudinal studies; however, it is possible to measure some
incidence rates in cross-sectional studies
o i.e. death and births are incident cases that can only occur once for each person, and so can be
accurately measured in a defined time frame
 Incidence is also defined as the number/frequency of new health events in a defined population
at risk, over a specified time period
THE TWO MEASURES OF INCIDENCE
 CUMULATIVE INCIDENCE
o Cumulative incidence: the proportion of a defined at-risk group or population that develops a
new clinical condition or outcome over a given time period
o Cumulative incidence can also be defined as, the number of people with new health events in a
specified time period in a defined population
o Is the number of new cases of a health state in a defined population during a specified time
period
o Measures the proportion of at-risk individuals who develop a condition o outcome over a
specified time period
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑒𝑜𝑝𝑙𝑒 𝑤𝑖𝑡ℎ 𝒏𝒆𝒘 ℎ𝑒𝑎𝑙𝑡ℎ

𝑒𝑣𝑒𝑛𝑡𝑠 𝑎𝑡 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑒𝑑 𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑
𝐶𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 = × 10𝑛
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑎𝑡 𝑟𝑖𝑠𝑘 𝑎𝑡 𝑡ℎ𝑒
𝒃𝒆𝒈𝒊𝒏𝒏𝒊𝒏𝒈 𝑜𝑓 𝑡ℎ𝑎𝑡 𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑
 INCIDENCE RATE
o Note that the denominator for an incidence rate is person-time, which is the sum of the length
of time during which each person in the population is at risk
o Incidence: the number or frequency of new health events in a defined population at risk, over
a specified time period
o Number of new cases of a health state in a defined population during a specified time period in
relation to the total person-time at risk of the health state
o The incidence rate is an average rate, in the sense that we have ignored the possibility that the
rate may have been changing from month to month.
o For the present we are assuming that the fluctuations we observed are not meaningful for our
purposes.
o We are assuming that there is some underlying constant rate of disease occurrence, and our
average rate provides an estimate of that constant rate. But the rate could change over time,
for example, by season.
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑒𝑜𝑝𝑙𝑒 𝑤𝑖𝑡ℎ 𝒏𝒆𝒘 ℎ𝑒𝑎𝑙𝑡ℎ

𝑒𝑣𝑒𝑛𝑡𝑠 𝑎𝑡 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑒𝑑 𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑅𝑎𝑡𝑒 = × 10𝑛
𝑡𝑜𝑡𝑎𝑙 𝑝𝑒𝑟𝑠𝑜𝑛 − 𝑡𝑖𝑚𝑒 𝑎𝑡 𝑟𝑖𝑠𝑘
𝑑𝑢𝑟𝑖𝑛𝑔 𝑡ℎ𝑎𝑡 𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑
 It is essential that when

calculating incidence
rates, you should include
the information of the
denominator (i.e. per
10n)
 Person-time: the sum of
the length of time during
which each person in the
population is at risk,
which forms the
denominator
Quiz
Sampling in descriptive studies.
1. Why use a sample?

2. What are important attributes for samples to be successful, and what are the types of
sampling commonly used?
3. What is a sampling frame, and what does it represent?
4. How would you calculate response rate, and when would you use it?
5. How would you calculate retention rate, and when would you use it?
6. When judging the quality of descriptive studies, there are four areas that should be
considered. What are they?
Prevalence and incidence (rates) are the principal measures of the frequency of states and events in
epidemiology.
1. What is the difference between incidence and prevalence?

2. Indicate whether each of the examples below is a measure of prevalence or incidence.
o A study of 4,000 children in selected rural areas of Tanzania looked at the rate of
mortality caused by diarrhea over a period of five years. The finding was 50 deaths of
diarrhea per 1000 children.
o Researchers were interested in examining the health status of elderly residents of a
local nursing home on February 26, 2007. It was found that 28% of the subjects had
diabetes mellitus Type II.
3. In a research study of coronary heart disease (CHD), 2000 men and 2000 women in each of
three age groups who were initially free of coronary heart disease (CHD) were recruited. The
results are presented in the following table:
AGE GROUP (YEARS) CHD (MEN) IN 1 YEAR CHD (WOMEN) IN 1 YEAR

<30 38 1
30-49 122 42
50-59 268 136
 Calculate the proportion of men and women who developed CHD in the different age
groups.
 Is this a measure of prevalence or incidence?
 How would you summarise these findings?
TOPIC 3: ANALYTICAL STUDIES
AMALYTICAL STUDIES
 Analytical studies aim to answer the WHY? Questions, and aim to identify or describe cause and
effect or association between factors.
 THE TWO MAIN QUESTIONS THAT
ANALYTICAL STUDIES ADDRESS
ARE:
o Is there an association between
two factors (such as cause and
effect)?
o How strong is an association?
 A 2-by-2 table is constructed to
measure the planned comparisons
between groups and generate
measures of association
 The relationship between the study
factor (exposure) and the outcome factor
can be clearly summarized in a 2 x 2 table.
o STUDY FACTOR (Independent Variable)
 The factor (or factors) that we are
interested in, that may be related to the
outcome
o OUTCOME (Dependent Variable)
 The outcome of interest in our study
 Analytical studies help determine the if the
study factor (exposure) is causing the
outcome factor, thus being able to detect
cause and effect relationships
TYPES OF ANALYTICAL STUDIES
 OBSERVATIONAL ANALYTICAL STUDIES

o CROSS-SECTIONAL
 Despite the fact that we discussed cross-sectional studies as being descriptive studies, they
can also be analytical studies.
 The main features of the study design of a cross-sectional study are still the same in both
descriptive and analytical studies, however, in analytical cross-sectional studies the study
factor (exposure) and outcome factor are both measured at this time point.
 The main difference between a descriptive and analytical cross-sectional study is the nature
of the research questions they address.
 Analytical cross-sectional studies set out to answer questions about associations between
study factors (exposures) and outcomes
 Descriptive cross-sectional studies only seek to describe the situation at the time they are
conducted
 PREVALENCE RATE RATIO 𝑎
 In analytical cross-sectional studies, two (𝑎 + 𝑏)
𝑃𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒 𝑅𝑎𝑡𝑒 𝑅𝑎𝑡𝑖𝑜 (𝑃𝑅𝑅) = 𝑐
prevalence rates are generated
o One for the ‘exposed’ group (𝑐 + 𝑑)
o The other for the ‘unexposed’ group
 These two-prevalence rate are then
compared via a measure of association called the prevalence rate ratio (PRR)
 INTERPRETATION OF PREVALENCE RATE
RATIOS
o PRR > 1: the exposure can be
considered a risk factor for the
outcome
 This occurs when the prevalence rate
in the exposed group is higher than
the prevalence rate in the unexposed
group
o PRR = 1: there is no association
between the exposure and the outcome
 This occurs when the prevalence rate in
the exposed group is equal to the
prevalence rate in the unexposed group
o PRR < 1: the exposure can be considered
to be protective against the outcome
 This occurs when the prevalence rate in
the exposed group is lower than the
prevalence rate in the unexposed group
 Times the low number by 100 to get a percentage
Example: Does cigarette advertising increase

the rate of smoking among teenagers?
 Yes:Yes = Those exposed to the ads and

then smoked
 Yes: No = Those exposed to the ad but did
not smoke
 No:Yes= Those who weren’t exposed to
the ads but did smoke
 No: No = Those not exposed to the ads
and did not smoke
Example: Does owning a pet reduce the rate

of depression?
 Yes:Yes = Those who have a pet and did

end up having depression
 Yes: No = Those exposed to a pet but did
not end up having depression
 No:Yes= Those who weren’t exposed to
pet but did end up having depression
 No: No = Those not exposed to a pet and
did not end up having depression
 STRENGTHS OF A CROSS-SECTIONAL ANALYTICAL STUDY
 Often based on representative sample of the population which helps with generalizability
 Carried out over short periods
o Relatively cheapo and quick to conduct
 This is important for an urgent problem
 Useful for a new problem where little information exists
 WEAKNESSES OF A CROSS-SECTIONAL ANALYTICAL STUDY
 Cannot separate the direction of the cause and effect
o This is because the exposure and the outcome are measured at the same time
 Measuring prevalent cases leads to the inclusion of higher proportion of cases with
outcome of long duration
o ECOLOGICAL STUDIES
 Ecological Study: an observational study defined by the level at which data are analysed,
namely at the population or group level, rather than an individual level, where they look for
an association between the occurrence of disease and exposure to known or suspected
causes
 The measurement of outcomes in ecological studies is usually a population-based rate, such
as an incidence, prevalence rate or mortality rate
 Groups of individuals, areas or other larger units are analyzed.
 e.g. a study examining the association between the rate of obesity in local government
areas of Melbourne and the amount of publicly-accessible open space in those local
government areas is an ecological study
o This study uses an exposure (publicly-accessible open space) that is an attribute of areas
rather than individuals.
o The outcome factor is the rate of obesity in those areas
 Reasons why group or area level information is used, rather than information for individuals
 Group or area information may be the only data available, or the only data that is easily
obtained
 The exposure and/or the outcome may occur at a group or area level and have no
equivalent at the individual level
 TYPES OF ECOLOGICAL STUDIES
 Ecologic comparison study: examines exposure rates and disease rates among different
groups over the same time period
 Ecologic trend study: examines changes in exposure and changes in disease within the
same community, country, or other aggregate unit.
 EXAMPLES OF EXPOSURES THAT COULD BE USED IN ECOLOGICAL STUDIES INCLUDE:
 Environmental measures: such as level of pollution, annual rainfall, etc.,
 Lifestyle measures: such as annual sales of tobacco, number of gyms or facilities for
physical activities
 Economic measures: such as the unemployment rate or per capita income
 STRENGTHS OF AN ECOLOGICAL ANALYTICAL STUDY
 The can answer questions on cause and effect
 The can use existing data collected for different purposes
 Are relatively quick and easy to do
 Important type of study for phenomena that apply at group or are level
o e.g. Characteristics of physical or social environment
 Some exposures may vary very little, or not at all within a given group, so by comparing
groups ecologic studies allow us to study greater range of exposure
 There are certain variables that are only defined and can only be measured for groups
 The only way to study these exposures is at a group level
 Can generate hypotheses for analytical epidemiological studies (cohort, case-control)
 WEAKNESSES OF AN ECOLOGICAL ANALYTICAL STUDY
 Weaknesses for answering questions on associations (such as cause and effect questions)
can be subject to ecological fallacy
o Ecological fallacy: associations found at area level do not apply at individual level
 Observations made at the group level may not represent the exposure-disease
relationship at the individual level
 Occurs when incorrect interferences about the individual are made from group level
data
o e.g. Associations of smoking prevalence with individual and area level of social cohesion
 There is no information on the cross-classifications of exposures and outcomes
o Therefore, you can’t be sure that the individuals who got the disease are the ones who
were exposed
 There is no information on individual level variables which may be confounders
o In other words, there is no information on factors that can impact both the disease and
the exposure
o STATISTICAL ANALYSIS OF AN ECOLOGICAL ANALYTICAL STUDY
 Correlation: describes the linear relationship between variables, the extent of the
relationship is quantified by a correlation coefficient or r-value
 r-value can have values between –1 (strong negative correlation) and +1 (strong positive
correlation)
 Can quantify for every unit change of exposure level there is a unit change in disease
o CASE-CONTROL STUDIES
o COHORT STUDIES
 EXPERIMENTAL ANALYTICAL STUDIES
o RANDOMISED TRIALS
o COMMUNITY TRIALS
Quiz
1. What is the primary objective of analytical epidemiological studies compared with
descriptive epidemiological studies?
2. What is an effective way of estimating the effect of exposure to a potential risk (study
factor) on the outcome factor in a population? Give an example.
3. Name two strengths and weaknesses of analytical cross-sectional studies.
4. Calculate the prevalence rate ratio for the following. What do you conclude?
5. What is the study factor(s) and outcome factor(s) for the following study abstract?
Bodnar, Cogswell and Scanlon (2002) Low Income Postpartum Women Are at Risk of Iron Deficiency.
The American Society for Nutritional Sciences, vol 132 (8), pp. 2298-2302.
We estimated the prevalence of postpartum iron deficiency, anaemia and iron deficiency anaemia in
the United States and compared risk of iron deficiency between women 0–24 mo postpartum (n =
680) and never-pregnant women, 20–40 y old (n = 587). We used data from National Health and
Nutrition Examination Survey, 1988–1994. Iron deficiency was defined as abnormal values for =">2
of 3 iron status measures (serum ferritin, free erythrocyte protoporphyrin, transferrin saturation).
Iron deficiency prevalences for women 0–6, 7–12 and 13–24 mo postpartum were 12.7, 12.4 and
7.8%, respectively, and 6.5% among never-pregnant women. After adjustment for confounding, the
risk of iron deficiency among women with a poverty index ratio 130% who were 0–6, 7–12 and 13–
24 mo postpartum was 4.1 (95% confidence interval 2.0, 7.2), 3.1 (1.3, 6.5) and 2.0 (0.8, 4.1) times as
great, respectively, as never-pregnant women with a poverty index ratio > 130%, but risk was not
elevated for never-pregnant women with a poverty index ratio 130%. Compared with the same
referent, the risk of iron deficiency was not meaningfully different for women with a poverty index
ratio > 130% who were 0–6, 7–12 or 13–24 mo postpartum. Given that low income postpartum
women bear a substantially greater iron deficiency risk than never-pregnant women, more attention
should be given to preventing iron deficiency among low income women during and after
pregnancy.
TOPIC 4: ANALYTICAL STUDIES—COHORT
STUDIES
COHORT STUDIES
 A ‘cohort’ is a group of individuals sharing a common characteristic

o E.g. students in HSH205 represent a cohort because they have the unit in common
 A cohort study is an observational analytical study in which a group of people exposed, and a
group not exposed to a possible risk factors are followed up over time
o Exposure status is measured first, then participants are followed up until outcome status is
established
 The incidence of the outcome in
each group (the exposed and not
exposed) are compared
 A cohort study can be either
prospective or retrospective
o Refer to the timing of the data
collection not the relationship
between exposure and outcome
o RETROSPECTIVE
 Both the exposure and the
outcome have occurred at the
time of the commencement of
the study
o PROSPECTIVE
 Exposure has occurred prior to
the commencement of the
study but the outcome has not
 This type of study begins with a
group of people (a cohort) who do
not have outcome of interest
 They are then classified into
groups according to exposure (or
non-exposure) to a factor that
may be a risk factor for that
outcome
 The participants are followed until
an outcome status is established
 The relationship between
exposure and outcome factor can
then be assessed
 STRENGTHS OF COHORT STUDIES
o Particularly good when exposure
is rare as long as you ensure that you have sufficient exposed participants
o Able to examine multiple effects of single exposure
o Establish temporal relationship between exposure and outcome
o Allow direct measurement of incidence of outcome in exposed and unexposed
 WEAKNESSES OF COHORT STUDIES
o Inefficient for rare diseases because you need sufficient individuals exposed to the disease
o Expensive and time consuming, and also cannot readily test recent hypotheses
o If retrospective, require existence of adequate records
o Validity seriously affected by loss to follow up
 E.g. A study that investigates whether asbestos is a risk factor for the outcome of interest,
asbestosis. The steps we would take are:
o A:
 Recruit a group of renovators of a period homes a s the study population
o B:
 Make sure the study population don’t have the outcome factor – that is, exclude any
renovators with asbestosis
o C:
 Classify individuals as ‘exposed’ or ‘unexposed’ to the study factor (in this case, the study
factor is asbestos)
o D:
 Follow the exposed and unexposed groups forward in time (20 years) until an outcome
occurs in some of the group members (outcome=asbestosis, a form of lung disease)
o E:
 Finally, observe and record incident cases of the outcome in both the exposed and
unexposed groups
DESIGN AND CONDUCT OF COHORT STUDIES
 In a cohort study the ideal

 SELECTION OF THE STUDY POPULATION
o The strategy used to select a sample of a population should be guided by considerations of
accuracy, completeness and practicality.
o Exposure: factor under investigation as a possible risk factor/ protective factor for the outcome
of interest
o CRITERIA FOR PARTICIPANTS
 Free from the study outcome
 At risk of developing the study outcomes
o EXPOSED
 COMMON EXPOUSRES
 The aim is to have both the exposed and non-exposed group from the same source of
population and the only difference between the group is their exposure status
 Easily identifiable from the general population
 Select sample of general population
 When selecting participants, you must make sure to exclude individuals who have the
outcome of interest
 RARE EXPOSURES
 Select the group based on exposure
 Select comparable unexposed group
 OCCUPATIONAL EXPOSURES
 Select specific occupation groups e.g. nurses, miners, etc
 Unexposed may be a different group from the same workplace or a different workplace
 Occupation is the exposure
ASCERTAINMENT OF EXPOSURE AND OUTCOME STATUS
 EXPOSURE
o Pre -existing records
o Self-reported (questionnaires and interviews)
o Multiple sources
 OUTCOME
o Death registry o Self-report (questionnaires and
o Disease registries interviews)
o Hospital/GP records o Direct examination
o Diagnostic tests
MEASRURES OF ASSOCIATION
 ANALYSIS
o BASIC ANALYSIS OF DATA FROM COHORT STUDIES
 Calculation of incidence rates of outcome of interest
 Rates compared between exposed and unexposed
o MEASURES OF ASSOCIATION
 Relative Risk (RR) and Rate Ratio (RR)
 Rate in exposed compared to rate in unexposed for outcome of interest
 Cohort studies use measure of association or impact to estimate the size of any association
between exposure and outcome, and indicate how much more likely people in an exposed group
are to develop the outcome than those in an unexposed group
 Most commonly used measures of association are called ‘relative risk’ measures (this term is
used for a number of different measures)
 There is a second group of measures of association known as the ‘difference measures’
o The most common of these measures is the risk difference which makes use of cumulative
incidence rates
 Analysis of a cohort study is via calculation of the incidence rate (or risk of developing the
outcome) in the exposed group compared to that in the unexposed group
 The type of incidence rate used determines the appropriate measure of association and is often
calculated as either a:
o Risk ratio: makes use of cumulative incidence rates
o Incidence rate ration: makes use of
person-time incidence rates and takes
into account differences in duration of
follow-up of the study group
o Odds ratio: sometimes used as a
measure of association when
convenient to do so, but is more
commonly used in case control studies
as it only provides an estimate of the
risk ratio
 All the measures listed above are
‘relative measures’ (hence the term
relative risk).
 RELATIVE RISK (RISK RATIO) 𝑎
o Cohort studies can be analysed using (𝑎 + 𝑏)
cumulative incidence rates via a 2 x 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑅𝑖𝑠𝑘 (𝑅𝑖𝑠𝑘 𝑅𝑎𝑡𝑖𝑜)(𝑅𝑅) = 𝑐
2 contingency table to calculate the (𝑐 + 𝑑)
relative risk (risk ratio)
o Pertains to the risk of an outcome in
exposed persons relative to the risk in
the unexposed
o Ratio > 1: greater risk in exposed group
o Ratio = 1: same in both groups
o Ratio < 1: the risk is lower in the exposed
group
 ATTRIBUTABLE RISK (AR)
o Also known as the risk difference that
concerns the difference in absolute risk
of the exposed compared to the
unexposed group. 𝑎 𝑐
𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑎𝑏𝑙𝑒 𝑅𝑖𝑠𝑘 (𝐴𝑅) = −
o Therefore, AR measures excess risk in (𝑎 + 𝑏) (𝑐 + 𝑑)
exposed individual, assumed to be
attributable to exposure.
o ATTRIBUTABLE RISK PERCENTAGE (AR%)
 The excess can also be expressed as a percentage of the risk in the exposed group that is
attributable to the exposure to give the measure called the attributable risk percent (AR%).
The AR% measures the proportion of risk in the exposed which is assumed to be attributable
to exposure.
𝑎 𝑐
−
(𝑎 + 𝑏) (𝑐 + 𝑑)
𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑎𝑏𝑙𝑒 𝑅𝑖𝑠𝑘 𝑃𝑒𝑟𝑐𝑒𝑛𝑡 (𝐴𝑅%) = 𝑎 × 100
(𝑎 + 𝑏)
 RATE RATIO 𝑎
o As losses to follow up often invalidate cumulative risk
𝑌1
measures, it is necessary to use rates based on person-time of 𝑅𝑎𝑡𝑒 𝑅𝑎𝑡𝑖𝑜 = 𝑎
observation.
𝑌0
Quiz
1. Describe some key features of cohort study designs.
2. A study found that menopausal status increases the rate of coronary heart disease (CHD) in
women, based on the data below. Using this information calculate the rate ratio of CHD
associated with menopause.
Cases Person years at risk Rates per 1000

person years
Pre menopause 6 8234 0.73
Post menopause 26 6989 3.7
3. A cohort study was used to examine use of ‘The Pill’ and the development of breast cancer.
Calculate the incidence rate ratio of the following:
Use of “The Pill” Breast cancer Person years Rate per 100,000
cases person years
Yes 55 48,222 114.05
No 45 55,358 81.29
Total 100
4. What is ‘attributable risk’? Calculate the attributable risk and attributable risk percent of the
following and interpret the outcomes.
Outdoor Melanoma + ve Melanoma - ve Total 5 year risk

Occupation
Yes 27 455 482 5.6%
No 77 1831 1908 4.0%
Total 104 2286 2390 4.5

TOPIC 5: ANALYTICAL STUDIES—CASE
CONTROL STUDIES
CASE-CONTOL STUDIES
 A case-control study compares two groups of people: with the outcome of interest (cases) and a
very similar group of people who do not have the outcome of interest (controls)
 Epidemiologists can then study the history of the people in each group to identify exposures or
study factors that may be associated with the outcome
FEATURES OF CASE-CONTROL STUDIES
 Begin with identification of a group of cases (individuals with the outcome factor) and a group of
controls (individuals without the outcome factor)
 The prevalence of exposure to a study factor (of interest) is measured in the two groups and
compared
 If the prevalence of exposure is higher in the cases than in the controls, the exposure might be a
risk factor for the outcome factor.
 If the prevalence of exposure is lower, then it might be a protective factor for the outcome
 Case-control studies hope to achieve the same goal as cohort studies, not usually at considerably
less cost and time, and are more useful in studying rare outcomes
 The comparability of cases and controls which us essential to be able to provide sound evidence
of valid statistical associations between the outcome and the exposure factor
 Controls should come from the same population that gave rise to the cases
HOW TO SELECT A CASE
 A definition of what constitutes a case is formulated and that definition has to be clear, concise
and accurate to ensure correct classification of cases and to ensure that non-cases are correctly
identified as non-cases (i.e. Prevalent or incident rates)
 Then the all possible sources of the cases are identified (i.e. hospital/clinical records, death
certificates, registries)
 To enroll all selected cases into the study, an important consideration during the selection of
cases is are the cases prevalent or incidence, and also consider the effects of survival, referrals
and refusals
 Case ascertainment must be independent of exposure status
HOW TO SELECT A CONTROL
 Controls have to be representative of the population from which cases arise. If the controls have
developed an outcome of interest they would’ve became a case in the study
 Controls must be sampled independently of exposure status and it is okay to have more controls
then cases
 IDEAL CONTROL GROUP
o Representative of the population from which the cases arise
o Identical to cases with respect to all characteristics that influence likelihood or degree of
exposure and are also related to occurrence of outcome
o Comparable in that the presence of exposure can be measured in a manner identical of that
used in the cases
o Accessible in that exposure information is relatively cheap to obtain
CASE -CONTROL STUDIES COHORT STUDIES
 Measure the odds of having an exposure or  Useful for rare exposures
characteristics in the case and control populations  The cohort study design identifies a people
 These odds are then compared using the odds ratio exposed to a particular factor and a comparison
– a measure of association group that was not exposed to that factor and
 The outcome is measured first, then the exposure measures and compares the incidence of
history of participants are established disease in the two groups
 Study group is categorized by presence or absence  The characteristic feature of a cohort study is
of outcome factor that the investigator identifies subjects at a point
in time when they do not have the outcome of
 Cases and controls investigated to see if they have
interest and compares the incidence of the
the study factor (exposure of interest) outcome of interest among groups of exposed
 The cases and controls are then compared with and unexposed (or less exposed) subjects
respect to the frequency of one or more past  A higher incidence of disease in the exposed
exposures. group suggests an association between that
 If the cases have a substantially higher odd of factor and the disease outcome.
exposure to a particular factor compared to the  This study design is generally a good choice
control subjects, it suggests an association. when dealing with an outbreak in a relatively
 This strategy is a better choice when the source small, well-defined source population,
population is large and ill-defined, and it is particularly if the disease being studied was
particularly useful when the disease outcome fairly frequent.
was uncommon.
ADVANTAGES DISADVANTAGES
 Can investigate a wide range of possible risk factors  Vulnerable to bas in selection of cases and controls
or exposures this is a result of when groups to be compared differ
 Useful for rare diseases because you start with the systematically in relation to exposure and outcome
outcome  Vulnerable to bias in measurement this occurs when
 Do not suffer from losses to follow up as the recording or obtaining data on exposure and that
outcomes and exposures have already occurred at differs between cases and controls
the commencement of the study  Problems in sorting out sequence of events
 They are suitable to test current hypotheses  Not suitable for investigation of rare study factors or
 Enable consistency of measurements techniques exposures due to the very large numbers of cases
easily maintained. That is the ascertainment of and controls that would required to detect rare
outcome and exposure status is easily measured in exposure
the same way for each group  No estimate of disease incidence
SOURCES OF EXPOSURE INFORMATION
 Face-to-face interviews
 Existing records i.e. medical records, death records etc
 Self-administered questionnaires
 Telephone interviews
 Computer-assisted telephone interviews (CATI)
 Tissue banks and databases on biochemical and environmental measures
MEASURE OF ASSOCIATION
 ODDS RATIO
o The measure of association commonly used in case-control studies
𝑎×𝑑
𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜 =
is the odds ratio (OR). The OR is the odds of exposure amongst the 𝑏×𝑐
cases compared to the odds amongst the controls
o The odds ratio tells us what are the odds of having been exposed to study factor X after
developing outcome Y?
o The Odds ratio can be used to estimate relative risk/risk ratio and has the same interpretation
as the relative risk/risk ratio.
o E.g. A group of 135 men under 55 years of age who are identified as having coronary heart
disease (the cases) are matched with a group of 1200 men under 55 years of age without
coronary heart disease (the controls). Cholesterol measures are taken from each group of men,
and those with cholesterol levels of 6.5 mmol/L or greater are identified. The results are shown
in a table below:
CASES CONTROLS
CHOLESTEROL YES 75 610 TOTAL EXPOSED= 685
>6.5 mmol/L NO 60 590 TOTAL UNEXPOSED= 650
TOTAL 135 1200 1335
𝑎×𝑑 This means that there is a 1.21 times greater

𝑏×𝑐 chance of coronary heart disease occurring
75 × 590 with a cholesterol level of greater than 6.5
610 × 60 mmol/L than a cholesterol level equal to or
44250 greater than 6.5 mmol/L for men under 55
36600 years
𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜 = 1.21
Quiz
1. Describe the features of a case-control study.
2. What are the advantages and disadvantages of a case-control study?
3. Locate a copy of the case-control study by Sanchez-Guerrero et al. (1995) entitled ‘Silicone breast
implants and the risk of connective-tissue diseases and symptoms’ (N Engl J Med. 1995 Jun
22;332(25):1666–70). The full text version is available online via the library.
Please answer the following questions after reading this journal article:
 What is the aim of this study?

 Why was this study conducted?
 How was the original cohort defined?
 How was the information on breast implantation collected?
 Was this a valid method?
 How was the information about connective tissue disease collected?
 Was this method valid?
 What measure of association was calculated?
 How was this calculated?
4. Cases represent 100 women with hypoglycaemia. What is the essential prerequisite of the control
group?
5. Is it possible to test the hypothesis that rates of hypoglycaemia are higher in single women than in
married, using the same 100 cases and controls?
6. Suppose you found out that 80% of the hypoglycaemia cases were married; does this demonstrate
that being married increases the risk of developing hypoglycaemia?
7. Assume that 90% of the control group are married. Estimate the odds ratio of hypoglycaemia for
single women. What is the conclusion?
8. Locate a copy of the case-control study by Teo KK, INTERHEART Study Investigators (2006).
Tobacco use and risk of myocardial infarction in 52 countries in the INTERHEART study: a case-
control study, Lancet, 368(9536): 647–58. The full text version is available online via the library.
After reading this paper, please discuss the following issues:
 What is the aim of this study?

 In this study, how was a case(s) defined?
 Were incident or prevalent cases used? Why?
 What was the recruitment strategy for the cases?
 Were exclusion criteria used for the controls? What were they, and why were they included?
 In Table 3, after Adjustment 1, what was the risk of AMI for current smokers compared to
those who gave up smoking more than 20 years before? What does this say about smoking
and risk of AMI?
 Describe two limitations of this study.
TOPIC 6: ANALYTICAL EXPERIMENTAL
STUDIES—RANDOMISED CONTROLLED TRIALS
AND COMMUNITY TRIALS
EXPERIMENTAL STUDIES
 Analyticcal observational studies mean that we observe what is happening rather than
intervening whereas experimental analytical studies intervene to see what happens to the
outcome of interest.
RANDOMISED CONTROLLED STUDIES (RCTs)
 Randomised controlled trials are a type of experimental studies that randomly assign study
participants to an intervention group or to a control group, in order to measure the effects of
intervention.
 Participant characteristics of the sample are collected at the baseline (i.e. gender) and a follow
up is important for this study
 RCTs commence with a population eligible for intervention being studied, whereby the
individuals are randomly allocated to receive new intervention or not, and the outcome of the
intervention is assessed by comparing intervention and control groups.
 In clinical medicine RCTs are used to measure the efficacy of a treatment and compare new
treatments with standard treatments (i.e. drug therapy, surgery)
 In public health and health promotion randomized control trials are used to evaluate health
promotion programs, evaluate preventative strategies and measure efficacy of new vaccines
 Randomised controlled trials include a control, or comparison group, so that outcomes in the
intervention group can be compared with the control group (intervention or placebo group)
 The best way to assess a new intervention is to identify a group who would benefit from the
intervention (if it is effective), and the randomly allocate them to receive intervention or not to
receive it
 RCTs are generally considered to give the best evidence on the effectiveness of interventions of
all epidemiological studies because the results of the random allocations process whereby there
is close similarity of the groups in all aspects other than the intervention received.
 Blinding is when participants don’t know who is in the intervention group and who is in the
control group, and when neither the participants nor the researcher know who is in the control
or experimental and who received the intervention and who didn’t it is called a double blind
trial, this avoids potentially affecting the behaviour of participants in the trial as well as how
patients outcomes are evaluated.
 MAIN ISSUES
o
Maintenance and assessment of compliance which is whether all the participants are doing
what they are supposed to be doing
o Achieving high and uniform rates of ascertainment of outcomes, and in order to do this you
need to ensure that accurate and complete data are collected for all groups, the method of
data collection are the same for each group and all groups are followed up in the same way
 MEASURES OF ASSOCIATION
o Relative risk, risk ratio, attributable risk and attributable risk percentage are the relative units
of analysis
o Intention to treat analysis
 Participants are analysed in groups to which they were randomized regardless of any
crossovers, where participants are changing between groups
 Preserves randomization and any bias due to unplanned withdrawals or crossovers between
different groups of participants are avoided
 Provide the best evidence for the question ‘does an  Ethical considerations (harmful interventions cannot
intervention work?’ be tested)
 RCTs balance out differences between groups other  Need to consider if it is ethical to withhold
than intervention via the random allocation process treatments/interventions from the control group.
thus the groups are similar except for the  Is it safe/ethical to expose participants to the
intervention treatment/intervention and do the benefits
 Carry less risk of bias confounding to other outweigh the risks
epidemiological designs  Is it ethical not to conduct an RCT for a new
 Provide strong evidence of causal relationships intervention?
between the intervention and the outcome  The results depend on participant compliances
 If properly conducted (large sample size correct  Feasibility of treatment/intervention (is It common
randomization procedures) intervention and control to conduct a RCT on common widespread
groups will be similar in all aspects apart from the interventions (i.e. vitamins))
intervention  Not cost effective and is very expensive
COMMUNITY (CLUSTER) TRIAL
 Community cluster trials are essentially the same as RCTs, except that the entire group or
communities are assigned to intervention or control groups (e.g. schools, suburbs, towns,
clinics). This is advantageous in a range of situations:
o When the intervention can only be implemented at a group level e.g. mass media campaign,
changes to school playground
o Where it is likely that there will be contamination between the intervention and comparison
groups i.e. where the comparison group is likely to receive or have access to the intervention.
This may occur in situations where people responsible for the intervention are dealing with
participants in both the intervention and control groups e.g. health promotion workers, school
teachers, clinicians.
 Community trials start by identifying eligible participants, in this case ‘communities.’ To be able
to assess the impact of the intervention, it is necessary to have baseline measures for the
communities (i.e. data on incidence or prevalence of a particularly health state, knowledge or
attitudes towards health behaviours).
 These communities are then assigned to an intervention or control group, that is, one
community will receive the intervention and the other will not
 After a specified period of time, the outcomes of interest are measured in both the intervention
and control communities.
 This is useful where ‘contamination’ between groups is likely to occur
o i.e. media or social marketing campaigns
o School-based intervention
o Intervention with GPs or other health professionals
 They allow researchers to select whole communities of groups from a range of groups or areas
to provide the intervention, which means that other groups or areas can then be used as the
control group
 The outcome of the intervention is assessed by comparing intervention and control groups
o These groups are comparable at the commencement of the study due to randomization.
Intervention should be the only source of difference
o Data are collected from all participants of the group or just a sample of the group
 Community trials can measure the effectiveness of intervention to change incidence or
prevalence in whole communities or populations
 Community trials are the only way to evaluate  Allocation of whole groups.
groups/community level interventions with rigour of  Usually small numbers of groups/ communities are
RCTs randomized
 Typically focus on effectiveness, evaluating outcomes  Difficult and expensive to conduct and results in limited
under conditions of actual use study power and hence require larger sample sizes to
lower this limitation
 Bias may occur if the communities are not comparable
in terms of the population demographics
 Difficult to ascertain the effect of the intervention in
community trials
EXAMPLES OF LARGE-SCALR INTERVENTION STUDIES
 In the early 1950s, a field trial of polio vaccine was conducted in which over 400 000 school
children were randomly assigned to receive either the vaccine or a placebo injection. This led to
millions of children being given the vaccine as a result of the efficacy and safety of this
treatment, resulting in a major decline in the incidence of polio throughout the world (Francis et
al., 1955).
 A randomised, controlled community trial was conducted to assess the effectiveness of vitamin
A supplementation to prevent childhood mortality in Indonesia. Children aged between 1–5
years in 229 villages were given two doses of vitamin A, while children of the same age in 221
control villages were not given vitamin A until after the study. Mortality among children in the
intervention villages was 50% lower than in the control villages (Sommer et al., 1986). Consider
the ethical implications!
 MRFIT – the Multiple Risk Factor Intervention Trial, in which almost 13,000 men at risk of CHD
were randomly allocated to either a special programme aimed to help them stop smoking and
reduce their blood pressure and cholesterol levels, or to a control group who received their
usual care. There was a non-significant, 7% reduction in CHD in the intervention group. The
authors concluded that this difference may have been small because many men in the control
group also changed their behaviour (e.g. by giving up smoking; MRFIT Research Group, 1982).
Quiz
An RCT was conducted to study the effects of dietary factors on the risk of myocardial infarction (MI)
in men under 70 years of age who had recovered from a previous MI. Patients admitted to several
different hospitals for MI were recruited upon their recovery.
Eligibility criteria: MI diagnosis; age (under 70 years); sex (male); restrictions regarding other
potential confounders; and should be able and willing to participate in the intervention. Eligible
study participants and their doctors were informed about the trial and asked to give their informed
consent. Each study participant was allocated a unique trial number, after which he/she was
randomly assigned to an intervention or control group.
Using this information answer the following questions:
1. Confounding is introduced by risk factors associated with the exposure; however, this is
reduced by randomisation. How does the number of study participants involved in the
randomisation influence the possibility of achieving this?
2. Confounding may also be dealt with by restrictions when selecting the study population, and
by methods used to control confounding in the data analysis phase. Compared with these
strategies, what is the principal advantage of random assignment of exposure?
3. Does randomisation prevent all sources of confounding? Explain.
4. To be eligible for a trial, study participants should not be exposed regardless of the
intervention. Therefore, men who had already intended to follow the intervention diet were
not included in this study. If they had been included, what difference would it make?
5. What is the main difference between RCTs and Community Trials?
TOPIC 7: DATA COLLECTION AND
MEASUREMENT IN EPIDEMIOLOGY
MEASUREMENT IN EPIDEMIOLOGY
 Measurement: is the assignment of numbers or labels to objects and events according to rules.
 In epidemiology, these objects and events are behaviours, characteristics, social and
environmental factors and so forth.
 There are four main scales of measurement; nominal scales, ordinal scales, interval scales and
ratio scales
COMMON TERMS
 Data: collection of facts/information obtained b measurement

 Data collection: gathering information
 Measurement: the assignment of numbers or labels to objects and events according to rules
SCALES OF MEASUREMENT
 Nominal scales: no inherent order

 Ordinal scale: ordered
 Interval scale: ordered and equal distance between each data point
 Ratio scale: continuous
DATA COLLECTION
 QUESTIONNAIRES
o SELF ADMINISTERED QUESTIONNAIRES
 Standardized questionnaires that are asked in the same way and administered in the same
order on the questionnaire for each participant
 Self-administered questionnaire simply means that the individual receives the questionnaire
either through mail, email or some other way and completes it themselves without the
assistance of a researcher.
 Relatively cost effective compared with other forms  Typically have low response rates
of questionnaire data collection  Inability to seek clarification on questions
 More likely to elicit “truth” particularly on sensitive  Cannot seek detailed and complex information
topics  Respondents misunderstanding/ not following
 Decreases the likelihood of social desirable responses instructions
o INTERVIEWER ADMINISTERED QUESTIONNAIRES
 A standardized questionnaire is given by the researcher
 The results can be affected by the presence of the interviewer in a number of ways
 Increases the response rate
 Increases the likelihood of obtaining complete data
 Ensures that there is no misunderstanding of the questions as the interviewer is able to
clarify questions to the individuals level of understanding
 Increases the likelihood of socially desirable responses
 The way a participant responds to question can be affected by the way in which the
interviewer asks the questions
 Interviewer could influence response
 They are costly and require training to interviewees
 Time consuming to conduct interviews
 RECORDS AND REGISTRIES
o Data collected for purposes other than epidemiology. Such as:
 Medical/Pharmaceutical records
 Births
 Deaths (General Record of Incidence of Mortality (GRIM))
 Disease Registers
 Cancer register
 National Diabetes Register
 Notifiable Disease register (HIV/AIDS, meningitis, legionnaires, TB, food poisoning)
 Australian Childhood Immunization Register (ACIR)
 Hospital morbidity databases
 Low cost  Recording error due to the data being recorded
 Quick as data already exists and has been collected incorrectly
previously  Differences in what and how much is recorded
 High accuracy in that it can be better than individuals especially in the case of medical records where some
recall doctors may have detailed notes, and some may have
 Provide data on exposures/ outcomes prospectively refined notes that cater to their understanding
even those in the past as we can track and link  Inaccurate, incomplete reporting due to social
individual records in order of time to identify what desirability
came first the exposure and outcome without having  Record of treatment doe not mean it was followed
to rely on individuals recall  Data abstraction requires standardized instrument
 Less influenced by social desirability and skilled/trained reviewers
 High response rates  Data completeness and quality
 DIARIES
o Kept by the participants to record specific or specified events as instructed by the researcher
o Are commonly used but are not the best source of data for epidemiological studies
 Assumed to be highly accurate because the  Demand time and skill from the participants which is
participants are not required to ‘summarise’ their time consuming and costly
behaviour  Participants need to be motivated to sustain activity
 Participants record exactly what they did and when over time and on a daily basis
  If the participant forgets then recall bias is introduced
if they retrospectively complete the diary
 Social desirable responding
 PHYSIOLOGICAL MEASUREMENTS
o Measures taken from human which are objective rather than subjective
o Can have a potential source of measurement error, whereby the measurement may not be
accurate due to equipment not being properly maintained or calibrated or due to error in the
way that the researcher collected the data
o They are not always possibly and can be time consuming and costly
o E.g. blood pressure, blood glucose, cholesterol, genotypes and anthropometric
 ENVIRONMENTAL PHYSICAL AND/ OR CHEMICAL ANALYSIS
o Includes environmental agents that may be physical, chemical or biological, this can include air
quality, soil levels and water quality
o This type of data may be collected from local environments (schools, leisure centers, local
shops etc) and from personal environments (such as makeup, food, etc)
o ADVANTAGES AND LIMITATIONS
 Objective ‘scientific’ testing
 Past data sometimes available (i.e. weather reports)
 Measurement error
SOURCES OF DATA
 Common sources of data of population health data that are routinely collected include
o Census data
o Disease registries (such as births, deaths, cancer and infectious diseases)
 The collection of registry data is a national responsibility and often this data is provided to both
the UN and the World Health Organisation (WHO).
 Other sources of Data include regular surveys such as the National Health Survey and hospital
records.
 Data that has been previously collected and as a  Often the data may be incomplete or of poor quality
result it is inexpensive to access  Secondary data sources may provide you with vast
 Time effective and inexpensive amount of information, but quantity is not
 High accessibility and is available from multiple synonymous of appropriateness. This is simply
sources because it has been collected to answer a different
 Feasibility of both longitudinal and international research question or objectives
comparative studies  Lack of control over data quality
Quiz
1. Provide an example for each of the four levels of measurement in epidemiology?
2. What are the main advantages of self-administered questionnaires compared with
interviewer administered questionnaires?
3. Explore the Victorian Department of Health, Health Status of Victorians web page. Generate
a Burden of Disease Life Expectancy report for your local government area. Now repeat this
for Victoria. Is the life expectancy in your local government area higher or lower than that
for Victoria?
4. Explore the Australian Bureau of Statistics web site. What data sources do you think are
most helpful? Why? How do you think you could use these population health data as a
health practitioner?
TOPIC 8: ASSOCIATION AND CAUSATION
 To a large degree the accuracy of results is determined upon the design and conduct of the
study, and the analysis of the study’s data
 The accuracy of the results is also determined by the degree of absence of random error, which
refers to precision and the degree of absence of systematic error (validity)
 Not all associations in epidemiological studies indicate causation and not all null associations
indicate lack of causation, this is due to the existence of random error and systematic error (also
called bias).
 The precision with which measurements are made affects the results of epidemiological studies
 Not all associations in epidemiological studies indicate causation and not all null associations
indicate lack of causation
RANDOM ERROR (PRECISION)
 Precision can be improved by

increasing the study size—i.e. the
size of the study population, the
duration of the follow-up, or
both. Sample size is influenced by
several factors—the availability
of potential study participants, as
well as financial and other
practical considerations.
 However, precision should be assessed carefully in the planning phase of the study
 Statistical methods can be used to assess the probability of obtaining a result by chance alone to
assess the range of values within which an estimate is likely to fall
 Accuracy indicates the extent which the study findings are free of errors
 Validity is the truth and reliability is a type of consistency of things being the same all the time
 When collecting data from large samples mistake tend to happen, and that is okay as long as the
mistakes are random meaning that there is no discernable pattern and in some cases the
mistakes can cancel each other in the long run
 Random measurement errors can affect the precision of a measure of association, but because
they are likely to increase or decrease a value no bias is introduced
 THREE MAIN SOURCES
o BIOLOGICAL VARIATION
 Not something the can be controlled and exists naturally within the sample or the population
o SAMPLING ERROR
 When a different sample from the same population gives a different estimate
 Sampling error is the random variation that can result when using sampling statistics to
estimate population parameters
 Sampling error can compromise precision thus there is always the possibility that the findings
based on the samples will not match the those in the population of interest
 Type 1 error: the probability of finding an association when none exist
 Type 2 error: the probability of not finding an association when one does exist
o MEASUREMENT ERROR
 E.g. inconsistency in taking measurements the same way repeatedly
 Are the results affected by random error or chance and whether they are threatened by
imprecise measurement (noise) or lack of power
SYSTEMATIC ERROR (VALIDITY)
 Bias occurs if there is a systematic tendency by a study to produce results that deviate from the
truth. For this reason, it is vital for researchers to identify the potential size and direction of the
bias in interpreting a study’s outcome
 A study with small systematic error trends to have high accuracy.
 There are two main types of systematic error: selection bias (also called ascertainment bias) and
measurements bias (also called information or observation bias)
 SELECTION (ASCERTAINMENT) BIAS
o Selection bias occurs when the participants of a study are not representative of the target
population about which conclusions are to be drawn and when there is a systematic difference
between the characteristics of the participants and those not selected for a study (e.g.
volunteer bias)—whereby those who volunteer to participate in a study may be inherently
different to non-responders)
 E.g. when the study factor makes people unavailable for a study.
 E.g. in occupational epidemiology, the healthy worker effect, whereby workers who are
severely ill or disabled are unable to work and are excluded from employment
o One way to avoid this effect is to compare workers in a specific job with workers in other jobs
that differ in the occupational exposures
o Randomization of participants assures that the groups are comparable, provided that the study
is large enough
 MEASUREMENT (INFORMATION) BIAS
o Measurement bias occurs when individual measurements or classifications of the exposure or
the study outcome are inaccurate
o One example includes when specimen of intervention and control groups are analysed
randomly by different laboratories that in turn have different quality assurance practices—
then bias is introduced
o The validity of a survey instrument is also important as a means of reducing measurement bias
o Another example of measurement bias is recall bias—where there is a different recall of
information by both cases and controls
o Recall bias can either exaggerate or underestimate the degree of effect
o If this type if bias occurs equally in the groups being compared, then there may be an
underestimate of the true strength of the groups’ relationship
o One approach to avoid recall bias is to appropriately frame the questions to aid accuracy of
recall
o If possible, conduct a study where information from medical recorded is attainable (as
appropriate)
CONFOUNDING
 Confounding: refers to a mixing or muddling of effects that can occur when the relationship we
are interested in is confused by the effect of something else
The association of interest, which is between the

study and outcome factors, is confounded by a
variable, which also has a plausible explanation for
its effect on the outcomes
To place the above diagram into perspective, say,

as a researcher, you are interested in the effect of
smoking on lung cancer
However, if not controlled for, environmental

pollution, another known risk factor for lung
cancer, may also confound the relationship
between smoking and lung cancer as demonstrated
in the diagram below
 CONFOUNDING HAPPENS WHEN:

o An extraneous factor, itself a determinant or risk factor for the study outcome or the exposure,
exists in the study population
o The effects of two exposures have not been separated and therefore it has been incorrectly
concluded that the effect is due to one factor rather than the other
o It creates the appearance of a cause and effect relationship, that does not in fact exist
CRITERIA FOR CAUSATION
 Causality is assumed when one variable is shown to contribute to the development of the study
outcome, and its removal is shown to reduce the frequency of the outcome
 Criteria have been developed and can be used as a test for the degree of association for casual
relationships

o
Quiz
1. ‘Not all associations in epidemiological studies indicate causation and not all null
associations indicate lack of causation’. Explain what this statement means and describe the
reasons for your explanation.
2. What do the terms precision and validity refer to?
3. What are the two main types of systematic error? How to they occur? What ramifications do
they have in epidemiologic research?
4. List the effects of selection bias on the following analytical study designs:
a. cohort studies
b. case-control studies
c. randomised control trials.
5. What factors affect measurement bias?
6. Confounding:
a. Define.
b. How can it be prevented or controlled?
 1300660688

Introduction to Epidemiology

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction to Epidemiology

Uploaded by

Copyright:

Available Formats

TOPIC 1: INTRODUCTION TO EPIDEMIOLOGY

WHAT IS EPIDEMIOLOGY AND WHY IS IT IMPORTANT FOR PUBLIC HEALTH?

THE MEANING AND SCOPE OF EPIDEMIOLOGY

 A rate is used as a general term in epidemiology

FEATURES OF DESCRIPTIVE STUDIES

 ALL DESCRIPTIVE STUDIES

HOW TO DISTINGUSIH BETWEEN A WELL-DONE STUDY AND A POOR QUALITY ONE

 QUESTIONS TO ASK YOURSELF

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑒𝑜𝑝𝑙𝑒 𝑤𝑖𝑡ℎ ℎ𝑒𝑎𝑙𝑡ℎ 𝑠𝑡𝑎𝑡𝑒 𝑎𝑡 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑒𝑑 𝑡𝑖𝑚𝑒

o THE TWO MEASURES OF PREVALENCE

# 𝑝𝑒𝑜𝑝𝑙𝑒 𝑤𝑖𝑡ℎ ℎ𝑒𝑎𝑙𝑡ℎ 𝑠𝑡𝑎𝑡𝑒 𝑎𝑡 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑒𝑑 𝑡𝑖𝑚𝑒

# 𝑝𝑒𝑜𝑝𝑙𝑒 𝑤𝑖𝑡ℎ ℎ𝑒𝑎𝑙𝑡ℎ 𝑠𝑡𝑎𝑡𝑒 𝑎𝑡 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑒𝑑 𝑡𝑖𝑚𝑒

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑒𝑜𝑝𝑙𝑒 𝑤𝑖𝑡ℎ 𝒏𝒆𝒘 ℎ𝑒𝑎𝑙𝑡ℎ

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑒𝑜𝑝𝑙𝑒 𝑤𝑖𝑡ℎ 𝒏𝒆𝒘 ℎ𝑒𝑎𝑙𝑡ℎ

 It is essential that when

1. Why use a sample?

1. What is the difference between incidence and prevalence?

AGE GROUP (YEARS) CHD (MEN) IN 1 YEAR CHD (WOMEN) IN 1 YEAR

TYPES OF ANALYTICAL STUDIES

 OBSERVATIONAL ANALYTICAL STUDIES

Example: Does cigarette advertising increase

 Yes:Yes = Those exposed to the ads and

Example: Does owning a pet reduce the rate

 Yes:Yes = Those who have a pet and did

 A ‘cohort’ is a group of individuals sharing a common characteristic

DESIGN AND CONDUCT OF COHORT STUDIES

 In a cohort study the ideal

ASCERTAINMENT OF EXPOSURE AND OUTCOME STATUS

Cases Person years at risk Rates per 1000

Pre menopause 6 8234 0.73

Post menopause 26 6989 3.7

Yes 55 48,222 114.05

Outdoor Melanoma + ve Melanoma - ve Total 5 year risk

Yes 27 455 482 5.6%

No 77 1831 1908 4.0%

Total 104 2286 2390 4.5

FEATURES OF CASE-CONTROL STUDIES

HOW TO SELECT A CASE

HOW TO SELECT A CONTROL

𝑎×𝑑 This means that there is a 1.21 times greater

 What is the aim of this study?

After reading this paper, please discuss the following issues:

 What is the aim of this study?

RANDOMISED CONTROLLED STUDIES (RCTs)

Using this information answer the following questions:

 Data: collection of facts/information obtained b measurement

 Nominal scales: no inherent order

RANDOM ERROR (PRECISION)

 Precision can be improved by

The association of interest, which is between the

To place the above diagram into perspective, say,

However, if not controlled for, environmental

 CONFOUNDING HAPPENS WHEN:

CRITERIA FOR CAUSATION

You might also like