Usmle Review Lecture Epidemiology and Biostats Alaa Elmaoued and Nancy Nguyen

Alaa Elmaoued
Nancy Nguyen
 Epidemiology/population health  Study design, types and selection of studies
 Incidence vs. prevalence  Descriptive studies
 Measures of health status  Analytical studies: observational vs. interventional
 Survival analysis interpretation  Systematic reviews and meta-analysis
 Composite health status indicators  Obtaining and describing samples
 Population pyramids and impact of demographic  Methods to handle noncompliance
changes  Qualitative analysis
 Disease surveillance and outbreak investigation
 Study interpretation
 Communicable disease transmission
 Bias, confounding, and threats to validity
 Points of intervention
 Internal vs. external validity
 Statistical vs. clinical significance
www.usmle.org
 Rates: crude and adjusted
 Crude = overall (e.g. crude mortality rate)
 Adjusted = stratified by different categories (e.g. Age-adjusted mortality rates)
 Mortality
 Standard mortality ratio = (observed # of deaths per yr/expected # deaths per yr) x 100
 If the SMR = 100, this indicates that the # observed deaths is equal to # expected
 Population attributable risk (PAR) = Incidence in the total population – incidence

in the nonexposed group
 Population attributable risk percent (PAR%) = [(incidence in the total
population-incidence in the nonexposed group)/incidence in the total population]
x 100
 Reproductive rates
 Maternal mortality
 death of a woman while pregnant or within 42 days of termination of pregnancy, irrespective of the duration
and site of the pregnancy, from any cause related to or aggravated by the pregnancy or its management but
not from accidental or incidental causes
 Denominator is usually reported per 100,000 registered live births
 Neonatal mortality
 Death of a live-born baby within 7 days of life
 Per 1,000 live births
 Infant mortality
 Death of a child less than 1 year of age
 Per 1,000 live births
 Under-5 mortality
NOT THAT
 Y-axis represents the proportion of survivors and X-axis represents time moving forward
 Generally used to assess survival with death as the defining “event” but can also be used
for other health outcomes such as fertility
 Data is used to define the intervals rather than having a predetermined interval
 Makes full use of the data and is more accurate
 Accounts for some loss to follow-up
 Years of potential life lost (YPLL)
 Measure of premature mortality or early death (i.e. people who die younger have a greater loss of
future productive years than people who die at an older age)
 Based on life expectancy of the population
 Quality-adjusted life years (QALY)

 Measure of the quality of remaining life years
 Used to evaluate different healthcare interventions
 Quality of life is based on a scale from 0 to 1 where 0 is death and 1 is the best possible health state
 Disability-adjusted life years (DALY)

 Years of life lost to premature death AND years lived with a disability of specified severity and
duration
 Measure of overall disease burden that combines mortality and morbidity
1. Define the outbreak and validate the 6. Recommend control measures
existence of an outbreak
7. Prepare a written report of the
2. Examine the distribution of cases by investigation and the findings
time and place
8. Communicate findings to those
3. Look for combinations (interactions) involved in policy development and
of relevant variables implementation and to the public
4. Develop hypotheses based on:
existing knowledge (if any), analogy to
diseases of known etiology, findings
from investigation of the outbreak
5. Test hypotheses
 Attack rate = # of people at risk in whom a certain illness develops/ total # of people at
risk
 Herd immunity 
 Reportable diseases: What types of diseases are reportable/notifiable?

Definition Example
Primary Preventing the initial Immunization
development of a disease
Secondary Early detection of existing Screening for cancer
disease to reduce severity and
complications
Tertiary Reducing the impact of the Rehabilitation for stroke
disease
 The physical examination records of the entire incoming freshman class of
1935 at the University of Minnesota were examined in 1977 to see if their
recorded height and weight at the time of admission to the university was
related to the development of coronary heart disease by 1986. This is an
example of:
A. A cross-sectional study
B. A case-control study
C. A concurrent cohort study
D. A retrospective cohort study
E. An experimental study
 The physical examination records of the entire incoming freshman class of
1935 at the University of Minnesota were examined in 1977 to see if their
recorded height and weight at the time of admission to the university was
related to the development of coronary heart disease by 1986. This is an
example of:
A. A cross-sectional study
B. A case-control study
C. A concurrent cohort study
D. A retrospective cohort study
E. An experimental study
 Residents of three villages with three different types of water supply were asked to
participate in a survey to identify cholera carriers. Because several cholera deaths had
occurred recently, virtually everyone present at the time underwent examination. The
proportion of residents in each village who were carriers was computed and compared.
What is the proper classification for this study?
A. Cross-sectional study
B. Case-control study
C. Concurrent cohort study
D. Nonconcurrent cohort study
E. Experimental study
 Residents of three villages with three different types of water supply were asked to
participate in a survey to identify cholera carriers. Because several cholera deaths had
occurred recently, virtually everyone present at the time underwent examination. The
proportion of residents in each village who were carriers was computed and compared.
What is the proper classification for this study?
A. Cross-sectional study
B. Case-control study
C. Concurrent cohort study
D. Nonconcurrent cohort study
E. Experimental study
 A case control study is characterized by all of the following except:
A. It is relatively inexpensive compared with most other epidemiologic

study designs
B. Patients with the disease (cases) are compared with persons without the
disease (controls)
C. Incidence rates may be computed directly
D. Assessment of past exposure may be biased
E. Definition of cases may be difficult
 A case control study is characterized by all of the following except:
A. It is relatively inexpensive compared with most other epidemiologic

study designs
B. Patients with the disease (cases) are compared with persons without the
disease (controls)
C. Incidence rates may be computed directly
D. Assessment of past exposure may be biased
E. Definition of cases may be difficult
Cross-sectional study Case-series/Case-report
 AKA prevalence study  Case report = one person
 Both exposure and disease outcome are  Case series = more than one
determined simultaneously
 Evaluates subjects with known exposure
 Cannot establish temporal relationship with similar treatment OR for exposure and
between the exposure and onset of outcome simultaneously
disease
 No hypothesis testing
 Vulnerable to selection bias (select certain
Ecological study patients)
• Based on aggregate or group data, not on  No control/comparison group = low
individual (e.g. cause of death in internal validity
different countries)  Vulnerable to Hawthorne effect
 Selection of subjects is based on exposure  Selection of subjects is based on disease
or other health outcome
 Groups are followed to compare incidence
of disease or other health outcomes  Groups are evaluated to compare past
exposure
 Prospective aka concurrent aka
longitudinal cohort study  Incident > prevalent cases (survival vs.
development)
 Retrospective aka nonconcurrent aka
historical cohort study  Matching
 Group = frequency match
 Good for evaluating temporal/causal
association  Individual = each case matched to a control
 Bad for rare diseases  Relatively inexpensive and does not

require as much time
 Expensive and time-consuming
 Susceptible to recall bias
 Problems with loss-to-follow-up
 Good for rare diseases
 Bad for rare exposures
 Randomized Control Trial
 Essentially the Gold Standard
 Unethical in a lot of cases!
 Double-blind
 Placebo-controlled
 Community intervention
Systematic Review Meta-analysis
 A research study which aims to provide an  A statistical technique used to combine the
exhaustive summary of current literature results of all eligible studies in a
relevant to a research question. systematic review into a single quantitative
estimate or summary effect size
 Crucial to EBM
Effect sizes measure the strength of the relationship between two variables, thereby providing
information about the magnitude of the intervention effect
Heterogeneity is a value calculated to determine if individual studies are similar enough to compare
(prefer non-significant findings for heterogeneity)
Publication bias is particularly problematic for systematic reviews because not all studies are
published, depending on the significance and direction of effects detected.
Horizontal line =
confidence interval
Each square represents the result from

individual studies
Center line = 1.0 (no association)
Overall result from the

meta-analysis
 Any systematic error in the design, conduct, or analysis of a study that results in a
mistaken estimate of an exposure’s effect on the risk of disease
 Selection bias
 Error introduced when the study population does not represent the target population
 Can be introduced at any stage of a research study
 Information bias
 Occurs during data collection and can lead to misclassification
 Sampling bias or non-random sampling bias: a selection procedure that yields a non-
representative sample in which a parameter estimate differs from the existing in the target
population
 Example is telephone random sampling which would systematically exclude households without
telephones
 Ascertainment bias
 Healthcare access bias
 Survivor treatment selection bias
 Recall bias
 If the presence of disease influences the perception of its causes or the search for exposure to the
putative cause
 Common in case-control studies where participants are aware of their disease status, but can also
occur in cohort studies
 Ecologic fallacy
 When analyses realized in an ecological group analysis are used to make inferences at the
individual level
 Hawthorne effect
 When individuals modify they react or behave in response to their awareness of being observed
 An extraneous variable that correlates (directly or inversely) with both the dependent
variable and the independent variable
 Example: Drinking coffee and pancreatic cancer
 Confounding is not an error in the study but can be considered a true phenomenon that is
identified in a study and must be understood
 One approach is to stratify…
 If you stratify the data by the confounding variable then you will find that the measure of
association will equal 1.0
 If you know of a possible confounder during the design phase of your study, you can match
cases to controls based on the confounding variable
 Internal validity
 The extent to which a study is able to make causal conclusions based the design and ability to
reduce systematic error
 Essentially how well you designed your study (confounding = red flag!)
 External validity
 Whether the findings of a study can be generalized to the rest of the population
 Example: hospital cohorts
Alaa Elmaoued
Nancy Nguyen
 Sensitivity and Specificity  Number Needed to Harm
 Positive and Negative Predictive Values  t-Test
 Incidence and Prevalence  ANOVA
 Odds Ratio  Chi-square
 Relative Risk  Pearson Correlation Coefficient
 Attributable Risk  Error types
 Relative Risk Reduction
 Absolute Risk Reduction
 Number Needed to Treat
 Incidence RATE = Number of new cases  Prevalence = Number of total existing
/ Population at risk cases / Population at risk
 Incidence looks at new cases at a time
 Prevalence = incidence x duration of
period
disease
 Chronic disease with long duration has a
high prevalence
 Disease with short duration has low
prevalence and equals the incidence of
disease
 Smithville has a stable population of 100,000 and 2000 individuals in this community
have been diagnosed with disease X. Although 300 individuals in Smithville die each
year from all causes, 100 of those die from disease X. There are 50 new cases of the
disease each year.
 The annual incidence of this disease is represented by which of the following?
The incidence is represented by the number of new cases of the disease in a given
period divided by the susceptible population. Because the 2000 people with the
disease are no longer susceptible, they must e subtracted from the total population;
thus the incidence is 50/98,000.
A research group is studying sickle cell disease in a geographically isolated community of
6000 people. A genetic analysis is performed on every community member At the beginning
of the year, it is determined that 10% are homozygous for hemoglobin S and therefore have
sickle cell disease, and 30% of the community is heterozygous for the mutant allele. Over the
course of the year, 100 infants are born, six of whom are diagnosed with sickle cell disease. Of
80 people who die during the year, three had sickle cell disease.
Which of the following is the current prevalence of sickle cell disease in this population?
Prevalence is the total number of cases in a population divided by the total population at risk of
the disease. Multiply the initial population (6000) by the initial prevalence (10%), yielding 600
cases. Over the course of the year, there was a net gain of 3 patients with sickle cell disease,
bringing the new total to 603. Likewise, the new population at risk is 6020, a net gain of 20 people.
Therefore, the current prevalence is 603/6020.
 Be Sensitive to Positive people
 Sensitivity is how good a test will identify those who have the disease
 Sensitivity = True Positives/(True Positives + False Negatives) OR = 1 – false-negative rate
 SN-N-OUT
 A highly sensitive test Rules Out the disease if it is negative
β-Thalassemia major results from a homozygous genotype that leads to complete absence of both the β-globin
chains. A study subjected 100,000 participants to an intrauterine screening test; 87 tested positive for β-
thalassemia major, and the remaining 99,913 tested negative. In 7 of those 87 cases the results were shown to be
false positive. Ultimately, 100 of those originally screened were found to actually have the disease.
Which of the following is the correct sensitivity of the intrauterine screening test?
 Proportion of positive test results tat are truly positive
 If the test result is positive in this patient, what is the probability that this patient truly has
the disease?
 PPV = TP/ TP+FP
 PPV is directly related to prevalence
 High prevalence means high PPV
Investigators studying cardiovascular disease discover a new serum protein marker that is
correlated with the presence of ruptured atherosclerotic plaques. It is hoped that this serum marker
could be used as a screening test to identify whether a person has had a recent MI. In a phase III
clinical trial of 1400 subjects, the investigators find that of the 500 subjects who had an MI, 400 tested
positive for the serum marker, whereas 850 subjects who did not have an MI tested negative for the
marker.
If this marker were used to screen patients for recent MI, what is the probability that a person will
have had an MI given a positive serum protein analysis?
The question is asking to calculate the positive predictive value of the test, i.e, the
probability that a person with a positive serum marker on the screening test will indeed
have had a recent MI.
 Specificity is the proportion of people without the disease wo test negative
 SP-P-IN
 Highly specific test when positive rules in the disease.
 Specificity = True Negatives / True Negatives + False Positives OR = 1 – false-

positive rate
 Proportion of negative test results that are true negative
 If the test is negative, what is the probability that this patient does not have the disease?
 NPV = True Negatives / All people who tested negative (TN + FN)
 NPV is inversely correlated with prevalence
 High prevalence = Low NPV
 How to determine whether a certain disease is associated with a certain exposure
 To determine whether an association exists, we can use data from case-control and
cohort studies
 Used in Case-control studies
 Odds that group with disease (cases) was exposed to a risk factor (a/c) divided by
the odds that group without the disease (controls) was exposed (b/d)
Researchers are investigating the relationship between cell phone use and brain cancer. Of 50
brain cancer patients, 30 admitted to using a cell phone for 10 year or more. Of 400 healthy
participants in the study, 250 were found to have used a cell phone for 10 years or more.
Which of the following is an appropriate conclusion to draw from this study regarding ell phone
use and brain cancer?
The clinical study described is a case-control study. Case-controls look at those with the
disease (the cases) compared to those without the disease (the controls). The odds ratio
is then calculated as OR=(odds in disease group)/ (odds in control group) = [30/(50-
30)] / [250/(400-250)] = 9/10.
 Used in cohort studies
 Risk of developing disease in the exposed group divided by risk in the unexposed
group.
 Defined as the difference in risk between exposed and unexposed groups, or the
proportion of disease occurrences that are attributable to the exposure
 Number needed to treat is defined as the number of patients who need to be
treated for 1 patient to benefit
 Number needed to harm is defined as the number of patients who need to be

exposed to a risk factor for 1 patient to be harmed
 t-Test checks the difference between the means of 2 groups.
 ANOVA checks the difference between the means of 3 or more groups
 Chi-square checks the difference between 2 or more percentages or proportions

of categorical outcomes; used for frequency data rather than for comparison of
means.
A physician is studying the effects of drug A and drug B on cognitive performance in
Alzheimer patients. She administers a memory test to two groups of subjects (those
taking drug A and those taking drug B) and compares their mean scores. Which of
the following statistical tests would be most appropriate for this purpose?
A. ANOVA
B. Chi-square test
C. Linear regression analysis
D. t-Test
E. Multiple linear regression
 The t-Test is used to compare two means derived from two samples.
 r is always between -1 and +1
 The closer the absolute value of r is to 1, the stronger the linear correlation between the 2
variables.
 Positive r value means a positive correlation
 Negative r value means a negative correlation
 The coefficient of determination (r2) is what is usually reported (i.e. graphs)

 Type I (α) errors and Type II (β) errors indicate that you accepted the wrong
hypothesis.
Type I (α) error Type II (β) error
• “False-positive” error: • “False-negative” error:

• You accepted your hypothesis • You fail to reject the null-
(alternative hypothesis) rather hypothesis when it is
than the null-hypothesis actually wrong
• The p-value is the probability of • β is the probability of making a

making a type I error type II error.
• Power = 1- β
 A study with greater power has less type II error
 The power is the probability of rejecting the null hypothesis when it is in fact false (This
is what we want to happen)
 Conventionally, a study should have a power of 0.8 (or a β of 0.2) to be accepted.
 Important: Increasing the sample size is the most practical and important way of increasing the
power of a statistical test, i.e., there is power in numbers.
A medical resident decides to test the hypothesis that people with Alzheimer’s have
elevated serum sodium levels. The Type I error of this study was 0.078. What does
this analysis represent for the study?
A. Determines the power of a study to detect a significant change
B. Probability of Type I error is known as β
C. Represents the probability of incorrectly rejecting the null hypothesis
D. Most studies used a probability of error level of 0.10 to determine the significance
E. It is equal to 1- β
 α should be less than 0.05 to be acceptable
 USMLE Step 1 Qbook, Fifth edition
 USMLERx Qbank 2015
 First Aid 2015 edition
 USMLE Step I Secrets
…You should probably start running…
In a city with a population of 1 million, 10, 000 individuals have SLE. There are
1,000 new cases of SLE each year and 200 deaths caused by the disease.
There are 2,500 deaths per year from all causes. Assuming no net
emigration or immigration to the city, the incidence of SLE in this city is
given by which of the following expressions?
A. 800/990,000
B. 800/1,000,000
C. 1,000/990,000
D. 1,000/1,000,000
E. 2,500/1,000,000
F. 10,000/1,000,000
In a city with a population of 1 million, 10, 000 individuals have SLE. There are
1,000 new cases of SLE each year and 200 deaths caused by the disease.
There are 2,500 deaths per year from all causes. Assuming no net
emigration or immigration to the city, the incidence of SLE in this city is
given by which of the following expressions?
A. 800/990,000
B. 800/1,000,000 Don’t forget to subtract the
prevalent cases of SLE! They are
C. 1,000/990,000 not part of the population at risk
of becoming new cases
D. 1,000/1,000,000
E. 2,500/1,000,000
F. 10,000/1,000,000
Researchers are developing a screening test for awesomeness which has a
sensitivity of 95% and a specificity of 90%. If the prevalence of awesomeness is
10%, which of the following is the best estimate for the probability that a
person who tests negative for awesomeness is actually not awesome at all?
A. 45%
B. 50%
C. 85%
D. 90%
E. 95%
F. 99%
Researchers are developing a screening test for awesomeness which has a
sensitivity of 95% and a specificity of 90%. If the prevalence of awesomeness is
10%, which of the following is the best estimate for the probability that a
person who tests negative for awesomeness is actually not awesome at all?
Awesome Not Awesome
A. 45% Pos Awesome 95 (sn=95%) 90 185
B. 50% Neg Awesome 5 810 (sp=90%) 815

C. 85% 100 900 1000
[prevalence = [start with a nice
D. 90%
10%] round number]
E. 95%
Negative Predictive Value = TN/TN+FN=810/815 = 99%
F. 99%
A study is conducted to evaluate the average number of pizza slices
consumed by medical students during their first year. Results of 100
students surveyed show an average number of pizza slices of 110 with a
standard deviation of 20. Which of the following is the best estimate for the
95% confidence interval for the mean in this sample?
A. 70 to 130
B. 70 to 150
C. 85 to 115
D. 90 to 130
E. 105 to 115
F. 106 to 114
A study is conducted to evaluate the average number of pizza slices
consumed by medical students during their first year. Results of 100
students surveyed show an average number of pizza slices of 110 with a
standard deviation of 20. Which of the following is the best estimate for the
95% confidence interval for the mean in this sample?
A. 70 to 130
CI = sample mean ± Z x (SD/√n) Z-score for 95% CI = 1.96 ≈ 2
B. 70 to 150
= 110 ± 2 (20/√100)
C. 85 to 115 = 110 ± 2 (20/10)
= 110 ± 2 (2)
D. 90 to 130 = 110 ± 4
= (106, 114)
E. 105 to 115
F. 106 to 114
A screening test used to detect cervical cancer has a sensitivity of 96%, a
specificity of 90% a positive predictive value of 92% and a negative
predictive value of 95%. A recent study on the impact of Gardasil suggests
that the prevalence of cervical cancer has declined. Given this information,
how will this impact the results of the screening test?
A. Decrease the sensitivity
B. Decrease the specificity
C. Increase the negative predictive value
D. Increase the positive predictive value
E. Increase the sensitivity
F. Increase the specificity
A screening test used to detect cervical cancer has a sensitivity of 96%, a
specificity of 90% a positive predictive value of 92% and a negative
predictive value of 95%. A recent study on the impact of Gardasil suggests
that the prevalence of cervical cancer has declined. Given this information,
how will this impact the results of the screening test?
A. Decrease the sensitivity A change in prevalence is a change in
the population, not the screening
B. Decrease the specificity exam; therefore you can eliminate
answers A, B, E, and F because
C. Increase the negative predictive value sensitivity and specificity pertain to
qualities of the TEST and not the
D. Increase the positive predictive value population
E. Increase the sensitivity
If the prevalence of a disease goes
F. Increase the specificity down, then you have the probability
of having more true negatives and
less true positives… thus the NPV
increases and the PPV decreases

Usmle Review Lecture Epidemiology and Biostats Alaa Elmaoued and Nancy Nguyen

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Usmle Review Lecture Epidemiology and Biostats Alaa Elmaoued and Nancy Nguyen

Uploaded by

Copyright:

Available Formats

Alaa Elmaoued

 Population attributable risk (PAR) = Incidence in the total population – incidence

 Quality-adjusted life years (QALY)

 Disability-adjusted life years (DALY)

 Reportable diseases: What types of diseases are reportable/notifiable?

A. It is relatively inexpensive compared with most other epidemiologic

A. It is relatively inexpensive compared with most other epidemiologic

 Bad for rare diseases  Relatively inexpensive and does not

Each square represents the result from

Center line = 1.0 (no association)

Overall result from the

 Specificity = True Negatives / True Negatives + False Positives OR = 1 – false-

 Number needed to harm is defined as the number of patients who need to be

 ANOVA checks the difference between the means of 3 or more groups

 Chi-square checks the difference between 2 or more percentages or proportions

 The coefficient of determination (r2) is what is usually reported (i.e. graphs)

• “False-positive” error: • “False-negative” error:

• The p-value is the probability of • β is the probability of making a

A. 45% Pos Awesome 95 (sn=95%) 90 185

B. 50% Neg Awesome 5 810 (sp=90%) 815

You might also like