You are on page 1of 40

2007 ANNUAL MEETING

INSTRUCTIONAL COURSE LECTURE HANDOUT

Course Number: 123


Course Title: So Much to Read, So Little Time
Location: San Diego Convention Center
Room 31BC
Date & Start Time: 14-FEB-2007 10:30
INSTRUCTORS WHO CONTRIBUTED TO THIS HANDOUT:
Michael J Goldberg, MD
Paul Tornetta III, MD
Charles Turkelson, PhD
Alan Marc Levine, MD

The material presented at this course has been made available by the AAOS for educational purposes only.

The AAOS disclaims any and all liability for injury, loss or other damages resulting to any individual attending the course and for all
claims, which may arise from the use of techniques and strategies demonstrated therein. The material is not intended to represent the
only methods, procedures or strategies for the situations discussed. Rather, the course is intended to present an array of approaches,
views, statements and opinions which the faculty believe will be helpful to course participants.

Some drugs or medical devices demonstrated in Academy educational programs or materials have not been cleared by the FDA or
have been cleared by the FDA for specific uses only. The FDA has stated that it is the responsibility of the physician to determine the
FDA clearance status of each drug or device he or she wishes to use in clinical practice.
So Much to Read, So Little Time
Moderator: Alan M. Levine MD
Faculty: Paul Tornetta III MD
Charles Turkelson PhD
Michael Goldberg MD

ICL Number 123


Wednesday, February 14, 2007
10:30 AM - 12:30 PM

1
How can Review Articles Provide Relevant Information for
your Practice?

Alan M. Levine MD
Director of the Alvin and Lois Lapidus Cancer Institute
Chairman of the Council on Education AAOS
Editor Emeritus JAAOS

“What information consumes is rather obvious: it consumes the attention of its


recipients. Hence a wealth of information creates a poverty of that attention, and a
need to allocate that attention efficiently among the overabundance of information
sources that might consume it.”
Herbert Simon, Nobel Laureate Economist 1971

Why do Orthopaedic Surgeons read the medical literature?


1. To keep current with new clinical developments.
2. To answer a question about a specific patient
3. To review and reinforce previously learned information (“study for exams”)
4. To keep up with subspecialty of interest
R. Smith BMJ 1996

How do we sift through the piles of information discriminating what is correct from
incorrect and determining what is clinically useful?
(subtitle: There are many pitfalls out there!)

2
Factors affecting what we read in the literature
Publication Bias
Negative and equivocal results are less likely to get published than
positive results
Negative results are less likely to be published in English than in other
languages
Statistical analysis used as a proof thus encouraging false positive results
Publication bias can influence the results of systematic reviews and meta-
analyses

Why are review articles needed


New information on evaluation or treatment
New techniques with better results
New laboratory studies or imaging modalities
New diagnostic modalities available
New technology

A variety of approaches exist for a given musculoskeletal problem


Useful to highlight pros and cons of each approach
Uses published results to allow reader to differentiate
Emphasizes how the approach has changed over time and areas in
which insufficient data exists

Updating data on a field not taught during residency

Synthesize important points in areas newly relevant to Orthopaedic Surgery (i.e.


growth factors, gene therapy, molecular biology)

Allows transfer of data into Orthopaedic literature

Educate physicians outside one’s area of specialty


Relevance of social issues to treatment
Problem solving in areas outside of one’s specialty (e.g. spine
surgery for those involved in joint reconstruction)
Novel uses of technology

Types of review articles


Clinical Review Articles (“Updates”)
Systematic Reviews
Meta-analyses

3
Clinical Review Articles (“Updates”)
Advantages

Summary of the literature usually done by an respected authority


in the field
Selectively review the medical literature in order to discuss a
broad or narrow topic
Reference list supplies easily accessible group of recent and
classic articles for further review
For a given clinical problem often several solutions are discussed
and the relative value of each assessed qualitatively
Allows review of the literature in areas where few randomized
prospective studies are available
Often well illustrated and written in a readable style
Allows rapid review on a number of different topics with a
relatively short investment of time
During the reading of a review article areas requiring further in-
depth reading often become evident

One year from 5 peer reviewed and 5 high circulation “throw-away”


journals
394 reviews: 16 (4.1%) were peer-reviewed systematic reviews,
135 (34.3%) were peer-reviewed non-systematic reviews and 243
(61.7%) were in “throw-aways”.
The peer reviewed articles were judged to be less relevant to
clinical practice and had fewer illustrations and tables.
Rochon et al, JAMA 2002

Disadvantages
Unlikely to answer a specific clinical question
Usually don’t contain clinical research that was systematically
gleaned from the literature
Have some bias in the selection of articles included
Qualitatively summarize the literature rather than give relative
value differences between studies
May give recommendations more strongly influenced by authors’
opinion rather than the evidence
Montori et al Clin Orthop 413:43-54, 2003

Differing conclusions on the same topic


106 reviews on the effect of passive smoking 1980-95
37% (39/106) concluded it was not harmful-

4
29/39 were written by authors with tobacco industry
affiliations.,
2/67 with conclusions of harmful written by authors with
tobacco affiliations
Barnes DE et al, JAMA 1998

Suitable topics
Radiologic characteristics, natural history and current treatment
options for metaphyseal fibrous defects
Current and potential uses for gene therapy in Orthopaedic
Surgery

Systematic or evidence-based review


Comprehensively examine the literature with a specified approach
to identify all relevant information to formulate the best approach
to diagnosis or treatment

Meta-analysis
Specific types of systematic review where quantitative methods are
used to re-analyze the literature and answer a focused clinical
question with statistical analysis of pooled data

Systematic reviews
What is the place of a systematic review in the hierarchy of evidence?
Methodologically sound studies may not be able to determine with
precision the relationship between interventions and result if the number
of participants is small. Different studies may reach contradictory results
or not be able to discriminate.
Sometimes this problem can be resolved by doing very large randomized
studies but it is not always possible. However collecting and synthesizing
the results of a number of studies may allow appropriate conclusions to be
drawn.

Elements of Evidence based Surgical Practice


Preferences, concerns and expectations of each patient
Clinical expertise of surgeons (skills, experience and knowledge)
Best research evidence that is relevant for clinical practice
Sackett DL, J.R.Soc Med 1995

Role of Systematic Review and Meta-analyses in Evidence based


Medicine

5
Orthopaedic Surgeon cannot possibly read and critically analyze
all the articles each year in the 18 major Orthopaedic Surgery
journals
Some important advances lie outside our normal reading patterns
Wide variation in quality and type of literature
Provides some standardization and evaluation of the quality of the
results in the literature

Suitable topics
Is there evidence whether non-operative or operative treatment is
superior for an Achilles tendon rupture in a healthy 35 year old?
What is the optimal study for clearing the cervical spine in a
polytrauma patient?
Is there a difference in infection rates between 24 hours and 48
hours of prophylactic antibiotics for total hip arthroplasty
patients?

Critical steps in a systematic review and meta-analysis


Define the question and inclusion criteria for studies
Conduct the literature search
Apply the inclusion criteria to select the final studies
Abstract the data
Generate the analysis, pooling data only if appropriate

6
Quorum Statement
To ensure the quality of systematic reviews a set of standards for
publication of systematic reviews of randomized prospective studies was
proposed which mandated that there were certain elements which needed to be
included the in the report so that readers could assess the reproducibility of the
data.
Moher et al Lancet 354:1896-1900,1999.

Meta-analyses in Orthopaedic Surgery


Systematic review of all meta-analyses on Orthopaedic topics between
1969 and 1999.
Only 41 met criteria and 26/41 published between 1994
and 1999.
88% had some methodologic problem limiting validity of conclusions.
Studies associated with an epidemiologist or in non-surgical
journals tended to be of higher quality. The mean quality scores of
the studies were better in thrombosis prevention than in fracture or
degenerative topics.
Bhandari et al JBJS 83A:15-24, 2001
The Cochrane Library
218 systematic reviews (of 4539 total) on topics related to Orthopedic
Surgery
Parker MJ, Gurusamy K. Arthroplasties (with and without bone cement)
for proximal femoral fractures in adults. Cochrane Database of Systematic
Reviews 2006, Issue 3.
Seventeen trials randomised and quasi-randomised controlled
trials comparing different arthroplasties and their insertion with
or without cement, for the treatment of hip fractures involving
1920 patients were included
Authors' conclusions:
There is limited evidence that cementing a prosthesis in place may
reduce post-operative pain and lead to better mobility. There is
insufficient evidence to determine the roles of bipolar prostheses
and total hip replacement. Further well-conducted randomised
trials are required.

Publication Bias
Tendency of journal editors to publish results with positive rather than
negative results or uncertain
There is a consensus however that is important to publish systematic
reviews which find no evidence to guide practice.
Denying uncertainty does not benefit patients
Encourages larger randomized prospective studies to resolve issue
Systematic reviews with dramatic results tend to be
methodologically weaker

7
Admitting uncertainly helps clarify options and encourage
research

User’s Guide for how to use a review article


from AMA User’s Guide to the Medical Literature: A manual for Evidence based Clinical
Practice Chicago AMA 2002
Are the results valid?
Can the review explicitly address a sensible clinical question?
Was the search for relevant studies detailed and exhaustive?
Were the primary studies of high methodologic quality?
Were the assessments of studies reproducible?
What are the results?
Were the results similar from study to study?
What are the overall results of the review?
How precise were the results?
How can I apply the results to patient care
How do I best interpret the results and apply them to practice
Were all clinically important outcomes considered?
Are the benefits worth the costs and risks?

How to interpret discordant Systematic Reviews


Differences
Results can be different
Conclusions drawn by the authors can be different
Where do differences arise?
Population, intervention or outcomes
Selection criteria for studies
Methods of data extraction
Statistical analysis
Do they ask the
same question?

Select the most


appropriate
question

Do they use the


same clinical
trials?

Same quality? Same selection


criteria?

Compare data Select review with Compare search Other factors


extraction etc highest quality strategy
and application

8
References:
Montori VM et a,l Methodologic Issues In Systematic Reviews and Meta-
analyses, Clinical Orthopaedics and Related Research 413: 43-54, 2003

Bandari M et al, Meta-Analyses in Orthoapedic Surgery: A Systematic Review of


Their Methodology, JBJS 83A:15-24, 2001.

Sauerland S, Role of Systematic Reviews and Meta-analyses in Evidenced-based


Medicine World J Surg 29:582-87, 2005

Parker M et al, Systematic Reviews, Meta-analyses and methodology JBJS


83A:1433-4, 2001

9
Methods to Facilitate Analysis of Articles in the Orthopaedic
Literature
Paul Tornetta III MD
Professor and Vice Chairman in the Department of Orthopaedic Surgery, Boston
University School of Medicine
Director of Orthopaedic Trauma for the Boston Medical Center.

10
Error
• All studies have error
• Critical analysis necessary
♦ Appropriate question
Appropriate population

?

♦ Selection bias

♦ Technique bias

♦ Outcomes measure

11
Standard Evaluation Standard Evaluation
• Just like an x-
x-ray • Study design (RCT, series)
• Methodology
• Same method each time ♦ Hypothesis (if there is one!)
♦ Population
• Systematic approach ♦ Intervention

• Outcomes assessed
• Handouts • Results

Case Series Standard Evaluation


• Helpful if…
if…. • Study design (RCT, series)
♦ Same population • Methodology
♦ Reproducible intervention
♦ Hypothesis (if there is one!)
♦ Population
♦ High percentage f/u
♦ Intervention
♦ Outcome measures important
• Outcomes assessed
• Arthritis after acetabular ORIF • Results

Example: Tibia Fx The “p” Value


• Reamed vs unreamed nailing
• Probability
♦ Union (%)
♦ Time to union (weeks)
• Coin toss
• Null hypothesis:
♦ Heads 50% (p = .5)
• There is no difference in the
union rate or time between ♦ Heads twice 25% (p = .25)
the groups ♦ Heads ten times < 1/1000 (p<.001)

Populations…. Sample

n n

Time to union Time to union

12
Random Sample Random Sample

n n

Time to union Time to union

Confidence Limits Comparison Samples

n 95% n

Time to union Time to union

Comparison Samples Statistics


• Poorly understood
• “Significant difference”
difference”
n • Alpha error (type 1)
♦ p value
♦ p < 0.05

Time to union

True Difference Alpha Error


• Study samples different • Study samples different
• P = 0.05 • The difference is not real
• 95% the difference is real • 5%
A B AB

13
Alpha Error Alpha error
• Study samples different Chance of incorrectly
• The difference is not real concluding a difference exists
• Confidence limits don’
don’t • Sampling error
overlap • Set at p = .05 or 5%
AB

Alpha Error Rates Example…


• 60 Orthopaedic journals • Tumia,
Tumia, et al 2003
Compared brace to cast for Colles
• 37% at risk for type 1 error ♦

♦ Reported multiple outcomes, not


♦ Conclusion that there is a
difference when there is not adjusted for numbers
• Pain, grip, etc multiple time
♦ Primarily due to multiple
evaluations points
♦ Improvement at particular times
• 20 endpoints x p = .05 = 1!!
♦ Probably random chance…
chance…..
♦ Fishing expedition

No True Difference Beta Error


• Study samples are the same • Study samples are the same
• Study finds no difference • The populations are different
• 80% • 20%
AB A B

Beta Error (type 2) Conclusions of RCT’s


• Converse
Truth
• Concluding no difference No Difference
Difference

when one does exist


Study Hypothesis True Hypothesis False
Difference False Positive Correct
Reject Hypothesis α error (1-β)
• Set at β = .2 or 20% No Difference Correct False Negative
Accept Hypothesis (1-α) β error

14
Power Power
(1-β)
• Power (1- • Related to “n”
♦ Strength of study
♦ Desire > 80%

♦ Determined by
A B
• Effect size (difference / SD)
• Type 1 error rate
• Sample size

Sample size Sample size

A A

Sample size Small Sample size


• Related to “n”

A A B

Increase Sample size Increase Sample size


• Related to “n” • Related to “n”

A B A B

15
Power Calculation of Power
• For continuous variables:
• Should be built in at the ♦ N= {[ (Zα - Zβ) σ]/∆}
• N=sample size
beginning • Zα=1.96, and
• ∆= difference b/n treatments.
• Standard deviation (σ)
σ2= [ (Ntreatment-1)( σtreatment)2 + (Ncontrol-1)( σcontrol)2 ]/ Ntreatment-Ncontrol

• Can be evaluated post-


post-hoc • For dichotomous variables:
♦ Zβ= [ √n/ 2σ] D- Zα
• σ = √PT(1-PT) + PC(1-PC)/ 2
• PT and PC= proportion of events

Power Studies
• Very important concept! • 196 Studies
♦ 79 Eliminated
• “No statistically signficant ♦ 43 Reported positive result
difference”
difference” • 117 Studies underwent power
analysis
• Need to demonstrate ♦ “No statistically significant
power is there! difference”
difference”

Outcomes Results: Power


• Primary endpoints • Primary outcomes
♦ Fracture healing
♦ Power avg. 25% (2% - 99%)
♦ Functional outcome
• Secondary outcomes
• Secondary endpoints
♦ Power avg. 19% (2% - 99%)
♦ Complications
♦ Pain • 4 / 117 Mention power!

β Error Rates Power


(1- β)
Power (1-
Type II Error • Related to
Rate (β)

Outcome
Average SD Range Total
♦ Magnitude of the treatment
Type effect
Primary 2.24%-
2.24%-
n=213
24.65% 27.21%
99.99% 90.61% ♦ Designated type I error rate
Secondary 2.24%-
2.24%-
n=127
19.66% 21.31%
99.99% 96.85% ♦ Sample size

16
Example: Tibia Healing Example: DVT
Time To Time to % Reduction Number of PE Rate PE Rate % Reduction Number of
Healing Healing in Time to patients Control Treatment in PE Risk patients needed /
Control Treatment healing needed per Group group group
Group group group

150 days 120 20% 16 10% 8% 20% 3213


150 days 135 10% 63 1% 0.8% 20% 35,001
150 days 143 5% 289 0.1% 0.08% 20% 352,881

Example Example: planning


• ORIF vs. Nonop calcaneus
• Mortality in elderly trauma
• “No difference”
difference” patients
• Since risk associated with ♦ 423 Patients…
Patients…4 centers
ORIF, would not be performed ♦ Early fixation = 11%
• Power is 3% ♦ Late fixation = 18%

♦ To prove it…
it…. >1500
• Conclusion may be flawed
♦ Can use this to plan future work
• Difference might exist

Example: Correct Effect Size


• Price et al, 2003
• Reaching statistical
♦ Compare 2 TKA designs
significance is not all!
♦ Effect size .5 (4 points on scale)

♦ Alpha = 0.05, Power = .85 • Must ask…


ask…does it matter?
♦ Need 38 patients

♦ Enrolled 40
• Clinically important
♦ Reached significance in 4 • Treatment difference / SD
parameters…
parameters….

Time to Union Statistical Tests…


A = 250 ± 60 Days
• Depends on the type of data
B = 260 Days
10 / 60 = 0.16 ♦ Means (time to healing)
P = .03
AB ♦ Proportions (infection rate)
n
• Depends on hypothesis
♦ Differences in treatments
♦ Correlations
Time to union

17
Student’s t - test Student’s t - test
• Compares two • Paired vs unpaired
independent means • Most unpaired
♦ Time to union in reamed vs ♦ Fracture healing
unreamed tibia • Price, et al 2003
• Assumes independence ♦ Bilateral TKA
• Assumes normal distribution ♦ Compared two prostheses
• P values, type 1, 2 errors…
errors…. ♦ Paired

Student’s t - test χ2 (Chi-square)


• Must adjust for small • Compares proportions
numbers • Null: proportions equal
• Very often left out…
out… ♦ Infection rate in open tibias
treated with reamed vs
♦ Fisher’
Fisher’s exact unreamed nails
♦ Leaves more room for ♦ Incidence of DVT using
sampling error Lovenox vs. ASA prophylaxis

χ2 (Chi-square) Variables
• Continuous
♦ Time to union
• Discrete
♦ Infection rate
♦ Nonunion rate
55%
30% Continuous is better!!

Summary
• Define the question
• Define the population
• Define the outcomes
• Understand potential error
• Read the literature carefully and
with skepticism!!

18
References:

Brighton, B; Bhandari, M Tornetta, P Felson T, MDHierarchy of Evidence: From Case


Reports to Randomized Controlled Trials.

Devereaux, P. J McKee, M Yusuf, S Methodologic Issues in Randomized Controlled


Trials of Surgical Interventions.

Hartz, A Marsh, JL Methodologic Issues in Observational Studies.

Bernstein, J McGuire, K Freedman, K Statistical Sampling and Hypothesis Testing in


Orthopaedic Research.

Bhandari, M Whang, W Kuo, J. Devereaux, PJ Sprague, S Tornetta P The Risk of False-


Positive Results in Orthopaedic Surgical Trials.

Griffin, D Audige, L Common Statistical Methods in Orthopaedic Clinical Studies.

19
Levels of Evidence
Charles M. Turkelson, Ph.D.

The quality of a clinical study is related to how unbiased (valid) its results are. Bias can
be intentional or unintentional. Well-designed studies are less susceptible to bias then
poorly designed studies. Poorly designed studies do not rule out the fact that something
other than the intervention of interest may have caused the observed results. Therefore,
one has less confidence in the results of biased studies.

Study quality is typically assessed using a formal system. Using such a system combats
the biases of those performing the quality assessment.

There is no gold standard for the true quality of a trial, so it is difficult to validate any
instrument used to evaluate study quality. (cf. Higgins JPT, Green S, editors. Cochrane
Handbook for Systematic Reviews of Interventions 4.2.5 [updated May 2005].
http://www.cochrane.org/resources/handbook/hbook.htm).

This does NOT mean “Do Nothing.” (After all, one treats sick patients, even if one can’t
make a diagnosis)

There is a large body of literature suggesting that the quality of the medical literature is
poor. Even the quality of randomized controlled trials (RCTs) is suspect (cf. Altman,
JAMA, 2002, 287:2765-2767). This highlights the need to assess quality.

Quality is not “applicability” (also called “generalizability” or “external validity”).


Studies with results that are applicable to a physician enroll patients and use interventions
similar to those he/she uses.

The results of the first RCTs published on a given topic can be more optimistic than those
of later RCTs (Trikalinos et al., J. Clin. Epidemiology, 2004, 57: 1124-1130). This is
even true of highly cited RCTs, such as those published in JAMA, NEJM, or Lancet. The
results of almost 1 in four (23%) highly cited RCTs are later shown to be too optimistic
(Ionnidis, JAMA, 2005, 294: 218-228). This early optimism may occur because the
patients enrolled in early RCTs are unlike those who are enrolled in later trials. In
particular, patients enrolled in earlier trials may be sicker than those enrolled in later
trials. (Gehr et al., BMC Medical Research Methodology, 2006, 6:25). Such patients may
not be like those routinely receive the intervention in clinical practice. Therefore, the
applicability of early RCTs is sometimes suspect.

NOTES:

20
Quality is not “reporting.” Incomplete reporting makes quality evaluation difficult. One
study of the oncology literature (Soares et al., BMJ, 2004, 328: 22-24) found that many
trials that performed an a priori power analysis, used adequate concealment of allocation,
or performed an intent to treat analysis did not report that they did so. In rheumatology,
77.4% of articles judged to have performed “inadequate” random-sequence generation
from the published report actually did so adequately. Similarly, 78.1% of articles judged
to have performed “inadequate” concealment of allocation actually did so (Hill et al., J.
Clin. Epidemiology, 2002, 55: 783-786). Similar failures to report positive aspects of
study design have also been found in obstetrics and gynecology articles (Schulz et al,
BMJ, 1996, 312: 742-744, and in internal medicine (Devereaux et al., J. Clin.
Epidemiology, 2004, 57: 1232-1236.

Specific tools for gauging the quality of reporting are available. The CONSORT checklist
is one such tool for RCTs. The checklist is shown below and is available electronically at
http://www.consort-statement.org/Downloads/download.htm.

Because the CONSORT checklist is for reporting, some of its items do not necessarily
relate to how well a study was designed or conducted. For example item 1 in the checklist
asks whether the paper’s title and abstract let the reader know that the study was
randomized.
PAPER SECTION Item Description Reported
And topic on
Page #
TITLE & 1 How participants were allocated to interventions (e.g.,
ABSTRACT "random allocation", "randomized", or "randomly assigned").
INTRODUCTION 2 Scientific background and explanation of rationale.
Background
METHODS 3 Eligibility criteria for participants and the settings and
Participants locations where the data were collected.
Interventions 4 Precise details of the interventions intended for each group
and how and when they were actually administered.
Objectives 5 Specific objectives and hypotheses.
Outcomes 6 Clearly defined primary and secondary outcome measures
and, when applicable, any methods used to enhance the
quality of measurements (e.g., multiple observations, training
of assessors).
Sample size 7 How sample size was determined and, when applicable,
explanation of any interim analyses and stopping rules.
Randomization -- 8 Method used to generate the random allocation sequence,
Sequence including details of any restrictions (e.g., blocking,
generation stratification)
Randomization -- 9 Method used to implement the random allocation sequence
Allocation (e.g., numbered containers or central telephone), clarifying
concealment whether the sequence was concealed until interventions were
assigned.
Randomization -- 10 Who generated the allocation sequence, who enrolled
Implementation participants, and who assigned participants to their groups.

21
Blinding (masking) 11 Whether or not participants, those administering the
interventions, and those assessing the outcomes were blinded
to group assignment. When relevant, how the success of
blinding was evaluated.
Statistical methods 12 Statistical methods used to compare groups for primary
outcome(s); Methods for additional analyses, such as
subgroup analyses and adjusted analyses.
RESULTS 13 Flow of participants through each stage (a diagram is strongly
recommended). Specifically, for each group report the
Participant flow
numbers of participants randomly assigned, receiving
intended treatment, completing the study protocol, and
analyzed for the primary outcome. Describe protocol
deviations from study as planned, together with reasons.
Recruitment 14 Dates defining the periods of recruitment and follow-up.
Baseline data 15 Baseline demographic and clinical characteristics of each
group.
Numbers analyzed 16 Number of participants (denominator) in each group included
in each analysis and whether the analysis was by "intention-
to-treat". State the results in absolute numbers when feasible
(e.g., 10/20, not 50%).
Outcomes and 17 For each primary and secondary outcome, a summary of
estimation results for each group, and the estimated effect size and its
precision (e.g., 95% confidence interval).
Ancillary analyses 18 Address multiplicity by reporting any other analyses
performed, including subgroup analyses and adjusted
analyses, indicating those pre-specified and those
exploratory.
Adverse events 19 All important adverse events or side effects in each
intervention group.
DISCUSSION 20 Interpretation of the results, taking into account study
Interpretation hypotheses, sources of potential bias or imprecision and the
dangers associated with multiplicity of analyses and
outcomes.
Generalizability 21 Generalizability (external validity) of the trial findings.
Overall evidence 22 General interpretation of the results in the context of current
evidence.

Checklists also exist for the quality of reporting of:

(1) Diagnostic studies (the STARD statement, which is available at http://www.consort-


statement.org/stardstatement.htm)

(2) Meta-analyses of RCTs (the QUOROM statement, which is available at


http://www.consort-statement.org/QUOROM.pdf)

(3) Meta-analyses of observational studies (the MOOSE statement, which is available at


http://www.consort-statement.org/News/news.html#moose)

NOTES:

22
There are several ways to assess quality. Quality can be assessed using evidence
hierarchies (“levels of evidence”), checklists, or scales. Checklists and scales are similar.
The primary difference is that in a scale, points are assigned to each question.

“Levels of evidence” approaches are common in the world of clinical practice guidelines.
Atkins et al. (BMC Health Services Research, 2004, 4:38) identified 57 organizations that
used this approach.

For the purposes of this talk, a “level of evidence” is defined as characterizing individual
studies. This is to avoid confusion of the term with “grades of recommendation” (or
“grades of evidence”), which applies to a body of literature, and may include
considerations about how to balance the benefits and harms of an intervention.

Presumably, higher levels of evidence are relatively immune to bias, and lower levels are
not. Therefore, one should have more confidence in the results of higher level studies.

“Levels of evidence” may be commonly used because they are relatively simple. In
orthopaedics, there is good agreement among reviewers about which level of evidence
should be assigned to a study. Agreement among reviewers with epidemiology training is
particularly good (Bhandari et al. JBJS, 2005, 86: 1717-1720).

Bridevaux et al. (Int. J. for Quality in Health, 2006, 18:177-182) found moderate (kappa
= 0.41) to almost perfect (kappa = 0.96) agreement among 8 gastroenterologists, 2
surgeons, and 4 primary care physicians (i.e., 14 clinicians) who rated the appropriateness
of 95 indications for screening for colorectal cancer. However, the experts overestimated
the level of evidence addressing 60% of the indications, while underestimating it in only
4%.

There are many different “levels of evidence” systems. The table below shows some of
them (from Upshur, JAMC, 2003, 169:672-673).

Source Highest Level for a Treatment or


Intervention
Scientific Advisory Council of the 1+ = systematic review or meta-analysis of
Osteoporosis Society of Canada RCTs
1= on RCT with adequate power
Centre for Evidence-Based Medicine 1a = systematic review with homogeneity
of RCTs
1b = individual RCT with narrow
confidence interval
1c= all or none
Scottish Intercollegiate Guidelines 1++ = high quality meta-analyses,
Network systematic reviews of RCTs, or RCTs with
very low risk of bias
1+ = well-conducted meta-analysis,
systematic reviews of RCTs, or RCTs of

23
with low risk of bias
1– = meta-analysis. Systematic reviews of
RCTs, or RCTs with high risk of bias
JBJS High quality randomized trial with
statistically significant difference or no
statistically significant difference but
narrow confidence intervals or systematic
Review of Level I RCTs (and study results
were homogenous)

The existence of so many systems is confusing (GRADE Working Group, BMJ, 2004,
328: 1490-1494).

Level I studies may not be well-reported. Poolman et al. (JBJS, 2006, BMC Medical
Research Methodology, 6:44) have found poor reporting of Level I orthopaedic studies.
This raises questions about whether these studies are really of high quality.

Checklists and scales are also used to assess study quality. Checklists consist of a series
of questions. Assigning points to each question converts a checklist into a scale. Scales
yield numerical quality scores. Twenty scales and 11 checklists for judging RCT quality
were developed between 1995 and 2000 (West S, King V, Carey TS, et al. Systems to
Rate the Strength of Scientific Evidence. Evidence Report/Technology Assessment No.
47 (Prepared by the Research Triangle Institute-University of North Carolina Evidence-
based Practice Center under Contract No. 290-97-0011). AHRQ Publication No. 02-
E016. Rockville, MD: Agency for Healthcare Research and Quality. April 2002).

There are at least 194 scales and checklists for evaluating the quality of non-randomized
studies Deeks JJ, Dinnes J, D’Amico R, Sowden AJ, Sakarovitch C, Song F, et al., Health
Technol Assess 2003; 7(27), available on-line at
http://www.hta.nhsweb.nhs.uk/fullmono/mon727.pdf).

That there are so many checklists and scales raises the risk that different instruments will
lead to different appraisals of quality.

In general, methodologists recommend against using scales. (Greenland, Am. J.


Epidemiology, 1994, 140: 300-301, Tritchler, Statist. Med, 1999. 18: 2135-2145). How
(or whether) to assign different weights to different questions is unknown (for example,
should more weight be given to questions about randomization than to questions about
blinding? Should both questions receive the same weight?). One result is that whether a
trial is classified as high or low quality depends on the scale used (Brouwers, BMC
Medical Research Methodology, 2005, 5:8). Also, the results of meta-analyses change,
depending on which scale is used to weight studies (Juni et al., JAMA, 1999, 282: 1054-
1060).

NOTES:

24
Key Quality Domains for RCTs Identified by West et al. (available on-line at
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=hstat1.table.71551)
Domain Elements*
Study Population • Description of study population
• Specific inclusion and exclusion criteria
• Sample size justification
Randomization • Adequate approach to sequence generation
• Adequate concealment method used
• Similarity of groups at baseline
Blinding • Double-blinding (e.g., of investigators, caregivers, subjects,
assessors, and other key study personnel as appropriate) to
treatment allocation
Interventions • Intervention(s) clearly detailed for all study groups (e.g.,
dose, route, timing for drugs, and details sufficient for
assessment and reproducibility for other types of
interventions)
• Compliance with intervention
• Equal treatment of groups except for intervention
Outcomes • Primary and secondary outcome measures specified
• Assessment method standard, valid, and reliable
Statistical • Appropriate analytic techniques that address study
Analysis withdrawals, loss to follow-up, missing data, and intention to
treat
• Power calculation
• Assessment of confounding
• Assessment of heterogeneity, if applicable
Funding or • Type and sources of support for study
Sponsorship
*Elements appearing in italics are those with an empirical basis. Elements appearing in
bold are those considered essential to give a system a full Yes rating for the domain.

NOTES:

25
Key domains for Observational Studies Identified by West et al. (available on-line at
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=hstat1.table.71573)
Domains Elements
Comparability of For all observational studies:
Subjects
• Specific inclusion/exclusion criteria for all groups
• Criteria applied equally to all groups
• Comparability of groups at baseline with regard to disease
status and prognostic factors
• Study groups comparable to non-participants with regard
to confounding factors
• Use of concurrent controls
• Comparability of follow-up among groups at each
assessment
Additional criteria for case-control studies:

• Explicit case definition


• Case ascertainment not influenced by exposure status
• Controls similar to cases except without condition of
interest and with equal opportunity for exposure
Exposure or • Clear definition of exposure
Intervention • Measurement method standard, valid and reliable
• Exposure measured equally in all study groups
Outcome • Primary/secondary outcomes clearly defined
Measurement • Outcomes assessed blind to exposure or intervention status
• Method of outcome assessment standard, valid and
reliable
• Length of follow-up adequate for question
Statistical Analysis • Statistical tests appropriate
• Multiple comparisons taken into consideration
• Modeling and multivariate techniques appropriate
• Power calculation provided
• Assessment of confounding
• Dose-response assessment, if appropriate
Funding or • Type and sources of support for study
Sponsorship
*Elements appearing in italics are those with an empirical basis. Elements appearing in
bold are those considered essential to give a system a Yes rating for the domain.
Domain for which a Yes rating required that a majority of elements be considered.

26
A different scheme is used by the Cochrane Collaboration. Cochrane identifies four key
sources of bias, and suggests ways they should be controlled in two different study
designs.
Source of Bias Cohort Studies Case Control Studies
Selection Bias Control for confounders Matching
Performance Bias Measurement of Exposure Measurement of Exposure
Attrition Bias Completeness of follow-up Completeness of follow-up
Detection Bias Blinding Case definition

To evaluate the results of a checklist, separate the questions in it according to the domain
that each addresses. If several explicit criteria are used to assess validity, it is desirable to
summarize these so as to derive an overall assessment of how valid the results of each
study are. A simple approach to doing this is to use three categories such as the
following:

Risk of bias Interpretation Relationship to individual criteria


A. Low Risk of bias Plausible bias unlikely to All of the criteria met
seriously alter the results
B. Moderate risk of bias Plausible bias that raises One or more criteria partly met
some doubt about the
results
C. High risk of bias Plausible bias that seriously One or more criteria not met
weakens confidence in the
results

The relationships suggested above will most likely be appropriate if only a few
assessment criteria are used and if all the criteria address only substantive, important
threats to the validity of study results. In general and when possible, authors should
obtain further information from the authors of a report when it is unclear whether a
criterion was met (From Higgins JPT, Green S, editors. Cochrane Handbook for
Systematic Reviews of Interventions 4.2.5 [updated May 2005].
http://www.cochrane.org/resources/handbook/hbook.htm).

Evidence hierarchies, checklists, and scales are often used to rate the quality of a study.
Yet, studies report multiple outcomes. Often, a study is of high quality only for the
primary outcome. Statistical power considerations, incomplete information on some
outcomes, and a host of other difficulties can make a study well-designed for one
outcome but not another.

NOTES:

27
When one has multiple studies at hand, a whole new set of considerations arise. Here, one
must not only consider the quality of studies, but also the quantity of data (in terms of
number of studies and number of patients), its consistency, and the magnitude of the
observed effect.

There are many systems for grading a body of evidence, and they can yield different
results. Ferreira et al. (J. Clin. Epidemiology, 2002, 55: 1126-1129) compared four
different systems and found that they disagreed almost half the time. In the case of low
back pain, grades ranged from “moderate evidence” in favour of an intervention to “no
evidence.” Disagreement occurs because of lack of agreement on the number and quality
of evidence required for a particular grade.

Conclusions:
1. There is no universally accepted way to evaluate study quality.
2. Often, decisions about which method to use involve using an approach
commensurate with the methodological expertise of those who are evaluating the
evidence.
3. The amount of time needed to complete a checklist/scale are also factors in
considering which approach to use.
4. We probably don’t need another scale or checklist. Use something “off the shelf.”
5. Be transparent and seek reproducibility.

NOTES:

28
EVIDENCE BASED PRACTICE
Michael J. Goldberg, M.D.
Children’s Hospital & Regional Medical Center

Slide 1 ___________________________________
___________________________________
Are we helping?
___________________________________
How do we know?
___________________________________
And, who else should know?
___________________________________
___________________________________
___________________________________

Slide 2 ___________________________________
How do we know we are ___________________________________
providing the best care?
___________________________________
The best care is:
Safe ___________________________________
Effective
Timely ___________________________________
___________________________________
___________________________________

Slide 3 ___________________________________
___________________________________
Outcomes
___________________________________
The results of patient care from the
perspective of the patient, of the ___________________________________
doctor and of the system.

___________________________________
___________________________________
___________________________________

29
Slide 4 ___________________________________
Types of Outcomes ___________________________________
Technical or physiological ___________________________________
Functional health status
Patient satisfaction ___________________________________
Resource utilization
___________________________________
___________________________________
___________________________________

Slide 5 ___________________________________
Technical or Physiologic Outcomes ___________________________________
Joint motion ___________________________________
Blood pressure
Laboratory tests ___________________________________
Scoliosis degrees
___________________________________
___________________________________
___________________________________

Slide 6 ___________________________________
Functional Health Status ___________________________________
Tasks ___________________________________
Roles
Jobs ___________________________________
Quality of life
___________________________________
___________________________________
___________________________________

30
Slide 7 ___________________________________

Patient Satisfaction
___________________________________

Processes of care
___________________________________
Consequences of care
___________________________________
___________________________________
___________________________________
___________________________________

Slide 8 ___________________________________

Resource Utilization
___________________________________

Utilization of dollars
___________________________________
Utilization of services
___________________________________
___________________________________
___________________________________
___________________________________

Slide 9 Measurement Tools


___________________________________
Technical xrays ___________________________________
goniometers
___________________________________
Functional patient/proxy
questionnaires ___________________________________
Satisfaction questionnaires ___________________________________
Resource administrative data
utilization
___________________________________
___________________________________

31
Slide 10 ___________________________________
___________________________________
A word of caution !
___________________________________
When measuring functional
performance, the measurement ___________________________________
tool can drive the result.
___________________________________
___________________________________
___________________________________

Slide 11 ___________________________________
___________________________________
Assessing the results of hip
replacement. A comparison of five ___________________________________
different rating systems.
___________________________________
Callaghan JJ, Dysart SH, Savory CF.
JBJS, 72B:1008-1009, 1990
___________________________________
___________________________________
___________________________________

Slide 12 ___________________________________
100

80
___________________________________
Excellent
60 ___________________________________
Good
40
Fair ___________________________________
20 Poor

0
___________________________________
HSS Mayo Iowa Harris MD
___________________________________
___________________________________

32
Slide 13 ___________________________________
___________________________________
Questionnaires constructed by or
administered by those who have a ___________________________________
vested interest in the success of the
treatment may be suspect.
___________________________________
___________________________________
___________________________________
___________________________________

Slide 14 Validated Questionnaires


___________________________________

Valid – Specificity
___________________________________
Measure what you want it to
Reproducible – Reliable
___________________________________
When patient stable it measures the
same result on different occasions ___________________________________
Responsive – Sensitive
When patient changes clinically, the ___________________________________
questionnaire detects it
___________________________________
___________________________________

Slide 15 ___________________________________
Are outcomes the best way to measure
quality of care? ___________________________________
Do not address wide geographic variations
in treatment.
___________________________________
Determined by co-morbid conditions that
cannot be influenced by doctors. ___________________________________
Not always linked directly to what the
doctor did. ___________________________________
Have a long timeline and are expensive to
measure.
___________________________________
___________________________________

33
Slide 16 ___________________________________
Performance Measures
___________________________________
Immunization rates
Breast cancer screening ___________________________________
Eye exams in diabetics
___________________________________
Adherence to evidence based
performance measures derived from
evidence based guidelines.
___________________________________
___________________________________
___________________________________

Slide 17 ___________________________________
___________________________________
Evidence Based Practice
and ___________________________________
Orthopaedic Based Practice
___________________________________
Can they co-exist??
___________________________________
___________________________________
___________________________________

Slide 18 ___________________________________
___________________________________
Evidence Based Practice
___________________________________
The integration of the best research
evidence with clinical expertise and ___________________________________
patient values.

___________________________________
___________________________________
___________________________________

34
Slide 19 Orthopaedic Reality
___________________________________

Few randomized trials ___________________________________


Overall poor quality of literature
Bias in technology reporting ___________________________________
Limited skills reading research
methodology ___________________________________
Technical skills drive the concept of
clinical expertise
Incentives to use newest technology
___________________________________
Surgical skills do differ
___________________________________
___________________________________

Slide 20 In America, the federal government, and


___________________________________
industry - the two largest purchasers of
health care services - want physician ___________________________________
performance measures now!

Structure Measures: telephone access


___________________________________

Process measures: “sign-your-site”


___________________________________
Measures related to exactly what the ___________________________________
doctor is doing while treating a patient’s
disease ___________________________________
___________________________________

Slide 21 Physician Performance Measurement: A


___________________________________
clash of competing interests:
___________________________________
1. The imperative to develop physician
performance measures may overlook the ___________________________________
fact that the evidence may not be
there. ___________________________________
2. Collected data used for quality ___________________________________
improvement may also be used for
accountability and public reporting.
___________________________________
___________________________________

35
Slide 22 Physician Performance Measurement: A
___________________________________
clash of competing interests:
___________________________________
1. The imperative to develop physician
performance measures may overlook the ___________________________________
fact that the evidence may not be
there. ___________________________________
2. Collected data used for quality ___________________________________
improvement may also be used for
accountability and public reporting.
___________________________________
___________________________________

Slide 23 ___________________________________
Evidence Evidence Based
Analysis Guidelines ___________________________________
___________________________________
Education Performance
And Policy Measures ___________________________________
___________________________________
Outcomes
___________________________________
___________________________________

Slide 24 Physician Performance Measures Must


___________________________________
Measure processes that are under the
control of the doctor ___________________________________
Control for co-morbidities and severity
of illness ___________________________________
Be aware of the micro environment in
which physicians work
Use different measures for specialists ___________________________________
and for primary care doctors
Have proper patient inclusion and ___________________________________
exclusion criteria
Use proper statistical analysis
Use data appropriately
___________________________________
___________________________________

36
Slide 25 Physician Performance Measurement: A
___________________________________
clash of competing interests:
___________________________________
1. The imperative to develop physician
performance measures may overlook the ___________________________________
fact that the evidence may not be
there. ___________________________________
2. Collected data used for quality ___________________________________
improvement may also be used for
accountability and public reporting.
___________________________________
___________________________________

Slide 26 ___________________________________
___________________________________
Data
___________________________________
Quality Accountability ___________________________________
Improvement
___________________________________
___________________________________
___________________________________

Slide 27 ___________________________________
Accountability is better called Judgment ___________________________________
Judgment for: ___________________________________
• Patients to choose a doctor
• Boards to maintain certification ___________________________________
• Payors to pay for results
___________________________________
___________________________________
___________________________________

37
Slide 28 ___________________________________
In America, the House of Medicine is
in a tizzy ___________________________________
(Def: A highly excited and distracted state of mind)
___________________________________
1. Our pocketbooks and purses
2. Our autonomy ___________________________________
3. The quality of medicine
4. Should be collaborate or resist ___________________________________
___________________________________
___________________________________

Slide 29 ___________________________________
Some nagging thoughts: ___________________________________
1. We do not always practice what we ___________________________________
know.
2. There are recognizable gaps in ___________________________________
quality.
3. Voluntary data reporting systems ___________________________________
have, for the most part, failed.

___________________________________
___________________________________

Slide 30 ___________________________________
___________________________________
And a well known reality:
___________________________________
Physicians will change practice if it is linked to
getting paid or becoming credentialed. ___________________________________
Thus complex P4P schema are on the horizon.
___________________________________
___________________________________
___________________________________

38
Slide 31 ___________________________________
___________________________________
For Example:
___________________________________
1. CMS will link correction of the
flawed reimbursement formula to
P4P. ___________________________________
2. They will use strategies that
address both payment and ___________________________________
credentialing

___________________________________
___________________________________

Slide 32 ___________________________________
___________________________________
Payment for Office Based MRI
___________________________________
1. Follow imaging guidelines.
2. Use certified equipment. ___________________________________
3. Credential both doctors and office
technicians. ___________________________________
___________________________________
___________________________________

Slide 33 ___________________________________
___________________________________
In conclusion:

1. Value evidence, even if you do not like the


___________________________________
result.
2. Support data collection for both quality ___________________________________
improvement and public reporting.
3. Participate and lead, rather than resist ___________________________________
___________________________________
___________________________________

39

You might also like