You are on page 1of 3

EDUCATION

Joachim P Sturmberg Elizabeth A Farmer


MBBS, MFM, PhD, FRACGP, is Associate Professor BSc, MBBS, PhD, FRACGP, is Dean,
of General Practice, Monash University, Victoria Graduate School Medicine, University of
and University of Newcastle, New South Wales. Wollongong, New South Wales.
jp.sturmberg@gmail.com

Assessing general practice


knowledge base
The applied knowledge test
The Royal Australian College of General Practitioners
A multiple choice based knowledge test has been part of the
(RACGP) Fellowship examination assesses competence for
examination for Fellowship of The Royal Australian College of General
unsupervised clinical practice anywhere in Australia through
Practitioners (RACGP) since its inception. In line with current best
three segments, each with a unique focus. The applied
practice, the format changed to a clinically based applied knowledge
test in 2000. The test consists of two question formats – single best knowledge test (AKT) is a written examination that tests
answer questions and extended matching questions. This article candidates’ applied clinical knowledge. Other segments assess
describes the features and characteristics of the RACGP applied clinical problem solving skills1 and ability to perform in a clinical
knowledge test; the design and management of the question bank; situation. Approximately 400–500 candidates sit each
the setting of the pass mark; and the performance of the test in terms administration of the RACGP examination, which is held twice
of validity, reliability and candidate acceptability. yearly throughout Australia.

How applied knowledge questions are designed


The familiar multiple choice questions (MCQ) format of knowledge
testing has been part of the examination since its inception in 1967.
Reviews of the examination and candidate feedback consistently
indicated that the MCQ was not seen as a good test of general practice
knowledge. In 1999, as part of a re-design of the examination, the
written testing components were modified to reflect actual clinical
practice more effectively – the ‘applied knowledge test’. Since then, all
questions are based on a clinical scenario. This requires candidates to
identify the most correct diagnosis or management strategy from a list
of options. Candidates demonstrate not only their knowledge, but also
their ability to apply that knowledge to the clinical situation described.
Scenarios cover general practice encounters in line with BEACH data.2

Question formats
The AKT consists of two item types: 70 single best answer (SBA)
questions and 80 extended matching questions (EMQ). Single best
answer questions require the candidate to select the one correct
answer from five response options (Table 1); EMQ items require
candidates to select the most likely or best answer from a list of up
to 26 choices. Frequently, several EMQ items are presented together
with the same menu list and the same medical theme (eg. diagnosing

Reprinted from Australian Family Physician Vol. 37, No. 8, August 2008 659
EDUCATION Assessing general practice knowledge base – the applied knowledge test

practice locations, and are recruited from examination panels in


Table 1. Examples of question formats used in the AKT
each Australian state. The writers, who work in groups, are asked to
Extended matching questions create scenarios and related questions that reflect general practice
A. Coeliac disease challenges in a range of settings including metropolitan, rural and
B. Depression remote, and that relate to patients from various ethnic backgrounds
C. Diabetes mellitus and age groups. Some scenarios are accompanied by images such as
D. Duodenal ulcer electrocardiographs, radiographs or clinical photographs. Questions
E. Eating disorder are pilot tested among the writing groups. Items are reviewed by the
F. HIV/AIDS national coordinator for the AKT segment before being entered into
G. Hyperthyroidism the item bank of the examination management system.
H. Inflammatory bowel disease Selecting, reviewing and finalising the AKT
I. Lymphoma
For each administration, the examination management system
J. Malaria
selects items from the item bank according to a number of criteria.
K. Oesophageal stricture
The paper thus generated, is checked for duplication of topics and
L. Substance abuse
conditions, and if necessary, suitable replacement questions are
M. Tuberculosis
inserted manually. Experienced examiners then review the draft paper
For the following patient with weight loss, select the MOST
likely diagnosis from the options provided. for content relevance and clarity of wording. The final paper is then
41 year old Kit Fung is a part time teacher’s aide who has prepared for printing and a master scoring sheet forwarded to the
been underweight since her last child was born 3 years ago. data manager for scoring and analysis of candidates’ responses.
She now complains of having lost another 3 kg in the past 6
months. She has had two previous miscarriages and has two Maintaining the item bank
children. She has had episodic bloating diarrhoea for a year
The item bank serves several interdependent functions. As well as
but it has been worse lately. There are no other changes in
her bowel habits, and she doesn’t drink any alcohol. storing items, it holds information for each item about the presenting
Answer: A. Coeliac disease complaint, domains of general practice represented according to
the RACGP curriculum,3 and age and gender characteristics of each
Single best answer questions
‘patient’. Finally, it also records the usage history of each item and
8 year old Miranda Kelly presents with upper sternal pain.
You note that she has a pyrexia of 40ºC and swelling of the information about its performance, such as item total correlation and
interphalangeal joints. She had streptococcal pharyngitis 2 item analyses.
weeks before this episode started. The MOST likely diagnosis is: These data form an important part of assessing the validity of the
A. toxic synovitis AKT component. The item total correlation measures the consistency
B. rheumatoid arthritis with which all items of a paper assess the knowledge base of
C. rheumatic fever general practice.
D. eptic arthritis
Blueprinting
E. steoarthritis
Answer: C. rheumatic fever Blueprinting ensures the content representativeness of each exam
segment as well as the examination as a whole, and is drawn
from the BEACH survey.2 Questions for the AKT paper are selected
a respiratory condition). As in real practice, some scenarios would according to this blueprint and are further stratified by age groups to
allow a number of possible responses (eg. a number of differential ensure a similar representation of paediatric, adult and geriatric cases
diagnoses, or management strategies). The single correct response is in each paper.
that which describes the most typical or likely condition, or the most
Marking
appropriate management strategy.
To assist candidates to familiarise themselves with these two Candidate answer sheets are computer scanned and automatically
question formats, the RACGP offers online learning resources scored against the answer key provided to the statistician. Candidates’
(www.gplearning.com.au) and pre-exam training workshops run by aggregate and segment scores are provided for their feedback.
state faculties.
Validity and reliability of the AKT segment
Constructing questions
A number of strategies are used to support the validity of the AKT
Most scenarios are written by examiners who are general practitioners segment. Validity is the primary concern when evaluating any
and RACGP Fellows. Writers come from all ages, stages of career and test. According to the standards for educational and psychological

660 Reprinted from Australian Family Physician Vol. 37, No. 8, August 2008
Assessing general practice knowledge base – the applied knowledge test EDUCATION

Table 2. Reliabilities of the AKT examination segment In addition, AKT scores typically correlate with those of the two
(1999.2–2006.2) other examination segments in the range of 0.55–0.65. This figure
indicates that the AKT has some variance in common with the other
Time of administration Cronbach’s α
segments. Candidate acceptance of a test format is also an important
1999.2 0.82
aspect of its performance as well as providing support for its validity.
2000.1 0.82
2000.2 0.83 Standard setting
2001.1 0.85 Standard setting for each of the three RACGP examination segments
2001.2 0.87 was introduced in 1999. A modified Angoff method6 has been applied
2002.1 0.90 throughout that period to the AKT. As part of the ongoing review
2002.2 0.88 of all aspects of the RACGP examination, AKT standard setting
2003.1 0.88 is currently the focus of attention in an attempt to improve its
2003.2 0.86 consistency. In general terms, each AKT paper is standard set by
a group of 20 experienced GP judges. They review each AKT item,
2004.1 0.86
determining a required level of performance and taking into account
2004.2 0.88
the characteristics of each item. Judgments are combined over items
2005.1 0.82
and judges to create a pass mark.
2005.2 0.82
2006.1 0.84 Conclusion
2006.2 0.80 The AKT was introduced as part of the RACGP’s commitment to
.1 = first examination, .2 = second examination maintaining relevance and quality of knowledge testing in the new
RACGP examination. The available evidence supports the validity of
testing, validity ‘... refers to the appropriateness, meaningfulness the AKT segment. Analysis of its performance over the past 7 years
and usefulness of the specific inferences made from test scores. has demonstrated that its reliability remains at a level that satisfies
Test validation is the process of accumulating evidence to support international expectations for a high stakes assessment.
such inferences’.4 Validity is a complex construct and four types are Maintaining content validity, blueprinting and standard setting
frequently considered in assessment: require ongoing attention to ensure ongoing performance of the AKT
• face validity estimates how well a test appears to measure a segment at international standards. Newer issues involve further
certain criterion; it does not guarantee that the test actually development of the standard setting processes and the testing of
measures that criterion their validity and reliability. Testing of concurrent or predictive validity
• content validity estimates how far your test samples from or would enhance knowledge about the value of the AKT in high stakes
represents all of the relevant content general practice testing.
• concurrent validity estimates the degree to which a test correlates
with a criterion measure given at the same time Conflict of interest: none declared.
• predictive validity estimates the degree to which a score on the test
References
predict scores on another criterion measure available in the future. 1. Farmer E, Hinchy J. Assessing general practice clinical decision making skills.
Face validity of the AKT is supported by the use of writers and Aust Fam Physician 2005;34:1059–61.
reviewers who are GPs from a wide range of Australian general 2. Britt H, Miller G, Knox S, et al. General practice activity in Australia 2004–05.
Canberra: Australian Institute of Health and Welfare (General Practice Series No
practice backgrounds. Content validity is achieved by selecting items
18); 2005 December 2005.
according to a blueprint, which represents the profile of complaints 3. The RACGP. Training Program Curriculum. South Melbourne: The RACGP, 1999.
presenting to Australian general practice.2 Systematic review of 4. American Educational Research Association, American Psychological Association,
National Council on Measurement in Education. Standards for educational and
relevant evidence ensures the validity of the answer key. However,
psychological testing. Washington: American Psychological Association, 1999.
formal testing of concurrent or predictive validity has not been done. 5. Cronbach L. Coefficient alpha and the internal structure of tests. Psychometrica
The AKT is characterised by a high level of internal 1951;6:297–334.
consistency. The internal consistency of a test is a measure 6. Angoff W. Scales, norms, and equivalent scores. Educational measurement.
Washington, DC: American Council on Education, 1971;508–600.
of the coherency of the test. Cronbach’s α is commonly used
to index internal consistency and ranges between 0–1; scores
above 0.8 are commonly accepted as a gold standard for high
stakes examinations.5 The reliabilities obtained over the past 15
administrations of the examination from 1999.2 to 2006.2 have
consistently met and surpassed that criterion (Table 2). correspondence afp@racgp.org.au

Reprinted from Australian Family Physician Vol. 37, No. 8, August 2008 661

You might also like