You are on page 1of 78

Everyone

deserves to be
inspired!

Writing Good Exam Questions

A Self-study Workbook
Written by Dr Kate Exley
FOR TRAINING PURPOSES ONLY
Produced by the Staff and Educational Development Unit, March 2010
(minor revisions made August 2012)

Writing Good Exam Questions


A Self-study Workbook
Contents
Page
List of Figures and List of Tables

1. Introduction and Purposes of the workbook


Intended learning Outcomes

2. LSHTM Exam requirements

3. The underlying principles of good question design

4. Aligning exam questions and specimen answers with intended learning


outcomes

10

5. What kinds of knowledge and skills can be tested in examinations?

13

6. Reducing the impact of factors such as stress, interpretation, time

19

7. Different styles and formats for exam questions

21

8. Evaluating Draft Questions

29

9. Marking Approaches : Using assessment criteria and marking schemes

33

10. Ways of producing accurate and clear marking guidance for questions

39

11. Ways of producing specimen answers suitable for distribution to


students

46

12. Exam question development, validation and approval processes

47

13. Security issues and appeals

48

14. Providing support and guidance for students

49

15. Concluding Remarks

54

Further Reading suggestions

56

Appendices

58

Appendix 1

58

Appendix 2

60

Appendix 3

63

Writing Good Exam Questions

List of Figures
Page
Figure 1

Diagram to illustrate the principles of Constructive Alignment in


module design

10

Figure 2

Blooms Taxonomy of Cognition


Revisited by Anderson & Krathwohl (2001)

15

Figure 3

A Normal Distribution or bell-shaped curve

34

List of Tables
Page
Table 1

A table of suggested verbs mapped against the Anderson and


Krathwohl adapted levels of Blooms Taxonomy of Cognition

16

Table 2

Ways in which intellectual skills can be tested through different


question stems

17

Table 3

Some Common Essay Style Questions used in Exams

25

Table 4

LSHTM Marking Gradepoints descriptions (Overarching criteria)

37

Table 5

Examples of grade conversions used at the School

38

-1-

Writing Good Exam Questions

1.

Introduction and Purposes of the workbook

1.1

The intended reader

This workbook is intended to support colleagues at The London School of Hygiene


and Tropical Medicine as they seek to write appropriate Masters level examination
questions and their accompanying assessment criteria and marking guidance. The
booklet aims to provide clear guidance on what is expected and through the use of
examples and exercises, enable colleagues to test out their own exam question
writing.
It is primarily intended for those who are new to writing examination questions
although more experienced colleagues may find it useful as a reference or updating
source. The workbook can be used in a number of ways. For those unable to attend
the Writing Better Exam Questions staff development workshop, it can act as a
distance learning resource that can be worked through systematically or it can be
quickly consulted to check and review current practice.
For those using the workshop in the distance learning mode the anticipated learning
outcomes are

1.2

To be familiar with the structure and format of LSHTM examinations


To be able to apply the principles of constructive alignment
To critique the different kinds of knowledge and skills that can be assessed
through the examination format
To consider how to ensure that questions are fair and equally accessible to all
students. (For example the layout and design of the question on the page)
To prepare appropriate marking guidance
To be familiar with question approval and validation processes in the School

The underlying principle

John Biggs theory of Constructive Alignment is used to provide the underpinning


framework for question design here. The different kinds of knowledge and skills that
can be tested appropriately through a written examination method are discussed as
is the need for questions to give equal and fair opportunity to all students. As many
students at The School are from overseas and may not have English as a first
language there are a number of factors to consider when writing clear and
unambiguous questions that do not unintentionally favour particular student groups.
A range of different question formats are reviewed and critiqued and a number of
ways of quality assuring draft questions are suggested and explained. However,
writing the exam question is only half the story producing the associated marking
guidance is also considered here. Marking guidance can take a number of different
forms ranging from specimen or model answers through to descriptive criteria and
detailed marking schemes matched to necessary answer content. These too will be
discussed with reference to examples from The School.
-2-

Writing Good Exam Questions

Aims of assessment at LSHTM


For all LSHTM courses, the overall aim of assessment is to
facilitate the learning of important elements in the course
and to test that the student has reached the minimum standard
acceptable for the award.
LSHTM Assessment Code of Practice
(January 2012)

-3-

Writing Good Exam Questions

2.

LSHTM Exam requirements

London-based MSc Course June exams:


There are two three-hour written examination papers taken in June. Together these
two papers contribute 30% to the final assessment (15% each).
Paper 1 examines the content of Term 1 teaching. It usually comprises questions
from each of the core/linear modules taken in Term 1. Thus the same questions for a
particular module will appear on several MSc Course exam Paper 1s. Design and
marking of these questions is co-ordinated by the teaching module organiser
together with other teaching module staff.
Paper 2 tests candidates ability to integrate the knowledge and skills acquired during
the whole of the MSc course. Paper 2 was originally developed in the mid-1990s
after full implementation of the present teaching module structure. As a whole, Paper
2 should be examining the key knowledge/skills which a candidate graduating with
an MSc in X should have. In devising Paper 2, MSc Exam Boards should reflect on
the intended learning outcomes for the MSc some of which are likely to require
assessment in this exam (others might have been assessed in compulsory study
modules the project etc). MSc intended learning outcomes can be found in the MSc
Course Handbook, prospectus etc. Questions should require integration of
knowledge/skills acquired in different parts of the MSc course they might use
material from compulsory modules but not optional ones that only some of the class
might take.

Distance learning PGDip/MSc June exams:


Most Distance Learning (DL) modules have a 2-hour exam covering the content of
that module and this contributes 70-100% of the modules mark. MSc EPI and CT
also have a 3-hour integrating paper (E400), akin to Paper 2 above, which
candidates sit in their final year of the course. Exam questions are usually coordinated by the Module organiser or other designated members of staff.
To provide guidance and formative support on the examination process DL exams
are compiled each year into two 'examiners reports' for each MSc course - one for
core modules and one for advanced modules. The reports give the complete exam
papers together with a guide on how the questions should have been answered. DL
students are sent reports from the 3 previous years.
All module exams taken will count towards the degree, save only where a student
has been assessed on more modules than are required in which instance the
Exam Board will determine whether an award may be given, and which modules are
counted towards it.

-4-

Writing Good Exam Questions

The Schools Assessment Code of Practice describes six assessment objectives that
should be kept in mind when writing examination questions and designing
assessments.
These are
Objectives of assessment at LSHTM

Identify whether each student has attained a minimum


level of achievement necessary to pass the course and
identify those who fail to achieve that level.
Note - intended learning outcomes should set out the
minimum standard of learning required for the award and
assessment should be designed to provide students with the
opportunity to demonstrate that they have achieved or
exceeded that standard (refer to Section 4 below).

Focus learning on the important aspects of each course.


Note In attempting to increase the difficulty of assessment
tasks set it is important to aim to assess more deeply rather
than more widely, ie to avoid focussing on peripheral and
less consequential details (refer to Section 3 and 9 below).

Provide feedback on performance so that learning may


improve.
Note The goal is that students are able to learn through the
process of being assessed and view assessment as part of
their learning process this requires that any feedback
provided is designed to feedforward (refer to Section 12
below).

Provide a means of encouragement.


Note It is important to remember that nothing is more
motivating than success and students need to be able to see
the progress they are making and build their confidence as
achievers (refer to Section 11 below).

Interfere as little as possible with important, but


ungraded, aspects of a students educational
experience.
Note - We define learning outcomes but individuals will learn
other things whilst studying and unintended outcomes can
be equally valuable (refer to Section 5 below).

Identify those students achieving the highest standards


so that they can be considered for a Distinction.
Note Designing exam questions, assessment criteria and
marking guidance that allow differentiation and enable good
students to get high marks (refer to Section 10 below).

-5-

Writing Good Exam Questions

3.

The underlying principles of good question design

The goal Test items should be really difficult for people who don't understand the
subject material, but they should be straightforward for those who do. If an item is
difficult because of complicated wording (e.g., double negatives) or vocabulary, you
will end up testing language skills rather than ability in the discipline.

The principles that underlie good question design are


i.
ii.
iii.
iv.
v.

Clarity
Reliability,
Validity
Authenticity
Fairness

Thinking about each in turn

i)

Clarity

Nothing in the content or structure of [a test] item should prevent an informed


student from responding correctly.
Gronlund (1998)

The clarity of an exam question may be compromised by unclear test instructions,


confusing and ambiguous terminology, overly verbose and complicated vocabulary
and/or sentence structure plus unnecessary and distracting detail (Gay, L.R., &
Airasian, P. 2000).
The layout of a question is also very important in conveying clarity particularly in
longer, multi-sectioned or data handling styled questions.

-6-

Writing Good Exam Questions

Note some dyslexic students have a tendency to mis-read or miss completely a


short second line of text or additional comment or second part of a question such as,
e.g.
What will be the outcome of adding further sodium chloride at this point?
Explain your answer.
In an interview, a dyslexic student spoke of this second line being hidden away and
he had developed ways of re-reading questions to try and avoid this happening to
him, however, the layout of a question may add to this problem, e.g. the indent here,
on the second line may make it more hidden.

EXERCISE
Testing for clarity contrast the following versions of the same exam question
(essay format answer required):
Version A:
Public health policy in the United Kingdom underwent a number of significant
changes during the Twentieth Century that can be directly attributed to the needs
and exigencies brought about by international conflict. Some of the changes and
developments that resulted to health systems and service delivery are still with us
today and it is important that we understand the background of circumstances that
influenced the decisions that were made. Provide a short analysis charting what you
consider to be the main transitions in public health policy brought about by the
unique needs and challenges, both direct and indirect, of an environment of
international conflict, within the UK health systems specifically, using the Second
World War as an example.
Version B:
Compare the advances in UK public health policy pre- and post-Second World War.
Think about points such as:
unclear test instructions,
confusing and ambiguous terminology,
being overly verbose,
using complicated vocabulary,
difficult or poor sentence structure,
unnecessary and distracting detail.

-7-

Writing Good Exam Questions

ii)

Reliability

Does the question allow markers to grade it consistently and reproducibly and does it
allow markers to discriminate between different levels of performance? This
frequently depends on the quality of the marking guidance and clarity of the
assessment criteria. It may also be improved through providing markers with training
and opportunities to learn from more experienced assessors.
The likelihood of eliciting an accurate measure of a students ability will be increased
when students are provided with a variety of ways to demonstrate their knowledge
and skills. For example, some students might generally do better on exams whilst
other students do better in their coursework. Including both, in a course will
accommodate those differences between students however as the DL courses are
provided through the University of Londons external programme, that restricts the
mode of most assessment to examinations, this may not be an option for all Module
leaders. However, even within a written examination we can include a variety of
question formats that can help to triangulate and cater to a students abilities and
provide a more reliable measure of their attainments.

iii)

Validity

A valid examination question measures achievement of the intended learning


outcomes of the module/unit module (not just what is easy to measure!). The form of
the examination question may also be of importance in ensuring validity. For
example, examination questions that are short answer questions are a good way of
assessing greater breadth of material covered in a course but tend to focus on
testing attainment of knowledge and application of knowledge. Whilst longer essay
style questions allow a more in depth exploration of subject material and require a
candidate to build and structure an argument or explain a complex concept with wide
reference to examples and readings. If these aspects are important they should be
clearly described in the learning outcomes and be transparent in the assessment
criteria for the assessment to be valid.

iv)

Authenticity

Authenticity is the need to match the style and approach of question setting to the
reality of practice. This is particularly important when considering the assessment of
Masters level qualifications frequently taken by mature students who are
accustomed to working within a professional context. A general example might be,
rather than set an essay style question, ask students to present their understanding
in the style of a professional, or industrial, or clinical report.
This may be very important when considering the testing of procedural knowledge
or functioning knowledge (please see 5.1). When the exam seeks to test a
candidates knowledge of how something works, the order or sequencing of events,
the interplay between contributing factors etc it can be very important to ensure
this is built into the question formatting and context setting to allow authenticity.

-8-

Writing Good Exam Questions

Example
A learning outcome for a module is
..will be able to design survey questionnaires to gather quantitative and qualitative
data in the field.

An examination question to test this procedural (knowing how to do something) kind


of knowledge (rather than memorisation of facts) could, for example, provide the
students with a sample survey questionnaire and ask them to give feedback on it
e.g. point out weaknesses in its design or suggest improvements and explain how it
could be administered in the field.

v)

Fairness

You need to give students a fair chance to demonstrate what they know and can do
and to be able to succeed in examinations. Fairness can be facilitated by being very
clear about expectations in student performance, providing examples of past
examination papers, giving opportunities for students to practice and gain exam
technique (through mocks for example), plus transparency in the processes and
criteria that will be used to mark and grade their work.
Students should know what is expected of them in order to obtain a particular grade
and their marks should be a reflection of their abilities and not a reflection of
extraneous and irrelevant factors such as gender, disability etc. Providing a level
playing field is the aim and this is particularly important at The School when
considering the different groups of students who come to study or embark upon DL
courses, e.g. non-native English speakers, students who have previously
experienced very different educational cultures, mature professionals etc.

-9-

Writing Good Exam Questions

4.

Aligning exam questions (and specimen answers) with intended


learning outcomes*

Constructive alignment is the term coined by John Biggs to describe a coherent


approach to ensure that the learning outcomes, teaching and learning methods and
the assessment for a unit of study are all directing student learning in the same
direction.
Figure 1.
Diagram to illustrate the principles of Constructive Alignment in module
design.

What do you want your students to learn?


Aims and Learning Outcomes

How will you help your students to learn it?


Teaching and Learning Methods
Learner Support and Guidance

How do you
know any of it
is working?
Module
Evaluation

How will you know how well they have


learnt it?
Assessment Methods and Criteria

* Learning Outcomes is the preferred terminology given in the QAA Codes of


Practice however learning goals may also be described as learning objectives in
some documentation.
- 10 -

Writing Good Exam Questions

An excellent place to start when writing an exam question is to go back to the


Learning Outcomes for the course or module. These should describe what it is that
you want your students to know about or be able to do at the end of the course.

Example
At the end of the module students should be able to select an appropriate method
and use it to test the significance of collected data.

The learning outcome clarifies what opportunities need to be built into a test question
and ensure that the test is valid. For the learning outcome given above students
should be expected to select a method and have the scope to be able to apply the
method to some data and finally to be able to comment on the significance or
otherwise of the data. To further clarify it would be beneficial to demarcate these
three different tasks within the question itself,

Example

selection of appropriate method,


using the method and
interpreting the significance of the findings

perhaps as separate question sections a, b, and c and finally, for each to have a
clear allocation of the total marks for the question.

Considering scale and scope


Examination questions should also aim to indicate to the students how much is
required of them to achieve a good mark the scale and scope of their expected
answers. One common way of doing this is to give a time limit (10 questions in 20
minutes) or you could limit the amount of space in the answer booklet or on-line proforma provided. Alternatively you can set a maximum word limit for responses.
In addition to these structured ways of indicating the length of answers expected
question setters can include Boundaries in their questions, such as

Within the limits


e.g. Between 2001 and 2005
- 11 -

Writing Good Exam Questions

To what extent
e.g. Using your knowledge of both prokaryotes and eukaryotes

Quantities and amounts


e.g. Provide 5 reasons why

With reference to
e.g. With reference to the published research from ..

EXERCISE
For the questions given below Underline the verb and key elements of the question that give an indication of
the extent (limits and boundaries) of the question.
Do you feel these are appropriate for Masters level study?
1.
Describe the three main methods of economic evaluation (40%). What are the
main strengths and weaknesses of each method? (40%). Support your answer with
examples of disease evaluation (20%)
2.
A recent retrospective analysis of health records in the Gambia has
suggested that the incidence of malaria has fallen dramatically in that country over
the last 10 years. The elimination of the disease is beginning to be discussed. The
National Malaria Control Programme has begun a surveillance system to detect
future changes.
What advice would you give the National Malaria Control Programme on how to
organize a surveillance system for malaria. Give practical tips for ensuring its quality.
3.
Write short notes on THREE of the following. In each case explain the
importance of the infectious agent and the mode of transmission in its spread and
control.
a)
b)
c)
d)
e)

rotavirus diarrhoea
measles
guinea worm
dengue
tuberculosis

Please see Appendix 1. for some feedback comments on this exercise. You may
also wish to refer directly to the learning outcomes of your modules and the Masters
level descriptors in the Qualification Framework document.

- 12 -

Writing Good Exam Questions

5.

What kinds of knowledge and skills can be tested in examinations?

It is possible to test a wide variety of different kinds of knowledge, skills and attitudes
through the careful writing of examination questions.
5.1
5.2
5.3
5.4

kinds of knowledge that can be tested


kinds of intellectual skills
kinds of transferable skills
kinds of attitude

e.g.
e.g.
e.g.
e.g.

Knowledge domains
Analysing, Evaluating
Writing skills, Time use
Ethics, equality

Again taking each of these elements in turn let us first consider the different kinds of
Knowledge and ways of knowing that you may wish to test in your students.

Exam questions should test a range of knowledge and skills at


Masters level. They should test and reward critical
appreciation and the ability to apply what has been learnt
rather than the passive reproduction of memorised facts.

Assessment Code of Practice, (2012)

5.1.

The kinds of knowledge that can be tested knowledge domains

Factual Knowledge
Terminology, facts, figures

Conceptual Knowledge
Classification, Principles, Theories, Structures, Frameworks

Procedural Knowledge
Algorithms, Techniques and Methods and Knowing when and how to use
them.

Metacognitive Knowledge
Strategy, Overview, Self Knowledge, Knowing how you know.

- 13 -

Writing Good Exam Questions

EXERCISE
Please consider the following four examination questions and decide what
kind of knowledge you feel they would test?
1.

What are the key steps and processes in bringing a new anti-cancer drug to
market and introducing it for clinical use?

2.

Write short notes on the following


a.
Bacterial pathogenicity
b.
Neisseria gonorrhoeae
c.
Group A streptococci

3.

Using the tabulated data provided calculate the incidence risk of prostate
cancer per 1000 men, per 5 years, at each of the given levels of alchohol
consumption.

4.

Why do malaria parasites persist in the human population. Explain the choice
of drugs which could be used to prevent persistence of Plasmoduim
falciparum and Plasmodium vivax.

5.2.

The kinds of intellectual skills that can be tested

At Masters level learning outcomes for/modules usually require students to


demonstrate higher level cognitive and intellectual skills, ie it is not enough for
students to demonstrate that they can remember facts and figures, names and
dates; they need to show they are able to interpret the meaning of data and
evaluate their significance. Several cognitive psychologists have been interested in
categorising the different ways that we can learn and think about things the most
famous of these being a group led by Benjamin Bloom in the mid 1950s.
Bloom et al (1956) identified three different domains of learning, Cognitive
(knowledge), Affective (attitudinal) and Psychomotor (manual skills). They went on
to produce complex hierarchies of skills for the knowledge and attitudinal domains
that have been re-visited and revised by many researchers since. When writing
examination questions it can be extremely helpful to consider the Cognitive domain
hierarchies in particular. Indeed, thinking carefully about the level of cognition that is
to be tested will help to select the most appropriate verb to be used in the exam
question.
- 14 -

Writing Good Exam Questions

e.g. Do we want to test a candidates ability to list important features, analyse the
given findings? or critique the argument they give.
Anderson et als (2001) re-working of Blooms taxonomy makes this easier as they
chose to present the hierarchy of sub-categories as active verbs and it is their
version particularly that has been widely used in course design and question design
in more recent years. It is however important to remember that,

Although Bloom's lends itself to wide application, each discipline must define
the original classifications within the context of their field
Crowe et al (2008)

Figure 2.
Bloom's Taxonomy of Cognition Revisited by Anderson & Krathwohl (2001)

Create
Evaluate

Analyse
Apply
Understand
Remember

Note Some colleagues in the School may already be familiar with the original Bloom
taxonomy that uses the terms Knowledge, Comprehension, Application, Analysis, Synthesis
and Evaluation

- 15 -

Writing Good Exam Questions

Table 1
A table of suggested verbs mapped against the Anderson and Krathwohl
adapted levels of Blooms Taxonomy of Cognition

Cognitive Level

Verb Examples

1. Remember

define, repeat, record, list, recall, name,


relate, underline.

2. Understand

translate, restate, discuss, describe, recognise,


explain, express, identify, locate, report,
review, tell.

3. Apply

interpret, apply, employ, use, demonstrate,


dramatise, practice, illustrate, operate,
schedule, sketch.

4. Analyse

distinguish, analyse, differentiate, appraise,


calculate, experiment, test, compare, contrast,
criticise, diagram, inspect, debate, question,
relate, solve, examine, categorise.

5. Evaluate

judge, appraise, evaluate, rate, compare,


revise, assess, estimate

6. Create

compose, plan, propose, design, formulate,


arrange, assemble, collect, construct, create,
set-up, organise, manage, prepare.

It is easy to see how Blooms very hierarchies become employed in different


question stems, for example, see Table 2.

- 16 -

Writing Good Exam Questions

Table 2.
Ways in which intellectual skills can be tested through different question
stems.

Intellectual Skill

Stem

Comparing

Describe the similarities and differences between...


Compare the following two methods for...

Relating & Effecting What are the major causes of...


What would be the most likely effects of...

Justifying

Which of the following alternatives do you favor and why?


Explain why you agree or disagree with the following
statement.

Summarising

State the main points included in...


Briefly summarize the contents of...

Generalising

Formulate several valid generalizations for the following data.


State a set of principles that can explain the following events.

Inferring

In light of this information, what is most likely to happen


when...
How would person X be likely to react to the following issue?

Classifying

Group the following items according to...


What do the following items have in common?

Creating

List as many ways as you can think of for/to...


Describe what would happen if...

Applying

Using the principles of X describe how you would solve.


Describe a situation that illustrates the principle of...

Analysing

Describe the reasoning errors in the following paragraph.


List and describe the main characteristics of...

- 17 -

Writing Good Exam Questions

Synthesising

Describe a plan for providing that...

Evaluating

Describe the strengths and weaknesses of...

(Adapted from Figure 7.11 of McMillan (2001) and Piontek, M.E. (2008))
Note you may like to compare these question stems with Blooms taxonomy,
given earlier and draw comparisons and to cross refer to the learning
outcomes specified for your own Modules.

EXERCISE
Take a few moments to look down this list of question stems and select two
that you feel could be used to test students on your module/course.
Why have you selected these two?

5.3.

The kinds of transferable skills that can be tested

Short answer and essay styled questions do give an assessor the opportunity to
judge a range of generic or transferable skills in the way students answer the
questions or respond to the tasks set. The most obvious of these are to do with
ability to write clearly and appropriately, to structure and organise answers so that
most important points are prioritised and well made and the ability to cite and use
source material effectively.
If these skills are to be included and given value in the assessment this should be
clearly stated in the assessment criteria used to make judgements and this fact
should be made clear to students. At The School this is an important issue as many
of the Masters students are non-native English speakers. What proportion of the
marks for a test question are allocated to skills such as written English should be
related to the Aims and Learning Outcomes for the course and context. In some
cases accuracy and style may be considered important, e.g. to highlight professional
skills and competencies, and be included in the assessment criteria, whilst in others
such characteristics are not what is being taught and considered.
- 18 -

Writing Good Exam Questions

5.4.

The kinds of attitude that can be tested

Attitudinal learning outcomes, such as equality, fairness, ethical considerations etc,


may be important learning outcomes in School Masters programmes and as such
are appropriate factors to be tested through the examinations. It is a complex area of
assessment as one can argue that just because a candidate knows what they should
be saying in response to an equality issue, this does not necessarily reflect what
they really feel or how they would react. Examination answers may therefore only
be considered as a partial reflection of a candidates attitude. In some cases it may
be that it is more straightforward to assume a student is adhering to the necessary
programme attitudes but to penalise (through the grading structure) cases where
such attitudes are transgressed. For example, when answers indicate important
values are either not fully understood or are not being applied by a candidate, e.g. an
answer is unacceptably gendered or racist etc.

6. Reducing the impact of extraneous factors such as stress, interpretation,


time
Ability to work under pressure or to demonstrate stress tolerance etc are unlikely to
be valid learning outcomes for a Masters Programme at The School and therefore all
attempts should be made to reduce the impact that stress and nerves may have on a
students performance in an examination.
It is possible to set and run examinations in ways that limit the importance of stress
induced factors (such as memory lapses) on success. Written examinations can be
organised as open book exams* or question topics can be pre-seen by candidates.
Such strategies reduce the need to question spot or the impact of luck in revising
the right or wrong sub-selection of topics tested. They allow students to think more
deeply about and possibly research, their views before attempting the questions (as
with course work assessments) but do have the added advantage of avoiding some
of the concerns of plagiarism in that candidates produce their individual answers
under exam conditions. Those familiar with running these types of examination
comment that the quality of student answers are frequently judged as a much higher
standard (again as is the case with course work answers).

* Open book examinations can allow students to take their own notes or choice of
texts or previously specified items into the examination.

- 19 -

Writing Good Exam Questions

If examinations are to be run traditionally as unseen, time constrained tests carried


out by individuals in silence then there are a number of things that the question
writer can consider to minimise the impact of such stress factors. For example,

Check that the question does not assume a lot of background knowledge
which may be culturally specific or introduce unnecessary bias;

Provide any important (untested) background detail within the body of the
question;

Give mark or timing guides within the framing of the question that indicate the
relative importance or attached weightings for each sub-section;

Set multiple-part problem questions so that the parts are independent from
each other. This means that if a student gets the first part wrong they dont
automatically lose marks or subsequent sections and makes grading much
quicker and more straightforward.
E.g in the second part of a question, write something like In the next part of
the calculation, assume that the answer to Part (a) was 25, regardless of what
you actually got in Part (a). Note that 25 is NOT necessarily the correct
answer to Part (a).

EXERCISE
Can you think of any additional aspects in the exam questions you will be writing that
should be considered to reduce the impact of stress factors?
Please list these here.

An extended example (and exercise) is provided in Appendix 2.

- 20 -

Writing Good Exam Questions

7.

Different styles and formats for exam questions

There are a number of ways in which examination questions can be written and
structured that in turn require very different responses from students. Examination
papers may consist of a variety of these formats. For example a paper may consist
of an initial section of 10 compulsory, short answer questions followed by a second
section in which the student is asked to attempt three from six longer questions
which may be essay or case study or problem solving styled questions.
Here are some examples of different ways questions are written at the School with a
commentary highlighting important features (such as the need to avoid ambiguity,
bias, inequality and yet be able to discriminate between different levels of attainment
and achievement).

7.1

Objective Tests
e.g. True-False, Matching Pairs and Multiple Choice Questions

There are few examples of such question types being used extensively in summative
assessments at the School and they are included here for completeness sake and
an acknowledgement that some teachers may well be using these question formats
as part of their class or on-line teaching, as self assessment or formative
assessment opportunities for their students.

Objective tests require a user to choose or provide a response to a question whose


correct answer is pre-determined. Such a question might require a student to
Select a solution from a set of choices (MCQ, true-false, matching, multiple
response);
Identify an object or position (graphical hotspot);
Supply brief text responses (text input, word or phrase matching);
Enter numeric text responses (number input); or
Provide a mathematical formula (string evaluation or algebraic comparison).
Pass-It, Good Practice Guide

- 21 -

Writing Good Exam Questions

True- False
Used to test a breadth in knowledge of information but the problem of
guessing is a major worry.

Matching Pairs
Used to assess knowledge of complex and inter-connecting relationships.

Multiple Choice Questions - Different Formats


There are many different types of MCQs. Some are especially well suited for
certain types of content. Some are particularly good for testing higher-order
learning. Some are inherently easier' or more difficult' than others.
o One-Choice Completions - Best Answer
The most commonly used MCQ format is simply a short-answer
question with a number of alternatives to choose from.
o Multiple-Choice Completions
This MCQ format allows for more than one correct answer. Such
questions are more difficult since the student is not just looking for one
correct response among four incorrect responses. However, the intent
of this format is not to test four separate points but rather to set up an
interpretive exercise.
o Quantitative and Functional Relationships
An MCQ format that deals with quantitative and/or functional
relationships. They are generally best for knowledge testing but can
also be used to test higher-order learning outcomes.

There is extensive guidance available on-line (particularly from Universities in the


USA) on the construction of multiple-choice tests and some of these are listed in the
references provided at the end of this workbook. Traditionally used to test lower
order cognitive skills their use in assessing higher order, Masters skills, such as
problem solving and analysis, is increasingly being explored.

7.2

Short answer questions can take many forms

constructed-response or open-ended questions that require students to


create an answer. These may be very short and of a fill in the blank nature
or longer, a few sentences or a couple of paragraphs maximum. They can be
used to test core knowledge from a module and check the student has the
required breadth in understanding.

Calculations and data manipulation questions

- 22 -

Writing Good Exam Questions

Example The investigators want to perform a sample size calculation with 80% power
and 5% 2-sided significance. They estimate that HIV-free survival at 7
months will be 60% in the control arm.
(i)

Calculate the sample size required to detect a 10% increase in HIV-free


survival at 7 months in the intervention versus the control arm. (Hint:
remember to identify your equation, define all your variables, show all your
calculations and conclude appropriately) (10 marks)

(ii)

Assume that 5% of motherinfant pairs are lost to follow up prior to the infant
reaching 7 months and adjust your sample size calculation accordingly. (4
marks)

EXERCISE
How could you improve parts (i) and (ii) of the example question above?

Please see the concerns that were raised by the Module team over the page

- 23 -

Writing Good Exam Questions

Improving the example question


Here are the views of the Module leader who raised two questions relating to the
clarity of the draft question
Part (i) - It isnt clear whether the question is asking students to calculate an
absolute or relative increase? This makes a big difference to the calculation
(see below). This is an example how the omission of one word can have a
significant difference on how the student answers!!!!
It is therefore crucial to use technical terminology precisely and avoid expert
shorthand that could be mis-leading to a new learner.
i.e.) If we are asking about a Relative increase:
n= F( ,) x [ p1 x (100 - p1 ) + p2 x (100 - p2)]/ (p1 - p2 )2
= type 1 error = 0.05
= type 2 error = 1- power = 1-.08 =0.2
F( ,)= 7.85
p1 = anticipated percentage of Infants in the control group HIV uninfected and alive
by 7 months=60%
p2= anticipated percentage of infants in intervention group HIV uninfected and alive
by 7 months=66%
n = sample size for each group
n= 7.85 [(60x40) + (66x34)]/36 =7.85x4644/36=1013
Not accounting for loss to follow up, a sample size of 1013 women per study arm
(2026 total) will give us 80% power and 5% significance to detect a 10% increase in
HIV free survival in the intervention from 60% in the control arm.
i.e.) And asking about an absolute increase: As before: p1 = anticipated
percentage of Infants in the control group HIV uninfected and alive by 7
months=60%
BUT this time: p2= anticipated percentage of infants in intervention group HIV
uninfected and alive by 7 months=70% (ie 60% plus 10%)
n = sample size for each group
n= 7.85 [(60x40) + (70x30)]/100 =7.85x 4500/100=353.25
Not accounting for loss to follow up, a sample size of 353 women per study arm (706
total) will give us 80% power and 5% significance to detect a 10% increase in HIV
free survival in the intervention from 60% in the control arm
Part (ii) - Will the formula be included in the question or the provided formulae
sheet? This is a straightforward calculation for which there is a formula. Do
you expect the students to memorise the formula or will they expect it to be
provided?
Being clear about what actually should be tested is the important factor here.

- 24 -

Writing Good Exam Questions

7.3

Longer form Questions - Essay questions

Longer format to allow students to respond to open ended questions at length. Used
to test higher skills, writing and structuring skills, further reading and a deeper level
of understanding. Assessors are frequently interested in a students ability to
organise and integrate a range of ideas and information and build an argument or
make a case (the intellectual skills of synthesis and evaluation, going back to
Blooms taxonomy).
Two types of essay questions can be readily identified, restricted-response and
extended-response. Restricted-response essays focus on understanding of basic
knowledge through relatively brief and confined written responses.
e.g. Outline the morphology, genome organisation and replication of the human
immunodeficiency virus (HIV).
Extended-response essays allow student to construct a variety of interpretations and
explanation and draw upon a wider and more flexibly defined set of information and
sources
e.g. The burden of disease caused by intestinal parasites in a community reflects
the levels of personal and environmental hygiene.
To what extent do you agree with this statement and what are its implications? Make
reference to specific infections to support your conclusions.

Table 3.
Some Common Essay Style Questions used in Exams

Question Stem
Give a Quotation Discuss
Make an Assertion Discuss
Compare and Contrast
Write-on
Outline
Describe
Explain (with examples)
Evaluate
Analyse the advantages...
Design a

- 25 -

Writing Good Exam Questions

EXERCISE
Look back over recent examination papers set for your course or teaching
module and add two more commonly used Question Stems to this list.
1.

2.

7.4

Longer form Questions Problem Solving / Data handling

Here the students are provided with some data (this could be in written, tabulated,
graphical form etc) and then asked a series of questions about it. The provided
information may be some research findings or monitoring data. The questions
usually begin with a couple of straightforward interpretative questions (e.g. Using the
table of infection rates provided, which of the described drug therapies reduces the
risk of infection the most?). They then move on to more complex questions of
application and analysis that require the students to carry out standard manipulations
or calculations of the data provided. The final questions are likely to be more
evaluative and open-ended, requiring the students to predict likely impacts or
suggest improvements etc.

An Example
On a hot summer day, children in three schools had a school outing to a playground
where some of the children played in the recreational fountain. Two days later nearly
half the children had symptoms of vomiting, diarrhoea, abdominal pain and
headache. A retrospective cohort study was carried out to try to identify the source
of the outbreak with the following results.
Risk factor

Ate commercial ice-cream


Drank water from taps near fountain
Drank water from taps near sanitary
facility
Played in fountain
Drank from fountain

Exposed to risk
factor
Ill
Not ill
72
76
3
4
18
32

Not exposed to risk


factor
Ill
Not ill
17
22
78
89
68
64

87
25

4
24

80
15

19
75

(a) Define what is meant by the risk and relative risk of becoming ill associated
with each factor (10%).
- 26 -

Writing Good Exam Questions

(b) Calculate BOTH the risk and relative risk associated with each factor (30%).
(c) Suggest possible interpretations of the results, and the implications for
control recommendations (10%).
The investigators wanted to identify the infectious agent involved. One possibility
they considered was norovirus which is known to cause acute gastroenteritis.
Although reverse transcription-PCR (RT-PCR) method is considered to be the gold
standard for diagnosis of this viral infection, it requires skilful personnel and a wellequipped laboratory. A simpler diagnostic kit has been developed. The following
table shows how the simpler diagnostic kit compares to the gold standard.

Diagnostic kit
Norovirus present
Norovirus absent

Gold standard (RT-PCR)


Norovirus present
Norovirus absent
37
3
13
47

(d) Would you advise the investigators to use the simpler diagnostic test in their
epidemiological study? Would your recommendations change if the simpler
diagnostic test was to be used in clinical practice? Justify your answer. (50%)
[Note on norovirus: this highly infectious RNA virus causes a self-limited, mild to
moderate disease that often occurs in outbreaks with clinical symptoms of nausea,
vomiting, diarrhoea, abdominal pain, headache, low grade fever or combination of
these symptoms. No treatment is indicated apart from rehydration in severe cases. ]

EXERCISE
In section 6. we discussed a number of ways that a question writer could minimise
the impact extraneous factors, such as stress, interpretation, time-management etc
in the way they set a question please look over the question above and identify at
least three ways the question author has sought to do this.
1.
2.
3.

- 27 -

Writing Good Exam Questions

7.5

Longer form Questions Case study or Scenario based question

In case study styled questions a context or situation is described in detail (e.g. this
maybe a patient history or government strategy position etc). Such questions are
often seen as being very authentic and ask students to apply their knowledge to a
particular and novel, set of circumstances. They frequently take considerable work
and effort to write well and usually involve a team of people who craft an idea into a
realistic and challenging situation.
Note - Some examples of this type of question are presented as examples in section
11.

Giving Choice
A common structure in examination papers is to have part of the paper core, to be
attempted by everybody and other sections which provide a limited amount of
choice, e.g. choose 2 from the following list of essay questions to complete.
Whilst the structure of exam papers is set by the Board of Examiners and not by
individual question setters, it is never-the-less interesting to consider the impact of
providing question choice within an exam.
Many people view the giving of choice as a way to increase fairness and reduce the
affect of luck in question spotting. It allows students to address questions for which
they feel most prepared and have been most interested in so seeing the best the
student can produce. However, providing choice inherently reduces the validity and
reliability of the test instrument because each student is in fact taking a different test
and has been encouraged to sample from their learning in different ways. It is nearly
impossible to create parallel exam questions that test achievement of the learning
outcomes to the same extent, and it is equally difficult to grade two different essays
absolutely comparably both factors making consistency very difficult (Piontek,
2008).

EXERCISE
Do you personally think that the giving of a choice in an examination, (e.g.
choose 3 from the following 6 questions) is fair?

- 28 -

Writing Good Exam Questions

8.

Evaluating Draft Questions

It is very difficult to write a question and then immediately see the ambiguities or
errors that it contains. Separating the creating from the evaluating roles in time can
help. Write a question and then come back to it the following day and re-read with
fresh eyes. When you have a draft question, next write a model/specimen answer
and/or some marking guidance. As you do this come to a decision about the
appropriate break down of marks and try to estimate how long it will take to tackle
the question, part by part. In coming up with the marking scheme for your question
you might find it helpful to have the learning outcomes for the module or course in
sight to refer to so that you can check that you are valuing the right things and giving
credit to Masters level criteria.
Below is a checklist of questions to use once you have a draft question (doing some
of this in a group with questions on overheads can work well):

- 29 -

Writing Good Exam Questions

Checklist for reviewing draft exam questions

1.

What is the question intended to measure?


(eg factual recall, data processing/analysis skills, problem-solving skills, policy
analysis skills, critical analysis skills)

2.

What else does it actually measure?


(eg does it rely too much on factual recall?)

3.

Does it measure what we said we would measure?


(Is it aligned with the teaching on the course, the content covered and
emphasised and the intended learning outcomes?)

4.

How well does the question relate to intended learning outcomes (of the
teaching module or MSc)?

5.

Is the language simple, clear, unambiguous and straightforward?

6.

What are the key words describing the task? Are they clear?(eg: list, define,
suggest reasons behind the effect are better than interpret, discuss,
evaluate)

7.

Is the language used easy to understand, including by candidates for whom


English is not their first language (eg does it use colloquial phrases)?

8.

Check punctuation and grammar as this can markedly change the meaning of
sentences (eg panda eats, leaves and shoots).

9.

Does the question give an advantage or disadvantage to those candidates


with particular professional backgrounds (eg medics)?

10.

How reliably can the answers be marked?

11.

If the question is in sections, is the division of percent of marks between


sections appropriate? Are there consequences for later sections if a candidate
makes an error in an early section? If yes, how will the marking cope with this
possibility?

12.

Can the question be completed in the time available (including reading,


thinking and reviewing time), including those for whom English is not their first
language?

13.

Does the question lead to answers which will distinguish between weak and
strong candidates, eg are there elements for candidates to demonstrate
distinction-level skills/knowledge?

- 30 -

Writing Good Exam Questions

Question Validation
The Masters programme that you contribute too is likely to have its own process of
question validation and process of compiling the examination paper. It is important
that you ascertain this from the module leader and adhere to it.
In general terms, however, once you have the question, model/specimen answer
and marking scheme written ask someone else to answer it (do not give them the
model/specimen answer), timing each part of the question. It allows you to check
that your calculated time it takes to complete estimates were about right. Modify
the question, and timings and marking scheme based on any misunderstanding
made clear by their answer.
It can be helpful to agree a question swap with a colleague and undertake an
informal peer review of the questions you have both written. This frequently happens
across a course team.
At this stage you will be ready to submit your question to the module leader and they
too will scrutinise your question and may get back to you with further suggested
improvements (please see the extended case study in the Appendix for further detail
about the way The School conducts examination question approval processes.)

EXERCISE Evaluating a question


Please read the following draft question and suggest improvements When you
have had a go turn the page and you will see the changes that the examiners team
finally made to the question.
Question X Draft
Describe the structure of the cell plasma membrane and its principal components.
How and where are plasma membranes usually made in the eukaryotic cell.
How are molecules transferred across the membrane into and out of the cell :

Water
Ethanol
Sodium and Potassium ions
Sugars.

What other functions in the cell may lipids serve?

Over the page you will find the edited version of Question X that was eventually
accepted and used in the examination.

- 31 -

Writing Good Exam Questions

Notes Exercise Evaluating some questions


When you submit a question to the Teaching Unit leader it is likely that they will
arrange for it to be scrutinised by members of the teaching team and they will
make suggestions for improvement. Question X, reviewed on the previous page,
ended up looking like this Question X accepted
Describe the structure and synthesis of the cell plasma membrane of eukaryotic
cells and its principal components. Explain how molecules are transferred across
the membrane giving 2 examples.

The questions are usually considered together with the associated marking
guidance notes and for this question these were the guidance notes that were
accepted

The Marking Guidance For Question X


Lipid bilayer membrane consisting of amphipathic lipid molecules with the
hydrophobic part inside and the hydrophilic part outside.
The major components are lipids of various kinds : these may consist of
phospholipids (eg phosphatidyl choline, serine,ethanolamine etc.)), triacylglycerols
(containing glycerol esterified with saturated or unsaturated fatty acids),
glycolipids (eg diacylglycerols with a sugar chain on the third glyceryl OH), sterols
or steroids (eg cholesterol) amongst others. May also contain others. Also
proteins which may be transmembrane with hydrophobic trans-membrane section
or anchored by lipid. Also protein transporters which span membrane and are
each responsible for the transport of a limited range of molecules or ions. Most
require energy.
Membranes are synthesised in the cytosol of the Endoplasmic Reticulum (ER)
where the acylation of fatty acids takes place. Lipids inserted into the inner layer
of the membrane are flipped by a scramblase and by flippases which equilibrate
lipids between both sides.Lipids are transported between membranes by
phospholipids exchange proteins
a)
b)
c)
d)

Diffusion (neutral)
diffusion (lipid-soluble)
ion transporter
specific transporter protein

Fuel storage (ie triglycerides); signalling; pigment

- 32 -

Writing Good Exam Questions

9.

Marking Approaches:
Using assessment criteria and marking schemes

Assessment criteria test the intended learning outcomes for a course or teaching
unit. They describe the knowledge and skills (and possibly attitude) that a student is
expected to demonstrate in their examination answers and they are then used in
marking the work. The learning outcomes describe what students should be able to
do; assessment criteria describe how well they should be able to do it they set
standards. Remember that learning outcomes define the minimum standard
required to achieve the award, and so in addition to these the assessment criteria
should provide an objective basis for interpreting and differentiating the performance
of students at the level of the outcome (a satisfactory pass) and at a series of predefined steps above this (usually up to a level considered an excellent or
outstanding pass).
For each examination question there should be a model/specimen answer, or a set
of specific marking guidance, that are used to mark the associated student answers.
These will usually vary with each and every question and are tailored and specific.
The assessment criteria are usually more generic and used as a framework to fairly
judge the merits of each students work across a whole course or teaching unit.
Assessment criteria describe the extent to which students have achieved the
specified learning outcomes. They are usually provided at two levels,
o the overarching criteria that describe the different bandings of overall
achievement at the Programme level e.g. First, Two-one, Two-two etc
at undergraduate level and Pass /Merit/Distinction categories at
Masters level.
o A detailed and specific level of criteria that describe and measure
achievement in particular modules of study or for individual
assessment tasks.

Two Different Approaches to Marking


Assigning grades fairly and robustly is a demanding occupation for all teachers and
we employ a range of approaches to help us to do this reliably and consistently. Two
very different methods are often used simultaneously and symbiotically norm
referencing and criteria referencing.
Norm referencing is all about comparison when we want to re-mark an earlier
exam question having subsequently marked a stronger or weaker essay we are
norm referencing our assessment. When we say, is this as good as the other
Distinction in the batch of answers? we are norm referencing.
An ultimate form of norm-referenced assessment is when we attempt to fit our
marking profile for a cohort of students to the bell-shaped curve or give our
assessment results a normal distribution. This pattern of achievement anticipates
- 33 -

Writing Good Exam Questions

that a few students will fail and a similarly few students will get distinctions whilst the
majority will gain marks that cluster and peak in the middle mark range.
You will also sometimes hear experienced assessors referring to a particular piece of
student work as providing a benchmark. This is where the answer provided for
various reasons encapsulates the criteria for a mark or grade: for example,
determining the threshold for a distinction. This can be extremely helpful, and is a
way in which norm referencing and criteria referencing naturally come together.

Figure 3.
A Normal Distribution or bell-shaped curve.

However, not all cohorts will fit this pattern, for example, Computing for Beginners
courses could form a two peak pattern, with clusters of students achieving very high
marks (and represent the students who could have taught the course!) and a
second cluster with marks at the bottom of the range (ie those who had never done
any computing before!).
Absolute norm referencing also has the characteristic of effectively setting quotas,
only so many students can get As and only so many can get Bs etc, and the
application of a bell-shaped curve to small groups or cohorts of students becomes
clearly unfair where we can see that variations between groups, say from year to
year, is likely to give rise to very different patterns of achievement.
Criterion referenced grading on the other hand specifies a standard through the
description of clear criteria and anybody who achieves the level or standard
described gains the marks so everybody in the cohort could potentially get an A
and each students work is individually judged in comparison to the criteria
regardless of what other students may or may not do.

- 34 -

Writing Good Exam Questions

EXERCISE

Please consider the strengths and limitations of both forms of grading work.

Norm-referenced assessment
Strengths

Weaknesses / limitations

Criterion-referenced assessment
Strengths

Weaknesses / limitations

- 35 -

Writing Good Exam Questions

In The Schools Assessment Code of Practice we can see some guidance and clarity
on this issue.
Using the full mark range Advice from the School
(Code of Practice 2012)
Markers are encouraged to use the full range of available
marks, to reflect the full range of student achievement. In
particular, markers should not feel reluctant to award 5.0
grades provided work meets the appropriate standards. The
following specific points should be noted

Excellent work does not have to be outstanding or


exceptional by comparison with other students.

Since the School uses criterion-referenced marking rather


than banded marking, 5.0 grades should not be capped to a
limited proportion of students per class.

There is no standard cut-off for what constitutes


excellent work. In many cases where quantitativelyscored assessments are used, a 5.0 grade may be awarded
for work scoring above a particular threshold (for
example 80%) of the possible marks, i.e. by no means
perfect but of a sufficiently high standard.

Good assessment design should ensure that tasks have


clear criteria to allow excellent students to achieve 5.0
grades.

LSHTM Grade descriptors for assessed work


(Code of Practice 2012)
The School uses a standard assessment system, marking against six gradepoints:
integers from 0 to 5. Grades 2 and above are pass grades (grade 5 can be seen as
equivalent to distinction standard); whilst grades below 2 are fail grades, (these are
equivalent to the old grades of A, B+, B, C, D and E).

- 36 -

Writing Good Exam Questions

Table 4.
LSHTM Marking Gradepoints descriptions (Overarching criteria)
Grade
point

Descriptor

Typical work should include evidence of

Excellent

Excellent engagement with the topic, excellent depth


of understanding & insight, excellent argument &
analysis. Generally, this work will be distinction
standard.
NB that excellent work does not have to be
outstanding or exceptional by comparison with
other students; these grades should not be
capped to a limited number of students per class.
Nor should such work be expected to be 100%
perfect some minor inaccuracies or omissions
may be permissible.

Very good

Very good engagement with the topic, very good


depth of understanding & insight, very good argument
& analysis. This work may be borderline distinction
standard.
Note that very good work may have some
inaccuracies or omissions but not enough to
question the understanding of the subject matter.

Good

Good (but not necessarily comprehensive)


engagement with the topic, clear understanding &
insight, reasonable argument & analysis, but may
have some inaccuracies or omissions.

Satisfactory

Adequate evidence of engagement with the topic but


some gaps in understanding or insight, routine
argument & analysis, and may have some
inaccuracies or omissions.

Unsatisfactory /
poor
(fail)

Inadequate engagement with the topic, gaps in


understanding, poor argument & analysis.

Very poor (fail)

Poor engagement with the topic, limited


understanding, very poor argument & analysis.

Not submitted
(null)

Null mark may be given where work has not been


submitted, or is in serious breach of assessment
criteria/regulations.

- 37 -

Writing Good Exam Questions

Summative assessment combines these marks into non-integer gradepoint


averages (GPAs) in the range 0 to 5, by averaging against relevant weightings. The
School does not set any fixed percentage to gradepoint conversion scheme.
Rather, the conversion should be done using a scheme agreed in advance by the
relevant Board of Examiners that best fits the particular assignment or question. The
approved conversion should appear in the marking pack for each
assessment/question for which it is to be used. Table 4 below gives examples of
three different percentage-to-gradepoint conversion charts.
Table 5 Examples of grade conversions used at the School.

Example
Grade
MARK
point
(%)
80-100 5
70-79
4
60-69
3
50-59
2
40-49
1
<40
0
(typical scheme)

Example
MARK Grade
point
(%)
95-100 5
85-94
4
75-84
3
60-74
2
50-59
1
<50
0
(higher numeric
pass threshold)

Example
MARK Grade
point
(%)
75-100 5
60-74
4
45-59
3
30-44
2
20-29
1
<20
0
(lower numeric
pass threshold)

Students should be made aware of the criteria on which all assessment tasks will be
marked, to improve their understanding of the standards expected of them.
The criteria used to place students in each grade category must be written down by
staff setting assessments, and adhered to by all those involved in the marking.

- 38 -

Writing Good Exam Questions

10.

Ways of producing accurate and clear marking guidance for questions

Experienced question setters recognise the importance of writing exam questions


and their associated marking guidance together and seeing them as a whole unit.
When you are considering the dimensions of a question have a separate sheet of
paper or screen open, in which you can make notes about the expected answers the
questions should elicit. Also make note of any likely errors or misunderstandings the
students may make.
The value of good marking guidelines
Well-developed marking guidelines:

Help identify problems with assignments.


Confirm different types of possible responses to the assignment question(s)
and what knowledge and/or skills are being tested.
Establish the necessary content for achieving different level marks
Encourage consistency between marking team members.
Provide ideas and wording for constructive feedback to students regarding what
would have constituted a good or better response.

Marking guidelines should be based directly on the Assessment criteria and for some
modules, such as those that are quantitative in nature, there is probably a need for
model/specimen answers, in addition to or instead of marking guidelines..
The Assessment criteria will serve as the basis for the development of the marking
guidelines. For each criterion I suggest that you initially think about the major steps
in the continuum of student achievement i.e. what do you expect from a Pass
answer at a 50% grade level and what would you expect of a Distinction answer?
Firstly, for each criterion, consider carefully what you expect students to have written
to achieve a passing mark for this criterion. Draft a detailed description of the content
and quality that markers should evaluate, in addition to what has been included in
the assignment instructions. Ask yourself: What would comprise the minimum of
what I would expect the student to have written for this section, or about this subject,
to achieve a passing mark? This description or set of required
concepts/ideas/issues/ definitions will serve as the basis for a grade of 2.
Once the basic expectations for a 2 grade have been drafted in association with the
original criteria, it is then necessary to describe what additional level of content
and/or quality would achieve higher marks (3, 4, 5). Please draft descriptions of what
components might achieve the different possible higher marks (3, 4, 5).
(A note on Distance Learning assessments)
N.B. Remember that many students do not have access to other facilities, such as
libraries, so the student must be able to respond to the question by reference to the
study materials ONLY and still achieve a high mark.
It should be possible to obtain a 5 by original and creative use of nothing more than
the materials provided. All instructions should be devised to allow scope for
- 39 -

Writing Good Exam Questions

imaginative input and cross referencing from students who have access to nothing
more than the course materials. For a 5 grade in particular, it is original thought, not
extra facts, that would contribute.)
You may well find that, depending on the nature of your course, module or subject
area there is one criteria type that tends to take precedence in differentiating the
marks. For example, in a strongly practice-based, professional course, the quality
and authenticity of reflective practice may be a priority criterion. In courses
concerned with exploring the impact of public policy decisions and practices the lead
criteria may be those emphasising the application of key principles and the analysis
of outcomes. If there are lead criteria, then a transparent approach would be to
emphasise these in advance to students both within the teaching and the
assessment design. There should also be links made between the criteria and the
intended learning outcomes that help to show students where the emphasis lies.
Finally based on the basic criteria for a passing mark (2), draft a list of fundamental
omissions or errors that would result in a 1 ore even in a 0, fail.

A question to challenge yourself with Does a grade of Outstanding actually equate to Impossible to achieve?

This is particularly important if you are likely to be assessing essay style questions
rather than numeric or quantitative questions. It is possible to score 100% in a
calculation answer and virtually impossible to score more than 80% in a discursive
essay style answer.
You have to give your students the opportunity to be able to excel you need to
consider how your more able students can demonstrate their additional qualities,
creativity or more in-depth knowledge or understanding to you. This is often a difficult
thing to achieve, i.e. to incorporate into the question design an opportunity to
differentiate between your able and excellent students.

- 40 -

Writing Good Exam Questions

In Summary Fundamental points in drafting marking guidance


1. Give precise descriptions of what is required for a minimum pass grade
(2) This should include details of the key elements or standards the student
needs to achieve a passing grade.
2. Provide descriptions of what could be added to the minimum required
response that would result in the student achieving a higher grade.
(3=good, 4=very good, 5=excellent);
3. Include description of omissions or errors that would define a
1=unsatisfactory (fail) or 0=fail grade;
4. Write description of elements for each question component or part of
question set in an assignment. This not only helps markers to recognise
what the question-setter expects by way of an answer and how to grade them,
but also enables the external examiners to understand why different grades
were awarded.
5. Write guidance that allows for some degree of markers discretion.
Leave room to offer marks for e.g., originality, reading beyond the subject
area and creative thinking, good use of examples, clear description of
reasoning. This might include points for the integration of concepts or
methods from other study module materials.
6. For questions where a numerical mark is to be given, please give a
definition of how many marks can be awarded for each question or part
of question (if not already designated in assessment criteria). This point
system should allow for some discretion for original thought, where relevant.
If the numerical answers to a question are sequential (i.e. they use numbers
calculated in earlier parts of the question), the marking guidelines should state
that marks should be given for process rather than purely the correct answer.
Adapted from Guidance for Drafting Assignment
Instructions and Marking Guidance, Public Health and
Health Service Management Distance Learning Courses

- 41 -

Writing Good Exam Questions

Here are a couple of examples showing how the marking guidance gives clear links
to the grading structure and differentiates between the possible grades.

Example 1.
Question
Discuss what is meant by the term epidemic. Describe the main features of an
epidemic curve. Identify the main types of epidemic, giving examples.

Marking Guidance
(Based on Teaching Session 3 and the Webber book chapter 2)

Grade 3 answers should contain the following points ...


Epidemic excess of cases in the community from that normally expected.
Characteristics of an epidemic: latent period, incubation period, period of
communicability. Common source (point sources/extended source) and propagated
source epidemics.

Grade 2 answer would have some omissions.


Grade 1 answer would have serious errors of interpretation
Grade 4/5 should provide a comprehensive discussion of the topic including relevant
examples.

Example 2.
Question
What has been the impact of HIV on the epidemiology and control of TB?

- 42 -

Writing Good Exam Questions

Marking Guidance
(Based on Section 2 Teaching Session 3 and the TB/HIV clinical manual)

A Grade 3 answer should provide basic information on the epidemiology and control
of TB including

Infectious agent mycobacterium tuberculosis


Transmission person to person: person with pulmonary TB coughs and
produces infectious droplet nuclei
Transmission generally occurs indoors as direct sunlight kills tubercle bacilli
Progression to disease higher in children and people with immunodeficiency.
Main strategy of TB control is to detect and cure cases of pulmonary TB
DOTS strategy

And an indication of how this changes in populations affected by HIV

HIV is driving the TB epidemic in many countries, especially in sub-saharan


African and increasingly in Asia and South America.

People infected with HIV at increased risk of developing TB.

Increased proportion of extrapulmonary and smear-negative pulmonary TB


cases, which are more difficult to diagnose, account for an increased
proportion of total cases.

More adverse drug reactions.

Risk of TB recurrence is higher.

Diagnosis more difficult, especially in children.

Control: TB and HIV/AIDS share mutual concerns. Prevention of HIV should


be a priority for TB control, TB care and prevention should be a priority of
HIV/AIDS programmes.

HIV exposes any weaknesses in TB control programmes.

Rise in TB suspects puts strain on diagnostic services.

Stigma associated with HIV/AIDS can affect uptake of TB services.


A Grade 2 answer may include some of these points, or alternatively all of these
points but with insufficient discussion.
Grade 1 and below would include some of these points but with significant errors of
interpretation.
Grades 4/5 answers will be an intelligent structured discussion of how HIV impacts
the epidemiology and control of TB including other relevant points in addition to the
ones listed above.

It is interesting to note that in both these examples the assessor has chosen to
provide a description for a Grade 3 answer first describing a point near the middle
of the grade-scale, the peak of the normal distribution, before going on to relate
higher (4/5) and lower (2/1) scoring grades to this mid-point.
- 43 -

Writing Good Exam Questions

EXERCISE
Consider an examination question that you have written or are currently in the
process of drafting. Produce some marking guidance for the question that provides
clear descriptions that differentiation between the Grades (0 to 5).
Think about which point on the grading scale you find it easiest to begin with.

Quantitative Question Formats and Marking Guidance


In section 7.4 I gave an example of a problem-solving formatted question that had a
number of sub-sections and itemised tasks within it. Here a set of Marking Guidance
for that question (about school children playing in a fountain) is provided as an
example of a quantitative question set of guidance.

Example 3.
Question
On a hot summer day, children in three schools had a school outing to a playground
where some of the children played in the recreational fountain. Two days later nearly
half the children had symptoms of vomiting, diarrhoea, abdominal pain and
headache. A retrospective cohort study was carried out to try to identify the source
of the outbreak with the following results.
Risk factor

Ate commercial ice-cream


Drank water from taps near fountain
Drank water from taps near sanitary
facility
Played in fountain
Drank from fountain

Exposed to risk
factor
Ill
Not ill
72
76
3
4
18
32

Not exposed to risk


factor
Ill
Not ill
17
22
78
89
68
64

87
25

4
24

- 44 -

80
15

19
75

Writing Good Exam Questions

(a)
(b)
(c)

Define what is meant by the risk and relative risk of becoming ill associated
with each factor (10 marks).
Calculate BOTH the risk and relative risk associated with each factor (30
marks).
Suggest possible interpretations of the results, and the implications for control
recommendations (10 marks).
Marking Guidance
Risk = children who were ill who were exposed/total number exposed
Relative risk = risk in exposed/risk in unexposed
Give 5 marks each for these definitions: total 10 marks.
Give3 marks for each correct risk & 3 marks for each correct relative risk:
total 30 marks
Main risk factor is playing in the recreational fountain. This suggests that
the source of the outbreak is water in the fountain, possibly indicating
faecal-oral transmission. Water in the fountain should be tested regularly
for relevant bacteria and viruses (eg, E Coli, salmonella, norovirus) and
should be monitored to ensure that adequate levels of chlorine are
present in the water. Alternatively children could be prevented from
playing in the fountain (however, on a hot sunny day it may be difficult to
keep them out of the water!) Up to 10 marks for that or similar relevant
comment.

The investigators wanted to identify the infectious agent involved. One possibility
they considered was norovirus which is known to cause acute gastroenteritis.
Although reverse transcription-PCR (RT-PCR) method is considered to be the gold
standard for diagnosis of this viral infection, it requires skilful personnel and wellequipped laboratory. A simpler diagnostic kit has been developed. The following
table shows how the simpler diagnostic kit compares to the gold standard.

Diagnostic test
Norovirus present
Norovirus absent

(d)

Gold standard
Norovirus present
37
13

Norovirus absent
3
47

Would you advise the investigators to use the simpler diagnostic test in their
epidemiological survey? Would your recommendations change if the simpler
diagnostic test was to be used in clinical practice. Justify your answer. (50
marks)
Marking Guidance
Sensitivity = 37/50 = 74%
Specificity = 27/50 = 94%

- 45 -

Writing Good Exam Questions

Give 5 marks each for calculation of sensitivity and specificity (10 marks).
Discussion of whether or not to use the test in (i) epidemiological survey
or (ii) clinical setting Up to 40 marks for answers that identify the key
requirements of a diagnostic test in the two situations and uses
information from the calculation of sensitivity and specifity correctly.
Some of the following points may be included in the answer:
Possible implications of missing true cases (1 in 4 true cases will be
missed) and of a diagnosis in true negatives (6/100 people without the
disease will be diagnosed by the test).
(i) Epidemiological survey: simpler diagnostic test will be adequate to
identify the outbreak. Do not need to identify all cases to recognise that
this is an outbreak. As large numbers to be tested consider
cost/resources/time savings of using the simpler test.
(ii) Clinical practice: what are the implications of missing 1 in 4 true
cases? As treatment non-specific (rehydration therapy) a false-negative
diagnosis with respect to norovirus would be unlikely to affect the outcome
of the illness in the individual. However consider whether other
investigations may be undertaken in people with symptoms who have
tested negative for the disease. Also as norovirus is known to be very
infectious consider impact on behaviour of having a diagnosis of the
disease. Also identification of contacts. Less issue for time/resources in
the clinical setting so it may be better to go for the gold standard test.
Other factors: cost, resources, time (up to 10 marks).

11. Ways of producing model / specimen answers suitable for distribution to


students
Providing students with past papers and specimen answers before an examination is
one way of providing transparency and clarity in what is expected and valued in an
answer. Providing students with a specimen answer after their papers have been
marked also helps them to review their own learning and act as a form of feedback.
It also helps to tailor any individual feedback to the particular needs of a student (e.g.
in tutorials) rather than having to cover everything generically.
Some teachers worry that providing a model/specimen answer can have a
reductionist impact on learning and in someway act to limit the scope and
individuality of students in responding to questions. This may be a valid concern if
the answers expected require students to come to their own conclusions and argue a
particular perspective rather than processing information which in many ways can be
considered factually right or wrong or where there is an acknowledged best
interpretation. An assessor needs to consider what can and should be conveyed in a
model/specimen answer and this will hugely influence the style of specimen answers
that should be provided. For example, where the answers is conveying correct
interpretations or knowledge of undisputed facts specimen answers can take the
form of brief, summary note answers that identify key pieces of knowledge or
- 46 -

Writing Good Exam Questions

explanations of why a particular answer is correct (or more correct than others).
However, if answers are expected to use evidence or explain with reference to the
literature, the specimen answer provided should seek to model good practice in
these academic skills whilst also emphasising that there may be other ways of
achieving positive results. In very open ended response questions it may be best to
provide brief outlines for two or three different possible interpretations and
arguments presented this can be particularly useful in a feedback mode of
presentation in which students come, review and then discuss the different
approaches taken thus attempting to encourage students in finding their own voice.

You may like to refer to the extended case study provided in Appendix 3 that takes
you through the steps of exam question and marking guidance development together
with extracts from the module team discussions.

12. Exam question development, validation and approval processes


The development and approval of questions is the responsibility of the course team
and is usually a process started towards the end of the Autumn term as refining
questions and marking guidance does take quite a lot of time to do well and
collaboratively.
You have an opportunity now, if you wish, to review an extended case study showing
the approach adopted by one course team and showing the development process for
one question.
Please refer to the extended case study provided in Appendix 3 to see the process
by which questions are produced by module teams.
This extended case study, based on a real example, aims to show the stages
of development that the question went through and reflections on the process
made by the course team (shown in comment boxes)
The case study includes the following sections
3.1
3.2

Question Background
A Work in Progress (presented in four steps)
i) An Early Draft with Feedback (Autumn Term)
ii) The Question Amended after Feedback from the Exam Chair (July)
iii) Some Fine Tuning (Final Version)
iv) A Completed Work? (Some reflections on the use of the question)

3.3

A reflective exercise

- 47 -

Writing Good Exam Questions

Following the initial grading process, Module Organisers


should look at the distribution of grades for the particular
Module. If this deviates significantly from past performance
or appears to differ significantly from other grade
distributions at Course, Faculty or School level, this should
be considered in more depth to confirm that the marks given
are indeed in line with School criteria. In some cases, Module
Organisers may wish to recommend re-marking, procedures for
which are detailed in the Guidance Notes for Boards of
Examiners.
LSHTM Assessment Code of Practice
(2012)

13. Security issues


It is important to be aware of information security matters when handling exam
questions. Please follow the procedures set for your course carefully.
E.g.:

password protect files,


place in electronic but secure tutor area
hand-deliver rather than send in unsealed envelopes
or at least seal and mark as confidential to the named recipient,
be careful not to leave overhead transparencies on the projector etc.

It is also important to remember that any grade divulged before the final meeting of
the Board of Examiners is a provisional grade, subject to external review and may be
amended at the discretion of the examiners.

Appeals
When thinking about the way we write examination questions and conduct
summative assessments it is worthwhile remembering that candidates may appeal
against a result where there is concern that the examination has not been conducted
in accordance with School policies and procedures. However, the University of
London does not allow appeals on purely academic grounds, such as challenging
the interpretation of a concept or principle.

- 48 -

Writing Good Exam Questions

14. Providing support and guidance for students


(formative assessment /practice opportunities)

Strategies to support students are usually based upon two guiding principles;
a)
b)

Transparency students knowing how and when their learning achievements


are going to be judged and evaluated from the outset of their studies;
Providing opportunities to practice and rehearse the ways in which they will be
required to demonstrate their learning achievements.

A common approach used in the School is to provide examples of past papers and
examiners reports so that students can see the process of assessment clearly.
It is also desirable to provide opportunities for students to experience assessment
forms and formats before they count. Building mock examinations into the module
or course and giving students feedback on their approach and success is one way
that this can be done. Having formative assessment that mirrors the summative
assessment can also be helpful. This is especially true for students at the School
who may have had very diverse experiences of education and assessment
processes prior to their Masters courses either in London or by DL.
The School has produced some guidance on the delivery of feedback to students
after formal course work assessments - this particularly highlights the need for
clarity, transparency and speed of feedback turn around-time (see below).
However, I would also like to emphasise the need to provide constructive feedback
on the formative, practice or mock assessments that are part of the teaching units
at The School. Feedback here needs to be focussed on helping the students to do
it better next time or to coin a phrase Feed-forward.
It may be helpful to keep in mind that ultimately learning is a transformative
process, personal to the individual, that isnt confined by or restricted to set points
of assessment. Marks provide useful measures and milestones, particularly within
formal course structures, but we also want our students to understand that learning
is life long, and to develop the skills needed to become sophisticated life-long
learners.

EXERCISE
How can you provide support for your students as they prepare for and
participate in examination assessments?

- 49 -

Writing Good Exam Questions

Feedback of progress to students


(Assessment Policy 2009)
The system of feedback should be made clear to students before
they undertake their first piece of assessment.

For courses taught face-to-face in London, this may be


set out in course handbooks and should be explained or
restated by the Course Director at an appropriate
point.

For courses delivered by Distance Learning, details of


what students can expect will be made clear in course
and module handbooks or other materials as appropriate.

For Course work components - Students should be given feedback


on their progress within a defined time period, measured in
weeks, during which double-marking takes place, feedback is
written, and grades and feedback are passed to Course
Administrators in the Teaching Support Office or DL Office to
send to students. Feedback will consist of full comments on
the piece of work plus a grade, the former being used to give
informative guidance to the student on progress made.

For courses taught face-to-face in London, the standard


turnaround time for marks to be agreed and feedback
given to the student is within either three weeks of
the deadline for handing in the work in term time, or
the end of the first week of the next term, whichever
is later.

For courses delivered by Distance Learning, turnaround


times for marks to be agreed and feedback given to the
student may be more variable; but clear guidance on the
timeframe within which this should happen will be given
to all marking and administrative staff.

- 50 -

Writing Good Exam Questions

Feedback on Examinations
School policy is that for coursework and project reports, students should receive
individual feedback to aid their learning. For the June exams, students receive their
grades. For DL courses, Examiners Reports for Students are prepared on
expectations with references to marking schemes.
Here is an example of an Examiners Report for Students that shows variety in
feedback depending on question type. Please note where the Examiner has
identified what would be required for a sound pass and what would be expected of
an answer awarded the higher grades.

ID2 Examiners report for students


Question 1
Overall all the sections of question 1 were well answered by the candidates who
attempted this question. The standard of the answers was high and showed depth of
understanding. All major points were covered form questions 1a-1e. The points that
were expected to be included to gain a good mark are detailed below.
a) Gram stain
Expected: Differential staining technique that distinguishes bacteria based on cell
wall structure. Description of stain (crystal violet and iodine) and staining technique
(acetone decolourises Gram negative cells). Gram positive stain purple, Gram
negative stain pink. Gram stain also allows shape of bacteria to be noted ie round
verses rod shaped.
b) Lipopolysaccharide
Expected: Cell wall of Gram-negative bacteria. Structure: Lipid A (toxin activity),
conserved core polysaccharide (KDO), O-specific side chain polysaccharides
responsible for many serotypes for example in salmonella. Endotoxin. Toxic shock
syndrome.
c) Bacillus anthracis
Expected: Gram-positive. Spores. Aerobe. Toxins edema factor/lethal factor,
capsule. Anthrax: Lung or skin infection. Zoonosis. Biological warfare. Ciprofloxacin,
penicillin.
d) Treponema pallidum
Expected: Spirochete. Motile, Gram-negative. Cannot be cultured.
Immunofluorescence test. Syphilis, STD. Three stages, primary, secondary, tertiary.
HIV associated.
- 51 -

Writing Good Exam Questions

e) Corynebacterium diphtheriae
Expected: Gram-positive. Non-motile. Blood tellurite agar (black colonies). Child
hood infection of upper respiratory tract. Aerosol. Toxin (tox gene; regulated by low
iron concentrations. Vaccination against toxin; penicillin kills bacteria but does not
inactivated toxin.

Question 2
For a safe pass, the student should have discussed that N. meningitidis is Gramnegative diplococcus, non-motile and lives in a
Comment (M1): This is helpful for the
certain percentage of upper respiratory tracts
student to know what is a safe pass
within the population. They cause meningitis and
other diseases / symptoms by crossing the blood brain barrier through the same
path, which neutrophils use. As virulence factors, they have pili and fimbriae to
attach, endotoxins causing inflammation to help entry, capsules to interfere with
complement attack as well as phagocytosis, killing and degradation by
macrophages/neutrophils, and IgAase to neutralize IgA. They can be typed be
several capsule serotypes, which are not all covered by the available vaccine.
Diagnosis needs to be very fast, since the most affected ones are children and
teenagers, which can succumb to the disease rather fast. Antibiotic therapy needs to
be started quickly. It would have been excellent to name a few relevant antibiotics.
For diagnosis, growth test using CF and blood on chocolate / blood agar, and test for
sugar usage, and the latex agglutination test should be mentioned, as well as other
possible test including PCR. The more details the better the score.

Question 3
For a safe pass the student should have named 2 zoonotic infections such as
brucellosis, salmonellosis (Salmonella typhimurium), listeriosis (M. bovis),
leptospirosis, psittacosis, tularaemia (Francisella), anthrax (Bacillus anthracis),
Coxiella (Q fever), Lyme disease and so on, so lots of choices.
Better grades could have been achieved by describing their reservoirs, life cycles
and diseases in detail as well as how they can be
Comment (M2): This is good
controlled. Some of them have a more
complicated life cycle and are transmitted by vectors (Borrelia, Coxiella, Y. pestis);
some come from specific hosts (M. bovis from ruminants, Leptospira from rats); B.
anthracis makes spores and is therefore difficult to eliminate by simple disinfection,
and the cadavers need to be incinerated. If these issues were detailed, students
would have scored high marks.

- 52 -

Writing Good Exam Questions

Question 4.
This question is based on the paper in your reader (Bahl et al). The questions help
you understand and interpret the data that are given in the tables, and help you
follow the discussion of the data by the authors.
Comment (M3): This is helpful for
students

a)
First of all, always read the titles/headers of tables carefully, because these
tell you what exactly is presented in the table: what is measured and how, what the
numbers mean, etc. The two tables give you different information, table 1 counts
episodes, and therefore give you incidence, whereas table 2 gives prevalence, that
is days-with-disease during the observation period. Looking at risk for diarrhea, we
see in table 1 that children with low plasma zinc are at increased risk because there
is a higher incidence of diarrhea with a significantly higher RR (Relative Risk,
significant when the confidence interval does not contain 1): 1.47 (1.03, 2.09). There
is also a significantly higher risk for severe diarrhea (1.70), but the RR for prolonged
diarrhea is not significantly different (RR of 2.54, but confidence interval contains 1).
This is further supported by the prevalence data in Table 2, where we see that only
the diarrhea with fever (= more severe diarrhea) is significantly more frequent in the
children with low plasma zinc. There is no significant difference in the prevalence of
the other morbidities between the children with low and with normal plasma zinc (see
the P-values in the table).
b)
First, read carefully. On what data are these statements based? In table 1 you
can see that the nr of episodes of ALRI was not different between the groups
(Confidence Interval contains 1). However, the total number of days with ALRI, as
presented in Table 2, was significantly higher in the children with low plasma zinc.
Therefore, one has to conclude that there must have been more days per episode in
the children with low plasma zinc.

- 53 -

Writing Good Exam Questions

15. Concluding Remarks


It is sincerely hoped that this workbook has provided the necessary guidance and
information you need to be able to produce demanding but fair examination
questions and their associated marking guidance and assessment criteria, for your
modules.

To conclude here are a few summary points,


The process

Start early the process of innovating, drafting, gaining feedback, re-drafting


etc takes time and it is important to begin thinking about examination
questions in good time.
Collect all relevant documents together before you start, e.g. the learning
outcomes for your module, standard grading schedules and criteria, any past
examination papers you have from recent years, any feedback from internal
and external examiners etc.
Produce examination questions and marking guidance in parallel, at the same
time thinking about what you expect from your students will help you to
clarify your questions.
Think about who can help you fine-tune your draft questions colleagues can
see things with fresh eyes and avoid ambiguity they can also give you
feedback on how long the question will take the students to answer.

The examination questions

Questions that are divided into discrete sub-sections and are accompanied by
their associated marking schedules have many benefits for both students and
markers providing clarity in presentation and grading reliability.
Include data or information in the question to reduce the emphasis on memory
and increase the emphasis on application and critical thinking.
Check that your draft question does not favour or disadvantage students from
particular backgrounds or cultures.
Keep sentences short, layout clear and well spaced out and use precise and
unambiguous language.
Check that the question standard and assessment criteria are at Masters
level.
Check does the question enable students to excel and allow markers to
discriminate between able and excellent performances.

- 54 -

Writing Good Exam Questions

The marking guidance

Marking guidance can take many forms it could be a model/specimen


answer or a list of elements that should or could be included in an answer or
be a worked calculation etc
However, marking guidance should go beyond lists of content or topics and
should also address format, analytical level, originality etc
Guidance should include a marking scheme that discriminates between
possible grades and is clear to all co-markers.

Other forms of assessment


This workbook has focussed on the challenge of writing exam questions. However,
many of the principles and good practices highlighted are equally applicable to the
design of in-course assessments such as assignments, reports and projects.
Where there is more than one assessment task for a module or course it is important
to ensure that certain learning outcomes are not over assessed whilst others are
neglected.

- 55 -

Writing Good Exam Questions

Further Reading Suggestions


Bloom, B. S., Krathwohl, D. R., and Masia, B. B. (1956). Taxonomy of Educational
Objectives: The Classification of Educational Goals, New York, NY: D. McKay.
Brown, G., Bull, J. and Pendlebury, M. (1997)
Assessing student learning in higher education
Routledge
Crowe, A., Dirks, C. and Wenderoth,M.P. (2008) Biology in Bloom: Implementing
Blooms Taxonomy to Enhance Student Learning in Biology. CBE Life Sci Educ 7(4):
368-381 2008
American Society for Cell Biology. Available at
http://www.lifescied.org/cgi/content/full/7/4/368
Coutinho, S. A. (2007). The relationship between goals, metacognition, and
academic success. Educate 7, 3947
Entwistle, A., and Entwistle, N. (1992). Experiences of understanding in revising for
degree examinations. Learn. Instruct. 2, 122.
Haines, C. (2004)
Assessing Students Written Work: Marking essays and reports
Key guides for effective teaching in higher education
RoutledgeFalmer
McMillan, J.H. (2001) Classroom assessment :Principles and practice for effective
instruction. Boston:Allyn and Bacon
Pass-it Good Practice Guide
http://www.pass-it.org.uk/resources/031112-goodpracticeguide-hw.pdf.
Piontek, M.E. (2008) Best Practices for Designing and Grading Exams.
Centre for Research on Learning and Teaching, Occasional Papers no. 24,
University of Michigan, http://www.crlt.umich.edu/publinks/occasional.php

- 56 -

Writing Good Exam Questions

Useful web-sites
Center for Instructional Development and Research
Resources Writing Exam Questions
A collected set of web-links and guidance sites on writing exam questions. Most from
institutions in the USA. Lots of information on writing MCQ Questions and comparing
them with other forms of written assessments.
http://depts.washington.edu/cidrweb/resources/exams.html

Key School documents used as reference material in this Workbook


(and where to find them)
LSHTM Assessment Code of Practice (January 2012)
http://www.lshtm.ac.uk/edu/taughtcourses/staffresources/index.html
Guidelines for writing exam questions
http://www.lshtm.ac.uk/edu/taughtcourses/exams_assmt_staff/index.html
Assessment Irregularities Procedures
http://www.lshtm.ac.uk/edu/taughtcourses/handbooks_regs_pols/index.html
MSc Marking Scheme
http://intra.lshtm.ac.uk/registry/regulations/taught_regulations/index.html

- 57 -

Writing Good Exam Questions

Appendices
Appendix 1.
Feedback comments are inserted in bold below each question.

EXERCISE
For the questions given below Underline the verb and key elements of the question that give an indication of
the extent (limits and boundaries) of the question.
Do you feel these are appropriate for Masters level study?

1.
Describe the three main methods of economic evaluation (40%). What are the
main strengths and weaknesses of each method? (40%). Support your answer with
examples of disease evaluation (20%)
Describing is a relatively low level cognitive skill but then the student is
asked to evaluate the three methods by giving strengths and weaknesses
this is the Masters level task in this question.
Factors that give limits are the requirement to describe three methods and to
support the answer with examples.

2.
A recent retrospective analysis of health records in the Gambia has
suggested that the incidence of malaria has fallen dramatically in that country over
the last 10 years. The elimination of the disease is beginning to be discussed. The
National Malaria Control Programme has begun a surveillance system to detect
future changes.
What advice would you give the National Malaria Control Programme on how to
organize a surveillance system for malaria. Give practical tips for ensuring its quality.
Giving Advice requires the students to select from and apply their knowledge
in order to synthesise an appropriate surveillance system this is Masters
level Students are also asked to consider what makes a such a system
Quality this could be considered a further degree of difficulty. The limits in
this question are given by the scenario of the question which makes it specific
to a country and a disease context.

- 58 -

Writing Good Exam Questions

3.
Write short notes on THREE of the following. In each case explain the
importance of the infectious agent and the mode of transmission in its spread and
control.
a)

rotavirus diarrhoea

b)

measles

c)

guinea worm

d)

dengue

e)

tuberculosis

This question does not clearly articulate Masters level requirements as the
Write short notes does not indicate a level and the Explain the importance
may or may not require some level of evaluation and critique but could equally
be a measure of memory depending on what had been taught in the module.

- 59 -

Writing Good Exam Questions

Appendix 2.
A detailed example
re-writing and formatting a question to ease interpretation
(related to chapter 6.)
This example has kindly been provided by the teaching team responsible for one of
the DL programmes delivered by the School Fundamentals of Clinical Trials. It
shows clearly the way a set of guiding principles are used to mould a clearer
question context from a great idea to a very demanding but fair question set-up.
The team wanted to write a question that tested their students abilities to think about
and apply key concepts rather than re-work the study materials provided. Past
experience had underlined the importance of providing relevant and realistic
question contexts and considerable effort is made to vary the scenarios used in
question setting.
What is presented here is the first draft of the question some team discussion
notes and then the final question as it was used to asses the DL students.

An Early Draft Of Question Y Context

(including feedback notes from members of the module team presented in


boxes)
One of the concerns in the treatment of babies born to HIV+ mothers in developing
countries is the transmission of HIV from mother-to-child during breast-feeding.
Infant formula, if used safely and consistently can prevent HIV from passing from
mother to child but can result in increased infant mortality.
In Botswana, free formula is provided and recommended for babies born to HIVinfected-mothers who are able to safely and consistently formula feed their infants.
Despite the availability of milk power, a number of HIV-infected-mothers continue to
choose breastfeeding. This may be due in part to difficulties in consistently preparing
safe-formula feeds and also the stigma associated with being seen to formula feed.
For these HIV-infected-mothers exclusive breastfeeding is recommended with early
weaning.
To investigate potential strategies to prevent
mother-to-child transmission of HIV the following
randomized control trial (RCT) is planned:

Comment (D1): I think we can make


each comment a bit snappier using the
headings of a protocol document? e.g.
recruitment criteria, exclusion criteria

Recruitment Criteria: Pregnant HIV infected mothers, who are currently not
receiving antiretrovirals (ART) and who plan to breast feed
Randomisation: Women to be randomized into 2 groups.

- 60 -

Writing Good Exam Questions

? Both groups of women receive antiretroviral


Comment (D2): Dont know what to
as per current standard of care during
call this? This is ethically lead as well as
pregnancy to reduce the risk of HIV passing
base line driven???
from the mother to the child, and their infant
takes 1 month of prophylactic antiretrovirals following delivery.
Interventions for comparison:
o Group One: mothers discontinue ARVs after delivery (unless ARV needed for
their own health)
o Group Two: mothers continue ARVs for 6 months after delivery,
The primary endpoint: the proportion of babies alive and uninfected with HIV by
7 months of age.
Secondary endpoints: include cumulative HIV free survival at 7 and 18 months
and safety of maternal ARV prophylaxis for
Comment (D3): I brought this one up
HIV exposed infants.
from down below!!!

Teaching Team comments: We were worried


Comment (D4): Lets talk over the
about this being a more difficult context to grasp
phone because I still think we have some
(i.e the intervention is given to the mother but that way to go with setting up this context.
We need to make it as clear as possible
the impact of the intervention is on the infant, the
for the students and this is a complicated
primary outcome that we are monitoring is HIV
trial!
survival in the infant) for those whose first
language is not English and those who do not have specialist knowledge of HIV.
However, we felt that this context was in keeping with the level of understanding we
would expect from students enrolled on the module. It is also reflective of a study
based in a developing country. Thus we felt with re-presentation of the context,
using a table form, it would be easier to understand.
b. Suggested Improvements to Question Y Context
HIV can be transmitted from mother to child during pregnancy, during delivery, and
after delivery, through exposure to HIV via breast milk. Prevention of mother-tochild transmission (PMTCT) of HIV therefore focuses on interventions that reduce
the risk at each of these times. A research team wishes to determine the efficacy
and safety of adding maternal antiretroviral prophylaxis during breastfeeding to
the current local standard of care, for PMTCT of HIV. The current standard of care
includes antiretroviral prophylaxis for the pregnant mother and one-month of
prophylaxis for the infant after birth. Having identified their primary outcome as infant
HIV-free survival at 7 months, they plan the following randomised control trial (RCT):

- 61 -

Writing Good Exam Questions

Table 1: Proposed post randomisation* treatment of the intervention and

Intervention
Arm

Control Arm

During pregnancy
and labour/delivery
Maternal Standard of
Care
Maternal
antiretroviral
prophylaxis from 28
weeks gestation

After Delivery/During Breastfeeding

Maternal
antiretroviral
prophylaxis from 28
weeks gestation

Usual care ie without


6 months maternal
antiretroviral
prophylaxis

Maternal Intervention
6 months maternal
antiretroviral
prophylaxis

Infant Standard of
Care
1 month infant
prophylaxis

1 month infant
prophylaxis

control
*Randomisation: Pregnant women are randomised 1:1 to the intervention or
control arm

- 62 -

Writing Good Exam Questions

Appendix 3
Extended Case Study showing the Development of a real Exam Question
- CT101 Fundamentals of Clinical Trials

This extended case study, based on a real example, aims to show the stages
of development that the question went through and reflections on the process
made by the course team (shown in comment boxes)
The case study includes the following sections

3.1

3.1
3.2

Question Background
A Work in Progress (presented in four steps)
i) An Early Draft with Feedback (Autumn Term)
ii) The Question Amended after Feedback from the Exam Chair (July)
iii) Some Fine Tuning (Final Version)
iv) A Completed Work? (Some reflections on the use of the question)

3.3

A reflective exercise

Question Background

Question Motivation: We wanted to move away from the overused cardiology drug
trial examples of previous exam papers. We have a diverse tutor team that included
clinical trialists working at the Institute of Mental Health and we were inspired by a
BMJ article by Goodyer et al reporting on a Mental Health trial on major depression
in adolescents.
Question Context: A Mental Health Trials (major Depression) taking place in
Adolescents. The intervention of interest was Cognitive Behavioural Therapy (CBT)
as an add-on to the standard drug treatment, selective serotonin reuptake inhibitors
(SSRIs). The outcome measurements were determined by a mental health
questionnaire.
Overall Objective: To see if students could apply key fundamental principles of
clinical trials to:
a unique patient base (i.e. adolescents)
a non-drug intervention (i.e. CBT)
an outcome that is measured by questionnaire (rather than by clinical
measurements).

- 63 -

Writing Good Exam Questions

Assessment Needs: The exam was composed of two questions. Prior to question
setting, we identified and allocated the key concepts (as covered in the distance
learning study material for this module) to be tested for each question. For this
question the chosen key principles to test were:
Trial designs;
Recruitment; Blinding;
Randomisation;
Bias
The second question was to be much more numerical/statistical in nature, thus this
first question excluded calculation type questions. Question two also included
questions specifically designed to be grade differentiators. The first question was
seen as testing students understanding and application of central and
straightforward concepts.
Question Types: We aimed to include a range of question types.
Who was involved in question development?: Three key tutors, course
director(s), the external examiner and exam chair. So there was lots of input from a
number of experts, many drafts, and a long communication trail before agreeing a
final version. What follows is a tracking on this process
3.2

A Work In Progress (Four steps i, ii, iii and iv)

Step i

An Early Draft with Feedback (Autumn Term)

The question is given in plain text, marking guidance is indented and italics and
module teams discussion notes are the comments boxes alongside the text..
The Question
Selective serotonin reuptake inhibitors (SSRIs) are prescribed for the treatment of
major depression in adolescents (age 11-16),
Comment (A1): What sort of depression
although there are concerns regarding their
major? Should we define with a
usefulness and a raised risk of suicide. The
depression score?
National Institute for

Comment (A2): Should we define


adolescents? 11-16? Not certain what is
the standard.

Health and Clinical Excellence (NICE) recommends the use of SSRIs in combination
with Cognitive Behavioural Therapy (CBT) in the UK. This recommendation is based
on data collected from the United States.

- 64 -

Writing Good Exam Questions

Investigators plan to conduct a pragmatic (effectiveness) randomised controlled trial


of SSRIs versus SSRIs with cognitive behavioural
Comment (A3): I know generally it is
therapy (1 therapy session per week) in the UK in
good to split questions out, but I think
adolescents with major depression.
here we could give more information in
the scenario then ask the student to
explain. My thinking is that depending
on the objectives an efficacy trial might
be of interest. So it really depends what
your end game is.

a) Why do you think the investigators plan to conduct a pragmatic trial as


opposed to anexplanatory (efficacy) trial?
(10 marks?)
Comment (A4): Will need to return to
The efficacy of an intervention is
marks to check them later.
the benefit it achieves under ideal
conditions.The effectiveness of an
Comment (A5): Need to include here
why this is good. Then what is the
intervention is the benefit it
weakness of this approach.
achieves through routine clinical
practice. In this case research is
aimed at inform standard practice in the UK. Thus it is important to
compare how each intervention performs at the GP/community level
in order to inform a policy
Comment (A6): There is lots of good
decision.
discussion on blackboard. Check bb to
pad out model answer.

c) What design and conduct features would you apply in order for the trial to be
explanatory or pragmatic?
(10 marks)
Comment (A7): Im not certain I would
know how to answer this question.
Design and conduct in one question
overwhelms me sorry Im just a babe
in arms really! Im guessing your
direction is to think about an intention to
treat analysis and how we define a
protocol deviation. If we continue down
the pragmatic route do you think we
could streamline this question?

Think about the eligibility criteria and how restrictive this should be
Think about who will be delivering the therapy intervention, what
training and experience these people would have.

- 65 -

Writing Good Exam Questions

How strict do you want to be in ensuring that the therapy is being


delivered according to the manual?
How much effort should be in place to minimise deviations from the
protocol such as treatment withdrawals, compliance to medication
and therapy.

d) Please discuss three potential barriers to recruitment (including consent) and


retainment of adolescents on this trial.
(15 marks)
Comment (A8): Hi have reworded
here but not necessarily for the better. Is
three ok? Should we be overt about
consent? Will need to make marking
more friendly to three points eg 15
marks.

Parents may not want to enter their children into a trial involving an
SSRI because of the risk of suicide, especially as they have major
depression. Therefore recruitment may be slow. As NICE
recommends SSRIs in conjunction with CBT based on US data there
may not be equipoise for this trial. Therefore clinicians may be
unwilling to randomise highly depressed children and their parents
may also be unwilling to be involved.
Both treatments would be available outside of the trial and therefore
there is not as much incentive to take part in a clinical trial. The
population may include those younger than 16 and therefore specific
consent and assent procedures would need to be put in place.There
may be a larger drop out in the CBT arm due to the extra burden of
having to attend numerous therapy sessions. Alternatively the extra
attention may be beneficial and increase retainment.
e) At the design stage a third treatment arm was suggested for inclusion
consisting of placebo only. What would be the advantages and disadvantages of
including this treatment arm?
(4 marks)
Comment (A9): Like the idea of this
question because you have to think
about it. But not certain whether it is a
step too far for the students. I think
maybe we could drop and ask about
randomisation? I think we have logged
about 10 marks.

The disadvantages would be that it would be ethically unacceptable


to include a placebo treatment for this group of participants.
Including a third arm would increase the numbers needed to be
recruited.
The inclusion of a placebo arm would allow a direct evaluation of
treatment against no treatment.
- 66 -

Writing Good Exam Questions

The primary outcome of the trial was the Health and Nation Outcome scale which is
a 12 item scale covering a wide range of health and social domains such as
psychiatric symptoms, physical health,
Comment (A10): Lovely. Should we be
functioning, relationships and housing. Each
adding anything about a composite score
question is marked from 0 (no problem) to 4
and how we use that to conclude? Or is
this too much info? (I.e. what is
(severe problem). This was completed by an
considered as an improvement? I think
interviewer at 12 weeks post randomisation.
this information can come later)

Two hundred adolescents were to be recruited into the trial from six centres. Simple
Randomisation was used to allocate treatment in the ratio 1:1. Each centre had one
interviewer collecting data and several therapists giving CBT.

f) Discuss whether blinding of the participant, interviewer and therapist would be


possible in this trial.
(10 marks)
The participant could not be blinded as they would know whether
they were receiving treatment or not. The interviewer could try to
remain blind to allocation, but likely that the patient would mention
what they had been receiving. The therapist would not be blind to
treatment allocation
g) Identify and discuss possible sources of bias that could occur in each of the
design, conduct and analysis stages of this trial.
(15 marks)
Bias is defined as systematic distortion of the estimated intervention
effect away from the truth caused by inadequacies in the design,
conduct or analysis of a trial.

The outcome measure is subjective, different interviewers could be


rating people very differently.
It is not possible to blind participants so what they may report may
be dependent on their treatment group.
More experienced therapists could always treat the more severely
depressed participants
We could end up with baseline imbalance for important variables
such as centre, sex, severity of depression as these have not been
taken account of in the randomisation scheme and due to relatively
small numbers we could expect some imbalance when using simple
randomisation.
Need to ensure that no selection bias has taken place.
New Question on Randomisation: What are the advantages and disadvantages of
using simple randomisation in this trial?
Easy verus inbalance of sample size no.
Might recommend randomised permuted blocks?
- 67 -

Writing Good Exam Questions

The primary analysis of this trial showed that at 12 weeks post randomisation the
mean (standard deviation) of the primary outcome
Comment (A11): Am I right in
was 18 (CI 7.5) in the SSRI group and 17.1
remembering this as the confidence
(CI8.3) in the SSRI plus CBT group. The
interval?
difference between the two groups was not
statistically significant under an intention-to-treat analysis.
h) What can you conclude from this result?
j) What other information would you consider when interpreting the results of this
trial, think particularly about what may be reported in the publication?
(20 marks)
Is the sample size large enough?
How many people were included in the analysis?
Was this the best and most appropriate design?
Has there been substantial bias introduced?
What were the results of the secondary outcomes, in particular
safety?
Are the conclusions similar under a per protocol analysis?
Was the randomisation successful, i.e. are the treatment groups
balanced?
Is the trial population generalisable to inform policy decisions?

Step ii

The Question Amended after Feedback from the Exam Chair (July)
(Ready for Review By The External Examiner)

The Question Re-worked


Selective serotonin reuptake inhibitors (SSRIs) are prescribed for the treatment of
major depression in adolescents (age 11-16),
Comment (L1): JR Condense to
although there are concerns regarding their
whats needed to answer the question.
usefulness and a possible raised risk of suicide.
The National Institute for Health and Clinical
Comment (R1): EL Yes, but we also
Excellence (NICE) recommends that the National
need to be careful not to disadvantage
those who know nothing in this area.
Health Service (NHS) in the UK use SSRIs (oral
drug) in combination with Cognitive Behavioural
Comment (L2): Can anyone see how to
Therapy (CBT). This recommendation is based
edit this further?
on data collected from the United States, which
could limit the generalisability to depressed adolescents in the NHS.
Investigators in the UK conducted a pragmatic (effectiveness) randomised controlled
trial of SSRIs alone versus SSRIs with cognitive behavioural therapy (1 therapy
session per week) in adolescents with major depression. Participants were treated
for 12 weeks.

- 68 -

Writing Good Exam Questions

b)

Explain what is meant by a pragmatic randomized control trial. Why do you


think these investigators conducted a pragmatic trial rather than an
explanatory trial? (Hint: You will need to consider the advantages and
disadvantages of each approach.)
(12 marks)
Pragmatic trials
A pragmatic trial determines the effectiveness of an intervention. This
is the benefit it achieves through routine clinical practice
Advantages: They are generalisable to routine clinical practice
Disadvantages: Larger sample sizes are often required
Explanatory trials
An explanatory trial determines the efficacy of an intervention. This is
the benefit it achieves under ideal conditions.
Advantages: Variation can be reduced due to the strict procedures and
so inferences can be made from smaller sample sizes
You can determine whether the intervention actually works
Disadvantages: They are conducted under strict procedures and so not
very generalisable to routine clinical practice.
In this case research is aimed at inform standard practice in the UK.
Thus it is important to compare how each intervention performs at the
GP/community level in order to inform a policy decision. Compliance
may be an issue in this age group.

b)

Given this trial is conducted in adolescents, discuss the potential challenges to


recruitment and retainment of study participants.
(6 marks)
Parents may not want to enter their children into a trial involving an
SSRI because of the risk of suicide, especially as they have major
depression. Therefore recruitment may be slow.
As NICE recommends SSRIs in conjunction with CBT based on US
data there may not be equipoise for this trial. Therefore clinicians
may be unwilling to randomise highly depressed children and their
parents may also be unwilling to be involved.
The two previous issues (for some too high risk, for others
effectiveness already proven) could be described as a problem of
equipoise that could affect recruitment.
Both treatments would be available outside of the trial and
therefore there is not as much incentive to take part in a clinical trial.
The population may include those younger than 16 and therefore
consent procedures for non adults is more complicated and
challenging
There may be a larger drop out in the CBT arm due to the extra
burden of having to attend numerous therapy sessions. Alternatively
the extra attention may be beneficial and increase retainment.
- 69 -

Writing Good Exam Questions

They should get some point for mentioning that because this is a
pragmatic trial the drop out will reflect the normal situation as what
they are evaluating is a policy of recommending CBT it will not bias
the research question
Adolescents may drop out when they leave school
Two hundred adolescents were to be randomly assigned into the trial from six
centres. Each centre had one interviewer collecting data and several therapists
giving CBT.
The primary outcome of the trial was the total score of the Health and Nation
Outcome scale which is a 12 item scale covering a wide range of health and social
domains such as psychiatric symptoms, physical health, functioning, relationships
and housing. Each question is marked from 0 (no problem) to 4 (severe problem).
This was completed by an interviewer at 12 weeks post randomisation.
The primary analysis compared the average total score of the Health and Nation
Outcome scale at 12 weeks post randomisation between treatment groups (SSRI
alone versus. SSRI+CBT).

c)

Explain the term double-blind in the context of a randomized control trial.


Discuss whether double-blinding would be possible in this trial.
(8 marks)
The term double-blind means that neither the participant nor the
person treating the patient (i.e. doctor and/or/both therapist), nor the
person responsible for evaluating the outcome (the researcher) know
whether the patient has been allocated to treatment or not. In this
case we also have the interviewer to think about too, who may or
may not be the evaluator.

The participant could not be blinded as they would know


whether they were receiving treatment or not

The interviewer could try to remain blind to allocation, but likely


that the patient would mention what they had been receiving

The therapist would not be blind to treatment allocation

It might be possible to blind the analyst/data processing team.

d) Identify and discuss three possible sources of bias that could occur in this trial.
(6 marks)
5 marks for each well explained challenge up to a maximum of 15
The outcome measure is subjective, different interviewers could
be rating people very differently.
It is not possible to blind participants so what they may report
may be dependent on their treatment group.
More experienced therapists could always treat the more
severely depressed participants
- 70 -

Writing Good Exam Questions

We could end up with baseline imbalance for important variables


such as centre, sex, severity of depression as these have not been
taken account of in the randomisation scheme and due to relatively
small numbers we could expect some imbalance when using simple
randomisation.
Need to ensure that no selection bias has taken place.
Inclusion/exclusion criteria
Timing of collection of outcomes
Experience of interviewers/therapists
Number of CBT sessions needed.
Concomitant medications
Therapies previously received
The process of implementing randomisation should be clear

e)

In this trial, the researchers used simple randomization to allocate patients to


either SSRI alone or SSRI in conjunction with Cognitive Behavioral Therapy.
Explain what is meant by simple randomization. Discuss its limitations and
describe an alternative method that might be appropriate for this trial.
(12 marks)
Simple randomisation is the equivalent to tossing an unbiased coin.
For example, heads may mean allocation to one arm and tails to the
other arm. Using randomisation lists prepared by this method
ensures each patient has an equal chance of either treatment plan.
Randomisation lists are most commonly generated by computer.
However, given that this trials is of limited size, there is the possibility
that substantial imbalance might occur between the two arms.
In order to ensure similar numbers in the treatment groups
throughout the trial, some form of restricted randomisation is usually
employed. The most common form of this is with the use of random
permuted blocks (block sizes of 4,6,8,10 etc). The idea behind
random permuted blocks is that the number of patients allocated to
each arm of the trial is the same at certain points in the recruitment
process. For example if a block size of 10 is used, then for each
'block' of 10 patients, five will be allocated to each treatment with the
sequence of treatment allocations within each block ordered
randomly with no connection between assignments.
The problem of predictability increases as the block size decreases.
For example, a block size of two would mean every other treatment
could be predicted. It is also advisable not to inform the investigators
that blocking was used and in particular what size the block is.
However, it is relatively straightforward for an investigator to work out
the block size after a number of patients have been randomised.

- 71 -

Writing Good Exam Questions

The primary analysis of this trial showed that at 12 weeks post randomisation the
mean (standard deviation) of the primary outcome was 18 (SD=7.5) in the SSRI
group and 17.1 (SD=8.3) in the SSRI plus CBT group. The difference between the
two groups was not statistically significant under an
intention-to-treat analysis.
Comment (L3): JR This is too broad
and vague a question. Change to
something more specific.
Comment (R2): EL Tom/Luke any
ideas to make this more specific?
Comment (L4): JR But why wasnt
baseline score used and ANCOVA
adjusted for baseline done?

f)

What other information would you consider important to report in the publication
of this trial to be able to interpret its results?
(6 marks)
Is the sample size large enough?
How many people were included in the analysis?
Was this the best and most appropriate design?
Has there been substantial bias introduced?
What were the results of the secondary outcomes, in particular
safety?
Are the conclusions similar under a per protocol analysis?
Was the randomisation successful, i.e. are the treatment groups
balanced?
Is the trial population generalisable to inform policy decisions?

Step iii

Some Fine Tuning

Final Version of the Question


Selective serotonin reuptake inhibitors (SSRIs) are prescribed for the treatment of
major depression in adolescents (age 11-16), although there are concerns regarding
their usefulness and a possible raised risk of suicide. In the UK, SSRIs are
recommended in combination with standard Cognitive Behavioural Therapy (CBT).
This recommendation is based on data from the United States, which could limit
applicability for practice in the UK. Investigators in the UK therefore conducted a
pragmatic randomised controlled trial of SSRIs alone versus SSRIs with a 12 week
course of CBT in adolescents with major depression.
a)

Explain what is meant by a pragmatic randomised controlled trial. Why do


you think these investigators conducted a pragmatic trial rather than an
explanatory trial? (Hint: You will need to consider the advantages and
- 72 -

Writing Good Exam Questions

disadvantages of each approach.)


(12 marks)
Pragmatic trials
A pragmatic trial determines the effectiveness of an intervention.
This is the benefit it achieves through routine clinical practice
Advantages: They are generalisable to routine clinical practice
Disadvantages: Larger sample sizes are often required
Explanatory trials
An explanatory trial determines the efficacy of an intervention. This
is the benefit it achieves under ideal conditions.
Advantages: Variation can be reduced due to the strict
procedures and so inferences can be made from smaller sample
sizes
You can determine whether the intervention could actually work
Disadvantages: They are conducted under strict procedures and
so not very generalisable to routine clinical practice.
There is already efficacy information but in this case research is
aimed at informing practice in the UK. Thus it is more important to
conduct a pragmatic trial set within UK practice.

b)

Given this trial is conducted in adolescents, discuss two potential challenges to


recruitment of study participants.
(6 marks)
Parents may not want to enter their children into a trial involving an
SSRI because of the risk of suicide, especially as they have major
depression. Therefore recruitment may be slow.
As UK recommends SSRIs in conjunction with CBT based on US
data there may not be equipoise for this trial. Therefore clinicians
may be unwilling to randomise highly depressed children and their
parents may also be unwilling to be involved.
The population may include those younger than 16 and therefore
consent procedures for non adults is more complicated and
challenging
Adolescents may drop out when they leave school

Two hundred adolescents were to be randomly assigned into the trial from six
centres. The primary outcome of the trial was the total score of the 12 item Health
and Nation Outcome scale covering psychiatric symptoms, physical health,
relationships and housing. This was completed by an interviewer at 12 weeks post
randomisation.
c)

Explain the term double-blind in the context of a randomised controlled trial.


Discuss whether double-blinding would be possible in this trial.
(9 marks)
The term double-blind means that neither the participant nor the
person treating the patient (i.e. doctor and/or/both therapist), nor the
- 73 -

Writing Good Exam Questions

person responsible for evaluating the outcome (the researcher) know


whether the patient has been allocated to treatment or not. In this
case we also have the interviewer to think about too, who may or
may not be the evaluator.

The participant could not be blinded as they would know


whether they were receiving treatment or not

The interviewer could try to remain blind to allocation, but likely


that the patient would mention what they had been receiving

The therapist would not be blind to treatment allocation

It might be possible to blind the analyst/data processing team.


d) Identify and discuss three possible sources of bias that could occur in this trial.
(9 marks)
The outcome measure is subjective, different interviewers could be
rating people very differently.
It is not possible to blind participants so what they may report may
be dependent on their treatment group.
Timing of collection of outcomes might differ between groups
because of their different experiences in the trial and possibly
different retention rates
Concomitant medications
as cant be blind post-randomisation, possible that people recruiting
might begin to see a pattern and guess what next patient would be
allocated to
Disappointment bias if some people wanted CBT and didnt get it

e)

In this trial, the researchers used simple randomisation to allocate patients to


either SSRI alone or SSRI with CBT. Explain what is meant by simple
randomisation. Discuss its limitations and describe one alternative method that
might be appropriate for this trial.
(14 marks)
Simple randomisation is the equivalent to tossing an unbiased coin.
For example, heads may mean allocation to one arm and tails to the
other arm. Using randomisation lists prepared by this method
ensures each patient has an equal chance of either treatment plan.
Randomisation lists are most commonly generated by computer.
Main limitations are possible imbalances between the two arms
either in terms of very different numbers between the two groups or
in terms imbalances in key prognostic factors between the groups.
Both limitations likely given this trials planned size.
In order to ensure similar numbers in the treatment groups
throughout the trial, some form of restricted randomisation is usually
employed. The most common form of this is with the use of random
- 74 -

Writing Good Exam Questions

permuted blocks (block sizes of 4,6,8,10 etc). The idea behind


random permuted blocks is that the number of patients allocated to
each arm of the trial is the same at certain points in the recruitment
process. For example if a block size of 10 is used, then for each
'block' of 10 patients, five will be allocated to each treatment with the
sequence of treatment allocations within each block ordered
randomly with no connection between assignments.
The problem of predictability increases as the block size decreases.
For example, a block size of two would mean every other treatment
could be predicted. It is also advisable not to inform the investigators
that blocking was used and in particular what size the block is.
However, it is relatively straightforward for an investigator to work out
the block size after a number of patients have been randomised.
To deal with imbalances in key prognostic factors between the
groups may want to stratify by these factors. Centre is often such a
factor, or severity of depression, or age group, or sex. If too many,
may prefer to use minimization which is a dynamic process dealing
with both types of imbalances together, and appropriate for UK
settings where computers easily available.

Step iv

A Completed Work? (Some reflection on the use of the Question)

No! We still needed lots more work on the model answer to make it much
more specific for the exam marking phase. We also didnt like marking it out
of 50 much easier to allocate marks to 100 (but the 50 was a constraint
placed on us by the previous exam board)

Was it successful? Feedback from students:


o They liked the context it was interesting
o The questions overall were clear and fair covering key concepts thus
they had a good opportunity to perform well
o Students need to think and apply concepts rather than recapitulating
study materials.

There is always room for more tinkering!

- 75 -

Writing Good Exam Questions

3.3 A Reflective Exercise

EXERCISE
Please consider and make short notes on the following The Process
When do you begin developing examination questions in your course team?
What are the strengths and weaknesses for you in adopting a similar question
development approach to the one described in the case study above?
Having read this case study what elements would you like to transfer to your
own approach to question writing?
The Question
How would you rate the above in terms of clarity, authenticity and fairness?
How strong would you expect the inter-marker reliability to be based on the
marking guidelines provided?

- 76 -

You might also like