You are on page 1of 6

J. Child Psychol. Psychiat. Vol. 42, No. 4, pp.

551–556, 2001
Cambridge University Press
' 2001 Association for Child Psychology and Psychiatry
Printed in Great Britain. All rights reserved
0021–9630\01 $15n00j0n00

Individual Differences in Cognitive Planning on the Tower of Hanoi


Task : Neuropsychological Maturity or Measurement Error ?
D. V. M. Bishop
University of Oxford, U.K.

G. Aamodt-Leeper, C. Creswell, R. McGurk, and D. H. Skuse


Institute of Child Health, London, U.K.

The Tower of Hanoi (ToH) task was given to 238 children aged from 7 to 15 years, and 20
adults. Individual variation within an age band was substantial. ToH score did not correlate
significantly with Verbal IQ, nor with ability to inhibit a prepotent response. We
readministered the ToH to 45 children after 30 to 40 days. The test-retest correlation of n5 is
low in relation to accepted psychometric standards, though at least as high as reliability of
the related Tower of London (ToL) in adults. The reasons for low reliability remain unclear ;
task novelty did not seem to be involved, as children did not improve on retest. We conclude
that it is not safe to use this test to index integrity or maturation of underlying neurological
systems in children. We compared our results with three published studies using the ToL
with children, and found similar levels of performance on problems involving the same
number of moves. Another study using automated ToL obtained much poorer scores,
suggesting that computerised presentation may impair children’s performance.

Keywords : Assessment, executive function, psychometrics.

Abbreviations : ToH : Tower of Hanoi ; ToL : Tower of London.

The ability to plan ahead to solve a problem is an 1996 ; Krikorian, Bartok, & Gay, 1994 ; Luciana &
executive function that is thought to depend on integrity Nelson, 1998). Intriguingly, whereas most cognitive tasks
of the frontal lobes. The Tower of Hanoi (ToH, see Fig. show improvement both with age and with higher IQ, it
1) has been used to assess this skill, and gives a has been suggested that the ToH is related only to age,
quantitative index of planning ability because one can and uncorrelated with IQ (Welsh, Pennington, &
easily specify the number of steps that are involved in the Grossier, 1991).
solution to a problem. Using a simplified version of this However, just as interest is mounting in assessing
task which he termed the Tower of London (ToL : see planning abilities, questions have been raised about the
Fig. 2), Shallice (1982) demonstrated that certain patients psychometric characteristics of Tower tasks. First, there
with frontal lesions showed pronounced impairments in is the issue of validity. As noted above, these tasks have
planning that could not be accounted for in terms of any traditionally been viewed as indexing the ability to plan
more basic perceptual or memory problems. Further ahead, with difficulty depending on the memory load,
studies, some of which have used a computerised version which increases with the number of moves that are
of the ToL or ToH, have both replicated findings of required to solve the problem (Shallice, 1982). However,
deficits in patients with frontal lesions (Owen, Downes, Goel and Grafman (1995) drew attention to another facet
Sahakian, Polkey, & Robbins, 1990 ; Goel & Grafman, of these tasks, the extent to which they require the person
1995), and also demonstrated frontal activation on PET to inhibit a prepotent tendency to move a disc or ball
imaging when normal volunteers perform this task (Baker immediately to its final peg position. They argued that the
et al., 1996 ; Morris, Ahmed, Syed, & Toone, 1993) ToH task is especially difficult for people with frontal
Impaired functioning on Tower tasks has been found lobe lesions because it requires them to see and resolve a
in various clinical groups, including people with autism conflict between a long-term goal and an immediate
or Asperger syndrome (Ozonoff, Pennington, & Rogers, subgoal, whereas the ToL task is much easier in this
1991) and children with intellectual disabilities (Borys, regard. Burgess (1997), in contrast, suggested that what
Spitz, & Dorans, 1982). Three papers have reported makes these planning tasks so difficult for people with
developmental trends for normal children on different frontal lesions may be their novelty. On this view, once
versions of the ToL (Anderson, Anderson, & Lajoie, the person has had experience with the task and worked
out an optimal strategy, performance is likely to improve.
Humes, Welsh, Retzlaff, and Cookson (1997) noted
that little was known about the reliability of Tower tasks.
Requests for reprints to : Dorothy Bishop, Department of They found only a weak intercorrelation (r l n37) be-
Experimental Psychology, University of Oxford, South Parks tween the ToH and ToL tasks, which they attributed to
Road, Oxford, OX1 3UD, U.K. the low reliability of the ToL, as assessed by an index of
(E-mail : dorothy.bishop!psy.ox.ac.uk). internal consistency (Cronbach α was n25 for ToL vs. n90

551
552 D. V. M. BISHOP et al.

test-retest interval, (and indeed, obtained relatively high


reliability). Other research that found significant im-
provement on retesting was a study by Aman, Roberts,
and Pennington (1998), who administered the test to
children aged from 10 to 14 years on two occasions 1
week apart.
In this study, our main aim was to assess developmental
trends and test-retest reliability for a ToH task. We also
explored sex differences in performance, and correlations
between ToH, verbal ability (which might be regarded as
important for formulating a plan of action and keeping
this in mind), and another executive function measure
testing ability to inhibit a prepotent response. We used a
modification of the original ToH task, as described by
Borys et al. (1982), as data from Krikorian et al. (1994)
and Anderson et al. (1996) suggested that the simpler
ToL task gives ceiling effects in older children. We also
extended the difficulty level by incorporating problems
that involved four discs, and required eight or nine moves
for their solution. To simplify administration and scoring,
we adopted a self-terminating procedure analogous to
Figure 1. The Tower of Hanoi puzzle, showing apparatus that used in the well-known Digit Span test (Wechsler,
consisting of three vertical posts, and three doughnut-like discs, 1992). The child was presented with two problems at each
of different colour and size, that fit on the pegs. The testee is level of difficulty, and was deemed to have solved a
shown the model array and must duplicate the arrangement on
problem if it was completed successfully in the minimum
a second apparatus in the minimum number of moves, while
obeying the following rules : (1) only one disc may be moved at number of moves on two out of three trials. If two
a time ; (2) a larger disc must not be placed on top of a smaller consecutive problems were failed, then testing termin-
one ; (3) discs may not be placed on the table. The illustration ated, and the subsequent problems were scored as failed.
shows a 5-move problem. This procedure allowed each child’s performance to be
represented by a single number that corresponded to the
highest level of difficulty problem that was solved.

Method
Participants
Children were recruited from three primary schools and one
secondary school in London. These schools were selected
because their pupils came from a wide range of social
backgrounds. The test battery was administered to all children
Figure 2. The Tower of London puzzle, devised by Shallice whose parents gave consent for participation, with the exception
(1982) to involve similar reasoning as in the Tower of Hanoi but of children who did not have English as a first language.
in an easier format. Rather than discs of different size, Thirteen children with estimated short form Verbal IQ below 70
perforated balls of different colours are used, and the pegs are of were excluded, giving a total sample of 238 children. Sample
different lengths. The illustration shows a 4-move problem. characteristics are shown in Table 1. The smaller sample size for

for ToH). Burgess (1997) noted the relatively weak Table 1


intercorrelations between different measures of executive Constitution of Sample in Terms of Age, Sex, and Verbal
function including the ToL, and Lowe and Rabbitt (1998) Ability
showed that test-retest reliability for elderly volunteers Short form Verbal IQ
completing the computerised ToL ranged from n26 to n60,
depending on the performance index adopted, with Age range N Mean SD
reliability overall falling below conventionally accepted
levels. Gnys and Willis (1991) found higher levels of test- 7–8 yr
Boys 31 99n5 16n76
retest reliability (r l n72) for 5-year-old children on a
Girls 35 103n5 18n46
noncomputerised ToH task, but the test-retest interval
was only 25 minutes. 9–10 yr
Boys 39 95n1 15n22
The reasons for poor test-retest reliability over longer
Girls 44 95n1 15n35
intervals are unclear. Lowe and Rabbitt (1998) noted that
if, as suggested by Burgess (1997), task novelty is a critical 11–12 yr
Boys 26 99n2 8n63
factor, then poor reliability is to be expected if a subset of
Girls 30 102n8 14n98
individuals show dramatic improvement as they develop
a strategy. If this is so, a general improvement in task 13–15 yr
Boys 18 96n2 12n89
performance should be seen on retest. This is exactly what
Girls 15 90n1 16n70
was seen in the study by Gnys and Willis (1991) on 5-
year-old children, where the mean score on retest was Adults
Male 5 113n4 6n66
more than 1 SD higher on retest than on first test.
Female 15 105n8 9n47
However, as noted above, that study used a very short
TOWER OF HANOI TASK 553

the 13–15-year-old age group reflects the fact that it is difficult make an arrangement that looks just the same. That is not as
to schedule research testing to fit in with a school curriculum easy as it sounds, because there are certain rules you have to
once children enter secondary school. A group of 20 adults aged follow. First of all, you are not allowed to put any of the discs
from 18 to 26 years, recruited from hospital staff, was also tested on the table. When you let go of a disc, it must be on one of the
to provide an indication of adult levels of performance on this pegs. Second, you can only move one disc at a time. And third,
test. you can never put a bigger disc on top of a smaller disc, like this
(demonstrating WRONG move). That is NOT allowed. You
can only make moves like this ’’ (demonstrating RIGHT move).
Procedure The tester then demonstrated right and wrong moves, asking
Children were tested individually in a quiet room at their the child ‘‘ would I be allowed to do this ? ’’, including putting
school in one or two sessions lasting 30 minutes. Forty-five discs of right or wrong size on pegs, putting a disc on the table
children aged from 7 to 10 years were retested on the Tower of and moving two discs at once, until it was established that the
Hanoi task after an interval of 30 to 40 days. child understood the rules.
She then said : ‘‘ Now we are ready to begin. ’’ She presented
the first three-move (two disc) problem, by arranging her own
Core Test Battery apparatus in the start position, and the child’s apparatus in the
end position (see Appendix), and asked child to move the discs
Children were given the Vocabulary and Similarities subtests on his or her apparatus to give the same arrangement of discs.
from the Wechsler Scale of Intelligence for Children-3rd UK As can be seen in the Appendix, there were two problems, A
edition, WISC-III (Wechsler, 1992) to provide an estimate of and B, at each number of moves. To be credited as passing a
short form Verbal IQ. given problem, the child had to solve it twice in the minimum
The Same-Opposite World subtest from the Children’s Test number of moves. The child was given up to three trials per
of Everyday Attention (TEA-Ch : Manly, Robertson, problem. After the first successful trial on a problem, the tester
Anderson, & Nimmo-Smith, 1998) was administered to 147 of said ‘‘ Good ! Show me again ’’, to indicate that the child should
the children and all of the adults. This subtest, which has repeat the successful performance, rather than adopting a
similarities to the Day-Night test of Gerstadt, Hong, and different solution. The test stopped when this criterion of 2\3
Diamond (1994), is designed to assess the ability to inhibit a attempts correct was not reached on two successive problems.
prepotent response, which is seen as a critical component of When moving on to problems involving three discs, the tester
executive function. In this test, the testee sees a trail of squares, said : ‘‘ These are getting harder now. It is important to think out
each containing the written digit 1 or 2. The test starts with a how you are going to do it before you start ’’. For problems
‘‘ Same World ’’ trial, in which the tester points to each square in where the end state involved placing all discs on the rightmost
turn, and the task is to name the digit in the square as quickly peg (i.e., the first problem at each move length), the examiner
as possible. The tester’s finger moves on to the next square as was careful to indicate on her apparatus the rod that the child’s
soon as the correct name has been supplied, but remains on the discs had to go on, to ensure that there was no doubt as to which
same square if an error is made. A practice run is given before way round the tower should be constructed. In each case, the
the test proper. The time taken to complete the trail is recorded. child’s tower was to be a direct match of the adult’s tower, with
On the next trial, the tester explains that this is the ‘‘ Opposite both child and adult towers having the discs on the rod that was
World ’’, where the task is to say ‘‘ one ’’ when the digit is 2, and rightmost from the child’s point of view.
‘‘ two ’’ when the digit is 1. A practice run is first administered, The tester counted the number of moves made by the child,
and then the test proper. The test continues with one more and noted it on the record form under the relevant trial number
‘‘ Opposite World ’’ trial, and finishes with another ‘‘ Same (see Appendix). If the child moved a disc to another peg and
World ’’ trial. Time to complete each trial is converted to a rate then had a change of mind and moved it back without letting go
measure (squares per second), averaged for the two ‘‘ Same of it, this was counted as two moves. If the child violated one of
World ’’ trials and the two ‘‘ Opposite World ’’ trials, and the the rules, the tester gave a reminder of the rule and restarted the
difference taken. Scores for ‘‘ Same World ’’, ‘‘ Opposite trial, counting this as a failed trial. Rule violations (disc placed
World ’’, and the difference score were converted to age-adjusted on table ; two discs moved at once ; larger disc put on smaller)
z-scores. Because the test was still in the process of stan- were recorded.
dardisation at the time we conducted our study, we used our The child’s final score was the highest level of task successfully
own sample as the basis for deriving norms. completed (in terms of number of moves), with an additional
half point being added if both tasks at this level of moves were
passed. Thus a child who passed both 3-move problems, one 4-
Tower of Hanoi Test move problem, and both 5-move problems, and failed both 6-
Apparatus. Two sets of a wooden apparatus for the ToH move problems, would achieve a score of 5n5 (highest level
test were constructed according to the specifications given by passed l 5, plus n5 for passing both problems at this level). A
Borys et al. (1982), except that a fourth size of disc was added so child who passed both 3-move problems, passed only the second
that more complex problems could be administered. Thus the 4-move problem, and then failed both 5-move problems would
apparatus consisted of a board containing 3 upright rods, be given a score of 4n0.
6n5 cm apart, and discs of diameter 2 cm, 3n5 cm, 4n5 cm, and
7 cm, coloured yellow, green, blue, and red respectively. The
thickness of the discs was 1n7 cm, and each contained a central Results
hole 1 cm in diameter, so that the discs could be fitted over the
rods. Mean scores on Tower of Hanoi are shown in Table 2.
Test procedure. The child was given problems of increasing A regression analysis was carried out to investigate age
complexity, starting with 3-move problems, and increasing up effects on performance. Analysis of age trends from 7
to 9-move problems, until two consecutive problems were years to adulthood is complicated by the fact that we
failed. The arrangements for each problem are shown in the would not expect a linear relationship. For most cognitive
Appendix.
functions, growth is more rapid in the early childhood,
At the start of the test, the tester sat facing the child, with one
ToH apparatus in front of her containing a two-disc tower on and then slows down, with stability being achieved by
the rightmost peg. The other identical apparatus was in front of young adulthood. There is no reason to expect growth in
the child, with one disc on each of the three pegs. executive functions in adulthood, and within the adult
The tester said : ‘‘ In this game, I will show you an ar- group, the correlation between age and ToH was non-
rangement of these discs on pegs, and what you have to do is to significant (r l n137, df l 18, n.s.). Therefore all adults
554 D. V. M. BISHOP et al.

Table 2
Mean (SD) Scores on ToH Task by Age and Sex (Possible Range 3 to 9n5)
Male Female Total

Age range Mean SD Mean SD Mean SD


7–8 yr 5n6 1n44 6n0 1n33 5n8 1n39
9–10 yr 6n9 1n75 5n9 1n60 6n3 1n74
11–12 yr 7n9 1n61 6n7 1n74 7n2 1n77
13–15 yr 6n8 1n90 6n7 1n36 6n8 1n65
Adult 9n0 0n61 8n1 1n65 8n3 1n50

were arbitrarily assigned the age of 18 years. The under 9 years : r l n457, df l 21, 9 years and over :
correlation between log age and ToH score (r l n370, r l n594, df l 20 ; test for difference in correlations,
df l 255) was only marginally higher than the corre- z l 0n47 (Guilford & Fruchter, 1973).
lation with raw age (r l n366, df l 255), but the former
was preferred for use in a regression equation as provid- Comparison with Developmental Studies Using ToL
ing a more plausible developmental model. Task
The regression equation relating age to ToH score was :
y l 2n72nxk6n589, where y is the predicted score, and x is The question arises as to how typical our findings are.
the natural log of age in months. By subtracting the Studies by Krikorian et al. (1994) and Anderson et al.
predicted from obtained ToH score, and dividing by the (1996) presented normative data on the ToL task (non-
RMS residual (l 1n646), we can convert ToH scores to computerised version) for children aged from 7 to 13
age-adjusted z-scores. We considered whether to derive a years. Direct comparison of results is impossible because
separate regression equation for males and females, but different numbers of problems and different scoring
although there was a statistically significant effect of methods were used. Also, both the ToL studies used only
gender on ToH z-score, males : mean l 0n15, SD l 3-, 4- and 5-move problems. However, we can consider
1n033 ; females : mean lk0n13, SD l 0n953 ; F (1, 255) l sensitivity to age in each study in terms of effect size, d,
4n95, p l n027, the effect size was so small (η# l n019) that which is a unit-free measurement (Cohen, 1977). For each
it did not seem justified using separate norms for boys adjacent year band from 7 to 13 years, d was computed as
and girls. the difference between mean scores divided by the SD of
Rule-breaking errors occurred on only a small minority the younger group. The mean value of d, which gives an
of trials : over the whole test, the average number of such index of the average increase in score for each year of age,
errors was 0n358 in the 7–8-year-olds, falling to 0n177 for was n10 for the current study, which is similar to the value
the 13–15-year-olds. Only nine children in the whole of n16 from the Krikorian et al. study. The mean for the
study made more than one rule-breaking error. Anderson et al. study was twice as large at n31. The
ToH z-score did not correlate significantly with either probable reason for this difference is that Anderson et al.
short form Verbal IQ (r l n089, df l 255) or the z-scores used a scoring method that took into account time to
derived from the Same-Opposite World subtest from complete each problem, whereas the current study and
TEA-Ch (Same rate : r l n139 ; Opposite rate, r l n136 ; that of Krikorian et al. considered only accuracy. Neither
difference score r lkn014, df l 163). ToL study found the drop in score that we observed for
Test-retest reliability for the 45 children tested on two the 13–15-year-olds relative to younger children ; it is
occasions was r l n528 for raw scores, and r l n508 for likely that this simply reflects sampling error on a measure
age-adjusted z-scores. Although these values are stat- where age effects are small. Overall, it is noteworthy that
istically significant (p n001), they are low in relation to the effect sizes for age are relatively small, compared with
conventional psychometric criteria. As noted above, one those for the other measures in our study : vocabulary
reason for low reliability would be if task novelty were a (n47), similarities (n35), TEA-Ch same world (n55), TEA-
factor, with some children developing a strategy for Ch opposite world (n48). Thus on the ToH, variation
performing the task as they became more experienced. within an age group is substantial in relation to score
Although it was the impression of the testers that some increases from one year band to the next. Coupled with
children suddenly got the point of the test, and then our finding of low reliability, this suggests that age effects
showed a dramatic improvement, overall, scores for the may be masked by other random factors that affect
two test sessions did not differ significantly : mean for the children’s performance from day to day.
first session was 5n92, SD l 1n56, and for the second Luciana and Nelson (1998) used a computerised
session, 6n13, SD l 1n83 ; F (1, 44) 1. There was a version of the ToL, which included 3-, 4- and 5-move
suggestion that older children might be more susceptible problems, with children aged from 4 to 8 years. The
to learning effects : for children aged 7–8 years, mean computer screen is divided into a top and bottom half,
score was 5n72 (SD l 1n38) in session 1 and 5n78 (SD l with the goal position being shown on the top half of the
1n94) in session 2, an improvement of only 0n06 points, screen, and the initial position on the bottom half. The
whereas for those aged 9 to 10 years, the mean score was child moves the coloured ‘‘ balls ’’ on the bottom half of
6n14 (SD l 1n72) for session 1 and 6n50 for session 2 the display using a touchscreen, so as to make the bottom
(SD l 1n67), a gain of 0n36 points. However, this differ- display look the same as the top display. The displays are
ence was not statistically significant, with the inter- presented in such a way that they can be perceived as
action between age and session having a corresponding stacks of coloured balls, held in stockings and suspended
F-ratio of less than 1. The test-retest reliabilities did not from a beam. Luciana and Nelson reported their data
differ significantly for the older and younger children ; separately for each level of problem difficulty, and direct
TOWER OF HANOI TASK 555

comparison with the current study is difficult. However, slowly ? It may be that a better scoring system can be
there is a suggestion in their data that the computerised devised, but it is unlikely that our system was simply too
version of ToL may be substantially harder for children insensitive to pick up age effects : the problem was not
than versions using a more conventional apparatus. that children did not vary, but rather than the variance
Luciana and Nelson reported that around 20 % of 8-year- within each age group was substantial in relation to the
olds, and somewhat fewer 7-year-olds, achieved age effect size.
minimum-move solutions on 4- and 5-move problems. In We found no support for the notion that ability to
the current study, 70 % of 7- to 8-year-olds succeeded on inhibit a prepotent response is a major determinant of
both problems at the 4-move level, and 47 % succeeded ToH performance in normally developing children : the
on both problems at the 5-move level. (The percentages correlations with the Same-Opposite world task, which is
of children of this age succeeding on at least one of the designed to assess this aspect of executive function, were
two problems at a given level were even higher : 97 % for close to zero. In future work, it would be interesting to
4-move problems and 74 % for 5-move problems.) The consider alternative explanations for individual differ-
apparatus-based ToL studies do not report data sep- ences in ToH performance ; for instance, Pennington,
arately for different levels of task difficulty, but a Bennetto, McAleer, and Roberts (1996) have emphasised
consideration of mean scores suggests their findings were the importance of working memory for executive tasks.
more comparable to those of the current study than to This is particularly the case for the ToH, where the need
those of Luciana and Nelson. In the study by Krikorian to hold a sequential plan in memory increases with task
et al. (1994) the mean percentage score for 7- and 8-year- difficulty. It is possible, also, that more complex in-
olds was around 80 % on a task that consisted pre- hibitory processes are implicated than are tapped by the
dominantly of 4- and 5-move problems. Anderson et al. Same-Opposite World task. In the Opposite World task,
(1996) used only 4- and 5-move problems, and reported the participant must simply maintain a response set that
scores of over 75 % correct for 7- to 8-year-olds. involves doing the opposite of what is customary. In the
ToH, it is necessary to shift continuously between
subgoals in order to arrive at a final goal, and this can
Discussion mean inhibiting a subgoal (e.g., get the red disc on the
rightmost peg) that was previously active.
Overall, these results with normally developing chil- The low reliability of the ToH task is disappointing for
dren do not offer much encouragement for those wishing those hoping to use this task for individual assessment.
to use the ToH as a clinical index of executive function. Luciana and Nelson (1998) argued because one cannot
Although the test-retest reliability may be adequate for readily apply brain imaging studies to normal children,
demonstrating group differences in experimental studies, one can adopt the empirical strategy of selecting be-
as when a clinical group is compared to a control group, havioural measures with reliable neural correlates (in
it is too low for confident assessment of individual cases. terms of lesion studies, or imaging of healthy adults), and
We cannot attribute poor test-retest correlations to then ‘‘ attribute children’s successful performance on
restriction of range, as there is large variation between these measures to the functional maturation of the brain
children even within a single year band. We considered regions with which they have been experimentally cor-
whether low reliability might arise from the task losing its related ’’ (p. 273). Unfortunately, our study, taken in
novelty value on retest, leading to improved performance, conjunction with the evidence reviewed by Burgess (1997)
but there was no evidence for this, because average scores and Lowe and Rabbitt (1998), suggests that factors other
on retest were closely similar to those obtained on initial than regional brain development may exert so powerful
test. In this regard, our study is discrepant with work by an influence on performance of executive function tasks
Gnys and Willis (1991) and Aman et al. (1998) : it is that they swamp any variation due to individual differ-
possible that practice effects occur but then dissipate with ences in underlying neurology.
time—our test-retest interval was much longer than those
used in these studies. Low reliability is not specific to
References
children : indeed, the values of test-retest correlation
found here are similar to those reported by Lowe and Aman, C. J., Roberts, R. J., Jr, & Pennington, B. F. (1998). A
Rabbitt (1998) for elderly volunteers using the automated neuropsychological examination of the underlying deficit in
ToL test. attention deficit hyperactivity disorder : Frontal lobe versus
Would we have obtained stronger age effects by taking right parietal lobe theories. Developmental Psychology, 34,
956–969.
speed as well as accuracy of performance into account ?
Anderson, P., Anderson, V., & Lajoie, G. (1996). The Tower of
There were three principal reasons why we did not do London Test : Validation and standardization for pediatric
this. First, unless one has some baseline measure of speed populations. The Clinical Neuropsychologist, 10, 54–65.
of performance when planning is not implicated (as in the Baker, S. C., Rogers, R. D., Owen, A. M., Frith, C. D., Dolan,
computerised ToL), then one risks confounding executive R. J., Frackowiak, R. S. J., & Robbins, T. W. (1996). Neural
function with motor skill. Second, if speed is incorporated systems engaged by planning : A PET study of the Tower of
in the scoring, one then has to decide whether to London task. Neuropsychologia, 34, 515–526.
encourage participants to work quickly, and run the risk Borys, S. V., Spitz, H. H., & Dorans, B. A. (1982). Tower of
of inducing an unreflective approach to the task, or Hanoi performance of retarded young adults and non-
whether to leave participants to work at their own pace, retarded children as a function of solution length and goal
state. Journal of Experimental Child Psychology, 33, 87–110.
thus penalising those who proceed cautiously. Third, it is
Burgess, P. W. (1997). Theory and methodology in executive
unclear how to combine speed and accuracy in a scoring function research. In P. Rabbitt (Ed.), Methodology of frontal
system : for instance, would a person who solved two 4- and executive function (pp. 81–116), Hove, U.K. : Psychology
move problems quickly but then failed both 5-move Press.
problems be given more credit than someone who failed Cohen, J. (1977). Statistical power analysis for the behavioral
one 4-move problem and then passed a 5-move problem sciences. New York : Academic Press.
556 D. V. M. BISHOP et al.

Gerstadt, C. L., Hong, Y. J., & Diamond, A. (1994). The Morris, R. G., Ahmed, S., Syed, G. M., & Toone, B. K. (1993).
relationship between cognition and action : Performance of Neural correlates of planning ability : Frontal lobe activation
children 3"–7 years old on a Stroop-like daynight test. during the Tower of London test. Neuropsychologia, 31,
#
Cognition, 53, 129–153. 1367–1378.
Gnys, J. A., & Willis, W. G. (1991). Validation of executive Owen, A. M., Downes, J. J., Sahakian, B. J., Polkey, C. E., &
function tasks with young children. Developmental Neuro- Robbins, T. W. (1990). Planning and spatial working mem-
psychology, 7, 487–501. ory deficits following frontal lobe lesions in man. Neuro-
Goel, V., & Grafman, J. (1995). Are the frontal lobes implicated psychologia, 28, 1021–1034.
in ‘‘ planning ’’ functions ? Interpreting data from the Tower Ozonoff, S., Pennington, B. F., & Rogers, S. (1991). Executive
of Hanoi. Neuropsychologia, 33, 623–642. function deficits in high-functioning autistic children : Re-
Guilford, J. P., & Fruchter, B. (1973). Fundamental statistics in lationship to theory of mind. Journal of Child Psychology and
psychology and education. (5th ed.) New York : McGraw-Hill. Psychiatry, 32, 1081–1105.
Humes, G., Welsh, M., Retzlaff, P., & Cookson, N. (1997). Pennington, B. F., Bennetto, L., McAleer, O., & Roberts, R. J.
Towers of Hanoi and London : Reliability and validity of two (1996). Executive functions and working memory : Theor-
executive function tasks. Assessment, 4, 249–257. etical and measurement issues. In G. R. Lyon & N. A.
Krikorian, R., Bartok, J., & Gay, N. (1994). Tower of London Krasnegor (Eds.), Attention, memory, and executive functions
Procedure : A standard method and developmental data. (pp. 327–365), Baltimore, MD : Paul Brookes.
Journal of Clinical and Experimental Neuropsychology, 16, Shallice, T. (1982). Specific impairments of planning. Philo-
840–850. sophical Transactions of the Royal Society, series B, 298,
Lowe, C., & Rabbitt, P. (1998). Test\retest reliability of the 199–209.
CANTAB and ISPOCD neuropsychological batteries : Wechsler, D. (1992). Wechsler Intelligence Scale for Children-
Theoretical and practical issues. Neuropsychologia, 36,
Third UK edition. London : Psychological Corporation.
915–923.
Welsh, M. C., Pennington, B. F., & Grossier, D. B. (1991). A
Luciana, M., & Nelson, C. A. (1998). The functional emergence
normative-developmental study of executive function : A
of prefrontally guided working memory systems in four- to
window on prefrontal function in children. Developmental
eight-year-old children. Neuropsychologia, 36, 273–293.
Neuropsychology, 7, 131–149.
Manly, T., Robertson, I. H., Anderson, V., & Nimmo-Smith, I.
(1998). The Test of Everyday Attention for Children (TEA-
Ch). Bury St Edmunds, U.K. : Thames Valley Test Company. Manuscript accepted 9 January 2001

Appendix
Record form for Tower of Hanoi task, showing the con- completed in the minimum number of moves. The test stops
figuration for each problem. The three dashes denote the three when this criterion of 2\3 attempts correct is not reached on
rods, with the letters indicating the discs. The number of moves two successive problems, and the child is awarded a score corre-
taken by the child on 1st, 2nd, and 3rd trials of each problem is sponding to the highest level (number of moves) passed, with
recorded in the box against that problem. At each level, problem n5 added if both problems at that level were passed. Two
A is given first, and then problem B. For each problem, the child children who failed both 3-move problems were given a score of
is administered up to three trials, until two trials have been 2n0. Thus the total score could range from 2n0 to 9n5.

Problem A Problem B
configuration* trial configuration* trial
start end 1 2 3 start end 1 2 3
S ––S M ––M
3 move M–– ––M L–– ––L
––S ––S S ––S
4 move LSM ––M M –S
LM– ––L L–– –ML
––S ––S S ––S
5 move ––M ––M M ––M
LMS ––L L–– SML
––S ––S S ––S
6 move M ––M M ––M
L– S ––L L–– S–L
7 move S ––S S ––S
M ––M M S
L–– ––L L–– ML–
8 move ––S ––S S ––S
S ––M M –GS
M ––L L –GM
LG– ––G G–– –GL
9 move ––S ––S S ––S
––M M ––M
M ––L L –GM
LGS ––G G–– SGL
*S = small; M = medium; L = large; G = giant

You might also like