Professional Documents
Culture Documents
International Journal of
Mathematical Education in Science
and Technology
Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/tmes20
To cite this article: Jane M. Watson , Ben A. Kelly , Rosemary A. Callingham & J. Michael
Shaughnessy (2003) The measurement of school students' understanding of statistical
variation, International Journal of Mathematical Education in Science and Technology, 34:1,
1-29, DOI: 10.1080/0020739021000018791
Taylor & Francis makes every effort to ensure the accuracy of all the information
(the Content) contained in the publications on our platform. However, Taylor
& Francis, our agents, and our licensors make no representations or warranties
whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and
views of the authors, and are not the views of or endorsed by Taylor & Francis. The
accuracy of the Content should not be relied upon and should be independently
verified with primary sources of information. Taylor and Francis shall not be liable
for any losses, actions, claims, proceedings, demands, costs, expenses, damages,
and other liabilities whatsoever or howsoever caused arising directly or indirectly in
connection with, in relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden.
Terms & Conditions of access and use can be found at http://www.tandfonline.com/
page/terms-and-conditions
Downloaded by [Colorado College] at 18:38 27 October 2014
int. j. math. educ. sci. technol., 2003
vol. 34, no. 1, 129
1. Introduction
The case for the need to measure school students understanding of statistical
variation would appear easy to make: statistics requires variation for its existence.
If understanding of statistics is to be measured then account must be taken of
understanding of variation. This does not mean understanding of standard devi-
ation but of something more fundamentalthe underlying change from expecta-
tion that occurs when measurements are made or events occur. This implies that
variation occurs in a context and thus it must be measured in context. For school
students the context is the chance and data part of the mathematics curriculum.
Hence it is within this context that measurement of variation itself must take place.
It was the purpose of this project to devise a survey that would use the context of
the chance and data curriculum, which only implicitly acknowledges the variation
on which it stands, to gain an appreciation of that elusive foundation of the subject.
Until very recently there has been little research into school students under-
standing of variation. Shaughnessy [1] suggested that this might reect the
emphasis of the school curriculum, which has traditionally been on measures of
the middles of data sets rather than on measures of spread. This in turn may reect
a view of teachers and curriculum planners that variation is messy and leads to a
complex codication in the standard deviation. This view is not reected, however,
by statisticians such as Moore [2], who included variation in four of his ve core
elements of statistical thinking, and Cobb and Moore, who stated explicitly that
the need for statistics arises from the omnipresence of variability [3, p. 801].
Further, the work of Wild and Pfannkuch [4] in developing a framework for
statistical thinking in empirical enquiry places variation at the heart of all
investigation, reinforcing the need to address students understanding and devise
ways to assist the development of appropriate appreciation before they leave
school.
Shaughnessy et al. [5] began the antecedent work to that presented in this study
in response to the analysis of data and chance items from the 1996 National
Assessment of Educational Progress (NAEP) in the USA [6]. In particular, an item
in the NAEP asked students to make a best prediction of how many red gum balls
Downloaded by [Colorado College] at 18:38 27 October 2014
2. Survey development
The search for items appropriate to a paper-and-pencil survey of about 45
minutes duration began with existing material used in previous research with
chance and data (see [10] for many sources), with the work of Shaughnessy et al.
[5] as described above, and with activities carried out in classrooms [11]. The
framework for the chance and data curriculum conceived by Holmes [12] was used
to ensure that aspects of variation across the curriculum were covered. The
following areas were hence specically catered for with the items in the
Appendixsampling variation, displaying variation, chance variation, describ-
ing/measuring variation, and sources of variation (explanations, inferences).
An initial survey was trialled in a small private school with 58 students in
grades 4 to 10. A few items were reworded to aid interpretation and understanding,
and decisions were made about a core set of items to be asked to grade 3 students,
with additions made at each of grades 5, 7, and 9. Initial analyses based on Rasch
methods [9, 13] and the Structure of Observed Learning Outcomes (SOLO)
developmental model of Biggs and Collis [14] were promising and with amend-
ments, the surveys, as presented in the Appendix, were used for the main study
(the spacing of the items in the Appendix does not reect the spacing used on the
actual surveys). The rationale for the choice of items and the addition of parts and
items for higher grades will be discussed in the following paragraphs.
To set the context for questions dealing with variation it was necessary to
include some items focusing more directly on basic chance measurement and table
or graph reading. Items 1 and 14 used by Watson et al. [15] and Item 3(a) used by
Torok [11] were of this nature for chance. Parts (a) to (d) of Item 15, parts (a) and
(b) of Item 6, and parts (a) to (d) of Item 4, served a similar purpose for reading
tables, pictographs, and stacked dot plots, respectively (based on items reported in
[11], [16], [17], and [18]). Also, Item 11 asked students to note anything unusual in
a bar graph presented from a newspaper article [19].
To explore students understanding of variation in chance settings, Item 2
(following on from Item 1 for all grades), asked students to imagine throwing a die
60 times with the possible results being recorded in a table. The purpose was to
assess the degree of variation they would attribute to outcomes associated with
many trials but having an underlying uniform expectation. The motivation for the
item arose from the work of Reading and Shaughnessy [8] and Torok and Watson
[7], although these researchers did not work with dice. The aims of Item 3, parts
4 J. M. Watson et al.
(b) to (e) were similar but set in the context of a spinner task [11]. Students in
grades 3 and 5 were asked to imagine the outcomes of 10 spins performed on six
dierent occasions, whereas students in grades 7 and 9 were asked to imagine the
outcomes for 50 spins performed on six dierent occasions. Finally, the words
random and variation were included in the words whose meanings were
requested in Item 12 [20, 21].
Consideration of variation in other data handling settings, including graphs,
was covered in the following: Item 4(e) and Item 5, answered by students in grades
7 and 9 for stacked dot plots related to the spinner task [11]; parts (c) to (f) of Item
6 for the pictographs, which was answered by students in all grades [17]; Item 7 for
a comparison of stacked dot plots, answered by students in grades 5, 7, and 9 [22];
Item 11 for bar graphs and Item 13 for nding an average in a set with an outlier,
both answered by students in grades 7 and 9 only [23, 24].
To consider sampling, Items 8, 9, and 10 were adapted from the work of Jacobs
Downloaded by [Colorado College] at 18:38 27 October 2014
[25] based on a school class doing a survey before selling rae tickets. All grades
were asked Item 8, involving how the students would conduct the survey
themselves. For Item 9, suggesting how other students had conducted the survey,
grade 3 students were only presented with three alternatives: (a) Shannon, (b) Jake,
and (c) Adam. Grade 5 students had these three alternatives, as well as (d) Ra. In
grades 7 and 9, students had these four alternatives, plus (e) Claire. All students
were asked to justify why each survey method was Good, Bad, or they were Not
sure, and to choose which of the survey alternatives was the best. Item 10, asking
for a nal prediction on school support for the survey, was only given to grade 9
students. Item 15(e) asked about fair selection of a sample and Item 16 explored
bias in sampling [26, 27], whereas the denition was solicited in Item 12.
3. Method
3.1. Sample
Data were collected from students in ten government schools in the Australian
state of Tasmania. A total of 746 students in grades 3 n 177, 5 n 183, 7
n 189, and 9 n 197 were administered the questionnaire described above
and shown in the Appendix. Each class was allowed approximately 45 minutes to
complete the questionnaire, and students were encouraged to ask questions to help
clarify the reading of items. Teachers and researchers, however, did not assist
students in such a way as to answer the question or inuence responses. Students
who nished early were given the opportunity to go back and answer the items left
blank. This assisted in eliminating missing data.
3.2. Coding
The SOLO taxonomy of Biggs and Collis [14], and previous analyses of some
items by Watson et al. [15] and Watson and Moritz [2628] provided starting
points for the analysis of the items in the present study. A coding scheme was
devised based on this background and the desire to provide structure to students
displayed understanding of variation in the contexts provided by the items.
Overall, 44 sub-parts were coded for the 16 items. The coding of Items 1 and
14 was based on previous work by Watson et al. [15], whereas the coding of Item
16 was based on a scheme devised by Watson and Moritz [26]. All other items were
coded based on their degree of mathematical correctness and/or the appropriate-
Students understanding of statistical variation 5
part, as well as catering for the global categories combined from these smaller
gradations.
The criterion for determining the appropriateness of the variation displayed in
responses to Item 3(e) was based on a simulation of 1000 outcomes for the response
using an EXCEL spreadsheet. The standard deviation for each simulation was
calculated and then plotted. Appropriate variation was determined by the standard
deviations falling within the middle 90% of a normal curve (0.62.3 for 10 spins;
1.35.0 for 50 spins). A similar simulation was carried out for Item 2 with the 60
tosses of the die, with the standard deviations within the middle 90% being the
indicator of appropriate variability (1.24.7). For each item, the standard deviation
of the students response was calculated and compared with one of these criteria.
For items similar in content, such as Items 9(a) to 9(e), a parallel classication
was used relative to the quality of the suggestions made in the stem of the item.
The four categories for these items reected no appreciation for the task, an
inappropriate decision on the adequacy of the survey, a non-central criticism or
approval of the techniques, and an appropriate statistical response.
3.3. Analysis
Use of Item Response Models, and more specically Rasch models, with data
coded using SOLO and other hierarchical systems has been described by a number
of authors (see, for example [29, 30]). These models use the interaction between
persons and items to estimate person abilities and item diculties on the same
scale. The unit of measurement is the logitthe natural logarithm of the odds of
success of a person on an item. For further discussion of the theoretical aspects of
the PCM used in this study, see Masters [9].
The PCM is underpinned by three assumptions: rst, the variable is uni-
dimensional; second, the variable is hierarchical or has direction; and third, the
items are independent of each other. The focused nature of the content in this
study suggested that the requirement of unidimensionality could be met. The
hierarchical coding scheme allowed the requirement of direction to be realized,
and nally, although some items, such as the set about the rae scenario (Items 8,
9, and 10), had a common context, each sub-part was stand-alone and was not
dependent on a correct answer to a previous part. The PCM [9] was hence tted to
the data using the Quest computer program [31]. This analysis allowed all the
items to be calibrated and placed on a single scale in one operation.
6 J. M. Watson et al.
fewer categories, as described previously. This improved the initial t of the data
to the model. A variable map of person ability and item diculty for the 746
students for 44 items was produced (see gure 1 in the Results).
Table 1 summarizes the items and parts of items that measure aspects of
pattern (in basic chance, and graphs and tables) or variation (in chance, data/
graphs, and sampling), so that each response in the survey can be allocated to one
recognizable component contributing to the overall scale. Item 12(c) refers to the
word variation, and has been included within both components for chance and
data/graphs. Question 11 was analysed in two ways to tap into dierent sources of
chance and data measurement (nding an unusual feature and showing an
appreciation of variation). For ease of identication, labels VC, VD, and VS
have been used to identify items associated with chance variation, data variation,
and sampling variation in the Appendix and elsewhere as appropriate.
4. Results
The results will be discussed from two perspectives: a descriptive analysis of
selected items that elucidates aspects of understanding and the outcomes of the
Rasch analysis including levels of understandings arising from interpretation of the
variable map.
times and the rest come up all over the place); if the sum was less than 40; or if
there appeared to be a misinterpretation of the question (e.g. 1, 60, 0, 0, 0, 0[no
reasoning]). Responses with a coding of 1 (30.2%) summed to 60 but provided
idiosyncratic reasoning for the variation displayed (e.g. 10, 20, 10, 10, 0, 10
Because you usually get lower numbers than higher numbers). Coding 2 (31.8%)
was given to a variety of numerical responses reecting strict probability or
unusual variation but with reasoning that reected some understanding of the
context (e.g. 5, 15, 10, 15, 10, 5Because when you throw a die, you get some
numbers a lot and never at other times). At code 3 (8.4%), responses showed
variation that was either too narrow or too wide, but expressed appropriate
reasoning about variation in the context (e.g. 11, 9, 11, 9, 11, 9Because they
are close to 10 which would be the average number rolled). At coding 4 (9.2%), the
reasoning and the displayed variation were appropriate (e.g. 9, 11, 8, 12, 10, 10
They are all around the same amount).
Downloaded by [Colorado College] at 18:38 27 October 2014
rst plot included only occurring data points, whereas the second plot used all
possible data points in the range (see Appendix). All three parts of this item were
coded in four categories. In all cases a code of 0 was given for responses with no
discernable logical reasoning (e.g. The town must not be well known or Neither
[graph is better] because neither of the graphs tell a story). In coding 1, for parts
Downloaded by [Colorado College] at 18:38 27 October 2014
(a) and (b), responses included data reading comments (e.g. Column 3 has four
crosses), sometimes combined with an inappropriate comment (e.g. 0 people
lived there), and if a summary statement was made, it was combined with an
inappropriate statement (e.g. They have lived there a long time. They have 22
people in their family.). At coding 2, responses included one summary statement,
and one data reading statement, or a single summary statement with no other
comment (e.g. The years vary as to how long they have lived in the town.
Someones lived there 37 years.). At coding 3, responses provided two distinct
appropriate summary statements (e.g. Most people have lived in town 3 years.
Second most people have lived in town for 12 years.). Table 2 shows the
percentages of students in the four coding categories for each part of Item 7. As
can be seen, the percentages of responses are relatively stable over and within the
three sub-items.
Questions 8, 9, and 10. Items 8, 9, and 10 concerning students carrying out a
survey before selling rae tickets to have a trip to Movieworld were the longest
and most complex questions for most grades. Additional parts were included for
higher grades. The coding of categories for the various parts is given in table 3
with examples of student responses. The consistency of types of response across
questions is shown in table 3, however, some questions had higher diculty levels
(see gure 1 [VS8VS10]).
Question 15. Item 15(e), the last part, which followed on from four parts about
reading tables, asked about choosing students in a fair way for the closing parade at
a sports day. Categories were determined based on the quality of responses.
Responses that did not address the question of how to choose the students (e.g.
They play the girls games rst. Then play the boys games.) were classied as
inappropriate and were coded 0 (21%). Students who gave responses using either a
behavioural or personality characteristic for selection (e.g. Watch them march and
see whos the best), or responded with methods containing at least one descriptor
for selection (e.g. 2 girls and 2 boys) were given a code of 1 (32.6%). Students who
used representative methods to make a selection (e.g. One child from each sport,
but only 2 boys from their sports and 2 girls from their sports) were given a code
of 2 (6%). Students who used one random method of selection but not combined
with stratication were given a code of 3 (30.7%)(e.g. Pick out of a hat), and
students that combined both representativeness and random methods in the one
response or gave two distinct purely randomized methods were given a code of 4
Downloaded by [Colorado College] at 18:38 27 October 2014
Q8 How many/how
to survey Q9(a) Shannon Q9(b) Jake Q9(c) Adam Q9(d) Ra Q9(e) Claire
Code [all grades] [all grades] [all grades] [all grades] [grades 5-9] [grades 7-9]
3 Appropriate Representative and Random methods: Detecting bias and Detecting bias: Lack of range and/or Appropriate
statistical response random: 10 from Good, because its a small sample size: Bad, not enough variation: criticism:
each grade, 5 boys good random way to Bad, not enough dierent age groups Bad, they would Bad, some kids
and 5 girls picked at survey people and probably say the might go twice
random selectively picked same thing
Random only: Put
all 600 student
names in a hat and
draw out 65
2 Non-central Representative Adequate sample Detecting bias only: Large sample size: Adequate sample Adequate sample
ideas and student methods only: You size: Bad, its not broad Bad, too many size: size:
uncertainty would survey 60 Good, theres a lot enough people Good, you get a lot Good, you just have
children, 10 from of people Student uncertainty: Student uncertainty: of answers enough
each grade so you Not sure, because Not sure, because Student uncertainty: Student uncertainty:
could see an average not many dierent thats only one class Not sure, it depends Not sure, because
for each grade people would go but he surveyed the how many of his people who thought
there most people friends have it was a bad idea
dierent opinions wouldnt bother
1 Inappropriate Non-representative: Method too random: Creating bias: Fairness: Friendship: Free choice:
analysis 50 students that I Bad, he could pick Good, to give them Good, because it is Good, because they Good, it is their
meet the wrong people a hint to buy one fair are his friends own choice
Entire population:
You would survey
them all
Students understanding of statistical variation
(3.3%)
(9.7%) (e.g. Put all the boys names in a hat and pull out two and do the same for
the girls). Not many responses were given a code of 2, indicating a preference for
chance over stratication methods.
Question 12. Item 12 asked for denitions of the three terms sample, ran-
dom, and variation. In each case three coding categories reected increased
structure and understanding in the response. Coding 0 responses were inappropri-
ate or tautological. Coding 1 reected single ideas or examples of the term. Coding
2 reected straightforward but clear explanations of the term, whereas coding 3
responses were considered to integrate ideas with examples. Examples of responses
for each coding category of each of the terms are given in table 4, with the
percentage in each coding category given in parenthesis.
-------------------------------------------------------------------------------------------------
4.0 |
|
| VS9b.3
|
| VC12b.3 LEVEL 4 Critical
| Q11a.2 aspects of variation
3.0 | VD6e.2 Employing complex
| VD6f.5 justification or critical
| reasoning
| VS15e.4
| VD6d.3 VD7b.3
| Q4d VD13.3
2.0 | VD7a.3
|
| VC3bA.3 VS9a.3 VS9d.3 VS9e.3 VD11b.3 VD12c.3
| VC2.4 VS8.3 VS12a.3 VD11b.2 VS16a
X | Q1.4 VC3bB.3
X | VS9e.2 VS16b.2 VS10.3
XX | VC2.3 VC3eB.2 VD7c.3 LEVEL 3 Applications of
1.0 XX | VD6f.4 VS9c.3 Q14.3 VD4e variation
XXXXX | VC3cA.3 VC3cB.3 VS8.2 Q4c VS16b.1 Consolidating and using ideas in
XXXXXX | VD6f.3 VS9a.2 VS12a.2 VD13.2 VS10.2 context, inconsistent in picking
XXXXXXXXXXX | VD6d.2 VS9f.2 Q4b VD5.2 VC12b.2 VD12c.2 most salient features
XXXXXXXXXXXXX | VD11b.1
XXXXXXXXXXXXXXXX |VC2.2 VC3eA.2 VS9b.2 VS9c.2 VS15e.2 VS15e.3 VS9d.2 VD7b.2 VD7c.2 Q11a.1 VD13.1
.0 XXXXXXXXXXXXXXX | VC3bA.2 VC3cA.2 VS9a.1 VD7a.2 VD12c.1
XXXXXXXXXXXXXXX | Q1.3 VS9e.1 VC12b.1 LEVEL 2 Partial recognition of
Downloaded by [Colorado College] at 18:38 27 October 2014
with their original item numbers (e.g. Q1) in gure 1, whereas the second three
components have been relabelled as VC, VD, and VS, for Variation in Chance,
Variation in Data/Graphs, and Variation in Sampling, to aid identication of the
components. The distribution of each component along the variable was satisfac-
tory, indicating the prociency of the scale to indicate students achievement on
specic sub-components as well as the underlying overall understanding of vari-
ation in chance and data. Further, the item diculty distribution matched the
distribution of student ability, indicating that the scale allowed all students to
demonstrate what they knew. Four levels of increasing understanding were
identied within the variable. The thresholds for each level were determined by
considering the increasing understanding of variation displayed and the sophisti-
cation and structure of responses in the categories of response. These levels are
identied in gure 1.
At Level 1, Prerequisites for Variation, students are likely to use stories or
personal experience to justify responses. They recognise variation only in the
simple context of the travel graph not looking the same everyday (Item 6(c)) or in
12 J. M. Watson et al.
-------------------------------------------------------------------------------------------------------------
Item Fit
(N = 746 L = 48 Probability Level= .50)
-------------------------------------------------------------------------------------------------------------
INFIT
MNSQ .63 .67 .71 .77 .83 .91 1.00 1.10 1.20 1.30 1.40 1.50 1.60
----------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-
Q1 . | *
VC2 . | * .
Q3a . * | .
VC3bA .* | .
VC3bB . | * .
VC3cA . * | .
VC3cB . | * .
VC3dA . * | .
VC3dB . | * .
VC3eA . | * .
VC3eB . | * .
Q6a . * | .
Q6b . * | .
VD6c . * | .
VD6d . | * .
VD6e . * | .
VD6f . | * .
VS8 . * .
VS9a . * | .
VS9b . * | .
VS9c . * | .
VS9f . * .
VS12a . * | .
Q14 . | * .
Q15a . * | .
Q15b . * | .
Q15c . | * .
Q15d . *| .
Downloaded by [Colorado College] at 18:38 27 October 2014
VS15e . | * .
VS9d . * | .
VD7a . * .
VD7b . * .
VD7c . | * .
Q4a . | * .
Q4b . | * .
Q4c . * | .
Q4d . * | .
VD4e . |* .
VD5 . * | .
VS9e . * | .
VD11a . * | .
VD11b . * | .
VC12b . * | .
VC/VD12c . * | .
VD13 . * | .
VS16a . * | .
VS16b . * | .
VS10 . | * .
====================================================================================================================================
aspects, but do not achieve a high level of sophistication. Students are likely to be
successful on some questions requiring critical analysis (e.g. choosing the appro-
priate stacked dot plot in Item 7), indicating some transition to Level 4 thinking.
This is not consistent, however, across contexts. Students are also likely to
complete successfully items based on the school curriculum in chance and graph
work.
Level 4, Critical Aspects of Variation, is where consolidation of concepts
occurs. In terms of variation students are likely to summarize graphical informa-
tion in statistically appropriate ways and acknowledge varying values in a data set
in calculating the mean. Appropriate variation is demonstrated in suggesting
outcomes for 60 die tosses and justications for spinner results. For sampling
items students are likely to nd the critical aspects of bias such as non-representa-
tiveness as well as make appropriate suggestions on their own. For the terms
sample, variation, and random, students are likely to display sophisticated
Downloaded by [Colorado College] at 18:38 27 October 2014
Levels 1 3(a) 14
4 4
3 3
2 2/3 2 1/2
1 1 1
0 0 0 0
Basic Tables/Graphs Items
Levels 4(a) 4(b) 4(c) 4(d) 6(a) 6(b) 11(a) 15(a) 15(b) 15(c) 15(d)
4 1 2
3 1 1 1
2
1 1 1 1 1 1 1 1
Downloaded by [Colorado College] at 18:38 27 October 2014
0 0 0 0 0 0 0 0 0 0 0 0
Variation in Chance Items
Levels 2 3(bA) 3(bB) 3(cA) 3(cB) 3(dA) 3(dB) 3(eA) 3(eB) 12(b) 12(c)
4 4 3 3 3 3
3 2/3 3 3 2 2 2 2
2 1 2 1/2 2 1 1 1 1
1 1 1/2 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0
Variation in Data/Graphs Items
Levels 4(e) 5 6(c) 6(d) 6(e) 6(f) 7(a) 7(b) 7(c) 11(b) 12(c) 13
4 3 2 5 3 3 2/3 3 3
3 1 2 2 3/4 2 2/3 1 2 1/2
2 1 2 1/2 1 1 1
1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0
Variation in Sampling Items
Levels 8 9(a) 9(b) 9(c) 9(d) 9(e) 9(f) 10 12(a) 15(e) 16(a) 16(b)
4 3 3 3 3 3 3 4 1
3 2 2 2 2/3 2 2 2 2/3 2 2/3 1/2
2 1 1 1 1 1 1 1 1 1
1 1
0 0 0 0 0 0 0 0 0 0 0 0 0
could provide a useful scoring rubric for teachers with the addition of suitable
descriptors.
5. Discussion
The discussion will consider four aspects of the current study: (i) the objective
to develop a scale to measure students understanding of variation in the context of
the chance and data curriculum; (ii) responses to some items in comparison with
outcomes reported by other researchers; (iii) the place of the scale in the larger
contexts of statistical literacy and literacy, and (iv) directions for future research.
a better representation of the data. Within the current study this is reected in the
number of students who, in response to Item 7(c), chose Graph 1 (27.6%) as their
preferred graph over the more statistically appropriate Graph 2. Students who
preferred Graph 1, and believed it to tell the story better, often cited such reasons
as it was easy to understand and read, and that it was better as it only showed
information relevant to the question. Students who preferred Graph 2 (21.9%), on
the other hand cited such reasons as it was more spaced out and thus easier to
understand. Although there is no one correct way of scaling, including non-
occurring values helps to display information in such a way as to raise interesting
questions about the factors that might have aected town growth [22], and this is a
desirable outcome for educators.
Konold and Higgins [22] also recognized that relating data back to the real
situation is an important feature in statistical understanding. Quite often students
in this study talked about the numbers only, and neglected what these numbers
represent. In relation to Items 7(a) and 7(b), it was common to see a response such
as Four Xs on 3. Such a response, although correct is not optimal since it does
not reect an understanding of what the data represents in the context.
Another item, 4(e), asked students to describe the shape of the stacked dot plot
provided. This question was designed to tap into students abilities to describe
informally a data set in terms of variation. Cobb [37], as a result of a teaching
experiment with grade 7 students, stated that talking informally about the shape of
a data set as hills and clusters is quite often good enough for the task at hand, and
can even give students some experience with working on ideas that can help
construct some meaningful interpretations of measures of spread. Within the
present study students who provided a reasonable description of the shape of
the data did so in three ways. Students described the stacked dot plot in terms of
geometric shapes (e.g. triangle, circular, two peaks), physical objects (e.g. pyramid,
Melbourne city, mountains, stairs, like on a stereo player), or by acknowledging
the existence of variation (e.g. up and down, uneven, lumpy, spread out, not in a
pattern). All three description types provide useful, non-threatening ways of
interacting with data and generating discussions about variability and spread.
Bias in sampling. Items 8 to 10 were a set of sampling questions based largely
on the work by Jacobs [25]. Like Jacobs study, the present ndings show that for
Item 8 (asking students how many people they would survey and how they would
choose them) many struggled with the concept of sample as a representation of a
Students understanding of statistical variation 17
whole, and responded with such answers as ask everyone or all 600. Metz [38]
found a similar result, with 12 students out of 37 opposing sampling as a means of
inference, stating a similar claim.
Roughly 7% of students in the present study initially suggested appropriate
randomization and stratied randomization techniques in Item 8. Jacobs [25],
however, found that one-third of all fth graders used such techniques. Schwartz
et al. [39] similarly found that 40% of sixth graders, although sceptical of truly
random samples, proposed methods that were stratied and avoided obvious bias.
In the present study, only 15% of grade 5 and 19% of grade 7 students provided
random and/or stratied techniques. This may reect a lack of opportunity to learn
about these concepts.
When asked to evaluate dierent sampling techniques only 22% of students
positively evaluated the preferred technique (Shannons survey). One of the main
criticisms was that drawing 60 names out of a hat from a population of 600 could
Downloaded by [Colorado College] at 18:38 27 October 2014
create a biased result. Students in the Schwartz et al. [39] study focused on similar
issues with such reasons as She might pull out [out of a hat] all rst grade names
(p.255). Other reasons for negative evaluations of Item 9(a) in the present study
were based on the ideas of inaccuracy (e.g. Because it is way o, and Because it
limits the range). Other students were more explicit and like those in the Schwartz
et al. [39] and Jacobs [25] studies, went as far as to say the method was too random
(e.g. Because it could be anyone, You dont know whose name will come out . . .).
These ndings also reect results of Metz [38], who stated one reason (among
others) that students did not like to generalize from a random sample was because
of the inherent variability within the population. Students, therefore, when able,
prefer to purposely select individuals to represent the characteristics in the
population through stratication [22, 26, 39]
Quite often students cited non-statistical reasons in their appraisals of the
methods presented; one such reason was the fairness rationale [25]. In response to
the randomized method in Item 9(a), some students evaluated the method as unfair
due to the fact that some children may have been selected who did not want to
participate, whereas others who were not selected probably did want to participate.
This rationale relies on emotive and personal beliefs of what constitutes a fair
survey. This reasoning also applied to the positive evaluations of Claires self-
selected survey technique: 40% of reasons for appraisal were freedom of choice,
fairness, methodological implications (easy to conduct), and inappropriate as-
sumptions of range and natural variation. Similarly, Jacobs [25] found that
students in her study assumed that, although self-selected, Claires method
would in fact produce a good mixture of respondents because of the absence of
other sample restrictions such as age and sex. Although the appreciation of the
need for range and variation was present in both studies, often there was diculty
in balancing this with the idea that self-selection methods are likely to produce
biased outcomes in samples.
The remaining three survey methods (Jake, Adam, and Ra) were all methods
that lacked representativeness. Schwartz et al. [39] found that some students in
response to a similar question suggested that surveying their friends and others
they thought likely to behave in the desired fashion to be acceptable in generalizing
about the wider population. In the present survey, students who evaluated Jake,
Adam, or Ra positively, often cited similar reasons (see table 3), suggesting a
limited understanding of the notion of sample.
18 J. M. Watson et al.
Item 10 presented grade 9 students with a list of conicting results from the
dierent sampling techniques. In response, the majority of students chose a survey
method and results that were biased and unrepresentative of the population, but
nevertheless congruent with the method they thought was best when asked in the
previous item (Item 9(f)). There were a few students, however, who changed their
choice from Shannons survey as being the best method, to choosing a result from
one of the other four biased surveys perhaps because of the tendency to favour
methods with more decisive results (e.g. Claire) than methods that yielded quite
indecisive results (e.g. Shannon). These students were apparently inuenced by
the additional information presented. Relatively few students chose the correct
survey method, and even fewer were inuenced in this direction from other
methods. Many students (16%) refused to choose an outcome based on the
methods and results supplied and instead circled the response Average them,
despite having evaluated some of the surveys as bad, and correctly evaluating
Downloaded by [Colorado College] at 18:38 27 October 2014
Shannons as good in the previous section. Jacobs [25] suggests that although
many students can successfully evaluate dierent survey methods as either positive
or negative, many are unable to condently draw conclusions from multiple
surveys eectively, and try to aggregate the information despite already identifying
biases within the sampling techniques. It is therefore important that teachers guide
instruction so that students are given opportunities to reason and draw conclusions
from the outcomes of multiple survey techniques.
All of the information provided in the questionnaire in this study is in some form
of text. Level 1 performance reects the code-breaking aspects of getting into the
eld of chance and data. Level 2 reects the making of meaning in contexts, which
in this case are social as well as cultural; in this study the participation was at a
Downloaded by [Colorado College] at 18:38 27 October 2014
others
Level 1 Prerequisites for Code-breaker Tier 1 Language,
variation: Working out the Coding practice denitions/processes
environment, table/simple
graph reading, intuitive
reasoning for chance
Developing teacher friendly coding rubrics for the items and providing scores
indicative of levels of student thinking would allow the scale to be used by
teachers, without recourse to sophisticated statistical analysis. Examples of de-
scriptive rubrics are given in the rst part of section 4 and in tables 3 and 4. They
currently exist for all codings in table 5 but would need to be prepared in booklets
for teachers to use.
The use of the scale in measuring change after instruction and learning
experiences aimed at increasing appreciation of variation is the next step in an
extended research programme of which this study is part. It is hoped that other
researchers may nd the questionnaire useful in similar pre-post studies of
learning interventions in this area.
Acknowledgments
This project was funded by an Australian Research Council grant
(No. 00000716).
VC2. Imagine you threw the dice 60 times. Fill in the table below to show how
many times each number might come up.
1
2
3
4
5
6
TOTAL 60
Q3(a) If you were to spin it once, what is the chance that it will land on the
shaded part?
VC3(b) Out of 50 [10] spins, how many times do you think the spinner will
land on the shaded part? Why do you think this?
VC3(c) If you were to spin it 50 [10] times again, would you expect to get
the same number out of 50 [10] to land on the shaded part next
time? Why do you think this?
VC3(d) How many times out of 50 [10] spins, landing on the shaded part
would surprise you?
VC3(e) Suppose that you were to do 6 sets of 50 [10] spins. Write a list that
would describe what might happen for the number of times the
spinner would land on the shaded part?
4. A class did 50 spins of the above spinner many times and the results for the
number of times it landed on the shaded part are recorded below.
VD5. Imagine that three other classes produced graphs for the spinner. In some
cases, the results were just made up without actually doing the experiment.
Downloaded by [Colorado College] at 18:38 27 October 2014
(a) Do you think class As results are made up or really from the experiment?
& Made up
& Real from experiment
Explain why you think this.
(b) Do you think class Bs results are made up or really from the experiment?
& Made up
& Real from experiment
Explain why you think this.
(c) Do you think class Cs results are made up or really from the experiment?
& Made up
& Real from experiment
Explain why you think this.
Students understanding of statistical variation 23
VD6(c) Would the graph look the same every day? Why or why not?
VD6(d) A new student came to school by car. Is the new student a boy or a
girl? How do you know?
VD6(e) What does the row with the Train tell about how the children get to
school?
VD6(f) Tom is not at school today. How do you think he will get to school
tomorrow? Why?
7. A class of students recorded the number of years their families had lived in their
town. Here are two graphs that students drew to tell the story.
Graph 1
X
X X
X X X X
X X X X X X X X X X X X X X X
| | | | | | | | | | | | | | | |
0 1 2 3 4 5 6 10 11 12 13 14 17 25 37
YEARS IN TOWN
X
X X
X X X X
X X X X X X X X X X X X X X X
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
0 5 10 15 20 25 30 35
YEARS IN TOWN
VS8. MOVIEWORLD
A class wanted to raise money for their school trip to Movieworld on the Gold
Coast. They could raise money by selling rae tickets for a Nintendo Game
system. But before they decided to have a rae they wanted to estimate how
many students in their whole school would buy a ticket. So they decided to
do a survey to nd out rst. The school has 600 students in grades 16 with 100
students in each grade. How many students would you survey and how
would you choose them? Why?
Why ____________________________________________________________
Why ____________________________________________________________
(c) Adam asked all of the 100 children in Grade 1. What do you think of
Adams survey?
& GOOD & BAD & NOT SURE
Why ___________________________________________________________
Why ____________________________________________________________
(e) Claire set up a booth outside of the tuck shop. Anyone who wanted to
stop and ll out a survey could. She stopped collecting surveys when she
got 60 kids to complete them. What do you think of Claires survey?
& GOOD & BAD & NOT SURE
Why ___________________________________________________________
(f) Who do you think has the best survey method? Why?
Students understanding of statistical variation 25
Shannon put all the names in a hat and 35% said they would buy tickets.
pulled out 60.
Jake asked 10 kids at the computer games 90% said they would buy tickets.
club
Adam asked all the children in Grade 1. 50% said they would buy tickets.
Ra surveyed 60 of his friends. 75% said they would buy tickets.
Claire set up a booth outside the tuckshop. 95% said they would buy tickets.
What percentage of students in the whole school will buy a rae ticket?
(Circle one)
Downloaded by [Colorado College] at 18:38 27 October 2014
VD11.
VD13. A small object was weighed on the same scales separately by nine students
in a science class. The weights (in grams) recorded by each student are
shown below.
6.3 6.0 6.0 15.3 6.1 6.3 6.2 6.15 6.3
The average value could be calculated in several ways.
(a) How would you nd the average? The average weight is ______ grams.
[Show your working in the box provided]
Q14. Box A and Box B are lled with red and blue marbles as follows:
BOX A BOX B
6 RED 60 RED
4 BLUE 40 BLUE
Downloaded by [Colorado College] at 18:38 27 October 2014
Each box is shaken. You want to get a blue marble, but you are only allowed to pick
out one marble without looking.
^ Which box should you choose?
(a) Box A (with 6 red and 4 blue)
(b) Box B (with 60 red and 40 blue)
(c) It doesnt matter
Please explain your answer.
15. A primary school had a sports day where every child could choose a sport to
play. Here is what they chose.
BOYS 0 20 20 10 50
GIRLS 40 10 15 10 75
Some 96 percent of callers to youth radio Only 389 believed possession of the drug
station Triple J have said marijuana use should remain a criminal oence. Many
should be decriminalised in Australia. The callers stressed they did not smoke
phone-in listener poll, which closed marijuana but still believed in
yesterday, showed 9924 out of the decriminalising its use, a Triple J
10,000-plus callers favoured statement said.
decriminalisation, the station said.
References
[1] Shaughnessy, J. M., 1997, Missed opportunities in research on the teaching and
learning of data and chance. In Proceedings of the 20th annual conference of the
Mathematics Education Research Group of AustralasiaPeople in mathematics
education, edited by F. Biddilph and K. Carr (Waikato, NZ: MERGA), pp. 612.
[2] Moore, D. S., 1990, Uncertainty. In On the Shoulders of Giants: New Approaches to
Numeracy, edited by L. A. Steen (Washington, DC: National Academy Press),
pp. 95137.
[3] Cobb, G. W., and Moore, D. S., 1997, Am. Math. Monthly, 104, 801823.
[4] Wild, C. J., and Pfannkuch, M., 1999, Int. Statist. Rev., 67, 223265.
[5] Shaughnessy, J. M., Watson, J., Moritz, J., and Reading, C., 1999, School
mathematics students acknowledgment of statistical variation. Paper presented at
the Presession Research SymposiumTheres more to life than centers, 77th
Annual National Council of Teachers of Mathematics Conference, San Francisco,
CA.
[6] Zawojewski, J. S., and Shaughnessy, J. M., 2000, Data and chance. In Results from the
Seventh Mathematics Assessment of the National Assessment of Educational Progress,
edited by E. A. Silver and P. A. Kenney (Reston, VA: NCTM), pp. 235268.
[7] Torok, R., and Watson, J., 2000, Math. Educ. Res. J., 12, 147169.
[8] Reading, C., and Shaughnessy, M., 2000, Student perceptions of variation in a
sampling situation. In Proceedings of the 24th Conference of the International Group
for the Psychology of Mathematics Education, edited by T. Nakahara and M. Kyama
(Hiroshima: Hiroshima University), pp. 8996.
[9] Masters, G. N., 1982, Psychometrika, 47, 149174.
[10] Watson, J. M., 1994, Instruments to assess statistical concepts in the school
curriculum. In Proceedings of the Fourth International Conference on Teaching
Statistics: Volume 1, edited by National Organizing Committee (Rabat, Morocco:
National Institute of Statistics and Applied Economics), pp. 7380.
[11] Torok, R., 2000, Australian Math. Teacher, 56(2), 2531.
[12] Holmes, P., 1980, Teaching Statistics 1116. (Berkshire: Schools Council and
Foulsham Education).
[13] Rasch, G., 1960, Probabilistic Models for Some Intelligence and Attainment Tests.
(Copenhagen: Denmarks Paedagogiske Institute; Reprinted by University of
Chicago Press, 1980).
[14] Biggs, J. B., and Collis, K. F., 1982, Evaluating the Quality of Learning: The SOLO
Taxonomy (New York: Academic Press).
[15] Watson, J. M., Collis, K. F., and Moritz, J. B., 1997, Math. Educ. Res. J., 9(1), 60
82.
28 J. M. Watson et al.
[16] Watson, J. M., 1998, Numeracy benchmarks for years 3 and 5: What about chance and
data? In Teaching mathematics in new times: Volume 2, edited by C. Kanes, M. Goos,
and E. Warren (Brisbane: Mathematics Education Research Group of Australasia),
pp. 669676.
[17] Watson, J. M., and Moritz, J. B., 1999, Australian J. Early Childhood, 24(2), 2227.
[18] Watson, J., and Pereira-Mendoza, L., 1996, Australian J. Lang. Literacy, 19, 244
258.
[19] Haley, M., 2000, Boaties safety failure. The Mercury, 30 March, p.7.
[20] Watson, J. M., Collis, K. F., and Moritz, J. B., 1993, Assessment of statistical
understanding in Australian schools. Paper presented at the Statistics 93 conference,
Wollongong, NSW.
[21] Moritz, J. B., Watson, J. M., and Pereira-Mendoza, L., 1996, The language of
statistical understanding: an investigation in two countries. Paper presented at the
Joint ERA/AARE Conference (Singapore) Available [on-line]: http://www.swin.
edu.au/aare/96pap/morij96.280
[22] Konold, C., and Higgins, T. L., in press, Working with data: Highlights related to
research. In Developing Mathematical Ideas: Collecting, Representing, and Analyzing
Downloaded by [Colorado College] at 18:38 27 October 2014
[40] Watson, J. M., 1997, Assessing statistical literacy using the media. In The Assessment
Challenge in Statistics Education, edited by I. Gal and J. B. Gareld (Amsterdam:
IOS Press and The International Statistical Institute), pp. 107121.
[41] Luke, A., and Freebody, P., 1997, Shaping the social practices of reading. In
Constructing Critical Literacies: Teaching and Learning Textual Practice, edited by
S. Musprati, A. Luke and P. Freebody (St. Leonards, Australia: Allen and Unwin),
pp. 185225.
Downloaded by [Colorado College] at 18:38 27 October 2014