Professional Documents
Culture Documents
Abstract
The broad objective of the present work was to successfully engage lowachieving, underserved middle-school students in activities introducing them to
practices of science. These extended beyond the typical univariable control-ofvariables strategy to include attribution and prediction involved in coordination
of multiple variables affecting an outcome, as well as practices of argument and
counterargument in advancing and challenging claims. Social science content
was used as a way to help students see the purpose and value of scientific
practices. The broad objective was largely met, as evidenced by near and far
assessments of transfer and maintenance among two 6th grade classes (N=49),
both outperforming a control group (N=23). Although students engaged
successfully in argument and counterargument as part of the intervention
activities, less successful was far transfer to meta-level reasoning about
argumentation and nature of science. Finally, and importantly in its practical
implications, one of the groups of intervention students showed less gain in 10
45-min classroom sessions than did the other group who engaged as pairs in
the same sequence of activities over an average of six 24-min individualized
sessions.
recognize and appreciate (Applebee, 1996; Forman & Ford, 2014; Lehrer &
Schauble, 2015; Sandoval, 2015).
These objectives of science education are a tall order to implement for
any students, let alone academically underperforming, disadvantaged ones.
Rudolph (2014) notes that such objectives in fact go back as far as Dewey and
that the problem lies not with these educational goals but with schools seeming
inability to devise workable school experiences to achieve them. What appear
to be the most promising methods have yet to be widely implemented in
classrooms.
Adding to the challenge is the restricted way in which scientific practice
has been defined, both in the K-12 classroom and in educational research. In
both cases science practice typically has been taken to mean use of the
scientific method, which in turn has been regarded as the design and analysis of
a controlled experiment. Moreover, the experiment is a univariable one, its
essence being the control (by holding constant) of variables (COV) in order to
identify the effect of a single variable on an outcome. This univariable model of
a rudimentary science experiment is not only the staple of classroom
introductions to the scientific method. Mastering COV similarly has until recently
been a primary focus of research on the development of scientific thinking (for
reviews see Kuhn, 2011; Lehrer & Schauble, 2006, 2015; Zimmerman, 2007).
In the real world, in contrast, outcomes are most often the consequence
not of a single cause but of multiple factors acting in concert, a fact that
practicing scientists are well aware of and take into account in both their
theoretical models and empirical investigations. The univariable logic and
execution of COV represents at most one narrow slice of authentic scientific
inquiry, and the most recent writing on developing childrens competency in
science emphasizes involving students not in acquisition of a tool kit of discrete
skills such as COV but rather in the practice of science as an authentic,
integrated whole (Lehrer & Schauble, 2015).
This is the approach we have sought to implement in the work presented
here. Moreover, we seek to establish that the objective of introducing beginning
4
Participants in the present work come from the low SES, low-achieving
middle-school population in which several researchers have reported it difficult
to develop rudimentary scientific thinking skills, compared to success in doing
so among more privileged groups (Kuhn & Dean, 2008; Siler, Klahr et al., 2010;
Lorch, Lorch, Calderhead et al., 2010). Some of these low-performing students,
for example, in seeking to design an experiment fail to manipulate the focal
variable. This failure can be attributed at least in part to absence of more basic
understanding of the purpose of scientific investigation as a) seeking to answer
questions whose answers are not already known, and b) engaging in causal
analysis, rather than seeking only to optimize outcomes (Schauble, Klopfer, &
Raghavan, 1991).
Furthermore, it is widely observed that students in such populations
typically show little interest in or disposition to study science. This population,
then, seemed to us an especially important one to reach and achieve success
with, remaining mindful of the practicality of the methods examined for largescale classroom use. We therefore included here a comparison of two parallel
methods, identical except that one is administered to pairs of students by a
researcher while the other is administered to a whole class by the classroom
teacher (with some researcher assistance).
Method
Participants
The participants were from a public school in a low-income neighborhood
in the Bronx in New York City. There were a total of 72 students (38 females)
from three sixth-grade and one seventh-grade science classes, all taught by the
same teacher. Participating in pairs in a pair intervention condition were 25
students (12 females) from one sixth-grade classroom. Because of the odd
number of students in this group, one student participated in the intervention
7
required seven sessions. These sessions took place over an average of 32 days,
with a range from 14 to 59.
At the first session, the activity was introduced as follows, illustrated by
an accompanying Powerpoint graphic:
A new Astro-World Foundation, funded by some wealthy businessmen,
wants to provide money for a space station. Groups of young people would
live there for several months. Many young people have applied. The
Foundation president needs to choose the best ones. So she asked some
applicants to spend a week in a space simulator (picture is shown and
function explained). She had background information about each
applicant, and each one got a rating on how well they survived in the
harsh conditions of the simulator. Some did fine; others okay, and some
became sick and had to leave.
Based on these records, she can decide which things are important to ask
new applicants about and which ones arent. Some of the factors, she
noticed, made a big difference to how well an applicant did, some made a
small difference, and some made no difference. She found out, for
example, that body weight made no difference: Heavy people did as well
in the simulator as light ones. But other things about people seemed to
make a big difference in how well they did. So now, when she chooses final
groups of astronauts to go on the real trips, shell have a better idea what
things to find out about applicants, so she can be pretty sure how an
applicant will do and shell be able to choose the ones who will do best.
But, in order to be sure, shes asked for our help in analyzing their results.
Which things are worth asking applicants about and which dont make any
difference, like body weight? There are a lot of things that we can ask
about but the foundation cant ask about everything. It would take too
long. If we know what to ask applicants, we can choose the best team of
astronauts.
Here are four things that the foundation thought might make a difference
to how well people do in the simulator: Fitness - does how well the person
can run or do other exercises matter? - Family size - does the size of the
family the person grew up in matter? - Education - does how much
education a person has matter? - and
Parents health - does the health of the persons parents matter? All the
applicants seem healthy, but maybe their parents health might say
something about how healthy they will turn out to be.
Will you help figure out which things are worth asking the applicants about
and which ones dont matter? Then you can predict how well theyll do and
choose the best ones for the team. Later, you can compare your results
with those of your classmates and see who chose the best-performing
astronaut team.
Following this introduction, pairs were asked to record on a form which of the
four factors they thought would and would not matter. In the classroom
condition, a tally across the class was shown, and in both groups it was noted
that opinions differ.
Control of variables phase. This phase was introduced by the adult
(teacher or facilitator, depending on condition) saying, These are only opinions
and what someone thinks. Now, lets look at the data to find out what actually
does matter and whether your hypotheses were right. A general reminder of
the larger purpose of the activity was then provided and was repeated
periodically throughout the activity (a minimum of once per session):
Remember, the goal is to figure out what matters to how well people do in the
simulator. Why do we want to know that? Because once we know what matters,
we can predict how well people will do. That way, we can pick the best team.
10
Students were then shown a set of 24 index cards, each containing the
applicants standing on the four factors and a space to record an applicants
performance rating in the simulator. Students were told that if they studied the
records carefully they could determine which factors make a difference to
performance and which dont. Students shared the set of cards with their
partner and were reminded that they needed to agree before making decisions
or drawing final conclusions.
The adult suggested studying one factor at a time. Students chose the
order of investigation in the pair condition; in the classroom condition it was
fixed (fitness, education, family size). Students were invited to choose from the
set the card(s) they would like to look at and to explain verbally (pair condition)
or in writing on a form for this purpose (classroom condition) what they would
find out from this choice of card(s). They then requested verbally (pair
condition) or in writing on a Data Request form (classroom condition) the
performance outcomes for their chosen applicants.
After recording these outcomes, pairs then had the option either to reach
a conclusion or to postpone concluding and seek further evidence by repeating
the preceding process. Once a pair was certain they had reached a conclusion
about a factors status, they could enter it on a Draft Memo form. In the pair
condition, if the conclusion was a valid one based on a controlled comparison,
the pair proceeded to choosing another factor to examine. If no controlled
comparison existed allowing a valid inference, the adult embarked on a
sequence of probe questions (see Appendix) whose purpose was to support
recognition of the limitation of the students investigative approach in not
yielding a definitive conclusion (e.g., Couldnt it also be the difference in
education thats leading to the different outcomes?). No superior approaches
were suggested, and scaffolding did not go beyond highlighting failure to
achieve the goal (a definitive conclusion). In the case of valid conclusions,
challenging probes were introduced (see Appendix), e.g., Suppose someone
disagrees with you and doesnt think that this factor makes a difference; what
could you tell them to convince them?
11
12
It was then illustrated that charts can be generated that separate cases
into different categories, for example in the display shown (Figure 2), only those
cases in which the applicants fitness was average rather than excellent are
shown. Students were then asked why it was that these applicants all of the
same fitness level showed a range of performance outcomes. With a little
prompting, students were able to generate the response that other factors
besides fitness were contributing to the outcomes.
Students were then shown a third display (Figure 3) in which all levels of
the fitness variable are included. They were asked to draw conclusions about
whether the factor makes a difference to the applicants performance. Given the
ability to see more data at once, students were asked to see if they reached the
same conclusions as they did earlier when comparing individual cases
presented on cards.
Students were then provided InspireData charts for each of five factors,
four introduced previously and one new one (home climate, a non-effective
factor), each of the same form as Figure 3, showing outcomes for all levels of
the factor. Students were reminded these charts would give them an
13
opportunity to verify their earlier conclusions. In their pairs, they did this and
then wrote memos to the foundation director confirming their earlier
conclusions based on a larger sample or revising their conclusions if they
thought necessary.
Figure 2. InspireData Chart Showing only Cases with Average Level of Fitness
14
Prediction phase. Students were told that now that they had reached final
conclusions, they could try using them to evaluate a new set of applicants. They
would then be able to select a set of five to be chosen for the astronaut
program and compare their choices to those of their classmates. Students were
told that they could select up to four factors about the new applicants that they
could receive information on. As students were selecting the factors, the adult
reminded them to review the InspireData chart and consider whether
knowledge of status on this factor would be informative as to outcome. In the
classroom condition, a similar process took place at the whole-class level.
Information about 10 new applicants on four factors (including one noneffective one, whether or not it was asked for), data for each applicant
appearing on a separate card and cards presented one at a time. Students
completed the first prediction with guidance and then worked independently. In
addition to making each prediction, they were asked for each one, Which of the
four factors you have data on mattered to your prediction? Students were
encouraged to review the InspireData charts to doublecheck their decisions or
when there are disagreement between the student pair.
15
A final discussion occurred when pairs made their selections of the five
top-rated candidates and, in the classroom condition, shared these with the
whole class. This discussion included remembering the beliefs they had initially
held about the factors and noting that they would not have chosen the same
applicants before and after the analysis they had conducted.
Progression through the sequence. Students in the pair condition
progressed at their own pace, with a judgment made by the facilitator as to
when both members of the pair had a solid grasp of COV and appropriately
justified and defended their claims and were thus ready to move to the
multivariable phase; a similar judgment was made regarding progression from
the multivariable to prediction phases. Although this close monitoring was not
possible in the classroom condition, a paper-and-pencil assessment was
administered after the 8th class session to ascertain individual progress to this
point. The task required students to select a case to compare to one presented
to them, in order to determine whether one variable (fitness) made a difference.
Post-intervention assessment
All post-intervention assessments except one were conducted individually
and all were delayed in order to assess maintenance of achievements. Among
students in the pair condition these assessments occurred an average of 26
days following completion of the intervention (range 14 to 42 days). Among
students in the classroom condition they occurred an average of 32 days
following completion of the intervention (range 18 to 46 days). Assessments for
students in the control condition occurred during this same time period.
Maintenance and near transfer of skills in experimental design and control
of variables (COV). As the first component of the delayed post-intervention
assessment, a task related to the intervention activity was individually
administered to students in both intervention conditions. It was not
administered to students in the control condition as they were not familiar with
the intervention content. Its purpose was to assess the extent to which
16
17
18
3. Reconciling claims. The final task continued the topic theme but
was presented in writing in the classroom on an occasion about one and a
half months after the other posttests, reducing possible influence of
students particular responses to the previous two tasks having the same
theme. Two potentially conflicting causal claims are now explicitly
presented and the question asks the participant how to interpret this
discrepancy:
You were hired by the Health Department to find out why people
living in the city of Logan, Georgia are getting cancer more often
than people who live outside the city. You tested and found out that
air pollution was worse inside the city than outside. You wrote a
report of your findings to the Health Department director, telling her
that air pollution was a likely cause of the increase in cancer.
She also got a report from another person she hired. This report said
that a likely cause of the cancer increase was not enough stores in
the city for people to buy healthy fruit and vegetables that lower risk
of cancer.
The director isn't sure what to conclude and she has written you
asking for advice. What would you write back? Give her the best
advice you can.
This question, then, unlike earlier ones, solicits reasoning not about the
claims themselves but rather meta-level reasoning about their status in
relation to one another and how the discrepancy between them is to be
understood a form of reasoning that is epistemological in nature and
central to scientific practice.
Results
Designing experiments and making inferences
19
Pair condition
Classroom
1 (4%)
2 (8%)
2 (8%)
condition
3 (13%)
3 (12%)
6 (25%)
20
21
Classroom
Control
condition
condition
condition
3 (12%)
13 (54%)
14 (61%)
7 (28%)
4 (17%)
7 (30%)
comparison
15 (60%)
7 (29%)
2 (09%)
Note. N=25 for the pair condition, 24 for the classroom condition and 23 for the
control condition.
Thus, comparing tables 1 and 2, to an approximately equal extent across
the two intervention conditions, fewer participants maintained controlled
comparison consistently when content was new and represented only in a
traditional paper-and-pencil format. Again assigning 4 points for a controlled
comparison and 3 points for varying a focal variable, mean score for the pair
group was 3.71(SD=0.51), comparable to the near transfer mean reported
above of 3.68. For the classroom group the mean was 3.10 (S.D. = .78), slightly
below the near transfer mean of 3.23. For the control group, the mean was 2.99
(S.D. = 0.63). The difference was significant for comparisons between the pair
condition and the classroom condition, t = 3.23, df = 39.59, p = .003, and
between the pair condition and the control group, t = 4.32, df = 42.43, p < .
001, but not between the classroom condition and the control group, t = 0.54,
df = 43.85, p = .591.
Multivariable analysis and prediction
22
Students overall did well on this task, indicating they understood the task
and were capable of performing it, yet there were significant differences in
performance across groups. In Table 3 appear mean prediction error scores over
the six items by group and the number of students in each group who showed
modal performance of zero error (a correct prediction). Of remaining students
who did not attain a modal performance level of zero error, all but one student
showed a modal level of 1, with the remaining student showing a modal level of
2. The maximum error score for each item was 2.
Table 3. Mean Prediction Error Scores and Modal Frequencies by Group
Pair
Classroom
Control
condition
condition
condition
0.16 (.23)
0.42 (.39)
0.67 (.28)
23
intermediate but closer in performance to the pair group than to the control
group.
Table 4. Mean Number of Times (of 6) each Factor was Reported as Having
Influenced Prediction
Pair condition
Effective factors
Employment
Family size
Education
Climate
Ineffective
*5.61
*5.57
*5.87
*5.30
Classroom
Control
condition
condition
*5.00
*5.25
*5.04
*4.00
*3.91
*3.70
*4.04
2.25
factor
Country Size
0.04
1.21
1.74
Note. *Means of the contributing factor significantly different from the mean for
the non-contributing factor, country size. N=23 in the pair condition, 24 in the
classroom condition and 23 in the control condition.
Students were categorized based on the consistency of their attributions as
follows:
1-Chose only one but inconsistent factor across 6 countries
2-Chose only one consistent causal factor across 6 countries
3-Chose multiple but inconsistent factors across 6 countries
4-Chose multiple consistent (but not all four effective) factors across 6
countries
5-Chose four effective factors completely consistently across 6 countries
The mean difference between pair and classroom groups was significant, t
=3.32, df= 42.38, p=.002. The classroom group significantly outperformed the
control group, t =2.86, df = 37.87, p=.007, as did the pair group, t =7.73, df =
42.12, p<.001.
24
Classroom
Control
condition
condition
condition
4 (17%)
5 (21%)
6 (26%)
6 (26%)
10 (42%)
9 (39%)
13 (57%)
9 (38%)
8 (35%)
26
27
fruits and vegetables and use less fire because it releases smoke & causes air
pollution.
None of these responses raise questions about whether the validity of the
causal claims should be evaluated (the second of which made no reference to
supporting evidence), rather than their being simply put into action. Across the
entire sample, there occurred only six exceptions, five from students in the
classroom intervention condition and one in the control condition. One of these
noted that the claims were not mutually exclusive I would advise that both
could be possible. The others sought to compare likelihoods of their
correctness, two drawing on their own knowledge Maybe air pollution is
worse than not enough stores. Theyre both important but pollution in the air is
important because what we are breathing is air, or It cant be lack of fruit
because some people dont eat fruit and dont get cancer. The remaining three
suggested methods for doing so e.g., Ask people that have cancer if they eat
healthy. If they say yes they probably got cancer from air pollution.
Discussion
In light of a history of difficulty in effecting advances in higher-order
thinking skills in the disadvantaged, low performing population studied here, the
goal of the present study was essentially met. Both intervention groups showed
notable although far from perfect mastery of COV relative to a control group,
yet with diminished maintenance in tests of far transfer following delay, among
the pair group as well as the classroom group, for whom the intervention
appeared less effective. With 80% of students in the pair condition and 50% in
the classroom condition consistently showing controlling across multiple tasks,
these results compare favorably to previous efforts with similar populations that
address only the COV strategy (Lorch et al., 2014; Siler et al., 2010).
Vacillation in use of new strategies, including COV, during a period when
they are first being acquired has been widely reported in microgenetic studies
28
(Kuhn, 1995; Siegler & Crowley, 1991). As commonly found, students in the
present study displayed more consistency in transfer tasks with familiar than
with unfamiliar material. An identical number of students in both intervention
groups (5 students in each, about 20%) did not maintain their previously
consistent use of their experimental design skills when they transitioned from
familiar (near test) to new (far test) content, becoming less consistent in its
application. The majority (60%) of the pair group maintained consistent control,
while slightly less than a third (29%) of the classroom group did so. The
difference in performance of the two groups (about 30 percentage points) thus
remained equivalent at the two assessments. These findings are consistent with
the view that continued and varied experience across a succession of domains
is necessary if consolidation of higher-level cognitive strategies (as assessed in
tests of maintenance and far transfer) is to be achieved.
With respect to the less studied skills entailed in coordination of effects of
multiple variables, as assessed in the Life Expectancy assessment, both
intervention groups displayed considerable mastery, especially in relation to the
far from optimal performance shown by adults on this task (Kuhn, Ramsey, &
Arvidsson, 2015) as well as the present control group.
The broad objectives of the present work were thus met, in establishing
that it is feasible using the methods we did to engage low-achieving,
underserved middle-school students in activities that introduce them to
practices of science, following which they show measureable benefits, with a
majority continuing to evidence understanding of and facility in these practices
several weeks later. In other recent work (Ramsey, 2014), there exists evidence
of experience in similar activities extending to a more traditional physical
science topic (weather patterns).
Results are more mixed with regard to argumentation and to developing
meta-level thinking about argument and related understandings of nature of
science. The intervention activities involved not only a repeated requirement to
justify ones claims with appropriate evidence, but also repeated engagement
with the probe, Suppose someone disagreed with you which we expected to
29
afford exercise in defending and supporting alternative claims using evidencebased arguments. This expectation was largely met within the intervention.
Students for the most part with practice became successful in meeting these
demands and most often did so confidently by means of direct reference to
evidence, e.g., Id show them the results on the chart.
Where students demonstrated less success was in extending these
competencies to meta-level reflection on argumentation in new contexts
unrelated to the intervention. We expected the intervention to support students
understanding of evidence as the strongest arbiter of divergent claims and to
thus lead them to reason about instances of such divergence accordingly. These
expectations were largely not realized when the assessment was extended to
new contexts not connected to the intervention. The post-intervention
argumentation assessments did suggest some achievement with respect to
recognizing multiple contributors to an outcome, consistent with the findings of
Kuhn, Ramsey and Arvidsson (2015) in a multi-year intervention. In the
counterargument assessment, however, only a slight difference across groups
appeared, with control group students more often mistaking the existence of an
alternative contributor (option A) as evidence against the role of the contributor
being examined. No group differences appeared, however, in students
recognizing absence of the outcome in the presence of the antecedent (option
B) as the strongest evidence to use in counterargument (the option adults are
most likely to choose), and they were just as likely to choose the absence of the
antecedent in the presence of the outcome (option C) as strongest (despite its
consistency with an alternative factor having produced the outcome).
Finally, students showed least progress in the group-administered
assessment asking them to account for divergent claims in this context
unconnected to the intervention. Only a small proportion of students undertook
to do so. The large majority did not treat divergent causal claims as a cause for
attention, examination, or attempted reconciliation. Instead, without
acknowledging the divergence, they focused on one or the other or both
imputed factors as causes worthy of action without further investigation.
30
directed activities that involve science practices, over a wide range of content,
they are in the best position to extract some general attributes of these
practices and to appreciate their value. This experience can only accrue
gradually. As well as the skill components involved in coordinating evidence and
claims in the service of argument, scientific practice encompasses the values
and norms that come from participation with others in a community that
upholds shared standards of knowing (Manz, 2014; Sandoval, 2015).
We turn finally to the comparison of our two intervention groups, whose
performance showed notable differences. A major difference in implementation
of the intervention was not only the whole-class vs. pair setting but the time
frame the classroom group participated for ten 45-min class sessions, whereas
the pair group participated on average for the equivalent of only three such
class sessions (an average of six 24-min sessions), less than a third as much
time invested. It was even the case, moreover, that the classroom group
showed a gain in COV from their interim performance on a whole-class
assessment to performance assessed individually several weeks after the
intervention, suggesting the importance of simply this difference in assessment
conditions.
Overall it was the individually instructed group who fairly consistently
showed superior performance, despite the lesser time invested. This outcome,
we believe, is one that has important practical implications, in speaking to the
value of individualized instruction, in particular for the population we studied.
The classroom group was able to make progress, but less efficiently, in large
part, we believe, due to a) students individual claims not being directly
challenged (instead only less directly so during whole-class discussion); b) the
attitude we observed in the classroom that only intermittent attention was
necessary to keep pace with what was taking place; and c) the classroom time
that routinely needed to be diverted to gaining and regaining students
attention. In the pair condition, in contrast, the pair was in constant interaction
with each other and the adult and required to think about and justify whatever
they said at just the time they said it. In current work, we are therefore
32
exploring ways to make the protocol shown in the appendix automated in a way
that could make it practical for large-scale use. This is of course a sizeable step
from personal conversation between peers and a more knowledgeable
facilitator, but we believe it may be one worth pursuing, especially in seeking to
reverse the long-standing lack of success that continues to be widely observed
among the population studied here.
33
Footnotes
1. In a sample of 16 educated adults, 8 (50%) chose B, while 3 (19%) chose
both B and an additional option.
2. The remaining two students in each intervention group were consistent but
attributed influence to only 3 of the 4 factors.
34
References
Applebee, A. N. (1996). Curriculum as conversation: Transforming traditions of
teaching and learning. University of Chicago Press.
Berland, L. K., & Hammer, D. (2012). Students framings and their participation
in scientific argumentation. In xxxxxxx (Ed.), Perspectives on scientific
argumentation (pp. 73-93). Springer Netherlands.
Carey, S., & Smith, C. (1993). On understanding the nature of scientific
knowledge. Educational psychologist, 28(3), 235-251.
Crowell, A., & Kuhn, D. (2014). Developing dialogic argumentation skills: a 3year intervention study. Journal of Cognition and Development, 15(2),
363-381.
Ford, M. J. (2012). A dialogic account of sense-making in scientific
argumentation and reasoning. Cognition and Instruction, 30(3), 207-245.
Forman, E. A., & Ford, M. J. (2014). Authority and accountability in light of
disciplinary practices in science. International Journal of Educational
Research, 64, 199-210.
Jewett, E., & Kuhn, D. (2015). Problem-based Learning as a Tool in Developing
Higher-order Intellectual Skills in Low-achieving Students. Manuscript
under review.
Kelly, G. (2008). Inquiry, activity and epistemic practice. In R. Duschl, & R.
Grandy (Eds.), Teaching scientific inquiry: Recommendations for research
and implementation (pp. 99117). Rotterdam, The Netherlands: Sense
Publishers.
35
36
Kuhn, D., & Pease, M. (2009). The dual components of developing strategy use.
In H.S. Waters & W. Schneider (Eds.), Metacognition, strategy use, and
instruction. New York: Guilford Press.
Kuhn, D., Ramsey, S., & Arvidsson, T. S. (2015). Developing multivariable
thinkers. Cognitive Development, 35, 92-110.
Kuhn, D., Zillmer, N., Crowell, A., & Zavala, J. (2013). Developing norms of
argumentation: metacognitive, epistemological, and social dimensions of
developing argumentive competence. Cognition and Instruction, 31(4),
456-496.
Lazonder, A., & Kamp, E. (2012). Bit by bit or all at once? Splitting up the inquiry
task to promote children's scientific reasoning. Learning and Instruction,
22, 458-464.
Lehrer, R., & Schauble, L. (2006). Scientific thinking and scientific literacy:
Supporting development in learning contexts. In K. A. Renninger & I. Sigel
(Vol. Eds.) & W. Damon (Series Ed.), Handbook of Child Psychology. Vol. 4.
Hoboken, NJ: Wiley. (6th ed.)
Lehrer, R., & Schauble, L. (2015). The development of scientific thinking. In L.
Liben (Vol. Ed.) & R. Lerner (Series Ed.), Handbook of Child Psychology
and Developmental Science. Vol. 2. Hoboken, NJ: Wiley. (7TH ed.)
Lorch Jr, R. F., Lorch, E. P., Calderhead, W. J., Dunlap, E. E., Hodell, E. C., & Freer,
B. D. (2010). Learning the control of variables strategy in higher and lower
achieving classrooms: Contributions of explicit instruction and
experimentation. Journal of Educational Psychology, 102(1), 90-101.
37
Lorch Jr, R. F., Lorch, E. P., Freer, B. D., Dunlap, E. E., Hodell, E. C., & Calderhead,
W. J. (2014). Using valid and invalid experimental designs to teach the
control of variables strategy in higher and lower achieving classrooms.
Journal of Educational Psychology, 106(1), 18-35.
Manz, E. (2014). Representing Student Argumentation as Functionally Emergent
From Scientific Activity. Review of Educational Research, xxxxxxxxxxxxxx
Osborne, J. (2014). Teaching scientific practices: Meeting the challenge of
change. Journal of Science Teacher Education, 25(2), 177-196.
Ramsey, S. H. (2014). How Do We Develop Multivariable Thinkers? An
Evaluation of a Middle School Scientific Reasoning
Curriculum. Unpublished Ph.d. dissertation, Teachers College Columbia
University.
Rudolph, J. L. (2014). Deweys Science as Method a Century Later Reviving
Science Education for Civic Ends. American Educational Research Journal,
51, 1056-1083.
Sandoval, W. (2005). Understanding students practical epistemologies and their
influence. Science Education, 89, 634656.
Sandoval, W. (2014). Science educations need for a theory of epistemological
development. Science Education, 98, 383-387.
Sandoval, W. (2015). Epistemic goals. In R. Gunstone (Ed.), Encyclopedia of
Science Education (pp 393-398). Dordrecht: Springer Netherlands.
38
Schauble, L., Klopfer, L. E., & Raghavan, K. (1991). Students' transition from an
engineering model to a science model of experimentation. Journal of
Research in Science Teaching, 28(9), 859-882.
Siegler, R., & Crowley, K. (1991). The microgenetic method: A direct means for
studying cognitive development. American Psychologist, 46(6), 606-620.
Siler, S., Klahr, D., Magaro, C., Willows, K., & Mowery, D. (2010, January).
Predictors of transfer of experimental design skills in elementary and
middle school children. In Intelligent Tutoring Systems (pp. 198-208).
Springer Berlin Heidelberg.
Zimmerman, C. (2007). The development of scientific thinking skills in
elementary and middle school. Developmental Review, 27, 172223.
39
Pair condition
Classroom
1 (4%)
2 (8%)
condition
3 (13%)
3 (12%)
40
Classroom
Control
condition
condition
condition
3 (12%)
13 (54%)
14 (61%)
7 (28%)
4 (17%)
7 (30%)
comparison
15 (60%)
7 (29%)
2 (09%)
Note. N=25 for the pair condition, 24 for the classroom condition and 23 for the
control condition.
41
Classroom
Control
condition
condition
condition
0.16 (.23)
0.42 (.39)
0.67 (.28)
42
Table 4 Mean Number of Times (of 6) each Factor was Reported as Having
Influenced Prediction
Pair condition
Effective factors
Employment
Family size
Education
Climate
Ineffective
*5.61
*5.57
*5.87
*5.30
Classroom
Control
condition
condition
*5.00
*5.25
*5.04
*4.00
*3.91
*3.70
*4.04
2.25
factor
Country Size
0.04
1.21
1.74
Note. *Means of the contributing factor significantly different from the mean for
the non-contributing factor, country size. N=23 in the pair condition, 24 in the
classroom condition and 23 in the control condition.
43
Classroom
Control
condition
condition
condition
4 (17%)
5 (21%)
6 (26%)
6 (26%)
10 (42%)
9 (39%)
13 (57%)
9 (38%)
8 (35%)
44
45
Figure 2 InspireData Chart Showing only Cases with Average Level of Fitness
46
47
A. (If pair chooses only one record) What are you going to find out from this
record? (Pair responds.)
(Facilitator provides the outcome for the chosen record. Pair records the
outcome on the record.)
What did you find out? (Pair responds.)
Can you conclude whether [factor being investigated] makes a difference to the
performance in the simulator? (Pair responds.)
1. (If pair says yes, they can tell whether factor makes a difference)
What will happen to the outcome for this applicant if [factor being investigated]
goes up? (Pair responds based on belief.)
Do you know for sure? Why dont you test this out to be sure? What cards would
you need to test this out?
2. (If pair says no) What cards would you need in order to be able to find out
whether this factor makes a difference? (Pair responds.)
(If pair does not suggest finding another record to compare) What will happen
to the outcome for this applicant if [factor being investigated] goes up? (Pair
responds based on belief.)
Do you know for sure? Why dont you test this out to be sure? What cards would
you need to test this out?
B. (If pair chooses two records) What are you going to find out by comparing
these two records? (Expected response: whether X makes a difference to the
outcome.)
(If pair answers anything else, guide pair to only think about finding out
whether [factor being investigated] makes a difference to the outcome.)
(Provide pair the outcome for the chosen record. Pair records the outcome on
the record.)
49
50
3. (If records were controlled) What will outcome be when [factor being
investigated] changes from one level to another? (Pair responds. If necessary
facilitator reviews what it means that a factor makes a difference.)
Make an argument to the foundation of how youre sure [factor being
investigated] is/isnt important. (Pair responds.)
But couldnt I say that applicant A has [certain level] for [another factor] and
that is why applicant A has a better grade than applicant B? (Expected
response: no, because applicant B also has the same level for [the other factor]
as applicant A. So it cannot be [the other factor] that is making the difference in
performance. If student does not provide the expected answer, guide the
partner to challenge the students response. If both students fail to note that
[the other factor] was the same for both records, facilitator asks this question
again when investigating the next factor and points specifically to the records.)
Suppose someone disagrees with you and doesnt think that [factor being
investigated] does/does not make a difference; what would you say to them to
convince them? (Pair responds. Eventually, expected response: show them the
cards.)
Lets write down what you found out here as a memo to the foundation. (Gives
pair a memo form)
(Once pair has successfully controlled for one or two factors) What if we change
your comparison so that [another factor] also differs. Can you still use this
comparison to show that [factor being investigated] makes a difference to the
outcome? (Pair responds.)
Why is this comparison not convincing? (Pair responds.)
Now choose another factor to investigate.
4. (Fallback Questions for if, after a few tries, pairs still do not choose
controlled records) What do you think would happen to applicant As
51
performance, if, for [factor being investigated], she has a different level? (Pair
makes a guess of performance.)
Should we find out what the records show? Which record do we need? (Pair
responds.)
a. (If pair still does not choose a controlled record) What would applicant
As record look like if she has a different level for [factor being investigated]?
(Pair responds.)
Lets look for that card so that we know what happens to applicant As
performance if she has a different level for [factor being investigated].
52
no college, some college and college education. Lets see if you come to the
same conclusion as you did earlier now that you are looking at more results.
Which factor do you want to look at first using the charts? (Pair responds.)
(Shows chart for [factor being investigated].)
So, remember, other things may be contributing as well. But can we say
OVERALL that [factor being investigated] makes a difference?
A. (If pair refers to beliefs or Phase I comparisons) Remember that the
organization wants to be really sure before we make any decisions. So we dont
want to rely on opinions/just two cases.
What does the chart say about whether the factor matters or not to the
performance?
B. (If pair is unsure how to tell if [factor being investigated] makes a
difference) Lets look at the chart together. What do you notice about the
performance levels when the applicants have [a certain level, e.g. average] and
what about when they have [a different level, e.g., excellent] for [factor being
investigated]? (Pair responds.)
Do the applicants have different or similar kinds of performance? What does
that tell you about whether [factor being investigated] makes a difference to
the performance?
C. (If pair correctly compares the distributions) Make an argument to the
foundation of how you are sure [factor being investigated] is/isnt important.
(Correct response: by comparing the 2 distributions)
Suppose someone disagrees with you, what would you say to them to convince
them? (Pair responds. Expected response: show them the chart.)
53
How would the graph look different if [factor being investigated] makes no
difference to how well people do?
Lets write down what you found out here as a memo to the foundation. (Give
pair a memo form.)
Now choose another factor to investigate.
(Once all factors have been examined) Now, lets make a summary for what you
have figured out. (Facilitator assists as needed as pair completes summary
memo.)
Phase 3. Prediction
Youve worked hard to figure out what factors mattered to the applicants
performance. Now, you can use your knowledge to predict how well they will do.
I have some new applicants here. Can you predict how well they will do? Then
you can choose a best team of five.
What information about the applicant would you need to predict his/her
performance? I can only tell you up to four things about the applicant. Which
factors do you want to know about? Look at the list of factors here; what
information about them would you like to have? You can make use of the
summary sheet or refer to any of the charts anytime. (Expected response: all
causal factors. If not all causal factors are requested, urge pairs to review each
chart and if necessary, repeat phase 2 protocols.)
Here is the information we have on this applicant. (Facilitator provides data for
all requested causal factors and, unless pair chose a non-causal factor, also
include Home Climate, a non-causal factor, for the applicant)
54
Now you can predict. Predict how each one will do, and explain why you made
that prediction. Be sure to discuss with your partner before making a final
decision.
(Applicant description sheets are presented, one at a time, with charts handy so
that pair can refer to them when needed.)
(For the first three predictions, facilitator asks, Which factors mattered to you
prediction?)
A. (If pair selects a non-causal factor as influencing the prediction)
What did you find out about [the non-effective factor]? Did it make a difference
to the outcome?
1. (If pair answers yes) How did you know? (Pair responds based on belief)
Do you know for sure? Lets look back at the chart?
2. (If pair answers no) When you predict how well someone will do, will it help
you to know whether they have [a certain level] or [another level] on this
factor?
a. (If pair answers yes) Do you know for sure? Why dont you check out the
chart?
(Show chart of the non-causal factor)
What do you think will happen to the applicants performance if for [the noneffective factor], they go from having [a certain level] to [another level]? (Pair
responds.)
i. (If pair answers no change) Do you still need to know about this factor to
make your prediction?
ii. (Otherwise) Lets find out. (Points to chart) Does it matter to the
performance whether [the non-effective factor] is [a certain level] or [another
level]? (Pair responds.)
55
Do you still want to know about this factor to make your prediction?
B. (If pair does not select one or more effective factors as influencing
the prediction) What made you predict this performance level? (Pair response:
because [a causal factor] is at a certain level)
So you are saying that because [a causal factor] is at [a certain level] thats why
you think this will be the performance level? What about factor [a causal factor
not selected]? Did you find out whether [the causal factor not selected] makes a
difference? (Pair responds.)
1. (If pair says yes) Does the applicants performance go up when [the causal
factor not selected] change from one level to another? (Pair responds.)
a. (If pair says yes) Then, when someone has a hi level of [the causal factor
selected] and [the causal factor not selected], what happens to the applicants
performance? (Expected response: the performance goes up even more.)
(If needed) Will the hi level on [the causal factor not selected] make it go up
even more than if it was just [the causal factor selected] affecting it?
b. (If pair says no) Lets look at the chart for [the causal factor not selected].
(Once all predictions have been made)
Now, choose from all these applicants. Which five should be chosen for the final
team? Discuss until everyone agrees.
Now, lets look at the ones you have chosen.
(Review each of the chosen applicants and compare their predicted
performances. For the ones with different performances, ask pairs to review
their predictions side by side.)
You gave these applicants lower grade, can you explain why?
(Repeat until all of the predictions of the chosen five applicants have the
appropriate
56