You are on page 1of 5

Can we Measure Computational Thinking with Tools?

Present and Future of Dr. Scratch

Jesús Moreno-León Gregorio Robles


Programamos & Universidad Rey Juan Carlos Universidad Rey Juan Carlos
Seville, Spain Madrid, Spain
jesus.moreno@programamos.es grex@gsyc.urjc.es
Marcos Román-González
Universidad Nacional de Educación a Distancia
Madrid, Spain
mroman@edu.uned.es

by previous assessment tools for visual program-


ming languages, such as Scrape [WHT11], Fairy
Abstract Assessment [WDCK12] or REACT [KBNR14], and is
based on Hairball [BHL+ 13], a static code analyzer
Dr. Scratch is a web-tool that analyzes for Scratch projects that detects potential issues in
Scratch projects to assess the development the code [FCB+ 13].
of computational thinking skills. This pa- Since the Hairball architecture is based on plug-
per presents the current state of the valida- ins, we developed new plug-ins to detect several bad
tion process of the tool. The process involves programming habits or bad smells that educators fre-
several investigations to test the validity of quently detect in their work with middle and high
Dr. Scratch from different perspectives, such school students [MR14]. In addition, we created
as the extent to which learners improve their a web-based service that facilitates the analysis of
coding skills while using the tool in real life the projects and provides feedback with ideas to im-
scenarios; the relationships of the score pro- prove the code. This web-tool is what we called Dr.
vided by the tool with other, similar measure- Scratch.
ments; the capacity of the tool to discriminate In this paper we present the current state of the
between different types of Scratch projects; as validation process of Dr. Scratch, highlighting the
well as the vision and feelings of educators who actions that have already been performed or are under
are using the tool in their lessons. The paper development, and pointing to the activities that are
also highlights the actions that are still pend- still pending to complete its formal validation.
ing to complete the formal validation of Dr.
Scratch.
2 Dr. Scratch CT analysis
1 Introduction The main feature of Dr. Scratch is the analysis of CT
skills. Aiming to come up with a CT score system,
Dr. Scratch [MLRRG15] is a free/libre/open source we reviewed prior work proposing theories and meth-
tool that analyzes Scratch [RMMH+ 09] projects ods for assessing the development of programming and
to assess the level of development of computa- CT skills of learners [WHC12, SF13, BR12] and col-
tional thinking (CT) skills by inspecting the source laborated with educators with years of experience us-
code of the programs. Dr. Scratch is inspired ing Scratch in their lessons. Table 1 summarizes the
CT assessment, which is based on the degree of de-
Copyright c by the paper’s authors. Copying permitted for
private and academic purposes.
velopment of seven dimensions of the CT competence:
Proceedings of the Seminar Series on Advanced Techniques and
abstraction and problem decomposition, logical think-
Tools for Software Evolution SATToSE 2017 (sattose.org). ing, synchronization, parallelism, algorithmic notions
07-09 June 2017, Madrid, Spain. of control flow, user interactivity and data represen-

1
tation. These dimensions are statically evaluated by the grouping of some dimensions, thus simplifying the
inspecting the source code of the analyzed project and feedback provided to users.
given a score from 0 to 3, resulting in a total mastery
score that ranges from 0 to 21 when all seven dimen- 3.1 Ecological Validity
sions are aggregated. With this information the tool
One of the main objectives of Dr. Scratch is to en-
generates a feedback report that is displayed to learn-
courage students to improve their programming skills.
ers, as shown in Figure 1.
Aiming to check its effectiveness regarding this goal,
we organized a series of workshops with over 100 stu-
dents in the range from 10 to 14 years in 8 different
schools [MLRRG15]. During the workshops, partici-
pants analyzed one of their projects with Dr. Scratch,
read the feedback report provided by the tool, and
tried to improve their projects using the tips included
in the report. After 1 hour of working with Dr.
Scratch, students increased their CT score and im-
proved their programming skills, which proves that the
tool is useful for learners of these ages.
Nevertheless, even though Dr. Scratch offers a CT-
dependent feedback report, the workshops showed that
the tool was especially useful for older students (12-
14 years old) with an intermediate initial CT score.
Figure 1: Dr. Scratch Analysis Result for Star Wars Future interventions will help us modify the feedback
StickMan DressUp, a Scratch project available at report so that younger students as well as learners with
https://scratch.mit.edu/projects/101463736/. basic or advanced CT levels can also make the most
out of the tool.

3.2 Convergent Validity


3 Dr. Scratch Validation Process
A big step in the validation process is the compar-
Different actions have been performed, or are under ison of the evaluations provided by Dr. Scratch
way, to validate Dr. Scratch from different perspec- with other measurements of similar constructs. We
tives. Firstly, we want young students to be able to identified three assessments that could be used in
analyze their projects and independently learn from this regard: the ones provided by human experts,
the tips that the tool provides, so we organized a se- software engineering complexity metrics and the
ries of workshops in schools to check the extent to CT-test [RGPGJF16], a 28-item multiple-choice test
which learners improve their coding skills while us- aimed at 12-13 years old students to assess their CT
ing the tool in real life scenarios (ecological validity, skills.
section 3.1). In addition, the CT score provided by On one hand, in order to compare the auto-
the tool is, to some degree, similar to other measure- matic scores provided by Dr. Scratch with the
ments, such as teacher grades or software engineering evaluations by human experts, we organized a pro-
complexity metrics, and the score has therefore been gramming contest for primary and secondary stu-
compared with them to study relationships and cor- dents [MLRGHR17]. This allowed us to gather and
relations (convergent validity, section 3.2). On other study over 450 evaluations of Scratch projects given
hand, Scratch creations are categorized under differ- by 16 experts in computer science education. The re-
ent types of projects, and it is interesting to find out if sults show strong correlations between automatic and
this topology is replicated when projects are analyzed manual evaluations, which could be considered as a
with Dr. Scratch (discriminant validity, section 3.3). validation of the metrics used by the tool. According
Lastly, the tool will not be used unless educators feel to the assessment research literature [CH12], the tool
that it measures what it promises, so we have surveyed is ideally convergent with expert evaluators, since the
several hundreds teachers on this regard (face validity, correlation detected between measurements is greater
section 3.4). than r = .70.
A key validation action that is still pending is the On other hand, we compared the Dr. Scratch
study of a big number of analyses to identify poten- scores with two classic software engineering met-
tial clusters of CT dimensions assessed by the tool rics that are globally recognized as a valid measure-
(factorial validity, section 3.5), which could lead to ment for the complexity of a software system: Mc-

2
Table 1: Level of Development for Each CT Dimension Assessed by Dr. Scratch [MLRRG15]

CT dimension Basic Intermediate Proficient


Logical Thinking if if else logic operations
Data representation modifiers of object variables lists
properties
User interactivity green flag keyboard, mouse, ask webcam, input sound
and wait
Control flow sequence of blocks repeat, forever repeat until
Abstraction and prob- more than one script use of custom blocks use of ’clones’ (instances
lem decomposition of sprites)
Parallelism two scripts on green two scripts on key two scripts on receive
flag pressed or sprite message, video/audio
clicked input, backdrop change
Synchronization wait message broadcast, wait until, when back-
stop all, stop program drop changes to, broad-
cast and wait

Cabe’s Cyclomatic Complexity and Halstead’s met- projects are subjected to the Dr. Scratch mastery
rics [MLRRG16a]. By comparing the results after an- score. For instance, while if and if else instructions
alyzing 95 Scratch projects, we found positive, signif- are commonly present at games, this is not the case
icant, moderate to strong correlations between mea- for stories, which usually have a linear structure with-
surements, which could also be considered as a valida- out branches. Therefore, games tend to score high in
tion of the complexity assessment process of the tool. the Dr. Scratch logical thinking dimension, while the
Finally, in an intervention involving an 8-weeks pro- opposite is true for stories.
gramming course with Scratch in three Spanish mid-
dle schools, with a total sample of n=71 students, 3.4 Face Validity
we compared the Dr. Scratch scores with the re-
Another main goal of the tool is to support teach-
sults of the CT-test. The findings show a posi-
ers in the evaluation tasks. It is thus important that
tive, moderate, and statistically significant correla-
educators feel that Dr. Scratch does what it claims
tion between tools, both in predictive and concurrent
to do (i.e. assessing CT). Pursuing this goal we pre-
terms [RGMLR17]. As we expected, the convergence
pared a survey for teachers who participate in a 40-
is not total since, although both tools are assessing
hours Scratch coding training course. At the mo-
the same psychological construct, they do it from dif-
ment of writing this paper more than 320 educators
ferent perspectives: summative-aptitudinal (CT-test)
have submitted the survey and another 120 are taking
and formative-iterative (Dr. Scratch).
the course. The partial results indicate that 84% of
participating teachers feel that the CT analysis of the
3.3 Discriminant Validity tool is measured in a correct way. Having educators
who teach at different grades and who have reached
Projects shared in the Scratch repository are catego- distinct levels of development of CT skills during the
rized under one or more project types, being the most course will allow the study of potential differences in
common games, animations, music, art and stories, al- their opinions about the usefulness of the tool based
though there are other categories such as tutorials or on these variables.
simulations. Aiming to check whether Dr. Scratch
is able to detect differences in the CT dimensions de-
3.5 Factorial Validity
veloped when programming different types of Scratch
projects, we randomly downloaded 500 projects from A key pending validation action is to study the re-
the main categories in the repository. Although this lationships between the different CT dimensions as-
work has not been published (at the moment of writ- sessed by Dr. Scratch aiming to identify clusters of
ing this paper it is still under revision), the results dimensions that share sufficient variation. This fac-
of a K-means cluster analysis confirm that different torial analysis, if performed on a big enough num-
types of projects can be used to develop distinct CT ber of projects, could lead to the grouping of some
dimensions, and consequently, the existing typology of the dimensions, which would therefore simplify the
of Scratch projects is empirically replicated when the feedback report that the tool displays to learners.

3
For this action we will use the dataset made avail- the 2016 ACM Conference on Interna-
able from [AH16], which consists of 250,166 projects tional Computing Education Research,
scraped from the Scratch repository. pages 53–61. ACM, 2016.

4 Conclusions [BHL+ 13] Bryce Boe, Charlotte Hill, Michelle


Len, Greg Dreschler, Phillip Con-
This paper presents the current state of the validation rad, and Diana Franklin. Hairball:
process of Dr. Scratch, a tool that enables analyz- Lint-inspired static analysis of Scratch
ing Scratch projects to assess the development of CT projects. In Proceeding of the 44th
skills. Most of the investigations required to validate ACM Technical Symposium on Com-
the tool are already done and published (ecological and puter Science Education, SIGCSE ’13,
convergent validity), or are in progress and will soon pages 215–220, New York, NY, USA,
be published (discriminant and face validity). In con- 2013. ACM.
sequence, even though there is also a key action that is
still pending (factorial validity), the verification pro- [BR12] Karen Brennan and Mitchel Resnick.
cess of Dr. Scratch is close to be finalized. New frameworks for studying and as-
It must be noted, nonetheless, that the analysis of sessing the development of Computa-
just one project cannot provide a full picture of a tional Thinking. In Proceedings of
learner’s CT skills, as there are perfectly valid sim- the 2012 annual meeting of the Ameri-
ple projects that do not require any modifications to can Educational Research Association,
include more complex structures (those that give a Vancouver, Canada, pages 1–25, 2012.
higher CT score). Consequently, we plan to add a
new feature to enable the creation of user accounts, [CH12] Kevin D Carlson and Andrew O Herd-
aiming that the analysis of the portfolio of projects of man. Understanding the impact of
a learner will provide a richer, more accurate picture. convergent validity on research re-
Furthermore, this new feature will also offer the sults. Organizational Research Meth-
possibility of easily tracking learners progression and ods, 15(1):17–32, 2012.
projects evolution, both in terms of software com-
plexity and presence of bad smells. The follow- [FCB+ 13] Diana Franklin, Phillip Conrad, Bryce
ing papers can illustrate the kind of investigations Boe, Katy Nilsen, Charlotte Hill,
in which the tool could be used in this regard: i) Michelle Len, Greg Dreschler, Gerardo
A study on the relationship between socialization Aldana, Paulo Almeida-Tanaka, Brynn
and coding skills [MLRRG16b] that made use of the Kiefer, Chelsea Laird, Felicia Lopez,
whole Scratch repository from 2007 to 2012 [HMH17], Christine Pham, Jessica Suarez, and
wherein we used an adaptation of Dr. Scratch to Robert Waite. Assessment of com-
assess the sophistication of more than 1.5 million puter science learning in a Scratch-
projects authored by almost 70,000 users. ii) An inves- based outreach program. In Proceed-
tigation on the presence of copy-and-paste in Scratch ing of the 44th ACM Technical Sympo-
projects, in which we correlated the assessment of sium on Computer Science Education,
the CT skills of learners, measured by Dr. Scratch, SIGCSE ’13, pages 371–376, New York,
with the existence of software clones in over 230,000 NY, USA, 2013. ACM.
projects [RMLAH17].
[HMH17] Benjamin Mako Hill and Andrés
Acknowledgements Monroy-Hernández. A longitudinal
This work has been funded in part by the Region of dataset of five years of public activity
Madrid under project “eMadrid - Investigación y De- in the scratch online community. Sci-
sarrollo de tecnologı́as para el e-learning en la Comu- entific Data, 4, 2017.
nidad de Madrid” (S2013/ICE-2715).
[KBNR14] Kyu Han Koh, Ashok Basawap-
atna, Hilarie Nickerson, and Alexan-
References
der Repenning. Real time assessment
[AH16] Efthimia Aivaloglou and Felienne Her- of computational thinking. In Visual
mans. How kids code and how we Languages and Human-Centric Com-
know: An exploratory study on the puting (VL/HCC), 2014 IEEE Sympo-
Scratch repository. In Proceedings of sium on, pages 49–52. IEEE, 2014.

4
[MLRGHR17] Jesús Moreno-León, Marcos Román- [RMLAH17] G. Robles, J. Moreno-León, E. Aival-
González, Casper Harteveld, and Gre- oglou, and F. Hermans. Software
gorio Robles. On the automatic assess- clones in scratch projects: on the pres-
ment of computational thinking skills: ence of copy-and-paste in computa-
A comparison with human experts. tional thinking learning. In 2017 IEEE
In Proceedings of the 2017 CHI Con- 11th International Workshop on Soft-
ference Extended Abstracts on Human ware Clones (IWSC), pages 1–7, Feb
Factors in Computing Systems, CHI 2017.
EA ’17, pages 2788–2795, New York,
NY, USA, 2017. ACM. [RMMH+ 09] Mitchel Resnick, John Maloney,
Andrés Monroy-Hernández, Natalie
[MLRRG15] Jesús Moreno-León, Gregorio Robles, Rusk, Evelyn Eastmond, Karen Bren-
and Marcos Román-González. Dr. nan, Amon Millner, Eric Rosenbaum,
Scratch: Automatic analysis of scratch Jay Silver, Brian Silverman, and
projects to assess and foster Computa- Yasmin Kafai. Scratch: Programming
tional Thinking. RED. Revista de Ed- for all. Commun. ACM, 52(11):60–67,
ucación a Distancia, 15(46), 2015. November 2009.
[MLRRG16a] Jesús Moreno-León, Gregorio Robles, [SF13] Linda Seiter and Brendan Foreman.
and Marcos Román-González. Com- Modeling the learning progressions
paring computational thinking devel- of computational thinking of primary
opment assessment scores with soft- grade students. In Proceedings of
ware complexity metrics. In 2016 the Ninth Annual International ACM
IEEE Global Engineering Education Conference on International Comput-
Conference (EDUCON), pages 1040– ing Education Research, ICER ’13,
1045, April 2016. pages 59–66, New York, NY, USA,
[MLRRG16b] Jesús Moreno-León, Gregorio Robles, 2013. ACM.
and Marcos Román-González. Exam-
[WDCK12] Linda Werner, Jill Denner, Shan-
ining the relationship between social-
non Campe, and Damon Chizuru
ization and improved software develop-
Kawamoto. The fairy performance
ment skills in the scratch code learn-
assessment: Measuring computational
ing environment. Journal of Universal
thinking in middle school. In Proceed-
Computer Science, 22(12):1533–1557,
ings of the 43rd ACM Technical Sympo-
2016.
sium on Computer Science Education,
[MR14] Jesús Moreno and Gregorio Robles. SIGCSE ’12, pages 215–220, New York,
Automatic detection of bad program- NY, USA, 2012. ACM.
ming habits in Scratch: A preliminary
study. In 2014 IEEE Frontiers in Ed- [WHC12] Amanda Wilson, Thomas Hainey, and
ucation Conference (FIE) Proceedings, Thomas Connolly. Evaluation of com-
pages 1–4, Oct 2014. puter games developed by primary
school children to gauge understanding
[RGMLR17] Marcos Román-González, Jesús of programming concepts. In European
Moreno-León, and Gregorio Robles. Conference on Games Based Learning,
Complementary tools for computa- page 549. Academic Conferences Inter-
tional thinking assessment. In CTE national Limited, 2012.
2017: International Conference on
Computational Thinking Education [WHT11] Ursula Wolz, Christopher Hallberg,
2017, page In press, July 2017. and Brett Taylor. Scrape: A tool for
visualizing the code of Scratch pro-
[RGPGJF16] Marcos Román-González, Juan-Carlos grams. In Poster presented at the 42nd
Pérez-González, and Carmen Jiménez- ACM Technical Symposium on Com-
Fernández. Which cognitive abilities puter Science Education, Dallas, TX,
underlie computational thinking? cri- 2011.
terion validity of the computational
thinking test. Computers in Human
Behavior, pages –, 2016.

You might also like