You are on page 1of 5

Developing

Scientific Software through the Open Community Engagement Process


Laura Christopherson, Ray Idaszak, and Stan Ahalt
Renaissance Computing Institute, University of North Carolina at Chapel Hill

Todays research relies on trustworthy software. Scientists use software to collect data, run
simulations, perform statistical analysis and testing, and visualize results. Research results
influence the development of life-sustaining techniques, products, and policies, such as clinical
treatments, drugs, and solutions to environmental problems.
Many scientists develop their own software (Hannay et al., 2009). The quality of academic
software, however, tends to be lower than commercially-developed software. Errors in the code
have led to paper retractions and irreproducible findings (Merali, 2010). If the software contains
errors, study results may be unreliable, and the life-sustaining techniques, products, and policies
developed based on study findings may also be in error. This could cost lives.
There are barriers to using proven software engineering methods, such as test-driven
development and continuous integration, in the development of academic code (cf. Basili et al.,
2008; Carver et al., 2007; Pitt-Francis et al., 2008). For example, scientists are not incentivized
financially or with increased prestige to employ software engineering best practices or obtain
training in software engineering because funding is limited and tenure and promotion decisions are
based on a successful publication record, not on whether one can program well.
Overcoming these barriers is important if the scientific community is to make headway in
answering grand challenge research questions. Irreproducible results and paper retractions due to
errors resulting from poor quality software will not advance science.
With funding from the National Science Foundations Scientific Software Innovation Institutes
(S2I2) program, our teamthe Water Science Software Institute (WSSI)is working to lower some
of these barriers by supporting scientific software development, and exploring the kinds of
activities that may improve the development process (Ahalt et al., 2013). To this end, WSSI has
developed a model for software development called the Open Community Engagement Process
(OCEP). OCEP brings software engineers and scientists together during research to traverse a four-
step, iterative process that incorporates Agile development principles, such as an iterative and
incremental development cycle, and an open source approach, such as engaging communities larger
than a single laboratory in the development of the code.

Figure 1: The Open Community Engagement Process

Design
ST
A

ITE
RAT

De
ve
lo
p

RT

Pu

li s

e
fin
Re

Des
ign

p
elo
v
De

ATE
ITER

Re

fi n

is h

Community amplification
with each traversal.

Pu

bl


Step 1Design: Through a discussion of research questions, hypotheses, and barriers to
answering those questions (e.g., difficulty with using relational databases for spatial data),
initial software requirements are developed.
Step 2Develop: Working code that will help scientists answer research questions and test
hypotheses is produced.

Step 3Refine: As the new code is used, new requirements are uncovered and previous
requirements are refined. Return to step 1.
Step 4Publish: To broaden the open source community, software, reproducible research
results, and lessons learned from developing the software are published and disseminated.


The WSSI team may iterate the OCEP steps with existing scientific partners as their needs
change and grow, or it may iterate OCEP steps with new scientific communities with new questions
and hypotheses. As WSSI expands its efforts to a wider group of water scientists and as those
scientists share success stories and publish reproducible research findings, software, and data, the
impact and quality of the software produced through the OCEP is amplified.
In the spirit of Step 4: Publish, this paper reports on the application of Steps 1 through 3 to a
representative, scientist-developed, computational modeling framework originally developed in the
early 1990s. Steps 1 through 3 occurred over the course of six months.
Step 1 activities included
a code walkthrough to orient all participants to the design, operation, and capabilities of
the existing software,
a 2-day in-person specifications meeting where research questions, hypotheses, and
barriers were discussed and used to generate initial requirements and objectives, and
use case development.
Step 2 consisted of a 5-day hackathon with co-located participants; step 3 was a subsequent 3-day
hackathon conducted in two locations simultaneously.
In this paper, these three OCEP activities are evaluated to determine factors that influence
hackathon outcomes. We determined that the first hackathon (Step 2) was more successful than the
second (Step 3) because more functional code was produced at the first hackathon proportional to
the expectations, and participants reported greater satisfaction with the process for the first
hackathon. In our evaluation, there were three main factors that contribute to hackathon success:
Planning and expectations
Communication
Location

Planning and expectations
Expectations for the first hackathon were set during Step 1 activities. Participants oriented
themselves to the code during the code walkthrough, which helped all involved understand the
complexity of the system. This understanding informed the selection of the portion of the code that
would be refactored during the first hackathon. The chosen unit of code was relatively small, and
provided an independent input to the underlying theoretical model, and so revisions posed little
risk to the overall system. Refactoring this unit of code was perceived to be a low risk effort that
would build trust between the scientists and the software engineers.
The selection of this low risk unit of code, along with the knowledge that this would be the
first attempt at a hackathon for the WSSI team, unified and inspired participants to view the first
hackathon as an experimental, learning endeavor. The team expected the first hackathon to be
challenging and that unanticipated obstacles would arise which would require the group to work
together to resolve. Experimentation was encouraged. Mistakes were expected and were viewed as
learning opportunities. The spirit of adventure that underpinned the first hackathon contributed to
the feelings of satisfaction by participants.
Most of the Step 1 activities, which provided opportunities for clarifying expectations and
helped focus attention on coding during the first hackathon, were not repeated prior to the second
hackathon. For example, no code walkthrough was conducted prior to the second hackathon. New
participants reported feeling pressed to ramp up quickly during the second hackathon so time
could be spent coding rather than getting oriented.

Christopherson-Idaszak-Ahalt: Developing Scientific Software :: 2

Objectives for the second hackathon were determined beforehand, but by a subset of the
participants. The participants who were not included in these objectives discussions had no
opportunity to help assess the feasibility of achieving the objectives. Feasible and agreed-upon
objectives were particularly important given the reduced duration of the second hackathon. The
adopted set of objectives was longer, more complex, and ultimately proved difficult to complete
over the three days.
The objectives included the development of new functionality, but, in hindsight, existing
code could have benefitted from refactoring before building new code on top of it. Because the
objectives also included revisions to code that directly impacted the underlying theoretical model,
the perceived overall risk was higher increasing pressure on the team.

Communication
Step 1 activitiesconducted over two monthslaid the foundation for open and inclusive
communication among all participants of the first hackathon. The code walkthrough, the 2-day
specifications meeting, and the collaborative work conducted in between these activities and the
first hackathon helped the group develop synergy, establish a common vocabulary for describing
their work, and build consensus with regard to scope and objectives. As a result of these activities,
all participants came to the first hackathon with a shared understanding of the goals, requirements,
and work tasks to be undertaken. Work during the hackathon could focus intensely on improving
the code.
Overall, formal opportunities for communication between all participants before the second
hackathon, such as those cited in Step 1, were omitted. Although many of the participants in the
second hackathon also participated in the first, objectives and targeted code were sufficiently
different from the first to warrant a repetition of some of the activities conducted in Step 1 that
aided communication and established trust. Mutual understanding of the system, consensus on
objectives and requirements and their feasibility, and a common vocabulary were only partially
established before the second hackathon. So instead of devoting hackathon time to coding, time was
often used for discussion and decision-making. For example, some participants spent an hour and a
half on a videoconference call to make decisions that affected work on one of the objectives.
Additional time was spent documenting these decisions. Coding time was lost, but essential
collective understanding was gained.
Many of these pre-coding discussions were lengthy because common vocabulary needed to
be defined. For example, the software engineers believed that it was important to incorporate
opaque types so that anyone wishing to revise the software would be required to use the functional
interface instead of accessing the data objects directly. This is also called encapsulation. It is
beneficial because a developer must manipulate the data object through its methods instead of
manipulating the object itself, and changes made to one part of the system can be propagated to
other parts of the system in a more transparent and sustainable manner. Before a shared
vocabulary was established, some of the scientists reasonably assumed a more general connotation
of opaquethat the code would be more difficult to understand or read. Conversely, the scientists
also needed to explain their specialized terminology for describing water science phenomena, such
as infiltration and exfiltration. Clarification of these domain specific terms was required before
coding progress could be made.
The consensus and trust that gradually developed over two months before the first
hackathon was forced into three days at the second hackathon at the expense of coding time. This
time crunch added undue pressure on the team, and some participants reported that this made it
more difficult to achieve synergy as quickly as expected. It ultimately resulted in less working code
produced by the second hackathon and contributed to lower reported satisfaction.

Christopherson-Idaszak-Ahalt: Developing Scientific Software :: 3

Location
All participants were co-located for the first hackathon. The second hackathon required
remote communication and collaboration because participants were located in two states, but were
in the same time zone.
At the end of each day of each hackathon there was a check-in for all participants to report
progress and make plans for the following day. These check-ins were conducted with
videoconferencing software during the second hackathon. Most participants reported
dissatisfaction with remote collaboration and cited the technology as contributing to that
dissatisfaction. For example, participants reported difficulty hearing and seeing others.
Adjustments were made, including adding another camera in one location. Other aspects of remote
collaboration proved difficult: white-boarding, looking over someones shoulder during coding, or
poring over printed notes together was not possible. Instead, documents had to be shared over
cloud storage services, which did not permit the kind of face-to-face interaction that participants
would have preferred.
Additionally, the videoconference check-ins consumed more time in the second hackathon
than in the first because of the reduced immediacy and interaction with long-distance
communication. Participants could not simply stop by a small working group and listen in or chime
in with some of their own ideas throughout the day, as they were able to in the first hackathon. So
the second hackathons check-ins included not only work updates, but also attempts to catch
everyone up on thinking and context. Many participants felt that these longer check-ins took time
away from coding.

Recommendations
This paper has discussed factors that have contributed to the successes and challenges we
have experienced with implementing OCEP. Based on our observations we make some
recommendations:
Start small and gradually build toward more complex objectives. This is consistent with
Agile development.
Refactor before adding new functionality.
Approach development as a learning experience. Welcome experimentation, and treat
mistakes as a natural part of the learning process.
Repeat Step 1 activities before all hackathons to develop consensus before coding to allow
hackathon time to focus on coding. In higher risk situations, provide additional time for Step
1 activities. We recommend a minimum of two months.
Ensure any newcomers receive some form of orientation prior to the hackathon, such as a
code walkthrough or system documentation.
Co-locate rather than collaborating remotely whenever feasible.
We propose this topic to the Workshop on Sustainable Software for Science: Practice and
Experience because we want to gather feedback and discuss similar experiences with other
workshop participants. We believe a rich discussion of this topic will be beneficial both to WSSI in
our continued exploration and implementation of OCEP and to other groups working toward
improving scientific software and the development processes supporting it.


Christopherson-Idaszak-Ahalt: Developing Scientific Software :: 4

Acknowledgements

This work was funded by an award from the National Science Foundation (1216817).


References
Ahalt, S., Band, L., Minsker, B., Palmer, M., Tiemann, M., Idaszak, R., Lenhardt, C., Whitton, M. (2013). Water
Science Software Institute: An Open Source Engagement Process. 2013 International Workshop on
Software Engineering for Computational Science and Engineering (SE-CSE13); San Francisco, California;
May 18, 2013.
Basili, V. R., Cruzes, D., Carver, J. C., Hochstein, L. M., Hollingsworth, J. K., Zelkowitz, M. V, & Shull, F. (2008).
Understanding the High-Performance-Computing Community: A Software Engineers Perspective. IEEE
Software, 25(4), 2936.
Carver, J. C., Kendall, R. P., Squires, S. E., & Post, D. E. (2007). Software development environments for
scientific and engineering software: A series of case studies. 29th International Conference on Software
Engineering (pp. 550559). Minneapolis, MN.
Hannay, J. E., MacLeod, C., Singer, J., Langtangen, H. P., Pfahl, D., & Wilson, G. (2009). How do scientists develop
and use scientific software? 2009 ICSE Workshop on Software Engineering for Computational Science and
Engineering (pp. 18). Washington DC: IEEE.
Merali, Z. (2010). ...ERROR ... why scientific programming does not compute. Nature News, 467(7317), 775
777.
Pitt-Francis, J., Bernabeu, M. O., Cooper, J., Garny, A., Momtahan, L., Osborne, J., Pathmanathan, P., et al. (2008).
Chaste: using agile programming techniques to develop computational biology software. Philosophical
transactions. Series A, Mathematical, physical, and engineering sciences, 366(1878), 31113136.

Christopherson-Idaszak-Ahalt: Developing Scientific Software :: 5

You might also like