You are on page 1of 4

Embrace your issues: Compassing the software

engineering landscape using bug reports


Markus Borg
Dept. of Computer Science
Lund University
Lund, Sweden
markus.borg@cs.lth.se
ABSTRACT
Software developers in large projects work in complex in-
formation landscapes, and staying on top of all relevant
software artifacts is challenging. As software systems often
evolve for years, a high number of issue reports is typically
managed during the lifetime of a system. Ecient man-
agement of incoming issue requires successful navigation of
the information landscape. In our work, we address two
important work tasks involved in issue management: Issue
Assignment (IA) and Change Impact Analysis (CIA). IA is
the early task of allocating an issue report to a development
team. CIA deals with identifying how source code changes
aect the software system, a fundamental activity in safety-
critical development. Our solution approach is to support
navigation, both among development teams and software
artifacts, based on information available in historical issue
reports. We present how we apply techniques from machine
learning and information retrieval to develop recommenda-
tion systems. Finally, we report intermediate results from
two controlled experiments and an industrial case study.
Categories and Subject Descriptors
D.2.5 [Software Engineering]: Testing and Debugging
Debugging aids, tracing; D.2.9 [Software Engineering]:
ManagementProgramming teams
Keywords
issue management, recommendation systems, machine learn-
ing, information retrieval
1. INTRODUCTION
The information landscape in large software engineering
projects is complex and ever-changing. Development teams
collaborate using information repositories such as require-
ments databases, test management systems, source code repos-
itories, and general document management systems, in to-
tal storing thousands of artifacts [9, 25, 17]. Often the
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for prot or commercial advantage and that copies
bear this notice and the full citation on the rst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specic
permission and/or a fee.
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.
interoperability between the repositories is poor, i.e., the
repositories turn into isolated information silos with little
transparency. An intrinsic strength of software is that it is
easy to modify, but source code changes impact other soft-
ware artifacts as well. Thus, both source code and other
stored information is continuously created and updated in
large projects [14], and staying on top of the information
landscape can constitute a signicant challenge for both de-
velopers and managers [26].
Information overload, i.e., projects characterized by a
state where individuals do not have time or capacity to pro-
cess all available information [16], threatens productivity
in large software engineering projects. Quick and concise
access to information is crucial in knowledge intensive work
such as software development and evolution. If the project
environment does not provide sucient support for naviga-
tion and retrieval, substantial eort is wasted on locating
relevant information [23].
In some projects, issue repositories constitute examples of
information silos that contain challenging amounts of soft-
ware engineering data. The continuous inow of issue re-
ports, especially for public issue repositories [2], makes ac-
tivities such as duplicate management, prioritization, and
work allocation tedious and error-prone [12, 7, 22]. While
some of the activities are supported in state-of-practice issue
repositories (e.g., Bugzilla and HP Quality Center oer au-
tomated duplicate detection), most processing of incoming
issue reports is still manual.
Previous research argues that the issue repository is a
key collaborative hub in large software engineering projects.
Cubranic et al. developed Hipikat, a recommendation sys-
tem to help newcomers in open source communities navi-
gate the existing information, and used the issue repository
to build a project memory [15]. Anvik and Murphy pre-
sented automated decision support for several activities in-
volved in issue triaging, all based on information stored in
issue repositories [4].
We also view the issue repository as a key collaborative
hub, but increase the granularity further by considering each
individual issue report as an important juncture in the in-
formation landscape. We have previously shown that issue
reports can connect software artifacts that are stored in sep-
arate databases [10], i.e., issue reports are a way to break
information silos. In software engineering contexts where the
change management process is rigid, every corrective change
originating from xing an issue report must be documented.
As such, the trace links from issue reports to artifacts in
various repositories turn into trails in the information land-
<
p
r
e
p
r
i
n
t
>
scape, created by past engineers as part of their everyday
software development.
In line with previous work, we apply Machine Learning
(ML) to detect patterns in the inow of issue reports. As
ML in general performs better the more data that are avail-
able [5], we aim to harness the daunting inow of issue re-
ports to assist navigation in the software engineering land-
scape. Early results indeed show that our solutions improve
as the number of issue reports grows, e.g., in issue assign-
ment [21] and prioritization [24]. The rest of this paper
presents how we support two work tasks in issue manage-
ment: Issue Assignment (IA) and Change Impact Analysis
(CIA).
2. PROBLEMANDSOLUTIONAPPROACH
Our work particularly addresses IA and CIA, as our indus-
trial partners stressed the importance of these two activities.
2.1 Automated IA using Ensemble Learning
The initial decision in issue triaging is typically to de-
cide which development team should investigate a certain
issue report, an activity referred to as IA. Several studies
report that manual bug assignment is labor-intensive and
error-prone [6, 19, 8], resulting in bug tossing (i.e., reas-
signing issue reports to other developers) and delayed issue
corrections.
We support IA using supervised ML. In line with previous
work, we train a classier on the textual content in issue
repositories [3, 19, 1]. However, while previous work has
applied individual classiers, we combine several classiers
in an ensemble learner using Stacked Generalization (SG).
SG is a versatile ensemble learner able to combine classiers
of dierent types, and it was for example heavily used in the
winning contribution of the well-known Netix Prize [27].
Previous research on IA has primarily targeted Open Source
Software (OSS) projects. OSS development is indeed inter-
esting to study, but it is often treated as the default software
engineering context. In fact, it diers in several aspects such
as development processes, team organization and developer
incentives. As presented in Section 3, we instead evaluate
our approach in proprietary issue management contexts.
2.2 A Recommendation System for CIA
When the issue triager has concluded that a corrective x
is required for an issue report, a CIA is required. Especially
in safety-critical development contexts, CIA is a formal anal-
ysis that must be completed before any changes to the source
code is allowed [18]. Several researchers have studied CIA
on the source code level [13]. However, in many proprietary
development contexts, it is critical to also determine impact
on other types of artifacts, e.g., requirements, design speci-
cations, and test case descriptions.
Our approach to support CIA is to develop a Recommen-
dation System, i.e., a software application that provides
information items estimated to be valuable for a software
engineering task in a given context [26]. Our solution, Imp-
Rec, relies on mining a knowledge base, a network of artifacts
and trace links, from previous CIA reports in an issue repos-
itory (similar to the project memory in Hipikat [15]). Then
we use Apache Lucene
1
, a state-of-the-art OSS search en-
gine library, to index the natural language content of all
1
lucene.apache.org
issue reports. Also, we calculate the relative importance of
the artifacts in the knowledge base using network centrality
measures.
ImpRec rst identies a set of potentially impacted ar-
tifacts, and then orders them using a ranking function (cf.
Figure 1). When a CIA for a new issue report is needed,
the tool uses traditional Information Retrieval (IR) through
Lucene, to locate textually similar issue reports in the knowl-
edge base. Starting from the similar issue reports, the start-
ing points, ImpRec conducts breadth-rst searches to iden-
tify additional candidates. Finally, all impact candidates are
ranked based on textual similarities, distances in the knowl-
edge base, and network centralities.
Figure 1: Overview of impact recommendations in
ImpRec. 1) Starting points in knowledge base iden-
tied using IR. 2) Breadth-rst searches follow trace
links to locate previous impact. 3) The impact can-
didates are ranked using network measures and tex-
tual similarity.
3. EVALUATION METHOD
Our idea is to evaluate automated support for issue man-
agement in two proprietary organizations. The rst com-
pany, Comp Telecom, develops telecom infrastructure and
employs thousands of software engineers. Allocating re-
sources is a major challenge, and bug tossing is an explicit
problem. The second company, Comp Auto, develops safety-
critical solutions for the process automation industry. The
development process follows safety standards mandating rig-
orous process requirements on traceability. CIA is an im-
portant work task for the developers, but it requires much
eort.
We evaluate automated IA in a controlled experiment us-
ing ve proprietary datasets. Four datasets originate from
dierent projects at Comp Telecom, and one dataset orig-
inates from Comp Auto. In total more than 50,000 issue
reports are used in the experiment. We train classiers on
dierent subsets of historical issue reports, and test our ap-
proach on more recent issue reports.
Initial evaluations of ImpRec were also conducted in a
controlled in-vitro setting. We extracted 27,000 issue reports
from Comp Auto, constituting 12 years of development of
an industrial automation system. We mined a knowledge
base from the previous CIA reports, in total about 5,000
reports, and compared the ImpRec output to the true im-
pact previously reported. As ImpRec has several parameters
that need to be congured, and as automated solutions in
software engineering typically are highly dependent on the
dataset [11], we aim to publish guidelines on how to sys-
<
p
r
e
p
r
i
n
t
>
tematically tune tools such as ImpRec as an outcome of the
in-vitro evaluation.
ImpRec is also evaluated in an in-vivo setting. We are cur-
rently conducting an industrial case study at Comp Auto in
Sweden (Case Sweden). We have selected one development
team that performs frequent CIAs as part of their work.
Four team members with dierent development roles in-
stalled ImpRec on their local machines in March 2014. The
team members are encouraged to use ImpRec when they
perform CIAs, and to report their experiences from using
the tool. As the developers use ImpRec, all user actions are
collected. Before the tool was deployed, we conducted seven
interviews to better understand the current CIA process.
Moreover, in August 2014 we will deploy ImpRec in a team
at the Comp Auto development site in Bangalore, India to
enable a replication of the case study in another part of the
organization (Case India). While the two cases under study
will be dierent, the involved developers work on the same
automation system and use the same issue repository. Thus,
the required reconguration of ImpRec will be minimal.
4. INITIAL RESULTS
Our controlled experiment of automated IA shows that
combining classiers under SG consistently outperforms in-
dividual classiers with regard to prediction accuracy [20].
Furthermore, our results conrm that it is worthwhile to
strive for a diverse set of individual classiers in the en-
semble, consistent with recommendations in the general ML
research eld. Our approach reaches prediction accuracies
of 50% to 90% for the ve dierent datasets, in line with the
prediction accuracy of the current manual processes. The
learning curves for all ve datasets display similar behaviour;
IA for all datasets benets from more training data, but
reaches dierent maximum prediction accuracy. However,
as all learning curves start to atten out at about the same
size of the training set, we present an empirically based rule
of thumb: when training a classier for automated IA, aim
for at least 2,000 issue reports in the training set.
We rst evaluated ImpRec in a controlled setting, and ex-
plored how to congure the involved parameters. Our initial
results suggest that using multiple starting points (>15) in
the knowledge base, and considering impacted artifacts sev-
eral trace links away (>5), increases the prediction accuracy.
However, considering a high number of impact candidates
requires a good ranking function. In our ranking function,
the most important features are the network centrality, and
the textual similarity between the newly submitted issue re-
port and the historical issue reports used as starting points.
Using feasible feature weighting, ImpRec predicts more than
30% of the previously reported impact among the rst 5 rec-
ommendations, and more than 40% among the rst 10 rec-
ommendations. Some recommendations beyond the rst 10
tend to be correct, but we typically do not value candidates
far down the list, as we suspect that developers are unlikely
to browse too long lists of potentially impacted artifacts.
A preliminary analysis of the interviews conducted in Case
Sweden indicates that developers see the CIA work task as a
rather negative necessity, sometimes conducted just to com-
ply with the safety process. The interviewees conrm that
it is a manual time-consuming activity, and highlight two
fundamental challenges: 1) recognizing the value of formal
CIA, and 2) feeling condent in the answers provided in a
CIA. Furthermore, one developer claims that the most di-
cult artifacts to assess for potential impact are the individual
requirements rather than the source code. The interviews
thus suggest that our work addresses important challenges.
First, using ImpRec could motivate developers to conduct
high-quality CIA, as the information will be reused to kick-
start CIAs in the future [10]. Second, ImpRec can be used
to verify a manual CIA as a way to boost condence. Fi-
nally, ImpRec supports recommending potentially impacted
requirements, as well as other non-code software artifacts.
Since ImpRec was deployed in Case Sweden, the collected
log les reveal that the four developers have used ImpRec to
search for impacted artifacts 23 times. The case study has so
far been running for 11 weeks, i.e., 44 man-weeks, showing
that the developers on average conducted about one IA ev-
ery second week (assuming they followed our instructions).
While this number is lower than what we hoped for, it is in
line with the expectations expressed by the developers dur-
ing the interviews. During this rst part of the case study,
ImpRec has recommended information considered valuable
to the developers in 37.5% of the use cases. As the data col-
lected from the tool are rich, containing both ranking output
and time stamps, we plan to perform a deeper analysis when
user data has been collected also from Case India.
5. CONCLUSION
Software developers in large projects must navigate com-
plex information spaces as part of their everyday work. One
activity that forces developers to locate relevant information
is issue management, i.e., resolving incoming issue reports.
While the volume of issue reports in a project can be daunt-
ing, we suggest harnessing the intrinsic navigational value
of historical issue reports.
IA is an early manual step in issue management. We train
an ensemble learner on previous issues, and use the identied
patterns to automate team assignment. Our results are in
line with the current manual process, i.e., our automated so-
lution selects the correct team for about 50-80% of the issue
reports. Thus, the accuracy of automated team assignment
is not higher, but it is much faster as the system can sug-
gest team assignments for newly submitted issue reports in
an instant.
CIA requires a high degree of information seeking, both
in the source code and the project documentation. Ana-
lyzing impact prior to changing source code is mandated
by development processes in safety-critical domains, but it
is a tedious and error-prone activity. We present a recom-
mendation system that supports CIA for changes caused by
correcting reported issues. The system, ImpRec, builds a
knowledge base by extracting previously reported impact
originating from the corrective xes of old issue reports. An
industrial case study is ongoing, but initial results from an
in-vitro evaluation indicate that more than 40% of thetrue
impact is identied among the 10 rst recommendations.
Future work includes completing the industrial case study
on CIA at an industrial partner in Sweden and conducting
post-mortem interviews. Before that however, we plan to
initiate a replication of the case study in India, to enable
additional feedback from another set of developers. Finally,
we hope to explore how issue reports can be used to pro-
vide decision support for other navigational challenges in
the software engineering context, such as fault localization
and analysis of feedback from end-users.
<
p
r
e
p
r
i
n
t
>
Acknowledgments
This work was funded by the Industrial Excellence Center
EASE Embedded Applications Software Engineering
2
.
6. REFERENCES
[1] M. Alenezi, K. Magel, and S. Banitaan. Ecient bug
triaging using text mining. Journal of Software, 8(9),
2013.
[2] J. Anvik, L. Hiew, and G. Murphy. Coping with an
open bug repository. In Proc. of the 2005 OOPSLA
workshop on Eclipse technology eXchange, pages
3539, 2005.
[3] J. Anvik, L. Hiew, and G. C. Murphy. Who should x
this bug? In Proc. of the 28th International
Conference on Software Engineering, pages 361370,
2006.
[4] J. Anvik and G. Murphy. Reducing the eort of bug
report triage: Recommenders for
development-oriented decisions. ACM Trans. Softw.
Eng. Methodol., 20(3):135, 2011.
[5] M. Banko and E. Brill. Scaling to very very large
corpora for natural language disambiguation. In Proc.
of the 39th Annual Meeting on Association for
Computational Linguistics, pages 2633, 2001.
[6] O. Baysal, M. Godfrey, and R. Cohen. A bug you like:
A framework for automated assignment of bugs. In
Proc. of the 17th International Conference on Program
Comprehension, pages 297298, 2009.
[7] N. Bettenburg, R. Premraj, T. Zimmermann, and
K. Sunghun. Duplicate bug reports considered
harmful... really? In Proc. of the International
Conference on Software Maintenance, pages 337345,
2008.
[8] P. Bhattacharya, I. Neamtiu, and C. R. Shelton.
Automated, highly-accurate, bug assignment using
machine learning and tossing graphs. Journal of
Systems and Software, 85(10):22752292, 2012.
[9] E. Bjarnason, P. Runeson, M. Borg,
M. Unterkalmsteiner, E. Engstrom, B. Regnell,
G. Sabaliauskaite, A. Loconsole, T. Gorschek, and
R. Feldt. Challenges and practices in aligning
requirements with verication and validation: A case
study of six companies. Empirical Software
Engineering, (To appear).
[10] M. Borg, O. Gotel, and K. Wnuk. Enabling
traceability reuse for impact analyses: A feasibility
study in a safety context. In Proc. of the 7th
International Workshop on Traceability in Emerging
Forms of Software Engineering, pages 7278, 2013.
[11] M. Borg and P. Runeson. IR in software traceability:
From a birds eye view. In Proc. of the International
Symposium on Empirical Software Engineering and
Measurement, pages 243246, 2013.
[12] M. Borg and P. Runeson. Changes, evolution and bugs
- Recommendation systems for issue management. In
M. Robillard, W. Maalej, R. Walker, and
T. Zimmermann, editors, Recommendation Systems in
Software Engineering, pages 477509. Springer, 2014.
[13] G. Canfora and L. Cerulo. Impact analysis by mining
software and change request repositories. In Proc. of
2
http://ease.cs.lth.se
the 11th International Symposium on Software
Metrics, pages 929, 2005.
[14] J. Cleland-Huang, C. K. Chang, and M. Christensen.
Event-based traceability for managing evolutionary
change. Transactions on Software Engineering,
29(9):796810, 2003.
[15] D. Cubranic, G. Murphy, J. Singer, and K. Booth.
Hipikat: A project memory for software development.
Transaction on Software Engineering, 31(6):446465,
2005.
[16] M. Eppler and J. Mengis. The concept of information
overload: A review of literature from organization
science, accounting, marketing, MIS, and related
disciplines. The Information Society, 20(5):325344,
2004.
[17] R. Feldt. Do system test cases grow old? In Proc. of
the 7th. International Conference on Software Testing,
number (To appear), 2014.
[18] International Electrotechnical Commission. IEC 61508
ed 1.0, Electrical/electronic/programmable electronic
safety-related systems, 2010.
[19] G. Jeong, S. Kim, and T. Zimmermann. Improving
bug triage with bug tossing graphs. In Proc. of the 7th
joint meeting of the European Software Engineering
Conference and the ACM SIGSOFT Symposium on
The Foundations of Software Engineering, pages
111120, 2009.
[20] L. Jonsson, M. Borg, D. Broman, K. Sandahl, S. Eldh,
and P. Runeson. Automated bug assignment:
Ensemble-based machine learning in large scale
industrial contexts. Empirical Software Engineering,
(Submitted), 2014.
[21] L. Jonsson, D. Broman, K. Sandahl, and S. Eldh.
Towards automated anomaly report assignment in
large complex systems using stacked generalization. In
Proc. of the 5th International Conference on Software
Testing, Verication and Validation, pages 437446,
2012.
[22] S. Just, R. Premraj, and T. Zimmermann. Towards
the next generation of bug tracking systems. In Proc.
of the 2008 Symposium on Visual Languages and
Human-Centric Computing, pages 8285, 2008.
[23] P. Karr-Wisniewski and Y. Lu. When more is too
much: Operationalizing technology overload and
exploring its impact on knowledge worker
productivity. Computers in Human Behavior,
26(5):10611072, 2010.
[24] L. Olofsson and P. Gullin. Development of a Decision
Support System for Defect Reports. Master thesis,
Dept. of Comp. Science, Lund University, 2014.
[25] B. Regnell, R. Berntsson-Svensson, and K. Wnuk. Can
we beat the complexity of very large-scale
requirements engineering? In Proc. of REFSQ, pages
123128, 2008.
[26] M. Robillard, R. Walker, and T. Zimmermann.
Recommendation systems for software engineering.
IEEE Software, 27(4):8086, 2010.
[27] J. Sill, G. Takacs, L. Mackey, and D. Lin.
Feature-weighted linear stacking. CoRR, 2009.
<
p
r
e
p
r
i
n
t
>

You might also like