You are on page 1of 2

N.

Reed Parsons:

September 30 2018

Improving Bug Triage with Bug Tossing Graphs


Jeong, Kim, Zimmermann
2009

Problem(s) Addressed
When a bug has been triaged and has been assigned to a developer, the authors found that this developer is not necessarily
the developer whom actually fixes the bug; often because the bug was assigned to the wrong developer. In these cases, which
were found to occur 37%-44% of the time, the developer will reassign the bug to another developer; a process called tossing.
Starting with the assigned developer, a bug is continuously tossed to another developer until it is finally accepted and resolved
by the fixer. The number of tosses can become numerous especially for large projects with a long implementation history,
imposing a significant delay -up to a year in some cases- on bug resolution.

Motivation
Previous works attempting to improve bug triage, extract developer structure by training machine learning algorithms with
co-change logs, email threads, metadata, and comment threads mined from project bug repositories, such to predict an
appropriate fixer. Although these these techniques show promise, they are not yet satisfactory as they assume similar bugs
will be fixed by the same developer in the future; an incorrect assumption as revealed by the author’s bug tossing analysis.
Research has not yet examined the direct working relationships -such as bug reassignment-between developers, nor have these
relationships been modeled as a graph structure.

Proposed Solution
Bugs ought to be correctly resolved in a timely manner in order for a product to be successful. The bug tossing routine
has been found to be a bottle neck in the bug resolution process. The removal or reduction of unnecessary tossing steps
and/or reduction of the tossing intervals would improve the bug triage process because the bug will reach the fixer faster.
Although not guaranteed, we can generally assume that the bug tossing process is Markovian in nature, insomuch as the
current decision to re-assign the bug does not depend on whom the bug was sent from. Based on this premise the Authors
devise a Tossing Graph which models the bug tossing history and allows for path reduction based on probability transactions
and minimum support values derived from previously generated tossing graphs.

Evaluation & Results


A previously published 10-fold cross validation process for both Naive Bayes and Baysesian Networks was followed to reproduce
a bug triage benchmark . Tossing graphs and machine learning predictions (baseline) were generated separately from the
training set; the accuracy for the best first 2,3,4 and 5 ML predictions were calculated. An additional set of tossing graphs
were generated using the previous baseline ML predictions with the accuracy calculated in the same manner.
Tossing length reduction and search failure rates both depended on whether the path of individual tossing relationships or
the relationship between intermediate developers and the fixer was used; actual or goal oriented path models receptively.
Similar to recall and precision, there was a trade off between reduction rates and search failure. For example, when more
developers are included by lowering minimum support and transaction probabilities produced lower search failure rates, at
the expense of poor path reduction rates incurred from the larger number of node traversals. At the expense of increased
search failure rates, high values for minimum support and transaction probability are preferred in order to retain a more
accurate graph and not produce “long and wrong tossing steps”.
The practicality of their approach was evaluated by requesting feedback from relevant developers who were sent tossing
graphs generated for their particular projects. The authors did not receive feedback from all of the developers involved

1
and were not able to provide a conclusive evidence, however they report positive connotations from the feedback that was
received. Regardless, the authors conclude tossing graphs were shown to be useful in bug assignment as a manager can
quickly visualize developer-bug-interactions and determine the most appropriate developers. Furthermore, even if a bug was
incorrectly assigned by the manager, the developer can consult the graph and quickly find a well suited developer to reassign.

Reflections on Learning
Pre-processing can be a challenge and can skew ML results. The authors provided examples such as; one developer multiple
email addresses, nicknames or abbreviated names, bug report formats can differ across projects, incomplete or incorrect
reports, as well as spelling mistakes, that can cause considerable issues and prevent a one size fits all pre-processing technique.
A standardized format would help alleviate some of these issues, however, implementation and ensuring strict adherence is
difficult especially across a broad range of projects. Adoption of such a standard is even more challenging and unlikely for
larger more established projects where retroactive changes would be costly, error prone and time consuming. If the open
source community could agree upon a protocol, the research and development of an automated bug report standardization
tool would be an interesting prospect.
I liked the introduction of graph theory into the realm of RSSE research and see value in combining more recent graph
optimization techniques with ML algorithms. Practically speaking, I see the authors idea being applied to the problem of
the over assignment of bug reports. In a previous article the authors provided an example of a project with one developer
whom resolved the majority of the bugs and thus was considered to have the most expertise, reinforcing the continuous
recommendation of this developer as the expert fixer;creating an inherent feed back loop. In this scenario the manager can
make a decision to assign to any number of developers and can better visualize the developer relationships improving upon
the big triage process.

You might also like