You are on page 1of 10

2018 42nd IEEE International Conference on Computer Software & Applications

Predicting the breakability of blocking bug pairs

Hui Ding Wanwangying Ma Lin Chen Yuming Zhou Baowen Xu


State Key Laboratory for Novel Software Technology
Nanjing University, China
dinghui85@gmail.com, wwyma@smail.nju.edu.cn, {lchen, zhouyuming, bwxu}@nju.edu.cn

Abstract—Software systems becomes increasingly complex for However, even if we have identified the blocking pairs, it is
the wide use of social-software-development platforms, such as still difficult to fix them in some situations. First, the upstream
GitHub. Complicated inter-dependencies within ecosystems bug is of low severity for the upstream project so that the
impose new challenges in resolving the blocking bug pairs, in upstream developers may not pay much attention to it. Then
which the upstream bugs prevent the downstream bugs to be the downstream projects probably under huge influence have to
fixed. Generally, the downstream bugs cannot be fixed until the wait for the unenergetic upstream. Second, the root cause of a
upstream bugs are fixed in blocking bug pairs, which keeps the downstream bug is only a special case for the upstream bug.
downstream developers waiting for a long time. However, The patch of the upstream bug is expected to solve all the
previous research found that some blocking pairs could be
situations, and therefore takes more time to release. Third, the
"broken" through a workaround, i.e., a temporary solution
upstream bug is not a runtime failure but an enhancement
proposed by the downstream developers before the upstream
bugs get fixed. requirement, for example the redesign of ill-formed APIs. In
the social coding process, developers often spend much time in
In this paper, we propose an approach to describe and predict discussing the enhancement requirements to reach an
the breakability of the blocking bug pairs. Our goal is to help the agreement. Fourth, the upstream bug is also blocked, resulting
ecosystem developers to predict whether the given blocking bug in a complex blocking chain.
pair can be broken. We evaluate our approach on two real world
As mentioned above, to fix a downstream bug in a blocking
ecosystems: Mozilla Firefox and Netbeans, and have the
pair may be a long process. In [5], they found that some
following two main findings. First, the participants within the
blocking bug pair and the vitality of the downstream bug have downstream bugs were fixed before the dependent upstream
much impact on the breakability. Second, we can build bugs in blocking pairs, which means the blocking pairs could
breakability predication models with the accuracy of probably be broken. In their investigation, they found some
approximate 80%. Moreover, for predicting unbreakable pairs patterns how the downstream developers broke the blocking
which affect the downstream projects seriously, our approach pairs. However, it is usually difficult to know whether a
achieves a precision over 92%. It can be used to effectively blocking pair is breakable or not. Thus an approach on
remind the upstream developers to fix these bugs as quickly as predicting the breakability of blocking bug pairs is necessary.
possible. However little work has paid attention to this issue. The issue
of breakability prediction is important in two major aspects: on
Keywords— blocking bugs; prediction; breakability one hand, downstream developers can choose to fix the
breakable bug without waiting for upstream developers, which
I. INTRODUCTION would save much time; on the other hand, upstream developers
should care more about the unbreakable bugs to avoid long
Social coding sites, such as GitHub, help people around the lasting impact on the downstream sides.
world to cooperate on software development and maintenance.
On social coding sites, projects and components with complex In this paper, we investigate the characteristics of blocking
dependencies between them form ecosystems. While bringing pairs and attempt to build prediction models for breakability.
convenience from software reuse, the impact of bugs in We conduct this study on two ecosystems: Mozilla Firefox and
upstream projects may reach the downstream projects through Netbeans. We first collect and analyze bug reports of the two
their dependencies. The fix of some downstream bugs depends subjects. Then we leverage machine learning algorithms to
on the fix of the bugs in their upstream projects. The two inter- build prediction models for breakability.
dependent bugs constitute a blocking bug pair.
This paper makes the following contributions. First, we
Blocking bugs require more time to fix compared with non- investigate the effect of bug features on breakability
blocking bugs. In [1], they found that, developers have to pay predication. Second, we propose an approach to predict the
nearly two to three times longer to fix blocking bugs than to fix breakability of blocking pairs in two large-scale ecosystems.
non-blocking ones. Due to the huge impact of blocking bugs, Then by applying various parameters, our model identifies
many researchers work on blocking bug prediction with bug unbreakable blocking pairs for the upstream projects in order to
reports [1–4]. remind the developers to fix the bugs preferentially.

0730-3157/18/$31.00 ©2018 IEEE 219


DOI 10.1109/COMPSAC.2018.00035
The rest of this paper is organized as follows. Section II betweenness, and brokerage level, as well as contributors that
describes related work. Section III introduces the preliminaries changed the source code more than others did.
and motivation. Section IV describes our research methodology
including the data source we used and the research questions. For the influence of inter-dependency, Bavota et al. [7]
Section V presents in detail the experimental results. We found that the upstream projects strongly impact on
propose further discussions on the practices of breaking downstream projects when there were general dependencies
blocking pair in Section VI and examine the threats to validity between them. Their study showed that a large number of
in Section VII. Finally, Section VIII concludes this paper. downstream code had to be modified when the upstream
project changed if the downstream project depended on the
upstream framework or general services.
II. RELATED WORK
Ma et al. [5] studied how developers fixed cross-project
A. Blocking bug prediction correlated bug in the Github scientific Python ecosystem. They
leveraged the manual inspection and an online survey to
Garcia and Shihab [1] proposed the first work about
investigate how the developers track the root cause of cross-
blocking bug prediction. Based on 402962 bug reports from six
project bugs and how they deal with cross-project bugs to
open source projects, they found that it took about two to three
eliminate the bad impacts.
times longer to fix blocking bugs than to fix non-blocking bugs.
They then extracted 14 different factors to build decision trees Ding et al. [8] extended Ma et al.’s work by focusing on the
to predict whether a bug would be a blocking bug. Their workaround, which is commonly used by the downstream
models achieved 9-29% precision, 47-76% recall, and 15-42% developers when facing cross-project bugs. A workaround is a
F-measure. They also found that the most important factors to temporary solution injected in the downstream code to bypass
determine blocking bugs were the comments, comment-size, an upstream error. Combining the manual inspection and
number of developers in the CC list and the reporter’s statistical comparisons, they found the workarounds and the
experience. corresponding upstream fixes were significantly different in
code size and structure. They also identified four types of
Work [2] extended Garcia and Shihab’s work by proposing
common patterns of the downstream workarounds.
a novel method called ELBlocker. Considering the class
imbalance phenomenon in the bug reports, ELBlocker Our work focuses on blocking bug pairs of which the fix of
leveraged ensemble learning techniques which combined the upstream bug blocks the fix of the downstream one.
multiple classifiers built on different subsets of the training set. Different from the aforementioned studies which studied the
Their approach significantly improved the F1-score of Garcia characteristics of the contributors or the practices of the
and Shihab’s method, SMOTE, OSS, and Bagging by 14.69%, developers during the fixing process, our study attempt to
23.36%, 30.98%, and 171.65%. predict whether the blocked downstream bug can be fixed
before the upstream one, that is, whether the blocking pair is
Our work concentrates on a specific feature of the blocking
breakable.
bug pairs, i.e., the breakability, and uses learning-based
approach to predict whether a blocking bug pair is breakable or
not. If the pair could be broken, then the downstream C. Prediction on other features or types of bugs
developers do not need to wait for an upstream fix. If the pair In recent years, a large number of researchers have
could not be broken, the upstream developers are supposed to proposed prediction models to reduce the impact of bugs by
put the upstream bug in high priority. Therefore, the results of extracting data from bug reports. Some works focused on
the breakability prediction will help the developers make certain features during bug fixing process, including the fixing
decisions when making the bug fix schedule, in order to time [3,9–11] ,bug severity or priority [4,12,13] and the
decrease the impact on the related projects, especially the characteristics of developers [14,15]. Other works concentrated
downstream ones. on some specific types of bugs, such as duplicate bugs [16,17],
re-open bugs [18,19], and performance bugs [20].
B. Cross-project bugs Our work puts our eyes on an emerging type of bugüü
Blocking bugs refer to bugs prevent other bugs from being blocking bugs, and predicts the breakability of them.
fixed. Another special type of bug similar to blocking bugs are
cross-project bugs. If a bug in a project affects the projects
depending on it, then the bug is called a cross project bug. As
the development of software ecosystems, the number of cross- III. PRELIMINARIES & MOTIVATION
project bugs has been increasing, which attracts the attention of
In this section, we first introduce the concept of
more and more researchers.
breakability. Then we provide the motivation to predict the
There have been some researchers focusing on the fixing breakability.
process of the cross-project bugs. Canfora et al. [6] investigated
Cross-System-Bug-Fixings (CSBFs) in FreeBSD and A. Breakability
OpenBSD kernels. They associated the occurrences of CSBFs The concept of blocking bug has been well investigated by
with social characteristics of contributors. By using social previous works [1,2,6]. If the fix of one bug B1 relies on the fix
network analysis, they found that cross-project bug fixings of another bug B2, the two bugs are called a blocking bug pair.
mainly involved contributors with the highest degree,

220
TABLE I. DISTRIBUTION OF BREAKABLE BLOCKING BUG PAIRS upstream and downstream information are used for prediction.
% Third, breakability prediction works in the stage after the
Bug Blocking Breakable Breakable blocking relation is identified, while blocking bug prediction
Ecosystems
reports pairs pairs in works before the blocking relation is known.
Blocking
Mozilla The identification of breakable blocking pairs is helpful for
24456 11140 3733 33.51%
Firefox the downstream side to be quickly fixed. The requirements for
Netbeans 108128 7760 2001 25.79% bug fixes are different between the upstream bug and the
downstream bug in a blocking pair. The upstream side as a
service provider would like to fix the bug perfectly in
The bug B2 refers to the upstream bug which is usually a
consideration of all the cases, which may take a lot of time.
service provider, and the bug B1 refers to the downstream bug
The downstream bug may be only a special case caused by the
which leverages the upstream service. Generally, B1 cannot be
upstream bug, and a quick fix for this case is preferred. For
fixed until B2 has been fixed. If B1 can be fixed before B2, we
example in Mozilla Firefox, the upstream bug #462222
say the blocking pair consisting of B1 and B2 is breakable. Ma
blocked the downstream bug #528440. The upstream function
et al. investigated this phenomenon in [5], and found some
'nsIWindowMediator' returned wrong values at the time point
patterns how downstream developers broke the blocking pairs.
when a window was closed and not destroyed yet. This caused
The definitions of “upstream” and “downstream” are "some intermittent failures". The downstream developers
restricted to a certain blocking bug pair not the projects the solved this bug by adding special conditional check around the
bugs belong to. Due to the inter-dependencies and interactions upstream API in the year 2009, while the upstream bug was
within ecosystems, one project may be both the service fixed four years later in 2013. Another example is from
provider and API user for another project. There may be no Netbeans, where the downstream bug #214254 was blocked by
explicit “upstream” and “downstream” assignment for two the upstream bug #208919. In the upstream side, collisions
projects. For a blocking bug pair, the “upstream” and happened when the two modules defined a same name as a file
“downstream” are clear and definite. and a folder respectively. The comment of the downstream bug
#208919 said:
Breakable blocking pairs are not rare in ecosystems.
TABLE I. shows that nearly 34% of blocking pairs in Mozilla “OK, so the problem is that folder is defined first and then
Firefox and 26% in Netbeans are broken. another module defines file of the same name. What should
happen in this situation?”
B. Motivation The downstream bug #214254 was caused for the above
Our study focuses on how to identify the breakable reason when the module entry is a file not a folder. This is a
blocking pairs. Given a series of blocking pairs, each special case of #208919, and the downstream developer fixed
containing an upstream blocking bug and a downstream the bug by “reverting the annotation based registration for
blocked bug, our approach attempts to identify each pair as Folder”.
breakable or unbreakable.
An upstream bug may block a number of downstream bugs,
A number of research works have been proposed to predict forming many blocking pairs with the same upstream bug. If
blocking bugs [1,2]. However, our study on breakability most of the blocking pairs are unbreakable, the upstream bug
prediction is significantly different from the blocking bug would cause huge impact on the ecosystem. The upstream bug
prediction for the following reasons. First, breakable attribute #1265429 in Mozilla Firefox blocked 12 downstream bugs and
relates to a pair of related bugs while blocking attribute relates all of the downstream bugs are unbreakable. The comment of
to a single bug. One bug blocked by two upstream bugs is one downstream bug #1265729 is as follows:
expected to mark as blocked in blocking prediction, while in
breakability prediction, the blocked bug belongs to two “I have wanted to do this for a long time... it helps us avoid
blocking pairs with two upstream bugs, and our approach is a host of ugly hacks and makes our codebase make more
expected to give the breakable attribute for each pair sense.”
respectively. Second, as the prediction objective is different, The prediction on unbreakable blocking pairs helps the
the instances used are different. For a blocking pair, both upstream developers to find the bugs that blocked more
downstream bugs. These influential upstream bugs should be
fixed in a high priority.

221
TABLE II. BUG FEATURES AND DESCRIPTIONS
Feature name Abbr. Description
Product product Product the bug located in.
Component component Second-level categories below product.
Severity severity Influence ot the bug.
Priority priority The order to fix ranked by developers.
Assignee asto The person who manages and maintains the bug.
Bug reporter reporter The person who reported the bug.
Number of the CC list ccnum The number of people in the interesting list.
Number of comments cmtnum The number of comments of the bug.
Bug description size wnum The number of words in the bug description.
Hardware Platform rep_platf The hardware environment of the bug.
Operating System op_sys The operating system the bug observed on.

only consider the last broken point. After processing the bug
IV. RESEARCH METHODOLOGY activities of blocking pairs, we got 3733 breakable pairs out of
In this section, we first introduce the data source, and then 11140 in Mozilla Firefox (33.5%) and 2001 breakable pairs out
make a general analysis. Finally, we propose the research of 7760 in Netbeans (25.8%).
questions and methods to investigate the questions.

A. Data Source
The investigated ecosystems are Mozilla Firefox 1 and
Netbeans2. Mozilla Firefox is an Internet browser and Netbeans
is a Java IDE. We collected the data from their Bugzilla
systems. We chose the bug reports that were already fixed, and
therefore the status in the reports are stable. As shown in
TABLE I., we got 24456 bug reports of Mozilla Firefox and
108128 bug reports of Netbeans. Then we analyzed each bug
report and extracted the blocking attribute. Through the bug
correlation information, we identified the blocking pairs. We
Fig. 1. Bug activity segment of bug#462222 and bug#528550 in Firefox
got 11140 blocking pairs from Mozilla Firefox and 7760 from
Netbeans. Finally, we marked the blocking pairs as breakable
or not.
B. Research Questions
The breakable information is not explicitly declared in bug The goal of this study is to predict the breakability of
reports. We decided the breakable attribute by checking and blocking pairs. Based on a large data set collected from two
comparing the upstream and downstream bug activities. ecosystems, we attempt to answer the following questions:
Bugzilla provides detailed bug activities which are described as
lines of change records. Each line is a modification of the bug RQ1: Which features are the best indicators for the
report, containing five columns namely, Who, When, What, breakability?
Removed and Added. "Who" means the person that
There are various features in blocking pairs. Which features
implemented the change; "When" means at what time the
can we leverage to predict the breakability? Are these features
change happened; "What" means the field that was changed;
of the same importance in prediction? We attempt to find the
"Removed" and "Added" are the value change in this field.
answer.
To identify breakable pairs automatically, we leveraged the
time series of "fixed" status from the upstream and downstream RQ2: Can we build effective models to predict whether a
bug reports. If at one point, the downstream bug was marked as blocking pair can be broken?
"fixed" and the upstream one was not fixed yet, then this The breakability is crucial for both the downstream and
blocking pair is breakable at that point. Fig. 1 shows a bug upstream developers. Identifying the breakable blocking pair is
activity segment of bug#462222 and bug#528440 in Mozilla helpful for the downstream developers. For the downstream
Firefox. The downstream bug#528440 was fixed at time 2009- side, with this message, they are able to fix the downstream
11-13, and the upstream bug#462222 was not fixed until 2013- bugs without waiting for the fixes of the upstream bugs.
09-08. Obviously, this blocking pair is breakable. Although there are a considerable number of breakable pairs
for downstream developers to accelerate the fixing process, the
Note that one blocking pair can be broken more than once
unbreakable pairs are still the majority. When the blocking pair
since the bug report might be reopened. In this condition, we
is unbreakable, the downstream can do nothing but wait. Under
1 the circumstances, the upstream should pay more attention to
https://bugzilla.mozilla.org/
2
https://netbeans.org/bugzilla/
the blocking pair.

222
C. Research Methods Classifier is an ensemble learning approach. This approach
1) Bug Features predicts with the majority vote of a series of decision trees.
To predict breakable pairs, we extracted various features To evaluate the efficiency of our prediction models, we
from both upstream and downstream bug reports. Both the leveraged a confusion matrix. The confusion matrix records the
upstream and downstream bug reports are of the same correct and incorrect results by a classifier. Each row
structures. The data fields are one to one corresponding. A represents the instances predicted by the classifier (P and N)
specified feature is of the same meaning to the bug itself, but and each column represents the actual class (T and F) they
different to the pair. We chose 11 features from upstream and belong to. For instance, a true positive (TP) represents a
downstream separately, and total 22 features for the pair. Note blocking pair is breakable and is classified as breakable; a false
that, for Mozilla Firefox the product feature is all "Firefox", positive (FP) represents a blocking pair is unbreakable but is
thus there are 20 valid features for it. We list each feature and classified as breakable; a true negative (TN) represents a
describe each one briefly in TABLE II. blocking pair is breakable but is classified as unbreakable; a
After the features were extracted for upstream and false negative (FN) represents a blocking pair is unbreakable
downstream bugs, we computed the information gain of each and is classified as unbreakable. The above four outcomes are
feature with the breakable attribute. The evaluation of summarized in TABLE III.
information is as follow:

TABLE III. THE CONFUSION MATRIX OF BREAKABLE PREDICTION


INFOGAIN(CLASS, ATTRIBUTE) = H(CLASS) - H(CLASS | ATTRIBUTE)
Actual class
H(Class) is entropy of the class, in this study standing for breakable unbreakable
breakability. H(Class | Attribute) is the entropy of the class in Predicated breakable TP FP
condition of the attribute. The change from the class without class unbreakable FN TN
any additional information to the class with a given attribute is
the information gain of the attribute on the class. Through the
information gain, we can observe the correlation between the With the values in the confusion matrix, we are able to
feature and the breakability. A greater information gain means calculate the evaluation metrics for our prediction models. The
the feature is more related to the breakability. metrics we used are as follows:
The attributes with a large number of distinct values are 1. Precision: The ratio of correctly classified breakable
beneficial to information gain, such as the attribute “bug pairs over all the pairs classified as breakable, and is calculated
reporters” which contains many distinct participants’ names. as Pre = TP / (TP + FP).
To further investigate the attributes against the bias of 2. Recall: The ratio of correctly classified breakable pairs
information gain, we resolved to information gain ratio. The over all the actually breakable pairs, and is calculated as Rec =
information gain ratio is calculated as follows: TP / (TP+FN).
3. F-measure: Consider both the precision and recall, F-
GAINR(CLASS, ATTRIBUTE) = INFOGAIN(CLASS, ATTRIBUTE)/H(ATTRIBUTE)
measure is calculated as F-measure = 2*Pre*Rec / (Pre+Rec).
H(Attribute) denotes entropy of the attribute. The
4. Roc Area: The ROC curve is created by plotting the
information gain ratio is the division of information gain and
recall against the precision at various threshold levels. The area
entropy of the attribute. When there are a large number of
under the ROC curve is between 0 and 1, and calculated by
distinct values in the attribute, information gain ratio is
integration.
decreased by H(Attribute). With information gain ratio, we can
avoid the attribute to be over-fitted. 5. Accuracy: The ratio between the number of correctly
classified pairs over total number of pairs, calculated by Acc =
2) Prediction models for breakable pairs (TP+TN) / (TP+FP+TN+FN).
We used Zero-R as our baseline model. We chose four
well-known machine learning classifiers for comparison, An ideal prediction model is expected to achieve 100%
namely: Naive Bayes, Bayes Networks, K-Nearest Neighbors precision which means every pair classified as breakable is
and Random Forests. Zero-R is a very simple classifier with actually breakable, and 100% recall which means every actual
the prediction output only the majority class in the training set. breakable pair is classified as breakable. In this condition, the
The Naive Bayes Classifier and the Bayes Networks Classifier F-measure, Roc Area, and Accuracy are all of the value 1. In
are both on the basis of conditional probability. The difference practice, it is difficult to build an ideal model and we attempt to
is: Naive Bayes works under the assumption that features are find an applicable one.
probability independent, while Bayes Networks does not
For prediction models, there is a trade-off between the
require this assumption. The K-Nearest Neighbor Classifier
precision and recall. Since the unbreakable pairs are the
calculates the distance from the unclassified instance to all the
majority in the data set, there is room to increase the prediction
instances in training set, and then consider k nearest neighbor
precision with the recall reduction in the acceptable range.
with the unclassified instance to decide the class. In this study,
we selected k=5 as the parameter. The Random Forests To increase the prediction precision on the unbreakable
pairs, we proposed a classifier based on Bayes Networks. This

223
classifier is able to classify with an assignable threshold from 0 TABLE IV. INFORMATION GAIN OF BUG FEATURES
to 1. The threshold is used to process the probability Mozilla Firefox Netbeans
distribution generated by the Bayes Network classifier. When Feature Info Gain Feature Info Gain
the threshold is 0.5, our classifier is the same as the Bayes reporter1 0.2913 reporter2 0.1863
Network classifier; when the threshold decreases from 0.5 to 0, asto2 0.19177 reporter1 0.16236
the classifier becomes stricter and eliminates more boundary reporter2 0.18799 asto2 0.10634
asto1 0.18024 component2 0.10213
instances to increase the precision; when the threshold cmtnum2 0.13666 asto1 0.08739
increases from 0.5 to 1, the classifier becomes looser and component1 0.08368 component1 0.0873
accept more boundary instances to increase the recall. For the component2 0.08113 cmtnum2 0.04586
upstream developers, we applied a lower threshold to improve ccnum2 0.06298 product2 0.04293
the precision. wnum1 0.03297 product1 0.03811
wnum2 0.02533 wnum2 0.02713
Since the recall will be reduced with the increase of op_sys1 0.02052 ccnum2 0.01698
precision, we cannot simply set the threshold to a very low severity1 0.01989 cmtnum1 0.01413
cmtnum1 0.01735 ccnum1 0.01078
value. We investigated the changing trends of the precision and ccnum1 0.0124 wnum1 0.01058
the recall with various thresholds. op_sys2 0.0062 severity1 0.00771
priority1 0.00469 op_sys2 0.00353
rep_platf1 0.00449 op_sys1 0.0032
priority2 0.0029 rep_platf2 0.00303
V. RESEARCH RESULTS rep_platf2 0.00194 priority1 0.00237
severity2 0.00107 priority2 0.0019
rep_platf1 0.00155
A. RQ1. Which features are the best indicators for the severity2 0.00132
breakability?
We performed information gain and information gain ratio
analysis on the collected data set. The results are shown in TABLE V. INFORMATION GAIN RATIO OF BUG FEATURES
TABLE IV and TABLE V. We use the suffix "1" and "2"
behind the feature names to distinguish between the features of Mozilla Firefox Netbeans
the upstream bugs ("1") and the downstream bugs ("2"). For Feature Gain Ratio Feature Gain Ratio
cmtnum2 0.04512 reporter2 0.02403
instance, "reporter1" denotes the reporter of the upstream bug reporter1 0.03784 reporter1 0.02156
and "component2" denotes the component where the asto2 0.0294 cmtnum2 0.02031
downstream bug happened. The second column of each subject severity1 0.02837 wnum2 0.01864
presents the information gain or information gain ratio value of ccnum2 0.02765 asto2 0.01675
the feature in first column. The features are ranked in asto1 0.02612 component2 0.01653
reporter2 0.02492 asto1 0.01346
descending order. The features on the top of the list are of
component1 0.01762 component1 0.01328
higher correlation with the breakability than the ones on the component2 0.01716 ccnum2 0.01231
bottom. wnum1 0.01677 product2 0.00992
cmtnum1 0.0137 severity1 0.00958
For the two subjects in our study, the rankings of the wnum2 0.01276 product1 0.00916
features are different due to the unique characteristics of each op_sys1 0.01125 cmtnum1 0.00817
ecosystem. However, from the ranking lists of the two software ccnum1 0.0089 ccnum1 0.00702
ecosystems, we can get some common conclusions. op_sys2 0.00366 wnum1 0.00694
priority1 0.00316 rep_platf2 0.00298
First, the participant-related attributes have much impact on rep_platf1 0.00281 op_sys2 0.00243
breakability. The reporters and assignees rank among the top priority2 0.00192 op_sys1 0.0022
severity2 0.00134 severity2 0.00168
30% in both lists, which shows that the features related with
rep_platf2 0.00125 priority1 0.00148
the developers play an important role in predicting the rep_platf1 0.00146
breakability. This is consistent with the results of Ma et al.’s priority2 0.00114
study [5]. The attribute of participants’ names contains many
distinct string values, which contributes to information gain information gain ratio analysis shows the number of
values. The four participant-related attributes are all ranked in downstream comments is an important factor of breakability.
top-5 in information gain analysis for each ecosystem. In gain
ratio analysis, the rankings of these attributes go down for the Third, different from our intuition, the severity and priority
distinct attribute values yet still in the top-7. Even though the of the downstream bug are of low importance for the
amount of distinct values, participant-related attributes are still breakability of blocking pairs in both subjects. These metrics
useful for breakability prediction. may contain unreliable data. We still need further research
works to investigate the reasons. Besides, operating system and
Second, the features indicating the vitality of the hardware platform are of small influence in Netbeans probably
downstream bugs are more influential than those of the because of the excellent multi-platform capabilities of Java.
upstream ones. The number of comments (cmtnum) and the
number of CC (ccnum) are two main indicators of the bug
vitality. In both subjects, the two indicators of the downstream
vitality rank higher than those of the upstream vitality. The

224
TABLE VI. PERFORMANCE OF BREAKABILITY PREDICTION MODELS
Software Model Precision Recall F-measure Roc Area Accuracy
Zero-R 0.000 0.000 0.000 0.500 66.49 %
NaiveBayes 0.636 0.628 0.632 0.810 75.48 %
Mozilla Firefox BayesNet 0.629 0.729 0.676 0.831 76.54 %
kNNs 0.688 0.618 0.651 0.740 77.81 %
Random Forests 0.785 0.591 0.675 0.885 80.87 %
Zero-R 0.000 0.000 0.000 0.500 74.21 %
NaiveBayes 0.499 0.521 0.510 0.753 74.18 %
Netbeans BayesNet 0.488 0.583 0.531 0.764 73.45 %
kNNs 0.588 0.535 0.560 0.709 78.32 %
Random Forests 0.717 0.367 0.486 0.882 79.95 %

models perform well for the recall but poorly for the precision
B. RQ2. Can we build effective models to predict if a blocking when predicting breakable pairs.
pair can be broken?
To validate our models and to reduce the bias caused by For our data set, the kNNs model is balanced between the
training set selection, we performed 10-fold cross-validation 10 precision and the recall. As for the precision, the kNNs model
times and use the average performance. For each validation, outperforms the two Bayesian models but underperforms
our data set is divided into 10 parts, and each part conforms to Random Forests. Considering the recall, the kNNs model
the distribution of the original data set. For each fold in ten, performs better than Random Forests but worse than Bayes
one part is selected as the testing set and the other parts are Networks. No matter concerning the precision or the recall, the
used as a training set. The model built on the training set is kNNs model is not a bad choice.
validated on the testing set, and the evaluation metrics are Among the four classifiers we chose, the Random Forests
recorded. The average performance of each model is shown in model performs best when considering the precision, Roc area,
TABLE VI. and accuracy, but worst when considering the recall. Regarding
The Zero-R algorithm is the baseline model in our study. It the precision and accuracy, the Random Forests model is no
simply predicts all the instances as the majority class. The doubt the best choice. For downstream developers, the
proportion of breakable pairs is less than 50% in each subject, performance on precision is more important than recall for time
and therefore Zero-R marks all the instances as unbreakable. saving.
Thus, the precise, recall, and F-measure are all zeros. Generally, there is a trade-off between precision and recall.
The Naive Bayes model and the Bayes Network model are For different purposes, the precise and the recall are of
based on the similar mechanism. The performance of the two different importance. If the downstream developers would like
models are similar in precision, Roc Area, and accuracy. The to avoid spending time on the unbreakable bugs before the
recall of the Bayes Network model is higher than that of all the upstream bugs is fixed, then a model of high precision is
other models in this study including the Naïve Bayes. The preferred. In our study, the Random Forests model is the most
precision of the two Bayesian models is lower than that of satisfied one in this case. The model of high recall is preferred
kNNs and Random Forests. In general, the two Bayesian when the downstream developers would like to break the

Fig. 2. The performance of unbreakable prediction

225
TABLE VII. PERFORMANCE OF UNBREAKABLE PREDICTION
Software Mozilla Firefox Netbeans
Threshold Precision Recall Precision Recall
0.50 (default) 0.852 0.784 0.845 0.787
0.40 0.860 0.760 0.852 0.759
0.30 0.869 0.738 0.858 0.730
0.20 0.878 0.707 0.867 0.692
0.10 0.890 0.658 0.880 0.626
0.05 0.903 0.612 0.891 0.555
0.01 0.927 0.506 0.924 0.410

blocking pairs as much as possible. The Bayes Network model The amount of broken blocking pairs is very large for
is best fit for this request in our study. manually checking. There are 5734 broken pairs in our data set
which is difficult for the authors to check them all. For a
For the unbreakable prediction, we selected seven levels of qualitative analysis, we chose 5% from the whole broken pairs
thresholds for our classifier to compare the performance by random sampling. For some broken pairs, the code of fixes
variation. The results are shown in TABLE VII. With the could not be obtained due to the attachment invalidation or no
decrease of the threshold, the precision improves gradually. reference to the source code in bug reports. There broken pairs
With the threshold value of 0.01, the precision is over 92% in were invalid for our manually checking and discarded. For the
both subjects. This demonstrates our model provides a very valid pairs, the three authors first reviewed the bug reports and
accurate suggestion of which blocking pairs are unbreakable. code individually according to the same criteria. During this
With our model, the upstream developers are able to find the procedure, they summarized and recorded the patterns of
bugs with huge impact on the downstream efficiently. breakability. And then they came together to discuss the
The two subjects differ in recall for the different proportion patterns and draw conclusions. By manually analyzing the bug
of unbreakable pairs. Fig. 2 shows the trends of the precision reports, we identify three kinds of practices of the developers.
and the recall on decreasing thresholds. The increase of First, if there is a workaround for the downstream bug, the
precision reduces the recall at a larger rate in Netbeans than in pair is obviously breakable. A workaround is a temporary patch
Mozilla Firefox. For different ecosystems, developers are by the downstream and was investigated in [5]. Their study
supposed to use different thresholds to balance the precision showed the downstream usually offer a workaround to break
and recall to better leverage our model. the blocking pair. The downstream workaround of bug#528440
in Mozilla Firefox is shown in Fig. 3. The function
“sss_getMostRecentBrowserWindow” should return the
reference of a window which is browsed most recently. The
upstream service “nsIWindowMediator” returned the reference
of a window before its destruction without checking the status
of the window. It crashed when the target window was closed
but not destroyed yet. The downstream developers fixed this
bug by adding special conditional checks around the return
values from the upstream side.
Second, the downstream developers may use the semi-
finished patch from the upstream side. The upstream
developers may raise multiple versions of patches before the
bug is finally fixed. Some early patches are not able to work
perfectly, but are enough for the downstream projects. Usually,
Fig. 3. The downstream workaround for bug#528440 in Mozilla Firefox the downstream developers should be familiar with the
upstream code in order to use the upstream semi-finished
patches properly. During our inspection, we found that the
VI. DISCUSSION downstream participants often contributed to the upstream
projects as well in this case. For example, the downstream
In this section, we discuss some findings about how bug#110302 was blocked by two upstream bugs, namely
developers break the blocking bug pairs. bug#112464 and bug#114016, forming two blocking pairs of
This study focused on the prediction of breakability, and the same downstream bug but different upstream bugs. The
gave the answer to the question "whether a blocking pair is three bugs are all about the IDE performance on processing
breakable". Furthermore, we are also interested in how to break large files. The downstream bug is that the IDE was
the blocking pairs. To investigate these questions, three authors unresponsive when loading large html files. In the upstream,
manually inspected the bug reports and the code of fixes for the bug#112464 is about the editor redrawing function and
broken blocking pairs. bug#114016 is about processing long lines in navigator.
Bug#114016 was fixed ahead of the downstream bug. With the

226
patch of bug#114016, the impact on the downstream side was checked a part of the instances by random sampling and did not
reduced and the comments from the downstream side is as found the misjudged pairs by our approach.
follows:
Threats to external validity. This study investigated
“After Vita's fix of issue #114016 the impression of the file 18900 blocking pairs from 132584 bug reports in two
editing is much better. Now I can scroll like I want without a ecosystems. The two subjects differ in size, complexity,
high CPU load.” developers, application field, and so on. All the bug data were
generated during the development process. However, our data
In this case, the blocking pair breaks for the downstream
set may not be representative of all kinds of ecosystems.
benefitting from the fix of another upstream bug. Although the
Further study is needed to confirm our findings about the
blocking pair consisting of bug#114016 and bug#110302 was
breakability of blocking pairs for other ecosystems.
not broken, it helped the other blocking pair consisting of
bug#112464 and bug#110302 to break.
VIII. CONCLUSION AND FUTURE WORK
Third, the downstream developers may rewrite the buggy
upstream function and integrate the redesigned function into In this study, we studied the characteristics of breakable
the downstream code. Hence, the dependency is changed. In blocking pairs, and proposed models to predict the breakability
our observation, this usually happened when the buggy of blocking pairs in two large-scale software ecosystems,
upstream function is loosely-coupled with the upstream project. Mozilla Firefox and Netbeans. By performing information gain
For example in Netbeans, the upstream bug#95534 blocks two analysis on the bug features, we found the developer related
downstream bugs, namely bug#95531 and bug#95241. The two features are most correlated with the breakability. In our
downstream bugs were both caused by uncompleted metadata breakable prediction models, the Random Forests algorithm
model support from the upstream side. The downstream achieved the precision of 78.5% and 71.7% for Mozilla Firefox
developers released a series of functions that “covered most use and Netbeans respectively. Meanwhile, since it is important for
cases” and fixed the two bugs. The comment of bug#95531 the upstream developers to be reminded of the unbreakable
said: blocking pairs efficiently, we proposed a classifier with
variable thresholds to identify unbreakable pairs and achieved a
“Fixed temporarly. precision over 92%.
Covered most use cases when ejb-jar.xml is missu(i)ng.” This study is a preliminary investigation on the breakability
of blocking pairs. There is still room for improvement in the
And the comment of bug#95241 is:
performance of our prediction models and we hope that this
“Fixed temporarly. work could inspire other researchers to find more efficient
models to predict the breakable pairs. In future work, we plan
JAX WS service is generated when webservices.xml is to extend our work on more software ecosystems. We also plan
missing.” to perform a more in-depth analysis on developers' practices
Thus, the two blocking pairs were broken by integrating a for breaking blocking pairs and further develop breakable plan
part of the upstream functions in themselves. This integrating generation tools to automatically suggest how to break a
process reduced the inter-dependency between the upstream blocking pair.
models and the downstream models, and finally broke the
blocking pairs. REFERENCES
As mentioned above, the practices for breaking blocking [1] H. Valdivia Garcia, E. Shihab, Characterizing and predicting
pairs were qualitatively analyzed by manual inspection. Due to blocking bugs in open source projects, Proc. 11th Work. Conf. Min.
Softw. Repos. - MSR 2014. (2014) 72–81.
the large size of our data set, we were not able to inspect every doi:10.1145/2597073.2597099.
instance. In the future work, we plan to make an in-depth [2] X. Xia, D. Lo, E. Shihab, X. Wang, X. Yang, ELBlocker: Predicting
investigation into this issue by leveraging nature language blocking bugs with ensemble imbalance learning, Inf. Softw.
processing and interviews/surveys with the developers. Technol. 61 (2015) 93–106. doi:10.1016/j.infsof.2014.12.006.
[3] L.D. Panjer, Predicting Eclipse Bug Lifetimes, in: Fourth Int. Work.
Min. Softw. Repos. (MSR’07ICSE Work. 2007), IEEE, 2007: pp.
29–29. doi:10.1109/MSR.2007.25.
[4] A. Lamkanfi, S. Demeyer, Q.D. Soetens, T. Verdonck, Comparing
VII. THREATS TO VALIDITY Mining Algorithms for Predicting the Severity of a Reported Bug,
Threats to internal validity. We identified breakable pairs in: 2011 15th Eur. Conf. Softw. Maint. Reengineering, IEEE, 2011:
pp. 249–258. doi:10.1109/CSMR.2011.31.
by checking the bug activities and comparing the fixing time of [5] W. Ma, L. Chen, X. Zhang, Y. Zhou, B. Xu, How do developers fix
the upstream and downstream bugs. This approach is correct cross-project correlated bugs? A case study on the GitHub scientific
when the fixes were exactly recorded in bug activities. Besides, python ecosystem, Proc. - 2017 IEEE/ACM 39th Int. Conf. Softw.
in some pairs, the downstream bug was fixed only a short time Eng. ICSE 2017. (2017) 381–392. doi:10.1109/ICSE.2017.42.
before the upstream one. In this case, we used a threshold for [6] G. Canfora, L. Cerulo, M. Cimitile, M. Di Penta, Social interactions
the shortest time the downstream is earlier than upstream in a around cross-system bug fixings, in: Proceeding 8th Work. Conf.
Min. Softw. Repos. - MSR ’11, ACM Press, New York, New York,
breakable pair. In this study, the threshold is one day, which USA, 2011: p. 143. doi:10.1145/1985441.1985463.
means the downstream bug was fixed at least 24 hours earlier [7] G. Bavota, G. Canfora, M. Di Penta, R. Oliveto, S. Panichella, How
than the upstream one in a breakable pair. We manually the Apache community upgrades dependencies: an evolutionary

227
study, Empir. Softw. Eng. 20 (2015) 1275–1317. [15] M. Tufano, G. Bavota, D. Poshyvanyk, M. Di Penta, R. Oliveto, A.
doi:10.1007/s10664-014-9325-9. De Lucia, An empirical study on developer-related factors
[8] H. Ding, W. Ma, L. Chen, Y. Zhou, B. Xu, An empirical study on characterizing fix-inducing commits, J. Softw. Evol. Process. 29
downstream workarounds for cross-project bugs, 24th Asia-Pacific (2017) e1797. doi:10.1002/smr.1797.
Softw. Eng. Conf. (2017). [16] X. Wang, L. Zhang, T. Xie, J. Anvik, J. Sun, An approach to
[9] L. Marks, Y. Zou, A.E. Hassan, Studying the fix-time for bugs in detecting duplicate bug reports using natural language and execution
large open source projects, in: Proc. 7th Int. Conf. Predict. Model. information, in: Proc. 13th Int. Conf. Softw. Eng. - ICSE ’08, ACM
Softw. Eng. - Promise ’11, ACM Press, New York, New York, Press, New York, New York, USA, 2008: p. 461.
USA, 2011: pp. 1–8. doi:10.1145/2020390.2020401. doi:10.1145/1368088.1368151.
[10] C. Weiss, R. Premraj, T. Zimmermann, A. Zeller, How Long Will It [17] N. Jalbert, W. Weimer, Automated duplicate detection for bug
Take to Fix This Bug?, in: Fourth Int. Work. Min. Softw. Repos. tracking systems, in: 2008 IEEE Int. Conf. Dependable Syst.
(MSR’07ICSE Work. 2007), IEEE, 2007: pp. 1–1. Networks With FTCS DCC, IEEE, 2008: pp. 52–61.
doi:10.1109/MSR.2007.13. doi:10.1109/DSN.2008.4630070.
[11] E. Giger, M. Pinzger, H. Gall, Predicting the fix time of bugs, in: [18] E. Shihab, A. Ihara, Y. Kamei, W.M. Ibrahim, M. Ohira, B. Adams,
Proc. 2nd Int. Work. Recomm. Syst. Softw. Eng. - RSSE ’10, ACM A.E. Hassan, K. Matsumoto, Studying re-opened bugs in open
Press, New York, New York, USA, 2010: pp. 52–56. source software, Empir. Softw. Eng. 18 (2013) 1005–1042.
doi:10.1145/1808920.1808933. doi:10.1007/s10664-012-9228-6.
[12] M. Sharma, P. Bedi, K.K. Chaturvedi, V.B. Singh, Predicting the [19] T. Zimmermann, N. Nagappan, P.J. Guo, B. Murphy,
priority of a reported bug using machine learning techniques and Characterizing and predicting which bugs get reopened, in: 2012
cross project validation, in: 2012 12th Int. Conf. Intell. Syst. Des. 34th Int. Conf. Softw. Eng., IEEE, 2012: pp. 1074–1083.
Appl., IEEE, 2012: pp. 539–545. doi:10.1109/ISDA.2012.6416595. doi:10.1109/ICSE.2012.6227112.
[13] A. Lamkanfi, S. Demeyer, E. Giger, B. Goethals, Predicting the [20] S. Zaman, B. Adams, A.E. Hassan, A qualitative study on
severity of a reported bug, in: 2010 7th IEEE Work. Conf. Min. performance bugs, in: 2012 9th IEEE Work. Conf. Min. Softw.
Softw. Repos. (MSR 2010), IEEE, 2010: pp. 1–10. Repos., IEEE, 2012: pp. 199–208. doi:10.1109/MSR.2012.6224281.
doi:10.1109/MSR.2010.5463284.
[14] D. Di Nucci, F. Palomba, G. De Rosa, G. Bavota, R. Oliveto, A. De
Lucia, A Developer Centered Bug Prediction Model, IEEE Trans.
Softw. Eng. (2017) 1–1. doi:10.1109/TSE.2017.2659747.

228

You might also like