You are on page 1of 10

Samiksha: Mining Issue Tracking System for Contribution

and Performance Assessment


Ayushi Rastogi
IIIT-D
New Delhi, India
ayushir@iiitd.ac.in

Arpit Gupta
DTU
New Delhi,India
arpitg1991@gmail.com

ABSTRACT

Ashish Sureka
IIIT-D
New Delhi, India
ashish@iiitd.ac.in

veloper Contribution Assessment, Issue Tracking System

Individual contribution and performance assessment is a standard practice conducted in organizations to measure the
value addition by various contributors. Accurate measurement of individual contributions based on pre-dened objectives, roles and Key Performance Indicators (KPIs) is a
challenging task. In this paper, we propose a contribution
and performance assessment framework (called as Samiksha) in the context of Software Maintenance. The focus of
the study presented in this paper is Software Maintenance
Activities (such as bug xing and feature enhancement) performed by bug reporters, bug triagers, bug xers, software
developers, quality assurance and project managers facilitated by an Issue Tracking System.
We present the result of a survey that we conducted to understand practitioners perspective and experience (specically on the topic of contribution assessment for software
maintenance professionals). We propose several performance
metrics covering dierent aspects (such as number of bugs
xed weighted by priority and quality of bugs reported) and
various roles (such as bug reporter and bug xer). We conduct a series of experiments on Google Chromium Project
data (extracting data from the issue tracker for Google Chromium Project) and present results demonstrating the effectiveness of our proposed framework.

1.

RESEARCH MOTIVATION AND AIM

Contribution and performance assessment of employees is


an important and routine activity performed within organizations to measure value addition based on pre-dened Key
Performance Indicators (KPIs) [9] [11]. Measuring contribution and impact based on the job role and responsibilities
of an employee is required for career advancement decisions,
identication of strengths and areas of improvements of the
employee, rewards and recognition, organizational eciency
and output improvement and making informed business decisions [9] [11].
Software Maintenance and particularly defect-xing process facilitated by an Issue Tracking System (the focus of
the work presented in this paper) is a collaborative activity involving several roles such as bug reporter, bug triager,
bug xer, project or QA (quality assurance) manager and
developers. Solutions for contribution and performance assessment of software maintenance professionals as well as
defect-xing performance is an area that has recently attracted several researchers attention [1] [5] [8] [10] [12] [13]
[4]. The work presented in this paper is motivated by the
need to develop novel framework and metrics to accurately
and objectively measure contribution of software maintenance professionals by mining data archived in Issue Tracking Systems.
The research aim of the work presented in this paper is
the following:

Categories and Subject Descriptors


H.4 [Information Systems Applications]: Miscellaneous;
D.2.8 [Software Engineering]: Metricscomplexity measures, performance measures; D.2.9 [Software Engineering]: ManagementProductivity, Programming teams

1. To investigate models and metrics to accurately, reliably and objectively measure the contribution and performance of software maintenance professionals (bug
reporters, issue triagers, bug xers, software developers and contributors in issue resolution, quality assurance and project managers) involved in defect xing
process.

General Terms
Algorithms, Experimentation, Measurement

Keywords

2. To investigate role-based contribution metrics and Key


Performance Indicators (KPIs) which can be easily
computed by mining data (data-driven and evidencebased) in an Issue Tracking System.

Mining Software Repositories, Software Maintenance, De-

Permission to make digital or hard copies of all or part of this work


for personal or classroom use is granted without fee provided that copies
are not made or distributed for pro t or commercial advantage and that
copies bear this notice and the full citation on the rst page. To copy
otherwise, to republish, to post on servers or to redistribute to lists, requires
prior speci c permission and/or a fee.
ISEC 13, February 21-23, 2013 New Delhi, India
Copyright 2013 ACM 978-1-4503-1987-4/13/02 ...$15.00.

3. To validate the proposed metrics with software maintenance professionals in Industry (experienced practitioners) and apply the proposed metrics on real-world
dataset (empirical analysis and experiments) demonstrating the eectiveness of the approach using a casestudy.



2.

RELATED WORK AND RESEARCH CONTRIBUTIONS

In this section, we discuss closely related work (to the


study presented in this paper) and present novel research
contributions in context to existing work. We conduct a
literature survey in the area of mining software repositories
for contribution and performance assessment (specically in
context to Software Maintenance) and present our review of
relevant researches.
The most closely related research to the study presented
in this paper is the work by Nagwani et al. However, there
are several dierences between the work by Nagwani et al.
and this paper. Nagwani et al. propose a team-member
ranking technique for software defect archives [12]. Their
technique consists of generating a ranked list of developers
based on individual contributions such as the quantity and
quality of bugs xed, comment activity in discussion forums
and bug reporting productivity [12].
Gousios et al. present an approach for measuring developers involvement and activity by mining software repositories such as source code repository, document archives,
mailing lists, discussion forums, defect tracking system, Wiki
and IRC [5]. They present a list of pre-dened developer actions (easily measurable and weight assigned based on importance) such as reporting a bug, starting a new wiki page,
committing code to the source-code repository which are
used as variables in a contribution factor function [5].
Ahsan et al. present an application of mining developer effort data from bug-x activity repository to build eort estimation models for the purpose of planning (such as scheduling a project release date) and cost estimation [1]. Their
approach consists of measuring the time elapsed between
assigned and resolved status of a bug report and multiplying the time with weights (incorporating defect severity-level
information) to compute developer contribution measure [1].
Kidane et al. propose two productivity indices (for online communities of developers and users of the open source
projects such as Eclipse) in their study on correlating temporal communication patterns with performance and creativity: creativity index and performance index [10]. They
dene creativity index as the ratio of the number of enhancement (change in the quantity of software) integrated
in a pre-dened time-period and the number of bugs resolved
in the same time-period [10]. Performance index is dened
as number of bugs resolved in a pre-dened time-period and
number of bugs reported in the same time-period.
Kanij et al. mention that there are no well established
and widely adopted performance metrics for software testers
and motivate the need for a multi-faceted evaluation method
for software testers [8]. They conduct a survey of Industry
professionals and list ve factors which are important for
measuring performance of software testers: Number of bugs
found, Severity of bugs, Quality of bug report, Ability of bug
advocacy and Rigorousness of test planning and execution
[8].
Rigby et al. present an approach to assess the personality of developers by mining mailing list archives [13]. They
conduct experiments (using a psychometrically-based word
count text analysis tool called as Linguistic Inquiry and
Word Count) on Apache HTTPD servers mailing list to
assess personality traits of Apache developers such as diligence, attitude, agreeableness, openness, neuroticism and

extroversion [13].
Fernandes et al. mention that performance evaluation of
teams is challenging in software development projects [4].
They propose an analytical model (Stochastic Automata
Networks model) and conduct a practical case-study in the
context of a distributed software development setting [4].
In the context of existing literature, the work presented
in this paper makes the following novel contributions:
1. A framework (called as Samiksha: a hindi word which
means review and critique) for contribution and performance assessment for Software Maintenance Professionals. We propose 11 metrics (Key Performance
Indicators in context to the activities and duties of
the appraisee under evaluation) for four dierent roles.
The novel contribution of this work is to assess the developer in terms of the various role played by a developer. The metrics (data-driven and evidence-based) is
computed by mining data in an Issue Tracking System.
While there are some studies on the topic of contribution assessment for Software Maintenance Professionals, we believe (based on our literature survey) that
the area is relatively unexplored and in this paper we
present a fresh perspective to the problem.
2. A survey conducted with experienced Software Maintenance Professionals on the topic of contribution and
performance assessment for bug reporter, bug owner,
bug triager and contributor. We believe that there is a
dearth of academic studies surveying the current practices and metrics used in the Industry for contribution
assessment of Software Maintenance Professionals. To
the best of our knowledge, the work presented in this
paper is the rst study to present the framework, survey from practitioners and application of the proposed
framework on a real-world publicly available dataset
from a popular open-source project.
3. An empirical analysis of the proposed approach on
Google Chromium Issue Tracker dataset. We conduct a series of experiments on real-world dataset and
present the results of our experiments in the form of
visualizations (from the perspective of an appraiser)
and present our insights.

3.

RESEARCH METHODOLOGY AND


FRAMEWORK

The research work presented in this paper is motivated by


the need to investigate solutions to common problems encountered by Software Maintenance Professionals. We believe that inputs from practitioners are needed to inform
our research. Hence we conduct a survey and discussion
with two experienced industry professionals (project manager level with more than a decade experience in software
development and maintenance) and present the results of the
survey as well as our insights (refer to Section 4). We present
11 performance and contribution metrics for four roles: bug
reporter, bug triager, bug owner and collaborator. We dene each metrics and describe the rationale, interpretation
and the formula (Section 5). The metrics are discussed at
an abstract level and are not specic to a project or Issue
Tracker.



Table 1: Dening the metadata of the survey


Attribute
PIK

Denition
Perceived Importance of Key Performance Indicators

CopKC

Current organizational practices to capture KPIs for


Contribution & Performance Assessment

Scale
5: Highly inuence (important)
1: Does not inuence (least important)
1: Captured accurately (formula based i.e. objective measure)
2: Captured accurately (based on subjective information,
self-reporting and perception)
3: Unable to capture accurately
4: Not relevant / Not considered

Table 2: Survey results for the role of Bug Reporter


Survey questions
Number of bugs reported
Number of Duplicate bugs reported
Number of non-Duplicate bugs reported
Number of Invalid bugs reported
Number of high Priority bugs reported
Number of high Severity bugs reported
Number of bugs reported that are later reopened
Number of hours worked
Quality of reported bugs
Followed bug reporting guidelines
Correctly assigned bug area (like WebKit, BrowserUI etc.)
Correctly assigned bug type (like Regression, Performance etc.)
Reported bugs across multiple components (Diversity of Experience)
Reported bugs belonging to a specic component (Specialization)
Participation level delivered (responded to queries and clarications) after reporting bug

Company A
PIK
CopKC
4
1
5
3
3
5
NULL
5
NULL
NULL NULL
3
5
3
5
2
5
NULL
4
NULL
4
3
NULL 3
3

Company B
PIK CopKC
3
1
4
5
5
4
1
5
1
4
1
5
2
5
2
4
3
5
3
2
3
4
3
3

Table 3: Survey results for the role of Bug Owner


Survey questions
Number of bugs assigned or owned
Number of bugs successfully resolved (from the set of bugs owned)
Number of high Priority bugs owned
Number of high Severity bugs owned
Number of resolved bugs that get reopened
Number of hours worked
Participated in (facilitated) discussion through comments on Issue Tracker
Owned bugs across multiple components (Diversity of Experience)
Owned bugs belonging to a specic component (Specialization)
Average time taken to resolve bug
Response time to a directly addressed comment

Company A
PIK CopKC
5
5
NULL
5
1
5
3
5
1
1
4
5
3
5
3
5
NULL
3

Company B
PIK CopKC
4
5
1
4
3
4
1
4
1
3
3
3
3
3
3
4
2
NULL

Table 4: Survey results for the role of Bug Collaborator


Survey questions
Participation level delivered (responded to queries and clarications) through online threaded discussions
Response time to a directly addressed comment on Issue Tracker
Collaborated in bugs across multiple components (Diversity of Experience)
Collaborated in bugs belonging to a specic component (Specialization)
Average time taken to resolve bug
Number of times collaborated on high Priority bugs
Number of times collaborated on high Severity bugs
Number of hours worked

Company A
PIK CopKC
3
NULL
4
NULL
2
4
3
2
5
3
5
NULL
5
NULL
3

Table 5: Survey results for the role of Bug Triager


Survey questions
Number of bugs Triaged
Number of hours worked
Identied Duplicate bug reports
Identied Invalid bug reports
Identied exact bug area (like WebKit, BrowserUI etc.)
Assigned best developer considering skills and workload
Correctly assigned owner (xer)
Correctly assigned bug type (like Regression, Performance etc.)
Correctly assigned Priority/Severity
Participated in (facilitated) discussion through comments on Issue Tracker
Response time to a directly addressed comment on Issue tracker



Company A
PIK CopKC
5
NULL
2
5
NULL
5
NULL
5
3
5
3
4
2
5
3
4
3

Company B
PIK CopKC
5
2
3
4
2
4
2
5
3
5
3
5
2
5
3
4
3

Company B
PIK CopKC
4
1
5
3
2
3
2
3
1
2
3
2
3
2
3

We conduct a case-study on Google Chromium (a widely


used and a popular open-source browser) project. The Issue
Tracker for Google Chromium project is publicly available
and the data can be downloaded using Issue Tracker API.
We implement the proposed metrics and apply the approach
on Google Chromium dataset. We present the results of our
experiments in the form of visualizations and graphs and
interpret the results from the perspective of an appraiser
(Section 7).

4.

SURVEY OF SOFTWARE MAINTENANCE


PROFESSIONALS

We conducted a survey to understand the relative importance of various Key Performance Indicators (KPIs) for the
four roles: bug reporter, bug owner, bug triager and collaborator. We prepared a questionnaire consisting of four parts
(refer to Tables 2, 3, 4, 5). Each part corresponds to one of
the four roles. The questionnaire for each part consists of a
question and two response elds. The question mentions a
specic activity for a particular role. For example, for the
role of bug owner the activity is: number of high priority
bugs xed, and for the role of bug triager the activity is:
number of times the triager is able to correctly assign the
bug report to the expert (bug xer who is the best person
to resolve the given bug report).
One of the response elds, P erceived Importance of Key
Performance Indicators (PIK), denotes the importance of
the activity (for the given role) on a scale of 1 to 5 (5 being highest and 1 being lowest) for contribution assessment.
The other response eld, Current organizational practices
to capture KPIs for Contribution and Performance Assessment (CopKC), denotes the extent to which these performance indicators are captured in practice for performance
evaluation. We broadly dene the notion of the ability to
capture KPIs for Contribution and Performance Assessment
into two heads: non-relevant and relevant. The KPI is either
considered non-relevant (for evaluation of developers) and
hence not measured or it is considered important. These
relevant KPIs are either not captured (inability of the organizations to measure) or they are able to capture it. The
KPIs captured can again be classied based on the approach
to measure it as objective or subjective. Objective measures
are captured using statistics (based on formulas) while subjective measure is evaluation based on the perceived notion
of contribution. This hierarchical classication completely
denes the notion of measurement in organization (refer to
Table 1 specifying the metadata for the survey).
We requested two experienced software maintenance professionals to provide their inputs. Both the professionals
have more than 10 years of experience in the software industry and are currently at the project manager level. One of
them is working in a large and global IT (Information Technology) services company (more than 100 thousand employees) and the other is working in a small (about 100 employees) oshore product development services company. Tables
2, 3, 4 and 5 show the results of the survey. Some of the
cells in the table have values   or  N U LL .   means
that at the time of the survey, the survey respondent was
not asked this question and  N U LL implies that the survey respondent did not answer this question (left the eld
blank).
One of the interpretations from the survey result is the gap

between the perceived importance of certain performance indicators and the extent to which such indicators are objectively and accurately measured in practice. We believe that
the specied performance indicators (which are considered
important in the context of contribution assessment) are not
measured rigorously due to lack of tool support. Results of
the survey (refer to Tables 2, 3, 4 and 5) are evidences supporting the need of developing contribution and performance
assessment framework for Software Maintenance Professionals.
It is interesting to note that the two experienced survey
respondents (from dierent sized organization) have given
higher or same value to some KPIs while for others, the result varied considerably. Based on the survey results, we
infer that the perceived value of an attribute varies across
organizations. Therefore, the proposed metric should address the perceived value of the attribute for the organization while measuring the contribution of a developer for a
given role.

5.

CONTRIBUTION EVALUATION METRICS

In this Section, we describe 11 performance and contribution assessment metrics for four dierent roles. These 11
proposed metrics is not an exhaustive list but represents few
important metrics valued by the organization. Following are
few introductory and relevant concepts which are common
to several metrics denition and Google Chromium project
describes these bug-lifecycle and reporting guidelines1 .
A closed bug can have status values as: Fixed, Veried,
Duplicate, WontFix, ExternalDependency, FixUnreleased,
Invalid and Others2 . An issue can be assigned a priority
value from 0 to 3 (0 is most urgent; 3 is least urgent).
Google Chromium issue reports has a eld called as Area3 .
Area represents the product area to which an issue belongs.
An issue can belong to multiple areas (for Google Chromium
Project) such as BuildTools, ChromeFrame, Compat-System,
Compat-Web, Internals, Feature and WebKit. A product
area BuildTools maps to Gyp, gclient, gcl, buildbots, trybots, test framework issues and the product area WebKit
maps to HTML, CSS, Javascript, and HTML5 features.

5.1
5.1.1

Bug Owner Metrics


Priority Weighted Fixed Issues (PWFI)

We propose a contribution metric (for the bug owner)


which incorporates the number of bugs xed as well as the
priority of each bug xed. For each unique bug-owner in the
dataset, we count only Fixed or Veried issues and not Duplicate, WontFix, ExternalDependency, FixUnreleased and
Invalid (because an Invalid bug or a WontFix bug is not a
contribution by the bug owner to the project).
P W F I(d) =

|P |

i=0

WPi

NPdi
NPi

(1)

where WPi is the weight (a tuning parameter which is an


1

http://www.chromium.org/for-testers/bug-reportingguidelines
2
user dened Status as permitted by Google Chromium Bug
Reporting Guidelines
3
http://www.chromium.org/for-testers/bug-reportingguidelines/chromium-bug-labels



indicator of importance, higher for urgent bugs and lowest


for least urgent bugs) or multiplication factor incorporating
issue priority information. Due to the weight WPi (sum of
the weight equal to one), a bug-owner contribution is not
just a function of absolute number of bugs xed and is also
dependent on the type of bugs xed. NPi is the total number
of Fixed or Veried bugs in the dataset with priority Pi and
NPdi is the number of priority Pi bugs Fixed or Veried by
developer d in the dataset. The value of P W F I(d) will be
between 0 and 1 (higher the value, more is the contribution).

5.1.2

reported bugs is an indicator of positive contribution for the


role of bug owner and vice-versa.
This metric is dened for Closed bugs with status Fixed
or Veried. It calculates deviation of the time required to
x a bug from the median time required to repair bugs with
same priority.
DM T T R(o, p) =


Specialization and Breadth Index (SBI)

where

A developer can specialize and be an expert in a specic


product area (component) or can have knowledge across several product areas. Let n be the total number of unique
product areas and d denotes a developer. We represent
Bn (d) as an index for breadth of expertise and 1 Bn (d) as
an index for specialization. pi is the probability value (probability of developer d working on component i) derived from
historical dataset. A lower value of Bn (d) means specialization and a higher value means breadth of expertise of a
developer.

Bn (d) =

i=n


pi logn (pi )

|P |



wi (Tbi M T T RPi )

i=0 bi

|Tbi M T T RPi |
0

DM T T R(o, n) =

where

if (Tbi M T T RPi )<0


otherwise

1
Ttotal

|Tbi M T T RPi |
0

|P |



wi (Tbi M T T RPi )

i=0 bi

if (Tbi M T T RPi )>0


otherwise

(4)
Time required to x a bug may be less than, greater than or
equal to the median time required to x same priority bugs.
Less or equal time required to x a bug is a measure of positive contribution and vice-versa. We propose two variants
of DMTTR(o) i.e. DMTTR(o,p) and DMTTR(o,n). For
an owner o, DMTTR(o,p) measures positive contribution p
(relatively less time to repair) and DMTTR(o,n) measures
negative contribution n (relatively more time to repair). wi
is a normalized tuning parameter which weighs deviation
proportional to priority i.e. less time required to x a high
priority bug adds more value (to the contribution of a bug
owner) than less time required to x a low priority bug.
Here bi is a bug report with priority Pi . Tbi is the total time
required to x bug bi and is measured as dierence in the
reported timestamp of bug as Fixed and rst reported comment with status Started or Assigned in the Mailing List.
M T T RPi is median of the time required to x bugs with
Priority Pi (median is robust to outliers5 ). Ttotal is a normalizing parameter and is calculated as summation of the
time taken by each participating bug.

(2)

The Bn (d) value can vary from 0 (minimal) to 1 (maximum). If the probability of all Areas is the same (same
distribution) then the Bn (d) is maximal. This is because
the pi value is the same for all n. On the other extreme end,
if there is only one Area associated to a particular developer
d then the Bn (d) becomes minimal (value of 0). The interpretation is that when Bn (d) is low for a specic developer
then it means a small set of Product Areas are associated
with the specic developer d.

Deviation from Median Time to Repair (DMTTR)

Bugs reported in Issue Tracking System can be categorized into four classes as adaptive, perfective, corrective and
preventive4 (as part of Software Maintenance Activities).
Depending on their class these bugs have dierent urgency
with which they must be resolved. For instance, a loophole reported in security of a software must be xed prior
to improving its Graphical User Interface (GUI).
According to Google Chromium bug label guidelines, priority of a bug is proportional to urgency of task. More
urgent a task is, higher is its priority. Also, urgency is a
measure of the time required to x bug. Therefore, high
priority bugs should take less time to repair as compared
to low priority bugs (more urgent tasks take less time to
x). To support this assertion, we conducted an experiment
on Google Chromium Issue Tracking System dataset to calculate the median time required to repair bugs with same
priority. The results of the experiment support the assertion
(as shown in Table 6).
One of the key responsibilities for the role of bug owner is
to x bugs in-time where time required may be inuenced
by various external factors. Bettenburg et al. in their work
pointed out that a poor quality reported bug increases the
eorts and hence the time required to x it [2] [3]. Irrespective of these factors, ensuring timely completion of the
4

Ttotal

(3)

i=1

5.1.3

5.2

Bug Reporter Metrics

In Issue Tracking System bugs are reported by multiple


users (project members and non-project members). Each
of these users may report bugs with dierent amount and
quality of information. This fact was highlighted by Schugerl
et al. in their work. They said that quality of reported
bugs vary signicantly [14] and good quality reported bug
optimizes the time required to x it and vice-versa [2].
Chromium in its bug reporting guidelines specify the factors that ensure quality of the bugs reported. However, evaluating bug reports based on these guidelines is a subjective
process [14], so is assessing the contribution of the bug reporter (result of Online Survey mentioned in Section 4).

5.2.1

Status of Reported bug Index (SRI)

A Closed bug may have status values (any one) dened


in equation 6. The fraction of Closed bugs reported by a
developer is a measure of contribution for the role of bug
reporter. However, the status with which these bugs were
Closed is a measure of quality of contribution. For instance,
5

http://en.wikipedia.org/wiki/Software maintenance



http://en.wikipedia.org/wiki/Outlier

a bug Closed with status Fixed adds more value to the contribution of bug reporter than a bug closed as Invalid.

reassignment is thus an important key performance indicator for the role of a bug triager.
Google Chromium in its Issue Tracking System does not
give any account of the Triager who initially assigned the
project owner but subsequent reassignments (through labels
dened in comments in the Mailing List) are available.

|s|

SRI(r) =

Nr 
Cr
ws s

N
Cs
s=1

(5)

5.3.1

Let N be the total number of bugs reported and Nr be


the total number of bugs reported by reporter r. ws is a
weight normalized tuning parameter dened for each Cs ,
where Cs is the count of bugs reported with status s. Csr is
the count of bugs (reported by reporter r) with status s. s
is dened as follows:

Reassignment Effort Index (REI)

Reassignments are not always harmful and can be the result of ve primary reasons: identifying root cause, correct
project owner, poor quality reported bug, diculty to identify proper x and workload balancing [6]. In an attempt to
identify correct project owner, triagers make large number
of reassignments. These large number of reassignments to
correctly identify project owner is a measure of the eorts
incurred by a bug triager.

s = < F ixed >, < V erif ied >, < F ixU nreleased >,
< Invalid >, < W ontF ix >, < Duplicate >,
(6)
< ExternalDependence >, < Other 6 >

REI(t) =

The relative performance is a fraction calculated with respect to baseline SM ode , where SM ode is frequently delivered
performance.

5.2.2

Let N be the total number of bug reports which were


at least once reassigned and Nt is the total number of bug
reports which were at least once reassigned by triager t.
This metric is dened for Closed bugs with status Fixed or
Veried. For a given bug report b, CRb is the count of total
number of project owner reassignments and CRbt is the count
of project owner reassignments by triager t.

Google Chromium Issue Tracking System denes priority


of a bug report in terms of urgency of task. A task is urgent if
it aects business7 . Among other factors, business is aected
(measured by priority) if bug in the software inuences large
fraction of the user base. Hence, reporting large fraction of
high priority bugs is an indication of degree of customer
eccentricity.
|P |


WPi

i=0

NPri
NPi

5.3.2

Accuracy of Reassignment Index (ARI)

Correctly identifying project owner is a challenging task.


It involves large number of reassignments by multiple triagers.
Each of these triagers make their contribution through reassignments. However, accuracy index (for the role of bug
triager) is a measure of number of times triager correctly
identies project owner in a bug with large number of reassignments.

(7)

where WPi is a normalized tuning parameter with value


proportional to the priority. NPi is the total number of
priority Pi bugs reported. NPri is the total number of priority
Pi bugs reported by bug reporter r. This metric is dened
for Closed bugs with status Fixed or Veried.

5.3

(8)

Degree of Customer Eccentricity (DCE)

DCE(r) =

Nt  CRbt

N
CRb

ARI(t) =


1
CRb
n CRmax

(9)

where n is the total number of bug reports which were


accurately reassigned (last reassigned) by triager t. For all
bug reports b, CRb measures the count of the total number
of reassignments before it was accurately (nally) reassigned
by triager t. CRmax measures the maximum number of
reassignments before nal. The value of ARI(t) lies between
0 and 1 (higher the value, more the contribution).

Bug Triager Metrics

In Triage Best Practices8 written for Google Chromium


project it is stated that identifying correct project owner is
one of the roles of a bug triager. Guo et al. in their work
said that determining ownership is one of the ve primary
reasons for bug report reassignment [6]. Jeong et al. in
their study on Mozilla and Eclipse reported that 37%-44%
of the reported bugs are reassigned with increase in time-tocorrection [7].
A bug triager is supposed to conduct quality check on
bug reports (to make sure it is not spam and contains required information such as steps to reproduce etc.) and
assign bug report to a developer who is (available) expert
to resolve similar bugs. A bug triager should have good
knowledge about the expertise, workload and availability of
bug developers (xers). Incorrectly assigned bug (error by
bug triager) cause multiple reassignments which in turn increases the repair time (productivity loss and delays). We
believe that computing the number of correct and incorrect

5.4

Bug Collaborator Metrics

A group of people who collectively (as a team) contribute


to x a bug are termed as bug collaborators. According to
User guide for Project Hosting Issue Tracker9 , collaborators
provide additional information (in the form of comments in
Mailing List) to x bugs. These comments contribute in
timely resolution of reported bug.

5.4.1

Cumulative Comment Score (CCS)

Introduction to Issue Tracker, a user guide for Project


Hosting Issue Tracker, states that developers add multiple
comments to Mailing List.

http : //sqa.f yicenter.com/art/Bug T riage M eeting.html


http://www.chromium.org/for-testers/bug-reportingguidelines/triage-best-practices
8



http://code.google.com/p/support/wiki/IssueTracker



The number of comments entered in Mailing List is contribution of the developer and is used to measure performance.

CCS(c) =

1  Nbc
Kb
n
Nb

values contribution proportional to importance of area and


normalizes it.

6.

(10)

Here, Nb is the total number of comments in bug report b.


Nbc is the total number of comments entered by collaborator
c in bug report b. n is the count of total number of bug
reports. Kb is a tuning parameter to account for variable
number of comments in bug reports. Kb is dened as ratio
of Nb and NM ode , where NM ode is the count of number of
comments entered frequently by collaborators (except collaborator c) on a bug b. This metric is dened for Closed
bugs with status Fixed or Veried.

5.4.2

Potential Contributor Index (PCI)

A developer can specialize and be an expert in one or


multiple areas (as calculated in SBI(d)). Multiple developers (potential contributors) are added to CC-list based on
prior knowledge (historical data or past experience) of their
domain of expertise.
P CI(c) =

 Nacc
Va
Na

(11)

7.

Potential Contributor Index-1 (PCI-1)

P CI 1 extends the idea presented in P CI. In bug xing


lifecycle, contributors are incrementally appended to CC-list
to meet additional support or expertise requirement.
P CI 1(c) =

 NaappCC
Va
Na

(12)

Rest parameters being same (as explained in P CI), NaappCC


is the count of number of times collaborator c was incrementally appended to the CC-list of bugs with Area a.

5.4.4

Contribution Index (CI)

When a bug is reported, developers are added to CC-list


as potential or prospective contributors. However, only few
people mentioned in CC-list contribute to x bug through
comments in Mailing List.

CI(c, a) =

 Nacol
Va
Nacc

EXPERIMENTAL ANALYSIS

In this section, we analyze and interpret results for the


proposed metrics (dened for bug reporter, bug triager, bug
owner and bug xer) calculated on Google Chromium Issue Tracking System dataset. These metrics are dened for
Closed bugs with status Fixed or Veried.
Figure 1a shows scatter plot of Priority Weighted Fixed
Issues (PWFI) metric for the role of bug owner. Here X-axis
is bug owners in dataset (arranged chronologically in order
of non-decreasing performance) and Y-axis is score scale.
We notice that 3 out of 342 bug owners displayed exceptional performance on score scale (as shown in Figure 1a).
Software maintenance project manager or appraiser may use
these results to identify dierent levels of performance and
measure individual contribution in a data-driven, evidencebased and objective manner. For example, 25% percentile
of bug owners have PWFI value above 0.00014 and 75% percentile of bug owners have PWFI value above 0.00273 (for
more details refer Table 8).
Figure 1b is scatter plot of Specialization and Breadth Index (SBI) metric dened for bug reporter. Similar interpretation as in Figure 1a can be made for this metric i.e. varying
level of performance, multiple levels and tiers of contribution level and very few exceptional performances in contrast
to other members in project team. This metric aids in understanding specialization and breadth of knowledge within
team, in addition to individual performance. This knowledge can be used by organizations in preparing strategies
for training manpower and assigning responsibilities based
on specialization and breadth of knowledge. For example,
1.7% of bug reporters demonstrate signicant breadth of
knowledge relative to other bug reporters (SBI score value
greater than or equal to 0.5).
Figure 1c and 1d is bar chart for PCI (Potential Contribution Index) and PCI-1(Potential Contribution Index-1) its extension respectively . These metrics are dened for the
role of bug collaborator. X-axis in Figure 1c is the contribution by dierent collaborators for a given component and Y-

Given a bug report b, PCI metric identies list of contributors who are valued to x it by assigning them values in the
range from 0 to 1. It is dened for Closed bugs with status
Fixed or Veried and CC-list dened. Here Na is the total
number of bugs reported in area a. Nacc is the number of
times prospective collaborator c was mentioned in CC-list
of bugs reported in area a. Va is the importance of area of
reported bug (as calculated in P W F I for area a).

5.4.3

EXPERIMENTAL DATASET

We conduct a series of experiments on Google Chromium


project dataset. Google Chromium is a popular and widely
used open-source browser. The Issue Tracker for Google
Chromium project is publicly available and the data can
be downloaded using Google Issue Tracker API. We perform empirical analysis on real-world and publicly available dataset so that our experiments can be replicated and
our results can be benchmarked and compared by other researchers. Table 7 displays the experimental dataset details.
We perform empirical analysis on one year dataset (as the
context is contribution and performance assessment and we
select one year as a performance appraisal cycle) which consists of 24, 743 bugs. In this reported timestamp of one year,
6, 765 developers reported bugs. These bugs were triaged by
500 developers. The triaged bug reports were owned by 391
bug owners. These bugs were nally xed by 5, 043 developers who collaborated to x it. Finally out of 24, 743 reported
bugs, 8, 628 bug reports were Closed with status Fixed or
Veried.

(13)

CI(c,a) measures expected to actual contribution for the


role of bug reporter. Here Nacc is the number of times collaborator c was added in CC-list of bugs reported in area a.
Nacol is subset of Nacc in which developers mentioned in CClist actually contributed to x it. Va is the importance of
area of reported bug (as calculated in P W F I for area a). It



Table 6: Priority-Time correlation statistics


Priority
P0
P1

P2
P3

Count
(of bugs)
00258
02,839
17,923
02,265

Maximum
0495.37
1120.37
1273.15
1180.56

Time Duration (in days)


Quartile Median 1/4 Quartile
010.69
02.71
0.44
044.10
11.04
2.03

129.63
21.99
1.93
206.02
55.90
5.03
3/4

Minimum
0.0002
0.0000
0.0000
0.0000

Table 7: Experimental dataset


Duration
Start date of the rst bug report in the dataset
Bug ID of the rst bug report
Total number of Reported bugs
Total number of Forbidden bugs
Total number of Available bugs

1year
End date of the last bug report in the dataset
Bug ID of the last bug report
25,383
640
(25,383-640)=24,743

01/01/2009
5,954

31/12/2009
31,403

Table 8: Statistical results of proposed metrics (yearly evaluation)


Metric
PWFI
SBI
DMTTR
SRI
DCE
REI
ARI
CCS
PCI10
PCI-111
CI

Total number of unique


Role
Count
342
Bug Owner
405
75
5830
Bug Reporter
2050
61
Bug Triager
15
1549
522
Bug Collaborator
307
9

Minimum
0.00004
0.00092
0.0
0.0
0.00003
0.00083
0.5
0.25
0.0
0.0
0.00086

Maximum
0.06119
0.78418
1.44123
0.03225
0.05267
0.81832
1
2.75
2.88489
0.91472
0.01861

Statistical Parameters
Median Mode
Percentile 25 %
0.00045 0.00004 0.00014
0.01042 0.00099 0.00219
0.09194 0.0
0.00772
0.00002 0.00002 0.00002
0.00066 0.00003 0.00003
0.00496 0.00165 0.00165
0.5
0.5
0.5
0.25
0.25
0.25
0.01407 0.0
0.0
0.0
0.0
0.0
0.00819 0.00534

axis is the list of components for which developer was added


in CC-list. In this graph, we note non-uniform contributorcomponent distribution i.e. large number of people collaborate on some components as compared to others. It also
shows the most potential contributors for bugs belonging to
specic component. This index can be used by organization
to identify interests and knowledge domain of contributors.
Figure 1d shows the inverted view of PCI. In this gure, Xaxis is the list of components and Y-axis is the contribution
by developers. This graph compares contribution of developers on a component. For instance, a...@chromium.org
(shown in blue) contributed signicantly in all the components.
Figure 1e is a thresholded score of Status of Reported bug
Index (SRI) metric. X-axis is the bug reporters arranged in
the order of non-decreasing performance. Y-axis is the score
on logarithmic scale. This graph shows a broad level picture
of the metric. We notice two visibly distinct classes separated by threshold, where threshold (mode of performance)
is close to low scoring bug reporters.
Figure 1f is a plot of a thresholded score for Cumulative
Comment Score (CCS) metric dened for the role of bug
collaborator. Here X-axis of the graph is bug collaborators
arranged in the order of non-decreasing performance and
Y-axis is performance with threshold at 1, where 1 is the
value given to performance delivered by a large fraction of
bug collaborators. This line dierentiates high performance
developers from low performance developers. In this graph,

Percentile 75 %
0.00273
0.06615
0.46293
0.00007
0.00024
0.02643
0.8125
0.5
0.04925
0.01407
0.01121

Range
0.06116
0.78326
1.44123
0.03225
0.05264
0.8175
0.5
2.5
2.88489
0.91472
0.01775

we notice that out of 1549 bug collaborators (for more details refer to Table 8), only 53 collaborators crossed threshold thereby clearly distinguishing exceptional performances
from general.
Figure 1g and 1h shows bar charts for Deviation from
Mean Time To Repair (DMTTR) and Contribution Index
(CI). DMTTR is dened for role of bug owner to measure
positive and negative contribution. X-axis of the graph is
performance score and Y-axis is bug owners. Bar chart compares positive and negative contribution, thus presenting
overall performance for the role. We notice in the graph
that podivi...@chromium.org always exceeded the Median
time required to x the bugs. In the chart, we observe that
pinkerton@chromium.org delivered more positive contribution as compared to negative contribution. Organizations
can use these statistics to predict the expected time to get
bugs resolved given that bug is assigned to a specic bug
owner.
Figure 1h shows a bar chart to measure Contribution Index (CI) of bug collaborator a...@chromium.org. Here Xaxis is the list of components for which a...@chromium.org
was mentioned in CC-list and/or contributed. Y-axis is the
total number of bug reports. In this graph, we notice that
a...@chromium.org was 47 times mentioned in CC-list of
bugs reported in component WebKit, however developer collaborated only once (refer to Table 8 for more detailed description).
Figure 1i and 1j is histogram plot of Reassignment Eort



Index (REI) metric and Degree of Customer Eccentricity


(DCE) metric dened for bug triager and bug reporter respectively. X- axis (on both the graphs) is performance score
of developers. Y-axis of REI is count (frequency) of number
of bug triagers and Y-axis of DCE is count (frequency) of
number of bug reporters. The histogram plot classify developers into classes (as dened by bin size). The graphs infer
level of performance delivered and identify the fraction of
developers in each class. The two graphs give information
at dierent levels of abstraction, where level of abstraction
is dened in terms of number and size of bins used to interpret data. Graph in Figure 1i gives a broad view stating that 49.17% of developers delivered performance greater
than 0.009. A detailed view of the performance for the role
of bug reporter is shown in Figure 1j. In this graph we notice that only 0.537 % of the developers deliver performance
greater than 0.008.
The 10 graphs in Figure 1 show dierent chart types dened for dierent roles. These chart types present information at dierent levels of abstraction with dierent interpretation (both from contribution assessment perspective and
organizational perspective). A more detailed statistical description of the results for all 11 metrics is shown in Table
8.

8.

Tracker data (through APIs or direct database access) for


computing pre-dened performance metrics (incorporating
role-based activities) and using visualization can provide an
evidence-based and data-driven solution to the problem of
accurate measurement of individual contributions in a software maintenance project (specically on bug resolution)

10.

11.

REFERENCES

[1] Syed Nadeem Ahsan, Muhammad Tanvir Afzal, Safdar Zaman,


Christian Guetl, and Franz Wotawa. Mining eort data from
the oss repository of developers bug x activity. Journal of IT
in Asia, 3:6780, 2010.
[2] Nicolas Bettenburg, Sascha Just, Adrian Schroted, Cathrin
WeiB, Rahul Premraj, and Thomas Zimmermann. Quality of
bug reports in eclipse. In Proceedings of the 2007 OOPSLA
workshop on eclipse technology eXchange, eclipse 07. ACM
Digital Library, 2007.
[3] Nicolas Bettenburg, Sascha Just, Adrian Schroted, Cathrin
Weiss, Rahul Premraj, and Thomas Zimmermann. What makes
a good bug report? In Proceedings of the 16th ACM SIGSOFT
International Symposium on Foundations of software
engineering, SIGSOFT08/FSE-16. ACM Digital Library, 2008.
[4] Paulo Fernandes, Afonso Sales, Alan R. Santos, and Thais
Webber. Performance evaluation of software development
teams: a practical case study. Electron. Notes Theor. Comput.
Sci., 275:7392, September 2011.
[5] Georgios Gousios, Eirini Kalliamvakou, and Diomidis Spinellis.
Measuring developer contribution from software repository
data. In Proceedings of the 2008 international working
conference on Mining software repositories, MSR 08, pages
129132, New York, NY, USA, 2008. ACM.
[6] Philip J. Guo, Thomas Zimmermann, Nachiappan Nagappan,
and Brendan Murphy. not my bug! and other reasons for
software bug report reassignments. In Proceedings of the ACM
2011 conference on Computer supported cooperative work,
CSCW11. ACM digital library, 2011.
[7] Gaeul Jeong, Sunghun Kim, and Thomas Zimmermann.
Improving bug triage with bug tossing graphs. In Proceedings
of the the 7th joint meeting of the European software
engineering conference and the ACM SIGSOFT symposium
on The foundations of software engineering, ESEC/FSE09.
ACM digital library, 2009.
[8] T. Kanij, R. Merkel, and J. Grundy. Performance assessment
metrics for software testers. In Cooperative and Human
Aspects of Software Engineering (CHASE), 2012 5th
International Workshop on, pages 63 65, june 2012.
[9] Robin Kessler. Competency-Based Performance Reviews: How
to Perform Employee Evaluations the Fortune 500 Way.
Career Press, 2008.
[10] Yared H. Kidane and Peter A. Gloor. Correlating temporal
communication patterns of the eclipse open source community
with performance and creativity. Comput. Math. Organ.
Theory, 13(1):1727, March 2007.
[11] Herwig Kressler. Motivate and Reward: Performance
Appraisal and Incentive Systems for Business Success.
Palgrave Macmillan, 2003.
[12] Naresh Nagwani and Shrish Verma. Rank-me: A java tool for
ranking team members in software bug repositories. Journal of
Software Engineering and Applications, 5(4):255261, 2012.
[13] Peter C. Rigby and Ahmed E. Hassan. What can oss mailing
lists tell us? a preliminary psychometric text analysis of the
apache developer mailing list. In Proceedings of the Fourth
International Workshop on Mining Software Repositories,
MSR 07. IEEE Computer Society, 2007.
[14] Philipp Schugerl, Juergen Rilling, and Philippe Charland.
Mining bug repositories- a quality assessment. In 2008
International Conference on Computational Intelligence for
Modelling Control and Automation. IEEE Computer Society,
2008.

THREATS TO VALIDITY AND LIMITATIONS

The survey has two respondents (Software Maintenance


Professionals working in Industry) and can be inuenced by
bias (based on individual experiences and beliefs). In future, we plan to conduct survey of more professionals which
will result in reducing bias and generalizing our conclusions.
The experiments are conducted on a dataset belonging to
one project (Google Chromium open-source browser) collected for a duration of one year (by design due to setting
the appraisal cycle as one year). More experiments can be
conducted on dataset belonging to diverse projects to investigate the generalizability of the conclusions. Due to limited
space in the paper, we dened 11 metrics for the four roles
of a software developer. However, more such metrics can be
dened covering various aspects and performance indicators.
The study presented in this paper serves the objective of introducing the broad framework (which can be extended by
adding more metrics), viewing it in relation to an industry
survey and validating it on a real-world dataset.

9.

ACKNOWLEDGEMENT

The work presented in this paper is supported by TCS


Research Fellowship for PhD students awarded to the rst
author. The author would like to acknowledge Dr. Pamela
Bhattacharya and survey respondents for providing their
valuable inputs.

CONCLUSIONS

We propose a contribution and performance evaluation


framework (consisting of several metrics for dierent types
of roles) for Software Maintenance Professionals motivated
by the need to objectively, accurately and reliably measure
individual value-addition to the project. We conduct a survey with experienced Software Maintenance Professionals
working in the Industry to throw light on the Key Performance Indicators and the extent to which such indicators
are measured in practice. We implement the proposed metrics and apply it on a real-world dataset by conducting a
case-study on Google Chromium project. Experimental results demonstrate that the proposed metrics can be used to
dierentiate between contributors (based on their roles) and
help organizations continuously monitor and gain visibility
on individual contributions. We conclude that mining Issue



You might also like