You are on page 1of 10

Empirical Research in Software Architecture

How far have we come?

Matthias Galster Danny Weyns


Department of Computer Science and Software Engineering Department of Computer Science
University of Canterbury Katholieke Universiteit Leuven, Belgium
Christchurch, New Zealand Linnaeus University, Vxj, Sweden
mgalster@ieee.org danny.weyns@cs.kuleuven.be

Abstract Context: Empirical research helps gain well-founded demonstrator) is the dominant research type in architecture-
insights about phenomena. Furthermore, empirical research related areas such as component-based software engineering
creates evidence for the validity of research results. Objective: [3]. In general, empirical models and methods do not appear to
We aim at assessing the state-of-practice of empirical research in be very popular in the architecture community [4].
software architecture. Method: We conducted a comprehensive A study from 2015 found that around 94% of papers at
survey based on the systematic mapping method. We included all recent editions of premier software engineering conferences
full technical research papers published at major software included an empirical method (e.g., case study, controlled
architecture conferences between 1999 and 2015. Results: 17% of experiment) [5]. However, no comprehensive reports exist on
papers report empirical work. The number of empirical studies
the state-of-practice of empirical research in software
in software architecture has started to increase in 2005. Looking
at the number of papers, empirical studies are about equally
architecture. Such reports could offer insights into the maturity
frequently used to a) evaluate newly proposed approaches and b) of the field of software architecture research and the amount
to explore and describe phenomena to better understand and type of evidence available to increase confidence in
software architecture practice. Case studies and experiments are research findings. Furthermore, such reports could pinpoint
the most frequently used empirical methods. Almost half of areas in research practices that are still weak and potentially
empirical studies involve human participants. The majority of require more encouragement and appreciation in the
these studies involve professionals rather than students. community, and that may require more focused training of
Conclusions: Our findings are meant to stimulate researchers in architecture researchers.
the community to think about their expectations and standards of
empirical research. Our results indicate that software B. Paper Goal and Research Questions
architecture has become a more mature domain with regards to We aim at finding out whether the increasing trend of
applying empirical research. However, we also found issues in applying empirical research in software engineering is also true
research practices that could be improved (e.g., when describing for software architecture research, or if evidence in software
study objectives and acknowledging limitations). architecture research primarily relies on anecdotes and rhetoric.
Therefore, the goal of this paper is to analyze (full technical)
Keywords software architecture, empirical research, state-of- research papers to better understand the state-of-practice of
practice empirical research from the point of view of researchers in the
context of software architecture. To operationalize this goal,
I. INTRODUCTION we define several research questions.
RQ1: How is empirical research applied in software
A. Problem and Motivation architecture? Empirical research can be used to evaluate a
The objective of empirical software engineering research is newly proposed approach (e.g., a controlled experiment to
to collect and use evidence to advance methods, processes, compare a new architecture recovery approach A to an
techniques, and tools [1]. Evidence obtained from empirical alternative approach B), or it can be used to increase our
works helps us build trust and confidence in research findings understanding about a phenomenon (e.g., as an exploratory or
rather than relying on promotional anecdotes or rhetoric. Since descriptive survey to understand what makes software
software architecture is primarily an applied discipline, architecture decision making difficult). Therefore, we aim at
evidence includes insights about the usefulness and finding out whether empirical work in software architecture is
applicability of research findings in practice. published as stand-alone studies (e.g., an exploratory survey)
In their 2010 article on applying empirical research to or if empirical studies tend to be published together with
software architecture, Falessi et al. promote the application of solution proposals (e.g., the proposal of a new approach
the empirical paradigm to software architecture [1]. However, together with an empirical evaluation). Furthermore, we
there has been little effort to systematically gather, rigorously investigate what empirical methods (case study, experiment,
analyze and widely disseminate empirical evidence in software etc.) are most prominent. This research question is exploratory
architecture [2]. olution (which in nature and helps identify typical use cases of empirical
pr software architecture research.
and the applicability of the solution by a small example or
RQ2: What is the role of human participants in empirical D. Paper Structure
software architecture research? Software architecture activities In Section II we discuss background and related work on
are often design and creative activities and therefore require empirical research in software engineering and software
human input. architecture. We then outline our research method in Section
many trade-off decisions, which III before we present the results in Section IV. Our findings are
depend on the experience and expertise of those involved. further discussed in Section V. We conclude in Section VI.
Therefore, we analyze to what degree humans are involved in
empirical studies. With regards to human participants as II. BACKGROUND AND RELATED WORK
subjects in software engineering research, there has been a
In 2006, Zannier et al. investigated empirical software
debate about whether students are representatives for
engineering research at the International Conference on
professionals [6]. Also, due to limited availability of
Software Engineering (ICSE) [10]. The study picked 5% of all
participants, sometimes studies rely on colleagues or students
papers at ICSE. It found that 70% of papers in the sample
of the researchers conducting the study which may result in
included some form of evaluation. Experience reports rather
biased findings [7]. Therefore, we investigate to what degree
than rigorous empirical studies following a well-established
empirical architecture research involves students and
method were the most frequently type of work presented.
professionals.
Furthermore, they found an absence of replications and few
RQ3: To what degree does empirical software architecture
papers that state a clear hypothesis. A study in 2015 on views
research acknowledge validity threats? To increase the
of internal and external validity in software engineering
confidence in empirical results, we need to be aware of validity
research [5] reviewed 405 papers from premier software
threats. Without a discussion of validity threats, it is difficult to
engineering conferences (ICSE, ESEC/FSE and EMSE). That
judge the strengths and limitations of a study with regard to the
study reported an increasing trend of empirical research.
validity and applicability of its outcome [8]. Therefore, we
Some studies on the use of particular empirical methods
study if current empirical research reports validity threats, and
exist. For example, Sjoberg et al. studied experiments in
if so, what types of threats are discussed. For example, do
software engineering [11] and found in 2005 that only around
studies focus on internal validity (to control every aspect of a
2% of papers in leading software engineering conferences and
study), external validity (to show a real-world effect, but
journals report controlled experiments. In 2013, systematic
without fully understanding which factors actually caused a
literature reviews were empirically explored by Zhang and Ali
certain result), or something else?
Babar [12]. Their study found that since 2004 systematic
C. Contributions literature reviews have become popular in software engineering
Our research is meant to stimulate software architecture but it is challenging to balance effort and rigor when
researchers to think about their expectations and standards of conducting such reviews. An analysis of recent systematic
empirical research (i.e., our aim is not to criticize papers). We mapping studies was performed by Petersen et al. in 2015 [13].
provide the following contributions: With regards to empirical practices, that study found that most
mapping studies report validity threats. In 2015, de Magalhaes
Overview of the state-of-practice over time of applying
et al. investigated replications in software engineering and
empirical research in software architecture. This provides
found that we still lack standardized concepts, guidelines and
insights in the growing maturity of empirical research and
methodologies to fully support replications [14].
awareness for the need of evidence. It also offers
Several analyses have been presented to investigate
researchers new perspectives and opportunities to study and
empirical research in particular areas of software engineering.
advance empirical work.
For example, Daneva et al. briefly reviewed empirical work on
Analysis of how empirical research is used and why.
software requirements engineering [15]. The study reported a
Software architecture researchers can use this information
significant growth in empirical requirements engineering
to select methods and to benchmark their own work. This
research with a jump in 2004 and 2005. Dyba and Dingsoyr
can assist researchers in building new theories and
investigated the status of empirical research on agile software
evaluating their work. Educators can identify issues when
development [16]. The study concluded that there is a need for
training new researchers in applying empirical research.
more and better empirical studies on agile software
Analysis of how trustworthy empirical software development. In 2013, Weyns and Ahmad reported a
architecture research is, in the sense to what degree validity systematic literature review on the validation of architecture-
threats are acknowledged. Practitioners can use these based approaches for self-adaptive systems [17]. They showed
insights to form their view on software architecture that only 2.5% of studies in this area applied empirical
research (and the community). methods. In software architecture it has been claimed that it is
We do not argue that more empiricism is always better or hard to apply empirical methods (e.g., due to difficulties
that empiricism determines (or is even the key for) impact in measuring architecture goodness, dependencies between design
software architecture. As argued by Shaw, the nature of decisions, social factors, types of required subjects in studies
validation can differ depending on the nature of a research with human participants) [18] but, to the best of our
question [9]. Many seminal papers (e.g., Ward Cunningham's knowledge, no study exists on the state-of-practice of empirical
paper on technical debt) have no empirical results but research in this field. A special issue of the Empirical Software
significant impact. We discuss this further in Section V.A Engineering journal was published in 2011 [2] and a workshop
when we relate research themes to our findings. on Empirical Assessment in Software Architecture (EASA)
was held at WICSA in 20081 and 20092. However, the and programs, and page length (e.g., eight pages at ECSA were
contributions of these initiatives focused on presenting considered short papers).
empirical research, rather than on reflecting on research and
providing an overview of the state-of-practice. Therefore, our TABLE I. NUMBER OF ANALYZED PAPERS PER VENUE
work complements these efforts and adds insights about how
Venue #
and why empirical research is conducted in architecture.
CBSE3: 2004 (first edition) 2015 192
III. RESEARCH METHOD 4
ECSA : 2007 (first edition) 2015 138
We performed a comprehensive survey following
procedures defined in the mapping study method [19]. A QoSA: 2005 (first edition) 2015 135
systematic mapping study is a type of literature review that WICSA: 1999, 2001/02, 2004/05, 2007/08, 2011, 2014/15 202
aims to collect and classify research related to a specific topic,
in our case empirical software architecture research. Mapping
studies aim at structuring an area and showing how the work in We manually analyzed each paper to decide whether or not
that area is distributed within the structure [13]. This requires a a paper presents empirical work. We analyzed the whole paper,
rigorous process to identify relevant papers, and to apply rather than only the title, abstract or keywords. Also, we did
systematic data collection and analysis to ensure completeness not only rely on methods claimed in papers. For example,
and representativeness. many papers claim case studies even though they only
We deviated from mapping study procedures in that we had present small examples or illustrations of a newly proposed
a clearly defined data extraction form derived from our approach. On the other hand, some papers do not use the
research questions, rather than key-wording and mapping out terminology of well-established empirical methods, but still
existing papers. We deviated from the systematic literature present findings based on systematically collected and
review method in that we did not apply quality assessment of analyzed data. Therefore, to decide whether a paper presents
included studies since our aim was to obtain insights into all empirical work, we used a broader definition of empirical
research that has been published in software architecture. Also, research as suggested by Sjoberg et al. [20]: Empirical research
our scope was limited, so some issues of mapping studies or includes any research based on systematically collected
systematic literature reviews did not apply (e.g., defining evidence
search criteria and a search string for an automated search, natural, social, or cognitive phenomena by using evidence
selecting electronic information sources). As suggested by based on observation or experience. It involves obtaining and
Petersen et al., our research process consisted of planning, interpreting evidence, by, e.g., experimentation, systematic
conducing and reporting [13] as discussed below. observation, interviews or surveys, or by the careful
[20]. As a
A. Planning consequence, we not only checked whether there was a well-
1) Study identification: Since we were interested in established empirical method, but also whether there was a
empirical research in software architecture, we analyzed clear study objective, whether data was collected and analyzed
systematically, and if findings were linked back to the
papers published at venues that focus on software architecture. objective. Furthermore, if the authors claimed a method, we
Therefore, our sample consists of all technical full papers from looked for clues that the claim about the method was true (e.g.,
the International ACM SIGSOFT Symposium on Component- references to literature related to the claimed method). An
Based Software Engineering (CBSE), the European example of a paper that presented a case study but was
Conference on Software Architecture (ECSA), the excluded since it did not clearly discuss data extraction and
International ACM SIGSOFT Conference on the Quality of fUML-Driven Design and Performance Analysis of
Software Architectures (QoSA) and the Working IEEE/IFIP Software Agents for Wireless Sensor Network
Conference on Software Architecture (WICSA). These four ECSA 2014. Finally, studies must provide insights beyond the
conferences are the main venues for publishing software studied case or example and offer insights about the claim or
architecture research. We included all editions of these statement in a paper. An example of a paper that presented a
study but was excluded because it did offer insights beyond a
venues. We excluded the SATURN series since it is a non-
studied case (without discussing general insights or insights
paper practitioner conference (we acknowledge however that about an approach) was Assessing a Multi-Site Development
talks at SATURN may present empirical works, in particular Organization for Architectural Compliance
case studies). While this selection is limited, it still gives a 2) Data extraction: Table II lists the data items extracted
representative impression of the state-of-practice. Also, while for each paper. We used topic-specific classifications (e.g.,
journals may follow best practices in research, conferences use where topics
the most common practices of research in a community [5]. were defined upfront, together with topic-independent
We manually identified 667 full technical research papers classifications that can be used in other mapping studies (e.g.,
(Table I). Papers were categorized as full technical research [13]. The first few data items were collected
papers based on paper categories in conference proceedings
not to answer a particular research question, but to provide
1
http://wwwp.dnsalias.org/wiki/Wicsa7:Workshop:Empirical_Assessment_in_Software_Architecture
3
[accessed December 2015] In 2004, the 6th International Symposium on Component-Based Software Engineering was held.
2
http://www.iso-architecture.org/wicsa2009/easa2009.pdf Before 2004, CBSE was organized as ICSE workshop. Therefore, we excluded these editions of CBSE.
4
[accessed December 2015] In 2009/12, ECSA and WICSA were co-located; papers for 2009/12 were recorded for ECSA only.
more detailed demographic information about papers. We study, multi-method study6). We added the category
explain some of the data items in more detail below. To systematic empirical enquiry for empirical work that does
extract data, we manually reviewed the whole papers. not follow the structure or design of well-established
methods, but still formulates objectives, collects and
TABLE II. EXTRACTED DATA analyzes data, and presents insights in a systematic manner
(see also our notion of empirical research in Section
Item Description RQ III.A.1). We did not rely on self-classification of methods
Venue Publication venu (conference) n/a or keywords, but reviewed papers manually.
Discussion of validity: We recorded whether validity
Year Publication year n/a threats are discussed explicitly (i.e., in a dedicated section
Paper title The title of the paper n/a or paragraph) or implicitly (i.e., hidden in a paragraph of
the methodology, discussion or conclusion), or not at all.
Academic authors Number of academic authors n/a
Types of validity: If validity was discussed, we recorded
Industry authors Number of industry authors 5
n/a whether external and internal validity (or other kinds of
validity) are differentiated. Also, we include an option of
Citation count Citation count as of 12/2015 n/a
and limitations are discussed,
Objective but no differentiation of types of validity is made.
See below n/a
formulation
3) Data analysis: Most data were analyzed using
Study focus See below n/a descriptive statistics. Data about the study focus underwent
Reason See below RQ1 content analysis and open coding to identify themes of
architecture research in papers. Content analysis categorizes
Method See below RQ1 data and analyzes frequencies of themes within categories. It
Replication Does the paper present a replication? RQ1 allows to transfer qualitative information into quantitative
information. Tabulation helped us analyze data from content
Repetition Does the paper present a repetition? RQ1 analysis. As suggested in Petersen et al. [13], we visualized
Subjects Human and/or non-human RQ2 our findings in bar plots and pie diagrams.
Human subjects n/a or students and/or professionals RQ2 B. Conducting and Reporting the Study
Discussion of validity See below RQ3 The study was conducted as outlined in the planning phase.
Parts of the process were iterative and revised. Both authors
Types of validity See below RQ3 were involved in selecting studies and extracting data.
Researchers were co-located and discussed throughout the
Objective formulation: To get a better idea about how process to ensure a shared understanding of inclusion criteria
clear research objectives are defined, we extracted and extracted data items, as well as to resolve ambiguity and
information about whether studies present (high-level) conflicts. Spreadsheets were used to record data
goals, detailed research questions, or concrete hypotheses. (http://www.cs.kuleuven.be/publicaties/rapporten/cw/CW691.a
Study focus: The selection of a research method depends bs.html). The reporting follows guidelines proposed by
on the studied phenomena and the nature of the research Petersen et al. [13].
question. For example, evidence for the correctness of a
IV. RESULTS
model checker may require formal proofs rather than
empirical studies. Therefore, we identified themes that
A. Demographics
empirical software architecture research has been targeting.
This was recorded as free text. We found 115 papers that report empirical work (17% of
Reason: We extracted the main reason for why the all 667 full technical research papers). In Fig. 1 we show the
empirical work was conducted. We differentiate focus and contribution of empirical papers compared to all full technical
evaluation. Focus means that empirical work is the core of research papers per year at CBSE, ECSA, QoSA and WICSA.
the paper with the empirical results as the main Due to the varying total number of papers at each conference
contribution. In this case, the goal of the paper motivates every year, we only show relative rather than absolute
the empirical work (e.g., an exploratory case study or a numbers. For example, in 2015, 43% of all full technical
systematic literature review). Evaluation means that the research papers reported empirical work.
main contribution of the paper is a new approach and
empiricism is used to evaluate this new proposal (e.g., an
experiment to evaluate the efficiency of a new viewpoint).
Method: We recorded empirical research method applied.
We used basic types of empirical methods (case study,
experiment, survey, systematic literature review, interview

6
Here, surveys do not include literature reviews. Interview studies differ from surveys in that they use
interviews as data collection instrument and often use a relatively small sample. Multi-method studies
5
Authors from non-academic institutions were counted as industry authors. combine two or more research methods in one paper.
and the average number of authors of non-empirical papers is
3.04 (min: 1; max: 12). With regards to authors from industry
and academia, we noticed that empirical papers have an
average of 2.7 academic authors (min: 0; max: 7) and non-
empirical papers have an average of 2.5 academic authors (min:
0; max: 11). On the other hand, empirical papers have an
average of 0.5 of industry authors (min: 0; max: 6), similar to
non-empirical papers with 0.5 industry authors on average
(min: 0; max: 10). Also, 70% of empirical papers do not have
authors from industry and 74% of non-empirical papers do not
have authors from industry. Therefore, we did not find any
Figure 1. Relative contribution of empirical works in relation to all papers.
significant difference in the types of authors in empirical and
Fig 1. shows an increasing trend to publish empirical work non-empirical papers. Furthermore, most empirical papers have
starting 2005. This could be due to the fact that some of the key two (31%), three (23%) or four (28%) authors but only very
publications about empirical software engineering were few empirical papers have a single author (6% of empirical
published in early 2000. This trend is similar to empirical papers). This distribution of authors in empirical papers is
research in requirements engineering where empirical works similar to the distribution of the authors of non-empirical
started to increase since 2004 [15]. To compare the distribution papers. It is also similar to what has been found for the number
of empirical works in software architecture to empirical works of authors of the top-100 papers in software engineering [21].
in software engineering in general, we contrast our findings to With regards to citation counts, we noticed that the
Siegmund et al. [5]. Siegmund et al. only investigated papers average annual citation count (i.e., the total number of citations
after 2011. Only looking at architecture papers published in for a paper divided by the number of years since its
2011 or later we notice that 28% of papers present empirical publication) for empirical papers is 2.80 (min: 0; max: 10.75).
work, compared to 94% of general software engineering The average annual citation count for non-empirical papers is
research. A possible explanation for this gap could be that 3.09 (min: 0; max: 50.3). In comparison, the average annual
citation count of the top-100 research papers in software
[5], while we meticulously reviewed papers to engineering ranges from 21.8 to 154.2 [21].
identify empirical works. This could have resulted in a more
selective inclusion of papers (see our discussion on the misuse Key insights
of terms and our notion of empirical work in Section III.A.1). Similar to software engineering research in general,
In Fig. 2 we visualize the distribution of empirical works of there is also an increasing trend to publish empirical
each conference. Again, due to the varying number of total work in software architecture.
papers published at each conference every year, we only show
relative numbers. For example, in 2015, 47% of all full Empirical papers do not necessarily have more or fewer
technical research papers at WICSA reported empirical work. authors from industry than non-empirical papers.
Fig. 2 shows an increasing trend of publishing empirical works
at all architecture-related conferences. To interpret Fig. 2 Citations of empirical papers are not necessarily higher
correctly, it should be noted that in 2009 and 2012 WICSA and than citations of non-empirical papers.
ECSA were co-located. Also, WICSA started in 1999, CBSE
in 2004, QoSA in 2005, and ECSA in 2007.
With regards to the number of authors, the average
number of authors of empirical papers is 3.15 (min: 1; max: 7)

Figure 2. Relative distribution of empirical works over time per conference.


We also investigated whether empirical works report
B. RQ1: How is Emprical Research Applied replications or repetitions of previous studies. Replications or
In the majority of empirical papers (54%), empirical work repetitions would increase the confidence in and strength of
is the focus (i.e., these papers present an empirical study, e.g., findings regarding a phenomenon (a replication runs the same
an exploratory case study, without proposing a new approach). study on a different sample or different subjects but with
In 46% of empirical papers, empirical work is used to evaluate identical conditions; a repetition repeats the same study with
a newly proposed approach (present new approach together the same sample and involves measuring the same cases
with empirical assessment). Fig. 3 shows that the number of multiple times). We did not find repetitions study. We found
papers that focus on empirical work for exploratory and one paper where the authors present a replication of their study
descriptive studies developed similarly to papers that present The Supportive Effect of Traceability
empirical work to evaluate new approaches. Links in Architecture-Level Software Understanding Two
Controlled Experiments WICSA 2014).

Key insights
Empirical research is adopted equally to explore/study
phenomena and to evaluate (new) approaches.
Case studies and experiments are the most frequently
used empirical research methods.
Most exploratory studies use case studies. Evaluation
studies tend to apply experiments.

Figure 3. Reasons to apply empirical research over time. Replications and repetitions are almost non-existent in
empirical software architecture research.
We found that the most frequently applied research method
are experiments, followed by case studies, see Fig. 4. C. RQ2: Role of Human Participants in Studies
Forty-seven percent of empirical studies use humans as
subjects or study participants, and 8% involve both human and
non-human subjects (e.g., software systems, design documents,
source code). Overall, we noticed a slightly increasing trend of
involving human subjects in studies. This could be due to the
types of research topics investigated in software architecture
(see also our discussion in Section IV). With regards to the
types of human subjects, we found that 68% of the papers that
involve humans use professionals and 25% of papers use
students. Seven percent of papers with humans involve both,
professionals and students. Fig. 6 shows the distribution of
types of participants and subjects across research methods (in
Figure 4. Distribution of empirical research methods. %; e.g., 100% of interview studies involve humans). Most
experiments involve non-human subjects.
Most papers that have an empirical study as their focus
conduct case studies (Fig. 5). On the other hand, most
evaluation studies use experiments.

Figure 6. Use of human and non-human participants.

Figure 5. Research methods versus reasons for conducting empirical work.


In Fig. 7 we illustrate the types of human participants or increasing trend to explicitly discussing validity threats. Fig. 8
subjects. We show the percentages across all studies of a shows the growing gap between papers that explicitly discuss
method, not only the ones that involve humans (e.g., 55% of all validity threats and papers that do not discuss them or only
case studies involve professionals). Therefore, the percentages implicitly (we show percentages for years from 2005 since this
in Fig. 7 do not add up to 100% for a research method since the is when the number of empirical works started to increase).
remaining studies included non-human participants.

Figure 8. Relative distribution of validity threats in papers 2005-2015.

We also investigated what types of validity were discussed


if there was an explicit discussion of validity threats. Table III
show the relative number of papers that address certain validity
threats, given the total number of papers that present a certain
Figure 7. Types of human participants in empirical architecture studies.
research method (e.g., 39% of all papers that include a case
study discuss internal validity). From Table III we can
Key insights conclude that interview studies and multi-method studies
acknowledge internal and external validity most, but systematic
Almost half of the empirical software architecture literature reviews and surveys often discuss validity threats
studies involve humans as subjects or participants. without clearly differentiating them. Reliability (i.e., whether
Experiments mostly use non-human subjects. Most case other researchers who conduct the same study would achieve
studies involve human participants. the same results) is discussed least.

Professionals are the most frequently used type of Key insights


human participants (except for experiments where
students are the dominating type of human participant). Empirical software architecture research increasingly
acknowledges and discusses validity threats.
Internal and external validity are the most frequently
D. RQ3: Validity Threats discussed types of validity.
Most empirical studies (60%) explicitly discuss study
limitations or validity threats, whereas a small number of Many papers discuss validity but do not differentiate
studies (8%) only implicitly acknowledge limitations. On the types of validity based on well-established categories.
other hand, there is still a significant number of empirical
papers that do not mention any limitations or validity threats at
all (32%). This number is slightly higher than the number
reported by Feldt and Magazinius (20.9%) in 2010 who
surveyed a sample of 43 papers published at ESEM in 2009 for
their reporting of validity threats [8]. Overall, we noticed an
TABLE III. VALIDITY TYPES PER EMPIRICAL METHOD
Systematic
Validity type Case study Experiment Interview study Multi-method SLR Survey
empirical enquiry
Internal 0.39 0.37 0.67 0.67 0.29 0.40 0.06

External 0.42 0.42 0.67 0.67 0.29 0.40 0.11

Construct 0.36 0.11 0.33 0.00 0.29 0.30 0.11

Reliability 0.21 0.00 0.17 0.00 0.14 0.00 0.00

Conclusion 0.12 0.11 0.17 0.33 0.14 0.00 0.00

Not differentiated 0.27 0.24 0.17 0.33 0.71 0.60 0.39


detail (e.g., quality of study design, strength of evidence), our
V. FURTHER INSIGHTS AND DISCUSSIONS findings allow some insights into the maturity of empirical
software architecture research. For example, many papers
A. Research Themes claim to present case studies but only provide example
We acknowledge that conducting empirical studies is one applications of an approach. Furthermore, standard
way to evaluate work and to increase our confidence into terminology of empirical methods is often mixed up (what is an
whether something (e.g., a new approach) works or not. Other experiment, what is a sample or unit of analysis, etc.). Also,
forms of obtaining evidence could include formal proofs and discussions of validity types across research methods are quite
not require collecting and analyzing data (e.g., for showing that uneven. Furthermore, the way that studies formulate their
a model checker works correctly). Hence, the type of research objective does often not comply with requirements for
performed in a discipline depends on the types of research specifying objectives in empirical research (e.g., experiments
questions and types of problems tackled. Therefore, based on should have hypotheses). Based on data we extracted for
content analysis and open coding of free text obtained from our we observe that 47%
III), we identified a of studies describe concrete research questions and around 41%
set of themes that seem to be prominent in empirical software of empirical studies describe the objective of the study as
architecture research: component selection and composition, higher level goals. On the other hand, only a small portion of
patterns, architecture reasoning and decisions, self-adaptation, empirical studies (the remaining 12%) describe detailed
architecture description, technical debt, quality (in general), hypotheses. We acknowledge that hypotheses are most
and other. In Fig. 9 we show how these themes are addressed common for experiments, but even only 32% of all
by different research methods (in Fig. 9 we show the absolute experiments state a clear hypothesis. Therefore, while the
number of papers per theme). Experiments tend to study increasing number of empirical studies raises the maturity of
quality issues (such as performance measurements, security, the field, this also puts an increasing demand on planning,
reliability) whereas surveys have been used for conducting and evaluating research and its results [22].
(such as architectural decision making, reasoning and Finally, we still lack replications and repetitions to build
knowledge management). As mentioned in Section III.B, we solid knowledge and evidence. This could be due to the fact
noticed a slight increase in human involvement in empirical that we still miss standardized concepts and terminologies,
software architecture studies. This could be due to the fact that frameworks, guidelines and methodologies to fully support
themes like architecture description, architecture reasoning and replications [14]. Another reason could be that software
decisions require human input in practice and therefore engineering researchers often do not appreciate the value of
empirical studies need to reflect this in order to provide results replications and repetitions but consider them as a way to
useful in the real world . Note that in this research we do not [5]. Therefore, we should
investigate the development of research themes in software acknowledge that replications and repetitions are key enablers
architecture in further detail. for further maturing software architecture as a research and
engineering discipline. Similar to other events (e.g., empirical
B. Implications for Researchers tracks at REFSQ and XP), we may encourage dedicated tracks
There is an undeniable trend towards applying systematic for such studies at software architecture venues.
empirical methods in the field of software architecture. Even
though we did not evaluate the quality of empirical works in

Figure 9. Software architecture research themes addressed by research methods

.
are a representative sample to derive conclusions about the
C. Implications for Practitioners state-of-practice in the architecture research community.
The fact that empirical works are getting more attention Reliability: Reliability is about ensuring that our results are
should increase the confidence of practitioners in research and the same if our study would be conducted again. Researchers
evaluations of new approaches. However, given that studies may have been biased in their decision of whether a paper is
often claim case studies but present examples, practitioners empirical or not, in particular if no established method and
should be careful when interpreting results based on claims study design was included in a paper. Therefore, data
made by authors of research papers. Practitioners may still extraction and analysis was done by two researchers following
need to carefully check relevance and rigor of presented works. guidelines and discussions while being co-located. Also, the
Furthermore, the active involvement of professionals as papers of several venues were analyzed by both researchers to
study participants but also as authors of empirical papers increase confidence. However, classification of papers may
underpins the practical relevance of research questions and have been subjective in some cases and biased by the
methods used in the domain. We found that many empirical background and experience of the researchers. This is also
works include not only human participants, but obtain evidence related to conclusion validity, which is concerned with the
from professionals. This shows that there are opportunities for ability to replicate the same findings.
practitioners to get actively involved in empirical research in Construct validity: Construct validity is about obtaining
collaborations with academics (in particular when providing the right measure and whether we defined the right scope in
data and access to projects). However, these joint efforts are relation to what is considered an empirical paper. Poor
subject to the typical success factors of industry-academia reporting of studies in papers may have affected our selection
collaborations (e.g., champions in industry organizations, buy- and data extraction. Furthermore, we have identified data items
in support to identify empirical studies (see our notion of empirical
commitment to contribute to industry needs) [23]. research in Section III.A). This notion may be biased. To
reduce this threat, we utilized ideas from [5] and [20].
D. Threats to Validity
Note that in contrast to other literature surveys we are not
Internal validity: Internal validity is about the extent to concerned with issues such as publication bias. Publication bias
which a causal conclusion based on a study is warranted. In our may prevent negative or new and controversial results to be
study, it was not always straightforward to identify full published (theoretical validity). In our study we were only
technical research papers. This is because often paper interested in the state of published empirical work and not in
categories are not separated in the conference proceedings, the the actual findings of that work.
conference program or even in the call for papers. Also, some
venues (e.g., CBSE) may not differentiate strictly between VI. CONCLUSIONS
technical full research papers, experience papers, industry Empirical research has received growing attention in
papers, short papers, position papers, etc. However, irrelevant software architecture. Besides implications discussed in the
papers were excluded during the review and screening process previous section, we feel that based on our findings, educators
based on the length of a paper (e.g., a 4-pages paper is most can devise educational programs to better train architecture
likely a short paper; for WICSA 2002 and 2004 we excluded researchers and to include empirical methods in the curriculum
working session papers). To increase validity, we applied of (post-graduate) study programs.
manual filtering and did not rely on a keyword-based paper We did not perform an in-depth analysis of empirical work
selection and data extraction. depending on the publication venue but aimed at providing a
External validity: External validity is about generalizing general overview of empirical research in software
our findings to all published software architecture research. We architecture. However, our data would certainly allow such an
excluded short papers even though there are short papers that analysis, which could be subject to future work.
Software Reference We acknowledge that not all empirical studies are created
Architectures - or applied equally, and many may not even provide any useful
at ECSA 2013 which presents an exploratory survey with information or value [24]. However, this paper does not aim to
practitioners). We also did not include journals. Journals may evaluate the value and quality of empirical research in software
have higher standards in their requirements for empirical work. architecture. We did not explicitly evaluate empirical works for
However, as argued in Section III, while journals may follow their study design quality, credibility, usefulness, value and
best practices in research, conferences use the most common power of evidence. Therefore, other future work includes
practices of research in a community [5]. Also, we were analyzing additional data, such as the number of human
interested in empirical research in software architecture and the participants in empirical works with human subjects to get
architecture community currently does not have a dedicated insights into the strength of evidence, the use of data
journal. This would make it difficult to separate architecture- repositories, and the relevance and rigor of applied methods.
related research from research not related to software To complete the overview of empirical research in software
architecture, without performing another filtering step (this architecture, we need to obtain insights about the perception of
filtering step could then cause additional validity threats). the architecture community on empirical research, similar to
Difficulties with separating architecture-related work also Siegmund et al. for general software engineering [5].
apply to general software engineering conferences, such as
ICSE, FSE or ASE (which also publish original research on
software architecture). Therefore, conference papers collected
from all editions of four primary architecture-related venues
REFERENCES [13] K. Petersen, S. Vakkalanka, and L. Kuzniarz, "Guidelines for
Conducting Systematic Mapping Studies in Software Engineering: An
[1] D. Falessi, M. A. Babar, G. Cantone, and P. Kruchten, "Applying Update," Information and Software Technology, vol. 64, pp. 1-18,
Empirical Software Engineering to Software Architecture: Challenges 2015.
and Lessons Learned," Empirical Software Engineering, vol. 15, pp.
250-276, June 2010. [14] C. de Magalhaes, F. da Silva, R. Santos, and M. Suassuna,
"Investigations about Replication of Empirical Studies in software
[2] M. A. Babar, P. Lago, and A. v. Deursen, "Empirical Research in Engineering: A Systematic Mapping Study," Information and Software
Software Architecture: Opportunities, Challenges and Approaches," Technology, vol. 64, pp. 76-101, 2015.
Empirical Software Engineering, vol. 16, pp. 539-543, 2011.
[15] M. Daneva, D. Damian, A. Marchetto, and O. Pastor, "Empirical
[3] T. Vale, I. Crnkovic, E. de Almeida, P. A. da Mota Silveira Neto, Y. C. Research Methodologies and Studies in Requirements Engineering:
Cavalcanti, and S. R. de Lemos Meira, "Twenty-eight Years of How Far did we Come?," Journal of Systems and Software, vol. 95, pp.
Component-based Software Engineering," Journal of Systems and 1-9, 2014.
Software, vol. 111, pp. 128-148, 2016.
[16] T. Dyba and T. Dingsoyr, "Empirical Studies of Agile Software
[4] J. Maras, L. Lednicki, and I. Crnkovic, "15 Years of CBSE Development: A Systematic Review," Information and Software
Symposium: Impact on the Research Community," in 15th ACM Technology, vol. 50, pp. 833-859, 2008.
SIGSOFT Symposium on Component Based Software Engineering
(CBSE) Bertinoro, Italy: ACM, 2012, pp. 61-70. [17] D. Weyns and T. Ahmad, "Claims and Evidence for Architecture-based
Self Adaptation - A Systematic Literature Review," in 7th European
[5] J. Siegmund, N. Siegmund, and S. Apel, "Views on Internal and Conference on Software Architecture (ECSA) Montpellier, France:
External Validity in Empirical Software Engineering," in 37th Springer Verlag, 2013, pp. 249-265.
International Conference on Software Engineering Florence, Italy:
IEEE Computer Society, 2015, pp. 9-19. [18] D. Falessi, P. Kruchten, and G. Cantone, "Issues in Applying Empirical
Software Engineering to Software Architectures," Proc. First European
[6] I. Salman, A. T. Misirli, and N. Juristo, "Are Students Representatives Conference on Software Architecture, Springer, 2007, pp. 257-262.
of Professionals in Software Engineering Experiments?," in 37th
International Conference on Software Engineering Florence, Italy: [19] K. Petersen, R. Feldt, S. Mujtaba, and M. Mattsson, "Systematic
IEEE Computer Society, 2015, pp. 666-676. Mapping Studies in Software Engineering," in Evaluation and
Assessment in Software Engineering (EASE 08) Bari, Italy: BCS, 2008,
[7] M. Galster and D. Tofan, "Exploring Web Advertising to Attract pp. 1-10.
Industry Professionals for Software Engineering Surveys," in 2nd
International Workshop on Conducting Empirical Studies in Industry [20] D. Sjoberg, T. Dyba, and M. Jorgensen, "The Future of Empirical
Hyderabad, India: ACM, 2014, pp. 5-8. Methods in Software Engineering Research," in Future of Software
Engineering (FOSE) Minneapolis, MN: IEEE Computer Society, 2007,
[8] R. Feldt and A. Magazinius, "Validity Threats in Empirical Software pp. 358-378.
Engineering Research - An Initial Survey," in 22nd International
Conference on Software Engineering and Knowledge Engineering [21] V. Garousi and J. Fernandes, "Highly-cited Papers in Software
Redwood City, CA: KSI, 2010, pp. 374-379. Engineering: The top-100," Information and Software Technology, vol.
71, pp. 108-128, 2016.
[9] M. Shaw, "Writing Good Software Engineering Research Papers,"
Proc. 25th International Conference on Software Engineering, IEEE [22] C. Wohlin and A. Aurum, "Towards a Decision-making Structure for
Computer Society, 2003, pp. 726-736. Selecting a Research Design in Empirical Software Engineering,"
Empirical Software Engineering, vol. 20, pp. 1427-1455, 2015.
[10] C. Zannier, G. Melnik, and F. Maurer, "On the Success of Empirical
Studies in the International Conference on Software Engineering," in [23] C. Wohlin, A. Aurum, L. Angelis, L. Philips, Y. Dietrich, T. Gorschek,
28th International Conference on Software Engineering Shanghai, H. Grahn, K. Henningsson, S. Kagstrom, G. Low, P. Rovegard, P.
China: ACM, 2006, pp. 341-350. Tomaszewski, C. Van Toorn, and J. Winter, "The Success Factors
Powering Industry-Academia Collaboration," IEEE Software, vol. 29,
[11] D. Sjoberg, J. E. Hannay, O. Hansen, V. Kampenes, A. Karahasanovic, pp. 67-73, 2012.
N.-K. Liborg, and A. C. Rekdal, "A Survey of Controlled Experiments
in Software Engineering," IEEE Transactions on Software Engineering, [24] E. J. Weyuker, "Empirical Software Engineering Research - The Good,
vol. 31, pp. 733-753, 2005. The Bad, The Ugly," in International Symposium on Empirical
Software Engineering and Measurement Banff, AB: IEEE Computer
[12] H. Zhang and M. Ali Babar, "Systematic Reviews in Software Society, 2011, pp. 1-9.
Engineering: An Empirical Investigation," Information and Software
Technology, vol. 55, pp. 1341-1354, 2012.

You might also like