Reader 36 WS 01 Rbae

Competence Center Corporate Data Quality 36th Workshop
READER
St.Gallen, 10th & 11th October 2013
IWI-HSG Institute of Information Management University of St. Gallen Mller-Friedberg-Strasse 8 CH-9000 St. Gallen
Contents
Otto, B., Aier, S.: Business Models in the Data Economy: A Case Study from the Business Partner Data Domain, in: 11th International Conference on Wirtschaftsinformatik, 27th February 01st March 2013, Leipzig, Germany. Wlodarczyk, T.W., Rong, C., Thorsen, K.A.H.: Industrial Cloud: Toward Inter-enterprise Integration., in M.G. Jatuun, G. Zhao, C. Rong (Editors), CloudCom 2009 (p. 460 - 471). Berlin, Heidelberg: Springer. Loshin, D.: Developing a Business Case and a Data Quality Road Map., in D. Loshin, A Practitioners Guide to Data Quality Improvement (p. 67 - 90). Burlington: Morgan Kaufmann, 2011.
Business Models in the Data Economy: A Case Study from the Business Partner Data Domain
Boris Otto and Stephan Aier
University of St. Gallen, Institute of Information Management, St. Gallen, Switzerland {boris.otto,stephan.aier}@unisg.ch
Abstract. Data management seems to experience a renaissance today. One particular trend in the so-called data economy has been the emergence of business models based on the provision of high-quality data. In this context, the paper examines business models of business partner data providers. The paper explores as to how and why these business models differ. Based on a study of six cases, the paper identifies three different business model patterns. A resourcebased view is taken to explore the details of these patterns. Furthermore, the paper develops a set of propositions that help understand why the different business models evolved and how they may develop in the future. Finally, the paper discusses the ongoing market transformation process indicating a shift from traditional value chains toward value networksa change which, if it is sustainable, would seriously threaten the business models of well-established data providers, such as Dun & Bradstreet, for example. Keywords: Business model, Case study, Data quality, Data resource management, Resource-based view
Introduction
Recent society, economic, and technological developments, such as management and exploitation of large data volumes (big data), increasing business relevance of consumer data due to the upcoming of social networks, and the growing attention topics like data quality have received lately, seem to have triggered a renaissance of data management in enterprises. Analyst company Gartner has coined the notion of the data economy [1] in an attempt to introduce a single term subsuming these trends. The term implies to view data as an intangible good. Research has been examining the transfer of management concepts for physical goods to the domain of intangible goods (such as data) since the 1980s [2], [3]. In parallel, business models have emerged taking up on the idea of selling data of high quality. Sourcing high-quality business partner data is of high relevance particularly for purchasing as well as for sales and marketing departments of large enterprises [4]. For example, reliable and valid business partner data (such as company names, company identifiers, or subsidiary company information) is a necessary prerequisite for doing cross-divisional spend analysis or for pooling purchasing volumes on a company-
475 11th International Conference on Wirtschaftsinformatik, 27th February 01st March 2013, Leipzig, Germany
wide level. The demand for high-quality business partner data has fuelled the emergence of corresponding business models. A prominent example is Dun & Bradstreet (D&B). While business partner data services have received attention in the practitioners community for quite some time, research has not taken up the issue to a significant extent so far (a notable exception is the work of Madnick et al. [4]). Nobody has come up with a comprehensive analysis of business models in the field of business partner data services to this day. The paper at hand addresses this gap in literature and aims at exploring business models in the business partner data domain. In particular, our research aims at investigating the question as to how and why business models of business partner data providers differ.
2
2.1
Theoretical Background
Data as an Economic Good
A clear, unambiguous and widely accepted understanding of the two terms data and information does not exist [5], [6]. One research strand sees information as knowledge exchanged during human communication, whereas another takes an information processing lens according to which pieces of data are the building blocks of information [7]. The aim of the paper is not to take part in that discussion, but to follow one specific definition, which is to view information as data processed [2]. The value of data is determined by its quality [8]. Data quality is defined as a context dependent, multidimensional concept [9]. Context dependency means that quality requirements may vary depending on the specific situation data is used in. Multidimensionality refers to the fact that there is no single criterion by which data quality can be fully ascertained. Examples of data quality dimensions are accuracy, availability, consistency, completeness, or timeliness. 2.2 Business Partner Data
Business partner data typically comprises organization data (e.g. company names, addresses, and identifiers, but also industry classification codes), contact data (e.g. telephone numbers and e-mail addresses of companies), and banking information. Madnick et al. [4] have identified three challenges when it comes to managing business partner data in an organization. The first challenge, identical entity instance identification, refers to the problem of identifying certain business partners, as in many cases an unambiguous, unique name or identification number is missing, and one and the same business partner is referred to by several synonyms across the organization. The second challenge, entity aggregation, relates to the problem of knowing about and identifying the parts and subsidiaries a certain business partner consists of. And the third challenge, transparency over inter-entity relationships, gets relevant if, for example, the overall revenue generated with a certain customer needs to be determined, including direct sales but also third-party sales and reselling.
476
2.3
Business Model Theory
A business model describes how an organization creates value [10], [11]. Business model research typically draws upon three paradigmatic perspectives on business strategy, namely the industrial organization perspective [12], the resource-based view [13], [14], and the strategy process perspective[15], [16]. The industrial organization perspective focuses on external forces that affect the work of managers. Substitute products, customers, suppliers, and competitors have an effect on strategic decisions, such as differentiation of products [17]. The resource-based view states that company specific sets of resources determine whether a company is able to achieve aboveaverage performance [13], [14]. According to the resource-based view, characteristics of key resources of companies are value, rareness, inimitability, and non-substitutability (VRIN criteria) [14]. The strategy process perspective, finally, focuses on the managerial function [16]. In the mid-1990s, business models started to receive increasing attention in the scientific community as the first electronic business models emerged [18]. Research that time was mostly descriptive and analytical in nature. In general, when defining the term business model many authors referred to a set of concepts representing the underlying meta-model. Each concept can be instantiated differently in a specific business model. Typically these meta-model concepts were then combined with business model frameworks. More recently, the scientific community has started to provide guidance and support for designing business models. Osterwalder and Pigneur, for example, have proposed a handbook for business model generation [19]. Hedman and Kalling [20] have proposed a business model framework which is built on the three paradigmatic perspectives outlined above. Their business model framework consists of seven concepts, namely (1) customers, (2) competitors, (3) offering, (4) activities and organization, (5) resources, and (6) factor and production inputs. It also has a longitudinal process component to cover the dynamics of the business model over time, which is referred to as (7) scope of management.
3
3.1
Research Design
Overview
The paper aims at investigating business models of the business partner data domain. For this purpose, case study research was chosen as the underlying research method, as this form of research allows examining contemporary phenomena at an early stage of research in their real-world context [21-23]. The course of the research follows the five guiding points proposed by Yin [21], namely (i) research question, (ii) research propositions, (iii) unit of analysis, (iv) logic which links the data to the propositions, and (v) criteria for interpreting the findings. As outlined in Section 1, the paper aims at investigating the (i) research question as to how and why business models in the business partner data domain differ. The case study explores a phenomenon which is still relatively unaddressed and for which only limited theoretical knowledge exists. Yin [21] concedes that in exploratory cases
477
sound theoretical (ii) research propositions are hardly available. However, he stipulates to design a conceptual framework that guides the investigation. Section 3.2 describes the conceptual framework used in the paper. A clear definition of the (iii) unit of analysis is important for determining the validity and generalizability of case study results, as it sets the boundaries of the scope of the analysis. In this paper, the unit of analysis is the domain of business models of business partner data providers. The conceptual framework also works as the (iv) logic which links the data to the propositions. In fact, the conceptual framework forms a lens through which the individual cases can be studied and compared. Finally, (v) criteria for interpreting the findings are derived from the theoretical foundations of business model research, particularly by taking a resource-based view. The interpretation of findings results in propositions on design patterns for business models to be used in the business partner data domain. 3.2 Conceptual Framework
The papers main goal is not to advance business model theory in general, but to use existing business model research as a lens to study observable business models in a particular domain, namely business partner data services. In order to be able to systematically describe and analyze the cases, the paper uses the business model framework proposed by Hedman and Kalling [20] (see Section 2.3) as a conceptual framework. This model was chosen because of two reasons. First, it is the result of a comprehensive analysis of literature on business models. Second, it combines the three paradigmatic perspectives on business strategy. Hence, Hedman and Kallings business model framework is well suited to explore the research questions addressed in this paper. 3.3 Case Selection
The case study selection process consisted of two steps. The first step used a focus group to determine the most relevant business partner data providers from a practitioners perspective. In general, focus groups are an adequate research method for examining the level of consensus within a certain community [24]. The focus group got together on February 3, 2011, in Ittingen, Switzerland. Participants were 28 enterprise data managers from large multinational organizations. They were presented an overview of business models of business partner data providers and were then asked (among other things) to identify on a list of 24 well-known data providers the four most relevant players. Criteria in the selection process referred to the conceptual framework and included, for example, the offering (availability of consulting services), resources (expertise in the domain), and the scope of management (global or regional). The participants chose Avox, BvD, D&B, and InfoGroup OneSource to be the four most important providers, so these four were selected to be included in the case study. In a second step, the list of four was extended by two more players, who had entered the market only shortly before, namely Factual and Infochimps. These two providers were chosen following the principle of theoretical replication [22], i.e. predicting contradictory results compared to the four pre-selected cases.
478
3.4
Data Collection and Analysis
Data was collected from multiple sources. The beginning was made with publicly available information, such as annual reports, information provided on websites, etc. Furthermore, the companies were contacted via e-mail and telephone and were asked for more detailed information on their service offerings. Main contact persons included the head of Business Intelligence & Key Account Management at D&B in Switzerland, a regional sales manager at BvD, and the Chief Operating Officer at Avox. Data analysis used the conceptual framework presented in Section 3.2 as a theoretical lens to link the data to the different concepts of the business model framework. In the case of Avox, for example, the interview protocols, documents from the public domain (e.g. press releases and website information) as well as internal presentations on the Avox business model were analyzed according to Hedman and Kallings framework. Section 4 presents the results of the case analysis.
4
4.1
Business Models of Business Partner Data Providers

Business Models of the Case Study Companies
Avox is a provider of business partner data (i.e. names, addresses, chamber of commerce numbers, ownership structures etc.) of legal entities companies do business with. Avox is specialized in business partner data relevant for the financial services industry. The data is stored in a central database which is fed by three main sources of data, namely (i) third-party data vendors (such as the Financial Times), (ii) companies providing information about themselves (such as annual reports, chamber of commerce information, or website information), and (iii) customers providing updates. Thus, Avox customers do not only receive business partner data, they also contribute to the Avox databasetypically on a weekly basis. Avox offers business partner data via three different services. Basic subsets of business data records are offered for free by wiki-data (i). Access to the Avox database for more comprehensive data is granted at a regular fee (ii). Customer specific services are offered at individually agreed prices (iii). BvD is a provider of business partner data and related software solutions. BvDs service portfolio is threefold. First, there is a database solution which basically offers access to the central database. Second, the company provides so-called catalysts for specific needs of procurement or compliance departments, for example. Third, custom-made consulting services are offered for business partner data integration with customers enterprise systems, such as SAP or salesforce.com. BvDs core activities comprise processing and combining of data from more than one hundred different sources, linking of this data, and extension of data through ownership and contact information from own research activities. The pricing model is based on both subscription and usage fees and also includes individual arrangements for customerspecific services. D&B is operating a database of approximately 177 million business entity records from more than 240 countries. D&B maintains the nine-digit D-U-N-S number each
479
organization in the database is assigned with. The D-U-N-S number is used by purchasing, sales, and marketing departments of customers for identifying, organizing, and consolidating information about business partners and for linking data about suppliers, customers, and trading partners. The D&B pricing model includes subscription and usage fees, licensing components, and customer-specific fees for services. Factual provides open data to developers of web and mobile applications. The service was initially offered for free. After the initialization phase the service is now charged per data set, for example. Optionally, a flat rate can be booked. Large customers pay individually agreed fees. A special aspect of Factuals business model is the fact that these fees depend on different aspects, such as the number of edits and contributions from a customers community to the Factual database (i.e. the company grants discounts which increase with the number of edits and contributions), customer-specific requirements for API service levels (such as response times and uptimes for technical support), the volume of page views or active users, the types of data sets accessed, and unencumbered data swaps (such as crosswalking IDs). Besides business partner data, Factual offers a variety of other, continuously growing datasets. Infochimps provides business partner data that is created both by Infochimps itself and by the user community. A small number of data sets are available for free. For all other data sets a fee has to be paid. Infochimps charges a commission fee for brokering data sets provided by users. Infochimps offers four different pricing models depending on the use of APIs per hour and per month. Infochimps does not limit its offering to the business partner data domain, but offers a variety of other data records as well, such as NFL football statistics. One business partner data set is titled International Business Directoy [sic!]. It contains addresses of 561,161 businesses and can be purchased at a price of USD 200. In case customers cannot find the data required, Infochimps offers retrieving on a case-wise basis. InfoGroup OneSource offers business partner data on 17 million companies and 23 million business executives on a global level. A key business process is enriching data from a variety of different external sources. The OneSource LiveContent platform combines data from over 50 data suppliers and thousands of other data sources. The data is delivered over the web, through integration into Customer Relationship Management (CRM) systems, and via information portals. Moreover, OneSource delivers data on a data as a service basis to salesforce.com users. OneSource charges subscription fees starting at EUR 10,000 p.a. Table 1 uses the conceptual framework introduced above to compare the business models of the six business partner data providers included in the case study.
480
Table 1. Business Models of the Case Study Companies

Avox BvD D&B Factual Infochimps InfoGrou p One Source Several thousands.
Customers
n/a
Competitors
Interactive Data, SIX Telekurs. One million entities, three service types, web services.
6,000 clients, 50,000 users. D&B, among others. 85 million companies, data and software support, web services, sales force. Monitoring of mergers and acquisitions, data analysis and provision.
100,000 from various industries. BvD, among others.
n/a
n/a
Offering
Activities and organizatio n
Data retrieval, analysis, cleansing and provision
177 million business entities, data and related services, web services, sales force. Data collection and optimization, provision of quality data services.
Similar offering as Infochimp s. Open data platform, API use for free or at a charge.
Similar offering as Factual. 15,000 data sets, open data platform, four different pricing models, web service. Data collection, infrastructure development, hosting, and distribution.
D&B, among others. 18 million companies, 20 million executives, data and software, web service. Selection of content providers, data collection, data blending, data updates. 104 employees.
Data mining, data retrieval, data acquisition from external parties. 21 employees, central open data platform. Open data community.
Resources
38 analysts to verify and cleanse data, central database Third-party vendors, official data sources, customers. International coverage, co-creation, partnering.
Factor and production inputs Scope of management
500 employees in 32 offices, central database (ORBIS). More than 100 different data sources. Global coverage, alliances, data, software, consulting.
More than 5,000 employees, central database Official sources, partnering, contact to companies Global coverage.
Less than 50 employees, central data platform.
Open data community.
Start-up company.
Start-up company.
50 worldclass suppliers, 2,500 data sources. Global coverage.
4.2
Resource Perspective
Resources play a key role in the development and maintenance of business models. Drawing upon the VRIN criteria, six key resources can be identified to be relevant for the specific business models of business partner data providers (see Table 2).
481
Table 2. Key Resources for Business Models of Business Partner Data Providers
Valuable Labor Expertise and Knowledge Database Information Technology and Procedures Network Access and Relationships Capital Yes Yes Yes Yes Yes Yes Rare No Yes Yes No Yes Yes Inimitable No No No No Yes No Nonsubstitutable No Yes Yes No Yes No
Labor is used primarily to collect and analyze data. D&B, for example, employs thousands of people to retrieve business partner data from chambers of commerce and other public data sources. As no special skills are needed to perform this task, labor is considered an imitable resource. Expertise and Knowledge refers to how business partner data is actually used, how business processes for creating and maintaining business partner data are designed, and how typical data quality problems are dealt with in customer organizations. Similar to labor, this expertise and knowledge is imitable, as domain expertise is available both in the practitioners and the research community [4]. A Database is a resource which is valuable, rare and nonsubstitutable. The data itself, however, is imitable, in particular because business partner data mainly refers to company names and addresses, subsidiary company information, and the legal form, i.e. data which is available in the public domain. Information Technology and Procedurese.g. an electronic platform through which business partner data is accessible for customers and which offers data aggregation and cleansing proceduresis valuable but does not meet any other VRIN criteria. Network Access and Relationships is of particular importance as all cases depend on access to external data sources, such as chambers of commerce (D&B) or customers (Avox). This resource is the only one that meets all four VRIN criteria. Finally, Capital is a resource which is valuable and rare, but not inimitable and non-substitutable.
5
5.1
Case Analysis
Business Model Patterns
The analysis of the business models presented in the case study reveals a number of similarities between the cases investigated. The biggest similarity refers to the data providers core activities, which mainly consist of retrieving and collecting data, consolidating it, and then providing it to their customers. Moreover, the companies use similar pricing model elements, ranging from subscription and usage fees to customer-specific service fees. However, there are also significant differences that can be observed. One main difference relates to the way the companies examined stand in
482
relation with other actors from the network they are embedded in. As a result of the analysis, three business model patterns can be identified (see Figure 1). Pattern I depicts the traditional buyer-supplier relationship between data consumers and data providers. A typical instantiation of this pattern can be found at D&B, for example. The flow of data is unidirectional, and so is the flow of money. Pattern II, in contrast, uses community sourcing principles and shows bidirectional flows of data[25], [26]. In this pattern, data consumers provide data back to a common platform, and so they become prosumers [27]. The more they contribute, the more discounts they get on their fee as data consumers. This mechanism can be found at Avox and Infochimps, for example. Pattern III relies mainly on crowd sourcing mechanisms [28]. The data provider collaborates with data providers which are not necessarily data consumers at the same time.
Fig. 1. Business Model Patterns
While all business models of the data providers under investigation rely on the provision of data by third parties to a certain extent, the business models that can be related to Pattern III are completely based on the principles of crowd sourcing. Both community sourcing and crowd sourcing have their roots in innovation management and its goal to include users and customers in the research and development process, and so the terms are often used synonymously. The paper, however, makes a distinction between the two terms by looking at the actual sources. Whereas Pattern II uses data from a clearly defined community, namely customers, Pattern III does not pose any restrictions at all as long as providers of data comply with existing laws and terms and conditions. Moreover, the community sourcing approach is closely related to ensuring and improving the quality of the data in terms of data accuracy and consistency. Crowd sourcing concepts typically are related to data quality only in terms of data availability.
483
5.2
Resource Allocation Patterns
To further explore the different business model patterns, a resource-based view is taken regarding the companies presented in the case study. The analysis focuses on the differences occurring in the allocation of the six resources introduced in Section 4.2. Figure 2 shows the results of this analysis.
Fig. 2. Resource Allocation in the Case Study Companies
Traditional data providers, such as BvD, D&B, and InfoGroup OneSource, are characterized by extensive allocation of resources in terms of Labor, Database, and Capital, but only medium allocation of resources with regard to Network Access and Relationships (although D&B, for example, employs about 5,000 people, which is by far more than any other competitor). In contrast, the business models of Factual and Infochimps rely on Network Access and Relationships to a major extent, although neither one employs a lot of staff or has sound Expertise and Knowledge in the business partner domain. As a consequence, both data providers use crowd sourcing mechanisms to enhance their databases. Avox takes an intermediate position when it comes to allocation of resources. Avox strongest resource is Expertise and Knowledge regarding a specific domain, namely business partner data for the financial industry.
6
6.1
Interpretation of Case Study Findings

Business Model Framework
Taking a resource-based view helps find explanations why the six business partner data providers under examination use different business models. For example, being a de-facto monopolist, D&B was able to develop adequate resources to acquire and manage business partner data over decades. These resourcesi.e. mainly Labor and
484
Databasehave allowed D&B to broadly diversify its offering in terms of scope, quality, and price of services. D&Bs ability to differentiate works as an entry barrier for new competitors. Since D&B is able to achieve high allocation of almost all of its key resources new entrants into the business partner data market are forced to find ways of extending their own resource base. Two approaches of extending ones resource base can be identified. Pattern II (community sourcing), as used by Avox, for example, represents a rather conservative approach, with customers contributing to the service providers resources. This approach is appropriate if data providers are able to leverage existing customer relationships in related areas of business (financial industry with a European focus in the case of Avox). A more radical extension of the resource base can be observed in business models following Pattern III (crowd sourcing), as used by Factual, for example. As a start-up company, Factual did not have any access to data via internal databases or existing customers, but had to build up their resources from scratch. The downside for providers of business partner data services following Pattern II and Pattern III is thatalthough having successfully entered an until then de-facto monopoly marketthey are limited in their offerings (data on certain industries only, data from customers only, for example) and the quality of the data they provide (community sourced or crowd sourced data is difficult to manage).
Fig. 3. Business Model Framework for Business Partner Data Providers
Exploring the situation of D&B, Avox, and Factual as typical examples of the Patterns I, II, and III, respectively, the paper proposes a business model framework (Fig-
485
ure 3) for business partner data providers. The framework comprises five discrete dimensions: pricing (premium pricing vs. budget pricing), quality (managed data vs. unmanaged data), sourcing (self-sourcing vs. crowd sourcing), market share (high vs. low), and offering (broad vs. niche). As the first three dimensions (pricing/quality/sourcing) correlated, they can be combined to form one single dimension. The same is true for the two other dimensions (market share and offering)although in a more differentiated sense: While a niche provideralthough strong in its nichehas a low overall market share, a low market share does not necessarily point to a niche provider but may also be the result of an early stage of market penetration. Figure 3 illustrates the current positions of D&B, Avox, and Factual in the framework, which consists of four quadrants: niche provider, new market entrant, wellestablished crowd-sourcer, and well-established traditional provider. The labeling of the quadrants takes into account the dynamics of the market and potential development paths the market participants may follow. As far as Factual is concerned, the position in the lower left quadrant (new market entrant) indicating a low market share and low quality, low cost data is highly unlikely to be sustainable. Therefore the necessary development for Factual should be to increase its market share in order to create new opportunities for more differentiated pricing models and active data management. Avox, as a niche provider, and D&B, as a well-established traditional provider, have no immediate need to change their respective business model, which, however, only holds true in a stable environment (i.e. if there are proper niches to occupy and if there is limited competition in the premium segment, respectively). Relying on a single niche may be dangerous for Avox, as specialized knowledge may become generally available or may lose its value in the future. Therefore it may be an option for Avox to leverage its expertise in exploiting one niche segment and increase its market share by addressing further niches or extending its offering to existing customers (by means of mergers and acquisitions, for example). Moreover, taking a resource-based view shows that there are not many key resources that are valuable, rare, inimitable and non-substitutable at the same time. In fact, Network Access and Relationships is the only key resource that meets each of the four criteria. In this regard, the well-established provider (D&B) has a rather weak position as far as the size of its network is concerned. At the same time, Factual, as a new entrant to the market, currently has the largest network and may be able to further improve its position regarding its other key resources. If this happened, Factuals business model would become a game changer, since Factual would be able to offer similar offerings as D&Bmanaged data, for exampleat much lower prices, thanks to its completely different cost structure. This would even affect the basic layout of the business model framework presented above, as the correlation of the framework dimensions would then become unstable. Furthermore, it is questionable whether D&B would be able to imitate this network resource, since that would require significantly different competencies and a different scope of management. Apart from that, the business partner data domain includes both companies representing the value chain paradigm (D&B, for example) and companies representing the value network paradigm (Factual, for example) [29]. Value networks leverage
486
positive network effects [30], i.e. each new member of the network increases the value of the network for all members. A value network may increase value and reduce costs at the same time, and thus create winner-takes-it-all situations through a bandwagon effect [29]. 6.2 Research Propositions
From the findings of the case study and the conclusions made with the help of the business model framework a set of propositions can be identified (see Table 3). These propositions help understand current business models of business partner data providers and outline their potential future development. Furthermore, the propositions lay the ground for future research to be done.
Table 3. Propositions on Business Models for Business Partner Data Providers
Proposition Description Supported by the case of Factual, Infochimps Avox, Factual, Infochimps Avox BvD, D&B Factual, Infochimps, D&B Avox Avox, Factual, Infochimps Avox, D&B, Factual, Infochimps
P1 P2a
New market entrants follow a growth strategy. New market entrants choose either a niche strategy focusing on highquality data (community sourcing) or a general strategy focusing on lower-quality data (crowd sourcing). Whether a niche strategy or a general strategy is chosen depends on having access to a niche community. Only a strong market position allows business partner data providers to differentiate their product portfolios and their pricing models. A strong market position may be achieved both by focusing on budget priced community data and by focusing on managed high-quality data. A strong market position may not be achieved by focusing on niche data. Community sourcing and even crowd sourcing will be a relevant approach in times of increasing cost competition. If a new market entrant successfully creates significant network effects by turning a value chain industry into a value network industry, this transformation will be irreversible and mandatory to follow for its competitors.
P2b P3 P4a
P4b P5
P6
Conclusion
The paper addresses two research questions with regard to business models of business partner data providers. First, it explores how these business models differ. The case study results imply that business models follow one of three different business model patterns: traditional buyer-supplier relationship, community sourcing, or crowd sourcing. These patterns differ mainly with regard to the instantiation of three busi-
487
ness model concepts, namely activities and organization, resources, and factor and production inputs. Second, the paper examines why business models of business partner data providers differ. Adopting a resource-based view the paper develops a business model framework in which business partner data providers can be positioned. Moreover, the paper identifies a set of propositions that help understand why these different business models evolved and how they may develop in the future. The paper contributes to the scientific body of knowledge as it is among the first endeavors to address business models in the business partner data domain, which is a topic of high relevance but still scarcely examined in the field of information systems research. Case description and analysis are grounded in theory and lead to a set of propositions. The paper may also benefit the practitioners community. The analysis of the business models together with the business model patterns that have been identified may help business partner data providers reflect their strategy and develop it further. Business partner data consumers may benefit from the findings by gaining a better understanding of the supply side of the market. Limitations of the paper derive mainly from the nature of case study research as a method of qualitative research. The paper is a first explorative step to deepen the understanding of business models in the business partner data domain. To achieve more theoretical robustnessby elaborating on the causal relationships underlying the propositions and by testing these propositionsfurther qualitative, but also quantitative research is required. For example, the business model patterns may be triangulated with business models of other data providers.
Acknowledgement
The research presented in this paper was partially funded by the European Commission under the 7th Framework Programme in the context of the NisB (The Network is the Business) project (Project ID 256955).
References
1. Newman, D.: How to Plan, Participate and Prosper in the Data Economy. Gartner, Stamford, CT (2011) 2. Wang, R.Y.: A Product Perspective on Total Data Quality Management. Communications of the ACM 41 58-65 (1998) 3. Goodhue, D.L., Quillard, J.A., Rockart, J.F.: Managing The Data Resource: A Contingency Perspective. MIS Quarterly 12, 373-392 (1988) 4. Madnick, S., Wang, R., Zhang, W.: A Framework for Corporate Householding. 7th International Conference on Information Quality, Cambridge, MA , 36-46 (2002) 5. Badenoch, D., Reid, C., Burton, P., Gibb, F., Oppenheim, C.: The value of information. In: Feeney, M., Grieves, M. (eds.): The value and impact of information. pp. 9-75. BowkerSaur, London (1994) 6. Boisot, M., Canals, A.: Data, information and knowledge: have we got it right? Journal of Evolutionary Economics 14, 43-67 (2004)
488
7. Oppenheim, C., Stenson, J., Wilson, R.M.S.: Studies on Information as an Asset I: Definitions. Journal of Information Science 29, 159-166 (2003) 8. Even, A., Shankaranarayanan, G.: Utility-driven assessment of data quality. ACM SIGMIS Database 38, 75-93 (2007) 9. Wang, R.Y., Strong, D.M.: Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information Systems 12, 5-34 (1996) 10. Timmers, P.: Business models for Electronic Markets. Electronic Markets 8, 3-8 (1999) 11. Alt, R., Zimmermann, H.-D.: Business Models. Electronic Markets 10, 3-9 (2001) 12. Bain, J.S.: Industrial organization. Wiley, New York, NY (1968) 13. Wernerfelt, B.: A Resource Based View of the Firm. Strategic Management Journal 5, 171-180 (1984) 14. Barney, J.: Firm Resources and Sustained Competitive Advantage. Journal of Management 17 ,99-120 (1991) 15. Mintzberg, H.: Patterns in strategy formation. Management Science 24, 934948 (1978) 16. Ginsberg, A.: Minding the Competition: From Mapping to Mastery. Strategic Management Journal 15, 153174 (1994) 17. Porter, M.E.: Competitive Strategy: Techniques for Analysing industry and Competitors. The Free Press, New York (1980) 18. Zott, C., Amit, R., Massa, L.: The Business Model: Theoretical Roots, Recent Developments, and Future Research. IESE Business School - University of Navarra, Barcelona, Spain (2010) 19. Osterwalder, A., Pigneur, Y.: Business Model Generation. Wiley, Hoboken, NJ (2010) 20. Hedman, J., Kalling, T.: The business model concept: theoretical underpinnings and empirical illustrations. European Journal of Information Systems 12, 49-59 (2003) 21. Yin, R.K.: Case study research: design and methods. Sage Publications, Thousand Oaks, CA (2002) 22. Benbasat, I., Goldstein, D.K., Mead, M.: The Case Research Strategy in Studies of Information Systems. MIS Quarterly 11, 369-386 (1987) 23. Eisenhardt, K.M.: Building Theories from Case Study Research. Academy of Management Review 14 (1989) 532-550 24. Morgan, D.L., Krueger, R.A.: When to use Focus Groups and why? In: Morgan, D.L. (ed.): Successful Focus Groups. Sage, Newbury Park, CA, 3-19 (1993) 25. Linder, J.C., Jarvenpaa, S., Davenport, T.H.: Toward an Innovation Sourcing Strategy. MIT Sloan Management Review 44, 43-51 (2003) 26. von Hippel, E.: Innovation by User Communities: Learning from Open-Source Software. MIT Sloan Management Review 42, 82-86 (2001) 27. Kotler, P.: The Prosumer Movement: A New Challenge for Marketers. In: Lutz, R.J. (ed.): Advances in Consumer Research, Vol. 13. Association for Consumer Research, Provo, UT, 510-513 (1986) 28. Doan, A., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing Systems on the World-Wide Web. Communications of ACM 54, 86-96 (2011) 29. Fjeldstad, .D., Haans, K.: Strategy Tradeoffs in the Knowledge and Network Economy. Business Strategy Review 12, 1-10 (2001) 30. Katz, M.L., Shapiro, C.: Network Externalities, Competition, and Compatibility. American Economic Review 75, 424 (1985)
489
Industrial Cloud: Toward Inter-enterprise Integration

Tomasz Wiktor Wlodarczyk, Chunming Rong, and Kari Anne Haaland Thorsen
Department of Electrical Engineering and Computer Science, University of Stavanger, N-4036 Stavanger, Norway {tomasz.w.wlodarczyk,chunming.rong,kari.a.thorsen}@uis.no
Abstract. Industrial cloud is introduced as a new inter-enterprise integration concept in cloud computing. The characteristics of an industrial cloud are given by its definition and architecture and compared with other general cloud concepts. The concept is then demonstrated by a practical use case, based on Integrated Operations (IO) in the Norwegian Continental Shelf (NCS), showing how industrial digital information integration platform gives competitive advantage to the companies involved. Further research and development challenges are also discussed. Keywords: cloud computing, integrated operations.
1 Introduction
The increasing amount of industrial digital information requires an integrated industrial information platform to exchange, process and analyze the incoming data, and to consult related information, e.g. historic or from other connected components, in order to obtain an accurate overview of the current operation status for a consequent decision. Collected information may often cross disciplines where it originated from. The challenge is to handle it in an integrated, cost effective, secure and reliable way. An enterprise may use the existing organizational structure for their information classification. However, as collaborations often exist across enterprises, information flow that crosses enterprise boundaries must be facilitated. Earlier attempts have been made within one enterprise. An industry wide collaboration poses more challenges. Existing general solution such as information grid [1] are not adequate to deal with the complexity in the challenges. Recently, there have been many discussions on what cloud is and is not [2-14]. Potential adopters were also discussed [4, 7]. However, most solutions mainly focus on small and medium size companies that adopt what is called a public cloud. Adoption of public cloud by large companies was discussed, but there were significant obstacles in it, mainly related with security. Some of them were answered by what is called a private cloud. In this paper, industrial cloud is introduced as a new inter-enterprise integration concept in cloud computing to solve the stated problem. Both definition and architecture of an industrial cloud are given and compared with the general cloud characteristics. By extending existing cloud computing concepts, we propose a solution that may provide convenient, integrated and cost effective adaptation. These
M.G. Jaatun, G. Zhao, and C. Rong (Eds.): CloudCom 2009, LNCS 5931, pp. 460471, 2009. Springer-Verlag Berlin Heidelberg 2009
461
advantages are recognized in a large scale industrial collaboration project Integrated Operations (IO) [16], where a central element is to establish an inter-enterprise digital information integration platform for members of OLF in Norwegian Continental Shelf (NCS) [15]. The paper consists of five sections. After a short introduction in Section 1, a brief survey of the recent efforts on cloud computing is given in the Section 2. A categorization of cloud is proposed to reflect actual business models and to facilitate more precise definition. In Section 3 the concept of industrial cloud is precisely defined. Generic architecture is proposed and explained. Further, a practical use case, based on Integrated Operations in NCS, is provided to show how this industrial digital information integration platform gives a competitive advantage to companies involved. Existing technologies that are essential parts of industrial cloud are named and described. In the end of this section further research and development challenges are also discussed. In Section 4 compact comparison of the three types of clouds: public, enterprise and industrial, is provided. Paper concludes with summary of main points.
2 Categories of Clouds
The general goals of cloud computing are to obtain better resource utilization and availability. The concept of cloud computing is presented sometimes as a grouping of other various concepts, especially SaaS, IaaS and HaaS [17], but the concept has also been defined differently from paper to paper in [2-14], indicating different models of cloud. The differences in organization and architecture of a cloud are often influenced by different business models cloud computing concept is applied to. Division between public and private (also hybrid between them) can be seen in several publications [8]. In this paper, public and enterprise cloud are identified by business models they are applied to, viewed from a global perspective. 2.1 Public Cloud Public cloud is the most common model of cloud, with popular examples such as Amazon Web Services [18] and Google App Engine [19]. One definition of public cloud, given by McKinsey [3], states that: Clouds are hardware-based services offering compute, network and storage capacity where: 1. Hardware management is highly abstracted from the buyer 2. Buyers incur infrastructure costs as variable OPEX 3. Infrastructure capacity if highly elastic (up or down) Public cloud is used mainly by small and medium size companies, very often startups. That is because it offers effortless hardware management and flexibility without any significant entrance costs. Access to public cloud is realized through internet. Hardware is owned and managed by an external company. Hardware issues are of no interest for companies using it. High degree of hardware utilization is achieved by means of virtualization (other examples also exist [20]). Platform is generic, usually providing one of application frameworks or access to standard computing resources.
462
T.W. Wlodarczyk, C. Rong, and K.A.H. Thorsen
There is no particular focus on collaboration between applications and no facilitation of reusing data between them. Public cloud features OpEx (Operational Expenditure) type of billing based on actual usage or on per month fee. There is small to usually none CapEx (Captial Expenditure). Security and privacy might be an issue as data is stored by en external entity. On the other hand, cloud providers might have better focus and bigger resources to address those issues than a small company [21]. Companies have no control over cloud provider. Therefore, it is important that there are clear policies on data handling and possibly external audit [22]. Public cloud also might raise geopolitical issues because of physical data placement. That is currently solved by separate data centers in different parts of the world [18]. However, it is a questionable solution in longer term. There is vendor lock-in threat, resulting in problems with data transfer between cloud vendors. However, that is a bigger issue for users of cloud-based applications than for companies providing services over the cloud. 2.2 Enterprise Cloud Enterprise cloud focuses not only on better utilization of computer resources, but also on integrating services crucial to companys operations and thereof their optimization. Good example here is Cisco vision [17]. Access to enterprise cloud is realized mainly through intranet, but internet might also be used. Hardware is owned and managed by the enterprise itself. Therefore, hardware issues are still present, however, to lesser extent. Hardware utilization can be improved by means of virtualization; however it might cover only some parts of companys datacenter. Platform is designed for the specific purpose and capable of supporting companys key operations. There is strong focus on collaboration between applications and facilitation of reusing and integrating data between them. Enterprise cloud can be economically beneficial to the company however it requires up-front investment and does not offer OpEx-type of billing. Control, security and privacy is not an issue (beyond what is required currently) as data are stored by the company itself. What is more, thanks to centralization security level might significantly increase [23]. There might be some geopolitical issues in case of centralization of international operations. There is no significant vendor lockin threat. Dependence on software vendors providing cloud functionalities is more or less the same as on currently used software. Adoption of public cloud by large companies or enterprises was also discussed [3]. There are already some examples of such adoptions [24]. At the same time many companies do not even consider such step. In their case benefits of public cloud are too small to counterbalance security, privacy and control risks. 2.3 Beyond Enterprise Cloud Enterprise cloud seems to be a good solution for integration inside a large company. However nowadays, enterprises face additional challenges which result from collaboration with other enterprises in the industry. Such collaboration is necessary to stay competitive, but it requires introduction of new technological solutions.
463
Some of integration and provisioning challenges have already been discussed in the concept of Information Grid. Notably, Semantic Web solutions were proposed to unify all the data in the company and to view them in a smooth continuum from the Internet to the Intranet [1]. Some authors proposed also integrating resources provisioning [25]. However, Information Grid model, that focuses on one enterprise only, did not offer convenient, seamless and integrated approach to practically solve interenterprise challenges. It is not only information data that are involved, but also work processes, and definition, operation and service models that need to be reconciled and collaborated in a seamless way. Hence, information grid is only a beginning. Finally, Information Grid model does not lead to new opportunities in the industry in the way cloud computing does e.g. lowering entrance costs for start-ups that leads to increased competition and innovation level. Therefore, in the next section industrial cloud is introduced as a new interenterprise integration concept in cloud computing. A precise definition is given and then explained by a practical use case.
3 Industrial Cloud
3.1 Definition and Architecture Industrial cloud is a platform for industrial digital information integration and collaboration. It connects unified data standards and common ontologies with open and shared architecture in order to facilitate data exchange and service composition between several companies. It should be controlled by an organization in form of e.g. special interest group (SIG) consisting of industry representatives to ensure development, evolution and adoption of standards. SIG should cooperate with international standardization body. In Fig. 1. industrial cloud is presented. It binds together enterprises in the industry and also service companies. Enterprises are the core of the industry. Service companies usually provide services to those enterprises and very often participate in more than industry. In traditional business-to-business (B2B) systems metadata and semantics are agreed upon in advance and are encapsulated in the systems. However, the trend is moving towards more open environment where communicating partners are not given at prior. This demands solutions where the semantics are explicit and standardized [26]. Information management, information integration and application integration require that the underlying data and processes can be described and managed semantically. Collaboration and communication within an industrial cloud depend on a shared understanding of concepts. Therefore, the basic elements of industrial cloud are unified data standards, common ontolgies, open and shared architecture and secure and reliable infrastructure. Unified data standards allow easy data exchange between companies. Common ontologies ensure shared point of view on meaning of data. Metadata need to be shared among applications, and it should be possible to semantically describe applications within the cloud. Open and shared architecture is a way to efficiently interconnect participants in industrial cloud.
464
Enterprise Cloud Enterprise Cloud Industrial Cloud

Service Company Cloud
Enterprise Cloud
Service Company Cloud
Fig. 1. Industrial Cloud
An ontology is a structure capturing semantic knowledge about a certain domain, by describing relevant concepts and the relations between these concepts [27, 28]. With a shared ontology it is possible to communicate information across domains and systems, independent of local names and structuring. This enables an automatic and seamless flow of data, where information can be accessed from its original location in the same way as if it was stored locally. In [29] Noy et al point out several reasons to construct and deploy ontologies, e.g.: ease of information exchange, easier for a third party to extract and aggregate information from diverse systems, easier to change assumptions of the world and analyze domain knowledge. The ontology creation should be mainly industry focused process. There is current and stable trend of moving construction of meta-data from enterprise to industrial level. Cross-industry approach might be useful; however, it is not probably on larger scale. In our current work we see that those ontologies have to be hierarchically organized depending on their detail level. The more general ones will be common in the industry. More detailed ones might stay specific to a particular company or consortium. However, they will still have reference to the more general ontologies. Data standards together with ontologies acting on open and shared architecture allow for easy service composition from multiple providers. Secure and reliable infrastructure builds trust for the platform and between all participants. It should be easy to add new applications to the cloud and applications should be easy to be found based on the services and they provide. By providing applications as semantically described web services [30], based on the commonly agreed ontology, it would be easy to search for particular service within them. Domain-knowledge is extracted from the applications; not hard-coded within the systems. It is then easier to provide new services, and automatically interpret the operations provided by these services. Industry can form an industrial cloud in order to enable on-the-fly and automatic outsourcing and subcontracting, lower operation costs, increase innovation level and create new opportunities for the industry. Cloud approach can be used as a way to ensure abstraction layer over all underlying technological solutions and integration patterns. Industrial cloud is the lacking element that binds and structures existing
465
Enterprises
Decision support
Agent Enterprises
Information exchange
Industrial Cloud
Service company
Enterprises Service company
Service composition
Data formats, Ontologies, Architecture and Infrastructure
Fig. 2. Integration, collaboration and composition in industrial cloud
technologies on the way to practical implementation. Fig. 2. summarizes main goals of industrial cloud, that is: information exchange, decision support and service composition. As for now, industrial cloud was defined in terms of its general purpose and technologies used. Further, it is important to place it in comparison with already existing types of clouds. In Fig. 3. all three types of cloud are presented in a form of a stack of functionalities they provide. Looking at current providers of public cloud like Google Apps Engine[19] or AWS[18] one can see that they offer two basic functions: provisioning (mainly of processing time and storage space), and metering and billing systems for resources they provide. Public cloud is realized through hardware virtualization (or similar technologies). Cloud provider supplies an API that is later utilized by cloud adopters. Enterprise cloud builds on fundament of public cloud. Further, it adds possibility of administrating workflows in the cloud, managing workload and monitoring which goes further than simple metering in public cloud. In this way enterprise cloud is less general but at the same time provides better support of large business users. Industrial cloud is created on the base of public and enterprise cloud. It features easier hardware provisioning by virtualization, it offers workflows administration, workload management and monitoring. However, it further facilitates integrational tasks like policies, reliability management, security and trust, outsourcing and subcontracting. It adds support for semantic interpretation of data, mapping, data fusion and service discovery and composition. Fig. 3. visualizes why the inter-enterprise integration concept introduced in this paper forms part of cloud computing. It builds on already existing cloud models and introduces extensions to them based on actual needs of industries. With time some of new functions in industrial cloud may migrate into the lower level clouds.
466
Enterprises
Service Companies
Authorities
SIG
Industrial Cloud Management Services

Service Discovery & Composition Security & Trust Outsourcing & Subcontracting Semantic interpretation
Policy
Data fusion
Industrial Cloud
Reliability management
Mapping & Integration
Administration workflows
Monitoring
Workload management
Enterprise Cloud
Provisioning
Billing, Metering, etc.
Public Cloud
Fig. 3. Industrial Cloud stacked on Enterprise and Public Cloud
3.2 Example from the Integrated Operations in Oil and Gas The oil and gas industry on NCS has for some years now been working on the concept of Integrated Operations (IO). Integrated operations aim at supporting the industry in reaching better, faster and more reliable decisions, and is expected to have a great impact on information flow between different sub-domains. IO is planed to be implemented in two steps: Generation 1 and Generation 2 (G1 and G2). G1 focuses on integration of offshore and onshore, real-time simulation and optimizing of key work processes. G2 integrates operation centers of operators (enterprises) and vendors (service providers), focuses on heavy automation of processes and optimization of processes across domains. There are several ongoing research project related to IO. The biggest, Integrated Operations in the High North (IOHN) [16], embraces several sub-projects focusing on different aspects of IO G2. The suggested technologies rely on an underlying architecture to build upon. The industrial cloud may provide such architecture solution The oil and gas industry is an information and knowledge industry. Data exist for decades and needs to be shared across different businesses, domains and applications. By combining data from several sources it is possible to gain more information than if the information was separated. This relies on the ability to semantically recognize the content of data. At present, data are isolated in information silos. Communicating and sharing information often result in man-hours and expenses on mapping data from one structure to another. By example, within the sub-domain of drilling and completion alone there are more than five different communication standards to relate to e.g. WITSML or OPC-UA. Much of the knowledge and logic is hard-coded within the different applications. It is difficult to share and transfer data to new or other systems
467
without information loss. In the recent years, ISO15926 was being developed as an upper-level data integration standard and ontology that could enable data sharing among several companies. It proved to be successful in initial tests. With the use of a shared ontology metadata are extracted from the applications and presented in a way that can be more easily shared among partners. Data are often stored in several places and over time these data tend to be inconsistent. Barriers between isolated information domains need to be broken down. There is a need for solutions where data can be accessed directly from the source. The industrial cloud focuses on cross-company application collaboration, and will ease communication and access of data across company and application boundaries. The oil and gas industry has already developed SOIL, an industrial communication network that provides high reliability and independence of other solution like internet. However, SOIL does not offer any kind of collaboration and integration facilities apart from secure network connection. The industry consists of many, both small and large, companies. Service companies providing services to several operators spend much time on integration with operators systems. With an underlying cloud architecture service provides can offer new services to the cloud as a whole, without the need for tailored integration with all the different operators. Industrial cloud could serve as a platform for actual delivery of Integrated Operations on NCS. Industrial cloud is capable of providing easy, abstracted access to all aforementioned technological solution integrating them in one efficient and simple product. 3.3 Challenges and Further Work Industrial cloud can be the solution to the problem of inter-enterprise digital information integration and collaboration. However, there are a few challenges that should be a subject of research and industrial effort while practically implementing the industrial cloud concept. Integration and collaboration requires inter-enterprise standardization. To do that different definitions or names on the same concept, different data formats, different work procedures have to be reconciled. This is usually easier said than done. For example, the ISO15926 is still far from completion after over ten years effort with participation of major actors in the domain. The biggest challenge is security. How to secure each companies data, but at the same time do not impede collaboration? Multi-level authentication could be a solution to this. However, more development in this field has to done as proper security solutions will be a key element of industrial cloud. Other challenges consist of dealing with many versions of truth for reasoning purposes, what is result of shared environment and integration of data in many formats. This topics are subject of current research in Semantic Web field [31]. Enabling old data to be used in the new environment is also a challenge. It is important as companies want to use all the data they already have. There already have be interesting attempts to do that [32]. Communication and possibly synchronization between industrial cloud and enterprise clouds is not yet solved. Similar but not exactly the same problems are already investigated in form of synchronization between private and public cloud [8]. As outsourcing can be automatic there is a need for automated contracting solutions, which have been topic of recent research [33].
468
4 Cloud Categories Comparison

Industrial cloud should also be compared with other types of cloud in terms of: how it is implemented, who is using it and what are problematic issues. This is summarized in Table. 1. In contrast with public cloud, industrial cloud is used by large companies together with smaller companies and in contrast with enterprise cloud it focuses on collaboration between several companies. Access to industrial cloud is realized
Table 1. Cloud categories comparison Public Who and why Small and medium companies; to lower hardware maintenance costs Internet External owner; aggressive virtualization Enterprise Large companies; to integrate internal services Industrial Large and other companies in one industry; to integrate inter-enterprice collaboration Extranet (and internet) Many owners; some individual virtualization; cross company virtualization not probable focused on integration and collaboration Enterprise specific; collaboration and composition CapEx, some OpEx possible Crucial issue; need of top-level security while preserving collaboration Not an issue; controlled by SIG (collaborating with international standard authority) Some issues; but industry should be ready to deal with them
Network Hardware
Intranet (and internet) Owned by the enterprise; some virtualization
Platform
programming and resources access Various; no collaboration OpEx Might be higher in some aspects; but privacy is a significant problem Problem; need of open policies and external audits Problem; geographically dependent data centers only a temporary solution Problem; open standards should help
supporting integration of operations Company specific; collaboration CapEx Security will increase as a result of central enforcing of policies Not an issue; everything controlled by one company Some issues; but company should be ready to deal with them
Applications
Economics Security and privacy
Control
Geopolitics
Vendor lock-in
Not a problem; everything owned by the company
Not a significant problem; issue controlled by SIG
469
through extranet or internet. Hardware is owned and managed independently by many companies, though, some part of hardware in each company will follow shared standard of open architecture. Basing on that, some companies can provide access to its data centers to other companies. This will improve hardware utilization and also facilitate agent mobility. Platform is designed for the specific purpose and capable of supporting industrys key operations. There is strong focus on collaboration between applications and facilitation of reusing and integrating data between them. Security and privacy are crucial issue as data must be shared and protected at the same time. Because of security and reliability needs extranet implementation might be very often advised. Some geopolitical issues might appear, however, industries are probably already aware of them. Vendor lock-in threat is not a significant issue as long as industrial cloud is wisely managed by SIG. Actually, it might be much smaller than currently. SIG should be organized on industrial level. Cross-industrial approach would most probably create many SIGs that would jeopardize standardization process. It should possible to avoided on industrial level, even though, that is definitely a challenge.
5 Summary
In this paper, industrial cloud is introduced as a new inter-enterprise integration concept in cloud computing. Both definition and architecture of industrial cloud are given in comparison with the general cloud characteristics. The concept is then demonstrated by a practical use case, based on IO in the NCS, showing how industrial digital information integration platform gives competitive advantage to the companies involved. The oil and gas industry in NCS recognizes the great potential value in full implementation and deployment of industrial cloud, where integration and collaboration are the key.
References
1. Alonso, O., Banerjee, S., Drake, M.: The Information Grid: A Practical Approach to the Semantic Web, http://www.oracle.com/technology/tech/semantic_technologies/ pdf/informationgrid_oracle.pdf 2. Mitra, S.: Deconstructing The Cloud (2008), http://www.forbes.com/2008/09/18/ mitra-cloud-computing-tech-enter-cx_sm_0919mitra.html 3. Forrest, W.: McKinsey & Co. Report: Clearing the Air on Cloud Computing (2009), http://uptimeinstitute.org/images/stories/ McKinsey_Report_Cloud_Computing/ clearing_the_air_on_cloud_computing.pdf 4. Buyya, R., Chee Shin, Y., Venugopal, S.: Market-Oriented Cloud Computing: Vision, Hype, and Reality.... In: 10th IEEE International Conference on High Performance Computing and Communications, HPCC 2008 (2008) 5. Douglis, F.: Staring at Clouds. IEEE Internet Computing 13(3), 46 (2009) 6. Grossman, R.L.: The Case for Cloud Computing. IT Professional 11(2) (2009)
470
7. Hutchinson, C., Ward, J., Castilon, K.: Navigating the Next-Generation Application Architecture. IT Professional 11(2), 1822 (2009) 8. IBM. IBM Perspective on Cloud Computing (2008), http://ftp.software.ibm.com/software/tivoli/brochures/ IBM_Perspective_on_Cloud_Computing.pdf 9. Lijun, M., Chan, W.K., Tse, T.H.: A Tale of Clouds: Paradigm Comparisons and Some Thoughts on Research Issues. In: Asia-Pacific Services Computing Conference 2008, APSCC 2008, IEEE, Los Alamitos (2008) 10. Lizhe, W., et al.: Scientific Cloud Computing: Early Definition and Experience. In: 10th IEEE International Conference on HPCC 2008 (2008) 11. Youseff, L., Butrico, M., Da Silva, D.: Toward a Unified Ontology of Cloud Computing. In: Grid Computing Environments Workshop, GCE 2008 (2008) 12. Rayport, J.F., Heyward, A.: Envisioning the Cloud: The Next Computing Paradigm (2009), http://www.marketspaceadvisory.com/cloud/ Envisioning_the_Cloud_PresentationDeck.pdf 13. Weinhardt, C., et al.: Business Models in the Service World. IT Professional 11(2), 2833 (2009) 14. Open Cloud Manifesto (2009), http://www.opencloudmanifesto.org/ 15. Map of the Norwegian continental shelf (2004), http://www.npd.no/English/Produkter+og+tjenester/ Publikasjoner/map2003.htm 16. Integrated Operations in the High North, http://www.posccaesar.org/wiki/IOHN 17. Gore, R.: The experience of Web 2.0 Communications and collaboration tools in a global enterprise - The road to 3.0 (2009), http://www.posccaesar.org/svn/pub/SemanticDays/2009/ Session_1_Rich_Gore.pdf 18. Amazon Web Services, http://aws.amazon.com 19. Google App Engine, http://code.google.com/appengine/ 20. Perilli, A.: Google fires back at VMware about virtualization for cloud computing (2009), http://www.virtualization.info/2009/04/ google-fires-back-at-vmware-about.html 21. Have You Adopted Small Business Cloud Computing? (2009), http://www.smallbusinessnewz.com/topnews/2009/02/04/ have-you-adopted-small-business-cloud-computing 22. Gartner: Seven cloud-computing security risks (2008), http://www.infoworld.com/d/security-central/ gartner-seven-cloud-computing-security-risks-853 23. Should an organization centralize its information security division? (2006), http://searchsecurity.techtarget.com/expert/ KnowledgebaseAnswer/0,289625,sid14_gci1228539,00.html 24. Google Apps makes its way into big business (2009), http://www.computerweekly.com/Articles/2008/06/24/231178/ google-apps-makes-its-way-into-big-business.htm 25. Taylor, S., Surridge, M., Marvin, D.: Grid Resources for Industrial Applications. In: IEEE International Conference on Web Services (2004) 26. Aassve, ., et al.: The SIM Report - A comparative Study of Semantic Technologies (2007)
471
27. Antoniou, G., Harmelen, F.v.: A Semantic Web Primer, 2nd edn. MIT Press, Cambridge (2008) 28. Grobelnik, M., Mladeni, D.: Knowledge Discovery for Ontology Construction. In: John Davies, R.S.P.W. (ed.) Semantic Web Technologies, pp. 927 (2006) 29. Noy, N.F., McGuinness, D.L.: Ontology Development 101: A guide to.., in Stanford Knowledge Systems Laboratory Technical Report, p. 25 (2001) 30. Roman, D., et al.: Semantic Web Services - Approaches and Perspectives. In: Davies, J., Studer, R., Warren, P. (eds.) Semantic Web Technologies: Trends and Research in Ontology-based Systems, pp. 191236. John Wiley & Sons, Chichester (2006) 31. W3C Semantic Web Activity (2009), http://www.w3.org/2001/sw/ 32. Calvanese, D., Giacomo, G.d.: Ontology based data integration (2009), http://www.posccaesar.org/svn/pub/SemanticDays/2009/ Tutorials_Ontology_based_data_integration.pdf 33. Baumann, C.: Contracting and Copyright Issues for Composite Semantic Services. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 895900. Springer, Heidelberg (2008)
5
DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP
CHAPTER OUTLINE 5.1 Return on the Data Quality Investment 68 5.2 Developing the Business Case 69 5.3 Finding the Business Impacts 69 5.4 Researching Costs 72 5.5 Correlating Impacts and Causes 73 5.6 The Impact Matrix 74 5.7 Problems, Issues, Causes 75 5.8 Mapping Impacts to Data Flaws 75 5.9 Estimating the Value Gap 76 5.10 Prioritizing Actions 79 5.11 The Data Quality Road Map 81 5.12 Practical Steps for Developing the Road Map 84 5.13 Accountability, Responsibility, and Management 84 5.14 The Life Cycle of the Data Quality Program 86 5.15 Summary 90
One of the most frequently asked questions about developing a data quality program is how do we develop a convincing business case for investing in information quality improvement? In this chapter we look at how our characterization of risks associated with ignoring data quality problems can be presented to senior management as an opportunity for developing competitive advantage, and what considerations for staffing and planning can be compiled into a tactical road map for deploying a data quality strategy. One of the major issues is that the senior managers who already recognize the value of improved data quality dont need justification to initiate a data quality program. However, organizational
# 2011 Elsevier Inc. All rights reserved. Doi: 10.1016/B978-0-12-373717-5.00005-1
67
68
Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP
best practices require that some form of business case be assembled and presented to a governing body to justify the investment in any kind of activity. A data quality improvement program is a serious commitment on behalf of an organization, and its importance deserves to be effectively communicated to the all of the business managers who may participate, either as sponsors or as beneficiaries. In chapter 1, we identified key impact dimensions and corresponding impact categories associated with poor data quality. The process of building a business case to justify both the technology and the organizational infrastructure necessary to ensure a successful program requires additional research and documentation, namely: Quantification of identified financial impacts, Assessment of the actual financial impacts, Determination of the source of the actual root causes in the information processing that are correlated to those impacts, Diagnosis of the root cause of the process failure, Determination of potential remediation approaches, The costs to remediate those process failures, and A way to prioritize and plan the solutions of those problems. All of this information can be accumulated into a pair of templates: one for impact analysis and the other for estimating the opportunity for value improvement or creation. In particular, the impact template is used to document the problems, issues, business impacts, and quantifiers. Together all this information enables the practitioner to estimate a quantified yearly incurred impact attributable to poor data quality.
5.1 Return on the Data Quality Investment

What is the purpose for developing a return on investment (ROI) model? In many situations, the ROI formulation is used before starting a project purely for the purpose of project approval and initiation and is then forgotten. In other environments, the ROI calculation is made after the fact as a way of demonstrating that some activity had some kind of positive business impact. In either situation, the ROI model is a marketing device. But while one might consider this approach as appropriate for projecting a return on investment, it is also reasonable to consider whether the expected returns are directly (and predictably) attributable to operations that are within the organizations control.
69
As an example, lets say the national tax collection agency (in the United States that is the Internal Revenue Service) has built a business case for the investment of a large amount of money to reengineer its software systems, using an expected increase in tax collections as the business justification. The ROI model suggests that building a more modern application system will result in greater collections. The improved system may account for more precision in calculating and collecting taxes, but in reality the amount of taxes collected depends on more than just the computer application. A downturn in the economy might result in more people out of work, legislation may mandate a freeze on the minimum wage or lower the tax rates, or natural disasters may result in migratory populations that are difficult to track down and contact. In essence, justifying the creation of a new application system based on increased collections ignores the fact that the expected performance results depend on a number of other variables beyond the organizations control.
5.2 Developing the Business Case

Therefore, the intention is not just to provide information that can be used to justify a data quality program, it is to provide a foundation for continuing to use the knowledge acquired during this phase to manage performance improvement over the data quality life cycle. If the impacts are truly related to poor data quality, then improving data quality will alleviate the pain in a measurably correlated manner. In turn, then, the ROI model becomes a management tool to gauge the effectiveness of the program. If improving data quality really will lead to improvements in achieving the business objectives (as is to be claimed by the business case), then the same measures used to determine the value gap can be used to monitor performance improvement! The process for developing a business case is basically a quest to identify a value gap associated with data quality the area of greatest opportunity for creating new value with the optimal investment. Following the process summarized in Figure 5.1 will help the analyst team identify the opportunities with the highest value and, therefore, the highest priority.
5.3 Finding the Business Impacts

It is highly probable that not only will there be an awareness of existing data quality issues, there will also be some awareness of the magnitude of the impacts these issues incur. The value of
70
Identify business impacts
Research costs
Correlate impacts and causes
Build the impact matrix
Research root causes
Map impacts to data flaws
Estimate value gap
Prioritize tasks
Figure 5.1 Developing a business case for data quality management.
the impact taxonomy developed based on the material in chapter 1 is twofold. First, by clearly specifying the many different impacts, it is possible to trace some of the issues back through the processing stages and determine whether some number can be attributed to a single process failure. Second, it shows how the results of different data quality events can be grouped together, which simplifies the research necessary to determine financial impact.
5.3.1 Roles and Responsibilities

Although there may be some awareness of existing issues, as a practical matter, the process of identifying and categorizing impacts is best performed as a collaborative effort among the line-of-business managers and their supporting information technology staff. The early process of identification, by necessity, relates poor data quality to business issues, which require knowledge of both business processes and how applications support those processes. Therefore, a small team consisting of one business representative and one IT representative from each line of business should assemble to expose those issues that will drive the business case. This meeting should be scheduled for an extended block of time (half a day) and convene at a location that is away from distractions such as telephone and email. One attendee should be included as a scribe to document the discussion.
5.3.2 Clarification of Business Objectives

Because data quality management is often triggered by acute events, the sentiment may be reactive (what do we do right now to improve the quality?), perhaps with some level of anxiety.
71
To alleviate this, it is necessary to level-set the meeting and ensure that every participant is aware that the goal is to come up with clearly quantifiable issues attributable to unexpected data. To achieve this, it is useful for each groups business participant to prepare a short (10 minutes) overview of that groups business objectives what services the group provides, what investment is made (staffing and otherwise) in providing those services, and how success is quantified. Next, each groups IT participant should provide a short overview of how information is used to support the groups services and achieve the business objectives.
5.3.3 Identification and Classification

The next step, then, in developing the business case is to clearly identify the issues attributable to poor data quality and to determine if they indeed are pain points for the organization. Again, we can employ the impact categories described in chapter 1 in this process, mostly from the top down by asking these questions: Where are the organizations costs higher than they should be? Are there any situations in which the organizations revenues are below expectations? (Note: for nonprofit or governmental organizations, you may substitute your quantifiable objectives for the word revenues.) Are there areas where confidence is lowered? What are the greatest areas of risk? The answers to these questions introduce areas for further concentration, in which the questions can be refined to focus on our specific topic by appending the phrase because of poor data quality at the end (e.g., Where are the organizations costs higher than they should be because of poor data quality?). The analyst can again employ the taxonomy at a lower level, asking questions specifically about the lower levels of the hierarchy. For example, if the organizations costs are higher, is it due to error detection, correction, scrap, rework, or any other area of increased overhead costs?
5.3.4 Identifying Data Flaws

At the same time, it will be necessary to understand how the impact is related to poor data quality. Most often a direct relation can be assessed each issue has some underlying cause that can be identified at the point of manifestation. For example, extra
72
costs associated with shipping ordered items occur when the original shipping address is incorrect and the item is returned and needs to be shipped a second time. Data flaws are the result of failed processes, so understanding the kinds of data flaws that are causing the impacts will facilitate root cause analysis. At the end of this stage, there should be a list of data flaws and business impacts that require further investigation for determination of financial impact, assessment of measurement criteria, and setting performance improvement goals.
5.4 Researching Costs

The next step in the process is to get a high-level view of the actual financial impacts associated with each issue. This step combines subject matter expertise with some old-fashioned detective work. Because the intention of developing a business case is to understand gross-level impacts, it is reasonable to attempt to get a high-level impact assessment that does not require significant depth of analysis. To this end, there is some flexibility in exactness of detail. In fact, much of the information that is relevant can be collected in a relatively short time. In this situation, anecdotes are good starting places, since they are indicative of high-impact, acute issues with high management visibility. Since the current issues probably have been festering for some time, there will be evidence of individuals addressing the manifestation of the problem in the past. Historical data associated with work/process flows during critical data events are a good source of cost/impact data. To research additional impact, it is necessary to delve deeper into the core of the story. To understand the scope, it is valuable to ask these kinds of questions: What is it about the data that caused the problem? How big is the problem? Has this happened before? How many times? When this happened in the past, what was the remediation process? What was done to prevent it from happening again? Environments with event and issue tracking systems have a head start, as the details will have been captured as part of the resolution workflow. Alternatively, organizations with formal change control management frameworks can review recommended and implemented changes triggered as a result of issue remediation.
73
An initial survey of impact can be derived from this source detection, correction, scrap and rework, and system development risks are examples of impact categories that can be researched through this resource. At the same time, consult issues tracking system event logs and management reports on staff allocation for problem resolution and review external impacts (e.g., stock price, customer satisfaction, management spin) to identify key quantifiers for business impact.
5.5 Correlating Impacts and Causes

The next step in developing the business case involves tracking the data flaws backward through the information processing flow to determine at which point in the process the data flaw was introduced. Since many data quality issues are very likely to be process failure, eliminating the source of the introduction of bad data upstream will provide much greater value than just correcting bad data downstream. Consider the example in Figure 5.2. At the data input processing stage, a customer name and contact information are
Data Flaws Incur Business Impacts An Example

Customer contact name, contact info misspelled at data entry point
Customer Service
Accts Receivable
Customer name does not match customer database, new record inserted with invalid information
Fulfillment
Figure 5.2 An example of how one data flaw causes multiple impacts.
74
incorrectly entered. The next stage, in which an existing customer record is located, the misspelling prevents the location of the record, and a new record is inadvertently created. Impacts are manifested at Customer Service, Accounts Receivable, and Fulfillment. In this supply chain example, it is interesting to note that each of the client application users would assume that their issues were separate ones, yet they all stem from the same root cause. The value in assessing the location of the introduction of the flaw into the process is that when we can show that one core problem has multiple impacts, the value of remediating the source of the problem will be much greater.
5.6 The Impact Matrix

The answers to the questions combined with the research will provide insight into quantifiable costs, which will populate an impact matrix template. A simple example, shown in Figure 5.3, is intended to capture information about the different kinds of impacts and how they relate to specific problems. In this example, there are five columns in the impact matrix:
Figure 5.3 An example of an impact template.

Problem Issue Business Impact Quantifier Yearly Incurred Impact
75
1. Problem this is the description of the original source problem. 2. Issue this is a list of issues that are attributable to the problem. There may be multiple issues associated with a specific problem. 3. Business Impact this describes the different business impacts that are associated with a specific issue. 4. Quantifier this describes a measurement of the severity of the business impact. 5. Periodic Accumulated Impact this provides a scaled representation of the actual costs that are related to the business impact over a specified time frame, such as the yearly impact shown in Figure 5.3. We will walk through an example of how the template in Figure 5.3 can be populated to reflect an example of how invalid data entry at one point in the supply chain management process results in impacts incurred at each of three different client application areas. For each business area, the corresponding impact quantifiers are identified, and then their associated costs are projected and expressed as yearly incurred impacts. In our impact matrix, the intention is to document the critical data quality problems, so that an analyst can review the specific issues that occur within the enterprise and then enumerate all the business impacts incurred by each of those issues. Once the impacts are specified, we simplify the process of assessing the actual costs, which we also incorporate in the matrix. The resulting matrix reveals the summed costs that can be attributed to poor data quality.
5.7 Problems, Issues, Causes

The first column of the impact matrix to be filled describes the problems and the associated data quality issues. Figuring out the presumptive error that leads to business impacts grounds the later steps of determining alternatives for remediation. In our example, shown in Figure 5.4, it had already been determined that the source problem is the incorrect introduction of customer identifying information at the data entry point. The issue, though, describes why it is a problem. Note that there may be multiple data issues associated with each business problem.
5.8 Mapping Impacts to Data Flaws

The next step is to evaluate the business impacts that occur at all of the line-of-business applications. These effectively describe the actual pain experienced as a result of the data flaw and provide
76
Problem
Issue
Business Impact
Quantifier
Yearly Incurred Impact
Inability to clearly identify known customers leads to duplication
Figure 5.4 Identifying problems and their issues.
greater detail as to why the source problem causes organizational pain. In our example, as seen in Figure 5.5, there are specific business impacts within each vertical line of business. These business impacts are added to the impact matrix, as shown in Figure 5.6. These business impacts are the same ones identified using the process in section 5.3. Although these are categorized in the impact matrix in relation to the source problem, it is valuable to maintain other classifications. For example, the different areas of shading reflect the application or line of business. We could also track how each impact falls into the business impact categories of chapter 1.
5.9 Estimating the Value Gap

The next step is to enumerate the quantifiers associated with the business impact and calculate a cost impact that can be projected over a years time. Realize that not all business impacts are
77
Data Flaws Incur Business Impacts An Example

Customer Service
1. Increased number of inbound calls 2. Increase in relevant call statistics 3. Decreased customer satisfaction
Accts Receivable
4. Lost payments 5. Increased audit demand 6. Impacted cash flow
Customer name does not match customer database, new record inserted with invalid information
Fulfillment
7. Increased shipping costs 8. Increased returns processing 9. Decreased customer satisfaction
Figure 5.5 Determining the actual business impacts and how they relate to the source problem.
necessarily quantified in terms of money. In our example, shown in Figure 5.7, some of the quantifiers are associated with monetary amounts (e.g., staff time, overdue receivables, increased shipping costs), whereas others are quantified with other organizational objectives (e.g., customer satisfaction, call center productivity). If the quantifier does not specifically relate to a monetary value, we will document it as long as the impact is measurable. In the version of the impact matrix in Figure 5.7 we have identified hard quantifiers and, based on those quantifiers, some sample incurred impacts rolled up over a years time. For example, the increase in inbound calls resulted in the need for additional staff time allocated to fielding those calls, and that additional time was summed up to $30,000 for the year. Auditing the accounts receivables might show that $250,000 worth of products have been ordered and shipped, but not paid for, an impact on revenues. Products shipped to the wrong location, returned, and reshipped had an average cost of $30, and took place 50 times per week, which equals $78,000.
78
Problem
Issue
Business Impact
Quantifier
Yearly Incurred Impact
Increased number of inbound call center calls
Increase in relevant call statistics Decreased customer satisfaction Lost payments Increased audit demand Impacted cash flow Increased shipping costs Increased returns processing Decreased customer satisfaction
Figure 5.6 Adding business impacts for each of the issues.
One of the big challenges is determining the quantifiers and the actual costs, because often those costs are buried within ongoing operations or are not differentiable from the operational budget. One rule of thumb to keep in mind is to be conservative. Documenting hard quantifiers is necessary since they will be used for current state assessment and identification of long-term target improvement goals. The objective is to come up with estimates that are both believable and supportable, but most of all, can be used for establishing achievable performance improvement goals. If the numbers are conservatively developed, the chances that changes to the environment will result in measurable improvement are greater. We are not done yet; realize that a business case doesnt just account for the benefits of an improvement program it also must factor in the costs associated with the improvements. Therefore, we need to look at the specific problems that are the root causes and what it would cost to fix those problems. In this
79
Problem
Issue
Business Impact
Quantifier
Yearly Incurred Impact $30,000.00
Increased number of inbound call center calls
Staff time
Increase in relevant call statistics Decreased customer satisfaction Lost payments Increased audit demand Impacted cash flow Increased shipping costs Increased returns processing Decreased customer satisfaction
Average call duration, throughput, hold time Call drop rate, re-calls Overdue receivables Staff time Cash flow volatility Increased shipping costs Staff time Attrition, order reduction (time or size) $78,000.00 $23,000.00 $250,000.00 $20,000.00
Figure 5.7 Quantifiers and estimated costs.
step, we evaluate the specific issues and develop a set of highlevel improvement plans, including analyst and developer staff time along with the costs of acquiring data quality tools. We can use a separate template, the remediation matrix (shown in Figure 5.8), that illustrates how potential solutions solve the core problem(s), and what the costs are for each proposed solution. Figure 5.8 shows an example remediation matrix, documenting the cost of each solution, which also allows us to allocate the improvement to the documented problem (and its associated impacts). Again, at this stage in the process it may not be necessary to identify the exact costs, but rather to get a ballpark estimate.
5.10 Prioritizing Actions

Because multiple problems across the enterprise may require the same solution, this opens up the possibility for economies of scale. It also allows us to amortize both the staff and technology
80
Problem Customer contact name, contact info misspelled at data entry point
Issue Inability to clearly identify known customers leads to duplication
Solution Parsing and Standardization, record linkage tools for cleansing
Implementation Costs $150,000.00 for license 15% annual maintenance
Staffing .75 FTE for 1 year .15 FTE for annual maintenance
Figure 5.8 Quantifiers and estimated costs.
investment across multiple problem areas, thereby further diluting the actual investment attributable to each area of business impact. Essentially, we can boil the prioritization process down to simple arithmetic: Each data issue accounts for some conservatively quantifiable gap in value over a specified time period. The root cause of each data issue can be remediated with a particular initial investment plus a continuous investment over the same specified time period. For each data issue calculate the opportunity value as the value gap minus the remediation cost. One can then sort the issues by the opportunity value, which will highlight those issues whose remediation will provide the greatest value to the organization. Of course, this simplistic model is a starting point, and other aspects can be integrated into the calculations, such as: Time to value, Initial investment in tools and technology, Available skills, and Learning curve. Any organization must cast the value within its own competencies and feasibility of execution and value. Although these templates provide a starting point, there is value in refining the business case development process to ensure that a valid return
81
on investment can be achieved while delivering value within a reasonable time frame.
5.11 The Data Quality Road Map

We now have two inputs for mapping out a plan for implementing data quality management program. Pragmatically, they are the value gap analysis described in this chapter and the data quality maturity model described in chapter 3. The road map combines the two by considering the level of maturity that is necessary to address the prioritized issues in the appropriate order of execution. Though one may aspire to achieve the highest level of maturity across all of the data quality framework components, the complexity introduced by the different kinds of challenges, combined with the oftentimes advisory role played by the data quality manager limits the mandate that can be imposed on the enterprise. Instead, it is desirable to propose a data quality vision that both supports the business objectives of the organization yet remains pragmatically achievable within the collaborative environment of the enterprise community. A practical approach is to target a level of maturity at which the necessary benefits of data quality management are achieved for the enterprise while streamlining the acceptance path for the individuals who will ultimately be contributing to the data quality effort. Given that targeted level of maturity, the next step is to lay out a road map for attaining that objective, broken out by phases that have achievable milestones and deliverables. These milestones and deliverables can be defined based on the descriptions of the component maturity in chapter 4. A typical implementation road map will contain five phases: 1. Establishing fundamentals 2. Formalize data quality activities 3. Deploy operational aspects 4. Establish level of maturity 5. Assess and fine-tune At the end of the final phase, there is an opportunity to review whether the stated objectives are met and whether it is reasonable to target a higher level of maturity. For example, consider this road map for attaining level 3 in the maturity model, which requires establishing the components detailed within levels 2 and 3 in chapter 3. The data quality strategy is deployed in five phases, with the objective of each phase of implementing the best practices that are specified in the detailed data quality maturity model.
82
5.11.1 Establish Fundamentals

Phase 1 establishes the fundamental organizational concepts necessary for framing the transition towards a high quality environment, with the following milestones: A framework for collaboration and sharing of knowledge between application manager, business client, and IT practitioners is put in place. Technology and operational best practices are identified, collected, and distributed via the collaboration framework. The relevant dimensions of data quality associated with data values are identified and are recognized as relevant by the business sponsors. Privacy, security, authorization, and limitation of use policies are articulated in ways that can be implemented. Tools for assessing objective data quality are available. Data standards are adopted. There is a process for characterizing areas of impact of poor data quality. Data quality rules are defined to identify data failures in process.
5.11.2 Formalize the Data Quality Activities

During phase 2, steps are taken to more formally define data quality activities and to take the initial steps in collaborative data quality management: Key individuals from enterprise form a data quality team to devise and recommend data governance program and policies. Expectations associated with dimensions of data quality associated with data values can be articulated. Simple errors are identified and reported. Root cause analysis is enabled using data quality rules and data validation. Data parsing, standardization, and cleansing tools are available. Data quality technology is used for entity location, record matching, and record linkage. Data quality impact analysis framework is in place.
5.11.3 Operationalizing Data Quality Management

Many of the ongoing operational aspects of a data quality program are put into place during phase 3: Data governance board consisting of business and IT representatives from across the enterprise is in place.
83
Expectations associated with dimensions of data quality related to data values, formats, and semantics can be articulated. Standards defined for data inspection for determination of accuracy. Standardized procedures for using data quality tools for data quality assessment and improvement in place. Data standards metadata managed within participant enterprises. Data quality service components identify flaws early in process. Data quality service components feed into performance management reporting.
5.11.4 Incremental Maturation

Phase 4 establishes most of the characteristics of the level 3 maturity: Guiding principles, charter, and data governance are in place. Standardized view of data stewardship across different applications and divisions, and stewardship program is in place. Capability for validation of data is established using defined data quality rules. Performance management is activated. Data quality management is deployed at both participant and enterprise levels. Data validation is performed automatically and only flaws are manually inspected. Business rulebased techniques are employed for validation. Guidelines for standardized exchange formats (e.g., XML) are defined. Structure and format standards are adhered to in all data exchanges. Auditing is established based on conformance to rules associated with data quality dimensions. Consistent reporting of data quality management is set up for necessary participants. Issues tracking system is in place to capture issues and their resolutions.
5.11.5 Assess, Tune, Optimize

The activities at phase 5 complete the transition to maturity level 3: Data contingency procedures are in place. Technology components for implementing data validation, certification, assurance, and reporting are in place.
84
Technology components are standardized across the enterprise at the service and at the implementation layers. Enterprise-wide data standards metadata management is in place. Exchange schemas are endorsed through data standards oversight process.
5.12 Practical Steps for Developing the Road Map

As a practical matter, these steps can be taken to lay out a road map for building a data quality program: Assess the current level of data quality maturity within the organization in comparison with the maturity model described in chapter 3. Determine those data quality issues with material impact. Articulate alternatives for remediation and elimination of root causes. Prioritize the opportunities for improvement. Assess business needs for processes. Assess business needs for skills. Assess business needs for technology. Map the needs to the associated level of data quality maturity. Develop a plan for acquisition of skills and tools to reach that targeted level of maturity. Plan the milestones and deliverables that address the needs for data quality improvement.
5.13 Accountability, Responsibility, and Management

Another important aspect of the data quality road map involves resource management, and addressing the challenge of coordinating the participants and stakeholders in a data quality management program is knowing where to begin. Often, it is assumed that starting an initiative by assembling a collection of stakeholders and participants in a room is the best way to begin. Before sending out invitations, however, consider this: without well-defined ground rules, these meetings run the risk of turning into turf battles over whose data, definitions, business rules, or information services are the correct ones.
85
Given the diversity of stakeholders and participants (and their differing requirements and expectations), how can we balance each individuals needs with the organizations drivers for data quality? There are a number of techniques that can help in organizing the business needs in a way that can in turn manage the initial and ongoing coordination of the participants. These include establishing processes and procedures for collaboration before kickoff, developing ground rules for participation, and clarifying who is responsible, accountable, consulted, and informed regarding the completion of tasks.
5.13.1 Processes and Procedures for Collaboration

Assembling individuals from different business areas and applications will expose a variety of opinions about the names, structures, definitions, sources, and reasonable uses for data concepts used across the organization. In fact, it is likely that there is already a lengthy corporate experience regarding the definition of common terms (e.g., what is a customer?), and to reduce replication of effort, take the time to establish rules for interaction in the context of a collaborative engagement where the participants methodically articulate their needs and expectations of their representative constituencies. The process should detail the approach for documenting expectations and provide resolution strategies whenever there are overlaps or conflicts with respect to defining organizational business needs.
5.13.2 Articulating Accountability: The RACI Matrix

In chapter 2 we discussed characteristics of the participants and stakeholders associated with a data quality management program. To ensure that each participants needs are addressed and that their associated tasks are performed appropriately, there must be some delineation of specific roles, responsibilities, and accountabilities assigned to each person. One useful model is the RACI (Responsible, Accountable, Consulted, and Informed) model. A RACI model is a two-dimensional matrix listing tasks along the rows and the roles listed along the columns. Each cell in the matrix is populated according to these participation types: R if the listed role is responsible for deliverables related to completing the task; A if the listed role is accountable for delivering the tasks deliverables or achieving the milestones;
86
C if the listed role is consulted for opinions on completing the task; or I if the listed role is informed and kept up to date on the progress of the task. Figures 5.9 and 5.10 provide a sample RACI matrix associated with some of the data quality processes described in chapter 2. Again, this template and assigned responsibilities is a starting point and is meant to be reviewed and refined in relation to the roles and relationships within your own organization.
5.14 The Life Cycle of the Data Quality Program

At the beginning of a data quality initiative, there may seem to be a never-ending list of issues that need to be addressed, and as a team works its way through this list, you will find that two interesting counterintuitive phenomena will become clear. The first is that tracking down and fixing one reported issue often results in the correction of some other problems reported to the list. The other is that even though you eliminate some problems, as these issues are resolved, new issues will emerge from the existing test suites. Sitting back and thinking about this provide some insight into the process, and ultimately suggests an interesting idea about planning for any quality management program. There are good explanations for both of these results, and examining the life cycle of the quality management process should help in developing a winning argument for the support of these programs. Consider the first by-product, in which fixing one problem results in other problems mysteriously disappearing. Apparently, even though more than one issue is reported, they all share the same root cause. Because the people reporting the issue only understood the applications functionality (but did not have a deep knowledge of how the underlying application was designed or how it worked), each issue was perceived to be separate whenever the results or side effects differed. Yet when issues share the same root cause, the process of analyzing, isolating, and eliminating the root cause of the failure also eliminates the root cause of the other failures. The next time you evaluate the errors, the other issues sharing the same root cause will no longer fail. The second by-product is a little less intuitive, because one would think that by finding and fixing problems, the result should be fewer issues, when in fact it is likely to result in more
Senior Manager
Business Client
Application Owner
Data governance manager
Data quality manager
Data steward
Data quality analyst
Metadata analyst
System developer
Operations staff
Business impact analysis
CI
CI
Data quality requirements analysis Data quality assessment Bottom-up Data quality assessment Top-down Engage business data consumers Define, review, prioritize DQ measures Define data quality metrics Set acceptability thresholds Data standards management Active metadata management Define data validity rules
CI
CI
CI
CI
CI
CI
CI
CI
CI
CI
CI
CI
CI
CI
CI
CI
CI
CI
CI
CI
Data quality inspection and monitoring
DQ SLA
CI
CI
CI
CI
Figure 5.9 Sample data quality RACI matrix part 1
87
88
Senior Manager Business Client
Application Owner
Data governance manager
Data quality manager
Data steward
Data quality analyst
Metadata analyst
System developer
Operations staff
Enhanced SDLC for DQ
CI
CI
Data quality issue reporting Data quality issue tracking Root cause analysis
CI
CI
CI
CI
CI
Data correction
CI
CI
CI
CI
Process remediation Data standardization and cleansing
CI
CI
Identity resolution Data enhancement
CI
CI
CI
CI
Figure 5.10 Sample data quality RACI matrix part 2
89
issues. What actually happens is that fixing one reported problem enables a test to run past the point of its original failure, allowing it to fail at some other point in the process. Of course, this (and every other newly uncovered) failure will need to be reported to the issue list, which will initially lead to an even longer list of issues. Rest assured, though, that eventually the rate of the discovery of new issues will stabilize and then decrease, while at the same time the elimination of root causes will continue to shorten the list of issues. If you prioritize the issues based on their relative impact, as more problems are eliminated, the severity of the remaining issues will be significantly lower as well. At some point, the effort needed to be expended on researching the remaining issues will exceed the value achieved in fixing them, and at that time you can effectively transition into proactive mode, decreasing your staffing needs as the accountability and responsibility is handed off to the application owners. In other words, this practical application of the Pareto principle demonstrates how reaching the point of diminishing returns allows for better resource planning while reaping the most effective benefits. There are some lessons to be learned with respect to data quality issue analysis: 1. Subjecting a process to increased scrutiny is bound to reveal significantly more flaws than originally expected. 2. Initial resource requirements will be necessary to address most critical issues. 3. Eliminating the root causes of one problem will probably fix more than one problem, improving quality overall. 4. There is a point at which the resource requirement diminishes because the majority of the critical issues have been resolved. These points suggest a valuable insight that there is a life cycle for a data quality management program. Initially there will be a need for more individuals focusing a large part of their time in researching and reacting to problems, but over time there will be a greater need to have fewer people concentrate some of their time on proactively preventing issues from appearing in the first place. In addition, as new data quality governance practices are pushed out to others across the organization, the time investment is diffused across the organization as well, further reducing the need for long-term dedicated resources. Knowing that the resource requirements are likely to be reduced over time may provide additional business justification to convince senior managers to support establishing a data quality program.
90
5.15 Summary
The life cycle of the data quality management program dovetails well with the maturity model described in chapter 3. The lower levels of the maturity model reflect the need for reacting to data quality issues, while as the organization gains more expertise, the higher levels of maturity reflect more insight into preventing process failures leading to data issues. As a practical matter, exploring areas of value for developing a successful business case will help in mapping out a reasonable and achievable road map. Consider an initial exercise that involves working with some senior managers to seek out those house on fire issues, namely by following these steps as reviewed in this chapter: 1. Identify five business objectives impacted by the quality of data 2. For each of those business objectives: a. Determine cost/impacts areas for each flaw b. Identify key quantifiers for those impacts c. At a high level, assess the actual costs associated with that problem 3. For each data quality problem: a. Review solution options for that problem b. Determine costs to implement 4. Seek economies of scale to exploit the same solution multiple times At the conclusion of this exercise, you should have a solid basis of information to begin to assemble a business case that not only justifies the investment in the staff and data quality technology used in developing an information quality program, but also provides baseline measurements and business-directed metrics that can be used to plan and measure ongoing program performance.

Reader 36 WS 01 Rbae

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reader 36 WS 01 Rbae

Uploaded by

Copyright:

Available Formats

Competence Center Corporate Data Quality 36th Workshop

St.Gallen, 10th & 11th October 2013

Business Model Theory

Data Collection and Analysis

Business Models of Business Partner Data Providers

Table 1. Business Models of the Case Study Companies

100,000 from various industries. BvD, among others.

Activities and organizatio n

Data retrieval, analysis, cleansing and provision

Factor and production inputs Scope of management

Less than 50 employees, central data platform.

Open data community.

50 worldclass suppliers, 2,500 data sources. Global coverage.

Fig. 1. Business Model Patterns

Resource Allocation Patterns

Fig. 2. Resource Allocation in the Case Study Companies

Interpretation of Case Study Findings

Fig. 3. Business Model Framework for Business Partner Data Providers

Industrial Cloud: Toward Inter-enterprise Integration

Industrial Cloud: Toward Inter-enterprise Integration

T.W. Wlodarczyk, C. Rong, and K.A.H. Thorsen

Industrial Cloud: Toward Inter-enterprise Integration

T.W. Wlodarczyk, C. Rong, and K.A.H. Thorsen

Enterprise Cloud Enterprise Cloud Industrial Cloud

Service Company Cloud

Fig. 1. Industrial Cloud

Industrial Cloud: Toward Inter-enterprise Integration

Enterprises Service company

Data formats, Ontologies, Architecture and Infrastructure

Fig. 2. Integration, collaboration and composition in industrial cloud

T.W. Wlodarczyk, C. Rong, and K.A.H. Thorsen

Industrial Cloud Management Services

Mapping & Integration

Billing, Metering, etc.

Fig. 3. Industrial Cloud stacked on Enterprise and Public Cloud

Industrial Cloud: Toward Inter-enterprise Integration

T.W. Wlodarczyk, C. Rong, and K.A.H. Thorsen

4 Cloud Categories Comparison

Intranet (and internet) Owned by the enterprise; some virtualization

Economics Security and privacy

Not a problem; everything owned by the company

Not a significant problem; issue controlled by SIG

Industrial Cloud: Toward Inter-enterprise Integration

T.W. Wlodarczyk, C. Rong, and K.A.H. Thorsen

Industrial Cloud: Toward Inter-enterprise Integration

Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP

5.1 Return on the Data Quality Investment

Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP

5.2 Developing the Business Case

5.3 Finding the Business Impacts

Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP

Identify business impacts

Correlate impacts and causes

Build the impact matrix

Research root causes

Map impacts to data flaws

Estimate value gap

Figure 5.1 Developing a business case for data quality management.

5.3.1 Roles and Responsibilities

5.3.2 Clarification of Business Objectives

Chapter 5 DEVELOPING A BUSINESS CASE AND A DATA QUALITY ROAD MAP

5.3.3 Identification and Classification

5.3.4 Identifying Data Flaws