You are on page 1of 22

Unlocking the Data Assets

Introduction Service providers are faced with a difficult dilemma. Despite the fact that they have decades of experience, major networks in place, are serving customers all over the world and continue to derive income from voice and data traffic flowing through their pipes, traditional revenues are eroding, and competition is increasing. Voice is becoming a commodity, as evidenced by the move to flat rate, all you can eat pricing plans. According to Pyramid Research, in 2007 global service providers were still deriving 81% of their average revenue per user (ARPU) from subscription-based voice services.1 The subscriptionbased services typically include both prepaid and postpaid accounts based on minutes of use. But further analysis in a forecast from researchers at IDC projects that the U.S. mobile subscriber market is entering its mature phase.2 While growth in data services will remain a key revenue driver to offset continued voice ARPU erosion even data revenue growth is set to slow sharply beginning in the 2009-2012 time frame. High-speed, IP-based networks are tailor-made for Internet-based applications, and customers are expecting their service providers to provide more innovative, personalized and context-aware services from multimedia offerings to mobile social networking. The young Millennials constitute an influential customer segment driving the push to new applications both on and off the Internet, given their seemingly insatiable appetite for data-rich applications and services. In the area of mobile social networking alone, a recent In-Stat report3 predicts a steady growth rate in the number of U.S. Millennials subscribing to mobile social networking, as nearly 30 million people sign up by the year 2012. In addition, Millennials use of wireless data applications will surge compared to average users.4 In this article, we describe the new business models and strategies service providers should consider to reap the potential of these dynamic and in-demand, end-user applications. Unlocking Intangible Assets A Service Provider Goldmine Service providers have substantial physical or tangible assets in the form of extensive wireline and wireless networks, but even these are not enough to meet the new customer-driven demand. The situation will only get worse as competition ramps up from major Internet, applications and content providers such as Google, Microsoft, Apple and others. In addition to making the most of their networks, service providers need to transform their business

models and develop strategies that leverage intangible assets as well. What Are Intangible Assets? Intangible assets are intellectual properties such as patents, copyrights, trademarks and trade secrets, as well as assets like end-user profiles, presence, location, calling/buying habits and preferences, goodwill and brand image. Google is a prime example of a company that is leveraging intangible assets by acting as an intellectual landlord who passes on the right to use its broad spectrum of search services. By capturing an astronomical number of eyeballs, Google captures a major portion of web advertising dollars. Apple is another excellent master of leveraging intangible assets. The company uses its virtual music portal iTunes to boost sales of its tangible offerings its line of network computers and consumer devices such as iPods and iPhones.* Telecom service providers also have a rich set of intellectual properties and end-user data to monetize. These include billing data, contextual information and analytics, credit history and social networking interests. By leveraging these end-user intangible assets shown below, telecom service providers can raise the top-line revenue by adapting to todays changing market dynamics and create a value proposition that is different from those offered by competitors. Intangible Assets Owned by Service Providers Established billing relations and trust Credit histories ID management (authentication) Wide range of vertical applications and services Location and presence capabilities In-depth knowledge of calling patterns Insight into customer buying and browsing preferences and habits Buddy/family lists Social network and other Web 2.0 capabilities Intangible assets allow service providers to move into nontraditional areas such as entertainment and Web 2.0 services and harvest new revenue streams such as advertising, fulfillment, eCommerce and m-Commerce. The result is an increase in their revenue streams through the creation of new customertailored services. And, as an additional benefit, a move in this direction secures revenue and value of

tangible assets fixed and mobile networks by attracting new subscribers who may be interested in the free services that are supported by advertising. New and Strategic Business Models Unlocking the wealth of customer data has significant short- and long-term strategic value. Business models that leverage end-user data such as user profiles that can be targeted for advertising or e/mcommerce initiatives that generate a commission or service fees generally out-perform models based solely on optimizing physical networks.5 By first evaluating their present mode of operation (PMO), service providers can then examine various business models that better match future mode of * iTunes, iPod and iPhone are trademarks of Apple Inc., registered in the US and other countries. operations (FMO) that take full advantage of both their tangible network assets and intangible enduser data assets. The goal is to support top line revenue growth and reduce operational expenses. The new models allow service providers to accelerate their transformation from being a traditional telephony domain player to a multi-domain (Internet or entertainment) player, allowing them to deliver innovative services rapidly to meet end-user demands. At the same time, these new models support the service providers drive to tap into new, high-growth markets to increase mobile subscriber growth and reduce churn. With this transformation, providers are not only operating in the telephony domain, but also are blending the best aspects of the IT, broadcast (content aggregation and brokering) and web domains into their operations and, as a result, are becoming multi-domain players. In order to move into this new cross-domain territory, service providers may have to make counterintuitive decisions. For instance, some of the business models may not be selfsustaining in terms of generating additional revenues. However, these business models could and should definitely be used for competitive differentiation. Alcatel-Lucent has identified eight key business models that service providers should consider as they look to boost revenues and lower costs Network operator sells an asset such as excess network capacity to a retail service provider Wholesaling such as a virtual network operator. Transferring the management of resources and day-to-day business functions such as billing, Outsourcing data storage and even human resources and supply chain mangement to an external supplier.

Asset sharing Two or more service providers own and operate networks, sharing overlapping tangible assets. Involves the coordination of commercial agreements and technologies that support the Content aggregation availability and distribution of user-generated content and premium content such as articles and brokering and multimedia video and music files. Interactively connecting people with the brands and organizations through multiple mediums for example, TV, mobile, and computers is an important revenue stream for Targeted advertising service providers. Includes development and deployment of platforms, tools and applications that allow users User-generated content to generate and distribute multimedia content. and communities Service providers leverage their established customer billing relationships to complete Fulfillment transactions and deliver services on behalf of third parties. Includes extending service provider network resources to support the buying and selling e/m-Commerce of products and services over fixed and mobile networks. These business models are examples of innovative ways to reduce operational costs and increase topline revenues by unlocking the potential of the service providers intangible assets as well as finding new uses for its tangible wireless and wireline network assets. A couple of cases in point illustrate this principle. Case study 1: Content aggregation and brokering Oranges free ad-funded mobile video service7 France Telecom and its wireless division, Orange, recently introduced an advertising-funded video magazine, Zap!, that is exclusively available to Orange mobile subscribers. The service offers free celebrity, lifestyle, news and sports video clips to 2G and 3G handset owners. However, to consume the content, subscribers must watch a short advertisement before the video. The information is refreshed to ensure continued customer interest. According to company executives, this strategy is not only generating new ad revenue, but is also drawing and locking in users interested in the content and dynamic services available on the service providers mobile phone network. Case study 2: User-generated content and communities A number of mobile carriers AT&T, Sprint, T-Mobile and Verizon have partnered with social networking sites like MySpace, Facebook, Loopt, Jumbuck to offer mobile social

networking services. Mobile social networking represents the convergence of various community development and communications applications that have become important to the young Millennial generation such as voice, texts, IM, games, maps and multimedia sharing. These synergistic partnerships are still in the development phase, but show incredible potential especially when one considers the predicted rate of adoption among the technology-savvy Millennials. In this scenario, mobile service providers are creating new revenue streams by providing on-deck client applications available for a subscription fee. A second potential source of income may be derived from data charges for using the service. However, many Millennials are sensitive to extra data charges, so pursuing this line of business may hamper adoption of the core service. A more lucrative development for revenue might stem from joint ventures with networking sites in which ad revenues on the Web and net new subscription revenues are shared.8 Creating Killer Environments Rather than Killer Apps To unlock the full value of their intangible assets, service providers need to create a killer environment rather than search for a killer application. This is because finding killer apps may be as easy as following the Millennials lead. But before the Millennials lead can be followed, service providers must first develop a killer environment to which end users are attracted and in which they can be carefully tracked. This environment will most likely be the byproduct of transformed networks based on an IP infrastructure that features a service delivery environment (SDE) that takes full advantage of the power and flexibility of a Service-oriented Architecture (SOA). Consider a mobile subscriber who signs up for a service called Gifts on Time, which alerts the end users to birthdays, anniversaries or any special occasions for friends and family members. In offering this service, carriers can not only provide an alert ahead of time, but also a URL link to a web site that was last used to purchase a gift and/or a greeting card. By clicking this link, the end user not only fulfills the obligation to send the gift on time, but also completes the entire transaction without interrupting the session on the phone. By leveraging the buddy/family list intangible asset, the service provider is aiding the end user and raising top-line revenue from both extended session minutes and/or from the commercial activity with a vendor partner. Thus, a killer environment also calls for service providers to make their networks more attractive to third parties and wholesale customers. This means they must significantly reduce the proprietary elements of the network that make it difficult to connect and transact business. A transformed

network based on IP technology will provide an environment that enables flexible, consistent and simple access to third-party services. The result is an enhanced quality of experience for the end user. In addition, service providers will have the ability to monetize their customer assets from nonuserpaid revenue such as on-portal advertising or the sale of differentiated assets to the marketplace. These would include such items as user location, demographics, postal code and presence, as well as providing access to various systems such as user shopping carts and operating system real estate (including prime displays, locations on the main screen and mailboxes). Overall, this new environment gives the service provider the necessary service agility, personalization and blending that make it possible to monetize intangible assets. This, in turn, will have the effect of reversing faltering revenue streams from traditional sources. Because of the limited nature of the service providers resources and the rapidly changing dynamics of the marketplace, the transformation to this new environment requires the help of a skilled technology partner with global experience, in-depth resources, and top quality research and development facilities so they will be uniquely positioned to leverage and unlock the value of their intangible assets through IP transformation.

DESIGNING BUISNESS INFORMATION:


Business and technical meta data is commonly referred to as meta data. Operational meta data refers to data about processes as opposed to data about data. SAP BW maintains all three types of meta data. However, the SAP BW meta data objects are used to model and maintain business and technical meta data, while operational meta data is generated by data warehouse processes and is available through scheduling and monitoring components. The modeling functionality is the most important part of the AWB(APPLICATION WORKBENCH), as it provides the main entry point for defining the core meta data objects used to support reporting and analysis. This includes everything from defining the extraction processes and implementing transformations to defining flat or multidimensional objects for information storage. The Business Content component allows you to browse through the predefined models available and activate them. Once activated, you can use these information models without further modification or extend them using the modeling component of the AWB. The Meta Data Repository provides an online hypertext documentation of either activated meta data objects (the ones actually used in the BW system) and the meta data objects of the Business Content. You can export this hypertext documentation to a set of HTML files and publish it on a Web server, where it may also serve as an online

and automatically updated project documentation. An offline meta data modeling tool tentatively called Repository Studio is currently under development at SAP. The Repository Studio is designed to support offline meta data modeling for SAP BW meta data. SAP BW meta data is imported into the offline repository. There you can modify it using the modeling functionality of the Repository Studio and export it back into an SAP BW system. The Repository Studio is a completely Web-based, multi-user application that you can use in team environments without having to be connected to an SAP BW system. However, it still supports working offline (e.g., on laptops while traveling) by integrating a standalone Web server.

SCHEDULING:
Data warehousing requires batch processing for loading and transforming data, creating and maintaining aggregates, creating and maintaining database indexes, exporting information to other systems, and creating batch reports. These processes need to be planned to provide results in time, to avoid resource conflicts by running too many jobs at a time, and to take care of logical dependencies between different jobs.

Monitoring :
Equally important as starting batch processes is monitoring and eventually troubleshooting them. The Data Load Monitor supports troubleshooting by providing access to detailed pro tocols of all activities related to loading, transforming, and storing data in SAP BW allowing you to access single data records and to simulate and debug user-defined transformations. Other processes monitored are ODS object activation, master data attribute activation, hierarchy activation, aggregate rollup, realignment and readjustment jobs, InfoCube compression jobs, database index maintenance, database statistics maintenance, and data exports.

Reporting Agent
The reporting agent allows the execution of queries in batch mode. Batch mode query execution can be used to: Print reports. Automatically identify exception conditions and notify users responsible for taking appropriate action. Precompute query results for use in Web templates. Precompute value sets for use with value set variables (see the Queries section, later in the chapter, for a definition of variables).

DataSource Manager:
The DataSource Manager also allows capturing and intermediately storing uploaded data in the persistent staging area (PSA). Data stored in the PSA is used for several purposes: Data quality. Complex check routines and correction routines can be implemented to make sure data in the PSA is consistent before it is integrated with other data sources or is uploaded to its final data target. Repeated delta updates. Many extraction programs do not allow you to repeat uploads of deltas, which are sets of records in the data source that have been inserted or updated since the last upload. Repeated delta uploads are required in cases where the same delta data has to be updated into multiple data targets

at different points of time. Short-term backup data source. A short-term backup data source is required in cases where update processes fail for some technical reason (such as insufficient disk space or network availability) or where subtle errors in the transformations performed on the data warehouse side are only discovered at a later point in time. Once stored in the PSA, data may be read from the PSA and updated into the final data target at any point in time and as often as required. Supporting development. Based on data in the PSA, SAP BW allows you to simulate transfer rules, and update rules, and to debug the implemented transformations.

POPULATING BUISNESS INFORMATION WAREHOUSE:


There are two types of data stores:

Physical data stores (data targets)


Basic InfoCubes Basic InfoCubes store data physically. Data initially gets loaded into the F table. During SAP compression, this will be moved to the E fact table. SAP compression is discussed in 4.6, SAP compression for loads on page 41. We discuss Basic InfoCubes in this redbook. Transactional InfoCubes Transactional InfoCubes store data physically. These cubes are both readable and writable, whereas Basic InfoCubes are read only. Transactional InfoCubes are used in Strategic Enterprise Management (SEM) only. Tranactional InfoCubes are beyond the scope of this redbook and are not discussed.

Virtual data stores


RemoteCube A RemoteCube is an InfoCube whose transaction data is not managed in the Business Warehouse, but rather externally. Only the structure of the RemoteCube is defined in BW. The data is read for reporting using a BAPI from another system. SAP RemoteCube In an SAP RemoteCube, data is stored in another SAP system.

USER ACCESS TO INFORMATION:


Objective Authentication and access control measures should ensure appropriate access to information and information processing facilities including mainframes, servers, desktop and laptop clients, mobile devices, applications, operating systems and network services and prevent inappropriate access to such resources. Access control policy An access control policy should be established, documented and periodically reviewed, based on business needs and external requirements. Access control policy and associated controls could take account of:

security issues for particular data systems and information processing facilities, given business needs, anticipated threats and vulnerabilities; security issues for particular types of data, given business needs, anticipated threats and vulnerabilities; relevant legislative, regulatory and certificatory requirements; relevant contractual obligations or service level agreements; other organizational policies for information access, use and disclosure; and consistency among such policies across systems and networks. Access control policy content Access control policies generally should include: clearly stated rules and rights based on user profiles; consistent management of access rights across a distributed/networked environment; an appropriate mix of administrative, technical and physical access controls; administrative segregation of access control roles -- e.g., access request, access authorization, access administration; requirements for formal authorization of access requests ("provisioning"); and requirements for authorization and timely removal of access rights ("deprovisioning"). User access management policy Policies should include a focus on ensuring authorized user access, and preventing unauthorized user access, to information and information systems. This could include: formal procedures to control the allocation of access rights; procedures covering all stages in the life-cycle of user access, from provisioning to deprovisioning; and special attention to control of privileged ("super-user") access rights. User registration Formal user registration and de-registration procedures should be implemented, for granting and revoking access to all information systems and services. In addition to assignment of unique user-IDs to each user, this could include: documentation of approval from the information system owner for each user's access; confirmation by a reviewing party (supervisor or other personnel) that each user's access is consistent with business purposes and with other security controls (e.g., segregation of duties); giving each user a written statement of their access rights and responsibilities; requiring users to sign statements indicating they understand the conditions of access (see also Terms and conditions of employment and Confidentiality agreements); ensuring access is not granted until all authorization procedures are completed; maintaining a current record of all users authorized to use a particular system or service; immediately changing/eliminating access rights for users who have changed roles or left the organization; and checking for and removing redundant or apparently unused user-IDs. Privilege management Allocation and use of access privileges should be restricted and controlled. This could include: development of privilege profiles for each system, based on intersection of user profiles and system resources; granting of privileges based on these standard profiles when possible; a formal authorization process for all privileges, with additional review requirements for exceptions to standard profiles; and

maintaining a current record of privileges granted. User password management Allocation of passwords should be controlled through a formal management process. This could include: requiring users to sign a statement indicating they will keep their individual passwords confidential and, if applicable, keep any group passwords confidential solely within the group; secure methods for creating and distributing temporary, initial-use passwords; forcing users to change any temporary, initial-use password; forcing users to periodically change passwords, and to use strong passwords at each change; development of procedures to verify a user's identity prior to providing a replacement password ("password reset"); prohibiting "loaning" of passwords; prohibiting storage of passwords on computer systems in unprotected form; and prohibiting use of default vendor passwords, where applicable. User access token management Allocation of access tokens, such as key-cards, should be controlled through a formal management process. This could include: requiring users to sign a statement indicating they will keep their access tokens secure; secure methods for creating and distributing tokens; use of two-factor tokens (token plus PIN) where appropriate and technically feasible; development of procedures to verify a user's identity prior to providing a replacement token; and prohibiting "loaning" of tokens. Review of user access rights Each user's access rights should be periodically reviewed using a formal process. This could include: review at regular intervals, and after any status change (promotion, demotion, transfer, termination); and more frequent review of privileged ("super user") access rights. Policy on use of network services Users should be provided with access only to the network services that they have been specifically authorized to use. This could include: authorization procedures for determining who is allowed to access to which networks and network services, consistent with other access rights; and policies on deployment of technical controls to limit network connections. User authentication for remote connections Where appropriate and technically feasible, authentication methods should be used to control remote access to the network. Equipment/location identification in networks Where appropriate and technically feasible, access to the network should be limited to identified devices or locations. Remote diagnostic and configuration port protection Physical and logical access to diagnostic and configuration ports should be appropriately controlled. This could include: physical and technical security for diagnostic and configuration ports; and disabling/removing ports, services and similar facilities which are not required for business functionality. Segregation in networks Where appropriate and technically feasible, groups of information users and services should be segregated on networks. This could include:

separation into logical domains, each protected by a defined security perimeter; and secure gateways between/among logical domains. Network connection control Capabilities of users to connect to the network should be appropriately restricted, consistent with access control policies and applications requirements. This could include: filtering by connection type (e.g., messaging, email, file transfer, interactive access, applications access); and additional authentication and access control measures as appropriate. Network routing control Routing controls should be implemented to ensure that computer connections and information flows do not breach the access control policies of/for applications on the network. This could include: positive source and destination address checking; and routing limitations based on the access control policy. Control of use of systems Controls should be implemented to restrict operating system access to authorized users, by requiring authentication of authorized users in accordance with the defined access control policy. This could include: providing mechanisms for authentication by knowledge-, token- and/or biometricfactor methods as appropriate; recording successful and failed system authentication attempts; recording the use of special system privileges; and issuing alarms when access security controls are breached. Secure log-on procedures Access to systems should be controlled by secure log-on procedures. This could include: display of a general notice warning about authorized and unauthorized use; no display of system or application identifiers until successful log-on; no display of help messages prior to successful log-on that could aid an unauthorized user; validation or rejection of log-on only on completion of all input data (e.g., both user-ID and password); no display of passwords as entered (e.g., hide with symbols); no transmission of passwords in clear text; limits on the number of unsuccessful log-on attempts in total or for a given time period; limits on the maximum and minimum time for a log-on attempt; logging of successful and unsuccessful log-on attempts; and on successful log-on, display date/time of last successful log-on and any unsuccessful attempts. User identification and authentication All system users should have a unique identifier ("user-ID") for their personal use only. A suitable authentication technique knowledge-, token- and/or biometric-based should be chosen to authenticate the user. This could include: shared user-IDs are employed only in exceptional circumstances, where there is a clear justification; generic user-IDs (e.g., "guest") are employed only where no individual-user-level audit is required and limited access privileges otherwise justify the practice; strength of the identification and authentication methods (e.g., use of multiple authentication factors) are suitable to the sensitivity of the information being accessed;

and regular user activities are not performed from privileged accounts. Password management system Systems for managing passwords should ensure the quality of this authentication method. This could include: log-on methods enforce use of individual user-IDs and associated passwords; set/change password methods enforce choice of strong passwords; force change of temporary password on first log-on; enforce password change thereafter at reasonable intervals; store passwords separately from application data; and store and transmit passwords in encrypted form only.

Access token management system Systems for managing access tokens should ensure the quality of this authentication method. Biometric access management system Systems for managing access via biometrics should ensure the quality of this authentication method. Use of system utilities that override controls Use of system utilities that are capable of overriding other controls should be restricted, and appropriately monitored whenever used (e.g., by special event logging processes). Session time-out Interactive sessions should shut down and lock out the user after a defined period of inactivity. Resumption of the interactive session should require reauthentication. This could include: time-out periods that reflect risks associated with type of user, setting of use and sensitivity of the applications and data being accessed; waiver or relaxation of time-out requirement when it is incompatible with a business process, provided other steps are taken to reduce vulnerabilities (e.g., increased physical security, reduction in access privileges, removal of sensitive data, removal of network connection capabilities). Limitation of connection time and location Restrictions on connection times should be used to provide additional security for high-risk applications or remote communications capabilities. This could include: requiring re-authentication at timed intervals; restricting overall connection duration or connection time period (e.g., normal office hours); and restricting connection locations (e.g., to IP address ranges). Information access restriction Access to information and application system functions should be restricted in accordance with a defined access control policy that is consistent with the overall organizational access policy. This could include any of the controls listed herein. Sensitive system isolation Sensitive systems should have a dedicated (isolated) computing environment. This could include: explicit identification and documentation of sensitivity by each system/application controller (owner); construction of appropriately isolated environments where technically and operationally feasible; and explicit identification and acceptance of risks when shared facilities and/or resources must be used.

IMPLEMENTING THE WAREHOUSE:


OBSATACLES: 1. The Project Is Over Budget
Depending on how much the actual expenditures exceeded the budget, the project may be considered a failure. The cause may have been an overly optimistic budget or the inexperience of those calculating the estimate. The inadequate budget might be the result of not wanting to tell management the bitter truth about the costs of a data warehouse. Unanticipated and expensive consulting help may have been needed. Performance or capacity problems, more users, more queries or more complex queries may have required more hardware or extra effort to resolve the problems. The project scope may have been extended without a change in the budget. Extenuating circumstances such as delays caused by hardware problems, software problems, user unavailability, change in the business or other factors may have resulted in additional expenses. 2. Slipped Schedule Most of the factors listed in the preceding section could also have contributed to the schedule not being met, but the major reason for a slipped schedule is the inexperience or optimism of those creating the project plan. In many cases management wanting to put a stake in the ground were the ones who set the schedule by choosing an arbitrary date for delivery in the hope of giving project managers something to shoot for. The schedule becomes a deadline without any real reason for a fixed delivery date. In those cases the schedule is usually established without input from those who know how long it takes to actually perform the data warehouse tasks. The deadline is usually set without the benefit of a project plan. Without a project plan that details the tasks, dependencies and resources, it is impossible to develop a realistic date by which the project should be completed. 3. Functions and Capabilities Not Implemented The project agreement specified certain functions and capabilities. These would have included what data to deliver, the quality of the data, the training given to the users, the number of users, the method of delivery e.g. web based, service level agreements (performance and availability), pre-defined queries, etc. If important functions and capabilities were not realized or were postponed to subsequent implementation phases, these would be indications of failure 4. Unhappy Users If the users are unhappy, the project should be considered a failure. Unhappiness is often the result of unrealistic expectations. Users were expecting far more than they got. They may have been promised too much or there may have been a breakdown in communication between IT and the user. IT may not have known enough to correct the users false expectations, or may have been afraid to tell them the truth. We often observe situations where the user says jump,

and IT is told to say how high? Also, the users may have believed the vendors promises for grand capabilities and grossly optimistic schedules. Furthermore, users may be unhappy about the cleanliness of their data, response time, availability, usability of the system, anticipated function and capability, or the quality and availability of support and training. 5. Unacceptable Performance Unacceptable performance has often been the reason that data warehouse projects are cancelled. Data warehouse performance should be explored for both the query response time and the extract/transform/load time. Any characterization of good query response time is relative to what is realistic and whether it is acceptable to the user. If the user was expecting sub second response time for queries that join two multi-million-row tables, the expectation would cause the user to say that performance was unacceptable. In this example, good performance should have been measured in minutes, not fractions of a second. The user needs to understand what to expect. Even though the data warehouse may require executing millions of instructions and may require accessing millions of rows of data, there are limits to what the user should be expected to tolerate. We have seen queries where response time is measured in days. Except for a few exceptions, this is clearly unacceptable. As data warehouses get larger, the extract/transform/load (ETL) process will take longer, sometimes as long as days. This will impact the availability of the data warehouse to the users. Database design, architecture, and hardware configuration, database tuning and the ETL code whether an ETL product or hand written code will significantly impact ETL performance. As the ETL process time increases, all of the factors have to be evaluated and adjusted. In some cases the service level agreement for availability will also have to be adjusted. Without such adjustments, the ETL processes may not complete on time, and the project would be considered a failure. 6. Poor Availability Availability is both scheduled availability (the days per week and the number of hours per day) as well as the percentage of time the system is accessible during scheduled hours. Availability failure is usually the result of the data warehouse being treated as a second-class system. Operational systems usually demand availability service level agreements. The performance evaluations and bonus plans of those IT members who work in operations and in systems often depends on reaching high availability percentages. If the same standards are not applied to the data warehouse, problems will go unnoticed and response to problems will be casual, untimely and ineffective. 7. Inability to Expand If a robust architecture and design is not part of the data warehouse implementation, any significant increase in the number of users or increase in the number of queries or complexity of queries may exceed the capabilities of the system. If the data warehouse is successful, there will also be a demand for more data, for more detailed data and, perhaps, a demand for more historical data to perform extended trend analysis, e.g. five years of monthly data. 8. Poor Quality Data/Reports If the data is not clean and accurate, the queries and reports will be wrong, In which case users will either make the wrong decisions or, if they recognize that the data is wrong, will mistrust the reports and not act on them. Users may spend significant time validating the report figures, which in turn will impact their productivity. This impact on productivity puts the value of the data warehouse in question.

9. Too Complicated for Users Some tools are too difficult for the target audience. Just because IT is comfortable with a tool and its interfaces, it does not follow that all the users will be as enthusiastic. If the tool is too complicated, the users will find ways to avoid it, including asking other people in their department or asking IT to run a report for them. This nullifies one of the primary benefits of a data warehouse, to empower the users to develop their own queries and reports. 10. Project Not Cost Justified Every organization should cost justify their data warehouse projects. Justification includes an evaluation of both the costs and the benefits. When the benefits were actually measured after implementation, they may have turned out to be much lower than expected, or the benefits came much later than anticipated. The actual costs may have been much higher than the estimated costs. In fact, the costs may have exceeded both the tangible and intangible benefits. 11. Management Does Not Recognize the Benefits In many cases, organizations do not measure the benefits of the data warehouse or do not properly report those benefits to management. Project managers, and IT as a whole, are often shy in boasting about their accomplishments. Sometimes they may not know how to report on their progress or on the impact the data warehouse is having on the organization. The project managers may believe that everyone in the organization will automatically know how wonderfully IT performed, and that everyone will recognize the data warehouse for the success that it is. They are wrong. In most cases, if management is not properly briefed on the data warehouse, they will not recognize its benefits and will be reluctant to continue funding something they do not appreciate. Conclusion There are many ways for a data warehouse project to fail. The project can be over budget, the schedule may slip, critical functions may not be implemented, the users could be unhappy and the performance may be unacceptable. The system may not be available when the users expect it, the system may not be able to expand function or users, the data and the reports coming from the data may be of poor quality, the interface may be too complicated for the users, the project may not be cost justified and management might not recognize the benefits of the data warehouse. By knowing the types of failures others have experienced you are in a position to avoid those failures. You must know what risks to anticipate with the data warehouse if you are going to deal with those risks and head them off before they sink your project. The most important activity of a project manager is picking the right people and avoiding those who can and will hurt the project.

PLANNING AN IMPLEMENTATION:
The main difference lies not so much in the overall architecture, implementation, or delivery process of the data warehouse, but more in the sense of what systems or processes are supported by the data warehouse. The classic data warehouses mostly contain data that is directly related to the core business itself, for example, sold items during a certain time frame in the retail sector. Therefore, it can directly support key decisions for the core business of companies. Data warehouses in the IT area most likely do not support the core business directly, but indirectly by improving the quality of the systems or processes on which the core business relies. If we use the retail sector as an example again, it

could contain historical information about the transaction time a customer needs to order a specific item from a Web-based shopping system. When we use the terms of the ITIL, the data warehouse mainly supports the following IT disciplines: Service-level management Capacity management Availability management Service continuity management

2.2.1 Understanding purpose and scope of data warehouse project This statement sounds obvious, but it is really the key point of basically every project. All the project stakeholders must have a clear understanding of why a data warehouse is implemented. Reasons can be very simple and abstract, such as: Improve quality of IT support: The data warehouse can help the support organization within your IT organizations to do a better job by adapting support processes based on warehouse data. Enable mid-term to long-term IT capacity planning: The data within the warehouse gives real historical data about the utilization of your systems. This can help planning your IT environment for the future and reducing costs by eliminating unused capacity. Support IT managers: The data warehouse can provide decision makers with valuable data to support their strategic IT decisions. For example, just by the fact that you see the components and their performance of your business processes can help in rethinking and redesigning certain parts of your IT infrastructure. Deliver facts for decisions: This part should be considered very important, because many decisions are made on an instinctual level. With the real numbers of a data warehouse, this level can be raised. Real numbers will help in reasoning when it comes to IT budgets. These are just a few examples that should help you in outlining the foundation of the data warehouse project. Do not forget that if there are no real benefits, there is no reason to implement. 2.2.2 Understanding the impact of the data warehouse This second statement has to be seen in the context of the purpose of the data warehouse. Only understanding the purpose does not mean that the IT organization is ready to take feedback from the data warehouse and incorporate this feedback in future decisions, plans, architectures, or organizational changes. If the IT organization has no processes that are based on an iterative model, the data warehouse will be much less useful to the company. Figure 2-1 on page 21 outlines this iterative development, which originally was designed for software

Figure 2-1 Iterative business process life cycle Within this iterative business process life cycle, there is always the stage of review or reevaluation. This is the point where the data warehouse can play a key role in supporting decisions for any future changes to the business process. 2.2.3 Understanding the duration of a data warehouse project Another important point is that the data warehouse effects and benefits are not immediate. This is a mid-term to long-term solution, and this must be understood by all stakeholders. Any effects seen within one year from building a business case can be seen as a major success, and not only require significant resources to implement the warehouse itself, but also require broad support within the IT organization. After all stakeholders understand the key concepts and benefits of the data warehouse project and agree to a real implementation, the real data warehouse delivery process has to be outlined. Do we need a prototype? When it comes to a new development, the question of a prototype is raised. We assume that a prototype is not necessary for a data warehouse project because a large part of the project is already covered by the Tivoli Data Warehouse product. It is useful to start with a very reduced set of sources for a small business case for a initial pilot project. 2.2.4 The business case One of your most experienced IT analysts should be responsible for outlining a business case analysis. The business case outlines one of your business processes and examines the various factors providing a data warehouse for this business process, for

example: Identify unmet business needs for the case: For example, it could be that a business process is already monitored, but no capacity planing is possible due to the lack of historical information. Identify specific benefits for this case: That is, what specific business needs will be satisfied with the data warehouse. It is important to be specific. Identify all the costs required for implementing the data warehouse: It is important to have an estimate of the costs up front. Outline risks: Normally, the risks involved in a data warehouse are very small. There is little if any direct impact on the systems that implement the business process. The only serious risk is that the project itself fails. Identify what is needed for the data warehouse to have impact on the business process: This could be a monthly meeting with several IT groups or users to discuss the reports that are delivered by the reporting solution. In addition, the feedback that has been collected during the discussion with the stakeholders can be used in outlining the business case.

JUSTIFYING THE WAREHOUSE:


Outline the requirements
The various requirements have to be outlined for implementing the data warehouse. These requirements not only cover any logical components, but also physical components, resources needed, and all the participating IT groups or users. Examples are: Analyze the requirements of key users (for example, by interviews) Outline all users involved Design the logical structure of the data warehouse Outline all data sources and check for accessibility Estimate initial and mid-term sizing of the database Design the delivery process Outline the training requirements

Outline and present one technical blueprint


The final document should be a complete technical architecture document that outlines the whole data warehouse solution, including: Technical architecture Network and system infrastructure Physical design of databases and servers Hardware recommendations Logical design ETL processes Security concepts Backup and recovery processes This document should not be seen as a perfect, final concept of the data warehouse. We strongly recommend that the iterative Rational Unified Process outlined for our first business case also be used for the whole data warehouse project and life cycle.

DATA WAREHOUSE PROCESS MANAGMENT:


Data Warehouses (DW) integrate data from multiple heterogeneous information sources and transform them into a multidimensional representation for decision support applications. Apart from a complex architecture, involving data sources, the data staging area, operational data stores, the global data warehouse, the client data marts, etc., a data warehouse is also characterized by a complex lifecycle. In a permanent design phase, the designer has to produce and maintain a conceptual model and a usually voluminous logical schema, accompanied by a detailed physical design for efficiency reasons. The designer must also deal with data warehouse administrative processes, which are complex in structure, large in number and hard to code; deadlines must be met for the population of the data warehouse and contingency actions taken in the case of errors. Finally, the evolution phase involves a combination of design and administration tasks: as time passes, the business rules of an organization change, new data are requested by the end users, new sources of information become available, and the data warehouse architecture must evolve to efficiently support the decision-making process within the organization that owns the data warehouse. All the data warehouse components, processes and data should be tracked and administered via a metadata repository. In , we presented a metadata modeling approach which enables the capturing of the static parts of the architecture of a data warehouse. The linkage of the architecture model to quality parameters (in the form of a quality model) and its implementation in the metadata repository ConceptBase have been formally described in presents a methodology for the exploitation of the information found in the metadata repository and the quality-oriented evolution of a data warehouse based on the architecture and quality model. In this paper, we complement these results with metamodels and support tools for the dynamic part of the data warehouse environment: the operational data warehouse processes. The combination of all the data warehouse viewpoints is depicted in Fig. 1.

In the three phases of the data warehouse lifecycle, the interested stakeholders need information on various aspects of the examined processes: what are they supposed to do, how are they implemented, why are they necessary and how they affect other processes in the data warehouse [68, 29]. Like the data warehouse architecture and quality metamodels, the process metamodel assumes the clustering of their entities in logical, physical and conceptual perspectives, each assigned with the task of answering one of the aforementioned stakeholder questions. In the rest of this section we briefly present the requirements faced in each phase, our solutions and their expected benefits. The design and implementation of operational data warehouse process is a labor-intensive and lengthy procedure, covering thirty to eighty percent of effort and expenses of the overall data warehouse construction [55, 15]. For a metamodel to be able to efficiently support the design and implementation tasks, it is imperative to capture at least two essential aspects of data warehouse processes, complexity of structure and relationship with the involved data. In our proposal, the logical perspective is capable of modeling the structure of complex activities and capture all the entities of the widely accepted Workflow Management Coalition Standard [64]. The relationship of data warehouse activities with their underlying data stores is taken care of in terms of SQL definitions. This simple idea reverts the classical belief that data warehouses are simply collections of materialized views. In previous data warehouse research, directly assigning a nave view definition to a data warehouse table has been the most common practice. Although this abstraction is elegant and sufficient for the purpose of examining alternative strategies for view maintenance, it is incapable of capturing real world processes within a data warehouse environment. In our approach, we can deduce the definition of a table in the data warehouse table as the outcome of the combination of the processes that populate it. This new kind of definition complements existing approaches, since our approach provides the operational semantics for the content of a data warehouse table, whereas the existing ones give an abstraction of its intentional semantics. The conceptual process perspective traces the reasons behind the structure of the data warehouse. We extend the demand-oriented concept of dependencies as in the Actor-Dependency model [68], with the supply-oriented notion of suitability that fits well with the redundancy found often in data warehouses. As an another extension to the Actor-Dependency model, we have generalized the notion of role in order to uniformly trace any person, program or data store participating in the

system. By implementing the metamodel in an object logic, we can exploit the query facilities of the repository to provide the support for consistency checking of the design. The deductive capabilities of ConceptBase [28] provide the facilities to avoid assigning manually all the interdependencies of activity roles in the conceptual perspective. It is sufficient to impose rules to deduce these interdependencies from the structure of data stores and activities. While the design and implementation of the warehouse are performed in a rather controlled environment, the administration of the warehouse has to deal with problems that evolve in an ad-hoc fashion. For example, during the loading of the warehouse contingency treatment is necessary for the efficient administration of failures. In such events, not only the knowledge of the structure of a process is important; the specific traces of executed processes are also required to be tracked down in an erroneous situation, not only the causes of the failure, but also the progress of the loading process by the time of the failure must be detected, in order to efficiently resume its operation. Still, failures during the warehouse loading are only the tip of the iceberg as far as problems in a data warehouse environment are concerned. This brings up the discussion on data warehouse quality and the ability of a metadata repository to trace it in an expressive and usable fashion.

LOOKING THE FUTURE:


Sensor technology.
The use of sensor technology has already become a part of our daily livestracking vehicles through toll plazas, merchandise in retail outlets, supplies in manufacturing plants, energy consumption and even individuals. Sensors, which will soon cost less than jelly beans, will be deployed by the millions. These "smart dust" sensors will be the size of a grain of sand containing sensors, computational capability, bi-directional communications and a power supply. Massively distributed sensor networks will generate huge quantities of data with opportunities for advanced analytics that are unfathomable by today's standards.

Pervasive business intelligence (BI).


The value of content in a data warehouse is amplified when access is provided throughout an organization. Decisioning services provided to front-line knowledge workers help transform the strategic vision of an organization to operational reality. The next step will be to deliver such capabilities to suppliers, distributors, customers and government agencies. More aggressive service levels will be required for performance, availability and data freshness. Business rules engines, business activity monitoring and advanced visualization will be necessary for effective deployment.

In-database processing.
In yesterday's world, sophisticated analytics were often performed in separate data marts using specialized file systems. Advances in Teradata relational database management systems

(RDBMSs) and third-party application technologies allow in-database processing to deliver significantly better performance for high-end analytics. For example, the old style of multidimensional online analytical processing (MOLAP) is rapidly being replaced by relational OLAP (ROLAP) for much greater scalability and lower total cost of ownership. MicroStrategy was an early partner with Teradata in this area. More recent Teradata partners in the ROLAP and hybrid OLAP space include Microsoft, Hyperion (Oracle) and Cognos, an IBM company. Another example of this trend is the evolution of traditional extract, transform and load (ETL) tools to the extract, load and transform (ELT) approach, wherein transformations take place inside the scalable relational database rather than on an external server. This approach significantly reduces data movement for complex transformations and makes very effective use of the inherent scalability of the Teradata platform. Data mining, traditionally performed inside proprietary file systems, has also moved into the data warehouse. Implementations by KXEN and SAS demonstrate the power of in-database processing even for the most sophisticated analytics.

Non-traditional data types.


In tomorrow's world, structured content will constitute less than 20% of the volume in a data warehouse. New data types such as biometrics, images/video, sound/voice, geospatial, text and XML documents will dominate the storage and analytic resources in advanced data warehouses. This will require new tools for analysis and extensions to traditional relational structures for storing and processing complex data types. These innovations, and many others, will provide huge opportunities for extracting more value from scalable data warehouses. Moreover, the ongoing evolution in processor and storage technologies ensures that hardware resources will not prevent the Teradata RDBMS from handling these new analytic opportunities.