You are on page 1of 20

IBM Software Thought Leadership White Paper

June 2010

Data governance for geographical information systems


Maximize the value of GIS data with IBM InfoSphere Foundation Tools and IBM InfoSphere Information Server

Data governance for geographical information systems

Contents
2 Introduction 3 Geographic information systems: Analytics in action 4 InfoSphere Information Server: A unied foundation for information architectures 7 The InfoSphere Information Server architecture 9 InfoSphere Information Server and GIS touch points 16 Integrating InfoSphere Information Server and GIS with other software components 18 GIS solutions in the real world 20 Building trustworthy data with IBM software

For government agencies, this is a powerful promise; by some estimates 80 percent of all agency data has some type of spatial or location component. For example, agencies can use GIS tools to easily plot census, traffic or commerce data onto maps, uncovering geographic trends that might otherwise be missed. But while this new world of information brims with possibilities, agencies face serious challenges managing the tremendous amounts of data being generated. The sheer volume of data, combined with changing laws and regulations, can make it difficult to integrate multiple systems and turn the data into consistent, timely and accurate information for decision making. IBM InfoSphere Information Server can help agencies derive more value from complex, heterogeneous information. It helps business users and IT personnel collaborate to understand the meaning, structure and content of information across a wide variety of sources. With IBM InfoSphere Information Server, users can access and use information in new ways to drive innovation, increase operational efficiency and reduce risk. InfoSphere Information Server can also help agencies maximize the value of a GIS by creating a gateway between the system and data sources that may have been previously inaccessible. Using InfoSphere Information Server in conjunction with a GIS enables an agency to enrich and leverage spatial information in new ways and provide new perspectives on old issues. This white paper provides an overview of InfoSphere Information Server with an emphasis on how InfoSphere Information Server and InfoSphere Foundation Tools can be used in conjunction with GIS. It begins with a brief discussion of GIS, including the primary GIS vendors, followed by an overview of InfoSphere Information Server and InfoSphere

Introduction
During the past two decades, government agencies across the world have made a signicant investment in information. Agencies of all types are using enterprise resource planning (ERP) software, supply chain management packages and customer relationship management (CRM) solutions, molding these information tools in a nearly endless variety of combinations to better serve their constituents. Agencies are also taking advantage of innovations for gathering and processing information. These technologies are a mix of the familiar and the new: Service-oriented architectures (SOAs), Web services, XML, grid computing and radio frequency identication (RFID). One technologygeographic information system (GIS)holds particular promise for agencies. A GIS links nearly any type of data with spatial and location information, enabling agencies to see and analyze the real-world dimensions that their information inhabits.

IBM Software

Foundation Tools components and capabilities. It will also show how GIS data sources can be analyzed, transformed and enhanced with InfoSphere Information Server and InfoSphere Foundation Tools, providing references to complementary software from IBM and third parties. The paper also provides examples of how the capabilities can be used to help agencies and organizations achieve their missions.

GIS also helps users anticipate future outcomes by depicting regression analysis for forecasting future events and processes. This analytical capability is what separates a true GIS from digital mapping. The ability of GIS to manage, correlate, predict, model and share geographic information makes GIS an essential analytical tool. But due to the specialized nature of the technology, and the additional training required to master the systems, GIS is frequently segregated by agencies into well-dened departments or shops. GIS departments are often viewed by other agency segments as an organizational black box, with staff that takes in mapping requirements and data; performs a series of complicated, esoteric machinations; and then returns the desired map. But with so much agency data having a spatial or location component, agencies are realizing that GIS analysis can provide unique and invaluable insight, and are looking for ways to share that resource more broadly across their organizations. The challenge is that most agency data is not stored in the GIS, but in ERP systems, CRM solutions, relational databases and other repositories scattered across the organization, sometimes in legacy applications, systems and formats. Regardless of the format or source, if data has a spatial or location component, it can be mappedand InfoSphere Information Server can help organizations create gateways between traditional data sources and GIS resources. Using InfoSphere Information Server in conjunction with a GIS can help an agency enrich and leverage spatial information in new ways.

Geographic information systems: Analytics in action


Making electronic maps is probably the most well-known use for GIS, but there are plenty of other potential uses. Strictly dened, GIS is a computer system capable of capturing, storing, analyzing and displaying geographically or spatially referenced information; in other words, data that is referenced by a spatial or physical location of some kind. Another way of describing GIS is to refer to it as software that links geographical data (in the form of coordinates) to descriptive (also known as tabular) data, making it possible to analyze the relationship between geographic data and its descriptive/tabular elements. Data in a GIS can exist in one of two basic formats: Vector and raster. Vector data consists of points, lines and polygons dened by geographic coordinates. Raster dataan image that covers a range of points where each pixel corresponds to a geographic location and has an associated valueis usually represented as a georeferenced picture or photo. A GIS can integrate those two format types as data layers or themes and can also link the vectors or images to descriptive or tabular data, producing a complete geospatial representation.

Data governance for geographical information systems

GIS software providers Many companies provide GIS software and applications; the largest is Environmental Systems Research Institute (commonly known as ESRI; www.esri.com), a privately held software company headquartered in Redlands, California. According to ESRI, their software is used by more than 300,000 organizations worldwide, including most U.S. Federal agencies and national mapping agencies, and more than 24,000 U.S. state and local government agencies. The ranks of GIS software providers also include companies such as Intergraph, MapInfo, Bentley and Autodesk. Regardless of the provider, virtually any database associated with a GIS can be enhanced by InfoSphere Information Server.

volume requirements, enabling companies to quickly deliver high-quality business results. InfoSphere Information Server supports a variety of initiatives:

InfoSphere Information Server: A unied foundation for information architectures


Today, critical agency and business initiatives cannot succeed without effectively integrated information. Initiatives such as single view of the constituent/customer, business intelligence (BI) and supply chain management require consistent, complete and trustworthy information. If the information cannot be trusted or doesnt meet their needs, end users will either stop using a system for information or may create local copies of the data on a spreadsheet. Those additional versions and extracts of the data result in the central organization losing control of valuable information. IBM InfoSphere Information Server is designed to provide a comprehensive, unied foundation for enterprise information architectures. It can scale to meet growing information

Business intelligence: InfoSphere Information Server makes it easy to develop a unied view of the business for better decisions. It helps users understand existing data sources; cleanse, correct and standardize information; and load analytical views that can be reused throughout the agency. IBM provides a BI system through IBM Cognos software. Master data management (MDM): InfoSphere Information Server helps simplify the development of authoritative master data (creating a golden record) by showing where and how information is stored across source systems. It also consolidates disparate data into a single, reliable record; cleanses and standardizes information; removes duplicates; and links records across systems. This master record can be loaded into operational data stores, data warehouses or master data applications. The record can also be assembled, completely or partially, on demand. Infrastructure rationalization: InfoSphere Information Server helps reduce operating costs by showing relationships between systems and by dening migration rules to consolidate instances or move data from obsolete systems. Data cleansing and matching ensure high-quality data enters the new system. Agency transformation: InfoSphere Information Server can speed development and enhance agency agility by providing reusable information services that can be plugged into applications, business processes and portals. Those standards-based information services are maintained centrally by information specialists, but are widely accessible throughout the enterprise and can also be accessed by other authorized agencies/entities.

IBM Software

Risk and compliance: InfoSphere Information Server helps improve visibility and data governance by enabling complete, authoritative views of information with proof of lineage and quality. Those views can be made widely available and reusable as shared services, while the rules inherent in them are maintained centrally. Data warehousing: InfoSphere Information Server can help create data warehousing and data mart applications. These are applications where data is offloaded from a transactional system and reorganizedusually for analytical or reporting purposes. Server, application and database consolidation: InfoSphere Information Server can help consolidate structured data sources contained in applications to support the reduction of the number of physical servers, applications and databases. Migrations: InfoSphere Information Server is frequently used during database or application migrations. While updating applications or during lengthy procedures such as ERP implementation, InfoSphere Information Server can be used to migrate data between legacy systems and ERP modules, or to provide temporary connectivity between legacy systems and implemented ERP sections during migration.

Data sources

Understand

Cleanse

Integrate

Targets

Discover, model and govern information structure and content

Standardize, merge and correct information

Combine, restructure, synchronize and move information for delivery


$330,646.21 $0 $440K

Parallel processing
Head count Customer acquisition Avg yield per customer Express

Metadata repository

Figure 1: InfoSphere Information Server enables businesses to perform three key functions: Gain an understanding of data, cleanse data and integrate data. Understand your information with IBM InfoSphere Foundation Tools

InfoSphere Information Server capabilities

InfoSphere Information Server combines a variety of IBM information integration technologies. Together, they enable organizations to investigate and understand their data; cleanse and certify it; and then transform and deliver it as a fully trusted resource to systems across the agency or enterprise (see Figure 1).

For an organization to effectively integrate data, it must rst establish a clear picture of what data it has, where the data resides and its overall condition. InfoSphere Foundation Tools, which are part of InfoSphere Information Server, can help organizations automate data proling and data-quality auditing to:

Understand data sources and relationships Eliminate the risk of using or proliferating bad data Improve productivity Leverage existing IT investments

Data governance for geographical information systems

InfoSphere Foundation Tools help agencies collaborate across user roles. Data analysts can use the analysis and reporting functionalities to generate integration specications and business rules that can be monitored over time. Meanwhile, subject matter experts can use Web-based tools to dene, annotate and report on elds of agency data. The common metadata foundation makes it easier for different types of users to create and manage metadata by using tools that are optimized for their roles. InfoSphere Foundation Tools enable organizations to capture and organize business metadata, provide modeling capabilities, assist in the translation of business rules into transformation processes and analyze data lineage by leveraging metadata.

allows a single record to survive from the best information across sources for each unique entity, helping you to create a single, comprehensive and accurate view of information.
Integrate your data

InfoSphere Information Server helps organizations transform and enrich information to ensure that it is in the proper context for new uses. It includes hundreds of prebuilt transformation functions for combining, restructuring and aggregating information. For example, InfoSphere Information Server provides in-line validation and transformation of complex data types, and high-speed joins and sorts of heterogeneous data. It also provides high-volume, complex data transformation and movement functionality that can be used for stand-alone extract, transform and load (ETL) scenarios, or as a real-time data processing engine for applications or processes.

Components of InfoSphere Foundation Tools InfoSphere InfoSphere InfoSphere InfoSphere InfoSphere InfoSphere InfoSphere Information Analyzer Business Glossary Business Glossary Anywhere Data Architect FastTrack Discovery Metadata Workbench InfoSphere Information Server data integration tools InfoSphere InfoSphere InfoSphere InfoSphere DataStage Change Data Capture Federation Server Classic Federation Server

Cleanse your information

The InfoSphere QualityStage component of InfoSphere Information Server supports information quality and consistency by standardizing, validating, matching and merging data. With InfoSphere QualityStage, organizations can certify and enrich common data elements, use trusted data such as postal records for name and address information and match records across or within data sources. InfoSphere Information Server

InfoSphere Information Server enables organizations to virtualize, synchronize and move information to the people, processes or applications that need it. Information can be delivered by using federation, time-based or event-based processing, moved in large bulk volumes from location to location or accessed in places when it cannot be consolidated. InfoSphere Information Server also provides direct, native access to a wide variety of information sources, both mainframe and distributed. It enables access to databases, les,

IBM Software

services and packaged applications, as well as to content repositories and collaboration systems. Companion products from IBM support high-speed replication, data synchronization and distribution across databases, change data capture capabilities and event-based publishing of information.

Unied parallel processing engine

The InfoSphere Information Server architecture


InfoSphere Information Server provides a unied architecture that supports all types of information integration through common services, unied parallel processing and unied metadata (see Figure 2). To ensure its availability across an organization, it employs an SOA; the SOA also connects the individual components of InfoSphere Information Server.

At the heart of InfoSphere Information Server is a unied parallel processing engine that handles everything from analysis of large databases for InfoSphere Information Analyzer to data cleansing for InfoSphere QualityStage and complex transformations for InfoSphere DataStage. The parallel processing engine delivers outstanding performance, enabling organizations to handle more data more quickly. Benets of the engine include:

Unified user interface

Parallelism and data pipelining to complete increasing volumes of work in decreasing time windows Scalability support to add hardware (for example, processors or nodes in a grid) with no changes to the data integration design Optimized database, le and queue processing to handle large les that cannot t in memory all at once or large numbers of small les

Common connectivity
Analysis interface Development interface Web admin interface

Common services
Metadata services Unified service deployment Security services Logging and reporting services

Unified parallel processing

Unified metadata

Design Understand Cleanse Transform Deliver

Operational

Common connectivity

InfoSphere Information Server connects to information sources whether they are structured, unstructured, applications or on the mainframe. Metadata-driven connectivity is shared across the InfoSphere Information Server components, and connection objects are reusable across functions. Connectors provide design-time importing of metadata, data browsing and sampling, runtime dynamic metadata access, error handling and high functionality and high-performance runtime data access. Prebuilt interfaces for packaged applications called packs provide adapters to SAP, Siebel, Oracle and other applications, enabling integration with enterprise applications and associated reporting and analytical systems. In some cases, you can extract specialized metadata associated with those sources.

Structured

Unstructured

Applications

Mainframe

Figure 2: InfoSphere Information Server connects to a wide range of data sources and includes a unied parallel processing engine, a metadata repository and a host of shared services.

Data governance for geographical information systems

Unied metadata

InfoSphere Information Server is built on a unied metadata infrastructure that enables shared understanding between business and technical domains. This infrastructure helps reduce development time and provides a persistent record that can improve condence in information. All functions of InfoSphere Information Server share the same metadata model, making it easier for different roles and functions to collaborate. A common metadata repository provides persistent storage for all InfoSphere Information Server suite components, all of which use the repository to navigate, query and update metadata. The repository contains four types of metadata: Technical, business, dynamic and operational.

that uses a standard relational database such as IBM DB2, Oracle or Microsoft SQL Server for persistence (DB2 is provided with InfoSphere Information Server). Those databases provide backup, administration, scalability, parallel access, transactions and concurrent access.
Common services

InfoSphere Information Server is built entirely on a set of shared services that perform core tasks. Design, execution and metadata functions are all available as shared services:

Technical metadata is information about the format of the data, such as the tables that are present, the attributes of those tables, how many characters wide a particular attribute may be and when the data was last updated. Business metadata can include a wide range of information about data usage, such as the owner or steward of a piece of data, the intended use of the data and denitions for acronyms or domain values. Dynamic metadata includes design-time information. Operational metadata includes performance monitoring, audit and log data and data proling sample data.

Design: Design services help developers create functionspecic services that can be shared. For example, InfoSphere Information Analyzer calls a column analyzer service that was created for enterprise data analysis but can be integrated with other parts of InfoSphere Information Server because it exhibits common SOA characteristics. Execution: Execution services include logging, scheduling, monitoring, reporting, security and Web framework, enabling organizations to manage and control all components from a single interface. Metadata: Using metadata services, metadata is shared live across tools so that changes made in one InfoSphere Information Server component are instantly visible across all of the suite components. Metadata services are tightly integrated with the common repository and are packaged in InfoSphere Metadata Server.

Because the repository is shared by all suite components, proling information that is created by InfoSphere Information Analyzer is instantly available to users of other InfoSphere Information Server productssuch as InfoSphere DataStage and InfoSphere QualityStage. The repository is a Java 2 Platform, Enterprise Edition (J2EE) application

The common services layer manages how services are deployed from any of the product functions, allowing cleansing and transformation rules or federated queries to be published as shared services within an SOA, using a consistent and easy-to-use mechanism. This can help organizations pursuing SOA-centric architectures by exposing data integration, federation or cleansing processes directly to an SOA rather than requiring a separate integration layer.

IBM Software

Unied user interface

InfoSphere Information Server provides a common graphical interface and tool framework that makes it easy for organizations to access the full power of the solution. Shared interfaces such as the InfoSphere Information Server console and the Web console provide a common look and feel, visual controls and user experience across products, making it possible to reduce training time and helping to simplify overall administration. Common functions, such as catalog browsing, metadata import, query and data browsing, all expose underlying common services in a uniform way. InfoSphere Information Server provides rich client interfaces for highly detailed development work and thin clients that run in Web browsers for administration. Application programming interfaces (APIs) support a variety of interface styles, including standard request-reply, service-oriented, event-driven and scheduled task invocation. This provides a exible range of capabilities to meet different users specic needs, while also ensuring a standard look and feel throughout product interfaces.

InfoSphere Information Services Director

Data sources
RDBMS ERP Application Mainframe Flat files Web services ODBC FTP API

Understand
InfoSphere Information Analyzer InfoSphere Business Glossary/ Business Glossary Anywhere InfoSphere Discovery InfoSphere Data Architect

Cleanse
InfoSphere QualityStage

Integrate/ delivery
InfoSphere DataStage InfoSphere FastTrack InfoSphere Federation Server InfoSphere Change Data Capture

Targets

$330,646.21 $0 $440K

Parallel processing Metadata repository


Technical metadata Business metadata Operational metadata

Head count Customer acquisition Avg yield per customer Express

InfoSphere Metadata Workbench

Figure 3: InfoSphere Information Server comprises many products spread over several touch points. GIS data quality

InfoSphere Information Server and GIS touch points


There are a number of touch points where InfoSphere Information Server capabilities can be used almost immediately out of the box with GIS data. They fall into three categories: Understanding data, cleansing data and integrating/ delivering data. The capabilities can be used either in unison or separately, but all of them utilize the same integrated InfoSphere Information Server architecture (see Figure 3).

Understanding the actual quality, content and structure of data is an important rst step to make critical business decisions. Overall data quality depends on many factors, such as correct data types, consistent formatting, retrievability and usability. If the structure and content of your data is poor, then queries of that data will be incomplete, organizations will be unable to make informed decisions and business users will learn to be

10 Data governance for geographical information systems

wary of any results coming from that system, following the old adage of garbage in, garbage out. InfoSphere Information Analyzer can evaluate the format, content and structure of both GIS tabular and traditional data for consistency and quality, or perform cross-table analysis between GIS data and nontraditional GIS data. InfoSphere Information Analyzer helps organizations to improve the accuracy of their data by identifying inconsistencies, redundancies and anomalies in data at the column, table and cross-table level. It also can make inferences about the best choices for data structure, helping you learn more about the optimal structure of your data. The inferences that InfoSphere Information Analyzer makes often indicate areas where the quality of an organizations data can be improved. When an analysis completes, users can review the inferences and choose to either accept or reject them. After a quality assessment of organizational data is complete, users can take the results to InfoSphere QualityStage, InfoSphere DataStage or other InfoSphere Information Server components. For example, the results from a data proling project can help your organization determine whether you need to complete data cleansing tasks using InfoSphere QualityStage. Agencies can also use InfoSphere Information Analyzer to regularly test the quality of their data, enabling them to see how well their data cleansing efforts are working over a period of time. To conduct an assessment, an organization can apply business rules to a sample set of data, starting with a baseline report and then following up with tests either on demand or at scheduled intervals. By evaluating how much of their data passes or fails the test, organizations can establish ongoing measures of data quality, which are critical components of a successful data governance program, including providing data governance specically to tabular GIS data.

Metadata sharing and metadata lineage

InfoSphere Information Server helps organizations get a handle on their metadata, develop links between data and metadata and expose the metadata to business users and analysts. The single metadata repository used by InfoSphere Information Server is designed to store the technical, business and operational metadata, serving as the metadata repository for all data integration purposes. There are different interfaces within the InfoSphere Information Server family to create, edit, analyze and publish metadata to the user community in different ways depending on need. Plus, because InfoSphere Information Server is based on a single metadata repository for data integration, metadata for GIS tabular data can be included with non-GIS data. Metadata created, developed and maintained within the metadata repository ts into three categories:

Technical metadata: Database tables, eld names, eld width, data types and eld size Business metadata: Descriptions of the contents of the data, intended audience, stewardship or owner, meanings for domain values, or anything that might be captured from a subject matter expert to provide more context or meaning to the data Operational metadata: Any process or job ow to show what has transpired with the data, including job ows that show transformations/calculations or summarizations of data, and data cleansing/deduplication processes

As with other data systems, metadata is critically important within a GIS, but is used in a slightly different context. GIS metadata is used and categorized differently between many organizations and can include a wide range of information (in many cases, inconsistently), such as the spatial envelope

IBM Software 11

or boundaries of the data, information about the source and capture method, time period when the data was captured, geographical reference/projection, stewardship and normal display characteristics. Access to GIS metadata allows users to better determine if a given data set will work for the intended map or spatial analysis, or if another data set should be found or created. Access to GIS metadata also helps agencies nd more opportunities for sharing existing data sets, rather than devoting time and resources to create a new set for each purpose. Within a more general scope (both inside and outside of the GIS venue), metadata allows users to determine the availability and usefulness of data sets. They have a shared and accepted denition of what a given term means, and can then link that accepted denition to a database, database attribute or another IT asset. Without a common denition and accepted master record, it is much more likely that users will either use different data sources or create their own, causing version control problems and creating distrust of some organizational data sources. The problem of distrust is common in the business world. For example, in a meeting with a number of HR professionals from a government agency, the group was asked what their head count was. But the agency had ve different denitions of what their head count actually was, depending on whether the tally included funded full-time equivalent positions. In government terms, a full-time equivalent position may be lled by a single, full-time employee or by two or more parttime employees; part-time, temporary and intern workers; personnel on temporary duty, full-time employees only or employees currently receiving benets. Each denition created a different data set, all were in use and the head-count number varied depending on who was asked.

It is almost impossible to determine where data comes from and whether or not the data can be trusted under such conditions. In many cases, if users dont trust the data, they will copy the data to a spreadsheet and massage it for their own purposeswhich results in inaccurate data being used to run the organization. Many of the InfoSphere Information Server components create and share metadata in a very transparent fashion, without requiring further effort from developers. InfoSphere QualityStage and InfoSphere DataStage developers can build jobs for data quality, cleansing and data integration, creating operational data in the process. For example, developers using InfoSphere QualityStage and InfoSphere DataStage can also read or develop notes and annotations to their processes for collaborationdocumenting what an analyst may have created, or simply creating notes on how a process worksthen share those notes with other users and interfaces. Analysts using InfoSphere Information Analyzer to look at the contents of data sources not only develop technical metadata in the process of connecting to data sources, but can also create notes about what results to share with job developers.
InfoSphere Business Glossary and InfoSphere Business Glossary Anywhere

To create business metadata, link business metadata to IT assets and disseminate metadata to users, organizations use InfoSphere Business Glossary and a companion product called InfoSphere Business Glossary Anywhere. In the example of the HR department with multiple ways to count their total staff, it would enable the user community to collaborate on the official denition for the term head count, where that information is stored, descriptive usage information and the data steward or owner for the term.

12 Data governance for geographical information systems

InfoSphere Business Glossary provides a Web-based tool for creating and managing standard denitions of business concepts. Through InfoSphere Business Glossary, users work collaboratively to share and build common understanding to create a classication system that is tailored to an organizations specic needs and structure. InfoSphere Business Glossary helps simplify the task of managing, browsing and customizing the broad variety of metadata that is stored in the repository of InfoSphere Metadata Serverthe critical details about tables, columns, models, schemas, operations and other components of the data integration process. Within InfoSphere Business Glossary, metadata is organized into categories, each of which contains terms. Users can use terms to classify other objects in the metadata repository based on the needs of your organization. You can also designate users or groups as stewards for any metadata object. For users, InfoSphere Business Glossary becomes an electronic data dictionary, providing an easy-to-comprehend way to navigate the metadata that keeps the entire organization speaking the same language. InfoSphere Business Glossary helps business users:

Find business information that is derived from metadata: Metadata helps users to understand the meaning of the data, its currency, its lineage and who the data owner is. Access metadata without complicated tooling and querying: Metadata objects can be arranged in a hierarchical fashion to simplify browsing of the data objects. Provide collaborative enrichment of business metadata: Maintaining business metadata is an ongoing process: Data inputs evolve and business users collaborate and add notes, annotations, categories and synonyms to enrich the metadata. InfoSphere Business Glossary provides a tool for recording those denitions and for relating business concepts together into taxonomies. This places the business requirements into the same metadata foundation used by the proling and analysis processes.

InfoSphere Metadata Workbench

InfoSphere Metadata Workbench is an interface that analysts and developers can use to discover and analyze relationships between information assets in the metadata repository. It enables users to understand, analyze, audit and manage the ow of data throughout their organization. InfoSphere Metadata Workbench provides IT professionals with a design-time tool for managing and understanding the assets generated and used by InfoSphere Information Server. It also permits registration of outside processes (such as COBOL programs) to be documented within the metadata ow. By providing data lineage reports and analysis, InfoSphere Metadata Workbench supports IT professionals who are responsible for compliance and governance initiatives (such as Sarbanes-Oxley Act compliance). It also provides forward and backward impact analysis that displays the impact of proposed changes to information management environments.

Develop a common vocabulary between business and technology: A common vocabulary allows multiple users to share a common denition of the meaning of data. Users can assign categories and terms that are meaningful in an organizational context, and create a hierarchy of categories for ease of browsing. Take part in data governance and stewardship activities: Data assurance programs assign responsibility to business users (data stewards) for the management of data through its life cycle.

IBM Software 13

InfoSphere Metadata Workbench helps analysts and developers:

Explore information assets that reside in the metadata repository of InfoSphere Information Server Perform simple and advanced asset search and querying See information assets in the context of the entire organization with integrated cross-suite viewing capabilities Create graphical views of asset relationships/ows Analyze dependencies and relationships of key InfoSphere Information Server assets and BI reports Trace lineage from jobs and databases to Cognos or other BI reports Understand columns, tables and other assets Perform lineage analysis to understand where data comes from or goes to by using shared table information, job design information or operational metadata from job runs Perform impact analysis to understand dependencies and the effects of changes to a column or job across InfoSphere Information Server Analyze operational metadata from job runs and report on rows written and read, and on the success or failure of events Manage InfoSphere Information Server metadata to obtain in-depth analysis reports Create and edit descriptions of information assets Assign glossary terms to information assets Reconcile duplicate assets Map databases to database aliases Access runtime information to enrich reporting

(with color codes to show asset location), gauges and tables. While the dashboard provides a valuable high-level view, a commander might want to know more about a particular readiness metric. In this case, InfoSphere Metadata Workbench can be used to quickly show that officer the data feeds and processes used to generate that particular metric, enabling the officer to evaluate the validity of the metric without deep technical knowledge. Figure 4 showcases the various software components collectively known as InfoSphere Foundation Tools.

InfoSphere Business Glossary/Business Glossary Anywhere Robust data dictionary, common definitions, establish stewardship, links from terms to data sources/objects

InfoSphere Information Analyzer Understand what is actually contained in database fields: null, duplicates, format errors, etc.

InfoSphere Data Architect Database, GIS and GIS metadata data source modeling

InfoSphere Information Server metadata repository


Technical metadata Business metadata Operation metadata Nonspatial Spatial (specifically tabular)

InfoSphere Metadata Workbench

Metadata lineage and impact analysis

As an example of how InfoSphere Metadata Workbench can help enhance operational effectiveness, consider a military readiness dashboard application with geographical elements that highlight the location of military assets and their current readiness level. Data is displayed using a variety of maps

Figure 4: GIS metadata management is enabled through IBM InfoSphere Foundation Tools, which provide a direct interface for discovering, gathering and exploiting metadata.

14 Data governance for geographical information systems

Modeling GIS data and metadata

Discover business objects hidden within data

InfoSphere Data Architect (IDA) helps organizations discover the structure of heterogeneous data sources by examining and analyzing the underlying metadata, and it assists in modeling planned data sources/migrations. IDA uses established Java Database Connectivity (JDBC) connections, enabling the users to explore existing data structures using native queries and easily browse the hierarchy of data elements. With IDA, users develop data models that can be incorporated into a data integration project at a source and a target level. IDA can create logical, physical and domain models for a variety of relational database sources. Elements from logical and physical data models can be visually represented in diagrams using Information Engineering (IE) notation; alternatively, physical data model diagrams can use the Unied Modeling Language (UML) notation. InfoSphere Data Architect enables data professionals to create physical data models from scratch, from logical models using transformation or from the database using reverse engineering. IDA also enables modelers and architects to dene and implement standards that help increase data quality and enterprise consistency for naming, meaning, values, relationships, privileges, privacy and traceability. Standards can be dened once and associated with diverse models and databases, helping to improve efficiency and consistency. IDA also includes extensible, rules-driven analysis that veries compliance to naming, syntax, normalization and best-practices standards for both models and databases. Finally, IDA can import and export logical and physical data models from other modeling tools, making it possible to take advantage of existing data models wherever they appear, reducing the amount of time needed to translate models into data objects.

Data from multiple heterogeneous sources is often related in ways that are not immediately obvious. It may also contain sensitive information that is not clearly identied. This can create difficulties for agencies working to integrate GIS into their broader organization, as data analysts may be unfamiliar with the types of data traditionally managed by a separate GIS department. Uncovering these hidden relationships and categorizations is critical to the success of any data integration or governance project. However, identifying and documenting an organizations data, as well as identifying relationships, business objects and transformational logic between data sources, is not always a straightforward process. InfoSphere Discovery automates this process through heuristics and sophisticated algorithms, helping organizations accelerate data integration and governance projects, while achieving greater accuracy with less risk.
Cleansing GIS tabular data

As organizations grow, they retain old data systems and augment them with new and improved systems as goals and needs evolve. Over time, data becomes increasingly difficult to nd, manage and use, decreasing the likelihood that users can quickly make accurate decisions based on up-to-date, trusted data. The cost of poor data quality is illustrated in the following scenarios:

A military readiness application contains incorrect and outdated data, so military officers cannot correctly grasp their readiness to deploy and operate in a hostile environment potentially risking lives, equipment and mission success. A data error in a bank causes hundreds of creditworthy customers to receive mortgage default notices. The error costs the bank time, effort and customer goodwill. A marketing organization sends duplicate direct mail pieces; redundancy in each mailing costs hundreds of thousands of dollars a year.

IBM Software 15

Data quality issues spring from many sources, but can often be traced back to one of three common themes:

Integrate GIS data with nontraditional GIS data

A lack of common standards or instructions for storing data Inconsistent data entry Poor or decentralized control over key organizational data

InfoSphere QualityStage is a data reengineering environment that is designed to help organizations cleanse and enrich data. The solution includes a set of testing stages, design tools for specifying matches between data, and additional features that combine to create a development environment for building data-cleansing tasks. Using the stages and design components, developers can quickly and easily process large stores of data, selectively transforming the data as needed. InfoSphere QualityStage provides a set of integrated modules for common data reengineering and cleansing tasks:

InfoSphere Information Server can be used to extend the data sources traditionally available to GIS by integrating GIS data with data formats that normally arent readable by a GIS with or without additional transformations. Nontraditional GIS data could be in a mainframe, in a at le, in a Web service, in an ERP system or in an applicationor in any combination of these locations. For example, a state department of transportation system might contain contractor transaction work-order data in a mainframe database, and may have department nancial information contained in a relational database or application. The GIS may have tabular information including a workorder number, while the work-order mainframe data contains a work-order contractor number and task order and the relational data has a contractor number and task order. InfoSphere Information Server could join the relational data to the mainframe data based on contractor and task order data, and join the combined information to the GIS tabular information to allow further insight into contractor performance on a spatial/geographic basis. In the event that spatial transformations are required, InfoSphere Information Server components can work in conjunction with FME Server, a specialized spatial ETL product from Safe Software. FME Server integrates directly with InfoSphere DataStage and by proxy with the other components of InfoSphere Information Server to perform specialized spatial transformations. With a wide range of data integration and analysis capabilities, InfoSphere Information Server opens the door to an equally wide range of GIS projects, from simply analyzing GIS tabular data to integrating and analyzing GIS data from more conventional data sources.

Investigating Conditioning (standardizing) Designing and running matches Determining which data records survive (survivorship)

With probabilistic matching capabilities and dynamic weighting algorithms, InfoSphere QualityStage helps agencies create high-quality, accurate data and consistently identify core business informationsuch as customer, location and product throughout the organization. InfoSphere QualityStage standardizes and matches any type of informationincluding information from disparate data sources, and all types of constituent/customer, product and tabular GIS data, either in batch or at the transaction level in real time.

16 Data governance for geographical information systems

Extending GIS capabilities with InfoSphere Information Server and other IBM software

InfoSphere Information Server works with other IBM software components, such as IBM Cognos and IBM SPSS, to help agencies extract even more value from their GIS systems. Organizations can integrate and augment the investments that they have already made in people, processes and technology by integrating data sources in new ways and providing more organizational insight into data. Here are a few examples of how extending a GIS with InfoSphere Information Server can help organizations derive additional value from their GIS investments:

Data modeling and metadata modeling: GIS data (or data to be integrated with GIS data) can be modeled using InfoSphere Data Architect as a modeling tool. InfoSphere Data Architect is also a convenient gateway for planning and modeling potential linkages of GIS data to traditional database sources. This helps users better understand exactly what relationships exist between GIS and non-GIS data sources, and how those relationships can be leveraged for both existing and planned/future sources. Integrated metadata management: GIS metadata can be stored and managed within the construct of the InfoSphere Information Server metadata repository. This allows organizations to have a better shared understanding of their GIS data. In addition, applied use of the metadata repository in conjunction with other InfoSphere Information Server products, such as InfoSphere Metadata Workbench, InfoSphere DataStage and InfoSphere QualityStage, offers a window into data lineage, quality and governance. This gives users a high level of trust in the data by showing where data comes from, when it was updated and what kinds of transformations took place to modify, transform or augment it.

GIS metadata query/ag/mine/retrieve: Storing GIS and non-GIS metadata can be done using InfoSphere Metadata Server and can currently be queried with InfoSphere Business Glossary. Theoretically, organizations can also make map-layer retrieval much more interactive for users dealing with large numbers of spatial layers. A query interface (which could be a search engine or an ad hoc query interface) could be combined with a writeback mechanism to the metadata to speed the map production process. In this scenario, users could type in search terms for their mapping requirements. Those search terms could be applied against spatial metadata entries, score the metadata based on what users have found useful for mapping, retrieve the appropriate spatial metadata and provide a checkbox next to each entry. Users would then check the layers they wanted to use to generate a map and click a submit button. The selected layers would be agged, and the ag would be written back to the metadata to indicate that a user had selected this data layerhelping to improve future queries of the metadata. The selection would also be passed to the GIS interface (which could be a GIS or a simple spatial viewer) for creation of a baseline map.

Integrating InfoSphere Information Server and GIS with other components


While InfoSphere Information Server is designed to provide the data integration architecture for an organization, and a GIS would provide spatial analysis capabilities, there are additional products that can be used to optimize organizational IT investments. These capabilities exist in bolt-on, off-the-shelf products such as analytics, data mining and BI applications (see Figure 5).

IBM Software 17

Traditional data sources Native API ODBC Web service FTP

Data mining/pattern detection


IBM Cognos

SPSS

Scheduled reports Ad hoc reporting


$330,646.21

Scorecard/metrics

$0

$440K

InfoSphere Information Server


Data profiling Data cleansing Data transformation/ enrichment Parallel processing

Graphs/charts Trending
Head count Customer acquisition Avg yield per customer Express

SpotOn Vantage Cognos + ESRI arcGIS Server + SpotOn Geographical Business Intelligence (GBI)

Spatial data sources ESRI Geodatabase Oracle spatial DB2 spatial Tabular

Integrated metadata Spatial ETL (SAFE software FME)

Metadata repository

Figure 5: InfoSphere Information Server connects to data mining applications, analytics applications and BI applications to extend the use and reach of trusted enterprise information. Spatial ETL: FME Server FME Server (www.safe.com) can be used to extend InfoSphere

With FME Server, organizations can address diverse spatial data requirements for:

Information Server via InfoSphere DataStage, forming it into a scalable spatial ETL platform. This enables spatial data managers to quickly meet diverse data access requirements, permitting specialized data-conversion processes specic to the GIS arena. FME Server offers exible spatial data services that help users convert, load and distribute large volumes of data so end users can access it where, when and how they need to. FME Server brings the power of Safe Softwares proven spatial data translation, transformation and integration technology from FME Desktop to enterprise server environments, enabling organizations to take advantage of exible spatial data distribution and scalable data loading and conversion features.

Web-based spatial data access: Downloading and streaming Scalable data consolidation: Loading and migration Online quality assurance: Spatial data uploading and validation Server-based spatial data conversion: Translation and transformation

Advanced statistics, analytics and data mining: SPSS

SPSS, an IBM company, helps organizations nd and implement new sources of competitive advantage through predictive analytics. When analytics are inserted into key business processes, better decisions are made and the best actions are taken on a consistent, repeatable basis.

18 Data governance for geographical information systems

SPSS provides predictive analytics and data mining technologies that can be used to add predictive intelligence to any data integration and/or BI solution. These capabilities can further extend GIS by providing superior analytical capabilities compared to plain vanilla GIS. Advanced statistics, analytics and data mining can be included in a hybrid InfoSphere Information ServerGIS in a grey box fashion, using an InfoSphere Information Server component to deliver data to a model/algorithm developed in SPSS. SPSS then runs the data through the model/algorithm, and enriches the data. The enriched data can be fed back into InfoSphere Information Server for further processing and movement to a data warehouse, data mart or other location where it can be presented through a BI interface.
BI: IBM Cognos

Linking BI to GIS: SpotOn SpotOn Vantage (www.spotonsystems.com) extends

IBM Cognos BI by seamlessly integrating with geospatial analytic capabilities from ESRI ArcGIS Server. Organizational data can easily be presented in a geographic manner alongside tabular and chart formats without the need for custom development. Users can navigate and interact with ESRI maps and Cognos report objects without leaving their current view. Information ows between map and report while the user retains a single unied view. Information is presented synchronized and in context, with high-impact and easy-to-understand visualizations. SpotOn Vantage can:

BI is a common interface or front end for any data integration project, allowing users to view and query data for a variety of purposes. IBM Cognos 8 BI delivers the complete range of BI capabilities: reporting, analysis, dashboarding and scorecards on a single SOA. Users can create, share and use reports that draw on data from virtually any combination of data sources via InfoSphere Information Server.

Embed live, interactive, high-impact maps within Cognos reports Develop additional map layers in ESRI with business data from Cognos reports Provide multidirectional interaction that allows freedom of analysis: Dashboard interaction, map-to-report, report-tomap, map-to-map, map-based prompting, drill down, drill through and so on

Reporting gives users access to a list of self-serve report types. It is adaptable to any data source and operates from a single metadata layer to provide benets such as multilingual reporting, ad hoc query and scheduling and bursting. Analysis enables the guided exploration of information that pertains to all dimensions of your business, regardless of where the data is stored. You can analyze and report against online analytical processing (OLAP) and dimensionally aware relational sources. Dashboards communicate complex information quickly. They translate information from various corporate systems and data using gauges, charts and other graphical elements to show the relative health of your organization. Scorecarding features align your business units and tactics with strategy, providing the ability to communicate goals consistently and monitor performance against targets.

Finally, it is important to note that there are additional methods for displaying geospatial data via a simplistic spatial viewing interface such as Google Earth. However, those viewers lack the capability to provide analysis capabilities as presented in a GISthey are designed as a display mechanism and lack many spatial analytical capabilities.

GIS solutions in the real world


The following examples illustrate solutions that employ a combination of GIS, InfoSphere Information Server and IBM software components.
Biosurveillance/food supply surveillance

The earlier that food supply contamination incidents can be detected, located and quarantined, the more lives can be saved. A biosurveillance or food supply surveillance solution must be

IBM Software 19

capable of integrating data from multiple heterogeneous sources, including GIS systems, hospitals, healthcare centers, doctor offices and pharmacies. Most healthcare facilities use an ICD-10 (International Standard of Diseases and Related Health Problems) code for patient diagnosis or symptoms. By monitoring diagnostic codes from healthcare facilities, a system can look for different combinations of codes that may indicate conditions of interestfood poisoning or unusual medical symptoms that might indicate a terrorist attack. Searching for a particular pattern may be done via statistical analytics or data mining algorithms provided by IBM SPSS, and the data scored to test the likelihood that a particular record may be of interest. If the data scores high enough, an alert can be generated and sent to the appropriate authority. The system could also be used to track natural disease occurrences as well, such as inuenza outbreaks. If a historical baseline of data is present, it allows monitoring to determine what is normal for a given region, time of year or weather condition and what may indicate an outbreak of interest to health officials. This type of information could be displayed in a Cognos dashboard (to show where diseases are below normal thresholds and where there may be outbreaks occurring), as well as spatially analyzed within a GIS for simple mapping purposes or to determine the origin or potential spread of a disease.
Departments of Transportation

cials to query their data or use data in the most efficient way possible. In turn, it becomes difficult to answer questions for political officials or taxpaying constituents in an effective and timely manner. InfoSphere Information Server and other IBM software components, such as Cognos and SPSS, can help DOTs create an overarching information architecture to provide faster access and more accurate information, increasing efficiency and providing greater taxpayer value.
Military readiness

Military readiness activities typically involve combining and leveraging complex and disparate data sources, many of which are completely isolated from each other. These data sources may contain GIS information on where different military assets are located, as well as more conventional data on supplies, logistics, asset management, manpower and critical skills information. For example, supply and logistics systems provide information on systems and parts availability and location. If integrated with supplier data, a point-of-view may be developed where inventory replenishment and pipeline can also be determined. Combining that with shipping information (associating a RFID or other tracking system) can further determine parts inventory availability. Asset management is another example: It requires tracking the location of completed or deployed systems. For the Navy, it might be a ship; for the Air Force, a radar system or delivery platform; for the Army, troop carriers or main battle tanks. Battleeld commanders are typically interested in the location of their assets, the collection of availability of those assets and the readiness of assets for deployment, relocation or use. Manpower information is also concerned with not only where billets may be assigned, but also who is lling that billet, what skills that person has (if they are uent in Arabic, for example) and where that person is currently located.

Departments of Transportation (DOTs) usually have massive amounts of disparate data: GIS information for roadway maps and engineering/construction projects; asset management and tracking systems for construction and maintenance equipment; video information on pavement conditions in linear referencing systems (LRS); nancial, budgeting and contracting information; work orders that may be in 20-year-old legacy systems and so on. In most cases, this information is kept in separately designed and siloed systems and sources. As a result, it may contain many redundancies and overlaps, making it difficult for

With InfoSphere Information Server, all of those information sources can be combined to deliver commanders a better view of their operational environment. As an example, as part of an operational deployment, a high-level commander may want to move a squadron of AH-64 Apache helicopters from the continental U.S. to a desert locationa much harsher environment. The burn rate on many mechanical parts, such as turbine blades and other engine parts, goes up in such environments. Data on how much the burn rate increases can be modeled and compared to existing parts inventories as well as anticipated replacement schedules. This information can then be used to determine if and when additional orders need to be placed to ship additional parts, or if additional parts need to be built. From a manpower perspective, personnel/billet location can be combined with skills and compared against mission plans to determine the availability of personnel with the critical skills needed to complete a particular mission. In turn, that information can be combined with logistical and spatial/positional information and presented to a regional commander through a dashboard, scorecard and/or reports. This gives the commander a much more cohesive and complete picture of asset location and readiness, as well as presenting the information necessary for predicting when critical capabilities may become unavailable due to logistical shortfalls.

For more information


To learn more about IBM InfoSphere Information Server, InfoSphere Foundation Tools and GIS, please visit:

ibm.com/software/data/infosphere ibm.com/software/data/integration/info_server ibm.com/software/data/infosphere/foundation-tools/index.html

To learn more about SPSS, an IBM company, please visit:


ibm.com/software/data/info/spss

To learn more about IBM Cognos solutions, please visit:


ibm.com/software/data/cognos

Building trustworthy data with IBM software


Currently available off-the-shelf IBM software components, such as InfoSphere Information Server, InfoSphere Foundation Tools, Cognos and SPSS can be used to increase not only the range of data sources available to a GIS, but also to greatly increase the quality and trustworthiness of that data. IBM solutions help you extend and leverage your existing investments in systems and data. Using these existing and tested components in new ways can improve your agencys or organizations ability to meet new and evolving goals.

Copyright IBM Corporation 2010 IBM Software Group Route 100 Somers, NY 10589 Produced in the United States of America June 2010 All Rights Reserved IBM, the IBM logo, ibm.com and InfoSphere are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their rst occurrence in this information with a trademark symbol ( or ), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark information at
ibm.com/legal/copytrade.shtml

About the author


Dave McDermott Information Technology Specialist Federal/InfoSphere Information Server IBM Software Group

Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries or both. Microsoft is a trademark of Microsoft Corporation in the United States, other countries or both. Other company, product or service names may be trademarks or service marks of others. Please Recycle

IMW14319-USEN-00

You might also like