You are on page 1of 4

Informatica PowerCenter Data Profiling

The Informatica PowerCenter Data Profiling option furnishes thorough, accurate information about the content, quality, and structure of the data in virtually any operational system to enable effective data integration. With the Data Profiling option, development teams leverage Informaticas unified data integration architecture and codeless development environment to discover and understand source data and easily create and leverage historic profiling performance metrics. By using the Data Profiling option, organizations can lower costs, speed development, and increase data quality across their data integration initiatives.

Benefits of Informatica PowerCenter Profiling


Improve data quality over time with historical dashboards and reports Reduce development iterations by leveraging reusable development assets Ensure data is fit for its intended purpose by easily profiling source data and eliminating unfounded assumptions Speed time-to-benefit through a single, fully integrated GUI and data integration environment that requires little training.

Assumptions Lead to Poor Data Quality


Understanding the content, quality, and structure of source data is critical to the success of any data integration effort. Manual steps to distill information about source data require exhaustive searches through external documentation, data dictionary definitions, copybooks, and tens of thousands of rows actual data, only to result in out-of-date and incorrect information. Rather than spending considerable time to understand the state of the data many project teams make quality assumptions and then simply dump data into the new ERP system, data warehouse, or target system often resulting in problem data. Reliance on assumptions or manually derived information about the data in operational systems propogates poor data quality, a target system unfit for its intended use, extended time to valueeven failed data integration projects. Complete understanding of source data and the quality of the data in operational systems is critical for the success of any data integration or conversion effort. Data integration development teams need an automated method for exposing valuable information about their underlying source systems to eliminate assumptions. The solution should automatically generate reusable development assets to fuel data integration projects. In addition, a data profiling solution that enables ongoing profiling and captures historical trends and statistical information--and also provides visibility into this information through reports and dashboards-- can help organizations dramatically improve data quality over time.

Understanding the data elements in their operational systems is particularly critical for enterprises starting e-business initiatives, customer relationship management (CRM) strategies, data warehouse implementations, or enterprise resource planning (ERP) migrations, yet it is often ignored. Unless enterprises take the time to understand the data available in their current and planned applications, they will end up with far too many databases to manage and will have problems with their data quality. Gartner Research

Ensure Data Quality With Comprehensive, Automatic Data Profiling


The PowerCenter Data Profiling option offers the capabilities necessary to implement fast, accurate, lower cost data integration. The Data Profiling option automatically profiles any data accessible to PowerCenter, including those sources supported by PowerExchange, eliminating upfront assumptions by building quality-based statistics. The Data Profiling option then autogenerates reusable development assets such as mappings and objects, which can be used for the initial data integration project and re-used for subsequent projects. By allowing organizations to profile data on an ongoing basis and to analyze changes to source data over time, the Data Profiling option supports ongoing efforts to improve data quality and ensures end user data confidence.

Figure 1: Right Click creation of an automatic profile of customer data

Improve Data Quality Over Time With Historical Dashboards and Reports
Because data is not static, enterprises must understand key metrics and trends in their data over time. The Data Profiling option offers an Interactive Profiling mode that displays results immediately as well as a Batch mode that provides ongoing data profiling and quality metrics. The Data Profiling option stores the metadata statistics for each run in a time-aware data warehouse. Users can view results in line or through a business intelligence tool, such as Informatica PowerAnalyzer, which provides reports and dashboards to illustrate changes in data content, quality, structure, and values over time. This capability is critical for any organization that wants to understand the effectiveness of data quality improvement efforts over time. It also ensures the ongoing accuracy of data integration logic as source data evolves.

Leverage Reusable Development Assets and Reduce Iterations


Codifying business rules for profiling and integrating data from operational systems can be a tedious and time-consuming process. The Informatica Data Profiling Option eliminates many of the iterative, manual processes necessary to profile and integrate data by automatically generating mapping logic that extracts information about the source data. Further, the autogenerated mapping logic can be re-used in full or in part for subsequent data integration processes. By automatically generating re-usable development assets, the Informatica Data Profiling Option replaces the manual processes involved with profiling data, eliminates error-prone steps in the integration process, and creates consistency across integration routines.

Ensure Data is Fit For Its Intended Purpose


The PowerCenter Data Profiling Option eliminates assumptions by uncovering accurate metadata statistics about the actual data in operational systems. Users have a choice of auto or custom profiling for generating rules that drive profiling. The wizard-driven auto approach enables team members to gain valuable insights quickly and with minimal effort. The custom profile feature provides users full control over the profile creation process, allowing users to gather detailed metrics about their source data and capture exception rows for analysis prior to building an integration workflow. By performing this profiling, organizations can enhance the quality of data delivered for business critical systems by uncovering data quality issues before initiating integration development.

Figure 2: Sample Profiling Data Quality Report of a Customer Data Source

Optimize Performance
Understand Even the Most Complex or Arcane Systems
An enterprise engaged in a data integration project must be able to connect to and profile any source system, whether it is a relational database, flat file, ERP, CRM, real-time, or mainframe legacy system. While most profiling tools on the market require time consuming conversion of most sources into a relational or flat file, the Data Profiling option leverages Informaticas universal connectivity to provide analytical profiling capabilities into even the most complex or arcane system. This unique feature allows users to profile source data regardless of the source system, project requirements, or system architecture, without converting complex data into relational or flat files.

The warehouse in which results are stored also has an open architecture that provides access to any industry reporting tool or custom built data delivery system. Users accustomed to a specific report format or interface will receive results in a familiar manner, which enhances acceptance rates, reduces the potential for an incorrect reading, and increases the likelihood of success.

High-Performance, GUI-Driven Environment Speeds Time-To-Benefit


Most standalone profiling tools suffer from sluggish performance because of architecture and connectivity limitations. These tools add cycle time to the data integration process and limit the effectiveness of the profiling process. Informaticas Data Profiling option is built on PowerCenters adaptive metadata driven engine, allowing it to benefit from performance features such as optimized partitioning, GUI-driven Data Smart Parallelism, in-memory caching, and unlimited linear scalability to power the profiling process. Scalable, high-volume throughput allows data integration teams to rapidly profile the entire set of source data, providing more accurate information about the data and reduced overall project risk.

WORLDWIDE HEADQUARTERS 2100 Seaport Boulevard Redwood City, CA 94063, USA Phone 650.385.5000 Fax 650.385.5500 Toll-free in the US 1.800.970.1179 www.informatica.com 2004 Informatica Corporation. All rights reserved. Printed in the U.S.A. Informatica, the Informatica logo, Turning integration into insight, PowerCenter, PowerExchange, and PowerAnalyzer are trademarks or registered trademarks of Informatica Corporation in the United State and in jurisdictions throughout the world. All other company and product names may be tradenames or trademarks of their respective owners. 6555 (10.19.04)

You might also like