Professional Documents
Culture Documents
0)
Informatica 9 Getting Started Guide Version 9 .0 Copyright (c) 1998-2010 . All rights reserved.
This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. This Software may be protected by U.S. and/or international Patents and other Patents Pending. Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013 (1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable. The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us in writing. Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange and Informatica On Demand are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rights reserved. Copyright Sun Microsystems. All rights reserved. Copyright RSA Security Inc. All Rights Reserved. Copyright Ordinal Technology Corp. All rights reserved.Copyright Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright 2007 Isomorphic Software. All rights reserved. Copyright Meta Integration Technology, Inc. All rights reserved. Copyright Intalio. All rights reserved. Copyright Oracle. All rights reserved. Copyright Adobe Systems Incorporated. All rights reserved. Copyright DataArt, Inc. All rights reserved. Copyright ComponentSource. All rights reserved. Copyright Microsoft Corporation. All rights reserved. Copyright Rouge Wave Software, Inc. All rights reserved. Copyright Teradata Corporation. All rights reserved. Copyright Yahoo! Inc. All rights reserved. Copyright Glyph & Cog, LLC. All rights reserved. This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and other software which is licensed under the Apache License, Version 2.0 (the "License"). You may obtain a copy of the License at http://www.apache.org/licenses/ LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright 1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under the GNU Lesser General Public License Agreement, which may be found at http://www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvine, and Vanderbilt University, Copyright ( ) 1993-2006, all rights reserved. This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistribution of this software is subject to terms available at http://www.openssl.org. This product includes Curl software which is Copyright 1996-2007, Daniel Stenberg, <daniel@haxx.se>. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies. The product includes software copyright 2001-2005 ( ) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://www.dom4j.org/ license.html. The product includes software copyright 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http:// svn.dojotoolkit.org/dojo/trunk/LICENSE. This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding this software are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/ license.html. This product includes software copyright 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at http://www.gnu.org/software/ kawa/Software-License.html. This product includes OSSP UUID software which is Copyright 2002 Ralf S. Engelschall, Copyright 2002 The OSSP Project Copyright 2002 Cable & Wireless Deutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php. This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are subject to terms available at http:/ /www.boost.org/LICENSE_1_0.txt.
This product includes software copyright 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http://www.pcre.org/license.txt. This product includes software copyright 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http:// www.eclipse.org/org/documents/epl-v10.php. This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/ overlib/?License, http://www.stlport.org/doc/license.html, http://www.asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://httpunit.sourceforge.net/doc/license.html, http://jung.sourceforge.net/license.txt , http:// www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/license.html, http://www.libssh2.org, http://slf4j.org/ license.html, and http://www.sente.ch/software/OpenSourceLicense.htm. This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and Distribution License (http://www.opensource.org/licenses/cddl1.php) the Common Public License (http:// www.opensource.org/licenses/cpl1.0.php) and the BSD License (http://www.opensource.org/licenses/bsd-license.php). This product includes software copyright 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this software are subject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab. For further information please visit http://www.extreme.indiana.edu/. This Software is protected by U.S. Patent Numbers 5,794,246; 6,014,670; 6,016,501; 6,029,178; 6,032,158; 6,035,307; 6,044,374; 6,092,086; 6,208,990; 6,339,775; 6,640,226; 6,789,096; 6,820,077; 6,823,373; 6,850,947; 6,895,471; 7,117,215; 7,162,643; 7,254,590; 7, 281,001; 7,421,458; and 7,584,422, international Patents and other Patents Pending.. DISCLAIMER: Informatica Corporation provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. Informatica Corporation does not warrant that this software or documentation is error free. The information provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is subject to change at any time without notice.
NOTICES This Informatica product (the Software) includes certain drivers (the DataDirect Drivers) from DataDirect Technologies, an operating company of Progress Software Corporation (DataDirect) which are subject to the following terms and conditions: 1. THE DATADIRECT DRIVERS ARE PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. 2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT INFORMED OF THE POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT LIMITATION, BREACH OF CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS. Part Number: IN-GSG-90000-0002
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Informatica Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Informatica Customer Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Informatica Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Informatica Web Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Informatica How-To Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Informatica Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Informatica Multimedia Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Informatica Global Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Part I: Getting Started with Informatica Administrator. . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter 2: Lesson 1. Accessing Informatica Administrator. . . . . . . . . . . . . . . . . . . 13
Accessing Informatica Administrator Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Task 1. Record Domain and User Account Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Informatica Domain Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Informatica Administrator User Account. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Task 2. Log In to Informatica Administrator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Accessing Informatica Administrator Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Table of Contents
Part II: Getting Started with Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Chapter 6: Lesson 1. Setting Up Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . 31
Setting Up Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Task 1. Log In to Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Task 2. Create a Project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Task 3. Create a Folder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Setting Up Informatica Analyst Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
ii
Table of Contents
Part III: Getting Started with Informatica Developer (Data Quality). . . . . . . . . . . . . . . . 55 Chapter 14: Lesson 1. Setting Up Informatica Developer. . . . . . . . . . . . . . . . . . . . . 56
Setting Up Informatica Developer Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Task 1. Start Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Task 2. Add a Domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Task 3. Add a Model Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Table of Contents
iii
Task 4. Create a Project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Task 5. Create a Folder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Task 6. Select a Default Data Integration Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Setting Up Informatica Developer Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
iv
Table of Contents
Step 2. Add Data Objects to the Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Step 3. Add a Standardizer Transformation to the Mapping. . . . . . . . . . . . . . . . . . . . . . . 78 Step 4. Configure the Standardizer Transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Task 3. Run the Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Task 4. View the Mapping Output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Standardizing Data Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Part IV: Getting Started with Informatica Developer (Data Services). . . . . . . . . . . . . . . 90 Chapter 20: Lesson 1. Setting Up Informatica Developer. . . . . . . . . . . . . . . . . . . . . 91
Setting Up Informatica Developer Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Task 1. Start Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Task 2. Add a Domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Task 3. Add a Model Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Task 4. Create a Project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Task 5. Create a Folder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Task 6. Select a Default Data Integration Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Setting Up Informatica Developer Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Table of Contents
vi
Table of Contents
Preface
The Informatica Getting Started Guide is written for data quality and data services developers and analysts. It provides a tutorial to help first-time users learn how to use Informatica Developer and Informatica Analyst. This guide assumes that you have an understanding of data quality concepts, flat file and relational database concepts, and the database engines in your environment.
Informatica Resources
Informatica Customer Portal
As an Informatica customer, you can access the Informatica Customer Portal site at http://my.informatica.com. The site contains product information, user group information, newsletters, access to the Informatica customer support case management system (ATLAS), the Informatica How-To Library, the Informatica Knowledge Base, the Informatica Multimedia Knowledge Base, Informatica Documentation Center, and access to the Informatica user community.
Informatica Documentation
The Informatica Documentation team takes every effort to create accurate, usable documentation. If you have questions, comments, or ideas about this documentation, contact the Informatica Documentation team through email at infa_documentation@informatica.com. We will use your feedback to improve our documentation. Let us know if we can contact you regarding your comments. The Documentation team updates documentation as needed. To get the latest documentation for your product, navigate to the Informatica Documentation Center from http://my.informatica.com.
vii
Standard Rate Brazil: +55 11 3523 7761 Mexico: +52 55 1168 9763 United States: +1 650 385 5800
Standard Rate Belgium: +32 15 281 702 France: +33 1 41 38 92 26 Germany: +49 1805 702 702 Netherlands: +31 306 022 797 Spain and Portugal: +34 93 480 3760 United Kingdom: +44 1628 511 445
viii
Preface
CHAPTER 1
contain a subset of application services. You configure the application services that are required by the application clients that you use.
Repositories. A group of relational databases that store metadata about objects and processes required to
Manager runs the application services and performs domain functions including authentication, authorization, and logging. You can log in to Informatica Administrator after you install Informatica 9.0. You use Informatica Administrator to manage the domain and configure the required application services before you can access the remaining application clients. The following figure shows the application services and the repositories that each application client uses in an Informatica domain:
The following table lists the application clients, not including Informatica Administrator, and the application services and the repositories that the client requires:
Application Client Data Analyzer Informatica Analyst Application Services Reporting Service - Analyst Service - Data Integration Service - Model Repository Service - Analyst Service - Data Integration Service - Model Repository Service - Metadata Manager Service - PowerCenter Integration Service - PowerCenter Repository Service Repositories Data Analyzer repository Model repository
Informatica Developer
Model repository
Metadata Manager
Application Services - PowerCenter Integration Service - PowerCenter Repository Service - PowerCenter Integration Service - PowerCenter Repository Service - Web Services Hub
PowerCenter repository
The following application services are not accessed by an Informatica application client:
PowerExchange Listener Service. Manages the PowerExchange Listener for bulk data movement and change
data capture. The PowerCenter Integration Service connects to the PowerExchange Listener through the Listener Service.
PowerExchange Logger Service. Manages the PowerExchange Logger for Linux, UNIX, and Windows to
capture change data and write it to the PowerExchange Logger Log files. Change data can originate from DB2 recovery logs, Oracle redo logs, a Microsoft SQL Server distribution database, or data sources on an i5/OS or z/OS system.
SAP BW Service. Listens for RFC requests from SAP BI and requests that the PowerCenter Integration Service
Feature Availability
Informatica 9.0 products use a common set of applications. The product features you can use depend on your product license. The following table describes the licensing options and the application features available with each option:
Licensing Option Data Explorer Advanced Edition Informatica Developer Features - Profiling - Scorecarding Informatica Analyst Features Profiling Scorecarding Create and run profiling rules Reference table management Profiling Scorecarding Reference table management Create profiling rules Run rules in profiles Bad and duplicate record management
Data Quality
- Create and run mappings with all transformations - Create and run rules - Profiling - Scorecarding - Export objects to PowerCenter
Informatica Developer Features - Create logical data object models - Create and run mappings with Data Services transformations - Create SQL data services - Export objects to PowerCenter - Create logical data object models - Create and run mappings with Data Services transformations - Create SQL data services - Export objects to PowerCenter - Create and run rules with Data Services transformations - Profiling
Note: Informatica Data Explorer Advanced Edition functionality is a subset of Informatica Data Quality functionality.
application services, nodes, grids, folders, database connection, applications, and licenses.
Security administrative tasks. Manage users, groups, roles, and privileges.
strengths and weaknesses. After you run a profile, you can selectively drill down to see the underlying rows from the profile results. You can also add columns to scorecards and add column values to reference tables.
Create rules in profiles. Create and apply rules within profiles. A rule is reusable business logic that defines
conditions applied to data when you run a profile. Use rules to further validate the data in a profile and to measure data quality progress.
Score data. Create scorecards to score the valid values for any column or the output of rules. Scorecards
display the value frequency for columns in a profile as scores. Use scorecards to measure and visually represent data quality progress. You can also view trend charts to view the history of scores over time.
Manage reference data. Create and update reference tables for use by analysts and developers to use in data
quality standardization and validation rules. Create, edit, and import data quality dictionary files as reference tables. Create reference tables to establish relationships between source data and valid and standard values. Developers use reference tables in standardization and lookup transformations in Informatica Developer.
Manage bad records and duplicate records. Fix bad records and consolidate duplicate records.
The Developer tool includes an editor, in which you can edit objects. In this example, the editor shows the Customer_Objects logical data object model. Depending on the object in the editor, the Developer tool displays views, such as the default view. The Developer tool also includes the following views that appear independently of the objects in the editor:
Object Explorer. Shows projects, folders, and the objects they contain. Outline. Shows dependent objects in an object. Properties. Shows object properties. Data Viewer. Shows the results of a mapping, data preview, or an SQL query. Validation Log. Shows object validation errors. Cheat Sheets. Shows cheat sheets.
You can hide any view and move any view to another location in the Developer tool. You can also display other views, such as the Search view. Click Window > Show View to select the views you want to display.
you can access the Informatica How-To Library, which contains articles about the Developer tool, Informatica Data Quality, and Informatica Data Services.
Workbench. Click the Workbench button to start working in the Developer tool.
Cheat Sheets
The Developer tool includes cheat sheets as part of the online help. A cheat sheet is a step-by-step guide that helps you complete one or more tasks in the Developer tool. After you complete a cheat sheet, you complete the tasks and see the results. For example, after you complete a cheat sheet to import and preview a relational physical data object, you have imported a relational database table and previewed the data in the Developer tool. To access cheat sheets, click Help > Cheat Sheets .
Data Quality
Use the data quality capabilities in the Developer tool to analyze the content and structure of your data and enhance the data in ways that meet your business needs. Use the Developer tool to design and run processes that achieve the following objectives:
Profile data. Profiling reveals the content and structure of your data. Profiling is a key step in any data project,
as it can identify strengths and weaknesses in your data and help you define your project plan.
Standardize data values. Standardize data to remove errors and inconsistencies that you find when you run a
profile. You can standardize variations in punctuation, formatting, and spelling. For example, you can ensure that the city, state, and ZIP code values are consistent.
Parse records. Parse data records to improve record structure and derive additional information from your data.
You can split a single field of freeform data into fields that contain different information types. You can also add information to your records. For example, you can flag customer records as personal or business customers.
Validate postal addresses. Address validation evaluates and enhances the accuracy and deliverability of your
postal address data. Address validation corrects errors in addresses and completes partial addresses by comparing address records against reference data from national postal carriers. Address validation can also add postal information that speeds mail delivery and reduces mail costs.
Find duplicate records. Duplicate record analysis compares a set of records against each other to find similar
or matching values in selected data columns. You set the level of similarity that indicates a good match between field values. You can also set the relative weight given to each column in match calculations. For example, you can prioritize surname information over forename information.
Create reference data tables. Reference data tables are key elements in data standardization. Informatica
provides a comprehensive set of reference data tables. You can create custom reference tables from columns in your source data.
Create and run data quality rules. Informatica provides pre-built rules that you can run or edit to suit your
available to users in the Developer tool and the Analyst tool. Users can collaborate on projects, and different users can take ownership of objects at different stages of a project.
Export mappings to PowerCenter. You can export mappings to PowerCenter to reuse the metadata for physical
Data Services
Data services are a collection of reusable operations that you can run against sources to access, transform, and deliver data. Use the data services capabilities in the Developer tool to achieve the following objectives:
Define logical views of data. A logical view of data describes the structure and use of data in an enterprise. You
can create a logical data object model that shows what types of data your enterprise uses and how that data is structured.
Map logical models to data sources or targets. Create a mapping that links objects in a logical model to data
sources or targets. You can link data from multiple, disparate sources to have a single view of the data. You can also load data that conforms to a model to multiple, disparate targets.
Create virtual views of data. You can deploy a logical model to a virtual federated database. End users can run
SQL queries against the virtual data without affecting the actual source data.
Export mappings to PowerCenter. You can export mappings to PowerCenter to reuse the metadata for physical
Profiling is a key step in any data project, as it can identify strengths and weaknesses in your data and help you define your project plan.
HypoStores Corporation must perform the following tasks to integrate data from the Los Angeles operation with data at the Boston headquarters:
Set up a single view of customer data from both locations. Create a virtual database to enable access to the customer data from both offices. Examine the Boston and Los Angeles data for data quality issues and resolve any issues that are identified. Check the Boston and Los Angeles customer data for duplicate records. Validate the accuracy of the postal address information in the data for CRM purposes.
Tutorials
This guide contains four tutorials that are comprised of lessons and tasks. The administrator must complete the configuration lessons in the Administrator tutorial to set the environment for the other tutorials.
Lessons
Each lesson introduces concepts that will help you understand the tasks to complete in the lesson. The lesson provides business requirements from the overall story. The objectives for the lesson outline the tasks that you will complete to meet business requirements. Each lesson provides an estimated time for completion. When you complete the tasks in the lesson, you can review the lesson summary. If the environment within the tool is not configured, the first lesson in each tutorial helps you do so.
Tasks
The tasks provide step-by-step instructions. Complete all tasks in the order listed to complete the lesson.
Tutorial Prerequisites
Before you can begin the tutorial lessons, the Informatica domain must be running with at least one node set up. The installer includes tutorial files that you will use to complete the lessons. You can find all the files in both the client and server installations:
You can find the tutorial files in the following location in the Developer tool installation path:
<Informatica Installation Directory>\services\Tutorials The following table lists the files that you need for the tutorial lessons:
File Name All_Customers.csv Boston_Customers.csv Tutorial Data Quality, Data Services Data Quality, Data Services
The Validating Address Data lesson in the Data Quality tutorial reads address reference data for United States addresses. For information on the address reference data available in your domain, contact an Informatica Administrator user.
Description Create a custom profile to configure columns, and sampling and drilldown options. Create expression rules to modify and profile column values. Create and run a scorecard to measure data quality progress over time. Create a reference table that you can use to standardize source data.
Data Quality
Lesson 6. Creating and Running Scorecards Lesson 7. Creating Reference Tables from Profile Results
Data Quality Data Explorer Data Quality Data Explorer Data Services All
Create a reference table to establish relationships between source data and valid and standard values.
Note: This tutorial does not include lessons on bad record and consolidation record management.
10
11
12
CHAPTER 2
Objectives
In this lesson, you complete the following tasks:
Record the domain and the administrator user account information. The domain information provides address
components of the Administrator tool URL, and the user account provides access to the Administrator tool.
Log in to the Administrator tool. Lessons in this tutorial require that you can log in to the Administrator tool.
Prerequisites
Before you start this lesson, verify the following prerequisites:
The Informatica domain is running. The administrator or user who installed Informatica 9.0 has provided you with the domain connectivity
Timing
Set aside 5 to 10 minutes to complete this lesson.
13
14
3. 4.
Enter the user name and password. Select Native or the name of an LDAP security domain. The Security Domain field appears when the Informatica domain contains an LDAP security domain.
5.
Click Login.
15
CHAPTER 3
the Analyst tool, the Data Integration Service, and the Administrator tool store metadata in the Model repository.
Data Integration Service. The Data Integration Service is an application service that performs data integration
manages the connections between service components and the users who access the Analyst tool. A database connection is a domain object that contains database connectivity information. The Data Integration Service connects to the database to process integration objects for the Developer tool and the Analyst tool. Integration objects include mappings, profiles, scorecards, and SQL data services.
Story
An administrator at HypoStores needs to create application services. Developers, architects, and analysts need application services to use the Developer and Analyst tools.
16
Objectives
In this lesson, you complete the following tasks:
Create a Model Repository Service to store metadata. Create database connections to the profiling warehouse and the staging databases. Create a Data Integration Service to perform data integration tasks. Create an Analyst Service to run the Analyst tool.
Prerequisites
Before you start this lesson, verify the following prerequisites:
The database administrator has provided you with the Model repository database connectivity information. You
must have the database connection information to create the Model Repository Service. All tutorials in this guide require the Model repository database.
The database administrator has provided you with the profiling warehouse database connectivity information.
Use this information to create the database connection to the profiling warehouse. The tutorial for the Analyst tool requires a profiling warehouse.
The database administrator has provided you with the staging database connectivity information. Use this
information to create a database connection to the staging database. The Analyst Service uses a staging database. The tutorial for the Analyst tool requires a profiling warehouse.
Timing
Set aside 15 to 20 minutes to complete this lesson.
License Node
4.
Click Next.
17
The New Model Repository Service - Step 2 of 2 dialog box appears. 5. Enter the following required information:
Property Database Type Username Password Connection String Description Type of database. Database user name. Database user password. The JDBC connect string used to connect to the Model repository database. - IBM DB2: jdbc:informatica:db2://<host name>:<port>;DatabaseName=<database name> - Oracle: jdbc:informatica:oracle://<host_name>:<port>;SID=<database name> - Microsoft SQL Server: jdbc:informatica:sqlserver://<host name>:<port>;DatabaseName=<database name> Determines whether to create content in the repository. If the repository contains content, do not create content. If the repository does not contain content, choose to create content.
Creation options
6. 7.
Click Test Connection to verify connectivity. Click Finish. You have created a Model Repository Service.
8. On the Domain Actions menu, click Enable to make the Model Repository Service available. It may take a few minutes to enable the service.
18
Description Password for the database user name. JDBC connection URL used to access metadata from the database. - IBM DB2: jdbc:informatica:db2://<host name>:<port>;DatabaseName=<database name> - Oracle: jdbc:informatica:oracle:// <host_name>:<port>;SID=<database name> - Microsoft SQL Server: jdbc:informatica:sqlserver://<host name>:<port>;DatabaseName=<database name> Not applicable for ODBC.
Connection string used to access data from the database. - IBM DB2: <database name> - Microsoft SQL Server: <server name>@<database name> - ODBC: <data source name> - Oracle: <database name>.world from the TNSNAMES entry.
Code page used to read from a source database or write to a target database or target file.
4. 5. 6.
Click Test Connection to verify metadata access connectivity. Click OK. Repeat steps 2 through 5 for each required database.
7. Click Close You created a database connection for the profiling warehouse and the staging database.
19
3.
Username
4.
Click Next. The New Data Integration Service - Step 2 of 2 dialog box appears.
5.
To complete the Analyst tool lessons, click Select to choose a connection for a profiling warehouse database. The Select Database Connections dialog box appears. a. b. c. Select the connection to the profiling warehouse database. Choose to use existing content or create content. Click OK.
6.
7.
On the Domain tab Actions menu, click Enable to make the Data Integration Service available.
20
3.
Username
4.
Click Next. The Create New Analyst Service Step 2 of 3 dialog box appears.
5.
6.
Click Select to select a staging database. The Select Database Connections dialog box appears.
7. 8.
21
9.
Click OK.
10. Click Next. 11. Click Finish. You have created an Analyst Service. 12. On the Domain tab Actions menu, click Enable to make the Analyst Service available.
22
CHAPTER 4
Users also require permissions to run a query against an SQL data service.
Story
An administrator at HypoStores gets a user account request from a developer and an analyst. They both need access to the Developer and Analyst tools.
Objectives
In this lesson, you complete the following tasks:
Create users to log in to the Developer and Analyst tools. Grant user privileges to access the Analyst tool and to create projects in the Analyst and Developer tools. Grant permissions on an SQL data service. Users require permissions to run a query against an SQL data
service. Repeat the task for each user account that you need to create.
23
Prerequisites
Before you start this lesson, verify the following prerequisites:
You have completed lessons 1 and 2 in this tutorial. Before you can grant permissions, an application with an SQL Data Service must be deployed to a Data
Integration Service.
Timing
Set aside 5 to 10 minutes to complete this lesson.
4.
Click OK.
5. Complete steps 1 through 4 for each user that you want to create. You created user accounts that can be used to log into application clients, such as the Administrator tool, Developer tool, or the Analyst tool.
24
4.
Click Edit. The Edit Roles and Privileges dialog box appears.
5. 6.
MRS_CREATE_PROJECT
7. Click OK. Provide the user names to each user who will complete lessons in the Developer tool or the Analyst tool tutorials.
10. In the Permission section, select all the permission options. 11. Click OK.
25
26
CHAPTER 5
Story
An administrator at HypoStores wants to view the status of jobs and SQL data services running on a Data Integration Service.
27
Objective
In this lesson, you complete the following tasks:
View running and previously run jobs for a Data Integration Service to check for failures. View connections to an SQL data service to check for active and timed out connections. View requests for an SQL data service to view running requests.
Prerequisites
Before you start this lesson, verify the following prerequisite:
Developers and analysts are running jobs and SQL data services on a Data Integration Service in the domain.
Timing
Set aside 5 to 10 minutes to complete this lesson.
28
29
30
CHAPTER 6
Objectives
In this lesson, you complete the following tasks:
Log in to the Analyst tool. Create a project to store the objects that you create in the Analyst tool. Create a folder in the project that can store related objects.
31
Prerequisites
Before you start this lesson, verify the following prerequisites:
An administrator has configured a Model Repository Service and an Analyst Service in the Administrator tool. You have the host name and port number for the Analyst tool. You have a user name and password to access the Analyst Service. You can get this information from an
administrator.
Timing
Set aside 5 to 10 minutes to complete this lesson.
On the login page, enter the user name and password. Select Native or the name of a specific security domain. The Security Domain field appears when the Informatica domain contains an LDAP security domain. If you do not know the security domain that your user account belongs to, contact the Informatica domain administrator.
5.
6.
Click Close to exit the welcome screen and access the Analyst tool.
32
33
CHAPTER 7
Story
HypoStores keeps the Los Angeles customer data in flat files. HypoStores needs to profile and analyze the data and perform data quality tasks.
Objectives
In this lesson, you complete the following tasks: 1. 2. Upload the flat file to the flat file cache location and create a data object. Preview the data for the flat file data object.
Prerequisites
Before you start this lesson, verify the following prerequisites:
You have completed lesson 1 in this tutorial. You have the LA_Customers.csv flat file. You can download the file here (requires a my.informatica.com
account).
Timing
Set aside 5 to 10 minutes to complete this task.
34
35
36
CHAPTER 8
Story
HypoStores wants to incorporate data from the newly-acquired Los Angeles office into its data warehouse. Before the data can be incorporated into the data warehouse, it needs to be cleansed. You are the analyst who is responsible for assessing the quality of the data and passing the information on to the developer who is responsible for cleansing the data. You want to view the profile results quickly and get a basic idea of the data quality.
Objectives
In this lesson, you complete the following tasks: 1. 2. Create and run a quick profile for the Customers_LA flat file data object. View the profile results.
Prerequisites
Before you start this lesson, verify the following prerequisite:
You have completed lessons 1 and 2 in this tutorial.
Timing
Set aside 5 to 10 minutes to complete this lesson.
37
1.
Click the header for the Null Values column to sort the values. Notice that the Address2, Address3, City2, CreateDate, and MiscDate columns have 100% null values. In Lesson 4, you create a custom profile to exclude these columns.
2.
Click the Full Name column. The values for the column appear in the Values view. Notice that the first and last names do not appear in separate columns.
38
In Lesson 5, you create a rule to separate the first and last names into separate columns. 3. Click the CustomerTier column. Notice that the values for the CustomerTier are inconsistent. In Lesson 6, you create a scorecard to score the CustomerTier values. In Lesson 7, you create a reference table that a developer can use to standardize the CustomerTier values. 4. Click the State column and then click the Patterns view. Notice that 483 columns have a pattern of XX, which indicate valid values. Seventeen values are not valid because they do not match the valid pattern. In Lesson 6, you create a scorecard to score the State values.
39
CHAPTER 9
Story
HypoStores needs to incorporate data from the newly-acquired Los Angeles office into its data warehouse. HypoStores wants to access the quality of the customer tier data in the LA customer data file. You are the analyst who is responsible for assessing the quality of the data and passing the information on to the developer who is responsible for cleansing the data.
Objectives
In this lesson, you complete the following tasks: 1. 2. 3. Create a custom profile for the flat file data object and exclude the columns with null values. Run the profile to analyze the content and structure of the CustomerTier column. Drill down into the rows for the profile results.
Prerequisites
Before you start this lesson, verify the following prerequisite:
You have completed lessons 1, 2, and 3 in this tutorial.
Timing
Set aside 5 to 10 minutes to complete this lesson.
40
10. In the Sampling Options panel, select the All Rows option. 11. In the Drilldown Options panel, verify that Enable Row Drilldown is selected and select on staged data for the Drilldown option. 12. Click Save. The Analyst tool creates the profile and displays the profile in another tab.
41
42
CHAPTER 10
Story
HypoStores wants to incorporate data from the newly-acquired Los Angeles office into its data warehouse. HypoStores wants to analyze the customer names and separate customer names into first name and last name. HypoStores wants to use expression rules to parse a column that contains first and last names into separate virtual columns and then profile the columns. HypoStores also wants to make the rules available to other analysts who need to analyze the output of these rules.
Objectives
In this lesson, you complete the following tasks: 1. Create expression rules to separate the FullName column into first name and last name columns. You create a rule that separates the first name from the full name. You create another rule that separates the last name from the first name. You create these rules for the Profile_LA_Customers_Custom profile. Run the profile and view the output of the rules in the profile. Edit the rules to make them usable for other Analyst tool users.
2. 3.
43
Prerequisites
Before you start this lesson, verify the following prerequisite:
You have completed Lessons 1, 2, 3, and 4.
Timing
Set aside 10 to 15 minutes to complete this lesson.
Click Validate. Click Save. The Analyst tool creates the rule and displays it in the Column Profiling view.
9.
Repeat steps 2 through 8 and create a rule named LastName and enter the following expression to separate the last name from the Name column:
SUBSTR(FullName,INSTR(FullName,' ',-1,1),LENGTH(FullName))
44
5. Repeat steps 1 through 4. The FirstName and LastName rules can now be used by any Analyst tool user to split a column with first and last names into separate columns.
45
CHAPTER 11
Story
HypoStores wants to incorporate data from the newly-acquired Los Angeles office into its data warehouse. Before they merge the data they want to make sure that the data in different customer tiers and states is analyzed for data quality. You are the analyst who is responsible for monitoring the progress of performing the data quality analysis You want to create a scorecard from the customer tier and state profile columns, configure thresholds for data quality, and view the score trend charts to determine how the scores improve over time.
46
Objectives
In this lesson, you will complete the following tasks: 1. 2. 3. 4. 5. 6. Create a scorecard from the results of the Profile_LA_Customers_Custom profile to view the scores for the CustomerTier and State columns. Run the scorecard to generate the scores for the CustomerTier and State columns. View the scorecard to see the scores for each column. Edit the scorecard to specify different valid values for the scores. Configure score thresholds and run the scorecard. View score trend charts to determine how scores improve over time.
Prerequisites
Before you start this lesson, verify the following prerequisite:
You have completed lessons 1 through 5 in this tutorial.
Timing
Set aside 15 minutes to complete the tasks in this lesson.
10. For each score in the Scores panel, accept the default settings for the score thresholds in Score Settings panel. 11. Click Finish.
47
48
49
CHAPTER 12
Story
HypoStores wants to profile the data to uncover anomalies and standardize the data with valid values. You are the analyst who is responsible for standardizing the valid values in the data. You want to create a reference table based on valid values from profile columns.
Objectives
In this lesson, you complete the following tasks: 1. 2. Create a reference table from the CustomerTier column in the Profile_LA_Customers_Custom profile by selecting valid values for columns. Edit the reference table to configure different valid values for columns.
Prerequisites
Before you start this lesson, verify the following prerequisite:
You have completed lessons 1 through 6 in this tutorial.
50
Timing
Set aside 15 minutes to complete the tasks in this lesson.
10. In the Column Attributes panel, configure the following column properties for the CustomerTier column:
Property Name Datatype Precision Scale Description Description CustomerTier String 10 0 Reference customer tier values
11. Optionally, choose to create a description column for rows in the reference table. Enter the name and precision for the column. 12. Preview the CustomerTier column values in the Preview panel. 13. Click Next. The Reftab_CustomerTier_HypoStores reference table name appears. You can enter an optional description. 14. In the Save in panel, select your tutorial project where you want to create the reference table. The Reference Tables: panel lists the reference tables in the location you select.
51
52
CHAPTER 13
Story
HypoStores wants to standardize data with valid values. You are the analyst who is responsible for standardizing the valid values in the data. You want to create a reference table to define standard customer tier codes that reference the LA customer data. You can then share the reference table with a developer.
Objectives
In this lesson, you complete the following task:
Create a reference table using the reference table editor to define standard customer tier codes that reference
Prerequisites
Before you start this lesson, verify the following prerequisite:
You have completed lessons 1 and 2 in this tutorial.
Timing
Set aside 10 minutes to complete the task in this lesson.
53
54
55
CHAPTER 14
Objectives
In this lesson, you complete the following tasks:
Start the Developer tool and go to the Developer tool workbench. Add a domain in the Developer tool. Add a Model repository so that you can create a project. Create a project to store the objects that you create in the Developer tool.
56
Create a folder in the project that can store related objects. Select a default Data Integration Service to perform data integration tasks.
Prerequisites
Before you start this lesson, verify the following prerequisites:
You have installed the Developer tool. You have a domain name, host name, and port number to connect to a domain. You can get this information
Timing
Set aside 5 to 10 minutes to complete the tasks in this lesson.
57
58
59
CHAPTER 15
Story
HypoStores Corporation stores customer data from the Los Angeles office and Boston office in flat files. You want to work with this customer data in the Developer tool. To do this, you need to import each flat file as a physical data object.
Objectives
In this lesson, you import flat files as physical data objects. You also set the source file directory so that the Data Integration Service can read the source data from the correct directory.
Prerequisites
Before you start this lesson, verify the following prerequisite:
You have completed lesson 1 in this tutorial.
Timing
Set aside 10 to 15 minutes to complete the tasks in this lesson.
60
10. Click Next. 11. Verify that the delimiter is set to comma. 12. Select Import Field Names from First Line. 13. Click Finish. The Boston_Customers physical data object appears under Physical Data Objects in the tutorial project. 14. Click the Read view. 15. Click the Runtime tab on the Properties view. 16. Set the Source File Directory to the following directory on the Data Integration Service machine: \Informatica\9. 0\server\Tutorial 17. Click File > Save .
61
4. 5. 6.
Select Create from an Existing Flat File. Click Browse and navigate to LA_Customers.csv in the following directory: <Informatica Installation Directory> \clients\DeveloperClient\Tutorial Click Open. The wizard names the data object LA_Customers.
7. 8. 9.
Click Next. Verify that the code page is MS Windows Latin 1 (ANSI), superset of Latin 1. Verify that the format is delimited.
10. Click Next. 11. Verify that the delimiter is set to comma. 12. Select Import Field Names from First Line. 13. Click Finish. The LA_Customers physical data object appears under Physical Data Objects in the tutorial project. 14. Click the Read view. 15. Click the Runtime tab on the Properties view. 16. Set the Source File Directory to the following directory on the Data Integration Service machine: \Informatica\9. 0\server\Tutorial 17. Click File > Save .
10. Click Next. 11. Verify that the delimiter is set to comma.
62
12. Select Import Field Names from First Line. 13. Click Finish. The All_Customers physical data object appears under Physical Data Objects in the tutorial project. 14. Click the Read view. 15. Click the Runtime tab on the Properties view. 16. Set the Source File Directory to the following directory on the Data Integration Service machine: \Informatica\9. 0\server\Tutorial 17. Click File > Save .
63
CHAPTER 16
as a percentage value. Use join analysis profiles to identify possible problems with column join conditions. You can run a profile at any stage in a project to measure data quality and to verify that changes to the data meet your project objectives. You can run a profile on a transformation in a mapping to indicate the effect that the transformation will have on data.
Story
HypoStores wants to verify that customer data is free from errors, inconsistencies, and duplicate information. Before HypoStores designs the processes to deliver the data quality objectives, it needs to measure the quality of its source data files and confirm that the data is ready to process.
64
Objectives
In this lesson, you complete the following tasks:
Perform a join analysis on the Boston_Customers data source and the LA_Customers data source. View the results of the join analysis to determine whether or not you can successfully merge data from the two
offices.
Run a profile on the All_Customers data source. View the column profiling results to observe the values and patterns contained in the data.
Prerequisites
Before you start this lesson, verify the following prerequisite:
You have completed lessons 1 and 2 in this tutorial.
Time Required
Set aside 20 minutes to complete this lesson.
65
5.
66
67
CHAPTER 17
Story
HypoStores wants the format of customer data files from the Los Angeles office to match the format of the data files from the Boston office. The customer data from the Los Angeles office stores the customer name in a FullName column, while the customer data from the Boston office stores the customer name in separate FirstName and LastName columns. HypoStores needs to parse the Los Angeles FullName column data into first names and last names so that the format of the Los Angeles data will match the format of the Boston data.
68
Objectives
In this lesson, you complete the following tasks:
Create and configure an LA_Customers_tgt data object to contain parsed data. Create a mapping to parse the FullName column into separate FirstName and LastName columns. Add the LA_Customers data object to the mapping to connect to the source data. Add the LA_Customers_tgt data object to the mapping to create a target data object. Add a Parser transformation to the mapping and configure it to use a token set to parse full names into first
Prerequisites
Before you start this lesson, verify the following prerequisite:
You have completed lessons 1 and 2 in this tutorial.
Timing
Set aside 20 minutes to complete the tasks in this lesson.
69
9.
In the Preview Options section, select Import Field Names from First Line and click Next.
10. Click Finish. The LA_Customers_tgt data object appears in the editor.
10. In the Value column, double-click the Output file directory entry. 11. Right-click and select Paste to paste the directory location you copied from the Read view. 12. In the Value column, double-click the Header options entry and choose Output Field Names. 13. In the Value column, double-click the Output file name entry and type LA_Customers_tgt.csv. 14. Click File > Save to save the data object.
70
71
10. Click Select to select a delimiter. The Select Delimeters window opens. 11. Select the Space delimiter and click OK. 12. In the Parser transformation, click the Undefined_output port and drag it to the FirstName port in the LA_customers_tgt data object. A connection appears between the ports. 13. In the Parser transformation, click the OverflowField port and drag it to the LastName port in the LA_customers_tgt data object. A connection appears between the ports. 14. Click File > Save to save the mapping.
72
73
74
CHAPTER 18
Use the Standardizer transformation to search for these values in data. You can choose one of the following search operation types:
Text. Search for custom strings that you enter. Remove these strings or replace them with custom text. Reference table. Search for strings contained in a reference table that you select. Remove these strings, or
replace them with reference table entries or custom text. For example, you can configure the Standardizer transformation to standardize address data containing the custom strings Street and St. using the replacement string ST. The Standardizer transformation replaces the search terms with the term ST. and writes the result to a new data column.
Story
HypoStores needs to standardize its customer address data so that all addresses use terms consistently. The address data in the All_Customers data object contains inconsistently formatted entries for common terms such as Street, Boulevard, Avenue, Drive, and Park.
75
Objectives
In this lesson, you complete the following tasks:
Create and configure an All_Customers_Stdz_tgt data object to contain standardized data. Create a mapping to standardize the address terms Street, Boulevard, Avenue, Drive, and Park to a consistent
format.
Add the All_Customers data object to the mapping to connect to the source data. Add the All_Customers_Stdz_tgt data object to the mapping to create a target data object. Add a Standardizer transformation to the mapping and configure it to standardize the address terms. Run the mapping to generate standardized address data. Run the Data Viewer to view the mapping output.
Prerequisites
Before you start this lesson, verify the following prerequisite:
You have completed lessons 1 and 2 in this tutorial.
Timing
Set aside 15 minutes to complete this lesson.
10. Click Finish. The All_Customers_Stdz_tgt data object appears in the editor.
76
10. In the Value column, double-click the Output file directory entry. 11. Right-click and select Paste to paste the directory location you copied from the Read view. 12. In the Value column, double-click the Header options entry and choose Output Field Names. 13. In the Value column, double-click the Output file name entry and type All_Customers_Stdz_tgt.csv. 14. Click File > Save to save the data object.
77
78
9.
Click OK.
10. Select Replace with Custom Text. Type the entry from Replacement Text column in the table above that corresponds to the Text to Replace string you added in step 8. 11. Click OK. 12. Repeat steps 5 through 11 until you have added standardization operations for all rows in the table in step 8. 13. Click Select to select a delimiter. 14. Select the Space delimiter and the Comma delimiter and click OK. 15. Click File > Save to save the mapping.
79
80
CHAPTER 19
Story
HypoStores needs correct and complete address data to ensure that its direct mail campaigns and other consumer mail items reach its customers. Correct and complete address data also reduces the cost of mailing operations for
81
the organization. In addition, HypoStores needs its customer data to include addresses in a printable format that is flexible enough to include addresses of different lengths. To meet these business requirements, the HypoStores ICC team creates an address validation mapping in the Developer tool.
Objectives
In this lesson, you complete the following tasks:
Create a target data object that will contain the validated address fields and match codes. Create a mapping with a source data object, a target data object, and an Address Validator transformation. Configure the Address Validator transformation to validate the address data of your customers. Run the mapping to validate the address data, and review the match code outputs to verify the validity of the
address data.
Prerequisites
Before you start this lesson, verify the following prerequisites:
You have completed lessons 1 and 2 in this tutorial. United States address reference data is installed in the domain and registered with the Administrator tool.
Contact your Informatica administrator to verify that United States address data is installed on your system. The reference data installs through the Data Quality Content Installer.
Timing
Set aside 25 minutes to complete this lesson.
82
7. 8.
In the Preview Options section, select Import Field Names from First Line and click Next. Click Finish. The All_Customers_av_tgt data object appears in the editor.
10. In the Value column, double-click the Output file directory entry. 11. Right-click this entry and select Paste to add the path you copied from the Read view. 12. In the Value column, double-click the Header options entry and choose Output Field Names. 13. In the Value column, double-click the Output file name entry and type All_Customers_av_tgt.csv. 14. Select File > Save to save the data object.
83
84
85
4.
Expand the Hybrid input port group and select the following ports:
Port Name DeliveryAddressLine1 LocalityComplete1 Postcode1 Province1 CountryName Description Street address data, such as street name and building number. City or town name. Postcode or ZIP code. Province or state name. Country name or abbreviation.
5.
On the toolbar above the port names list, click Add selection to current model. This toolbar is visible when you select Templates. The selected ports appear in the transformation in the mapping editor.
6.
Connect the source ports to the Address Validator transformation ports as follows:
Source Port Address1 City ZIP State Country Address Validator Transformation Port DeliveryAddressLine1 LocalityComplete1 Postcode1 Province1 CountryName
86
5.
Expand the LastLine Elements output port group and select the following ports:
Port Name LocalityComplete1 Postcode1 ProvinceAbbreviation1 Description City or town name. Postcode or ZIP code. Province or state identifier.
Note: Hold the Ctrl key to select multiple ports in a single operation. 6. Expand the Country output port group and select the following port:
Port Name CountryName1 Description Country name.
7.
Expand the Status Info output port group and select the following ports:
Port Name MailabilityScore MatchCode Description Score that represents the chance of successful postal delivery. Code that represents the degree of similarity between the input address and the reference data.
8.
On the toolbar above the port names list, click Add selection to current model. This toolbar is visible when you select Templates.
9.
Connect the Address Validator transformation ports to the All_Customers_av_tgt ports as follows:
Address Validator Transformation Port StreetComplete1 LocalityComplete1 Postcode1 ProvinceAbbreviation1 CountryName1 MailabilityScore MatchCode Target Port Address1 City ZIP State Country MailabilityScore MatchCode
Connect the unused ports on the data source to the ports with the same names on the data target.
87
V3
88
MatchCode V2
Description Verified as deliverable by the Address Validator. Input data is correct, but there was an imperfect match with the reference data. Some reference data files may be missing files. Verified as deliverable by the Address Validator. Input data is correct but poor standardization has reduced address deliverability. Corrected by the Address Validator. All elements have been processed and corrected where necessary. Corrected by the Address Validator. All elements have been processed, but some elements could not be checked. Partially corrected by the Address Validator. because some reference data files may be missing. Corrected by the Address Validator, but poor standardization has reduced address deliverability. Input data could not be corrected, but the address is very likely to be deliverable as it matches a unique reference address. Input data could not be corrected, but the address is very likely to be deliverable as it matches multiple reference addresses. Input data could not be corrected, and deliverability is not likely. Input data could not be corrected, and deliverability is very unlikely. No validation was performed. This may be due to an absence of current or licensed reference data. The address may or may not be deliverable.
V1
C4 C3
C2 C1 I4
I3
I2 I1 N1... N6
Note: Although some MatchCode values can confirm the deliverability of an address, others provide guideline information only. In these cases, you may need to reconfigure the address template in the Address Validator transformation or check that the address reference data is up to date.
89
90
CHAPTER 20
Objectives
In this lesson, you complete the following tasks:
Start the Developer tool and go to the Developer tool workbench. Add a domain in the Developer tool. Add a Model repository so that you can create a project. Create a project to store the objects that you create in the Developer tool.
91
Create a folder in the project that can store related objects. Select a default Data Integration Service to perform data integration tasks.
Prerequisites
Before you start this lesson, verify the following prerequisites:
You have installed the Developer tool. You have a domain name, host name, and port number to connect to a domain. You can get this information
Timing
Set aside 5 to 10 minutes to complete the tasks in this lesson.
92
93
94
CHAPTER 21
Story
HypoStores Corporation stores customer data from the Los Angeles office and Boston office in flat files. HypoStores wants to work with this customer data in the Developer tool.
Objectives
In this lesson, you import flat files as physical data objects. You also set the source file directory so that the Data Integration Service can read the source data from the correct directory.
Prerequisites
Before you start this lesson, verify the following prerequisite:
You have completed lesson 1 in this tutorial.
Timing
Set aside 10 to 15 minutes to complete the tasks in this lesson.
95
10. Verify that the format is delimited. 11. Click Next. 12. Verify that the delimiter is set to comma. 13. Select Import column names from first line. 14. Click Finish. The Boston_Customers physical data object appears under Physical Data Objects in the tutorial project. 15. Click the Read view. 16. Click the Runtime tab on the Properties view. 17. Set Source file directory to the following directory on the Data Integration Service machine: \Informatica \9.0\server\Tutorial 18. Click the Data Viewer view. 19. Click Run. The Data Integration Service reads the data from the Boston_Customers file and shows the results in the Output window. 20. Click File > Save .
96
10. Click Next. 11. Verify that the delimiter is set to comma. 12. Select Import column names from first line. 13. Click Finish. The LA_Customers physical data object opens in the editor. 14. Click the Read view. 15. Click the Runtime tab on the Properties view. 16. Set the source file directory to the following directory on the Data Integration Service machine: \Informatica \9.0\server\Tutorial 17. Click the Data Viewer view. 18. Click Run. The Data Integration Service reads the data from the LA_Customers file and shows the results in the Output window. 19. Click File > Save .
97
CHAPTER 22
98
Story
HypoStores Corporation wants a single view of customer data from the Los Angeles and Boston offices. The enterprise data model requires that customer names use the same format regardless of the data source. The customer data from the Los Angeles office uses a different format for customer names than the data from the Boston office. The data from the Los Angeles office uses the correct format, so you need to reformat the customer data from the Boston office to conform to the data model.
Objectives
In this lesson, you complete the following tasks: 1. 2. Import a logical data object model that contains the Customer and Orders logical data objects. Create a logical data object mapping with the Customer logical data object as the mapping output. The mapping transforms the Boston customer data and defines a single view of the data from the Los Angeles and Boston offices. Run the mapping to view the combined customer data.
3.
Prerequisites
Before you start this lesson, verify the following prerequisite:
Complete lessons 1 and 2 in this tutorial.
Timing
Set aside 20 minutes to complete the tasks in this lesson.
99
10. Select the Expression transformation in the editor. 11. Select the FullName port. 12. Click the Down arrow at the top of the transformation twice to move the FullName port below the Customer Tier port. 13. Click File > Save . 14. Click the Data Viewer view. 15. Click Run. The Data Integration Service processes the data from the Boston_Customers source and the Expression transformation. The Developer tool shows the results in the Output window. The results show that the Data Integration Service has concatenated the FirstName and LastName columns from the source.
101
Right-click an empty area in the editor and click Run Data Viewer.
The Data Viewer view appears. After the Data Integration Service runs the mapping, the Developer tool shows the data in the Output section of the Data Viewer view. The output shows that you merged the FirstName and LastName columns of the Boston_Customers source. It also shows that you combined the data from the LA_Customers source and Boston_Customers source.
102
103
CHAPTER 23
104
Story
HypoStores Corporation wants to create a report about customers for the Los Angeles and Boston offices. However, the Los Angeles customer data is not in the central data warehouse. A developer in the IT department has combined the data for the Los Angeles and Boston customer offices in a logical data object model. The developer can make this data available to query in a virtual database. A business analyst can create a report based on the virtual data.
Objectives
In this lesson, you complete the following tasks: 1. 2. 3. 4. Create an SQL data service to define a virtual database that contains customer data. Preview the virtual data. Create an application that contains the SQL data service. Deploy the application to a Data Integration Service.
Prerequisites
Before you start this lesson, verify the following prerequisite:
Complete lessons 1, 2, and 3 in this tutorial.
Timing
Set aside 15 to 20 minutes to complete the tasks in this lesson.
105
4. 5.
Enter HypoStores_Customers for the SQL data service name and click Next. To create a virtual table, click the New button. The Developer tool adds a virtual table to the list of virtual tables.
6. 7.
Enter Customers for the virtual table name. Click the Open button in the Source column. The Select a Source dialog box appears.
8. 9.
In the tutorial folder, expand the Customer_Order logical data object model, and select the Customer logical data object. Click OK. The developer tool adds Customer as the virtual table source. It also specifies Data Object as the source type and the tutorial project as the location.
10. Enter Customer_Schema in the Virtual Schemas column and press Enter. 11. Click Finish. The Developer tool creates the HypoStores_Customers SQL data service. The SQL data service contains the Customers table and the Customers mapping.
106
107
CHAPTER 24
Story
You have developed a mapping that provides a single view of Los Angeles and Boston customer data. You want to export this mapping to PowerCenter so that you can apply version control and load the target data to the central data warehouse.
Objectives
In this lesson, you export a Developer tool mapping to a PowerCenter repository.
Prerequisites
Before you start this lesson, verify the following prerequisites:
Complete lessons 1, 2, and 3. You can connect to a PowerCenter 9.0 repository. To get the repository login information, contact a domain
administrator.
Timing
Set aside 5 to 10 minutes to complete this task.
108
10. Select the repository folder that you want to export the mapping to. If the repository contains a tutorial folder, select it. 11. Click Next. The Developer tool prompts you to select the objects to export. 12. Select Customer_Orders and click Finish. The Developer tool exports the objects to the location you selected.
109
APPENDIX A
Administrator FAQ
Review the FAQ to answer questions you may have about Informatica Administrator. What is the difference between the Informatica Administrator and the PowerCenter Administration Console? The PowerCenter Administration Console is renamed to Informatica Administrator (the Administrator tool). The Administrator tool has a new interface. Some of the properties and configuration tasks from the PowerCenter Administration Console have been moved to different locations in the Administrator tool. The Administrator tool is also expanded to include new services and objects. Can I use one user account to access the Administrator tool and the Developer tool? Yes. You can give a user permission to access both tools. You do not need to create separate user accounts for each client application. What is the difference between the PowerCenter Repository Service and the Model Repository Service? The PowerCenter application services and PowerCenter application clients use the PowerCenter Repository Service. The PowerCenter repository has folder based security. The Data Integration Service, Analyst Service, Developer tool, and Analyst tool use the Model Repository Service. The Model Repository Service has project based security. You can migrate some Model repository objects to the PowerCenter repository. Where can I use database connections that I create in the Informatica Administrator? You can create, view, and edit database connections in the Administrator tool and the Developer tool. You can create and view database connections in the Analyst tool. You can also configure database connection pooling in the Administrator tool. What is the difference between the PowerCenter Integration Services and the Data Integration Service? The PowerCenter Integration Service is an application service that runs sessions and workflows. The Data Integration Service is an application service that performs data integration tasks for the Analyst tool, the Developer tool, and external clients. The Analyst tool and the Developer tool send data integration task requests to the Data Integration Service to preview or run data profiles, SQL data services, and mappings.
110
Commands from the command line or an external clients send data integration task request to the Data Integration Service to run SQL data services. Why can't I connect to an SQL data service that is deployed to a Data Integration Service? To connect to an SQL data service, the application that contains the SQL data service must be running. To start the application, select the application in the application view of the Data Integration Service, and then click the Start option on the Domain tab Actions menu.
mapping only in that it cannot use shortcuts and does not use a source qualifier.
Logical data object mapping. A mapping in a logical data object model. A logical data object mapping can
contain a logical data object as the mapping input and a data object as the mapping output. Or, it can contain one or more physical data objects as the mapping input and logical data object as the mapping output.
Virtual table mapping. A mapping in an SQL data service. It contains a data object as the mapping input
Input Parameter transformation or physical data object as the mapping input and an Output Parameter transformation or physical data object as the mapping output. What is the difference between a mapplet in PowerCenter and a mapplet in the Developer tool? A mapplet in PowerCenter and in the Developer tool is a reusable object that contains a set of transformations. You can reuse the transformation logic in multiple mappings. A PowerCenter mapplet can contain source definitions or Input transformations as the mapplet input. It must contain Output transformations as the mapplet output.
111
A Developer tool mapplet can contain data objects or Input transformations as the mapplet input. It can contain data objects or Output transformations as the mapplet output. A mapping in the Developer tool also includes the following features:
You can validate a mapplet as a rule. You use a rule in a profile. A mapplet can contain other mapplets.
What is the difference between a mapplet and a rule? You can validate a mapplet as a rule. A rule is business logic that defines conditions applied to source data when you run a profile. You can validate a mapplet as a rule when the mapplet meets the following requirements:
It contains an Input and Output transformation. The mapplet does not contain active transformations. It does not specify cardinality between input groups.
Can I use one user account to access the Administrator tool, the Developer tool, and the Analyst Tool? Yes. You can give a user permission to access all three tools. You do not need to create separate user accounts for each client application. What happened to the Reference Table Manager? Where is my reference data stored? The functionality from the Reference Table Manager is included in the Analyst tool. You can use the Analyst tool to create and share reference data. The reference data is stored in the staging database that you configure when you create an Analyst Service.
112