Professional Documents
Culture Documents
Informatica PowerCenter
(Version 8.5)
PowerCenter Workflow Administration Guide Version 8.5 October 2007 Copyright (c) 19982007 Informatica Corporation. All rights reserved. This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. This Software is protected by U.S. and/or international Patents and other Patents Pending. Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable. The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us in writing. Informatica, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica Complex Data Exchange and Informatica On Demand Data Replicator are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rights reserved. Copyright 2007 Adobe Systems Incorporated. All rights reserved. Copyright Sun Microsystems. All rights reserved. Copyright RSA Security Inc. All Rights Reserved. Copyright Ordinal Technology Corp. All rights reserved. Copyright Platon Data Technology GmbH. All rights reserved. Copyright Melissa Data Corporation. All rights reserved. Copyright Aandacht c.v. All rights reserved. Copyright 1996-2007 ComponentSource. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright 2007 Isomorphic Software. All rights reserved. Copyright Meta Integration Technology, Inc. All rights reserved. Copyright MySQL AB. All rights reserved. Copyright Microsoft. All rights reserved. Copyright Oracle. All rights reserved. Copyright AKS-Labs. All rights reserved. Copyright Quovadx, Inc. All rights reserved. Copyright SAP. All rights reserved. Copyright 2003, 2007 Instantiations, Inc. All rights reserved. This product includes software developed by the Apache Software Foundation (http://www.apache.org/), software copyright 2004-2005 Open Symphony (all rights reserved) and other software which is licensed under the Apache License, Version 2.0 (the License). You may obtain a copy of the License at http:// www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright, Red Hat Middleware, LLC, all rights reserved; software copyright 1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under the GNU Lesser General Public License Agreement, which may be found at http://www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, as-is, without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvine, and Vanderbilt University, Copyright (c) 1993-2006, all rights reserved. This product includes software copyright (c) 2003-2007, Terence Parr. All rights reserved. Your right to use such materials is set forth in the license which may be found at http://www.antlr.org/license.html. The materials are provided free of charge by Informatica, as-is, without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistribution of this software is subject to terms available at http://www.openssl.org. This product includes Curl software which is Copyright 1996-2007, Daniel Stenberg, <daniel@haxx.se>. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies. The product includes software copyright 2001-2005 (C) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://www.dom4j.org/license.html. The product includes software copyright (c) 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://svn.dojotoolkit.org/dojo/trunk/LICENSE. This product includes ICU software which is copyright (c) 1995-2003 International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding this software are subject to terms available at http://www-306.ibm.com/software/globalization/icu/license.jsp This product includes software copyright (C) 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at http://www.gnu.org/software/kawa/Software-License.html.
This product includes OSSP UUID software which is Copyright (c) 2002 Ralf S. Engelschall, Copyright (c) 2002 The OSSP Project Copyright (c) 2002 Cable & Wireless Deutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mitlicense.php. This product includes software developed by Boost (http://www.boost.org/). Permissions and limitations regarding this software are subject to terms available at http://www.boost.org/LICENSE_1_0.txt. This product includes software copyright 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http://www.pcre.org/license.txt. This product includes software copyright (c) 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://www.eclipse.org/org/documents/epl-v10.php. The product includes the zlib library copyright (c) 1995-2005 Jean-loup Gailly and Mark Adler. This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html. This product includes software licensed under the terms at http://www.bosrup.com/web/overlib/?License. This product includes software licensed under the terms at http://www.stlport.org/doc/license.html. This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php.) This product includes software developed by the Indiana University Extreme! Lab. For further information please visit http://www.extreme.indiana.edu/. This Software is protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,014,670; 6,032,158; 5,794,246; 6,339,775; 6,850,947; 6,895,471 and other U.S. Patents Pending. DISCLAIMER: Informatica Corporation provides this documentation as is without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. Informatica Corporation does not warrant that this software or documentation is error free. The information provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is subject to change at any time without notice. Part Number: PC-WAG-85000-0002
Table of Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxix List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxv Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xli
About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlii Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlii Other Informatica Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xliii Visiting Informatica Customer Portal . . . . . . . . . . . . . . . . . . . . . . . . . xliii Visiting the Informatica Web Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . xliii Visiting the Informatica Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . xliii Obtaining Customer Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xliii
Viewing Object Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Entering Descriptions for Repository Objects . . . . . . . . . . . . . . . . . . . . . 19 Renaming Repository Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Checking In and Out Versioned Repository Objects . . . . . . . . . . . . . . . . . . . 20 Checking In Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Viewing and Comparing Versioned Repository Objects . . . . . . . . . . . . . . 21 Searching for Versioned Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Copying Repository Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Copying Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Copying Workflow Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Comparing Repository Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Steps for Comparing Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Working with Metadata Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Creating a Metadata Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Editing a Metadata Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Deleting a Metadata Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Keyboard Shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
vi
Table of Contents
Configuring a Relational Database Connection . . . . . . . . . . . . . . . . . . . 54 Copying a Relational Database Connection . . . . . . . . . . . . . . . . . . . . . . 57 Replacing a Relational Database Connection . . . . . . . . . . . . . . . . . . . . . 58 FTP Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Using an SFTP Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Rules and Guidelines for Mainframes . . . . . . . . . . . . . . . . . . . . . . . . . . 63 External Loader Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 HTTP Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Configuring Certificate Files for SSL Authentication . . . . . . . . . . . . . . . 68 PowerChannel Relational Database Connections . . . . . . . . . . . . . . . . . . . . . 71 PowerExchange for JMS Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Connection Properties for JNDI Application Connections . . . . . . . . . . . 74 Connection Properties for JMS Application Connections . . . . . . . . . . . . 74 Creating JNDI and JMS Application Connections . . . . . . . . . . . . . . . . . 75 PowerExchange for MSMQ Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 PowerExchange for PeopleSoft Connections . . . . . . . . . . . . . . . . . . . . . . . . . 77 PowerExchange for Salesforce Connections . . . . . . . . . . . . . . . . . . . . . . . . . 79 PowerExchange for SAP NetWeaver BW Option Connections . . . . . . . . . . . . 80 Configuring an SAP BW OHS Application Connection . . . . . . . . . . . . . 80 Configuring an SAP BW Application Connection . . . . . . . . . . . . . . . . . 81 PowerExchange for SAP NetWeaver mySAP Option Connections . . . . . . . . . 83 Configuring an SAP R/3 Application Connection for ABAP Integration . 84 Configuring an FTP Connection for ABAP Integration . . . . . . . . . . . . . 86 Configuring Application Connections for ALE Integration . . . . . . . . . . . 86 Configuring an Application Connection for BAPI/RFC Integration . . . . 88 PowerExchange for Siebel Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 PowerExchange for TIBCO Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Connection Properties for TIB/Rendezvous Application Connections . . . 92 Connection Properties for TIB/Adapter SDK Connections . . . . . . . . . . . 93 Configuring TIBCO Application Connections . . . . . . . . . . . . . . . . . . . . 94 PowerExchange for Web Services Connections . . . . . . . . . . . . . . . . . . . . . . . 96 PowerExchange for webMethods Connections . . . . . . . . . . . . . . . . . . . . . . . 99 PowerExchange for WebSphere MQ Connections . . . . . . . . . . . . . . . . . . . . . . .101 Test Queue Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Table of Contents
vii
Creating a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Creating a Workflow Manually . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Creating a Workflow Automatically . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Adding Tasks to Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Deleting a Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Using the Workflow Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Step 1. Assign a Name and Integration Service to the Workflow . . . . . . 111 Step 2. Create a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Step 3. Schedule a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Assigning an Integration Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Assigning a Service from the Workflow Properties . . . . . . . . . . . . . . . . 115 Assigning a Service from the Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Working with Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Specifying Link Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Viewing Links in a Workflow or Worklet . . . . . . . . . . . . . . . . . . . . . . . 120 Deleting Links in a Workflow or Worklet . . . . . . . . . . . . . . . . . . . . . . . 120 Using the Expression Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Adding Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Validating Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Expression Editor Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Scheduling a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Unscheduling a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Creating a Reusable Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Configuring Scheduler Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Editing Scheduler Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Disabling Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Validating a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Expression Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Task Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Workflow Properties Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Running Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Manually Starting a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Running a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Running a Workflow with Advanced Options . . . . . . . . . . . . . . . . . . . . 135 Running a Part of a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Running a Task in the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Suspending the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Configuring Suspension Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
viii Table of Contents
Stopping or Aborting the Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 How the Integration Service Handles Stop and Abort . . . . . . . . . . . . . . 140 Stopping or Aborting a Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Viewing a Workflow Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Table of Contents
ix
Creating a Decision Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Working with Event Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Example of User-Defined Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Working with Event-Raise Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Working with Event-Wait Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Working with the Timer Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Table of Contents
Using Pre- and Post-Session SQL Commands . . . . . . . . . . . . . . . . . . . . . . . 223 Guidelines for Entering Pre- and Post-Session SQL Commands . . . . . . 223 Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Using Pre- and Post-Session Shell Commands . . . . . . . . . . . . . . . . . . . . . . 225 Using Parameters and Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Configuring Non-Reusable Shell Commands . . . . . . . . . . . . . . . . . . . . 226 Configuring Reusable Shell Commands . . . . . . . . . . . . . . . . . . . . . . . . 229 Pre-Session Shell Command Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Validating a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Validating Multiple Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Stopping and Aborting a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Threshold Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Fatal Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 ABORT Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 User Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Integration Service Handling for Session Failure . . . . . . . . . . . . . . . . . 234 Working with Session Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Changing the Session Log Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 Changing the Target File and Directory . . . . . . . . . . . . . . . . . . . . . . . . 238 Changing Source Parameters in a File . . . . . . . . . . . . . . . . . . . . . . . . . 239 Changing Connection Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Getting Run-Time Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 Rules and Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 Mapping Parameters and Variables in Sessions . . . . . . . . . . . . . . . . . . . . . . 242 Assigning Parameter and Variable Values in a Session . . . . . . . . . . . . . . . . . 243 Passing Parameter and Variable Values between Sessions . . . . . . . . . . . . 244 Configuring Parameter and Variable Assignments . . . . . . . . . . . . . . . . . 245 Handling High Precision Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 Bigint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 Decimal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
Table of Contents
xi
Configuring Sources in a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 Configuring Readers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 Configuring Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Configuring Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 Working with Relational Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 Selecting the Source Database Connection . . . . . . . . . . . . . . . . . . . . . . 258 Defining the Treat Source Rows As Property . . . . . . . . . . . . . . . . . . . . 258 Overriding the SQL Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Configuring the Table Owner Name . . . . . . . . . . . . . . . . . . . . . . . . . . 260 Overriding the Source Table Name . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Working with File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Configuring Source Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Configuring Commands for File Sources . . . . . . . . . . . . . . . . . . . . . . . 265 Configuring Fixed-Width File Properties . . . . . . . . . . . . . . . . . . . . . . . 266 Configuring Delimited File Properties . . . . . . . . . . . . . . . . . . . . . . . . . 268 Configuring Line Sequential Buffer Length . . . . . . . . . . . . . . . . . . . . . 271 Integration Service Handling for File Sources . . . . . . . . . . . . . . . . . . . . . . . 273 Character Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Multibyte Character Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Null Character Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Row Length Handling for Fixed-Width Flat Files . . . . . . . . . . . . . . . . . 275 Numeric Data Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 Using a File List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Creating the File List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Configuring a Session to Use a File List . . . . . . . . . . . . . . . . . . . . . . . . 278 Using FastExport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 Creating a FastExport Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 Changing the Reader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Changing the Source Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Overriding the Control File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 Rules and Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
xii
Table of Contents
Configuring Targets in a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Configuring Writers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Configuring Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 Configuring Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Working with Relational Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Target Database Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Target Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Truncating Target Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 Deadlock Retry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 Dropping and Recreating Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 Constraint-Based Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 Bulk Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 Table Name Prefix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 Target Table Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Reserved Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 Working with Target Connection Groups . . . . . . . . . . . . . . . . . . . . . . . . . 310 Working with Active Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 Working with File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Configuring Target Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Configuring Commands for File Targets . . . . . . . . . . . . . . . . . . . . . . . 317 Configuring Test Load Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 Configuring Fixed-Width Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Configuring Delimited Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 Integration Service Handling for File Targets . . . . . . . . . . . . . . . . . . . . . . . 323 Writing to Fixed-Width Flat Files with Relational Target Definitions . . 323 Writing to Fixed-Width Files with Flat File Target Definitions . . . . . . . 324 Generating Flat File Targets By Transaction . . . . . . . . . . . . . . . . . . . . . 325 Writing Multibyte Data to Fixed-Width Flat Files . . . . . . . . . . . . . . . . 326 Null Characters in Fixed-Width Files . . . . . . . . . . . . . . . . . . . . . . . . . 327 Character Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 Writing Metadata to Flat File Targets . . . . . . . . . . . . . . . . . . . . . . . . . 328 Working with Heterogeneous Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Reject Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 Locating Reject Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 Reading Reject Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Table of Contents
xiii
xiv
Table of Contents
Determining the Commit Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 Switching from Source-Based to Target-Based Commit . . . . . . . . . . . . . 362 User-Defined Commits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Rolling Back Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 Understanding Transaction Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Transformation Scope. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Understanding Transaction Control Units . . . . . . . . . . . . . . . . . . . . . . 371 Rules and Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 Creating Target Files by Transaction . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Setting Commit Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
Table of Contents
xv
Configuring a Partition Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 Steps for Adding Partition Points to a Pipeline . . . . . . . . . . . . . . . . . . . 444
Table of Contents
xvii
Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520 Troubleshooting Orphaned Sequences and Views . . . . . . . . . . . . . . . . . 522 Using the $$PushdownConfig Mapping Parameter . . . . . . . . . . . . . . . . . . . 525 Configuring Sessions for Pushdown Optimization . . . . . . . . . . . . . . . . . . . 527 Pushdown Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 Target Load Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530 Viewing Pushdown Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
Table of Contents
xix
Rules and Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558 Using Parameters and Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 Accessing the Run Instance Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 Steps to Configure Concurrent Workflows . . . . . . . . . . . . . . . . . . . . . . . . . 561 Starting and Stopping Concurrent Workflows . . . . . . . . . . . . . . . . . . . . . . . 563 Starting Workflow Instances from Workflow Designer . . . . . . . . . . . . . 563 Starting One Concurrent Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . 563 Starting Concurrent Workflows from the Command Line . . . . . . . . . . . 564 Stopping or Aborting Concurrent Workflows . . . . . . . . . . . . . . . . . . . . 564 Monitoring Concurrent Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565 Viewing Session and Workflow Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 Log Files for Unique Workflow Instances . . . . . . . . . . . . . . . . . . . . . . . 566 Log Files for Workflow Instances of the Same Name . . . . . . . . . . . . . . . 566 Rules and Guidelines for Concurrent Workflows. . . . . . . . . . . . . . . . . . . . . 568
xx
Table of Contents
Scheduling and Unscheduling Workflows . . . . . . . . . . . . . . . . . . . . . . 588 Viewing Session Logs and Workflow Logs . . . . . . . . . . . . . . . . . . . . . . 588 Viewing History Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 Workflow and Task Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590 Using the Gantt Chart View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592 Listing Tasks and Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592 Navigating the Time Window in Gantt Chart View . . . . . . . . . . . . . . . 593 Zooming the Gantt Chart View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594 Performing a Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595 Opening All Folders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 Using the Task View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598 Filtering in Task View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599 Opening All Folders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601 Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
Table of Contents
xxi
Session Log File Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662 Setting Tracing Levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662 Viewing Log Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664
Using a Parameter File with Workflows or Sessions . . . . . . . . . . . . . . . . 702 Using a Parameter File with pmcmd . . . . . . . . . . . . . . . . . . . . . . . . . . . 705 Parameter File Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 Guidelines for Creating Parameter Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 708 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709 Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
xxiv
Table of Contents
Configuring a Session to Write to a File . . . . . . . . . . . . . . . . . . . . . . . 741 Configuring File Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742 Selecting an External Loader Connection . . . . . . . . . . . . . . . . . . . . . . . 744 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
Configuring a Numeric Cache Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 778 Steps to Configure the Cache Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778 Cache Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781 Configuring the Cache Size for Cache Partitioning . . . . . . . . . . . . . . . . 781 Aggregator Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783 Incremental Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783 Configuring the Cache Sizes for an Aggregator Transformation . . . . . . . 783 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784 Joiner Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786 1:n Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787 n:n Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787 Configuring the Cache Sizes for a Joiner Transformation . . . . . . . . . . . 787 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788 Lookup Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790 Sharing Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791 Configuring the Cache Sizes for a Lookup Transformation . . . . . . . . . . 791 Rank Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793 Configuring the Cache Sizes for a Rank Transformation . . . . . . . . . . . . 794 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794 Sorter Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796 Configuring the Cache Size for a Sorter Transformation . . . . . . . . . . . . 796 XML Target Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798 Configuring the Cache Size for an XML Target . . . . . . . . . . . . . . . . . . 798 Optimizing the Cache Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799
xxvi
Table of Contents
Connections Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821 Sources Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822 Targets Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831 Mapping Tab (Partitions View) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843 Partition Properties Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843 KeyRange Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844 HashKeys Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844 Partition Points Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844 Non-Partition Points Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847 Components Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848 Reusable Pre- or Post-Session Commands . . . . . . . . . . . . . . . . . . . . . . 850 Non-Reusable Pre- or Post-Session Commands . . . . . . . . . . . . . . . . . . 850 Reusable Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852 Non-Reusable Email. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853 Email Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853 Metadata Extensions Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873
Table of Contents
xxvii
xxviii
Table of Contents
List of Figures
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure 1-1. Sample Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2. Workflow Manager Windows . . . . . . . . . . . . . . . . . . . . . . . . 1-3. Workflow Manager General Options . . . . . . . . . . . . . . . . . . . 1-4. Workflow Manager Format Options . . . . . . . . . . . . . . . . . . . 1-5. Miscellaneous Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6. Two Versions of the Same Object . . . . . . . . . . . . . . . . . . . . . 1-7. Query Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8. Diff Tool Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1. Connection Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1. Sample Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2. Sample Workflow With Two Branches . . . . . . . . . . . . . . . . . . 3-3. Valid Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4. Invalid Workflow with a Loop . . . . . . . . . . . . . . . . . . . . . . . . 3-5. Setting a Link Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6. Displaying a Link Condition in the Workflow . . . . . . . . . . . . 3-7. Expression Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8. Schedule Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9. Customized Repeat Dialog Box . . . . . . . . . . . . . . . . . . . . . . . 3-10. Example Workflow - Validation . . . . . . . . . . . . . . . . . . . . . . 3-11. Running Part of a Workflow - Example . . . . . . . . . . . . . . . . 4-1. Creating an Expression Using Variables . . . . . . . . . . . . . . . . . 4-2. Expression Using a Predefined Workflow Variable . . . . . . . . . 4-3. Condition Variable Example . . . . . . . . . . . . . . . . . . . . . . . . . 4-4. Status Variable Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5. PrevTaskStatus Variable Example. . . . . . . . . . . . . . . . . . . . . . 4-6. Sample Workflow Using Workflow Variable . . . . . . . . . . . . . . 5-1. General Tab - Edit Tasks Dialog Box . . . . . . . . . . . . . . . . . . . 5-2. Revert Button in Session Properties . . . . . . . . . . . . . . . . . . . . 5-3. Fail Task if Any Command Fails Option . . . . . . . . . . . . . . . . 5-4. Sample Workflow Using a Decision Task . . . . . . . . . . . . . . . . 5-5. Sample Workflow Without a Decision Task . . . . . . . . . . . . . . 5-6. Expanded Sample Workflow Using a Decision Task . . . . . . . . 5-7. Example of User-Defined Events . . . . . . . . . . . . . . . . . . . . . . 5-8. Sample Workflow Using the Timer Task . . . . . . . . . . . . . . . . 6-1. Workflow with Multiple Worklets . . . . . . . . . . . . . . . . . . . . . 6-2. Workflow with Nested Worklets . . . . . . . . . . . . . . . . . . . . . . 7-1. Session Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2. Session Target Object Settings . . . . . . . . . . . . . . . . . . . . . . . . 7-3. Connection Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4. Config Object Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2 .. 4 .. 7 .. 9 . 12 . 21 . 23 . 28 . 37 106 107 117 117 119 119 121 127 129 133 137 145 149 149 150 151 152 161 163 173 177 177 178 180 189 196 196 208 209 211 218
List of Figures
xxix
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure
7-5. Stop or Continue the Session on Pre- or Post-Session SQL Errors . . . . 7-6. Make Reusable Option for Pre-Session Shell Commands . . . . . . . . . . . 7-7. Stop or Continue the Session on Pre-Session Shell Command Error . . . 8-1. Sources Node of the Session Properties . . . . . . . . . . . . . . . . . . . . . . . 8-2. Readers Settings in the Sources Node of the Mapping Tab . . . . . . . . . 8-3. Connections Settings in the Sources Node . . . . . . . . . . . . . . . . . . . . . 8-4. Properties Settings in the Sources Node of the Mapping Tab . . . . . . . . 8-5. SQL Query Override Property in the Session Properties . . . . . . . . . . . 8-6. Source Table Owner Name Property . . . . . . . . . . . . . . . . . . . . . . . . . 8-7. Source Table Name Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-8. Properties Settings in the Sources Node for a Flat File Source . . . . . . . 8-9. Flat Files Dialog Box - Fixed-Width . . . . . . . . . . . . . . . . . . . . . . . . . 8-10. Fixed Width File Properties Dialog Box . . . . . . . . . . . . . . . . . . . . . . 8-11. Flat Files Dialog Box - Delimited . . . . . . . . . . . . . . . . . . . . . . . . . . 8-12. Delimited File Properties Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . 8-13. Line Sequential Buffer Length Property for File Sources . . . . . . . . . . 9-1. Defining Target Properties in the Session Properties . . . . . . . . . . . . . . 9-2. Writer Settings on the Mapping Tab of the Session Properties . . . . . . . 9-3. Connection Settings on the Mapping Tab of the Session Properties . . . 9-4. Properties Settings on the Mapping Tab of the Session Properties . . . . 9-5. Properties Settings on the Mapping Tab for a Relational Target . . . . . 9-6. Test Load Options - Relational Targets . . . . . . . . . . . . . . . . . . . . . . . 9-7. Mapping Using Constraint-Based Loading . . . . . . . . . . . . . . . . . . . . . 9-8. Table Name Prefix Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9. Target Table Name Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10. Properties Settings in the Mapping Tab for a Flat File Target . . . . . . 9-11. Flat Files Dialog Box - Fixed-Width. . . . . . . . . . . . . . . . . . . . . . . . . 9-12. Fixed Width Properties Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . 9-13. Flat Files Dialog Box - Delimited . . . . . . . . . . . . . . . . . . . . . . . . . . 9-14. Delimited File Properties Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . 9-15. Properties Settings on the Mapping Tab . . . . . . . . . . . . . . . . . . . . . . 10-1. Message Queue Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2. Web Service Message Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3. Changed Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4. Message Processing with Recovery File . . . . . . . . . . . . . . . . . . . . . . . 10-5. Message Processing with Recovery Table . . . . . . . . . . . . . . . . . . . . . 11-1. Mapping with a Single Commit Source . . . . . . . . . . . . . . . . . . . . . . 11-2. Mapping with Multiple Commit Sources . . . . . . . . . . . . . . . . . . . . . 11-3. Mapping with Targets Connected to a Commit Source . . . . . . . . . . . 11-4. Mapping a Custom Transformation with a Commit Source . . . . . . . . 11-5. Roll Back on Failed Commit Example . . . . . . . . . . . . . . . . . . . . . . . 11-6. Transaction Control Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-1. Email Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.224 .226 .230 .254 .255 .256 .257 .260 .261 .262 .264 .267 .267 .269 .269 .272 .288 .289 .290 .291 .294 .296 .303 .307 .308 .315 .319 .320 .321 .321 .331 .337 .338 .339 .346 .348 .361 .362 .363 .364 .368 .372 .410
xxx
List of Figures
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure
13-2. Post-Session Email Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-3. Suspension Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-4. Using Post-Session Commands to Generate Reports . . . . . . . . . . . . . . . . . . . . 13-5. Using Email Variables to Attach Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-6. Sending Email Without Microsoft Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1. Default Partition Points and Stages in a Sample Mapping . . . . . . . . . . . . . . . . 14-2. Thread Creation for a Mapping with Three Partitions . . . . . . . . . . . . . . . . . . . 14-3. Dynamic Partitioning Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-4. Session Properties Partitions View on the Mapping Tab . . . . . . . . . . . . . . . . . 14-5. Edit Partition Point Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-1. Sample Mapping Showing Valid Partition Points . . . . . . . . . . . . . . . . . . . . . . 15-2. Overriding the SQL Query and Entering a Filter Condition . . . . . . . . . . . . . . 15-3. Properties Settings for Relational Targets in the Session Properties . . . . . . . . . 15-4. Connections Settings for File Targets in the Session Properties . . . . . . . . . . . . 15-5. Properties Settings for File Targets in the Session Properties . . . . . . . . . . . . . . 15-6. Sorted File Data with 1:n Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-7. Sorted File Data Passed Through a Single Partition . . . . . . . . . . . . . . . . . . . . . 15-8. Sorted Relational Data with 1:n Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . 15-9. Sorted Relational Data Passed Through a Single Partition . . . . . . . . . . . . . . . . 15-10. Using Sorter Transformations with Hash Auto-Keys to Maintain Sort Order . 15-11. Session Properties - Configuring Sorter Transformations . . . . . . . . . . . . . . . . 16-1. Sample Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-2. Hash Auto-Keys Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-3. Edit Partition Key Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-4. Mapping Where Key Range Partitioning Can Increase Performance . . . . . . . . . 16-5. Edit Partition Key Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-6. Adding Key Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-7. Pass-Through Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-8. Round-Robin Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-1. Sample Mapping Used in a Pushdown Optimization Session . . . . . . . . . . . . . . 17-2. Sample Mapping with Two Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-3. Sample Mapping with Two Pushdown Groups . . . . . . . . . . . . . . . . . . . . . . . . 17-4. Pushdown Optimization Viewer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-1. Multiple Workflow Instance Names and Parameter Files . . . . . . . . . . . . . . . . . 19-2. Configuring Concurrent Runs with the Same Instance Names . . . . . . . . . . . . . 19-3. Workflow with Unique Instance Names in Task View . . . . . . . . . . . . . . . . . . . 20-1. Workflow Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-2. Workflow Monitor Statistics Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-3. Workflow Monitor General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-4. Workflow Monitor Gantt Chart Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-5. Workflow Monitor Task View Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-6. Workflow Monitor Advanced Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-7. History Names Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
414 421 424 425 425 429 430 434 441 443 448 450 459 462 463 471 472 473 474 475 478 483 490 492 493 494 495 497 499 502 529 531 533 555 557 565 571 576 579 580 581 582 589
List of Figures
xxxi
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure
20-8. Gantt Chart View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-9. Navigating the Time Window in the Gantt Chart View . . . . . . . . . . . . . . . . . 20-10. Zooming the Gantt Chart View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-11. Task View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-1. Workflow Monitor Repository Details Area . . . . . . . . . . . . . . . . . . . . . . . . . . 21-2. Integration Service Details. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-3. Integration Service Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-4. Workflow Monitor Folder Details Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-5. Workflow Monitor Workflow Details Area . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-6. Workflow Monitor Task Progress Details Area . . . . . . . . . . . . . . . . . . . . . . . . 21-7. Workflow Monitor Session Statistics Area . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-8. Workflow Monitor Worklet Details Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-9. Workflow Monitor Task Details Area for Command Tasks . . . . . . . . . . . . . . . 21-10. Workflow Monitor Failure Information Area . . . . . . . . . . . . . . . . . . . . . . . . 21-11. Workflow Monitor Task Details Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-12. Workflow Monitor Source/Target Statistics Area . . . . . . . . . . . . . . . . . . . . . 21-13. Workflow Monitor Partition Details Area . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-14. Workflow Monitor Performance Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-1. Running a Workflow on a Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-2. Workflow Distributed to the Nodes in a Grid. . . . . . . . . . . . . . . . . . . . . . . . . 22-3. Session Threads Distributed to DTM Processes Running on Nodes in a Grid . . 22-4. Partition Groups Distributed Based on Partitioning Configuration . . . . . . . . . 22-5. Partition Groups Distributed Based on Resource Availability . . . . . . . . . . . . . . 23-1. Workflow Properties General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-1. Sample Workflow Log in the Log Events Window . . . . . . . . . . . . . . . . . . . . . 24-2. Sample Workflow Log Events Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-3. Sample Session Log Events Window. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-1. Control File Editor Dialog Box for Teradata . . . . . . . . . . . . . . . . . . . . . . . . . 28-2. Writers Settings on the Mapping Tab. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-3. Properties Settings on the Mapping Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-4. External Loader Connection Settings on the Mapping Tab . . . . . . . . . . . . . . . 29-1. FTP Connection Settings on the Mapping Tab . . . . . . . . . . . . . . . . . . . . . . . . 29-2. Properties Settings for the Source File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-3. Properties Settings for the Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-1. Incremental Aggregation Session Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 31-1. Cache Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1. General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2. Properties Tab - General Options Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3. Properties Tab - Performance Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4. Config Object Tab - Advanced Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5. Config Object Tab - Log Option Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6. Config Object Tab - Error Handling Settings . . . . . . . . . . . . . . . . . . . . . . . . . A-7. Config Object Tab - Partitioning Options . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.592 .594 .595 .599 .605 .606 .608 .609 .610 .611 .612 .613 .615 .616 .617 .618 .620 .621 .628 .629 .630 .632 .633 .641 .650 .659 .661 .728 .742 .743 .745 .754 .757 .758 .768 .776 .802 .804 .808 .812 .815 .816 .818
xxxii
List of Figures
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure
A-8. Config Object Tab - Session on Grid . . . . . . . . . . . . . . . . . . . . . . A-9. Mapping Tab - Connections Settings . . . . . . . . . . . . . . . . . . . . . . A-10. Mapping Tab - Sources Node - Readers Settings . . . . . . . . . . . . . A-11. Mapping Tab - Sources Node - Connections Settings . . . . . . . . . A-12. Mapping Tab - Sources Node - Properties Settings . . . . . . . . . . . A-13. Flat Files Dialog Box for Sources . . . . . . . . . . . . . . . . . . . . . . . . A-14. Fixed Width Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-15. Delimited Properties for File Sources . . . . . . . . . . . . . . . . . . . . . A-16. Mapping Tab - Targets Node - Writers Settings . . . . . . . . . . . . . A-17. Mapping Tab - Targets Node - Connections Settings . . . . . . . . . A-18. Mapping Tab - Targets Node - Properties Settings (Relational) . . A-19. Mapping Tab - Targets Node - File Properties Settings . . . . . . . . A-20. Flat Files Dialog Box for Targets . . . . . . . . . . . . . . . . . . . . . . . . A-21. Fixed-Width Properties for File Targets . . . . . . . . . . . . . . . . . . . A-22. Delimited Properties for File Targets . . . . . . . . . . . . . . . . . . . . . A-23. Mapping Tab - Transformations Node . . . . . . . . . . . . . . . . . . . . A-24. Mapping Tab - Partitions Properties Node . . . . . . . . . . . . . . . . . A-25. Mapping Tab - KeyRange Node . . . . . . . . . . . . . . . . . . . . . . . . A-26. Mapping Tab - Partition Points Node . . . . . . . . . . . . . . . . . . . . A-27. Edit Partition Point Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . A-28. Edit Partition Key Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . A-29. Components Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-30. Task Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-31. Edit Pre-Session Command Dialog Box . . . . . . . . . . . . . . . . . . . A-32. Email Object Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-33. On-Success or On-Failure Email - General Tab . . . . . . . . . . . . . A-34. On-Success or On-Failure Email - Properties Tab . . . . . . . . . . . . A-35. Metadata Extensions Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1. Workflow Properties - General Tab . . . . . . . . . . . . . . . . . . . . . . . B-2. Workflow Properties - Properties Tab . . . . . . . . . . . . . . . . . . . . . B-3. Workflow Properties - Scheduler Tab . . . . . . . . . . . . . . . . . . . . . B-4. Workflow Properties - Scheduler Tab - Edit Scheduler Dialog Box B-5. Workflow Properties - Customized Repeat Dialog Box . . . . . . . . . B-6. Workflow Properties - Variables Tab . . . . . . . . . . . . . . . . . . . . . . B-7. Workflow Properties - Events Tab . . . . . . . . . . . . . . . . . . . . . . . . B-8. Workflow Properties - Metadata Extensions Tab . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
820 821 823 824 825 827 828 829 832 833 834 837 839 840 840 842 843 844 845 846 847 848 850 851 853 854 855 856 860 862 864 865 867 869 870 871
List of Figures
xxxiii
xxxiv
List of Figures
List of Tables
Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table 1-1. Workflow Manager General Options . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2. Workflow Manager Format Options . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3. Workflow Manager Miscellaneous Options . . . . . . . . . . . . . . . . . . . . . 1-4. Workflow Manager Page Setup Options . . . . . . . . . . . . . . . . . . . . . . . 1-5. Metadata Extension Attributes in the Workflow Manager . . . . . . . . . . . 1-6. Workflow Manager Keyboard Shortcuts . . . . . . . . . . . . . . . . . . . . . . . 1-7. Keyboard Shortcuts for Navigating the Workspace . . . . . . . . . . . . . . . . 2-1. Native Connect String Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2. Relational Database Connection Information . . . . . . . . . . . . . . . . . . . 2-3. FTP Connection Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4. PowerChannel Relational Database Connection Information . . . . . . . . 2-5. JNDI Application Connection Properties . . . . . . . . . . . . . . . . . . . . . . 2-6. JMS Application Connection Properties . . . . . . . . . . . . . . . . . . . . . . . 2-7. MSMQ Connection Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8. PeopleSoft Application Connection Properties . . . . . . . . . . . . . . . . . . . 2-9. Salesforce Application Connection Attribute . . . . . . . . . . . . . . . . . . . . 2-10. SAP BW OHS Application Connection Properties . . . . . . . . . . . . . . . 2-11. SAP BW Application Connection Properties . . . . . . . . . . . . . . . . . . . 2-12. Types of Connections for SAP Sessions . . . . . . . . . . . . . . . . . . . . . . . 2-13. SAP R/3 Application Connection Properties . . . . . . . . . . . . . . . . . . . 2-14. SAP_ALE_IDoc_Reader Application Connection Properties . . . . . . . . 2-15. SAP_ALE_IDoc_Writer Application Connection Properties . . . . . . . . 2-16. SAP RFC/BAPI Application Connection Properties . . . . . . . . . . . . . . 2-17. Siebel Application Connection Properties . . . . . . . . . . . . . . . . . . . . . 2-18. TIB/Rendezvous Application Connection Properties . . . . . . . . . . . . . 2-19. TIB/Adapter SDK Application Connection Properties . . . . . . . . . . . . 2-20. Web Service Application Connection Properties . . . . . . . . . . . . . . . . . 2-21. webMethods Application Connection Properties . . . . . . . . . . . . . . . . 2-22. Message Queue Connection Properties . . . . . . . . . . . . . . . . . . . . . . . 3-1. Schedule Tab Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2. Repeat Dialog Box Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1. Task-Specific Workflow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2. Datatype Default Values for User-Defined Workflow Variables . . . . . . 5-1. Workflow Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2. Timer Task Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1. Apply All Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2. Integration Service Behavior for Failed Sessions . . . . . . . . . . . . . . . . . . 7-3. User-Defined Session Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4. Built-in Session Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5. Datatype Conversion for Pre-Session Variable Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7 .. 9 . 12 . 14 . 30 . 33 . 33 . 51 . 55 . 61 . 71 . 74 . 75 . 76 . 77 . 79 . 80 . 81 . 83 . 85 . 86 . 87 . 89 . 90 . 92 . 94 . 97 . 99 101 127 129 146 154 158 190 210 234 236 237 245
List of Tables
xxxv
Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table
7-6. Datatype Conversion for Post-Session Variable Assignment . . . . . . . . . . . . . . . . . 8-1. Treat Source Rows As Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2. Flat File Source Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3. Fixed-Width File Properties for File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4. Delimited File Properties for File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5. Support for ASCII and Unicode Data Movement Modes . . . . . . . . . . . . . . . . . . . 8-6. Null Character Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-7. FastExport Connection Attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-8. Fast Export Session Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-9. FastExport Control File Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1. Support for ASCII and Unicode Data Movement Modes . . . . . . . . . . . . . . . . . . . 9-2. Relational Target Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3. Test Load Options - Relational Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4. Effect of Target Options when You Treat Source Rows as Updates . . . . . . . . . . . . 9-5. Effect of Target Options when You Treat Source Rows as Data Driven . . . . . . . . . 9-6. Integration Service Commands on Supported Databases . . . . . . . . . . . . . . . . . . . . 9-7. Flat File Target Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8. Test Load Options - Flat File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9. Writing to a Fixed-Width Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10. Delimited File Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11. Datatype Modifications for File Target Columns . . . . . . . . . . . . . . . . . . . . . . . . 9-12. Field Length Measurements for Fixed-Width Flat File Targets . . . . . . . . . . . . . . 9-13. Characters to Include when Calculating Field Length for Fixed-Width Targets . . 9-14. Row Indicators in Reject File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-15. Column Indicators in Reject File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1. Restart and Recover Commands for Real-time Sessions . . . . . . . . . . . . . . . . . . . 11-1. Transformation Scope Property Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2. Session Commit Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1. PM_RECOVERY Table Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2. PM_TGT_RUN_ID Table Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3. PM_REC_STATE Table Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-4. Configurable Options for Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-5. Recoverable Workflow Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-6. Recoverable Task Statuses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-7. Recovery Strategy by Task Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-8. Incremental and Full Recovery Session Recovery Situations . . . . . . . . . . . . . . . . 12-9. Repeatable Data in Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-1. Email Variables for Post-Session Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2. Format Tags for Email Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1. Cache Partitioning for Each Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2. Variable Value Calculations with Partitioned Sessions . . . . . . . . . . . . . . . . . . . . 14-3. Options on Session Properties Partitions View on the Mapping Tab . . . . . . . . . . 14-4. Edit Partition Point Dialog Box Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.245 .258 .264 .268 .270 .273 .275 .281 .282 .282 .286 .294 .296 .297 .298 .298 .315 .319 .320 .322 .324 .325 .325 .332 .332 .352 .370 .374 .379 .379 .379 .382 .383 .386 .387 .389 .394 .415 .416 .437 .438 .442 .443
xxxvi
List of Tables
Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table
15-1. Transformation Partition Points . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2. File Properties Settings for File Sources . . . . . . . . . . . . . . . . . . . . . 15-3. Configuring Source File Name for Single-Threaded Reading . . . . . 15-4. Configuring Source File Name for Multi-Threaded Reading . . . . . . 15-5. Configuring Commands for Multi-Threaded Reading . . . . . . . . . . 15-6. Keep Relative Input Row Order . . . . . . . . . . . . . . . . . . . . . . . . . . 15-7. Keep Absolute Input Row Order . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8. Partitioning Relational Target Attributes . . . . . . . . . . . . . . . . . . . . 15-9. File Targets Connection Options . . . . . . . . . . . . . . . . . . . . . . . . . 15-10. Target File Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-11. Restrictions on the Number of Partitions for Transformations . . . 16-1. Valid Partition Types for Partition Points . . . . . . . . . . . . . . . . . . . 17-1. Operators Available in Databases . . . . . . . . . . . . . . . . . . . . . . . . . 17-2. PowerCenter Variables Available in Databases . . . . . . . . . . . . . . . . 17-3. PowerCenter Functions Available in Databases . . . . . . . . . . . . . . . 17-4. Pushdown Optimization with Target Load Options . . . . . . . . . . . . 19-1. Session Parameters to Override for Concurrent Workflows . . . . . . . 20-1. Workflow Monitor General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . 20-2. Workflow Monitor Gantt Chart Tab . . . . . . . . . . . . . . . . . . . . . . . 20-3. Workflow Monitor Advanced Tab . . . . . . . . . . . . . . . . . . . . . . . . . 20-4. Workflow and Task Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-1. Workflow Monitor Repository Details Attributes . . . . . . . . . . . . . . 21-2. Workflow Monitor Integration Service Details Attributes . . . . . . . . 21-3. Workflow Monitor Integration Service Monitor Attributes . . . . . . . 21-4. Workflow Monitor Folder Details Attributes . . . . . . . . . . . . . . . . . 21-5. Workflow Monitor Workflow Details Attributes . . . . . . . . . . . . . . 21-6. Workflow Monitor Session Statistics Attributes . . . . . . . . . . . . . . . 21-7. Workflow Monitor Worklet Details Attributes . . . . . . . . . . . . . . . . 21-8. Workflow Monitor Command Task Details Attributes . . . . . . . . . . 21-9. Workflow Monitor Failure Information Attributes . . . . . . . . . . . . . 21-10. Workflow Monitor Task Details Attributes . . . . . . . . . . . . . . . . . 21-11. Workflow Monitor Source and Target Statistics Attributes . . . . . . 21-12. Workflow Monitor Partition Details Attributes . . . . . . . . . . . . . . 21-13. Workflow Monitor Performance Attributes . . . . . . . . . . . . . . . . . 21-14. Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-1. Resource Types and Associated Repository Objects . . . . . . . . . . . . 24-1. Message Severity Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-2. Keyboard Shortcuts for the Log Events Window . . . . . . . . . . . . . . 24-3. Log File Default Locations and Associated Service Process Variables 24-4. Session Log Tracing Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-1. Functions in the Session Log Interface . . . . . . . . . . . . . . . . . . . . . . 26-1. PMERR_DATA Table Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-2. PMERR_MSG Table Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
447 455 456 457 457 457 458 460 462 464 479 484 510 511 511 530 560 579 580 582 590 605 606 608 609 610 612 613 615 617 617 619 620 622 623 642 647 652 653 663 669 676 678
List of Tables
xxxvii
Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table
26-3. PMERR_SESS Table Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-4. PMERR_TRANS Table Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-5. Error Log File Column Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-6. Error Log Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-1. Source Input Fields that Accept Parameters and Variables . . . . . . . . . . . . . . . . 27-2. Target Input Fields that Accept Parameters and Variables . . . . . . . . . . . . . . . . . 27-3. Transformation Input Fields that Accept Parameters and Variables . . . . . . . . . . 27-4. Task Input Fields that Accept Parameters and Variables . . . . . . . . . . . . . . . . . . 27-5. Session Input Fields that Accept Parameters and Variables . . . . . . . . . . . . . . . . 27-6. Workflow Input Fields that Accept Parameters and Variables . . . . . . . . . . . . . . 27-7. Connection Object Input Fields that Accept Parameters and Variables . . . . . . . 27-8. Data Profiling Input Fields that Accept Parameters and Variables . . . . . . . . . . . 27-9. Parameter File Headings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-1. IBM DB2 EE External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-2. IBM DB2 EE External Loader Return Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 28-3. IBM DB2 EEE External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-4. Oracle External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-5. Sybase IQ External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-6. Teradata MultiLoad External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . 28-7. Teradata MultiLoad External Loader Attributes Defined at the Session Level . . . 28-8. Teradata TPump External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . 28-9. Teradata TPump External Loader Attributes Defined at the Session Level . . . . . 28-10. Teradata FastLoad External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . 28-11. Teradata FastLoad External Loader Attributes Defined at the Session Level . . . 28-12. Teradata Warehouse Builder Operators and Protocol . . . . . . . . . . . . . . . . . . . 28-13. Teradata Warehouse Builder External Loader Attributes . . . . . . . . . . . . . . . . . 28-14. Teradata Warehouse Builder External Loader Attributes Defined for Sessions . 28-15. Properties Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-1. Integration Service Behavior for FTP Sources . . . . . . . . . . . . . . . . . . . . . . . . . 29-2. Properties Settings for a Source Instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-3. Properties Settings for Target Instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-4. Integration Service Behavior with Partitioned FTP File Targets . . . . . . . . . . . . 31-1. Types of Cache Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31-2. Naming Conventions for Cache Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31-3. Components of Cache File Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31-4. Cache Partitioning for Each Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 31-5. Caches for Joiner Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1. General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2. Properties Tab - General Options Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3. Properties Tab - Performance Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4. Config Object Tab - Advanced Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5. Config Object Tab - Log Options Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6. Config Object Tab - Error Handling Settings . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.680 .681 .683 .686 .692 .692 .693 .694 .694 .696 .697 .697 .698 .717 .718 .719 .722 .724 .730 .732 .732 .734 .735 .737 .737 .738 .740 .743 .751 .757 .759 .759 .772 .773 .773 .781 .786 .802 .805 .809 .812 .815 .816
Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table
A-7. Config Objects Tab - Partitioning Options . . . . . . . . . . . . . . . . . . . . . . A-8. Config Object Tab - Session on Grid . . . . . . . . . . . . . . . . . . . . . . . . . . A-9. Mapping Tab - Connections Settings . . . . . . . . . . . . . . . . . . . . . . . . . . A-10. Mapping Tab - Sources Node - Connections Settings . . . . . . . . . . . . . . A-11. Mapping Tab - Sources Node - Properties Settings (Relational Sources) A-12. Mapping Tab - Sources Node - Properties Settings (File Sources) . . . . . A-13. Fixed-Width Properties for File Sources . . . . . . . . . . . . . . . . . . . . . . . A-14. Delimited Properties for File Sources . . . . . . . . . . . . . . . . . . . . . . . . . A-15. Mapping Tab - Targets Node - Writers Settings . . . . . . . . . . . . . . . . . A-16. Mapping Tab - Targets Node - Connections Settings . . . . . . . . . . . . . . A-17. Mapping Tab - Targets Node - Properties Settings (Relational) . . . . . . A-18. Mapping Tab - Targets Node - File Properties Settings . . . . . . . . . . . . A-19. Fixed-Width Properties for File Targets . . . . . . . . . . . . . . . . . . . . . . . A-20. Delimited Properties for File Targets . . . . . . . . . . . . . . . . . . . . . . . . . A-21. Mapping Tab - Partition Points Node . . . . . . . . . . . . . . . . . . . . . . . . . A-22. Edit Partition Point Dialog Box Options. . . . . . . . . . . . . . . . . . . . . . . A-23. Components Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-24. Components Tab Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-25. Pre- or Post-Session Commands - General Tab . . . . . . . . . . . . . . . . . . A-26. Pre- or Post-Session Commands - Properties Tab . . . . . . . . . . . . . . . . . A-27. Pre- or Post-Session Commands - Commands Tab . . . . . . . . . . . . . . . . A-28. On-Success or On-Failure Emails - General Tab . . . . . . . . . . . . . . . . . A-29. On-Success or On-Failure Emails - Properties Tab. . . . . . . . . . . . . . . . A-30. Metadata Extensions Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1. Workflow Properties - General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2. Workflow Properties - Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . B-3. Workflow Properties - Scheduler Tab . . . . . . . . . . . . . . . . . . . . . . . . . . B-4. Workflow Properties - Scheduler Tab - Edit Scheduler Dialog Box . . . . . B-5. Workflow Properties - Repeat Dialog Box Options . . . . . . . . . . . . . . . . B-6. Workflow Properties - Variables Tab . . . . . . . . . . . . . . . . . . . . . . . . . . B-7. Workflow Properties - Events Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-8. Workflow Properties - Metadata Extensions Tab . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
819 820 822 824 825 826 828 830 832 833 835 837 840 841 845 846 849 849 851 852 852 854 855 856 860 862 865 866 867 869 870 871
List of Tables
xxxix
xl
List of Tables
Preface
Welcome to PowerCenter, the Informatica software product that delivers an open, scalable data integration solution addressing the complete life cycle for all data integration projects including data warehouses, data migration, data synchronization, and information hubs. PowerCenter combines the latest technology enhancements for reliably managing data repositories and delivering information resources in a timely, usable, and efficient manner. The PowerCenter repository coordinates and drives a variety of core functions, including extracting, transforming, loading, and managing data. The Integration Service can extract large volumes of data from multiple platforms, handle complex transformations on the data, and support high-speed loads. PowerCenter can simplify and accelerate the process of building a comprehensive data warehouse from disparate data sources.
xli
Document Conventions
This guide uses the following formatting conventions:
If you see It means The word or set of words are especially emphasized. Emphasized subjects. This is the variable name for a value you enter as part of an operating system command. This is generic text that should be replaced with user-supplied values. The following paragraph provides additional facts. The following paragraph provides suggested uses. The following paragraph notes situations where you can overwrite or corrupt data, unless you follow the specified procedure. This is a code example. This is an operating system command you enter from a prompt to run a task.
xlii
Preface
Informatica Customer Portal Informatica web site Informatica Knowledge Base Informatica Global Customer Support
support@informatica.com for technical inquiries support_admin@informatica.com for general customer service requests
WebSupport requires a user name and password. You can request a user name and password at http://my.informatica.com.
Preface
xliii
Use the following telephone numbers to contact Informatica Global Customer Support:
North America / South America Informatica Corporation Headquarters 100 Cardinal Way Redwood City, California 94063 United States Europe / Middle East / Africa Informatica Software Ltd. 6 Waltham Park Waltham Road, White Waltham Maidenhead, Berkshire SL6 3TN United Kingdom Asia / Australia Informatica Business Solutions Pvt. Ltd. Diamond District Tower B, 3rd Floor 150 Airport Road Bangalore 560 008 India Toll Free Australia: 1 800 151 830 Singapore: 001 800 4632 4357 Standard Rate India: +91 80 4112 5738
Standard Rate Belgium: +32 15 281 702 France: +33 1 41 38 92 26 Germany: +49 1805 702 702 Netherlands: +31 306 022 797 United Kingdom: +44 1628 511 445
xliv
Preface
Chapter 1
Overview, 2 Customizing Workflow Manager Options, 6 Navigating the Workspace, 15 Working with Repository Objects, 19 Checking In and Out Versioned Repository Objects, 20 Searching for Versioned Objects, 23 Copying Repository Objects, 24 Comparing Repository Objects, 26 Working with Metadata Extensions, 29 Keyboard Shortcuts, 33
Overview
In the Workflow Manager, you define a set of instructions called a workflow to execute mappings you build in the Designer. Generally, a workflow contains a session and any other task you may want to perform when you run a session. Tasks can include a session, email notification, or scheduling information. You connect each task with links in the workflow. You can also create a worklet in the Workflow Manager. A worklet is an object that groups a set of tasks. A worklet is similar to a workflow, but without scheduling information. You can run a batch of worklets inside a workflow. After you create a workflow, you run the workflow in the Workflow Manager and monitor it in the Workflow Monitor. For more information about the Workflow Monitor, see Monitoring Workflows on page 569.
Task Developer. Use the Task Developer to create tasks you want to run in the workflow. Workflow Designer. Use the Workflow Designer to create a workflow by connecting tasks with links. You can also create tasks in the Workflow Designer as you develop the workflow. Worklet Designer. Use the Worklet Designer to create a worklet.
Figure 1-1 shows what a workflow might look like if you want to run a session, perform a shell command after the session completes, and then stop the workflow:
Figure 1-1. Sample Workflow
Workflow Tasks
You can create the following types of tasks in the Workflow Manager:
Assignment. Assigns a value to a workflow variable. For more information, see Working with the Assignment Task on page 166. Command. Specifies a shell command to run during the workflow. For more information, see Working with the Command Task on page 169. Control. Stops or aborts the workflow. For more information about the Control task, see Stopping or Aborting the Workflow on page 140. Decision. Specifies a condition to evaluate. For more information, see Working with the Decision Task on page 176. Email. Sends email during the workflow. For more information about the Email task, see Sending Email on page 401. Event-Raise. Notifies the Event-Wait task that an event has occurred. For more information, see Working with Event Tasks on page 180. Event-Wait. Waits for an event to occur before executing the next task. For more information, see Working with Event Tasks on page 180. Session. Runs a mapping you create in the Designer. For more information about the Session task, see Working with Sessions on page 205. Timer. Waits for a timed event to trigger. For more information, see Scheduling a Workflow on page 123.
Navigator. You can connect to and work in multiple repositories and folders. In the Navigator, the Workflow Manager displays a red icon over invalid objects. Workspace. You can create, edit, and view tasks, workflows, and worklets. Output. Contains tabs to display different types of output messages. The Output window contains the following tabs:
Save. Displays messages when you save a workflow, worklet, or task. The Save tab displays a validation summary when you save a workflow or a worklet. Fetch Log. Displays messages when the Workflow Manager fetches objects from the repository. Validate. Displays messages when you validate a workflow, worklet, or task. Copy. Displays messages when you copy repository objects. Server. Displays messages from the Integration Service. Notifications. Displays messages from the Repository Service.
Overview
Overview. An optional window that lets you easily view large workflows in the workspace. Outlines the visible area in the workspace and highlights selected objects in color. Click View > Overview Window to display this window.
You can view a list of open windows and switch from one window to another in the Workflow Manager. To view the list of open windows, click Window > Windows. The Workflow Manager also displays a status bar that shows the status of the operation you perform. Figure 1-2 shows the Workflow Manager windows:
Figure 1-2. Workflow Manager Windows
Status Bar
Output
Navigator
Workspace
Overview
you remove an Integration Service with associated workflows, assign another Integration Service to the workflows.
To delete an Integration Service: 1. 2.
In the Navigator, right-click on the Integration Service you want to delete. Click Delete.
Overview
General. You can configure workspace options, display options, and other general options on the General tab. For more information about the General tab, see Configuring General Options on page 6. Format. You can configure font, color, and other format options on the Format tab. For more information about the Format tab, see Configuring Format Options on page 9. Miscellaneous. You can configure Copy Wizard and Versioning options on the Miscellaneous tab. For more information about the Miscellaneous tab, see Configuring Miscellaneous Options on page 11. Advanced. You can configure enhanced security for connection objects in the Advanced tab. For more information about the Advanced tab, see Enabling Enhanced Security on page 13.
You can also configure the workspace layout for printing. For more information, see Configuring Page Setup Options on page 14.
Table 1-1 describes general options you can configure in the Workflow Manager:
Table 1-1. Workflow Manager General Options
Option Reload Tasks/ Workflows When Opening a Folder Ask Whether to Reload the Tasks/Workflows Delay Overview Window Pans Arrange Workflows/ Worklets Vertically By Default Allow Invoking In-Place Editing Using the Mouse Description Reloads the last view of a tool when you open it. For example, if you have a workflow open when you disconnect from a repository, select this option so that the same workflow appears the next time you open the folder and Workflow Designer. Default is enabled. Appears when you select Reload tasks/workflows when opening a folder. Select this option if you want the Workflow Manager to prompt you to reload tasks, workflows, and worklets each time you open a folder. Default is disabled. By default, when you drag the focus of the Overview window, the focus of the workbook moves concurrently. When you select this option, the focus of the workspace does not change until you release the mouse button. Default is disabled. Arranges tasks in workflows vertically by default. Default is disabled.
By default, you can press F2 to edit objects directly in the workspace instead of opening the Edit Task dialog box. Select this option so you can also click the object name in the workspace to edit the object. Default is disabled.
Display Tool Names on Views Always Show the Full Name of Tasks Show the Expression on a Link Show Background in Partition Editor and Pushdown Optimization Launch Workflow Monitor when Workflow Is Started Receive Notifications from Repository Service
You can receive notification messages in the Workflow Manager and view them in the Output window. Notification messages include information about objects that another user creates, modifies, or deletes. You receive notifications about sessions, tasks, workflows, and worklets. The Repository Service notifies you of the changes so you know objects you are working with may be out of date. For the Workflow Manager to receive a notification, the folder containing the object must be open in the Navigator, and the object must be open in the workspace. You also receive user-created notifications posted by the user who manages the Repository Service. Default is enabled. Reset all format options to the default values.
Reset All
Table 1-2 describes the format options for the Workflow Manager:
Table 1-2. Workflow Manager Format Options
Option Current Theme Select Theme Tools Color Orthogonal Links Solid Lines for Links Categories Description Currently selected color theme for the Workflow Manager tools. This field is display-only. Apply a color theme to the Workflow Manager tools. For more information about color themes, see Using Color Themes on page 10. Workflow Manager tool that you want to configure. When you select a tool, the configurable workspace elements appear in the list below Tools menu. Color of the selected workspace element. Link lines run horizontally and vertically but not diagonally in the workspace. Links appear as solid lines. By default, the Workflow Manager displays orthogonal links as dotted lines. Component of the Workflow Manager that you want to customize.
Informatica Classic. This is the standard color scheme for workspace elements. The workspace background is gray, the workspace text is white, and the link colors are blue, red, blue-gray, dark green, and black. High Contrast Black. Bright link colors stand out against the black background.The workspace background is black, the workspace text is white, and the link colors are purple, red, light blue, bright green, and white. Colored Backgrounds. Each Designer tool has a different pastel-colored workspace background. The workspace text is black, and the link colors are the same as in the Informatica Classic color theme.
Note: You can also apply a color theme to the Designer tools. For more information, see
Using the Designer in the Designer Guide. After you select a color theme for the Workflow Manager tools, you can modify the color of individual workspace elements.
To select a color theme for a Workflow Manager tool: 1. 2. 3.
In the Workflow Manager, click Tools > Options. Click the Format tab. In the Color Themes section of the Format tab, click Select Theme.
10
4. 5. 6.
Select a theme from the Theme menu. Click the tabs in the Preview section to see how the workspace elements appear in each of the Workflow Manager tools. Click OK to apply the color theme.
Note: After you select the workspace colors using a color theme, you can change the color of
individual workspace elements. Changes that you make to individual elements do not appear in the Preview section of the Theme Selector dialog box. For information about customizing individual workspace elements, see Configuring Format Options on page 9.
11
Get Default Object When Resolved to Choose Show Check Out Image in Navigator Allow Delete Without Checkout
12
Reset All
13
3.
4.
Click OK.
14
Customize windows. Customize toolbars. Search for tasks, links, events and variables. Arrange objects in the workspace. Zoom and pan the workspace.
Display a window. From the menu, select View. Then select the window you want to open. Close a window. Click the small x in the upper right corner of the window. Dock or undock a window. Double-click the title bar or drag the title bar toward or away from the workspace.
Using Toolbars
The Workflow Manager can display the following toolbars to help you select tools and perform operations quickly:
Standard. Contains buttons to connect to and disconnect from repositories and folders, toggle windows, zoom in and out, pan the workspace, and find objects. Connections. Contains buttons to create and edit connections, and assign Integration Services. Repository. Contains buttons to connect to and disconnect from repositories and folders, export and import objects, save changes, and print the workspace. View. Contains buttons to customize toolbars, toggle the status bar and windows, toggle full-screen view, create a new workbook, and view the properties of objects. Layout. Contains buttons to arrange and restore objects in the workspace, find objects, zoom in and out, and pan the workspace. Tasks. Contains buttons to create tasks. Workflow. Contains buttons to edit workflow properties. Run. Contains buttons to schedule the workflow, start the workflow, or start a task. Versioning. Contains buttons to check in objects, undo checkouts, compare versions, list checked-out objects, and list repository queries. Tools. Contains buttons to connect to the other PowerCenter Client applications. When you use a Tools button to open another PowerCenter Client application, PowerCenter uses the same repository connection to connect to the repository and opens the same folders.
Navigating the Workspace 15
For more information about how to perform these toolbar operations, see Using the Designer in the Designer Guide.
Find in Workspace. Searches multiple items at once and returns a list of all task names, link conditions, event names, or variable names that contain the search string. Find Next. Searches through items one at a time and highlights the first task, link, event, variable, or text string that contains the search string. If you repeat the search, the Workflow Manager highlights the next item that contains the search string.
In any Workflow Manager tool, click the Find in Workspace toolbar button or click Edit > Find in Workspace. The Find in Workspace dialog box appears.
2. 3.
Choose search for tasks, links, variables, or events. Enter a search string, or select a string from the list. The Workflow Manager saves the last 10 search strings in the list.
4. 5.
Specify whether or not to match whole words and whether or not to perform a casesensitive search. Click Find Now. The Workflow Manager lists task names, link conditions, event names, or variable names that match the search string at the bottom of the dialog box.
6.
Click Close.
16
To search for a task, link, event, or variable, open the appropriate Workflow Manager tool and click a task, link, or event. To search for text in the Output window, click the appropriate tab in the Output window. Enter a search string in the Find field on the standard toolbar. The search is not case sensitive.
Find Next Button Find Field
2.
3.
Click Edit > Find Next, click the Find Next button on the toolbar, or press Enter or F3 to search for the string. The Workflow Manager highlights the first task name, link condition, event name, or variable name that contains the search string, or the first string in the Output window that matches the search string.
4.
To search for the next item, press Enter or F3 again. The Workflow Manager alerts you when you have searched through all items in the workspace or Output window before it highlights the same objects a second time.
Zoom Center In/Out by 10%. Increases or decreases the magnification by 10% increments while maintaining the center of the view. Zoom Point In/Out by 10%. Uses a point you select as the center point and increases or decreases the magnification by 10% increments. Zoom Rectangle. Increases the current magnification of a rectangular area you select. Degree of magnification depends on the size of the area you select, workspace size, and current magnification. Zoom Normal. Sets the zoom level to 100%. Scale to Fit. Scales all workspace objects to fit the workspace.
17
Zoom Percent. Sets the zoom level to the percent you choose while maintaining the center of the view.
To maximize the size of the workspace window, click View > Full Screen. To go back to normal view, click the Close Full Screen button or press Esc. To pan the workspace, click Layout > Pan or click the Pan button on the toolbar. Drag the focus of the workspace window and release the mouse button when it is in the appropriate position. Double-click the workspace to stop panning.
18
View properties for each object. Enter descriptions for each object. Rename an object.
To edit any repository object, you must first add a repository in the Navigator so you can access the repository object. To add a repository in the Navigator, click Repository > Add. Use the Add Repositories dialog box to add the repository. For more information about adding repositories, see Using the Repository Manager in the PowerCenter Repository Guide.
19
Checking In Objects
You commit changes to the repository by checking in objects. When you check in an object, the repository creates a new version of the object and assigns it a version number. The repository increments the version number by one each time it creates a new version. To check in an object from the Workflow Manager workspace, select the object or objects and click Versioning > Check in.
For more information about checking out and checking in objects, see the Repository Guide. If you want to check out or check in scheduler objects in the Workflow Manager, you can run an object query to search for them. You can also check out a scheduler object in the Scheduler Browser window when you edit the object. However, you must run an object query to check in the object. If you want to check out or check in session configuration objects in the Workflow Manager, you can run an object query to search for them. You can also check out objects from the Session Config Browser window when you edit them. In addition, you can check out and check in session configuration and scheduler objects from the Repository Manager. For more information about running queries to find versioned objects, see Searching for Versioned Objects on page 23.
20
Version 3
Use the following rules and guidelines when you view older versions of objects in the workspace:
You cannot simultaneously view multiple versions of composite objects, such as workflows and worklets. Older versions of a composite object might not include the child objects that were used when the composite object was checked in. If you open a composite object that includes a child object version that is purged from the repository, the preceding version of the child object appears in the workspace as part of the composite object. For example, you might want to view version 5 of a workflow that originally included version 3 of a session, but version 3 of the session is purged from the repository. When you view version 5 of the workflow, version 2 of the session appears as part of the workflow. You cannot view older versions of sessions if they reference deleted or invalid mappings, or if they do not have a session configuration.
21
In the workspace or Navigator, select the object and click Versioning > View History.
2.
Select the version you want to view in the workspace and click Tools > Open in Workspace.
Note: An older version of an object is read-only, and the version number appears as a
prefix before the object name. You can simultaneously view multiple versions of a noncomposite object in the workspace.
To compare two versions of an object: 1. 2.
In the workspace or Navigator, select an object and click Versioning > View History. Select the versions you want to compare and then click Compare > Selected Versions. -orSelect a version and click Compare > Previous Version to compare a version of the object with the previous version. The Diff Tool appears. For more information about comparing objects in the Diff Tool, see the Repository Guide.
Note: You can also access the View History window from the Query Results window when
you run a query. For more information about creating and working with object queries, see Working with Object Queries in the Repository Guide.
22
Track repository objects during development. You can add Label, User, Last saved, or Comments parameters to queries to track objects during development. For more information about creating object queries, see Working with Object Queries in the Repository Guide. Associate a query with a deployment group. When you create a dynamic deployment group, you can associate an object query with it. For more information about working with deployment groups, see Copying Folders and Deployment Groups in the Repository Guide.
To create an object query, click Tools > Queries to open the Query Browser. Figure 1-7 shows the Query Browser:
Figure 1-7. Query Browser
Edit a query. Delete a query. Create a query. Configure permissions. Run a query.
From the Query Browser, you can create, edit, and delete queries. You can also configure permissions for each query from the Query Browser. You can run any queries for which you have read permissions from the Query Browser.
23
The Import Wizard provides the same options to resolve conflicts as the Copy Wizard. For more information, see Exporting and Importing Objects in the Repository Guide.
Copying Sessions
When you copy a Session task, the Copy Wizard looks for the database connection and associated mapping in the destination folder. If the mapping or connection does not exist in the destination folder, you can select a new mapping or connection. If the destination folder does not contain any mapping, you must first copy a mapping to the destination folder in the Designer before you can copy the session. When you copy a session that has mapping variable values saved in the repository, the Workflow Manager either copies or retains the saved variable values.
24
Open the workflow or worklet. To select a segment, highlight each task you want to copy. You can select multiple reusable or non-reusable objects. You can also select segments by dragging the pointer in a rectangle around objects in the workspace. Click Edit > Copy or press Ctrl+C to copy the segment to the clipboard. Open the workflow or worklet into which you want to paste the segment. You can also copy the object into the Workflow or Worklet Designer workspace. Click Edit > Paste or press Ctrl+V.
3. 4. 5.
The Copy Wizard opens, and notifies you if it finds copy conflicts.
Note: You can copy individual non-reusable tasks by selecting the individual task and
25
You can also compare instances of the same type. For example, if the workflows you compare contain worklet instances with the same name, you can compare the instances to see if they differ. Use the Workflow Manager to compare the following instances and attributes:
Instances of sessions and tasks in a workflow or worklet comparison. For example, when you compare workflows, you can compare task instances that have the same name. Instances of mappings and transformations in a session comparison. For example, when you compare sessions, you can compare mapping instances. The attributes of instances of the same type within a mapping comparison. For example, when you compare flat file sources, you can compare attributes, such as file type (delimited or fixed), delimiters, escape characters, and optional quotes.
You can compare schedulers and session configuration objects in the Repository Manager. You cannot compare objects of different types. For example, you cannot compare an Email task with a Session task. When you compare objects, the Workflow Manager displays the results in the Diff Tool window. The Diff Tool output contains different nodes for different types of objects. When you import Workflow Manager objects, you can compare object conflicts. For more information, see Exporting and Importing Objects in the Repository Guide.
Open the folders that contain the objects you want to compare. Open the appropriate Workflow Manager tool.
3.
Click Tasks > Compare, Worklets > Compare, or Workflow > Compare. A dialog box similar to the following one appears.
4. 5.
select the objects, right-click and select Compare Objects. In the workspace, select the objects, right-click and select Compare Objects.
27
Differences between objects are highlighted and the nodes are flagged. Differences between object properties are marked. Displays the properties of the node you select.
6.
To view more differences between object properties, click the Compare Further icon or right-click the differences.
7.
If you want to save the comparison as a text or HTML file, click File > Save to File.
28
You can create both reusable and non-reusable metadata extensions. You associate reusable metadata extensions with all repository objects of a certain type such as all sessions or all worklets. You associate non-reusable metadata extensions with a single repository object such as one workflow. For more information about metadata extensions, see Metadata Extensions in the PowerCenter Repository Guide.
Open the appropriate Workflow Manager tool. Drag the appropriate object into the workspace. Double-click the title bar of the object to edit it.
29
4.
This tab lists the existing user-defined and vendor-defined metadata extensions. Userdefined metadata extensions appear in the User Defined Metadata Domain. If they exist, vendor-defined metadata extensions appear in their own domains.
5.
Click the Add button. A new row appears in the User Defined Metadata Extension Domain.
6.
Datatype Precision
30
Reusable
Required
UnOverride
Optional
Description 7.
Optional
Click OK.
31
32
Keyboard Shortcuts
When editing a repository object or maneuvering around the Workflow Manager, use the following Keyboard shortcuts to help you complete different operations quickly. Table 1-6 lists the Workflow Manager keyboard shortcuts for editing a repository object:
Table 1-6. Workflow Manager Keyboard Shortcuts
To Cancel editing in a cell. Select and clear a check box. Copy text from a cell onto the clipboard. Cut text from a cell onto the clipboard. Edit the text of a cell. Find all combination and list boxes. Find tables or fields in the workspace. Move around cells in a dialog box. Paste copied or cut text from the clipboard into a cell. Select the text of a cell. Press Esc Space Bar Ctrl+C Ctrl+X F2 Type the first letter on the list. Ctrl+F Ctrl+directional arrows Ctrl+V F2
Table 1-7 lists the Workflow Manager keyboard shortcuts for navigating in the workspace:
Table 1-7. Keyboard Shortcuts for Navigating the Workspace
To Create links. Press Ctrl+F2. Press Ctrl+F2 to select first task you want to link. Press Tab to select the rest of the tasks you want to link. Press Ctrl+F2 again to link all the tasks you selected. F2 SHIFT + * (use asterisk on numeric keypad ) Tab Ctrl+mouse click
Edit task name in the workspace. Expand selected node and all its children. Move across Select tasks in the workspace. Select multiple tasks.
Table 24-2 on page 652 lists keyboard shortcuts for the Log Events window of the Workflow Monitor.
Keyboard Shortcuts
33
34
Chapter 2
Overview, 36 Working with Connection Objects, 37 Configuring Session Connections, 40 Overriding Connection Attributes, 44 Configuring Connection Object Permissions, 48 Relational Database Connections, 50 FTP Connections, 60 External Loader Connections, 64 HTTP Connections, 66 PowerChannel Relational Database Connections, 71 PowerExchange Connections
35
Overview
Before using the Workflow Manager to create workflows and sessions, you must configure connections in the Workflow Manager. You can configure the following connection information in the Workflow Manager:
Create relational database connections. Create connections to each source, target, Lookup transformation, and Stored Procedure transformation database. You must create connections to a database before you can create a session that accesses the database. For more information, see Relational Database Connections on page 50. Create FTP connections. Create a File Transfer Protocol (FTP) connection object in the Workflow Manager and configure the connection properties. For more information, see FTP Connections on page 60. Create external loader connections. Create an external loader connection to load information directly from a file or pipe rather than running the SQL commands to insert the same data into the database. For more information, see External Loader Connections on page 64. Create queue connections. Create database connections for message queues. For more information, see PowerExchange for WebSphere MQ Connections on page 101 and PowerExchange for MSMQ Connections on page 76. Create source and target application connections. Create connections to source and target applications. When you create or modify a session that reads from or writes to an application, you can select configured source and target application connections. When you create a connection, the connection properties you need depends on the application. For more information, see PowerExchange Interfaces for PowerCenter or one of the following PowerExchange sections in this chapter:
PowerExchange for JMS Connections, 74 PowerExchange for MSMQ Connections, 76 PowerExchange for PeopleSoft Connections, 77 PowerExchange for Salesforce Connections, 79 PowerExchange for SAP NetWeaver BW Option Connections, 80 PowerExchange for SAP NetWeaver mySAP Option Connections, 83 PowerExchange for Siebel Connections, 90 PowerExchange for TIBCO Connections, 92 PowerExchange for Web Services Connections, 96 PowerExchange for webMethods Connections, 99 PowerExchange for WebSphere MQ Connections, 101
36
Select a connection object and click to edit. Select a connection object and click to delete. Click to create a new connection object. Select a connection object and click to configure permissions. Select a relational database connection object and click to copy. To create a connection object: 1.
In the Workflow Manager, click Connections and select the type of connection you want to create. Select one of the following connection types:
Relational. For information about creating relational database connections, see Relational Database Connections on page 50 or PowerExchange Interfaces for PowerCenter.
Working with Connection Objects 37
FTP. For information about creating FTP connections, see FTP Connections on page 60. Loader. For information about creating loader connections, see External Loader Connections on page 64. Queue. For information about creating queue connections, see PowerExchange for WebSphere MQ Connections on page 101 and PowerExchange for MSMQ Connections on page 76. Application. For information about creating application connections, see the appropriate PowerExchange application section in this chapter or PowerExchange Interfaces for PowerCenter.
The Connection Browser dialog box appears, listing all the source and target connections available for the selected connection type. See Figure 2-1 on page 37.
2.
Click New. If you selected FTP as the connection type, the Connection Object dialog box appears. Go to step 5. If you selected Relational, Queue, Application, or Loader connection type, the Select Subtype dialog box appears.
3. 4.
In the Select Subtype dialog box, select the type of database connection you want to create. Click OK.
5.
Enter the properties for the type of connection object you want to create.
38
The Connection Object Definition dialog box displays different properties depending on the type of connection object you create. For more information about connection object properties, see the section for each specific connection type in this chapter.
6.
Click OK. The new database connection appears in the Connection Browser list.
7. 8.
To add more database connections, repeat steps 3 to 7. Click OK to save all changes.
Open the Connection Browser dialog box for the connection object. For example, click Connections > Relational to open the Connection Browser dialog box for a relational database connection. Click Edit. The Connection Object Definition dialog box appears.
2.
3.
Enter the values for the properties you want to modify. The connection properties vary depending on the type of connection you select. For more information about connection properties, see the section for each specific connection type in this chapter.
4.
Click OK.
Open the Connection Browser dialog box for the connection object. For example, click Connections > Relational to open the Connection Browser dialog box for a relational database connection. Select the connection object you want to delete in the Connection Browser dialog box.
Tip: Hold the shift key to select more than one connection to delete.
2.
3.
39
targets, and transformations such as Lookup and Stored Procedure transformations. You cannot change the relational connection type.
Application
Select this connection type to access PowerExchange and Teradata FastExport sources and targets. You can also access transformations such as HTTP, Salesforce Lookup, and BAPI/ RFC transformations.
Queue
Select this connection type to access an MSMQ or WebSphere MQ source, or if you want to write messages to a WebSphere MQ message queue. If you select this option, select a configured MQ connection in the value column. For static WebSphere MQ targets, set the connection type to FTP or Queue. For dynamic MQSeries targets, set the connection type to Queue. For more information, see the PowerExchange for WebSphere MQ User Guide.
External Loader
Select this connection type to use the External Loader to load output files to Teradata, Oracle, DB2, or Sybase IQ databases. If you select this option, select a configured loader connection in the Value column. To use this option, you must use a mapping with a relational target definition and choose File as the writer type on the Writers tab for the relational target instance. As the Integration Service completes the session, it uses an external loader to load target files to the Oracle, Sybase IQ, IBM DB2, or Teradata database. You cannot choose external loader for flat file or XML target definitions in the mapping. For more information about using the external loader feature, see External Loading on page 711.
FTP
Select this connection type to use FTP to access the source or target directory for flat file and XML sources or targets. If you want to extract or load data from or to a flat file or XML source or target using FTP, you must specify an FTP connection when you configure source or target options. If you select this option, select a configured FTP connection in the Value
40 Chapter 2: Managing Connection Objects
column. FTP connections must be defined in the Workflow Manager prior to configuring sessions. For more information about using FTP, see Using FTP on page 749 .
None
Select None when you want to read or write from a local flat file or XML file, or if you use an associated source for a WebSphere MQ session.
When you use $Source and the pipeline contains one source, the Integration Service uses the database connection you specify for the source. When you use $Source and the pipeline contains multiple sources joined by a Joiner transformation, the Integration Service determines the database connections based on the location of the Lookup or Stored Procedure transformation in the pipeline:
When the transformation is after the Joiner transformation, the Integration Service uses the database connection for the detail table. When the transformation is before the Joiner transformation, the Integration Service uses the database connection for the source connected to the transformation.
When you use $Target and the pipeline contains one target, the Integration Service uses the database connection you specify for the target.
41
When you use $Target and the pipeline contains multiple relational targets, the session fails. When you use $Source or $Target in an unconnected Lookup or Stored Procedure transformation, the session fails.
You can enter values for the $Source and $Target connection variables on the Properties tab of the session properties, or on the Mapping tab, Connections node.
To enter the database connection for the $Source and $Target connection variables: 1.
In the session properties, select the Properties tab or the Mapping tab, Connections node.
2.
Click the Open button in $Source Connection Value or $Target Connection Value field.
42
3.
Use Connection Object. Select a relational or application connection object. You can also create a connection object and select it. Use Connection Variable. Use a connection variable or session parameter. You can enter either the $Source or $Target connection variable, or the $DBConnectionName or $AppConnectionName session parameter. If you enter a session parameter, define the parameter in the parameter file. If you do not define a value for the session parameter, the Integration Service determines which database connection to use when it runs the session.
4.
Click OK.
43
You use an FTP, queue, external loader, or application connection for a non-relational source or target instance. You use an FTP, queue, or external loader connection for a relational target instance. You use an application connection for a relational source instance.
For example, for an FTP target connection, you can override the remote file name, whether the target file is staged, and the transfer mode. For a Teradata FastExport source connection, you can override attributes such as TDPID, tenacity, and block size. When you configure a session, you can override connection attributes for a connection object or you can use a session parameter and override the attributes in the parameter file. You configure connections in the Connections settings on the Mapping tab:
Select a connection object and override connections, or use a session parameter. Connection Object or Session Parameter that defines the Connection
44
You can override connection attributes in the session or in the parameter file:
Session Select the connection object and override attributes in the session. For more information, see Overriding Connection Attributes in the Session on page 45. Parameter file. Use a session parameter to define the connection and override connection attributes in the parameter file. For more information, see Overriding Connection Attributes in the Parameter File on page 46.
On the Mapping tab, select the source or target instance in the Connections node. Select the connection type. Click the Open button in the value field to select a connection object. Choose the connection object. Click Override. The Connection Object Definition dialog box appears.
6. 7.
45
When you define a connection in the parameter file, copy the template for the appropriate connection type and paste it into the parameter file. Then supply the parameter values. For example, to override connection attributes for an FTP connection in the parameter file, perform the following steps: 1. 2. 3. 4. Configure the session or workflow to run with a parameter file. For more information, see Configuring the Parameter File Name and Location on page 702. In the session properties Mapping tab, select the source or target instance in the Connections node. Click the Open button in the value field and configure the connection to use a session parameter. For example, use $FTPConnectionMyFTPConn for an FTP connection. Open the ConnectionParam.prm template file in a text editor and scroll down to the section for the connection type whose attributes you want to override. For example, for an FTP connection, locate the Connection Type: FTP section:
Connection Type : FTP --------------------... Template ==================== $FTPConnection<VariableName>= $Param_FTPConnection<VariableName>_Remote_Filename= $Param_FTPConnection<VariableName>_Is_Staged= $Param_FTPConnection<VariableName>_Is_Transfer_Mode_ASCII=
46
5.
Copy the template text for the connection attributes you want to override. For example, to override the Remote File Name and Is Staged attributes, copy the following lines:
$FTPConnection<VariableName>= $Param_FTPConnection<VariableName>_Remote_Filename= $Param_FTPConnection<VariableName>_Is_Staged=
6.
Paste the text into the parameter file. Replace <VariableName> with the connection name, and supply the parameter values. For example:
[MyFolder.WF:wf_MyWorkflow.ST:s_MySession] $FTPConnectionMyFTPConn=FTP_Conn1 $Param_FTPConnectionMyFTPConn_Remote_Filename=ftp_src.txt $Param_FTPConnectionMyFTPConn_Is_Staged=YES
Note: The Integration Service interprets spaces or quotation marks before or after the
equals sign as part of the parameter name or value. If you do not define a value for an attribute, the Integration Service uses the value defined for the connection object. For more information about parameter files, see Parameter Files on page 687.
47
Read. View the connection object in the Workflow Manager and Repository Manager. When you have read permission, you can perform tasks in which you view, copy, or edit repository objects associated with the connection object. Write. Edit the connection object. Execute. Run sessions that use the connection object.
To view a connection object you must have read permission on the object. To delete a connection object, you must be the owner of the connection object or have the Administrator role for the Repository Service. To edit a connection object, you must have read and write permissions on the object. For more information about the Administrator role and permissions needed to manage connection objects, see the PowerCenter Administrator Guide.
48
To assign or edit permissions on a connection object, select an object from the Connection Object Browser, and click Permissions.
You can perform the following tasks to manage permissions on a connection object:
Change connection object permissions for users and groups. Add users and groups and assign permissions to users and groups on the connection object. List all users to see all users that have permissions on the connection object. List all groups to see all groups that have permissions on the connection object. List all, to see all users, groups, and others that have permissions on the connection object. Remove each user or group that has permissions on the connection object. Remove all users and groups that have permissions on the connection object. Change the owner of the connection object.
For more information about how to add users, groups, and to change the owner, or assign permissions to each user and group, see the PowerCenter Repository Guide. If you change the permissions assigned to a user that is currently connected to a repository in a PowerCenter Client tool, the changed permissions take effect the next time the user reconnects to the repository.
49
Database name. Name for the connection. Database type. Type of the source or target database. Database user name. Name of a user who has the appropriate database permissions to read from and write to the database. You can enter a session parameter as the database user name and define it in a parameter file. For more information about database user names and passwords, see Database User Names and Passwords on page 50. To use an SQL override with pushdown optimization, the user must also have permission to create views on the source or target database.
Password. Database password in 7-bit ASCII. You can enter a session parameter as the database password and define it in a parameter file. For more information about database user names and passwords, see Database User Names and Passwords on page 50. Connect string. Connect string used to communicate with the database. For more information about connect strings, see Database Connect Strings on page 51. Database code page. Code page associated with the database. For more information about database code pages, see Database Connection Code Pages on page 52.
For relational databases, you might need to run special SQL commands in the database environment. For example, you might need to set the quoted identifier parameter. You can set up environment SQL to run SQL commands at each connection to the database or at each database transaction. For more information about configuring environment SQL commands, see Configuring Environment SQL on page 52.
50
For more information about pmpasswd, see the Command Line Reference. Some database drivers, such as ISG Navigator, do not allow user names and passwords. Since the Workflow Manager requires a database user name and password, PowerCenter provides two reserved words to register databases that do not allow user names and passwords:
PmNullUser PmNullPasswd Oracle OS Authentication. Oracle OS Authentication lets you log in to an Oracle database if you have a login name and password for the operating system. You do not need to know a database user name and password. PowerCenter uses Oracle OS Authentication when the connection user name is PmNullUser and the connection is for an Oracle database. IBM DB2 client authentication. IBM DB2 client authentication lets you log in to an IBM DB2 database without specifying a database user name or password if the IBM DB2 server is configured for external authentication or if the IBM DB2 server is on the same machine hosting the Integration Service process. PowerCenter uses IBM DB2 client authentication when the connection user name is PmNullUser and the connection is for an IBM DB2 database.
Use the PmNullUser user name if you use one of the following authentication methods:
51
The Workflow Manager filters the list of code pages for connections to ensure that the code page for the connection is a subset of the code page for the repository. If you configure the Integration Service for data code page validation, the Integration Service enforces code page compatibility at session runtime. The Integration Service ensures that the target database code page is a superset of the source database code page. When you change the code page in a database connection, you must choose one that is compatible with the previous code page. If the code pages are incompatible, the Workflow Manager invalidates all sessions using that database connection. If you configure the PowerCenter Client and Integration Service for relaxed code page validation, you can select any supported code page for source and target database connections. If you are familiar with the data and are confident that it will convert safely from one code page to another, you can run sessions with incompatible source and target data code pages. It is your responsibility to ensure your data will convert properly. For more information about code page compatibility, see the PowerCenter Administrator Guide.
configure connection environment SQL in a target connection, and you configure three partitions for the pipeline, the Integration Service runs the SQL three times, once for each connection to the target database. Use SQL commands that do not depend on a transaction being open during the entire read or write process. For example, if you want to set up the connection environment so that double quotation marks are object identifiers, use the following SQL statement, which sets the quoted identifier parameter for the duration of the connection:
SET QUOTED_IDENTIFIER ON
This command must be run before each transaction. The command is not appropriate for connection environment SQL because setting the parameter once for each connection is not sufficient.
You can enter any SQL command that is valid in the database associated with the connection object. The Integration Service does not allow nested comments, even though the database might. When you enter SQL in the SQL Editor, you type the SQL statements. Use a semicolon (;) to separate multiple statements. The Integration Service ignores semicolons within /*...*/. If you need to use a semicolon outside of comments, you can escape it with a backslash (\). You can use parameters and variables in the environment SQL. Use any parameter or variable type that you can define in the parameter file. You can enter a parameter or variable within the SQL statement, or you can use a parameter or variable as the environment SQL. For example, you can use a session parameter, $ParamMyEnvSQL, as the connection or transaction environment SQL, and set $ParamMyEnvSQL to the SQL statement in a parameter file. For more information, see Where to Use Parameters and Variables on page 691. You can configure the table owner name using sqlid in the connection environment SQL for a DB2 connection. However, the table owner name in the target instance overrides the SET sqlid statement in environment SQL. To use the table owner name specified in the SET sqlid statement, do not enter a name in the target name prefix.
53
In the Workflow Manager, connect to a repository. Click Connections > Relational. The Relational Connection Browser dialog box appears, listing all the source and target database connections.
3.
4. 5.
In the Select Subtype dialog box, select the type of database connection you want to create. Click OK.
54
6.
Table 2-2 describes the properties that you configure for a relational database connection:
Table 2-2. Relational Database Connection Information
Property Name Required/ Optional Required Description Connection name used by the Workflow Manager. Connection name cannot contain spaces or other special characters, except for the underscore. Type of database. Database user name with the appropriate read and write database permissions to access the database. If you use Oracle OS Authentication, IBM DB2 client authentication, or databases such as ISG Navigator that do not allow user names, enter PmNullUser. For Teradata connections, this overrides the default database user name in the ODBC entry. To define the user name in the parameter file, enter session parameter $ParamName as the user name, and define the value in the session or workflow parameter file. The Integration Service interprets user names that start with $Param as session parameters. For more information about parameter files, see Parameter Files on page 687.
Required Required
55
Password
Required
Connect String
Conditional
Code Page Connection Environment SQL Transaction Environment SQL Enable Parallel Mode Database Name
Required All relational databases All relational databases Oracle Sybase ASE, Microsoft SQL Server, and Teradata
Teradata Sybase ASE and Microsoft SQL Server Sybase ASE and Microsoft SQL Server Microsoft SQL Server
Packet Size
Use to optimize the native drivers for Sybase ASE and Microsoft SQL Server. The name of the domain. Used for Microsoft SQL Server on Windows.
Domain Name
56
7.
Click OK. The new database connection appears in the Connection Browser list.
8. 9.
To add more database connections, repeat steps 3 to 7. Click OK to save all changes.
2.
3.
4.
If you copy one database connection object as a different type of database connection, you must reconfigure the connection properties for the copied connection.
5.
Click OK. The Workflow Manager retains connection properties that apply to the database type. If a required connection property does not exist, the Workflow Manager displays a warning message. This happens when you copy a connection object as a different database type or copy a connection object that is already invalid.
6.
Click OK to close the warning dialog box. The copy of the connection appears in the Relational Connection Browser.
7. 8.
If the copied connection is invalid, click the Edit button to enter required connection properties. Click Close to close the Relational Connection Browser dialog box.
Source connection Target connection Connection Information property in Lookup and Stored Procedure transformations $Source Connection Value session property $Target Connection Value session property
For more information about using $Source and $Target connection variables, see Stored Procedure Transformation in the Transformation Guide. When the repository contains both relational and application connections with the same name, the Workflow Manager replaces the relational connections only if you specified the connection type as relational in all locations. For example, you have a relational and an application source, each called ITEMS. In one session, you specified the name ITEMS for a source connection instead of Relational:ITEMS. When you replace the relational connection ITEMS with another relational connection, the Workflow Manager does not replace any relational connection in the repository because it cannot determine the connection type for the source connection entered as ITEMS. The Integration Service uses the updated connection information the next time the workflow runs. You must first close all folders before replacing a relational database connection.
58 Chapter 2: Managing Connection Objects
Close all folders in the repository. Click Connections > Replace. The Replace Connections dialog box appears.
Add Button
3. 4.
Click the Add button to replace a connection. In the From list, select a relational database connection you want to replace.
5. 6.
In the To list, select the replacement relational database connection. Click Replace. All sessions in the repository that use the From connection now use the connection you select in the To list.
59
FTP Connections
Before you can configure a session to use FTP or SFTP, you must create and configure the FTP connection properties in the Workflow Manager. The Integration Service uses the FTP connection properties to create an SFTP or FTP connection.
To create an FTP connection: 1. 2.
In the Workflow Manager, connect to a repository. Click Connections > FTP. The FTP Connection Browser appears.
60
3.
Click New.
4.
Enter the properties for the FTP connection. Table 2-3 describes the properties that you configure for an FTP connection:
Table 2-3. FTP Connection Properties
Property Name User Name Required/ Optional Required Optional Description Connection name used by the Workflow Manager. User name necessary to access the host machine. Must be in 7-bit ASCII only. Required to connect to an SFTP server with password based authentication. To define the user name in the parameter file, enter session parameter $ParamName as the user name, and define the value in the session or workflow parameter file. The Integration Service interprets user names that start with $Param as session parameters. For more information about parameter files, see Parameter Files on page 687. Indicates the password for the user name is a session parameter, $ParamName. If you enable this option, define the password in the workflow or session parameter file, and encrypt it using the pmpasswd CRYPT_DATA option. For more information about pmpasswd, see the Command Line Reference. For more information about parameter files, see Parameter Files on page 687. Default is disabled. Password for the user name. Must be in 7-bit ASCII only. Required to connect to an SFTP server with password based authentication.
Required
Password
Optional
FTP Connections
61
-orIP address:port_number
When you specify a port number, enable that port number for FTP on the host machine. If you enable SFTP, specify a host name or port number for an SFTP server. If you do not specify a port number, the Integration Service uses 22 by default. Default Remote Directory Required Default directory on the FTP host used by the Integration Service. Do not enclose the directory in quotation marks. You can enter a parameter or variable for the directory. Use any parameter or variable type that you can define in the parameter file. For more information, see Where to Use Parameters and Variables on page 691. Depending on the FTP server you use, you may have limited options to enter FTP directories. See the FTP server documentation for details. In the session, when you enter a file name without a directory, the Integration Service appends the file name to this directory. This path must contain the appropriate trailing delimiter. For example, if you enter c:\staging\ and specify data.out in the session, the Integration Service reads the path and file name as c:\staging\data.out. For SAP, you can leave this value blank. SAP sessions use the Source File Directory session property for the FTP remote directory. If you enter a value, the Source File Directory session property overrides it. Number of seconds the Integration Service attempts to reconnect to the FTP host if the connection fails. If the Integration Service cannot reconnect to the FTP host in the retry period, the session fails. Default value is 0 and indicates an infinite retry period. Enables SFTP. Public key file path and file name. Required if the SFTP server uses publickey authentication. Enabled for SFTP. Private key file path and file name. Required if the SFTP server uses publickey authentication. Enabled for SFTP. Private key file password used to decrypt the private key file. Required if the SFTP server uses public key authentication and the private key is encrypted. Enabled for SFTP.
Retry Period
Optional
Use SFTP Public Key File Name Private Key File Name Private Key File Password
5.
Click OK.
connection. You can configure publickey or password authentication. The Integration Service connects to the SFTP server with the authentication properties you configure. If the authentication does not succeed, the session fails. For more information about SFTP authentication properties, see Table 2-3 on page 61.
If you enter a mainframe file name in the default directory for a source or target, enter the closing quote. For example, if the file is located in the following default remote directory:
staging.
To access the file data from the default mainframe directory, enter the following in the Remote file name field:
data
When the Integration Service begins or ends the session, it connects to the mainframe host and looks for the following directory and file name:
staging.data
If you want to use a file in a different directory, enter the directory and file name in the Remote file name field. For example, you might enter the following file name and directory:
overridedir.filename
FTP Connections
63
Click Connections > Loader in the Workflow Manager. The Loader Connection Browser dialog box appears.
2. 3.
Click New. Select an external loader type, and then click OK. The Loader Connection Editor dialog box appears.
64
4.
User Name
Required
Required
Password
Required
Connect String
Required
5.
Enter the loader connection attributes. For information about attributes for a specific external loader type, see External Loading on page 711.
6.
Click OK.
65
HTTP Connections
You configure connection information for an HTTP transformation in an HTTP application connection. The Integration Service can use HTTP application connections to connect to HTTP servers. HTTP application connections enable you to control connection attributes, including the base URL and other parameters. If you want to connect to an HTTP proxy server, configure the HTTP proxy server settings in the Integration Service. For more information about configuring the Integration Service to use an HTTP proxy server, see the PowerCenter Administrator Guide. Configure an HTTP application connection in the following circumstances:
The HTTP server requires authentication. You want to configure the connection timeout. You want to override the base URL in the HTTP transformation.
In the Workflow Manager, connect to a PowerCenter repository. Select Connections > Applications. The Application Connection Browser dialog box appears.
3. 4.
66
5. 6.
Enter a name for the HTTP connection. Enter values for the connection attributes.
HTTP Connection Attributes User Name Required/ Optional Required Description Authenticated user name for the HTTP server. If the HTTP server does not require authentication, enter PmNullUser. To define the user name in the parameter file, enter session parameter $ParamName as the user name, and define the value in the session or workflow parameter file. The Integration Service interprets user names that start with $Param as session parameters. For more information about parameter files, see Parameter Files on page 687. Indicates the password for the authenticated user is a session parameter, $ParamName. If you enable this option, define the password in the workflow or session parameter file, and encrypt it using the pmpasswd CRYPT_DATA option. For more information about pmpasswd, see the Command Line Reference. For more information about parameter files, see Parameter Files on page 687. Default is disabled. Password for the authenticated user. If the HTTP server does not require authentication, enter PmNullPasswd. URL of the HTTP server. This value overrides the base URL defined in the HTTP transformation.
Required
Required Optional
HTTP Connections
67
Description Number of seconds the Integration Service waits for a connection to the HTTP server before it closes the connection. Authentication domain for the HTTP server. This is required for NTLM authentication. For more information about authentication with an HTTP transformation, see HTTP Transformation in the PowerCenter Transformation Guide. File containing the bundle of trusted certificates that the client uses when authenticating the SSL certificate of a server. You specify the trust certificates file to have the Integration Service authenticate the HTTP server. By default, the name of the trust certificates file is cabundle.crt. For information about adding certificates to the trust certificates file, see Configuring Certificate Files for SSL Authentication on page 68. Client certificate that an HTTP server uses when authenticating a client. You specify the client certificate file if the HTTP server needs to authenticate the Integration Service. Password for the client certificate. You specify the certificate file password if the HTTP server needs to authenticate the Integration Service. File type of the client certificate. You specify the certificate file type if the HTTP server needs to authenticate the Integration Service. The file type can be PEM or DER. For information about converting certificate file types to PEM or DER, see Configuring Certificate Files for SSL Authentication on page 68. Default is PEM. Private key file for the client certificate. You specify the private key file if the HTTP server needs to authenticate the Integration Service. Password for the private key of the client certificate. You specify the key password if the web service provider needs to authenticate the Integration Service. File type of the private key of the client certificate. You specify the key file type if the HTTP server needs to authenticate the Integration Service. The HTTP transformation uses the PEM file type for SSL authentication.
Optional
Certificate File
Optional
Optional
Required
Optional Optional
Required
7.
Click OK.
68
The trust certificates file (ca-bundle.crt) contains certificate files from major, trusted certificate authorities. If the certificate bundle does not contain a certificate from a certificate authority that the HTTP transformation uses, you can convert the certificate of the HTTP server to PEM format and append it to the ca-bundle.crt file. The private key for a client certificate must be in PEM format.
DER. Files with the .cer or .der extension. PEM. Files with the .pem extension. PKCS12. Files with the .pfx or .P12 extension.
When you append HTTP server certificates to the ca-bundle.crt file, the HTTP server certificate files must use the PEM format. Use the OpenSSL utility to convert certificates from one format to another. You can get OpenSSL at http://www.openssl.org. For example, to convert the DER file named server.der to PEM format, use the following command:
openssl x509 -in server.der -inform DER -out server.pem -outform PEM
If you want to convert the PKCS12 file named server.pfx to PEM format, use the following command:
openssl pkcs12 -in server.pfx -out server.pem
To convert a private key named key.der from DER to PEM format, use the following command:
openssl rsa -in key.der -inform DER -outform PEM -out keyout.pem
For more information, refer to the OpenSSL documentation. After you convert certificate files to the PEM format, you can append them to the trust certificates file. Also, you can use PEM format private key files with HTTP transformation.
1.
Access the HTTP server using HTTPS. Double-click the padlock icon in the status bar of Internet Explorer. In the Certificate dialog box, click the Details tab. Select the Authority Information Access field. Click Copy to File.
HTTP Connections
69
Use the Certificate Export Wizard to copy the certificate in DER format.
2. 3.
Convert the certificate from DER to PEM format. Append the PEM certificate file to the certificate bundle, ca-bundle.crt. The ca-bundle.crt file is located in the following directory: <PowerCenter Installation Directory>/server/bin
For more information about adding certificates to the ca-bundle.crt file, see the curl documentation at http://curl.haxx.se/docs/sslcerts.html.
70
In the Workflow Manager, connect to a repository. Click Connections > Relational. The Relational Connection Browser dialog box appears, listing all the source and target database connections.
3.
4. 5.
In the Select Subtype dialog box, select the type of PowerChannel database connection you want to create. Click OK. The Connection Object Definition dialog box appears. Table 2-2 describes the properties that you configure for a PowerChannel relational database connection:
Table 2-4. PowerChannel Relational Database Connection Information
Property Name Required/ Optional Required Description Connection name used by the Workflow Manager. Connection name cannot contain spaces or other special characters, except for the underscore. Type of database.
Type
Required
71
Required
Password
Required
Connect String
Conditional
Required Microsoft SQL Server All relational databases Oracle Microsoft SQL Server Microsoft SQL Server Microsoft SQL Server Microsoft SQL Server
Environment SQL Rollback Segment Server Name Packet Size Domain Name Use Trusted Connection
72
Optional
Optional Optional
Encryption Level
Optional
Compression Level
Optional
Certificate Account
Optional
6.
Click OK. The new PowerChannel database connection appears in the Connection Browser list.
7. 8.
To add more database connections, repeat steps 3 to 7. Click OK to save all changes.
73
74
Table 2-6 describes the properties that you configure for a JMS application connection:
Table 2-6. JMS Application Connection Properties
Property JMS Destination Type Required/ Optional Required Description Select QUEUE or TOPIC for the JMS Destination Type. Select QUEUE if you want to read source messages from a JMS provider queue or write target messages to a JMS provider queue. Select TOPIC if you want to read source messages based on the message topic or write target messages with a particular message topic. Enter the name of the connection factory. The name of the connection factory must be the same as the connection factory name you configured in JNDI. The Integration Service uses the connection factory to create a connection with the JMS provider. Enter the name of the destination. The destination name must match the name you configured in JNDI. Enter a user name. Enter a password.
Required
For more information about JMS and JNDI, see the JMS documentation.
In the Workflow Manager, connect to a PowerCenter repository. Click Connections > Application. The Application Connection Browser dialog box appears.
3. 4.
From Select Type, select JNDI Connection or JMS Connection. Click New. The Connection Object Definition dialog box appears.
5. 6. 7.
Enter a name for the application connection. Enter the connection information. Click OK. The new application connection appears in the Application Connection Browser.
75
In the Workflow Manager, connect to a PowerCenter repository. Click Connections > Queue. The Queue Connection Browser dialog box displays.
3. 4.
From Select Type, select MSMQ. Click New. The Connect Object Definition dialog box appears.
5. 6.
Enter a name for the queue connection. Select a code page. The code page must match the code page of the source. For more information, see Database Connection Code Pages on page 52.
7.
Enter the connection information. Table 2-7 describes the properties that you configure for an MSMQ connection:
Table 2-7. MSMQ Connection Properties
Property Queue Name Machine Name Queue Type Required/ Optional Required Required Required Description Name of the MSMQ queue. Name of the MSMQ machine. If MSMQ is running on the same machine as the Integration Service, you can enter a period (.). Select public if the MSMQ queue is a public queue. Select private if the MSMQ queue is a private queue.
8.
Click OK. The new queue connection appears in the Message Queue connection list.
76
In the Workflow Manager, connect to a repository. Click Connections > Application. The Application Connection Browser appears.
3.
Select the PeopleSoft application type and click New. The Connection Object Definition dialog box appears.
4.
Enter the connection information. Table 2-8 describes the properties that you configure for a PeopleSoft application connection:
Table 2-8. PeopleSoft Application Connection Properties
Property Name User Name Required/ Optional Required Required Description Name you want to use for this connection. Database user name with SELECT permission on physical database tables in the PeopleSoft source system. To define the user name in the parameter file, enter session parameter $ParamName as the user name, and define the value in the session or workflow parameter file. The Integration Service interprets user names that start with $Param as session parameters. For more information about parameter files, see Parameter Files on page 687. Indicates the password for the database user name is a session parameter, $ParamName. If you enable this option, define the password in the workflow or session parameter file, and encrypt it using the pmpasswd CRYPT_DATA option. For more information about pmpasswd, see the Command Line Reference. For more information about parameter files, see Parameter Files on page 687. Default is disabled. Password for the database user name. Must be in US-ASCII. Connect string for the underlying database of the PeopleSoft system. For more information, see Table 2-1 on page 51. This option appears for DB2, Oracle, and Informix.
Required
Required Conditional
77
Language Code
Optional
Database Name Server Name Domain Name Packet Size Use Trusted Connection
Optional Optional
Click OK. The new application connection appears in the Application Object Browser.
6. 7.
To add more application connections, repeat steps 3 to 5. Click OK to save all changes.
78
In the Workflow Manager, connect to a repository. Click Connections > Application. The Application Connection Browser dialog box appears.
3. 4.
From Select Type, select Salesforce Connection. Click New. The Connection Object Definition dialog box appears.
5. 6.
Enter a name for the application connection. Enter the Salesforce user name for the application connection. The Integration Service uses this user name to log in to Salesforce.com.
7. 8.
Enter the password for the Salesforce user name. Enter a value for the connection attribute. Table 2-9 describes the connection attribute for a Salesforce application connection:
Table 2-9. Salesforce Application Connection Attribute
Connection Attribute Service URL Required/ Optional Required Description URL of the Salesforce service you want to access. Default is https://www.salesforce.com/services/Soap/u/7.0. In a test or development environment, you might want to access the Salesforce Sandbox testing environment. For more information about the Salesforce Sandbox, see the Salesforce documentation.
9.
Click OK. The new application connection appears in the Application Object Browser.
79
SAP BW OHS application connection to connect to SAP BW sources. SAP BW application connection to connect to SAP BW targets.
In the Workflow Manager, connect to a repository. Click Connections > Application. The Application Connection Browser dialog box appears.
3. 4.
From the Select Type list, select BW_OHS_READER to create a connection object for SAP BW OHS sources. Click New. The Connection Object Definition dialog box appears.
5.
Enter the connection information. Table 2-10 describes the properties that you configure for an SAP BW OHS application connection:
Table 2-10. SAP BW OHS Application Connection Properties
Property Name User Name Required/ Optional Required Required Description Connection name used by the Workflow Manager. SAP BW user name. To define the user name in the parameter file, enter session parameter $ParamName as the user name, and define the value in the session or workflow parameter file. The Integration Service interprets user names that start with $Param as session parameters. For more information about parameter files, see Overview on page 688. Indicates the SAP BW password is a session parameter, $ParamName. If you enable this option, define the password in the workflow or session parameter file, and encrypt it using the pmpasswd CRYPT_DATA option. For more information about pmpasswd, see the Command Line Reference. For more information about parameter files, see Overview on page 688. Default is disabled. SAP BW password.
Required
Password
Required
80
*For more information about code pages, see PowerExchange for SAP NetWeaver User Guide.
6.
Click OK. The new application connection appears in the Application Connection Browser list.
7.
Click Close.
In the Workflow Manager, connect to a repository. Click Connections > Application. The Application Connection Browser dialog box appears.
3. 4.
Select SAP BW to create a connection object for SAP BW targets. Click New. The Connection Object Definition dialog box appears.
5.
Enter the connection information. Table 2-11 describes the properties that you configure for an SAP BW application connection:
Table 2-11. SAP BW Application Connection Properties
Property Name User Name Required/ Optional Required Required Description Connection name used by the Workflow Manager. SAP BW user name. To define the user name in the parameter file, enter session parameter $ParamName as the user name, and define the value in the session or workflow parameter file. The Integration Service interprets user names that start with $Param as session parameters. For more information about parameter files, see Overview on page 688.
81
Required Optional
*For more information about code pages, see PowerExchange for SAP NetWeaver User Guide.
6.
Click OK. The new application connection appears in the Application Connection Browser list.
7.
Click Close.
82
SAP R/3 application connection. Configure application connections to access the SAP system when you run a stream or file mode session. For more information about configuring SAP R/3 connections, see Configuring an SAP R/3 Application Connection for ABAP Integration on page 84. FTP connection. Configure FTP connections to access the staging file through FTP. For more information about configuring FTP connections, see Configuring an FTP Connection for ABAP Integration on page 86. SAP_ALE _IDoc_Reader and SAP_ALE _IDoc_Writer connections. Configure SAP_ALE_IDoc_Reader connections to receive IDocs and business content integration documents using ALE. Configure SAP_ALE_IDoc_Writer connections to send IDocs using ALE. for more information about configuring SAP_ALE_IDoc_Writer connections, see Configuring Application Connections for ALE Integration on page 86. SAP RFC/BAPI interface connection. Configure SAP RFC/BAPI Interface connections if you want to process data in SAP using BAPI/RFC transformations. For more information about configuring SAP RFC/BAPI connections, see Configuring an Application Connection for BAPI/RFC Integration on page 88.
Table 2-12 describes the type of connection you need depending on the method of integration with mySAP applications:
Table 2-12. Types of Connections for SAP Sessions
Connection Type SAP R/3 application connection FTP connection SAP_ALE _IDoc_Reader connection SAP_ALE _IDoc_Writer connection SAP RFC/BAPI interface connection Integration Method ABAP integration with stream and file mode sessions. ABAP integration with file mode sessions. IDoc ALE and business content integration. IDoc ALE and business content integration. BAPI/RFC integration.
83
CPI-C. Use a CPI-C connection when you extract data through stream mode. The connection information for CPI-C is stored in the sideinfo file. RFC. Use an RFC connection when you extract data through file mode. The connection information for RFC is stored in saprfc.ini. You must also have authorizations on the SAP system to read SAP tables and to run file mode and stream mode sessions.
Configuring one Application Connection for Stream and File Mode Sessions
You can create separate application connections for file and stream mode, or you can create one connection for both file and stream mode. Create separate entries if the SAP administrator creates separate authorization profiles. To create one connection for both modes, the following conditions must be true:
The saprfc.ini and sideinfo files must have the same entries for connection string and client. The SAP administrator must have created a single profile with authorizations for both file and stream mode sessions. For more information about creating a profile, see the PowerExchange for SAP NetWeaver User Guide.
In the Workflow Manager, connect to a repository. Click Connections > Application. The Application Connection Browser dialog box appears.
3. 4.
From the Select Type list, select SAP R3 as the application connection type. Click New. The Connection Object Definition dialog box appears.
84
5.
Enter the following information for the application connection, depending on whether you are configuring an application connection for a stream or file mode session. Table 2-13 describes the properties that you configure for an SAP R/3 connection:
Table 2-13. SAP R/3 Application Connection Properties
Property Name User Name Values for CPI-C (Stream Mode) Connection name used by the Workflow Manager. SAP user name with authorization on S_CPIC and S_TABU_DIS objects. To define the user name in the parameter file, enter session parameter $ParamName as the user name, and define the value in the session or workflow parameter file. The Integration Service interprets user names that start with $Param as session parameters. For more information about parameter files, see Parameter Files on page 687. Indicates the password for the SAP user name is a session parameter, $ParamName. If you enable this option, define the password in the workflow or session parameter file, and encrypt it using the pmpasswd CRYPT_DATA option. For more information about pmpasswd, see the Command Line Reference. For more information about parameter files, see Parameter Files on page 687. Default is disabled. Password for the SAP user name. DEST entry in the sideinfo file. Code page compatible with the SAP server. The code page must correspond to the Language Code. SAP client number. Language code that corresponds to the SAP language. Values for RFC (File Mode) Connection name used by the Workflow Manager. SAP user name with authorization on S_DATASET, S_TABU_DIS, S_PROGRAM, and B_BTCH_JOB objects. To define the user name in the parameter file, enter session parameter $ParamName as the user name, and define the value in the session or workflow parameter file. The Integration Service interprets user names that start with $Param as session parameters. For more information about parameter files, see Parameter Files on page 687. Indicates the password for the SAP user name is a session parameter, $ParamName. If you enable this option, define the password in the workflow or session parameter file, and encrypt it using the pmpasswd CRYPT_DATA option. For more information about pmpasswd, see the Command Line Reference. For more information about parameter files, see Parameter Files on page 687. Default is disabled. Password for the SAP user name. Type A DEST entry in saprfc.ini. Code page compatible with the SAP server. The code page must correspond to the Language Code. SAP client number. Language code that corresponds to the SAP language.
* For more information about code pages, see the PowerCenter Administrator Guide. For more information about selecting a language code, see the PowerExchange for SAP NetWeaver User Guide.
6.
Click OK. The new application connection appears in the Application Connection Browser list.
7.
85
8.
Click Close.
In the Workflow Manager, connect to a repository. Click Connections > Application. The Application Connection Browser dialog box appears.
3. 4.
From the Select Type list, select SAP_ALE_IDoc_Reader as the application connection type for outbound IDoc or business content integration sessions. Click New. The Connection Object Definition dialog box appears.
5.
Enter the connection information. Table 2-14 describes the properties that you configure for an SAP_ALE_IDoc_Reader application connection:
Table 2-14. SAP_ALE_IDoc_Reader Application Connection Properties
Property Name Required/ Optional Required Description Connection name used by the Workflow Manager.
86
6.
Click OK. The new application connection appears in the Application Connection Browser list.
In the Workflow Manager, connect to a repository. Click Connections > Application. The Application Connection Browser dialog box appears.
3. 4.
From the Select Type list, select SAP_ALE_IDoc_Writer as the application connection type for inbound IDoc sessions. Click New. The Connection Object Definition dialog box appears.
5.
Enter the connection information. Table 2-15 describes the properties that you configure for an SAP_ALE_IDoc_Writer application connection:
Table 2-15. SAP_ALE_IDoc_Writer Application Connection Properties
Property Name User Name Required/ Optional Required Required Description Connection name used by the Workflow Manager. SAP user name with authorization on S_DATASET, S_TABU_DIS, S_PROGRAM, and B_BTCH_JOB objects. To define the user name in the parameter file, enter session parameter $ParamName as the user name, and define the value in the session or workflow parameter file. The Integration Service interprets user names that start with $Param as session parameters. For more information about parameter files, see Parameter Files on page 687.
87
Password
Required
* For more information about code pages, see the PowerCenter Administrator Guide. For more information about selecting a language code, see the PowerExchange for SAP NetWeaver User Guide.
6.
Click OK. The new application connection appears in the Application Connection Browser list.
7.
Click Close.
In the Workflow Manager, connect to a repository. Click Connections > Application. The Application Connection Browser dialog box appears.
3. 4.
From the Select Type list, select SAP RFC/BAPI Interface. Click New. The Connection Object Definition dialog box appears.
5.
88
Table 2-16 describes the properties that you configure for an SAP RFC/BAPI application connection:
Table 2-16. SAP RFC/BAPI Application Connection Properties
Property Name User Name Required/ Optional Required Required Description Connection name used by the Workflow Manager. SAP user name with authorization on S_DATASET, S_TABU_DIS, S_PROGRAM, and B_BTCH_JOB objects. To define the user name in the parameter file, enter session parameter $ParamName as the user name, and define the value in the session or workflow parameter file. The Integration Service interprets user names that start with $Param as session parameters. For more information about parameter files, see Parameter Files on page 687. Indicates the password for the SAP user name is a session parameter, $ParamName. If you enable this option, define the password in the workflow or session parameter file, and encrypt it using the pmpasswd CRYPT_DATA option. For more information about pmpasswd, see the Command Line Reference. For more information about parameter files, see Parameter Files on page 687. Default is disabled. Password for the SAP user name. Note: If you want to run a session on 32-bit Linux, and you want to connect to SAP 4.60, enter the password in upper case. The SAP system must also use upper case passwords. Type A DEST entry in saprfc.ini. Code page compatible with the SAP server. Must also correspond to the Language Code. Language code that corresponds to the SAP language. SAP client number.
Required
Password
Required
* For more information about code pages, see the PowerCenter Administrator Guide. For more information about selecting a language code, see the PowerExchange for SAP NetWeaver User Guide.
6.
Click OK. The new application connection appears in the Application Connection Browser list.
7.
Click Close.
89
In the Workflow Manager, connect to a repository. Click Connections > Application. The Application Connection Browser appears.
3.
Select a Siebel application type and click New. The Connection Object Definition dialog box appears.
4.
Enter the connection information. Table 2-17 describes the properties that you configure for a Siebel application connection:
Table 2-17. Siebel Application Connection Properties
Property Name User Name Required/ Optional Required Required Description Name you want to use for this connection. Database user name with SELECT permission on physical database tables in the Siebel source system. To define the user name in the parameter file, enter session parameter $ParamName as the user name, and define the value in the session or workflow parameter file. The Integration Service interprets user names that start with $Param as session parameters. For more information about parameter files, see Parameter Files on page 687. Indicates the password for the database user name is a session parameter, $ParamName. If you enable this option, define the password in the workflow or session parameter file, and encrypt it using the pmpasswd CRYPT_DATA option. For more information about pmpasswd, see the Command Line Reference. For more information about parameter files, see Parameter Files on page 687. Default is disabled. Password for the database user name. Must be in US-ASCII. Connect string for the underlying database of the Siebel system. For more information, see Table 2-1 on page 51. This option appears for DB2, Oracle, and Informix.
Required
Required Conditional
90
Database Name Server Name Domain Name Packet Size Use Trusted Connection
Optional Optional
5.
Click OK. The new application connection appears in the Application Connection Browser.
6. 7.
To add more application connections, repeat steps 3 to 5. Click OK to save all changes.
91
TIB/Rendezvous. Configure to read or write messages in TIB/Rendezvous format. TIB/Adapter SDK. Configure read or write messages in AE format.
Subject
Required
Service
Optional
Network
Optional
92
information about the TIBCO daemon, see the TIBCO documentation. Certified Optional Select if you want the Integration Service to read or write certified messages. For more information about reading and writing certified messages in a session, see the PowerExchange for TIBCO User Guide. Unique CM name for the CM transport when you choose certified messaging. For more information about CM transports, see the TIBCO documentation. Enter a relay agent when you choose certified messaging and the machine running the Integration Service is not constantly connected to a network. The Relay Agent name must be fewer than 127 characters. For more information about relay agents, see the TIBCO documentation. Enter a unique ledger file name when you want the Integration Service to read or write certified messages. The ledger file records the status of each certified message. Configure a file-based ledger when you want the TIBCO daemon to send unconfirmed certified messages to TIBCO targets. You also configure a file-based ledger with Request Old when you want the Integration Service to receive unconfirmed certified messages from TIBCO sources. For more information about file-based ledgers, see the TIBCO documentation. Select if you want PowerCenter to wait until it writes the status of each certified message to the ledger file before continuing message delivery or receipt. Select if you want the Integration Service to receive certified messages that it did not confirm with the source during a previous session run. When you select Request Old, you should also specify a file-based ledger for the Ledger File attribute. For more information about Request Old, see the TIBCO documentation. Register the user certificate with a private key when you want to connect to a secure TIB/Rendezvous daemon during the session. The text of the user certificate must be in PEM encoding or PKCS #12 binary format. Enter a user name for the secure TIB/Rendezvous daemon. Enter a password for the secure TIB/Rendezvous daemon.
Optional Optional
Ledger File
Optional
Optional Optional
Optional
Optional Optional
93
Note: The adapter instances you specify in TIB/Adapter SDK connections should only contain
one session. Table 2-19 describes the connection properties you configure for a TIB/Adapter SDK application connection:
Table 2-19. TIB/Adapter SDK Application Connection Properties
Connection Attributes Name Code Page Required/ Optional Required Required Description Name you want to use for this connection. Code page the Integration Service uses to extract data from the TIBCO. When using relaxed code page validation, select compatible code pages for the source and target data to prevent data inconsistencies. Default subject for source and target messages. During a workflow, the Integration Service reads messages with this subject from TIBCO sources. It also writes messages with this subject to TIBCO targets. You can overwrite the default subject for TIBCO targets when you link the SendSubject port in a TIBCO target definition in a mapping. For more information about the SendSubject field in TIBCO target definitions, see the PowerExchange for TIBCO User Guide. For more information about subject names, see the TIBCO documentation. Name of an adapter instance. URL for the TIB/Repository instance you want to connect to. You can enter the server process variable $PMSourceFileDir for the Repository URL. URL for the adapter instance. Name of the TIBCO session associated with the adapter instance. Select Validate Messages when you want the Integration Service to read and write messages in AE format.
Subject
Required
Application Name Repository URL Configuration URL Session Name Validate Messages
In the Workflow Manager, connect to a PowerCenter repository. Click Connections > Application. The Application Object Browser dialog box appears.
3.
4.
Click New.
94
Enter a name for the application connection and verify the code page. For more information about code pages, see Database Connection Code Pages on page 52.
6. 7.
Enter the connection information. Click OK. The new TIBCO application connection appears in the Application Object Browser.
95
Configure a Web Services application connection with an endpoint URL if the web service you connect to requires authentication or if you want to use an endpoint URL that differs from the one contained in the WSDL file. Configure a Web Services application connection without an endpoint URL if the web service you connect to requires authentication but you want to use the endpoint URL contained in the WSDL file. You do not need to configure a Web Services application connection if the web service you connect to does not require authentication and you want to use the endpoint URL contained in the WSDL file.
If you need to configure SSL authentication, enter values for the SSL authentication-related properties in the Web Services application connection. For more information about SSL authentication, see PowerExchange for Web Services User Guide. For more information about the SSL authentication properties, see step 6 of the following procedure. The Repository Service saves all application connections that you define in the Workflow Manager in the PowerCenter repository.
To create a Web Services application connection: 1. 2.
In the Workflow Manager, connect to a PowerCenter repository. Click Connections > Application. The Application Connection Browser dialog box appears.
3. 4.
From Select Type, select Web Services Consumer. Click New. The Connection Object Definition dialog box appears.
96
5. 6.
Enter a name for the Web Services application connection. Enter the connection information. Table 2-20 describes the properties that you configure for a Web Services application connection:
Table 2-20. Web Service Application Connection Properties
Connection Attributes User Name Required/ Optional Required Description User name that the web service requires. If the web service does not require a user name, enter PmNullUser. To define the user name in the parameter file, enter session parameter $ParamName as the user name, and define the value in the session or workflow parameter file. The Integration Service interprets user names that start with $Param as session parameters. For more information about parameter files, see Parameter Files on page 687. Indicates the web service password is a session parameter, $ParamName. If you enable this option, define the password in the workflow or session parameter file, and encrypt it using the pmpasswd CRYPT_DATA option. For more information about pmpasswd, see the Command Line Reference. For more information about parameter files, see Parameter Files on page 687. Default is disabled. Password that the web service requires. If the web service does not require a password, enter PmNullPasswd. Connection code page. The Repository Service uses the character set encoded in the repository code page when writing data to the repository. Endpoint URL for the web service that you want to access. The WSDL file specifies this URL in the location element. You can use session parameter $ParamName, a mapping parameter, or a mapping variable as the endpoint URL. For example, you can use a session parameter, $ParamMyURL, as the endpoint URL, and set $ParamMyURL to the URL in the parameter file. For more information about defining parameters and variables in parameter files, see Where to Use Parameters and Variables on page 691. Domain for authentication. Number of seconds the Integration Service waits for a connection to the web service provider before it closes the connection and fails the session. File containing the bundle of trusted certificates that the client uses when authenticating the SSL certificate of a server. You specify the trust certificates file to have the Integration Service authenticate the web service provider. By default, the name of the trust certificates file is ca-bundle.crt. For information about adding certificates to the trust certificates file, see the PowerExchange for Web Services User Guide.
Required
Required Required
Optional
Domain Timeout
Optional Required
Optional
97
Optional
Optional
Optional
Key Password
Optional
Optional
7.
Click OK. The application connection appears in the Application Connection Browser.
98
In the Workflow Manager, connect to a PowerCenter repository. Click Connections > Application. The Application Connection Browser dialog box appears.
3. 4.
From Select Type, select webMethods Broker. Click New. The Connection Object Definition dialog box appears.
5.
Enter the connection information. Table 2-21 describes the properties that you configure for a webMethods application connection:
Table 2-21. webMethods Application Connection Properties
Property Name Broker Host Required/ Optional Required Required Description Name you want to use for this connection. Enter the host name of the Broker you want the Integration Service to connect to. If the port number for the Broker is not the default port number, also enter the port number. Default port number is 6849. Enter the host name and port number in the following format:
<host name:port>
Optional Optional
Enter the name of the Broker. If you do not enter a Broker name, the Integration Service uses the default Broker. Enter a client ID for the Integration Service to use when it connects to the Broker during the session. If you do not enter a client ID, the Broker generates a random client ID. If you select Preserve Client State, enter a client ID. Enter the name of the group to which the client belongs. Enter the name of the application that will run the Broker Client. Default is Informatica PowerCenter Connect for webMethods.
Required Required
99
6.
Click OK. The application connection appears in the Application Object Browser.
100
In the Workflow Manager, connect to a repository. Click Connections > Queue. The Queue Connection Browser appears.
3. 4.
Click New. Select Message Queue as the subtype and click OK. The Connection Object Definition dialog box appears.
5.
Enter the connection information. Table 2-22 describes the properties that you configure for a Message Queue queue connection:
Table 2-22. Message Queue Connection Properties
Property Name Code Page Queue Manager Queue Name Connection Retry Period Required/ Optional Required Required Required Required Required Description Name you want to use for this connection. Code page that is the same as or a a subset of the code page of the queue manager coded character set identifier (CCSID). Name of the queue manager for the message queue. Name of the message queue. Number of seconds the Integration Service attempts to reconnect to the WebSphere MQ queue if the connection fails. If the Integration Service cannot reconnect to the WebSphere MQ queue in the retry period, the session fails. Default is 0 and indicates an infinite retry period. For more information about resiliency for PowerExchange for WebSphere MQ, see the PowerExchange for WebSphere MQ User Guide.
6.
Click OK. The queue connection appears in the Message Queue connection list.
7.
To edit or delete a queue connection, select the queue connection from the list and click the appropriate button.
101
From the command prompt of the WebSphere MQ server machine, go to the <WebSphere MQ>\bin directory. Use one of the following commands to test the connection for the queue:
amqsputc. Use if you installed the WebSphere MQ client on the Integration Service machine. amqsput. Use if you installed the WebSphere MQ server on the Integration Service machine.
The amqsputc and amqsput commands put a new message on the queue. If you test the connection to a queue in a production environment, terminate the command to avoid writing a message to a production queue. For example, to test the connection to the queue production, which is administered by the queue manager QM_s153664.informatica.com, enter one of the following commands:
amqsputc production QM_s153664.informatica.com
If the connection is valid, the command returns a connection acknowledgment. If the connection is not valid, it returns an WebSphere MQ error message.
3.
If the connection is successful, press Ctrl+C at the prompt to terminate the connection and the command.
102
On the WebSphere MQ Server system, go to the <WebSphere MQ>/samp/bin directory. Use one of the following commands to test the connection for the queue:
amqsputc. Use if you installed the WebSphere MQ client on the Integration Service machine. amqsput. Use if you installed the WebSphere MQ server on the Integration Service machine.
The amqsputc and amqsput commands put a new message on the queue. If you test the connection to a queue in a production environment, make sure you terminate the command to avoid writing a message to a production queue. For example, to test the connection to the queue production, which is administered by the queue manager QM_s153664.informatica.com, enter one of the following commands:
amqsputc production QM_s153664.informatica.com
If the connection is valid, the command returns a connection acknowledgment. If the connection is not valid, it returns an WebSphere MQ error message.
3.
If the connection is successful, press Ctrl+C at the prompt to terminate the connection and the command.
103
104
Chapter 3
Overview, 106 Creating a Workflow, 109 Using the Workflow Wizard, 111 Assigning an Integration Service, 115 Working with Links, 117 Using the Expression Editor, 121 Scheduling a Workflow, 123 Validating a Workflow, 132 Manually Starting a Workflow, 135 Suspending the Workflow, 138 Stopping or Aborting the Workflow, 140 Viewing a Workflow Report, 142
105
Overview
A workflow is a set of instructions that tells the Integration Service how to run tasks such as sessions, email notifications, and shell commands. After you create tasks in the Task Developer and Workflow Designer, you connect the tasks with links to create a workflow. In the Workflow Designer, you can specify conditional links and use workflow variables to create branches in the workflow. The Workflow Manager also provides Event-Wait and Event-Raise tasks to control the sequence of task execution in the workflow. You can also create worklets and nest them inside the workflow. Every workflow contains a Start task, which represents the beginning of the workflow. Figure 3-1 shows a sample workflow:
Figure 3-1. Sample Workflow
Workflow Tasks
Start Task
Session Task
Assignment Task
Link
Command Task
106
When you create a workflow, select an Integration Service to run the workflow. You can start the workflow using the Workflow Manager, Workflow Monitor, or pmcmd. Use the Workflow Monitor to see the progress of a workflow during its run. The Workflow Monitor can also show the history of a workflow. For more information about the Workflow Monitor, see Monitoring Workflows on page 569. Use the following guidelines when you develop a workflow: 1. 2. Create a workflow. Create a workflow in the Workflow Designer. For more information about creating a new workflow, see Creating a Workflow Manually on page 109. Add tasks to the workflow. You might have already created tasks in the Task Developer. Or, you can add tasks to the workflow as you develop the workflow in the Workflow Designer. For more information about workflow tasks, see Working with Tasks on page 157. Connect tasks with links. After you add tasks to the workflow, connect them with links to specify the order of execution in the workflow. For more information about links, see Working with Links on page 117. Specify conditions for each link. You can specify conditions on the links to create branches and dependencies. For more information, see Working with Links on page 117. Validate workflow. Validate the workflow in the Workflow Designer to identify errors. For more information about validation rules, see Validating a Workflow on page 132. Save workflow. When you save the workflow, the Workflow Manager validates the workflow and updates the repository. Run workflow. In the workflow properties, select an Integration Service to run the workflow. Run the workflow from the Workflow Manager, Workflow Monitor, or
3.
4.
5. 6. 7.
Overview
107
pmcmd. You can monitor the workflow in the Workflow Monitor. For more information about starting a workflow, see Manually Starting a Workflow on page 135. For a complete list of workflow properties, see Workflow Properties Reference on page 859.
108
Creating a Workflow
A workflow must contain a Start task. The Start task represents the beginning of a workflow. When you create a workflow, the Workflow Designer creates a Start task and adds it to the workflow. You cannot delete the Start task. After you create a workflow, you can add tasks to the workflow. The Workflow Manager includes tasks such as the Session, Command, and Email tasks. Finally, you connect workflow tasks with links to specify the order of execution in the workflow. You can add conditions to links. When you edit a workflow, the Repository Service updates the workflow information when you save the workflow. If a workflow is running when you make edits, the Integration Service uses the updated information the next time you run the workflow. You can create a workflow manually or automatically, or you can use the Workflow Wizard.
Open the Workflow Designer. Click Workflows > Create. Enter a name for the new workflow. Click OK. The Workflow Designer creates a Start task in the workflow.
Open the Workflow Designer. Close any open workflow. Click the session button on the Tasks toolbar. Click in the Workflow Designer workspace. The Mappings dialog box appears.
4.
Select a mapping to associate with the session and click OK. The Create Workflow dialog box appears. The Workflow Designer names the workflow wf_SessionName by default. You can rename the workflow or change other workflow properties. For more information about workflow properties, see Workflow Properties Reference on page 859.
Creating a Workflow 109
5.
Click OK. The Workflow Designer creates a workflow for the session.
Deleting a Workflow
You may decide to delete a workflow that you no longer use. When you delete a workflow, you delete all non-reusable tasks and reusable task instances associated with the workflow. Reusable tasks used in the workflow remain in the folder when you delete the workflow. If you delete a workflow that is running, the Integration Service aborts the workflow. If you delete a workflow that is scheduled to run, the Integration Service removes the workflow from the schedule. You can delete a workflow in the Navigator window, or you can delete the workflow currently displayed in the Workflow Designer workspace:
To delete a workflow from the Navigator window, open the folder, select the workflow and press the Delete key. To delete a workflow currently displayed in the Workflow Designer workspace, click Workflows > Delete.
110
In the Workflow Manager, open the folder containing the mapping you want to use in the workflow. Open the Workflow Designer. Click Workflows > Wizard.
111
4.
Enter a name for the workflow. The convention for naming workflows is wf_WorkflowName. For a complete list of naming conventions for repository objects, see Naming Conventions in Getting Started.
5. 6.
Enter a description for the workflow. Select the Integration Service to run the workflow and click Next.
In the second step of the Workflow Wizard, select a valid mapping and click the right arrow button. The Workflow Wizard creates a Session task in the right pane using the selected mapping and names it s_MappingName by default.
112
2.
You can select additional mappings to create more Session tasks in the workflow. When you add multiple mappings to the list, the Workflow Wizard creates sequential sessions in the order you add them.
3. 4.
Use the arrow buttons to change the session order. Specify whether the session should be reusable. When you create a reusable session, use the session in other workflows. For more information about reusable sessions, see Working with Tasks on page 157.
5.
Specify how you want the Integration Service to run the workflow. You can specify that the Integration Service runs sessions only if previous sessions complete, or you can specify that the Integration Service always runs each session. When you select this option, it applies to all sessions you create using the Workflow Wizard.
113
To schedule a workflow: 1.
In the third step of the Workflow Wizard, configure the scheduling and run options. For more information about scheduling a workflow, see Scheduling a Workflow on page 123.
2.
Click Next. The Workflow Wizard displays the settings for the workflow.
3.
Verify the workflow settings and click Finish. To edit settings, click Back. The completed workflow opens in the Workflow Designer workspace. From the workspace, you can add tasks, create concurrent sessions, add conditions to links, or modify properties.
114
In the Workflow Designer, open the Workflow. Click Workflows > Edit. The Edit Workflow dialog box appears.
3.
Select the Integration Service that you want to run the workflow.
115
5.
Close all folders in the repository. Click Service > Assign Integration Service. -orRight-click the Integration Service name in the Navigator and choose Assign to Workflows. The Assign Integration Service dialog box appears.
3. 4. 5. 6.
From the Choose Integration Service list, select the service you want to assign. From the Show Folder list, select the folder you want to view. Or, click All to view workflows in all folders in the repository. Click the Selected check box for each workflow you want the Integration Service to run. Click Assign.
116
The Workflow Manager does not allow you to create a workflow that contains a loop, such as the loop shown in Figure 3-4. Figure 3-4 shows an invalid workflow:
Figure 3-4. Invalid Workflow with a Loop
Use the following procedure to link tasks in the Workflow Designer or the Worklet Designer.
To link two tasks: 1.
2.
In the workspace, click the first task you want to connect and drag it to the second task.
117
3.
If you want to link multiple tasks concurrently, you may not want to connect each link manually.
To link tasks concurrently: 1. 2.
In the workspace, click the first task you want to connect. Ctrl-click all other tasks you want to connect. Note: Do not use Ctrl+A or Edit > Select All to choose tasks.
3.
Click Tasks > Link Concurrent. A link appears between the first task you selected and each task you added. The first task you selected links to each task concurrently.
If you have a number of tasks that you want to link sequentially, you may not wish to connect each link manually.
To link tasks sequentially: 1. 2. 3.
In the workspace, click the first task you want to connect. Ctrl-click the next task you want to connect. Continue to add tasks in the order you want them to run. Click Tasks > Link Sequential. Links appear in sequential order between the first task and each subsequent task you added.
118
To accomplish this, you can set the link condition between the two sessions so that the s_STORES_AZ runs only if the number of failed target rows for S_STORES_CA is zero. Figure 3-5 shows how to set the link condition using the target failed rows variable for S_STORES_CA:
Figure 3-5. Setting a Link Condition
After you specify the link condition in the Expression Editor, the Workflow Manager validates the link condition and displays it next to the link in the workflow. Figure 3-6 shows the link condition displayed in the workspace:
Figure 3-6. Displaying a Link Condition in the Workflow
Link Condition
119
In the Workflow Designer workspace, double-click the link you want to specify. -orRight-click the link and choose Edit. The Expression Editor appears.
2.
In the Expression Editor, enter the link condition. The Expression Editor provides predefined workflow variables, user-defined workflow variables, variable functions, and boolean and arithmetic operators.
3.
Validate the expression using the Validate button. The Workflow Manager displays validation results in the Output window.
Tip: Drag the end point of a link to move it from one task to another without losing the link
condition.
the color for links, click Tools > Options > Format, and choose the Link Selection option.
To view link paths: 1. 2.
In the Worklet Designer or Workflow Designer, right-click a task and choose Highlight Path. Select Forward Path, Backward Path, or Both. The Workflow Manager highlights all links in the branch you select.
In the Worklet Designer or Workflow Designer, select all links you want to delete. Tip: Use the mouse to drag the selection, or you can Ctrl-click the tasks and links.
2.
Click Edit > Delete Links. The Workflow Manager removes all selected links.
120
The Expression Editor displays built-in variables, user-defined workflow variables, and predefined workflow variables such as $Session.status. For more information about workflow variables, see Working with Workflow Variables on page 143. The Expression Editor also displays the following functions:
Transformation language functions. SQL-like functions designed to handle common expressions. User-defined functions. Functions you create in PowerCenter based on transformation language functions. Custom functions. Functions you create with the Custom Function API.
For more information about the transformation language and custom functions, see the Transformation Language Reference. For more information about user-defined functions, see the Designer Guide.
121
Adding Comments
You can add comments using -- or // comment indicators with the Expression Editor. Use comments to give descriptive information about the expression, or you can specify a valid URL to access business documentation about the expression.
Validating Expressions
Use the Validate button to validate an expression. If you do not validate an expression, the Workflow Manager validates it when you close the Expression Editor. You cannot run a workflow with invalid expressions. Expressions in link conditions and Decision task conditions must evaluate to a numeric value. Workflow variables used in expressions must exist in the workflow.
122
Scheduling a Workflow
You can schedule a workflow to run continuously, repeat at a given time or interval, or you can manually start a workflow. Each workflow has an associated scheduler. A scheduler is a repository object that contains a set of schedule settings. You can create a non-reusable scheduler for the workflow. Or, you can create a reusable scheduler to use the same set of schedule settings for workflows in the folder. You can change the schedule settings by editing the scheduler. By default, the workflow runs on demand. If you change schedule settings, the Integration Service reschedules the workflow according to the new settings. The Integration Service runs a scheduled workflow as configured. The Workflow Manager marks a workflow invalid if you delete the scheduler associated with the workflow. If you configure multiple instances of a workflow, and you schedule the workflow run time, the Integration Service runs all instances at the scheduled time. You cannot schedule workflow instances to run at different times. If you choose a different Integration Service for the workflow or restart the Integration Service, it reschedules all workflows. This includes workflows that are scheduled to run continuously but whose start time has passed and workflows that are scheduled to run continuously but were unscheduled. You must manually reschedule workflows whose start time has passed if they are not scheduled to run continuously. If you delete a folder, the Integration Service removes workflows from the schedule when it receives notification from the Repository Service. If you copy a folder into a repository, the Integration Service reschedules all workflows in the folder when it receives the notification. The Integration Service does not run the workflow in the following situations:
The prior workflow run fails. When a workflow fails, the Integration Service removes the workflow from the schedule, and you must manually reschedule it. You can reschedule the workflow in the Workflow Manager or using pmcmd. In the Workflow Manager Navigator window, right-click the workflow and select Schedule Workflow. For more information about the pmcmd scheduleworkflow command, see the Command Line Reference. You remove the workflow from the schedule. You can remove the workflow from the schedule in the Workflow Manager or using pmcmd. In the Workflow Manager Navigator window, right-click the workflow and select Unschedule Workflow. For more information about removing workflows from the schedule, see Unscheduling a Workflow on page 125. For more information about the pmcmd unscheduleworkflow command, see the Command Line Reference. The Integration Service is running in safe mode. In safe mode, the Integration Service does not run scheduled workflows, including workflows scheduled to run continuously or run on service initialization. When you enable the Integration Service in normal mode, the Integration Service runs the scheduled workflows. For more information about safe mode, see Creating and Configuring the Integration Service in the PowerCenter Administrator Guide.
Scheduling a Workflow
123
Note: The Integration Service schedules the workflow in the time zone of the Integration
Service machine. For example, the PowerCenter Client is in the current time zone and the Integration Service is in a time zone two hours later. If you schedule the workflow to start at 9 a.m., it starts at 9 a.m. in the time zone of the Integration Service machine and 7 a.m. current time.
To schedule a workflow: 1. 2. 3.
In the Workflow Designer, open the workflow. Click Workflows > Edit. In the Scheduler tab, choose Non-reusable if you want to create a non-reusable set of schedule settings for the workflow. Select Reusable if you want to select an existing reusable scheduler for the workflow.
Note: If you do not have a reusable scheduler in the folder, you must create one before
you choose Reusable. The Workflow Manager displays a warning message if you do not have an existing reusable scheduler.
4.
Click the right side of the Scheduler field to edit scheduling settings for the scheduler.
For a complete list of scheduler options, see Configuring Scheduler Settings on page 126.
124
5.
If you select Reusable, choose a reusable scheduler from the Scheduler Browser dialog box.
6.
Click OK.
To reschedule a workflow on its original schedule, right-click the workflow in the Navigator window and choose Schedule Workflow.
Unscheduling a Workflow
To remove a workflow from its schedule, right-click the workflow in the Navigator and choose Unschedule Workflow. To permanently remove a workflow from a schedule, configure the workflow schedule to run on demand.
Note: When the Integration Service restarts, it reschedules all unscheduled workflows that are
scheduled to run continuously. For more information about schedule options, see Configuring Scheduler Settings on page 126.
Scheduling a Workflow
125
In the Workflow Designer, click Workflows > Schedulers. Click Add to add a new scheduler.
3. 4.
In the General tab, enter a name for the scheduler. Configure the scheduler settings in the Scheduler tab. For a complete list of scheduler settings, see Table 3-1 on page 127.
126
Scheduling a Workflow
127
Optional
Conditional
128
Weekly
Conditional
Scheduling a Workflow
129
Daily Frequency
Optional
Non-reusable schedulers. When you configure or edit a non-reusable scheduler, check in the workflow to allow the schedule to take effect. You can update the schedule manually with the workflow checked out. Right-click the workflow in the Navigator, and select Schedule Workflow. Note that the changes are applied to the latest checked-in version of the workflow.
Reusable schedulers. When you edit settings for a reusable scheduler, the repository creates a new version of the scheduler and increments the version number by one. To update a workflow with the latest schedule, check in the scheduler after you edit it. When you configure a reusable scheduler for a new workflow, you must check in both the workflow and the scheduler to enable the schedule to take effect. Thereafter, when you check in the scheduler after revising it, the workflow schedule is updated even if it is checked out. You need to update the workflow schedule manually if you do not check in the scheduler. To update a workflow schedule manually, right-click the workflow in the Navigator and select Schedule Workflow. The new schedule is implemented for the latest version of the workflow that is checked in. Workflows that are checked out are not updated with the new schedule.
130
Disabling Workflows
You may want to disable the workflow while you edit it. This prevents the Integration Service from running the workflow on its schedule. Select the Disable Workflows option on the General tab of the workflow properties. The Integration Service does not run disabled workflows until you clear the Disable Workflows option. Once you clear the Disable Workflows option, the Integration Service reschedules the workflow.
Scheduling a Workflow
131
Validating a Workflow
Before you can run a workflow, you must validate it. When you validate the workflow, you validate all task instances in the workflow, including nested worklets. The Workflow Manager validates the following properties:
Expressions. Expressions in the workflow must be valid. Tasks. Non-reusable task and Reusable task instances in the workflow must follow validation rules. Scheduler. If the workflow uses a reusable scheduler, the Workflow Manager verifies that the scheduler exists.
The Workflow Manager also verifies that you linked each task properly.
Note: The Workflow Manager validates Session tasks separately. If a session is invalid, the
workflow may still be valid. For more information about session validation, see Validating a Session on page 231.
Expression Validation
The Workflow Manager validates all expressions in the workflow. You can enter expressions in the Assignment task, Decision task, and link conditions. The Workflow Manager writes any error message to the Output window. Expressions in link conditions and Decision task conditions must evaluate to a numerical value. Workflow variables used in expressions must exist in the workflow. The Workflow Manager marks the workflow invalid if a link condition is invalid.
Task Validation
The Workflow Manager validates each task in the workflow as you create it. When you save or validate the workflow, the Workflow Manager validates all tasks in the workflow except Session tasks. It marks the workflow invalid if it detects any invalid task in the workflow. The Workflow Manager verifies that attributes in the tasks follow validation rules. For example, the user-defined event you specify in an Event task must exist in the workflow. The Workflow Manager also verifies that you linked each task properly. For example, you must link the Start task to at least one task in the workflow. For more information about task validation rules, see Validating Tasks on page 165. When you delete a reusable task, the Workflow Manager removes the instance of the deleted task from workflows. The Workflow Manager also marks the workflow invalid when you delete a reusable task used in a workflow. The Workflow Manager verifies that there are no duplicate task names in a folder, and that there are no duplicate task instances in the workflow.
132
Running Validation
When you validate a workflow, you validate worklet instances, worklet objects, and all other nested worklets in the workflow. You validate task instances and worklets, regardless of whether you have edited them. The Workflow Manager validates the worklet object using the same validation rules for workflows. The Workflow Manager validates the worklet instance by verifying attributes in the Parameter tab of the worklet instance. For more information about validating worklets, see Validating Worklets on page 203. If the workflow contains nested worklets, you can select a worklet to validate the worklet and all other worklets nested under it. To validate a worklet and its nested worklets, right-click the worklet and choose Validate.
Example
For example, you have a workflow that contains a non-reusable worklet called Worklet_1. Worklet_1 contains a nested worklet called Worklet_a. The workflow also contains a reusable worklet instance called Worklet_2. Worklet_2 contains a nested worklet called Worklet_b. In the example workflow in Figure 3-10, the Workflow Manager validates links, conditions, and tasks in the workflow. The Workflow Manager validates all tasks in the workflow, including tasks in Worklet_1, Worklet_2, Worklet_a, and Worklet_b. You can validate a part of the workflow. Right-click Worklet_1 and choose Validate. The Workflow Manager validates all tasks in Worklet_1 and Worklet_a. Figure 3-10 shows the example workflow:
Figure 3-10. Example Workflow - Validation
Worklet_1: Non-reusable worklet. Contains a nested worklet called Worklet_a. Worklet_2: Reusable worklet. Contains a nested worklet called Worklet_b.
Validating a Workflow
133
results view or a view dependencies list. When you validate multiple workflows, the validation does not include sessions, nested worklets, or reusable worklet objects in the workflows.
Note: You can also select and validate multiple workflows from the Navigator in the
Repository Manager. You can save and optionally check in workflows that change from invalid to valid status. For more information about validating multiple objects, see the Repository Guide.
To validate multiple workflows: 1. 2. 3.
Select workflows from either a query list or a view dependencies list. Check out the objects you want to validate. Right-click one of the selected workflows and choose Validate. The Validate Objects dialog box appears.
4.
134
Running a Workflow
When you click Workflows > Start Workflow, the Integration Service runs the entire workflow. To run a workflow from pmcmd, use the startworkflow command. For more information about using pmcmd, see the Command Line Reference.
To run a workflow from the Workflow Manager: 1. 2. 3.
Open the folder containing the workflow. From the Navigator, select the workflow that you want to start. Right-click the workflow in the Navigator and choose Start Workflow. The Integration Service runs the entire workflow.
After the Workflow Manager sends a request to the Integration Service, the Output window displays the Integration Service response. If an error appears, check the workflow log or session log for error messages. You can also manually start a workflow by right-clicking in the Workflow Designer workspace and choosing Start Workflow.
Open the folder containing the workflow. From the Navigator, select the workflow that you want to start. Right-click the workflow in the Navigator and click Start Workflow Advanced.
135
4.
Optional
5.
Click OK.
136
For example, you have a workflow with multiple tasks and multiple branches, as shown in Figure 3-11. If you want to run the tasks commandtask2, e_email2, and command3, start the workflow from commandtask2. All subsequent tasks in the branch run.
Figure 3-11. Running Part of a Workflow - Example
When you start the workflow from commandtask2, the Integration Service runs this portion of the workflow.
To run a part of a workflow from pmcmd, use the startfrom flag of the startworkflow command. For more information about using pmcmd, see the Command Line Reference.
To run a part of a workflow: 1. 2.
Connect to the folder containing the workflow. In the Navigator, drill down the Workflow node to show the tasks in the workflow. -orIn the Workflow Designer workspace, select the task from which you want the Integration Service to begin running the workflow.
3. 4.
Right-click the task for which you want the Integration Service to begin running the workflow. Click Start Workflow From Task.
When a task fails in the workflow, the Integration Service stops running tasks in the path. The Integration Service does not evaluate the output link of the failed task. If no other task is running in the workflow, the Workflow Monitor displays the status of the workflow as Suspended. If you have the high availability option, the Integration Service suspends the workflow depending on how automatic task recovery is set. If you configure the workflow to suspend on error and do not enable automatic task recovery, the workflow suspends when a task fails. If you enable automatic task recovery, the Integration Service first attempts to restart the task up to the specified recovery limit, and then suspends the workflow if it cannot restart the failed task. For more information about automatic task recovery, see Configuring Task Recovery on page 386. If one or more tasks are still running in the workflow when a task fails, the Integration Service stops running the failed task and continues running tasks in other paths. The Workflow Monitor displays the status of the workflow as Suspending. When the status of the workflow is Suspended or Suspending, you can fix the error, such as a target database error, and recover the workflow in the Workflow Monitor. When you recover a workflow, the Integration Service restarts the failed tasks and continues evaluating the rest of the tasks in the workflow. The Integration Service does not run any task that already completed successfully.
Note: Editing a suspended workflow or tasks inside a suspended workflow can cause repository
inconsistencies.
To suspend a workflow: 1. 2.
In the Workflow Designer, open the workflow. Click Workflows > Edit.
138
3.
4.
Click OK.
139
Use a Control task in the workflow. For more information, see Working with the Control Task on page 174. Issue a stop or abort command in the Workflow Monitor. For more information, see Monitoring Workflows on page 569. Issue a stop or abort command in pmcmd. For more information, see the Command Line Reference.
You can also stop or abort a task within a workflow. For more information about stopping the Session task, see Stopping and Aborting a Session on page 233.
When you stop a Command task that contains multiple commands, the Integration Service finishes executing the current command and does not run the rest of the commands. The Integration Service cannot stop tasks such as the Email task. For example, if the Integration Service has already started sending an email when you issue the stop command, the Integration Service finishes sending the email before it stops running the workflow. The Integration Service aborts the workflow if the Repository Service process shuts down.
140
continues processing concurrent tasks in the workflow. If the Integration Service cannot stop the task, you can abort the task. When you abort a task, the Integration Service kills the process on the task. The Integration Service continues processing concurrent tasks in the workflow when you abort a task. You can also stop or abort a worklet. The Integration Service stops and aborts a worklet similar to stopping and aborting a task. The Integration Service stops the worklet while executing concurrent tasks in the workflow. You can also stop or abort tasks within a worklet.
141
Tasks. Tasks contained in the workflow. Link conditions. Links between objects in the workflow. Events. User-defined and built-in events in the workflow. Variables. User-defined and built-in variables in the workflow.
In the Workflow Manager, open a workflow. Right-click in the workspace and choose View Workflow Report.
The Workflow Manager launches Data Analyzer in the default browser for the client machine and runs the Workflow Composite Report. For more information about using Data Analyzer, see the Data Analyzer User Guide.
142
Chapter 4
Overview, 144 Predefined Workflow Variables, 146 User-Defined Workflow Variables, 152
143
Overview
You can create and use variables in a workflow to reference values and record information. For example, use a variable in a Decision task to determine whether the previous task ran properly. If it did, you can run the next task. If not, you can stop the workflow. Use the following types of workflow variables:
Predefined workflow variables. The Workflow Manager provides predefined workflow variables for tasks within a workflow. For more information, see Predefined Workflow Variables on page 146. User-defined workflow variables. You create user-defined workflow variables when you create a workflow. For more information, see User-Defined Workflow Variables on page 152. Assignment tasks. Use an Assignment task to assign a value to a user-defined workflow variable. For example, you can increment a user-defined counter variable by setting the variable to its current value plus 1. For more information about using workflow variables in Assignment tasks, see Working with the Assignment Task on page 166. Decision tasks. Decision tasks determine how the Integration Service runs a workflow. For example, use the Status variable to run a second session only if the first session completes successfully. For more information about using workflow variables in Decision tasks, see Working with the Decision Task on page 176. Links. Links connect each workflow task. Use workflow variables in links to create branches in the workflow. For example, after a Decision task, you can create one link to follow when the decision condition evaluates to true, and another link to follow when the decision condition evaluates to false. For more information about using workflow variables in Link tasks, see Working with Links on page 117. Timer tasks. Timer tasks specify when the Integration Service begins to run the next task in the workflow. Use a user-defined date/time variable to specify the time the Integration Service starts to run the next task. For more information about using workflow variables in Timer tasks, see Working with the Timer Task on page 189.
Use workflow variables when you configure the following types of tasks:
144
When you build an expression, you can select predefined variables on the Predefined tab. You can select user-defined variables on the User-Defined tab. The Functions tab contains functions that you use with workflow variables. Use the point-and-click method to enter an expression using a variable. For more information about using the Expression Editor, see Using the Expression Editor on page 121. Use the following keywords to write expressions for user-defined and predefined workflow variables:
Overview
145
Task-specific variables. The Workflow Manager provides a set of task-specific variables for each task in the workflow. Use task-specific variables in a link condition to control the path the Integration Service takes when running the workflow. The Workflow Manager lists task-specific variables under the task name in the Expression Editor. Built-in variables. Use built-in variables in a workflow to return run-time or system information such as folder name, Integration Service Name, system date, or workflow start time. For more information about built-in variables, see the Transformation Language Reference. The Workflow Manager lists built-in variables under the Built-in node in the Expression Editor.
Tip: When you set the error severity level for log files to Tracing in the PowerCenter Server
setup, the workflow log displays the values of workflow variables. Use this logging level for troubleshooting only. Table 4-1 lists the task-specific workflow variables available in the Workflow Manager:
Table 4-1. Task-Specific Workflow Variables
Task-Specific Variables Condition Description Evaluation result of decision condition expression. If the task fails, the Workflow Manager keeps the condition set to null. Sample syntax: $Dec_TaskStatus.Condition = <TRUE | FALSE | NULL | any integer> Date and time the associated task ended. Sample syntax: $s_item_summary.EndTime > TO_DATE('11/10/ 2004 08:13:25') Last error code for the associated task. If there is no error, the Integration Service sets ErrorCode to 0 when the task completes. Sample syntax: $s_item_summary.ErrorCode = 24013 Note: You might use this variable when a task consistently fails with this final error message. Last error message for the associated task. If there is no error, the Integration Service sets ErrorMsg to an empty string when the task completes. Sample syntax: $s_item_summary.ErrorMsg = 'PETL_24013 Session run completed with failure Note: You might use this variable when a task consistently fails with this final error message. Task Types Decision Datatype Integer
EndTime
All tasks
Date/time
ErrorCode
All tasks
Integer
ErrorMsg
All tasks
Nstring*
146
FirstErrorMsg
Session
Nstring*
PrevTaskStatus
All tasks
Integer
SrcFailedRows
Session
Integer
SrcSuccessRows
Session
Integer
StartTime
All tasks
Date/time
147
TgtFailedRows
Session
Integer
TgtSuccessRows
Session
Integer
TotalTransErrors
Session
Integer
All predefined workflow variables except Status have a default value of null. The Integration Service uses the default value of null when it encounters a predefined variable from a task that has not yet run in the workflow. Therefore, expressions and link conditions that depend upon tasks not yet run are valid. The default value of Status is NOTSTARTED.
The Expression Editor displays the predefined workflow variables on the Predefined tab. The Workflow Manager groups task-specific variables by task and lists built-in variables under the Built-in node. To use a variable in an expression, double-click the variable. The Expression Editor displays task-specific variables in the Expression field in the following format:
$<TaskName>.<predefinedVariable>
148
Figure 4-2 shows the Expression Editor with an expression using a task-specific workflow variable and keyword:
Figure 4-2. Expression Using a Predefined Workflow Variable
Email Task
Command Task
Link condition:
$FileExists.Condition = TRUE
The Integration Service returns value based on the decision task, FileExists. Link condition:
$FileExists.Condition = FALSE
The Integration Service returns value based on the decision task, FileExists.
149
When you run the workflow, the Integration Service evaluates the link condition and returns the value based on the decision condition expression of the FileExists task. The Integration Service triggers either the email task or the command task depending on the Check_for_File task outcome.
Link condition:
$Session2.Status = SUCCEEDED
The Integration Service returns value based on the previous task in the workflow, Session2.
When you run the workflow, the Integration Service evaluates the link condition and returns the value based on the status of Session2.
150
Link condition:
$Session2.PrevTaskStatus = SUCCEEDED
The Integration Service returns value based on the previous task run, Session1.
When you run the workflow, the Integration Service skips Session2 because the session is disabled. When the Integration Service evaluates the link condition, it returns the value based on the status of Session1.
Tip: If you do not disable Session2, the Integration Service returns the value based on the
status of Session2. You do not need to change the link condition when you enable and disable Session2.
151
Use a user-defined variable to determine when to run the session that updates the orders database at headquarters. To configure user-defined workflow variables, set up the workflow as follows: 1. 2. 3. Create a persistent workflow variable, $$WorkflowCount, to represent the number of times the workflow has run. Add a Start task and both sessions to the workflow. Place a Decision task after the session that updates the local orders database. Set up the decision condition to check to see if the number of workflow runs is evenly divisible by 10. Use the modulus (MOD) function to do this. 4. 5. Create an Assignment task to increment the $$WorkflowCount variable by one. Link the Decision task to the session that updates the database at headquarters when the decision condition evaluates to true. Link it to the Assignment task when the decision condition evaluates to false.
When you configure workflow variables using conditions, the session that updates the local database runs every time the workflow runs. The session that updates the database at headquarters runs every 10th time the workflow runs.
152
The start value is the value of the variable at the start of the workflow. The start value could be a value defined in the parameter file for the variable, a value saved in the repository from the previous run of the workflow, a user-defined initial value for the variable, or the default value based on the variable datatype. The Integration Service looks for the start value of a variable in the following order: 1. 2. 3. 4. Value in parameter file Value saved in the repository (if the variable is persistent) User-specified default value Datatype default value
For a list of datatype default values, see Table 4-2 on page 154. For example, you create a workflow variable in a workflow and enter a default value, but you do not define a value for the variable in a parameter file. The first time the Integration Service runs the workflow, it evaluates the start value of the variable to the user-defined default value. If you declare the variable as persistent, the Integration Service saves the value of the variable to the repository at the end of the workflow run. The next time the workflow runs, the Integration Service evaluates the start value of the variable as the value saved in the repository. If the variable is non-persistent, the Integration Service does not save the value of the variable. The next time the workflow runs, the Integration Service evaluates the start value of the variable as the user-specified default value. If you want to override the value saved in the repository before running a workflow, you need to define a value for the variable in a parameter file. When you define a workflow variable in the parameter file, the Integration Service uses this value instead of the value saved in the repository or the configured initial value for the variable. The current value is the value of the variable as the workflow progresses. When a workflow starts, the current value of a variable is the same as the start value. The value of the variable can change as the workflow progresses if you create an Assignment task that updates the value of the variable. If the variable is persistent, the Integration Service saves the current value of the variable to the repository at the end of a successful workflow run. If the workflow fails to complete, the Integration Service does not update the value of the variable in the repository. The Integration Service states the value saved to the repository for each workflow variable in the workflow log.
153
Variables of type Nstring can have a maximum length of 600 characters. Variables of type Date/Time use the date time format in the session configuration object. Variables of type Date/Time can have the following formats:
MM/DD/RR MM/DD/YYYY MM/DD/RR HH24:MI MM/DD/YYYY HH24:MI MM/DD/RR HH24:MI:SS MM/DD/YYYY HH24:MI:SS MM/DD/RR HH24:MI:SS.NS MM/DD/YYYY HH24:MI:SS.NS
Note: You can use the following separators: backslash, colon, period, and spaces. You cannot
154
In the Workflow Designer, create a new workflow or edit an existing one. Select the Variables tab.
Add Button
Validate Button
3.
Click Add and enter a name for the variable. The correct format for a user-defined workflow variable is $$VariableName. Do not use a single $ for a user-defined workflow variable. The single $ is reserved for predefined workflow variables. Workflow variable names are not case sensitive.
4.
In the Datatype field, select the datatype for the new variable. You can select from the following datatypes:
5.
Enable the Persistent option if you want the value of the variable retained from one execution of the workflow to the next. For more information, see Start and Current Values on page 153.
6.
Enter the default value for the variable in the Default field. If the default is a null value, enable the Is Null option.
7.
To validate the default value of the new workflow variable, click the Validate button.
User-Defined Workflow Variables 155
8. 9.
Click Apply to save the new workflow variable. Click OK to close the workflow properties.
156
Chapter 5
Overview, 158 Creating a Task, 159 Configuring Tasks, 161 Validating Tasks, 165 Working with the Assignment Task, 166 Working with the Command Task, 169 Working with the Control Task, 174 Working with the Decision Task, 176 Working with Event Tasks, 180 Working with the Timer Task, 189
157
Overview
The Workflow Manager contains many types of tasks to help you build workflows and worklets. You can create reusable tasks in the Task Developer. Or, create and add tasks in the Workflow or Worklet Designer as you develop the workflow. Table 5-1 summarizes workflow tasks available in Workflow Manager:
Table 5-1. Workflow Tasks
Task Name Assignment Tool Workflow Designer Worklet Designer Task Developer Workflow Designer Worklet Designer Workflow Designer Worklet Designer Workflow Designer Worklet Designer Reusable No Description Assigns a value to a workflow variable. For more information, see Working with the Assignment Task on page 166. Specifies shell commands to run during the workflow. You can choose to run the Command task if the previous task in the workflow completes. For more information, see Working with the Command Task on page 169. Stops or aborts the workflow. For more information, see Working with the Control Task on page 174. Specifies a condition to evaluate in the workflow. Use the Decision task to create branches in a workflow. For more information, see Working with the Decision Task on page 176. Sends email during the workflow. For more information, see Sending Email on page 401. Represents the location of a user-defined event. The Event-Raise task triggers the user-defined event when the Integration Service runs the Event-Raise task. For more information, see Working with Event Tasks on page 180. Waits for a user-defined or a predefined event to occur. Once the event occurs, the Integration Service completes the rest of the workflow. For more information, see Working with Event Tasks on page 180. Set of instructions to run a mapping. For more information, see Working with Sessions on page 205. Waits for a specified period of time to run the next task. For more information, see Working with Event Tasks on page 180.
Command
Yes
Control Decision
No No
Task Developer Workflow Designer Worklet Designer Workflow Designer Worklet Designer
Yes
Event-Raise
No
Event-Wait
No
Session
Task Developer Workflow Designer Worklet Designer Workflow Designer Worklet Designer
Yes
Timer
No
The Workflow Manager validates tasks attributes and links. If a task is invalid, the workflow becomes invalid. Workflows containing invalid sessions may still be valid. For more information about validating tasks, see Validating Tasks on page 165.
158 Chapter 5: Working with Tasks
Creating a Task
You can create tasks in the Task Developer, or you can create them in the Workflow Designer or the Worklet Designer as you develop the workflow or worklet. Tasks you create in the Task Developer are reusable. Tasks you create in the Workflow Designer and Worklet Designer are non-reusable by default. For more information about reusable tasks, see Reusable Workflow Tasks on page 161.
2. 3. 4. 5.
Select the task type you want to create, Command, Session, or Email. Enter a name for the task. For session tasks, select the mapping you want to associate with the session. Click Create. The Task Developer creates the workflow task.
6.
Creating a Task
159
In the Workflow Designer or Worklet Designer, open a workflow or worklet. Click Tasks > Create. Select the type of task you want to create. Enter a name for the task. Click Create. The Workflow Designer or Worklet Designer creates the task and adds it to the workspace.
6.
Click Done.
You can also use the Tasks toolbar to create and add tasks to the workflow. Click the button on the Tasks toolbar for the task you want to create. Click again in the Workflow Designer or Worklet Designer workspace to create and add the task. The Workflow Designer or Worklet Designer creates the task with a default task name when you use the Tasks toolbar.
160
Configuring Tasks
After you create the task, you can configure general task options on the General tab. For each task instance in the workflow, you can configure how the Integration Service runs the task and the other objects associated with the selected task. You can also disable the task so you can run rest of the workflow without the selected task. Figure 5-1 displays the General tab in the Edit Tasks dialog box:
Figure 5-1. General Tab - Edit Tasks Dialog Box
When you use a task in the workflow, you can edit the task in the Workflow Designer and configure the following task options in the General tab:
Fail parent if this task fails. Choose to fail the workflow or worklet containing the task if the task fails. Fail parent if this task does not run. Choose to fail the workflow or worklet containing the task if the task does not run. Disable this task. Choose to disable the task so you can run the rest of the workflow without the task. Treat input link as AND or OR. Choose to have the Integration Service run the task when all or one of the input link conditions evaluates to True.
Configuring Tasks
161
You can create any task as non-reusable or reusable. Tasks you create in the Task Developer are reusable. Tasks you create in the Workflow Designer are non-reusable by default. However, you can edit the general properties of a task to promote it to a reusable task. The Workflow Manager stores each reusable task separate from the workflows that use the task. You can view a list of reusable tasks in the Tasks node in the Navigator window. You can see a list of all reusable Session tasks in the Sessions node in the Navigator window.
In the Workflow Designer, double-click the task you want to make reusable. In the General tab of the Edit Task dialog box, select the Make Reusable option. When prompted whether you are sure you want to promote the task, click Yes. Click OK to return to the workflow. Click Repository > Save.
The newly promoted task appears in the list of reusable tasks in the Tasks node in the Navigator window.
162
Figure 5-2 displays the Revert button in the Mapping tab of a Session task:
Figure 5-2. Revert Button in Session Properties
Revert Button
Disabling Tasks
In the Workflow Designer, you can disable a workflow task so that the Integration Service runs the workflow without the disabled task. The status of a disabled task is DISABLED. Disable a task in the workflow by selecting the Disable This Task option in the Edit Tasks dialog box.
Configuring Tasks
163
To fail the parent workflow or worklet if the task fails, double-click the task and select the Fail Parent If This Task Fails option in the General tab. When you select this option and a task fails, it does not prevent the other tasks in the workflow or worklet from running. Instead, the Integration Service marks the status of the workflow or worklet as failed. If you have a session nested within multiple worklets, you must select the Fail Parent If This Task Fails option for each worklet instance to see the failure at the workflow level. To fail the parent workflow or worklet if the task does not run, double-click the task and select the Fail Parent If This Task Does Not Run option in the General tab. When you choose this option, the Integration Service fails the parent workflow if a task did not run.
Note: The Integration Service does not fail the parent workflow if you disable a task.
164
Validating Tasks
You can validate reusable tasks in the Task Developer. Or, you can validate task instances in the Workflow Designer. When you validate a task, the Workflow Manager validates task attributes and links. For example, the user-defined event you specify in an Event tasks must exist in the workflow. The Workflow Manager uses the following rules to validate tasks:
Assignment. The Workflow Manager validates the expression you enter for the Assignment task. For example, the Workflow Manager verifies that you assigned a matching datatype value to the workflow variable in the assignment expression. Command. The Workflow Manager does not validate the shell command you enter for the Command task. Event-Wait. If you choose to wait for a predefined event, the Workflow Manager verifies that you specified a file to watch. If you choose to use the Event-Wait task to wait for a user-defined event, the Workflow Manager verifies that you specified an event. Event-Raise. The Workflow Manager verifies that you specified a user-defined event for the Event-Raise task. Timer. The Workflow Manager verifies that the variable you specified for the Absolute Time setting has the Date/Time datatype. Start. The Workflow Manager verifies that you linked the Start task to at least one task in the workflow.
When a task instance is invalid, the workflow using the task instance becomes invalid. When a reusable task is invalid, it does not affect the validity of the task instance used in the workflow. However, if a Session task instance is invalid, the workflow may still be valid. The Workflow Manager validates sessions differently. For more information, see Validating a Session on page 231. To validate a task, select the task in the workspace and click Tasks > Validate. Or, right-click the task in the workspace and choose Validate.
Validating Tasks
165
In the Workflow Designer, click the Assignment icon on the Tasks toolbar.
-orClick Tasks > Create. Select Assignment Task for the task type.
2.
Enter a name for the Assignment task. Click Create. Then click Done. The Workflow Designer creates and adds the Assignment task to the workflow.
3. 4.
Double-click the Assignment task to open the Edit Task dialog box. On the Expressions tab, click Add to add an assignment.
Add an assignment.
Open Button
166
5.
6. 7.
Select the variable for which you want to assign a value. Click OK. Click the Edit button in the Expression field to open the Expression Editor. The Expression Editor shows predefined workflow variables, user-defined workflow variables, variable functions, and boolean and arithmetic operators.
8.
Enter the value or expression you want to assign. For example, if you want to assign the value 500 to the user-defined variable $$custno1, enter the number 500 in the Expression Editor.
Working with the Assignment Task 167
9.
Click Validate. Validate the expression before you close the Expression Editor.
10.
Repeat steps 5 to 7 to add more variable assignments. Use the up and down arrows in the Expressions tab to change the order of the variable assignments.
11.
Click OK.
168
Standalone Command task. Use a Command task anywhere in the workflow or worklet to run shell commands. Pre- and post-session shell command. You can call a Command task as the pre- or postsession shell command for a Session task. For more information about specifying presession and post-session shell commands, see Using Pre- and Post-Session Shell Commands on page 225.
Use any valid UNIX command or shell script for UNIX servers, or any valid DOS or batch file for Windows servers. For example, you might use a shell command to copy a file from one directory to another. For a Windows server you would use the following shell command to copy the SALES_ ADJ file from the source directory, L, to the target, H:
copy L:\sales\sales_adj H:\marketing\
For a UNIX server, you would use the following command to perform a similar operation:
cp sales/sales_adj marketing/
Each shell command runs in the same environment as the Integration Service. Environment settings in one shell command script do not carry over to other scripts. To run all shell commands in the same environment, call a single shell script that invokes other scripts.
Standalone Command tasks. You can use service, service process, workflow, and worklet variables in standalone Command tasks. You cannot use session parameters, mapping parameters, or mapping variables in standalone Command tasks. The Integration Service does not expand these types of parameters and variables in standalone Command tasks. Pre- and post-session shell commands. You can use any parameter or variable type that you can define in the parameter file.
For more information about using parameters and variables in commands, see Where to Use Parameters and Variables on page 691.
169
Assigning Resources
You can assign resources to Command task instances in the Worklet or Workflow Designer. You might want to assign resources to a Command task if you assign the workflow to an Integration Service associated with a grid. When you assign a resource to a Command task and the Integration Service is configured to check resources, the Load Balancer dispatches the task to a node that has the resource available. A task fails if the Load Balancer cannot find a node where the required resource is available. For information about assigning resources to Command and Session tasks, see Assigning Resources to Tasks on page 642. For information about configuring the Integration Service to check resources, see Creating and Configuring the Integration Service in the PowerCenter Administrator Guide.
In the Workflow Designer or the Task Developer, click the Command Task icon on the Tasks toolbar.
-orClick Task > Create. Select Command Task for the task type.
2. 3.
Enter a name for the Command task. Click Create. Then click Done. Double-click the Command task in the workspace to open the Edit Tasks dialog box.
170
4.
Add Button
Edit Button
5. 6.
In the Name field, enter a name for the new command. In the Command field, click the Edit button to open the Command Editor.
7. 8. 9. 10.
Enter the command you want to run. Enter one command in the Command Editor. You can use service, service process, workflow, and worklet variables in the command. Click OK to close the Command Editor. Repeat steps 3 to 8 to add more commands in the task. Optionally, click the General tab in the Edit Tasks dialog to assign resources to the Command task. For more information about assigning resources to Command and Session tasks, see Assigning Resources to Tasks on page 642.
Working with the Command Task 171
11.
Click OK.
If you specify non-reusable shell commands for a session, you can promote the non-reusable shell commands to a reusable Command task. For more information, see Creating a Reusable Command Task from Pre- or Post-Session Commands on page 228.
172
Figure 5-3 shows the Fail Task if Any Command Fails option:
Figure 5-3. Fail Task if Any Command Fails Option
Set the task to fail if any command in the task fails. Set the command recovery strategy to restart the task or fail the task and continue the workflow.
You can choose a recovery strategy for the task. The recovery strategy determines how the Integration Service recovers the task when you configure workflow recovery and the task fails. You can configure the task to restart or you can configure the task to fail and continue running the workflow. For more information about recovering tasks, see Configuring Task Recovery on page 386.
173
In the Workflow Designer, click the Control Task icon on the Tasks toolbar.
-orClick Tasks > Create. Select Control Task for the task type.
2.
Enter a name for the Control task. Click Create. Then click Done. The Workflow Manager creates and adds the Control task to the workflow.
3.
174
4.
Fail Parent Stop Parent Abort Parent Fail Top-Level Workflow Stop Top-Level Workflow Abort Top-Level Workflow
175
Example
For example, you have a Command task that depends on the status of the three sessions in the workflow. You want the Integration Service to run the Command task when any of the three sessions fails. To accomplish this, use a Decision task with the following decision condition:
$Q1_session.status = FAILED OR $Q2_session.status = FAILED OR $Q3_session.status = FAILED
You can then use the predefined condition variable in the input link condition of the Command task. Configure the input link with the following link condition:
$Decision.condition = True
176
You can configure the same logic in the workflow without the Decision task. Without the Decision task, you need to use three link conditions and treat the input links to the Command task as OR links. Figure 5-5 shows the sample workflow without the Decision task:
Figure 5-5. Sample Workflow Without a Decision Task
You can further expand the sample workflow in Figure 5-4. In Figure 5-4, the Integration Service runs the Command task if any of the three Session tasks fails. Suppose now you want the Integration Service to also run an Email task if all three Session tasks succeed.
177
To do this, add an Email task and use the decision condition variable in the link condition. Figure 5-6 shows the expanded sample workflow using a Decision task:
Figure 5-6. Expanded Sample Workflow Using a Decision Task
$Decision.condition = True
$Decision.condition = False
In the Workflow Designer, click the Decision Task icon on the Tasks toolbar.
-orClick Tasks > Create. Select Decision Task for the task type.
2.
Enter a name for the Decision task. Click Create. Then click Done. The Workflow Designer creates and adds the Decision task to the workspace.
178
3.
4. 5.
Click the Open button in the Value field to open the Expression Editor. In the Expression Editor, enter the condition you want the Integration Service to evaluate. Validate the expression before you close the Expression Editor.
6.
Click OK.
179
Event-Raise task. Event-Raise task represents a user-defined event. When the Integration Service runs the Event-Raise task, the Event-Raise task triggers the event. Use the EventRaise task with the Event-Wait task to define events. Event-Wait task. The Event-Wait task waits for an event to occur. Once the event triggers, the Integration Service continues executing the rest of the workflow.
To coordinate the execution of the workflow, you may specify the following types of events for the Event-Wait and Event-Raise tasks:
Predefined event. A predefined event is a file-watch event. For predefined events, use an Event-Wait task to instruct the Integration Service to wait for the specified indicator file to appear before continuing with the rest of the workflow. When the Integration Service locates the indicator file, it starts the next task in the workflow. User-defined event. A user-defined event is a sequence of tasks in the workflow. Use an Event-Raise task to specify the location of the user-defined event in the workflow. A userdefined event is sequence of tasks in the branch from the Start task leading to the EventRaise task. When all the tasks in the branch from the Start task to the Event-Raise task complete, the Event-Raise task triggers the event. The Event-Wait task waits for the Event-Raise task to trigger the event before continuing with the rest of the tasks in its branch.
180
Complete the following steps to configure the workflow shown in Figure 5-7: 1. 2. 3. 4. 5. 6. 7. 8. Link Q1_session and Q2_session concurrently. Add Q3_session after Q1_session. Declare an event called Q1Q3_Complete in the Events tab of the workflow properties. In the workspace, add an Event-Raise task after Q3_session. Specify the Q1Q3_Complete event in the Event-Raise task properties. This allows the Event-Raise task to trigger the event when Q1_session and Q3_session complete. Add an Event-Wait task after Q2_session. Specify the Q1Q3_Complete event for the Event-Wait task. Add Q4_session after the Event-Wait task. When the Integration Service processes the Event-Wait task, it waits until the Event-Raise task triggers Q1Q3_Complete before it runs Q4_session. The Integration Service runs Q1_session and Q2_session concurrently. When Q1_session completes, the Integration Service runs Q3_session. The Integration Service finishes executing Q2_session. The Event-Wait task waits for the Event-Raise task to trigger the event. The Integration Service completes Q3_session. The Event-Raise task triggers the event, Q1Q3_complete. The Integration Service runs Q4_session because the event, Q1Q3_Complete, has been triggered. The Integration Service runs the Email task.
The Integration Service runs the workflow shown in Figure 5-7 in the following order: 1. 2. 3. 4. 5. 6. 7. 8.
181
In the Workflow Designer, click Workflow > Edit to open the workflow properties. Select the Events tab in the Edit Workflow dialog box.
3.
Click Add to add an event name. Event name is not case sensitive.
4.
Click OK.
In the Workflow Designer workspace, create an Event-Raise task and place it in the workflow to represent the user-defined event you want to trigger. A user-defined event is the sequence of tasks in the branch from the Start task to the Event-Raise task.
182
2.
3.
Click the Open button in the Value field on the Properties tab to open the Events Browser for user-defined events.
4. 5.
Choose an event in the Events Browser. Click OK twice to return to the workspace.
the indicator file to appear. Once the indicator file appears, the Integration Service continues running tasks after the Event-Wait task. You can assign resources to Event-Wait tasks that wait for predefined events. You may want to assign a resource to a predefined Event-Wait task if you are running on a grid and the indicator file appears on a specific node or in a specific directory. When you assign a resource to a predefined Event-Wait task and the Integration Service is configured to check resources, the Load Balancer distributes the task to a node where the required resource is available. For more information about assigning resources to tasks, see Assigning Resources to Tasks on page 642. For more information about configuring the Integration Service to check resources, see Creating and Configuring the Integration Service in the PowerCenter Administrator Guide.
Note: If you use the Event-Raise task to trigger the event when you wait for a predefined event,
you may not be able to successfully recover the workflow. You can also use the Event-Wait task to wait for a user-defined event. To use the Event-Wait task for a user-defined event, specify the name of the user-defined event in the Event-Wait task properties. The Integration Service waits for the Event-Raise task to trigger the userdefined event. Once the user-defined event is triggered, the Integration Service continues running tasks after the Event-Wait task.
184
In the workflow, create an Event-Wait task and double-click the Event-Wait task to open it. In the Events tab of the task, select User-Defined.
3.
Click the Event button to open the Events Browser dialog box.
4. 5.
Select a user-defined event for the Integration Service to wait. Click OK twice.
185
On Windows, the Integration Service looks for the file in the system directory. For example, on Windows 2000, the system directory is c:\winnt\system32. On UNIX, the Integration Service looks for the indicator file in the current working directory for the Integration Service process. On UNIX this directory is /server/bin.
You can enter the actual name of the file or use process variables to specify the location of the file. You can also use user-defined workflow and worklet variables to specify the file name and location. For example, create a workflow variable, $$MyFileWatchFile, for the indicator file name and location, and set $$MyFileWatchFile to the file name and location in the parameter file. For more information about creating workflow and worklet variables, see User-Defined Workflow Variables on page 152. For more information about defining parameters and variables in parameter files, see Where to Use Parameters and Variables on page 691. The Integration Service writes the time the file appears in the workflow log.
Note: Do not use a source or target file name as the indicator file name because you may
accidentally delete a source or target file. Or, the Integration Service may try to delete the file before the session finishes writing to the target.
186
Create an Event-Wait task and double-click the Event-Wait task to open it. In the Events tab of the task, select Predefined.
3. 4.
Enter the path of the indicator file. If you want the Integration Service to delete the indicator file after it detects the file, select the Delete Filewatch File option in the Properties tab.
5.
Click OK.
187
188
Absolute time. You specify the time that the Integration Service starts running the next task in the workflow. You may specify the date and time, or you can choose a user-defined workflow variable to specify the time. Relative time. You instruct the Integration Service to wait for a specified period of time after the Timer task, the parent workflow, or the top-level workflow starts.
For example, a workflow contains two sessions. You want the Integration Service wait 10 minutes after the first session completes before it runs the second session. Use a Timer task after the first session. In the Relative Time setting of the Timer task, specify ten minutes from the start time of the Timer task. Figure 5-8 shows the sample workflow using the Timer task:
Figure 5-8. Sample Workflow Using the Timer Task
Use a Timer task anywhere in the workflow after the Start task.
To create a Timer task: 1.
In the Workflow Designer, click the Timer task icon on the Tasks toolbar.
-orClick Tasks > Create. Select Timer Task for the task type.
2. 3.
Double-click the Timer task to open it. On the General tab, enter a name for the Timer task.
189
4.
Click the Timer tab to specify when the Integration Service starts the next task in the workflow.
5.
Specify attributes for Absolute Time or Relative Time described in Table 5-2:
Table 5-2. Timer Task Attributes
Timer Attribute Absolute Time: Specify the exact time to start Absolute Time: Use this workflow date-time variable to calculate the wait Description Integration Service starts the next task in the workflow at the date and time you specify. Specify a user-defined date-time workflow variable. The Integration Service starts the next task in the workflow at the time you choose. The Workflow Manager verifies that the variable you specify has the Date/Time datatype. The Timer task fails if the date-time workflow variable evaluates to NULL. Specify the period of time the Integration Service waits to start executing the next task in the workflow. Select this option to wait a specified period of time after the start time of the Timer task to run the next task. Select this option to wait a specified period of time after the start time of the parent workflow/worklet to run the next task. Choose this option to wait a specified period of time after the start time of the top-level workflow to run the next task.
Relative time: Start after Relative time: from the start time of this task Relative time: from the start time of the parent workflow/ worklet Relative time: from the start time of the top-level workflow
190
Chapter 6
Overview, 192 Developing a Worklet, 193 Using Worklet Variables, 197 Assigning Variable Values in a Worklet, 199 Validating Worklets, 203
191
Overview
A worklet is an object representing a set of tasks created to reuse a set of workflow logic in multiple workflows. You can create a worklet in the Worklet Designer. To run a worklet, include the worklet in a workflow. The workflow that contains the worklet is called the parent workflow. When the Integration Service runs a worklet, it expands the worklet to run tasks and evaluate links within the worklet. It writes information about worklet execution in the workflow log.
Suspending Worklets
When you choose Suspend on Error for the parent workflow, the Integration Service also suspends the worklet if a task in the worklet fails. When a task in the worklet fails, the Integration Service stops executing the failed task and other tasks in its path. If no other task is running in the worklet, the worklet status is Suspended. If one or more tasks are still running in the worklet, the worklet status is Suspending. The Integration Service suspends the parent workflow when the status of the worklet is Suspended or Suspending. For more information about suspending workflows, see Suspending the Workflow on page 138.
192
Developing a Worklet
To develop a worklet, you must first create a worklet. After you create a worklet, configure worklet properties and add tasks to the worklet. You can create reusable worklets in the Worklet Designer. You can also create non-reusable worklets in the Workflow Designer as you develop the workflow.
In the Worklet Designer, click Worklet > Create. The Create Worklet dialog box appears.
Configure the worklet to run in a workflow that is enabled for concurrent execution.
2. 3.
Enter a name for the worklet. If you are adding the worklet to a workflow that is enabled for concurrent execution, enable the worklet for concurrent execution. For more information about enabling
Developing a Worklet
193
workflows for concurrent execution, see Starting and Stopping Concurrent Workflows on page 563.
4.
Click OK. The Worklet Designer creates a Start task in the worklet.
In the Workflow Designer, open a workflow. Click Tasks > Create. For the Task type, select Worklet. Enter a name for the task. Click Create. The Workflow Designer creates the worklet and adds it to the workspace.
6.
Click Done.
Worklet variables. Use worklet variables to reference values and record information. You use worklet variables the same way you use workflow variables. You can assign a workflow variable to a worklet variable to override its initial value. For more information about worklet variables, see Using Worklet Variables on page 197. Events. To use the Event-Wait and Event-Raise tasks in the worklet, you must first declare an event in the worklet properties.
194
Metadata extension. Extend the metadata stored in the repository by associating information with repository objects. For more information, see Working with Metadata Extensions on page 29.
Create a non-reusable worklet in the Workflow Designer workspace. Right-click the worklet and choose Open Worklet. The Worklet Designer opens so you can add tasks in the worklet.
3. 4.
Add tasks in the worklet by using the Tasks toolbar or click Tasks > Create in the Worklet Designer. Connect tasks with links.
Nesting Worklets
You can nest a worklet within another worklet. When you run a workflow containing nested worklets, the Integration Service runs the nested worklet from within the parent worklet. You can group several worklets together by function or simplify the design of a complex workflow when you nest worklets. You might choose to nest worklets to load data to fact and dimension tables. Create a nested worklet to load fact and dimension data into a staging area. Then, create a nested worklet to load the fact and dimension data from the staging area to the data warehouse.
Developing a Worklet
195
You might choose to nest worklets to simplify the design of a complex workflow. Nest worklets that can be grouped together within one worklet. In the workflow in Figure 6-1, two worklets relate to regional sales and two worklets relate to quarterly sales. Figure 6-1 shows a workflow that uses multiple worklets:
Figure 6-1. Workflow with Multiple Worklets
The workflow in Figure 6-2 shows the same workflow with the worklets grouped and nested in parent worklets. Figure 6-2 shows a workflow that uses nested worklets:
Figure 6-2. Workflow with Nested Worklets
196
197
Double-click the worklet instance in the Workflow Designer workspace. On the Variables tab, click the Add button in the pre-worklet variable assignment.
Add Button
3. 4.
Click the open button in the User-Defined Worklet Variables field to select a worklet variable. Click Apply. The worklet variable in this worklet instance now has the selected workflow variable as its initial value.
You cannot use variables from the parent workflow in the worklet. You cannot use user-defined worklet variables in the parent workflow. You can use predefined worklet variables in the parent workflow, just as you use predefined variables for other tasks in the workflow.
198
Pre-worklet variable assignment. You can update user-defined worklet variables before a worklet runs. You can assign these variables the values of parent workflow or worklet variables or the values of mapping variables from other tasks in the workflow or parent worklet. You can update worklet variables with values from the parent of the worklet. Therefore, if a worklet is in another worklet within a workflow, you can assign values from the parent worklet variables, but not the workflow variables.
Post-worklet variable assignment. You can update parent workflow or worklet variables after the worklet completes. You can assign these variables the values of user-defined worklet variables.
199
You assign variables on the Variables tab when you edit a worklet:
To pass the worklet variable value from wklt_CreateCustList to wklt_UpdateCustOrders, complete the following steps: 1. Configure worklet wklt_CreateCustList to use a worklet variable, for example, $$URLString1. For more information about creating worklet variables, see Using Worklet Variables on page 197. 2. Configure worklet wklt_UpdateCustOrders to use a worklet variable, for example, $$URLString2.
200
3.
Configure the workflow to use a workflow variable, for example, $$PassURLString. For more information about creating workflow variables, see User-Defined Workflow Variables on page 152.
4. 5.
Configure worklet wklt_CreateCustList to assign the value of worklet variable $$URLString1 to workflow variable $$PassURLString after the worklet completes. Configure worklet wklt_UpdateCustOrders to assign the value of workflow variable $$PassURLString to worklet variable $$URLString2 before the worklet starts.
Pre-worklet variable assignment. Update user-defined worklet variables with the values of parent workflow or worklet variables or the values of mapping variables from other tasks in the workflow or parent worklet that run before this worklet. Post-worklet variable assignment. Update parent workflow and worklet variables with the values of user-defined worklet variables.
Edit the worklet for which you want to assign variables. Click the Variables tab.
Select a variable.
Pre-worklet variable assignment. Assign values to user-defined worklet variables before a worklet runs. Post-worklet variable assignment. Assign values to parent workflow and worklet variables after a worklet completes.
Assigning Variable Values in a Worklet 201
3. 4. 5.
Click the edit button in the variable assignment field. In the pre- or post-worklet variable assignment area, click the add button to add a variable assignment statement. Click the open button in the User-Defined Worklet Variables and Parent Workflow/ Worklet Variables fields to select the variables whose values you wish to read or assign. For pre-worklet variable assignment, you may enter parameter and variable names into these fields. The Workflow Manager does not validate parameter and variable names. The Workflow Manager assigns values from the right side of the assignment statement to variables on the left side of the statement. So, if the variable assignment statement is $$SiteURL_WFVar=$$SiteURL_WkltVar, the Workflow Manager assigns the value of $$SiteURL_WkltVar to $$SiteURL_WFVar.
6.
Repeat steps 4 and 5 to add more variable assignment statements. To delete a variable assignment statement, click one of the fields in the assignment statement, and click the cut button. Click OK.
7.
202
Validating Worklets
The Workflow Manager validates worklets when you save the worklet in the Worklet Designer. In addition, when you use worklets in a workflow, the Integration Service validates the workflow according to the following validation rules at run time:
You cannot run two instances of the same worklet concurrently in the same workflow. You cannot run two instances of the same worklet concurrently across two different workflows. Each worklet instance in the workflow can run once.
When a worklet instance is invalid, the workflow using the worklet instance remains valid. For more information about workflow validation rules, see Validating a Workflow on page 132. The Workflow Manager displays a red invalid icon if the worklet object is invalid. The Workflow Manager validates the worklet object using the same validation rules for workflows. The Workflow Manager displays a blue invalid icon if the worklet instance in the workflow is invalid. The worklet instance may be invalid when any of the following conditions occurs:
The parent workflow or worklet variable you assign to the user-defined worklet variable does not have a matching datatype. The user-defined worklet variable you used in the worklet properties does not exist. You do not specify the parent workflow or worklet variable you want to assign.
For non-reusable worklets, you may see both red and blue invalid icons displayed over the worklet icon in the Navigator.
Validating Worklets
203
204
Chapter 7
Overview, 206 Creating a Session Task, 207 Editing a Session, 208 Understanding Buffer Memory, 213 Configuring Automatic Memory Settings, 214 Creating a Session Configuration Object, 218 Configuring Performance Details, 221 Using Pre- and Post-Session SQL Commands, 223 Using Pre- and Post-Session Shell Commands, 225 Validating a Session, 231 Stopping and Aborting a Session, 233 Working with Session Parameters, 236 Mapping Parameters and Variables in Sessions, 242 Assigning Parameter and Variable Values in a Session, 243 Handling High Precision Data, 248
205
Overview
A session is a set of instructions that tells the Integration Service how and when to move data from sources to targets. A session is a type of task, similar to other tasks available in the Workflow Manager. In the Workflow Manager, you configure a session by creating a Session task. To run a session, you must first create a workflow to contain the Session task. When you create a Session task, you enter general information such as the session name, session schedule, and the Integration Service to run the session. You can also select options to run pre-session shell commands, send On-Success or On-Failure email, and use FTP to transfer source and target files. You can configure the session to override parameters established in the mapping, such as source and target location, source and target type, error tracing levels, and transformation attributes. You can also configure the session to collect performance details for the session and store them in the PowerCenter repository. You might view performance details for a session to tune the session. You can run as many sessions in a workflow as you need. You can run the Session tasks sequentially or concurrently, depending on the requirement. The Integration Service creates several files and in-memory caches depending on the transformations and options used in the session. For more information about session output files and caches, see Integration Service Architecture in the PowerCenter Administrator Guide.
206
communicate with databases and the Integration Service. You must assign appropriate permissions for any database, FTP, or external loader connections you configure. For more information about configuring the Workflow Manager, see Working with Connection Objects on page 37.
In the Workflow Designer, click the Session Task icon on the Tasks toolbar. -orClick Tasks > Create. Select Session Task for the task type.
2. 3.
4. 5.
Select the mapping you want to use in the Session task and click OK. Click Done. The Session task appears in the workspace.
207
Editing a Session
After you create a session, you can edit it. For example, you might need to adjust the buffer and cache sizes, modify the update strategy, or clear a variable value saved in the repository. Double-click the Session task to open the session properties. The session has the following tabs, and each of those tabs has multiple settings:
General tab. Enter session name, mapping name, and description for the Session task, assign resources, and configure additional task options. Properties tab. Enter session log information, test load settings, and performance configuration. Config Object tab. Enter advanced settings, log options, and error handling configuration. Mapping tab. Enter source and target information, override transformation properties, and configure the session for partitioning. Components tab. Configure pre- or post-session shell commands and emails. Metadata Extension tab. Configure metadata extension options.
For a detailed description of the session properties tabs and associated options, see Session Properties Reference on page 801. Figure 7-1 shows the session properties:
Figure 7-1. Session Properties
208
You can edit session properties at any time. The repository updates the session properties immediately. If the session is running when you edit the session, the repository updates the session when the session completes. If the mapping changes, the Workflow Manager might issue a warning that the session is invalid. The Workflow Manager then lets you continue editing the session properties. After you edit the session properties, the Integration Service validates the session and reschedules the session. For more information about session validation, see Validating a Session on page 231.
For a target instance, you can change writers, connections, and properties settings.
Editing a Session
209
Table 7-1 shows the options you can use to apply attributes to objects in a session. You can apply different options depending on whether the setting is a reader or writer, connection, or an object property.
Table 7-1. Apply All Options
Setting Reader Writer Reader Writer Option Apply Type to All Instances Description Applies a reader or writer type to all instances of the same object type in the session. For example, you can apply a relational reader type to all the other readers in the session. Applies a reader or writer type to all the partitions in a pipeline. For example, if you have four partitions, you can change the writer type in one partition for a target instance. Use this option to apply the change to the other three partitions. Applies the same type of connection to all instances. Connection types are relational, FTP, queue, application, or external loader. Apply a connection value to all instances or partitions. The connection value defines a specific connection that you can view in the connection browser. You can apply a connection value that is valid for the existing connection type. Apply only the connection attribute values to all instances or partitions. Each type of connection has different attributes. You can apply connection attributes separately from connection values. To view sample connection attributes, see Figure 7-3 on page 211. Apply the connection value and its connection attributes to all the other instances that have the same connection type. This option combines the connection option and the connection attribute option. Applies the connection value and its attributes to all the other instances even if they do not have the same connection type. This option is similar to Apply Connection Data, but it lets you change the connection type. Applies an attribute value to all instances of the same object type in the session. For example, if you have a relational target you can choose to truncate a table before you load data. You can apply the attribute value to all the relational targets in the session. Applies an attribute value to all partitions in a pipeline. For example, you can change the name of the reject file name in one partition for a target instance, then apply the file name change to the other partitions.
Connections Connections
Connections
Connections
Connections
Properties
Properties
value to apply to it. The Apply All Connection Information option lets you apply a new connection type, connection value, and connection attributes. Figure 7-3 illustrates the connection options by showing where they display on a connection browser:
Figure 7-3. Connection Options
The connection type can be relational, FTP, queue, application, or external loader.
Open a session in the workspace. Click the Mappings tab. Choose a source, target, or transformation instance from the Navigator. Settings for properties, connections, and readers or writers might display, depending on the object you choose.
Editing a Session
211
4.
5. 6.
Select an option from the list and choose to apply it to all instances or all partitions. Click OK to apply the attribute or property.
212
For example, a session that contains a single partition using a mapping that contains 50 sources and 50 targets requires a minimum of 200 buffer blocks.
[(50 + 50)* 2] = 200
You configure buffer memory settings by adjusting the following session properties:
DTM Buffer Size. The DTM buffer size specifies the amount of buffer memory the Integration Service uses when the DTM processes a session. Configure the DTM buffer size on the Properties tab in the session properties. Default Buffer Block Size. The buffer block size specifies the amount of buffer memory used to move a block of data from the source to the target. Configure the buffer block size on the Config Object tab in the session properties.
The Integration Service specifies a minimum memory allocation for the buffer memory and buffer blocks. By default, the Integration Service allocates 12,000,000 bytes of memory to the buffer memory and 64,000 bytes per block. If the DTM cannot allocate the configured amount of buffer memory for the session, the session cannot initialize. Usually, you do not need more than 1 GB for the buffer memory. You can configure a numeric value for the buffer size, or you can configure the session to determine the buffer memory size at run time. For more information about configuring automatic memory settings, see Configuring Automatic Memory Settings on page 214. For information about configuring buffer memory or buffer block size, see the PowerCenter Performance Tuning Guide.
213
You can configure DTM buffer size and the default buffer block size in the session properties.
To configure automatic memory settings for the DTM buffer size: 1. 2.
Open the session, and click the Config Object tab. Enter a value for the Default Buffer Block Size. You can specify auto or a numeric value. If you enter 2000, the Integration Service interprets the number as 2000 bytes. Append KB, MB, or GB to the value to specify other units. For example, specify 512MB.
3. 4.
Click the Properties tab. Enter a value for the DTM buffer size.
214
You can specify auto or a numeric value. If you enter 2000, the Integration Service interprets the number as 2000 bytes. Append KB, MB, or GB to the value to specify other units. For example, specify 512MB.
Note: If you specify auto for the DTM buffer size or the default Buffer Block Size, enable
automatic memory settings by configuring a non-zero value for the Maximum Memory Allowed for Auto Memory Attributes and the Maximum Percentage of Total Memory Allowed for Auto Memory Attributes. If you do not enable automatic memory settings after you specify auto for the DTM buffer size or the default Buffer Block Size, the Integration Service uses default values.
Lookup transformation index and data caches Aggregator transformation index and data caches Rank transformation index and data caches Joiner transformation index and data caches Sorter transformation cache XML target cache
You can configure auto for the index and data cache size in the transformation properties or on the mappings tab of the session properties.
215
When you configure automatic memory settings, the Integration Service specifies a minimum memory allocation for the index and data caches. By default, the Integration Service allocates 1megabyte to the index cache and 2 megabytes to the data cache for each transformation instance. If you configure a maximum memory limit that is less than the minimum value for an index or data cache, the Integration Service overrides the value based on the transformation metadata. When you run a session on a grid and you configure Maximum Memory Allowed For Auto Memory Attributes, the Integration Service divides the allocated memory among all the nodes in the grid. When you configure Maximum Percentage of Total Memory Allowed For Auto Memory Attributes, the Integration Service allocates the specified percentage of memory on each node in the grid.
Open the transformation in the Transformation Developer or the Mappings tab of the session properties. In the transformation properties, select or enter auto for the following cache size settings:
216
3.
Open the session in the Task Developer or Workflow Designer, and click the Config Object tab.
4.
Enter a value for the Maximum Memory Allowed for Auto Memory Attributes. If you enter 2000, the Integration Service interprets the number as 2000 bytes. Append KB, MB, or GB to the value to specify other units. For example, specify 512MB. This value specifies the maximum amount of memory to use for session caches. If you set the value to zero, the Integration Service uses default values for memory attributes that you set to auto.
5.
Enter a value for the Maximum Percentage of Total Memory Allowed for Auto Memory Attributes. This value specifies the maximum percentage of total memory the session caches may use. If the value is set to zero, the Integration Service uses default values for memory attributes that you set to auto.
217
218
In the Workflow Manager, click Tasks > Session Configuration. The Session Configuration Browser appears.
2.
3.
219
4.
In the Properties tab, configure advanced settings, log options, and error handling options.
5.
Click OK.
For session configuration object settings descriptions, see Config Object Tab on page 811.
220
Configure the session to collect performance data. Configure the session to write performance data to repository. Configure Integration Service to persist run-time statistics to the repository at the verbose level.
For more information about configuring the Integration Service to persist run-time statistics to the repository, see Creating and Configuring the Integration Service in the PowerCenter Administrator Guide. The Workflow Monitor displays performance details for each session that is configured to collect or write performance details. For more information about viewing performance details in the Workflow Monitor, see Viewing Performance Details on page 621.
To configure performance details: 1. 2.
In the Workflow Manager, open the session properties and select the Properties tab. Select Collect performance data to view performance details while the session runs.
221
3.
Select Write Performance Data to Repository to store and view performance details for previous session runs. You must also configure the Integration Service to store the run-time information at the verbose level.
4. 5.
For more information about the Collect performance data and Write performance data to repository option, see Performance Settings on page 807.
222
Use any command that is valid for the database type. However, the Integration Service does not allow nested comments, even though the database might. Use a semicolon (;) to separate multiple statements. The Integration Service issues a commit after each statement. The Integration Service ignores semicolons within /* ...*/. If you need to use a semicolon outside of comments, you can escape it with a backslash (\). The Workflow Manager does not validate the SQL.
Error Handling
You can configure error handling on the Config Object tab. You can choose to stop or continue the session if the Integration Service encounters an error issuing the pre- or postsession SQL command.
223
Figure 7-5 shows how to configure error handling for pre- or post-session SQL commands:
Figure 7-5. Stop or Continue the Session on Pre- or Post-Session SQL Errors
224
Pre-session command. The Integration Service performs pre-session shell commands at the beginning of a session. You can configure a session to stop or continue if a pre-session shell command fails. Post-session success command. The Integration Service performs post-session success commands only if the session completed successfully. Post-session failure command. The Integration Service performs post-session failure commands only if the session failed to complete. Use any valid UNIX command or shell script for UNIX nodes, or any valid DOS or batch file for Windows nodes. Configure the session to run the pre- or post-session shell commands.
The Workflow Manager provides a task called the Command task that lets you configure shell commands anywhere in the workflow. You can choose a reusable Command task for the preor post-session shell command. Or, you can create non-reusable shell commands for the preor post-session shell commands. For more information about the Command task, see Working with the Command Task on page 169. If you create a non-reusable pre- or post-session shell command, you can make it into a reusable Command task. The Workflow Manager lets you choose from the following options when you configure shell commands:
Create non-reusable shell commands. Create a non-reusable set of shell commands for the session. Other sessions in the folder cannot use this set of shell commands. Use an existing reusable Command task. Select an existing Command task to run as the pre- or post-session shell command.
Configure pre- and post-session shell commands in the Components tab of the session properties.
a specific directory, you can run the same workflow on different Integration Services without changing session properties. You can also use a session parameter, $ParamMyCommand, as the pre- or post-session shell command, and set $ParamMyCommand to the command in a parameter file. For more information about using parameters and variables in commands, see Where to Use Parameters and Variables on page 691.
226
In the Components tab of the session properties, select Non-reusable for pre- or postsession shell command.
2. 3.
Click the Edit button in the Value field to open the Edit Pre- or Post-Session Command dialog box. Enter a name for the command in the General tab.
227
4.
If you want the Integration Service to perform the next command only if the previous command completed successfully, select Fail Task if Any Command Fails in the Properties tab.
5.
In the Commands tab, click the Add button to add shell commands. Enter one command for each line.
Add a command.
6.
Click OK.
228
To create a Command Task from non-reusable pre- or post-session shell commands, click the Edit button to open the Edit dialog box for the shell commands. In the General tab, select the Make Reusable check box. After you select the Make Reusable check box and click OK, a new Command task appears in the Tasks folder in the Navigator window. Use this Command task in other workflows, just as you do with any other reusable workflow tasks.
In the Components tab of the session properties, click Reusable for the pre- or postsession shell command. Click the Edit button in the Value field to open the Task Browser dialog box. Select the Command task you want to run as the pre- or post-session shell command.
4.
Click the Override button in the Task Browser dialog box if you want to change the order of the commands, or if you want to specify whether to run the next command when the previous command fails. Changes you make to the Command task from the session properties only apply to the session. In the session properties, you cannot edit the commands in the Command task.
5.
Click OK to select the Command task for the pre- or post-session shell command. The name of the Command task you select appears in the Value field for the shell command.
229
230
Validating a Session
The Workflow Manager validates a Session task when you save it. You can also manually validate Session tasks and session instances. Validate reusable Session tasks in the Task Developer. Validate non-reusable sessions and reusable session instances in the Workflow Designer. The Workflow Manager marks a reusable session or session instance invalid if you perform one of the following tasks:
Edit the mapping in a way that might invalidate the session. You can edit the mapping used by a session at any time. When you edit and save a mapping, the repository might invalidate sessions that already use the mapping. The Integration Service does not run invalid sessions. You must reconnect to the folder to see the effect of mapping changes on Session tasks. For more information about validating mappings, see Mappings in the Designer Guide. When you edit a session based on an invalid mapping, the Workflow Manager displays a warning message:
The mapping [mapping_name] associated with the session [session_name] is invalid.
Delete a database, FTP, or external loader connection used by the session. Leave session attributes blank. For example, the session is invalid if you do not specify the source file name. Change the code page of a session database connection to an incompatible code page.
If you delete objects associated with a Session task such as session configuration object, Email, or Command task, the Workflow Manager marks a reusable session invalid. However, the Workflow Manager does not mark a non-reusable session invalid if you delete an object associated with the session. If you delete a shortcut to a source or target from the mapping, the Workflow Manager does not mark the session invalid. The Workflow Manager does not validate SQL overrides or filter conditions entered in the session properties when you validate a session. You must validate SQL override and filter conditions in the SQL Editor. If a reusable session task is invalid, the Workflow Manager displays an invalid icon over the session task in the Navigator and in the Task Developer workspace. This does not affect the validity of the session instance and the workflows using the session instance. If a reusable or non-reusable session instance is invalid, the Workflow Manager marks it invalid in the Navigator and in the Workflow Designer workspace. Workflows using the session instance remain valid. To validate a session, select the session in the workspace and click Tasks > Validate. Or, rightclick the session instance in the workspace and choose Validate.
Validating a Session
231
the Navigator.
To validate multiple sessions: 1. 2.
Select sessions from either a query list or a view dependencies list. Right-click one of the selected sessions and choose Validate. The Validate Objects dialog box appears.
3.
Choose whether to save objects and check in objects that you validate.
232
Threshold Errors
You can choose to stop a session on a designated number of non-fatal errors. A non-fatal error is an error that does not force the session to stop on its first occurrence. Establish the error threshold in the session properties with the Stop on Errors option. When you enable this option, the Integration Service counts non-fatal errors that occur in the reader, writer, and transformation threads. The Integration Service maintains an independent error count when reading sources, transforming data, and writing to targets. The Integration Service counts the following nonfatal errors when you set the Stop on Errors option in the session properties:
Reader errors. Errors encountered by the Integration Service while reading the source database or source files. Reader threshold errors can include alignment errors while running a session in Unicode mode. Writer errors. Errors encountered by the Integration Service while writing to the target database or target files. Writer threshold errors can include key constraint violations, loading nulls into a not null field, and database trigger responses. Transformation errors. Errors encountered by the Integration Service while transforming data. Transformation threshold errors can include conversion errors, and any condition set up as an ERROR, such as null input.
When you create multiple partitions in a pipeline, the Integration Service maintains a separate error threshold for each partition. When the Integration Service reaches the error threshold for any partition, it stops the session. The writer may continue writing data from one or more partitions, but it does not affect the ability to perform a successful recovery.
Note: If alignment errors occur in a non line-sequential VSAM file, the Integration Service sets
Fatal Error
A fatal error occurs when the Integration Service cannot access the source, target, or repository. This can include loss of connection or target database errors, such as lack of database space to load data. If the session uses a Normalizer or Sequence Generator
Stopping and Aborting a Session 233
transformation, the Integration Service cannot update the sequence values in the repository, and a fatal error occurs. If the session does not use a Normalizer or Sequence Generator transformation, and the Integration Service loses connection to the repository, the Integration Service does not stop the session. The session completes, but the Integration Service cannot log session statistics into the repository.
ABORT Function
Use the ABORT function in the mapping logic to abort a session when the Integration Service encounters a designated transformation error. For more information about ABORT, see Functions in the Transformation Language Reference.
User Command
You can stop or abort the session from the Workflow Manager. You can also stop the session using pmcmd.
234
- Error threshold met due to transformation errors - ABORT( ) - Invalid evaluation of transaction control expression
235
$ParamName
Defines any other session property. For example, you can use this parameter to define a table owner name, table name prefix, FTP file or directory name, lookup cache file name prefix, or email address. You can use this parameter to define source, lookup, target, and reject file names, but not the session log file name or database connections.
* Define the parameter name using the appropriate prefix, for example, $InputFileMySource.
237
* Define the parameter name using the appropriate prefix and suffix. For example, for a source instance named Customers, the parameter for source table name is $PMCustomers@TableName. If the Source Qualifier is named SQ_Customers, the parameter for source number of affected rows is $PMSQ_Customers@numAffectedRows.
238
If you do not define a value for any of these parameters, the Integration Service uses the value defined in the connection object. The connection attributes you can override are listed in the following template file:
<PowerCenter Installation Directory>/server/bin/ConnectionParam.prm
For more information about overriding connection attributes in the parameter file and using the template file, see Overriding Connection Attributes in the Parameter File on page 46.
239
You can also use email variables to get the session name, Integration Service name, number of rows loaded, and number of rows rejected. For more information about post-session email, see Working with Post-Session Email on page 414.
When you define the parameter file as a resource for a node, verify the Integration Service runs the session on a node that can access the parameter file. Define the resource for the node, configure the Integration Service to check resources, and edit the session to require the resource. For information about configuring the Integration Service to check resources, see the PowerCenter Administrator Guide. When you create a file parameter, use alphanumeric and underscore characters. For example, to name a source file parameter, use $InputFileName, such as $InputFile_Data. All session file parameters of a particular type must have distinct names. For example, if you create two source file parameters, you might name them $SourceFileAccts and $SourceFilePrices. When you define the parameter in the file, you can reference any directory local to the Integration Service. Use a parameter to define the location of a file. Clear the entry in the session properties that define the file location. Enter the full path of the file in the parameter file. You can change the parameter value in the parameter file between session runs, or you can create multiple parameter files. If you use multiple parameter files, use the pmcmd
240
Startworkflow command with the -paramfile or -localparamfile options to specify which parameter file to use. Use the following rules and guidelines when you create database connection parameters:
You can change connections for relational sources, targets, lookups, and stored procedures. When you define the parameter, you can reference any database connection in the repository. Use the same $DBConnection parameter for more than one connection in a session.
241
In the Navigator window of the Workflow Manager, right-click the Session task and select View Persistent Values.
2. 3.
Click Delete Values to delete existing variable values. To save changes, click OK.
242
The types of parameters and variables you can update depend on whether you assign them before or after a session runs. You can update the following types of parameters and variables before or after a session runs:
Pre-session variable assignment. You can update mapping parameters, mapping variables, and session parameters before a session runs. You can assign these parameters and variables the values of workflow or worklet variables in the parent workflow or worklet. Therefore, if a session is in a worklet within a workflow, you can assign values from the worklet variables, but not the workflow variables. You cannot update mapplet variables in the pre-session variable assignment. Post-session on success variable assignment. You can update workflow or worklet variables in the parent workflow or worklet after the session completes successfully. You can assign these variables the values of mapping parameters and variables. Post-session on failure variable assignment. You can update workflow or worklet variables in the parent workflow or worklet when the session fails. You can assign these variables the values of mapping parameters and variables.
243
You assign parameters and variables on the Components tab of the session properties:
To pass the mapping variable value from s_NewCustomers to s_MergeCustomers, complete the following steps: 1. 2. 3. Configure the mapping associated with session s_NewCustomers to use a mapping variable, for example, $$Count1. Configure the mapping associated with session s_MergeCustomers to use a mapping variable, for example, $$Count2. Configure the workflow to use a workflow variable, for example, $$PassCountValue. For more information about creating workflow variables, see User-Defined Workflow Variables on page 152.
244 Chapter 7: Working with Sessions
4. 5.
Configure session s_NewCustomers to assign the value of mapping variable $$Count1 to workflow variable $$PassCountValue after the session completes successfully. Configure session s_MergeCustomers to assign the value of workflow variable $$PassCountValue to mapping variable $$Count2 before the session starts.
Pre-session variable assignment. Assign values to mapping parameters, mapping variables, and session parameters. You update these parameters and variables with the values of parent workflow or worklet variables. Post-session variable assignment. Assign values to workflow and worklet variables. You update these variables with the values of mapping parameters or variables.
Mapping parameters and variables have different datatypes than workflow and worklet variables. When you assign parameter and variable values in a session, the Integration Service converts the datatypes as shown in Table 7-5 and Table 7-6. Table 7-5 describes datatype conversion when you assign workflow or worklet variable values to mapping parameters or mapping variables:
Table 7-5. Datatype Conversion for Pre-Session Variable Assignment
Workflow/Worklet Variable Datatype Date/Time Double Integer NString Mapping Parameter/Variable Datatype Date/Time Double Integer NString
Table 7-6 describes datatype conversion when you assign mapping parameter or variable values to workflow or worklet variables:
Table 7-6. Datatype Conversion for Post-Session Variable Assignment
Mapping Parameter/Variable Datatype Date/Time Decimal Double Integer NString NText Real Workflow/Worklet Variable Datatype Date/Time NString* Double Integer NString NString Double
245
* There is no workflow or worklet variable datatype for the Decimal datatype. The Integration Service converts Decimal values in mapping parameters and variables to NString values in workflow and worklet variables.
Edit the session for which you want to assign parameters and variables. Click the Components tab. Select the variable assignment type:
Pre-session variable assignment. Assign values to mapping parameters, mapping variables, and session parameters before a session runs. Post-session on success variable assignment. Assign values to parent workflow or worklet variables after a session completes successfully. Post-session on failure variable assignment. Assign values to parent workflow or worklet variables after a session fails.
4.
Click the Open button in the variable assignment field. The Pre- or Post-Session Variable Assignment dialog box appears.
Add a variable assignment row. Delete a variable assignment row.
5. 6.
In the Pre- or Post-Session Variable Assignment dialog box, click the add button to add a variable assignment row. Click the open button in the Mapping Variables/Parameters and Parent Workflow/ Worklet Variables fields to select the parameters and variables whose values you want to read or assign. To assign values to a session parameter, enter the parameter name in the Mapping Variables/Parameters field. The Workflow Manager does not validate parameter names.
246
The Workflow Manager assigns values from the right side of the assignment statement to parameters and variables on the left side of the statement. So, if the variable assignment statement is $$Counter_MapVar=$$Counter_WFVar, the Workflow Manager assigns the value of $$Counter_WFVar to $$Counter_MapVar.
7.
Repeat steps 5 and 6 to add more variable assignment statements. To delete a variable assignment statement, click one of the fields in the assignment statement, and click the delete button. Click OK.
8.
247
Bigint
When the Integration Service evaluates expressions for some functions and arithmetic operators, it converts bigint values to double, evaluates the expression, and converts the result back to bigint. The transformation Bigint datatype supports precision of 19 digits, and the Double datatype supports precision of 15 digits. When the Integration Service converts the Double result to Bigint, it may round the number to a precision of 15. To ensure precision up to 19 digits, enable high precision when you configure the session. When you enable high precision, the Integration Service converts bigint values to decimal, evaluates the expressions, and converts the result to decimal for some functions and operators. Because the transformation Decimal datatype supports precision up to 28 digits, there is no loss of precision. If a session does not contain expression functions or operators that process bigint values, then enabling high precision does not affect how the Integration Service handles bigint values. Bigint values are processed by non-scientific functions and operators such as CUME, and AVG, and by all functions in the scientific category. For more information about functions and operators that can process bigint values, see the PowerCenter Designer Guide.
Decimal
The Integration Services processes decimal values as doubles or decimals. When you enable high precision, you choose to enable the Decimal datatype or let the Integration Service process the data as a Double (precision of 15). To enable high precision data handling:
Use the Decimal datatype with a precision of 16 to 28 in the mapping. Select Enable High Precision in the session properties.
For example, you have a mapping with Decimal (20,0) that passes the number 40012030304957666903. If you enable high precision, the Integration Service passes the number as is. If you do not enable high precision, the Integration Service passes 4.00120303049577 x 10 19.
248
If you want to process a Decimal value with a precision greater than 28 digits, the Integration Service treats it as a Double value. For example, if you want to process the number 2345678904598383902092.1927658, which has a precision of 29 digits, the Integration Service treats this number as a Double value of 2.34567890459838 x 10 21.
249
250
Chapter 8
Overview, 252 Configuring Sources in a Session, 254 Working with Relational Sources, 258 Working with File Sources, 263 Integration Service Handling for File Sources, 273 Using a File List, 277 Using FastExport, 280
251
Overview
In the Workflow Manager, you can create sessions with the following sources:
Relational. You can extract data from any relational database that the Integration Service can connect to. When extracting data from relational sources and Application sources, you must configure the database connection to the data source prior to configuring the session. File. You can create a session to extract data from a flat file, COBOL, or XML source. Use an operating system command to generate source data for a flat file or COBOL source or generate a file list. If you use a flat file or XML source, the Integration Service can extract data from any local directory or FTP connection for the source file. If the file source requires an FTP connection, you need to configure the FTP connection to the host machine before you create the session.
Heterogeneous. You can extract data from multiple sources in the same session. You can extract from multiple relational sources, such as Oracle and Microsoft SQL Server. Or, you can extract from multiple source types, such as relational and flat file. When you configure a session with heterogeneous sources, configure each source instance separately.
Globalization Features
You can choose a code page that you want the Integration Service to use for relational sources and flat files. You specify code pages for relational sources when you configure database connections in the Workflow Manager. You can set the code page for file sources in the session properties. For more information about code pages, see the PowerCenter Administrator Guide.
Source Connections
Before you can extract data from a source, you must configure the connection properties the Integration Service uses to connect to the source file or database. You can configure source database and FTP connections in the Workflow Manager.
252
Partitioning Sources
You can create multiple partitions for relational, Application, and file sources. For relational or Application sources, the Integration Service creates a separate connection to the source database for each partition you set in the session properties. For file sources, you can configure the session to read the source with one thread or multiple threads. For more information about partitioning data, see Understanding Pipeline Partitioning on page 427.
Overview
253
The Sources node lists the sources used in the session and displays their settings. To view and configure settings for a source, select the source from the list. You can configure the following settings for a source:
Configuring Readers
You can click the Readers settings on the Sources node to view the reader the Integration Service uses with each source instance. The Workflow Manager specifies the necessary reader for each source instance in the Readers settings on the Sources node.
254
Figure 8-2 shows the Readers settings in the Sources node of the Mapping tab:
Figure 8-2. Readers Settings in the Sources Node of the Mapping Tab
Configuring Connections
Click the Connections settings on the Sources node to define source connection information.
255
Figure 8-3 shows the Connections settings in the Sources node of the Mapping tab:
Figure 8-3. Connections Settings in the Sources Node
For relational sources, choose a configured database connection in the Value column for each relational source instance. By default, the Workflow Manager displays the source type for relational sources. For more information about configuring database connections, see Selecting the Source Database Connection on page 258. For flat file and XML sources, choose one of the following source connection types in the Type column for each source instance:
FTP. To read data from a flat file or XML source using FTP, you must specify an FTP connection when you configure source options. You must define the FTP connection in the Workflow Manager prior to configuring the session. For more information about using FTP, see Using FTP on page 749. None. Choose None to read from a local flat file or XML file.
Configuring Properties
Click the Properties settings in the Sources node to define source property information. The Workflow Manager displays properties, such as source file name and location for flat file, COBOL, and XML source file types. You do not need to define any properties on the Properties settings for relational sources.
256
Figure 8-4 shows the Properties settings in the Sources node of the Mapping tab:
Figure 8-4. Properties Settings in the Sources Node of the Mapping Tab
For more information about configuring sessions with relational sources, see Working with Relational Sources on page 258. For more information about configuring sessions with flat file sources, see Working with File Sources on page 263. For more information about configuring sessions with XML sources, see the PowerCenter XML Guide.
257
Source database connection. Select the database connection for each relational source. For more information, see Selecting the Source Database Connection on page 258. Treat source rows as. Define how the Integration Service treats each source row as it reads it from the source table. For more information, see Defining the Treat Source Rows As Property on page 258. Override SQL query. You can override the default SQL query to extract source data. For more information, see Overriding the SQL Query on page 259. Table owner name. Define the table owner name for each relational source. For more information, see Configuring the Table Owner Name on page 260. Source table name. You can override the source table name for each relational source. For more information, see Overriding the Source Table Name on page 261.
258
After you determine how to treat all rows in the session, you also need to set update strategy options for individual targets. For more information about setting the target update strategy options, see Target Properties on page 293. For more information about setting the update strategy for a session, see Update Strategy Transformation in the Transformation Guide.
Fields with incompatible datatypes or unknown fields Typing mistakes or other errors
259
Figure 8-5 shows the Properties settings in the Sources node where you can override the SQL query:
Figure 8-5. SQL Query Override Property in the Session Properties
SQL Query
In the Workflow Manager, open the session properties. Click the Mapping tab and open the Transformations view. Click the Sources node and open the Properties settings. Click the Open button in the SQL Query field to open the SQL Editor. Enter the SQL override. Click OK to return to the session properties.
You can use a parameter or variable as the table owner name. Use any parameter or variable type that you can define in the parameter file. For example, you can use a session parameter, $ParamMyTableOwner, as the table owner name, and set $ParamMyTableOwner to the table owner name in the parameter file. For more information, see Where to Use Parameters and Variables on page 691. Figure 8-6 shows the Properties settings where you define the table owner name for relational sources:
Figure 8-6. Source Table Owner Name Property
Owner Name
you override the source table name using an SQL query, the Integration Service uses the source table name defined in the SQL query.
261
Figure 8-7 shows the Properties settings where you define the source table name for relational sources:
Figure 8-7. Source Table Name Property
262
Source properties. You can define source properties on the Properties settings in the Sources node, such as source file options. For more information, see Configuring Source Properties on page 263 and Configuring Commands for File Sources on page 265. Flat file properties. You can edit fixed-width and delimited source file properties. For more information, see Configuring Fixed-Width File Properties on page 266 and Configuring Delimited File Properties on page 268. Line sequential buffer length. You can change the buffer length for flat files on the Advanced settings on the Config Object tab. For more information, see Configuring Line Sequential Buffer Length on page 271. Treat source rows as. You can define how the Integration Service treats each source row as it reads it from the source. For more information, see Defining the Treat Source Rows As Property on page 258.
263
Figure 8-8 shows the flat file source properties you define in the Properties settings of the Sources node on the Mapping tab:
Figure 8-8. Properties Settings in the Sources Node for a Flat File Source
Table 8-2 describes the properties you define on the Properties settings for flat file source definitions:
Table 8-2. Flat File Source Properties
File Source Options Input Type Required/ Optional Required Description Type of source input. You can choose the following types of source input: - File. For flat file, COBOL, or XML sources. - Command. For source data or a file list generated by a command. You cannot use a command to generate XML source data. Directory name of flat file source. By default, the Integration Service looks in the service process variable directory, $PMSourceFileDir, for file sources. If you specify both the directory and file name in the Source Filename field, clear this field. The Integration Service concatenates this field with the Source Filename field when it runs the session. You can also use the $InputFileName session parameter to specify the file location. For more information about session parameters, see Working with Session Parameters on page 236.
Optional
264
Optional
Command Type
Optional
Command
Optional
Optional
265
command as the flat file source data. Generating source data with a command eliminates the need to stage a flat file source. Use a command or script to send source data directly to the Integration Service instead of using a pre-session command to generate a flat file source. For example, to uncompress a data file and use the uncompressed data as the source data input rows, use the following command:
uncompress -c $PMSourceFileDir/myCompressedFile.Z
The command uncompresses the file and sends the standard output of the command to the flat file reader. The flat file reader reads the standard output of the command as the flat file source data.
The command generates a file list from the source file directory listing. When the session runs, the flat file reader reads each file as it reads the file names from the command. To use the output of a command as a file list, select Command as the Input Type, Command generating file list as the Command Type, and enter a command for the Command property.
266
To edit the fixed-width properties, select Fixed Width and click Advanced. The Fixed Width Properties dialog box appears. By default, the Workflow Manager displays file properties as configured in the mapping. Edit these settings to override those configured in the source definition. Figure 8-10 shows the Fixed Width Properties dialog box:
Figure 8-10. Fixed Width File Properties Dialog Box
267
Table 8-3 describes options you can define in the Fixed Width Properties dialog box for file sources:
Table 8-3. Fixed-Width File Properties for File Sources
Fixed-Width Properties Options Text/Binary Required/ Optional Required Description Indicates the character representing a null value in the file. This can be any valid character in the file code page, or any binary value from 0 to 255. For more information about specifying null characters, see Null Character Handling on page 274. If selected, the Integration Service reads repeat null characters in a single field as a single null value. If you do not select this option, the Integration Service reads a single null character at the beginning of a field as a null field. Important: For multibyte code pages, specify a single-byte null character if you use repeating non-binary null characters. This ensures that repeating null characters fit into the column. For more information about specifying null characters, see Null Character Handling on page 274. Code page of the fixed-width file. Select a code page or a variable: - Code page. Select the code page. - Use Variable. Enter a user-defined workflow or worklet variable or the session parameter $ParamName, and define the code page in the parameter file. For more information about supported code pages, see the PowerCenter Administrator Guide. For more information about parameter files, see Parameter Files on page 687. Default is the PowerCenter Client code page. Integration Service skips the specified number of rows before reading the file. Use this to skip header rows. One row may contain multiple records. If you select the Line Sequential File Format option, the Integration Service ignores this option. Integration Service skips the specified number of bytes between records. For example, you have an ASCII file on Windows with one record on each line, and a carriage return and line feed appear at the end of each line. If you want the Integration Service to skip these two single-byte characters, enter 2. If you have an ASCII file on UNIX with one record for each line, ending in a carriage return, skip the single character by entering 1. If selected, the Integration Service strips trailing blank spaces from records before passing them to the Source Qualifier transformation. Select this option if the file uses a carriage return at the end of each record, shortening the final column.
Optional
Code Page
Required
Optional
Optional
Optional Optional
268
Click Set File Properties to open the Flat Files dialog box. Figure 8-11 shows the Flat Files dialog box:
Figure 8-11. Flat Files Dialog Box - Delimited
To edit the delimited properties, select Delimited and click Advanced. The Delimited File Properties dialog box appears. By default, the Workflow Manager displays file properties as configured in the mapping. Edit these settings to override those configured in the source definition. Figure 8-12 shows the Delimited File Properties dialog box:
Figure 8-12. Delimited File Properties Dialog Box
269
Table 8-4 describes options you can define in the Delimited File Properties dialog box for file sources:
Table 8-4. Delimited File Properties for File Sources
Delimited File Properties Options Column Delimiters Required/ Optional Required Description One or more characters used to separate columns of data. Delimiters can be either printable or single-byte unprintable characters, and must be different from the escape character and the quote character (if selected). To enter a single-byte unprintable character, click the Browse button to the right of this field. In the Delimiters dialog box, select an unprintable character from the Insert Delimiter list and click Add. You cannot select unprintable multibyte characters as delimiters. Maximum number of delimiters is 80. By default, the Integration Service treats multiple delimiters separately. If selected, the Integration Service reads any number of consecutive delimiter characters as one. For example, a source file uses a comma as the delimiter character and contains the following record: 56, , , Jane Doe. By default, the Integration Service reads that record as four columns separated by three delimiters: 56, NULL, NULL, Jane Doe. If you select this option, the Integration Service reads the record as two columns separated by one delimiter: 56, Jane Doe. Select No Quotes, Single Quote, or Double Quotes. If you select a quote character, the Integration Service ignores delimiter characters within the quote characters. Therefore, the Integration Service uses quote characters to escape the delimiter. For example, a source file uses a comma as a delimiter and contains the following row: 342-3849, Smith, Jenna, Rockville, MD, 6. If you select the optional single quote character, the Integration Service ignores the commas within the quotes and reads the row as four fields. If you do not select the optional single quote, the Integration Service reads six separate fields. When the Integration Service reads two optional quote characters within a quoted string, it treats them as one quote character. For example, the Integration Service reads the following quoted string as Im going tomorrow: 2353, Im going tomorrow, MD Additionally, if you select an optional quote character, the Integration Service reads a string as a quoted string if the quote character is the first character of the field. Note: You can improve session performance if the source file does not contain quotes or escape characters. Code page of the delimited file. Select a code page or a variable: - Code page. Select the code page. - Use Variable. Enter a user-defined workflow or worklet variable or the session parameter $ParamName, and define the code page in the parameter file. For more information about supported code pages, see the PowerCenter Administrator Guide. For more information about parameter files, see Parameter Files on page 687. Default is the PowerCenter Client code page.
Optional
Optional Quotes
Required
Code Page
Required
270
Escape Character
Optional
Optional Optional
271
Figure 8-13 shows the Advanced settings on the Config Object tab in the session properties where you define the line buffer length:
Figure 8-13. Line Sequential Buffer Length Property for File Sources
272
Character set Multibyte character error handling Null character handling Row length handling for fixed-width flat files Numeric data handling Tab handling
Character Set
You can configure the Integration Service to run sessions in either ASCII or Unicode data movement mode. Table 8-5 describes source file formats supported by each data movement path in PowerCenter:
Table 8-5. Support for ASCII and Unicode Data Movement Modes
Character Set 7-bit ASCII US-EBCDIC (COBOL sources only) 8-bit ASCII 8-bit EBCDIC (COBOL sources only) ASCII-based MBCS EBCDIC-based SBCS EBCDIC-based MBCS Unicode mode Supported Supported Supported Supported Supported Supported Supported ASCII mode Supported Supported Supported Supported Integration Service generates a warning message. Not supported. The Integration Service terminates the session. Not supported. The Integration Service terminates the session.
If you configure a session to run in ASCII data movement mode, delimiters, escape characters, and null characters must be valid in the ISO Western European Latin 1 code page. Any 8-bit characters you specified in previous versions of PowerCenter are still valid. In Unicode data movement mode, delimiters, escape characters, and null characters must be valid in the specified code page of the flat file. For more information about configuring and working with data movement modes, see the PowerCenter Administrator Guide.
273
Non-line sequential file. The Integration Service skips rows containing misaligned data and resumes reading the next row. The skipped row appears in the session log with a corresponding error message. If an alignment error occurs at the end of a row, the Integration Service skips both the current row and the next row, and writes them to the session log. Line sequential file. The Integration Service skips rows containing misaligned data and resumes reading the next row. The skipped row appears in the session log with a corresponding error message. Reader error threshold. You can configure a session to stop after a specified number of non-fatal errors. A row containing an alignment error increases the error count by 1. The session stops if the number of rows containing errors reaches the threshold set in the session properties. Errors and corresponding error messages appear in the session log file.
Fixed-width COBOL sources are always byte-oriented and can be line sequential. The Integration Service handles COBOL files according to the following guidelines:
Line sequential files. The Integration Service skips rows containing misaligned data and writes the skipped rows to the session log. The session stops if the number of error rows reaches the error threshold. Non-line sequential files. The session stops at the first row containing misaligned data.
274
Table 8-6 describes how the Integration Service uses the Null Character and Repeat Null Character properties to determine if a column is null:
Table 8-6. Null Character Handling
Null Character Binary Repeat Null Character Disabled Integration Service Behavior A column is null if the first byte in the column is the binary null character. The Integration Service reads the rest of the column as text data to determine the column alignment and track the shift state for shift sensitive code pages. If data in the column is misaligned, the Integration Service skips the row and writes the skipped row and a corresponding error message to the session log. A column is null if the first character in the column is the null character. The Integration Service reads the rest of the column to determine the column alignment and track the shift state for shift sensitive code pages. If data in the column is misaligned, the Integration Service skips the row and writes the skipped row and a corresponding error message to the session log. A column is null if it contains the specified binary null character. The next column inherits the initial shift state of the code page. A column is null if the repeating null character fits into the column with no bytes leftover. For example, a five-byte column is not null if you specify a two-byte repeating null character. In shift-sensitive code pages, shift bytes do not affect the null value of a column. A column is still null if it contains a shift byte at the beginning or end of the column. Specify a single-byte null character if you use repeating non-binary null characters. This ensures that repeating null characters fit into a column.
Non-binary
Disabled
Binary Non-binary
Enabled Enabled
The file is fixed-width line-sequential with a carriage return or line feed that appears sooner than expected. The file is fixed-width non-line sequential, and the last line in the file is shorter than expected.
In these cases, the Integration Service reads the data but does not append any blanks to fill the remaining bytes. The Integration Service reads subsequent fields as NULL. Fields containing repeating null characters that do not fill the entire field length are not considered NULL.
275
276
Integration Service performs incremental aggregation across all listed source files.
Text file One file name, or path and file name, for each line
The Integration Service skips blank lines and ignores leading blank spaces. Any characters indicating a new line, such as \n in ASCII files, must be valid in the code page of the Integration Service.
277
The following example shows a valid file list created for an Integration Service on Windows. Each of the drives listed are mapped on the Integration Service machine. The western_trans.dat file is located in the same directory as the file list.
western_trans.dat d:\data\eastern_trans.dat e:\data\midwest_trans.dat f:\data\canada_trans.dat
After you create the file list, place it in a directory local to the Integration Service.
In the Workflow Manager, open the session properties. Click the Mapping tab and open the Transformations view. Click the Properties settings in the Sources node.
4. 5.
In the Source Filetype field, choose Indirect. In the Source Filename field, replace the file name with the name of the file list.
278
If necessary, also enter the path in the Source File Directory field. If you enter a file name in the Source Filename field, and you have specified a path in the Source File Directory field, the Integration Service looks for the named file in the listed directory. If you enter a file name in the Source Filename field, and you do not specify a path in the Source File Directory field, the Integration Service looks for the named file in the directory where the Integration Service is installed on UNIX or in the system directory on Windows.
6.
Click OK.
279
Using FastExport
FastExport is a utility that uses multiple Teradata sessions to quickly export large amounts of data from a Teradata database. You can create a PowerCenter session that uses FastExport to read Teradata sources. To use FastExport, create a mapping with a Teradata source database. The mapping can include multiple source definitions from the same Teradata source database joined in a single Source Qualifier transformation. In the session, use FastExport reader instead of Relational reader. Use a FastExport connection to the Teradata tables you want to export in a session. FastExport uses a control file that defines what to export. When a session starts, the Integration Service creates the control file from the FastExport connection attributes. If you create a SQL override for the Teradata tables, the Integration Service uses the SQL to generate the control file. You can override the control file for a session by defining a control file in the session properties. The Integration Service writes FastExport messages in the session log and information about FastExport performance in the FastExport log. PowerCenter saves the FastExport log in the folder defined by the Temporary File Name session attribute. The default extension for the FastExport log is .log.
To use FastExport in a session, complete the following steps:
1. 2. 3. 4.
Create a FastExport connection in the Workflow Manager and configure the connection attributes. Open the session and change the reader from Relational to Teradata FastExport. Change the connection type and select a FastExport connection for the session. Optionally, create a FastExport control file in a text editor and save it in the repository.
Click Connections > Application in the Workflow Manager. The Connection Browser dialog box appears.
2. 3. 4. 5. 6. 280
Click New. Select a Teradata FastExport connection and click OK. Enter a name for the FastExport connection. Enter the database user name and password. Enter the FastExport attributes and click OK.
Table 8-7 shows the attributes that you configure for a Teradata FastExport connection:
Table 8-7. FastExport Connection Attributes
Attribute TDPID Tenacity Default Value n/a 4 Description Teradata database ID. Number of hours that FastExport tries to log on to the Teradata database. When FastExport tries to log on but the maximum number of Teradata sessions is already running, FastExport waits for the amount of time defined by the SLEEP option. After the SLEEP time, FastExport tries to log on to the Teradata Database again. FastExport repeats this process until it has either logged on for the required number of sessions or exceeded the TENACITY hours time period. Maximum number of FastExport sessions per FastExport job. Max Sessions must be between 1 and the total number of access module processes (AMPs) on your system. Number of minutes FastExport pauses before retrying a login. FastExport attempts a login until the login succeeds or the Tenacity hours elapse. Maximum block size to use for the exported data. Enables data encryption for FastExport. You can use data encryption with the version 8 Teradata client. Restart log table name. The FastExport utility uses the information in the restart log table to restart jobs that halt because of a Teradata database or client system failure. Each FastExport job should use a separate logtable. If you specify a table that does not exist, the FastExport utility creates the table and uses it as the restart log. PowerCenter does not support restarting FastExport, but if you stage the output, you can restart FastExport manually. The name of the Teradata database you want to connect to. The Integration Service generates the SQL statement using the database name as a prefix to the table name.
Max Sessions
Sleep
Database Name
n/a
For more information about the connection attributes, see the Teradata documentation.
Using FastExport
281
Table 8-8 describes the session attributes you can change for FastExport:
Table 8-8. Fast Export Session Attributes
Attribute Is Staged Fractional seconds precision Default Value Disabled 0 Precision If selected, FastExport writes data to a stage file. The precision for fractional seconds following the decimal point in a timestamp. You can enter 0 to 6. For example, a timestamp with a precision of 6 is 'hh:mi:ss.ss.ss.ss.' The fractional seconds precision must match the setting in the Teradata database. PowerCenter uses the temporary file name to generate the names for the log file, control file, and the staged output file. Enter a complete path for the file. The control file text. Use this attribute to override the control file the Integration Service creates for a session. For more information, see Overriding the Control File on page 282.
Temporary File
$PMTempDir\
Blank
Begin on a new line. Start with a period ( . ) character. End with a semicolon ( ; ) character.
For more information about using the control file, see the Teradata documentation. Table 8-9 describes the control file statements you can use with PowerCenter:
Table 8-9. FastExport Control File Statements
Control File Statement .LOGTABLE utillog ; LOGON tdpz/user,pswd; BEGIN EXPORT .SESSIONS 20; .EXPORT OUTFILE ddname2; SELECT EmpNo, Hours FROM charges Description The restart logtable name. The database login string, including the database, user name, and password. The first export command. The number of Teradata sessions. The destination file for the exported data. The SQL statements to select data.
282
Create a control file in a text editor. Copy the control file text to the clipboard. Paste the control file text into the Control File Override field.
The Workflow Manager does not validate the control file syntax. Teradata verifies the control file syntax when you run a session. If the control file is invalid, the session fails.
Tip: You can change the control file to read-only to use the control file for each session. The
When you use an SQL override for Teradata, PowerCenter uses it to create the FastExport control file. If you do not use an SQL override, PowerCenter generates a control file based on the connected ports in the source qualifier. FastExport supports a maximum export file size of 2 GB on a UNIX MP-RAS operating system. Other operating systems have no file size limitation. You cannot concatenate exported data files. The session fails if you use a pre-session SQL command and FastExport.
Using FastExport
283
284
Chapter 9
Overview, 286 Configuring Targets in a Session, 288 Working with Relational Targets, 292 Working with Target Connection Groups, 310 Working with Active Sources, 312 Working with File Targets, 314 Integration Service Handling for File Targets, 323 Working with Heterogeneous Targets, 329 Reject Files, 330
285
Overview
In the Workflow Manager, you can create sessions with the following targets:
Relational. You can load data to any relational database that the Integration Service can connect to. When loading data to relational targets, you must configure the database connection to the target before you configure the session. File. You can load data to a flat file or XML target or write data to an operating system command. For flat file or XML targets, the Integration Service can load data to any local directory or FTP connection for the target file. If the file target requires an FTP connection, you need to configure the FTP connection to the host machine before you create the session. Heterogeneous. You can output data to multiple targets in the same session. You can output to multiple relational targets, such as Oracle and Microsoft SQL Server. Or, you can output to multiple target types, such as relational and flat file. For more information, see Working with Heterogeneous Targets on page 329.
Globalization Features
You can configure the Integration Service to run sessions in either ASCII or Unicode data movement mode. Table 9-1 describes target character sets supported by each data movement mode in PowerCenter:
Table 9-1. Support for ASCII and Unicode Data Movement Modes
Character Set 7-bit ASCII ASCII-based MBCS UTF-8 EBCDIC-based SBCS EBCDIC-based MBCS Unicode Mode Supported Supported Supported (Targets Only) Supported Supported ASCII Mode Supported Integration Service generates a warning message, but does not terminate the session. Integration Service generates a warning message, but does not terminate the session. Not supported. The Integration Service terminates the session. Not supported. The Integration Service terminates the session.
You can work with targets that use multibyte character sets with PowerCenter. You can choose a code page that you want the Integration Service to use for relational objects and flat files. You specify code pages for relational objects when you configure database connections in the Workflow Manager. The code page for a database connection used as a target must be a superset of the source code page.
286
When you change the database connection code page to one that is not two-way compatible with the old code page, the Workflow Manager generates a warning and invalidates all sessions that use that database connection. Code pages you select for a file represent the code page of the data contained in these files. If you are working with flat files, you can also specify delimiters and null characters supported by the code page you have specified for the file. Target code pages must be a superset of the source code page. However, if you configure the Integration Service and Client for code page relaxation, you can select any code page supported by PowerCenter for the target database connection. When using code page relaxation, select compatible code pages for the source and target data to prevent data inconsistencies. For more information about code page compatibility, see the PowerCenter Administrator Guide. If the target contains multibyte character data, configure the Integration Service to run in Unicode mode. When the Integration Service runs a session in Unicode mode, it uses the database code page to translate data. If the target contains only single-byte characters, configure the Integration Service to run in ASCII mode. When the Integration Service runs a session in ASCII mode, it does not validate code pages.
Target Connections
Before you can load data to a target, you must configure the connection properties the Integration Service uses to connect to the target file or database. You can configure target database and FTP connections in the Workflow Manager. For more information about creating database connections, see Relational Database Connections on page 50. For more information about creating FTP connections, see FTP Connections on page 60.
Partitioning Targets
When you create multiple partitions in a session with a relational target, the Integration Service creates multiple connections to the target database to write target data concurrently. When you create multiple partitions in a session with a file target, the Integration Service creates one target file for each partition. You can configure the session properties to merge these target files. For more information about configuring a session for pipeline partitioning, see Understanding Pipeline Partitioning on page 427.
Overview
287
Targets Node
Writers Settings
Connections Settings
Properties Settings
Transformations View
The Targets node contains the following settings where you define properties:
Configuring Writers
Click the Writers settings in the Transformations view to define the writer to use with each target instance.
288
Figure 9-2 shows where you define the writer to use with each target instance:
Figure 9-2. Writer Settings on the Mapping Tab of the Session Properties
Writer Settings
When the mapping target is a flat file, an XML file, an SAP BW target, or a WebSphere MQ target, the Workflow Manager specifies the necessary writer in the session properties. However, when the target in the mapping is relational, you can change the writer type to File Writer if you plan to use an external loader.
Note: You can change the writer type for non-reusable sessions in the Workflow Designer and
for reusable sessions in the Task Developer. You cannot change the writer type for instances of reusable sessions in the Workflow Designer. When you override a relational target to use the file writer, the Workflow Manager changes the properties for that target instance on the Properties settings. It also changes the connection options you can define in the Connections settings. When you override a relational target to use the file writer, the Integration Service can trim subsecond data in the target. It trims subseconds based on the least value of the datetime format string in the session configuration object or the precision of the relational target. After you override a relational target to use a file writer, define the file properties for the target. Click Set File Properties and choose the target to define. For more information, see Configuring Fixed-Width Properties on page 319 and Configuring Delimited Properties on page 320.
289
Configuring Connections
View the Connections settings on the Mapping tab to define target connection information. Figure 9-3 shows the Connection settings on the Mappings tab of the session properties:
Figure 9-3. Connection Settings on the Mapping Tab of the Session Properties
For relational targets, the Workflow Manager displays Relational as the target type by default. In the Value column, choose a configured database connection for each relational target instance. For more information about configuring database connections, see Target Database Connection on page 293. For flat file and XML targets, choose one of the following target connection types in the Type column for each target instance:
FTP. If you want to load data to a flat file or XML target using FTP, you must specify an FTP connection when you configure target options. FTP connections must be defined in the Workflow Manager prior to configuring sessions. For more information about using FTP, see Using FTP on page 749. Loader. Use the external loader option to improve the load speed to Oracle, DB2, Sybase IQ, or Teradata target databases. To use this option, you must use a mapping with a relational target definition and choose File as the writer type on the Writers settings for the relational target instance. The Integration Service uses an external loader to load target files to the Oracle, DB2, Sybase
290
IQ, or Teradata database. You cannot choose external loader if the target is defined in the mapping as a flat file, XML, MQ, or SAP BW target. For more information about using the external loader feature, see External Loading on page 711.
Queue. Choose Queue when you want to output to a WebSphere MQ or MSMQ message queue. None. Choose None when you want to write to a local flat file or XML file.
Configuring Properties
View the Properties settings on the Mapping tab to define target property information. The Workflow Manager displays different properties for the different target types: relational, flat file, and XML. Figure 9-4 shows the Properties settings on the Mapping tab:
Figure 9-4. Properties Settings on the Mapping Tab of the Session Properties
Properties Settings
For more information about relational target properties, see Working with Relational Targets on page 292. For more information about flat file target properties, see Working with File Targets on page 314. For more information about XML target properties, see Working with Heterogeneous Targets on page 329. For more information about configuring sessions with multiple target types, see Working with Heterogeneous Targets on page 329.
291
Target database connection. Define database connection information. For more information, see Target Database Connection on page 293. Target properties. You can define target properties such as target load type, target update options, and reject options. For more information, see Target Properties on page 293. Truncate target tables. The Integration Service can truncate target tables before loading data. For more information, see Truncating Target Tables on page 298. Deadlock retry. You can configure the session to retry deadlocks when writing to targets or a recovery table. For more information, see Deadlock Retry on page 300. Drop and recreate indexes. Use pre- and post-session SQL to drop and recreate an index on a relational target table to optimize query speed. For more information, see Dropping and Recreating Indexes on page 301. Constraint-based loading. The Integration Service can load data to targets based on primary key-foreign key constraints and active sources in the session mapping. For more information, see Constraint-Based Loading on page 301. Bulk loading. You can specify bulk mode when loading to DB2, Microsoft SQL Server, Oracle, and Sybase databases. For more information, see Bulk Loading on page 304.
You can define the following properties in the session and override the properties you define in the mapping:
Table name prefix. You can specify the target owner name or prefix in the session properties to override the table name prefix in the mapping. For more information, see Table Name Prefix on page 306. Pre-session SQL. You can create SQL commands and execute them in the target database before loading data to the target. For example, you might want to drop the index for the target table before loading data into it. For more information, see Using Pre- and PostSession SQL Commands on page 223. Post-session SQL. You can create SQL commands and execute them in the target database after loading data to the target. For example, you might want to recreate the index for the target table after loading data into it. For more information, see Using Pre- and PostSession SQL Commands on page 223. Target table name. You can override the target table name for each relational target. For more information, see Target Table Name on page 307.
If any target table or column name contains a database reserved word, you can create and maintain a reserved words file containing database reserved words. When the Integration Service executes SQL against the database, it places quotes around the reserved words. For more information, see Reserved Words on page 308.
292 Chapter 9: Working with Targets
When the Integration Service runs a session with at least one relational target, it performs database transactions per target connection group. For example, it commits all data to targets in a target connection group at the same time. For more information, see Working with Target Connection Groups on page 310.
Target Properties
You can configure session properties for relational targets in the Transformations view on the Mapping tab, and in the General Options settings on the Properties tab. Define the properties for each target instance in the session. When you click the Transformations view on the Mapping tab, you can view and configure the settings of a specific target. Select the target under the Targets node.
293
Figure 9-5 shows the relational target properties you define in the Properties settings on the Mapping tab:
Figure 9-5. Properties Settings on the Mapping Tab for a Relational Target
Table 9-2 describes the properties available in the Properties settings on the Mapping tab of the session properties:
Table 9-2. Relational Target Properties
Target Property Target Load Type Required/ Optional Required Description You can choose Normal or Bulk. If you select Normal, the Integration Service loads targets normally. You can choose Bulk when you load to DB2, Sybase, Oracle, or Microsoft SQL Server. If you specify Bulk for other database types, the Integration Service reverts to a normal load. Choose Normal mode if the mapping contains an Update Strategy transformation. For more information, see Bulk Loading on page 304. Integration Service inserts all rows flagged for insert. Default is enabled. Integration Service updates all rows flagged for update. Default is enabled. Integration Service inserts all rows flagged for update. Default is disabled.
294
Optional Optional
Optional
Reject Filename
Required
*For more information, see Using Session-Level Target Properties with Source Properties on page 297. For more information about target update strategies, see Update Strategy Transformation in the PowerCenter Transformation Guide.
295
Figure 9-6 shows the test load options in the General Options settings on the Properties tab:
Figure 9-6. Test Load Options - Relational Targets
Table 9-3 describes the test load options on the General Options settings on the Properties tab:
Table 9-3. Test Load Options - Relational Targets
Property Enable Test Load Required/ Optional Optional Description You can configure the Integration Service to perform a test load. With a test load, the Integration Service reads and transforms data without writing to targets. The Integration Service generates all session files, and performs all pre- and post-session functions, as if running the full session. The Integration Service writes data to relational targets, but rolls back the data when the session completes. For all other target types, such as flat file and SAP BW, the Integration Service does not write data to the targets. Enter the number of source rows you want to test in the Number of Rows to Test field. You cannot perform a test load on sessions using XML sources. You can perform a test load for relational targets when you configure a session for normal mode. If you configure the session for bulk mode, the session fails. Enter the number of source rows you want the Integration Service to test load. The Integration Service reads the number you configure for the test load.
Optional
296
Inserts. If you treat source rows as inserts, select Insert for the target option. When you enable the Insert target row option, the Integration Service ignores the other target row options and treats all rows as inserts. If you disable the Insert target row option, the Integration Service rejects all rows. Deletes. If you treat source rows as deletes, select Delete for the target option. When you enable the Delete target option, the Integration Service ignores the other target-level row options and treats all rows as deletes. If you disable the Delete target option, the Integration Service rejects all rows. Updates. If you treat source rows as updates, the behavior of the Integration Service depends on the target options you select. Table 9-4 describes how the Integration Service loads the target when you configure the session to treat source rows as updates:
Table 9-4. Effect of Target Options when You Treat Source Rows as Updates
Target Option Insert Integration Service Behavior If enabled, the Integration Service uses the target update option (Update as Update, Update as Insert, or Update else Insert) to update rows. If disabled, the Integration Service rejects all rows when you select Update as Insert or Update else Insert as the target-level update option. Integration Service updates all rows as updates. Integration Service updates all rows as inserts. You must also select the Insert target option. Integration Service updates existing rows and inserts other rows as if marked for insert. You must also select the Insert target option. Integration Service ignores this setting and uses the selected target update option.
*The Integration Service rejects all rows if you do not select one of the target update options.
297
Data Driven. If you treat source rows as data driven, you use an Update Strategy transformation to specify how the Integration Service handles rows. However, the behavior of the Integration Service also depends on the target options you select. Table 9-5 describes how the Integration Service loads the target when you configure the session to treat source rows as data driven:
Table 9-5. Effect of Target Options when You Treat Source Rows as Data Driven
Target Option Insert Integration Service Behavior If enabled, the Integration Service inserts all rows flagged for insert. Enabled by default. If disabled, the Integration Service rejects the following rows: - Rows flagged for insert - Rows flagged for update if you enable Update as Insert or Update else Insert Integration Service updates all rows flagged for update. Enabled by default. Integration Service inserts all rows flagged for update. Disabled by default. Integration Service updates rows flagged for update and inserts remaining rows as if marked for insert. If enabled, the Integration Service deletes all rows flagged for delete. If disabled, the Integration Service rejects all rows flagged for delete.
*The Integration Service rejects rows flagged for update if you do not select one of the target update options.
298
*If you use a DB2 database on AS/400, the Integration Service issues a clrpfm command. ** If you use the Microsoft SQL Server ODBC driver, the Integration Service issues a delete statement.
If the Integration Service issues a truncate target table command and the target table instance specifies a table name prefix, the Integration Service verifies the database user privileges for the target table by issuing a truncate command. If the database user is not specified as the target owner name or does not have the database privilege to truncate the target table, the Integration Service issues a delete command instead. When the Integration Service issues a delete command, it writes the following message to the session log:
WRT_8208 Error truncating target table <target table name>. The Integration Service is trying the DELETE FROM query.
If the Integration Service issues a delete command and the database has logging enabled, the database saves all deleted records to the log for rollback. If you do not want to save deleted records for rollback, you can disable logging to improve the speed of the delete. For all databases, if the Integration Service fails to truncate or delete any selected table because the user lacks the necessary privileges, the session fails. If you enable truncate target tables with the following sessions, the Integration Service does not truncate target tables:
Incremental aggregation. When you enable both truncate target tables and incremental aggregation in the session properties, the Workflow Manager issues a warning that you cannot enable truncate target tables and incremental aggregation in the same session. Test load. When you enable both truncate target tables and test load, the Integration Service disables the truncate table function, runs a test load session, and writes the following message to the session log:
WRT_8105 Truncate target tables option turned off for test load session.
Real-time. The Integration Service does not truncate target tables when you restart a JMS or Websphere MQ real-time session that has recovery data.
In the Workflow Manager, open the session properties. Click the Mapping tab, and then click the Transformations view.
299
3.
4. 5.
In the Properties settings, select Truncate Target Table Option for each target table you want the Integration Service to truncate before it runs the session. Click OK.
Deadlock Retry
Select the Session Retry on Deadlock option in the session properties if you want the Integration Service to retry writes to a target database or recovery table on a deadlock. A deadlock occurs when the Integration Service attempts to take control of the same lock for a database row. The Integration Service may encounter a deadlock under the following conditions:
A session writes to a partitioned target. Two sessions write simultaneously to the same target. Multiple sessions simultaneously write to the recovery table, PM_RECOVERY.
Encountering deadlocks can slow session performance. To improve session performance, you can increase the number of target connection groups the Integration Service uses to write to the targets in a session. To use a different target connection group for each target in a session, use a different database connection name for each target instance. You can specify the same connection information for each connection name. For more information, see Working with Target Connection Groups on page 310.
300
You can retry sessions on deadlock for targets configured for normal load. If you select this option and configure a target for bulk mode, the Integration Service does not retry target writes on a deadlock for that target. You can also configure the Integration Service to set the number of deadlock retries and the deadlock sleep time period. For more information about configuring the Integration Service, see the PowerCenter Administrator Guide. To retry a session on deadlock, click the Properties tab in the session properties and then scroll down to the Performance settings.
Using pre- and post-session SQL. The preferred method for dropping and re-creating indexes is to define an SQL statement in the Pre SQL property that drops indexes before loading data to the target. Use the Post SQL property to recreate the indexes after loading data to the target. Define the Pre SQL and Post SQL properties for relational targets in the Transformations view on the Mapping tab in the session properties. For more information, see Using Pre- and Post-Session SQL Commands on page 223. Using the Designer. The same dialog box you use to generate and execute DDL code for table creation can drop and recreate indexes. However, this process is not automatic. Every time you run a session that modifies the target table, you need to launch the Designer and use this feature.
Constraint-Based Loading
In the Workflow Manager, you can specify constraint-based loading for a session. When you select this option, the Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the Integration Service loads the corresponding transformed row first to the primary key table, then to any foreign key tables. Constraintbased loading depends on the following requirements:
Active source. Related target tables must have the same active source. Key relationships. Target tables must have key relationships. Target connection groups. Targets must be in one target connection group. Treat rows as insert. Use this option when you insert into the target. You cannot use updates with constraint-based loading.
Active Source
When target tables receive rows from different active sources, the Integration Service reverts to normal loading for those tables, but loads all other targets in the session using constraintbased loading when possible. For example, a mapping contains three distinct pipelines. The first two contain a source, source qualifier, and target. Since these two targets receive data from different active sources, the Integration Service reverts to normal loading for both
Working with Relational Targets 301
targets. The third pipeline contains a source, Normalizer, and two targets. Since these two targets share a single active source (the Normalizer), the Integration Service performs constraint-based loading: loading the primary key table first, then the foreign key table. For more information about active sources, see Working with Active Sources on page 312.
Key Relationships
When target tables have no key relationships, the Integration Service does not perform constraint-based loading. Similarly, when target tables have circular key relationships, the Integration Service reverts to a normal load. For example, you have one target containing a primary key and a foreign key related to the primary key in a second target. The second target also contains a foreign key that references the primary key in the first target. The Integration Service cannot enforce constraint-based loading for these tables. It reverts to a normal load.
Verify all targets are in the same target load order group and receive data from the same active source. Use the default partition properties and do not add partitions or partition points. Define the same target type for all targets in the session properties. Define the same database connection name for all targets in the session properties. Choose normal mode for the target load type for all targets in the session properties.
For more information, see Working with Target Connection Groups on page 310.
Load primary key table in one mapping and dependent tables in another mapping. Use constraint-based loading to load the primary table. Perform inserts in one mapping and updates in another mapping.
302
For more information about update strategies, see Update Strategy Transformation in the Transformation Guide. Constraint-based loading does not affect the target load ordering of the mapping. Target load ordering defines the order the Integration Service reads the sources in each target load order group in the mapping. A target load order group is a collection of source qualifiers, transformations, and targets linked together in a mapping. Constraint-based loading establishes the order in which the Integration Service loads individual targets within a set of targets receiving data from a single source qualifier.
Example
The session for the mapping in Figure 9-7 is configured to perform constraint-based loading. In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys referencing the T1 primary key. T_3 has a primary key that T_4 references as a foreign key. Since these four tables receive records from a single active source, SQ_A, the Integration Service loads rows to the target in the following order:
The Integration Service loads T_1 first because it has no foreign key dependencies and contains a primary key referenced by T_2 and T_3. The Integration Service then loads T_2 and T_3, but since T_2 and T_3 have no dependencies, they are not loaded in any particular order. The Integration Service loads T_4 last, because it has a foreign key that references a primary key in T_3.
Figure 9-7. Mapping Using Constraint-Based Loading
303
After loading the first set of targets, the Integration Service begins reading source B. If there are no key relationships between T_5 and T_6, the Integration Service reverts to a normal load for both targets. If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data from a single active source, the Aggregator AGGTRANS, the Integration Service loads rows to the tables in the following order:
T_5 T_6
T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database connection for each target, and you use the default partition properties. T_5 and T_6 are in another target connection group together if you use the same database connection for each target and you use the default partition properties. The Integration Service includes T_5 and T_6 in a different target connection group because they are in a different target load order group from the first four targets.
To enable constraint-based loading: 1. 2. 3.
In the General Options settings of the Properties tab, choose Insert for the Treat Source Rows As property. Click the Config Object tab. In the Advanced settings, select Constraint Based Load Ordering. Click OK.
Bulk Loading
You can enable bulk loading when you load to DB2, Sybase, Oracle, or Microsoft SQL Server. If you enable bulk loading for other database types, the Integration Service reverts to a normal load. Bulk loading improves the performance of a session that inserts a large amount of data to the target database. Configure bulk loading on the Mapping tab. When bulk loading, the Integration Service invokes the database bulk utility and bypasses the database log, which speeds performance. Without writing to the database log, however, the target database cannot perform rollback. As a result, you may not be able to perform recovery. Therefore, you must weigh the importance of improved session performance against the ability to recover an incomplete session. For more information about increasing session performance when bulk loading, see the PowerCenter Performance Tuning Guide.
Note: When loading to DB2, Microsoft SQL Server, and Oracle targets, you must specify a
normal load for data driven sessions. When you specify bulk mode and data driven, the Integration Service reverts to normal load.
304
Committing Data
When bulk loading to Sybase and DB2 targets, the Integration Service ignores the commit interval you define in the session properties and commits data when the writer block is full. When bulk loading to Microsoft SQL Server and Oracle targets, the Integration Service commits data at each commit interval. Also, Microsoft SQL Server and Oracle start a new bulk load transaction after each commit.
Tip: When bulk loading to Microsoft SQL Server or Oracle targets, define a large commit
interval to reduce the number of bulk load transactions and increase performance.
Oracle Guidelines
Use the following guidelines when bulk loading to Oracle:
Do not define CHECK constraints in the database. Do not define primary and foreign keys in the database. However, you can define primary and foreign keys for the target definitions in the Designer. To bulk load into indexed tables, choose non-parallel mode and disable the Enable Parallel Mode option. For more information, see Relational Database Connections on page 50. Note that when you disable parallel mode, you cannot load multiple target instances, partitions, or sessions into the same table. To bulk load in parallel mode, you must drop indexes and constraints in the target tables before running a bulk load session. After the session completes, you can rebuild them. If you use bulk loading with the session on a regular basis, use pre- and post-session SQL to drop and rebuild indexes and key constraints.
When you use the LONG datatype, verify it is the last column in the table. Specify the Table Name Prefix for the target when you use Oracle client 9i. If you do not specify the table name prefix, the Integration Service uses the database login as the prefix.
DB2 Guidelines
Use the following guidelines when bulk loading to DB2:
You must drop indexes and constraints in the target tables before running a bulk load session. After the session completes, you can rebuild them. If you use bulk loading with the session on a regular basis, use pre- and post-session SQL to drop and rebuild indexes and key constraints. You cannot use source-based or user-defined commit when you run bulk load sessions on DB2. If you create multiple partitions for a DB2 bulk load session, you must use database partitioning for the target partition type. If you choose any other partition type, the Integration Service reverts to normal load and writes the following message to the session log:
305
ODL_26097 Only database partitioning is support for DB2 bulk load. Changing target load type variable to Normal.
When you bulk load to DB2, the DB2 database writes non-fatal errors and warnings to a message log file in the session log directory. The message log file name is <session_log_name>.<target_instance_name>.<partition_index>.log. You can check both the message log file and the session log when you troubleshoot a DB2 bulk load session. If you want to bulk load flat files to IBM DB2 on MVS, use PowerExchange. For more information, see the PowerExchange DB2 Adapter Guide.
connection environment SQL, the Integration Service uses table owner name in the target instance. To use the table owner name specified in the SET sqlid statement, do not enter a name in the target name prefix.
306
Figure 9-8 shows the Properties settings where you define the table name prefix for relational targets:
Figure 9-8. Table Name Prefix Property
Target Instance
307
Figure 9-9 shows the Properties settings where you define the target table name for relational targets:
Figure 9-9. Target Table Name Property
Reserved Words
If any table name or column name contains a database reserved word, such as MONTH or YEAR, the session fails with database errors when the Integration Service executes SQL against the database. You can create and maintain a reserved words file, reswords.txt, in the server/bin directory. When the Integration Service initializes a session, it searches for reswords.txt. If the file exists, the Integration Service places quotes around matching reserved words when it executes SQL against the database. Use the following rules and guidelines when working with reserved words.
The Integration Service searches the reserved words file when it generates SQL to connect to source, target, and lookup databases. If you override the SQL for a source, target, or lookup, you must enclose any reserved word in quotes. You may need to enable some databases, such as Microsoft SQL Server and Sybase, to use SQL-92 standards regarding quoted identifiers. Use connection environment SQL to issue the command. For example, use the following command with Microsoft SQL Server:
SET QUOTED_IDENTIFIER ON
308
309
Deadlock retry. If the Integration Service encounters a deadlock when it writes to a target, the deadlock affects targets in the same target connection group. The Integration Service still writes to targets in other target connection groups. For more information, see Deadlock Retry on page 300. Constraint-based loading. The Integration Service enforces constraint-based loading for targets in a target connection group. If you want to specify constraint-based loading, you must verify the primary table and foreign table are in the same target connection group. For more information, see Constraint-Based Loading on page 301. Belong to the same partition. Belong to the same target load order group and transaction control unit. Have the same target type in the session. Have the same database connection name for relational targets, and Application connection name for SAP BW targets. For more information, see the PowerExchange for SAP NetWeaver User Guide. Have the same target load type, either normal or bulk mode.
Targets in the same target connection group meet the following criteria:
For example, suppose you create a session based on a mapping that reads data from one source and writes to two Oracle target tables. In the Workflow Manager, you do not create multiple partitions in the session. You use the same Oracle database connection for both target tables in the session properties. You specify normal mode for the target load type for both target tables in the session properties. The targets in the session belong to the same target connection group. Suppose you create a session based on the same mapping. In the Workflow Manager, you do not create multiple partitions. However, you use one Oracle database connection name for one target, and you use a different Oracle database connection name for the other target. You specify normal mode for the target load type for both target tables. The targets in the session belong to different target connection groups.
Note: When you define the target database connections for multiple targets in a session using
session parameters, the targets may or may not belong to the same target connection group. The targets belong to the same target connection group if all session parameters resolve to the same target connection name. For example, you create a session with two targets and specify the session parameter $DBConnection1 for one target, and $DBConnection2 for the other
310
target. In the parameter file, you define $DBConnection1 as Sales1 and you define $DBConnection2 as Sales1 and run the workflow. Both targets in the session belong to the same target connection group.
311
Aggregator Application Source Qualifier Custom, configured as an active transformation Joiner MQ Source Qualifier Normalizer (VSAM or pipeline) Rank Sorter Source Qualifier XML Source Qualifier Mapplet, if it contains any of the above transformations
Note: Although the Filter, Router, Transaction Control, and Update Strategy transformations
are active transformations, the Integration Service does not use them as active sources in a pipeline. Active sources affect how the Integration Service processes a session when you use any of the following transformations or session properties:
XML targets. The Integration Service can load data from different active sources to an XML target when each input group receives data from one active source. For more information about XML targets, see Working with XML Targets in the PowerCenter XML Guide. Transaction generators. Transaction generators, such as Transaction Control transformations, become ineffective for downstream transformations or targets if you put a transaction control point after it. Transaction control points are transaction generators and active sources that generate commits. For more information about effective and ineffective transaction generators, see Transaction Control Transformation in the Transformation Guide. For a list of transaction control points, see Transformation Scope on page 369. Mapplets. An Input transformation must receive data from a single active source. For more information about connecting mapplets to active sources in mappings, see Mapplets in the Designer Guide. Source-based commit. Some active sources generate commits. When you run a sourcebased commit session, the Integration Service generates a commit from these active sources at every commit interval. For more information about source-based commit sessions, see Source-Based Commits on page 360.
312
Constraint-based loading. To use constraint-based loading, you must connect all related targets to the same active source. The Integration Service orders the target load on a rowby-row basis based on rows generated by an active source. For more information about constraint-based loading, see Constraint-Based Loading on page 301. Row error logging. If an error occurs downstream from an active source that is not a source qualifier, the Integration Service cannot identify the source row information for the logged error row. For more information about logging errors, see Overview on page 674.
313
Use a flat file target definition. Create a mapping with a flat file target definition. Create a session using the flat file target definition. When the Integration Service runs the session, it creates the target flat file or generates the target data based on the flat file target definition. Use a relational target definition. Use a relational definition to write to a flat file when you want to use an external loader to load the target. Create a mapping with a relational target definition. Create a session using the relational target definition. Configure the session to output to a flat file by specifying the File Writer in the Writers settings on the Mapping tab. For more information about using the external loader feature, see External Loading on page 711. Target properties. You can define target properties such as partitioning options, merge options, output file options, reject options, and command options. For more information, see Configuring Target Properties on page 314. Flat file properties. You can choose to create delimited or fixed-width files, and define their properties. For more information, see Configuring Fixed-Width Properties on page 319 and Configuring Delimited Properties on page 320.
You can configure the following properties for flat file targets:
314
Figure 9-10 shows the flat file target properties you define in the Properties settings on the Mapping tab:
Figure 9-10. Properties Settings in the Mapping Tab for a Flat File Target
Table 9-7 describes the properties you define in the Properties settings for flat file target definitions:
Table 9-7. Flat File Target Properties
Target Properties Merge Type Required/ Optional Optional Description Type of merge the Integration Service performs on the data for partitioned targets. For more information about partitioning targets and creating merge files, see Partitioning File Targets on page 461. Name of the merge file directory. By default, the Integration Service writes the merge file in the service process variable directory, $PMTargetFileDir. If you enter a full directory and file name in the Merge File Name field, clear this field. Name of the merge file. Default is target_name.out. This property is required if you select a merge type.
Optional
Optional
315
Header Options
Optional
Header Command
Optional
Footer Command
Optional
Output Type
Required
Merge Command
Optional
Optional
316
Optional
Required
Optional Optional
317
To send the target data to a command, select Command for the output type and enter a command for the Command property. For example, to generate a compressed file from the target data, use the following command:
compress -c - > $PMTargetFileDir/myCompressedFile.Z
The Integration Service sends the output data to the command, and the command generates a compressed file that contains the target data.
Note: You can also use service process variables, such as $PMTargetFileDir, in the command.
You cannot perform a test load on sessions using XML sources. You can perform a test load for relational targets when you configure a session for normal mode. If you configure the session for bulk mode, the session fails. Enable a test load on the session Properties tab.
318
Table 9-8 describes the test load options in the General Options settings on the Properties tab:
Table 9-8. Test Load Options - Flat File Targets
Property Enable Test Load Required/ Optional Optional Description You can configure the Integration Service to perform a test load. With a test load, the Integration Service reads and transforms data without writing to targets. The Integration Service generates all session files and performs all pre- and post-session functions, as if running the full session. The Integration Service writes data to relational targets, but rolls back the data when the session completes. For all other target types, such as flat file and SAP BW, the Integration Service does not write data to the targets. Enter the number of source rows you want to test in the Number of Rows to Test field. You cannot perform a test load on sessions using XML sources. You can perform a test load for relational targets when you configure a session for normal mode. If you configure the session for bulk mode, the session fails. Enter the number of source rows you want the Integration Service to test load. The Integration Service reads the number you configure for the test load.
Optional
To edit the fixed-width properties, select Fixed Width and click Advanced.
319
Table 9-9 describes the options you define in the Fixed Width Properties dialog box:
Table 9-9. Writing to a Fixed-Width Target
Fixed-Width Properties Options Null Character Required/ Optional Required Description Enter the character you want the Integration Service to use to represent null values. You can enter any valid character in the file code page. For more information about using null characters for target files, see Null Characters in Fixed-Width Files on page 327. Select this option to indicate a null value by repeating the null character to fill the field. If you do not select this option, the Integration Service enters a single null character at the beginning of the field to represent a null value. For more information about specifying null characters for target files, see Null Characters in Fixed-Width Files on page 327. Code page of the fixed-width file. Select a code page or a variable: - Code page. Select the code page. - Use Variable. Enter a user-defined workflow or worklet variable or the session parameter $ParamName, and define the code page in the parameter file. For more information about supported code pages, see the PowerCenter Administrator Guide. Default is the PowerCenter Client code page.
Optional
Code Page
Required
320
To edit the delimited properties, select Delimited and click Advanced. Figure 9-14 shows the Delimited File Properties dialog box:
Figure 9-14. Delimited File Properties Dialog Box
321
Table 9-10 describes the options you can define in the Delimited File Properties dialog box:
Table 9-10. Delimited File Properties
Edit Delimiter Options Delimiters Required/ Optional Required Description Character used to separate columns of data. Delimiters can be either printable or single-byte unprintable characters, and must be different from the escape character and the quote character (if selected). To enter a single-byte unprintable character, click the Browse button to the right of this field. In the Delimiters dialog box, select an unprintable character from the Insert Delimiter list and click Add. You cannot select unprintable multibyte characters as delimiters. Select None, Single, or Double. If you select a quote character, the Integration Service does not treat delimiter characters within the quote characters as a delimiter. For example, suppose an output file uses a comma as a delimiter and the Integration Service receives the following row: 342-3849, Smith, Jenna, Rockville, MD, 6. If you select the optional single quote character, the Integration Service ignores the commas within the quotes and writes the row as four fields. If you do not select the optional single quote, the Integration Service writes six separate fields. Code page of the delimited file. Select a code page or a variable: - Code page. Select the code page. - Use Variable. Enter a user-defined workflow or worklet variable or the session parameter $ParamName, and define the code page in the parameter file. For more information about supported code pages, see the PowerCenter Administrator Guide. Default is the PowerCenter Client code page.
Optional Quotes
Required
Code Page
Required
322
Write to fixed-width flat files from relational target definitions. The Integration Service adds spaces to target columns based on transformation datatype. Write to fixed-width flat files from flat file target definitions. You must configure the precision and field width for flat file target definitions to accommodate the total length of the target field. Generate flat file targets by transaction. You can configure the file target to generate a separate output file for each transaction. Write multibyte data to fixed-width files. You must configure the precision of string columns to accommodate character data. When writing shift-sensitive data to a fixedwidth flat file target, the Integration Service adds shift characters and spaces to meet file requirements. Null characters in fixed-width files. The Integration Service writes repeating or nonrepeating null characters to fixed-width target file columns differently depending on whether the characters are single- or multibyte. Character set. You can write ASCII or Unicode data to a flat file target. Write metadata to flat file targets. You can configure the Integration Service to write the column header information when you write to flat file targets.
323
Table 9-11 describes the number of bytes the Integration Service adds to the target column and optional characters it uses for each datatype:
Table 9-11. Datatype Modifications for File Target Columns
Transformation Datatype Connected to Fixed-Width Flat File Target Column Decimal Double Bytes Added by Integration Service 2 7
Optional Characters for the Datatype - Negative sign (-) for the mantissa. - Decimal point (.). - Negative sign for the mantissa. - Decimal point. - Negative sign, e, and three digits for the exponent, for example, -4.2-e123. - Negative sign for the mantissa. - Decimal point. - Negative sign, e, and three digits for the exponent. - Negative sign for the mantissa. - Negative sign for the mantissa. - Decimal point. - Negative sign for the mantissa. - Decimal point. - Negative sign for the mantissa. - Decimal point. - Negative sign, e, and three digits for the exponent.
Float
1 2 2 7
Truncates the row for string columns Writes the row to the reject file for numeric and datetime columns
Note: When the Integration Service writes a row to the reject file, it writes a message in the
session log. When a session writes to a fixed-width flat file based on a fixed-width flat file target definition in the mapping, the Integration Service defines the total length of a field by the precision or field width defined in the target. Fixed-width files are byte-oriented, which means the total length of a field is measured in bytes.
324
Table 9-12 describes how the Integration Service measures the total field length for fields in a fixed-width flat file target definition:
Table 9-12. Field Length Measurements for Fixed-Width Flat File Targets
Datatype Number String Datetime Target Field Property That Determines Total Field Length Field width Precision Field width
Table 9-13 lists the characters you must accommodate when you configure the precision or field width for flat file target definitions to accommodate the total length of the target field:
Table 9-13. Characters to Include when Calculating Field Length for Fixed-Width Targets
Datatype Number Characters to Accommodate - Decimal separator. - Thousands separators. - Negative sign (-) for the mantissa. - Multibyte data. - Shift-in and shift-out characters. For more information, see Writing Multibyte Data to Fixed-Width Flat Files on page 326. - Date and time separators, such as slashes (/), dashes (-), and colons (:). For example, the format MM/DD/YYYY HH24:MI:SS.US has a total length of 26 bytes.
String
Datetime
When you edit the flat file target definition in the mapping, define the precision or field width great enough to accommodate both the target data and the characters in Table 9-13. For example, suppose you have a mapping with a fixed-width flat file target definition. The target definition contains a number column with a precision of 10 and a scale of 2. You use a comma as the decimal separator and a period as the thousands separator. You know some rows of data might have a negative value. Based on this information, you know the longest possible number is formatted with the following format:
-NN.NNN.NNN,NN
Open the flat file target definition in the mapping and define the field width for this number column as a minimum of 14 bytes. For more information about formatting numeric and datetime values, see Working with Flat Files in the Designer Guide.
commit point. The Integration Service uses the FileName port value from the first row in each transaction to name the output file. For more information about generating separate a output file by transaction, see Creating Target Files by Transaction on page 373.
Non shift-sensitive multibyte data. The file contains all multibyte data. Configure the precision in the target definition to allow for the additional bytes. For example, you know that the target data contains four double-byte characters, so you define the target definition with a precision of 8 bytes. If you configure the target definition with a precision of 4, the Integration Service truncates the data before writing to the target.
Shift-sensitive multibyte data. The file contains single-byte and multibyte data. When writing to a shift-sensitive flat file target, the Integration Service adds shift characters and spaces to meet file requirements. You must configure the precision in the target definition to allow for the additional bytes and the shift characters. For more information, see Writing Shift-Sensitive Multibyte Data on page 326.
Note: Delimited files are character-oriented, and you do not need to allow for additional
If a column begins or ends with a double-byte character, the Integration Service adds shift characters so the column begins and ends with a single-byte shift character. If the data is shorter than the column width, the Integration Service pads the rest of the column with spaces. If the data is longer than the column width, the Integration Service truncates the data so the column ends with a single-byte shift character.
326
To illustrate how the Integration Service handles a fixed-width file containing shift-sensitive data, say you want to output the following data to the target:
SourceCol1 AAAA SourceCol2 aaaa
The first target column contains eight bytes and the second target column contains four bytes. The Integration Service must add shift characters to handle shift-sensitive data. Since the first target column can handle eight bytes, the Integration Service truncates the data before it can add the shift characters.
TargetCol1 -oAAA-i TargetCol2 aaaa
For the first target column, the Integration Service writes three of the double-byte characters to the target. It cannot write any additional double-byte characters to the output column because the column must end in a single-byte character. If you add two more bytes to the first target column definition, then the Integration Service can add shift characters and write all the data without truncation. For the second target column, the Integration Service writes all four single-byte characters to the target. It does not add write shift characters to the column because the column begins and ends with single-byte characters.
327
Character Set
You can configure the Integration Service to run sessions with flat file targets in either ASCII or Unicode data movement mode. If you configure a session with a flat file target to run in Unicode data movement mode, the target file code page must be a superset of the source code page. Delimiters, escape, and null characters must be valid in the specified code page of the flat file. If you configure a session to run in ASCII data movement mode, delimiters, escape, and null characters must be valid in the ISO Western European Latin1 code page. Any 8-bit character you specified in previous versions of PowerCenter is still valid. For more information about configuring and working with data movement modes and code pages, see the PowerCenter Administrator Guide.
The column width for ITEM_ID is six. When you enable the Output Metadata For Flat File Target option, the Integration Service writes the following text to a flat file:
#ITEM_ITEM_NAME 100001Screwdriver 100002Hammer 100003Small nails PRICE 9.50 12.90 3.00
For information about configuring the Integration Service to output flat file metadata, see the PowerCenter Administrator Guide.
328
Multiple target types. You can create a session that writes to both relational and flat file targets. Multiple target connection types. You can create a session that writes to a target on an Oracle database and to a target on a DB2 database. Or, you can create a session that writes to multiple targets of the same type, but you specify different target connections for each target in the session.
All database connections you define in the Workflow Manager are unique to the Integration Service, even if you define the same connection information. For example, you define two database connections, Sales1 and Sales2. You define the same user name, password, connect string, code page, and attributes for both Sales1 and Sales2. Even though both Sales1 and Sales2 define the same connection information, the Integration Service treats them as different database connections. When you create a session with two relational targets and specify Sales1 for one target and Sales2 for the other target, you create a session with heterogeneous targets. You can create a session with heterogeneous targets in one of the following ways:
Create a session based on a mapping with targets of different types or different database types. In the session properties, keep the default target types and database types. Create a session based on a mapping with the same target types. However, in the session properties, specify different target connections for the different target instances, or override the target type to a different type. Relational target to flat file. Relational target to any other relational database type. Verify the datatypes used in the target definition are compatible with both databases. SAP BW target to a flat file target type.
Note: When the Integration Service runs a session with at least one relational target, it
performs database transactions per target connection group. For example, it orders the target load for targets in a target connection group when you enable constraint-based loading. For more information, see Working with Target Connection Groups on page 310.
329
Reject Files
During a session, the Integration Service creates a reject file for each target instance in the mapping. If the writer or the target rejects data, the Integration Service writes the rejected row into the reject file. The reject file and session log contain information that helps you determine the cause of the reject. Each time you run a session, the Integration Service appends rejected data to the reject file. Depending on the source of the problem, you can correct the mapping and target database to prevent rejects in subsequent sessions.
Note: If you enable row error logging in the session properties, the Integration Service does not
create a reject file. It writes the reject rows to the row error tables or file.
The Workflow Manager replaces slash characters in the target instance name with underscore characters. To find a reject file name and path, view the target properties settings on the Mapping tab of session properties.
330
Row indicator. The first column in each row of the reject file is the row indicator. The row indicator defines whether the row was marked for insert, update, delete, or reject. If the session is a user-defined commit session, the row indicator might indicate whether the transaction was rolled back due to a non-fatal error, or if the committed transaction was in a failed target connection group. For more information about user-defined commit sessions and rejected rows, see User-Defined Commits on page 365.
Column indicator. Column indicators appear after every column of data. The column indicator defines whether the column contains valid, overflow, null, or truncated data.
The following sample reject file shows the row and column indicators:
0,D,1921,D,Nelson,D,William,D,415-541-5145,D 0,D,1922,D,Page,D,Ian,D,415-541-5145,D 0,D,1923,D,Osborne,D,Lyle,D,415-541-5145,D 0,D,1928,D,De Souza,D,Leo,D,415-541-5145,D
Reject Files
331
0,D,2001123456789,O,S. MacDonald,D,Ira,D,415-541-514566,T
Row Indicators
The first column in the reject file is the row indicator. The row indicator is a flag that defines the update strategy for the data row. Table 9-14 describes the row indicators in a reject file:
Table 9-14. Row Indicators in Reject File
Row Indicator 0 1 2 3 4 5 6 7 8 9 Meaning Insert Update Delete Reject Rolled-back insert Rolled-back update Rolled-back delete Committed insert Committed update Committed delete Rejected By Writer or target Writer or target Writer or target Writer Writer Writer Writer Writer Writer Writer
If a row indicator is 0, 1, or 2, the writer or the target database rejected the row. If a row indicator is 3, an update strategy expression marked the row for reject.
Column Indicators
A column indicator appears after every column of data. A column indicator defines whether the data is valid, overflow, null, or truncated.
Note: A column indicator D also appears after each row indicator.
Overflow. Numeric data exceeded the specified precision or scale for the column.
332
Null columns appear in the reject file with commas marking their column. An example of a null column surrounded by good data appears as follows:
5,D,,N,5,D
Either the writer or target database can reject a row. Consult the session log to determine the cause for rejection.
Reject Files
333
334
Chapter 10
Real-time Processing
Overview, 336 Understanding Real-time Data, 337 Configuring Real-time Sessions, 340 Terminating Conditions, 341 Flush Latency, 343 Commit Type, 344 Message Recovery, 345 Recovery File, 346 Recovery Table, 348 Stopping Real-time Sessions, 350 Restarting and Recovering Real-time Sessions, 351 Rules and Guidelines, 353 Informatica Real-time Products, 354
335
Overview
This chapter contains general information about real-time processing. Real-time processing behavior depends on the real-time source. Exceptions are noted in this chapter or are described in the corresponding product documentation. For more information about realtime processing for your Informatica real-time product, see the Informatica product documentation. For more information about real-time processing for PowerExchange, see PowerExchange Interfaces for PowerCenter. You can use PowerCenter to process data in real time. Real-time processing is on-demand processing of data from real-time sources. A real-time session reads, processes, and writes data to targets continuously. By default, a session reads and writes bulk data at scheduled intervals unless you configure the session for real-time processing. To process data in real time, the data must originate from a real-time source. Real-time sources include JMS, WebSphere MQ, TIBCO, webMethods, MSMQ, SAP, and web services. You might want to use real-time processing for processes that require immediate access to dynamic data, such as financial data. To understand real-time processing with PowerCenter, you need to be familiar with the following concepts:
Real-time data. Real-time data includes messages and messages queues, web services messages, and changed source data. Real-time data originates from a real-time source. Real-time sessions. A real-time session is a session that processes real-time source data. A session is real-time if the Integration Service generates a real-time flush based on the flush latency configuration and all transformations propagate the flush to the targets. Latency is the period of time from when source data changes on a source to when a session writes the data to a target. Real-time properties. Real-time properties determine when the Integration Service processes the data and commits the data to the target.
Terminating conditions. Terminating conditions determine when the Integration Service stops reading data from the source and ends the session if you do not want the session to run continuously. Flush latency. Flush latency determines how often the Integration Service flushes realtime data from the source. Commit type. The commit type determines when the Integration Service commits realtime data to the target.
Message recovery. If the real-time session fails, you can recover messages. When you enable message recovery for a real-time session, the Integration Service stores source messages or message IDs in a recovery file or table. If the session fails, you can run the session in recovery mode to recover messages the Integration Service could not process.
336
Messages and message queues. Process messages and message queues from WebSphere MQ, JMS, MSMQ, SAP, TIBCO, and webMethods sources. You can read from messages and message queues. You can write to messages, messaging applications, and message queues. Web service messages. Receive a message from a web service client through the Web Services Hub and transform the data. You can write the data to a target or send a message back to a web service client. Changed source data. Extract changed data that represents changes to a relational database or file source from the change stream and write to a target.
Messaging Application
Integration Service
The messaging application and the Integration Service complete the following tasks to process messages from a message queue: 1. 2. 3. 4. The messaging application adds a message to a queue. The Integration Service reads the message from the queue and extracts the data. The Integration Service processes the data. The Integration Service writes a reply message to a queue.
337
The web service client, the Web Services Hub, and the Integration Service complete the following tasks to process web service messages: 1. 2. 3. 4. The web service client sends a SOAP request to the Web Services Hub. The Web Services Hub processes the SOAP request and passes the request to the Integration Service. The Integration Service runs the service request. It sends a response to the Web Services Hub or writes the data to a target. If the Integration Service sends a response to the Web Services Hub, the Web Services Hub generates a SOAP message reply and passes the reply to the web service client.
338
Figure 10-3 shows an example of how PowerExchange Client for PowerCenter, PowerExchange, and the Integration Service process changed data:
Figure 10-3. Changed Data Processing
PowerExchange Client for PowerCenter 1 3 3 Integration Service 4
Target
PowerExchange Client for PowerCenter, PowerExchange, and the Integration Service complete the following tasks to process changed data: 1. 2. 3. 4. PowerExchange Client for PowerCenter connects to PowerExchange. PowerExchange extracts changed data from the change stream. PowerExchange passes changed data to the Integration Service through PowerExchange Client for PowerCenter. The Integration Service transforms and writes data to a target.
For more information about changed source data, see PowerExchange Interfaces for PowerCenter.
339
Terminating conditions. Define the terminating conditions to determine when the Integration Service stops reading from a source and ends the session. For more information, see Terminating Conditions on page 341. Flush latency. Define a session with flush latency to read and write real-time data. Flush latency determines how often the session commits data to the targets. For more information, see Flush Latency on page 343. Commit type. Define a source- or target-based commit type for real-time sessions. With a source-based commit, the Integration Service commits messages based on the commit interval and the flush latency interval. With a target-based commit, the Integration Service commits messages based on the flush latency interval. For more information, see Commit Type on page 344. Message recovery. Enable recovery for a real-time session to recover messages from a failed session. For more information, see Message Recovery on page 345.
340
Terminating Conditions
A terminating condition determines when the Integration Service stops reading messages from a real-time source and ends the session. When the Integration Service reaches a terminating condition, it stops reading from the real-time source. It processes the messages it read and commits data to the target. Then, it ends the session. You can configure the following terminating conditions:
If you configure multiple terminating conditions, the Integration Service stops reading from the source when it meets the first condition. By default, the Integration Service reads messages continuously and uses the flush latency to determine when it flushes data from the source. After the flush, the Integration Service resets the counters for the terminating conditions.
Idle Time
Idle time is the amount of time in seconds the Integration Service waits to receive messages before it stops reading from the source. -1 indicates an infinite period of time. For example, if the idle time for a JMS session is 30 seconds, the Integration Service waits 30 seconds after reading from JMS. If no new messages arrive in JMS within 30 seconds, the Integration Service stops reading from JMS. It processes the messages and ends the session.
Message Count
Message count is the number of messages the Integration Service reads from a real-time source before it stops reading from the source. -1 indicates an infinite number of messages. For example, if the message count in a JMS session is 100, the Integration Service stops reading from the source after it reads 100 messages. It processes the messages and ends the session.
Note: The name of the message count terminating condition depends on the Informatica
product. For example, the message count for PowerExchange for SAP NetWeaver mySAP Option is called Packet Count. The message count for PowerExchange Client for PowerCenter is called UOW Count.
Terminating Conditions
341
limit to read messages from a real-time source for a set period of time. 0 indicates an infinite period of time. For example, if you use a 10 second time limit, the Integration Service stops reading from the messaging application after 10 seconds. It processes the messages and ends the session.
342
Flush Latency
Use flush latency to run a session in real time. Flush latency determines how often the Integration Service flushes data from the source. For example, if you set the flush latency to 10 seconds, the Integration Service flushes data from the source every 10 seconds. For changed source data, the flush latency interval is determined by the flush latency and the unit of work (UOW) count attributes. For more information, see PowerExchange Interfaces for PowerCenter. The Integration Service uses the following process when it reads data from a real-time source and the session is configured with flush latency: 1. The Integration Service reads data from the source. The flush latency interval begins when the Integration Service reads the first message from the source. 2. 3. 4. At the end of the flush latency interval, the Integration Service stops reading data from the source. The Integration Service processes messages and writes them to the target. The Integration Service reads from the source again until it reaches the next flush latency interval.
Configure flush latency in seconds. The default value is zero, which indicates that the flush latency is disabled and the session does not run in real time. Configure the flush latency interval depending on how dynamic the data is and how quickly users need to access the data. If data is outdated quickly, such as financial trading information, then configure a lower flush latency interval so the target tables are updated as close as possible to when the changes occurred. For example, users need updated financial data every few minutes. However, they need updated customer address changes only once a day. Configure a lower flush latency interval for financial data and a higher flush latency interval for address changes. Use the following rules and guidelines when you configure flush latency:
The Integration Service does not buffer messages longer than the flush latency interval. The lower you set the flush latency interval, the more frequently the Integration Service commits messages to the target. If you use a low flush latency interval, the session can consume more system resources.
If you configure a commit interval, then a combination of the flush latency and the commit interval determines when the data is committed to the target. For more information, see Commit Type on page 344.
Flush Latency
343
Commit Type
The Integration Service commits data to the target based on the flush latency and the commit type. You can configure a session to use the following commit types:
Source-based commit. When you configure a source-based commit, the Integration Service commits data to the target using a combination of the commit interval and the flush latency interval. The first condition the Integration Service meets triggers the end of the flush latency period. After the flush, the counters are reset. For example, you set the flush latency to five seconds and the source-based commit interval to 1,000 messages. The Integration Service commits messages to the target either after reading 1,000 messages from the source or after five seconds.
Target-based commit. When you configure a target-based commit, the Integration Service ignores the commit interval and commits data to the target based on the flush latency interval.
When writing to targets in a real-time session, the Integration Service processes commits serially and commits data to the target in real time. It does not store data in the DTM buffer memory. For more information about commit types and intervals, see Understanding Commit Points on page 357.
344
Message Recovery
When you enable message recovery for a real-time session, the Integration Service can recover unprocessed messages from a failed session. The Integration Service stores source messages or message IDs in a recovery file or table. If the session fails, run the session in recovery mode to recover the messages the Integration Service did not process. Depending on the real-time source and the target type, the messages or message IDs are stored in the following storage types:
Recovery file. Messages or message IDs are stored in a designated local recovery file. A session with a real-time source and a non-relational target uses the recovery file. For more information, see Recovery File on page 346. Recovery table. Message IDs are stored in a recovery table in the target database. A session with a JMS or WebSphere MQ source and a relational target uses the recovery table. For more information, see Recovery Table on page 348.
A session can use both storage types. For example, a session with a JMS and TIBCO source uses a recovery file and recovery table. When you recover a real-time session, the Integration Service restores the state of operation from the point of interruption. It reads and processes the messages in the recovery file or table. Then, it ends the session. During recovery, the terminating conditions do not affect the messages the Integration Service reads from the recovery file or table. For example, if you specified message count and idle time for the session, the conditions apply to the messages the Integration Service reads from the source, not the recovery file or table. Sessions with MSMQ sources, web service messages, or changed source data use a different recovery strategy. For more information, see the PowerExchange for MSMQ User Guide, PowerCenter Web Services Provider Guide, or PowerExchange Interfaces for PowerCenter.
2.
The Integration Service stores messages in the location indicated by the recovery cache directory. The default value recovery cache directory is $PMCacheDir.
Message Recovery
345
Recovery File
The Integration Service stores messages or message IDs in a recovery file for real-time sessions that are enabled for recovery and include the following source and target types:
JMS source with non-relational targets WebSphere MQ source with non-relational targets SAP R/3 source and all targets webMethods source and all targets TIBCO source and all targets
The Integration Service temporarily stores messages or message IDs in a local recovery file that you configure in the session properties. During recovery, the Integration Service processes the messages in this recovery file to ensure that data is not lost.
Message Processing
Figure 10-4 shows how the Integration Service processes messages using the recovery file:
Figure 10-4. Message Processing with Recovery File
Recovery File
Real-time Source
Integration Service
Target
The Integration Service completes the following tasks to processes messages using recovery files: 1. 2. The Integration Service reads a message from the source. For sessions with JMS and WebSphere MQ sources, the Integration Service writes the message ID to the recovery file. For all other sessions, the Integration Service writes the message to the recovery file. For sessions with SAP R/3, webMethods, or TIBCO sources, the Integration Service sends an acknowledgement to the source to confirm it read the message. The source deletes the message. The Integration Service repeats steps 1 - 3 until the flush latency is met. The Integration Service processes the messages and writes them to the target. The target commits the messages. For sessions with JMS and WebSphere MQ sources, the Integration Service sends a batch acknowledgement to the source to confirm it read the messages. The source deletes the messages.
3.
4. 5. 6. 7.
346
8.
Message Recovery
When you recover a real-time session, the Integration Service reads and processes the cached messages. After the Integration Service reads all cached messages, it ends the session. For sessions with JMS and WebSphere MQ sources, the Integration Service uses the message ID in the recovery file to retrieve the message from the source. The Integration Service clears the recovery file after the flush latency period expires and at the end of a successful session. If the session fails after the Integration Service commits messages to the target but before it removes the messages from the recovery file, targets can receive duplicate rows during recovery.
Recovery File
347
Recovery Table
The Integration Service stores message IDs in a recovery table for real-time sessions that are enabled for recovery and include the following source and target types:
JMS source with relational targets WebSphere MQ source with relational targets
The Integration Service temporarily stores message IDs and commit numbers in a recovery table on each target database. The commit number indicates the number of commits that the Integration Service issued to the target. During recovery, the Integration Service uses the commit number to determine if it wrote the same amount of messages to all targets. The messages IDs and commit numbers are verified against the recovery table to ensure that no data is lost or duplicated.
Note: The source must use unique message IDs and provide access to the messages through the
message ID.
PM_REC_STATE Table
When the Integration Service runs a real-time session that uses the recovery table and has recovery enabled, it creates a recovery table, PM_REC_STATE, on the target database to store message IDs and commit numbers. When the Integration Service recovers the session, it uses information in the recovery tables to determine if it needs to write the message to the target table. For more information, see Target Recovery Tables on page 378.
Message Processing
Figure 10-5 shows how the Integration Service processes messages using the recovery table:
Figure 10-5. Message Processing with Recovery Table
Target Database Recovery Table Target Table
Real-time Source
Integration Service
The Integration Service completes the following tasks to process messages using recovery tables: 1. 2. The Integration Service reads one message at a time until the flush latency is met. The Integration Service stops reading from the source. It processes the messages and writes them to the target.
348
3.
The Integration Service writes the message IDs, commit numbers, and the transformation states to the recovery table on the target database and writes the messages to the target simultaneously. When the target commits the messages, the Integration Service sends an acknowledgement to the real-time source to confirm that all messages were processed and written to the target. The Integration Service continues to read messages from the source.
4.
5.
Message Recovery
When you recover a real-time session, the Integration Service uses the message ID and the commit number in the recovery table to determine whether it committed messages to all targets. The Integration Service commits messages to all targets if the message ID exists in the recovery table and all targets have the same commit number. During recovery, the Integration Service sends an acknowledgement to the source that it processed the message. The Integration Service does not commit messages to all targets if the message ID does not exist in the recovery table, or if the targets have different commit numbers. During recovery, the Integration Service reads the message from the source and gets the transformation state from the previous message. It processes and writes the message to the targets that did not have the message. When the Integration Service reads all messages from the recovery table, it ends the session. If the session fails before the Integration Service commits messages to all targets and you cold start the session, targets can receive duplicate rows.
Recovery Table
349
JMS and WebSphere MQ. The Integration Service processes messages it read up until you issued the stop. It writes messages to the targets. MSMQ, SAP, TIBCO, webMethods, and web service messages. The Integration Service does not process messages if you stop a session before the Integration Service writes all messages to the target.
When you stop a real-time session with a JMS or a WebSphere MQ source, the Integration Service performs the following tasks: 1. The Integration Service stops reading messages from the source. If you stop a real-time recovery session, the Integration Service stops reading from the source after it recovers all messages. 2. 3. 4. The Integration Service processes messages in the pipeline and writes to the targets. The Integration Service sends an acknowledgement to the source. The Integration Service clears the recovery table or recovery file to avoid data duplication when you restart the session.
When you restart the session, the Integration Service starts reading from the source. It restores the session and transformation state of operation to resume the session from the point of interruption.
350
Automatic recovery. The Integration Service restarts the session if you configured the workflow to automatically recover terminated tasks. The Integration Service recovers any unprocessed data and restarts the session regardless of the real-time source. Manual recovery. Use a Workflow Monitor or Workflow Manager menu command or pmcmd command to recover the session. For some real-time sources, you must recover the session before you restart it or the Integration Service will not process messages from the failed session. For more information, see Restart and Recover Commands on page 352.
351
- Recover Task - Recover Workflow - Recover Workflow from Task - Cold Start Task - Cold Start Workflow - Cold Start Workflow from Task
Discards the recovery information and restarts the task or workflow. For more information, see Restarting a Task or Workflow Without Recovery on page 587.
For more information about how to recover a workflow or session, see Steps to Recover Workflows and Tasks on page 396.
352
The session fails if a mapping contains a Transaction Control transformation. The session fails if a mapping contains any transformation with Generate Transactions enabled. The session fails if a mapping contains any transformation with the transformation scope set to all input. The session fails if a mapping contains any transformation that has row transformation scope and receives input from multiple transaction control points. When you configure a session for recovery, use source-based commits, and add partitions, you must specify pass-through partitioning at each partition point. The session fails if the load scope for the target is set to all input. The Integration Service ignores flush latency when you run a session in debug mode. If the mapping contains a relational target, configure the load type for the target to normal. If the mapping contains an XML target definition, select Append to Document for the On Commit option in the target definition. The Integration Service is not resilient to connection failures to messaging systems.
The source definition is the master source for a Joiner transformation. You configure multiple source definitions to run concurrently for the same target load order group. The mapping contains an XML target definition. Partition points are not all pass-through. You edit the recovery file or the mapping before you restart the session and you run a session with a recovery strategy of Restart or Resume.
353
PowerExchange for JMS. Use PowerExchange for JMS to read from JMS sources and write to JMS targets. You can read from JMS messages, JMS provider message queues, or JMS provider based on message topic. You can write to JMS provider message queues or to a JMS provider based on message topic. JMS providers are message-oriented middleware systems that can send and receive JMS messages. During a session, the Integration Service connects to the Java Naming and Directory Interface (JNDI) to determine connection information. When the Integration Service determines the connection information, it connects to the JMS provider to read or write JMS messages.
PowerExchange for WebSphere MQ. Use PowerExchange for WebSphere MQ to read from WebSphere MQ message queues and write to WebSphere MQ message queues or database targets. PowerExchange for WebSphere MQ interacts with the WebSphere MQ queue manager, message queues, and WebSphere MQ messages during data extraction and loading. PowerExchange for TIBCO. Use PowerExchange for TIBCO to read messages from TIBCO and write messages to TIBCO in TIB/Rendezvous or AE format. The Integration Service receives TIBCO messages from a TIBCO daemon, and it writes messages through a TIBCO daemon. The TIBCO daemon transmits the target messages across a local or wide area network. Target listeners subscribe to TIBCO target messages based on the message subject.
PowerExchange for webMethods. Use PowerExchange for webMethods to read documents from webMethods sources and write documents to webMethods targets. The Integration Service connects to a webMethods broker that sends, receives, and queues webMethods documents. The Integration Service reads and writes webMethods documents based on a defined document type or the client ID. The Integration Service also reads and writes webMethods request/reply documents.
PowerExchange for MSMQ. Use PowerExchange for MSMQ to read from MSMQ sources and write to MSMQ targets. The Integration Service connects to the Microsoft Messaging Queue to read data from messages or write data to messages. The queue can be public or private and transactional or non-transactional.
PowerExchange for SAP NetWeaver mySAP Option. Use PowerExchange for SAP NetWeaver mySAP Option to read from SAP using outbound IDocs or write to SAP using inbound IDocs using Application Link Enabling (ALE). The Integration Service can read from outbound IDocs and write to a relational target. The Integration Service can read data from a relational source and write the data to an inbound IDoc. The Integration Service can capture changes to the master data or transactional data in the SAP application database in real time.
354
PowerCenter Web Services Provider. Use the PowerCenter Web Services Provider to expose transformation logic as a service through the Web Services Hub and write client applications to run real-time web services. You can create a service mapping to receive a message from a web service client, transform it, and write it to any target PowerCenter supports. You can also create a service mapping with both a web service source and target definition to receive a message request from a web service client, transform the data, and send the response back to the web service client. The Web Services Hub receives requests from web service clients and passes them to the gateway. The Integration Service or the Repository Service process the requests and send a response to the web service client through the Web Services Hub.
PowerExchange. Use PowerExchange to extract and load relational and non-relational data, extract changed data, and extract changed data in real time. To extract data, the Integration Service reads changed data from PowerExchange on the machine hosting the source. You can extract and load data from multiple sources and targets, such as DB2/390, DB2/400, and Oracle. You can also use a data map from a PowerExchange Listener as a non-relational source. For more information, see PowerExchange Interfaces for PowerCenter.
355
356
Chapter 11
Overview, 358 Target-Based Commits, 359 Source-Based Commits, 360 User-Defined Commits, 365 Understanding Transaction Control, 369 Setting Commit Properties, 374
357
Overview
A commit interval is the interval at which the Integration Service commits data to targets during a session. The commit point can be a factor of the commit interval, the commit interval type, and the size of the buffer blocks. The commit interval is the number of rows you want to use as a basis for the commit point. The commit interval type is the type of rows that you want to use as a basis for the commit point. You can choose between the following commit types:
Target-based commit. The Integration Service commits data based on the number of target rows and the key constraints on the target table. The commit point also depends on the buffer block size, the commit interval, and the Integration Service configuration for writer timeout. Source-based commit. The Integration Service commits data based on the number of source rows. The commit point is the commit interval you configure in the session properties. User-defined commit. The Integration Service commits data based on transactions defined in the mapping properties. You can also configure some commit and rollback options in the session properties.
Source-based and user-defined commit sessions have partitioning restrictions. If you configure a session with multiple partitions to use source-based or user-defined commit, you can choose pass-through partitioning at certain partition points in a pipeline. For more information, see Setting Partition Types on page 484.
358
Target-Based Commits
During a target-based commit session, the Integration Service commits rows based on the number of target rows and the key constraints on the target table. The commit point depends on the following factors:
Commit interval. The number of rows you want to use as a basis for commits. Configure the target commit interval in the session properties. Writer wait timeout. The amount of time the writer waits before it issues a commit. Configure the writer wait timeout in the Integration Service setup. Buffer blocks. Blocks of memory that hold rows of data during a session. You can configure the buffer block size in the session properties, but you cannot configure the number of rows the block holds.
When you run a target-based commit session, the Integration Service may issue a commit before, on, or after, the configured commit interval. The Integration Service uses the following process to issue commits:
When the Integration Service reaches a commit interval, it continues to fill the writer buffer block. When the writer buffer block fills, the Integration Service issues a commit. If the writer buffer fills before the commit interval, the Integration Service writes to the target, but waits to issue a commit. It issues a commit when one of the following conditions is true:
The writer is idle for the amount of time specified by the Integration Service writer wait timeout option. The Integration Service reaches the commit interval and fills another writer buffer.
For more information about configuring the writer wait timeout, see Creating and Configuring the Integration Service in the PowerCenter Administrator Guide.
Note: When you choose target-based commit for a session containing an XML target, the
Workflow Manager disables the On Commit session property on the Transformations view of the Mapping tab.
Target-Based Commits
359
Source-Based Commits
During a source-based commit session, the Integration Service commits data to the target based on the number of rows from some active sources in a target load order group. These rows are referred to as source rows. When the Integration Service runs a source-based commit session, it identifies commit source for each pipeline in the mapping. The Integration Service generates a commit row from these active sources at every commit interval. The Integration Service writes the name of the transformation used for source-based commit intervals into the session log:
Source-based commit interval based on... TRANSFORMATION_NAME
The Integration Service might commit less rows to the target than the number of rows produced by the active source. For example, you have a source-based commit session that passes 10,000 rows through an active source, and 3,000 rows are dropped due to transformation logic. The Integration Service issues a commit to the target when the 7,000 remaining rows reach the target. The number of rows held in the writer buffers does not affect the commit point for a sourcebased commit session. For example, you have a source-based commit session that passes 10,000 rows through an active source. When those 10,000 rows reach the targets, the Integration Service issues a commit. If the session completes successfully, the Integration Service issues commits after 10,000, 20,000, 30,000, and 40,000 source rows. If the targets are in the same transaction control unit, the Integration Service commits data to the targets at the same time. If the session fails or aborts, the Integration Service rolls back all uncommitted data in a transaction control unit to the same source row. If the targets are in different transaction control units, the Integration Service performs the commit when each target receives the commit row. If the session fails or aborts, the Integration Service rolls back each target to the last commit point. It might not roll back to the same source row for targets in separate transaction control units. For more information about transaction control units, see Understanding Transaction Control Units on page 371.
Note: Source-based commit may slow session performance if the session uses a one-to-one
mapping. A one-to-one mapping is a mapping that moves data from a Source Qualifier, XML Source Qualifier, or Application Source Qualifier transformation directly to a target. For more information about performance, see the PowerCenter Performance Tuning Guide.
360
XML Source Qualifier when you only connect ports from one output group Normalizer (VSAM) Aggregator with the All Input transformation scope Joiner with the All Input transformation scope Rank with the All Input transformation scope Sorter with the All Input transformation scope Custom with one output group and with the All Input transformation scope A multiple input group transformation with one output group connected to multiple upstream transaction control points Mapplet, if it contains one of the above transformations
For more information about transformation scope and transaction control, see Understanding Transaction Control on page 369. For more information about active sources, see Working with Active Sources on page 312. A mapping can have one or more target load order groups, and a target load order group can have one or more active sources that generate commits. The Integration Service uses the commits generated by the active source that is closest to the target definition. This is known as the commit source. For example, you have the mapping in Figure 11-1:
Figure 11-1. Mapping with a Single Commit Source
The mapping contains a Source Qualifier transformation and an Aggregator transformation with the All Input transformation scope. The Aggregator transformation is closer to the targets than the Source Qualifier transformation and is therefore used as the commit source for the source-based commit session.
Source-Based Commits
361
The mapping contains a target load order group with one source pipeline that branches from the Source Qualifier transformation to two targets. One pipeline branch contains an Aggregator transformation with the All Input transformation scope, and the other contains an Expression transformation. The Integration Service identifies the Source Qualifier transformation as the commit source for t_monthly_sales and the Aggregator as the commit source for T_COMPANY_ALL. It performs a source-based commit for both targets, but uses a different commit source for each.
The target receives data from the XML Source Qualifier transformation, and you connect multiple output groups from an XML Source Qualifier transformation to downstream transformations. An XML Source Qualifier transformation does not generate commits when you connect multiple output groups downstream. The target receives data from an active source with multiple output groups other than an XML Source Qualifier transformation. For example, the target receives data from a Custom transformation that you do not configure to generate transactions. Multiple output group active sources neither generate nor propagate commits.
mapping, the Integration Service can use different commit types for targets in this session depending on the transformations used in the mapping:
You put a commit source between the XML Source Qualifier transformation and the target. The Integration Service uses source-based commit for the target because it receives commits from the commit source. The active source is the commit source for the target. You do not put a commit source between the XML Source Qualifier transformation and the target. The Integration Service uses target-based commit for the target because it receives no commits.
Connected to an XML Source Qualifier transformation with multiple connected output groups. Integration Service uses target-based commit when loading to these targets.
Connected to an active source that generates commits, AGG_Sales. Integration Service uses source-based commit when loading to this target.
This mapping contains an XML Source Qualifier transformation with multiple output groups connected downstream. Because you connect multiple output groups downstream, the XML Source Qualifier transformation does not generate commits. You connect the XML Source Qualifier transformation to two relational targets, T_STORE and T_PRODUCT. Therefore, these targets do not receive any commit generated by an active source. The Integration Service uses target-based commit when loading to these targets. However, the mapping includes an active source that generates commits, AGG_Sales, between the XML Source Qualifier transformation and T_YTD_SALES. The Integration Service uses source-based commit when loading to T_YTD_SALES.
Source-Based Commits
363
You put a commit source between the Custom transformation and the target. The Integration Service uses source-based commit for the target because it receives commits from the active source. The active source is the commit source for the target. You do not put a commit source between the Custom transformation and the target. The Integration Service uses target-based commit for the target because it receives no commits.
Connected to an active source that generates commits, AGG_store_orders. Integration Service uses source-based commit when loading to this target. Transformation Scope is All Input.
The mapping contains a multiple output group Custom transformation, CT_XML_Parser, which drops the commits generated by the Source Qualifier transformation. Therefore, targets T_store_name and T_store_addr do not receive any commits generated by an active source. The Integration Service uses target-based commit when loading to these targets. However, the mapping includes an active source that generates commits, AGG_store_orders, between the Custom transformation and T_store_orders. The Integration Service uses sourcebased commit when loading to T_store_orders.
Note: You can configure a Custom transformation to generate transactions when the Custom
transformation procedure outputs transactions. When you do this, configure the session for user-defined commit. For more information about user-defined commit sessions, see UserDefined Commits on page 365.
364
User-Defined Commits
During a user-defined commit session, the Integration Service commits and rolls back transactions based on a row or set of rows that pass through a Transaction Control transformation. The Integration Service evaluates the transaction control expression for each row that enters the transformation. The return value of the transaction control expression defines the commit or rollback point. You can also create a user-defined commit session when the mapping contains a Custom transformation configured to generate transactions. When you do this, the procedure associated with the Custom transformation defines the transaction boundaries. When the Integration Service evaluates a commit row, it commits all rows in the transaction to the target or targets. When it evaluates a rollback row, it rolls back all rows in the transaction from the target or targets. The Integration Service writes a message to the session log at each commit and rollback point. The session details are cumulative. The following message is a sample commit message from the session log:
WRITER_1_1_1> WRT_8317 USER-DEFINED COMMIT POINT Wed Oct 15 08:15:29 2003
=================================================== WRT_8036 Target: TCustOrders (Instance Name: [TCustOrders]) WRT_8038 Inserted rows - Requested: 1003 Rejected: 0 Affected: 1023 Applied: 1003
When the Integration Service writes all rows in a transaction to all targets, it issues commits sequentially for each target. The Integration Service rolls back data based on the return value of the transaction control expression or error handling configuration. If the transaction control expression returns a rollback value, the Integration Service rolls back the transaction. If an error occurs, you can choose to roll back or commit at the next commit point. If the transaction control expression evaluates to a value other than commit, rollback, or continue, the Integration Service fails the session. For more information about valid values, see Transaction Control Transformation in the Transformation Guide. When the session completes, the Integration Service may write data to the target that was not bound by commit rows. You can choose to commit at end of file or to roll back that open transaction.
Note: If you use bulk loading with a user-defined commit session, the target may not recognize
the transaction boundaries. If the target connection group does not support transactions, the Integration Service writes the following message to the session log:
WRT_8324 Warning: Target Connection Groups connection doesnt support transactions. Targets may not be loaded according to specified transaction boundaries rules.
User-Defined Commits
365
Rollback evaluation. The transaction control expression returns a rollback value. Open transaction. You choose to roll back at the end of file. Roll back on error. You choose to roll back commit transactions if the Integration Service encounters a non-fatal error. Roll back on failed commit. If any target connection group in a transaction control unit fails to commit, the Integration Service rolls back all uncommitted data to the last successful commit point.
For more information about transaction control units, see Understanding Transaction Control Units on page 371.
Rollback Evaluation
If the transaction control expression returns a rollback value, the Integration Service rolls back the transaction and writes a message to the session log indicating that the transaction was rolled back. It also indicates how many rows were rolled back. The following message is a sample message that the Integration Service writes to the session log when the transaction control expression returns a rollback value:
WRITER_1_1_1> WRT_8326 User-defined rollback processed WRITER_1_1_1> WRT_8331 Rollback statistics WRT_8162 =================================================== WRT_8330 Rolled back [333] inserted, [0] deleted, [0] updated rows for the target [TCustOrders]
The following message is a sample message indicating that Commit on End of File is enabled in the session properties:
WRITER_1_1_1> WRT_8143 Commit at end of Load Order Group Wed Nov 05 08:15:29 2003
366
Note: The Integration Service does not roll back a transaction if it encounters an error before it
7.
User-Defined Commits
367
8.
The Integration Service writes the row to the reject file from TCG_T1 and TCG1_T2. These are the commit rows associated with the third commit point.
Figure 11-5 illustrates Integration Service behavior when it rolls back on a failed commit:
Figure 11-5. Roll Back on Failed Commit Example
Third commit is successful (3). Rows appear in the reject file (8).
Third commit fails (4). Integration Service rolls back to second commit (6). Rows appear in reject file (7). Integration Service does not issue third commit (5). It rolls back to second commit (6). Rows appear in reject file (7).
The following table describes row indicators in the reject file for committed transactions in a failed transaction control unit:
Row Indicator 7 8 9 Description Committed insert Committed update Committed delete
368
Generates transaction boundaries. The transformations that define transaction boundaries differ, depending on the session commit type:
Target-based and user-defined commit. Transaction generators generate transaction boundaries. A transaction generator is a transformation that generates both commit and rollback rows. The Transaction Control and Custom transformation are transaction generators. Source-based commit. Some active sources generate commits. They do not generate rollback rows. Also, transaction generators generate commit and rollback rows. For a list of active sources that generate commits, see Determining the Commit Source on page 360.
Drops incoming transaction boundaries. When a transformation drops incoming transaction boundaries, and does not generate commits, the Integration Service outputs all rows into an open transaction. All active sources that generate commits and transaction generators drop incoming transaction boundaries.
For a list of transaction control points, see Table 11-1 on page 370.
Transformation Scope
You can configure how the Integration Service applies the transformation logic to incoming data with the Transformation Scope transformation property. When the Integration Service processes a transformation, it either drops transaction boundaries or preserves transaction boundaries, depending on the transformation scope and the mapping configuration. You can choose one of the following values for the transformation scope:
Row. Applies the transformation logic to one row of data at a time. Choose Row when a row of data does not depend on any other row. When you choose Row for a
369
transformation connected to multiple upstream transaction control points, the Integration Service drops transaction boundaries and outputs all rows from the transformation as an open transaction. When you choose Row for a transformation connected to a single upstream transaction control point, the Integration Service preserves transaction boundaries.
Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions. When you choose Transaction, the Integration Service preserves incoming transaction boundaries. It resets any cache, such as an aggregator or lookup cache, when it receives a new transaction. When you choose Transaction for a multiple input group transformation, you must connect all input groups to the same upstream transaction control point.
All Input. Applies the transformation logic on all incoming data. When you choose All Input, the Integration Service drops incoming transaction boundaries and outputs all rows from the transformation as an open transaction. Choose All Input when a row of data depends on all rows in the source.
Table 11-1 lists the transformation scope values available for each transformation:
Table 11-1. Transformation Scope Property Values
Transformation Aggregator Application Source Qualifier Complex Data Custom* n/a Transaction control point. Default. Read only. Optional. Transaction control point when configured to generate commits or when connected to multiple upstream transaction control points. Optional. Transaction control point when configured to generate commits. Default. Always a transaction control point. Generates commits when it has one output group or when configured to generate commits. Otherwise, it generates an open transaction. Row Transaction Optional. All Input Default. Transaction control point.
Default. Does not display. Default. Does not display. Default. Does not display. Default. Read only. Default for passive transformations. Optional for active transformations. Optional. Default for active transformations. Default. Transaction control point.
370
Default. Does not display. Default. Does not display. Transaction control point. Default. Does not display. Default. Does not display. Optional. Transaction when the flush on commit is set to create a new document. Default. Does not display. n/a Transaction control point. Default. Does not display.
*For more information about how the Transformation Scope property affects Custom or Java transformations, see Custom Transformation or Java Transformation in the PowerCenter Transformation Guide.
371
When the Integration Service reaches the commit point for all targets in a transaction control unit, it issues commits sequentially for each target. Figure 11-6 illustrates transaction control units with a Transaction Control transformation:
Figure 11-6. Transaction Control Units
Note that T5_ora1 uses the same connection name as T1_ora1 and T2_ora1. Because T5_ora1 is connected to a separate Transaction Control transformation, it is in a separate transaction control unit and target connection group. If you connect T5_ora1 to tc_TransactionControlUnit1, it will be in the same transaction control unit as all targets, and in the same target connection group as T1_ora1 and T2_ora1.
Transformations with Transaction transformation scope must receive data from a single transaction control point. The Integration Service uses the transaction boundaries defined by the first upstream transaction control point for transformations with Transaction transformation scope. Transaction generators can be effective or ineffective for a target. The Integration Service uses the transaction generated by an effective transaction generator when it loads data to a target. For more information about effective and ineffective transaction generators, see Transaction Control Transformation in the Transformation Guide. The Workflow Manager prevents you from using incremental aggregation in a session with an Aggregator transformation with Transaction transformation scope.
372
Transformations with All Input transformation scope cause a transaction generator to become ineffective for a target in a user-defined commit session. For more information about using transaction generators in mappings, see Transaction Control Transformation in the Transformation Guide. The Integration Service resets any cache at the beginning of each transaction for Aggregator, Joiner, Rank, and Sorter transformations with Transaction transformation scope. You can choose the Transaction transformation scope for Joiner transformations when you use sorted input. When you add a partition point at a transformation with Transaction transformation scope, the Workflow Manager uses the pass-through partition type by default. You cannot change the partition type.
373
* Tip: When you bulk load to Microsoft SQL Server or Oracle targets, define a large commit interval. Microsoft SQL Server and Oracle start a new bulk load transaction after each commit. Increasing the commit interval reduces the number of bulk load transactions and increases performance.
374
Chapter 12
Recovering Workflows
Overview, 376 State of Operation, 377 Recovery Options, 382 Configuring Workflow Recovery, 383 Configuring Task Recovery, 386 Resuming Sessions, 389 Working with Repeatable Data, 391 Steps to Recover Workflows and Tasks, 396 Rules and Guidelines for Session Recovery, 398
375
Overview
Workflow recovery allows you to continue processing the workflow and workflow tasks from the point of interruption. You can recover a workflow if the Integration Service can access the workflow state of operation. The workflow state of operation includes the status of tasks in the workflow and workflow variable values. The Integration Service stores the state in memory or on disk, based on how you configure the workflow:
Enable recovery. When you enable a workflow for recovery, the Integration Service saves the workflow state of operation in a shared location. You can recover the workflow if it terminates, stops, or aborts. The workflow does not have to be running. Suspend. When you configure a workflow to suspend on error, the Integration Service stores the workflow state of operation in memory. You can recover the suspended workflow if a task fails. You can fix the task error and recover the workflow.
The Integration Service recovers tasks in the workflow based on the recovery strategy of the task. By default, the recovery strategy for Session and Command tasks is to fail the task and continue running the workflow. You can configure the recovery strategy for Session and Command tasks. The strategy for all other tasks is to restart the task. When you have high availability, PowerCenter recovers a workflow automatically if a service process that is running the workflow fails over to a different node. You can configure a running workflow to recover a task automatically when the task terminates. PowerCenter also recovers a session and workflow after a database connection interruption. When the Integration Service runs in safe mode, it stores the state of operation for workflows configured for recovery. If the workflow fails the Integration Service fails over to a backup node, the Integration Service does not automatically recover the workflow. You can manually recover the workflow if you have the appropriate privileges on the Integration Service.
376
State of Operation
When you recover a workflow or session, the Integration Service restores the workflow or session state of operation to determine where to begin recovery processing. The Integration Service stores the workflow state of operation in memory or on disk based on the way you configure the workflow. The Integration Service stores the session state of operation based on the way you configure the session.
Active service requests Completed and running task status Workflow variable values
When you run concurrent workflows, the Integration Service appends the instance name or the workflow run ID to the workflow recovery storage file in $PMStorageDir. When you enable a workflow for recovery the Integration Service does not store the session state of operation by default. You can configure the session recovery strategy to save the session state of operation.
State of Operation
377
Source. If the output from a source is not deterministic and repeatable, the Integration Service saves the result from the SQL query to a shared storage file in $PMStorageDir. The Integration Service saves the result from the SQL query to a shared storage file in $PMStorageDir. For more information about deterministic and repeatable data, see Working with Repeatable Data on page 391. Transformation. The Integration Service creates checkpoints in $PMStorageDir to determine where to start processing the pipeline when it runs a recovery session. When you run a session with an incremental Aggregator transformation, the Integration Service creates a backup of the Aggregator cache files in $PMCacheDir at the beginning of a session run. The Integration Service promotes the backup cache to the initial cache at the beginning of a session recovery run.
Relational target recovery data. The Integration Service writes recovery information to recovery tables in the target database to determine the last row committed to the target when the session was interrupted.
PM_RECOVERY. Contains target load information for the session run. The Integration Service removes the information from this table after each successful session and initializes the information at the beginning of subsequent sessions. PM_TGT_RUN_ID. Contains information the Integration Service uses to identify each target on the database. The information remains in the table between session runs. If you manually create this table, you must create a row and enter a value other than zero for LAST_TGT_RUN_ID to ensure that the session recovers successfully. PM_REC_STATE. Contains information the Integration Service uses to determine if it needs to write messages to the target table during recovery for a real-time session. For more information, see PM_REC_STATE Table on page 348.
If you edit or drop the recovery tables before you recover a session, the Integration Service cannot recover the session. If you disable recovery, the Integration Service does not remove the recovery tables from the target database. You must manually remove the recovery tables.
378
State of Operation
379
Note: When concurrent recovery sessions write to the same target database, the Integration
Service may encounter a deadlock on PM_RECOVERY. To retry writing to PM_RECOVERY on deadlock, you can configure the Session Retry on Deadlock option to retry the deadlock for the session. For more information, see Deadlock Retry on page 300.
380
Run one of the following scripts to create the recovery tables in the target database:
Script create_schema_db2.sql create_schema_inf.sql create_schema_ora.sql create_schema_sql.sql create_schema_syb.sql create_schema_ter.sql Database IBM DB2 Informix Oracle Microsoft SQL Server Sybase Teradata
State of Operation
381
Recovery Options
To perform recovery, you must configure the mapping, workflow tasks, and the workflow for recovery. Table 12-4 describes the options that you can configure for recovery:
Table 12-4. Configurable Options for Recovery
Option Suspend Workflow on Error Suspension Email Enable HA Recovery Location Workflow Description Suspends the workflow when a task in the workflow fails. You can fix the failed tasks and recover a suspended workflow. For more information, see Recovering Suspended Workflows on page 384. Sends an email when the workflow suspends. For more information, see Recovering Suspended Workflows on page 384. Saves the workflow state of operation in a shared location. You do not need high availability to enable workflow recovery. For more information, see Configuring Workflow Recovery on page 383. Recovers terminated Session and Command tasks while the workflow is running. You must have the high availability option. For more information, see Automatically Recovering Terminated Tasks on page 388. The number of times the Integration Service attempts to recover a Session or Command task. For more information, see Automatically Recovering Terminated Tasks on page 388. The recovery strategy for a Session or Command task. Determines how the Integration Service recovers a Session or Command task during workflow recovery and how it recovers a session during session recovery. For more information, see Configuring Task Recovery on page 386. Enables the Command task to fail if any of the commands in the task fail. If you do not set this option, the task continues to run when any of the commands fail. You can use this option with Suspend Workflow on Error to suspend the workflow if any command in the task fails. For more information, see Configuring Task Recovery on page 386. Indicates that the transformation always generates the same set of data from the same input data. When you enable this option with the Output is Repeatable option for a relational source qualifier, the Integration Service does not save the SQL results to shared storage. When you enable it for a transformation, you can configure recovery to resume from the last checkpoint. For more information, see Output is Deterministic on page 392. Indicates whether the transformation generates rows in the same order between session runs. The Integration Service can resume a session from the last checkpoint when the output is repeatable and deterministic.When you enable this option with the Output is Deterministic option for a relational source qualifier, the Integration Service does not save the SQL results to shared storage. For more information, see Output is Repeatable on page 392.
Workflow Workflow
Automatically Recover Terminated Tasks Maximum Automatic Recovery Attempts Recovery Strategy
Workflow
Workflow
Session, Command
Command
Output is Deterministic
Transformation
Output is Repeatable
Transformation
382
Stopped
Suspended
Terminated
Note: A failed workflow is a workflow that completes with failure. You cannot recover a failed workflow.
For more information about workflow status, Workflow and Task Status on page 590.
383
The following figure shows where to enable a workflow for recovery from a stopped, terminated, or aborted state:
384
Send an email when the workflow suspends. Configure a workflow to suspend on task error.
385
Stopped
Failed
Terminated
Restart task. When the Integration Service recovers a workflow, it restarts each recoverable task that is configured with a restart strategy. You can configure Session and Command tasks with a restart recovery strategy. All other tasks have a restart recovery strategy by default. Fail task and continue workflow. When the Integration Service recovers a workflow, it does not recover the task. The task status becomes failed, and the Integration Service continues running the workflow.
386
Configure a fail recovery strategy if you want to complete the workflow, but you do not want to recover the task. You can configure Session and Command tasks with the fail task and continue workflow recovery strategy.
Resume from the last checkpoint. The Integration Service recovers a stopped, aborted, or terminated session from the last checkpoint. You can configure a Session task with a resume strategy. For more information about the resume strategy, see Resuming Sessions on page 389.
Table 12-7 describes the recovery strategy for each task type:
Table 12-7. Recovery Strategy by Task Type
Task Type Assignment Command Control Decision Email Event-Raise Event-Wait Session Recovery Strategy Restart task Restart task Fail task and continue workflow Restart task Restart task Restart task Restart task Restart task Resume from the last checkpoint. Restart task Fail task and continue workflow. Restart task n/a Default is fail task and continue workflow. For more information, see Session Task Strategies on page 388. If you use a relative time from the start time of a task or workflow, set the timer with the original value less the passed time. The Integration Service does not recover a worklet. You can recover the session in the worklet by expanding the worklet in the Workflow Monitor and choosing Recover Task. The Integration Service might send duplicate email. Default is fail task and continue workflow. For more information, see Configuring Task Recovery on page 386. Comments
Timer Worklet
Fail task and continue workflow. If you want to suspend the workflow on Command task error, you must configure the task with a fail strategy. If the Command task has more than one command, and you configure a fail strategy, you need to configure the task to fail if any command fails. Restart task. When the Integration Service recovers a workflow, it restarts a Command task that is configured with a restart strategy.
Configure the recovery strategy on the Properties page of the Command task. For more information about Command tasks, see Working with the Command Task on page 169.
387
Resume from the last checkpoint. The Integration Service saves the session state of operation and maintains target recovery tables. If the session aborts, stops, or terminates, the Integration Service uses the saved recovery information to resume the session from the point of interruption. For more information about the resume strategy, see Resuming Sessions on page 389. Restart task. The Integration Service runs the session again when it recovers the workflow. When you recover with restart task, you might need to remove the partially loaded data in the target or design a mapping to skip the duplicate rows. Fail task and continue workflow. When the Integration Service recovers a workflow, it does not recover the session. The session status becomes failed, and the Integration Service continues running the workflow.
Configure the recovery strategy on the Properties page of the Session task.
388
Resuming Sessions
When you configure session recovery to resume from the last checkpoint, the Integration Service creates checkpoints in $PMStorageDir to determine where to start processing session recovery. When the Integration Service resumes a session, it restores the session state of operation, including the state of each source, target, and transformation. The Integration Service determines how much of the source data it needs to process. When the Integration Service resumes a session, the recovery session must produce the same data as the original session. The session is not valid if you configure recovery to resume from the last checkpoint, but the session cannot produce repeatable data. For more information about repeatable data, see Working with Repeatable Data on page 391. The Integration Service can recover flat file sources including FTP sources. It can truncate or append to flat file and FTP targets. When you recover a session from the last checkpoint, the Integration Service restores the session state of operation to determine the type of recovery it can perform:
Incremental. The Integration Service starts processing data at the point of interruption. It does not read or transform rows that it processed before the interruption. By default, the Integration Service attempts to perform incremental recovery. Full. The Integration Service reads all source rows again and performs all transformation logic if it cannot perform incremental recovery. The Integration Service begins writing to the target at the last commit point. If any session component requires full recovery, the Integration Service performs full recovery on the session.
Table 12-8 describes when the Integration Service performs incremental or full recovery, depending on the session configuration:
Table 12-8. Incremental and Full Recovery Session Recovery Situations
Component Commit type Incremental Recovery The session uses a source-based commit. The mapping does not contain any transformation that generates commits. Transformations propagate transactions and the transformation scope must be Transaction or Row. A file source supports incremental reads. The FTP server must support the seek operation to allow incremental reads. A relational source supports incremental reads when the output is deterministic and repeatable. If the output is not deterministic and repeatable, the Integration Service supports incremental relational source reads by staging SQL results to a storage file. Full Recovery The session uses a target-based commit or user-defined commit. At least one transformation is configured with the All transformation scope. n/a The FTP server does not support the seek operation. n/a
Resuming Sessions
389
n/a
390
Source. The output data from the source is repeatable between the original run and the recovery run. For more information, see Source Repeatability on page 391. Transformation. The output data from each transformation to the target is repeatable. For more information, see Transformation Repeatability on page 392.
Source Repeatability
You can resume a session from the last checkpoint when each source generates the same set of data and the order of the output is repeatable between runs. Source data is repeatable based on the type of source in the session.
Relational source. A relational source might produce data that is not the same or in the same order between workflow runs. When you configure recovery to resume from the last checkpoint, the Integration Service stores the SQL result in a cache file to guarantee the output order for recovery. If you know the SQL result will be the same between workflow runs, you can configure the source qualifier to indicate that the data is repeatable and deterministic. When the relational source output is deterministic and the output is always repeatable, the Integration Service does not store the SQL result in a cache file. When the relational
391
output is not repeatable, the Integration Service can skip creating the cache file if a transformation in the mapping always produces ordered data.
SDK source. If an SDK source produces repeatable data, you can enable Output is Deterministic and Output is Repeatable in the SDK Source Qualifier transformation. Flat file source. A flat file does not change between session and recovery runs. If you change a source file before you recover a session, the recovery session might produce unexpected results.
Transformation Repeatability
You can configure a session to resume from the last checkpoint when transformations in the session produce the same data between the session and recovery run. All transformations have properties that determine if the transformation can produce repeatable data. A transformation can produce the same data between a session and recovery run if the output is deterministic and the output is repeatable.
Output is Deterministic
A transformation generates deterministic output when it always creates the same output data from the same input data. If you set the session recovery strategy to resume from the last checkpoint, the Workflow Manager validates that output is deterministic for each transformation.
Output is Repeatable
A transformation generates repeatable data when it generates rows in the same order between session runs. Transformations produce repeatable data based on the transformation type, the transformation configuration, or the mapping configuration.
392
Always. The order of the output data is consistent between session runs even if the order of the input data is inconsistent between session runs. Based on input order. The transformation produces repeatable data between session runs when the order of the input data from all input groups is consistent between session runs. If the input data from any input group is not ordered, then the output is not ordered. When a transformation generates repeatable data based on input order, during session validation, the Workflow Manager validates the mapping to determine if the transformation can produce repeatable data. For example, an Expression transformation produces repeatable data only if it receives repeatable data. Never. The order of the output data is inconsistent between session runs. You cannot configure recovery to resume from the last checkpoint if a transformation does not produce repeatable data.
Never produces repeatable data. Produces repeatable data if it receives repeatable data.
The mapping contains two Source Qualifier transformations that produce repeatable data. The mapping contains a Union and Custom transformation that never produce repeatable data. The Lookup transformation produces repeatable data when it receives repeatable data.
393
Therefore, the target does not receive repeatable data and you cannot configure the session to resume recovery. You can modify the mapping to enable resume recovery. Add a Sorter transformation configured for distinct output rows immediately after the transformations that never output repeatable data. Add the Sorter transformation after the Custom transformation. The following figure shows the mapping with a Sorter transformation connected to the Custom transformation:
The Lookup transformation produces repeatable data because it receives repeatable data from the Sorter transformation. Table 12-9 describes when transformations produce repeatable data:
Table 12-9. Repeatable Data in Transformations
Transformation Aggregator Application Source Qualifier* Complex Data Custom* Expression External Procedure* Filter Joiner Java* Lookup, dynamic Lookup, static Repeatable Data Always. Based on input order. Based on input order. Based on input order. Configure the property according to the transformation procedure behavior. Based on input order. Based on input order. Configure the property according to the transformation procedure behavior. Based on input order. Based on input order. Based on input order. Configure the property according to the transformation procedure behavior. Always. The lookup source must be the same as a target in the session. Based on input order.
394
Rank Router Sequence Generator Sorter, configured for distinct output rows Sorter, not configured for distinct output rows Source Qualifier, flat file Source Qualifier, relational* SQL Transformation Stored Procedure* Transaction Control Union Update Strategy XML Generator XML Parser XML Source Qualifier
* You can configure the Output is Repeatable and Output is Deterministic properties for some transformations. Or you can add a transformation that produces repeatable data immediately after the transformation.
395
Recover a workflow. Continue processing the workflow from the point of interruption. Recover a session. Recover a session but not the rest of the workflow. Recover a workflow from a session. Recover a session and continue processing a workflow.
If the Integration Service uses operating system profiles, recover the session or workflow using the same operating system profile that the Integration Service used to run the session or workflow. If you want to restart a workflow or task without recovery, you can cold start the workflow or task. For more information, see Restarting a Task or Workflow Without Recovery on page 587.
Note: Recovery behavior for real-time sessions varies depending on the real-time source. For
more information, see Restarting and Recovering Real-time Sessions on page 351.
Recovering a Workflow
When you recover a workflow, the Integration Service restores the workflow state of operation and continues processing from the point of failure. The Integration Service uses the task recovery strategy to recover the task that failed. You configure a workflow for recovery by configuring the workflow to suspend when a task fails, or by enabling recovery in the Workflow Properties. You can recover a workflow using the Workflow Manager, the Workflow Monitor, or pmcmd.
To recover a workflow using the Workflow Manager: 1. 2.
Select the workflow in the Navigator or open the workflow in the Workflow Designer workspace. Right-click the workflow and choose Recover Workflow. The Integration Service recovers the interrupted tasks and runs the rest of the workflow.
Select the workflow in the Workflow Monitor. Right-click the workflow and choose Recover. The Integration Service recovers the failed tasks and runs the rest of the workflow.
You can also use the pmcmd recoverworkflow command to recover a workflow. For information about recovering a workflow with pmcmd, see the Command Line Reference.
396 Chapter 12: Recovering Workflows
Recovering a Session
You can recover a failed, terminated, aborted, or stopped session without recovering the workflow. If the workflow completed, you can recover the session without running the rest of the workflow. You must configure a recovery strategy of restart or resume from the last checkpoint to recover a session. The Integration Service recovers the session according to the task recovery strategy. You do not need to suspend the workflow or enable workflow recovery to recover a session.
To recover a session from the Workflow Monitor: 1. 2.
Double-click the workflow in the Workflow Monitor to expand it and display the task. Right-click the session and choose Recover Task.
The Integration Service recovers the failed session according to the recovery strategy. For more information about task recovery strategies, see Task Recovery Strategies on page 386. You can also use the pmcmd starttask with a -recover option to recover a session. For information about recovering a task with pmcmd, see the Command Line Reference.
Double-click the workflow in the Workflow Monitor to expand it and display the session. Right-click the session and choose Recover Workflow from Task.
The Integration Service recovers the failed session according to the recovery strategy. You can use the pmcmd startworkflow with a -recover option to recover a workflow from a session. For information about recovering a workflow with pmcmd, see the Command Line Reference.
Note: To recover a session within a worklet, expand the worklet and then choose to recover the
task.
397
The Integration Service creates a new session log when it runs a recovery session. A session reports performance statistics for the last successful run. You can recover a session containing a transformation that uses the random number generator (RAND) function if you provide a seed parameter.
You must use pass-through partitioning for each transformation. You cannot configure recovery to resume from the last checkpoint for a session that runs on a grid. When you configure a session for full pushdown optimization, the Integration Service runs the session on the database. As a result, it cannot perform incremental recovery if the session fails. When you perform recovery for sessions that contain SQL overrides, the Integration Service must drop and recreate views. When you modify a workflow or session between the interrupted run and the recovery run, you might get unexpected results. The Integration Service does not prevent recovery for a modified workflow. The recovery workflow or session log displays a message when the workflow or the task is modified since last run. The pre-session command and pre-SQL commands run only once when you resume a session from the last checkpoint. If a pre- or post- command or SQL command fails, the Integration Service runs the command again during recovery. Design the commands so you can rerun them. You cannot configure a session to resume if it writes to a relational target in bulk mode.
You change the number of partitions. If you change the number of partitions after a session fails, the recovery session fails. The interrupted task has a fail recovery strategy. If you configure a Command or Session recovery to fail and continue the workflow recovery, the task is not recoverable. Recovery storage file is missing. The Integration Service fails the recovery session or workflow if the recovery storage file is missing from $PMStorageDir or if the definition of $PMStorageDir changes between the original and recovery run.
398
Recovery table is empty or missing from the target database. The Integration Service fails a recovery session under the following circumstances:
You deleted the table after the Integration Service created it. The session enabled for recovery failed immediately after the Integration Service removed the recovery information from the table.
You might get inconsistent data if you perform recovery under the following circumstances:
The sources or targets change after the initial session. If you drop or create indexes or edit data in the source or target tables before recovering a session, the Integration Service may return missing or repeat rows. The source or target code pages change after the initial session failure. If you change the source or target code page, the Integration Service might return incorrect data. You can perform recovery if the code pages are two-way compatible with the original code pages.
399
400
Chapter 13
Sending Email
Overview, 402 Configuring Email on UNIX, 403 Configuring Email on Windows, 404 Working with Email Tasks, 410 Working with Post-Session Email, 414 Working with Suspension Email, 421 Using Service Variables to Address Email, 423 Tips, 424
401
Overview
You can send email to designated recipients when the Integration Service runs a workflow. For example, if you want to track how long a session takes to complete, you can configure the session to send an email containing the time and date the session starts and completes. Or, if you want the Integration Service to notify you when a workflow suspends, you can configure the workflow to send email when it suspends. To send email when the Integration Service runs a workflow, perform the following steps:
Configure the Integration Service to send email. Before creating Email tasks, configure the Integration Service to send email. For more information, see Configuring Email on UNIX on page 403 and Configuring Email on Windows on page 404. If you use a grid or high availability in a Windows environment, you must use the same Microsoft Outlook profile on each node to ensure the Email task can succeed.
Create Email tasks. Before you can configure a session or workflow to send email, you need to create an Email task. For more information, see Working with Email Tasks on page 410. Configure sessions to send post-session email. You can configure the session to send an email when the session completes or fails. You create an Email task and use it for postsession email. For more information, see Working with Post-Session Email on page 414. When you configure the subject and body of post-session email, use email variables to include information about the session run, such as session name, status, and the total number of rows loaded. You can also use email variables to attach the session log or other files to email messages. For more information, see Email Variables and Format Tags on page 415.
Configure workflows to send suspension email. You can configure the workflow to send an email when the workflow suspends. You create an Email task and use it for suspension email. For more information, see Working with Suspension Email on page 421.
The Integration Service sends the email based on the locale set for the Integration Service process running the session. You can use parameters and variables in the email user name, subject, and text. For Email tasks and suspension email, you can use service, service process, workflow, and worklet variables. For post-session email, you can use any parameter or variable type that you can define in the parameter file. For example, you can use the $PMSuccessEmailUser or $PMFailureEmailUser service variable to specify the email recipient for post-session email. For more information about using service variables in email addresses, see Using Service Variables to Address Email on page 423. For more information about defining parameters and variables in parameter files, see Parameter Files on page 687.
402
Log in to the UNIX system as the PowerCenter user who starts the Informatica Services. Type the following lines at the prompt and press Enter:
rmail <your fully qualified email address>,<second fully qualified email address> From <your_user_name>
3.
To indicate the end of the message, type ^D. You should receive a blank email from the email account of the user you specify in the From line. If not, locate the directory where rmail resides and add that directory to the path.
Log in to the UNIX system as the PowerCenter user who starts the Informatica Services. Type the following line at the prompt and press Enter:
rmail <your fully qualified email address>,<second fully qualified email address>
3.
To indicate the end of the message, type . on a separate line and press Enter. Or, type ^D. You should receive a blank email from the email account of the PowerCenter user. If not, locate the directory where rmail resides and add that directory to the path.
After you verify that rmail is installed correctly, you can send email. For more information about configuring email, see Working with Email Tasks on page 410.
403
Install the Microsoft Outlook mail client on each node configured to run the Integration Service. Run Microsoft Outlook on a Microsoft Exchange Server. Configure a Microsoft Outlook profile. Configure Logon network security. Create distribution lists in the Personal Address Book in Microsoft Outlook. Verify the Integration Service is configured to send email using the Microsoft Outlook profile you created in step 1.
Complete the following steps to configure the Integration Service on Windows to send email: 1. 2. 3. 4.
The Integration Service on Windows sends email in MIME format. You can include characters in the subject and body that are not in 7-bit ASCII. For more information about the MIME format or the MIME decoding process, see the email documentation.
Note: If you have high availability or if you use a grid, use the same profile for each node
404
Open the Control Panel on the machine running the Integration Service process. Double-click the Mail (or Mail and Fax) icon.
3.
On the Services tab of the user Properties dialog box, click Show Profiles.
The Mail dialog box displays the list of profiles configured for the computer.
4.
If you have already set up a Microsoft Outlook profile, skip to Step 2. Configure Logon Network Security on page 407. If you do not already have a Microsoft Outlook profile set up, continue to step 5. Click Add in the mail properties window. The Microsoft Outlook Setup Wizard appears.
5.
405
6.
Select Use The Following Information Services and then select Microsoft Exchange Server. Click Next.
7.
8.
Enter the name of the Microsoft Exchange Server. Enter the mailbox name. Click Next.
406
9. 10.
Indicate whether you travel with a computer. Click Next. Enter the path to a personal address book. Click Next.
11.
Indicate whether you want to run Outlook when you start Windows. Click Next. The Setup Wizard indicates that you have successfully configured an Outlook profile.
12.
Click Finish.
Open the Control Panel on the machine running the Integration Service process. Double-click the Mail (or Mail and Fax) icon. The User Properties sheet appears.
407
3.
On the Services tab, select Microsoft Exchange Server and click Properties.
4.
Click the Advanced tab. Set the Logon network security option to NT Password Authentication.
5.
Click OK.
408
From the Administration Console, click the Properties tab for the Integration Service. In the Configuration Properties tab, select Edit. In the MSExchangeProfile field, verify that the name of Microsoft Exchange profile matches the Microsoft Outlook profile you created.
409
Session properties. You can configure the session to send email when the session completes or fails. For more information, see Working with Post-Session Email on page 414. Workflow properties. You can configure the workflow to send email when the workflow is interrupted. For more information, see Working with Suspension Email on page 421. Workflows or worklets. You can include an Email task anywhere in the workflow or worklet to send email based on a condition you define. For more information, see Using Email Tasks in a Workflow or Worklet on page 410.
Figure 13-1 shows the Edit Tasks dialog box for an Email task:
Figure 13-1. Email Task
410
Enter the email address using 7-bit ASCII characters only. You can use service, service process, workflow, and worklet variables in the email address. For more information, see Using Service Variables to Address Email on page 423. You can send email to any valid email address. On Windows, the mail recipient does not have to have an entry in the Global Address book of the Microsoft Outlook profile. If the Integration Service runs on Windows, you can send email to multiple recipients by creating a distribution list in the Personal Address book. All recipients must also be in the Global Address book. You cannot enter multiple addresses separated by commas or semicolons. If the Integration Service runs on UNIX, you can enter multiple email addresses separated by a comma. Do not include spaces between email addresses.
In the Task Developer, click Tasks > Create. The Create Task dialog box appears.
2.
Select an Email task and enter a name for the task. Click Create. The Workflow Manager creates an Email task in the workspace.
3. 4.
411
5. 6. 7.
Click Rename to enter a name for the task. Enter a description for the task in the Description field. Click the Properties tab.
8.
Enter the fully qualified email address of the mail recipient in the Email User Name field. For more information about entering the email address, see Email Address Tips and Guidelines on page 411.
412
9.
Enter the subject of the email in the Email Subject field. You can use a service, service process, workflow, or worklet variable in the email subject. Or, you can leave this field blank. Click the Open button in the Email Text field to open the Email Editor.
10.
11.
Enter the text of the email message in the Email Editor. You can use service, service process, workflow, and worklet variables in the email text. Or, you can leave the Email Text field blank.
Note: You can incorporate format tags and email variables in a post-session email.
However, you cannot add them to an Email task outside the context of a session. For more information, see Email Variables and Format Tags on page 415.
12.
413
Use a reusable Email task. Select a reusable Email task. Edit the nonreusable Email task. Use a nonreusable Email task.
You can specify a reusable Email that task you create in the Task Developer for either success email or failure email. Or, you can create a non-reusable Email task for each session property. When you create a non-reusable Email task for a session, you cannot use the Email task in a workflow or worklet. You cannot specify a non-reusable Email task you create in the Workflow or Worklet Designer for post-session email. You can use parameters and variables in the email user name, subject, and text. Use any parameter or variable type that you can define in the parameter file. For example, you can use the service variable $PMSuccessEmailUser or $PMFailureEmailUser for the email recipient. Ensure that you specify the values of the service variables for the Integration Service that runs the session. For more information, see Using Service Variables to Address Email on
414 Chapter 13: Sending Email
page 423. You can also enter a parameter or variable within the email subject or text, and define it in the parameter file. For more information about defining parameters and variables in parameter files, see Where to Use Parameters and Variables on page 691.
large attachments can cause problems with the email system, avoid attaching excessively large files, such as session logs generated using verbose tracing. The Integration Service generates an error message in the email if an error occurs attaching the file. Table 13-1 describes the email variables that you can use in a post-session email:
Table 13-1. Email Variables for Post-Session Email
Email Variable %a<filename> Description Attach the named file. The file must be local to the Integration Service. The following file names are valid: %a<c:\data\sales.txt> or %a</users/john/data/sales.txt>. The email does not display the full path for the file. Only the attachment file name appears in the email. Note: The file name cannot include the greater than character (>) or a line break. Session start time. Session completion time. Name of the repository containing the session. Session status. Attach the session log to the message. The Integration Service attaches a session log if you configure the session to create a log file. If you do not configure the session to create a log file or you run a session on a grid, the Integration Service creates a temporary file in the PowerCenter Services installation directory and attaches the file. If the Integration Service does not use operating system profiles, verify that the user that starts Informatica Services has permissions on PowerCenter Services installation directory to create a temporary log file. If the Integration Service uses operating system profiles, verify that the operating system user of the operating system profile has permissions on PowerCenter Services installation directory to create a temporary log file. Session elapsed time (session completion time-session start time). Total rows loaded. Name of the mapping used in the session. Name of the folder containing the session. Total rows rejected. Session name.
%b %c %d %e %g
%i %l %m %n %r %s
415
%u %v %w %y %z
Note: The Integration Service ignores %a, %g, and %t when you include them in the email subject. Include these variables in the email message only.
Table 13-2 lists the format tags you can use in an Email task:
Table 13-2. Format Tags for Email Tasks
Formatting tab new line Format Tag \t \n
416
2. 3.
Select Reusable in the Type column for the success email or failure email field. Click the Open button in the Value column to select the reusable Email task.
4. 5.
Select the Email task in the Object Browser dialog box and click OK. Optionally, edit the Email task for this session property by clicking the Edit button in the Value column.
417
If you edit the Email task for either success email or failure email, the edits only apply to this session.
6.
2.
Select Non-Reusable in the Type column for the success email or failure email field.
418
3.
4. 5.
Edit the Email task and click OK. For more information about editing Email tasks, see Working with Email Tasks on page 410. Click OK to close the session properties.
Sample Email
The following example shows a user-entered text from a sample post-session email configuration using variables:
Session complete. Session name: %s Integration Service name: %v %l %r %e %b %c %i %g
Completed Start Time: Tue Nov 22 12:26:31 2005 Completion Time: Tue Nov 22 12:26:41 2005 Elapsed time: 0:00:10 (h:m:s)
420
You can use service, service process, workflow, and worklet variables in the email user name, subject, and text. For example, you can use the service variable $PMSuccessEmailUser or $PMFailureEmailUser for the email recipient. Ensure that you specify the values of the service variables for the Integration Service that runs the session. For more information, see Using Service Variables to Address Email on page 423. You can also enter a parameter or variable within the email subject or text, and define it in the parameter file. For more information about defining parameters and variables in parameter files, see Where to Use Parameters and Variables on page 691.
421
In the Workflow Designer, open the workflow. Click Workflows > Edit to open the workflow properties. On the General tab, select Suspend on Error. Click the Browse Emails button to select a reusable Email task.
Note: The Workflow Manager returns an error if you do not have any reusable Email task
in the folder. Create a reusable Email task in the folder before you configure suspension email.
5. 6.
Choose a reusable Email task and click OK. Click OK to close the workflow properties.
422
$PMSuccessEmailUser. Defines the email address of the user to receive email when a session completes successfully. Use this variable with post-session email. You can also use it to address email in standalone Email tasks or suspension email. $PMFailureEmailUser. Defines the email address of the user to receive email when a session completes with failure or when the Integration Service suspends a workflow. Use this variable with post-session or suspension email. You can also use it to address email in standalone Email tasks.
When you use one of these service variables, the Integration Service sends email to the address configured for the service variable. $PMSuccessEmailUser and $PMFailureEmailUser are optional process variables. Verify that you define a variable before using it to address email. You might use this functionality when you have an administrator who troubleshoots all failed sessions. Instead of entering the administrator email address for each session, use the email variable $PMFailureEmailUser as the recipient for post-session email. If the administrator changes, you can correct all sessions by editing the $PMFailureEmailUser service variable, instead of editing the email address in each session. You might also use this functionality when you have different administrators for different Integration Services. If you deploy a folder from one repository to another or otherwise change the Integration Service that runs the session, the new service sends email to users associated with the new service when you use process variables instead of hard-coded email addresses.
423
Tips
The following suggestions can extend the capabilities of Email tasks. When the Integration Service runs on Windows, configure a Microsoft Outlook profile for each node. If you run the Integration Service on multiple nodes in a Windows environment, create a Microsoft Outlook profile for each node. To use the profile on multiple nodes for multiple users, create a generic Microsoft Outlook profile, such as PowerCenter, and use this profile on each node in the domain. Use the same profile on each node to ensure that the Microsoft Exchange Profile you configured for the Integration Service matches the profile on each node. Use service variables to address email. Use service variables to address email in Email tasks, post-session email, and suspension email. When the service variables $PMSuccessEmailUser and $PMFailureEmailUser are configured for the Integration Service, use them to address email. You can change the email recipient for all sessions the service runs by editing the service variables. It is easier to deploy sessions into production if you define service variables for both development and production servers. Generate and send post-session reports. Use a post-session success command to generate a report file and attach that file to a success email. For example, you create a batch file called Q3rpt.bat that generates a sales report, and you are running Microsoft Outlook on Windows. Figure 13-4 shows how you can configure the post-session success command to generate a report:
Figure 13-4. Using Post-Session Commands to Generate Reports
424
Figure 13-5 shows how you can configure success email to attach a report file:
Figure 13-5. Using Email Variables to Attach Reports
Use other mail programs. If you do not have Microsoft Outlook, use a post-session success command to invoke a command line email program, such as Windmill. In this case, you do not have to enter the email user name or subject, since the recipients, email subject, and body text will be contained in the batch file, sendmail.bat. Figure 13-6 shows how you can configure the post-session success command to invoke a command line email program:
Figure 13-6. Sending Email Without Microsoft Outlook
Tips
425
426
Chapter 14
Overview, 428 Partitioning Attributes, 429 Dynamic Partitioning, 433 Cache Partitioning, 437 Mapping Variables in Partitioned Pipelines, 438 Partitioning Rules, 439 Configuring Partitioning, 441
427
Overview
You create a session for each mapping you want the Integration Service to run. Each mapping contains one or more pipelines. A pipeline consists of a source qualifier and all the transformations and targets that receive data from that source qualifier. When the Integration Service runs the session, it can achieve higher performance by partitioning the pipeline and performing the extract, transformation, and load for each partition in parallel. A partition is a pipeline stage that executes in a single reader, transformation, or writer thread. The number of partitions in any pipeline stage equals the number of threads in the stage. By default, the Integration Service creates one partition in every pipeline stage. If you have the Partitioning option, you can configure multiple partitions for a single pipeline stage. You can configure partitioning information that controls the number of reader, transformation, and writer threads that the master thread creates for the pipeline. You can configure how the Integration Service reads data from the source, distributes rows of data to each transformation, and writes data to the target. You can configure the number of source and target connections to use. Complete the following tasks to configure partitions for a session:
Set partition attributes including partition points, the number of partitions, and the partition types. For more information about partitioning attributes, see Partitioning Attributes on page 429. You can enable the Integration Service to set partitioning at run time. When you enable dynamic partitioning, the Integration Service scales the number of session partitions based on factors such as the source database partitions or the number of nodes in a grid. For more information about dynamic partitioning, see Dynamic Partitioning on page 433. After you configure a session for partitioning, you can configure memory requirements and cache directories for each transformation. For more information about cache partitioning, see Cache Partitioning on page 437. The Integration Service evaluates mapping variables for each partition in a target load order group. You can use variable functions in the mapping to set the variable values. For more information about mapping variables in partitioned pipelines, see Mapping Variables in Partitioned Pipelines on page 438. When you create multiple partitions in a pipeline, the Workflow Manager verifies that the Integration Service can maintain data consistency in the session using the partitions. When you edit object properties in the session, you can impact partitioning and cause a session to fail. For information about how the Workflow Manager validates partitioning, see Partitioning Rules on page 439. You add or edit partition points in the session properties. When you change partition points you can define the partition type and add or delete partitions. For more information about configuring partition information, see Configuring Partitioning on page 441.
428
Partitioning Attributes
You can set the following attributes to partition a pipeline:
Partition points. Partition points mark thread boundaries and divide the pipeline into stages. The Integration Service redistributes rows of data at partition points. Number of partitions. A partition is a pipeline stage that executes in a single thread. If you purchase the Partitioning option, you can set the number of partitions at any partition point. When you add partitions, you increase the number of processing threads, which can improve session performance. Partition types. The Integration Service creates a default partition type at each partition point. If you have the Partitioning option, you can change the partition type. The partition type controls how the Integration Service distributes data among partitions at partition points.
Partition Points
By default, the Integration Service sets partition points at various transformations in the pipeline. Partition points mark thread boundaries and divide the pipeline into stages. A stage is a section of a pipeline between any two partition points. When you set a partition point at a transformation, the new pipeline stage includes that transformation. Figure 14-1 shows the default partition points and pipeline stages for a mapping with one pipeline:
Figure 14-1. Default Partition Points and Stages in a Sample Mapping
First Stage
Second Stage
Third Stage
Fourth Stage
When you add a partition point, you increase the number of pipeline stages by one. Similarly, when you delete a partition point, you reduce the number of stages by one. Partition points mark the points in the pipeline where the Integration Service can redistribute data across partitions. For example, if you place a partition point at a Filter transformation and define multiple partitions, the Integration Service can redistribute rows of data among the partitions before the Filter transformation processes the data. The partition type you set at this partition point controls the way in which the Integration Service passes rows of data to each partition. For more information about partition points, see Working with Partition Points on page 445.
Partitioning Attributes 429
Number of Partitions
The number of threads that process each pipeline stage depends on the number of partitions. A partition is a pipeline stage that executes in a single reader, transformation, or writer thread. The number of partitions in any pipeline stage equals the number of threads in that stage. You can define up to 64 partitions at any partition point in a pipeline. When you increase or decrease the number of partitions at any partition point, the Workflow Manager increases or decreases the number of partitions at all partition points in the pipeline. The number of partitions remains consistent throughout the pipeline. If you define three partitions at any partition point, the Workflow Manager creates three partitions at all other partition points in the pipeline. In certain circumstances, the number of partitions in the pipeline must be set to one. Increasing the number of partitions or partition points increases the number of threads. Therefore, increasing the number of partitions or partition points also increases the load on the node. If the node contains enough CPU bandwidth, processing rows of data in a session concurrently can increase session performance. However, if you create a large number of partitions or partition points in a session that processes large amounts of data, you can overload the system. The number of partitions you create equals the number of connections to the source or target. If the pipeline contains a relational source or target, the number of partitions at the source qualifier or target instance equals the number of connections to the database. If the pipeline contains file sources, you can configure the session to read the source with one thread or with multiple threads. Figure 14-2 shows the threads in a mapping with three partitions:
Figure 14-2. Thread Creation for a Mapping with Three Partitions
Threads for Partition #1 Threads for Partition #2 Threads for Partition #3 3 Reader Threads (First Stage) 6 Transformation Threads (Third Stage) 3 Writer Threads (Fourth Stage)
(Second Stage)
When you define three partitions across the mapping, the master thread creates three threads at each pipeline stage, for a total of 12 threads.
430
The Integration Service runs the partition threads concurrently. When you run a session with multiple partitions, the threads run as follows: 1. 2. The reader threads run concurrently to extract data from the source. The transformation threads run concurrently in each transformation stage to process data. The Integration Service redistributes data among the partitions at each partition point. The writer threads run concurrently to write data to the target.
3.
Partition point does not exist at multiple input group transformation. When a partition point does not exist at a multiple input group transformation, the Integration Service processes one thread at a time for the multiple input group transformation and all downstream transformations in the stage. Partition point exists at multiple input group transformation. When a partition point exists at a multiple input group transformation, the Integration Service creates a new pipeline stage and processes the stage with one thread for each partition. The Integration Service creates one transformation thread for each partition regardless of the number of output groups the transformation contains.
Partition Types
When you configure the partitioning information for a pipeline, you must define a partition type at each partition point in the pipeline. The partition type determines how the Integration Service redistributes data across partition points. The Integration Services creates a default partition type at each partition point. If you have the Partitioning option, you can change the partition type. The partition type controls how the Integration Service distributes data among partitions at partition points. You can create different partition types at different points in the pipeline. You can define the following partition types in the Workflow Manager:
Database partitioning. The Integration Service queries the IBM DB2 or Oracle database system for table partition information. It reads partitioned data from the corresponding nodes in the database. You can use database partitioning with Oracle or IBM DB2 source instances on a multi-node tablespace. You can use database partitioning with DB2 targets.
Partitioning Attributes
431
Hash auto-keys. The Integration Service uses a hash function to group rows of data among partitions. The Integration Service groups the data based on a partition key. The Integration Service uses all grouped or sorted ports as a compound partition key. You may need to use hash auto-keys partitioning at Rank, Sorter, and unsorted Aggregator transformations. Hash user keys. The Integration Service uses a hash function to group rows of data among partitions. You define the number of ports to generate the partition key. Key range. With key range partitioning, the Integration Service distributes rows of data based on a port or set of ports that you define as the partition key. For each port, you define a range of values. The Integration Service uses the key and ranges to send rows to the appropriate partition. Use key range partitioning when the sources or targets in the pipeline are partitioned by key range. Pass-through. In pass-through partitioning, the Integration Service processes data without redistributing rows among partitions. All rows in a single partition stay in the partition after crossing a pass-through partition point. Choose pass-through partitioning when you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions. Round-robin. The Integration Service distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows.
For more information about partition types, see Working with Partition Types on page 481.
432
Dynamic Partitioning
If the volume of data grows or you add more CPUs, you might need to adjust partitioning so the session run time does not increase. When you use dynamic partitioning, you can configure the partition information so the Integration Service determines the number of partitions to create at run time. The Integration Service scales the number of session partitions at run time based on factors such as source database partitions or the number of nodes in a grid. If any transformation in a stage does not support partitioning, or if the partition configuration does not support dynamic partitioning, the Integration Service does not scale partitions in the pipeline. The data passes through one partition. Complete the following tasks to scale session partitions with dynamic partitioning:
Set the partitioning. The Integration Service increases the number of partitions based on the partitioning method you choose. For more information about dynamic partitioning methods, see Configuring Dynamic Partitioning on page 434. Set session attributes for dynamic partitions. You can set session attributes that identify source and target file names and directories. The session uses the session attributes to create the partition-level attributes for each partition it creates at run time. For more information about setting session attributes for dynamic partitions, see Configuring Partition-Level Attributes on page 436. Configure partition types. You can edit partition points and partition types using the Partitions view on the Mapping tab of session properties. For information about using dynamic partitioning with different partition types, see Using Dynamic Partitioning with Partition Types on page 435. For information about configuring partition types, see Configuring Partitioning on page 441.
Note: Do not configure dynamic partitioning for a session that contains manual partitions. If
you set dynamic partitioning to a value other than disabled and you manually partition the session, the session is invalid.
Dynamic Partitioning
433
Disabled. Do not use dynamic partitioning. Defines the number of partitions on the Mapping tab. Based on number of partitions. Sets the partitions to a number that you define in the Number of Partitions attribute. Use the $DynamicPartitionCount session parameter, or enter a number greater than 1. Based on number of nodes in grid. Sets the partitions to the number of nodes in the grid running the session. If you configure this option for sessions that do not run on a grid, the session runs in one partition and logs a message in the session log. Based on source partitioning. Determines the number of partitions using database partition information. The number of partitions is the maximum of the number of partitions at the source. For more information about database partitioning, see Database Partitioning Partition Type on page 486.
434
Dynamic partitioning uses the same connection for each partition. You cannot use dynamic partitioning with XML sources and targets. You cannot use dynamic partitioning with the Debugger. When you set dynamic partitioning to a value other than disabled, and you manually partition the session on the Mapping tab, you invalidate the session. The session fails if you use a parameter other than $DynamicPartitionCount to set the number of partitions. The following dynamic partitioning configurations cause a session to run with one partition:
You override the default cache directory for an Aggregator, Joiner, Lookup, or Rank transformation. The Integration Service partitions a transformation cache directory when the default is $PMCacheDir. You override the Sorter transformation default work directory. The Integration Service partitions the Sorter transformation work directory when the default is $PMTempDir. You use an open-ended range of numbers or date keys with a key range partition type. You use datatypes other than numbers or dates as keys in key range partitioning. You use key range relational target partitioning. You create a user-defined SQL statement or a user-defined source filter. You set dynamic partitioning to the number of nodes in the grid, and the session does not run on a grid. You use pass-through relational source partitioning. You use dynamic partitioning with an Application Source Qualifier. You use SDK or PowerConnect sources and targets with dynamic partitioning.
Pass-through partitioning. If you change the number of partitions at a partition point, the number of partitions in each pipeline stage changes. If you use pass-through partitioning with a relational source, the session runs in one partition in the stage. Key range partitioning. You must define a closed range of numbers or date keys to use dynamic partitioning. The keys must be numeric or date datatypes. Dynamic partitioning does not scale partitions with key range partitioning on relational targets. Database partitioning. When you use database partitioning, the Integration Service creates session partitions based on the source database partitions. Use database partitioning with Oracle and IBM DB2 sources.
Dynamic Partitioning
435
Hash auto-keys, hash user keys, or round-robin. Use hash user keys, hash auto-keys, and round-robin partition types to distribute rows with dynamic partitioning. Use hash user keys and hash auto-keys partitioning when you want the Integration Service to distribute rows to the partitions by group. Use round-robin partitioning when you want the Integration Service to distribute rows evenly to partitions.
436
Cache Partitioning
When you create a session with multiple partitions, the Integration Service may use cache partitioning for the Aggregator, Joiner, Lookup, Rank, and Sorter transformations. When the Integration Service partitions a cache, it creates a separate cache for each partition and allocates the configured cache size to each partition. The Integration Service stores different data in each cache, where each cache contains only the rows needed by that partition. As a result, the Integration Service requires a portion of total cache memory for each partition. After you configure the session for partitioning, you can configure memory requirements and cache directories for each transformation in the Transformations view on the Mapping tab of the session properties. To configure the memory requirements, calculate the total requirements for a transformation, and divide by the number of partitions. To improve performance, you can configure separate directories for each partition. Table 14-1 describes the situations when the Integration Service uses cache partitioning for each applicable transformation:
Table 14-1. Cache Partitioning for Each Transformation
Transformation Aggregator Transformation Joiner Transformation Description You create multiple partitions in a session with an Aggregator transformation. You do not have to set a partition point at the Aggregator transformation. You create a partition point at the Joiner transformation. For more information about partitioning with Joiner transformations, see Partitioning Joiner Transformations on page 469. You create a hash auto-keys partition point at the Lookup transformation. For more information about partitioning with Lookup transformations, see Partitioning Lookup Transformations on page 476. You create multiple partitions in a session with a Rank transformation. You do not have to set a partition point at the Rank transformation. You create multiple partitions in a session with a Sorter transformation. You do not have to set a partition point at the Sorter transformation.
Lookup Transformation
Cache Partitioning
437
3.
4.
For more information about the variable functions, see the PowerCenter Transformation Language Reference. Table 14-2 describes how the Integration Service calculates variable values across partitions:
Table 14-2. Variable Value Calculations with Partitioned Sessions
Variable Function SetCountVariable SetMaxVariable SetMinVariable Variable Value Calculation Across Partitions Integration Service calculates the final count values from all partitions. Integration Service compares the final variable value for each partition and saves the highest value. Integration Service compares the final variable value for each partition and saves the lowest value.
Note: Use variable functions only once for each mapping variable in a pipeline. The
Integration Service processes variable functions as it encounters them in the mapping. The order in which the Integration Service encounters variable functions in the mapping may not be the same for every session run. This may cause inconsistent results when you use the same variable function multiple times in a mapping.
438
Partitioning Rules
You can create multiple partitions in a pipeline if the Integration Service can maintain data consistency when it processes the partitioned data. When you create a session, the Workflow Manager validates each pipeline for partitioning. For information about partitioning transformations, see Working with Partition Points on page 445.
You delete a transformation that was a partition point. You add a transformation that is a default partition point. You move a transformation that is a partition point to a different pipeline. You change a transformation that is a partition point in any of the following ways:
The existing partition type is invalid. The transformation can no longer support multiple partitions. The transformation is no longer a valid partition point.
You disable partitioning or you change the partitioning between a single node and a grid in a transformation after you create a pipeline with multiple partitions. You switch the master and detail source for the Joiner transformation after you create a pipeline with multiple partitions.
Partitioning Rules
439
440
Configuring Partitioning
When you create or edit a session, you can change the partitioning for each pipeline in a mapping. If the mapping contains multiple pipelines, you can specify multiple partitions in some pipelines and single partitions in others. You update partitioning information using the Partitions view on the Mapping tab of session properties. Add, delete, or edit partition points on the Partitions view of session properties. If you add a key range partition point, you can define the keys in each range. Figure 14-4 shows the configuration options on the Partitions view on the Mapping tab:
Figure 14-4. Session Properties Partitions View on the Mapping Tab
Add a partition point. Delete a partition point.
Partitioning Workspace
Configuring Partitioning
441
Table 14-3 lists the configuration options for the Partitions view on the Mapping tab:
Table 14-3. Options on Session Properties Partitions View on the Mapping Tab
Partitions View Option Add Partition Point Description Click to add a new partition point. When you add a partition point, the transformation name appears under the Partition Points node. For more information, see Overview on page 446. Click to delete the selected partition point. You cannot delete certain partition points. For more information, see Overview on page 446. Click to edit the selected partition point. This opens the Edit Partition Point dialog box. For more information about the options in this dialog box, see Table 14-4 on page 443. Displays the key and key ranges for the partition point, depending on the partition type. For key range partitioning, specify the key ranges. For hash user keys partitioning, this field displays the partition key. The Workflow Manager does not display this area for other partition types. Click to add or remove the partition key for key range or hash user keys partitioning. You cannot create a partition key for hash auto-keys, round-robin, or pass-through partitioning.
Edit Keys
Specify the partition type at the partition point. Add and delete partitions. Enter a description for each partition.
442
Figure 14-5 shows the configuration options where you edit partition points:
Figure 14-5. Edit Partition Point Dialog Box
Selected Partition Point Add a partition. Delete a partition. Select a partition. Enter the partition description.
You can enter a description for each partition you create. To enter a description, select the partition in the Edit Partition Point dialog box, and then enter the description in the Description field.
Configuring Partitioning
443
On the Partitions view of the Mapping tab, select a transformation that is not already a partition point, and click the Add a Partition Point button.
Tip: You can select a transformation from the Non-Partition Points node.
2.
Select the partition type for the partition point or accept the default value. For more information about specifying a valid partition type, see Setting Partition Types on page 484. Click OK. The transformation appears in the Partition Points node in the Partitions view on the Mapping tab of the session properties.
3.
444
Chapter 15
Overview, 446 Adding and Deleting Partition Points, 447 Partitioning Relational Sources, 450 Partitioning File Sources, 453 Partitioning Relational Targets, 459 Partitioning File Targets, 461 Partitioning Custom Transformations, 466 Partitioning Joiner Transformations, 469 Partitioning Lookup Transformations, 476 Partitioning Sorter Transformations, 477 Restrictions for Transformations, 479
445
Overview
Partition points mark the boundaries between threads in a pipeline. The Integration Service redistributes rows of data at partition points. You can add partition points to increase the number of transformation threads and increase session performance. For information about adding and deleting partition points, see Adding and Deleting Partition Points on page 447. When you configure a session to read a source database, the Integration Service creates a separate connection and SQL query to the source database for each partition. You can customize or override the SQL query. For more information about partitioning relational sources, see Partitioning Relational Sources on page 450. When you configure a session to load data to a relational target, the Integration Service creates a separate connection to the target database for each partition at the target instance. You configure the reject file names and directories for the target. The Integration Service creates one reject file for each target partition. For more information about partitioning relational targets, see Partitioning Relational Targets on page 459. You can configure a session to read a source file with one thread or with multiple threads. You must choose the same connection type for all partitions that read the file. For more information about partitioning source files, see Partitioning File Sources on page 453. When you configure a session to write to a file target, you can write the target output to a separate file for each partition or to a merge file that contains the target output for all partitions. You can configure connection settings and file properties for each target partition. For more information about configuring target files, see Partitioning File Targets on page 461. When you create a partition point at transformations, the Workflow Manager sets the default partition type. You can change the partition type depending on the transformation type.
446
You can delete these partition points if the pipeline contains only one partition or if the Integration Service passes all rows in a group to a single partition before they enter the transformation. You cannot delete this partition point. You cannot delete this partition point.
Controls how the writer passes data to the targets The Workflow Manager creates a partition point at a multiple input group transformation when it is configured to process each partition with one thread, or when a downstream multiple input group Custom transformation is configured to process each partition with one thread. For example, the Workflow Manager creates a partition point at a Joiner transformation that is connected to a downstream Custom transformation configured to use one thread per partition. This ensures that the Integration Service uses one thread to process each partition at a Custom transformation that requires one thread per partition. You cannot delete this partition point.
447
You cannot create a partition point at a source instance. You cannot create a partition point at a Sequence Generator transformation or an unconnected transformation. You can add a partition point at any other transformation provided that no partition point receives input from more than one pipeline stage. You cannot delete a partition point at a Source Qualifier transformation, a Normalizer transformation for COBOL sources, or a target instance. You cannot delete a partition point at a multiple input group Custom transformation that is configured to use one thread per partition. You cannot delete a partition point at a multiple input group transformation that is upstream from a multiple input group Custom transformation that is configured to use one thread per partition. The following partition types have restrictions with dynamic partitioning:
Pass-through. When you use dynamic partitioning, if you change the number of partitions at a partition point, the number of partitions in each pipeline stage changes. Key Range. To use key range with dynamic partitioning you must define a closed range of numbers or date keys. If you use an open-ended range, the session runs with one partition.
You can add and delete partition points at other transformations in the pipeline according to the following rules:
You cannot create partition points at source instances. You cannot create partition points at Sequence Generator transformations or unconnected transformations. You can add partition points at any other transformation provided that no partition point receives input from more than one pipeline stage.
448
In this mapping, the Workflow Manager creates partition points at the source qualifier and target instance by default. You can place an additional partition point at Expression transformation EXP_3. If you place a partition point at EXP_3 and define one partition, the master thread creates the following threads:
* * * *
Partition Points
(Second Stage)
In this case, each partition point receives data from only one pipeline stage, so EXP_3 is a valid partition point. The following transformations are not valid partition points:
Transformation Source SG_1 EXP_1 and EXP_2 Reason This is a source instance. This is a Sequence Generator transformation. If you could place a partition point at EXP_1 or EXP_2, you would create an additional pipeline stage that processes data from the source qualifier to EXP_1 or EXP_2. In this case, EXP_3 would receive data from two pipeline stages, which is not allowed.
For more information about processing threads, see Integration Service Architecture in the PowerCenter Administrator Guide.
449
partitioning, Integration Service reverts to pass-through partitioning and prints a message in the session log. Figure 15-2 shows where you can override the SQL query for each source partition:
Figure 15-2. Overriding the SQL Query and Entering a Filter Condition
Open Button
Transformations View
For more information about partitioning application sources, such as TIBCO, see the PowerExchange documentation.
450
overrides any customized SQL query that you set in the Designer when you configure the Source Qualifier transformation. For more information, see Source Qualifier Transformation in the Transformation Guide. The SQL query also overrides any key range and filter condition that you enter for a source partition. So, if you also enter a key range and source filter, the Integration Service uses the SQL query override to extract source data. If you create a key that contains null values, you can extract the nulls by creating another partition and entering an SQL query or filter to extract null values. To enter an SQL query for each partition, click the Browse button in the SQL Query field. Enter the query in the SQL Editor dialog box, and then click OK. If you entered an SQL query in the Designer when you configured the Source Qualifier transformation, that query appears in the SQL Query field for each partition. To override this query, click the Browse button in the SQL Query field, revise the query in the SQL Editor dialog box, and then click OK.
If you know that the IDs for customers outside the USA fall within the range for a particular partition, you can enter a filter in that partition to exclude them. Therefore, you enter the following filter condition for the second partition:
CUSTOMERS.COUNTRY = USA
451
When the session runs, the following queries for the two partitions appear in the session log:
READER_1_1_1> RR_4010 SQ instance [SQ_CUSTOMERS] SQL Query [SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, CUSTOMERS.LAST_NAME FROM CUSTOMERS WHERE CUSTOMER.CUSTOMER ID < 135000] [...] READER_1_1_2> RR_4010 SQ instance [SQ_CUSTOMERS] SQL Query [SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, CUSTOMERS.LAST_NAME FROM CUSTOMERS WHERE CUSTOMERS.COUNTRY = USA AND 135000 <= CUSTOMERS.CUSTOMER_ID]
To enter a filter condition, click the Browse button in the Source Filter field. Enter the filter condition in the SQL Editor dialog box, and then click OK. If you entered a filter condition in the Designer when you configured the Source Qualifier transformation, that query appears in the Source Filter field for each partition. To override this filter, click the Browse button in the Source Filter field, change the filter condition in the SQL Editor dialog box, and then click OK.
452
Flat file. You can configure a session to read flat file, XML, or COBOL source files. Command. You can configure a session to use an operating system command to generate source data rows or generate a file list. For more information about using a command to generate source data, see Working with File Sources on page 263.
When connecting to file sources, you must choose the same connection type for all partitions. You may choose different connection objects as long as each object is of the same type. To specify single- or multi-threaded reading for flat file sources, configure the source file name property for partitions 2-n. To configure for single-threaded reading, pass empty data through partitions 2-n. To configure for multi-threaded reading, leave the source file name blank for partitions 2-n. For more information about configuring file properties with multiple partitions, see Configuring for File Partitioning on page 454.
Use pass-through partitioning at the source qualifier. Use single- or multi-threaded reading with flat file or COBOL sources. Use single-threaded reading with XML sources. You cannot use multi-threaded reading if the source files are non-disk files, such as FTP files or WebSphere MQ sources. If you use a shift-sensitive code page, use multi-threaded reading if the following conditions are true:
The file is fixed-width. The file is not line sequential. You did not enable user-defined shift state in the source definition.
To read data from the three flat files concurrently, you must specify three partitions at the source qualifier. Accept the default partition type, pass-through. If you configure a session for multi-threaded reading, and the Integration Service cannot create multiple threads to a file source, it writes a message to the session log and reads the source with one thread. When the Integration Service uses multiple threads to read a source file, it may not read the rows in the file sequentially. If sort order is important, configure the session to read the
Partitioning File Sources 453
file with a single thread. For example, sort order may be important if the mapping contains a sorted Joiner transformation and the file source is the sort origin.
You can also use a combination of direct and indirect files to balance the load. Session performance for multi-threaded reading is optimal with large source files. The load may be unbalanced if the amount of input data is small. You cannot use a command for a file source if the command generates source data and the session is configured to run on a grid or is configured with the resume from the last checkpoint recovery strategy.
Reading direct files. You can configure the Integration Service to read from one or more direct files. If you configure the session with more than one direct file, the Integration Service creates a concurrent connection to each file. It does not create multiple connections to a file. Reading indirect files. When the Integration Service reads an indirect file, it reads the file list and then reads the files in the list sequentially. If the session has more than one file list, the Integration Service reads the file lists concurrently, and it reads the files in the list sequentially.
Reading direct files. When the Integration Service reads a direct file, it creates multiple reader threads to read the file concurrently. You can configure the Integration Service to read from one or more direct files. For example, if a session reads from two files and you create five partitions, the Integration Service may distribute one file between two partitions and one file between three partitions. Reading indirect files. When the Integration Service reads an indirect file, it creates multiple threads to read the file list concurrently. It also creates multiple threads to read the files in the list concurrently. The Integration Service may use more than one thread to read a single file.
click the source instance name for a file source, the Workflow Manager displays connection and file properties in the session properties. You can configure the source file names and directories for each source partition. The Workflow Manager generates a file name and location for each partition. Table 15-2 describes the file properties settings for file sources in a mapping:
Table 15-2. File Properties Settings for File Sources
Attribute Input Type Required/ Optional Required Description Type of source input. You can choose the following types of source input: - File. For flat file, COBOL, or XML sources. - Command. For source data or a file list generated by a command. You cannot use a command to generate XML source data. Order in which multiple partitions read input rows from a source file. You can choose the following options: - Optimize throughput. The Integration Service does not preserve input row order. - Keep relative input row order. The Integration Service preserves the input row order for the rows read by each partition. - Keep absolute input row order. The Integration Service preserves the input row order for all rows read by all partitions. For more information, see Configuring Concurrent Read Partitioning on page 457. Directory name of flat file source. By default, the Integration Service looks in the service process variable directory, $PMSourceFileDir, for file sources. If you specify both the directory and file name in the Source Filename field, clear this field. The Integration Service concatenates this field with the Source Filename field when it runs the session. You can also use the $InputFileName session parameter to specify the file location. File name, or file name and path of flat file source. Optionally, use the $InputFileName session parameter for the file name. The Integration Service concatenates this field with the Source File Directory field when it runs the session. For example, if you have C:\data\ in the Source File Directory field, then enter filename.dat in the Source Filename field. When the Integration Service begins the session, it looks for C:\data\filename.dat. By default, the Workflow Manager enters the file name configured in the source definition. You can choose the following source file types: - Direct. For source files that contain the source data. - Indirect. For source files that contain a list of files. When you select Indirect, the Integration Service finds the file list and reads each listed file when it runs the session.
Optional
Optional
Optional
Optional
455
Command
Optional
command to generate source data. Table 15-3 describes the session configuration and the Integration Service behavior when it uses a single thread to read source files:
Table 15-3. Configuring Source File Name for Single-Threaded Reading
Source File Name Partition #1 Partition #2 Partition #3 Partition #1 Partition #2 Partition #3 Value ProductsA.txt empty.txt empty.txt ProductsA.txt empty.txt ProductsB.txt Integration Service Behavior Integration Service creates one thread to read ProductsA.txt. It reads rows in the file sequentially. After it reads the file, it passes the data to three partitions in the transformation pipeline. Integration Service creates two threads. It creates one thread to read ProductsA.txt, and it creates one thread to read ProductsB.txt. It reads the files concurrently, and it reads rows in the files sequentially.
If you use FTP to access source files, you can choose a different connection for each direct file. For more information about using FTP to access source files, see Using FTP on page 749.
generated by each command. Otherwise, the Integration Service uses partitions 2-n to read a portion of the data generated by the command for the first partition. Table 15-4 describes the session configuration and the Integration Service behavior when it uses multiple threads to read source files:
Table 15-4. Configuring Source File Name for Multi-Threaded Reading
Attribute Partition #1 Partition #2 Partition #3 Partition #1 Partition #2 Partition #3 Value ProductsA.txt <blank> <blank> ProductsA.txt <blank> ProductsB.txt Integration Service Behavior Integration Service creates three threads to concurrently read ProductsA.txt. Integration Service creates three threads to read ProductsA.txt and ProductsB.txt concurrently. Two threads read ProductsA.txt and one thread reads ProductsB.txt.
Table 15-5 describes the session configuration and the Integration Service behavior when it uses multiple threads to read source data piped from a command:
Table 15-5. Configuring Commands for Multi-Threaded Reading
Attribute Partition #1 Partition #2 Partition #3 Partition #1 Partition #2 Partition #3 Value CommandA <blank> <blank> CommandA <blank> CommandB Integration Service Behavior Integration Service creates three threads to concurrently read data piped from the command. Integration Service creates three threads to read data piped from CommandA and CommandB. Two threads read the data piped from CommandA and one thread reads the data piped from CommandB.
Optimize throughput. The Integration Service does not preserve row order when multiple partitions read from a single file source. Use this option if the order in which multiple partitions read from a file source is not important. Keep relative input row order. Preserves the sort order of the input rows read by each partition. Use this option if you want to preserve the sort order of the input rows read by each partition. Table 15-6 shows an example sort order of a file source with 10 rows by two partitions:
Table 15-6. Keep Relative Input Row Order
Partition Partition #1 Partition #2 Rows Read 1,3,5,8,9 2,4,6,7,10
457
Keep absolute input row order. Preserves the sort order of all input rows read by all partitions. Use this option if you want to preserve the sort order of the input rows each time the session runs. In a pass-through mapping with passive transformations, the order of the rows written to the target will be in the same order as the input rows. Table 15-7 shows an example sort order of a file source with 10 rows by two partitions:
Table 15-7. Keep Absolute Input Row Order
Partition Partition #1 Partition #2 Rows Read 1,2,3,4,5 6,7,8,9,10
Note: By default, the Integration Service uses the Keep absolute input row order option in
sessions configured with the resume from the last checkpoint recovery strategy.
458
Enter reject file directories. Enter reject file names. Transformations View
459
Table 15-8 describes the partitioning attributes for relational targets in a pipeline:
Table 15-8. Partitioning Relational Target Attributes
Attribute Reject File Directory Reject File Name Description Location for the target reject files. Default is $PMBadFileDir. Name of reject file. Default is target name partition number.bad. You can also use the session parameter, $BadFileName, as defined in the parameter file.
Database Compatibility
When you configure a session with multiple partitions at the target instance, the Integration Service creates one connection to the target for each partition. If you configure multiple target partitions in a session that loads to a database or ODBC target that does not support multiple concurrent connections to tables, the session fails. When you create multiple target partitions in a session that loads data to an Informix database, you must create the target table with row-level locking. If you insert data from a session with multiple partitions into an Informix target configured for page-level locking, the session fails and returns the following message:
WRT_8206 Error: The target table has been created with page level locking. The session can only run with multi partitions when the target table is created with row level locking.
Sybase IQ does not allow multiple concurrent connections to tables. If you create multiple target partitions in a session that loads to Sybase IQ, the Integration Service loads all of the data in one partition.
460
None. Write the partitioned target files to the local machine. FTP. Transfer the partitioned target files to another machine. You can transfer the files to any machine to which the Integration Service can connect. For more information about using FTP to load to target files, see Using FTP on page 749. Loader. Use an external loader that can load from multiple output files. This option appears if the pipeline loads data to a relational target and you choose a file writer in the Writers settings on the Mapping tab. If you choose a loader that cannot load from multiple output files, the Integration Service fails the session. For more information about configuring external loaders for partitioning, see Partitioning Sessions with External Loaders on page 714. Message Queue. Transfer the partitioned target files to a WebSphere MQ message queue. For more information about loading to message queues, see the PowerExchange for WebSphere MQ User Guide.
Note: You can merge target files if you choose a local or FTP connection type for all target
partitions. You cannot merge output files from sessions with multiple partitions if you use an external loader or a WebSphere MQ message queue as the target connection type.
461
Transformations View
Table 15-9 describes the connection options for file targets in a mapping:
Table 15-9. File Targets Connection Options
Attribute Connection Type Description Choose an FTP, external loader, or message queue connection. Select None for a local connection. The connection type is the same for all partitions. For an FTP, external loader, or message queue connection, click the Open button in this field to select the connection object. You can specify a different connection object for each partition.
Value
462
Select output type. Enter output file directories. Enter output file names. Enter reject file directories. Enter reject file names.
463
Table 15-10 describes the file properties for file targets in a mapping:
Table 15-10. Target File Properties
Attribute Merge Type Description Type of merge the Integration Service performs on the data for partitioned targets. When merging target files, the Integration Service writes the output for all partitions to the merge file or a command when the session runs. You cannot merge files if the session uses an external loader or a message queue. For more information about merging target files, see Configuring Merge Options on page 465. Location of the merge file. Default is $PMTargetFileDir. Name of the merge file. Default is target name.out. Appends the output data to the target files and reject files for each partition. Appends output data to the merge file if you merge the target files. You cannot use this option for target files that are non-disk files, such as FTP target files. If you do not select this option, the Integration Service truncates each target file before writing the output data to the target file. If the file does not exist, the Integration Service creates it. Type of target for the session. Select File to write the target data to a file target. Select Command to send target data to a command. You cannot select Command for FTP or queue target connection. For more information about processing target data with a command, see Configuring Commands for Partitioned File Targets on page 465. Create a header row in the file target. Command used to generate the header row in the file target. Command used to generate a footer row in the file target. Command used to process merged target data. For more information about using a command to process merged target data, see Configuring Commands for Partitioned File Targets on page 465. Location of the target file. Default is $PMTargetFileDir. Name of target file. Default is target name partition number.out. You can also use the session parameter, $OutputFileName, as defined in the parameter file. Location for the target reject files. Default is $PMBadFileDir. Name of reject file. Default is target name partition number.bad. You can also use the session parameter, $BadFileName, as defined in the parameter file. Command used to process the target output data for a single partition. For more information, see Configuring Commands for Partitioned File Targets on page 465.
Output Type
Output File Directory Output File Name Reject File Directory Reject File Name Command
Note: For more information about configuring properties for file targets, see Working with
464
Target data for a single partition. You can enter a command for each target partition. The Integration Service sends the target data to the command when the session runs. To send the target data for a single partition to a command, select Command for the Output Type. Enter a command for the Command property for the partition in the session properties.
Merge data for all target partitions. You can enter a command to process the merge data for all partitions. The Integration Service concurrently sends the target data for all partitions to the command when the session runs. The command may not maintain the order of the target data. To send merge data for all partitions to a command, select Command as the Output Type and enter a command for the Merge Command Line property in the session properties.
For more information about using commands with flat file targets, see Working with File Targets on page 314.
Sequential Merge. The Integration Service creates an output file for all partitions and then merges them into a single merge file at the end of the session. The Integration Service sequentially adds the output data for each partition to the merge file. The Integration Service creates the individual target file using the Output File Name and Output File Directory values for the partition. File list. The Integration Service creates a target file for all partitions and creates a file list that contains the paths of the individual files. The Integration Service creates the individual target file using the Output File Name and Output File Directory values for the partition. If you write the target files to the merge directory or a directory under the merge directory, the file list contains relative paths. Otherwise, the list file contains absolute paths. Use this file as a source file if you use the target files as source files in another mapping. Concurrent Merge. The Integration Service concurrently writes the data for all target partitions to the merge file. It does not create intermediate files for each partition. Since the Integration Service writes to the merge file concurrently for all partitions, the sort order of the data in the merge file may not be sequential.
For more information about merging targets files in sessions that use an FTP connection, see Configuring FTP in a Session on page 753.
Partitioning File Targets 465
Add multiple partitions. You can create multiple partitions when the Custom transformation allows multiple partitions. For more information, see Working with Multiple Partitions on page 466. Create partition points. You can create a partition point at a Custom transformation even when the transformation does not allow multiple partitions. For more information, see Creating Partition Points on page 466.
The Java, SQL, and HTTP transformations were built using the Custom transformation and have the same partitioning features. Not all transformations created using the Custom transformation have the same partitioning features as the Custom transformation. When you configure a Custom transformation to process each partition with one thread, the Workflow Manager adds partition points depending on the mapping configuration. For more information, see Working with Threads on page 467. For more information about Custom transformations, see Custom Transformation in the Transformation Guide.
No. The transformation cannot be partitioned. The transformation and other transformations in the same pipeline are limited to one partition. You might choose No if the transformation processes all the input data together, such as data cleansing. Locally. The transformation can be partitioned, but the Integration Service must run all partitions in the pipeline on the same node. Choose Local when different partitions of the transformation must share objects in memory. Across Grid. The transformation can be partitioned, and the Integration Service can distribute each partition to different nodes.
Note: When you add multiple partitions to a mapping that includes a multiple input or output
group Custom transformation, you define the same number of partitions for all groups.
466
You can define the partition type for each input group in the transformation. You cannot define the partition type for output groups. Valid partition types are pass-through, round-robin, key range, and hash user keys.
Requires one thread for each partition. * Partition Point Does not require one thread for each partition.
CT_quartile contains one input group and is downstream from a multiple input group transformation. CT_quartile requires one thread for each partition, but the upstream Custom transformation does not. The Workflow Manager creates a partition point at the closest upstream multiple input group transformation, CT_Sort.
467
* Partition Point
Requires one thread for each partition. Does not require one thread for each partition.
CT_Order_class and CT_Order_Prep have multiple input groups, but only CT_Order_Prep requires one thread for each partition. The Workflow Manager creates a partition point at CT_Order_Prep.
468
1:n. Use one partition for the master source and multiple partitions for the detail source. The Integration Service maintains the sort order because it does not redistribute master data among partitions. n:n. Use an equal number of partitions for the master and detail sources. When you use n:n partitions, the Integration Service processes multiple partitions concurrently. You may need to configure the partitions to maintain the sort order depending on the type of partition you use at the Joiner transformation.
469
Note: When you use 1:n partitions, do not add a partition point at the Joiner transformation.
If you add a partition point at the Joiner transformation, the Workflow Manager adds an equal number of partitions to both master and detail pipelines. Use different partitioning guidelines, depending on where you sort the data:
Using sorted flat files. Use one of the following partitioning configurations:
Use 1:n partitions when you have one flat file in the master pipeline and multiple flat files in the detail pipeline. Configure the session to use one reader-thread for each file. Use n:n partitions when you have one large flat file in the master and detail pipelines. Configure partitions to pass all sorted data in the first partition, and pass empty file data in the other partitions. Use 1:n partitions for the master and detail pipeline. Use n:n partitions. If you use a hash auto-keys partition, configure partitions to pass all sorted data in the first partition.
Using sorted relational data. Use one of the following partitioning configurations:
Using the Sorter transformation. Use n:n partitions. If you use a hash auto-keys partition at the Joiner transformation, configure each Sorter transformation to use hash auto-keys partition points as well.
Add only pass-through partition points between the sort origin and the Joiner transformation.
Configure the mapping with one source and one Source Qualifier transformation in each pipeline.
470
Specify the path and file name for each flat file in the Properties settings of the Transformations view on the Mapping tab of the session properties. Each file must use the same file properties as configured in the source definition. The range of sorted data in the flat files can overlap. You do not need to use a unique range of data for each file.
Figure 15-6 shows sorted file data joined using 1:n partitioning:
Figure 15-6. Sorted File Data with 1:n Partitions
Flat File
Source Qualifier with passthrough partition Sorted Data Sorted output depends on join type.
The Joiner transformation may output unsorted data depending on the join type. If you use a full outer or detail outer join, the Integration Service processes unmatched master rows last, which can result in unsorted data.
471
Figure 15-7 shows sorted file data passed through a single partition to maintain sort order:
Figure 15-7. Sorted File Data Passed Through a Single Partition
Source Qualifier Joiner transformation with hash autokeys partition point
Source Qualifier
The example in Figure 15-7 shows sorted data passed in a single partition to maintain the sort order. The first partition contains sorted file data while all other partitions pass empty file data. At the Joiner transformation, the Integration Service distributes the data among all partitions while maintaining the order of the sorted data.
472
Joiner transformation Relational Source Source Qualifier transformation with key-range or passthrough partition point
The Joiner transformation may output unsorted data depending on the join type. If you use a full outer or detail outer join, the Integration Service processes unmatched master rows last, which can result in unsorted data.
473
Figure 15-9 shows sorted relational data passed through a single partition to maintain the sort order:
Figure 15-9. Sorted Relational Data Passed Through a Single Partition
Relational Source
Relational Source
The example in Figure 15-9 shows sorted relational data passed in a single partition to maintain the sort order. The first partition contains sorted relational data while all other partitions pass empty data. After the Integration Service joins the sorted data, it redistributes data among multiple partitions.
474
Figure 15-10 shows Sorter transformations used with hash auto-keys to maintain sort order:
Figure 15-10. Using Sorter Transformations with Hash Auto-Keys to Maintain Sort Order
Source with unsorted data Sorter transformation with hash autokeys partition point
Note: For best performance, use sorted flat files or sorted relational data. You may want to
calculate the processing overhead for adding Sorter transformations to the mapping.
475
You use the hash auto-keys partition type for the Lookup transformation. The lookup condition contains only equality operators. The database is configured for case-sensitive comparison. For example, if the lookup condition contains a string port and the database is not configured for case-sensitive comparison, the Integration Service does not perform cache partitioning and writes the following message to the session log:
CMN_1799 Cache partitioning requires case sensitive string comparisons. Lookup will not use partitioned cache as the database is configured for case insensitive string comparisons.
The Integration Service uses cache partitioning when you create a hash auto-keys partition point at the Lookup transformation. When the Integration Service creates cache partitions, it begins creating caches for the Lookup transformation when the first row of any partition reaches the Lookup transformation. If you configure the Lookup transformation for concurrent caches, the Integration Service builds all caches for the partitions concurrently. For more information about creating concurrent or sequential caches, see Lookup Caches in the Transformation Guide. For more information about cache partitioning, see Cache Partitioning on page 437.
Lookup transformations can share a partitioned cache if the transformations meet the following conditions:
The cache structures are identical. The lookup/output ports for the first shared transformation must match the lookup/output ports for the subsequent transformations. The transformations have the same lookup conditions, and the lookup condition columns are in the same order.
You cannot share a partitioned cache with a non-partitioned cache. When you share Lookup caches across target load order groups, you must configure the target load order groups with the same number of partitions. If the Integration Service detects a mismatch between Lookup transformations sharing an unnamed cache, it rebuilds the cache files. If the Integration Service detects a mismatch between Lookup transformations sharing a named cache, it fails the session.
476
477
Figure 15-11 shows where you specify the work directories in the session properties:
Figure 15-11. Session Properties - Configuring Sorter Transformations
478
Joiner Transformation
Sequence numbers generated by Normalizer and Sequence Generator transformations might not be sequential for a partitioned source, but they are unique.
479
480
Chapter 16
Overview, 482 Setting Partition Types, 484 Database Partitioning Partition Type, 486 Hash Auto-Keys Partition Type, 490 Hash User Keys Partition Type, 491 Key Range Partition Type, 493 Pass-Through Partition Type, 497 Round-Robin Partition Type, 499
481
Overview
The Integration Services creates a default partition type at each partition point. If you have the Partitioning option, you can change the partition type. The partition type controls how the Integration Service distributes data among partitions at partition points. When you configure the partitioning information for a pipeline, you must define a partition type at each partition point in the pipeline. The partition type determines how the Integration Service redistributes data across partition points. You can define the following partition types in the Workflow Manager:
Database partitioning. The Integration Service queries the IBM DB2 or Oracle system for table partition information. It reads partitioned data from the corresponding nodes in the database. Use database partitioning with Oracle or IBM DB2 source instances on a multinode tablespace. Use database partitioning with DB2 targets. For more information, see Database Partitioning Partition Type on page 486. Hash partitioning. Use hash partitioning when you want the Integration Service to distribute rows to the partitions by group. For example, you need to sort items by item ID, but you do not know how many items have a particular ID number. You can use the following types of hash partitioning:
Hash auto-keys. The Integration Service uses all grouped or sorted ports as a compound partition key. You may need to use hash auto-keys partitioning at Rank, Sorter, and unsorted Aggregator transformations. For more information, see Hash Auto-Keys Partition Type on page 490. Hash user keys. The Integration Service uses a hash function to group rows of data among partitions. You define the number of ports to generate the partition key. Hash User Keys Partition Type on page 491.
Key range. You specify one or more ports to form a compound partition key. The Integration Service passes data to each partition depending on the ranges you specify for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range. For more information, see Key Range Partition Type on page 493. Pass-through. The Integration Service passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions. For more information, see Pass-Through Partition Type on page 497. Round-robin. The Integration Service distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows. For more information, see Database Partitioning Partition Type on page 486.
482
The mapping in Figure 16-1 reads data about items and calculates average wholesale costs and prices. The mapping must read item information from three flat files of various sizes, and then filter out discontinued items. It sorts the active items by description, calculates the average prices and wholesale costs, and writes the results to a relational database in which the target tables are partitioned by key range. You can delete the default partition point at the Aggregator transformation because hash autokeys partitioning at the Sorter transformation sends all rows that contain items with the same description to the same partition. Therefore, the Aggregator transformation receives data for all items with the same description in one partition and can calculate the average costs and prices for this item correctly. When you use this mapping in a session, you can increase session performance by defining different partition types at the following partition points in the pipeline:
Source qualifier. To read data from the three flat files concurrently, you must specify three partitions at the source qualifier. Accept the default partition type, pass-through. Filter transformation. Since the source files vary in size, each partition processes a different amount of data. Set a partition point at the Filter transformation, and choose round-robin partitioning to balance the load going into the Filter transformation. Sorter transformation. To eliminate overlapping groups in the Sorter and Aggregator transformations, use hash auto-keys partitioning at the Sorter transformation. This causes the Integration Service to group all items with the same description into the same partition before the Sorter and Aggregator transformations process the rows. You can delete the default partition point at the Aggregator transformation. Target. Since the target tables are partitioned by key range, specify key range partitioning at the target to optimize writing data to the target.
For more information about specifying partition types, see Setting Partition Types on page 484.
Overview
483
484
* The default partition type is pass-through when the transformation scope is Transaction and hash auto-keys when the transformation scope is All Input.
When you use an IBM DB2 database, the Integration Service creates SQL statements similar to the following for partition 1:
SELECT <column list> FROM <table name> WHERE (nodenumber(<column 1>)=0 OR nodenumber(<column 1>) = 3)
486
The Integration Service generates the following SQL statements for IBM DB2:
Session Partition 1: SELECT <column list> FROM t1,t2 WHERE ((nodenumber(t1 column1)=0) AND <join clause> Session Partition 2: SELECT <column list> FROM t1,t2 WHERE ((nodenumber(t1 column1)=1) AND <join clause> Session Partition 3: No SQL query.
If you specify database partitioning for a database other than Oracle or IBM DB2, the Integration Service reads the data in a single partition and writes a message to the session log. If the number of session partitions is more than the number of partitions for the table in the database, the excess partitions receive no data. The session log describes which partitions do not receive data. If the number of session partitions is less than the number of partitions for the table in the database, the Integration Service distributes the data equally to the session partitions. Some session partitions receive data from more than one database partition. When you use database partitioning with dynamic partitioning, the Integration Service determines the number of session partitions when the session begins. For more information about dynamic partitioning, see Dynamic Partitioning on page 433. Session performance with partitioning depends on the data distribution in the database partitions. The Integration Service generates SQL queries to the database partitions. The
487
SQL queries perform union or join commands, which can result in large query statements that have a performance impact.
You cannot use database partitioning when you configure the session to use source-based or user-defined commits, constraint-based loading, or workflow recovery. When you configure a source qualifier for database partitioning, the Integration Service reverts to pass-through partitioning under the following circumstances:
The database table is stored on one database partition. You run the session in debug mode. You specify database partitioning for a session with one partition. You use pushdown optimization. Pushdown optimization works with the other partition types.
When you create an SQL override to read database tables and you set database partitioning, the Integration Service reverts to pass-through partitioning and writes a message to the session log. If you create a user-defined join, the Integration Service adds the join to the SQL statements it generates for each partition. If you create a source filter, the Integration Service adds it to the WHERE clause in the SQL query for each partition.
488
You cannot use database partitioning when you configure the session to use source-based or user-defined commit, constraint-based loading, or session recovery. The target table must contain a partition key, and you must link all not-null partition key columns in the target instance to a transformation in the mapping. You must use high precision mode when the IBM DB2 table partitioning key uses a Bigint field. The Integration Service fails the session when the IBM DB2 table partitioning key uses a Bigint field and you use low precision mode. If you create multiple partitions for a DB2 bulk load session, use database partitioning for the target partition type. If you choose any other partition type, the Integration Service reverts to normal load and writes the following message to the session log:
ODL_26097 Only database partitioning is support for DB2 bulk load. Changing target load type variable to Normal.
If you configure a session for database partitioning, the Integration Service reverts to passthrough partitioning under the following circumstances:
The DB2 target table is stored on one node. You run the session in debug mode using the Debugger. You configure the Integration Service to treat the database partitioning partition type as pass-through partitioning and you use database partitioning for a non-DB2 relational target.
489
In this mapping, the Sorter transformation sorts items by item description. If items with the same description exist in more than one source file, each partition will contain items with the same description. Without hash auto-keys partitioning, the Aggregator transformation might calculate average costs and prices for each item incorrectly. To prevent errors in the cost and prices calculations, set a partition point at the Sorter transformation and set the partition type to hash auto-keys. When you do this, the Integration Service redistributes the data so that all items with the same description reach the Sorter and Aggregator transformations in a single partition. For information about partition points where you can specify hash partitioning, see Table 16-1 on page 484.
490
When you specify hash auto-keys partitioning in the preceding mapping, the Sorter transformation receives rows of data grouped by the sort key, such as ITEM_DESC. If the item description is long, and you know that each item has a unique ID number, you can specify hash user keys partitioning at the Sorter transformation and select ITEM_ID as the hash key. This might improve the performance of the session since the hash function usually processes numerical data more quickly than string data. If you select hash user keys partitioning at any partition point, you must specify a hash key. The Integration Service uses the hash key to distribute rows to the appropriate partition according to group. For example, if you specify key range partitioning at a Source Qualifier transformation, the Integration Service uses the key and ranges to create the WHERE clause when it selects data from the source. Therefore, you can have the Integration Service pass all rows that contain customer IDs less than 135000 to one partition and all rows that contain customer IDs greater than or equal to 135000 to another partition. For more information, see Key Range Partition Type on page 493. If you specify hash user keys partitioning at a transformation, the Integration Service uses the key to group data based on the ports you select as the key. For example, if you specify ITEM_DESC as the hash key, the Integration Service distributes data so that all rows that contain items with the same description go to the same partition. To specify the hash key, select the partition point on the Partitions view of the Mapping tab, and click Edit Keys. This displays the Edit Partition Key dialog box. The Available Ports list displays the connected input and input/output ports in the transformation. To specify the hash key, select one or more ports from this list, and then click Add.
491
Figure 16-3 shows one port selected as the hash key for a Filter transformation:
Figure 16-3. Edit Partition Key Dialog Box
To rearrange the order of the ports that define the key, select a port in the Selected Ports list and click the up or down arrow.
492
Key range partitioning at the target optimizes writing to the target tables.
Partition 1: 00012999 Partition 2: 30005999 Partition 3: 60009999 Set the partition type at the target instance to key range. Create three partitions. Choose ITEM_ID as the partition key. The Integration Service uses this key to pass data to the appropriate partition.
4.
493
When you set the key range, the Integration Service sends all items with IDs less than 3000 to the first partition. It sends all items with IDs between 3000 and 5999 to the second partition. Items with IDs greater than or equal to 6000 go to the third partition. For more information about key ranges, see Adding Key Ranges on page 495.
To rearrange the order of the ports that define the partition key, select a port in the Selected Ports list and click the up or down arrow. In key range partitioning, the order of the ports does not affect how the Integration Service redistributes rows among partitions, but it can affect session performance. For example, you might configure the following compound partition key:
Selected Ports ITEMS.DESCRIPTION ITEMS.DISCONTINUED_FLAG
494
Since boolean comparisons are usually faster than string comparisons, the session may run faster if you arrange the ports in the following order:
Selected Ports ITEMS.DISCONTINUED_FLAG ITEMS.DESCRIPTION
You can leave the start or end range blank for a partition. When you leave the start range blank, the Integration Service uses the minimum data value as the start range. When you leave the end range blank, the Integration Service uses the maximum data value as the end range. For example, you can add the following ranges for a key based on CUSTOMER_ID in a pipeline that contains two partitions:
CUSTOMER_ID Partition #1 Partition #2 135000 Start Range End Range 135000
495
When the Integration Service reads the Customers table, it sends all rows that contain customer IDs less than 135000 to the first partition and all rows that contain customer IDs equal to or greater than 135000 to the second partition. The Integration Service eliminates rows that contain null values or values that fall outside the key ranges. When you configure a pipeline to load data to a relational target, if a row contains null values in any column that defines the partition key or if a row contains a value that fall outside all of the key ranges, the Integration Service sends that row to the first partition. When you configure a pipeline to read data from a relational source, the Integration Service reads rows that fall within the key ranges. It does not read rows with null values in any partition key column. If you want to read rows with null values in the partition key, use pass-through partitioning and create an SQL override.
The partition key must contain at least one port. If you choose key range partitioning at any partition point, you must specify a range for each port in the partition key. Use the standard PowerCenter date format to enter dates in key ranges. The Workflow Manager does not validate overlapping string or numeric ranges. The Workflow Manager does not validate gaps or missing ranges. If you choose key range partitioning and need to enter a date range for any port, use the standard PowerCenter date format. For more information about the default date format, see Dates in the Transformation Language Reference. When you define key range partitioning at a Source Qualifier transformation, the Integration Service defaults to pass-through partitioning if you change the SQL statement in the Source Qualifier transformation. The Workflow Manager does not validate overlapping string ranges, overlapping numeric ranges, gaps, or missing ranges. If a row contains a null value in any column that defines the partition key, or if a row contains values that fall outside all of the key ranges, the Integration Service sends that row to the first partition.
496
By default, this mapping contains partition points at the source qualifier and target instance. Since this mapping contains an XML target, you can configure only one partition at any partition point. In this case, the master thread creates one reader thread to read data from the source, one transformation thread to process the data, and one writer thread to write data to the target. Each pipeline stage processes the rows as follows:
Time Source Qualifier (First Stage) Row Set 1 Row Set 2 Row Set 3 Row Set 4 ... Row Set n Transformations (Second Stage) Row Set 1 Row Set 2 Row Set 3 ... Row Set n-1 Target Instance (Third Stage) Row Set 1 Row Set 2 ... Row Set n-2
Because the pipeline contains three stages, the Integration Service can process three sets of rows concurrently. If the Expression transformations are very complicated, processing the second (transformation) stage can take a long time and cause low data throughput. To improve performance, set a partition point at Expression transformation EXP_2 and set the partition
497
type to pass-through. This creates an additional pipeline stage. The master thread creates an additional transformation thread:
(Second Stage)
The Integration Service can now process four sets of rows concurrently as follows:
Source Qualifier (First Stage) Row Set 1 Row Set 2 Row Set 3 Row Set 4 ... Row Set n FIL_1 & EXP_1 Transformations (Second Stage) Row Set 1 Row Set 2 Row Set 3 ... Row Set n-1 EXP_2 & LKP_1 Transformations (Third Stage) Row Set 1 Row Set 2 ... Row Set n-2 Target Instance (Fourth Stage) Row Set 1 ... Row Set n-3
Time
By adding an additional partition point at Expression transformation EXP_2, you replace one long running transformation stage with two shorter running transformation stages. Data throughput depends on the longest running stage. So in this case, data throughput increases. For more information about processing threads, see Integration Service Architecture in the PowerCenter Administrator Guide.
498
The session based on this mapping reads item information from three flat files of different sizes:
Source file 1: 80,000 rows Source file 2: 5,000 rows Source file 3: 15,000 rows
When the Integration Service reads the source data, the first partition begins processing 80% of the data, the second partition processes 5% of the data, and the third partition processes 15% of the data. To distribute the workload more evenly, set a partition point at the Filter transformation and set the partition type to round-robin. The Integration Service distributes the data so that each partition processes approximately one-third of the data.
499
500
Chapter 17
Overview, 502 Running Pushdown Optimization Sessions, 503 Working with Databases, 505 Working with Dates, 508 Working with Expressions, 510 Error Handling, Logging, and Recovery, 516 Working with Slowly Changing Dimensions, 518 Working with Sequences and Views, 519 Using the $$PushdownConfig Mapping Parameter, 525 Configuring Sessions for Pushdown Optimization, 527
501
Overview
You can push transformation logic to the source or target database using pushdown optimization. When you use pushdown optimization, the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The source or target database executes the SQL queries to process the transformations. The amount of transformation logic you can push to the database depends on the database, transformation logic, and mapping and session configuration. When you run a session configured for pushdown optimization, the Integration Service analyzes the mapping and writes one or more SQL statements based on the mapping transformation logic. The Integration Service analyzes the transformation logic, mapping, and session configuration to determine the transformation logic it can push to the database. The Integration Service executes any SQL statement generated against the source or target tables, and it processes any transformation logic that it cannot push to the database. Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related to pushdown optimization. Figure 17-1 shows a mapping containing transformation logic that can be pushed to the source database:
Figure 17-1. Sample Mapping Used in a Pushdown Optimization Session
The mapping contains a Filter transformation that filters out all items except those with an ID greater than 1005. The Integration Service can push the transformation logic to the database. It generates the following SQL statement to process the transformation logic:
INSERT INTO ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC, n_PRICE) SELECT ITEMS.ITEM_ID, ITEMS.ITEM_NAME, ITEMS.ITEM_DESC, CAST(ITEMS.PRICE AS INTEGER) FROM ITEMS WHERE (ITEMS.ITEM_ID >1005)
The Integration Service generates an INSERT SELECT statement to get the ID, NAME, and DESCRIPTION columns from the source table. It filters the data using a WHERE clause. The Integration Service does not extract data from the database at this time.
502
Using source-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the source database. Using target-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the target database. Using full pushdown optimization. The Integration Service pushes as much transformation logic as possible to both source and target databases. If you configure a session for full pushdown optimization, and the Integration Service cannot push all the transformation logic to the database, it performs source-side or target-side pushdown optimization instead.
503
When you run a session with large quantities of data and full pushdown optimization, the database server must run a long transaction. Consider the following database performance issues when you generate a long transaction:
A long transaction uses more database resources. A long transaction locks the database for longer periods of time. This reduces database concurrency and increases the likelihood of deadlock. A long transaction increases the likelihood of an unexpected event.
The Rank transformation cannot be pushed to the database. If you configure the session for full pushdown optimization, the Integration Service pushes the Source Qualifier transformation and the Aggregator transformation to the source, processes the Rank transformation, and pushes the Expression transformation and target to the target database. The Integration Service does not fail the session if it can push only part of the transformation logic to the database.
504
Oracle IBM DB2 Teradata Microsoft SQL Server Sybase ASE Databases that use ODBC drivers
When you use native drivers, the Integration Service generates SQL statements using native database SQL. When you use ODBC drivers, the Integration Service generates SQL statements using ANSI SQL. The Integration Service can generate more functions when it generates SQL statements using native language instead of ANSI SQL. When the Integration Service reads data, it can convert data to a format that differs from the database format. Or, the database can use different default settings for handling null values, case sensitivity, and sort order than the Integration Service. When these formats and settings differ, the transformation logic processed on the database can produce different data than the same transformation logic processed by the Integration Service. The database can produce different output than the Integration Service when the following settings and conversions are different:
Nulls treated as the highest or lowest value. The returned results can differ depending on how null values are treated. The Integration Service and a database can treat null values differently. For example, you want to push a Sorter transformation to an Oracle database. In the session, you configure nulls as the lowest value in the sort order. Oracle treats null values as the highest value in the sort order. Sort order. The returned results can differ depending on the sort order applied while processing the transformation logic. The Integration Service and a database can use different sort orders. For example, you want to push the transformations in a session to a Microsoft SQL Server database, which is configured to use a sort order that is not case sensitive. You configure the session properties to use the binary sort order, which is case sensitive. The results differ based on whether the Integration Service or Microsoft SQL Server database process the transformation logic. Case sensitivity. The returned results can differ depending on whether case sensitivity is enabled. For example, the Integration Service uses case sensitive queries and the database does not. A Filter transformation uses the following filter condition: IIF(col_varchar2 = CA, TRUE, FALSE). You need the database to return rows that match CA. However, if you push this transformation logic to a Microsoft SQL Server database that is not case sensitive, it returns rows that match the values Ca, ca, cA, and CA. Numeric values converted to character values. The format of the character value can vary depending on the database or Integration Service that converts the numeric value. The Integration Service and a database can convert the same numeric value to a character value
505
in different formats. The database can convert numeric values to an unacceptable character format. For example, a table contains the number 1234567890. When the Integration Service converts the number to a character value, it inserts the characters 1234567890. However, a database might convert the number to 1.2E9. The two sets of characters represent the same value. However, if you require the characters in the format 1234567890, you can disable pushdown optimization.
Precision. Transformation datatypes use a default numeric precision that can vary from the native datatypes. For example, a transformation Decimal datatype has a precision of 1-28, while a Teradata Decimal datatype has a precision of 1-18. The results can vary if the database uses a different precision than the Integration Service.
IBM DB2
You can encounter the following problems using ODBC drivers with an IBM DB2 database:
A session containing a Sorter transformation fails if the sort is distinct and not case sensitive, and one of the sort keys is a string datatype. A session containing a Lookup transformation fails for source-side or full pushdown optimization. A session that requires type casting fails if the casting is from float or double to string, or if it requires any other type of casting that IBM DB2 databases disallow.
A session containing a Sorter transformation fails if the sort is distinct and not case sensitive. A pushdown optimization session fails when it loads datetime data to a Microsoft SQL Server target using an ODBC driver. The Integration Service converts datetime data to the ODBC Timestamp datatype, which is not a supported Microsoft SQL Server datatype.
Sybase ASE
You can encounter the following problems using ODBC drivers with a Sybase ASE database:
The session fails when it performs datatype conversions and you use Sybase ASE 12.5 or below. The session fails if you use a Joiner transformation configured for a full outer join.
506
Teradata
You can encounter the following problems using ODBC drivers with a Teradata database:
The session fails if it converts a numeric datatype with precision greater than 18. The session fails when you use full pushdown optimization for a session containing a Sorter transformation. A sort on a distinct key can give inconsistent results if the sort is not case sensitive and one port is a character port. An Integration Service and a database can produce different results for a session that contains an Aggregator transformation if the group by port is of a string datatype and is not case sensitive. A session containing a Lookup transformation fails if it is configured for target-side pushdown optimization. A session that contains a date-to-string datatype conversion fails.
507
Date values converted to character values. The Integration Service converts the transformation Date/Time datatype to the native datatype that supports subsecond precision in the database. The session fails if you configure the datetime format in the session to a format that the database does not support. For example, when the Integration Service performs the ROUND function on a date, it stores the date value in a character column, using the format MM/DD/YYYY HH:MI:SS.US. When the database performs this function, it stores the date in the default date format for the database. If the database is Oracle, it stores the date as the default DD-MON-YY. If you require the date to be in the format MM/DD/YYYY HH:MI:SS.US, you can disable pushdown optimization. Date formats for TO_CHAR and TO_DATE functions. The Integration Service uses the date format in the TO_CHAR or TO_DATE function when the Integration Service pushes the function to the database. The database converts each date string to a datetime value supported by the database. For example, the Integration Service pushes the following expression to the database:
TO_DATE( DATE_PROMISED, 'MM/DD/YY' )
The database interprets the date string in the DATE_PROMISED port based on the specified date format string MM/DD/YY. The database converts each date string, such as 01/22/98, to the supported date value, such as Jan 22 1998 00:00:00. If the Integration Service pushes a date format to an IBM DB2, a Microsoft SQL Server, or a Sybase database that the database does not support, the Integration Service stops pushdown optimization and processes the transformation. The Integration Service converts all dates before pushing transformations to an Oracle or Teradata database. If the database does not support the date format after the date conversion, the session fails.
HH24 date format. You cannot use the HH24 format in the date format string for Teradata. When the Integration Service generates SQL for a Teradata database, it uses the HH format string instead. Blank spaces in date format strings. You cannot use blank spaces in the date format string in Teradata. When the Integration Service generates SQL for a Teradata database, it substitutes the space with B. SYSDATE built-in variable. When you use the SYSDATE built-in variable, the Integration Service returns the current date and time for the node running the service process. However, when you push the transformation logic to the database, the SYSDATE variable returns the current date and time for the machine hosting the database. If the time
508
zone of the machine hosting the database is not the same as the time zone of the machine running the Integration Service process, the results can vary.
509
Optimization Viewer when it cannot push an expression to the database. Use the message to determine the reason why it could not push the expression to the database. The tables in this section summarize the availability of PowerCenter operators, variables, and functions in databases.
Operators
Table 17-1 summarizes the availability of PowerCenter operators in databases. Each column marked with an X indicates that the operator can be pushed to the database using source-side, target-side, or full pushdown optimization. Each column marked with an S indicates that the operator can be pushed to the database using source-side pushdown optimization:
Table 17-1. Operators Available in Databases
Operator +-*/ % || = > < >= <= <> != ^= not and or IBM DB2 X X S X X X X Microsoft SQL Server X X S X X X X Oracle X X X X X X X Sybase ASE X X S X X X X Teradata X X S X X X X X X X X ODBC X
510
Variables
Table 17-2 summarizes the availability of PowerCenter variables in databases. Columns marked with an X indicate variables that can be pushed to the database using source-side, target-side, or full pushdown optimization:
Table 17-2. PowerCenter Variables Available in Databases
Variable SESSSTARTTIME SYSDATE WORKFLOWSTARTTIME IBM DB2 X X Microsoft SQL Server X X Oracle X X Sybase ASE X X Teradata X X ODBC X
Functions
Table 17-3 summarizes the availability of PowerCenter functions in databases. Each column marked with an X indicates the function can be pushed to the database using source-side, target-side, or full pushdown optimization. Each column marked with an S indicates the function can be pushed to the database using source-side pushdown optimization:
Table 17-3. PowerCenter Functions Available in Databases
Microsoft SQL Server
Sybase ASE
Teradata
IBM DB2
Oracle
ODBC X X
Exceptions
You cannot push transformation logic to a Teradata database for a transformation that uses an expression containing ADD_TO_DATE to change days, hours, minutes, or seconds.
511
Sybase ASE
Teradata
IBM DB2
Oracle
Function COMPRESS() CONCAT() COS() COST() COSH() COUNT() CRC32() CUME() DATE_COMPARE() DATE_DIFF() DECODE() DECODE_BASE64() DECOMPRESS() ENCODE_BASE64() ERROR() EXP() FIRST() FLOOR() FV() GET_DATE_PART() GREATEST() IIF() IN() INDEXOF() INITCAP() INSTR() IS_DATE()
ODBC X X X S X X X X
Exceptions
S X X X X
S X X S X
X X X X X
S X X S X
S X X X X
X S S S S S
X X
X X
X X
X X
X X
X X
X S X X S S
512
Sybase ASE
Teradata
IBM DB2
Oracle
Function IS_NUMBER() IS_SPACES() ISNULL() LAST() LAST_DAY() LEAST() LENGTH() LOWER() LPAD() LTRIM()
ODBC X X X X X
Exceptions
X X X X X X X X X X X X X X X X X X
Oracle returns the date up to the second. If the input date contains subseconds, Oracle trims the date to the second.
If you use LTRIM in transformation logic, the database treats the argument (' ') as NULL, but the Integration Service treats the argument (' ') as spaces.
LOG() MAKE_DATE_TIME() MAX() MD5() MEDIAN() METAPHONE() MIN() MOD() MOVINGAVG() MOVINGSUM() NPER() PERCENTILE() PMT() POWER() PV()
X X
X X
X X
X X
X X
513
Sybase ASE
Teradata
IBM DB2
Oracle
Function RAND() RATE() REG_EXTRACT() REG_MATCH() REG_REPLACE REVERSE() REPLACECHR() REPLACESTR() RPAD() RTRIM()
ODBC X X X X
Exceptions
X X X X X X If you use RTRIM in transformation logic, the database treats the argument (' ') as NULL, but the Integration Service treats the argument (' ') as spaces.
X X X S
X X X X
X X S X
S X X If you use SOUNDEX in transformation logic, the database treats the argument (' ') as NULL, but the Integration Service treats the argument (' ') as spaces. X An IBM DB2 database and the Integration Service produce different results for STDDEV. IBM DB2 uses a different algorithm than other databases to calculate STDDEV.
STDDEV()
S X X X
S X X X
X X X X
S X X X
S X X X
When you push SYSDATE to the database, the database server returns the timestamp in the time zone of the database server, not the Integration Service.
514
Sybase ASE
Teradata
IBM DB2
Oracle
Function SYSTIMESTAMP()
ODBC X X X X X X
Exceptions When you push SYSTIMESTAMP to the database, the database server returns the timestamp in the time zone of the database server, not the Integration Service.
TAN() TANH() TO_BIGINT TO_CHAR(DATE) TO_CHAR(NUMBER) TO_DATE() TO_DECIMAL() TO_FLOAT() TO_INTEGER() TRUNC(DATE) TRUNC(NUMBER) UPPER() VARIANCE()
X X X X X X X X X
X S X X X X X X S
X X X X X X X X X X
X S X X X X X X S
X X X S X X X X X
X X X
X X X
X X X
S X
S X X
An IBM DB2 database and the Integration Service produce different results for VARIANCE. IBM DB2 uses a different algorithm than other databases to calculate VARIANCE.
515
Error Handling
When the Integration Service pushes transformation logic to the database, it cannot track errors that occur in the database. As a result, it handles errors differently than when it processes the transformations in the session. When the Integration Service runs a session configured for full pushdown optimization and an error occurs, the database handles the errors. When the database handles errors, the Integration Service does not write reject rows to the reject file.
Logging
When the Integration Service pushes transformation logic to the database, it cannot trace all the events that occur inside the database server. The statistics the Integration Service can trace depend on the type of pushdown optimization. When you push transformation logic to the database, the Integration Service generates a session log with the following differences:
The session log does not contain details for transformations processed by the database. The session log does not contain the thread busy percentage when the session is configured for full pushdown optimization. The session log contains the number of loaded rows when the session is configured for source-side, target-side, and full pushdown optimization. The session log does not contain the number of rows read from the source when the Integration Service uses full pushdown optimization and pushes all transformation logic to the database. The session log contains the number of rows read from each source when the Integration Service uses source-side pushdown optimization.
Recovery
If you configure a session for full pushdown optimization and the session fails, the Integration Service cannot perform incremental recovery because the database processes the transformations. Instead, the database rolls back the transactions. If the database server fails, it rolls back transactions when it restarts. If the Integration Service fails, the database server rolls back the transaction. If the failure occurs while the Integration Service is creating temporary sequence objects or views in the database, which is before any rows have been processed, the Integration Service runs the generated SQL on the database again. If the failure occurs before the database processes all rows, the Integration Service performs the following tasks:
516
1.
If applicable, the Integration Service drops and recreates temporary view or sequence objects in the database to ensure duplicate values are not produced. For more information, see Working with Sequences and Views on page 519. The Integration Service runs the generated SQL on the database again.
2.
If the failure occurs while the Integration Service is dropping the temporary view or sequence objects from the database, which is after all rows are processed, the Integration Service tries to drop the temporary objects again.
517
You can push transformations included in the slowly changing dimensions mapping to an Oracle or IBM DB2 database. The source data must not have duplicate rows. The database can become deadlocked if it makes multiple updates to the same row. You must create the slowly changing dimensions mapping using the 8.5 version of the Slowly Changing Dimensions Wizard. You cannot push the slowly changing dimensions logic to the database if it was created by the Slowly Changing Dimensions Wizard from a previous version.
518
To push a Source Qualifier transformation with a SQL override to the database. To push a filtered lookup, an unconnected relational lookup, or a Lookup transformation with a SQL override to the database.
After the database transaction completes, the Integration Service drops sequence and view objects created for pushdown optimization.
Note: The Integration Service does not parse or validate the SQL overrides. If you configure a
session to push the Source Qualifier or Lookup transformation with an SQL override to the database, test the SQL override against the database before you run the session.
Sequences
To push Sequence Generator transformation logic to a database, you must configure the session for pushdown optimization with sequences. If you configure a session to push Sequence Generator transformation logic to a database, the Integration Service completes the following tasks: 1. Creates a sequence object in the database. The Integration Service generates the sequence object in the database based on the Sequence Generator transformation logic. The Integration Service creates a unique name for each sequence object. To create a unique sequence object name, it adds the prefix PM_S to a value generated by a hash function. Generates the SQL query and executes it against the database. The Integration Service generates and executes the SQL query to push the Sequence Generator transformation logic to the database. Drops the sequence object from the database. When the transaction completes, the Integration Service drops the sequence object that it created in the database.
2.
3.
For more information about configuring the session for pushdown optimization, see Configuring Sessions for Pushdown Optimization on page 527.
519
When the Integration Service pushes transformation logic to the database, it executes the following SQL statement to create the sequence object in the source database:
CREATE SEQUENCE PM_S6UHW42OGXTY7NICHYIOSRMC5XQ START WITH 1 INCREMENT BY 1 MINVALUE 0 MAXVALUE 9223372036854775807 NOCYCLE CACHE 9223372036854775807
After the Integration Service creates the sequence object, the Integration Service executes the SQL query to process the transformation logic contained in the mapping:
INSERT INTO STORE_SALES(PRIMARYKEY, QUARTER, SALES, STORE_ID) SELECT CAST(PM_S6UHW42OGXTY7NICHYIOSRMC5XQ.NEXTVAL AS FLOAT), CAST(CAST(SALES_BYSTOREQUARTER_SRC.QUARTER AS FLOAT) AS VARCHAR2(10)), CAST(CAST(SALES_BYSTOREQUARTER_SRC.SALES AS NUMBER(10, 2)) AS NUMBER(25, 2)), CAST(SALES_BYSTOREQUARTER_SRC.STORE_ID AS NUMBER(0, 0)) FROM SALES_BYSTOREQUARTER_SRC
After the session completes, the Integration Service drops the sequence object from the database. If the session fails, the Integration Service drops and recreates the sequence object before performing recovery tasks.
Views
You must configure the session for pushdown optimization with views to enable the Integration Service to create the view objects in the database. The Integration Service creates a view object under the following conditions:
You configure pushdown optimization for a Source Qualifier or Lookup transformation configured with an SQL override. You configure pushdown optimization for a Lookup transformation configured with a filter. You configure pushdown optimization for an unconnected Lookup transformation.
If you configure the session for pushdown optimization with views, the Integration Service completes the following tasks: 1. Creates a view in the database. The Integration Service creates a view in the database based on the lookup filter, unconnected lookup, or SQL override in the Source Qualifier
520
or Lookup transformation. To create a unique view name, the Integration Service adds the prefix PM_V to a value generated by a hash function. 2. Executes an SQL query against the view. After the Integration Service creates a view object, it executes an SQL query against the view created in the database to push the transformation logic to the source. Drops the view from the database. When the transaction completes, the Integration Service drops the view it created.
3.
For more information about configuring the session for pushdown optimization, see Configuring Sessions for Pushdown Optimization on page 527.
You want the search to return customers whose names match variations of the name Johnson, including names such as Johnsen, Jonssen, and Jonson. To perform the name matching, you enter the following SQL override for the Source Qualifier transformation:
SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME, CUSTOMERS.ADDRESS1, CUSTOMERS.ADDRESS2, CUSTOMERS.CITY, CUSTOMERS.STATE, CUSTOMERS.POSTAL_CODE, CUSTOMERS.PHONE, CUSTOMERS.EMAIL FROM CUSTOMERS WHERE CUSTOMERS.LAST_NAME LIKE 'John%' OR CUSTOMERS.LAST_NAME LIKE 'Jon%'
When the Integration Service pushes transformation logic for this session to the database, it executes the following SQL statement to create a view in the source database:
CREATE VIEW PM_V4RZRW5GWCKUEWH35RKDMDPRNXI (CUSTOMER_ID, COMPANY, FIRST_NAME, LAST_NAME, ADDRESS1, ADDRESS2, CITY, STATE, POSTAL_CODE, PHONE, EMAIL) AS SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME, CUSTOMERS.ADDRESS1, CUSTOMERS.ADDRESS2, CUSTOMERS.CITY, CUSTOMERS.STATE, CUSTOMERS.POSTAL_CODE, CUSTOMERS.PHONE, CUSTOMERS.EMAIL FROM CUSTOMERS WHERE CUSTOMERS.LAST_NAME LIKE 'John%' OR CUSTOMERS.LAST_NAME LIKE 'Jon%'
After the Integration Service creates the view, it executes an SQL query to perform the transformation logic in the mapping:
SELECT PM_V4RZRW5GWCKUEWH35RKDMDPRNXI.CUSTOMER_ID, PM_V4RZRW5GWCKUEWH35RKDMDPRNXI.COMPANY, PM_V4RZRW5GWCKUEWH35RKDMDPRNXI.FIRST_NAME, PM_V4RZRW5GWCKUEWH35RKDMDPRNXI.LAST_NAME, PM_V4RZRW5GWCKUEWH35RKDMDPRNXI.ADDRESS1, PM_V4RZRW5GWCKUEWH35RKDMDPRNXI.ADDRESS2,
521
PM_V4RZRW5GWCKUEWH35RKDMDPRNXI.CITY, PM_V4RZRW5GWCKUEWH35RKDMDPRNXI.STATE, PM_V4RZRW5GWCKUEWH35RKDMDPRNXI.POSTAL_CODE, PM_V4RZRW5GWCKUEWH35RKDMDPRNXI.PHONE, PM_V4RZRW5GWCKUEWH35RKDMDPRNXI.EMAIL FROM PM_V4RZRW5GWCKUEWH35RKDMDPRNXI WHERE (PM_V4RZRW5GWCKUEWH35RKDMDPRNXI.POSTAL_CODE = 94117)
After the session completes, the Integration Service drops the view from the database. If the session fails, the Integration Service drops and recreates the view before performing recovery tasks.
Complete the following tasks to remove an orphaned sequence or view object from a database: 1. Identify the orphaned objects in the database. You can identify orphaned objects based on the session logs or a query on the database. Analyze the session log to determine orphaned objects from a session run. Run the database query to determine all orphaned objects in the database at a given time. Remove the orphaned objects from the database. You can execute SQL statements to drop the orphaned objects you identified.
2.
When the Integration Service creates a sequence or view object in the database, it adds the prefix PM_S to the names of sequence objects and PM_V to the names of view objects. You can search for these objects based on the prefix to identify them. The following queries show the syntax to search for sequence objects created by the Integration Service: IBM DB2:
SELECT SEQNAME FROM SYSCAT.SEQUENCES WHERE SEQSCHEMA = CURRENT SCHEMA AND SEQNAME LIKE PM\_S% ESCAPE \
Oracle:
SELECT SEQUENCE_NAME FROM USER_SEQUENCES WHERE SEQUENCE_NAME LIKE PM\_S% ESCAPE \
The following queries show the syntax to search for view objects created by the Integration Service: IBM DB2:
SELECT VIEWNAME FROM SYSCAT.VIEWS WHERE VIEWSCHEMA = CURRENT SCHEMA AND VIEW_NAME LIKE PM\_V% ESCAPE \
Oracle:
SELECT VIEW_NAME FROM USER_VIEWS WHERE VIEW_NAME LIKE PM\_V% ESCAPE \
Teradata:
SELECT TableName FROM DBC.Tables WHERE CreatorName = USER AND TableKind =V AND TableName LIKE PM\_V% ESCAPE \
523
The following query shows the syntax to drop view objects created by the Integration Service on any database:
DROP VIEW <view name>
524
3. 4. 5.
When you configure the session, select $$PushdownConfig for the Pushdown Optimization attribute. Define the parameter in the parameter file. Enter one of the following values for $$PushdownConfig in the parameter file:
Value None Source [Seq View] Target [Seq View] Full [Seq View] Description Specify to make the Integration Service process all transformation logic for the session. Specify to make the Integration Service push as much of the transformation logic to the source database as possible. Specify to make the Integration Service push as much of the transformation logic to the target database as possible. Specify to make the Integration Service push as much of the transformation logic to the source and target databases. The Integration Service processes any transformation logic that it cannot push to a database.
525
Optionally, specify Seq or View after Source, Target, or Full to enable the Integration Service to create sequence or view objects, respectively, in the database. For example, enter Source View to use source-side pushdown optimization and enable the creation of view objects in the source database. for more information on sequence and view objects, see Working with Sequences and Views on page 519.
526
Pushdown Options
You can configure the following pushdown optimization options in the session properties:
Pushdown Optimization. Select the type of pushdown optimization. If you use the $$PushdownConfig mapping parameter, ensure that you configured the mapping parameter and defined a value for it in the parameter file. For more information about configuring the mapping parameter and parameter file, see Using the $$PushdownConfig Mapping Parameter on page 525. Allow Temporary View for Pushdown. Select to allow the Integration Service to create temporary view objects in the database when it pushes the session to the database. The Integration Service must create a view in the database when the session contains an SQL override in the Source Qualifier transformation or Lookup transformation, a filtered lookup, or an unconnected lookup. For more information about views, see Views on page 520. Allow Temporary Sequence for Pushdown. Select to allow the Integration Service to create temporary sequence objects in the database. The Integration Service must create a sequence object in the database if the session contains a Sequence Generator transformation. For more information about sequences, see Sequences on page 519.
For more information about configuring the pushdown optimization options, see the Workflow Administration Guide. You can use the Pushdown Optimization Viewer to determine if you need to edit the mapping, transformation, or session configuration to push more transformation logic to the database. The Pushdown Optimization Viewer informs you whether it can push transformation logic to the database using source-side, target-side, or full pushdown optimization. If you can push transformation logic to the database, the Pushdown Optimization Viewer lists all transformations that can be pushed to the database. You can also select a pushdown option or pushdown group in the Pushdown Optimization Viewer to view the corresponding SQL statement that is generated for the specified selections.
Note: When you select a pushdown option or pushdown group, you do not change the
pushdown configuration. To change the configuration, you must update the pushdown option in the session properties.
Partitioning
You can push a session with multiple partitions to a database if the partition types are passthrough partitioning or key range partitioning.
Configuring Sessions for Pushdown Optimization 527
The end key range for each partition must equal the start range for the next partition to merge all rows into the first partition. The end key range cannot overlap with the next partition. For example, if the end range for the first partition is 3386, then the start range for the second partition must be 3386. You must configure the partition point at the Source Qualifier transformation to use hash auto-keys partitioning and all subsequent partition points to use either hash auto-keys or pass-through partitioning.
528
The first key range is 1313 - 3340, and the second key range is 3340 - 9354. The SQL statement merges all the data into the first partition:
SELECT ITEMS.ITEM_ID, ITEMS.ITEM_NAME, ITEMS.ITEM_DESC FROM ITEMS1 ITEMS WHERE (ITEMS.ITEM_ID >= 1313) AND (ITEMS.ITEM_ID < 9354) ORDER BY ITEMS.ITEM_ID
The SQL statement selects items 1313 through 9354, which includes all values in the key range, and merges the data from both partitions into the first partition. The SQL statement for the second partition passes empty data:
SELECT ITEMS.ITEM_ID, ITEMS.ITEM_NAME, ITEMS.ITEM_DESC FROM ITEMS1 ITEMS WHERE (1 = 0) ORDER BY ITEMS.ITEM_ID
529
If the session uses pass-through partitioning at the partition point at the Source Qualifier transformation and all subsequent partition points, the Integration Service can push the transformation logic to the database using source-side, target-side, or full pushdown optimization. If the session uses key range partitioning at the Source Qualifier transformation and contains hash auto-keys or pass-through partitions in downstream partition points, the Integration Service can push the transformation logic to the database using source-side or full pushdown optimization.
If pushdown optimization merges data from multiple partitions of a transformation into the first partition and the Integration Service processes the transformation logic for a downstream transformation, the Integration Service does not redistribute the rows among the partitions in the downstream transformation. It continues to pass the rows to the first partition and pass empty data in the other partitions.
Use the following rules and guidelines when you configure the Integration Service to push the target transformation logic to a database and configure the target load rules:
If you do not achieve performance gains when you use full pushdown optimization and the source rows are treated as delete or update, use source-side pushdown optimization. You cannot use full pushdown optimization and treat source rows as delete or update if the session contains a Union transformation and the Integration Service pushes transformation logic to a Sybase database.
530
Pipeline 2
Pushdown Group 1
Pushdown Group 2
Pipeline 1 and Pipeline 2 originate from different sources and contain transformations that are valid for pushdown optimization. The Integration Service creates a pushdown group for each target, and generates an SQL statement for each pushdown group. Because the two pipelines are joined, the transformations up to and including the Joiner transformation are part of both pipelines and are included in both pushdown groups.
Configuring Sessions for Pushdown Optimization 531
To view pushdown groups, open the Pushdown Optimization Viewer. The Pushdown Optimization Viewer previews the pushdown groups and SQL statements that the Integration Service generates at run time.
To view pushdown groups: 1. 2.
In the Workflow Manager, open a session configured for pushdown optimization. On the Mapping tab, select Pushdown Optimization in the left pane or View Pushdown Optimization in the right pane. The Pushdown Optimization Viewer displays the pushdown groups and the transformations that comprise each group. It displays the SQL statement for each partition if you configure multiple partitions in the pipeline. You can view messages and SQL statements generated for each pushdown group and pushdown option. Pushdown options include None, To Source, To Target, Full, and $$PushdownConfig.
532
Figure 17-4 shows a mapping containing one pipeline with two partitions that can be pushed to the source database:
Figure 17-4. Pushdown Optimization Viewer
Select a value for the SQL statements are connection variable. generated for the pushdown group.
3.
Select a pushdown option in the Pushdown Optimization Viewer to preview the SQL statements. The pushdown option in the viewer does not affect the optimization that occurs at run time. To change pushdown optimization for a session, edit the session properties.
4.
If you configure the session to use a connection variable, click Preview Result for Connection to select a connection value to preview. If the session uses a connection variable, you must choose a connection value each time you open the Pushdown Optimization Viewer. The Workflow Manager does not save the value you select, and the Integration Service does not use this value at run time. If an SQL override contains the $$$SessStartTime variable, the Pushdown Optimization Viewer does not expand this variable when you preview pushdown optimization.
Configuring Sessions for Pushdown Optimization 533
534
Chapter 18
Overview, 536 Aggregator Transformation, 538 Expression Transformation, 539 Filter Transformation, 540 Joiner Transformation, 541 Lookup Transformation, 542 Router Transformation, 544 Sequence Generator Transformation, 545 Sorter Transformation, 546 Source Qualifier Transformation, 547 Target, 548 Union Transformation, 549 Update Strategy Transformation, 550
535
Overview
When you configure pushdown optimization, the Integration Service tries to push each transformation to the database. The following criteria affects whether the Integration Service can push the transformation to the database:
Type of transformation Location of the transformation in the mapping Mapping and session configuration for the transformation The expressions contained in the transformation
The criteria might also affect the type of pushdown optimization that the Integration Service can perform and the type of database to which the transformation can be pushed. The Integration Service can push logic of the following transformations to the database:
Aggregator Expression Filter Joiner Lookup Router Sequence Generator Sorter Source Qualifier Target Union Update Strategy
The transformation logic updates a mapping variable and saves it to the repository database. The transformation contains a variable port. The session is configured to override the default values of input or output ports. The database does not have an equivalent operator, variable, or function that is used in an expression in the transformation.
536
The mapping contains too many branches. When you branch a pipeline, the SQL statement required to represent the mapping logic becomes more complex. The Integration Service cannot generate SQL for a mapping that contains more than 64 twoway branches, 43 three-way branches, or 32 four-way branches. If the mapping branches exceed these limitations, the Integration Service processes the downstream transformations.
The Integration Service processes all transformations in the mapping if any of the following conditions are true:
The session is a data profiling or debug session. The session is configured to log row errors.
Overview
537
Aggregator Transformation
The following table shows the pushdown types for each database to which you can push the Aggregator transformation:
Database IBM DB2 Microsoft SQL Server Oracle Sybase ASE Teradata ODBC Pushdown Type Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full
The Integration Service processes the Aggregator transformation if any of the following conditions are true:
The session and mapping is configured for incremental aggregation. The transformation contains a nested aggregate function. The transformation contains a conditional clause in an aggregate expression. The transformation uses a FIRST(), LAST(), MEDIAN(), or PERCENTILE() function in any port expression. An output port is not an aggregate or a part of the group by port. The transformation is downstream from another Aggregator transformation. The Integration Service pushes the first Aggregator transformation to the database and processes all downstream Aggregator transformations.
538
Expression Transformation
The following table shows the pushdown types for each database to which you can push the Expression transformation:
Database IBM DB2 Microsoft SQL Server Oracle Sybase ASE Teradata ODBC Pushdown Type Source-side, Target-side, Full Source-side, Target-side, Full Source-side, Target-side, Full Source-side, Target-side, Full Source-side, Target-side, Full Source-side, Target-side, Full
The Integration Service processes the Expression transformation if the transformation calls an unconnected Stored Procedure.
Expression Transformation
539
Filter Transformation
The following table shows the pushdown types for each database to which you can push the Filter transformation:
Database IBM DB2 Microsoft SQL Server Oracle Sybase ASE Teradata ODBC Pushdown Type Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full
The Integration Service processes the Filter transformation if the filter expression cannot be pushed to the database. For example, if the filter expression contains an operator that cannot be pushed to the database, the Integration Service does not push the filter expression to the database.
540
Joiner Transformation
The following table shows the pushdown types for each database to which you can push the Joiner transformation:
Database IBM DB2 Microsoft SQL Server Oracle Sybase ASE Teradata ODBC Pushdown Type Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full
The Integration Service processes the Joiner transformation if any of the following conditions are true:
The Joiner transformation is downstream from an Aggregator transformation. The Integration Service cannot push the master and detail pipelines of the Joiner transformation to the database. The Joiner transformation is configured with an outer join, and the master or detail source is a multi-table join. The Integration Service cannot generate SQL to represent an outer join combined with a multi-table join. The Joiner transformation is configured with a full outer join and configured for pushdown optimization to Sybase. The session is configured to mark all source rows as updates and configured for pushdown optimization to Teradata.
Joiner Transformation
541
Lookup Transformation
When you configure a Lookup transformation for pushdown optimization, the database performs a lookup on the database lookup table. The following table shows the pushdown types for each database to which you can push the Lookup transformation:
Database IBM DB2 Microsoft SQL Server Oracle Sybase ASE Teradata ODBC Pushdown Type Source-side, Target-side, Full Source-side, Full Source-side, Target-side, Full Source-side, Full Source-side, Full Source-side, Full
Use the following rules and guidelines when you configure the Integration Service to push Lookup transformation logic to a database:
The database does not use PowerCenter caches when processing transformation logic. The Integration Service processes all transformations after a pipeline branch when multiple Lookup transformations are present in different branches of pipeline, and the branches merge downstream. A session configured for target-side pushdown optimization fails if the session requires datatype conversion.
The Integration Service processes the Lookup transformation if any of the following conditions are true:
The Lookup transformation is a pipeline lookup. The transformation uses a dynamic cache. The transformation is configured to return the first, last, or any matching value. To use pushdown optimization, you must configure the Lookup transformation to report an error on multiple matches. The transformation is downstream from an Aggregator transformation. The session is configured to mark all source rows as updates and configured for pushdown optimization to Teradata. The session is configured for source-side pushdown optimization and the lookup table and source table are on different databases. The session is configured for target-side pushdown optimization and the lookup table and target table are on different databases.
542
The session is configured for full pushdown optimization and the source and target tables are not in the same database as the lookup table.
If the Lookup transformation contains an SQL override or a filter, configure the session for pushdown optimization with a view. The database might perform slower than the Integration Service if the session contains multiple unconnected lookups. The generated SQL might be complex because the Integration Service creates an outer join each time it invokes an unconnected lookup. Test the session with and without pushdown optimization to determine which session has better performance.
The Integration Service processes an unconnected Lookup transformation if you configure target-side pushdown optimization for the unconnected lookup.
Configure the session for pushdown optimization with a view. You cannot append an ORDER BY clause to the SQL statement in the lookup override. The session fails if you append an ORDER BY clause. The order of the columns in the lookup override must match the order of the ports in the Lookup transformation. If you reverse the order of the ports in the lookup override, the query results transpose the values. The session fails if the SELECT statement in the SQL override refers to a database sequence.
The Integration Service processes a Lookup transformation with an SQL override if the transformation contains Informatica outer join syntax in the SQL override. Use ANSI outer join syntax in the SQL override to push the transformation to a database.
Lookup Transformation
543
Router Transformation
The following table shows the pushdown types for each database to which you can push the Router transformation:
Database IBM DB2 Microsoft SQL Server Oracle Sybase ASE Teradata ODBC Pushdown Type Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full
You can use source-side pushdown when all output groups merge into one transformation that can be pushed to the source database. The Integration Service processes the Router transformation if the router expression cannot be pushed to the database. For example, if the expression contains an operator that cannot be pushed to the database, the Integration Service does not push the expression to the database.
544
The Integration Service processes the Sequence Generator transformation if any of the following conditions are true:
The transformation is reusable. The transformation is connected to multiple targets. The transformation connects the CURRVAL port. The Integration Service cannot push all of the logic for the Sequence Generator transformation to the database. For example, a Sequence Generator transformation creates sequence values that are supplied to two branches of a pipeline. When you configure pushdown optimization, the database can create sequence values for only one pipeline branch. When the Integration Service cannot push all of the Sequence Generator logic to the database, the following message appears:
Pushdown optimization stops at the transformation <transformation name> because the upstream Sequence Generator <Sequence Generator transformation name> cannot be pushed entirely to the database.
The pipeline branches before the Sequence Generator transformation and then joins back together after the Sequence Generator transformation. The pipeline branches after the Sequence Generator transformation and does not join back together. A sequence value passes through an Aggregator, a Filter, a Joiner, a Sorter, or a Union transformation. The Sequence Generator transformation provides sequence values to a transformation downstream from a Source Qualifier transformation that is configured to select distinct rows.
The Integration Service processes a transformation downstream from the Sequence Generator transformation if it uses the NEXTVAL port of the Sequence Generator transformation in CASE expressions and is configured for pushdown optimization to IBM DB2.
545
Sorter Transformation
The following table shows the pushdown types for each database to which you can push the Sorter transformation:
Database IBM DB2 Microsoft SQL Server Oracle Sybase ASE Teradata ODBC Pushdown Type Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full
Use the following rules and guidelines when you configure the Integration Service to push Sorter transformation logic to a database:
The Integration Service pushes the Sorter transformation to the database and processes downstream transformations when the Sorter transformation is configured for a distinct sort. The Integration Service processes the Sorter transformation when the Sorter transformation is downstream from a Union transformation and the port used as a sort key in the Sorter transformation is not projected from the Union transformation to the Sorter transformation.
546
The Integration Service processes the Source Qualifier transformation logic when any of the following conditions are true:
The transformation contains Informatica outer join syntax in the SQL override or a userdefined join. Use ANSI outer join syntax in the SQL override to enable the Integration Service to push the Source Qualifier transformation to the database. The source is configured for database partitioning. The source is an Oracle source that uses an XMLType datatype.
The SELECT statement in a custom SQL query must list the port names in the order in which they appear in the transformation. If the ports are not in the correct order, the session can fail or output unexpected results. Configure the session for pushdown optimization with a view. The session fails if the SELECT statement in the SQL override refers to a database sequence. The session fails if the SQL override contains an ORDER BY clause and you push the Source Qualifier transformation logic to an IBM DB2, a Microsoft SQL Server, a Sybase ASE, or a Teradata database. If a Source Qualifier transformation is configured to select distinct values and contains an SQL override, the Integration Service ignores the distinct configuration. If the session contains multiple partitions, specify the SQL override for all partitions. Test the SQL override query on the source database before you push it to the database because PowerCenter does not validate the override SQL syntax. The session fails if the SQL syntax is not compatible with the source database.
Source Qualifier Transformation 547
Target
The following table shows the pushdown types for each database to which you can push the target logic:
Database IBM DB2 Microsoft SQL Server Oracle Sybase ASE Teradata ODBC Pushdown Type Target-side, Full Target-side, Full Target-side, Full Target-side, Full Target-side, Full Target-side, Full
The Integration Service processes the target logic when you configure the session for full pushdown optimization and any of the following conditions are true:
The target includes a target update override. The session is configured for constraint-based loading, and the target load order group contains more than one target. The session uses an external loader.
If you configure full pushdown optimization and the target uses a different connection than the source, the Integration Service cannot push the all transformation logic to one database. Instead, it pushes as much transformation logic as possible to the source database and pushes any remaining transformation logic to the target database if it is possible. The Integration Service processes the target logic when you configure the session for targetside pushdown optimization and any of the following conditions are true:
The target includes a target update override. The target is configured for database partitioning. The session is configured for bulk loading. The session uses an external loader. Use source-side pushdown optimization with an external loader to enable the Integration Service to push the transformation logic to the source database.
548
Union Transformation
The following table shows the pushdown types for each database to which you can push the Union transformation:
Database IBM DB2 Microsoft SQL Server Oracle Sybase ASE Teradata ODBC Pushdown Type Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full
The Integration Service processes the Union transformation logic when any of the following conditions are true:
The Integration Service cannot push all input groups to the source database. The input groups do not originate from the same source. Some ports in the Union transformation are not connected to the upstream transformation that outputs distinct rows.
Union Transformation
549
Use the following rules and guidelines when you configure the Integration Service to push Update Strategy transformation logic to a database:
The generated SQL for an Update Strategy transformation with an update operation can be complex. Run the session with and without pushdown optimization to determine which configuration is faster. If there are multiple operations to the same row, the Integration Service and database can process the operations differently. To ensure that new rows are not deleted or updated when pushed to a database, source rows are processed in the following order: delete transactions, update transactions, and then insert transactions. If the Update Strategy transformation contains more than one insert, update, or delete operation, the Integration Service generates and runs the insert, update, and delete SQL statements serially. The Integration Service runs the three statements even if they are not required. This might decrease performance. The Integration Service ignores rejected rows when using full pushdown optimization. It does not write reject rows to a reject file.
The Integration Service processes the Update Strategy transformation if the update strategy expression cannot be pushed to the database. For example, if the expression contains an operator that cannot be pushed to the database, the Integration Service does not push the expression to the database.
550
The Integration Service processes the Update Strategy transformation if any of the following conditions are true
The transformation is upstream from an Aggregator, a Joiner, or a distinct Sorter transformation and the session is configured to treat the source rows as data driven. The transformation uses operations other than the insert operation and the Integration Service cannot push all transformation logic to the database. The update strategy expression returns a value that is not numeric and not Boolean.
551
552
Chapter 19
Overview, 554 Configuring Unique Workflow Instances, 555 Configuring Concurrent Workflows of the Same Name, 557 Using Parameters and Variables, 560 Steps to Configure Concurrent Workflows, 561 Starting and Stopping Concurrent Workflows, 563 Monitoring Concurrent Workflows, 565 Viewing Session and Workflow Logs, 566 Rules and Guidelines for Concurrent Workflows, 568
553
Overview
A concurrent workflow is a workflow that can run as multiple instances concurrently. A workflow instance is a representation of a workflow. When you configure a concurrent workflow, you enable the Integration Service to run one instance of the workflow multiple times concurrently, or you define unique instances of the workflow that run concurrently. Configure a concurrent workflow with one of the following workflow options:
Allow concurrent workflows with the same instance name. Configure one workflow instance to run multiple times concurrently. Each instance has the same source, target, and variables parameters. The Integration Service identifies each instance by the run ID. The run ID is a number that identifies a workflow instance that has run. For more information about running the same instance concurrently, see Configuring Concurrent Workflows of the Same Name on page 557. Configure unique workflow instances to run concurrently. Define each workflow instance name and configure a workflow parameter file for the instance. You can define different sources, targets, and variables in the parameter file. For more information about configuring the workflow instances, see Configuring Unique Workflow Instances on page 555.
When you run concurrent workflows, the Workflow Monitor displays each workflow by workflow name and instance name. If the workflow has no unique instance names, the Workflow Monitor displays the same workflow name for each concurrent workflow run. For more information about viewing concurrent workflows in the Workflow Monitor, see Monitoring Concurrent Workflows on page 565. The Integration Service appends either an instance name or a run ID and timestamp to the workflow and session log names to create unique log files for concurrent workflows. For more information about viewing logs for concurrent workflows, see Viewing Session and Workflow Logs on page 566.
554
When you start the workflow, you can choose which instances to run. For more information about starting concurrent workflows, see Starting and Stopping Concurrent Workflows on page 563. When you configure a concurrent workflow to run with unique instances, you can run the instances concurrently. To run one instance multiple times concurrently, configure the workflow to run with the same instance name.
555
When you recover a concurrent workflow, identify the instance that you want to recover. In the Workflow Monitor right-click the instance to recover. When you recover with pmcmd, enter the instance name parameter. For more information about recovering workflows from pmcmd, see the Command Line Reference.
Note: You cannot recover a session from the last checkpoint if the workflow updates a
The Integration Service can persist variables between concurrent workflow runs for concurrent workflows with unique instance names. You can stop or abort a workflow by instance name from pmcmd . The Workflow Monitor displays the instance name for each workflow run. The instance name appears in the workflow log and the Run Properties panel of the Workflow Monitor. When you configure a concurrent workflow to run with a unique instance name, the log file names contain the instance name. You can archive the log files or save them by instance name and timestamp.
556
557
When the Web Services Hub starts a web service workflow instance, the instance has the same name as the other workflow instance.
Note: When you enable a workflow as a web service, the Workflow Designer enables the
For more information about starting concurrent workflows, see Starting and Stopping Concurrent Workflows on page 563.
The Integration Service overwrites variables between concurrent workflow runs when the variables are the same for each run. You can stop or abort a workflow by run ID from pmcmd. You can stop or abort workflow tasks by run ID from pmcmd.
558
The Workflow Monitor does not display the run ID for each instance. The run ID appears in the workflow log, session log, and the Run Properties panel of the Workflow Monitor. When you configure a concurrent workflow to run with the same instance name, the log file names always contain timestamps.
559
The Integration Service persists workflow variables by workflow run instance name. For more information about the parameter file format, see Parameter File Structure on page 698.
560
In the Workflow Manager, open the Workflow. On the workflow General tab, enable concurrent execution. The workflow is enabled to run concurrently with the same instance name.
3.
561
Allow concurrent runs with the same instance name or require unique instance names.
4.
Allow concurrent run with the same instance name. The Integration Service can run concurrent workflows with the same name. Allow concurrent run only with unique instance name. The Integration Service can run concurrent workflows if the instance names are unique.
5.
Optionally, click the Add button to add workflow instance names. The workflow instance name is not case sensitive. The Workflow Designer validates the characters in the instance name. You cannot use the following special characters in the instance name:
$ . + - = ~ ` ! % ^ & * () [] {} ' \ " ; : / ? , < > \\ | \t \r \n
6.
Optionally, enter the path to a workflow parameter file for the instance. To use different sources, targets, or variables for each workflow instance, configure a parameter file for each instance. Click OK.
7.
562
Open the folder containing the workflow. From the Navigator, select the workflow that you want to start. Right-click the workflow and select Start Workflow Advanced.
4. 5.
Choose the workflow run instances to start. By default, all instances are selected. You can clear all the workflow instances and choose the instances to start. Click OK to start the workflow instances.
The Workflow Monitor displays each concurrent workflow name and instance name.
Variables tabs. The Integration Service does not run any of the configured workflow instances.
To start one concurrent workflow instance: 1. 2. 3.
Open the folder containing the workflow. From the Navigator, select the workflow that you want to start. Right-click the workflow in the Navigator and choose Start Workflow. The Integration Service runs one instance of the workflow with the attributes from the workflow Properties and Variables tabs.
When the workflow has unique instance names, the Workflow Monitor displays the instance name for each workflow run.
When you view concurrent workflows in Gantt Chart View, the Workflow Monitor displays one timeline for each workflow name or workflow instance name. You can scroll the Time Window to view information about specific workflow runs.
565
Unique instance names. The Integration Service appends the instance name to the log file name. Instances of the same name. The Integration Service appends a run ID and timestamp to the log file name.
The Integration Service writes the run ID and the workflow type to the workflow log. The workflow type describes if the workflow is a concurrent workflow. For example:
Workflow SALES_REV started with run id [108], run instance name [WF_CONCURRENT_SALES1], run type [Concurrent Run with Unique Instance Name].
Each session log also includes an entry that describes the workflow run ID and instance name:
Workflow: [SALES_REV] Run Instance Name: [WF_CONCURRENT_SALES1] Run Id: [108]
For example if the workflow log file name is wf_store_sales.log, and the instance name is store1_workflow, the Integration Service creates the following log file names for the binary workflow log file and the text workflow log file:
wf_store_sales.log.store1_workflow.bin wf_store_sales.log.store1_workflow
To avoid overwriting the log files, you can archive the log files or save the log files by timestamp.
566
For example if the workflow log file name is wf_store_sales.log, and the run ID is 845, the Integration Service creates the following log file names for the binary workflow log file and the text workflow log file if workflow runs on July 12, 2007 at 11:20:45:
wf_store_sales.log.845.20070712112045.bin wf_store_sales.log.845.20070712112045
When you configure the workflow to run concurrently with the same instance name, and you also define instance names, the Integration Service appends the instance name and the timestamp to the log file name. For example:
<workflow_name>.<instance_name>.<run ID>.20070712112045.bin <session_name>.<instance_name>.<run ID>.20070712112045.bin
The Integration Service writes the instance name and run ID to the workflow log. For example:
Workflow wf_Stores started with run ID[86034], run instance name[Store1_workflow]
567
You cannot reference workflow run instances in parameter files. To use separate parameters for each instance, you must configure different parameter files. If you use the same cache file name for more than one concurrent workflow instance, each workflow instance will be valid. However, sessions will fail if conflicts occur writing to the cache. You can use pmcmd to restart concurrent workflows by run ID or instance name. If you configure multiple instances of a workflow and you schedule the workflow, the Integration Service runs all instances at the scheduled time. You cannot run instances on separate schedules. Configure a worklet to run concurrently on the worklet General tab. You must enable a worklet to run concurrently if the parent workflow is enabled to run concurrently. Otherwise the workflow is invalid. You can enable a worklet to run concurrently and place it in two non-concurrent workflows. The Integration Service can run the two worklets concurrently. Two workflows enabled to run concurrently can run the same worklet. One workflow can run two instances of the same worklet if the worklet has no persisted variables. A session in a worklet can run concurrently with a session in another worklet of the same instance name when the session does not contain persisted variables. Aggregator transformation. You cannot use an incremental aggregation in a concurrent workflow. The session fails. Lookup transformation. Use non-persistent cache. If you use static persistent cache, the Integration Service builds the cache once. Sessions might fail or wait to share the cache. If you persist dynamic cache, the session fails. Sequence Generator transformation. To avoid generating the same set of sequence numbers for concurrent workflows, configure the number of cached values in the Sequence Generator transformation.
568
Chapter 20
Monitoring Workflows
Overview, 570 Using the Workflow Monitor, 572 Customizing Workflow Monitor Options, 578 Using Workflow Monitor Toolbars, 584 Working with Tasks and Workflows, 585 Workflow and Task Status, 590 Using the Gantt Chart View, 592 Using the Task View, 598 Tips, 602
569
Overview
You can monitor workflows and tasks in the Workflow Monitor. A workflow is a set of instructions that tells an Integration Service how to run tasks. Integration Services run on nodes or grids. The nodes, grids, and services are all part of a domain. With the Workflow Monitor, you can view details about a workflow or task in Gantt Chart view or Task view. You can also view details about the Integration Service, nodes, and grids. The Workflow Monitor displays workflows that have run at least once. You can run, stop, abort, and resume workflows from the Workflow Monitor. The Workflow Monitor continuously receives information from the Integration Service and Repository Service. It also fetches information from the repository to display historic information. The Workflow Monitor consists of the following windows:
Navigator window. Displays monitored repositories, Integration Services, and repository objects. Output window. Displays messages from the Integration Service and the Repository Service. Properties window. Displays details about services, workflows, worklets, and tasks. Time window. Displays progress of workflow runs. Gantt Chart view. Displays details about workflow runs in chronological (Gantt Chart) format. Task view. Displays details about workflow runs in a report format, organized by workflow run.
The Workflow Monitor displays time relative to the time configured on the Integration Service machine. For example, a folder contains two workflows. One workflow runs on an Integration Service in the local time zone, and the other runs on an Integration Service in a time zone two hours later. If you start both workflows at 9 a.m. local time, the Workflow Monitor displays the start time as 9 a.m. for one workflow and as 11 a.m. for the other workflow.
570
Navigator Window
Time Window
Output Window
Properties Window
Toggle between Gantt Chart view and Task view by clicking the tabs on the bottom of the Workflow Monitor. You can view and hide the Output and Properties windows in the Workflow Monitor. To view or hide the Output window, click View > Output. To view or hide the Properties window, click View > Properties View. You can also dock the Output and Properties windows at the bottom of the Workflow Monitor workspace. To dock the Output or Properties window, right-click a window and select Allow Docking. If the window is floating, drag the window to the bottom of the workspace. If you do not allow docking, the windows float in the Workflow Monitor workspace.
Overview
571
Select Start > Programs > Informatica PowerCenter [version] > Client >Workflow Monitor from the Windows Start menu. Configure the Workflow Manager to open the Workflow Monitor when you run a workflow from the Workflow Manager. Click Tools > Workflow Monitor from the Designer, Workflow Manager, or Repository Manager. Or, click the Workflow Monitor icon on the Tools toolbar. When you use a Tools button to open the Workflow Monitor, PowerCenter uses the same repository connection to connect to the repository and opens the same folders.
You can open multiple instances of the Workflow Monitor on one machine using the Windows Start menu.
To open the Workflow Monitor when you start a workflow: 1. 2.
In the Workflow Manager, click Tools > Options. In the General tab, select Launch Workflow Monitor When Workflow Is Started.
In the Workflow Manager, connect to a repository. In the Navigator, right-click an Integration Service or a repository and select Run Monitor. The Workflow Monitor appears.
572
Connecting to a Repository
When you open the Workflow Monitor, you must connect to a repository. Connect to repositories by clicking Repository > Connect. Enter the repository name and connection information. After you connect to a repository, the Workflow Monitor displays a list of Integration Services available for the repository. The Workflow Monitor can monitor multiple repositories, Integration Services, and workflows at the same time.
Note: If you are not connected to a repository, you can remove the repository from the
Navigator. Select the repository in the Navigator and click Edit > Delete. The Workflow Monitor displays a message verifying that you want to remove the repository from the Navigator list. Click Yes to remove the repository. You can connect to the repository again at any time.
When you open an Integration Service, the Workflow Monitor gets workflow run information stored in the repository. It does not get dynamic workflow run information from currently running workflows.
573
Filtering Tasks
You can view all or some workflow tasks. You can filter tasks you do not want to view. For example, if you want to view only Session tasks, you can hide all other tasks. You can view all tasks at any time. You can also filter deleted tasks. To filter deleted tasks, click Filters > Deleted Tasks.
To filter tasks: 1.
Click Filters > Tasks. The Filter Tasks dialog box appears.
2. 3.
Clear the tasks you want to hide, and select the tasks you want to view. Click OK.
Note: When you filter a task, the Gantt Chart view displays a red link between tasks to
indicate a filtered task. You can double-click the link to view the tasks you hid.
574
You can hide unconnected Integration Services. When you hide a connected Integration Service, the Workflow Monitor asks if you want to disconnect from the Integration Service and then filter it. You must disconnect from an Integration Service before hiding it.
To filter Integration Ser vices: 1.
In the Navigator, right-click a repository to which you are connected and select Filter Integration Services. -orConnect to a repository and click Filters > Integration Services. The Filter Integration Services dialog box appears.
2.
Select the Integration Services you want to view and clear the Integration Services you want to filter. Click OK. If you are connected to an Integration Service that you clear, the Workflow Monitor prompts you to disconnect from the Integration Service before filtering.
3.
Click Yes to disconnect from the Integration Service and filter it. The Workflow Monitor hides the Integration Service from the Navigator. Click No to remain connected to the Integration Service. If you click No, you cannot filter the Integration Service.
Tip: To filter an Integration Service in the Navigator, right-click it and select Filter Integration
Service.
575
You can open and close folders in the Gantt Chart and Task views. When you open a folder, it opens in both views. To open a folder, right-click it in the Navigator and select Open. Or, you can double-click the folder.
Viewing Statistics
You can view statistics about the objects you monitor in the Workflow Monitor. Click View > Statistics. The Statistics window displays the following information:
Number of opened repositories. Number of repositories you are connected to in the Workflow Monitor. Number of connected Integration Services. Number of Integration Services you connected to since you opened the Workflow Monitor. Number of fetched tasks. Number of tasks the Workflow Monitor fetched from the repository during the period specified in the Time window.
You can also view statistics about nodes and sessions. For more information about node statistics, see Viewing Integration Service Properties on page 606. For more information about viewing session statistics, see Viewing Session Statistics on page 612.
Viewing Properties
You can view properties for the following items:
Tasks. You can view properties, such as task name, start time, and status. For more information about viewing Task progress, see Viewing Task Progress Details on page 611. For information about viewing command task details, see Viewing Command Task Run Properties on page 615. For information about viewing session task details, see Viewing Session Task Run Properties on page 616. Sessions. You can view properties about the Session task and session run, such as mapping name and number of rows successfully loaded. You can also view load statistics about the session run. For more information about session details, see Viewing Session Task Run Properties on page 616. You can also view performance details about the session run. For more information, see Viewing Performance Details on page 621. Workflows. You can view properties such as start time, status, and run type. For more information about viewing workflow details, see Viewing Workflow Run Properties on page 610.
576
Links. When you double-click a link between tasks in Gantt Chart view, you can view tasks that you filtered out. Integration Services. You can view properties such as Integration Service version and startup time. You can also view the sessions and workflows running on the Integration Service. For more information about viewing Integration Service details, see Viewing Integration Service Properties on page 606. Grid. You can view properties such as the name, Integration Service type, and code page of a node in the Integration Service grid. You can view these details in the Integration Service Monitor. For more information about the Integration Service Monitor, see Viewing Integration Service Properties on page 606. Folders. You can view properties such as the number of workflow runs displayed in the Time window. For more information about viewing folder details, see Viewing Repository Folder Details on page 609.
To view properties for all objects, right-click the object and select Properties. You can rightclick items in the Navigator or the Time window in either Gantt Chart view or Task view. To view link properties, double-click the link in the Time window of Gantt Chart view. When you view link properties, you can double-click a task in the Link Properties dialog box to view the properties for the filtered task.
577
General. Customize general options such as the maximum number of workflow runs to display and whether to receive messages from the Workflow Manager. See Configuring General Options on page 578. Gantt Chart view. Configure Gantt Chart view options such as workspace color, status colors, and time format. See Configuring Gantt Chart View Options on page 580. Task view. Configure which columns to display in Task view. See Configuring Task View Options on page 581. Advanced. Configure advanced options such as the number of workflow runs the Workflow Monitor holds in memory for each Integration Service. See Configuring Advanced Options on page 582.
578
Table 20-1 describes the options you can configure on the General tab:
Table 20-1. Workflow Monitor General Tab
Setting Maximum Days Maximum Workflow Runs per Folder Receive Messages from Workflow Manager Receive Notifications from Repository Service Description Number of tasks the Workflow Monitor displays up to a maximum number of days. Default is 5. Maximum number of workflow runs the Workflow Monitor displays for each folder. Default is 200. Select to receive messages from the Workflow Manager. The Workflow Manager sends messages when you start or schedule a workflow in the Workflow Manager. The Workflow Monitor displays these messages in the Output window. Select to receive notification messages in the Workflow Monitor and view them in the Output window. You must be connected to the repository to receive notifications. Notification messages include information about objects that another user creates, modifies, or delete. You receive notifications about folders and Integration Services. The Repository Service notifies you of the changes so you know objects you are working with may be out of date. You also receive notices posted by the user who manages the Repository Service.
579
Table 20-2 describes the options you can configure on the Gantt Chart tab:
Table 20-2. Workflow Monitor Gantt Chart Tab
Setting Status Color Description Select a status and configure the color for the status. The Workflow Monitor displays tasks with the selected status in the colors you select. You can select two colors to display a gradient. Configure the color for the recovery sessions. The Workflow Monitor uses the status color for the body of the status bar, and it uses and the recovery color as a gradient in the status bar. Select a color for each workspace component. Select a display format for the time window.
580
581
Table 20-3 describes the options you can configure on the Advanced tab:
Table 20-3. Workflow Monitor Advanced Tab
Setting Expand Running Workflows Automatically Refresh workflow tasks when the connection to the Integration Service is re-established. Expand workflow runs when opening the latest runs Hide Folders/Workflows That Do Not Contain Any Runs When Filtering By Running/ Schedule Runs Description Expands running workflows in the Navigator. Refreshes workflow tasks when you reconnect to the Integration Service. Expands workflows when you open the latest run. Hides folders or workflows under the Workflow Run column in the Time window when you filter running or scheduled tasks.
582
583
For more information about how to perform these toolbar operations, see Using the Designer in the Designer Guide. By default, the Workflow Monitor displays the following toolbars:
Standard. Contains buttons to connect to and disconnect from repositories, see print previews, and to search the workspace. Integration Service. Contains buttons to connect to and disconnect from Integration Services, ping Integration Service, and perform workflows operations. View. Contains buttons to refresh the view, get workflow history and properties, and show workflow or session logs. Filters. Contains buttons to display most recent runs, and to filter tasks, servers, and folders.
After a toolbar appears, it displays until you exit the Workflow Monitor or hide the toolbar. You can drag each toolbar to resize and reposition each toolbar.
584
Run a task or workflow. Resume a suspended workflow. Restart a task or workflow without recovery. Stop or abort a task or workflow. Schedule and unschedule a workflow. View session logs and workflow logs. View history names.
In the Navigator or Workflow Run List, select the workflow with the runs you want to see. Right-click the workflow and select Open Latest 20 Runs. The menu option is disabled when the latest 20 workflow runs are already open. Up to 20 of the latest runs appear.
2.
Select the number of runs you want to display. The runs appear in the Workflow Run List.
585
In the Navigator, select the workflow you want to run. Right-click the workflow in the Navigator and select Restart. -orClick Task > Restart. The Integration Service runs the workflow you specify.
In the Navigator, select the task or worklet you want to run. Right-click the task or worklet in the Navigator and select Restart Task. The Integration Service runs the task or worklet you specify. It does not run the rest of the workflow.
In the Navigator, select the task from which you want to run the workflow. Right-click the task and select Restart Workflow from Task. -orClick Task > Restart. The Integration Service runs the workflow starting with the task you specify.
586
Workflow Monitor. When you recover a workflow, the Integration Service recovers the failed session, and continues running the rest of the tasks in the workflow path. For more information about recovering a workflow, see Recovery Options on page 382. Recovery behavior for real-time sessions depends on the real-time source. For more information, see Restarting and Recovering Real-time Sessions on page 351.
To recover a workflow or worklet: 1. 2.
In the Navigator, select the workflow or worklet you want to recover. Click Tasks > Recover. -orRight-click the workflow or worklet in the Navigator and select Recover. The Workflow Monitor displays Integration Service messages about the recover command in the Output window.
In the Navigator, select the task or workflow you want to restart. Click Tasks > Cold Start Task or Workflows > Cold Start Workflow. -orRight-click the task or workflow in the Navigator and select Cold Start Task or Cold Start Workflow.
587
Behavior for real-time sessions depends on the real-time source. For more information, see Stopping Real-time Sessions on page 350.
To stop or abort workflows, tasks, or worklets in the Workflow Monitor: 1. 2.
In the Navigator, select the task, workflow, or worklet you want to stop or abort. Click Tasks > Stop or Tasks > Abort. -orRight-click the task, workflow, or worklet in the Navigator and select Stop or Abort. The Workflow Monitor displays the status of the stop or abort command in the Output window.
Right-click the workflow and select Schedule. The Workflow Monitor displays the workflow status as Scheduled, and displays a message in the Output window.
Right-click the workflow and select Unschedule. The Workflow Monitor displays the workflow status as Unscheduled and displays a message in the Output window.
For more information about scheduling workflows, see Scheduling a Workflow on page 123.
588
If you want to view past session or workflow logs, you can configure the session or workflow to save log files. When you configure the workflow to save log files, the workflow creates a text file and the binary file that displays in the Log Events window. You can save log files by timestamp or by workflow or session runs. You can configure how many workflow or session runs to save. To view past session or workflow log files, configure the session or workflow to save logs by timestamp. For more information about workflow and session logs, see Session and Workflow Logs on page 645.
To view a session or workflow log file: 1. 2.
Right-click a session or workflow in the Navigator or Time window. Select Get Session Log or Get Workflow Log. The most recent session or workflow log file opens in the Log Events window.
Tip: When the Workflow Monitor retrieves the session or workflow log, you can press Esc
589
Suspending
590
To see a list of tasks by status, view the workflow in Task view and sort by status. Or, click Edit > List Tasks in Gantt Chart view. For more information, see Listing Tasks and Workflows on page 592.
591
Task name. Name of the task in the workflow. Duration. The length of time the Integration Service spends running the most recent task or workflow. Status. The status of the most recent task or workflow. For more information about status, see Workflow and Task Status on page 590. Connection between objects. The Workflow Monitor shows links between objects in the Time window.
592
Open the Gantt Chart view and click Edit > List Tasks. The List Tasks dialog box appears.
2.
In the List What field, select the type of task status you want to list. For example, select Failed to view a list of failed tasks and workflows.
3.
Tip: Double-click the task name in the List Tasks dialog box to highlight the task in Gantt
Chart view.
Use the scroll bars. Right-click the task or workflow and click Go To Next Run or Go To Previous Run. Click View > Organize to select the date you want to display.
When you click View > Organize, the Go To field appears above the Time window. Click the Go To field to view a calendar and select the date you want to display. When you select a date, the Workflow Monitor displays that date beginning at 12:00 a.m.
593
594
To zoom the Time window in Gantt Chart view, click View > Zoom, and then select the time increment. You can also select the time increment in the Zoom button on the toolbar.
Performing a Search
Use the search tool in the Gantt Chart view to search for tasks, workflows, and worklets in all repositories you connect to. The Workflow Monitor searches for the word you specify in task names, workflow names, and worklet names. You can highlight the task in Gantt Chart view by double-clicking the task after searching.
595
To perform a search: 1.
Open the Gantt Chart view and click Edit > Find. The Find Object dialog box appears.
2. 3.
In the Find What field, enter the keyword you want to find. Click Find Now. The Workflow Monitor displays a list of tasks, workflows, and worklets that match the keyword.
Tip: Double-click the task name in the Find Object dialog box to highlight the task in Gantt
Chart view.
596
597
Workflow run list. The list of workflow runs. The workflow run list contains folder, workflow, worklet, and task names. The Workflow Monitor displays workflow runs chronologically with the most recent run at the top. It displays folders and Integration Services alphabetically. Status message. Message from the Integration Service regarding the status of the task or workflow. Run type. The method you used to start the workflow. You might manually start the workflow or schedule the workflow to start. Node. Node of the Integration Service that ran the task. Start time. The time that the Integration Service starts running the task or workflow. Completion time. The time that the Integration Service finishes executing the task or workflow. Status. The status of the task or workflow. Filter tasks. Use the Filter menu to select the tasks you want to display or hide. For more information about filtering tasks in Task view, see Filtering in Task View on page 599. Hide and view columns. Hide or view an entire column in Task view. For more information about hiding and viewing columns in Task view, see Configuring Task View Options on page 581. Hide and view the Navigator. You can hide the Navigator in Task view. Click View > Navigator to hide or view the Navigator.
To view the tasks in Task view, select the Integration Service you want to monitor in the Navigator.
598
Navigator Window
Time Window
By task type. You can filter out tasks you do not want to view. For example, if you want to view only Session tasks, you can filter out all other tasks. For more information about filtering task types and Integration Services, see Filtering Tasks and Integration Services on page 574. By nodes in the Navigator. You can filter the workflow runs in the Time window by selecting different nodes in the Navigator. For example, when you select a repository name in the Navigator, the Time window displays all workflow runs that ran on the Integration Services registered to that repository. When you select a folder name in the Navigator, the Time window displays all workflow runs in that folder. By the most recent runs. To display by the most recent runs, click Filters > Most Recent Runs and select the number of runs you want to display. By Time window columns. You can click Filters > Auto Filter and filter by properties you specify in the Time window columns.
599
Click Filters > Auto Filter. The Filter button appears in the some columns of the Time Window in Task view.
2. 3.
Click the Filter button in a column in the Time window. Select the properties you want to filter.
Tip: If you want to view all tasks, select All.
When you click the Filter button in either the Start Time or Completion Time column, you can select a custom time to filter.
4.
Select Custom for either Start Time or Completion Time. The Filter Start Time or Custom Completion Time dialog box appears.
5. 6.
Choose to show tasks before, after, or between the time you specify. Select the date and time. Click OK.
600
601
Tips
Reduce the size of the Time window. When you reduce the size of the Time window, the Workflow Monitor refreshes the screen faster, reducing flicker. Use the Repository Manager to truncate the list of workflow logs. If the Workflow Monitor takes a long time to refresh from the repository or to open folders, truncate the list of workflow logs. When you configure a session or workflow to archive session logs or workflow logs, the Integration Service saves those logs in local directories. The repository also creates an entry for each saved workflow log and session log. If you move or delete a session log or workflow log from the workflow log directory or session log directory, truncate the lists of workflow and session logs to remove the entries from the repository. The repository always retains the most recent workflow log entry for each workflow.
602
Chapter 21
Overview, 604 Viewing Repository Service Details, 605 Viewing Integration Service Properties, 606 Viewing Repository Folder Details, 609 Viewing Workflow Run Properties, 610 Viewing Worklet Run Properties, 613 Viewing Command Task Run Properties, 615 Viewing Session Task Run Properties, 616 Viewing Performance Details, 621
603
Overview
The Workflow Monitor displays information that you can use to troubleshoot and analyze workflows. You can view details about services, workflows, worklets, and tasks in the Properties window of the Workflow Monitor. You can view the following details in the Workflow Monitor:
Repository Service details. View information about repositories, such as the number of connected Integration Services. For more information, see Viewing Repository Service Details on page 605. Integration Service properties. View information about the Integration Service, such as the Integration Service Version. You can also view system resources that running workflows consume, such as the system swap usage at the time of the running workflow. For more information, see Viewing Integration Service Properties on page 606. Repository folder details. View information about a repository folder, such as the folder owner. For more information, see Viewing Repository Folder Details on page 609. Workflow run properties. View information about a workflow, such as the start and end time. For more information, see Viewing Workflow Run Properties on page 610. Worklet run properties. View information about a worklet, such as the execution nodes on which the worklet is run. For more information, see Viewing Worklet Run Properties on page 613. Command task run properties. View the information about Command tasks in a running workflow, such as the start and end time. For more information, see Viewing Command Task Run Properties on page 615. Session task run properties. View information about Session tasks in a running workflow, such as details on session failures. For more information, see Viewing Session Task Run Properties on page 616. Performance details. View counters that help you understand the session and mapping efficiency, such as information on the data cache size for an Aggregator transformation. For more information, see Viewing Performance Details on page 621.
604
Table 21-1 describes the attributes that appear in the Repository Details area:
Table 21-1. Workflow Monitor Repository Details Attributes
Attribute Name Repository Name Is Opened User Name Number of Connected Integration Services Is Versioning Enabled Description Name of the repository. Yes, if you are connected to the repository. Otherwise, value is No. Name of the user connected to the repository. Attribute appears if you are connected to the repository. Number of Integration Services you are connected to in the Workflow Monitor. Attribute appears if you are connected to the repository. Indicates whether repository versioning is enabled.
605
Integration Service Details. Displays information about the Integration Service. For more information, see Viewing Integration Service Details on page 606. Integration Service Monitor. Displays system resource usage information about nodes associated with the Integration Service. For more information, see Viewing Integration Service Monitor on page 607.
Table 21-2 describes the attributes that appear in the Integration Service Details area:
Table 21-2. Workflow Monitor Integration Service Details Attributes
Attribute Name Integration Service Name Integration Service Version Integration Service Mode Integration Service OperatingMode Description Name of the Integration Service. PowerCenter version and build. Appears if you are connected to the Integration Service in the Workflow Monitor. Data movement mode of the Integration Service. Appears if you are connected to the Integration Service in the Workflow Monitor. The operating mode of the Integration Service. Appears if you are connected to the Integration Service in the Workflow Monitor.
606
Grid Assigned
607
Table 21-3 describes the attributes that appear in the Integration Service Monitor area:
Table 21-3. Workflow Monitor Integration Service Monitor Attributes
Attribute Name Node Name Folder Workflow Task/Partition Status Process ID CPU % Memory Usage Swap Usage Description Name of the node on which the Integration Service is running. Folder that contains the running workflow. Name of the running workflow. Name of the session and partition that is running. Or, name of Command task that is running. Status of the task. Process ID of the task. For a node, this is the percent of CPU usage of processes running on the node. For a task, this is the percent of CPU usage by the task process. For a node, this is the memory usage of processes running on the node. For a task, this is the memory usage of the task process. Amount of swap space usage of processes running on the node.
608
Table 21-4 describes the attributes that appear in the Folder Details area:
Table 21-4. Workflow Monitor Folder Details Attributes
Attribute Name Folder Name Is Opened Number of Workflow Runs Within Time Window Number of Fetched Workflow Runs Workflows Fetched Between Deleted Owner Description Name of the repository folder. Indicates if the folder is open. Number of workflows that have run in the time window during which the Workflow Monitor displays workflow statistics. For more information about configuring a time window for workflows, see Configuring General Options on page 578. Number of workflow runs displayed during the time window. Time period during which the Integration Service fetched the workflows. Appears as, DD/MM/YYYT HH:MM:SS and DD/MM/YYYT HH:MM:SS. Indicates if the folder is deleted. Repository folder owner.
609
Workflow details. View information about the workflow. For more information, see Viewing Workflow Details on page 610. Task progress details. View information about the tasks in the workflow. For more informations, see Viewing Task Progress Details on page 611. Session statistics. View information about the session. For more information, see Viewing Session Statistics on page 612.
Table 21-5 describes the attributes that appear in the Workflow Details area:
Table 21-5. Workflow Monitor Workflow Details Attributes
Attribute Name Task Name Concurrent Type OS Profile Name of the operating system profile assigned to the workflow. Value is empty if an operating system profile is not assigned to the workflow. For more information about using operating system profiles, see the PowerCenter Administrator Guide. Description Name of the workflow.
610
611
Table 21-6 describes the attributes that appear in the Session Statistics area:
Table 21-6. Workflow Monitor Session Statistics Attributes
Attribute Name Session Source Success Rows Source Failed Rows Target Success Rows Target Failed Rows Total Transformation Errors Start Time End Time Description Name of the session. Number of rows the Integration Service successfully read from the source. Number of rows the Integration Service failed to read from the source. Number of rows the Integration Service successfully wrote to the target. Number of rows the Integration Service failed to write the target. Number of transformation errors in the session. Start time of the session. End time of the session.
612
Worklet details. View information about the worklet. For more information, see Viewing Workflow Details on page 610. Session statistics. View information about the session. For more information, see Viewing Session Statistics on page 612.
Table 21-7 describes the attributes that appear in the Worklet Details area:
Table 21-7. Workflow Monitor Worklet Details Attributes
Attribute Name Instance Name Task Type Integration Service Name Start Time End Time Recovery Time(s) Description Name of the worklet instance in the workflow. Task type is Worklet. Name of the Integration Service assigned to the workflow associated with the worklet. Start time of the worklet. End time of the worklet. Time of the recovery worklet run.
613
614
Table 21-8 describes the attributes that appear in the Task Details area:
Table 21-8. Workflow Monitor Command Task Details Attributes
Attribute Name Instance Name Task Type Integration Service Name Node(s) Start Time End Time Recovery Time(s) Status Status Message Deleted Version Number Description Command task name. Task type is Command. Name of the Integration Service assigned to the workflow associated with the Command task. Nodes on which the commands in the Command task run. Start time of the Command task. End time of the Command task. Time of the recovery run. Status of the Command task. For more information about Command task status, see Workflow and Task Status on page 590. Message about the Command task status. Indicates if the Command task is deleted. Version number of the Command task.
615
Failure information. View information about session failures. For more information, see Viewing Failure Information on page 616. Task details. View information about the session. For more information, see Viewing Session Task Details on page 617. Source and target statistics. View information about the number of rows the Integration Service read from the source and wrote to the target. For more information, see Viewing Source and Target Statistics on page 618. Partition details. View information about partitions in a session. For more information, see Viewing Partition Details on page 619. Performance. View information about session performance. For more information, see Viewing Performance Details on page 621.
To view session details, right-click a session in the Workflow Monitor and choose Get Run Properties. When you load data to a target with multiple groups, such as an XML target, the Integration Service provides session details for each group.
616
Table 21-9 describes the attributes that appear in the Failure Information area:
Table 21-9. Workflow Monitor Failure Information Attributes
Attribute Name First Error Code First Error Description Error code for the first error. First error message.
Table 21-10 describes the attributes that appear in the Task Details area for Session tasks:
Table 21-10. Workflow Monitor Task Details Attributes
Attribute Name Instance Name Task Type Integration Service Name Node(s) Start Time Description Name of the session. Task type is Session. Name of the Integration Service assigned to the workflow associated with the session. Node on which the session is running. Start time of the session.
617
618
Table 21-11 describes the attributes that appear in the Source/Target Statistics area:
Table 21-11. Workflow Monitor Source and Target Statistics Attributes
Attribute Name Transformation Name Description Name of the source qualifier instance or the target instance in the mapping. If you create multiple partitions in the source or target, the Instance Name displays the partition number. If the source or target contains multiple groups, the Instance Name displays the group name. Node running the transformation. For sources, shows the number of rows the Integration Service successfully read from the source. For targets, shows the number of rows the Integration Service successfully applied to the target. For sources, shows the number of rows the Integration Service successfully read from the source. For targets, shows the number of rows affected by the specified operation. For example, you have a table with one column called SALES_ID and five rows containing the values 1, 2, 3, 2, and 2. You mark rows for update where SALES_ID is 2. The writer affects three rows, even though there was one update request. Or, if you mark rows for update where SALES_ID is 4, the writer affects 0 rows. Number of rows the Integration Service dropped when reading from the source, or the number of rows the Integration Service rejected when writing to the target. Rate at which the Integration Service read rows from the source or wrote data into the target per second. The estimated rate at which the Integration Service read data from the source and wrote data to the target in bytes per second. Throughput (Bytes/Sec) is based on the Throughput (Rows/Sec) and the row size. The row size is based on the number of columns the Integration Service read from the source and wrote to the target, the data movement mode, column metadata, and if you enabled high precision for the session. The calculation is not based on the actual data size in each row. The total bytes the Integration Service read from source or wrote to target. It is obtained by multiplying rows/sec and the row size. Error message code of the most recent error message written to the session log. If you view details after the session completes, this field displays the last error code. Most recent error message written to the session log. If you view details after the session completes, this field displays the last error message. Time the Integration Service started to read from the source or write to the target. The Workflow Monitor displays time relative to the Integration Service. Time the Integration Service finished reading from the source or writing to the target. The Workflow Monitor displays time relative to the Integration Service.
Affected Rows
Bytes Last Error Code Last Error Message Start Time End Time
For example, if the Integration Service moves more rows through one target partition than another, or if the throughput is not evenly distributed, you might want to adjust the data range for the partitions. Figure 21-13 shows the Partition Details area:
Figure 21-13. Workflow Monitor Partition Details Area
Table 21-12 describes the attributes that appear in the Partition Details area:
Table 21-12. Workflow Monitor Partition Details Attributes
Attribute Name Partition Name Node Transformations Process ID CPU % CPU Seconds Memory Usage Input Rows Output Rows Description Name of the partition. Node running the partition. Transformations in the partition pipeline. Process ID of the partition. Percent of the CPU the partition is consuming during the current session run. Amount of process time in seconds the CPU is taking to process the data in the partition during the current session run. Amount of memory the partition consumes during the current session run. Number of input rows for the partition. Number of output rows for the partition.
620
Workflow Monitor. When you configure the session to collect performance details, you can view them in the Workflow Monitor. When you configure the session to store performance details, you can view the details for previous sessions. For more information about configuring session properties for performance data, see Configuring Performance Details on page 221. Performance details file. The Integration Service creates a performance detail file for the session when it completes. Use a text editor to view the performance details file.
By evaluating the final performance details, you can determine where session performance slows down. The Workflow Monitor also provides session-specific details that can help tune the following memory settings:
Buffer block size Index and data cache size for Aggregator, Rank, Lookup, and Joiner transformations
Right-click a session in the Workflow Monitor and choose Get Run Properties. Click the Performance area in the Properties window. Figure 21-14 shows the Performance area:
Figure 21-14. Workflow Monitor Performance Area
621
Table 21-13 describes the attributes that appear in the Performance area:
Table 21-13. Workflow Monitor Performance Attributes
Attribute Name Performance Counter Counter Value Description Name of the performance counter. Value of the performance counter.
When you create multiple partitions, the Performance Area displays a column for each partition. The columns display the counter values for each partition.
3.
Click OK.
Locate the performance details file. The Integration Service names the file session_name.perf, and stores it in the same directory as the session log. If there is no session-specific directory for the session log, the Integration Service saves the file in the default log files directory.
2.
<Joiner transformation> [M]. Displays performance details about the master pipeline of the Joiner transformation. <Joiner transformation> [D]. Displays performance details about the detail pipeline of the Joiner transformation.
When you create multiple partitions, the Integration Service generates one set of counters for each partition. The following performance counters illustrate two partitions for an Expression transformation:
Transformation Name EXPTRANS [1] Counter Name Expression_input rows Counter Value 8
622
Transformation Name
Counter Value 8 16 16
EXPTRANS [2]
Note: When you increase the number of partitions, the number of aggregate or rank input
rows may be different from the number of output rows from the previous transformation. Table 21-14 describes the counters that may appear in the Session Performance Details area or in the performance details file:
Table 21-14. Performance Counters
Transformation Counters Aggregator/Rank_inputrows Aggregator/Rank_outputrows Aggregator/Rank_errorrows Aggregator/Rank_readfromcache Aggregator/Rank_writetocache Aggregator and Rank Transformations Aggregator/Rank_readfromdisk Description Number of rows passed into the transformation. Number of rows sent out of the transformation. Number of rows in which the Integration Service encountered an error. Number of times the Integration Service read from the index or data cache. Number of times the Integration Service wrote to the index or data cache. Number of times the Integration Service read from the index or data file on the local disk, instead of using cached data. Number of times the Integration Service wrote to the index or data file on the local disk, instead of using cached data. Number of new groups the Integration Service created. Number of times the Integration Service used existing groups. Number of rows passed into the transformation. Number of rows sent out of the transformation. Number of rows in which the Integration Service encountered an error. Number of rows stored in the lookup cache.
Aggregator/Rank_writetodisk
623
Joiner_writetodisk*
Joiner_readBlockFromDisk**
*The Integration Service generates this counter when you use sorted input for the Joiner transformation. **The Integration Service generates this counter when you do not use sorted input for the Joiner transformation.
624
If you have multiple source qualifiers and targets, evaluate them as a whole. For source qualifiers and targets, a high value is considered 80-100 percent. Low is considered 0-20 percent.
625
626
Chapter 22
Overview, 628 Running Workflows on a Grid, 629 Running Sessions on a Grid, 630 Working with Partition Groups, 632 Grid Connectivity and Recovery, 635 Configuring a Workflow or Session to Run on a Grid, 636
627
Overview
When a PowerCenter domain contains multiple nodes, you can configure workflows and sessions to run on a grid. When you run a workflow on a grid, the Integration Service runs a service process on each available node of the grid to increase performance and scalability. When you run a session on a grid, the Integration Service distributes session threads to multiple DTM processes on nodes in the grid to increase performance and scalability. You create the grid and configure the Integration Service in the Administration Console. To run a workflow on a grid, you configure the workflow to run on the Integration Service associated with the grid. To run a session on a grid, configure the session to run on the grid. Figure 22-1 shows the relationship between the workflow and nodes when you run a workflow on a grid:
Figure 22-1. Running a Workflow on a Grid
Node 1 Workflow Integration Service Grid
Node 2 Node 3
The Integration Service distributes workflow tasks and session threads based on how you configure the workflow or session to run:
Running workflows on a grid. The Integration Service distributes workflows across the nodes in a grid. It also distributes the Session, Command, and predefined Event-Wait tasks within workflows across the nodes in a grid. For more information about running a workflow on a grid, see Running Workflows on a Grid on page 629. Running sessions on a grid. The Integration Service distributes session threads across nodes in a grid. For information about running a session on a grid, see Running Sessions on a Grid on page 630.
Note: To run workflows on a grid, you must have the Server grid option. To run sessions on a
grid, you must have the Session on Grid option. For more information about creating the grid and configuring the Integration Service to run on a grid, see Managing the Grid in the PowerCenter Administrator Guide.
628
Node 1
Node 2
Node 1
Node 3
629
Node availability. The Load Balancer verifies which nodes are currently running, enabled, and available for task dispatch. Resource availability. If the Integration Service is configured to check resources, it identifies nodes that have resources required by mapping objects in the session. Partitioning configuration. The Load Balancer dispatches groups of session threads to separate nodes based on the partitioning configuration.
You might want to configure a session to run on a grid when the workflow contains a session that takes a long time to run. For example, a workflow contains a session with one partition. To balance the load, you configure the session to run on a grid and configure the Integration Service to check resources. The Load Balancer distributes the reader, writer, and transformation threads to DTM processes running on the nodes in the grid. The reader threads require a resource, so the Load Balancer distributes them to a DTM process on the node where resources are available. Figure 22-3 shows session threads distributed to DTM processes running on nodes in a grid:
Figure 22-3. Session Threads Distributed to DTM Processes Running on Nodes in a Grid
Node 1
Node 2
Node 3
Node 4
For more information about assigning resources to tasks or mapping objects, see Assigning Resources to Tasks on page 642. For more information about configuring the Integration
630
Service to check resources, see Creating and Configuring the Integration Service in the PowerCenter Administrator Guide.
631
Partition 2
Reader 2
Transformation 2
Writer 2
Node 2
available, you must configure the Integration Service to check resources. For information about configuring the Integration Service to check resources, see Creating and Configuring the Integration Service in the PowerCenter Administrator Guide.
632
Figure 22-5 shows an example of partition groups distributed based on partitioning configuration and resource availability:
Figure 22-5. Partition Groups Distributed Based on Resource Availability
Partition 1 Partition 2 Reader 1 Reader 2 Transformation 1 Transformation 2 Writer 1 Writer 2
The Integration Service limits the number of partition groups to the number of nodes in a grid. When a transformation is partitionable locally, the DTM process forms one partition group for the transformation threads, and runs that group in one DTM process. The following transformations are partitioned locally:
Custom transformation configured to partition locally External Procedure transformation Cached Lookup transformation Unsorted Joiner transformation SDK Reader or Writer transformation configured to partition locally
633
For information about configuring a shared directory, see Creating and Configuring the Integration Service in the PowerCenter Administrator Guide. For more information about determining cache requirements, see Session Caches on page 769.
634
High availability option. When you have high availability, workflows fail over to another node if the node or service shuts down. If you do not have high availability, you can manually restart a workflow on another node to recover it. Recovery strategy. You can configure a workflow to suspend on error. You configure a recovery strategy for tasks within the workflow. When a workflow suspends, the recovery behavior depends on the recovery strategy you configure for each task in the workflow. Shutdown mode. When you disable an Integration Service or service process, you can specify that the service completes, aborts, or stops processes running on the service. Behavior differs when you disable the Integration Service or you disable a service process. Behavior also differs when you disable a master service process or a worker service process. The Integration Service or service process may also shut down unexpectedly. In this case, the failover and recovery behavior depend on which service process shuts down and the configured recovery strategy. Running mode. If the workflow runs on a grid, the Integration Service can recover workflows and tasks on another node. If a session runs on a grid, you cannot configure a resume recovery strategy. Operating mode. If the Integration Service runs in safe mode, recovery is disabled for sessions and workflows.
Note: You cannot configure an Integration Service to fail over in safe mode if it runs on a grid.
For information about recovery behavior, see Recovering Workflows on page 375. For information about Integration Service failover and Integration Service safe mode, see the PowerCenter Administrator Guide.
635
Configure the workflow properties. On the General tab of the workflow properties, assign an Integration Service to run the workflow. Verify that the Integration Service is configured to run on a grid. Configure the session properties. To run a session on a grid, enable the session to run on a grid in the Config Object tab of the session properties. Configure resource requirements. You configure resource requirements on the General tab of the Session, Command, and predefined Event-Wait tasks. For information about configuring resource requirements, see Assigning Resources to Tasks on page 642.
For information about creating a grid, see the PowerCenter Administrator Guide.
If you override a service process variable, ensure that the Integration Service can access input files, caches, logs, storage and temporary directories, and source and target file directories. To ensure that a Session, Command, or predefined Event-Wait task runs on a particular node, configure the Integration Service to check resources and specify a resource requirement for a the task. For more information about configuring the Integration Service to check resources, see the PowerCenter Administrator Guide. To ensure that session threads for a mapping object run on a particular node, configure the Integration Service to check resources and specify a resource requirement for the object. When you run a session that creates cache files, configure the root and cache directory to use a shared location to ensure consistency between cache files. Ensure the Integration Service builds the cache in a shared location when you add a partition point at a Joiner transformation and the transformation is configured for 1:n partitioning. The cache for the Detail pipeline must be shared. Ensure the Integration Service builds the cache in a shared location when you add a partition point at a Lookup transformation, and the partition type is not hash auto-keys. When you run a session that uses dynamic partitioning, and you want to distribute session threads across all nodes in the grid, configure dynamic partitioning for the session to use
636
the Based on number of nodes in the grid method. For more information about configuring dynamic partitioning, see Dynamic Partitioning on page 433.
You cannot run a debug session on a grid. You cannot configure a resume recovery strategy for a session that you run on a grid. Configure the session to run on a grid when you work with sessions that take a long time to run. Configure the workflow to run on a grid when you have multiple concurrent sessions. You can run a persistent profile session on a grid, but you cannot run a temporary profile session on a grid. When you use a Sequence Generator transformation, increase the number of cached values to reduce the communication required between the master and worker DTM processes and the repository. To ensure that the Log Viewer can accurately order log events when you run a workflow or session on a grid, use time synchronization software to ensure that the nodes of a grid use a synchronized date/time. If the workflow uses an Email task in a Windows environment, configure the same Microsoft Outlook profile on each node to ensure the Email task can run.
637
638
Chapter 23
Overview, 640 Assigning Service Levels to Workflows, 641 Assigning Resources to Tasks, 642
639
Overview
The Load Balancer dispatches tasks to Integration Service processes running on nodes. When you run a workflow, the Load Balancer dispatches the Session, Command, and predefined Event-Wait tasks within the workflow. If the Integration Service is configured to check resources, the Load Balancer matches task requirements with resource availability to identify the best node to run a task. It may dispatch tasks to a single node or across nodes. For more information about how the Load Balancer dispatches tasks or about configuring the Integration Service to check resources, see the PowerCenter Administrator Guide. To identify the nodes that can run a task, the Load Balancer matches the resources required by the task with the resources available on each node. It dispatches tasks in the order it receives them. When the Load Balancer has more Session and Command tasks to dispatch than the Integration Service can run at the time, the Load Balancer places the tasks in the dispatch queue. When nodes become available, the Load Balancer dispatches the waiting tasks from the queue in the order determined by the workflow service level. You configure resources for each node using the Administration Console. For more information about configuring node resources, see the PowerCenter Administrator Guide. You assign resources and service levels using the Workflow Manager. You can perform the following tasks:
Assign service levels. You assign service levels to workflows. Service levels establish priority among workflow tasks that are waiting to be dispatched. For more information about assigning service levels, see Assigning Service Levels to Workflows on page 641.
Assign resources. You assign resources to tasks. Session, Command, and predefined EventWait tasks require PowerCenter resources to succeed. If the Integration Service is configured to check resources, the Load Balancer dispatches these tasks to nodes where the resources are available. For more information about assigning resources, see Assigning Resources to Tasks on page 642.
640
641
642
If you try to assign a resource type that does not apply to a repository object, the Workflow Manager displays the following error message:
The selected resource cannot be applied to this type of object. Please select a different resource.
The Workflow Manager assigns connection resources. When you use a relational, FTP, or external loader connection, the Workflow Manager assigns the connection resource to sources, targets, and transformations in a session instance. You cannot manually assign a connection resource in the Workflow Manager.
To assign resources to a task instance: 1.
Open the task properties in the Worklet or Workflow Designer. If the task is an Event-Wait task, you can assign resources only if the task waits for a predefined event.
2.
643
3.
In the Edit Resources dialog box, click the Add button to add a resource.
Select an object.
Select a resource.
4.
In the Select Resource dialog box, choose an object you want to assign a resource to. The Resources list shows the resources available to the nodes where the Integration Service runs. Select the resource to assign and click Select. In the Edit Resources dialog box, click OK.
5. 6.
644
Chapter 24
Overview, 646 Log Events, 647 Log Events Window, 650 Working with Log Files, 653 Workflow Logs, 659 Session Logs, 661 Viewing Log Events, 664
645
Overview
The Service Manager provides accumulated log events from each service in the domain and for sessions and workflows. To perform the logging function, the Service Manager runs a Log Manager and a Log Agent. The Log Manager runs on the master gateway node. The Integration Service generates log events for workflows and sessions. The Log Agent runs on the nodes to collect and process log events for sessions and workflows. Log events for workflows include information about tasks performed by the Integration Service, workflow processing, and workflow errors. Log events for sessions include information about the tasks performed by the Integration Service, session errors, and load summary and transformation statistics for the session. You can view log events for workflows with the Log Events window in the Workflow Monitor. The Log Events window displays information about log events including severity level, message code, run time, workflow name, and session name. For session logs, you can set the tracing level to log more information. All log events display severity regardless of tracing level. The following steps illustrates how the Log Manager processes session and workflow logs: 1. 2. During a session or workflow, the Integration Service writes binary log files on the node. It sends information about the sessions and workflows to the Log Manager. The Log Manager stores information about workflow and session logs in the domain configuration database. The domain configuration database stores information such as the path to the log file location, the node that contains the log, and the Integration Service that created the log. When you view a session or workflow in the Log Events window, the Log Manager retrieves the information from the domain configuration database to determine the location of the session or workflow logs. The Log Manager dispatches a Log Agent to retrieve the log events on each node to display in the Log Events window.
3.
4.
For more information about the Log Manager, see the PowerCenter Administrator Guide. If you want access to log events for more than the last workflow run, you can configure sessions and workflows to archive logs by time stamp. You can also configure a workflow to produce text log files. You can archive text log files by run or by time stamp. When you configure the workflow or session to produce text log files, the Integration Service creates the binary log and the text log file.
646
Log Events
You can view log events in the Workflow Monitor Log Events window and you can view them as text files. The Log Events window displays log events in a tabular format. For information about viewing log events in the Log Events window, see Log Events Window on page 650. For information about writing log events to a workflow or session log file, see Working with Log Files on page 653.
Log Codes
Use log events to determine the cause of workflow or session problems. To resolve problems, locate the relevant log codes and text prefixes in the workflow and session log. Then, refer to the PowerCenter Troubleshooting Guide for more information about the error. The Integration Service precedes each workflow and session log event with a thread identification, a code, and a number. The code defines a group of messages for a process. The number defines a message. The message can provide general information or it can be an error message. Some log events are embedded within other log events. For example, a code CMN_1039 might contain informational messages from Microsoft SQL Server.
Message Severity
The Log Events window categorizes workflow and session log events into severity levels. It prioritizes error severity based on the embedded message type. The error severity level appears with log events in the Log Events window in the Workflow Monitor. It also appears with messages in the workflow and session log files. Table 24-1 describes message severity levels:
Table 24-1. Message Severity Levels
Severity Level FATAL ERROR WARNING INFO TRACE DEBUG Description Fatal error occurred. Fatal error messages have the highest severity level. Indicates the service failed to perform an operation or respond to a request from a client application. Error messages have the second highest severity level. Indicates the service is performing an operation that may cause an error. This can cause repository inconsistencies. Warning messages have the third highest severity level. Indicates the service is performing an operation that does not indicate errors or problems. Information messages have the third lowest severity level. Indicates service operations at a more specific level than Information. Tracing messages are generally record message sizes. Trace messages have the second lowest severity level. Indicates service operations at the thread level. Debug messages generally record the success or failure of service operations. Debug messages have the lowest severity level.
Log Events
647
Writing Logs
The Integration Service writes the workflow and session logs as binary files on the node where the service process runs. It adds a .bin extension to the log file name you configure in the session and workflow properties. When you run a session on a grid, the Integration Service creates one session log for each DTM process. The log file on the primary node has the configured log file name. The log file on a worker node has a .w<Partition Group Id> extension:
<session or workflow name>.w<Partition Group ID>.bin
For example, if you run the session s_m_PhoneList on a grid with three nodes, the session log files use the names, s_m_PhoneList.bin, s_m_PhoneList.w1.bin, and s_m_PhoneList.w2.bin. When you rerun a session or workflow, the Integration Service overwrites the binary log file unless you choose to save workflow logs by time stamp. When you save workflow logs by time stamp, the Integration Service adds a time stamp to the log file name and archives them. For more information about archiving logs, see Archiving Log Files by Time Stamp on page 654. If you want to view log files for more than one run, configure the workflow or session to create log files. For more information about log files, see Working with Log Files on page 653. A workflow or session continues to run even if there are errors while writing to the log file after the workflow or session initializes. If the log file is incomplete, the Log Events window cannot display all the log events. The Integration Service starts a new log file for each workflow and session run. When you recover a workflow or session, the Integration Service appends a recovery.time stamp extension to the file name for the recovery run. For real-time sessions, the Integration Service overwrites the log file when you cold start a session or when you restart a JMS or WebSphere MQ session that does not have recovery data. The Integration Service appends the log file when you restart a JMS or WebSphere MQ session that has recovery data. If you want to convert the binary file to a text file, use the infacmd convertLog or the infacmd GetLog command. For more information, see the PowerCenter Command Line Reference.
648
For more information about configuring the Integration Service to use a shared library for session logs, see the PowerCenter Administrator Guide. For more information about implementing the Session Log Interface, see Working with the Session Log Interface on page 665.
Log Events
649
Severity. Lists the type of message, such as informational or error. Time stamp. Date and time the log event reached the Log Agent. Node. Node on which the Integration Service process is running. Thread. Thread ID for the workflow or session. Process ID. Windows or UNIX process identification numbers. Displays in the Output window only. Message Code. Message code and number. Message. Message associated with the log event.
Figure 24-1 shows a sample workflow log in the Log Events window:
Figure 24-1. Sample Workflow Log in the Log Events Window
Main Area
Output Window
By default, the Log Events window displays log events according to the date and time the Integration Service writes the log event on the node. The Log Events window displays logs consisting of multiple log files by node name. When you run a session on a grid, log events for the partition groups are ordered by node name and grouped by log file.
650
You can perform the following tasks in the Log Events window:
Save log events to file. Click Save As to save log events as a binary, text, or XML file. Copy log event text to a file. Click Copy to copy one or more log events and paste them into a text file. Sort log events. Click a column heading to sort log events. Search for log events. Click Find to search for text in log events. Refresh log events. Click Refresh to view updated log events during a workflow or session run.
Note: When you view a log larger than 2 GB, the Log Events window displays a warning that
the file might be too large for system memory. If you choose to continue, the Log Events window might shut down unexpectedly.
Open the Workflow Monitor. Connect to a repository in the Navigator. Select an Integration Service. Right-click a workflow and select Get Workflow Log. The Log Events window displays.
5.
651
Query Area 6. 7. 8.
Enter the text you want to find. Optionally, click Match Case if you want the query to be case sensitive. Select one of the following options:
Message. Select to search text in the Message field. All Fields. Select to search text in all fields.
9.
Click Find Next to search for the next instance of the text in the Log Events. Or, click Find Previous to search for the previous instance of the text in the Log Events.
652
Write Backward Compatible Log File. Select this option to create a text file for workflow or session logs. If you do not select the option, the Integration Service creates the binary log only. Log File Directory. The directory where you want the log file created. By default, the Integration Service writes the workflow log file in the directory specified in the service process variable, $PMWorkflowLogDir. It writes the session log file in the directory specified in the service process variable, $PMSessionLogDir. If you enter a directory name that the Integration Service cannot access, the workflow or session fails. Table 24-3 shows the default location for each type of log file and the associated service process variables:
Table 24-3. Log File Default Locations and Associated Service Process Variables
Log File Type Workflow logs Session logs Default Directory (Service Process Variable) $PMWorkflowLogDir $PMSessionLogDir Default Value for Service Process Variable $PMRootDir/WorkflowLogs $PMRootDir/SessLogs
Name. The name of the log file. You must configure a name for the log file or the workflow or session is invalid. You can use a service, service process, or user-defined workflow or worklet variable for the log file name.
Note: The Integration Service stores the workflow and session log names in the domain
configuration database. If you want to use Unicode characters in the workflow or session log file names, the domain configuration database must be a Unicode database.
653
By run. Archive text log files by run. Configure a number of text logs to save. By time stamp. Archive binary logs and text files by time stamp. The Integration Service saves an unlimited number of logs and labels them by time stamp. When you configure the workflow or session to archive by time stamp, the Integration Service always archives binary logs.
where n=0 for the first historical log. The variable increments by one for each workflow or session run. If you run a session on a grid, the worker service processes use the following naming convention for a session:
<session name>.n.w<DTM ID>
yyyy = year mm = month, ranging from 01-12 dd = day, ranging from 01-31 hh = hour, ranging from 00-23 mi = minute, ranging from 00-59
654
To prevent filling the log directory, periodically purge or back up log files when using the time stamp option. If you run a session on a grid, the worker service processes use the following naming convention for sessions:
<session name>.yyyymmddhhmi.w<DTM ID> <session name>.yyyymmddhhmi.w<DTM ID>.bin
When you archive text log files, you can view the logs by navigating to the workflow or session log folder and viewing the files in a text reader. When you archive binary log files, you can view the logs by navigating to the workflow or session log folder and importing the files in the Log Events window. You can archive binary files when you configure the workflow or session to archive logs by time stamp. You do not have to create text log files to archive binary files. You might need to archive binary files to send to Informatica Technical Support for review.
655
2.
Required
Required
Required
Optional
3.
Click OK.
656
2.
Required
Required
657
3.
4.
Optional
5.
Click OK.
658
Workflow Logs
Workflow logs contain information about the workflow runs. You can view workflow log events in the Log Events window of the Workflow Monitor. You can also create an XML, text, or binary log file for workflow log events. A workflow log contains the following information:
Workflow name Workflow status Status of tasks and worklets in the workflow Start and end times for tasks and worklets Results of link conditions Errors encountered during the workflow and general information Some session messages and errors
Main Area
Output Window
Workflow Logs
659
660
Session Logs
Session logs contain information about the tasks that the Integration Service performs during a session, plus load summary and transformation statistics. By default, the Integration Service creates one session log for each session it runs. If a workflow contains multiple sessions, the Integration Service creates a separate session log for each session in the workflow. When you run a session on a grid, the Integration Service creates one session log for each DTM process. In general, a session log contains the following information:
Allocation of heap memory Execution of pre-session commands Creation of SQL commands for reader and writer threads Start and end times for target loading Errors encountered during the session and general information Execution of post-session commands Load summary of reader, writer, and DTM statistics Integration Service version and build number
Session Logs
661
DIRECTOR> TM_6794 Connected to repository [HR_80] in domain [StonesDomain] user [ellen] DIRECTOR> TM_6014 Initializing session [s_PromoItems] at [Mon Apr 03 16:19:48 2006] DIRECTOR> TM_6683 Repository Name: [HR_80] DIRECTOR> TM_6684 Server Name: [Copper] DIRECTOR> TM_6686 Folder: [Snaps] DIRECTOR> TM_6685 Workflow: [wf_PromoItems] DIRECTOR> TM_6101 Mapping name: m_PromoItems [version 1] DIRECTOR> SDK_1805 Recovery cache will be deleted when running in normal mode. DIRECTOR> SDK_1802 Session recovery cache initialization is complete.
The session log file includes the Integration Service version and build number.
DIRECTOR> TM_6703 Session [s_PromoItems] is run by 32-bit Integration Service version [8.1.0], build [0329]. [sapphire],
662
You can also enter tracing levels for individual transformations in the mapping. When you enter a tracing level in the session properties, you override tracing levels configured for transformations in the mapping.
Session Logs
663
Most recent session or workflow log. View the session or workflow log in the Log Events window for the last run workflow. Archived binary log files. View archived binary log files in the Log Events window. Archived text log files. View archived text log files in any text editor.
For more information about archiving log files, see Archiving Log Files on page 654.
To view the Log Events window for a session or workflow: 1. 2.
In the Workflow Monitor, right-click the workflow or session. Select Get Session Log or Get Workflow Log.
If you do not know the session or workflow log file name and location, check the Log File Name and Log File Directory attributes on the Session or Workflow Properties tab. If you are running the Integration Service on UNIX and the binary log file is not accessible on the Windows machine where the PowerCenter client is running, you can transfer the binary log file to the Windows machine using FTP.
2. 3. 4. 5.
In the Workflow Monitor, click Tools > Import Log. Navigate to the session or workflow log file directory. Select the binary log file you want to view. Click Open.
If you do not know the session or workflow log file name and location, check the Log File Name and Log File Directory attributes on the Session or Workflow Properties tab. Navigate to the session or workflow log file directory. The session and workflow log file directory contains the text log files and the binary log files. If you archive log files, check the file date to find the latest log file for the session.
3.
664
Chapter 25
Overview, 666 Implementing the Session Log Interface, 667 Functions in the Session Log Interface, 669
665
Overview
By default, the Integration Service writes session events to binary log files on the node where the service process runs. In addition, the Integration Service can pass the session event information to an external library. In the external shared library, you can provide the procedure for how the Integration Service writes the log events. PowerCenter provides access to the session event information through the Session Log Interface. When you create the shared library, you implement the functions provided in the Session Log Interface. When the Integration Service writes the session events, it calls the functions specified in the Session Log Interface. The functions in the shared library you create must match the function signatures defined in the Session Log Interface. For more information about the functions in the Session Log Interface, see Functions in the Session Log Interface on page 669.
666
3.
To ensure that the shared library can be correctly called by the Integration Service, follow the guidelines for writing the shared library.
You must implement all the functions in the Session Log Interface. All calls from the Integration Service to the functions in the Session Log Interface are serialized except for abnormal termination. The Integration Service makes the calls to the functions as it logs events to the session log. Therefore, when you implement the functions in the Session Log Interface, you do not need to use mutex objects to ensure that only one thread executes a section of code at a time.
667
When you implement the Session Log Interface in UNIX, do not perform any signal handling within the functions. This ensures that the functions do not interfere with the way that PowerCenter handles signals. Do not register or unregister any signal handlers. Since the Integration Service is a multi-threaded process, you must compile the shared library as a multi-threaded library so that it can be loaded correctly.
668
INFA_OutputSessionLogMsg
INFA_OutputSessionLogFatalMsg
INFA_EndSessionLog
INFA_AbnormalSessionTermination
The functions described in this section use the time structures declared in the standard C header file time.h. The functions also assume the following declarations:
typedef typedef typedef typedef typedef int unsigned int unsigned short char int INFA_INT32; INFA_UINT32; INFA_UNICHAR; INFA_MBCSCHAR; INFA_MBCS_CODEPAGE_ID;
INFA_InitSessionLog
void INFA_InitSessionLog(void ** dllContext, const INFA_UNICHAR * sServerName, const INFA_UNICHAR * sFolderName, const INFA_UNICHAR * sWorkflowName, const INFA_UNICHAR * sessionHierName[]);
The Integration Service calls the INFA_InitSessionLog function before it writes any session log event. The parameters passed to this function provide information about the session for which the Integration Service will write the event logs.
669
INFA_OutputSessionLogMsg
void INFA_OutputSessionLogMsg( void * dllContext, time_t curTime, INFA_UINT32 severity, const INFA_UNICHAR * msgCategoryName, INFA_UINT32 msgCode, const INFA_UNICHAR * msg, const INFA_UNICHAR * threadDescription);
The Integration Service calls this function each time it logs an event. The parameters passed to the function include the different elements of the log event message. You can use the parameters to customize the format for the log output or to filter out messages. INFA_OutputSessionLogMsg has the following parameters:
Parameter dllContext Data Type Unspecified Description User defined information specific to the shared library. You can use this parameter to store information related to network connection or to allocate memory needed during the course of handling the session log output. The shared library must allocate and deallocate any memory associated with this parameter. The time that the Integration Service logs the event. The code that indicates the type of the log event message. The event logs use the following severity codes: 32: Debug Messages 8: Informational Messages 2: Error Messages
curTime severity
670
Parameter msgCategoryName
Description Code prefix that indicates the category of the log event message. In the following example message, the string BLKR is the value passed in the msgCategoryName parameter.
READER_1_1_1> BLKR_16003 Initialization completed successfully.
msgCode
unsigned int
Number that identifies the log event message. In the following example message, the string 16003 is the value passed in the msgCode parameter.
READER_1_1_1> BLKR_16003 Initialization completed successfully.
msg
The text of the log event message. In the following example message, the string Initialization completed successfully is the value passed in the msg parameter.
READER_1_1_1> BLKR_16003 Initialization completed successfully.
threadDescription
Code that indicates which thread is generating the event log. In the following example message, the string READER_1_1_1 is the value passed in the threadDescription parameter.
READER_1_1_1> BLKR_16003 Initialization completed successfully.
INFA_OutputSessionLogFatalMsg
void INFA_OutputSessionLogFatalMsg(void * dllContext, const char * msg);
The Integration Service calls this function to log the last event before an abnormal termination. The parameter msg is MBCS characters in the Integration Service code page. When you implement this function in UNIX, make sure that you call only asynchronous signal safe functions from within this function. INFA_OutputSessionLogFatalMsg has the following parameters:
Parameter dllContext Data Type Unspecified Description User defined information specific to the shared library. You can use this parameter to store information related to network connection or to allocate memory needed during the course of handling the session log output. The shared library must allocate and deallocate any memory associated with this parameter. Text of the error message. Typically, these messages are assertion error messages or operating system error messages.
msg
constant char
INFA_EndSessionLog
void INFA_EndSessionLog(void * dllContext);
The Integration Service calls this function after the last message is sent to the session log and the session terminates normally. You can use this function to perform clean up operations and release memory and resources.
Functions in the Session Log Interface 671
INFA_AbnormalSessionTermination
void INFA_AbnormalSessionTermination(void * dllContext);
The Integration Service calls this function after the last message is sent to the session log and the session terminates abnormally. The Integration Service calls this function after it calls the INFA_OutputSessionLogFatalMsg function. If the Integration Service calls this function, then it does not call INFA_EndSessionLog. For example, the Integration Service calls this function when the DTM aborts or times out. In UNIX, the Integration Service calls this function when a signal exception occurs. Include only minimal functionality when you implement this function. In UNIX, make sure that you call only asynchronous signal safe functions from within this function. INFA_AbnormalSessionTermination has the following parameter:
Parameter dllContext Data Type Unspecified Description User defined information specific to the shared library. You can use this parameter to store information related to network connection or to allocate memory needed during the course of handling the session log output. The shared library must allocate and deallocate any memory associated with this parameter.
672
Chapter 26
Overview, 674 Understanding the Error Log Tables, 676 Understanding the Error Log File, 682 Configuring Error Log Options, 685
673
Overview
When you configure a session, you can choose to log row errors in a central location. When a row error occurs, the Integration Service logs error information that lets you determine the cause and source of the error. The Integration Service logs information such as source name, row ID, current row data, transformation, timestamp, error code, error message, repository name, folder name, session name, and mapping information. You can log row errors into relational tables or flat files. When you enable error logging, the Integration Service creates the error tables or an error log file the first time it runs the session. Error logs are cumulative. If the error logs exist, the Integration Service appends error data to the existing error logs. You can choose to log source row data. Source row data includes row data, source row ID, and source row type from the source qualifier where an error occurs. The Integration Service cannot identify the row in the source qualifier that contains an error if the error occurs after a non pass-through partition point with more than one partition or one of the following active sources:
Aggregator Custom, configured as an active transformation Joiner Normalizer (pipeline) Rank Sorter
By default, the Integration Service logs transformation errors in the session log and reject rows in the reject file. When you enable error logging, the Integration Service does not generate a reject file or write dropped rows to the session log. Without a reject file, the Integration Service does not log Transaction Control transformation rollback or commit errors. If you want to write rows to the session log in addition to the row error log, you can enable verbose data tracing.
Note: When you log row errors, session performance may decrease because the Integration Service processes one row at a time instead of a block of rows at once.
UNIX. The Integration Service writes data to the error log file using the Integration Service process code page. However, you can configure the Integration Service to write to the error log file using UTF-8 by enabling the LogsInUTF8 Integration Service property. Windows. The Integration Service writes all characters in the error log file using the UTF-8 encoding format.
674
The code page for the relational database where the error tables exist must be a subset of the target code page. If the error log table code page is not a subset of the target code page, the Integration Service might write inconsistent data in the error log tables. For more information about code pages, see the PowerCenter Administrator Guide.
Overview
675
PMERR_DATA. Stores data and metadata about a transformation row error and its corresponding source row. PMERR_MSG. Stores metadata about an error and the error message. PMERR_SESS. Stores metadata about the session. PMERR_TRANS. Stores metadata about the source and transformation ports, such as name and datatype, when a transformation error occurs.
PMERR_DATA
When the Integration Service encounters a row error, it inserts an entry into the PMERR_DATA table. This table stores data and metadata about a transformation row error and its corresponding source row. Table 26-1 describes the structure of the PMERR_DATA table:
Table 26-1. PMERR_DATA Table Schema
Column Name REPOSITORY_GID WORKFLOW_RUN_ID Datatype Varchar Integer Description A unique identifier for the repository. A unique identifier for the workflow.
676
677
PMERR_MSG
When the Integration Service encounters a row error, it inserts an entry into the PMERR_MSG table. This table stores metadata about the error and the error message. Table 26-2 describes the structure of the PMERR_MSG table:
Table 26-2. PMERR_MSG Table Schema
Column Name REPOSITORY_GID WORKFLOW_RUN_ID WORKLET_RUN_ID SESS_INST_ID MAPPLET_INST_NAME TRANS_NAME Datatype Varchar Integer Integer Integer Varchar Varchar Description A unique identifier for the repository. A unique identifier for the workflow. A unique identifier for the worklet. If a session is not part of a worklet, this value is 0. A unique identifier for the session. Mapplet to which the transformation belongs. If the transformation is not part of a mapplet, this value is n/a. Name of the transformation where an error occurred.
678
ERROR_TYPE
Integer
LINE_NO
Integer
PMERR_SESS
When you choose relational database error logging, the Integration Service inserts entries into the PMERR_SESS table. This table stores metadata about the session where an error occurred.
679
MAPPING_NAME LINE_NO
Varchar Integer
PMERR_TRANS
When the Integration Service encounters a transformation error, it inserts an entry into the PMERR_TRANS table. This table stores metadata, such as the name and datatype of the source and transformation ports.
680
TRANS_ATTR
Varchar
SOURCE_ATTR
Varchar
LINE_NO
Integer
681
Session header. Contains session run information. Information in the session header is like the information stored in the PMERR_SESS table. Column header. Contains data column names. Column data. Contains actual row data and error message information.
The following sample error log file contains a session header, column header, and column data:
********************************************************************** Repository GID: fe4817ab-7d87-465f-9110-354222424df0 Repository: CustomerInfo Folder: Row_Error_Logging Workflow: wf_basic_REL_errors_AGG_case Session: s_m_basic_REL_errors_AGG_case Mapping: m_basic_REL_errors_AGG_case Workflow Run ID: 1310 Worklet Run ID: 0 Session Instance ID: 19 Session Start Time: 08/03/2004 16:57:01 Session Start Time (UTC): 1067126221 **********************************************************************
Transformation||Transformation Mapplet Name||Transformation Group||Partition Index||Transformation Row ID||Error Sequence||Error Timestamp||Error UTC Time||Error Code||Error Message||Error Type||Transformation Data||Source Mapplet Name||Source Name||Source Row ID||Source Row Type||Source Data
682
agg_REL_basic||N/A||Input||1||1||1||08/03/2004 16:57:03||1067126223||11019||Port [CUST_ID_NULL]: Default value is: ERROR(<<Expression Error>> [ERROR]: [AGG] CUST_ID - NULL detected on input.\n... nl:ERROR(s:'[AGG] CUST_ID - NULL detected on input.')).||3||D:1221|N:|N:|N:|D:Kauai Dive Shoppe|D:4-976 Sugarloaf Hwy|D:Kapaa Kauai|D:HI|D:94766|D:[AGG] DEFAULT SID VALUE.|D:01/01/2001 00:00:00||mplt_add_NULLs_to_QACUST3||SQ_QACUST3||1||0||D:1221|D:Kauai Dive Shoppe|D:4-976 Sugarloaf Hwy|D:Kapaa Kauai|D:HI|D:94766 agg_REL_basic||N/A||Input||1||4||1||08/03/2004 16:57:03||1067126223||11019||Port [CITY_IN]: Default value is: ERROR(<<Expression Error>> [ERROR]: [AGG] Null detected for City_IN.\n... nl:ERROR(s:'[AGG] Null detected for City_IN.')).||3||D:1354|N:|N:|D:1354|T:Cayman Divers World|D:PO Box 541|N:|D:Gr|N:|D:[AGG] DEFAULT SID VALUE.|D:01/01/2001 00:00:00||mplt_add_NULLs_to_QACUST3||SQ_QACUST3||4||0||D:1354|D:Cayman Divers World Unlim|D:PO Box 541|N:|D:Gr|N: agg_REL_basic||N/A||Input||1||5||1||08/03/2004 16:57:03||1067126223||11131||Transformation [agg_REL_basic] had an error evaluating variable column [Var_Divide_by_Price]. Error message is [<<Expression Error>> [/]: divisor is zero\n... f:(f:2 / f:(f:1 f:TO_FLOAT(i:1)))].||3||D:1356|N:|N:|D:1356|T:Tom Sawyer Diving C|T:632-1 Third Frydenh|D:Christiansted|D:St|D:00820|D:[AGG] DEFAULT SID VALUE.|D:01/01/2001 00:00:00||mplt_add_NULLs_to_QACUST3||SQ_QACUST3||5||0||D:1356|D:Tom Sawyer Diving Centre|D:632-1 Third Frydenho|D:Christiansted|D:St|D:00820
683
Transformation Data
Source Data
684
tab. For more information about creating a session configuration object, see Creating a Session Configuration Object on page 218.
To configure error logging options: 1. 2. 3.
Double-click the Session task to open the session properties. Select the Config Object tab. Specify the error log type. To enable error logging for the session, choose either Relational Database or Flat File. If you choose None, the Integration Service does not create an error log.
685
Table 26-6 describes the error logging settings of the Config Object tab:
Table 26-6. Error Log Options
Error Log Options Error Log Type Required/ Optional Required Description Specifies the type of error log to create. You can specify relational database, flat file, or none. By default, the Integration Service does not create an error log. Specifies the database connection for a relational log. This option is required when you enable relational database logging. Specifies the table name prefix for relational logs. The Integration Service appends 11 characters to the prefix name. Oracle and Sybase have a 30 character limit for table names. If a table name exceeds 30 characters, the session fails. You can use a parameter or variable for the error log table name prefix. Use any parameter or variable type that you can define in the parameter file. For more information, see Where to Use Parameters and Variables on page 691. Specifies the directory where errors are logged. By default, the error log file directory is $PMBadFilesDir\. This option is required when you enable flat file logging. Specifies error log file name. The character limit for the error log file name is 255. By default, the error log file name is PMError.log. This option is required when you enable flat file logging. Specifies whether or not to log transformation row data. When you enable error logging, the Integration Service logs transformation row data by default. If you disable this property, n/a or -1 appears in transformation row data fields. If you choose not to log source row data, or if source row data is unavailable, the Integration Service writes an indicator such as n/a or -1, depending on the column datatype. If you do not need to capture source row data, consider disabling this option to increase Integration Service performance. Delimiter for string type source row data and transformation group row data. By default, the Integration Service uses a pipe ( | ) delimiter. Verify that you do not use the same delimiter for the row data as the error logging columns. If you use the same delimiter, you may find it difficult to read the error log file.
Optional
Required
4.
Click OK.
686
Chapter 27
Parameter Files
Overview, 688 Parameter and Variable Types, 689 Where to Use Parameters and Variables, 691 Parameter File Structure, 698 Configuring the Parameter File Name and Location, 702 Parameter File Example, 707 Guidelines for Creating Parameter Files, 708 Troubleshooting, 709 Tips, 710
687
Overview
A parameter file is a list of parameters and variables and their associated values. These values define properties for a service, service process, workflow, worklet, or session. The Integration Service applies these values when you run a workflow or session that uses the parameter file. Parameter files provide you with the flexibility to change parameter and variable values each time you run a session or workflow. You can include information for multiple services, service processes, workflows, worklets, and sessions in a single parameter file. You can also create multiple parameter files and use a different file each time you run a session or workflow. The Integration Service reads the parameter file at the start of the workflow or session to determine the start values for the parameters and variables defined in the file. You can create a parameter file using a text editor such as WordPad or Notepad. Consider the following information when you use parameter files:
Types of parameters and variables. You can define different types of parameters and variables in a parameter file. These include service variables, service process variables, workflow and worklet variables, session parameters, and mapping parameters and variables. For more information, see Parameter and Variable Types on page 689. Properties you can set in parameter files. You can use parameters and variables to define many properties in the Designer and Workflow Manager. For example, you can enter a session parameter as the update override for a relational target instance, and set this parameter to the UPDATE statement in the parameter file. The Integration Service expands the parameter when the session runs. For information about the properties you can set in a parameter file, see Where to Use Parameters and Variables on page 691. Parameter file structure. You assign a value for a parameter or variable in the parameter file by entering the parameter or variable name and value on a single line in the form name=value. Groups of parameters and variables must be preceded by a heading that identifies the service, service process, workflow, worklet, or session to which the parameters or variables apply. For more information about the structure of a parameter file, see Parameter File Structure on page 698. Parameter file location. You specify the parameter file to use for a workflow or session. You can enter the parameter file name and directory in the workflow or session properties or in the pmcmd command line. For more information, see Configuring the Parameter File Name and Location on page 702.
688
Service variables. Define general properties for the Integration Service such as email addresses, log file counts, and error thresholds. $PMSuccessEmailUser, $PMSessionLogCount, and $PMSessionErrorThreshold are examples of service variables. The service variable values you define in the parameter file override the values that are set in the Administration Console. For more information about service variables, see the PowerCenter Administrator Guide. Service process variables. Define the directories for Integration Service files for each Integration Service process. $PMRootDir, $PMSessionLogDir, and $PMBadFileDir are examples of service process variables. The service process variable values you define in the parameter file override the values that are set in the Administration Console. If the Integration Service uses operating system profiles, the operating system user specified in the operating system profile must have access to the directories you define for the service process variables. For more information about service process variables, see the PowerCenter Administrator Guide. Workflow variables. Evaluate task conditions and record information in a workflow. For example, you can use a workflow variable in a Decision task to determine whether the previous task ran properly. In a workflow, $TaskName.PrevTaskStatus is a predefined workflow variable and $$VariableName is a user-defined workflow variable. For more information about workflow variables, see Working with Workflow Variables on page 143. Worklet variables. Evaluate task conditions and record information in a worklet. You can use predefined worklet variables in a parent workflow, but you cannot use workflow variables from the parent workflow in a worklet. In a worklet, $TaskName.PrevTaskStatus is a predefined worklet variable and $$VariableName is a user-defined worklet variable. For more information about worklet variables, see Working with Worklets on page 191. Session parameters. Define values that can change from session to session, such as database connections or file names. $PMSessionLogFile and $ParamName are user-defined session parameters. For more information about session parameters, see Working with Sessions on page 205. Mapping parameters. Define values that remain constant throughout a session, such as state sales tax rates. When declared in a mapping or mapplet, $$ParameterName is a userdefined mapping parameter. For more information about mapping parameters, see the Designer Guide. Mapping variables. Define values that can change during a session. The Integration Service saves the value of a mapping variable to the repository at the end of each successful session run and uses that value the next time you run the session. When declared in a
689
mapping or mapplet, $$VariableName is a mapping variable. For more information about mapping variables, see the Designer Guide. You cannot define the following types of variables in a parameter file:
$Source and $Target connection variables. Define the database location for a relational source, relational target, lookup table, or stored procedure. For more information about connection variables, see Configuring Session Connections on page 40. Email variables. Define session information in an email message such as the number of rows loaded, the session completion time, and read and write statistics. For more information about email variables, see Sending Email on page 401. Local variables. Temporarily store data in variable ports in Aggregator, Expression, and Rank transformations. For more information about local variables, see the Transformation Guide. Built-in variables. Variables that return run-time or system information, such as Integration Service name or system date. For more information about built-in variables, see the Transformation Language Reference. Transaction control variables. Define conditions to commit or rollback transactions during the processing of database rows. For more information about transaction control variables, see the Transformation Language Reference. ABAP program variables. Represent SAP structures, fields in SAP structures, or values in the ABAP program. For more information about ABAP program variables, see the PowerExchange for SAP NetWeaver User and Administrator Guide.
690
3.
You can also use a parameter file to override service and service process properties defined in the Administration Console. For example, you can override the session log directory, $PMSessionLogDir. To do this, configure the workflow or session to use a parameter file and set $PMSessionLogDir to the new file path in the parameter file. You can specify parameters and variables for the following PowerCenter objects:
Sources. You can use parameters and variables in input fields related to sources. For more information, see Table 27-1 on page 692. Targets. You can use parameters and variables in input fields related to targets. For more information, see Table 27-2 on page 692. Transformations. You can use parameters and variables in input fields related to transformations. For more information, see Table 27-3 on page 693. Tasks. You can use parameters and variables in input fields related to tasks in the Workflow Manager. For more information, see Table 27-4 on page 694. Sessions. You can use parameters and variables in input fields related to Session tasks. For more information, see Table 27-5 on page 694. Workflows. You can use parameters and variables in input fields related to workflows. For more information, see Table 27-6 on page 696.
691
Connections. You can use parameters and variables in input fields related to connection objects. For more information, see Table 27-7 on page 697. Data profiling objects. You can use parameters and variables in input fields related to data profiling. For more information, see Table 27-8 on page 697.
Table 27-1 lists the input fields related to sources where you can specify parameters and variables:
Table 27-1. Source Input Fields that Accept Parameters and Variables
Source Type Relational Field Source Table Name* Valid Parameter and Variable Types Workflow variables, worklet variables, session parameters, mapping parameters, and mapping variables All
PeopleSoft
SetID, Effective date, Tree name, Set control value, Extract date TIB/Adapter SDK repository URL Endpoint URL*
* You can specify parameters and variables in this field when you override it in the session properties (Mapping tab) in the Workflow Manager. Note: For information about input fields related to Source Qualifier transformations, see Table 27-3 on page 693.
Table 27-2 lists the input fields related to targets where you can specify parameters and variables:
Table 27-2. Target Input Fields that Accept Parameters and Variables
Target Type Relational Relational Field Update override*, Pre- and post-session SQL commands* Target Table Name* Valid Parameter and Variable Types All Workflow variables, worklet variables, session parameters, mapping parameters, and mapping variables Service and service process variables Service and service process variables Mapping parameters and variables
* You can specify parameters and variables in this field when you override it in the session properties (Mapping tab) in the Workflow Manager.
692
Table 27-3 lists the input fields related to transformations where you can specify parameters and variables:
Table 27-3. Transformation Input Fields that Accept Parameters and Variables
Transformation Type Transformations that allow you to use the Expression Editor Aggregator, Joiner, Lookup, Rank, XML Generator Custom, External Procedure, HTTP, XML Parser External Procedure Lookup Lookup Field Transformation expressions Valid Parameter and Variable Types Mapping parameters and variables
Cache directory*
Runtime location*
Initialization properties SQL override*, Cache file name prefix* Connection information*
Service and service process variables All Session parameters $DBConnectionName and $AppConnectionName, connection variables $Source and $Target Service and service process variables All
Default work directory* SQL query*, User-defined join*, Source filter condition*, Pre- and post-session SQL commands* Script file name* Call text (unconnected Stored Procedure)* Connection information* Endpoint URL*
Mapping parameters and variables All Session parameter $DBConnectionName, connection variables $Source and $Target Mapping parameters and variables
* You can specify parameters and variables in this field when you override it in the session properties (Mapping tab) in the Workflow Manager.
693
Table 27-4 lists the input fields related to Workflow Manager tasks where you can specify parameters and variables:
Table 27-4. Task Input Fields that Accept Parameters and Variables
Task Type Assignment task Command task Command task Decision task Email task Event-Wait task Link Session Timer task Field Assignment (user defined variables and expression) Command (name and command) Pre- and post-session shell commands Decision name (condition to be evaluated) Email user name, subject, and text File watch name (predefined events) Link condition See Table 27-5. Absolute time: Workflow date-time variable to calculate the wait Workflow and worklet variables Valid Parameter and Variable Types Workflow and worklet variables Service, service process, workflow, and worklet variables All Workflow and worklet variables Service, service process, workflow, and worklet variables Service, service process, workflow, and worklet variables Service, service process, workflow, and worklet variables
Table 27-5 lists the input fields related to sessions where you can specify parameters and variables:
Table 27-5. Session Input Fields that Accept Parameters and Variables
Tab Properties tab Properties tab Properties tab Properties tab Field Session log file name Session log file directory Parameter file name $Source and $Target connection values Valid Parameter and Variable Types Built-in session parameter $PMSessionLogFile Service and service process variables Workflow and worklet variables Session parameters $DBConnectionName and $AppConnectionName, connection variables $Source and $Target Mapping parameter $$PushdownConfig Service variable $PMSessionLogCount Service variable $PMSessionErrorThreshold All Service variables, service process variables, workflow variables, worklet variables, session parameters
Properties tab Config Object tab Config Object tab Config Object tab Config Object tab
Pushdown optimization session property Session log count Session error threshold Table name prefix for relational error logs Error log file name and directory
694
Table 27-5. Session Input Fields that Accept Parameters and Variables
Tab Config Object tab Mapping tab Field Number of partitions for dynamic partitioning Transformation properties that override properties you configure in a mapping Relational connection values Queue connection values* FTP connection values* Application connection values* External loader connection values* FTP remote file name Lookup source file name and directory Valid Parameter and Variable Types Built-in session parameter $DynamicPartitionCount Varies according to property. For more information, see Table 27-2 on page 692 and Table 27-3 on page 693. Session parameter $DBConnectionName, connection variables $Source and $Target Session parameter $QueueConnectionName Session parameter $FTPConnectionName Session parameter $AppConnectionName Session parameter $LoaderConnectionName All Service variables, service process variables, workflow variables, worklet variables, session parameters All Workflow variables, worklet variables, session parameter $ParamName Service variables, service process variables, workflow variables, worklet variables, session parameters Service variables, service process variables, workflow variables, worklet variables, session parameters All Service variables, service process variables, workflow variables, worklet variables, session parameters Service variables, service process variables, workflow variables, worklet variables, session parameters Service variables, service process variables, workflow variables, worklet variables, session parameters Service variables, service process variables, workflow variables, worklet variables, session parameters
Mapping tab Mapping tab Mapping tab Mapping tab Mapping tab Mapping tab Mapping tab
Pre- and post-session SQL commands (source and target) Code page for file sources and targets Source input file name and directory
Mapping tab
Table owner name for relational sources Target merge file name and directory
Mapping tab
Mapping tab
Mapping tab
695
Table 27-5. Session Input Fields that Accept Parameters and Variables
Tab Mapping tab Field Target reject file name and directory Valid Parameter and Variable Types Service variables, service process variables, workflow variables, worklet variables, session parameters All Service and service process variables All Service and service process variables
Target table name prefix Teradata FastExport temporary file Control file content override for Teradata external loaders Recovery cache directory for WebSphere MQ, JMS, SAP ALE IDoc, TIBCO, webMethods, Web Service Provider sources MQ Source Qualifier filter condition SAP stage file name and directory SAP source file directory Post-session email (user name, subject, and text) Post-session email attachment file name
Mapping tab Mapping tab Mapping tab Components tab Components tab
All Service and service process variables Service and service process variables All All
* You can override connection attributes for this connection type in the parameter file. For more information, see Overriding Connection Attributes in the Parameter File on page 46.
Table 27-6 lists the input fields related to workflows where you can specify parameters and variables:
Table 27-6. Workflow Input Fields that Accept Parameters and Variables
Tab Properties tab Properties tab General tab Field Workflow log file name and directory Workflow log count Suspension email (user name, subject, and text) Valid Parameter and Variable Types Service, service process, workflow, and worklet variables Service variable $PMWorkflowLogCount Service, service process, workflow, and worklet variables
696
Table 27-7 lists the input fields related to connection objects in the Workflow Manager where you can specify parameters and variables:
Table 27-7. Connection Object Input Fields that Accept Parameters and Variables
Connection Type Relational Relational: Source, Target, Lookup, Stored Procedure FTP FTP Application Application: Web Services Consumer Loader Field Database user name, password* Connection and transaction environment SQL User name, password* for host machine Default remote directory Application user name, password* Endpoint URL Database user name, password* Valid Parameter and Variable Types Session parameter $ParamName All
Session parameter $ParamName All Session parameter $ParamName Session parameter $ParamName, mapping parameters and variables Session parameter $ParamName
* Encrypt the password in the parameter file using the pmpasswd command line program with the CRYPT_DATA encryption type. For more information about pmpasswd, see the Command Line Reference.
Table 27-8 lists the input fields related to data profiling where you can specify parameters and variables:
Table 27-8. Data Profiling Input Fields that Accept Parameters and Variables
Object Data Profile domain Field Data profiling domain value Valid Parameter and Variable Types Service and service process variables
697
The Integration Service interprets all characters between the beginning of the line and the first equals sign as the parameter name and all characters between the first equals sign and the end of the line as the parameter value. Therefore, if you enter a space between the parameter name and the equals sign, the Integration Service interprets the space as part of the parameter name. If a line contains multiple equals signs, the Integration Service interprets all equals signs after the first one as part of the parameter value.
698
Create each heading only once in the parameter file. If you specify the same heading more than once in a parameter file, the Integration Service uses the information in the section below the first heading and ignores the information in the sections below subsequent identical headings. For example, a parameter file contains the following identical headings:
[HET_TGTS.WF:wf_TCOMMIT1] $$platform=windows ... [HET_TGTS.WF:wf_TCOMMIT1] $$platform=unix $DBConnection_ora=Ora2
In workflow wf_TCOMMIT1, the value for mapping parameter $$platform is windows, not unix, and session parameter $DBConnection_ora is not defined. If you define the same parameter or variable in multiple sections in the parameter file, the parameter or variable with the smallest scope takes precedence over parameters or variables with larger scope. For example, a parameter file contains the following sections:
[HET_TGTS.WF:wf_TGTS_ASC_ORDR] $DBConnection_ora=Ora2 [HET_TGTS.WF:wf_TGTS_ASC_ORDR.ST:s_TGTS_ASC_ORDR] $DBConnection_ora=Ora3
In session s_TGTS_ASC_ORDR, the value for session parameter $DBConnection_ora is Ora3. In all other sessions in the workflow, it is Ora2.
699
Comments
You can include comments in parameter files. The Integration Service ignores lines that are not valid headings and do not contain an equals sign character (=). The following lines are examples of parameter file comments:
--------------------------------------Created 10/11/06 by JSmith. *** Update the parameters below this line when you run this workflow on Integration Service Int_01. *** ; This is a valid comment because this line contains no equals sign.
Null Values
You can assign null values to parameters and variables in the parameter file. When you assign null values to parameters and variables, the Integration Service obtains the value from the following places, depending on the parameter or variable type:
Service and service process variables. The Integration Service uses the value set in the Administration Console. Workflow and worklet variables. The Integration Service uses the value saved in the repository (if the variable is persistent), the user-specified default value, or the datatype default value. For more information about workflow variable start values, see UserDefined Workflow Variables on page 152. Session parameters. Session parameters do not have default values. If the Integration Service cannot find a value for a session parameter, it may fail the session, take an empty string as the default value, or fail to expand the parameter at run time. For example, the Integration Service fails a session where the session parameter $DBConnectionName is not defined. Mapping parameters and variables. The Integration Service uses the value saved in the repository (mapping variables only), the configured initial value, or the datatype default value. For more information about the initial and default values of mapping parameters and variables, see Mapping Parameters and Variables in the Designer Guide.
To assign a null value, set the parameter or variable value to <null> or leave the value blank. For example, the following lines assign null values to service process variables $PMBadFileDir and $PMCacheDir:
$PMBadFileDir=<null> $PMCacheDir=
700
---------------------------------------[Service:IntSvs_01] $PMSuccessEmailUser=pcadmin@mail.com $PMFailureEmailUser=pcadmin@mail.com [HET_TGTS.WF:wf_TCOMMIT_INST_ALIAS] $$platform=unix [HET_TGTS.WF:wf_TGTS_ASC_ORDR.ST:s_TGTS_ASC_ORDR] $$platform=unix $DBConnection_ora=Ora2 $ParamAscOrderOverride=UPDATE T_SALES SET CUST_NAME = :TU.CUST_NAME, DATE_SHIPPED = :TU.DATE_SHIPPED, TOTAL_SALES = :TU.TOTAL_SALES WHERE CUST_ID = :TU.CUST_ID [ORDERS.WF:wf_PARAM_FILE.WT:WL_PARAM_Lvl_1] $$DT_WL_lvl_1=02/01/2005 01:05:11 $$Double_WL_lvl_1=2.2 [ORDERS.WF:wf_PARAM_FILE.WT:WL_PARAM_Lvl_1.WT:NWL_PARAM_Lvl_2] $$DT_WL_lvl_2=03/01/2005 01:01:01 $$Int_WL_lvl_2=3 $$String_WL_lvl_2=ccccc
701
Open a Workflow in the Workflow Manager. Click Workflows > Edit. The Edit Workflow dialog box appears.
3. 4.
Click the Properties tab. Enter the parameter file location and name in the Parameter Filename field. You can enter either a direct path or a service process variable. Use the appropriate delimiter for the Integration Service operating system. If you configured the
702
PowerCenter environment for high availability, include the service process variable in the path.
5.
Click OK.
Open a session in the Workflow Manager. The Edit Tasks dialog box appears.
2. 3.
Click the Properties tab, and open the General Options settings. Enter the parameter file location and name in the Parameter Filename field. You can enter either a direct path or a service process variable. Use the appropriate delimiter for the Integration Service operating system. If you configured the PowerCenter environment for high availability, include the service process variable in the path. You can also enter a user-defined workflow or worklet variable. Enter a workflow or worklet variable to define the session parameter file name in the workflow parameter file.
703
For more information, see Using Variables to Specify Session Parameter Files on page 704.
4.
Click OK.
704
For the first and second workflow instances, you want the sessions to use the following session parameter files:
Session s_1 s_2 s_3 Session Parameter File Name (First workflow run instance) s_1Inst1.txt s_2Inst1.txt s_3Inst1.txt Session Parameter File Name (Second workflow run instance) s_1Inst2.txt s_2Inst2.txt s_3Inst2.txt
Create workflow variables to store the session parameter file names. For example, you create user-defined workflow variables $$s_1ParamFileName, $$s_2ParamFileName, and $$s_3ParamFileName. In the session properties for each session, set the parameter filename to a workflow variable:
Session s_1 s_2 s_3 Session Parameter File Name in Session Properties $$s_1ParamFileName $$s_2ParamFileName $$s_3ParamFileName
In the workflow parameter file for each workflow instance, set each workflow variable to the correct session parameter file name, and set $PMMergeSessParamFile=TRUE. If you use a variable as the session parameter file name, and you define the same parameter or variable in both the session and workflow parameter files, the Integration Service sets parameter and variable values according to the following rules:
When a parameter or variable is defined in the same section of the workflow and session parameter files, the Integration Service uses the value in the workflow parameter file. When a parameter or variable is defined in both the session section of the session parameter file and the workflow section of the workflow parameter file, the Integration Service uses the value in the session parameter file.
For more information about running concurrent workflows, see Running Concurrent Workflows on page 553.
705
The following command starts workflowA using the parameter file, myfile.txt:
pmcmd startworkflow -uv USERNAME -pv PASSWORD -s SALES:6258 -f east -w wSalesAvg -paramfile '\$PMRootDir/myfile.txt' workflowA
The following command starts taskA using the parameter file, myfile.txt:
pmcmd starttask -uv USERNAME -pv PASSWORD -s SALES:6258 -f east -w wSalesAvg -paramfile '\$PMRootDir/myfile.txt' taskA
For more information the pmcmd startworkflow and starttask commands, see the PowerCenter Command Line Reference.
706
The parameter file for the session includes the folder and session name and each parameter and variable:
[Production.s_MonthlyCalculations] $PMFailureEmailUser=pcadmin@mail.com $$State=MA $$Time=10/1/2005 05:04:11 $InputFile1=sales.txt $DBConnection_target=sales $PMSessionLogFile=D:/session logs/firstrun.txt
The next time you run the session, you might edit the parameter file to change the state to MD and delete the $$Time variable. This allows the Integration Service to use the value for the variable that the previous session stored in the repository.
707
List all session parameters. Session parameters do not have default values. If the Integration Service cannot find a value for a session parameter, it may fail the session, take an empty string as the default value, or fail to expand the parameter at run time. Session parameter names are not case sensitive. List all necessary mapping parameters and variables. Mapping parameter and variable values become start values for parameters and variables in a mapping. Mapping parameter and variable names are not case sensitive. Enter folder names for non-unique session names. When a session name exists more than once in a repository, enter the folder name to indicate the location of the session. Precede parameters and variables in mapplets with the mapplet name. Use the following format:
mapplet_name.parameter_name=value mapplet2_name.variable_name=value
Use multiple parameter files. You assign parameter files to workflows, worklets, and sessions individually. You can specify the same parameter file for all of these tasks or create multiple parameter files. When defining parameter values, do not use unnecessary line breaks or spaces. The Integration Service interprets additional spaces as part of a parameter name or value. Use correct date formats for datetime values. Use the following date formats for datetime values:
MM/DD/RR MM/DD/RR HH24:MI:SS MM/DD/RR HH24:MI:SS:NS MM/DD/YYYY MM/DD/YYYY HH24:MI:SS MM/DD/YYYY HH24:MI:SS.NS MM/DD/YYYY HH24:MI:SS.MS MMDD/YYYY HH24:MI:SS.US
Do not enclose parameter or variable values in quotes. The Integration Service interprets everything after the first equals sign as part of the value. Use a parameter or variable value of the proper length for the error log table name prefix. If you use a parameter or variable for the error log table name prefix, do not specify a prefix that exceeds 19 characters when naming Oracle, Sybase, or Teradata error log tables. The error table names can have up to 11 characters, and Oracle, Sybase, and Teradata databases have a maximum length of 30 characters for table names. For more information, see Understanding the Error Log Tables on page 676. The parameter or variable name can exceed 19 characters.
708
Troubleshooting
I have a section in a parameter file for a session, but the Integration Service does not seem to read it. Make sure to enter folder and session names as they appear in the Workflow Manager. Also, use the appropriate prefix for all user-defined session parameters. For more information about the parameter names, see Working with Session Parameters on page 236. I am trying to use a source file parameter to specify a source file and location, but the Integration Service cannot find the source file. Make sure to clear the source file directory in the session properties. The Integration Service concatenates the source file directory with the source file name to locate the source file. Also, make sure to enter a directory local to the Integration Service and to use the appropriate delimiter for the operating system. I am trying to run a workflow with a parameter file and one of the sessions keeps failing. The session might contain a parameter that is not listed in the parameter file. The Integration Service uses the parameter file to start all sessions in the workflow. Check the session properties, and then verify that all session parameters are defined correctly in the parameter file. I ran a workflow or session that uses a parameter file, and it failed. What parameter and variable values does the Integration Service use during the recovery run? For service variables, service process variables, session parameters, and mapping parameters, the Integration Service uses the values specified in the parameter file, if they exist. If values are not specified in the parameter file, then the Integration Service uses the value stored in the recovery storage file. For workflow, worklet, and mapping variables, the Integration Service always uses the value stored in the recovery storage file. For more information about recovering workflows and sessions, see Recovering Workflows on page 375.
Troubleshooting
709
Tips
Use a single parameter file to group parameter information for related sessions. When sessions are likely to use the same database connection or directory, you might want to include them in the same parameter file. When connections or directories change, you can update information for all sessions by editing one parameter file. Use pmcmd and multiple parameter files for sessions with regular cycles. Sometimes you reuse session parameters in a cycle. For example, you might run a session against a sales database everyday, but run the same session against sales and marketing databases once a week. You can create separate parameter files for each session run. Instead of changing the parameter file in the session properties each time you run the weekly session, use pmcmd to specify the parameter file to use when you start the session. Use reject file and session log parameters in conjunction with target file or target database connection parameters. When you use a target file or target database connection parameter with a session, you can keep track of reject files by using a reject file parameter. You can also use the session log parameter to write the session log to the target machine. Use a resource to verify the session runs on a node that has access to the parameter file. In the Administration Console, you can define a file resource for each node that has access to the parameter file and configure the Integration Service to check resources. Then, edit the session that uses the parameter file and assign the resource. When you run the workflow, the Integration Service runs the session with the required resource on a node that has the resource available. For more information about configuring the Integration Service to check resources, see the PowerCenter Administrator Guide. You can override initial values of workflow variables for a session by defining them in a session section. If a workflow contains an Assignment task that changes the value of a workflow variable, the next session in the workflow uses the latest value of the variable as the initial value for the session. To override the initial value for the session, define a new value for the variable in the session section of the parameter file. You can define parameters and variables using other parameters and variables. For example, in the parameter file, you can define session parameter $PMSessionLogFile using a service process variable as follows:
$PMSessionLogFile=$PMSessionLogDir/TestRun.txt
710
Chapter 28
External Loading
Overview, 712 External Loader Behavior, 713 Loading to IBM DB2, 715 Loading to Oracle, 721 Loading to Sybase IQ, 723 Loading to Teradata, 726 Configuring External Loading in a Session, 741 Troubleshooting, 747
711
Overview
You can configure a session to use IBM DB2, Oracle, Sybase IQ, and Teradata external loaders to load session target files into their respective databases. External loaders can increase session performance by loading information directly from a file or pipe rather than running the SQL commands to insert the same data into the database. Use multiple external loaders within one session. For example, if a mapping contains two targets, you can create a session that uses an Oracle external loader connection and a Sybase IQ external loader connection. For information about creating external loader connections, see External Loader Connections on page 64.
Disable constraints. You disable constraints built into the tables receiving the data before performing the load. For information about disabling constraints, see the database documentation. Turn off or disable database logging. To preserve high performance, you can increase commit intervals and turn off database logging. However, to perform database recovery on failed sessions, you must have database logging turned on. Configure code pages. IBM DB2, Oracle, Sybase IQ, and Teradata database servers must use the same code page as the target flat file code page. The Integration Service creates the control files and target flat files using the target flat file code page. If you use a code page other than 7-bit ASCII for the target flat file, run the Integration Service in Unicode data movement mode. Configure the external loader connection as a resource. If the Integration Service is configured to run on a grid, configure the external loader connection as a resource on the node where the external loader is available. For more information, see the PowerCenter Administrator Guide.
712
If the session is configured to trim subseconds, the Integration Service processes datetime data with a precision of 19. If the session is not configured to trim subseconds, the Integration Service processes datetime data based on the precision specified in the target flat file. Precision ranges from 19 to 29. Subseconds are trimmed according to the precision specified. If the precision specified in the target file is greater than that specified for the database, the Integration Service limits the precision to the maximum precision specified for the database.
The Integration Service waits for all external loading to complete before it performs postsession commands, runs external procedures, and sends post-session email. The Integration Service writes external loader initialization and completion messages in the session log. For more information about the external loader performance, check the external loader log. The loader saves the log in the same directory as the target flat files. The default extension for external loader logs is .ldrlog. The behavior of the external loader depends on how you choose to load the data. You can load data to a named pipe or to a flat file.
The pipe name is the same as the configured target file name.
Note: If a session aborts or fails, the data in the named pipe may contain inconsistencies.
713
Oracle, with parallel load enabled Teradata Tpump Teradata Warehouse Builder
If you use a loader that cannot load from multiple files, use round-robin partitioning to route the data to a single target file. You choose an external loader connection for each partition. However, the Integration Service uses the loader connection for the first partition. The Integration Service creates a single output file, and the external loader loads the output from the target file to the database. If you choose any other partition type for the target, the Integration Service fails the session. Use round-robin partition type for the target when you use the following loaders:
IBM DB2 EE IBM DB2 EEE Autoloader Oracle, with parallel load disabled Sybase IQ Teradata MultiLoad Teradata Fastload
714
IBM DB2 EE external loader. Performs insert and replace operations on targets. The external loader can also restart or terminate load operations. The IBM DB2 EE external loader invokes the db2load executable located in the Integration Service installation directory. The IBM DB2 EE external loader can load data to an IBM DB2 server on a machine that is remote to the Integration Service. IBM DB2 EEE Autoloader. Performs insert and replace operations on targets. The external loader can also restart or terminate load operations. The IBM DB2 EEE external loader invokes the IBM DB2 Autoloader program to load data. The Autoloader program uses the db2atld executable. The IBM DB2 EEE external loader can partition data and load the partitioned data simultaneously to the corresponding database partitions. The IBM DB2 EEE loader requires that the IBM DB2 server be on the same machine hosting the Integration Service.
Note: If the IBM DB2 EEE server is on a machine that is remote to the Integration Service,
connect to the IBM DB2 EEE database using a relational database connection. Use database partitioning for the IBM DB2 target. When you use database partitioning, the Integration Service queries the IBM DB2 system for table partition information and loads partitioned data to the corresponding nodes in the target database. For more information about database partitioning, see Database Partitioning Partition Type on page 486.
The IBM DB2 external loaders load from a delimited flat file. Verify that the target table columns are wide enough to store all of the data. For a connection that uses IBM DB2 client authentication, enter the PmNullUser user name and PmNullPasswd when you create the external loader connection. PowerCenter uses IBM DB2 client authentication when the connection user name is PmNullUser and the connection is to an IBM DB2 database. For a session with multiple partitions, use the round-robin partition type to route data to a single target file. For more information about partitioning sessions with external loaders, see Partitioning Sessions with External Loaders on page 714. If you configure multiple targets in the same pipeline to use IBM DB2 external loaders, each loader must load to a different tablespace on the target database. For more information about selecting external loaders, see Configuring External Loading in a Session on page 741. You must have the correct authority levels and privileges to load data to the database tables. For more information, see Configuring Authorities, Privileges, and Permissions on page 716.
715
Insert. Adds loaded data to the table without changing existing table data. Replace. Deletes all existing data from the table, and inserts the loaded data. The table and index definitions do not change. Restart. Restarts a previously interrupted load operation. Terminate. Terminates a previously interrupted load operation and rolls back the operation to the starting point, even if consistency points were passed. The tablespaces return to normal state, and the external loader makes all table objects consistent.
SYSADM authority DBADM authority LOAD authority on the database and one of the following privileges:
INSERT privilege on the table when the load utility is invoked in insert, terminate, or restart mode. INSERT and DELETE privilege on the table when the load utility is invoked in replace, terminate, or restart mode.
In addition, you must have proper read access and read/write permissions:
The database instance owner must have read access to the external loader input files. If you run IBM DB2 as a service on Windows, you must configure the service start account with a user account that has read/write permissions to use LAN resources, including drives, directories, and files. If you load to IBM DB2 EEE, the database instance owner must have write access to the load dump file and the load temporary file.
716
db2load Remote
Is Staged
Disabled
Recoverable
Enabled
Any other return code indicates that the load operation failed. The Integration Service writes the following error message to the session log:
WRT_8047 Error: External loader process <external loader name> exited with error <return code>.
717
Table 28-2 describes the return codes for the IBM DB2 EE external loader:
Table 28-2. IBM DB2 EE External Loader Return Codes
Code 0 1 2 3 4 Description External loader operation completed successfully. External loader cannot locate the control file. External loader could not open the external loader log file. External loader could not access the control file because the control file is locked by another process. IBM DB2 database returned an error.
Split and load. Partitions the data and loads it simultaneously using the corresponding database partitions. Split only. Partitions the data and writes the output to files in the specified split file directory. Load only. Does not partition the data. It loads data in existing split files using the corresponding database partitions. Analyze. Generates an optimal partitioning map with even distribution across all database partitions. If you run the external loader in split and load mode after you run it in analyze mode, the external loader uses the optimal partitioning map to partition the data.
For more information about IBM DB2 loading modes, see the IBM DB2 database documentation. The IBM DB2 EEE external loader creates multiple logs based on the number of database partitions it loads to. For each partition, the external loader appends a number corresponding to the partition number to the external loader log file name. The IBM DB2 EEE external loader log file format is file_name.ldrlog.partition_number. The Integration Service does not archive or overwrite IBM DB2 EEE external loader logs. If an external loader log of the same name exists when the external loader runs, the external loader appends new external loader log messages to the end of the existing external loader log file. You must manually archive or delete the external loader log files. For more information about log files generated by DB2 Autoload, see the IBM DB2 documentation. For information about IBM DB2 EEE external loader return codes, see the IBM DB2 documentation.
718
Table 28-3 describes attributes for IBM DB2 EEE external loader connections:
Table 28-3. IBM DB2 EEE External Loader Attributes
Attribute Opmode Default Value Insert Description IBM DB2 external loader operation mode. Select one of the following operation modes: - Insert - Replace - Restart - Terminate For more information about IBM DB2 operation modes, see Setting Operation Modes on page 716. Name of the IBM DB2 EEE external loader executable file. Location of the split files. The external loader creates split files if you configure SPLIT_ONLY loading mode. Database partitions on which the load operation is to be performed. Database partitions that determine how to split the data. If you do not specify this attribute, the external loader determines an optimal splitting method. Loading mode the external loader uses to load the data. Select one of the following loading modes: - Split and load - Split only - Load only - Analyze Maximum number of splitter processes. Forces the external loader operation to continue even if it determines at startup time that some target partitions or tablespaces are offline. Number of megabytes of data the external loader loads before writing a progress message to the external loader log. Specify a value between 1 and 4,000 MB. Range of TCP ports the external loader uses to create sockets for internal communications with the IBM DB2 server. Checks for record truncation during input or output. Name of the file that specifies the partitioning map. To use a customized partitioning map, specify this attribute. Generate a customized partitioning map when you run the external loader in Analyze loading mode. Name of the partitioning map when you run the external loader in Analyze loading mode. You must specify this attribute if you want to run the external loader in Analyze loading mode. Number of rows the external loader traces when you need to review a dump of the data conversion process and output of hashing values.
External Loader Executable Split File Location Output Nodes Split Nodes Mode
25 No 100
n/a
Trace
719
Date Format
mm/dd/ yyyy
720
Loading to Oracle
When you load to Oracle targets, use the Oracle SQL Loader to perform insert, update, and delete operations on targets. The Oracle external loader creates a reject file for data rejected by the database. The reject file has an extension of .ldrreject. The loader saves the reject file in the target files directory.
If you select an Oracle external loader, the default external loader executable name is sqlload. This is accurate for most UNIX platforms, but if you use Windows, check the Oracle documentation to find the name of the external loader executable. For a connection that uses Oracle OS Authentication, enter the PmNullUser user name and PmNullPasswd when you create the external loader connection. PowerCenter uses Oracle OS Authentication when the connection user name is PmNullUser and the connection is to an Oracle database. The target flat file for an Oracle external loader can be fixed-width or delimited. For optimal performance when writing to a partitioned target, select Direct Path. For more information, see the Oracle documentation. If the target has bigint data and you override the connection to use an Oracle connection, the Integration Service converts the data to the Oracle Number datatype. If you configure a session to write subsecond data up to nanoseconds to an Oracle 10 target, the Integration Service writes subsecond data up to microseconds. To ensure nanosecond precision, edit the control file to process subseconds. For optimal performance, use the following guidelines to determine settings for partitioned and non-partitioned targets:
Target Partitioned Partitioned Non-partitioned Load Method Direct Path Conventional Path n/a Parallel Load enable enable disable* Load Mode Append n/a n/a
* If you disable parallel load, you must choose round-robin partitioning to route data to a single target file.
K, where K is the maximum number of bytes a character contains in the selected target code page. This ensures that the Integration Service does not truncate data before loading the target file.
Load Method
10000
722
Loading to Sybase IQ
When you load to Sybase IQ, use the Sybase IQ external loader to perform insert operations. The Integration Service can load multibyte data to Sybase IQ targets. The Integration Service can write to a flat file when the Sybase IQ server is on the same machine or on a different machine as the Integration Service. The Integration Service can write to a named pipe if the Integration Service is local to the Sybase IQ database server.
Ensure that target tables do not violate primary key constraints. Configure a Sybase IQ user with read/write access before you use a Sybase IQ external loader. Target flat files for a Sybase IQ external loader can be fixed-width or delimited. The Sybase IQ external loader cannot perform update or delete operations on targets. For a session with multiple partitions, use the round-robin partition type to route data to a single target file. For more information about partitioning sessions with external loaders, see Partitioning Sessions with External Loaders on page 714. If the Integration Service and Sybase IQ server are on different machines, map or mount a drive from the machine hosting the Integration Service to the machine hosting the Sybase IQ server.
Loading to Sybase IQ
723
The session might fail if you use quotes in the connect string.
Table 28-5 describes the attributes for Sybase IQ external loader connections:
Table 28-5. Sybase IQ External Loader Attributes
Attribute Block Factor Default Value 10000 Description Number of records per block in the target Sybase table. The external loader applies the Block Factor attribute to load operations for fixedwidth flat file targets only. Size of blocks used in Sybase database operations. The external loader applies the Block Size attribute to load operations for delimited flat file targets only. If enabled, the Sybase IQ database issues a checkpoint after successfully loading the table. If disabled, the database issues no checkpoints. Number of rows the Sybase IQ external loader loads before it writes a status message to the external loader log. Location of the flat file target. You must specify this attribute relative to the database server installation directory. If the directory is in a Windows system, use a backslash (\) in the directory path: D:\mydirectory\inputfile.out If the directory is in a UNIX system, use a forward slash (/) in the directory path: /mydirectory/inputfile.out Enter the target file directory path using the syntax for the machine hosting the database server installation. For example, if the Integration Service is on a Windows machine and the Sybase IQ server is on a UNIX machine, use UNIX syntax.
Block Size
50000
Checkpoint
Enabled
1000 n/a
724
Is Staged
Enabled
Loading to Sybase IQ
725
Loading to Teradata
When you load to Teradata targets, use one of the following external loaders:
Multiload. Performs insert, update, delete, and upsert operations for large volume incremental loads. Use this loader when you run a session with a single partition. Multiload acquires table level locks, making it appropriate for offline loading. For more information about configuring the Multiload external loader connection object, see Configuring Teradata MultiLoad External Loader Attributes on page 730. TPump. Performs insert, update, delete, and upsert operations for relatively low volume updates. Use this loader when you run a session with multiple partitions. TPump acquires row-hash locks on the table, allowing other users to access the table as TPump loads to it. For more information about configuring the Tpump external loader connection object, see Configuring Teradata TPump External Loader Attributes on page 732. FastLoad. Performs insert operations for high volume initial loads, or for high volume truncate and reload operations. Use this loader when you run a session with a single partition. Use this loader on empty tables with no secondary indexes. For more information about configuring the FastLoad external loader connection object, see Configuring Teradata FastLoad External Loader Attributes on page 735. Warehouse Builder. Performs insert, update, upsert, and delete operations on targets. Use this loader when you run a session with multiple partitions. You can achieve the functionality of the other loaders based on the operator you use. For more information about configuring the Warehouse Builder external loader connection object, see Configuring Teradata Warehouse Builder Attributes on page 737.
If you use a Teradata external loader to perform update or upsert operations, use the Target Update Override option in the Mapping Designer to override the UPDATE statement in the external loader control file. For upsert, the INSERT statement in the external loader control file remains unchanged. For more information about using the Target Update Override option, see Mappings in the PowerCenter Designer Guide.
The Integration Service can use Teradata external loaders to load fixed-width and delimited flat files to a Teradata database. Since all Teradata loaders delimit individual records using the line-feed (\n) character, you cannot use the line-feed character as a delimiter for Teradata loaders. If a session contains one partition, the target output file name, including the file extension, must not exceed 27 characters. If the session contains multiple partitions, the target output file name, including the file extension, must not exceed 25 characters. You cannot use spaces as null characters. Use the Teradata external loaders to load multibyte data. You cannot use the Teradata external loaders to load binary data.
726
When you load to Teradata using named pipes, set the checkpoint value to 0 to prevent external loaders from performing checkpoint operations. You can specify error, log, or work table names, depending on the loader you use. You can also specify error, log, or work database names. You can override the control file in the session properties. When the Integration Service processes bigint data from a Teradata v6.2 source, it inserts bigint values as 0 in the Teradata target. Use the ODBC Driver for Teradata 03.06.00.02 to process bigint values in a Teradata v6.2 source and target. When you use Teradata, you can enter PmNullPasswd as the database password to prevent the password from appearing in the control file. Instead, the Integration Service writes an empty string for the password in the control file.
For more information about Teradata external loaders, see the Teradata documentation.
control file syntax when you run a session. If the control file is invalid, the session fails. You can view the edited control file by opening the Control File Editor.
Loading to Teradata
727
Figure 28-1 shows the Control File Editor where you override the Teradata control file:
Figure 28-1. Control File Editor Dialog Box for Teradata
In the Workflow Manager, open the session properties. Click the Mapping tab and open the Transformations view. Click the Targets node. In the Connections settings, in the Value field, click Change. The Connection Object Definition dialog box appears.
5.
728
Click Generate. The Workflow Manager generates the control file based on the session and loader properties.
7.
Edit the generated control file and click OK to save the changes.
When you create the user variable, use the following syntax:
<User_Variable_Name>=<Substitution_Value>
If you include spaces in the user variable name or the substitution value, the session may fail. When you add the user variable to the control file, use the following syntax:
:CF.<User_Variable_Name>
Example
After the Integration Service loads data to the target, you want to display the system date to an output file. In the connection object, you configure the following user variable:
OutputFileName=output_file.txt
When you run the session, the Integration Service replaces :CF.OutputFileName with output_file.txt in the control file.
Loading to Teradata
729
You can perform insert, update, delete, and upsert operations on targets. You can also use data driven mode to perform insert, update, or delete operations based on an Update Strategy or Custom transformation. For a session with multiple partitions, use the round-robin partition type to route data to a single target file. For more information about partitioning sessions with external loaders, see Partitioning Sessions with External Loaders on page 714. If you invoke a greater number of sessions than the maximum number of concurrent sessions the database allows, the session may hang. You can set the minimum value for Tenacity and Sleep to ensure that sessions fail rather than hang.
To configure attributes for the Teradata MultiLoad external loader, click Connections > Loader, select the Type, and click Edit. Table 28-6 shows the attributes that you configure for the Teradata MultiLoad external loader:
Table 28-6. Teradata MultiLoad External Loader Attributes
Attribute TDPID Database Name Date Format Default Value n/a n/a n/a Description Teradata database ID. Optional database name. If you do not specify a database name, the Integration Service uses the target table database name defined in the mapping. Date format. The date format in the connection object must match the date format you define in the target definition. The Integration Service supports the following date formats: - DD/MM/YYYY - MM/DD/YYYY - YYYY/DD/MM - YYYY/MM/DD Total number of rejected records that MultiLoad can write to the MultiLoad error tables. Uniqueness violations do not count as rejected records. An error limit of 0 means that there is no limit on the number of rejected records. Interval between checkpoints. You can set the interval to the following values: - 60 or more. MultiLoad performs a checkpoint operation after it processes each multiple of that number of records. - 159. MultiLoad performs a checkpoint operation at the specified interval, in minutes. - 0. MultiLoad does not perform any checkpoint operation during the import task. Amount of time, in hours, MultiLoad tries to log in to the required sessions. If a login fails, MultiLoad delays for the number of minutes specified in the Sleep attribute, and then retries the login. MultiLoad keeps trying until the login succeeds or the number of hours specified in the Tenacity attribute elapses.
Error Limit
Checkpoint
10,000
Tenacity
10,000
730
Enabled mload
Sleep
Is Staged
Disabled
Error Database
n/a
n/a
n/a
n/a
Loading to Teradata
731
Table 28-7 shows the attributes that you configure when you override the Teradata MultiLoad external loader connection object in the session properties:
Table 28-7. Teradata MultiLoad External Loader Attributes Defined at the Session Level
Attribute Error Table 1 Default Value n/a Description Table name for the first error table. Use this attribute to override the default error table name. If you do not specify an error table name, the Integration Service uses ET_<target_table_name>. Table name for the second error table. Use this attribute to override the default error table name. If you do not specify an error table name, the Integration Service uses UV_<target_table_name>. Work table name overrides the default work table name. If you do not specify a work table name, the Integration Service uses WT_<target_table_name>. Log table name overrides the default log table name. If you do not specify a log table name, the Integration Service uses ML_<target_table_name>. Control file text. Use this attribute to override the control file the Integration Service uses when it loads to Teradata. For more information, see Overriding the Control File on page 727.
Error Table 2
n/a
For more information about these attributes, see the Teradata documentation.
Error Limit
732
Load Mode
Upsert
Enabled tpump
Sleep
Packing Factor
20
Statement Rate
Loading to Teradata
733
Robust
Disabled
No Monitor Is Staged
Enabled Disabled
Error Database
n/a
n/a
User Variables
n/a
Table 28-9 shows the attributes that you configure when you override the Teradata TPump external loader connection object in the session properties:
Table 28-9. Teradata TPump External Loader Attributes Defined at the Session Level
Attribute Error Table Default Value n/a Description Error table name. Use this attribute to override the default error table name. If you do not specify an error table name, the Integration Service uses ET_<target_table_name><partition_number>.
734
Table 28-9. Teradata TPump External Loader Attributes Defined at the Session Level
Attribute Log Table Default Value n/a Description Log table name. Use this attribute to override the default log table name. If you do not specify a log table name, the Integration Service uses TL_<target_table_name><partition_number>. Control file text. Use this attribute to override the control file the Integration Service uses when it loads to Teradata. For more information, see Overriding the Control File on page 727.
n/a
For more information about these attributes, see the Teradata documentation.
Each FastLoad job loads data to one Teradata database table. If you want to load data to multiple tables using FastLoad, you must create multiple FastLoad jobs. For a session with multiple partitions, use the round-robin partition type to route data to a single target file. For more information about partitioning sessions with external loaders, see Partitioning Sessions with External Loaders on page 714. The target table must be empty with no defined secondary indexes. FastLoad does not load duplicate rows from the output file to the target table in the Teradata database if the target table has a primary key. If you load date values to the target table, you must configure the date format for the column in the target table in the format YYYY-MM-DD. You cannot use FastLoad to load binary data. You can use comma (,), tab (\t), and pipe ( | ) as delimiters.
To configure attributes for the Teradata FastLoad external loader, click Connections > Loader, select the Type, and click Edit. Table 28-10 shows the attributes that you configure for the Teradata FastLoad external loader:
Table 28-10. Teradata FastLoad External Loader Attributes
Attribute TDPID Database Name Error Limit Default Value n/a n/a 1,000,000 Description Teradata database ID. Database name. Maximum number of rows that FastLoad rejects before it stops loading data to the database table.
Loading to Teradata
735
Tenacity
Enabled
fastload
6 Disabled Disabled
Error Database
n/a
736
Table 28-11 shows the attributes that you configure when you override the Teradata FastLoad external loader connection object in the session properties:
Table 28-11. Teradata FastLoad External Loader Attributes Defined at the Session Level
Attribute Error Table 1 Default Value n/a Description Table name for the first error table overrides the default error table name. If you do not specify an error table name, the Integration Service uses ET_<target_table_name>. Table name for the second error table overrides the default error table name. If you do not specify an error table name, the Integration Service uses UV_<target_table_name>. Control file text. Use this attribute to override the control file the Integration Service uses when it loads to Teradata. For more information, see Overriding the Control File on page 727.
Error Table 2
n/a
n/a
For more information about these attributes, see the Teradata documentation.
Update
Stream
Each Teradata Warehouse Builder operator has associated attributes. Not all attributes available for FastLoad, MultiLoad, and TPump external loaders are available for Teradata Warehouse Builder.
Loading to Teradata
737
Table 28-13 shows the attributes that you configure for Teradata Warehouse Builder:
Table 28-13. Teradata Warehouse Builder External Loader Attributes
Attribute TDPID Database Name Error Database Name Operator Max instances Error Limit Checkpoint Default Value n/a n/a n/a Update 4 0 0 Description Teradata database ID. Database name. Name of the error database. Warehouse Builder operator used to load the data. Select Load, Update, or Stream. Maximum number of parallel instances for the defined operator. Maximum number of rows that Warehouse Builder rejects before it stops loading data to the database table. Number of rows transmitted to the Teradata database between checkpoints. If processing stops while a Warehouse Builder job is running, you can restart the job at the most recent checkpoint. If you enter 0, Warehouse Builder does not perform checkpoint operations. Number of hours Warehouse Builder tries to log in to the Warehouse Builder sessions when the maximum number of load jobs are already running on the Teradata database. When Warehouse Builder tries to log in for a new session, and the Teradata database indicates that the maximum number of load sessions is already running, Warehouse Builder logs off all new sessions that were logged in, delays for the number of minutes specified in the Sleep attribute, and then retries the login. Warehouse Builder keeps trying until it logs in for the required number of sessions or exceeds the number of hours specified in the Tenacity attribute. To disable Tenacity, set the value to 0. Mode to generate SQL commands. Select Insert, Update, Upsert, Delete, or Data Driven. When you use the Update or Stream operators, you can choose Data Driven load mode. When you select data driven loading, the Integration Service follows instructions in Update Strategy or Custom transformations to determine how to flag rows for insert, delete, or update. The Integration Service writes a column in the target file or named pipe to indicate the update strategy. The control file uses these values to determine how to load data to the database. The Integration Service uses the following values to indicate the update strategy: 0 - Insert 1 - Update 2 - Delete Drops the Warehouse Builder error tables before beginning the next session. Warehouse Builder will not run if error tables containing data exist from a prior job. Clear the option to keep error tables. Specifies whether to truncate target tables. Enable this option to truncate the target database table before beginning the Warehouse Builder job. Name and optional file path of the Teradata external loader executable file. If the external loader directory is not in the system path, enter the file path and file name.
Tenacity
Load Mode
Upsert
Enabled
Disabled tbuild
738
Sleep
Disabled 20
Robust
Disabled
Is Staged
Disabled
Error Database
n/a
n/a
n/a
Loading to Teradata
739
Table 28-14 shows the attributes that you configure when you override the Teradata Warehouse Builder external loader connection object in the session properties:
Table 28-14. Teradata Warehouse Builder External Loader Attributes Defined for Sessions
Attribute Error Table 1 Default Value n/a Description Table name for the first error table. Use this attribute to override the default error table name. If you do not specify an error table name, the Integration Service uses ET_<target_table_name>. Table name for the second error table. Use this attribute to override the default error table name. If you do not specify an error table name, the Integration Service uses UV_<target_table_name>. Work table name. This attribute overrides the default work table name. If you do not specify a work table name, the Integration Service uses WT_<target_table_name>. Log table name. This attribute overrides the default log table name. If you do not specify a log table name, the Integration Service uses RL_<target_table_name>. Control file text. This attribute overrides the control file the Integration Service uses to loads to Teradata. For more information, see Overriding the Control File on page 727.
Error Table 2
n/a
For more information about these attributes, see the Teradata documentation.
740
741
To change the writer type for the target, select the target instance and change the writer type from Relational Writer to File Writer.
742
Target Instance
Properties Settings
Output Filename
743
Note: Do not select Merge Partitioned Files or enter a merge file name. You cannot merge
744
On the Mapping tab, select the target instance in the Navigator. Select the Loader connection type. Click the Open button in the Value field.
745
5.
Use object. Select a loader connection object. Click the Override button to override connection attributes. For more information about overriding connection attributes in the Workflow Manager, see Overriding Connection Attributes in the Session on page 45. The attributes you can override vary according to loader type. Use connection variable. Use the $LoaderConnectionName session parameter, and define the parameter in the parameter file. Override connection attributes in the parameter file. For information about overriding connection attributes in the parameter file, see Overriding Connection Attributes in the Parameter File on page 46. For more information about parameter files, see Parameter Files on page 687.
4.
Click OK.
746
Troubleshooting
I am trying to set up a session to load data to an external loader, but I cannot select an external loader connection in the session properties. Verify that the mapping contains a relational target. When you create the session, select a file writer in the Writers settings of the Mapping tab in the session properties. Then open the Connections settings and select an external loader connection. I am trying to run a session that uses TPump, but the session fails. The session log displays an error saying that the Teradata output file name is too long. The Integration Service uses the Teradata output file name to generate names for the TPump error and log files and the log table name. To generate these names, the Integration Service adds a prefix of several characters to the output file name. It adds three characters for sessions with one partition and five characters for sessions with multiple partitions. Teradata allows log table names of up to 30 characters. Because the Integration Service adds a prefix, if you are running a session with a single partition, specify a target output file name with a maximum of 27 characters, including the file extension. If you are running a session with multiple partitions, specify a target output file name with a maximum of 25 characters, including the file extension. I tried to load data to Teradata using TPump, but the session failed. I corrected the error, but the session still fails. Occasionally, Teradata does not drop the log table when you rerun the session. Check the Teradata database, and manually drop the log table if it exists. Then rerun the session.
Troubleshooting
747
748
Chapter 29
Using FTP
Overview, 750 Integration Service Behavior, 751 Configuring FTP in a Session, 753
749
Overview
You can configure a session to use File Transfer Protocol (FTP) to read from flat file or XML sources or write to flat file or XML targets. The Integration Service can use FTP to access any machine it can connect to, including mainframes. With both source and target files, use FTP to transfer the files directly or stage them in a local directory. Access source files directly or use a file list to access indirect source files in a session. To use FTP file sources and targets in a session, complete the following tasks: 1. Create an FTP connection object in the Workflow Manager and configure the connection attributes. For more information about creating FTP connections, see FTP Connections on page 60. Configure the session to use the FTP connection object in the session properties. For more information, see Configuring FTP in a Session on page 753.
2.
Configure an FTP connection to use SSH File Transfer Protocol (SFTP) if you are connecting to an SFTP server. SFTP enables file transfer over a secure data stream. The Integration Service creates an SSH2 transport layer that enables a secure connection and access to the files on an SFTP server.
Specify the source or target output directory in the session properties. If you do not specify a directory, the Integration Service stages the file in the directory where the Integration Service runs on UNIX or in the Windows system directory. You cannot run sessions concurrently if the sessions use the same FTP source file or target file located on a mainframe. If you abort a workflow containing a session that stages an FTP source or target from a mainframe, you may need to wait for the connection to timeout before you can run the workflow again. To run a session using an FTP connection for an SFTP server that requires public key authentication, the public key and private key files must be accessible on nodes where the session will run.
750
Source files. Stage source files on the machine hosting the Integration Service or access the source files directly from the FTP host. Use a single source file or a file list that contains indirect source files for a single source instance. Target files. Stage target files on the machine hosting the Integration Service or write to the target files on the FTP host.
Select staging options for the session when you select the FTP connection object in the session properties. You can also stage files by creating a pre- or post-session shell command to copy the files to or from the FTP host. You generally get better performance when you access source files directly with FTP. However, you may want to stage FTP files to keep a local archive.
751
When you stage target files, the Integration Service creates a target file locally and transfers it to the FTP host after the session completes. If you do not stage the target file, the Integration Service writes directly to the target file on the FTP host. If the target file exists, the Integration Service truncates the file. If you have the Partitioning option, use FTP for multiple target partition instances. You can write to multiple target files or a merge file on the Integration Service or the FTP host. For more information about using FTP with partitioned file targets, see Partitioning File Targets on page 461.
752
Select an FTP connection. Configure source file properties. Configure target file properties.
To stage the source or target file on the Integration Service machine, edit the FTP connection in the session properties to configure the directory and file name for the staged file.
753
On the Mapping tab, select the source or target instance in the Transformation view. Select the FTP connection type. Click the Open button in the Value field.
754
6.
Use object. Select an FTP connection object. Click the Override button to override connection attributes. For more information about overriding connection attributes in the Workflow Manager, see Overriding Connection Attributes in the Session on page 45. Use connection variable. Use the $FTPConnectionName session parameter, and define the parameter in the parameter file. Override connection attributes in the parameter file. For information about overriding connection attributes in the parameter file, see Overriding Connection Attributes in the Parameter File on page 46. For more information about parameter files, see Parameter Files on page 687.
755
Required Required
7.
Click OK.
756
Figure 29-2 shows where you configure the source file properties on the Mapping tab:
Figure 29-2. Properties Settings for the Source File
Table 29-2 describes the source file attributes on the Mapping tab:
Table 29-2. Properties Settings for a Source Instance
Attribute Source File Type Description Indicates whether the source file contains the source data or a list of files with the same file properties. Choose Direct if the source file contains the source data. Choose Indirect if the source file contains a list of files. Name and path of the local source file directory used to stage the source data. By default, the Integration Service uses the service process variable directory, $PMSourceFileDir, for file sources. The Integration Service concatenates this field with the Source file name field when it runs the session. If you do not stage the source file, the Integration Service uses the file name and directory from the FTP connection object. The Integration Service ignores this field if you enter a fully qualified file name in the Source file name field. Name of the local source file used to stage the source data. You can enter the file name or the file name and path. If you enter a fully qualified file name, the Integration Service ignores the Source file directory field. If you do not stage the source file, the Integration Service uses the remote file name and default directory from the FTP connection object.
757
Target Instance
Properties Settings
758
Table 29-3 describes the target file attributes on the Mapping tab:
Table 29-3. Properties Settings for Target Instance
Attribute Output File Directory Description Name and path of the local target file directory used to stage the target data. By default, the Integration Service uses the service process variable directory, $PMTargetFileDir. The Integration Service concatenates this field with the Output file name field when it runs the session. If you do not stage the target file, the Integration Service uses the file name and directory from the FTP connection object. The Integration Service ignores this field if you enter a fully qualified file name in the Output file name field. Name of the local target file used to stage the target data. You can enter the file name, or the file name and path. If you enter a fully qualified file name, the Integration Service ignores the Output file directory field. If you do not stage the source file, the Integration Service uses the remote file name and default directory from the FTP connection object.
You must use an FTP connection for each target partition. You can choose to stage the files when selecting the connection object for the target partition. You must stage the files to use sequential merge. If the FTP connections for the target partitions have any settings other than a remote file name, the Integration Service does not create a merge file.
Table 29-4 describes the actions of the Integration Service with partitioned FTP file targets:
Table 29-4. Integration Service Behavior with Partitioned FTP File Targets
Merge Type No Merge Integration Service Behavior If you stage the files, The Integration Service creates one target file for each partition. At the end of the session, the Integration Service transfers the target files to the remote location. If you do not stage the files, the Integration Service generates a target file for each partition at the remote location. Enable the Is Staged option in the connection object. Integration Service creates one output file for each partition. At the end of the session, the Integration Service merges the individual output files into a single merge file, deletes the individual output files, and transfers the merge file to the remote location.
Sequential Merge
759
Table 29-4. Integration Service Behavior with Partitioned FTP File Targets
Merge Type File List Integration Service Behavior If you stage the files, the Integration Service creates the following files: - Output file for each partition - File list that contains the names and paths of the local files - File list that contains the names and paths of the remote files At the end of the session, the Integration Service transfers the files to the remote location. If the individual target files are in the Merge File Directory, file list contains relative paths. Otherwise, the file list contains absolute paths. If you do not stage the files, the Integration Service writes the data for each partition at the remote location and creates a remote file list that contains a list of the individual target files. Use the file list as a source file in another mapping. If you stage the files, the Integration Service concurrently writes the data for all target partitions to a local merge file. At the end of the session, the Integration Service transfers the merge file to the remote location. The Integration Service does not write to any intermediate output files. If you do not stage the files, the Integration Service concurrently writes the target data for all partitions to a merge file at the remote location.
Concurrent Merge
760
Chapter 30
Overview, 762 Integration Service Processing for Incremental Aggregation, 763 Reinitializing the Aggregate Files, 764 Moving or Deleting the Aggregate Files, 765 Partitioning Guidelines with Incremental Aggregation, 766 Preparing for Incremental Aggregation, 767
761
Overview
When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes incrementally and you can capture changes, you can configure the session to process those changes. This allows the Integration Service to update the target incrementally, rather than forcing it to process the entire source and recalculate the same data each time you run the session. For example, you might have a session using a source that receives new data every day. You can capture those incremental changes because you have added a filter condition to the mapping that removes pre-existing data from the flow of data. You then enable incremental aggregation. When the session runs with incremental aggregation enabled for the first time on March 1, you use the entire source. This allows the Integration Service to read and store the necessary aggregate data. On March 2, when you run the session again, you filter out all the records except those time-stamped March 2. The Integration Service then processes the new data and updates the target accordingly. Consider using incremental aggregation in the following circumstances:
You can capture new source data. Use incremental aggregation when you can capture new source data each time you run the session. Use a Stored Procedure or Filter transformation to process new data. Incremental changes do not significantly change the target. Use incremental aggregation when the changes do not significantly change the target. If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case, drop the table and recreate the target with complete source data.
Note: Do not use incremental aggregation if the mapping contains percentile or median
functions. The Integration Service uses system memory to process these functions in addition to the cache memory you configure in the session properties. As a result, the Integration Service does not store incremental aggregation values for percentile and median functions in disk caches.
762
Save a new version of the mapping. Configure the session to reinitialize the aggregate cache. Move the aggregate files without correcting the configured path or directory for the files in the session properties. Change the configured path or directory for the aggregate files without moving the files to the new location. Delete cache files. Decrease the number of partitions.
When the Integration Service rebuilds incremental aggregation files, the data in the previous files is lost.
Note: To protect the incremental aggregation files from file corruption or disk failure,
763
you cannot change from a Latin1 code page to an MSLatin1 code page, even though these code pages are compatible.
764
Change the Integration Service data movement mode from ASCII to Unicode or from Unicode to ASCII. Change the Integration Service code page to an incompatible code page. Change the session sort order when the Integration Service runs in Unicode mode. Change the Enable High Precision session option.
For more information about cache file storage and naming conventions, see Cache Files on page 772.
765
Change the cache directory for a partition. If you change the directory for a partition and you want the Integration Service to reuse the cache files, you must move the cache files for the partition associated with the changed directory.
If you change the directory for the first partition, and you do not move the cache files, the Integration Service rebuilds the cache files for all partitions. If you change the directory for partitions 2-n, and you do not move the cache files, the Integration Service rebuilds the cache files that it cannot locate.
Decrease the number of partitions. If you delete a partition and you want the Integration Service to reuse the cache files, you must move the cache files for the deleted partition to the directory configured for the first partition. If you do not move the files to the directory of the first partition, the Integration Service rebuilds the cache files that it cannot locate.
Note: If you increase the number of partitions, the Integration Service realigns the index
and data cache files the next time you run a session. It does not need to rebuild the files.
Move cache files. If you move cache files for a partition and you want the Integration Service to reuse the files, you must also change the partition directory. If you do not change the directory, the Integration Service rebuilds the files the next time you run a session. Delete cache files. If you delete cache files, the Integration Service rebuilds them the next time you run a session.
If you change the number of partitions and the cache directory, you may need to move cache files for both. For example, if you change the cache directory for the first partition and you decrease the number of partitions, you need to move the cache files for the deleted partition and the cache files for the partition associated with the changed directory.
766
Implement mapping logic or filter to remove pre-existing data. Configure the session for incremental aggregation and verify that the file directory has enough disk space for the aggregate files.
Verify the location where you want to store the aggregate files. The index and data files grow in proportion to the source data. Be sure the cache directory has enough disk space to store historical data for the session. When you run multiple sessions with incremental aggregation, decide where you want the files stored. Then, enter the appropriate directory for the process variable, $PMCacheDir, in the Workflow Manager. You can enter session-specific directories for the index and data files. However, by using the process variable for all sessions using incremental aggregation, you can easily change the cache directory when necessary by changing $PMCacheDir. Changing the cache directory without moving the files causes the Integration Service to reinitialize the aggregate cache and gather new aggregate data. In a grid, Integration Services rebuild incremental aggregation files they cannot find. When an Integration Service rebuilds incremental aggregation files, it loses aggregate history.
Verify the incremental aggregation settings in the session properties. You can configure the session for incremental aggregation in the Performance settings on the Properties tab. You can also configure the session to reinitialize the aggregate cache. If you choose to reinitialize the cache, the Workflow Manager displays a warning indicating the Integration Service overwrites the existing cache and a reminder to clear this option after running the session.
767
Figure 30-1 shows the Performance settings on the Properties tab where you configure incremental aggregation options:
Figure 30-1. Incremental Aggregation Session Properties
Note: You cannot use incremental aggregation when the mapping includes an Aggregator
transformation with Transaction transformation scope. The Workflow Manager marks the session invalid.
768
Chapter 31
Session Caches
Overview, 770 Cache Memory, 771 Cache Files, 772 Configuring the Cache Size, 775 Cache Partitioning, 781 Aggregator Caches, 783 Joiner Caches, 786 Lookup Caches, 790 Rank Caches, 793 Sorter Caches, 796 XML Target Caches, 798 Optimizing the Cache Size, 799
769
Overview
The Integration Service allocates cache memory for XML targets and Aggregator, Joiner, Lookup, Rank, and Sorter transformations in a mapping. The Integration Service creates index and data caches for the XML targets and Aggregator, Joiner, Lookup, and Rank transformations. The Integration Service stores key values in the index cache and output values in the data cache. The Integration Service creates one cache for the Sorter transformation to store sort keys and the data to be sorted. You configure memory parameters for the caches in the session properties. When you first configure the cache size, you can calculate the amount of memory required to process the transformation or you can configure the Integration Service to automatically configure the memory requirements at run time. After you run a session, you can tune the cache sizes for the transformations in the session. You can analyze the transformation statistics to determine the cache sizes required for optimal session performance, and then update the configured cache sizes. If the Integration Service requires more memory than what you configure, it stores overflow values in cache files. When the session completes, the Integration Service releases cache memory, and in most circumstances, it deletes the cache files. If the session contains multiple partitions, the Integration Service creates one memory cache for each partition. In particular situations, the Integration Service uses cache partitioning, creating a separate cache for each partition. The following table describes the type of information that the Integration Service stores in each cache:
Mapping Object Aggregator Joiner Lookup Rank Sorter XML Target Cache Type - Index - Data - Index - Data - Index - Data - Index - Data - Sorter - Index - Data Description - Stores group values as configured in the group by ports. - Stores calculations based on the group by ports. - Stores all master rows in the join condition that have unique keys. - Stores master source rows. - Stores lookup condition information. - Stores lookup data that is not stored in the index cache. - Stores group values as configured in the group by ports. - Stores ranking information based on the group by ports. - Stores sort keys and data. - Stores primary and foreign key information in separate caches. - Stores XML row data while it generates the XML target.
770
Cache Memory
The Integration Service creates each memory cache based on the configured cache size. When you create a session, you can configure the cache sizes for each transformation instance in the session properties. The Integration Service might increase the configured cache size for one of the following reasons:
The configured cache size is less than the minimum cache size required to process the operation. The Integration Service requires a minimum amount of memory to initialize each session. If the configured cache size is less than the minimum required cache size, then the Integration Service increases the configured cache size to meet the minimum requirement. If the Integration Service cannot allocate the minimum required memory, the session fails. The configured cache size is not a multiple of the cache page size. The Integration Service stores cached data in cache pages. The cached pages must fit evenly into the cache. Thus, if you configure 10 MB (1,048,576 bytes) for the cache size and the cache page size is 10,000 bytes, then the Integration Service increases the configured cache size to 1,050,000 bytes to make it a multiple of the 10,000-byte page size.
When the Integration Service increases the configured cache size, it continues to run the session and writes a message similar to the following message in the session log:
MAPPING> TE_7212 Increasing [Index Cache] size for transformation <transformation name> from <configured index cache size> to <new index cache size>.
Review the session log to verify that enough memory is allocated for the minimum requirements. For optimal performance, set the cache size to the total memory required to process the transformation. If there is not enough cache memory to process the transformation, the Integration Service processes some of the transformation in memory and pages information to disk to process the rest. Use the following information to understand how the Integration Service handles memory caches differently on 32-bit and 64-bit machines:
An Integration Service process running on a 32-bit machine cannot run a session if the total size of all the configured session caches is more than 2 GB. If you run the session on a grid, the total cache size of all session threads running on a single node must not exceed 2 GB. If a grid has 32-bit and 64-bit Integration Service processes and a session exceeds 2 GB of memory, you must configure the session to run on an Integration Service on a 64-bit machine. For more information about grids, see Running Workflows on a Grid on page 629.
Cache Memory
771
Cache Files
When you run a session, the Integration Service creates at least one cache file for each transformation. If the Integration Service cannot process a transformation in memory, it writes the overflow values to the cache files. Table 31-1 describes the types of cache files that the Integration Service creates for different mapping objects:
Table 31-1. Types of Cache Files
Mapping Object Aggregator, Joiner, Lookup, and Rank transformations Sorter transformation XML target Cache File The Integration Service creates the following types of cache files: - One header file for each index cache and data cache - One data file for each index cache and data cache The Integration Service creates one sorter cache file. The Integration Service creates the following types of cache files: - One data cache file for each XML target group - One primary key index cache file for each XML target group - One foreign key index cache file for each XML target group
The Integration Service creates cache files based on the Integration Service code page. When you run a session, the Integration Service writes a message in the session log indicating the cache file name and the transformation name. When a session completes, the Integration Service releases cache memory and usually deletes the cache files. You may find index and data cache files in the cache directory under the following circumstances:
The session performs incremental aggregation. You configure the Lookup transformation to use a persistent cache. The session does not complete successfully. The next time you run the session, the Integration Service deletes the existing cache files and create new ones.
Note: Since writing to cache files can slow session performance, configure the cache sizes to
772
Table 31-2 describes the naming convention for each type of cache file:
Table 31-2. Naming Conventions for Cache Files
Cache Files Data and sorter Index Naming Convention [<Name Prefix> | <prefix> <session ID>_<transformation ID>]_[partition index]_[OS][BIT].<suffix> [overflow index] <prefix> <session id>_<transformation id>_<group id>_<key type>.<suffix> <overflow>
OS
BIT
Cache Files
773
Overflow Index
For example, the name of the data file for the index cache is PMLKUP748_2_5S32.idx1. PMLKUP identifies the transformation type as Lookup, 748 is the session ID, 2 is the transformation ID, 5 is the partition index, S (Solaris) is the operating system, and 32 is the bit platform.
For more information about configuring resources for a node, see Managing the Grid in the PowerCenter Administrator Guide. If all Integration Service processes in a grid have slow access to the cache files, set up a separate, local cache file directory for each Integration Service process. An Integration Service process may have faster access to the cache files if it runs on the same machine that contains the cache directory.
774
Cache calculator. Use the calculator to estimate the total amount of memory required to process the transformation. For more information, see Calculating the Cache Size on page 775. Auto cache memory. Use auto memory to specify a maximum limit on the cache size that is allocated for processing the transformation. Use this method if the machine on which the Integration Service process runs has limited cache memory. For more information, see Using Auto Memory Size on page 776. Numeric value. Configure a specific value for the cache size. Configure a specific value when you want to tune the cache size. For more information, see Configuring a Numeric Cache Size on page 778.
You configure the memory requirements differently when the Integration Service uses cache partitioning. If the Integration Service uses cache partitioning, it allocates the configured cache size for each partition. To configure the memory requirements for a transformation with cache partitioning, calculate the total requirements for the transformation and divide by the number of partitions. For more information about configuring the cache size for cache partitioning, see Configuring the Cache Size for Cache Partitioning on page 781. The cache size requirements for a transformation may change when the inputs to the transformation change. Monitor the cache sizes in the session logs on a regular basis to help you tune the cache size. For more information about optimizing the cache size, see Optimizing the Cache Size on page 799.
775
Displays calculated cache sizes. By default, Auto displays. You can also enter new values.
You can select one of the following modes in the cache calculator:
Auto. Choose auto mode if you want the Integration Service to determine the cache size at run time based on the maximum memory configured on the Config Object tab. For more information about auto memory cache, see Using Auto Memory Size on page 776. Calculate. Select to calculate the total requirements for a transformation based on inputs. The cache calculator requires different inputs for each transformation. You must select the applicable cache type to apply the calculated cache size. For example, to apply the calculated cache size for the data cache and not the index cache, select only the Data Cache Size option.
The cache calculator estimates the cache size required for optimal session performance based on your input. After you configure the cache size and run the session, you can review the transformation statistics in the session log to tune the configured cache size. For more information, see Optimizing the Cache Size on page 799.
Note: You cannot use the cache calculator to estimate the cache size for an XML target.
776
The machine only has 800 MB of cache memory available. The following figure shows maximum memory allocation of 800 MB:
When you configure a numeric value and a percentage for the auto cache memory, the Integration Service compares the values and uses the lesser of the two for the maximum memory limit. The Integration Service allocates up to 800 MB as long as 800 MB is less than 5% of the total memory. To configure auto cache memory, you can use the cache calculator or you can enter Auto directly into the session properties. By default, transformations use auto cache memory. If a session has multiple transformations that require caching, you can configure some transformations with auto memory cache and other transformations with numeric cache sizes. The Integration Service allocates the maximum memory specified for auto caching in addition to the configured numeric cache sizes. For example, a session has three transformations. You assign auto caching to two transformations and specify a maximum memory cache size of 800 MB. You specify 500 MB as the cache size for the third transformation. The Integration Service allocates a total of 1,300 MB of memory. If the Integration Service uses cache partitioning, the Integration Service distributes the maximum cache size specified for the auto cache memory across all transformations in the session and divides the cache memory for each transformation among all of its partitions. For more information about configuring automatic memory settings, see Configuring Automatic Memory Settings on page 214.
777
In the Workflow Manager, open the session. Click the Mapping tab. Select the mapping object in the left pane.
778
The right pane of the Mapping tab shows the object properties where you can configure the cache size:
4.
Use one of the following methods to set the cache size: Enter a value for the cache size, click OK, and then skip to step 8. If you enter a value, all values are in bytes by default. However, you can enter a value and specify one of the following units: KB, MB, or GB. If you enter the units, do not enter a space between the value and unit. For example, enter 350000KB, 200MB, or 1GB. -orEnter Auto for the cache size, click OK, and then skip to step 8. -or-
779
5.
Select a mode. Select the Auto mode to limit the amount of cache allocated to the transformation. Skip to step 8. -orSelect the Calculate mode to calculate the total memory requirement for the transformation.
6.
Provide the input based on the transformation type, and click Calculate.
Note: If the input value is too large and you cannot enter the value in the cache calculator,
use auto memory cache. The cache calculator calculates the cache sizes in kilobytes.
7. 8.
If the transformation has a data cache and index cache, select Data Cache Size, Index Cache Size, or both. Click OK to apply the calculated values to the cache sizes you selected in step 7.
780
Cache Partitioning
When you create a session with multiple partitions, the Integration Service may use cache partitioning for the Aggregator, Joiner, Lookup, Rank, and Sorter transformations. When the Integration Service partitions a cache, it creates a separate cache for each partition and allocates the configured cache size to each partition. The Integration Service stores different data in each cache, where each cache contains only the rows needed by that partition. As a result, the Integration Service requires a portion of total cache memory for each partition. When the Integration Service uses cache partitioning, it accesses the cache in parallel for each partition. If it does not use cache partitioning, it accesses the cache serially for each partition. Table 31-4 describes the situations when the Integration Service uses cache partitioning for each applicable transformation:
Table 31-4. Cache Partitioning for Each Transformation
Transformation Aggregator Transformation Description You create multiple partitions in a session with an Aggregator transformation. You do not have to set a partition point at the Aggregator transformation. For more information about Aggregator transformation caches, see Aggregator Caches on page 783. You create a partition point at the Joiner transformation. For more information about caches for the Joiner transformation, see Joiner Caches on page 786. You create a hash auto-keys partition point at the Lookup transformation. For more information about Lookup transformation caches, see Lookup Caches on page 790. You create multiple partitions in a session with a Rank transformation. You do not have to set a partition point at the Rank transformation. For more information about Rank transformation caches, see Rank Caches on page 793. You create multiple partitions in a session with a Sorter transformation. You do not have to set a partition point at the Sorter transformation. For more information about Sorter transformation caches, see Sorter Caches on page 796.
Sorter Transformation
partitioning method. If you use dynamic partitioning based on the nodes in a grid, the Integration Service creates one partition for each node. If you use dynamic partitioning based on the source partitioning, use the number of partitions in the source database. For more information about dynamic partitioning, see Dynamic Partitioning on page 433.
782
Aggregator Caches
The Integration Service uses cache memory to process Aggregator transformations with unsorted input. When you run the session, the Integration Service stores data in memory until it completes the aggregate calculations. The Integration Service creates the following caches for the Aggregator transformation:
Index cache. Stores group values as configured in the group by ports. Data cache. Stores calculations based on the group by ports.
By default, the Integration Service creates one memory cache and one disk cache for both the data and index in the transformation. When you create multiple partitions in a session with an Aggregator transformation, the Integration Service uses cache partitioning. It creates one disk cache for all partitions and a separate memory cache for each partition.
Incremental Aggregation
The first time you run an incremental aggregation session, the Integration Service processes the source. At the end of the session, the Integration Service stores the aggregated data in two cache files, the index and data cache files. The Integration Service saves the cache files in the cache file directory. The next time you run the session, the Integration Service aggregates the new rows with the cached aggregated values in the cache files. When you run a session with an incremental Aggregator transformation, the Integration Service creates a backup of the Aggregator cache files in $PMCacheDir at the beginning of a session run. The Integration Service promotes the backup cache to the initial cache at the beginning of a session recovery run. The Integration Service cannot restore the backup cache file if the session aborts. When you create multiple partitions in a session that uses incremental aggregation, the Integration Service creates one set of cache files for each partition. For information about caching with incremental aggregation, see Partitioning Guidelines with Incremental Aggregation on page 766.
Aggregator Caches
783
The following dialog box shows the inputs that are required to calculate the cache size for an Aggregator transformation:
The following table describes the input you provide to calculate the Aggregator cache sizes:
Option Name Number of Groups Required/ Optional Required Description Number of groups. The Aggregator transformation aggregates data by group. Calculate the number of groups using the group by ports. For example, if you group by Store ID and Item ID, you have 5 stores and 25 items, and each store contains all 25 items, then calculate the number of groups as:
5 * 25 = 125 groups
Required
The data movement mode of the Integration Service. The cache requirement varies based on the data movement mode. Each ASCII character uses one byte. Each Unicode character uses two bytes.
Enter the input and then click Calculate to calculate the data and index cache sizes. The calculated values appear in the Data Cache Size and Index Cache Size fields. For more information about configuring cache sizes, see Configuring the Cache Size on page 775.
Troubleshooting
Use the information in this section to help troubleshoot caching for an Aggregator transformation. The following warning appears when I use the cache calculator to calculate the cache size for an Aggregator transformation:
CMN_2019 Warning: The estimated data cache size assumes the number of aggregate functions equals the number of connected output-only ports. If there are more aggregate functions, increase the cache size to cache all data in memory.
You can use one or more aggregate functions in an Aggregator transformation. The cache calculator estimates the cache size when the output is based on one aggregate function. If you use multiple aggregate functions to determine a value for one output port, then you must increase the cache size.
784
Review the transformation statistics in the session log and tune the cache size for the Aggregator transformation in the session. For more information about optimizing the cache size, see Optimizing the Cache Size on page 799. The following memory allocation error appears in the session log when I run a session with an aggregator cache size greater than 4 GB and the Integration Service process runs on a Hewlett Packard 64-bit machine with a PA RISC processor:
FATAL 8/17/2006 5:12:21 PM node01_havoc *********** FATAL ERROR : Failed to allocate memory (out of virtual memory). *********** FATAL 8/17/2006 5:12:21 PM node01_havoc *********** FATAL ERROR : Aborting the DTM process due to memory allocation failure. ***********
By default, a 64-bit HP-UX machine with a PA RISC processor allocates up to 4 GB of memory for each process. If a session requires more than 4 GB of memory, increase the maximum data memory limit for the machine using the maxdsiz_64bit operating system variable. For more information about maxdsiz_64bit, see the following URL:
http://docs.hp.com/en/B3921-90010/maxdsiz.5.html
Aggregator Caches
785
Joiner Caches
The Integration Service uses cache memory to process Joiner transformations. When you run a session, the Integration Service reads rows from the master and detail sources concurrently and builds index and data caches based on the master rows. The Integration Service performs the join based on the detail source data and the cached master data. The Integration Service stores a different number of rows in the caches based on the type of Joiner transformation. Table 31-5 describes the information that Integration Service stores in the caches for different types of Joiner transformations:
Table 31-5. Caches for Joiner Transformation
Joiner Transformation Type Unsorted Input Sorted Input with Different Sources Index Cache Stores all master rows in the join condition with unique index keys. Stores 100 master rows in the join condition with unique index keys. Data Cache Stores all master rows. Stores master rows that correspond to the rows stored in the index cache. If the master data contains multiple rows with the same key, the Integration Service stores more than 100 rows in the data cache. Stores data for the rows stored in the index cache. If the index cache stores keys for the master pipeline, the data cache stores the data for master pipeline. If the index cache stores keys for the detail pipeline, the data cache stores data for detail pipeline.
Stores all master or detail rows in the join condition with unique keys. Stores detail rows if the Integration Service processes the detail pipeline faster than the master pipeline. Otherwise, stores master rows. The number of rows it stores depends on the processing rates of the master and detail pipelines. If one pipeline processes its rows faster than the other, the Integration Service caches all rows that have already been processed and keeps them cached until the other pipeline finishes processing its rows.
If the data is sorted, the Integration Service creates one disk cache for all partitions and a separate memory cache for each partition. It releases each row from the cache after it joins the data in the row. If the data is not sorted and there is not a partition at the Joiner transformation, the Integration Service creates one disk cache and a separate memory cache for each partition. If the data is not sorted and there is a partition at the Joiner transformation, the Integration Service creates a separate disk cache and memory cache for each partition. When the data is not sorted, the Integration Service keeps all master data in the cache until it joins all data.
786
When you create multiple partitions in a session, you can use 1:n partitioning or n:n partitioning. The Integration Service processes the Joiner transformation differently when you use 1:n partitioning and when you use n:n partitioning.
1:n Partitioning
You can use 1:n partitioning with Joiner transformations with sorted input. When you use 1:n partitioning, you create one partition for the master pipeline and more than one partition in the detail pipeline. When the Integration Service processes the join, it compares the rows in a detail partition against the rows in the master source. When processing master and detail data for outer joins, the Integration Service outputs unmatched master rows after it processes all detail partitions.
n:n Partitioning
You can use n:n partitioning with Joiner transformations with sorted or unsorted input. When you use n:n partitioning for a Joiner transformation, you create n partitions in the master and detail pipelines. When the Integration Service processes the join, it compares the rows in a detail partition against the rows in the corresponding master partition, ignoring rows in other master partitions. When processing master and detail data for outer joins, the Integration Service outputs unmatched master rows after it processes the partition for each detail cache.
Tip: If the master source has a large number of rows, use n:n partitioning for better session
performance. To use n:n partitioning, you must create multiple partitions in the session and create a partition point at the Joiner transformation. You create the partition point at the Joiner transformation to create multiple partitions for both the master and detail source of the Joiner transformation. If you create a partition point at the Joiner transformation, the Integration Service uses cache partitioning. It creates one memory cache for each partition. The memory cache for each partition contains only the rows needed by that partition. As a result, the Integration Service requires a portion of total cache memory for each partition.
Joiner Caches
787
transformation with n:n partitioning, calculate the total requirements for the transformation, and then divide it by the number of partitions. You can use the cache calculator to determine the cache size required to process the transformation. For example, you use the cache calculator to determine that the Joiner transformation requires 2,000,000 bytes of memory for the index cache and 4,000,000 bytes of memory for the data cache. You create four partitions for the pipeline. If you use 1:n partitioning, configure 2,000,000 bytes for the index cache and 4,000,000 bytes for the data cache. If you use n:n partitioning, configure 500,000 bytes for the index cache and 1,000,000 bytes for the data cache. The following dialog box shows the inputs that are required to calculate the cache size for a Joiner transformation:
The following table describes the input you provide to calculate the Joiner cache sizes:
Input Number of Master Rows Required/ Optional Conditional Description Number of rows in the master source. Applies to a Joiner transformation with unsorted input. The number of master rows does not affect the cache size for a sorted Joiner transformation. Note: If rows in the master source share unique keys, the cache calculator overestimates the index cache size. The data movement mode of the Integration Service. The cache requirement varies based on the data movement mode. ASCII characters use one byte. Unicode characters use two bytes.
Required
Enter the input and then click Calculate to calculate the data and index cache sizes. The calculated values appear in the Data Cache Size and Index Cache Size fields. For more information about configuring cache sizes, see Configuring the Cache Size on page 775.
Troubleshooting
Use the information in this section to help troubleshoot caching for a Joiner transformation.
788
The following warning appears when I use the cache calculator to calculate the cache size for a Joiner transformation with sorted input:
CMN_2020 Warning: If the master and detail pipelines of a sorted Joiner transformation are from the same source, the Integration Service cannot determine how fast it will process the rows in each pipeline. As a result, the cache size estimate may be inaccurate.
The master and detail pipelines process rows concurrently. If you join data from the same source, the pipelines may process the rows at different rates. If one pipeline processes its rows faster than the other, the Integration Service caches all rows that have already been processed and keeps them cached until the other pipeline finishes processing its rows. The amount of rows cached depends on the difference in processing rates between the two pipelines. The cache size must be large enough to store all cached rows to achieve optimal session performance. If the cache size is not large enough, increase it.
Note: This message applies if you join data from the same source even though it also appears
when you join data from different sources. The following warning appears when I use the cache calculator to calculate the cache size for a Joiner transformation with sorted input:
CMN_2021 Warning: Increase the data cache size if the sorted Joiner transformation processes master rows that share the same key. To determine the new cache size, divide the existing cache size by 2.5 and multiply the result by the average number of master rows per key.
When you calculate the cache size for the Joiner transformation with sorted input, the cache calculator bases the estimated cache requirements on an average of 2.5 master rows for each unique key. If the average number of master rows for each unique key is greater than 2.5, increase the cache size accordingly. For example, if the average number of master rows for each unique key is 5 (double the size of 2.5), then double the cache size calculated by the cache calculator.
Joiner Caches
789
Lookup Caches
If you enable caching in a Lookup transformation, the Integration Service builds a cache in memory to store lookup data. When the Integration Service builds a lookup cache in memory, it processes the first row of data in the transformation and queries the cache for each row that enters the transformation. If you do not enable caching, the Integration Service queries the lookup source for each input row. The result of the Lookup query and processing is the same, whether or not you cache the lookup source. However, using a lookup cache can increase session performance. You can optimize performance by caching the lookup source when the source is large. If the lookup does not change between sessions, you can configure the transformation to use a persistent lookup cache. When you run the session, the Integration Service rebuilds the persistent cache if any cache file is missing or invalid. The Integration Service creates the following caches for the Lookup transformation:
Data cache. For a connected Lookup transformation, stores data for the connected output ports, not including ports used in the lookup condition. For an unconnected Lookup transformation, stores data from the return port. Index cache. Stores data for the columns used in the lookup condition.
The Integration Service creates disk and memory caches based on the lookup caching and partitioning information. The following table describes the caches that the Integration Service creates based on the cache and partitioning information:
Lookup Conditions - Static cache - No hash auto-keys partition point - Dynamic cache - No hash auto-keys partition point - Static or dynamic cache - Hash auto-keys partition point Disk Cache One disk cache for all partitions. One disk cache for all partitions. One disk cache for each partition. Memory Cache One memory cache for each partition. One memory cache for all partitions. One memory cache for each partition.
When you create multiple partitions in a session with a Lookup transformation and create a hash auto-keys partition point at the Lookup transformation, the Integration Service uses cache partitioning. When the Integration Service uses cache partitioning, it creates caches for the Lookup transformation when the first row of any partition reaches the Lookup transformation. If you configure the Lookup transformation for concurrent caches, the Integration Service builds all caches for the partitions concurrently.
790
Sharing Caches
The Integration Service handles shared lookup caches differently depending on whether the cache is static or dynamic:
Static cache. If two Lookup transformations share a static cache, the Integration Service does not allocate additional memory for shared transformations in the same pipeline stage. For shared transformations in different pipeline stages, the Integration Service does allocate additional memory. Static Lookup transformations that use the same data or a subset of data to create a disk cache can share the disk cache. However, the lookup keys may be different, so the transformations must have separate memory caches.
Dynamic cache. When Lookup transformations share a dynamic cache, the Integration Service updates the memory cache and disk cache. To keep the caches synchronized, the Integration Service must share the disk cache and the corresponding memory cache between the transformations.
The following table describes the input you provide to calculate the Lookup cache sizes:
Input Number of Rows with Unique Lookup Keys Data Movement Mode Required/ Optional Required Required Description Number of rows in the lookup source with unique lookup keys. The data movement mode of the Integration Service. The cache requirement varies based on the data movement mode. ASCII characters use one byte. Unicode characters use two bytes.
Lookup Caches
791
Enter the input and then click Calculate to calculate the data and index cache sizes. The calculated values appear in the Data Cache Size and Index Cache Size fields. For more information about configuring cache sizes, see Configuring the Cache Size on page 775.
792
Rank Caches
The Integration Service uses cache memory to process Rank transformations. It stores data in rank memory until it completes the rankings. When the Integration Service runs a session with a Rank transformation, it compares an input row with rows in the data cache. If the input row out-ranks a stored row, the Integration Service replaces the stored row with the input row. For example, you configure a Rank transformation to find the top three sales. The Integration Service reads the following input data:
SALES 10,000 12,210 5,000 2,455 6,324
The Integration Service caches the first three rows (10,000, 12,210, and 5,000). When the Integration Service reads the next row (2,455), it compares it to the cache values. Since the row is lower in rank than the cached rows, it discards the row with 2,455. The next row (6,324), however, is higher in rank than one of the cached rows. Therefore, the Integration Service replaces the cached row with the higher-ranked input row. If the Rank transformation is configured to rank across multiple groups, the Integration Service ranks incrementally for each group it finds. The Integration Service creates the following caches for the Rank transformation:
Data cache. Stores ranking information based on the group by ports. Index cache. Stores group values as configured in the group by ports.
By default, the Integration Service creates one memory cache and disk cache for all partitions. If you create multiple partitions for the session, the Integration Service uses cache partitioning. It creates one disk cache for the Rank transformation and one memory cache for each partition, and routes data from one partition to another based on group key values of the transformation.
Rank Caches
793
The following table describes the input you provide to calculate the Rank cache sizes:
Input Number of Groups Required/ Optional Required Description Number of groups. The Rank transformation ranks data by group. Determine the number of groups using the group by ports. For example, if you group by Store ID and Item ID, have 5 stores and 25 items, and each store has all 25 items, then calculate the number of groups as:
5 * 25 = 125 groups
Read-only
Number items in the ranking. For example, if you want to rank the top 10 sales, you have 10 ranks. The cache calculator populates this value based on the value set in the Rank transformation. The data movement mode of the Integration Service. The cache requirement varies based on the data movement mode. ASCII characters use one byte. Unicode characters use two bytes.
Required
Enter the input and then click Calculate to calculate the data and index cache sizes. The calculated values appear in the Data Cache Size and Index Cache Size fields. For more information about configuring cache sizes, see Configuring the Cache Size on page 775.
Troubleshooting
Use the information in this section to help troubleshoot caching for a Rank transformation.
794
The following memory allocation error appears in the session log when I run a session with an rank cache size greater than 4 GB and the Integration Service process runs on a Hewlett Packard 64-bit machine with a PA RISC processor:
FATAL 8/17/2006 5:12:21 PM node01_havoc *********** FATAL ERROR : Failed to allocate memory (out of virtual memory). *********** FATAL 8/17/2006 5:12:21 PM node01_havoc *********** FATAL ERROR : Aborting the DTM process due to memory allocation failure. ***********
By default, a 64-bit HP-UX machine with a PA RISC processor allocates up to 4 GB of memory for each process. If a session requires more than 4 GB of memory, increase the maximum data memory limit for the machine using the maxdsiz_64bit operating system variable. For more information about maxdsiz_64bit, see the following URL:
http://docs.hp.com/en/B3921-90010/maxdsiz.5.html
Rank Caches
795
Sorter Caches
The Integration Service uses cache memory to process Sorter transformations. The Integration Service passes all incoming data into the Sorter transformation before it performs the sort operation. The Integration Service creates a sorter cache to store sort keys and data while the Integration Service sorts the data. By default, the Integration Service creates one memory cache and disk cache for all partitions. If you create multiple partitions in the session, the Integration Service uses cache partitioning. It creates one disk cache for the Sorter transformation and one memory cache for each partition. The Integration Service creates a separate cache for each partition and sorts each partition separately. If you do not configure the cache size to sort all of the data in memory, a warning appears in the session log, stating that the Integration Service made multiple passes on the source data. The Integration Service makes multiple passes on the data when it has to page information to disk to complete the sort. The message specifies the number of bytes required for a single pass, which is when the Integration Service reads the data once and performs the sort in memory without paging to disk. To increase session performance, configure the cache size so that the Integration Service makes one pass on the data.
796
The following table describes the input you provide to calculate the Sorter cache size:
Input Number of Rows Data Movement Mode Required/ Optional Required Required Description Number of rows. The data movement mode of the Integration Service. The cache requirement varies based on the data movement mode. ASCII characters use one byte. Unicode characters use two bytes.
Enter the input and then click Calculate to calculate the sorter cache size. The calculated value appears in the Sorter Cache Size field. For more information about configuring cache sizes, see Configuring the Cache Size on page 775.
Sorter Caches
797
Data cache. Stores XML row data while it generates an XML target document. Stores one data cache for all groups. Index caches. Stores primary keys or foreign keys. Creates a primary key index cache and a foreign key index cache for each group.
3.
The following table shows how to calculate the size of the data cache and trees for a group:
Data Cache Primary Key Tree Size Foreign Key Tree Size = = = (Number of rows in a group) X (Row size of the group) (Number of rows in a group) X (Primary key index cache size)
For more information about auto memory, see Using Auto Memory Size on page 776.
Note: You cannot use the cache calculator to configure the cache size for an XML target.
798
The log shows that the index cache requires 286,720 bytes and the data cache requires 1,774,368 bytes to process the transformation in memory without paging to disk. The cache size may vary depending on changes to the session or source data. Review the session logs after subsequent session runs to monitor changes to the cache size. You must set the tracing level to Verbose Initialization in the session properties to enable the Integration Service to write the transformation statistics to the session log. For more information about configuring tracing levels, see Setting Tracing Levels on page 662.
Note: The session log does not contain transformation statistics for a Sorter, a Joiner
transformation with sorted input, an Aggregator transformation with sorted input, or an XML target.
799
800
Appendix A
General Tab, 802 Properties Tab, 804 Config Object Tab, 811 Mapping Tab (Transformations View), 821 Mapping Tab (Partitions View), 843 Components Tab, 848 Metadata Extensions Tab, 856
801
General Tab
By default, the General tab appears when you edit a session task. Figure A-1 shows the General tab:
Figure A-1. General Tab
On the General tab, you can rename the session task and enter a description for the session task. Table A-1 describes settings on the General tab:
Table A-1. General Tab
General Tab Options Rename Description Mapping name Resources Fail Parent if This Task Fails* Required/ Optional Optional Optional Required Optional Optional Description You can enter a new name for the session task with the Rename button. You can enter a description for the session task in the Description field. Name of the mapping associated with the session task. You can associate an object with an available resource. Fails the parent worklet or workflow if this task fails.
802
Optional Required
Disables the task. Runs the task when all or one of the input link conditions evaluate to True.
General Tab
803
Properties Tab
On the Properties tab, you can configure the following settings:
General Options. General Options settings allow you to configure session log file name, session log file directory, parameter file name and other general session settings. For more information, see General Options Settings on page 804. Performance. The Performance settings allow you to increase memory size, collect performance details, and set configuration parameters. For more information, see Performance Settings on page 807.
804
Table A-2 describes the General Options settings on the Properties tab:
Table A-2. Properties Tab - General Options Settings
General Options Settings Write Backward Compatible Session Log File Session Log File Name Required/ Optional Optional Description Select to write session log to a file.
Optional
By default, the Integration Service uses the session name for the log file name: s_mapping name.log. For a debug session, it uses DebugSession_mapping name.log. Enter a file name, a file name and directory, or use the $PMSessionLogFile session parameter. The Integration Service appends information in this field to that entered in the Session Log File Directory field. For example, if you have C:\session_logs\ in the Session Log File Directory File field and then enter logname.txt in the Session Log File field, the Integration Service writes the logname.txt to the C:\session_logs\ directory. You can also use the $PMSessionLogFile session parameter to represent the name of the session log or the name and location of the session log. For more information about session parameters, see Parameter Files on page 687. Designates a location for the session log file. By default, the Integration Service writes the log file in the service process variable directory, $PMSessionLogDir. If you enter a full directory and file name in the Session Log File Name field, clear this field. Designates the name and directory for the parameter file. Use the parameter file to define session parameters and override values of mapping parameters and variables. For more information about session parameters, see Parameter Files on page 687. For more information about mapping parameters and variables, see the PowerCenter Designer Guide. You can enter a workflow or worklet variable as the session parameter file name if you configure a workflow to run concurrently, and you want to use different parameter files for the sessions in each workflow run instance. For more information, see Using Variables to Specify Session Parameter Files on page 681. You can configure the Integration Service to perform a test load. With a test load, the Integration Service reads and transforms data without writing to targets. The Integration Service generates all session files and performs all pre- and post-session functions, as if running the full session. The Integration Service writes data to relational targets, but rolls back the data when the session completes. For all other target types, such as flat file and SAP BW, the Integration Service does not write data to the targets. Enter the number of source rows you want to test in the Number of Rows to Test field. You cannot perform a test load on sessions that use XML sources. Note: You can perform a test load when you configure a session for normal mode. If you configure the session for bulk mode, the session fails. Enter the number of source rows you want the Integration Service to test load. The Integration Service reads the number you configure for the test load. You cannot perform a test load when you run a session against a mapping that contains XML sources.
Required
Optional
Optional
Optional
Properties Tab
805
Optional
Required
Commit Type
Required
Commit Interval
Required
806
Optional
Required
Java Classpath
Optional
*Tip: When you bulk load to Microsoft SQL Server or Oracle targets, define a large commit interval. Microsoft SQL Server and Oracle start a new bulk load transaction after each commit. Increasing the commit interval reduces the number of bulk load transactions and increases performance.
Performance Settings
You can configure performance settings on the Properties tab. In Performance settings, you can increase memory size, collect performance details, and set configuration parameters.
Properties Tab
807
808
Optional
Optional
Optional
Optional
Optional
Optional
Properties Tab
809
Conditional
Conditional
Required
810
Advanced. Advanced settings allow you to configure constraint-based loading, lookup caches, and buffer sizes. For more information, see Advanced Settings on page 811. Log Options. Log options allow you to configure how you want to save the session log. By default, the Log Manager saves only the current session log. For more information, see Log Options Settings on page 814. Error Handling. Error Handling settings allow you to determine if the session fails or continues when it encounters pre-session command errors, stored procedure errors, or a specified number of session errors. For more information see, Error Handling Settings on page 816. Partitioning Options. Partitioning options allow the Integration Service to determine the number of partitions to create at run time. For more information, see Partitioning Options on page 818. Session on Grid. When Session on Grid is enabled, the Integration Service distributes session threads to the nodes in a grid to increase performance and scalability. For more information, see Session on Grid on page 819.
Advanced Settings
Advanced settings allow you to configure constraint-based loading, lookup caches, and buffer sizes.
811
Figure A-4 shows the Advanced settings on the Config Object tab:
Figure A-4. Config Object Tab - Advanced Settings
Table A-4 describes the Advanced settings of the Config Object tab:
Table A-4. Config Object Tab - Advanced Settings
Advanced Settings Constraint Based Load Ordering Cache Lookup() Function Required/ Optional Optional Optional Description Integration Service loads targets based on primary key-foreign key constraints where possible. If selected, the Integration Service caches PowerMart 3.5 LOOKUP functions in the mapping, overriding mapping-level LOOKUP configurations. If not selected, the Integration Service performs lookups on a row-by-row basis, unless otherwise specified in the mapping.
812
Optional
Optional
Optional
813
Custom Properties
Optional
Optional
Optional
814
Figure A-5 shows the Log Options settings on the Config Object tab:
Figure A-5. Config Object Tab - Log Option Settings
Table A-5 shows the Log Options settings of the Config Object tab:
Table A-5. Config Object Tab - Log Options Settings
Log Options Settings Save Session Log By Required/ Optional Required Description Configure this option when you choose to save session log files. If you select Save Session Log by Timestamp, the Log Manager saves all session logs, appending a timestamp to each log. If you select Save Session Log by Runs, the Log Manager saves a designated number of session logs. Configure the number of sessions in the Save Session Log for These Runs option. You can also use the $PMSessionLogCount service variable to save the configured number of session logs for the Integration Service. For more information about these options, see Session Logs on page 661. Number of historical session logs you want the Log Manager to save. The Log Manager saves the number of historical logs you specify, plus the most recent session log. When you configure 5 runs, the Log Manager saves the most recent session log, plus historical logs 0-4, for a total of 6 logs. You can configure up to 2,147,483,647 historical logs. If you configure 0 logs, the Log Manager saves only the most recent session log.
Required
815
Table A-6 describes the Error handling settings of the Config Object tab:
Table A-6. Config Object Tab - Error Handling Settings
Error Handling Settings Stop On Errors Required/ Optional Optional Description Indicates how many non-fatal errors the Integration Service can encounter before it stops the session. Non-fatal errors include reader, writer, and DTM errors. Enter the number of non-fatal errors you want to allow before stopping the session. The Integration Service maintains an independent error count for each source, target, and transformation. If you specify 0, non-fatal errors do not cause the session to stop. Optionally use the $PMSessionErrorThreshold service variable to stop on the configured number of errors for the Integration Service. Overrides tracing levels set on a transformation level. Selecting this option enables a menu from which you choose a tracing level: None, Terse, Normal, Verbose Initialization, or Verbose Data. For more information about tracing levels, see Session Logs on page 661.
Override Tracing
Optional
816
Optional
Optional
Error Log Type Error Log DB Connection Error Log Table Name Prefix Error Log File Directory Error Log File Name Log Row Data
Optional Optional
817
Partitioning Options
When you configure dynamic partitioning, the Integration Service determines the number of partitions to create at run time. Configure dynamic partitioning on the Config Object tab of session properties. Figure A-7 shows the partitioning options:
Figure A-7. Config Object Tab - Partitioning Options
818
Table A-7 describes the Partitioning Options settings on the Config Objects tab:
Table A-7. Config Objects Tab - Partitioning Options
Partitioning Options Settings Dynamic Partitioning Required/ Optional Required Description Configure dynamic partitioning using one of the following methods: - Disabled. Do not use dynamic partitioning. Define the number of partitions on the Mapping tab. - Based on number of partitions. Sets the partitions to a number that you define in the Number of Partitions attribute. Use the $DynamicPartitionCount session parameter, or enter a number greater than 1. - Based on number of nodes in grid. Sets the partitions to the number of nodes in the grid running the session. If you configure this option for sessions that do not run on a grid, the session runs in one partition and logs a message in the session log. - Based on source partitioning. Determines the number of partitions using database partition information. The number of partitions is the maximum of the number of partitions at the source. Default is disabled. Determines the number of partitions that the Integration Service creates when you configure dynamic partitioning based on the number of partitions. Enter a value greater than 1 or use the $DynamicPartitionCount session parameter.
Number of Partitions
Required
Session on Grid
When Session on Grid is enabled, the Integration Service distributes workflows and session threads to the nodes in a grid to increase performance and scalability.
819
Figure A-8 shows the Session on Grid option on the Config Object tab:
Figure A-8. Config Object Tab - Session on Grid
Table A-8 describes the Session on Grid setting on the Config Object tab:
Table A-8. Config Object Tab - Session on Grid
Session on Grid Setting Is Enabled Required/ Optional Optional Description Specifies whether the session runs on a grid.
820
Connections Node
The Connections node displays the source, target, lookup, stored procedure, FTP, external loader, and queue connections. You can choose connection types and connection values. You can also edit connection object values. Figure A-9 shows the Connections settings on the Mapping tab:
Figure A-9. Mapping Tab - Connections Settings
821
Partitions Value
n/a Required
Sources Node
The Sources node lists the sources used in the session and displays their settings. If you want to view and configure the settings of a specific source, select the source from the list. You can configure the following settings:
Readers. The Readers settings displays the reader the Integration Service uses with each source instance. For more information, see Readers Settings on page 822. Connections. The Connections settings lets you configure connections for the sources. For more information, see Connections Settings on page 823. Properties. The Properties settings lets you configure the source properties. For more information, see Properties Settings on page 824.
Readers Settings
You can view the reader the Integration Service uses with each source instance. The Workflow Manager specifies the necessary reader for each source instance. For relations sources the reader is Relational Reader and for file sources it is File Reader.
822
Figure A-10 shows the Readers settings on the Mapping tab (Sources node):
Figure A-10. Mapping Tab - Sources Node - Readers Settings
Connections Settings
You can configure the connections the Integration Service uses with each source instance.
823
Figure A-11 shows the Connections settings on the Mapping tab (Sources node):
Figure A-11. Mapping Tab - Sources Node - Connections Settings
Table A-10 describes the Connections settings on the Mapping tab (Sources node):
Table A-10. Mapping Tab - Sources Node - Connections Settings
Connections Settings Type Required/ Optional Required Description Enter the connection type for relational and non-relational sources. Specifies Relational for relational sources. For more information about configuring connections in a session, see Configuring Session Connections on page 40. Enter a source connection based on the value you choose in the Type column.
Value
Required
Properties Settings
Click the Properties settings to define source property information. The Workflow Manager displays properties for both relational and file sources.
824
Figure A-12 shows the Properties settings on the Mapping tab (Sources node):
Figure A-12. Mapping Tab - Sources Node - Properties Settings
Table A-11 describes Properties settings on the Mapping tab for relational sources:
Table A-11. Mapping Tab - Sources Node - Properties Settings (Relational Sources)
Relational Source Options Owner Name Source Table Name Required/ Optional Optional Optional Description Specifies the table owner name. Specifies the source table name. Enter a table name, or enter a parameter or variable to define the source table name in a parameter file. You can use mapping parameters, mapping variables, session parameters, workflow variables, or worklet variables. For more information about using parameter files, see Parameter Files on page 687. If you override the source table name here, and you override the source table name using an SQL query, the Integration Service uses the source table name defined in the SQL query. Specifies the condition used to join data from multiple sources represented in the same Source Qualifier transformation. For more information about user defined join, see Source Qualifier Transformation in the PowerCenter Transformation Guide.
Optional
825
Table A-11. Mapping Tab - Sources Node - Properties Settings (Relational Sources)
Relational Source Options Tracing Level Required/ Optional n/a Description Specifies the amount of detail included in the session log when you run a session containing this transformation. You can view the value of this attribute when you click Show all properties. For more information about tracing level, see Setting Tracing Levels on page 662. Selects unique rows. Pre-session SQL commands to run against the source database before the Integration Service reads the source. For more information about pre-session SQL, see Using Pre- and Post-Session SQL Commands on page 223. Post-session SQL commands to run against the source database after the Integration Service writes to the target. For more information about post-session SQL, see Using Pre- and Post-Session SQL Commands on page 223. Defines a custom query that replaces the default query the Integration Service uses to read data from sources represented in this Source Qualifier. A custom query overrides entries for a custom join or a source filter. For more information, see Overriding the SQL Query on page 259. Specifies the filter condition the Integration Service applies when querying records. For more information, see Source Qualifier Transformation in the PowerCenter Transformation Guide.
Optional Optional
Post SQL
Optional
Sql Query
Optional
Source Filter
Optional
Table A-12 describes the Properties settings on the Mapping tab for file sources:
Table A-12. Mapping Tab - Sources Node - Properties Settings (File Sources)
File Source Options Source File Directory Required/ Optional Optional Description Enter the directory name in this field. By default, the Integration Service looks in the service process variable directory, $PMSourceFileDir, for file sources. If you specify both the directory and file name in the Source Filename field, clear this field. The Integration Service concatenates this field with the Source Filename field when it runs the session. You can also use the $InputFileName session parameter to specify the file directory. For more information about session parameters, see Parameter Files on page 687. Enter the file name, or file name and path. Optionally use the $InputFileName session parameter for the file name. The Integration Service concatenates this field with the Source File Directory field when it runs the session. For example, if you have C:\data\ in the Source File Directory field, then enter filename.dat in the Source Filename field. When the Integration Service begins the session, it looks for C:\data\filename.dat. By default, the Workflow Manager enters the file name configured in the source definition. For more information about session parameters, see Parameter Files on page 687.
Source Filename
Required
826
Table A-12. Mapping Tab - Sources Node - Properties Settings (File Sources)
File Source Options Source Filetype Required/ Optional Required Description You can configure multiple file sources using a file list. Indicates whether the source file contains the source data, or a list of files with the same file properties. Select Direct if the source file contains the source data. Select Indirect if the source file contains a list of files. When you select Indirect, the Integration Service finds the file list then reads each listed file when it executes the session. For more information about file lists, see Using a File List on page 277. You can configure the file properties. For more information, see Setting File Properties for Sources on page 827. Displays the datetime format for datetime fields. Displays the thousand separator for numeric fields. Displays the decimal separator for numeric fields.
*You can view the value of this attribute when you click Show all properties. This attribute is read-only. For more information, see the PowerCenter Designer Guide.
Select the file type (fixed-width or delimited) you want to configure and click Advanced.
827
Note: Edit these settings only if you need to override those configured in the source definition.
Figure A-14 shows the Fixed Width Properties dialog box for flat file sources:
Figure A-14. Fixed Width Properties
Table A-13 describes the options you define in the Fixed Width Properties dialog box for sources:
Table A-13. Fixed-Width Properties for File Sources
Fixed-Width Properties Options Null Character: Text/ Binary Required/ Optional Required Description Indicates the character representing a null value in the file. This can be any valid character in the file code page, or any binary value from 0 to 255. For more information about specifying null characters, see Null Character Handling on page 274. If selected, the Integration Service reads repeat NULL characters in a single field as a single NULL value. If you do not select this option, the Integration Service reads a single null character at the beginning of a field as a null field. Important: For multibyte code pages, specify a single-byte null character if you use repeating non-binary null characters. This ensures that repeating null characters fit into the column. For more information about specifying null characters, see Null Character Handling on page 274. Select the code page of the fixed-width file. The default setting is the client code page. Integration Service skips the specified number of rows before reading the file. Use this to skip header rows. One row may contain multiple rows. If you select the Line Sequential File Format option, the Integration Service ignores this option. You can enter any integer from zero to 2147483647.
Optional
Required Optional
828
Optional Optional
Figure A-15 shows the Delimited File Properties dialog box for flat file sources:
Figure A-15. Delimited Properties for File Sources
829
Table A-14 describes the options you can define in the Delimited File Properties dialog box for flat file sources:
Table A-14. Delimited Properties for File Sources
Delimited File Properties Options Delimiters Required/ Optional Required Description Character used to separate columns of data in the source file. Delimiters can be either printable or single-byte unprintable characters, and must be different from the escape character and the quote character (if selected). To enter a single-byte unprintable character, click the Browse button to the right of this field. In the Delimiters dialog box, select an unprintable character from the Insert Delimiter list and click Add. You cannot select unprintable multibyte characters as delimiters. The delimiter must be in the same code page as the flat file code page. Select None, Single, or Double. If you select a quote character, the Integration Service ignores delimiter characters within the quote characters. Therefore, the Integration Service uses quote characters to escape the delimiter. For example, a source file uses a comma as a delimiter and contains the following row: 342-3849, Smith, Jenna, Rockville, MD, 6. If you select the optional single quote character, the Integration Service ignores the commas within the quotes and reads the row as four fields. If you do not select the optional single quote, the Integration Service reads six separate fields. When the Integration Service reads two optional quote characters within a quoted string, it treats them as one quote character. For example, the Integration Service reads the following quoted string as Im going
tomorrow:
Optional Quotes
Required
2353, Im going tomorrow., MD Additionally, if you select an optional quote character, the Integration Service only reads a string as a quoted string if the quote character is the first character of the field. Note: You can improve session performance if the source file does not contain quotes or escape characters. Code Page Row Delimiter Required Optional Select the code page of the delimited file. The default setting is the client code page. Specify a line break character. Select from the list or enter a character. Preface an octal code with a backslash (\). To use a single character, enter the character. The Integration Service uses only the first character when the entry is not preceded by a backslash. The character must be a single-byte character, and no other character in the code page can contain that byte. Default is linefeed, \012 LF (\n). Character immediately preceding a delimiter character embedded in an unquoted string, or immediately preceding the quote character in a quoted string. When you specify an escape character, the Integration Service reads the delimiter character as a regular character (called escaping the delimiter or quote character). Note: You can improve session performance for mappings containing Sequence Generator transformations if the source file does not contain quotes or escape characters.
Escape Character
Optional
830
Optional
Targets Node
The Targets node lists the used in the session and displays their settings. If you want to view and configure the settings of a specific target, select the target from the list. You can configure the following settings:
Writers. The Writers settings displays the writer the Integration Service uses with each target instance. For more information, see Writers Settings on page 831. Connections. The Connections settings lets you configure connections for the targets. For more information, see Connections Settings on page 832. Properties. The Properties settings lets you configure the target properties. For more information, see Properties Settings on page 834.
Writers Settings
You can view and configure the writer the Integration Service uses with each target instance. The Workflow Manager specifies the necessary writer for each target instance. For relational targets the writer is Relational Writer and for file targets it is File Writer.
831
Figure A-16 shows the Writers settings on the Mapping tab (Targets node):
Figure A-16. Mapping Tab - Targets Node - Writers Settings
Table A-15 describes the Writers settings on the Mapping tab (Targets node):
Table A-15. Mapping Tab - Targets Node - Writers Settings
Writers Setting Writers Required/ Optional Required Description For relational targets, choose Relational Writer or File Writer. When the target in the mapping is a flat file, an XML file, an SAP BW target, or MQ target, the Workflow Manager specifies the necessary writer in the session properties. When you choose File Writer for a relational target use an external loader to load data to this target. For more information, see External Loading on page 711. When you override a relational target to use the file writer, the Workflow Manager changes the properties for that target instance on the Properties settings. It also changes the connection options you can define on the Connections settings. After you override a relational target to use a file writer, define the file properties for the target. Click Set File Properties and choose the target to define. For more information, see Configuring Fixed-Width Properties on page 319 and Configuring Delimited Properties on page 320.
Connections Settings
You can enter connection types and specific target database connections on the Targets node of the Mappings tab.
832
Figure A-17 shows the Connections settings on the Mapping tab (Targets node):
Figure A-17. Mapping Tab - Targets Node - Connections Settings
Table A-16 describes the Connections settings on the Mapping tab (Targets node):
Table A-16. Mapping Tab - Targets Node - Connections Settings
Connections Settings Type Required/ Optional Required Description Enter the connection type for relational and non-relational sources and targets based on the value you choose in the Type column. You can choose a connection object based on the connection type: - Application - Queue - External Loader - FTP - None For more information about configuring connections in a session, see Configuring Session Connections on page 40. Displays the partitions if the session is partitioned. Enter a target connection based on the value you choose in the Type column. For more information about configuring connections in a session, see Configuring Session Connections on page 40.
Partitions Value
n/a Required
833
Properties Settings
Click the Properties settings to define target property information. The Workflow Manager displays different properties for the different target types: relational, flat file, and XML.
834
Table A-17 describes the Properties settings on the Mapping tab for relational targets:
Table A-17. Mapping Tab - Targets Node - Properties Settings (Relational)
Target Property Target Load Type Required/ Optional Required Description You can choose Normal or Bulk. If you select Normal, the Integration Service loads targets normally. You can choose Bulk when you load to IBM DB2, Sybase, Oracle, or Microsoft SQL Server. If you select Bulk for an IBM DB2, Sybase, Oracle, or Microsoft SQL Server target, the Integration Service invokes the bulk API with default settings, bypassing database logging. If you select Bulk for other database types, the Integration Service reverts to a normal load. Loading in bulk mode can improve session performance, but limits the ability to recover because no database logging occurs. For more information about bulk loading, see Bulk Loading on page 304. If selected, the Integration Service inserts all rows flagged for insert. By default, this option is selected. For more information about target update strategies, see Update Strategy Transformation in the PowerCenter Transformation Guide. If selected, the Integration Service updates all rows flagged for update. By default, this option is selected. For more information about target update strategies, see Update Strategy Transformation in the PowerCenter Transformation Guide. If selected, the Integration Service inserts all rows flagged for update. By default, this option is not selected. For more information about target update strategies, see Update Strategy Transformation in the PowerCenter Transformation Guide. If selected, the Integration Service updates rows flagged for update if it they exist in the target, then inserts any remaining rows marked for insert. For more information about target update strategies, see Update Strategy Transformation in the PowerCenter Transformation Guide. If selected, the Integration Service deletes all rows flagged for delete. For more information about target update strategies, see Update Strategy Transformation in the PowerCenter Transformation Guide. If selected, the Integration Service truncates the target before loading. For more information about this feature, see Truncating Target Tables on page 298. Enter the directory name in this field. By default, the Integration Service writes all reject files to the service process variable directory, $PMBadFileDir. If you specify both the directory and file name in the Reject Filename field, clear this field. The Integration Service concatenates this field with the Reject Filename field when it runs the session. You can also use the $BadFileName session parameter to specify the file directory. For more information about session parameters, see Parameter Files on page 687.
Insert
Optional
Optional
Optional
Optional
Delete
Optional
Truncate Table
Optional
Optional
835
Rejected Truncated/ Overflowed Rows* Update Override* Table Name Prefix Pre SQL
Optional Optional
*You can view the value of this attribute when you click Show all properties. This attribute is read-only. For more information, see the PowerCenter Designer Guide.
836
Table A-18 describes the Properties settings on the Mapping tab for file targets:
Table A-18. Mapping Tab - Targets Node - File Properties Settings
Target Property Merge Partitioned Files Required/ Optional Optional Description When selected, the Integration Service merges the partitioned target files into one file when the session completes, and then deletes the individual output files. If the Integration Service fails to create the merged file, it does not delete the individual output files. You cannot merge files if the session uses FTP, an external loader, or a message queue. Enter the directory name in this field. By default, the Integration Service writes the merged file in the service process variable directory, $PMTargetFileDir. If you enter a full directory and file name in the Merge File Name field, clear this field. Name of the merge file. Default is target_name.out. This property is required if you select Merge Partitioned Files.
Optional
Optional
837
Output Filename
Required
Optional
Reject Filename
Required
Optional n/a
838
*You can view the value of this attribute when you click Show all properties. This attribute is read-only. For more information, see the PowerCenter Designer Guide.
Select the file type (fixed-width or delimited) you want to configure and click Advanced.
839
Figure A-21 shows the Fixed-Width Properties dialog box for flat file targets:
Figure A-21. Fixed-Width Properties for File Targets
Table A-19 describes the options you define in the Fixed Width Properties dialog box:
Table A-19. Fixed-Width Properties for File Targets
Fixed-Width Properties Options Null Character Required/ Optional Required Description Enter the character you want the Integration Service to use to represent null values. You can enter any valid character in the file code page. For more information about specifying null characters for target files, see Null Characters in Fixed-Width Files on page 327. Select this option to indicate a null value by repeating the null character to fill the field. If you do not select this option, the Integration Service enters a single null character at the beginning of the field to represent a null value. For more information about specifying null characters for target files, see Null Characters in Fixed-Width Files on page 327. Select the code page of the fixed-width file. The default setting is the client code page.
Optional
Code Page
Required
840
Table A-20 describes the options you can define in the Delimited File Properties dialog box for flat file targets:
Table A-20. Delimited Properties for File Targets
Edit Delimiter Options Delimiters Required/ Optional Required Description Character used to separate columns of data. Delimiters can be either printable or single-byte unprintable characters, and must be different from the escape character and the quote character (if selected). To enter a single-byte unprintable character, click the Browse button to the right of this field. In the Delimiters dialog box, select an unprintable character from the Insert Delimiter list and click Add. You cannot select unprintable multibyte characters as delimiters. Select No Quotes, Single Quote, or Double Quotes. If you select a quote character, the Integration Service does not treat delimiter characters within the quote characters as a delimiter. For example, suppose an output file uses a comma as a delimiter and the Integration Service receives the following row: 342-3849, Smith, Jenna, Rockville, MD, 6. If you select the optional single quote character, the Integration Service ignores the commas within the quotes and writes the row as four fields. If you do not select the optional single quote, the Integration Service writes six separate fields. Select the code page of the delimited file. The default setting is the client code page.
Optional Quotes
Required
Code Page
Required
Transformations Node
On the Transformations node, you can override properties that you configure in transformation and target instances in a mapping. The attributes you can configure depends on the type of transformation you select.
841
842
Partition Properties. For more information, see Partition Properties Node on page 843. KeyRange. For more information, see KeyRange Node on page 844. HashKeys. For more information, see HashKeys Node on page 844. Partition Points. For more information, see Partition Points Node on page 844. Non-Partition Points. For more information, see Non-Partition Points Node on page 847.
843
KeyRange Node
In the KeyRange node, you can configure the partition range for key-range partitioning. Select Edit Keys to edit the partition key. For more information, see Edit Partition Key on page 846. Figure A-25 shows the KeyRange node on the Mapping tab:
Figure A-25. Mapping Tab - KeyRange Node
HashKeys Node
The HashKeys node you can configure hash key partitioning. Select Edit Keys to edit the partition key. For more information, see Edit Partition Key on page 846.
Add and delete partition points. Specify the partition type at each partition point.
844
Add and delete partitions. Enter a description for each partition. Add keys and key ranges for certain partition types.
For more information about partitioning a pipeline, see Understanding Pipeline Partitioning on page 427. Figure A-26 shows Mapping tab - Partition Points node:
Figure A-26. Mapping Tab - Partition Points Node
845
Table A-22 describes the options in the Edit Partition Point dialog box:
Table A-22. Edit Partition Point Dialog Box Options
Edit Partition Point Options Add button Delete button Name Description Select Partition Type Description Click to add a partition. You can add up to 64 partitions. For more information about adding partitions, see Steps for Adding Partition Points to a Pipeline on page 444. Click to delete the selected partition. For more information about deleting partitions, see Steps for Adding Partition Points to a Pipeline on page 444. Partition number. Enter a description for the current partition. Select a partition type from the list. For more information, see Setting Partition Types on page 484.
846
You can specify one or more ports as the partition key. To rearrange the order of the ports that make up the key, select a port in the Selected Ports list and click the up or down arrow. For more information about adding a key for key range partitioning, see Key Range Partition Type on page 493. For more information about adding a key for hash partitioning, see Database Partitioning Partition Type on page 486.
847
Components Tab
In the Components tab, you can configure pre-session shell commands, post-session commands, email messages if the session succeeds or fails, and variable assignments. Figure A-29 shows the Components Tab:
Figure A-29. Components Tab
848
Type
Required
Value
Optional
Optional
Optional
Components Tab
849
Optional
Click the Override button to override the Fail Task if Any Command Fails option in the Command task. For more information about the Fail Task if Any Command Fails option, see Table A-26 on page 852.
850
Click the Open button in the Value field in the Components tab to edit pre- or post-session shell commands. The Edit Pre-Session Command or Edit Post-Session Command dialog box appears. Figure A-31 shows the Edit Pre-Session Command dialog box:
Figure A-31. Edit Pre-Session Command Dialog Box
Table A-25 describes General tab for editing pre- or post-session shell commands:
Table A-25. Pre- or Post-Session Commands - General Tab
General Tab Options for Preor Post-Session Commands Name Make Reusable
Description
Enter a name for the pre- or post-session shell command. Select Make Reusable to create a reusable Command task from the pre- or post-session shell commands. Clear the Make Reusable option if you do not want the Workflow Manager to create a reusable Command task from the shell commands. For more information about creating Command tasks from pre- or post-session shell commands, see Creating a Reusable Command Task from Pre- or PostSession Commands on page 228. Enter a description for the pre- or post-session shell command.
Description
Optional
Components Tab
851
Table A-26 describes the Properties tab for editing pre- or post-session commands:
Table A-26. Pre- or Post-Session Commands - Properties Tab
Properties Tab Options for Preor Post-Session Commands Name Fail Task if Any Command Fails
Description
Name of the pre-session shell command. Integration Service stops running the rest of the commands in a task when one command in the Command task fails if you select this option. If you do not select this option, the Integration Service runs all the commands in the Command task and treats the task as completed, even if a command fails.
Table A-27 describes the Commands tab for editing pre- or post-session commands:
Table A-27. Pre- or Post-Session Commands - Commands Tab
Commands Tab Options for Preor Post-Session Commands Name Command
Description
Name of the pre- or post-session shell command. Shell command you want the Integration Service to perform. Enter one command for each line. You can use parameters and variables in shell commands. Use any parameter or variable type that you can define in the parameter file. For more information, see Where to Use Parameters and Variables on page 691. If the command contains spaces, enclose the command in quotes. For example, if you want to call c:\program files\myprog.exe, you must enter c:\program files\myprog.exe, including the quotes. Enter only one command on each line.
Reusable Email
Select Reusable in the Type field for the On-Success or On-Failure email if you want to select an existing Email task as the On-Success or On-Failure email. The Email Object Browser appears when you click the right side of the Values field.
852
Select an Email task to use as On-Success or On-Failure email. Click the Override button to override properties of the email. For more information about email properties, see Table A-29 on page 855.
Non-Reusable Email
Select Non-Reusable in the Type field to create a non-reusable email for the session. NonReusable emails do not appear as Email tasks in the Task folder. Click the right side of the Values field to edit the properties for the non-reusable On-Success or On-Failure emails. For more information about email properties, see Table A-29 on page 855.
Email Properties
You configure email properties for On-Success or On-Failure Emails when you override an existing Email task or when you create a non-reusable email for the session.
Components Tab
853
Figure A-33 shows the dialog box for editing the On-Success or On-Failure email properties:
Figure A-33. On-Success or On-Failure Email - General Tab
Table A-28 describes general settings for editing On-Success or On-Failure emails:
Table A-28. On-Success or On-Failure Emails - General Tab
Email Settings Name Description Required/ Optional Required Required Description Enter a name for the email you want to configure. Enter a description for the email you want to configure.
854
Table A-29 describes the email properties for On-Success or On-Failure emails:
Table A-29. On-Success or On-Failure Emails - Properties Tab
Email Properties Email User Name Required/ Optional Required Description Required to send On-Success or On-Failure session email. Enter the email address of the person you want the Integration Service to email after the session completes. The email address must be entered in 7-bit ASCII. For success email, you can enter $PMSuccessEmailUser to send email to the user configured for the service variable. For failure email, you can enter $PMFailureEmailUser to send email to the user configured for the service variable. Enter the text you want to appear in the subject header. Enter the text of the email. Use several variables when creating this text to convey meaningful information, such as the session name and session status. For more information, see Sending Email on page 401.
Optional Optional
Components Tab
855
You can create and promote metadata extensions with the Metadata Extensions tab. For more information about creating metadata extensions, see Metadata Extensions in the PowerCenter Repository Guide. Table A-30 describes the configuration options for the Metadata Extensions tab:
Table A-30. Metadata Extensions Tab
Metadata Extensions Tab Options Extension Name Datatype Required/ Optional Required Required Description Name of the metadata extension. Metadata extension names must be unique in a domain. Datatype: numeric (integer), string, boolean, or XML.
856
Precision
Reusable
Select to make the metadata extension apply to all objects of this type (reusable). Clear to make the metadata extension apply to this object only (non-reusable). Description of the metadata extension.
Description
Optional
857
858
Appendix B
General Tab, 860 Properties Tab, 862 Scheduler Tab, 864 Variables Tab, 869 Events Tab, 870 Metadata Extensions Tab, 871
859
General Tab
You can change the workflow name and enter a comment for the workflow on the General tab. By default, the General tab appears when you open the workflow properties. Figure B-1 shows the General tab of the workflow properties:
Figure B-1. Workflow Properties - General Tab
Enable the Integration Service to run multiple instances of the workflow at the same time.
860
Disabled
Optional
Suspend on Error
Optional
Web Services
Optional
Optional
Service Level
Optional
General Tab
861
Properties Tab
Configure parameter file name and workflow log options on the Properties tab. Figure B-2 shows the Properties tab:
Figure B-2. Workflow Properties - Properties Tab
Optional
Optional
Enter a file name, or a file name and directory. Required. The Integration Service appends information in this field to that entered in the Workflow Log File Directory field. For example, if you have C:\workflow_logs\ in the Workflow Log File Directory field, then enter logname.txt in the Workflow Log File Name field, the Integration Service writes logname.txt to the C:\workflow_logs\ directory.
862
Required
Required
Enable HA Recovery Automatically recover terminated tasks Maximum automatic recovery attempts
Optional Optional
Optional
Properties Tab
863
Scheduler Tab
The Scheduler Tab lets you schedule a workflow to run continuously, run at a given interval, or manually start a workflow. For more information about scheduling workflows, see Scheduling a Workflow on page 123. Figure B-3 shows the Scheduler tab:
Figure B-3. Workflow Properties - Scheduler Tab
Non-Reusable. Create a non-reusable scheduler for the workflow. Reusable. Choose a reusable scheduler for the workflow.
864
Scheduler Tab
865
Table B-4 describes the settings on the Edit Scheduler dialog box:
Table B-4. Workflow Properties - Scheduler Tab - Edit Scheduler Dialog Box
Scheduler Options Run Options: Run On Integration Service Initialization/Run On Demand/ Run Continuously Required/ Optional Optional Description Indicates the workflow schedule type. If you select Run On Integration Service Initialization, the Integration Service runs the workflow as soon as the Integration Service is initialized. If you select Run On Demand, the Integration Service only runs the workflow when you start the workflow. If you select Run Continuously, the Integration Service starts the next run of the workflow as soon as it finishes the first run. Required if you select Run On Integration Service Initialization in Run Options. Also required if you do not choose any setting in Run Options. If you select Run Once, the Integration Service runs the workflow once, as scheduled in the scheduler. If you select Run Every, the Integration Service runs the workflow at regular intervals, as configured. If you select Customized Repeat, the Integration Service runs the workflow on the dates and times specified in the Repeat dialog box. Required if you select Customized Repeat in Schedule Options. Opens the Repeat dialog box, allowing you to schedule specific dates and times for the workflow to run. The selected scheduler appears at the bottom of the page. For more information about the Repeat dialog box, see Customizing Repeat Option on page 128. Required if you select Run On Integration Service Initialization in Run Options. Also required if you do not choose any setting in Run Options. Indicates the date on which the Integration Service begins scheduling the workflow. Required if you select Run On Integration Service Initialization in Run Options. Also required if you do not choose any setting in Run Options. Indicates the time at which the Integration Service begins scheduling the workflow. Required if the workflow schedule is Run Every or Customized Repeat. If you select End On, the Integration Service stops scheduling the workflow in the selected date. If you select End After, the Integration Service stops scheduling the workflow after the set number of workflow runs. If you select Forever, the Integration Service schedules the workflow as long as the workflow does not fail.
Conditional
Edit
Conditional
Start Date
Conditional
Start Time
Conditional
Conditional
866
Weekly
Optional
Scheduler Tab
867
Daily
Required
868
Variables Tab
Before using workflow variables, you must declare them on the Variables tab. Figure B-6 shows the settings on the Variables tab:
Figure B-6. Workflow Properties - Variables Tab
Variables Tab
869
Events Tab
Before using the Event-Raise task, declare a user-defined event on the Events tab. Figure B-7 shows the Events tab:
Figure B-7. Workflow Properties - Events Tab
870
You can create and promote metadata extensions with the Metadata Extensions tab. For more information about creating metadata extensions, see Metadata Extensions in the Repository Guide. Table B-8 describes the configuration options for the Metadata Extensions tab:
Table B-8. Workflow Properties - Metadata Extensions Tab
Metadata Extensions Tab Options Extension Name Datatype Value Required/ Optional Required Required Optional Description Name of the metadata extension. Metadata extension names must be unique in a domain. Datatype: numeric (integer), string, boolean, or XML. For a numeric metadata extension, the value must be an integer. For a boolean metadata extension, choose true or false. For a string or XML metadata extension, click the Edit button on the right side of the Value field to enter a value of more than one line. The Workflow Manager does not validate XML syntax.
871
Reusable
Select to make the metadata extension apply to all objects of this type (reusable). Clear to make the metadata extension apply to this object only (non-reusable). This column appears only if the value of one of the metadata extensions was changed. To restore the default value, click Revert. Description of the metadata extension.
UnOverride Description
Optional Optional
872
Index
A
ABORT function See also PowerCenter Transformation Language Reference session failure 233 aborted status 590 aborting Control tasks 174 Integration Service handling 140 sessions 141 status 590 tasks 140 tasks in Workflow Monitor 587 workflows 140 Absolute Time specifying 190 Timer task 189 active sources constraint-based loading 301 definition 312 generating commits 360 row error logging 313 source-based commit 360 transaction generators 312 XML targets 312 adding tasks to workflows 110
advanced settings session properties 811 aggregate caches reinitializing 764, 809 aggregate files deleting 765 moving 765 Aggregator cache description 783 overview 783 Aggregator transformation See also PowerCenter Transformation Guide adding to concurrent workflows 568 cache partitioning 781, 783 caches 783 configure caches 783 inputs for cache calculator 784 pushdown optimization 538 sorted ports 783 using partition points 447 AND links input type 163 $AppConnection using 237 Append if Exists flat file target property 316, 464 application connections CPI-C 84 JMS 74
873
JNDI 74 parameter types 695 password, parameter types 697 PeopleSoft 77 RFC/BAPI 88 Salesforce 79 SAP ALE IDoc Reader 86 SAP ALE IDoc Writer 87 SAP BW 80 SAP R/3 83, 84 session parameter 237 Siebel 90 TIB/Adapter SDK 94 TIB/Rendezvous 94 TIBCO 92 user name, parameter types 697 Web Services 96 webMethods 99 arrange workflows vertically 7 workspace objects 17 assigning Integration Services 116 Assignment tasks creating 166 definition 166 description 158 using Expression Editor 121 variables in 144, 694 attributes partition-level 436 automatic memory settings configuring 214 automatic task recovery configuring 388
buffer memory allocating 213 buffer blocks 213 configuring 213 bulk loading commit interval 305 data driven session 304 DB2 guidelines 305 Oracle guidelines 305 session properties 304, 835 test load 296 using user-defined commit 365
C
cache partitioning 437 cache calculator Aggregator transformation inputs 784 description 775 Joiner transformation inputs 788 Lookup transformation inputs 791 Rank transformation inputs 794 Sorter transformation inputs 797 using 780 cache directories variable types 693 cache directory sharing 774 cache files locating 765 naming convention 772 cache partitioning Aggregator transformation 781, 783 described 437 incremental aggregation 783 Joiner transformation 781, 787 Lookup transformation 476, 781, 790 performance 437 Rank transformation 781, 793 Sorter transformation 781, 796 caches Aggregator transformation 783 auto memory 776 cache calculator 775, 780 configuring 775, 778 configuring concurrent caches 814 configuring for Aggregator transformation 783 configuring for Joiner transformation 787, 796 configuring for Lookup transformation 791 configuring for Rank transformation 794
B
Backward Compatible Session Log configuring 657 Backward Compatible Workflow Log configuring 656 $BadFile naming convention 236 using 236 Based on Number of Partitions setting 434 block size FastExport attribute 281 buffer block size configuring 213, 813
874 Index
configuring for XML target 798 configuring maximum memory limits 215 configuring maximum numeric memory limit 813 data caches on a grid 633 for non-reusable sessions 775 for reusable sessions 775 for sorted-input Aggregator transformations 783 for transformations 770 index caches on a grid 633 Joiner transformation 786 lookup functions 812 Lookup transformation 790 memory 771 methods to configure 775 numeric value 778 optimizing 799 overriding 775 overview 770 persistent lookup 790 Rank transformation 793 resetting with real-time sessions 370 session cache files 770 Sorter transformation 796 specifying maximum memory by percentage 813 XML target 798 certified messages configuring TIB/Rendezvous application connections 93 changed source data real-time data 337 checking in versioned objects 20 checking out versioned objects 20 checkpoint session recovery 389 session state of operation 378, 389 COBOL sources error handling 274 numeric data handling 276 code page compatibility See also PowerCenter Administrator Guide multiple file sources 277 targets 287 code pages database connections 52, 286 delimited source 270 delimited target 322, 841 external loader files 712 fixed-width source 268 fixed-width target 320, 840
relaxed validation 52 cold start real-time sessions 351 tasks and workflows in Workflow Monitor 587 color themes selecting 10 colors setting 9 workspace 9 command file targets 317 generating file list 266 generating source data 265 partitioned sources 453 partitioned targets 465 processing target data 317 Command property configuring flat file sources 265 configuring flat file targets 317 configuring partitioned targets 464 partitioning file sources 456 Command tasks assigning resources 642 creating 170 definition 169 description 158 executing commands 172 Fail Task if Any Command Fails 172 monitoring details in the Workflow Monitor 615 multiple UNIX commands 172 promoting to reusable 172 using parameters and variables 225 using variables in 169 variable types 694 Command Type configuring flat file sources 265 partitioning file sources 456 comments adding in Expression Editor 122 commit interval bulk loading 305 configuring 374 description 358 source- and target-based 358 commit source source-based commit 360 commit type configuring 344, 806 real-time sessions 344 committing data target connection groups 360
Index
875
transaction control 365 comparing objects See also PowerCenter Designer Guide See also PowerCenter Repository Guide sessions 26 tasks 26 workflows 26 worklets 26 Components tab properties 848 concurrent connections in partitioned pipelines 460 concurrent merge file targets 465 concurrent read partitioning session properties 455 concurrent workflows adding instance names 562 configuring to run with same name 557 configuring unique instances 555 creating workflow instances with pmcmd 564 rules and guidelines 568 running web service workflows 557 scheduling 123, 568 Start Workflow Advanced option 563 Start Workflow option 563 starting and stopping 563 starting from command line 564 steps to configure 561 transformation restrictions 568 using different session parameter files 704 using parameters 560 viewing in Workflow Monitor 565 viewing logs 566 concurrent worklets description 568 Config Object tab properties 811 configurations See session config objects Configure Concurrent Execution configuring workflow instances 562 configuring error handling options 685 connect string examples 51 syntax 51 connection environment SQL configuring 52 parameter and variable types 697
connection objects See also PowerCenter Repository Guide assigning permissions 48 configuring in sessions 40 definition 37 deleting 39 overriding connection attributes 44 owner 48 Connection Retry Period database connection 57 WebSphere MQ 101 connection settings applying to all session instances 210 targets 833 connection variables defining for Lookup transformations 42 defining for Stored Procedure transformations 42 specifying $Source and $Target 41 ConnectionParam.prm file using 46 connections changing Teradata FastExport connections 281 configuring for sessions 40 copy as 57 copying a relational database connection 57 creating Teradata FastExport connections 280 external loader 64 FTP 60 multiple targets 329 overriding connection attributes 44 overriding for Lookup transformations 42 overriding for Stored Procedure transformations 42 parameter file template 46 relational database 54, 71 replacing a relational database connection 58 SFTP 60 sources 255 targets 290 connectivity connect string examples 51 constraint-based loading active sources 301 configuring 301 enabling 304 key relationships 302 session property 812 target connection groups 302 Update Strategy transformations 302 control file override description 282 loading Teradata 727
876
Index
setting Teradata FastExport statements 282 steps to override Teradata FastExport 283 Control tasks definition 174 description 158 options 175 stopping or aborting the workflow 140 copying repository objects 24 counters overview 622 CPI-C application connections See also connections configuring 84 creating Assignment tasks 166 Command tasks 170 data files directory 767 Decision tasks 178 Email tasks 411 error log tables 676 external loader connections 64 file list for partitioned sources 454 FTP sessions 753 index directory 767 metadata extensions 29 reserved words file 309 reusable scheduler 125 session configuration objects 218 sessions 207 tasks 159 workflow variables 155 workflows 109 CUME function partitioning restrictions 479 custom properties overriding 814 Custom transformation partitioning guidelines 479 pipeline partitioning 466 threads 467 customization of toolbars 15 of windows 15 workspace colors 9 customized repeat daily 130 editing 128 monthly 130 options 128 repeat every 129
weekly 129
D
data capturing incremental source changes 762, 767 Data Analyzer running from Workflow Manager 142 data cache for incremental aggregation 765 naming convention 773 data driven bulk loading 304 data encryption FastExport attribute 281 data files creating directory 767 finding 765 data flow See pipelines data movement mode affecting incremental aggregation 765 Data Profiling domains domain value, variable types 697 database connections configuring 54, 71 configuring for PowerChannel 71 connection retry period 57 copying a relational database connection 57 domain name 56, 72 packet size 56, 72 parameter 239 parameter types 695 password, parameter types 697 replacing a relational database connection 58 resilience 54 session parameter 236 use trusted connection 57, 72 user name parameter types 697 using IBM DB2 client authentication 51 using Oracle OS Authentication 51 database partitioning description 431, 482 Integration Service handling for sources 487 multiple sources 487 one source 486 performance 486, 488 rules and guidelines for Integration Service 487 rules and guidelines for sources 488 rules and guidelines for targets 489 targets 488
Index 877
database sequences dropping during recovery 522 dropping orphaned sequences 522 pushdown optimization 522 troubleshooting 522 database views creating with pushdown optimization 520 dropping during recovery 522 dropping orphaned views 522 pushdown optimization 522 troubleshooting 522 databases configuring a connection 54 connection requirements 55, 65, 71 connections 50 environment SQL 52 selecting code pages 52 datatypes See also PowerCenter Designer Guide Decimal 324 Double 324 Float 324 Integer 324 Money 324 numeric 324 padding bytes for fixed-width targets 323 Real 324 date time format 4 dates configuring 4 formats 4 DB2 See also IBM DB2 bulk loading guidelines 305 commit interval 305 $DBConnection naming convention 236 using 236 deadlock retries See also PowerCenter Administrator Guide configuring 300 PM_RECOVERY table 300 session 809 target connection groups 310 decimal arithmetic See high precision Decision tasks creating 178 decision condition variable 176 definition 176
description 158 example 176 using Expression Editor 121 variable types 694 variables in 144 Default Remote Directory for FTP connections 62 deleting connection objects 39 workflows 110 delimited flat files code page, sources 270, 830 code page, targets 322 consecutive delimiters 831 escape character, sources 271, 830 numeric data handling 276 quote character, sources 270, 830 quote character, targets 322 row settings 271 session properties, sources 268 session properties, targets 320 sources 830 delimited sources number of rows to skip 831 delimited targets session properties 841 delimiter session properties, sources 268 session properties, targets 320 directories for historical aggregate data 767 workspace file 8 directory shared caches 774 disabled status 590 disabling tasks 163 workflows 131 displaying Expression Editor 122 Integration Services in Workflow Monitor 574 domain name database connections 56, 72 dropping indexes 301 DTM (Data Transformation Manager) buffer size 214 DTM Buffer Pool Size session property 809
878
Index
$DynamicPartitionCount description 236 dynamic partitioning based on number of nodes in grid 434 based on number of partitions 434 description 433 disabled 434 number of partitions, parameter types 695 performance 433 rules and guidelines 435 using source partitions 434 using with partition types 435
E
edit null characters session properties 840 Edit Partition Point dialog box options 443 editing delimited file properties for sources 829 delimited file properties for targets 840 metadata extensions 31 null characters 840 scheduling 130 sessions 208 workflows 110 effective dates (PeopleSoft) parameter and variable types 692 email See also Email tasks attaching files 415, 424 configuring a user on Windows 404, 424 configuring the Integration Service on UNIX 403 configuring the Integration Service on Windows 404 distribution lists 408 format tags 415 logon network security on Windows 407 MIME format 404 multiple recipients 408 on failure 414 on success 414 overview 402 post-session 414 post-session, parameter and variable types 696 rmail 403 service variables 423 session properties 852 specifying a Microsoft Outlook profile 409 suspending workflows 421 suspension, variable types 696
text message 410 tips 424 user name 410 using other mail programs 425 using service variables 423 variables 415 workflows 410 worklets 410 Email tasks See also email creating 411 description 158 overview 410 suspension email 139 variable types 694 enabling enhanced security 13 past events in Event-Wait task 188 end of file transaction control 366 end options end after 128 end on 128 forever 128 endpoint URL in Web Service application connections 96 parameter and variable types 692, 693, 697 enhanced security enabling 13 enabling for connection objects 13 environment SQL See also connection environment SQL See also transaction environment SQL configuring 52 guidelines for entering 53 parameter and variable types 697 error handling COBOL sources 274 configuring 223 error log files 682 fixed-width file 274 options 685 overview 234 PMError_MSG table schema 678 PMError_ROWDATA table schema 676 PMError_Session table schema 679 pre- and post-session SQL 223 pushdown optimization 516 settings 816 transaction control 366
Index
879
error log files directory, parameter and variable types 694 name, parameter and variable types 694 overview 682 table name prefix length restriction 708 error log tables creating 676 overview 676 error logs options 686 overview 674 session errors 234 error messages external loader 713 error threshold pipeline partitioning 233 stop on errors 233 variable types 694 errors fatal 233 pre-session shell command 230 stopping on 816 threshold 233 validating in Expression Editor 122 Event-Raise tasks configuring 182 declaring user-defined event 182 definition 180 description 158 in worklets 195 events in worklets 195 predefined events 180 user-defined events 180 Event-Wait tasks definition 180 description 158 file watch name, variable types 694 for predefined events 186 for user-defined events 184 waiting for past events 188 working with 183 ExportSessionLogLibName passing log events to an external library 648 Expression Editor adding comments 122 displaying 122 syntax colors 122 using 121 validating 132 validating expressions using 122
Expression transformation See also PowerCenter Transformation Guide pushdown optimization 539 expressions parameter and variable types 693 pushdown optimization 510 validating 122 external loader behavior 713 code page 712 configuring as a resource 712 connections 64 DB2 715 error messages 713 Integration Service support 712 loading multibyte data 721, 723 on Windows systems 714 Oracle 721 overview 712 session properties 40 setting up Workflow Manager 741 Sybase IQ 723 Teradata 726 using with partitioned pipeline 461 external loader connections parameter types 695 password, parameter types 697 session parameter 236 user name, parameter types 697 External Procedure transformation See also PowerCenter Transformation Guide initialization properties, variable types 693 partitioning guidelines 479 Extract Date (PeopleSoft) parameter and variable types 692
F
Fail Task if Any Command Fails in Command Tasks 172 session command 852 fail task recovery strategy description 386, 388 failed status 590 failing workflows failing parent workflows 163, 175 using Control task 175 fatal errors session failure 233
880
Index
file list See also PowerCenter Designer Guide creating 277 creating for partitioned sources 454 generating with command 266 merging target files 465 using for source file 277 file mode SAP R/3 application connections 84 file sources code page, parameter and variable types 695 directories, parameter and variable types 695 input file commands, parameter and variable types 695 Integration Service handling 273, 276 names, parameter and variable types 695 numeric data handling 276 partitioning 453 session properties 263 file targets code page, parameter and variable types 695 partitioning 461 session properties 314 filter conditions adding 496 in partitioned pipelines 451 parameter and variable types 693 filter conditions (MQSeries) parameter and variable types 696 Filter transformation See also PowerCenter Transformation Guide pushdown optimization 540 filtering deleted tasks in Workflow Monitor 574 Integration Services in Workflow Monitor 574 tasks in Gantt Chart view 574 tasks in Task View 599 Find in Workspace tool overview 16 Find Next tool overview 16 fixed-width files code page 828 code page, sources 268 code page, targets 320 error handling 274 multibyte character handling 274 null characters, sources 268, 828 null characters, targets 320 numeric data handling 276 padded bytes in fixed-width targets 323 source session properties 266
target session properties 319 writing to 323, 324 fixed-width sources session properties 828 fixed-width targets session properties 840 flat file definitions escape character, sources 271 Integration Service handling, targets 323 quote character, sources 270 quote character, targets 322 session properties, sources 263 session properties, targets 314 flat files code page, sources 268 code page, targets 320 configuring recovery 392 creating footer 316 creating headers 316 delimiter, sources 270 delimiter, targets 322 Footer Command property 316, 464 generating source data 265 generating with command 265 Header Command property 316, 464 Header Options property 316, 464 multibyte data 326 null characters, sources 268 null characters, targets 320 numeric data handling 276 output file session parameter 236 precision, targets 324, 326 preserving input row order 457 processing with command 317 shift-sensitive target 326 source file session parameter 236 writing targets by transaction 325 flush latency configuring 343 description 343 Flush Session Recovery Data (property) Integration Service 347 fonts format options 10 setting 9 footer creating in file targets 316, 464 parameter and variable types 695 Footer Command flat file targets 316, 464
Index
881
format date time 4 format options color themes 10 colors 9 date and time 4 fonts 10 orthogonal links 9 resetting 10 schedule 4 solid lines for links 9 Timer task 4 fractional seconds precision Teradata FastExport attribute 282 FTP accessing source files 753 accessing target files 753 connecting to file targets 461 connection names 61 connection properties 61 connections for ABAP integration 86 creating a session 753 defining connections 60 defining default remote directory 62 defining host names 62 mainframe guidelines 63 overview 750 partitioning targets 759 remote directory, parameter and variable types 697 remote file name, parameter and variable types 695 resilience 62 retry period 62 session properties 822, 833 SFTP 750 Use SFTP 62 FTP connections parameter types 695 password, parameter types 697 session parameter 237 user name parameter types 697 $FTPConnection using 237 full pushdown optimization description 503 full recovery description 389 functions available in databases 511 pushdown optimization 511 Session Log Interface 669
G
Gantt Chart configuring 580 filtering 574 listing tasks and workflows 592 navigating 593 opening and closing folders 575 organizing 593 overview 570 searching 595 time increments 594 time window, configuring 580 using 592 general options arranging workflow vertically 7 configuring 6 in-place editing 7 launching Workflow Monitor 8 open editor 8 panning windows 7 reload task or workflow 7 repository notifications 8 session properties 802 show background in partition editor and DBMS based optimization 8 show expression on a link 8 show full name of task 8 General tab in session properties in Workflow Manager 802 generating commits with source-based commit 360 globalization See also PowerCenter Administrator Guide database connections 286 overview 286 targets 286 grid cache requirements 633 configuring resources 636 configuring session properties 636 configuring workflow properties 636 distributing sessions 630, 635 distributing workflows 629, 635 Integration Service behavior 635 Integration Service property settings 636 overview 628 pipeline partitioning 632 recovering sessions 635 recovering workflows 635 requirements 636
882
Index
H
hash auto-key partitioning description 432 overview 490 hash partitioning adding hash keys 491 description 482 hash user keys description 432 hash user keys partitioning overview 491 performance 491 header creating in file targets 316, 464 parameter and variable types 695 Header Command flat file targets 316, 464 Header Options flat file targets 316, 464 heterogeneous sources defined 252 heterogeneous targets overview 329 high availability configuring for WebSphere MQ 101 high precision enabling 809 handling 248 history names in Workflow Monitor 589 host names for FTP connections 62 HTTP transformation pipeline partitioning 466 threads 467
I
IBM DB2 connect string example 51 connection with client authentication 51 database partitioning 482, 486, 488 IBM DB2 EE attributes 717 connecting with client authentication 65, 715 external loader connections 64
external loading 715 IBM DB2 EEE attributes 718 connecting with client authentication 65, 715 external loader connections 64 external loading 715 icons Workflow Monitor 572 worklet validation 203 idle time configuring 341 incremental aggregation cache partitioning 783 changing session sort order 765 configuring 809 configuring the session 767 deleting files 765 Integration Service data movement mode 765 moving files 765 overview 762 partitioning data 766 preparing to enable 767 processing 763 reinitializing cache 764 incremental changes capturing 767 incremental recovery description 389 index caches for incremental aggregation 765 naming convention 773 indexes creating directory 767 dropping for target tables 301 finding 765 recreating for target tables 301 indicator files predefined events 183 INFA_AbnormalSessionTermination Session Log Interface 672 INFA_EndSessionLog Session Log Interface 671 INFA_InitSessionLog Session Log Interface 669 INFA_OutputSessionLogFatalMsg Session Log Interface 671 INFA_OutputSessionLogMsg Session Log Interface 670 Informix connect string syntax 51 row-level locking 460
Index
883
in-place editing enabling 7 input link type selecting for task 163 Input Type file source partitioning property 455 flat file source property 264 $InputFile naming convention 236 using 236 instances workflow instances (description) 554 Integration Service assigning a grid 636 assigning workflows 116 behavior on a grid 635 calling functions in the Session Log Interface 667 commit interval overview 358 connecting in Workflow Monitor 573 external loader support 712 filtering in Workflow Monitor 574 grid overview 628 handling file targets 323 monitoring details in the Workflow Monitor 606 online and offline mode 573 pinging in Workflow Monitor 573 removing from the Navigator 4 running sessions on a grid 630 selecting 115 tracing levels 662 truncating target tables 298 using FTP 60 using SFTP 60 version in session log 662 Integration Service code page See also Integration Service affecting incremental aggregation 765 Integration Service handling file targets 323 fixed-width targets 324, 326 multibyte data to file targets 326 shift-sensitive data, targets 326 is staged FastExport session attribute 282
session level classpath 807 threads 467 JMS application connections configuring 74 JMS application JMS application connections properties JNDI application connections See also connections configuring 74 Joiner cache description 786 Joiner transformation See also PowerCenter Transformation Guide cache partitioning 781, 787 caches 786 configure caches 787, 796 inputs for cache calculator 788 joining sorted flat files 470 joining sorted relational data 472 partitioning 787 partitioning guidelines 479 pushdown optimization 541
K
Keep absolute input row order session properties 458 Keep relative input row order session properties 457 key range partitioning adding 493 adding partition key 494 description 432, 482 Partitions View 442 performance 494 pushdown optimization 530 keyboard shortcuts Workflow Manager 33 keys constraint-based loading 302
L
latency description 336 launching Workflow Monitor 8, 572 Ledger File configuring for TIB/Rendezvous application
J
Java Classpath session property 807 Java transformation pipeline partitioning 466
884 Index
connections 93 line sequential buffer length configuring 813 sources 271 links AND 163 condition 117 example link condition 118 linking tasks concurrently 118 linking tasks sequentially 118 loops 117 OR 163 orthogonal 9 show expression on a link 8 solid lines 9 specifying condition 118 using Expression Editor 121 variable types 694 variables in 144 working with 117 List Tasks in Workflow Monitor 592 Load Balancer assigning priorities to tasks 641 assigning resources to tasks 642 workflow settings 640 $LoaderConnection using 236 log codes See PowerCenter Troubleshooting Guide log files See also session log files See also workflow log files real-time sessions 648 session log 805 using a shared library 648 log options settings 814 logging pushdown optimization 516 logtable name FastExport attribute 281 lookup cache description 790 lookup caches file name prefix, parameter and variable types 693 session property 812 lookup databases database connection session parameter 236
lookup files lookup file session parameter 236 lookup source files using parameters 236 Lookup SQL Override option parameter and variable types 693 Lookup transformation See also Transformation Guide See also PowerCenter Transformation Guide adding to concurrent workflows 568 cache partitioning 476, 781, 790 caches 790 configure caches 791 connection information, parameter and variable types 693 inputs for cache calculator 791 pushdown optimization 542 resilience 54 source file, parameter and variable types 695 $LookupFile naming convention 236 using 236 lookups persistent cache 790 loops invalid workflows 117
M
mainframe FTP guidelines 63 mapping parameters See also PowerCenter Designer Guide in parameter files 689 in session properties 242 overriding 242 passing values between sessions 243 $$PushdownConfig 525 mapping variables See also PowerCenter Designer Guide available in databases 511 in parameter files 689 in partitioned pipelines 438 passing values between sessions 243 pushdown optimization 511 mappings configuring pushdown optimization 503 session failure from partitioning 439 max sessions FastExport attribute 281
Index
885
Maximum Days Workflow Monitor 579 maximum memory limit configuring for caches 215 configuring for session caches 813 percentage of memory for session caches 813 session on a grid 216 Maximum Workflow Runs Workflow Monitor 579 memory caches 771 memory requirements DTM buffer size 214 session cache size 215 Merge Command description 464 flat file targets 316 parameter and variable types 695 Merge File Directory description 464 flat file target property 315 parameter and variable types 695 Merge File Name description 464 flat file target property 315 parameter and variable types 695 Merge Type description 464 flat file target property 315 merging target files commands 465 concurrent merge 465 file list 465 FTP 461 FTP file targets 759 local connection 461, 464 sequential merge 465 session properties 463, 837 message count configuring 341 message processing real-time sessions 346, 348 recovery file 346 recovery table 348 message queue processing real-time data 337 using with partitioned pipeline 461 Message Queue queue connections configuring for WebSphere MQ 101 message recovery description 345
enabling 345 real-time sessions 345, 347, 349 recovery file 345, 347 recovery table 349 recovery tables 345 session recovery data flush 347 messages processing real-time data 337 messages and message queues real-time data 337 metadata extensions creating 29 deleting 32 editing 31 overview 29 session properties 856 Microsoft Access pipeline partitioning 460 Microsoft Outlook configuring an email user 404, 424 configuring the Integration Service 404 Microsoft SQL Server commit interval 305 connect string syntax 51 MIME format email 404 monitoring See also Workflow Monitor command tasks 615 failed sessions 616 folder details 609 Integration Service details 606 performance details 621 Repository Service details 605 session details 616 targets 618 tasks details 611 worklet details 613 MOVINGAVG function See also PowerCenter Transformation Language Reference partitioning restrictions 479 MOVINGSUM function See also PowerCenter Transformation Language Reference partitioning restrictions 479 MSMQ queue connections See also connections configuring 76 multibyte data character handling 274 Oracle external loader 721 Sybase IQ external loader 723
886
Index
writing to files 326 multiple group transformations partitioning 431 multiple input group transformations creating partition points 447 multiple sessions validating 232
O
objects viewing older versions 21 older versions of objects viewing 21 open transaction definition 369 operating system profile override 135 operators available in databases 510 pushdown optimization 510 Optimize throughput session properties 457 optimizing data flow 625 options (Workflow Manager) format 6, 9, 10 general 6 miscellaneous 6 solid lines for links 9 OR links input type 163 Oracle bulk loading guidelines 305 commit intervals 305 connect string syntax 51 connection with OS Authentication 51 database partitioning 482, 486 Oracle external loader attributes 722 connecting with OS Authentication 65, 721 data precision 721 delimited flat file target 721 external loader connections 64 external loader support 712, 721 fixed-width flat file target 721 multibyte data 721 partitioned target files 722 reject file 721 Output File Directory property FTP targets 759 parameter and variable types 695 partitioning target files 464 Output File Name property flat file targets 317 FTP targets 759 parameter and variable types 695 partitioning target files 464
N
naming conventions session parameters 236 native connect string See connect string navigating workspace 15 non-persistent variables definition 153 non-reusable sessions caches 775 non-reusable tasks inherited changes 162 promoting to reusable 162 normal loading session properties 835 Normal tracing levels definition 663 Normalizer transformation using partition points 447 notifications See repository notifications null characters editing 840 file targets 320 Integration Service handling 274 session properties, targets 319 targets 840 number of nodes in grid setting with dynamic partitioning 434 number of partitions overview 430 performance 430 session parameter 236 setting for dynamic partitioning 434 numeric values reading from sources 276
Index
887
output files session properties 838 targets 316 Output is Deterministic transformation property 392 Output is Repeatable transformation property 392 Output Type property flat file targets 316 partitioning file targets 464 $OutputFile naming convention 236 using 236 overriding Teradata loader control file 727 tracing levels 816 owner connection object 48 owner name truncating target tables 299
P
packet size database connections 56, 72 page setup configuring 14 parameter files comments, adding 700 configuring concurrent workflow instances 560 datetime formats 708 defining properties in 691 description 688 example of use 707 guidelines for creating 698, 708 headings 698 input fields that accept parameters and variables 692 location, configuring 702 name, configuring 702 null values, entering 700 overriding connection attributes 46 overview 688 parameter and variable types in 689 precedence of 705 sample parameter file 700 scope of parameters and variables in 698 sections 698 session parameter file name, variable types 694, 704 specifying which to use 688 structure of 698 template file 46
888 Index
tips for creating 710 troubleshooting 709 using variables to specify 704 using with pmcmd 705 using with sessions 702 using with workflows 702 parameters database connection 239 defining in parameter files 691 input fields that accept parameters 692 overview of types 689 session 236 partition count session parameter 236 partition groups description 632 stages 632 partition keys adding 491, 494 adding key ranges 495 rows with null values 496 rules and guidelines 496 partition names setting 443 partition points adding and deleting 446 Custom transformation 466 editing 442 HTTP transformation 466 Java transformation 466 Joiner transformation 469 overview 429 partition types changing 443 default 484 description 482 key range 493 overview 431 pass-through 497 performance 483 round-robin 499 using with partition points 484 partitioning incremental aggregation 766 Joiner transformation 787 performance 499 using FTP with multiple targets 752 partitioning restrictions Informix 460 number of partitions 439 numerical functions 479
relational targets 460 Sybase IQ 460 transformations 479 unconnected transformations 448 XML targets 479 partition-level attributes description 436 partitions adding 443 deleting 443 description 430 entering description 443 merging for pushdown optimization 530 merging target data 465 properties 441, 443 scaling 433 session properties 463 pass-through partition type description 432 overview 482 performance 497 processing 497 pushdown optimization 530 Peoplesoft application connections See also connections configuring 77 performance cache settings 775 commit interval 360 data, collecting 809 data, writing to repository 809 performance detail files understanding counters 622 viewing 621 performance settings session properties 809 permissions connection object 48 connection objects 48 database 48 editing sessions 208 persistent variables definition 153 in worklets 197 pinging Integration Service in Workflow Monitor 573 pipeline partitioning adding hash keys 491 adding key ranges 495 cache 437 concurrent connections 460
configuring a session 441 configuring for sorted data 469 configuring pushdown optimization 527 configuring to optimize join performance 469 Custom transformation 466 database compatibility 460 description 428, 446, 482 dynamic partitioning 433 editing partition points 442 error threshold 233 example of use 483 external loaders 461, 714 file lists 454 file sources 453 file targets 461 filter conditions 451 FTP file targets 759 guidelines 453 hash auto-keys partitioning 490 hash user keys partitioning 491 HTTP transformation 466 Java transformation 466 Joiner transformation 469 key range 493 loading to Informix 460 mapping variables 438 merging target files 461, 464, 837 message queues 461 multiple group transformations 431 numerical functions restrictions 479 object validation 439 on a grid 632 partition keys 491, 494 partitioning indirect files 454 pass-through partitioning type 497 performance 491, 494, 499 pipeline stage 428 recovery 233 reject file 330 relational targets 459 round-robin partitioning 499 rules 439 session properties 843 sorted flat files 470 sorted relational data 472 Sorter transformation 474, 477 SQL queries 450 threads and partitions 430 Transaction Control transformation 484 valid partition types 484
Index
889
pipeline stage description 428 pipelines See also source pipelines active sources 312 data flow monitoring 625 description 428, 446, 482 $PMStorageDir session state of operations 377 workflow state of operations 377 $PMWorkflowCount archiving log files 656 $PMWorkflowLogDir archiving workflow logs 656 PM_REC_STATE table creating manually 381 description 378 real-time sessions 348 PM_RECOVERY table creating manually 381 deadlock retries 300 deadlock retry 378 description 378 format 379 PM_TGT_RUN_ID creating manually 381 description 378 format 379 PMError_MSG table schema 678 PMError_ROWDATA table schema 676 PMError_Session table schema 679 $PMFailureEmailUser definition 423 tips 424 PmNullPasswd reserved word 51 PmNullUser IBM DB2 client authentication 51 Oracle OS Authentication 51 reserved word 51 $PMSessionLogFile using 236 $PMSuccessEmailUser definition 423 tips 424 $PMWorkflowLogDir definition 653
$PMWorkflowRunInstanceName retrieving workflow instances 560 post-session command session properties 849 shell command properties 852 post-session email See also email overview 414 parameter and variable types 696 session options 854 session properties 849 post-session shell command configuring non-reusable 226 configuring reusable 229 parameter and variable types 694 using 225 post-session SQL commands entering 223 post-session variable assignment datatype conversion 245 performing after failure 243 performing on success 243 post-worklet variable assignment performing 199 PowerCenter Repository Reports viewing in Workflow Manager 142 PowerCenter resources See resources PowerChannel configuring a database connection 71 PowerChannel database connections configuring 71 PowerExchange Client for PowerCenter real-time data 337 pre- and post-session SQL commands, parameter and variable types 692, 693, 695 entering 223 guidelines 223 precision flat files 326 writing to file targets 324 predefined events waiting for 186 predefined variables in Decision tasks 176 preparing to run status 590 pre-session shell command configuring non-reusable 226 configuring reusable 229
890
Index
errors 230 session properties 849 using 225 pre-session SQL commands entering 223 pre-session variable assignment datatype conversion 245 performing 243 pre-worklet variable assignment performing 199 printing page setup 14 priorities assigning to tasks 641 Private Key File Name SFTP 62 Private Key File Password SFTP 62 privileges See also PowerCenter Administrator Guide Properties tab in session properties in Workflow Manager 804 Public Key File Name SFTP 62 $$PushdownConfig description 525 pushdown group viewing 531 pushdown groups description 531 Pushdown Optimization Viewer, using 531 pushdown optimization adding transformations to mappings 531 Aggregator transformation 538 configuring partitioning 527 configuring sessions 527 creating database views 520 database sequences 522 database views 522 error handling 516 Expression transformation 539 expressions 510 Filter transformation 540 full pushdown optimization 503, 504 functions 511 Joiner transformation 541 key range partitioning, using 530 loading to targets 530 logging 516 Lookup transformation mapping variables 511
mappings 503 merging partitions 530 native database drivers 505 ODBC drivers 505 operators 510 overview 502 parameter types 694 pass-through partition type 530 performance issues 504 $$PushdownConfig parameter, using 525 recovery 516 Router transformation 544 rules and guidelines 531 Sequence Generator transformation 545 sessions 503 Sorter transformation 546 source database partitioning 488 Source Qualifier transformation 547 source-side optimization 503 SQL generated 503 SQL versus ANSI SQL 505 targets 548 target-side optimization 503 transformations 536 Union transformation 549 Update Strategy transformation 550 Pushdown Optimization Viewer viewing pushdown groups 531
Q
queue connections MSMQ 76 parameter types 695 session parameter 237 testing WebSphere MQ 102 WebSphere MQ 101 $QueueConnection using 237 quoted identifiers reserved words 308
R
rank cache description 793 Rank transformation See also PowerCenter Transformation Guide cache partitioning 781, 793 caches 793
Index
891
configure caches 794 inputs for cache calculator 794 using partition points 447 reader selecting for Teradata FastExport 281 reader time limit configuring 341 real-time data changed source data 337 messages and message queues 337 overview 337 supported products 354 web service messages 337 real-time Flush Latency configuring 343 real-time processing description 336 real-time products overview 354 real-time sessions aborting 350 cold start 351 configuring 340 configuring commit type 344 configuring flush latency 343 configuring idle time 341 configuring message count 341 configuring reader time limit 341 configuring terminating conditions 341 description 336 log files 648 message processing 346, 348 message recovery 347, 349 overview 336 PM_REC_STATE table 348 recovering 351 restarting 351 resuming 351 rules and guidelines 353 session logs 648 stopping 350 supported products 354 transformation scope 370 truncating target tables 299 recoverable tasks description 386 recovering sessions containing Incremental Aggregator 378 sessions from checkpoint 389 with repeatable data in sessions 391
recovering workflows recovering instances by run ID 558 recovering workflows by instance name 555 recovery completing unrecoverable sessions 398 dropping database sequences 522 dropping database views 522 flat files 392 full recovery 389 incremental 389 overview 376 pipeline partitioning 233 PM_RECOVERY table format 379 PM_TGT_RUN_ID table format 379 pushdown optimization 516 real-time sessions 345 recovering a task 396 recovering a workflow from a task 397 recovering by instance name 555 recovering workflows by run ID 558 resume from last checkpoint 387, 388 rules and guidelines 398 SDK sources 392 session state of operations 377 sessions on a grid 635 strategies 386 target recovery tables 378 validating the session for 391 workflow state of operations 377 workflows on a grid 635 recovery cache folder variable types for JMS 696 variable types for TIBCO 696 variable types for webMethods 696 variable types for WebSphere MQ 696 recovery file message processing 346 message recovery 345, 347 recovery strategy fail task and continue workflow 386, 388 restart task 386, 388 resume from last checkpoint 387, 388 recovery tables description 378 manually creating from scripts 381 message processing 348 message recovery 345, 349 recreating indexes 301 reinitializing aggregate cache 764
892
Index
reject file changing names 330 column indicators 332 locating 330 Oracle external loader 721 parameter and variable types 696 pipeline partitioning 330 reading 331 row indicators 332 session parameter 236 session properties 295, 317, 835, 838 transaction control 366 viewing 330 reject file directory parameter and variable types 696 target file properties 464 Reject File Name description 464 flat file target property 317 relational connections See database connections relational database connections See database connections relational databases copying a relational database connection 57 replacing a relational database connection 58 relational sources session properties 258 relational targets partitioning 459 partitioning restrictions 460 session properties 292, 835 Relative time specifying 190 Timer task 189 reload task or workflow configuring 7 removing Integration Service 4 renaming repository objects 19 repeatable data recovering workflows 391 with sources 391 with transformations 392 repositories adding 19 connecting in Workflow Monitor 573 entering descriptions 19 repository folder monitoring details in the Workflow Monitor 609
repository notifications receiving 8 repository objects comparing 26 configuring 19 rename 19 Repository Service monitoring details in the Workflow Monitor 605 notification in Workflow Monitor 579 notifications 8 Request Old configuring for TIB/Rendezvous application connections 93 reserved words generating SQL with 308 resword.txt 308 reserved words file creating 309 resilience configuring for WebSphere MQ 101 database connection 54 FTP 62 resources assigning external loader 712 assigning to tasks 642 restart task recovery strategy description 386, 388 restarting tasks in Workflow Monitor 586 restarting tasks and workflows without recovery in Workflow Monitor 587 resume from last checkpoint recovery strategy 387, 388 resume recovery strategy using recovery target tables 378 using repeatable data 391 retry period database connection 57 FTP 62 reusable sessions caches 775 reusable tasks inherited changes 162 reverting changes 162 reverting changes tasks 162 RFC/BAPI application connections See also connections configuring 88 rmail See also email
Index
893
configuring 403 rolling back data transaction control 365 round-robin partitioning description 432, 482, 499 Router transformation See also PowerCenter Transformation Guide pushdown optimization 544 row error logging active sources 313 row indicators reject file 332 rows to skip delimited files 831 run options run continuously 127 run on demand 127 service initialization 127 running status 590 workflows 135 runtime location variable types 693 runtime partitioning setting in session properties 434
S
Salesforce application connections See also connections accessing Sandbox 79 configuring 79 SAP ALE IDoc Reader application connections See also connections configuring 86 SAP ALE IDoc Writer application connections See also connections configuring 87 SAP BW application connections See also connections configuring 80 SAP R/3 ABAP integration 86 ALE integration 86 SAP R/3 application connections See also connections configuring 83, 84 stream and file mode sessions 84 stream mode sessions 84 scheduled status 590
894 Index
scheduling concurrent workflows 123 configuring 126 creating reusable scheduler 125 disabling workflows 131 editing 130 end options 128 error message 124 run every 128 run once 128 run options 127 schedule options 128 start date 128 start time 128 workflows 123 scheduling workflows concurrent workflows 568 script files parameter and variable types 693 SDK sources recovering 392 searching versioned objects in the Workflow Manager 23 Workflow Manager 16 Workflow Monitor 595 Sequence Generator transformation See also PowerCenter Transformation Guide adding to concurrent workflows 568 partitioning guidelines 448, 479 pushdown optimization 545 sequential merge file targets 465 service levels assigning to tasks 641 service process variables in Command tasks 225 in parameter files 689 service variables email 423 in parameter files 689 session state of operations 377 Session cache size configuring memory requirements 215 session command settings session properties 849 session config objects editing 218 selecting 218 session errors handling 234
session events passing to an external library 648 $PMSessionLogCount archiving session logs 658 session log count variable types 694 $PMSessionLogDir archiving session logs 657 session log files archiving 654 time stamp 654 Session Log Interface description 666 functions 669 guidelines 667 implementing 667 INFA_AbnormalSessionTermination 672 INFA_EndSessionLog 671 INFA_InitSessionLog 669 INFA_OutputSessionLogFatalMsg 671 INFA_OutputSessionLogMsg 670 Integration Service calls 667 session logs changing locations 657 changing name 657 directory, variable types 694 enabling and disabling 657 external loader error messages 713 file name, parameter types 694 generating using UTF-8 653 Integration Service version and build 662 location 653, 805 naming 653 passing to external library 666 real-time sessions 648 sample 662 saving 815 session parameter 236 tracing levels 662, 663 viewing in Workflow Monitor 588 workflow recovery 398 session on grid description 630 session parameter file name variable types 694, 704 session parameters application connection parameter 237 built-in 236 database connection parameter 236 external loader connection parameter 236 file name, variable types 694, 704
FTP connection parameter 237 in parameter files 689 naming conventions 236 number of partitions 236 overview 236 passing values between sessions 243 queue connection parameter 237 reject file parameter 236 session log parameter 236 setting as a resource 240 source file parameter 236 target file parameter 236 user-defined 236 session properties Components tab 848 Config Object tab 811 constraint-based loading 304 delimited files, sources 268 delimited files, targets 320 edit delimiter 829, 840 edit null character 840 email 414, 852 external loader 40 FastExport sources 282 fixed-width files, sources 266 fixed-width files, targets 319 FTP files 822, 833 general settings 802 General tab 802 Metadata Extensions tab 856 null character, targets 319 on failure email 414 on success email 414 output files, flat file 838 partition attributes 441, 443 Partitions View 843 performance settings 809 post-session email 414 post-session shell command 852 Properties tab 804 reject file, flat file 317, 838 reject file, relational 295, 835 relational sources 258 relational targets 292 session command settings 849 session retry on deadlock 300 sort order 765 source connections 255 sources 254 table name prefix 306 target connection settings 822, 833
Index
895
target connections 290 target load options 304, 835 target-based commit 374 targets 288 Transformation node 841 transformations 841 session recovery data flush message recovery 347 session retry on deadlock See also PowerCenter Administrator Guide overview 300 session statistics viewing in the Workflow Monitor 612 sessions See also session properties aborting 141, 233 apply attributes to all instances 209 assigning resources 642 assigning variables pre- and post-session 243 assigning variables pre- and post-session, procedure 246 configuring for multiple source files 278 configuring for pushdown optimization 527 configuring to optimize join performance 469 creating 207 creating a session configuration object 218 datatype conversion when passing variables 245 definition 206 description 158 distributing over grids 630, 635 editing 208 email 402 external loading 712, 741 failure 233, 439 full pushdown optimization 503 metadata extensions in 29 monitoring counters 622 monitoring details 616 multiple source files 277 overriding connection attributes 44 overriding source table name 261, 825 overriding target table name 307, 836 overview 206 parameters 236 passing information between 243 passing information between, example 244 properties reference 801 pushdown optimization 503 recovering on a grid 635 source-side pushdown optimization 503 stopping 141, 233
target-side pushdown optimization 503 task progress details 611 test load 296, 319 truncating target tables 298 using FTP 753 using SFTP 753 validating 231 viewing details in the Workflow Monitor 617 viewing failure information in the Workflow Monitor 616 viewing performance details in the Workflow Monitor 621 viewing statistics in the Workflow Monitor 612 Set Control Value (PeopleSoft) parameter and variable types 692 Set File Properties description 265, 317 SetID (PeopleSoft) parameter and variable types 692 SFTP See also FTP authentication methods 62 configuring connection 62 creating a session 753 defining Private Key File Name 62 defining Private Key File Password 62 defining Public Key File Name 62 description 750 key file location 753 running a session on a grid 753 shared library implementing the Session Log Interface 667 passing log events to an external library 648 shell commands executing in Command tasks 172 make reusable 228 parameter and variable types 694 post-session 225 post-session properties 852 pre-session 225 using Command tasks 169 using parameters and variables 169, 225 shortcuts keyboard 33 Siebel application connections See also connections configuring 90 sleep FastExport attribute 281 sort order See also session properties
896
Index
affecting incremental aggregation 765 preserving for input rows 457 sorted flat files partitioning for optimized join performance 470 sorted ports caching requirements 783 sorted relational data partitioning for optimized join performance 472 sorter cache description 796 naming convention 773 Sorter transformation See also PowerCenter Transformation Guide cache partitioning 781, 796 caches 796 inputs for cache calculator 797 partitioning 477 partitioning for optimized join performance 474 pushdown optimization 546 work directory, variable types 693 $Source how Integration Service determines value 41 multiple sources 41 session properties 806 source commands generating file list 265 generating source data 265 $Source connection value parameter and variable types 694 source data capturing changes for aggregation 762 source databases database connection session parameter 236 Source File Directory description 757 Source File Name description 265, 455, 757 Source File Type description 265, 455, 757 source files accessing through FTP 750, 753 configuring for multiple files 277, 278 delimited properties 830 fixed-width properties 828 session parameter 236 session properties 264, 455, 826 using parameters 236 wildcard characters 266 source location session properties 264, 455, 826
source pipelines description 428, 446, 482 Source Qualifier transformation See also PowerCenter Transformation Guide pushdown optimization 547 pushdown optimization, SQL override 520 using partition points 447 source tables overriding table name 261, 825 parameter and variable types 692 source-based commit active sources 360 configuring 344 description 360 real-time sessions 344 sources code page 270 code page, flat file 268 commands 265, 453 connections 255 delimiters 270 dynamic files names 266 escape character 830 generating file list 266 generating with command 265 line sequential buffer length 271 monitoring details in the Workflow Monitor 618 multiple sources in a session 277 null characters 268, 274, 828 overriding source table name 261, 825 overriding SQL query, session 259 partitioning 453 preserving input row sort order 457 quote character 830 reading concurrently 455 resilience 54 session properties 254, 454 specifying code page 828, 830 wildcard characters 266 source-side pushdown optimization description 503 SQL configuring environment SQL 52 generated for pushdown optimization 503 guidelines for entering environment SQL 53 queries in partitioned pipelines 450 SQL override pushdown optimization 520 SQL query parameter and variable types 693
Index
897
staging files (SAP) file name and directory, variable types 696 start date and time scheduling 128 Start tasks definition 106 Start Workflow Advanced starting concurrent workflows 563 starting selecting a service 115 start from task 136 starting a part of a workflow 136 starting tasks 137 starting workflows using Workflow Manager 135 Workflow Monitor 572 workflows 135 state of operations checkpoints 378, 389 session recovery 377 workflow recovery 377 statistics for Workflow Monitor 576 viewing 576 status aborted 590 aborting 590 disabled 590 failed 590 in Workflow Monitor 590 preparing to run 590 running 590 scheduled 590 stopped 590 stopping 590 succeeded 590 suspended 138, 590 suspending 138, 590 tasks 590 terminated 590 terminating 590 unknown status 590 unscheduled 591 waiting 591 workflows 590 stop on error threshold 233 errors 816 pre- and post-session SQL errors 223 stopped status 590
stopping in Workflow Monitor 587 Integration Service handling 140 sessions 141 status 590 tasks 140 using Control tasks 174 workflows 140 Stored Procedure transformation call text, parameter and variable types 693 connection information, parameter and variable types 693 stream mode SAP R/3 application connections 84 subject (TIBCO) default subject 94 succeeded status 590 Suspend on Error description 861 suspended status 138, 590 suspending behavior 138 email 139, 421 status 138, 590 workflows 138 worklets 192 suspension email variable types 696 Sybase ASE commit interval 305 connect string example 51 Sybase IQ partitioning restrictions 460 Sybase IQ external loader attributes 724 connections 64 data precision 723 delimited flat file targets 723 fixed-width flat file targets 723 multibyte data 723 optional quotes 723 overview 723 support 712
T
table name prefix relational error logs, length restriction 708 relational error logs, parameter and variable types 694
898
Index
target owner 306 target, parameter and variable types 696 table names overriding source table name 261, 825 overriding target table name 307, 836 table owner name parameter and variable types 695 session properties 260 targets 306 $Target how Integration Service determines value 41 multiple targets 41 session properties 806 target commands processing target data 317 targets 465 using with partitions 465 target connection groups committing data 360 constraint-based loading 302 description 310 Transaction Control transformation 371 target connection settings session properties 822, 833 $Target connection value parameter and variable types 694 target databases database connection session parameter 236 target files appending 464 delimited 841 fixed-width 840 session parameter 236 target load order constraint-based loading 303 target owner table name prefix 306 target properties bulk mode 293 test load 293 update strategy 293 using with source properties 297 target recovery tables description 378 manually creating 381 target tables overriding table name 307, 836 parameter and variable types 692 truncating 298 truncating, real-time sessions 299
target update parameter and variable types 692 target-based commit configuring 344 real-time sessions 344 WriterWaitTimeOut 359 target-based commit interval description 359 targets See also target connection groups accessing through FTP 750, 753 code page 322, 840, 841 code page compatibility 287 code page, flat file 320 commands 317 connection settings 833 connections 290 database connections 286 deleting partition points 447 delimiters 322 file writer 288 globalization features 286 heterogeneous 329 load, session properties 304, 835 merging output files 461, 464 monitoring details in the Workflow Monitor 618 multiple connections 329 multiple types 329 null characters 320 output files 316 overriding target table name 307, 836 partitioning 459, 461 processing with command 317 pushdown optimization 548 relational settings 835 relational writer 288 resilience 54 session properties 288, 292 specifying null character 840 truncating tables 298 truncating tables, real-time sessions 299 using pushdown optimization 530 writers 288 target-side pushdown optimization description 503 Task Developer creating tasks 159 displaying and hiding tool name 8 Task view configuring 581 displaying 598
Index
899
filtering 599 hiding 581 opening and closing folders 575 overview 570 using 598 tasks aborted 590 aborting 140, 590 adding in workflows 110 arranging 17 assigning resources 642 Assignment tasks 166 automatic recovery 388 cold start 587 Command tasks 169 configuring 161 Control task 174 copying 24 creating 159 creating in Task Developer 159 creating in Workflow Designer 159 Decision tasks 176 disabled 590 disabling 163 email 410 Event-Raise tasks 180 Event-Wait tasks 180 failed 590 failing parent workflow 163 in worklets 195 inherited changes 162 instances 162 list of 158 Load Balancer settings 640 monitoring details 611 non-reusable 110 overview 158 preparing to run 590 promoting to reusable 162 recovery strategies 386 restarting in Workflow Monitor 586 restarting without recovery in Workflow Monitor 587 reusable 110 reverting changes 162 running 590 show full name 8 starting 137 status 590 stopped 590 stopping 140, 590 stopping and aborting in Workflow Monitor 587
succeeded 590 Timer tasks 189 using Tasks toolbar 110 validating 132 Tasks toolbar creating tasks 160 TDPID description 281 temporary file Teradata FastExport attribute 282 tenacity FastExport attribute 281 Teradata connect string example 51 Teradata external loader code page 726 connections 64 control file content override, parameter and variable types 696 date format 726 FastLoad attributes 735 MultiLoad attributes 730 overriding the control file 727 support 712 Teradata Warehouse Builder attributes 737 TPump attributes 732 Teradata FastExport changing the source connection 281 connection attributes 281 creating a connection 280 description 280 overriding the control file 282 rules and guidelines 283 selecting the reader 281 session attributes description 282 steps for using 280 TDPID attribute 281 temporary file, variable types 696 Teradata Warehouse Builder attributes 738 operators 737 terminated status 590 terminating status 590 terminating conditions configuring 341 Terse tracing levels See also PowerCenter Designer Guide defined 663
900
Index
test load bulk loading 296 enabling 805 file targets 319 number of rows to test 805 relational targets 296 threads Custom transformation 467 HTTP transformation 467 Java transformation 467 partitions 430 TIB/Adapter SDK application connections See also connections configuring 94 properties 93 TIB/Rendezvous application connections See also connections configuring 94 properties 92 TIB/Repository (TIBCO) repository URL, variable types 692 TIBCO application connections configuring 92 time configuring 4 formats 4 time increments Workflow Monitor 594 time stamps session log files 654 session logs 658 workflow log files 654 workflow logs 656 Workflow Monitor 570 time window configuring 580 Timer tasks absolute time 189, 190 definition 189 description 158 example 189 relative time 189, 190 variables in 144, 694 tool names displaying and hiding 8 toolbars adding tasks 110 creating tasks 160 using 15 Workflow Manager 15 Workflow Monitor 584
tracing levels See also PowerCenter Designer Guide Normal 663 overriding 816 session 662, 663 Terse 663 Verbose Data 663 Verbose Initialization 663 transaction defined 369 transaction boundary dropping 369 transaction control 369 transaction control bulk loading 365 end of file 366 Integration Service handling 365 open transaction 369 overview 369 points 369 real-time sessions 369 reject file 366 rules and guidelines 372 transformation error 366 transformation scope 369 user-defined commit 365 Transaction Control transformation partitioning guidelines 484 target connection groups 371 transaction control unit See also target connection group description 371 transaction environment SQL configuring 52, 53 parameter and variable types 697 transaction generator active sources 312 effective and ineffective 312 transaction control points 369 transformation expressions parameter and variable types 693 transformation scope defined 369 real-time processing 370 transformations 370 transformations caches 770 configuring pushdown optimization 536 partitioning restrictions 479 producing repeatable data 392 recovering sessions with Incremental Aggregator 378
Index
901
session properties 841 Transformations node properties 841 Transformations view session properties 821 Treat Error as Interruption See Suspend on Error effect on worklets 192 Treat Source Rows As bulk loading 304 using with target properties 297 Treat Source Rows As property overview 258 trees (PeopleSoft) names, parameter and variable types 692 truncating Table Name Prefix 299 target tables 298
example 180 waiting for 184 user-defined joins parameter and variable types 693
V
validating expressions 122, 132 multiple sessions 232 session for recovery 391 tasks 132 workflows 132, 133 worklets 203 variable values calculating across partitions 438 variables defining in parameter files 691 email 415 in Command tasks 169 input fields that accept variables 692 overview of types 689 workflow 144 Verbose Data tracing level See also PowerCenter Designer Guide configuring session log 663 Verbose Initialization tracing level See also PowerCenter Designer Guide configuring session log 663 versioned objects See also PowerCenter Repository Guide Allow Delete without Checkout option 12 checking in 20 checking out 20 comparing versions 21 searching for in the Workflow Manager 23 viewing 21 viewing multiple versions 21 viewing older versions of objects 21 reject file 330 views See also database views
U
unconnected transformations partitioning restrictions 448 Union transformation See also PowerCenter Transformation Guide pushdown optimization 549 UNIX systems email 403 external loader behavior 713 unknown status status 590 unscheduled status 591 unscheduling workflows 125 update strategy target properties 293 Update Strategy transformation See also PowerCenter Transformation Guide constraint-based loading 302 pushdown optimization 550 using with target and source properties 297 updating incrementally 767 URL adding through business documentation links 122 user-defined commit See also transaction control bulk loading 365 user-defined events declaring 182
902 Index
W
waiting status 591 web links adding to expressions 122
web service messages real-time data 337 Web Services application connections See also connections configuring 96 endpoint URL 96 Web Services Hub running concurrent workflows 557 webMethods application connections See also connections configuring 99 WebSphere MQ queue connections See also connections configuring 101 testing 102 wildcard characters configuring source files 266 windows customizing 15 displaying and closing 15 docking and undocking 15 fonts 10 Navigator 3 Output 3 overview 3 panning 7 reloading 7 Workflow Manager 3 Workflow Monitor 570 workspace 3 Windows Start Menu accessing Workflow Monitor 572 Windows systems email 404 external loader behavior 714 logon network security 407 workflow state of operations 377 Workflow Composite Report viewing 142 Workflow Designer creating tasks 159 displaying and hiding tool name 8 workflow instance adding workflow instances 562 creating dynamically 564 description 554 starting and stopping 563 starting from command line 564 using $PMWorkflowRunInstanceName variable 560 viewing in Workflow Monitor 565
workflow log files archiving 654 configuring 655 time stamp 654 viewing concurrent workflows 566 workflow logs changing locations 656 changing name 656 enabling and disabling 656 file name and directory, variable types 696 locating 653 naming 653 sample 659, 661 viewing in Workflow Monitor 588 workflow log count, variable types 696 Workflow Manager adding repositories 19 arrange 17 checking out and in versioned objects 20 configuring for multiple source files 278 connections overview 36 copying 24 CPI-C connection 84 customizing options 6 database connections 54, 71 date and time formats 4 display options 6 entering object descriptions 19 external loader connections 64 FTP connections 60 general options 6 JMS connections 74 JNDI connections 74 messages to Workflow Monitor 579 MSMQ queue connections 76 overview 2 PeopleSoft connections 77 printing the workspace 14 relational database connections 50 RFC/BAPI connections 88 running sessions on a grid 628 running workflows on a grid 628 Salesforce connections 79 SAP ALE IDoc Reader connections 86 SAP ALE IDoc Writer connections 87 SAP BW connections 80 SAP R/3 connections 83, 84 searching 16 searching for versioned objects 23 SFTP connections 60 Siebel connections 90
Index
903
TIB/Adapter SDK connections 94 TIB/Rendezvous connections 94 TIBCO connections 92 toolbars 15 tools 2 validating sessions 231 versioned objects 20 viewing reports 142 Web Services connections 96 webMethods connections 99 WebSphere MQ connections 101 windows 3, 15 zooming the workspace 17 Workflow Monitor See also monitoring closing folders 575 cold start tasks or workflows 587 configuring 578 connecting to Integration Service 573 connecting to repositories 573 customizing columns 581 deleted Integration Services 573 deleted tasks 574 disconnecting from an Integration Service 573 displaying services 574 filtering deleted tasks 574 filtering services 574 filtering tasks in Task View 574, 599 Gantt Chart view 570 hiding columns 581 hiding services 574 icon 572 launching 572 launching automatically 8 listing tasks and workflows 592 Maximum Days 579 Maximum Workflow Runs 579 monitor modes 573 navigating the Time window 593 notification from Repository Service 579 opening folders 575 overview 570 performing tasks 585 pinging the Integration Service 573 receive messages from Workflow Manager 579 resilience to Integration Service 573 restarting tasks or workflows without recovery 587 restarting tasks, workflows, and worklets 586 searching 595 Start Menu 572 starting 572
statistics 576 stopping or aborting tasks and workflows 587 switching views 571 Task view 570 time 570 time increments 594 toolbars 584 viewing command task details 615 viewing concurrent workflows 565 viewing folder details 609 viewing history names 589 viewing Integration Service details 606 viewing repository details 605 viewing session details 616 viewing session failure information 616 viewing session logs 588 viewing session statistics 612 viewing source details 618 viewing target details 618 viewing task progress details 611 viewing workflow details 610 viewing workflow logs 588 viewing worklet details 613 workflow and task status 590 workflow properties service levels 641 suspension email 421 workflow run ID description 557 viewing in the workflow log 567 workflow tasks reusable and non-reusable 162 workflow variables built-in variables 146 creating 155 datatypes 146, 155 default values 148, 153, 154 in parameter files 689 keywords 145 non-persistent variables 153 passing values to and from sessions 243 passing values to and from worklets 199 persistent variables 153 predefined 146 start and current values 153 user-defined 152 using 144 using in expressions 148 workflows aborted 590 aborting 140, 590
904
Index
adding tasks 110 assigning Integration Service 116 branches 106 cold start workflows 587 configuring concurrent with same name 557 configuring instance names 562 configuring unique instances 555 copying 24 creating 109 definition 106 deleting 110 developing 107, 109 disabled 590 disabling 131 dispatching tasks 640 distributing over grids 629, 635 editing 110 email 410 events 106 fail parent workflow 163 failed 590 guidelines 107 links 106 metadata extensions in 29 monitor 107 override Integration Service 135 override operating system profile 135 overview 106 parameter file 153 preparing to run 590 properties reference 859 recovering on a grid 635 restarting in Workflow Monitor 586 restarting without recovery in Workflow Monitor 587 run type 611 running 135, 590 scheduled 590 scheduling 123 scheduling concurrent instances 123 scheduling concurrent workflows 568 selecting a service 107 service levels 641 starting 135 starting concurrent workflows with pmcmd 564 starting with advanced options 135 status 138, 590 stopped 590 stopping 140, 590 stopping and aborting in Workflow Monitor 587 succeeded 590 suspended 590
suspending 138, 590 suspension email 421 terminated 590 terminating 590 unknown status 590 unscheduled 591 unscheduling 125 using tasks 158 validating 132 variables 144 viewing details in the Workflow Monitor 610 viewing reports 142 waiting 591 Worklet Designer displaying and hiding tool name 8 worklet variables in parameter files 689 passing values between worklets 199 passing values to and from sessions 243 worklets adding tasks 195 adding to concurrent workflows 568 assigning variables pre- and post-worklet 199 assigning variables pre- and post-worklet, procedure 201 configuring properties 194 create non-reusable worklets 194 create reusable worklets 193 declaring events 195 developing 193 email 410 fail parent worklet 163 metadata extensions in 29 monitoring details in the Workflow Monitor 613 overriding variable value 197 overview 192 parameters tab 197 passing information between 199 passing information between, example 200 persistent variable example 197 persistent variables 197 restarting in Workflow Monitor 586 suspended 590 suspending 192, 590 validating 203 variables 197 waiting 591 workspace colors 9 colors, setting 9 file directory 8
Index
905
fonts, setting 9 navigating 15 printing 14 zooming 17 writers session properties 831 WriterWaitTimeOut target-based commit 359 writing multibyte data to files 326 to fixed-width files 323, 324
X
XML sources numeric data handling 276 XML target caches 798 configure caches 798 XML target cache description 798 variable types 692 XML targets active sources 312 partitioning restrictions 479 target-based commit 359
Z
zooming Workflow Manager 17
906
Index