You are on page 1of 87

Business Objects Data Integrator

Training BODI Version XI3


1

Audience Application Developers Consultants Database Administrators working on data extraction, data warehousing, or data integration.

2/18/2011

Assumptions You understand your source data systems, RDBMS, business intelligence and e-commerce messaging concepts. You are familiar with SQL (Structured Query Language). You are familiar with Microsoft Windows or UNIX platforms to effectively use Data Integrator.

2/18/2011

Business Objects Data Integration Platform The Data Integration Platform consists of Data Integrator: data movement and management server Rapid Marts: suite of packaged data caches for speedy delivery and integration of data

2/18/2011

Business Objects Data Integration Platform

2/18/2011

Rapid Mart SAP R/3 Modules

Account Payable ----> FI-FInance Account Receivable ----> FI-FInance Cost Center ----> CO-Controlling Human Resources ----> HR-Human Resources Inventory ----> MM-Materials Movement Plant Maintenance ----> PM-Plant Maintenance Production Planning ----> PP-Production Planning Project Systems ----> PS-Project Systems Purchasing ----> SD-Sales and Distribution Sales ----> SD-Sales and Distribution 2/18/2011
6

Data Integrator
DI is a data movement and integration platform

2/18/2011

Data Integrator Architecture

2/18/2011

Data Integrator operating system platforms


DI Designer runs on the following Windows platforms: NT 2000 Professional 2000 Server 2000 Advanced Server 2000 Datacenter Server XP All other DI components run on the above Windows platforms and the following UNIX platforms: Solaris 2.7 and 2.8 (Sun OS releases 5.7 and 5.8) HP-UX version 11.00 (PA_RISC 2.0), and 11.1 IBM AIX 4.3.3.75 with maintenance level 4330-10, and AIX 5.1

2/18/2011

Data Integrator Components

Standard components are:


DI Job Server DI Engine DI Designer DI Repository DI Access Server DI Administrator DI Metadata Reports tool DI Web Server DI Service DI SNMP Agent
10

2/18/2011

Data Integrator Component Relationships

2/18/2011

11

Data Integrator Components DI Job Server starts the data movement engine that integrates data from multiple heterogeneous sources, performs complex data transformations, and manages extractions and transactions. can move data in either batch or real-time mode and uses distributed query optimization, multithreading, in-memory caching, in-memory data transformations, and parallel pipelining to deliver high data throughput and scalability.

2/18/2011

12

Data Integrator Components DI Engine When DI jobs are executed, the Job Server starts DI engine processes to perform data extraction, transformation, and movement. DI engine processes use parallel-pipelining and in-memory data transformations to deliver high data throughput and scalability.

2/18/2011

13

Data Integrator Components DI Designer allows for defining data management applications which consist of data mappings, transformations, and control logic. a development tool with a graphical user interface. It enables developers to create objects, then drag, drop, and configure them by selecting icons in flow diagrams, table layouts and nested, workspace pages. 2/18/2011
14

Data Integrator Components

DI Repository
a set of tables that hold user-created and predefined system objects, source and target metadata, and transformation rules. It is set up on an open client/server platform to facilitate the sharing of metadata with other enterprise tools. Each repository is stored on an existing RDBMS. associated with one or more DI Job Servers.
15

2/18/2011

Data Integrator Components


There are two types of repositories: A local repository is used by an application designer to store definitions of DI objects (like projects, jobs, work flows, and data flows) and source/target metadata. A central repository is an optional component that can be used to support multi-user development. The central repository provides a shared object library allowing developers to check objects in and out of their local repositories.

2/18/2011

16

Data Integrator Components


DI Access Server The Access Server is a real-time, request-reply message broker that collects message requests, routes them to a realtime service, and delivers a message reply within a userspecified time frame. The Access Server queues messages and sends them to the next available real-time service across any number of computing resources. This approach provides automatic scalability because the Access Server can initiate additional real-time services on additional computing resources if traffic for a given real-time service is high. Multiple Access Servers can also be configured.

2/18/2011

17

Data Integrator Components


DI Administrator browser-based administration of DI resources, including: Scheduling, monitoring and executing batch jobs Configuring, starting and stopping real-time services Configuring Job Server, Access Server, and repository usage Configuring and managing adapters Managing users Publishing batch jobs and real-time services via Web services

2/18/2011

18

Data Integrator Components


DI Metadata Reports tool This provides browser-based reports on DI metadata, which is stored in the repository. Reports are provided for: Repository summary Job analysis Execution statistics Impact analysis

2/18/2011

19

Data Integrator Components DI Web Server


supports browser access to the Administrator and the Metadata Reports tool. The Windows service name for this server is DI Web Server service; The UNIX equivalent is a daemon named the Tomcat server. This is the servlet engine used by the DI Web Server.

2/18/2011

20

Data Integrator Components DI Service The DI Service is installed when DI Job and Access Servers are installed. The DI Service starts Job Servers and Access Servers when you reboot your system. The Windows service name is DATA INTEGRATOR Service. The UNIX equivalent is a daemon named AL_JobService.

2/18/2011

21

Data Integrator Components DI SNMP Agent DI error events can be communicated using SNMP-supported applications for better error monitoring. The DI SNMP agent monitors and records information about the Job Servers and jobs running on the computer where the agent is installed.

2/18/2011

22

Data Integrator Management Tools


License Server The License Server allows you to centrally control license validation for DI components and licensed extensions. Repository Manager The Repository Manager allows you to create, upgrade, and check the versions of local and central repositories. Server Manager The Server Manager allows you to add, delete, or edit the properties of Job Servers and Access Servers. It is automatically installed on each computer on which you install a Job Server or Access Server.

2/18/2011

23

Data Integrator Objects


All entities you create, modify, or work with in DI Designer are called objects. The local object library shows objects such as source and target metadata, system functions, projects, and jobs. DI has two types of objects:
Reusable objects
Have a single definition All calls to the object refer to the object definition Changes in the object definition get propagated to all calls to the object definition

Single-use objects
Objects that are defined only within the context of a single job or data flow E.g. Scripts

2/18/2011

24

Data Integrator Object Relationships

2/18/2011

25

Projects
A reusable object that allows you to group jobs. highest level of organization offered by DI. used to group jobs that have schedules that depend on one another or that you want to monitor together. Only one project can be open at a time. Projects cannot be shared among multiple users.

2/18/2011

26

Jobs A job is the only object that is executed. The following objects can be included in a job definition:
Data flows Transforms Work flows Scripts Conditionals While Loops Try/catch blocks.

2/18/2011

27

Datastores represent connections between DI and databases or applications, directly or through adapters. allow DI to access metadata from a database or application and hence to read from or write to a database or an application. DI datastores can connect to: Databases and mainframe file systems. Applications that have pre-packaged or userwritten DI adapters. SAP R/3, SAP BW, PeopleSoft, J.D. Edwards One World, and J.D. Edwards World. 2/18/2011
28

File Formats DI can use data stored in files for data sources or data targets. File format objects can describe files in: Delimited format Characters such as commas or tabs separate each field Fixed width format The column width is specified by the user SAP R/3 format

2/18/2011

29

Data Flows
Data flows extract, transform, and load data; reading sources, transforming data, and loading targets, occurs inside a data flow. A data flow can be added to a job or a work flow. From inside a work flow, a data flow can send and receive information to and from other objects through input and output parameters.

2/18/2011

30

Data Flows

Input Parameters

Source(s)

Data Transformation Operations

Target(s)

Output Parameters

2/18/2011

31

Work Flows
A work flow defines the decision-making process for executing data flows. The purpose of a work flow is to prepare for executing data flows and to set the state of the system after the data flows are complete. The following objects can be elements in work flows:
Work f lows Data flows Scripts Conditionals While loops Try/catch blocks

2/18/2011

32

Work Flows

Control Operations

Data Flow

Control Operations

2/18/2011

33

Conditionals Conditionals are single-use objects used to implement if/then/else logic in a work flow. To define a conditional, you specify a condition and two logical branches: If A Boolean expression that evaluates to TRUE or FALSE. You can use functions, variables, and standard operators to construct the expression. Then Work flow elements to execute if the If expression evaluates to TRUE. Else (Optional) Work flow elements to execute if the If expression evaluates to FALSE. 2/18/2011
34

Conditionals
Work Flow Conditional True If Process Successful False Else Send E-mail Then Run Work Flow

2/18/2011

35

While Loops The while loop is a single-use object that you can use in a work flow. The while loop repeats a sequence of steps as long as a condition is true.

2/18/2011

36

While Loops

False While Number != 0 True Step 1 Step 2

2/18/2011

37

Try / Catch Blocks A try/catch block is a combination of one try object and one or more catch objects that allow you to specify alternative work flows if errors occur while DI is executing a job. Try/catch blocks:
Catch classes of exceptions thrown by DI, the DBMS, or the operating system Apply solutions that you provide Continue execution

Try and catch objects are single-use objects. 2/18/2011


38

Try / Catch Blocks Categories of available exceptions are:


Database access errors Email errors Engine abort errors Execution errors File access errors Microsoft connection errors Parser errors Predefined transform errors Repository access errors Resolver errors System exception errors User transform errors

2/18/2011

39

Scripts

Scripts are single-use objects used to call functions and assign values to variables in a work flow. A script can contain the following statements:
Function calls If statements While statements Assignment statements Operators
40

2/18/2011

Types Of Lookup Functions


Retrieves a value in a table or file based on the values in a different source table or File.

1) Lookup 2) Lookup_Ext 3) Lookup_Seq

2/18/2011

41

Variables Variables are symbolic placeholders for values.


Local Variables
Local variables are local to the work flow in which they are defineda local variable defined in a work flow is available for use in any of the single-use objects in the work flow. The value of the local variable can be passed as a parameter into another work flow or data flow called in the work flow.

2/18/2011

42

Variables
Global Variables
Global variables are global within a job. Once a name for a global variable is used in a job that name becomes reserved for the job. Global variables are exclusive within the context of the job in which they are created. Setting parameters is not necessary when you use global variables.

2/18/2011

43

Parameters
Parameters are expressions passed to a work flow or data flow when the work flow or data flow is called. Parameters can be defined to pass values into and out of work flows, data flows, and custom functions

2/18/2011

44

Transforms
The following transforms are available from the object library on the Transforms tab.
-- Case -- Effective_Date -- History_Preserving -- Map_Operation -- Pivot (Columns to Rows) -- Reverse Pivot (Rows to Columns) -- SQL -- Date_Generation -- Hierarchy_Flattening -- Key_Generation -- Merge -- Query -- Row_Generation -- Table_Comparison

2/18/2011

45

Query Transform
Retrieves a data set that satisfies conditions that you specify. A query transform is similar to a SQL SELECT statement.

2/18/2011

46

Query Transform

2/18/2011

47

Query Transform

Input Schema

Output Schema

Options

2/18/2011

48

Case Transform
Specifies multiple paths in a single transform (different rows are processed in different ways). Simplifies branch logic in data flows by consolidating case or decision making logic in one transform. Paths are defined in an expression table.

2/18/2011

49

Case Transform

2/18/2011

50

Case Transform

2/18/2011

51

SQL Transform
Performs the indicated SQL query operation. Use this transform to perform standard SQL operations for things that cannot be performed using other built-in transforms.

2/18/2011

52

SQL Transform

2/18/2011

53

SQL Transform

2/18/2011

54

Merge Transform
Combines incoming data sets, producing a single output data set with the same schema as the input data sets.

2/18/2011

55

Merge Transform

2/18/2011

56

Merge Transform

2/18/2011

57

Row_Gen Transform
Produces a data set with a single column. The column values start from zero and increment by one to a specified number of rows.

2/18/2011

58

Row_Gen Transform

2/18/2011

59

Key_Generation Transform
Generates new keys for new rows in a data set. The Key_Generation transform looks up the maximum existing key value from a table and uses it as the starting value to generate new keys.

2/18/2011

60

Key_Generation Transform

2/18/2011

61

Key_Generation Transform

2/18/2011

62

Date_Generation Transform
Produces a series of dates incremented as you specify. Use this transform to produce the key values for a time dimension target. From this generated sequence you can populate other fields in the time dimension (such as day_of_week) using functions in a query.

2/18/2011

63

Date_Generation Transform

2/18/2011

64

Date_Generation Transform

Date_Generation

2/18/2011

65

Table_Comparison Transform
Compares two data sets and produces the difference between them as a data set with rows flagged as INSERT or UPDATE. The Table_Comparison transform allows you to detect and forward changes that have occurred since the last time a target was updated.

2/18/2011

66

Table_Comparison Transform

2/18/2011

67

Map_Operation Transform
Allows conversions between data manipulation operations. The Map_Operation transform allows you to change operation codes on data sets to produce the desired output. For example, if a row in the input data set has been updated in some previous operation in the data flow, you can use this transform to map the UPDATE operation to an INSERT. The result could be to convert UPDATE rows to INSERT rows to preserve the existing row in the target. 2/18/2011
68

Map_Operation Transform

2/18/2011

69

Table_Comparison & Map_Operation Transforms

2/18/2011

70

History_Preserving Transform
The History_Preserving transform allows you to produce a new row in your target rather than updating an existing row. You can indicate in which columns the transform identifies changes to be preserved. If the value of certain columns change, this transform creates a new row for each row flagged as UPDATE in the input data set.

2/18/2011

71

Pivot Transform (Columns to Rows)


Creates a new row for each value in a column that you identify as a pivot column. The Pivot transform allows you to change how the relationship between rows is displayed. For each value in each pivot column, DI produces a row in the output data set. You can create pivot sets to specify more than one pivot column.

2/18/2011

72

Pivot Transform (Columns to Rows)


Region Sales-2001 Sales-2002 Sales-2003 North 200 300 400 East 300 600 700 West 350 800 770 South 800 200 3750

Region North North North 2/18/2011

Year 2001 2002 2003

Sales 200 300 400


73

Reverse Pivot Transform (Rows to Columns)


Creates one row of data from several existing rows. The Reverse Pivot transform allows you to combine data from several rows into one row by creating new columns. For each unique value in a pivot axis column and each selected pivot column, DI produces a column in the output data set.

2/18/2011

74

Reverse Pivot Transform (Rows to Columns)


Region North North North Year 2001 2002 2003 Sales 200 300 400

Region 2001 2002 2003 North 200 300 400

2/18/2011

75

Functions
Functions operate on single values, such as values in specific columns in a data set. You can use functions in the following operations: Queries Scripts Conditionals You can use :
Built-in functions (DI functions) Custom functions (user-defined functions) Database and application functions (functions specific to DBMS)

2/18/2011

76

Procedures
DI supports the use of stored procedures for Oracle, Microsoft SQL Server, Sybase, and DB2 databases. You can call stored procedures from the jobs you create and run in DI

2/18/2011

77

Debugging
Execute a job in the Data Scan mode View and analyze the output data in the Data Scan window Compare and analyze different data samples

2/18/2011

78

Debugging Data Scan Mode

2/18/2011

79

Debugging Analyzing The Output


Object List Scan Date and Time

Schema Area

Data Area
80

2/18/2011

Migration and Repositories


The development process you use to create your ETL application involves three distinct phases: design, test, and production. Each phase may require a different computer in a different environment, and different security settings for each. To control the environment differences, each phase may require a different repository.
81

2/18/2011

Migration and Repositories

Design Repository

Export to Test Repository

Test Repository
Export to Production Repository

Production Repository 2/18/2011


82

Migration and Repositories

When moving objects from one phase to another, export jobs from your source repository to either a file or a database, then import them into your target repository.

2/18/2011

83

Exporting Objects to a Database

You can export objects from the current repository to another repository. However, the other repository must be the same version as the current one. The export process allows you to change environment-specific information defined in datastores and file formats to match the new environment.
2/18/2011
84

Exporting/Importing Objects to/from a File You can also export objects to a file. If you choose a file as the export destination, DI does not provide options to change environment specific information. Importing objects or an entire repository from a file overwrites existing objects with the same names in the destination repository. You must restart DI after the import process completes.
2/18/2011
85

Parallel Execution The maximum number of parallel DI engine processes in the Job Server options (Tools > Options> Job Server > Environment). This helps in running the transforms in parallel.

2/18/2011

86

Parallel Work Flows / Data Flows

2/18/2011

87

You might also like