Professional Documents
Culture Documents
Atul Singh
atulsingh@in.ibm.com
What is Datastage?
Datastage is an ETL tool used to design jobs for Extraction, Transformation and Load Ideal Tool for data integration projects-such as data warehouses and data marts.
9/26/2013 4:25 AM
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM 3
Atul Singh
atulsingh@in.ibm.com
Data Warehouse
A data warehouse is a
subject-oriented integrated time-varying non-volatile
9/26/2013 4:25 AM
Atul Singh
atulsingh@in.ibm.com
Optimized Loader
ERP Systems
Purchased Data
Atul Singh
atulsingh@in.ibm.com
to the
competition
9/26/2013 4:25 AM
Atul Singh
atulsingh@in.ibm.com
The Motivation
DATA
INFORMATION
UNDERSTANDING
DECISION
9/26/2013 4:25 AM
Atul Singh
atulsingh@in.ibm.com
INTRODUCTION TO DATASTAGE
9/26/2013 4:25 AM
Atul Singh
atulsingh@in.ibm.com
What is DataStage ?
DataStage is a client server application. Server can be installed in either Windows or Unix Operating Systems. Client can be installed in Windows Communication between the client tools and DataStage server Design jobs for Extraction, Transformation, and Loading (ETL) Ideal Tool for data integration projects-such as data warehouses, data marts and system migration.
9/26/2013 4:25 AM
9/26/2013
Atul Singh
atulsingh@in.ibm.com
Extract
Load
Transform
9/26/2013 4:25 AM
10
Atul Singh
atulsingh@in.ibm.com
DataStage Architecture
SERVER
NT/ UNIX
Intel Alpha Unix Solaris ENGINE
WIN 95/NT
MANAGER
CLIENT
DESIGNER
DIRECTOR
ADMIN
Graphical workflow style tools for point-and-click specifications of sources, targets and transformation requirements
9/26/2013 4:25 AM 11
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
12
Atul Singh
atulsingh@in.ibm.com
DataStage Terminology
Project: A Project is a collection of related Jobs Job : A job is an executable Program which is built using different stages in GUI Stages: They represent the processing steps required. Links: They represent the flow of data between different stages. Shared Containers: Defines reusable logic Sequences: Allows to run a sequence of related Jobs.
9/26/2013 4:25 AM
13
Atul Singh
atulsingh@in.ibm.com
Atul Singh
atulsingh@in.ibm.com
DataStage Administrator
9/26/2013 4:25 AM
15
Atul Singh
atulsingh@in.ibm.com
Client Logon
9/26/2013 4:25 AM
16
Atul Singh
atulsingh@in.ibm.com
DataStage Manager
9/26/2013 4:25 AM
17
Atul Singh
atulsingh@in.ibm.com
DataStage Designer
9/26/2013 4:25 AM
18
Atul Singh
atulsingh@in.ibm.com
DataStage Director
9/26/2013 4:25 AM
19
Atul Singh
atulsingh@in.ibm.com
Developing in DataStage
Define global and project properties in Administrator Import meta data into Manager Build job in Designer Compile Designer Validate, run, and monitor in Director
9/26/2013 4:25 AM
20
Atul Singh
atulsingh@in.ibm.com
DataStage Projects
9/26/2013 4:25 AM
21
Atul Singh
atulsingh@in.ibm.com
Project Properties
Projects can be created and deleted in Administrator Project properties and defaults are set in Administrator
9/26/2013 4:25 AM
22
Atul Singh
atulsingh@in.ibm.com
Setting Project Properties To set project properties, log onto Administrator, select your project, and then click Properties
9/26/2013 4:25 AM
23
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
24
Atul Singh
atulsingh@in.ibm.com
Environment Variables
9/26/2013 4:25 AM
25
Atul Singh
atulsingh@in.ibm.com
Permissions Tab
9/26/2013 4:25 AM
26
Atul Singh
atulsingh@in.ibm.com
Tunables Tab
9/26/2013 4:25 AM
27
Atul Singh
atulsingh@in.ibm.com
Parallel Tab
9/26/2013 4:25 AM
28
Atul Singh
atulsingh@in.ibm.com
Atul Singh
atulsingh@in.ibm.com
What Is Metadata?
Data
Source
Meta Data
Transform
Target
Meta Data
Atul Singh
atulsingh@in.ibm.com
DataStage Manager
9/26/2013 4:25 AM
31
Atul Singh
atulsingh@in.ibm.com
Manager Contents
Metadata describing sources and targets: Table definitions DataStage objects: jobs, routines, table definitions, etc.
9/26/2013 4:25 AM
32
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
33
Atul Singh
atulsingh@in.ibm.com
Export Procedure
In Manager, click Export>DataStage Components Select DataStage objects for export Specified type of export: DSX, XML Specify file path on client machine
9/26/2013 4:25 AM
34
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
35
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
36
Atul Singh
atulsingh@in.ibm.com
Import Procedure
In Manager, click Import>DataStage Components Select DataStage objects for import
9/26/2013 4:25 AM
37
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
38
Atul Singh
atulsingh@in.ibm.com
Metadata Import
Import format and column destinations from sequential files Import relational table column destinations Imported as Table Definitions Table definitions can be loaded into job stages
9/26/2013 4:25 AM
39
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
40
Atul Singh
atulsingh@in.ibm.com
Atul Singh
atulsingh@in.ibm.com
What Is a Job?
Executable DataStage program Created in DataStage Designer, but can use components from Manager Built using a graphical user interface Compiles into Orchestrate shell language (OSH)
9/26/2013 4:25 AM
42
Atul Singh
atulsingh@in.ibm.com
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
44
Atul Singh
atulsingh@in.ibm.com
Designer Toolbar
Show/hide metadata markers
Job properties
Compile
9/26/2013 4:25 AM
45
Atul Singh
atulsingh@in.ibm.com
Tools Palette
9/26/2013 4:25 AM
46
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
47
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
48
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
49
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
50
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
51
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
52
Atul Singh
atulsingh@in.ibm.com
Transformer Stage
Used to define constraints, derivations, and column mappings A column mapping maps an input column to an output column In this module will just defined column mappings (no derivations)
9/26/2013 4:25 AM
53
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
54
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
55
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
56
Atul Singh
atulsingh@in.ibm.com
Result
9/26/2013 4:25 AM
57
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
58
Atul Singh
atulsingh@in.ibm.com
Shows in Manager
Annotation stage
Is a stage on the tool palette Shows on the job GUI (work area)
9/26/2013 4:25 AM
59
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
60
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
61
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
62
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
63
Atul Singh
atulsingh@in.ibm.com
Compiling a Job
9/26/2013 4:25 AM
64
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
65
Atul Singh
atulsingh@in.ibm.com
Running Jobs
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
67
Atul Singh
atulsingh@in.ibm.com
DataStage Director
Can schedule, validating, and run jobs Can be invoked from DataStage Manager or Designer
Tools > Run Director
9/26/2013 4:25 AM
68
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
69
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
70
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
71
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
72
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
73
Atul Singh
atulsingh@in.ibm.com
Atul Singh
atulsingh@in.ibm.com
Job Presentation
9/26/2013 4:25 AM
75
Atul Singh
atulsingh@in.ibm.com
Naming conventions
Stages named after the
Data they access
9/26/2013 4:25 AM
76
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
77
Atul Singh
atulsingh@in.ibm.com
Container
9/26/2013 4:25 AM
78
Atul Singh
atulsingh@in.ibm.com
Partitioner Collector
9/26/2013 4:25 AM
79
Atul Singh
atulsingh@in.ibm.com
More Stages
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
81
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
82
Atul Singh
atulsingh@in.ibm.com
9/26/2013 4:25 AM
83
Atul Singh
atulsingh@in.ibm.com
Stage Variables
Show/Hide button
9/26/2013 4:25 AM
84
Atul Singh
atulsingh@in.ibm.com
2 sorted input links, 1 output link "left outer" on primary input, "right outer" on secondary input Pre-sort make joins "lightweight": few rows need to be in RAM
9/26/2013 4:25 AM
85
Atul Singh
atulsingh@in.ibm.com
Atul Singh
atulsingh@in.ibm.com
No basic coding
9/26/2013 4:25 AM 87
Atul Singh
atulsingh@in.ibm.com
Job Sequencer
Build like a regular job Type Job Sequence Has stages and links Job Activity stage represents a DataStage job Links represent passing control
9/26/2013 4:25 AM 88
Stages
Atul Singh
atulsingh@in.ibm.com
Example
Job Activity stage contains conditional triggers
9/26/2013 4:25 AM
89
Atul Singh
atulsingh@in.ibm.com
QUESTIONS ??????????
9/26/2013 4:25 AM 90
Atul Singh
atulsingh@in.ibm.com