Professional Documents
Culture Documents
1. DATASTAGE QUESTIONS.............................................................................2
2. DATASTAGE FAQ from GEEK INTERVIEW QUESTIONS......................14
3. DATASTAGE FAQ.........................................................................................26
4. TOP 10 FEATURES IN DATASTAGE HAWK.............................................30
5. DATASTAGE NOTES....................................................................................32
6. DATASTAGE TUTORIAL.............................................................................43
............................................................................................................................................44
About DataStage.................................................................................................................44
Client Components.............................................................................................................44
DataStage Designer. ..........................................................................................................44
DataStage Director.............................................................................................................44
DataStage Manager............................................................................................................44
DataStage Administrator....................................................................................................45
DataStage Manager Roles..................................................................................................45
Server Components............................................................................................................45
DataStage Features.............................................................................................................45
Types of jobs......................................................................................................................46
DataStage NLS...................................................................................................................46
JOB.....................................................................................................................................47
Built-In Stages – Server Jobs.............................................................................................47
Aggregator. ........................................................................................................................47
Hashed File. .......................................................................................................................47
UniVerse. ...........................................................................................................................47
UniData..............................................................................................................................47
ODBC.................................................................................................................................47
Sequential File. ..................................................................................................................48
Folder Stage. ......................................................................................................................48
Transformer........................................................................................................................48
Container............................................................................................................................49
IPC Stage............................................................................................................................49
Link Collector Stage...........................................................................................................49
Link Partitioner Stage.........................................................................................................50
Server Job Properties..........................................................................................................50
Containers...........................................................................................................................50
Local containers. ...............................................................................................................50
Shared containers. .............................................................................................................50
Job Sequences.....................................................................................................................51
7. LEARN FEATURES OF DATASTAGE.........................................................52
8. INFORMATICA vs DATASTAGE:................................................................94
☻Page 1 of 243☻
9. BEFORE YOU DESIGN YOUR APPLICATION........................................104
10. DATASTAGE 7.5x1 GUI FEATURES.........................................................113
11. DATASTAGE & DWH INTERVIEW QUESTIONS...................................117
12. DATASTAGE ROUTINES............................................................................131
13. SET_JOB_PARAMETERS_ROUTINE........................................................198
DATASTAGE QUESTIONS
1. What is the flow of loading data into fact & dimensional tables?
A) Fact table - Table with Collection of Foreign Keys corresponding to the Primary Keys
in Dimensional table. Consists of fields with numeric values.
Dimension table - Table with Unique Primary Key.
Load - Data should be first loaded into dimensional table. Based on the primary key
values in dimensional table, the data should be loaded into Fact table.
2. What is the default cache size? How do you change the cache size if needed?
A. Default cache size is 256 MB. We can increase it by going into Datastage
Administrator and selecting the Tunable Tab and specify the cache size over there.
☻Page 2 of 243☻
Dynamic files do not perform as well as a well, designed static file, but do perform better
than a badly designed one. When creating a dynamic file you can specify the following
Although all of these have default values)
11. How to run a Shell Script within the scope of a Data stage job?
A) By using "ExcecSH" command at Before/After job properties.
☻Page 3 of 243☻
12. How to handle Date conversions in Datastage? Convert a mm/dd/yyyy format to
yyyy-dd-mm?
A) We use a) "Iconv" function - Internal Conversion.
b) "Oconv" function - External Conversion.
☻Page 4 of 243☻
Plug-In: a) Good Performance.
b) Database specific. (Only one database)
c) Cannot handle Stored Procedures.
Ascential DataStage
Ascential DataStage EE (3)
Ascential DataStage EE MVS
Ascential DataStage TX
Ascential QualityStage
Ascential MetaStage
Ascential RTI (2)
Ascential ProfileStage
Ascential AuditStage
Ascential Commerce Manager
Industry Solutions
Connectivity
Files
RDBMS
Real-time
PACKs
EDI
Other
Server Components:
Data Stage Engine
Meta Data Repository
☻Page 5 of 243☻
Package Installer
☻Page 6 of 243☻
Data Stage Designer:
We can create the jobs. We can compile the job. We can run the job. We can
declare stage variable in transform, we can call routines, transform, macros, functions.
We can write constraints.
☻Page 7 of 243☻
It is a JAVA engine running at the background.
Q 33 What is sequencer?
It sets the sequence of execution of server jobs.
☻Page 8 of 243☻
Q 34 What are Active and Passive stages?
Active Stage: Active stage model the flow of data and provide mechanisms for combining
data streams, aggregating data and converting data from one data type to another. Eg,
Transformer, aggregator, sort, Row Merger etc.
Passive Stage: A Passive stage handles access to Database for the extraction or writing of
data. Eg, IPC stage, File types, Universe, Unidata, DRS stage etc.
Q 35 What is ODS?
Operational Data Store is a staging area where data can be rolled back.
1) Examples
☻Page 9 of 243☻
3) MyName = DSJobName
Q 37 What is keyMgtGetNextValue?
It is a Built-in transform it generates Sequential numbers. Its input type is literal string &
output type is string.
Q 40 What is container?
A container is a group of stages and links. Containers enable you to simplify and
modularize your server job designs by replacing complex areas of the diagram with a
single container stage. You can also use shared containers as a way of incorporating server
job functionality into parallel jobs.
DataStage provides two types of container:
• Local containers. These are created within a job and are only accessible by that
job. A local container is edited in a tabbed page of the job’s Diagram window.
• Shared containers. These are created separately and are stored in the Repository
in the same way that jobs are. There are two types of shared container
☻Page 10 of 243☻
Specify the job you want to control DSAttachJob
Gets the meta data details for the specified link DSGetLinkMetaData
Get buffer size and timeout value for an IPC or Web Service DSGetIPCStageProps
stage
Get information about the meta bag properties associated with DSGetJobMetaBag
the named job
Get the names of the links attached to the specified stage DSGetStageLinks
Get a number of log events on the specified subject from the DSGetLogSummary
job log
Get the newest log event, of a specified type, from the job log DSGetNewestLogId
☻Page 11 of 243☻
Return a job handle previously obtained from DSAttachJob DSDetachJob
Log a fatal error message in a job's log file and aborts the job. DSLogFatal
Put an info message in the job log of a job controlling current DSLogToController
job.
Suspend a job until a named file either exists or does not exist. DSWaitForFile
Q 42 What is Routines?
Routines are stored in the Routines branch of the Data Stage Repository, where you can
create, view or edit. The following programming components are classified as routines:
Transform functions, Before/After subroutines, Custom UniVerse functions, ActiveX
(OLE) functions, Web Service routines
☻Page 12 of 243☻
Q 46 What is job sequencer?
Q 47 What are different activities in job sequencer?
Q 48 What are triggers in data Stages? (conditional, unconditional, otherwise)
Q 49 Are u generated job Reports? S
Q 50 What is plug-in?
Q 51 Have u created any custom transform? Explain? (Oconv)
☻Page 13 of 243☻
DATASTAGE FAQ from GEEK INTERVIEW QUESTIONS
Question: Dimension Modeling types along with their significance
Answer:
Data Modelling is broadly classified into 2 types.
A) E-R Diagrams (Entity - Relatioships).
B) Dimensional Modelling.
Question: Dimensional modelling is again sub divided into 2 types.
Answer:
A) Star Schema - Simple & Much Faster. Denormalized form.
B) Snowflake Schema - Complex with more Granularity. More normalized form.
Question: Importance of Surrogate Key in Data warehousing?
Answer:
Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is, it is
independent of underlying database, i.e. Surrogate Key is not affected by the changes
going on with a database.
Question: Differentiate Database data and Data warehouse data?
Answer:
Data in a Database is
A) Detailed or Transactional
B) Both Readable and Writable.
C) Current.
Question: What is the flow of loading data into fact & dimensional tables?
Answer:
Fact table - Table with Collection of Foreign Keys corresponding to the Primary Keys in
Dimensional table. Consists of fields with numeric values.
Dimension table - Table with Unique Primary Key.
Load - Data should be first loaded into dimensional table. Based on the primary key
values in dimensional table, then data should be loaded into Fact table.
Question: Orchestrate Vs Datastage Parallel Extender?
Answer:
Orchestrate itself is an ETL tool with extensive parallel processing capabilities and
running on UNIX platform. Datastage used Orchestrate with Datastage XE (Beta version
of 6.0) to incorporate the parallel processing capabilities. Now Datastage has purchased
Orchestrate and integrated it with Datastage XE and released a new version Datastage 6.0
i.e. Parallel Extender.
Question: Differentiate Primary Key and Partition Key?
Answer:
Primary Key is a combination of unique and not null. It can be a collection of key values
called as composite primary key. Partition Key is a just a part of Primary Key. There are
several methods of partition like Hash, DB2, Random etc...While using Hash partition we
specify the Partition Key.
☻Page 14 of 243☻
Question: What are Stage Variables, Derivations and Constants?
Answer:
Stage Variable - An intermediate processing variable that retains value during read and
doesn’t pass the value into target column.
Constraint - Conditions that are either true or false that specifies flow of data with a link.
Derivation - Expression that specifies value to be passed on to the target column.
Question: What is the default cache size? How do you change the cache size if
needed?
Answer:
Default cache size is 256 MB. We can increase it by going into Datastage Administrator
and selecting the Tunable Tab and specify the cache size over there.
Question: What are Static Hash files and Dynamic Hash files?
Answer:
As the names itself suggest what they mean. In general we use Type-30 dynamic Hash
files. The Data file has a default size of 2GB and the overflow file is used if the data
exceeds the 2GB size.
☻Page 15 of 243☻
Answer:
Using "dsjob" command as follows.
dsjob -run -jobstatus projectname jobname
Question: What are the command line functions that import and export the DS jobs?
Answer:
dsimport.exe - imports the DataStage components.
dsexport.exe - exports the DataStage components.
Question: How to run a Shell Script within the scope of a Data stage job?
Answer:
By using "ExcecSH" command at Before/After job properties.
Question: What are OConv () and Iconv () functions and where are they used?
Answer:
IConv() - Converts a string to an internal storage format
OConv() - Converts an expression to an output format.
☻Page 16 of 243☻
Answer:
In a Hashed File, the size of the file keeps changing randomly.
If the size of the file increases it is called as "Modulus".
If the size of the file decreases it is called as "Splitting".
Question: Did you Parameterize the job or hard-coded the values in the jobs?
Answer:
Always parameterized the job. Either the values are coming from Job Properties or from a
‘Parameter Manager’ – a third part tool. There is no way you will hard–code some
parameters in your jobs. The often Parameterized variables in a job are: DB DSN name,
username, password, dates W.R.T for the data to be looked against at.
Question: Have you ever involved in updating the DS versions like DS 5.X, if so tell
us some the steps you have taken in doing so?
Answer:
Yes.
The following are some of the steps:
Definitely take a back up of the whole project(s) by exporting the project as a .dsx file
See that you are using the same parent folder for the new version also for your old jobs
using the hard-coded file path to work.
After installing the new version import the old project(s) and you have to compile them all
again. You can use 'Compile All' tool for this.
Make sure that all your DB DSN's are created with the same name as old ones. This step is
for moving DS from one machine to another.
In case if you are just upgrading your DB from Oracle 8i to Oracle 9i there is tool on DS
CD that can do this for you.
Do not stop the 6.0 server before the upgrade, version 7.0 install process collects project
information during the upgrade. There is NO rework (recompilation of existing
jobs/routines) needed after the upgrade.
Question: What are other Performance tunings you have done in your last project to
increase the performance of slowly running jobs?
Answer:
☻Page 17 of 243☻
Staged the data coming from ODBC/OCI/DB2UDB stages or any database on the
server using Hash/Sequential files for optimum performance also for data recovery in
case job aborts.
Tuned the OCI stage for 'Array Size' and 'Rows per Transaction' numerical values for
faster inserts, updates and selects.
Tuned the 'Project Tunables' in Administrator for better performance.
Used sorted data for Aggregator.
Sorted the data as much as possible in DB and reduced the use of DS-Sort for better
performance of jobs.
Removed the data not used from the source as early as possible in the job.
Worked with DB-admin to create appropriate Indexes on tables for better performance
of DS queries.
Converted some of the complex joins/business in DS to Stored Procedures on DS for
faster execution of the jobs.
If an input file has an excessive number of rows and can be split-up then use standard
logic to run jobs in parallel.
Before writing a routine or a transform, make sure that there is not the functionality
required in one of the standard routines supplied in the sdk or ds utilities categories.
Constraints are generally CPU intensive and take a significant amount of time to
process. This may be the case if the constraint calls routines or external macros but if it
is inline code then the overhead will be minimal.
Try to have the constraints in the 'Selection' criteria of the jobs itself. This will
eliminate the unnecessary records even getting in before joins are made.
Tuning should occur on a job-by-job basis.
Use the power of DBMS.
Try not to use a sort stage when you can use an ORDER BY clause in the database.
Using a constraint to filter a record set is much slower than performing a SELECT …
WHERE….
Make every attempt to use the bulk loader for your particular database. Bulk loaders
are generally faster than using ODBC or OLE.
Question: Tell me one situation from your last project, where you had faced problem
and How did u solve it?
Answer:
1. The jobs in which data is read directly from OCI stages are running extremely slow. I
had to stage the data before sending to the transformer to make the jobs run faster.
2. The job aborts in the middle of loading some 500,000 rows. Have an option either
cleaning/deleting the loaded data and then run the fixed job or run the job again from
the row the job has aborted. To make sure the load is proper we opted the former.
☻Page 18 of 243☻
Most of the times the data was sent to us in the form of flat files. The data is dumped and
sent to us. In some cases were we need to connect to DB2 for look-ups as an instance then
we used ODBC drivers to connect to DB2 (or) DB2-UDB depending the situation and
availability. Certainly DB2-UDB is better in terms of performance as you know the native
drivers are always better than ODBC drivers. 'iSeries Access ODBC Driver 9.00.02.02' -
ODBC drivers to connect to AS400/DB2.
Question: What are Routines and where/how are they written and have you written
any routines before?
Answer:
Routines are stored in the Routines branch of the DataStage Repository, where you can
create, view or edit.
The following are different types of Routines:
1. Transform Functions
2. Before-After Job subroutines
3. Job Control Routines
Question: What will you in a situation where somebody wants to send you a file and
use that file as an input or reference and then run job.
Answer:
• Under Windows: Use the 'WaitForFileActivity' under the Sequencers and then run the
job. May be you can schedule the sequencer around the time the file is expected to
arrive.
• Under UNIX: Poll for the file. Once the file has start the job or sequencer depending
on the file.
Question: What is the utility you use to schedule the jobs on a UNIX server other
than using Ascential Director?
Answer:
Use crontab utility along with dsexecute() function along with proper parameters passed.
☻Page 19 of 243☻
Answer:
Yes. One of the most important requirements.
Question: How would call an external Java function which are not supported by
DataStage?
Answer:
Starting from DS 6.0 we have the ability to call external Java functions using a Java
package from Ascential. In this case we can even use the command line to invoke the Java
function and write the return values from the Java program (if any) and use that files as a
source in DataStage job.
Question: How will you determine the sequence of jobs to load into data warehouse?
Answer:
First we execute the jobs that load the data into Dimension tables, then Fact tables, then
load the Aggregator tables (if any).
Question: The above might raise another question: Why do we have to load the
dimensional tables first, then fact tables:
Answer:
As we load the dimensional tables the keys (primary) are generated and these keys
(primary) are Foreign keys in Fact tables.
Question: Does the selection of 'Clear the table and Insert rows' in the ODBC stage
send a Truncate statement to the DB or does it do some kind of Delete logic.
Answer:
There is no TRUNCATE on ODBC stages. It is Clear table blah blah and that is a delete
from statement. On an OCI stage such as Oracle, you do have both Clear and Truncate
options. They are radically different in permissions (Truncate requires you to have alter
table permissions where Delete doesn't).
Question: How do you rename all of the jobs to support your new File-naming
conventions?
Answer:
Create an Excel spreadsheet with new and old names. Export the whole project as a dsx.
Write a Perl program, which can do a simple rename of the strings looking up the Excel
file. Then import the new dsx file probably into a new project for testing. Recompile all
jobs. Be cautious that the name of the jobs has also been changed in your job control jobs
or Sequencer jobs. So you have to make the necessary changes to these Sequencers.
☻Page 20 of 243☻
Question: How good are you with your PL/SQL?
Answer:
On the scale of 1-10 say 8.5-9
Question: What are the main differences between Ascential DataStage and
Informatica PowerCenter?
Answer:
Chuck Kelley’s Answer: You are right; they have pretty much similar functionality.
However, what are the requirements for your ETL tool? Do you have large sequential files
(1 million rows, for example) that need to be compared every day versus yesterday? If so,
then ask how each vendor would do that. Think about what process they are going to do.
Are they requiring you to load yesterday’s file into a table and do lookups? If so, RUN!!
Are they doing a match/merge routine that knows how to process this in sequential files?
Then maybe they are the right one. It all depends on what you need the ETL to do. If you
are small enough in your data sets, then either would probably be OK.
Les Barbusinski’s Answer: Without getting into specifics, here are some differences you
may want to explore with each vendor:
• Does the tool use a relational or a proprietary database to store its Meta data and
scripts? If proprietary, why?
• What add-ons are available for extracting data from industry-standard ERP,
Accounting, and CRM packages?
☻Page 21 of 243☻
• Can the tool’s Meta data be integrated with third-party data modeling and/or
business intelligence tools? If so, how and with which ones?
• How well does each tool handle complex transformations, and how much external
scripting is required?
• What kinds of languages are supported for ETL script extensions?
Almost any ETL tool will look like any other on the surface. The trick is to find out which
one will work best in your environment. The best way I’ve found to make this
determination is to ascertain how successful each vendor’s clients have been using their
product. Especially clients who closely resemble your shop in terms of size, industry, in-
house skill sets, platforms, source systems, data volumes and transformation complexity.
Ask both vendors for a list of their customers with characteristics similar to your own that
have used their ETL product for at least a year. Then interview each client (preferably
several people at each site) with an eye toward identifying unexpected problems, benefits,
or quirkiness with the tool that have been encountered by that customer. Ultimately, ask
each customer – if they had it all to do over again – whether or not they’d choose the same
tool and why? You might be surprised at some of the answers.
Joyce Bischoff’s Answer: You should do a careful research job when selecting products.
You should first document your requirements, identify all possible products and evaluate
each product against the detailed requirements. There are numerous ETL products on the
market and it seems that you are looking at only two of them. If you are unfamiliar with
the many products available, you may refer to www.tdan.com, the Data Administration
Newsletter, for product lists.
If you ask the vendors, they will certainly be able to tell you which of their product’s
features are stronger than the other product. Ask both vendors and compare the answers,
which may or may not be totally accurate. After you are very familiar with the products,
call their references and be sure to talk with technical people who are actually using the
product. You will not want the vendor to have a representative present when you speak
with someone at the reference site. It is also not a good idea to depend upon a high-level
manager at the reference site for a reliable opinion of the product. Managers may paint a
very rosy picture of any selected product so that they do not look like they selected an
inferior product.
☻Page 22 of 243☻
Answer: Batch program is the program it's generate run time to maintain by the Datastage
itself but u can easy to change own the basis of your requirement (Extraction,
Transformation, Loading) .Batch program are generate depends your job nature either
simple job or sequencer job, you can see this program on job control option.
Question: Suppose that 4 job control by the sequencer like (job 1, job 2, job 3, job 4 )
if job 1 have 10,000 row ,after run the job only 5000 data has been loaded in target
table remaining are not loaded and your job going to be aborted then.. How can
short out the problem?
Answer:
Suppose job sequencer synchronies or control 4 job but job 1 have problem, in this
condition should go director and check it what type of problem showing either data type
problem, warning massage, job fail or job aborted, If job fail means data type problem or
missing column action .So u should go Run window ->Click-> Tracing->Performance or
In your target table ->general -> action-> select this option here two option
(i) On Fail -- Commit , Continue
(ii) On Skip -- Commit, Continue.
First u check how much data already load after then select on skip option then
continue and what remaining position data not loaded then select On Fail ,
Continue ...... Again Run the job defiantly u gets successful massage
Question: How do you rename all of the jobs to support your new File-naming
conventions?
Answer: Create a Excel spreadsheet with new and old names. Export the whole project as
a dsx. Write a Perl program, which can do a simple rename of the strings looking up the
Excel file. Then import the new dsx file probably into a new project for testing.
Recompile all jobs. Be cautious that the name of the jobs has also been changed in your
job control jobs or Sequencer jobs. So you have to make the necessary changes to these
Sequencers.
Question: What will you in a situation where somebody wants to send you a file and
use that file as an input or reference and then run job.
Answer: A. Under Windows: Use the 'WaitForFileActivity' under the Sequencers and
then run the job. May be you can schedule the sequencer around the time the file is
expected to arrive.
B. Under UNIX: Poll for the file. Once the file has start the job or sequencer depending on
the file
☻Page 23 of 243☻
Answer: Sequencers are job control programs that execute other jobs with preset Job
parameters.
Answer: In almost all cases we have to delete the data inserted by this from DB manually
and fix the job and then run the job again.
Question34: What is the difference between the Filter stage and the Switch stage?
Ans: There are two main differences, and probably some minor ones as well. The two
main differences are as follows.
1) The Filter stage can send one input row to more than one output link. The Switch
stage can not - the C switch construct has an implicit break in every case.
2) The Switch stage is limited to 128 output links; the Filter stage can have a
theoretically unlimited number of output links. (Note: this is not a challenge!)
Question: How can i achieve constraint based loading using datastage7.5.My target
tables have inter dependencies i.e. Primary key foreign key constraints. I want my primary
key tables to be loaded first and then my foreign key tables and also primary key tables
should be committed before the foreign key tables are executed. How can I go about it?
2) To improve the performance of the Job, you can disable all the constraints on the tables
and load them. Once loading done, check for the integrity of the data. Which does not
meet raise exceptional data and cleanse them.
This only a suggestion, normally when loading on constraints are up, will drastically
performance will go down.
3) If you use Star schema modeling, when you create physical DB from the model, you
can delete all constraints and the referential integrity would be maintained in the ETL
process by referring all your dimension keys while loading fact tables. Once all
dimensional keys are assigned to a fact then dimension and fact can be loaded together. At
the same time RI is being maintained at ETL process level.
Ans: Either use Copy command as a Before-job subroutine if the metadata of the 2 files
are same or create a job to concatenate the 2 files into one, if the metadata is different.
☻Page 24 of 243☻
Question: How do you eliminate duplicate rows?
Ans: Data Stage provides us with a stage Remove Duplicates in Enterprise edition. Using
that stage we can eliminate the duplicates based on a key column.
Ans: While job development we can create a parameter 'FILE_NAME' and the value can
be passed while
Ans: In almost all cases we have to delete the data inserted by this from DB manually and
fix the job and then run the job again.
A master record and an update record are merged only if both of them have the same
values for the merge key column(s) that we specify .Merge key columns are one or more
columns that exist in both the master and update records.
Business advantages:
☻Page 25 of 243☻
Technological advantages:
DATASTAGE FAQ
☻Page 26 of 243☻
Basically architecture of DS is client/server architecture.
Data stage manager is used for to import & export the project to view & edit the contents
of the repository.
Data stage administrator is used for creating the project, deleting the project & setting
the environment variables.
Data stage director is use for to run the jobs, validate the jobs, scheduling the jobs.
Server components
DS server: runs executable server jobs, under the control of the DS director, that extract,
transform, and load data into a DWH.
DS Package installer: A user interface used to install packaged DS jobs and plug-in;
Repository or project: a central store that contains all the information required to build
DWH or data mart.
3. I have some jobs every month automatically delete the log details what r the steps
u have to take for that
4. I want to run the multiple jobs in the single job. How can u handle.
☻Page 27 of 243☻
1. VSS- visual source safe
2. CVSS- concurrent visual source safe.
VSS is designed by Microsoft but the disadvantage is only one user can access at a time,
other user can wait until the first user complete the operation.
CVSS, by using this many users can access concurrently. When compared to VSS, CVSS
cost is high.
6. What is the difference between clear log file and clear status file?
Clear log--- we can clear the log details by using the DS Director. Under job menu
clear log option is available. By using this option we can clear the log details of
particular job.
Clear status file---- lets the user remove the status of the record associated with all stages
of selected jobs.(in DS Director)
7. I developed 1 job with 50 stages, at the run time one stage is missed how can u
identify which stage is missing?
By using usage analysis tool, which is available in DS manager, we can find out the what r
the items r used in job.
8. My job takes 30 minutes time to run, I want to run the job less than 30 minutes?
What r the steps we have to take?
By using performance tuning aspects which are available in DS, we can reduce time.
Tuning aspect
And also use link partitioner & link collector stage in between passive stages
Pivot stage is used to transposition purpose. Pivot is an active stage that maps sets of
columns in an input table to a single column in an output table.
10. If a job locked by some user, how can you unlock the particular job in DS?
We can unlock the job by using clean up resources option which is available in DS
Director. Other wise we can find PID (process id) and kill the process in UNIX server.
11. What is a container? How many types containers are available? Is it possible to
use container as look up?
☻Page 28 of 243☻
A container is a group of stages and links. Containers enable you to simplify and
modularize your server job designs by replacing complex areas of the diagram with a
single container stage.
DataStage provides two types of container:
• Local containers. These are created within a job and are only accessible by that job
only.
• Shared containers. These are created separately and are stored in the Repository in the
same way that jobs are. Shared containers can use any job in the project.
To deconstruct the shared container, first u have to convert the shared container to local
container. And then deconstruct the container.
13. I am getting input value like X = Iconv(“31 DEC 1967”,”D”)? What is the X
value?
X value is Zero.
Iconv Function Converts a string to an internal storage format.It takes 31 dec 1967 as zero
and counts days from that date(31-dec-1967).
14. What is the Unit testing, integration testing and system testing?
Unit testing: As for Ds unit test will check the data type mismatching,
Size of the particular data type, column mismatching.
Integration testing: According to dependency we will put all jobs are integrated in to one
sequence. That is called control sequence.
System testing: System testing is nothing but the performance tuning aspects in Ds.
15. What are the command line functions that import and export the DS jobs?
16. How many hashing algorithms are available for static hash file and dynamic hash
file?
17. What happens when you have a job that links two passive stages together?
☻Page 29 of 243☻
Obviously there is some process going on. Under covers Ds inserts a cut-down
transformer stage between the passive stages, which just passes data straight from one
stage to the other.
Nested Condition. Allows you to further branch the execution of a sequence depending on
a condition.
19. I have three jobs A,B,C . Which are dependent on each other? I want to run A
& C jobs daily and B job runs only on Sunday. How can u do it?
First you have to schedule A & C jobs Monday to Saturday in one sequence.
Next take three jobs according to dependency in one more sequence and schedule that job
only Sunday.
The IILive2005 conference marked the first public presentations of the functionality in the
WebSphere Information Integration Hawk release. Though it's still a few months away I
am sharing my top Ten things I am looking forward to in DataStage Hawk:
☻Page 30 of 243☻
1) The metadata server. To borrow a simile from that judge on American Idol "Using
MetaStage is kind of like bathing in the ocean on a cold morning. You know it's good for
you but that doesn't stop it from freezing the crown jewels." MetaStage is good for ETL
projects but none of the projects I've been on has actually used it. Too much effort
required to install the software, setup the metabrokers, migrate the metadata, and learn
how the product works and write reports. Hawk brings the common repository and
improved metadata reporting and we can get the positive effectives of bathing in sea water
without the shrinkage that comes with it.
2) QualityStage overhaul. Data Quality reporting can be another forgotten aspect of data
integration projects. Like MetaStage the QualityStage server and client had an additional
install, training and implementation overhead so many DataStage projects did not use it. I
am looking forward to more integration projects using standardisation, matching and
survivorship to improve quality once these features are more accessible and easier to use.
3) Frictionless Connectivity and Connection Objects. I've called DB2 every rude name
under the sun. Not because it's a bad database but because setting up remote access takes
me anywhere from five minutes to five weeks depending on how obscure the error
message and how hard it is to find the obscure setup step that was missed during
installation. Anything that makes connecting to database easier gets a big tick from me.
4) Parallel job range lookup. I am looking forward to this one because it will stop people
asking for it on forums. It looks good, it's been merged into the existing lookup form and
seems easy to use. Will be interested to see the performance.
5) Slowly Changing Dimension Stage. This is one of those things that Informatica were
able to trumpet at product comparisons, that they have more out of the box DW support.
There are a few enhancements to make updates to dimension tables easier, there is the
improved surrogate key generator, there is the slowly changing dimension stage and
updates passed to in memory lookups. That's it for me with DBMS generated keys, I'm
only doing the keys in the ETL job from now on! DataStage server jobs have the hash file
lookup where you can read and write to it at the same time, parallel jobs will have the
updateable lookup.
6) Collaboration: better developer collaboration. Everyone hates opening a job and being
told it is locked. "Bloody whathisname has gone to lunch, locked the job and now his
password protected screen saver is up! Unplug his PC!" Under Hawk you can open a
readonly copy of a locked job plus you get told who has locked the job so you know
whom to curse.
8) Improved SQL Builder. I know a lot of people cross the street when they see the SQL
Builder coming. Getting the SQL builder to build complex SQL is a bit like teaching a
monkey how to play chess. What I do like about the current SQL builder is that it
☻Page 31 of 243☻
synchronises your SQL select list with your ETL column list to avoid column mismatches.
I am hoping the next version is more flexible and can build complex SQL.
9) Improved job startup times. Small parallel jobs will run faster. I call it the death of a
thousand cuts, your very large parallel job takes too long to run because a thousand
smaller jobs are starting and stopping at the same time and cutting into CPU and memory.
Hawk makes these cuts less painful.
10) Common logging. Log views that work across jobs, log searches, log date constraints,
wildcard message filters, saved queries. It's all good. You no longer need to send out a
search party to find an error message.
That's my top ten. I am also hoping the software comes in a box shaped like a hawk and
makes a hawk scream when you open it. A bit like those annoying greeting cards. Is there
any functionality you think Hawk is missing that you really want to see?
DATASTAGE NOTES
DataStage Tips:
1. Aggregator stage does not support more than one source, if you try to do this you
will get error, “The destination stage cannot support any more stream input links”.
☻Page 32 of 243☻
2. You can give N number input links to transformer stage, but you can give
sequential file stage as reference link. You can give only one sequential file stage
as primary link and number other links as reference link. If you try to give
sequential file stage as reference link you will get error as, “The destination stage
cannot support any more stream input links” because reference link represent a
lookup table, but sequential file does not use as lookup table, Hashed file can be
use as lookup table.
DATABASE Stages:
ODBC Stage:
☻Page 33 of 243☻
You can use an ODBC stage to extract, write, or aggregate data. Each ODBC stage can
have any number of inputs or outputs. Input links specify the data you are writing. Output
links specify the data you are extracting and any aggregations required. You can specify
the data on an input link using an SQL statement constructed by DataStage, a generated
query, a stored procedure, or a user-defined SQL query.
• GetSQLInfo: is used to get quote character and schema delimiters of your data
source. Optionally specify the quote character used by the data source. By default, this
is set to " (double quotes). You can also click the Get SQLInfo button to connect to
the data source and retrieve the Quote character it uses. An entry of 000 (three zeroes)
specifies that no quote character should be used.
Optionally specify the schema delimiter used by the data source. By default this is set
to. (period) but you can specify a different schema delimiter, or multiple schema
delimiters. So, for example, identifiers have the form
Node:Schema.Owner;TableName you would enter :.; into this field. You can also click
the Get SQLInfo button to connect to the data source and retrieve the Schema
delimiter it uses.
• NLS tab: You can define a character set map for an ODBC stage using the NLS tab of
the ODBC Stage
The ODBC stage can handle the following SQL Server data types:
• GUID
• Timestamp
• SmallDateTime
• Update action. Specifies how the data is written. Choose the option you want from
the drop-down list box:
Clear the table, then insert rows. Deletes the contents of the table and adds the
new rows.
Insert rows without clearing. Inserts the new rows in the table.
Insert new or update existing rows. New rows are added or, if the insert fails, the
existing rows are updated.
Replace existing rows completely. Deletes the existing rows, then adds the new
rows to the table.
Update existing rows only. Updates the existing data rows. If a row with the
supplied key does not exist in the table then the table is not updated but a warning
is logged.
Update existing or insert new rows. The existing data rows are updated or, if this
fails, new rows are added.
Call stored procedure. Writes the data using a stored procedure. When you select
this option, the Procedure name field appears.
User-defined SQL. Writes the data using a user-defined SQL statement. When
you select this option, the View SQL tab is replaced by the Enter SQL tab.
☻Page 34 of 243☻
• Create table in target database. Select this check box if you want to automatically
create a table in the target database at run time. A table is created based on the defined
column set for this stage. If you select this option, an additional tab, Edit DDL,
appears. This shows the SQL CREATE statement to be used for table generation.
• Transaction Handling. This page allows you to specify the transaction handling
features of the stage as it writes to the ODBC data source. You can choose whether to
use transaction grouping or not, specify an isolation level, the number of rows written
before each commit, and the number of rows written in each operation.
Isolation Levels: Read Uncommitted, Read Committed, Repeatable Read,
Serializable, Versioning, and Auto-Commit.
Rows per transaction field. This is the number of rows written before the data is
committed to the data table. The default value is 0, that is, all the rows are written
before being committed to the data table.
Parameter array size field. This is the number of rows written at a time. The
default is 1, that is, each row is written in a separate operation.
==
PROCESSING Stages:
TRANSFORMER Stage:
Transformer stages do not extract data or write data to a target database. They are used to
handle extracted data, perform any conversions required, and pass data to another
Transformer stage or a stage that writes data to a target data table.
Transformer stages can have any number of inputs and outputs. The link from the main
data input source is designated the primary input link. There can only be one primary
input link, but there can be any number of reference inputs.
Input Links
The main data source is joined to the Transformer stage via the primary link, but the stage
can also have any number of reference input links.
A reference link represents a table lookup. These are used to provide information that
might affect the way the data is changed, but do not supply the actual data to be changed.
Reference input columns can be designated as key fields. You can specify key expressions
that are used to evaluate the key fields. The most common use for the key expression is to
specify an equi-join, which is a link between a primary link column and a reference link
column. For example, if your primary input data contains names and addresses, and a
☻Page 35 of 243☻
reference input contains names and phone numbers, the reference link name column is
marked as a key field and the key expression refers to the primary link’s name column.
During processing, the name in the primary input is looked up in the reference input. If the
names match, the reference data is consolidated with the primary data. If the names do not
match, i.e., there is no record in the reference input whose key matches the expression
given, all the columns specified for the reference input are set to the null value.
Where a reference link originates from a UniVerse or ODBC stage, you can look up
multiple rows from the reference table. The rows are specified by a foreign key, as
opposed to a primary key used for a single-row lookup.
Output Links
You can have any number of output links from your Transformer stage.
You may want to pass some data straight through the Transformer stage unaltered, but it’s
likely that you’ll want to transform data from some input columns before outputting it
from the Transformer stage.
The source of an output link column is defined in that column’s Derivation cell within the
Transformer Editor. You can use the Expression Editor to enter expressions or transforms
in this cell. You can also simply drag an input column to an output column’s Derivation
cell, to pass the data straight through the Transformer stage.
In addition to specify derivation details for individual output columns, you can also
specify constraints that operate on entire output links. A constraint is a BASIC expression
that specifies criteria that data must meet before it can be passed to the output link. You
can also specify a reject link, which is an output link that carries all the data not output on
other links, that is, columns that have not met the criteria.
Each output link is processed in turn. If the constraint expression evaluates to TRUE for
an input row, the data row is output on that link. Conversely, if a constraint expression
evaluates to FALSE for an input row, the data row is not output on that link.
Constraint expressions on different links are independent. If you have more than one
output link, an input row may result in a data row being output from some, none, or all of
the output links.
For example, if you consider the data that comes from a paint shop, it could include
information about any number of different colors. If you want to separate the colors into
different files, you would set up different constraints. You could output the information
☻Page 36 of 243☻
about green and blue paint on LinkA, red and yellow paint on LinkB, and black paint on
LinkC.
When an input row contains information about yellow paint, the LinkA constraint
expression evaluates to FALSE and the row is not output on LinkA. However, the input
data does satisfy the constraint criterion for LinkB and the rows are output on LinkB.
If the input data contains information about white paint, this does not satisfy any
constraint and the data row is not output on Links A, B or C, but will be output on the
reject link. The reject link is used to route data to a table or file that is a “catch-all” for
rows that are not output on any other link. The table or file containing these rejects is
represented by another stage in the job design.
Because the Transformer stage is an active stage type, you can specify routines to be
executed before or after the stage has processed the data. For example, you might use a
before-stage routine to prepare the data before processing starts. You might use an after-
stage routine to send an electronic message when the stage has finished.
The first link to a Transformer stage is always designated as the primary input link.
However, you can choose an alternative link to be the primary link if necessary. To do
this:
1. Select the current primary input link in the Diagram window.
2. Choose Convert to Reference from the Diagram window shortcut menu.
3. Select the reference link that you want to be the new primary input link.
4. Choose Convert to Stream from the Diagram window shortcut menu.
==
AGGREGATOR Stage:
Aggregator stages classify data rows from a single input link into groups and compute
totals or other aggregate functions for each group. The summed totals for each group are
output from the stage via an output link.
☻Page 37 of 243☻
If you want to aggregate the input data in a number of different ways, you can have
several output links, each specifying a different set of properties to define how the input
data is grouped and summarized.
==
FOLDER Stage:
Folder stages are used to read or write data as files in a directory located on the DataStage
server.
The folder stages can read multiple files from a single directory and can deliver the files to
the job as rows on an output link. The folder stage can also write rows of data as files to a
directory. The rows arrive at the stage on an input link.
Note: The behavior of the Folder stage when reading folders that contain other folders is
undefined.
In an NLS environment, the user running the job must have write permission on the folder
so that the NLS map information can be set up correctly.
The Columns tab defines the data arriving on the link to be written in files to the
directory. The first column on the Columns tab must be defined as a key, and gives the
name of the file. The remaining columns are written to the named file, each column
separated by a newline. Data to be written to a directory would normally be delivered in a
single column.
☻Page 38 of 243☻
The Columns tab defines a maximum of two columns. The first column must be marked
as the Key and receives the file name. The second column, if present, receives the contents
of the file.
==
IPC Stage:
The output link connecting IPC stage to the stage reading data can be opened as soon as
the input link connected to the stage writing data has been opened.
You can use Inter-process stages to join passive stages together. For example you could
use them to speed up data transfer between two data sources:
In this example the job will run as two processes, one handling the communication from
sequential file stage to IPC stage, and one handling communication from IPC stage to
ODBC stage. As soon as the Sequential File stage has opened its output link, the IPC stage
can start passing data to the ODBC stage. If the job is running on a multi processor
system, the two processor can run simultaneously so the transfer will be much faster.
The Properties tab allows you to specify two properties for the IPC stage:
• Buffer Size. Defaults to 128 Kb. The IPC stage uses two blocks of memory; one block
can be written to while the other is read from. This property defines the size of each block,
so that by default 256 Kb is allocated in total.
• Timeout. Defaults to 10 seconds. This gives time limit for how long the stage will wait
for a process to connect to it before timing out. This normally will not need changing, but
may be important where you are prototyping multi-processor jobs on single processor
platforms and there are likely to be delays.
==
☻Page 39 of 243☻
The Link Partitioner stage is an active stage which takes one input and allows you to
distribute partitioned rows to up to 64 output links. The stage expects the output links to
use the same meta data as the input link.
Partitioning your data enables you to take advantage of a multi-processor system and have
the data processed in parallel. It can be used in conjunction with the Link Collector stage
to partition data, process it in parallel, then collect it together again before writing it to a
single target. To really understand the benefits you need to know a bit about how
DataStage jobs are run as processes, see “DataStage Jobs and Processes”.
In order for this job to compile and run as intended on a multi-processor system you must
have inter-process buffering turned on, either at project level using the DataStage
Administrator, or at job level from the Job Properties dialog box.
The General tab on the Stage page contains optional fields that allow you to define
routines to use which are executed before or after the stage has processed the data.
• Before-stage subroutine and Input Value. Contain the name (and value) of a
subroutine that is executed before the stage starts to process any data. For example,
you can specify a routine that prepares the data before processing starts.
• After-stage subroutine and Input Value. Contain the name (and value) of a
subroutine that is executed after the stage has processed the data. For example, you can
specify a routine that sends an electronic message when the stage has finished.
Choose a routine from the drop-down list box. This list box contains all the routines
defined as a Before/After Subroutine under the Routines branch in the Repository. Enter
an appropriate value for the routine’s input argument in the Input Value field.
If you choose a routine that is defined in the Repository, but which was edited but not
compiled, a warning message reminds you to compile the routine when you close the Link
Partitioner Stage dialog box.
A return code of 0 from the routine indicates success, any other code indicates failure and
causes a fatal error when the job is run.
☻Page 40 of 243☻
If you installed or imported a job, the Before-stage subroutine or Afterstage subroutine
field may reference a routine that does not exist on your system. In this case, a warning
message appears when you close the Link Partitioner Stage dialog box. You must install
or import the “missing” routine or choose an alternative one to use.
The Properties tab allows you to specify two properties for the Link Partitioner stage:
• Partitioning Algorithm. Use this property to specify the method the stage uses to
partition data. Choose from:
Round-Robin. This is the default method. Using the round-robin method the stage
will write each incoming row to one of its output links in turn.
Random. Using this method the stage will use a random number generator to
distribute incoming rows evenly across all output links.
Hash. Using this method the stage applies a hash function to one or more input
column values to determine which output link the row is passed to.
Modulus. Using this method the stage applies a modulus function to an integer
input column value to determine which output link the row is passed to.
• Partitioning Key. This property is only significant where you have chosen a
partitioning algorithm of Hash or Modulus. For the Hash algorithm, specify one or
more column names separated by commas. These keys are concatenated and a hash
function applied to determine the destination output link. For the Modulus algorithm,
specify a single column name which identifies an integer numeric column. The value
of this column value determines the destination output link.
The Link Partitioner stage can have one input link. This is where the data to be partitioned
arrives.
The Inputs page has two tabs: General and Columns.
• General. The General tab allows you to specify an optional description of the stage.
• Columns. The Columns tab contains the column definitions for the data on the input
link. This is normally populated by the meta data of the stage connecting on the input
side. You can also Load a column definition from the Repository, or type one in
yourself (and Save it to the Repository if required). Note that the meta data on the
input link must be identical to the meta data on the output links.
The Link Partitioner stage can have up to 64 output links. Partitioned data flows along
these links. The Output Name drop-down list on the Outputs pages allows you to select
which of the 64 links you are looking at.
The Outputs page has two tabs: General and Columns.
☻Page 41 of 243☻
• General. The General tab allows you to specify an optional description of the stage.
• Columns. The Columns tab contains the column definitions for the data on the input
link. You can Load a column definition from the Repository, or type one in yourself
(and Save it to the Repository if required). Note that the meta data on the output link
must be identical to the meta data on the input link. So the meta data is identical for all
the output links.
==
The Link Collector stage is an active stage which takes up to 64 inputs and allows you to
collect data from this links and route it along a single output link. The stage expects the
output link to use the same meta data as the input links.
The Link Collector stage can be used in conjunction with a Link Partitioner stage to
enable you to take advantage of a multi-processor system and have data processed in
parallel. The Link Partitioner stage partitions data, it is processed in parallel, then the Link
Collector stage collects it together again before writing it to a single target. To really
understand the benefits you need to know a bit about how DataStage jobs are run as
processes, see “DataStage Jobs and Processes”.
In order for this job to compile and run as intended on a multi-processor system you must
have inter-process buffering turned on, either at project level using the DataStage
Administrator, or at job level from the Job Properties dialog box.
The Properties tab allows you to specify two properties for the Link Collector stage:
• Collection Algorithm. Use this property to specify the method the stage uses to
collect data. Choose from:
Round-Robin. This is the default method. Using the round-robin method the stage
will read a row from each input link in turn.
Sort/Merge. Using the sort/merge method the stage reads multiple sorted inputs
and writes one sorted output.
☻Page 42 of 243☻
• Sort Key. This property is only significant where you have chosen a collecting
algorithm of Sort/Merge. It defines how each of the partitioned data sets are known to
be sorted and how the merged output will be sorted. The key has the following format:
In an NLS environment, the collate convention of the locale may affect the sort order. The
default collate convention is set in the DataStage Administrator, but can be set for
individual jobs in the Job Properties dialog box.
For example:
FIRSTNAME d, SURNAME D
Specifies that rows are sorted according to FIRSTNAME column and SURNAME column
in descending order.
The Link Collector stage can have up to 64 input links. This is where the data to be
collected arrives. The Input Name drop-down list on the Inputs page allows you to select
which of the 64 links you are looking at.
DATASTAGE TUTORIAL
☻Page 43 of 243☻
4. DataStage 16. UniData.
Director 17. ODBC
5. DataStage 18. Sequential File
Manager 19. Folder Stage
6. DataStage 20. Transformer
Administrator 21. Container
7. DataStage 22. IPC Stage
Manager Roles 23. Link Collector
8. Server Stage
Components 24. Link Partitioner
9. DataStage Stage
Features 25. Server Job
10. Types of Jobs Properties
11. DataStage NLS 26. Containers
12. JOB 27. Local containers
13. Aggregator 28. Shared containers
14. Hashed File
15. UniVerse 29. Job Sequences
About DataStage
DataStage Director.
DataStage Manager.
☻Page 44 of 243☻
DataStage Administrator
Server Components
DataStage Features
☻Page 45 of 243☻
Loads the data warehouse
Types of jobs
There are two other entities that are similar to jobs in the way
they appear in the DataStage Designer, and are handled by it.
These are:
Shared containers.
Job Sequences.
☻Page 46 of 243☻
• Sort data according to local rules
JOB
Hashed File.
Hashed File stages represent a hashed file, i.e., a file that uses
a hashing algorithm for distributing records in one or more
groups on disk. You can use a Hashed File stage to extract or
write data, or to act as an intermediate file in a job. The
primary role of a Hashed File stage is as a reference table
based on a single key field.
UniVerse.
UniData.
ODBC.
☻Page 47 of 243☻
stage is also used as an intermediate stage for aggregating
data.
Sequential File.
Folder Stage.
Transformer.
☻Page 48 of 243☻
Transformer stages in server jobs can have any number of
inputs and outputs. The link from the main data input source
is designated the primary input link. There can only be one
primary input link, but there can be any number of reference
inputs.
Container.
IPC Stage.
The output link connecting IPC stage to the stage reading data
can be opened as soon as the input link connected to the stage
writing data has been opened.
☻Page 49 of 243☻
Link Partitioner Stage.
Containers
Local containers.
Shared containers.
☻Page 50 of 243☻
Job Sequences
☻Page 51 of 243☻
LEARN FEATURES OF DATASTAGE
DATASTAGE:
DataStage has the following features to aid the
design and processing required to build a data
warehouse:
Uses graphical design tools. With simple point-
and-click techniques you can draw a
scheme to represent your processing
requirements.
Extracts data from any number or type of
database.
Handles all the metadata definitions required
to define your data warehouse. You can
view and modify the table definitions at
any point during the design of your
application.
Aggregates data. You can modify SQL SELECT
statements used to extract data.
Transforms data. DataStage has a set of
predefined transforms and functions you
can use to convert your data. You can
easily extend the functionality by defining
your own transforms to use.
Loads the data warehouse.
COMPONENTS OF DATASTAGE:
DataStage consists of a number of client and
server components. DataStage has four client
components
☻Page 52 of 243☻
monitor DataStage server jobs and parallel
jobs.
3. DataStage Manager. A user interface
used to view and edit the contents of the
Repository.
4. DataStage Administrator. A user
interface used to perform administration
tasks such as setting up DataStage users,
creating and moving projects, and setting
up purging criteria.
SERVER COMPONENTS:
There are three server components:
1. Repository. A central store that contains
all the information required to build a
data mart or data warehouse.
2. DataStage Server. Runs executable jobs
that extract, transform, and load data into
a data warehouse.
3. DataStage Package Installer. A user
interface used to install packaged
DataStage jobs and plug-ins.
DATASTAGE PROJECTS:
You always enter DataStage through a DataStage
project. When you start a DataStage client you
are prompted to attach to a project. Each project
contains:
• DataStage jobs.
• Built-in components. These are predefined
components used in a job.
• User-defined components. These are
customized components created using the
DataStage Manager. Each user-defined
component performs a specific task in a
job.
DATASTAGE JOBS:
There are three basic types of DataStage job:
1. Server jobs. These are compiled and run
on the DataStage server. A server job will
☻Page 53 of 243☻
connect to databases on other machines as
necessary, extract data, process it, then
write the data to the target
datawarehouse.
2. Parallel jobs. These are compiled and run
on the DataStage server in a similar way to
server jobs, but support parallel processing
on SMP, MPP, and cluster systems.
3. Mainframe jobs. These are available only
if you have Enterprise MVS Edition
installed. A mainframe job is compiled and
run on the mainframe. Data extracted by
such jobs is then loaded into the data
warehouse.
SPECIAL ENTITIES:
• Shared containers. These are reusable
job elements. They typically comprise a
number of stages and links. Copies of
shared containers can be used in any
number of server jobs or parallel jobs and
edited as required.
• Job Sequences. A job sequence allows
you to specify a sequence of DataStage
jobs to be executed, and actions to take
depending on results.
TYPES OF STAGES:
• Built-in stages. Supplied with DataStage
and used for extracting aggregating,
transforming, or writing data. All types of
job have these stages.
• Plug-in stages. Additional stages that can
be installed in DataStage to perform
specialized tasks that the built-in stages do
not support Server jobs and parallel jobs
can make use of these.
• Job Sequence Stages. Special built-in
stages which allow you to define
sequences of activities to run. Only Job
Sequences have these.
☻Page 54 of 243☻
DATASTAGE NLS:
DataStage has built-in National Language
Support (NLS). With NLS installed, DataStage can
do the following:
• Process data in a wide range of languages
• Accept data in any character set into most
DataStage fields
• Use local formats for dates, times, and
money (server jobs)
• Sort data according to local rules
☻Page 55 of 243☻
TO CONNECT TO A PROJECT:
1. Enter the name of your host in the Host
system field. This is the name of the
system where the DataStage Server
components are installed.
2. Enter your user name in the User name
field. This is your user name on the server
system.
3. Enter your password in the Password
field.
4. Choose the project to connect to from the
Project drop-down list box.
5. Click OK. The DataStage Designer window
appears with the New dialog box open,
ready for you to create a new job:
☻Page 56 of 243☻
CREATING A JOB:
Jobs are created using the DataStage Designer.
For this example, you need to create a server
job, so double-click the New Server Job icon.
☻Page 57 of 243☻
IMPORTING TABLE DEFINITIONS:
1. In the Repository window of the DataStage
Designer, select the Table Definitions
branch, and choose Import
Table Definitions… from the shortcut
menu. The Import Metadata (ODBC
Tables) dialog box appears:
☻Page 58 of 243☻
DEVELOPING A JOB:
Jobs are designed and developed using the
Designer. The job design is developed in the
Diagram window (the one with grid lines). Each
data source, the data warehouse, and each
processing step is represented by a stage in the
job design. The stages are linked together to
show the flow of data.
Adding Stages:
Stages are added using the tool palette. This
palette contains icons that represent the
components you can add to a job. The palette
has different groups to organize the tools
available.
To add a stage:
☻Page 59 of 243☻
1. Click the stage button on the tool palette that
represents the stage type you want to add.
2. Click in the Diagram window where you want
the stage to be positioned. The stage appears in
the Diagram window as a square. You can also
drag items from the palette to the Diagram
window.
We recommend that you position your stages as
follows:
Data sources on the left
Data warehouse on the right
Transformer stage in the center
When you add stages, they are automatically
assigned default names. These names are based
on the type of stage and the number of the item
in the Diagram window. You can use the default
names in the example.
Once all the stages are in place, you can link
them together to show the flow of data.
Linking Stages
You need to add two links:
• One between the Universe and Transformer
stages
• One between the Transformer and Sequential
File stages
To add a link:
1. Right-click the first stage, hold the mouse
button down and drag the link to the transformer
stage. Release the mouse button.
2. Right-click the Transformer stage and drag the
link to the Sequential File stage.
The following screen shows how the Diagram
window looks when you have added the stages
and links:
☻Page 60 of 243☻
Editing the Stages
☻Page 61 of 243☻
The General tab specifies where the file is found
and the connection type.
• Outputs. Contains information describing the
data flowing from the stage. You edit this page to
describe the data you want to extract from the
file. In this example, the output from this stage
goes to the Transformer stage. To edit the
Universe stage:
1. Check that you are displaying the General tab
on the Stage page.
Choose localuv from the Data source name
drop-down list. Localuv is where EXAMPLE1 is
copied to during installation.
The remaining parameters on the General and
Details tabs are used to enter logon details and
describe where to find the file. Because
EXAMPLE1 is installed in localuv, you do not
have to complete these fields, which are
disabled.
2. Click the Outputs tab. The Outputs page
appears:
☻Page 62 of 243☻
3. Choose dstage.EXAMPLE1 from the
Available tables drop-down list.
4. Click Add to add dstage.EXAMPLE1 to the
Table names field.
5. Click the Columns tab. The Columns tab
appears at the front of the dialog box. You must
specify the columns contained in the file you
want to use. Because the column definitions are
stored in a table definition in the Repository, you
can load them directly.
6. Click Load…. The Table Definitions window
appears with then UniVerse localuv branch
highlighted.
7. Select dstage.EXAMPLE1. The Select
Columns dialog box appears, allowing you to
select which column definitions you want to load.
8. In this case you want to load all available
columns definitions, so just click OK. The column
definitions specified in the table definition are
copied to the stage. The Columns tab contains
definitions for the four columns in EXAMPLE1:
☻Page 63 of 243☻
11. Choose File → Save to save your job design
so far.
☻Page 64 of 243☻
Transformer stage to edit it. The Transformer
Editor appears:
☻Page 65 of 243☻
drop-down list. Next you will specify the
transform to apply to the input DATE column to
produce the output DATE column. You do this in
the upper right pane of the Transformer Editor.
7. Double-click the Derivation field for the
DSLink4 DATE column. The Expression Editor box
appears. At the moment, the box contains the
text DSLink3.DATE, which indicates that the
output is directly derived from the input DATE
column. Select the text DSLink3 and delete it by
pressing the Delete key.
☻Page 66 of 243☻
10. Select the MONTH.TAG transform. It
appears in the Expression Editor box with the
argument field [%Arg1%] highlighted.
11. Right-click to open the Suggest Operand
menu again. This time, select Input Column. A
list of available input columns appears:
☻Page 67 of 243☻
This dialog box has two pages:
• Stage. Displayed by default. This page
contains the name of the stage you are editing
and two tabs. The General tab specifies the line
termination type, and the NLS tab specifies a
character set map to use with the stage (this
appears if you have NLS installed).
• Inputs. Describes the data flowing into the
stage. This page only appears when you have an
input to a Sequential File stage. You do not need
to edit the column definitions on this page,
because they were all specified in the
Transformer stage.
☻Page 68 of 243☻
• Columns tab. Contains the column definitions
for the data you want to extract. This tab
contains the column definitions specified in the
Transformer stage’s output link.
2. Enter the pathname of the text file you want to
create in the File name field, for example,
seqfile.txt. By default the file is placed in the
server project directory (for example,
c:\Ascential\DataStage\Projects\datastage) and is
named after the input link, but you can enter, or
browse for, a different directory.
3. Click OK to close the Sequential File Stage
dialog box.
4. Choose File Save to save the job design.
The job design is now complete and ready to be
compiled.
Compiling a Job
Running a Job
☻Page 69 of 243☻
can start the Director from the Designer by
choosing Tools → Run Director.
When the Director is started, the DataStage
Director window appears with the status of all
the jobs in your project:
Developinga Job
☻Page 70 of 243☻
Server, connecting to other data sources as
necessary.
• Mainframe jobs. These are available only if
you have installed Enterprise MVS Edition.
Mainframe jobs are uploaded to a mainframe,
where they are compiled and run.
• Parallel jobs. These are available only if you
have installed the Enterprise Edition. These run
on DataStage servers that are SMP, MPP, or
cluster systems. There are two other entities
that are similar to jobs in the way they appear
in the DataStage Designer, and are handled by it.
These are:
• Shared containers. These are reusable job
elements. They typically comprise a number of
stages and links. Copies of shared containers can
be used in any number of server jobs and parallel
jobs and edited as required.
• Job Sequences. A job sequence allows you to
specify a sequence of DataStage server or
parallel jobs to be executed, and actions to take
depending on results.
STAGES:
☻Page 71 of 243☻
the extraction or writing of data. Active stages
model the flow of data and provide mechanisms
for combining data streams, aggregating data,
and converting data from one data type to
another.
As well as using the built-in stage types, you can
also use plug-in stages for specific operations
that the built-in stages do not support. The
Palette organizes stage types into different
groups, according to function:
• Database
• File
• PlugIn
• Processing
• Real Time
☻Page 72 of 243☻
☻Page 73 of 243☻
Mainframe Job Stages
☻Page 74 of 243☻
`
☻Page 75 of 243☻
Parallel jobs Processing Stages
☻Page 76 of 243☻
SERVER JOBS:
When you design a job you see it in terms of
stages and links. When it is compiled, the
DataStage engine sees it in terms of processes
that are subsequently run on the server. How
does the DataStage engine define a process? It is
here that the distinction between active and
passive stages becomes important. Actives
stages, such as the Transformer and Aggregator
perform processing tasks, while passive stages,
such as Sequential file stage and ODBC stage,
are reading or writing data sources and provide
services to the active stages. At its simplest,
active stages become processes. But the
situation becomes more complicated where you
connect active stages together and passive
stages together.
☻Page 77 of 243☻
Single Processor and Multi-Processor
Systems
☻Page 78 of 243☻
Release 6 of DataStage make it possible for you
to stipulate at design time that jobs should be
compiled in this way. There are two ways of
doing this:
• Explicitly – by inserting IPC stages between
connected active stages.
• Implicitly – by turning on inter-process row
buffering either project wide (using the
DataStage Administrator) or for individual jobs
(in the Job Properties dialog box)
The IPC facility can also be used to produce
multiple processes where passive stages are
directly connected. This means that an operation
reading from one data source and writing to
another could be divided into a reading process
and a writing process able to take advantage of
multiprocessor systems.
☻Page 79 of 243☻
Partitioning and Collecting
☻Page 80 of 243☻
AggregatorStages
☻Page 81 of 243☻
this link and the column definitions of the data
are defined on the Inputs page in the
Aggregator Stage dialog box.
☻Page 82 of 243☻
Data element: The type of data in the
column.
Description: A text description of the
column.
☻Page 83 of 243☻
TransformerStages
Link Area
☻Page 84 of 243☻
bar between them to resize the panes relative to
one another. There is also a horizontal scroll bar,
allowing you to scroll the view left or right. The
left pane shows input links, the right pane shows
output links. The input link shown at the top of
the left pane is always the primary link. Any
subsequent links are reference links. For all types
of link, key fields are shown in bold. Reference
link key fields that have no expression defined
are shown in red (or the color defined in Tools
‰ Options), as are output columns that have no
derivation defined.
Within the Transformer Editor, a single link may
be selected at any one time. When selected, the
link’s title bar is highlighted, and arrowheads
indicate any selected columns.
Metadata Area
Input Links
☻Page 85 of 243☻
the key fields. The most common use for the
key expression is to specify an equi-join, which is
a link between a primary link column and a
reference link column. For example, if your
primary input data contains names and
addresses, and a reference input contains names
and phone numbers, the reference link name
column is marked as a key field and the key
expression refers to the primary link’s name
column. During processing, the name in the
primary input is looked up in the reference input.
If the names match, the reference data is
consolidated with the primary data. If the names
do not match, i.e., there is no record in the
reference input whose key matches the
expression given, all the columns specified for
the reference input are set to the null value.
Output Links
☻Page 86 of 243☻
the data not output on other links, that is,
columns that have not met the criteria. Each
output link is processed in turn. If the constraint
expression evaluates to TRUE for an input row,
the data row is output on that link. Conversely, if
a constraint expression evaluates to FALSE for an
input row, the data row is not output on that link.
Constraint expressions on different links are
independent. If you have more than one output
link, an input row may result in a data row being
output from some, none, or all of the output
links. For example, if you consider the data that
comes from a paint shop, it could include
information about any number of different colors.
If you want to separate the colors into different
files, you would set up different constraints. You
could output the information about green and
blue paint on Link A, red and yellow paint on Link
B, and black paint on Link C. When an input
row contains information about yellow paint, the
Link A constraint expression evaluates to FALSE
and the row is not output on Link A. However,
the input data does satisfy the constraint
criterion for Link B and the rows are output on
Link B. If the input data contains information
about white paint, this does not satisfy any
constraint and the data row is not output on
Links A, B or C, but will be output on the reject
link. The reject link is used to route data to a
table or file that is a “catch-all” for rows that are
not output on any other link. The table or file
containing these rejects is represented by
another stage in the job design.
Inter-ProcessStages
☻Page 87 of 243☻
In this example the job will run as two processes,
one handling the communication from sequential
file stage to IPC stage, and one handling
communication from IPC stage to ODBC stage. As
soon as the Sequential File stage has opened its
output link, the IPC stage can start passing data
to the ODBC stage. If the job is running on a
multi-processor system, the two processor can
run simultaneously so the transfer will be much
faster. You can also use the IPC stage to
explicitly specify that connected active stages
should run as separate processes. This is
advantageous for performance on multi-
processor systems. You can also specify this
behavior implicitly by turning inter process row
buffering on, either for the whole project via
DataStage Administrator, or individually for a job
in its Job Properties dialog box.
☻Page 88 of 243☻
Using the IPC Stage
☻Page 89 of 243☻
processor jobs on single processor platforms and
there are likely to be delays.
☻Page 90 of 243☻
links. The stage expects the output links to use
the same metadata as the input link. Partitioning
your data enables you to take advantage of a
multi-processor system and have the data
processed in parallel. It can be used in
conjunction with the Link Collector stage to
partition data, process it in parallel, and then
collect it together again before writing it to a
single target.
☻Page 91 of 243☻
values to determine which output link the row is
passed to.
– Modulus. Using this method the stage applies
a modulus function to an integer input column
value to determine which output link the row is
passed to.
LinkCollectorStages
☻Page 92 of 243☻
In order for this job to compile and run as
intended on a multi-processor system you must
have inter-process buffering turned on, either at
project level using the Data Stage Administrator,
or at job level from the Job Properties dialog
box.
☻Page 93 of 243☻
INFORMATICA vs DATASTAGE:
Deployment Facility
- Ability to handle initial Yes, No,
deployment, major releases, My experience has been Ascential has done a good job
minor releases and patches that INFA is definitely in recent releases.
with equal ease easier to implement
initially and upgrade.
Transformations
- No of available 58 28,
transformation functions DS has many more canned
transformation functions than
28.
- Support for looping the Supports for comparing Does not support
source row (For While Loop) immediate previous
record
- Slowly Changing Dimension Full history, recent Supports only through Custom
values, Current & Prev scripts. Does not have a
values wizard to do this.
DS has a component called
ProfileStage that handles this
type of comparison. You'll want
to use it judiciously in your
production processing because
it does take extra resources to
use it but I have
found it to be very useful.
- Time Dimension generation Does not support. Does not support.
- Rejected Records Can be captured Cannot be captured in
separate file.
DS absolutely has the ability to
capture rejected records in a
separate file. That's a pretty
basic capability and I don't
know of any ETL tool
that can't do it...
- Debugging Facility Not Supported. Supports basic debugging
facilities for testing.
Application Integration
Functionality
- Support for real Time Not Available Not Available,
Data Exchange The 7.x version of DS has a
component to handle real-time
data exchange. I think it is
called RTE.
- Support for CORBA/XML Does not support Does not support
Metadata
☻Page 94 of 243☻
- Ability to view & navigate Does Not Support Job sessions can be monitored
metadata on the web using Informatica
Classes.
This is completely not true. DS
has a very strong metadata
component (MetaStage) that
works not only with DS, but
also has plug-ins to work
with modeling tools (like
ERWin) and BI tools (like
Cognos). This is one
of their strong suits (again,
IMHO).
- Ability to Customize views of Supports Not Available,
metadata for different users Also not true - MetaStage
(DBA Vs allows publishing of metadata
Business user). in HTML format for different
types of users. It is completely
customizable.
- Metadata repository can be Yes No. But the proprietary meta
stored in RDBMS data can be moved to a
RDBMS using the DOC Tool
1) System Requirement
1.1 Platform Support
1.1.1 Informatica: Win NT/ Unix
1.1.2 DataStage: Win NT/ Unix/More platforms.
2) Deployment facility
2.1. Ability to handle initial deployment, major releases, minor
releases and patches with equal ease
2.1.1.Informatica:. Yes
2.1.2.DataStage: No
My experience has been that INFA is definitiely easier to
implementinitially and upgrade. Ascential has done a good job in
recent releases
to improve, but IMHO INFA still does this better.
☻Page 95 of 243☻
3) Transformations
3.1. No of available transformation functions
3.1.1.Informatica:. 58
3.1.2.DataStage: 28
DS has many more canned transformation functions than 28. I'm
not surewhat leads you to this number, but I'd recheck it if I were
you.
3.2. Support for looping the source row (For While Loop)
3.2.1.Informatica:. Supports for comparing immediate previous
record
3.2.2.DataStage: Does not support
5) Metadata
5.1. Ability to view & navigate metadata on the web
☻Page 96 of 243☻
5.1.1..Informatica:. Does not support
5.1.2.DataStage: Job sessions can be monitored using Informatica
Classes
☻Page 97 of 243☻
these. SAP is a reseller of DataStage for SAP BW,
PeopleSoft bundles DataStage in its EPM products.
DataStage has some very good debugging facilities
including the ability to step through a job link by link or row
by row and watch data values as a job executes. Also
server side tracing.
DataStage 7.x releases have intelligent assistants (wizards)
for creating the template jobs for each type of slowly
changing dimension table loads. The DataStage Best
Practices course also provides training in DW loading with
SCD and surrogate key techniques.
Ascential and Informatica both have robust metadata
management products. Ascential MetaStage comes
bundled free with DataStage Enterprise and manages
metadata via a hub and spoke architecture. It can import
metadata from a wide range of databases and modelling
tools and has a high degree of interaction with DataStage
for operational metadata. Informatica SuperGlue was
released last year and is rated more highly by Gartner in
the metadata field. It integrates closely with PowerCenter
products. They both support multiple views (business and
technical) of metadata plus the functions you would expect
such as impact analysis, semantics and data lineage.
DataStage can send emails. The sequence job has an
email stage that is easy to configure. DataStage 7.5 also
has new mobile device support so you can administer your
DataStage jobs via a palm pilot. There are also 3rd party
web based tools that let you run and review jobs over a
browser. I found it easy to send sms admin messages from
a DataStage Unix server.
DataStage has a command line interface. The dsjob
command can be used by any scheduling tool or from the
command line to run jobs and check the results and logs of
jobs.
Both products integrate well with Trillium for data quality,
DataStage also integrate with QualityStage for data quality.
This is the preferred method of address cleansing and
fuzzy matching.
Milind - I've got to ask - where are you getting your information
from??? I have done ETL tool comparisons for several clients over
the past 7 or so years. They are both good tools with different
strengths so it really depends on what your organizations needs /
priorities are as to which one is "better". I have spent much more
time in the past couple of years on DS than INFA so I don't feel I
can speak to the changes INFA has made lately, but I know you
have incorrect info about DS.
☻Page 98 of 243☻
challenges with meeting availability requirements. It is one of the
most impressive changes Ascential has made lately (IMHO).
☻Page 99 of 243☻
support multiple views (business and technical) of
metadata plus the functions you would expect such as
impact analysis, semantics and data lineage.
- DataStage can send emails. The sequence job has an
email stage that is easy to configure. DataStage 7.5
also has new mobile device support so you can
administer your DataStage jobs via a palm pilot.
There are also 3rd party web based tools that let you
run and review jobs over a browser. I found it easy to
send sms admin messages from a DataStage Unix
server.
- DataStage has a command line interface. The dsjob
command can be used by any scheduling tool or from
the command line to run jobs and check the results and
logs of jobs.
- Both products integrate well with Trillium for data
quality, DataStage also integrate with QualityStage for
data quality. This is the preferred method of address
cleansing and fuzzy matching.
CODE
ID CustKey Name DOB City State
1001 BS001 Bob Smith 6/8/1961 Tampa FL
1002 LJ004 Lisa Jones 10/15/1954 Miami FL
CODE
ID CustKey Name DOB City State
1001 BS001 Bob Smith 6/8/1961 Dayton OH
1002 LJ004 Lisa Jones 10/15/1954 Miami FL
CODE
ID CustKey Name DOB City St Curr
Effective Date
1001 BS001 Bob Smith 6/8/1961 Tampa FL
Y 5/1/2004
1002 LJ004 Lisa Jones 10/15/1954 Miami FL
Y 5/2/2004
CODE
ID CustKey Name DOB City St Curr
Effective Date
1001 BS001 Bob Smith 6/8/1961 Tampa FL N
5/1/2004
1002 LJ004 Lisa Jones 10/15/1954 Miami FL Y
5/2/2004
1003 BS001 Bob Smith 6/8/1961 Dayton OH Y
5/27/2004
As you can see, there are two dimension records for Bob
Conforming Dimensions
Conforming Dimension
Customer Dimension
CODE
ID CustKey Name DOB City State
1001 BS001 Bob Smith 6/8/1961 Tampa FL
1002 LJ004 Lisa Jones 10/15/1954 Miami FL
Billing Dimension
CODE
ID Bill2Ky Name Account Type Credit
Limit CustKey
1001 9211 Bob Smith Credit $10,000
BS001
1002 23421 Lisa Jones Cash $100
LJ004
CODE
ID CustKey Name DOB City St Curr
Effective Date
1001 BS001 Bob Smith 6/8/1961 Dayton OH
Y 5/1/2004
1002 LJ004 Lisa Jones 10/15/1957 Miami FL
Y 5/2/2004
CODE
ID CustKey Name DOB City St Curr
Effective Date
1001 BS001 Bob Smith 6/8/1961 Tampa FL
N 5/1/2004
1002 LJ004 Lisa Jones 10/15/1957 Miami FL
Y 5/2/2004
1003 BS001 Bob Smith 6/8/1961 Dayton OH
Y 5/27/2004
As you can see, the current ID for Bob Smith in the Type 1
SCD is 1001, while it is 1003 in the Type 2 SCD. This is not
conforming.
CODE
ID CustKey Name DOB City St
1001 BS001 Bob Smith 6/8/1961 Dayton OH
1002 LJ004 Lisa Jones 10/15/1957 Miami FL
CODE
ID SubKey CustKey Name DOB City
St Curr Eff Date
1001 001 BS001 Bob Smith 6/8/1961 Tampa
FL N 5/1/2004
1002 001 LJ004 Lisa Jones 10/15/1957 Miami
FL Y 5/2/2004
1001 002 BS001 Bob Smith 6/8/1961
Dayton OH Y 5/27/2004
You must assess your data. Data Stage jobs can be quite
complex and so it is advisable to consider the following
before starting a job:
variable = @NULL
variable = @NULL.STR
Errors that occur as the files are loaded into Oracle are
recorded in the sqlldr log file.
Rejected rows are written to the bad file. The main reason for
rejected rows is an integrity constraint in the target table; for
example, null values in NOT NULL columns, nonunique values in
UNIQUE columns, and so on. The bad file is in the same format
as the input data file
A=′12345′
A[3]=1212
MyString = "1#2#3#4#5"
String = Fieldstore (MyString, "#", 2, 2, "A#B")
* above results in: "1#a#B#4#5"
Eq or = Equality X=Y
IF Operator:
Syntax
Syntax
Example
Call DSLogInfo("Transforming: ":Arg1,
"MyTransform")
Example
Call DSLogInfo("Transforming: ":Arg1,
"MyTransform")
Date( ) :
Ereplace Function:
Formats data for output.:
Syntax
MyString = "AABBCCBBDDBB"
NewString = Ereplace(MyString, "BB", "")
* The result is "AACCDD"
Date Conversions
The following examples show the effect of various D (Date)
conversion codes.
Date Conversions
The following examples show the effect of various D (Date)
conversion codes.
X = Oconv(10740, X = "5/27/97"
"D/MDY[Z,Z,2]")
Example
* Do some processing...
...
Return
• Compiler Directives
• Declaration
• Program Control
• Data Conversion
• Data Formatting
• Locales
Function MyTransform(Arg1)
Begin Case
Article-II:
• Transformer “Cancel” operation:
If the Cancel button or <ESC> key are pressed from the main
Transformer dialog and changes have been made, then a
confirmation message box is displayed, to check that the user
wants to quit without saving the changes. If no changes have
been made, no confirmation message is displayed.
• Multi-Client Manager:
The previously unsupported “Client Switcher” tool has been
enhanced and integrated into the DataStage Client. This tool
allows the users to install and switch between multiple different
versions of the client. Switching between them also changes
the desktop shortcuts and the Start Menu group to point to
another installed DataStage client.
Basic DWH:
DataStage:
27. How do you import your source and targets? What are the
types of sources and targets?
28. What is Active Stages and Passive Stages means in
datastage?
29. What is difference between Informatica and DataStage?
Which do you think is best?
30. What are the stages you used in your project?
31. Whom do you report?
32. What is orchestrate? Difference between orchestrate and
datastage?
33. What is parallel extender? Had you work on this?
34. What do you mean by parallel processing?
35. What is difference between Merge Stage and Join Stage?
36. What is difference between Copy Stage and Transformer
Stage?
37. What is difference between ODBC Stage and OCI Stage?
38. What is difference between Lookup Stage and Join Stage?
39. What is difference between Change Capture Stage and
Difference Stage?
40. What is difference between Hashed file and Sequential
File?
41. What are different Joins used in Join Stage?
42. How you decide when to go for join stage and lookup
stage?
43. What is partition key? Which key is used in round robin
partition?
44. How do you handle SCD in datastage?
45. What are Change Capture Stage and Change Apply Stages?
46. How many streams to the transformer you can give?
47. What is primary link and reference link?
48. What is routine? What is before and after subroutines?
These are run after/before job or stage?
49. Had you write any subroutines in your project?
50. What is Config File? Each job having its own config file or
one is needed?
51. What is Node?
52. What is IPC Stage? What it increase performance?
53. What is Sequential buffer?
54. What are Link Partioner and Link Collector?
55. What are the performance tunning you have done in your
project?
56. Did you done scheduling? How? Can you schedule a job at
the every end date of month? How?
57. What is job sequence? Had you run any jobs?
Data stage:
DWH FAQ:
Conformed dimension:
Junk dimension:
• It is convenient grouping of random flags and aggregates
to get them out of a fact table and into a useful
dimensional framework.
Degenerated dimension:
• Usually occur in line item oriented fact table designs.
Degenerate dimensions are normal, expected and useful.
• The degenerated dimension key should be the actual
production order of number and should set in the fact table
without a join to anything.
Time dimension:
• It contains a number of useful attributes for describing
calendars and navigating.
• An exclusive time dimension is required because the SQL
date semantics and functions cannot generate several
important features, attributes required for analytical
purposes.
• Attributes like week days, week ends, holidays, physical
periods cannot be generated by SQL statements.
Fact less fact table:
• Fact table which do not have any facts are called fact less
fact table.
• They may consist of keys; these two kinds of fact tables
do not have any facts at all.
• The first type of fact less fact table records an ‘event’.
• Many event tracking tables in dimensional data
warehouses turn out to be factless.
Ex: A student tracking system that details each ‘student
attendance’ event each day.
• The second type of fact less fact table is coverage. The
coverage tables are frequently needed when a primary fact
table in dimensional DWH is sparse.
Ex: The sales fact table that records the sales of products
in stores on particular days under each promotion
condition
DATASTAGE ROUTINES
BL:
DataIn = "":Trim(Arg1)
CheckFileRecords:
Function CheckFileRecords(Arg1,Arg2)
Loop
CloseSeq FileVar
Ans=vCountVal
Return (vCountVal)
CheckFileSizes:
FNAME = "GLEISND_OC_02_20040607_12455700.csv"
Ans = Output
CheckIdocsSent:
If the job has a fatal error with "No link file", the routine
will copy the IDOC link file(s) into the interface error
folder.
In case the fatal error above is not found the routine
aborts the job.
If System(91) Then
OsType = 'NT'
OsDelim = '\'
NonOsDelim = '/'
Move = 'move '
End Else
OsType = 'UNIX'
OsDelim = '/'
NonOsDelim = '\'
Move = 'mv -f '
End
vJobHandle = DSAttachJob(JobName,
DSJ.ERRFATAL)
vLastRunStart = DSGetJobInfo(vJobHandle,
DSJ.JOBSTARTTIMESTAMP)
vLastRunEnd = DSGetJobInfo(vJobHandle,
DSJ.JOBLASTTIMESTAMP)
vErr = DSDetachJob(vJobHandle)
Call DSLogInfo("Job " : JobName : "
Detached" , vRoutineName)
End Else
Call DSLogInfo("Could not open file -
" : vIdocLogFilePath , vRoutineName)
Call DSLogInfo("Creating new file - " :
vIdocLogFilePath , vRoutineName)
CREATE vIdocLogFile ELSE Call
DSLogFatal("Could not create file - " :
vIdocLogFilePath , vRoutineName)
WEOFSEQ vIdocLogFile
WRITESEQ Fmt("Module Run", "12' 'L") :
Fmt("Status", "10' 'L") : " " : "Message" To
vIdocLogFile Else ABORT
Call DSLogInfo("Log file created : " :
vIdocLogFilePath , vRoutineName)
GOTO FileCreated
End
ClearMappingTable:
SUBROUTINE ClearMappingTable
(Clear_Mapping_Table, Errorcode)
ComaDotRmv:
DataIn = "":(Arg1)
CopyFiles:
Function
CopyofFiles(sourceDir,SourceFileMask,TargetDir,Ta
rgetFileMask,Flags)
RoutineName = "CopyFiles"
If System(91) Then
OsType = 'NT'
OsDelim = '\'
NonOsDelim = '/'
Copy = 'copy '
Flag = Flags
End Else
OsType = 'UNIX'
OsDelim = '/'
NonOsDelim = '\'
Copy = 'cp -f '
End
SourceWorkFiles =
Trims(Convert(',',@FM,SourceFileMask))
SourceFileList =
Splice(Reuse(SourceDir),OsDelim,SourceWorkFiles)
TargetWorkFiles =
Trims(Convert(',',@FM,TargetFileMask))
TargetFileList =
Splice(Reuse(TargetDir),OsDelim,TargetWorkFiles)
Ans = OsStatus
CopyofComareROWS:
Function
copyofcompareRows(Column_Name,Column_Value)
vJobName=DSGetJobInfo(DSJ.ME, DSJ.JOBNAME)
vStageName=DSGetStageInfo(DSJ.ME, DSJ.ME,
DSJ.STAGENAME)
vCommonName=CheckSum(vJobName) :
CheckSum(vStageName) : CheckSum(Column_Name)
vLastValue=LastValue
vNewValue=Column_Value
LastValue=vNewValue
CopyOfZSTPKeyLookup
Check if key passed exists in file passed
Arg1: Hash file to look in
Arg2: Key to look for
Arg3: Number of file to use "1" or "2"
Ans = 0
Ans = RetVal
Create12CharTS:
Function Create12CharTS(JobName)
vJobStartTime = DSGetJobInfo(vJobHandle,
DSJ.JOBSTARTTIMESTAMP)
Ans=vDate
CreateEmptyFile:
Function CreateEmptyFile(Arg1,Arg2)
WeofSeq FileVar
CloseSeq FileVar
Ans="1"
Datetrans:
DateVal
Function Datetrans(DateVal)
Function DeleteFiles(SourceDir,FileMask,Flags)
* Function ReverseDate(DateVal)
* Date mat be in the form of DD.MM.YY i.e.
01.10.03
* convert to YYYYMMDD SAP format
DeleteFiles:
RoutineName = "DeleteFiles"
If SourceDir = '' Then SourceDir = '.'
If System(91) Then
OsType = 'NT'
OsDelim = '\'
WorkFiles = Trims(Convert(',',@FM,FileMask))
FileList =
Splice(Reuse(SourceDir),OsDelim,WorkFiles)
Call DSLogInfo('Deleting
':FileList,RoutineName)
Call DSExecute(OsType,OsCmd,OsOutput,OsStatus)
If OsStatus Then
Residx= Index(OsOutput,"non-existent",1)
if Index(OsOutput,"non-existent",1) = 0
then
Call DSLogInfo('The Delete command
(':Residx:OsCmd:') returned status
':OsStatus:':':@FM:OsOutput,RoutineName)
End
Else
Call DSLogInfo('No Files matched Wild
Card - Delete was not required...',RoutineName)
OsStatus = 0
End
End Else
Call DSLogInfo('Files
deleted...',RoutineName)
End
Ans = OsStatus
DisconnectNetworkDrive:
Function Disconnectnetworkdrive(Drive_Letter)
RoutineName = "MapNetworkDrive"
OsType = 'NT'
OsDelim = '\'
NonOsDelim = '/'
Copy = 'copy '
Call DSExecute(OsType,OsCmd,OsOutput,OsStatus)
If OsStatus Then
Call DSLogWarn('The Copy command
(':OsCmd:') returned status
':OsStatus:':':@FM:OsOutput, RoutineName)
End Else
Call DSLogInfo('Drive: ' : Drive_Letter :
'Disconnected ',RoutineName)
End
Ans = OsStatus
DosCmd:
Function DosCmd(Cmd)
RoutineName = "DosCmd"
If System(91) Then
OsType = 'NT'
OsDelim = '\'
NonOsDelim = '/'
End Else
OsType = 'UNIX'
OsDelim = '/'
NonOsDelim = '\'
End
OsCmd = Cmd
DSMoveFiles:
Move files from one directory to another:
If System(91) Then
OsType = 'NT'
OsDelim = '\'
NonOsDelim = '/'
Move = 'move '
End Else
OsType = 'UNIX'
OsDelim = '/'
NonOsDelim = '\'
Move = 'mv -f '
End
WorkFiles = Trims(Convert(',',@FM,FileMask))
FileList =
Splice(Reuse(SourceDir),OsDelim,WorkFiles)
Ans = OsStatus
Routine Name:ErrorMgmtDummy:
RoutineName = 'Map'
DEFFUN
LogToHashFile(ModRunNum,Ticket_Group,Ticket_Seque
nce,Set_Key,Table,FieldName,Key,Error,Text,Severi
tyInd) Calling 'DSU.LogToHashFile'
Ret_Code=LogToHashFile(Mod_Run_Num,Ticket_Group,T
icket_Sequence,Set_Key,Table,FieldName,Chk_Value,
Ans,Msg,SeverityInd)
RETURN(Ans)
FileExists:
Ans = FileFound
FileSize:
Returns the size of a file
Function FileSize(FileName)
RoutineName = "FileSize"
FileSize = -99
Ans = FileSize
FindExtension:
FunctionFindExtesion(Arg1)
File_Name=Arg1
Ans = File_Extension
FindFileSuffix:
Function FindFileSuffix(Arg1)
File_Name=Arg1
TimestampEndPos = Index(File_Name,MyTimestamp,1)
+ Len(MyTimestamp)
MySuffix = File_Name[TimestampEndPos + 1,
Len(File_Name)]
Ans = MySuffix
FindTimeStamp:
Function FindTimeStamp(Arg1)
File_Name=Arg1
Ans = Timestamp
formatCharge:
Function FormatCharge(Arg1)
vCharge=Trim(Arg1, 0, "L")
vCharge=vCharge/100
vCharge=FMT(vCharge,"R2")
Ans=vCharge
formatGCharge:
Ans=1
vLength=Len(Arg1)
vMinus=If Arg1[1,1]='-' Then 1 Else 0
If Arg1='0.00' Then
Ans=Arg1
End
Else
If vMinus=1 Then
vString=Arg1[2,vLength-1]
vString='-':Trim(vString, '0','L')
End
else
vString=Trim(Arg1, '0','L')
end
Ans=vString
End
FTPFile:
* FUNCTION
FTPFile(Script_Path,File_Path,File_Name,IP_Addres
s, User_ID,Password,Target_Path)
*
*
RoutineName = 'FTPFile'
Call DSExecute("UNIX",OsCmd,OsOutput,OsStatus)
If OsStatus Then
Call DSLogInfo('The FTP command (':OsCmd:')
returned status
':OsStatus:':':@FM:OsOutput,'DSMoveFiles')
End Else
Call DSLogInfo('Files FTPd...':
'(':OsCmd:')','FTPFile')
End
Ans = OsStatus
RETURN(Ans)
FTPmget:
* FUNCTION
FTPFile(Script_Path,Source_Path,File_Wild_Card,IP
_Address, User_ID,Password,Target_Path)
*
*
RoutineName = 'FTPmget'
Call DSExecute("UNIX",OsCmd,OsOutput,OsStatus)
If OsStatus Then
Call DSLogInfo('The FTP command (':OsCmd:')
returned status
':OsStatus:':':@FM:OsOutput,'DSMoveFiles')
End Else
Call DSLogInfo('Files FTPd...':
'(':OsCmd:')',RoutineName)
End
Ans = OsStatus
RETURN(Ans)
t = Char(009)
Ans = Pattern
GBIConcatItem:
Concatenate All Input Arguments to Output using TAB
character:
Routine="GBIConcatItem"
Ans = Pattern
GCMFConvert:
Receive GCMF string and change known strings to
required values:
DataIn = "":Trim(Arg1)
GCMFFormating:
*
* FUNCTION GCMFFormating(Switch, All_Row)
*
* Replaces some special characters when creating
the GCMF file
*
* Input Parameters : Arg1: Switch = Step to
change.
* Arg2: All_Row = Row
containing the GCMF Record.
*
DataIn=Trim(All_Row)
Ans = DataInFmt
End
End
Else
If Switch=2 Then
DataInFmt = Ereplace (DataIn ,">",
">")
DataInFmt = Ereplace (DataInFmt ,"<",
"<")
Ans = DataInFmt
End
Else
* Final Replace, After the Merge of all
GCMF segments
DataInFmt = Ereplace (DataIn ,"|",
"|")
Ans = DataInFmt
End
End
GeneralCounter:
NextId = Identifier
IF UNASSIGNED(OldParam) Then
OldParam = NextId
TotCount = 0
END
Ans = TotCount
GetNextCustomerNumber:
If NOT(Initialized) Then
* Not initialised. Attempt to open the file.
Initialized = 1
Open "IOC01_SUPER_GRP_CTL_HF" TO SeqFile
Else
Call DSLogFatal("Cannot open customer
number allocation control file",RoutineName)
Ans = -1
End
End
* Read the named record from the file.
Readu NextVal From SeqFile, Arg1 Else
Call DSLogFatal("Cannot find super group
in customer number allocation control
file",RoutineName)
Ans = -1
End
Ans = NextVal
GetNextErrorTableID:
Sequence number generator in a concurrent environment.
If NOT(Initialized) Then
* Not initialised. Attempt to open the file.
Initialized = 1
Open "ErrorTableSequences" TO SeqFile Else
* Open failed. Create the sequence
file.
EXECUTE "CREATE.FILE
ErrorTableSequences 2 1 1"
Open "ErrorTableSequences" TO SeqFile
Else Ans = -1
End
Ans = NextVal
NextVal = NextVal + 1
GetNextModSeqNo:
Gets the Next Mod Run Code from an Initialised
Sequence
This routine gets the next Mod Run Number in a
squenced that was initialised,.
GetParameterArray:
* GetParameterArray(Arg1)
* Decription: Get parameters
* Written by:
* Notes:
* Bag of Tricks Version 2.3.0 Release Date 2001-
10-01
* Arg1 = Path and Name of Parameter File
*
* Result = ( <1> = Parameter names, <2> =
Parameter values)
*
-------------------------------------------------
-----------
DEFFUN FileFound(A) Calling 'DSU.FileFound'
cBlank = ''
cName = 1
cValue = 2
vParamFile = Arg1
aParam = cBlank
vParamCnt = 0
vCurRoutineName = 'Routine:
GetParameterArray'
End Else
Call DSLogWarn('Error from
':vParamFile:'; Status =
':STATUS(),vCurRoutineName)
vFailed = @TRUE
End
End Else
vFailed = @TRUE
End
Ans = ""
GoTo ExitLastDayMonth
End
InYear = Substrings(Arg1,1,4)
InMonth = Substrings(Arg1,5,2)
End Case
Ans=OutDt:"-":InMonth:"-":InYear
ExitLastDayMonth:
LogToErrorFile:
* FUNCTION
LogToErrorFile(Table,Field_Name,Check_Value,Error
_Number,Error_Text_1, Error_Text_2,
Error_Text_3,Additional_Message)
*
*
* Places the current Writes Error Messages to a
Hash File
*
* Input Parameters : Arg1: Table
= The name of Control table being checked
* Arg2: Field_Name
= The name of the Field that is in error
* Arg3: Check_Value
= The value used to look up in the Hash file to
get try and get a look up match
* Arg4: Error_Number
= The error number returned
* Arg5: Error_Text_1
= First error message argument. Used to build the
default error message
* Arg6: Error_Text_2
= Second error message argument. Used to build
the default error message
* Arg7: Error_Text_3
= Thrid error message argument. Used to build the
default error message
* Arg8: Additional_Message
= Any text that could be stored against an error
*
RoutineName = "LogToErrorFile"
Ans = "ERROR"
If System(91) Then
OsType = 'NT'
OsDelim = '\'
NonOsDelim = '/'
Move = 'move '
End Else
OsType = 'UNIX'
OsDelim = '/'
NonOsDelim = '\'
Move = 'mv -f '
End
Ans = "ERROR"
Return(Ans)
LogToHashFile:
* FUNCTION
LogToHashFile(ModRunNum,TGrp,TSeg,SetKey,Table,Fi
eldNa,KeyValue,Error,Msg,SeverityInd)
*
*
* Places the current Writes Error Messages to a
Hah File
*
* Input Parameters : Arg1: ModRunNum =
The unique number allocated to a run of an Module
* Arg2: Ticket_Group =
The Ticket Group Number of the Current Row
* Arg3: Ticket_Sequence =
The Ticket Sequence Number of the Current Row
* Arg4: Set_Key = A
Key to identify a set of rows e.g. an Invoice
Number to a set of invoice lines
* Arg5: Table =
The name of Control table being checked
* Arg6: FieldNa =
The name of the Field that is in error
* Arg7: KeyValue =
The value used to look up in the Hash file to get
try and get a look up match
RoutineName = "LogToHashFile"
TAns = 0
If System(91) Then
OsType = 'NT'
OsDelim = '\'
NonOsDelim = '/'
Move = 'move '
End Else
OsType = 'UNIX'
OsDelim = '/'
NonOsDelim = '\'
Move = 'mv -f '
End
Ans = TAns
RETURN(Ans)
The routine arguments are the field name, the field, the
group key, whether this is the first mandatory check for
* Call DSLogInfo("Routine
started":Arg1,RoutineName)
If NOT(Initialized) Then
Initialized = 1
* Call DSLogInfo("Initialisation
Started",RoutineName)
Open "MANDATORY_FIELD_HF" TO SeqFile Else
Call DSLogFatal("Cannot open Mandatory
field control file",RoutineName)
Ans = -1
End
* Call DSLogInfo("Initialisation
Complete",RoutineName)
End
If Arg4 = "Y"
Then
Mandlist = ""
ProcessIn = "":Trim(Arg5)
If IsNull(ProcessIn) or ProcessIn = ""
Then ProcessV = " "
Map:(Routinue Name)
* FUNCTION
Map(Value,FieldName,Format,Default,Msg,ErrorLogIn
d)
*
* Executes a lookup against a hashed file using a
key
*
* Input Parameters : Arg1: Value =
The Value to Be Mapped
* Arg2: FieldName =
The Name of the field that is either the Target
of the Derivation or the sourceField that value
is contained in
* Arg3: Format =
The name of the Hash file containing the mapping
data
* Arg4: Default =
The Default value to return if value is not found
* Arg5: Msg =
Any text you want stored against an error
* Arg6: SeverityInd =
An Indicator to the servity Level
* Arg7: ErrorLogInd =
An Indicator to indicate if errors should be
logged
* Arg8: HashfileLocation =
An Indicator to indicate of errors should be
logged (Note this is not yet implemented)
*
* Return Values: If the Value is not found,
return value is: -1. or the Default value if that
is supplied
* If Format Table not found,
return value is: -2
*
*
*
RoutineName = 'Map'
DEFFUN
LogToHashFile(ModRunNum,Ticket_Group,Ticket_Seque
nce,Set_Key,Table,FieldName,Key,Error,Text,Severi
tyInd) Calling 'DSU.LogToHashFile'
*
If Len(Chk_Hash_File_Name) = 3 And
HashFileLocation = "G" Then Format_Extn =
Chk_Hash_File_Name Else Format_Extn = Mod_Run_Num
[1,5]
If System(91) Then
OsType = 'NT'
OsDelim = '\'
NonOsDelim = '/'
Move = 'move '
End Else
OsType = 'UNIX'
OsDelim = '/'
NonOsDelim = '\'
Move = 'mv -f '
End
ColumnPosition = 0
PositionReturn = 0
Table = Format
End Else
Default_Ans = Chk_Value
End
Case @TRUE
If UpCase(Field(Default,"|",1)) <> "BL"
Then Default_Ans = Default Else Default_Ans = -1
End Case
LogPass = "N"
If (Default = "PASS" and Default_Ans <> Ans)
then LogPass = "Y"
If LogPass = "Y"
Then
*Message = "PASS Trans Default_Ans
==>" : Default_Ans : " Ans ==> " : Ans
*Call DSLogInfo(Message, RoutineName )
Ret_Code=LogToHashFile(Mod_Run_Num,Ticket_Group,T
icket_Sequence,Set_Key,Table,FieldName,Chk_Value,
Ans,Msg,SeverityInd)
End
RETURN(Ans)
ErrCode = DSDetachJob(hJob)
Pattern:
Routine="Pattern"
Var_Len = len(Value)
Pattern = Value
Begin Case
End Case
PrepareJob:
RangeCheck:
RoutineName = 'RangeChk'
DEFFUN
LogToHashFile(ModRunNum,Ticket_Group,Ticket_Seque
nce,Set_Key,Table,FieldName,Key,Error,Text,Severi
tyInd) Calling 'DSU.LogToHashFile'
Ret_Code=LogToHashFile(Mod_Run_Num,Ticket_Group,T
icket_Sequence,Set_Key,Table,FieldName,Value,Ans,
OutputMsg,SeverityInd)
End
RETURN(Ans)
ReadParameter:
*
* Function : ReadParameter - Read parameter value from
configuration file
* Arg : ParameterName
(default=JOB_PARAMETER)
* DefaultValue (default='')
* Config file (default=@PATH/config.ini)
* Return : Parameter value from config file
Function
Readparameters(parametersname,Defaultvalue,Config
File)
ParameterValue = DefaultValue
Loop
While ReadSeq Line From fCfg
If Trim(Field(Line,'=',1)) = ParameterName
Then
ParameterValue = Trim(Field(Line,'=',2))
Exit
End
Repeat
CloseSeq fCfg
Ans = ParameterValue
RETURN(Ans)
ReturnNumber:
String=Arg1
Slen=Len(String)
Scheck=0
Rnum=""
Schar=Substrings(String,Scheck,1)
If NUM(Schar) then
Rnum=Rnum:Schar
End
Next Outer
Ans=Rnum
ReturnNumbers:
length=0
length=LEN(Arg1);
length1=1;
Outer=length;
postNum=''
counter=1;
For Outer = length to 1 Step -1
Arg2=Arg1[Outer,1]
postNum=RIGHT(Arg1,length2)
END
else
postNum=RIGHT(Arg1,counter)
END
END
counter=counter+1
Next Outer
Ans=postNum
ReverseDate:
Function ReverseDate(DateVal)
* Function ReverseDate(DatelVal)
* Date mat be in the form of DDMMYYYY i.e.
01102003 or DMMYYYY 1102003
If Len(DateVal) = 7 then
NDateVal = "0" : DateVal
End Else
NDateVal = DateVal
End
RunJob:
Status<1>=Jobname=FinishStatus
Status<2>=Jobname
FunctionRunJob(Arg1,Arg2,Arg3,Arg4)
JobHandle = ''
Info = ''
ParamCount = Dcount(Params,'|')
If RowLimit = '' Then RowLimit = 0
If WarnLimit = '' Then WarnLimit = 0
JobStartTime = DSRTimestamp()
JobHandle = DSAttachJob(RunJobName,
DSJ.ERRFATAL)
Message = DSRMessage('DSTAGE_TRX_I_0014',
'Attaching job for processing - %1 - Status of
Attachment = %2', RunJobName:@FM:JobHandle )
Call DSLogInfo(Message, RoutineName)
Message = DSRMessage('DSTAGE_TRX_I_0016',
'Getting job statistics', '' )
Call DSLogInfo(Message, RoutineName)
StageList =
DSGetJobInfo(JobHandle,DSJ.STAGELIST)
Message = DSRMessage('DSTAGE_TRX_I_0017',
'List of Stages=%1', StageList )
Call DSLogInfo(Message, RoutineName)
Info<1> = RunJobName
Info<2> = JobStartTime ;* StartTime
(Timestamp format)
Info<3> = JobEndTime ;* Now/End (Timestamp
format)
LinkCount = Dcount(LinkNames,',')
For StageLink = 1 To LinkCount
* Get Rowcount For this linkname
RowCount =
DSGetLinkInfo(JobHandle,Field(StageList,',',Stage
),Field(LinkNames,',',StageLink),DSJ.LINKROWCOUNT)
Message =
DSRMessage( 'DSTAGE_TRX_I_0019', 'RowCount for
%1.%2=%3',
Field(StageList,',',Stage):@FM:Field(LinkNames,',
',StageLink):@FM:RowCount)
Call DSLogInfo(Message, RoutineName)
Info<4,-1> =
Field(StageList,',',Stage):'.':Field(LinkNames,',
',StageLink)
Info<5,-1> = RowCount
Next StageLink
Next Stage
Ans = RunJobName:'=':Status:@FM:Info
RunJobAndDetach:
FunctionRunDetachJob(Arg1,Arg2,Arg3,Arg4)
JobHandle = ''
Info = ''
ParamCount = Dcount(Params,'|')
If RowLimit = '' Then RowLimit = 0
If WarnLimit = '' Then WarnLimit = 0
Message = DSRMessage('DSTAGE_TRX_I_0014',
'Attaching job for processing - %1 - Status of
Attachment = %2', RunJobName:@FM:JobHandle )
Call DSLogInfo(Message, RoutineName)
LimitErr = DSSetJobLimit(JobHandle,
DSJ.LIMITROWS, RowLimit)
LimitErr = DSSetJobLimit(JobHandle,
DSJ.LIMITWARN, WarnLimit)
ErrCode = DSRunJob(JobHandle,
DSJ.RUNNORMAL)
ErrCode = DSDetachJob(JobHandle)
Ans = 0
RunShellCommandReturnStatus:
Function RunShellcommandreturnstatus(Command)
Call DSLogInfo('Running
command:':Command,'RunShellCommandReturnStatus')
Call DSExecute('UNIX',Command,Ans,Ret)
Return(Ret)
SegKey:
Function
Seqkey(Segment_Num,segmentparam,key,ErrorLogInd)
* FUNCTION SegKey(Value,ErrorLogInd)
*
* Executes a lookup against a hashed file using a
key
*
* Input Parameters : Arg1: Segment_Num
* Arg2: Segment_Parm
* Arg1: Key = An
ordered Pip separated set of Seqment Primary Key
Fields
* Arg2: ErrorLogInd = An
Indicator to indicate of errors should be logged
(Note this is not yet implemented)
*
* Return Values: If the Value is not found,
return value is: -1. or the Default value if that
is supplied
* If Format Table not found,
return value is: -2
*
RoutineName = 'SegKey'
BlankFields = ""
CRLF = Char(13) : Char(10)
Write_Ind = Field(Segment_Parm,"|",Segment_Num)
NumKeys = Dcount(Key,"|")
Blank_Key_Cnt = 0
ReturnKey = ""
For i = 1 to NumKeys
Key_Part = Field(Key,"|",i)
if Key_Part = "" Then
Blank_Key_Cnt = Blank_Key_Cnt + 1
BlankFields<Blank_Key_Cnt> = i
end
Next i
Ans = "Invalid_Key"
End Else
Ans = ReturnKey
End
End
Else
Ans = "Invalid_Key"
End
JobParam%%1 = STAGECOM.STATUS<7,1>
JobParam%%2 = STAGECOM.STATUS<7,2> etc
Subroutinues
SetDsparmsformfile(inputArg,Errorcode)
JobName = Field(STAGECOM.NAME,'.',1,2)
ParamList =
STAGECOM.JOB.CONFIG<CONTAINER.PARAM.NAMES>
If ParamList = '' Then
Call DSLogWarn('Parameters may not be
externally derived if the job has no parameters
defined.',SetParams)
Return
End
ArgList = Trims(Convert(',',@FM,InputArg))
ParamDir = ArgList<1>
If ParamDir = '' Then
ParamDir = '.'
End
ParamFile = ArgList<2>
If ParamFile = '' Then
ParamFile = JobName
End
If System(91) Then
Delim = '\'
End Else
Delim = '/'
End
ParamPath = ParamDir:Delim:ParamFile
StatusFileName =
FileInfo(DSRTCOM.RTSTATUS.FVAR,1)
Readvu LockItem From DSRTCOM.RTSTATUS.FVAR,
JobName, 1 On Error
Call DSLogFatal('File read error for
':JobName:' on ':StatusFileName:'. Status =
':Status(),SetParams)
StatusId = JobName:'.':STAGECOM.WAVE.NUM
Readv ParamValues From
DSRTCOM.RTSTATUS.FVAR, StatusId, JOB.PARAM.VALUES
On Error
Release DSRTCOM.RTSTATUS.FVAR, JobName
On Error Null
ErrorCode = 1
Call DSLogFatal('File read error for
':StatusId:' on ':StatusFileName:'. Status =
':Status(),SetParams)
Return
End Else
Release DSRTCOM.RTSTATUS.FVAR, JobName
On Error Null
ErrorCode = 2
Call DSLogFatal('Failed to read
':StatusId:' record from
':StatusFileName,SetParams)
Return
End
Loop
ReadSeq ParamData From ParamFileVar On
Error
Release DSRTCOM.RTSTATUS.FVAR,
JobName On Error Null
ErrorCode = 4
Call DSLogFatal('File read error on
':ParamPath:'. Status = ':Status(),SetParams)
Return
End Else
Exit
End
Convert '=' To @FM In ParamData
ParamName = Trim(ParamData<1>)
Del ParamData<1>
ParamValue =
Convert(@FM,'=',TrimB(ParamData))
Locate(ParamName,ParamList,1;ParamPos)
Then
If
Index(UpCase(ParamName),'PASSWORD',1) = 0
Then Call DSLogInfo('Parameter
"':ParamName:'" set to
"':ParamValue:'"',SetParams)
Writev ParamValues On
DSRTCOM.RTSTATUS.FVAR, StatusId, JOB.PARAM.VALUES
On Error
Release DSRTCOM.RTSTATUS.FVAR, JobName
On Error Null
ErrorCode = 5
Call DSLogFatal('File write error for
':StatusId:' on ':StatusFileName:'. Status =
':Status(),SetParams)
Return
End Else
Release DSRTCOM.RTSTATUS.FVAR, JobName
On Error Null
ErrorCode = 6
Call DSLogFatal('Unable to write
':StatusId:' record on ':StatusFileName:'. Status
= ':Status(),SetParams)
Return
End
Release DSRTCOM.RTSTATUS.FVAR, JobName On
Error Null
STAGECOM.JOB.STATUS<JOB.PARAM.VALUES> =
ParamValues
setParamsForFileSplit:
Using values from a control file this routine will run a job
multiple times loading the specified number of rows for
each job run.
Function setParamsForFileSplit:
(ControlFilename,Jobname)
*************************************************
**********************
* Nick Bond....
*
* This routine retrieves values from a control
file and passes them as paramters to *
* a job which is run once for each record in the
control file. *
*
*
vNewFile = 'SingleInvoice':vRecord
vJobHandle = DSAttachJob(vJobName,
DSJ.ERRFATAL)
ErrCode = DSSetParam(vJobHandle,
'StartID', vStart)
ErrCode = DSSetParam(vJobHandle,
'StopID', vStop)
ErrCode = DSSetParam(vJobHandle,
'newfile', vNewFile )
vRecord = vRecord+1
End
Else
** If record is empty leave loop
GoTo Label1
End
Repeat
******** End of Loop
Label1:
Call DSLogInfo('All records have been
processed', Routine)
SetUserStatus:
Function Setuserstatus(Arg1)
Call DSSetUserStatus(Arg1)
Ans=Arg1
SMARTNumberConversion
Converts numbers in format 1234,567 to format 1234.57
Function SMARTNUMBERconversion(arg1)
WRK = ICONV(INP,"MD33") ; *
convert to internal to 3 decimal places
Ans = OCONV(WRK,"MD23") ; *
convert to external t 2 decimal places
TicketErrorCommon
* FUNCTION
TicketErrorCommon(Mod_Run_ID,Ticket_Group,Ticket_
Sequence,Ticket_Set_Key,Job_Stage_Name,Mod_Root_P
ath)
*
* Places the current Row Ticket in Common
*
ModRunID = Mod_Run_ID
TicketFileID = Ticket_File_ID
TicketSequence = Ticket_Sequence
SetKey = Ticket_Set_Key
JobStageName = Job_Stage_Name
ModRootPath = Mod_Root_Path
RETURN(Ans)
TVARate:
Function TvaRate(mtt_Base,mtt_TVA)
BaseFormated = "":(Mtt_Base)
TvaFormated = "":(Mtt_TVA)
Function Tvatest(Mtt_TVA,Dlco)
Country = TRIM(Dlco):";"
TestCountry =
Count("AT;BE;CY;CZ;DE;DK;EE;ES;FI;GB;GR;HU;IE;IT;
LT;LU;LV;MT;NL;PL;PT;SE;SI;SK;", Country)
Begin Case
Case Mtt_TVA <> 0
Reply = "B3"
Case Mtt_TVA = 0 And Dlco = "FR" And TestCountry
= 0
Reply = "A1"
Case Mtt_TVA = 0 And Dlco <> "FR" And TestCountry
= 1
Reply = "E6"
Case Mtt_TVA = 0 And Dlco <> "FR" And TestCountry
= 0
Reply = "E7"
Case @True
Reply = "Error"
End Case
Ans = Reply
UnTarFile:
Function Untarfile(Arg1)
DIR =
"/interface/dashboard/dashbd_dev_dk_int/Source/"
FNAME = "GLEISND_OC_02_20040607_12455700.csv"
*--------------------------------
*---syntax= tar -xvvf myfile.tar
*---------------------------------
Ans = Output
UtilityMessageToControllerLog
Function UtilityMessageToControllerLog(Arg1)
Equate RoutineName To
"UtilityMessageToControllerLog"
InputMsg = Arg1
If Isnull(InputMsg) Then
InputMsg = " "
End
Call DSLogToController(InputMsg)
Ans = 1
UTLPropagateParms:
Ans = 0
ParentJobName =
DSGetJobInfo(DSJ.ME,DSJ.JOBNAME)
ChildParams =
Convert(',',@FM,DSGetJobInfo(Handle,DSJ.PARAMLIST
))
ParamCount = Dcount(ChildParams,@FM)
If ParamCount Then
ParentParams =
Convert(',',@FM,DSGetJobInfo(DSJ.ME,DSJ.PARAMLIST
))
Loop
ThisParam = ChildParams<1>
Del ChildParams<1>
*** Find job parameter in parent job
and set parameter in child job to value of parent.
Locate(ThisParam,ParentParams;ParamPos) Then
ThisValue =
DSGetParamInfo(DSJ.ME,ThisParam,DSJ.PARAMVALUE)
ParamStatus =
DSSetParam(Handle,ThisParam,ThisValue)
Call DSLogInfo ("Setting:
":ThisParam:" To: ":ThisValue,
"UTLPropagateParms")
End
Else
*** If the parameter is not found
in parent job:
*** - write a warning to log file.
*** - return code changed to 3.
Call DSLogWarn ("Parameter :
":ThisParam:" does not exist in ":ParentJobName,
"UTLPropagateParms")
Ans = 3
End
While ChildParams # '' Do Repeat
End
Return(Ans)
UTLRunReceptionJob:
Function
Utilrunrece[pationjob(countryparam,fileset_name_typepa
ram,modulerunparam,Abort_msg_param)
Ans = -3
*************************************************
**************************************
***
###################
***
*************************************************
**************************************
*** Define job to launch -
Sequence or Job (START) ***
***
***
L$DefineSeq$START:
summary$<1,-1> = Time$$:Convert(@VM, " ",
DSMakeMsg("DSTAGE_JSG_M_0057\%1 (JOB %2)
started", "ReceptionJob":@FM:vRecJobNameBase))
** If Sequential Job exists - start Sequential
Job.
vJobSuffix = "_Seq"
vRecJobName = vRecJobNameBase : vJobSuffix
GoTo L$AttachJob$START
L$DefineJob$START:
** If no Sequential Job - start Elementary Job
vJobSuffix = "_Job"
vRecJobName = vRecJobNameBase : vJobSuffix
GoTo L$AttachJob$START
L$ErrNoJob$START:
L$AttachJob$START:
Call DSLogInfo(DSMakeMsg("Checking presence of
" : vRecJobName : " for " : Module_Run_Parm, ""),
"")
jbRecepJob = vRecJobName
hRecepJob = DSAttachJob(jbRecepJob,
DSJ.ERRNONE)
If (Not(hRecepJob)) Then
AttachErrorMsg$ = DSGetLastErrorMsg()
If AttachErrorMsg$ = "(DSOpenJob) Cannot
find job " : vRecJobName Then
If vJobSuffix = "_Seq" Then GoTo
L$DefineJob$START
Else
GoTo L$ErrNoJob$START
End
End
Msg = DSMakeMsg("DSTAGE_JSG_M_0001\Error
calling DSAttachJob(%1)<L>%2",
jbRecepJob:@FM:AttachErrorMsg$)
MsgId = "@ReceptionJob"; GoTo L$ERROR
GoTo L$ERROR
End
If hRecepJob = 2 Then
GoTo L$RecepJobPrepare$START
End
***
***
*** Define job to launch -
Sequence or Job (END) ***
*************************************************
**************************************
***
###################
***
*************************************************
**************************************
*** Setup , Run and Wait for
Reception Job (START) ***
***
***
L$RecepJobPrepare$START:
*** Activity "ReceptionJob": Setup, Run and Wait
for job
hRecepJob = DSPrepareJob(hRecepJob)
If (Not(hRecepJob)) Then
Msg = DSMakeMsg("DSTAGE_JSG_M_0012\Error
calling DSPrepareJob(%1)<L>%2",
jbRecepJob:@FM:DSGetLastErrorMsg())
MsgId = "@ReceptionJob"; GoTo L$ERROR
End
GoTo L$PropagateParms$START
L$PropagateParms$START:
*** Activity "PropagateParms": Propagating
parameters from parent job to child job using
separate routine.
summary$<1,-1> = Time$$:Convert(@VM, " ",
DSMakeMsg("DSTAGE_JSG_M_0058\%1 (ROUTINE %2)
started",
"PropagateParms":@FM:"DSU.UTLPropagateParms"))
RtnOk = DSCheckRoutine("DSU.UTLPropagateParms")
If (Not(RtnOk)) Then
Msg = DSMakeMsg("DSTAGE_JSG_M_0005\BASIC
routine is not cataloged: %1",
"DSU.UTLPropagateParms")
MsgId = "@PropagateParms"; GoTo L$ERROR
End
Call 'DSU.UTLPropagateParms'(rPropagateParms,
hRecepJob)
summary$<1,-1> = Time$$:Convert(@VM, " ",
DSMakeMsg("DSTAGE_JSG_M_0064\%1 finished, reply=
%2", "PropagateParms":@FM:rPropagateParms))
IdAbortRact%%Result1%%1 = rPropagateParms
IdAbortRact%%Name%%3 = "DSU.UTLPropagateParms"
*** Checking result of routine. If <> 0 then
abort processing.
If (rPropagateParms <> 0)
Then GoTo L$ABORT
GoTo L$RecepJobRun$START
L$RecepJobRun$START:
ErrCode = DSRunJob(hRecepJob, DSJ.RUNNORMAL)
If (ErrCode <> DSJE.NOERROR) Then
Msg = DSMakeMsg("DSTAGE_JSG_M_0003\Error
calling DSRunJob(%1), code=%2[E]",
jbRecepJob:@FM:ErrCode)
MsgId = "@ReceptionJob"; GoTo L$ERROR
End
ErrCode = DSWaitForJob(hRecepJob)
GoTo L$RecepJob$FINISHED
***
***
*** Setup , Run and Wait for
Reception Job (END) ***
*************************************************
**************************************
***
###################
***
*************************************************
**************************************
*** Verification of result
of Reception Job (START) ***
L$RecepJob$FINISHED:
jobRecepJobStatus = DSGetJobInfo(hRecepJob,
DSJ.JOBSTATUS)
jobRecepJobUserstatus =
DSGetJobInfo(hRecepJob, DSJ.USERSTATUS)
summary$<1,-1> = Time$$:Convert(@VM, " ",
DSMakeMsg("DSTAGE_JSG_M_0063\%1 finished, status=
%2[E]", "ReceptionJob":@FM:jobRecepJobStatus))
IdRecepJob%%Result2%%5 = jobRecepJobUserstatus
IdRecepJob%%Result1%%6 = jobRecepJobStatus
IdRecepJob%%Name%%7 = vRecJobName
Dummy = DSDetachJob(hRecepJob)
bRecepJobelse = @True
If (jobRecepJobStatus = DSJS.RUNOK)
Then GoTo L$SeqSuccess$START; bRecepJobelse =
@False
If bRecepJobelse Then GoTo L$SeqFail$START
***
***
*** Verification of result
of Reception Job (END) ***
*************************************************
**************************************
***
###################
***
*************************************************
**************************************
*** Definition of actions to
take on failure or success (START) ***
***
***
L$SeqFail$START:
*** Sequencer "Fail": wait until inputs ready
Call DSLogInfo(DSMakeMsg("Routine SEQUENCER -
Control End Sequence Reports a FAIL on Reception
Job", ""), "@Fail")
GoTo L$ABORT
L$SeqSuccess$START:
*** Sequencer "Success": wait until inputs ready
Call DSLogInfo(DSMakeMsg("Routine SEQUENCER -
Control End Sequence Reports a SUCCESS on
Reception Job", ""), "@Success")
GoTo L$FINISH
***
***
*** Definition of actions to
take on failure or success (END) ***
*************************************************
**************************************
L$ERROR:
Call
DSLogWarn(DSMakeMsg("DSTAGE_JSG_M_0009\Controller
problem: %1", Msg), MsgId)
summary$<1,-1> = Time$$:Convert(@VM, " ",
DSMakeMsg("DSTAGE_JSG_M_0052\Exception raised:
%1", MsgId:", ":Msg))
bAbandoning = @True
GoTo L$FINISH
L$ABORT:
summary$<1,-1> = Time$$:Convert(@VM, " ",
DSMakeMsg("DSTAGE_JSG_M_0056\Sequence failed",
""))
Call DSLogInfo(summary$, "@UTLRunReceptionJob")
Call DSLogWarn("Unrecoverable errors in
routine UTLRunReceptionJob, see entries above",
"@UTLRunReceptionJob")
Ans = -3
GoTo L$EXIT
**************************************************
L$FINISH:
If bAbandoning Then GoTo L$ABORT
summary$<1,-1> = Time$$:Convert(@VM, " ",
DSMakeMsg("DSTAGE_JSG_M_0054\Sequence finished
OK", ""))
Call DSLogInfo(summary$, "@UTLRunReceptionJob")
Ans = 0
ValidateField:
vData_Type = Downcase(Data_Type)
BEGIN CASE
******** Check the arguments
* Value being checked is null
CASE isNull(Field_Value)
Call DSTransformError("The value being checked is
Null - Field_Name = " : Field_Name, vRoutineName)
* Argument for the data type is not valid
CASE vData_Type <> "char" AND vData_Type <>
"alpha" AND vData_Type <> "numeric" AND
vData_Type <> "date"
Call DSTransformError("The value " : Data_Type :
" is not a valid data type for routine: ",
vRoutineName)
* Length is not a number
CASE Not(Num(Length))
Call DSTransformError("The length supplied is not
a number : Field Checked " : Field_Name,
vRoutineName)
CASE vData_Type = "date" And (Date_Format = "" OR
isNull(Date_Format))
END CASE
*********
End
Ans = Ans
VatCheckSG:
Function VatcheckSg(Arg1)
String=Arg1
Slen=Len(String)
Scheck=0
CharCheck=0
Schar=Substrings(String,Scheck,1)
CharCheck=CharCheck+1
end
Next
Ans=CharCheck
WriteParmFile:
Function writeparamfile(Arg1,Arg2,arg3,arg4)
Loop
ReadSeq Dummy From FileVar Else Exit ;* at
end-of-file
Repeat
WeofSeq FileVar
CloseSeq FileVar
Ans=MyLine
WriteSeg:
* FUNCTION SegKey(Value,ErrorLogInd)
*
* Executes a lookup against a hashed file using a
key
*
* Input Parameters : Arg1: Segment_Num
* Arg2: Segment_Parm
*
* Return Values: If the Segment should be
written return value is "Y"
* If If not return value is "N"
*
*
*
RoutineName = 'WriteSeg'
Write_Ind = Field(Segment_Parm,"|",Segment_Num)
SET_JOB_PARAMETERS_ROUTINE
InputArg……………..Arguments.
ErrorCode…………Arguments.
Routinuename: SetDSParamsFromFile
$INCLUDE DSINCLUDE DSD_STAGE.H
$INCLUDE DSINCLUDE JOBCONTROL.H
$INCLUDE DSINCLUDE DSD.H
$INCLUDE DSINCLUDE DSD_RTSTATUS.H
ErrorCode = 0 ; * set
this to non-zero to stop the stage/job
JobName = Field(STAGECOM.NAME,'.',1,2)
ParamList =
STAGECOM.JOB.CONFIG<CONTAINER.PARAM.NAMES>
If ParamList = '' Then
Call DSLogWarn('Parameters may not be
externally derived if the job has no parameters
defined.',SetParams)
Return
End
ArgList = Trims(Convert(',',@FM,InputArg))
ParamDir = ArgList<1>
If ParamDir = '' Then
ParamDir = '.'
End
ParamFile = ArgList<2>
If ParamFile = '' Then
ParamFile = JobName
End
If System(91) Then
Delim = '\'
End Else
Delim = '/'
End
ParamPath = ParamDir:Delim:ParamFile
End Else
Call StatusFileName =
FileInfo(DSRTCOM.RTSTATUS.FVAR,1)
Readvu LockItem From DSRTCOM.RTSTATUS.FVAR,
JobName, 1 On Error
Call DSLogFatal('File read error for
':JobName:' on ':StatusFileName:'. Status =
':Status(),SetParams)
ErrorCode = 1
ReturnDSLogFatal('Failed to read
':JobName:' record from
':StatusFileName,SetParams)
ErrorCode = 2
Return
End
StatusId = JobName:'.':STAGECOM.WAVE.NUM
Readv ParamValues From
DSRTCOM.RTSTATUS.FVAR, StatusId, JOB.PARAM.VALUES
On Error
Release DSRTCOM.RTSTATUS.FVAR, JobName
On Error Null
ErrorCode = 1
Call DSLogFatal('File read error for
':StatusId:' on ':StatusFileName:'. Status =
':Status(),SetParams)
Return
End Else
Release DSRTCOM.RTSTATUS.FVAR, JobName
On Error Null
ErrorCode = 2
Call DSLogFatal('Failed to read
':StatusId:' record from
':StatusFileName,SetParams)
Return
End
Loop
ReadSeq ParamData From ParamFileVar On
Error
Locate(ParamName,ParamList,1;ParamPos)
Then
If
Index(UpCase(ParamName),'PASSWORD',1) = 0
Then Call DSLogInfo('Parameter
"':ParamName:'" set to
"':ParamValue:'"',SetParams)
Else Call DSLogInfo('Parameter
"':ParamName:'" set but not displayed on
log',SetParams)
End
Else
Call DSLogWarn('Parameter
':ParamName:' does not exist in Job
':JobName,SetParams)
Continue
End
ParamValues<1,ParamPos> = ParamValue
Repeat
Writev ParamValues On
DSRTCOM.RTSTATUS.FVAR, StatusId, JOB.PARAM.VALUES
On Error
Release DSRTCOM.RTSTATUS.FVAR, JobName
On Error Null
ErrorCode = 5
Call DSLogFatal('File write error for
':StatusId:' on ':StatusFileName:'. Status =
':Status(),SetParams)
Return
End Else
Release DSRTCOM.RTSTATUS.FVAR, JobName
On Error Null
ErrorCode = 6
Call DSLogFatal('Unable to write
':StatusId:' record on ':StatusFileName:'. Status
= ':Status(),SetParams)
Return
End
JobParam%%1 = STAGECOM.STATUS<7,1>
JobParam%%2 = STAGECOM.STATUS<7,2> etc
seq$V0S10$count = 0
seq$V0S43$count = 0
seq$V0S44$count = 0
handle$list = ""
id$list = ""
abort$list = ""
b$Abandoning = @False
b$AllStarted = @False
summary$restarting = @False
*** Sequence start point
summary$ =
DSMakeMsg("DSTAGE_JSG_M_0048\Summary of sequence
run", "")
If summary$restarting Then
summary$<1,-1> = Time$$:Convert(@VM, " ",
DSMakeMsg("DSTAGE_JSG_M_0049\Sequence restarted
after failure", ""))
End Else
summary$<1,-1> = Time$$:Convert(@VM, " ",
DSMakeMsg("DSTAGE_JSG_M_0051\Sequence started",
""))
End
GoSub L$V0S2$START
b$AllStarted = @True
GoTo L$WAITFORJOB
**************************************************
L$V0S0$START:
*** Activity
"FR_PARIS_End_to_End_Processing_SAct": Initialize
job
summary$<1,-1> = Time$$:Convert(@VM, " ",
DSMakeMsg("DSTAGE_JSG_M_0057\%1 (JOB %2)
started",
"FR_PARIS_End_to_End_Processing_SAct":@FM:"FR_PAR
IS_End_to_End_Processing_Seq"))
Call DSLogInfo(DSMakeMsg("SEQUENCE - START
End_to_End_Processing_Seq", ""),
"@FR_PARIS_End_to_End_Processing_SAct")