Professional Documents
Culture Documents
DATA INTEGRATOR
1 How does the statement Single point of Integration suits the data integrator?
Ans: DI combines both batch data movement and management with caching to provide a single data integration
platform for information management from any information source and for any information use
7 What are the various analyses present with Data Integrator and BO Enterprise?
Ans: Datastore analysis: Use reports to see whether the following BI reports uses data from tables that are
contained in Business Views, Crystal Reports, and Universe etc.
Dependency analysis: Search specific objects in repository and understand whether they impact or are impacted
by other DI or BO universe or reports.
Universe analysis: View universe class and object lineage
Business view analysis: View data sources for Business Views in the CMS
Report analysis: View data sources for reports in the CMS
Chapter 3
4 What are file formats? Or Difference between file format and format.
Ans: A set of properties describing the structure of a flat file or a metadata structure.
A format describes a specific file, whereas a file format template is a generic description that can be used for
many data files.
7 What are the various work areas for a file format editor?
Ans: Properties-Values: used to edit values for file format properties.
Column Attributes: used to edit and define columns or fields in file
Data Preview: used to view how the setting affects sample data.
Chapter 4
6 What are objects that can be elements in a work flow and explain each of them?
Ans: WF: WF can call other WF or itself & can nest to any depth.
DF
Conditionals: Single-use obj used to implement if/then/else logicin a WF
While loops: Single-use obj used in WF
Try/catch blocks: Single-use obj. Combination of one try and one or more catch objects that allows to specify
alternative WF if an error is encountered.
Scripts: Single-use obj. Used to call functions & assign values to variables in WF.
7 What are the necessary conditions for creation of jobs, workflow, dataflow?
Ans: Use consistent naming convention between above 3.
Define job flow step by step i.e. create job 1st, add a WF to job, then add DF.
Name WF in order of sequence.
Job invoked by 3rd party: for this: The corresponding Job Server must be running.
The DI Designer does not need to be running.
Chapter 5
2 What are the transform operations that are not pushed down?
Ans: Expressions that include DI functions that do not have DB
correspondents.
Load operations that contain triggers.
Not all operations can be combined into single requests.
4 Explain annotations.
Ans: Describes a flow or part of a flow, diagram in workspace, job, DF, WF, catch, conditionals, or while loops.
9 View Data displays data in rows & columns. How number of rows displayed?
Ans: Sample size: the number of rows sampled in memory. Default size
1000 rows for imported source, targets & transforms.
Filtering
Sorting: If original data set is smaller or used filters, then number of returned rows could be less than the
default.
Filter: Debug filter functions as Query Transform with a Where clause, but complex expressions are not
supported.
Breakpoint: can only see filtered row.
It is the location where a debug job execution pauses & returns control.
It applies to after image for Update, Normal & Insert row types & before image for a Delete row type.
It used without a conditions pauses job execution for the Ist row passed to breakpoint
12 DI and SQL
Ans: DI only shows SQL generated for table source but not for SQL sources that are not table sources e.g. lookup
function, key_generation transform, key_generation function, table_comparison transform, and target tables
Chapter 6
1 Explain transform.
Ans: Is a step in a DF that act as a data set.
Manipulates data I/P sets & produce one or more O/P data sets
Can edit I/P data options & O/P data in a transform.
If I/p set contain nested schema the nested data is passed without change
Ans: Loading a data set that contains nested schemas into a relational target requires nested rows to be
unnested.
Unnesting a schema produces a cross-product of top-level schema & nested schema
Can load different column from different nesting levels into different schemas.
Chapter 7
1 Explain functions.
Ans: Function take input values & produce a return value.
A SQL outer join may return multiple matches for a single record in outer table.
Functions always returns exactly same number of records that are in source.
* Sophisticated caching options.
* A default value when no match is found.
Chapter 8
1 Note on variables
Ans: Symbolic placeholders for values.
Datatype supported: integer, decimal, date, text string.
Used in expression for decision-making or data manipulation.
Name must start with $.
Script used to assign values to variable in a WF.
8 Explain basic syntax rule, syntax for column & table references in expressions.
no comments:
Post a Comment
Older PostHome
Designer
Repository
Job Server
Engines
Access Server
Adapters
Real-time Services
Address Server
Management Console
Answer:
Create a single environment for developing, testing, and deploying the entire data integration
platform.
Manage a single metadata repository to capture the relationships between different extraction and
access methods and provide integrated lineage and impact analysis.
A job is the smallest unit of work that you can schedule independently for execution.
A work flow defines the decision-making process for executing data flows.
Data flows extract, transform, and load data. Everything having to do with data, including reading
sources, transforming data, and loading targets, occurs inside a data flow.
5. Arrange these objects in order by their hierarchy: Dataflow, Job, Project, and Workflow.
Answer
Project, Job, Workflow, Dataflow.
6. What are reusable objects in DataServices?
Answer:
Job, Workflow, Dataflow.
7. What is a transform?
Answer:
A transform enables you to control how datasets change in a dataflow.
8. What is a Script?
Answer:
A script is a single-use object that is used to call functions and assign values in a workflow.
9. What is a real time Job?
Answer:
Real-time jobs "extract" data from the body of the real time message received and from any secondary
sources used in the job.
10. What is an Embedded Dataflow?
Answer:
An Embedded Dataflow is a dataflow that is called from inside another dataflow.
11. What is the difference between a data store and a database?
Answer:
A datastore is a connection to a database.
12. How many types of datastores are present in Data services?
Answer:
Three.
Database Datastores: provide a simple way to import metadata directly froman RDBMS.
Application Datastores: let users easily import metadata frommost Enterprise Resource Planning
(ERP) systems.
Adapter Datastores: can provide access to an applications data and metadata or just metadata.
A local repository
A central repository
A profiler repository
Answer:
A Repository is a set of tables that hold system objects, source and target metadata, and transformation
rules. A Datastore is an actual connection to a database that holds data.
19. What is the difference between a Parameter and a Variable?
Answer:
A Parameter is an expression that passes a piece of information to a work flow, data flow or custom function
when it is called in a job. A Variable is a symbolic placeholder for values.
20. When would you use a global variable instead of a local variable?
Answer:
When the variable will need to be used multiple times within a job.
When you want to reduce the development time required for passing values between job
components.
When you need to create a dependency between job level global variable name and job
components.
Whether or not the server can handle the processing requirements of flows running at the same
time (in parallel)
24. What does a lookup function do? How do the different variations of the lookup function differ?
Answer:
All lookup functions return one row for each row in the source. They differ in how they choose which of
several matching rows to return.
'
25. List the three types of input formats accepted by the Address Cleanse transform.
Answer:
Data_Transfer
Date_Generation
Effective_Date
Hierarchy_Flattening
History_Preserving
Key_Generation
Map_CDC_Operation
Table_Comparison
XML_Pipeline
Global_Address_Cleanse
Data_Cleanse
Match
Associate
Country_id
USA_Regulatory_Address_Cleanse
The Data Cleanse transform identifies and isolates specific parts of mixed data, and standardizes your data
based on information stored in the parsing dictionary, business rules defined in the rule file, and expressions
defined in the pattern file.
32. What is the difference between Dictionary and Directory?
Answer:
Directories provide information on addresses from postal authorities. Dictionary files are used to identify,
parse, and standardize data such as names, titles, and firm data.
33. Give some examples of how data can be enhanced through the data cleanse transform, and
describe the benefit of those enhancements.
Answer:
Enhancement Benefit
34. A project requires the parsing of names into given and family, validating address information,
and finding duplicates across several systems. Name the transforms needed and the task they will
perform.
Answer:
35. Describe when to use the USA Regulatory and Global Address Cleanse transforms.
Answer:
Use the USA Regulatory transform if USPS certification and/or additional options such as DPV and Geocode
are required. Global Address Cleanse should be utilized when processing multi-country data.
36. Give two examples of how the Data Cleanse transform can enhance (append) data.
Answer:
The Data Cleanse transform can generate name match standards and greetings. It can also assign gender
codes and prenames such as Mr. and Mrs.
37. What are name match standards and how are they used?
Answer:
Name match standards illustrate the multiple ways a name can be represented.They are used in the match
process to greatly increase match results.
38. What are the different strategies you can use to avoid duplicate rows of data when re-loading a
job.
Answer:
Designing the data flow to completely replace the target table during each execution.
Row-by-row select look up the target table using SQL every time it receives an input row. This
option is best if the target table is large.
Cached comparison table To load the comparison table into memory. This option is best when
the table fits into memory and you are comparing the entire target table
Sorted input To read the comparison table in the order of the primary key column(s) using
sequential read.This option improves performance because Data Integrator reads the comparison
table only once.Add a query between the source and the Table_Comparison transform. Then, from
the querys input schema, drag the primary key columns into the Order By box of the query.
Caching data
Join ordering
Improving throughput
Company Name
Design type 2 in BODS?
1. History preserve transform
2. Table comparison, map operation, case transform
What will happen if a column removed in sql t/f?
1. Job will not show error while validating but during running or if we update schema in sql t/f, it will
fail.
Company Name
How many types datastores in BODS?
3 Types
Databases and mainframe file systems.
Applications that have pre-packaged or user-written adapters.
J.D. Edwards One World and J.D. Edwards World, Oracle Applications, PeopleSoft, SAP ERP and
SAP NetWeaver BW, and Siebel Applications. See the appropriate supplement guide.
Difference between parameters and variables?
What are the available lookups and brief about those?
Lookup
Lookup_ext
Lookup_seq
Company Name
Brief about oracle exception? Built in and user defined?
Write an exception to update data when data not inserted?
Exception when unique constraint error exists - Duplicate
Can we do DML operation in oracle function?
Yes
Can we call oracle function with DML operation in SELECT statement?
It throws error
parallel?
Consider the following:
Whether or not the flows are independent of each other
Whether or not the server can handle the processing requirements of flows running at the same time
(in parallel)
What does a lookup function do? How do the different variations of the lookup function differ?
All lookup functions return one row for each row in the source. They differ in how they choose which
of several matching rows to return.
List the three types of input formats accepted by the Address Cleanse transform.
Discrete, multiline, and hybrid.
Name the transform that you would use to combine incoming data sets to produce a single output
data set with the same schema as the input data sets.
The Merge transform.
What are Adapters?
Adapters are additional Java-based programs that can be installed on the job server to provide
connectivity to other systems such as Salesforce.com or the JavaMessagingQueue. There is also a
SoftwareDevelopment Kit (SDK) to allow customers to create adapters for custom applications.
List the data integrator transforms
Data_Transfer
Date_Generation
Effective_Date
Hierarchy_Flattening
History_Preserving
Key_Generation
Map_CDC_Operation
Pivot Reverse Pivot
Table_Comparison
XML_Pipeline
List the Data Quality Transforms
Global_Address_Cleanse
Data_Cleanse
Match
Associate
Country_id
USA_Regulatory_Address_Cleanse
What are Cleansing Packages?
These are packages that enhance the ability of Data Cleanse to accurately process various forms of
global data by including language-specific reference data and parsing rules.
What is Data Cleanse?
The Data Cleanse transform identifies and isolates specific parts of mixed data, and standardizes
your data based on information stored in the parsing dictionary, business rules defined in the rule file,
and expressions defined in the pattern file.
What is the difference between Dictionary and Directory?
Directories provide information on addresses from postal authorities. Dictionary files are used to
identify, parse, and standardize data such as names, titles, and firm data.
Give some examples of how data can be enhanced through the data cleanse transform, and describe
the benefit of those enhancements.
Enhancement Benefit
Determine gender distributions and target
Gender Codes marketing campaigns
Provide fields for improving matching
Match Standards results
A project requires the parsing of names into given and family, validating address information, and
finding duplicates across several systems. Name the transforms needed and the task they will
perform.
Data Cleanse: Parse names into given and family.
Address Cleanse: Validate address information.
Match: Find duplicates.
Describe when to use the USA Regulatory and Global Address Cleanse transforms.
Use the USA Regulatory transform if USPS certification and/or additional options such as DPV and
Geocode are required. Global Address Cleanse should be utilized when processing multi-country
data.
Give two examples of how the Data Cleanse transform can enhance (append) data.
The Data Cleanse transform can generate name match standards and greetings. It can also assign
gender codes and prenames such as Mr. and Mrs.
What are name match standards and how are they used?
Name match standards illustrate the multiple ways a name can be represented.They are used in the
match process to greatly increase match results.
What are the different strategies you can use to avoid duplicate rows of data when re-loading a job.
Using the auto-correct load option in the target table.
Including the Table Comparison transform in the data flow.
Designing the data flow to completely replace the target table during each execution.
Including a preload SQL statement to execute before the table loads.
What is the use of Auto Correct Load?
It does not allow duplicated data entering into the target table.It works like Type 1 Insert else Update
the rows based on Non-matching and matching data respectively.
What is the use of Array fetch size?
Array fetch size indicates the number of rows retrieved in a single request to a source database. The
default value is 1000. Higher numbers reduce requests, lowering network traffic, and possibly
improve performance. The maximum value is 5000
What are the difference between Row-by-row select and Cached comparison table and sorted input
in Table Comparison Tranform?
Row-by-row select look up the target table using SQL every time it receives an input row. This option
is best if the target table is large.
Cached comparison table - To load the comparison table into memory. This option is best when the
table fits into memory and you are comparing the entire target table
Sorted input - To read the comparison table in the order of the primary key column(s) using
sequential read.This option improves performance because Data Integrator reads the comparison
table only once.Add a query between the source and the Table_Comparison transform. Then, from
the query's input schema, drag the primary key columns into the Order By box of the query.
What is the use of using Number of loaders in Target Table?
Number of loaders loading with one loader is known as Single loader Loading. Loading when the
number of loaders is greater than one is known as Parallel Loading. The default number of loaders is
1. The maximum number of loaders is 5.
What is the use of Rows per commit?
Specifies the transaction size in number of rows. If set to 1000, Data Integrator sends a commit to the
underlying database every 1000 rows.
What is the difference between lookup (), lookup_ext () and lookup_seq ()?
lookup() : Briefly, It returns single value based on single condition
lookup_ext(): It returns multiple values based on single/multiple condition(s)
lookup_seq(): It returns multiple values based on sequence number
What is the use of History preserving transform?
The History_Preserving transform allows you to produce a new row in your target rather than
updating an existing row. You can indicate in which columns the transform identifies changes to be
preserved. If the value of certain columns change, this transform creates a new row for each row
flagged as UPDATE in the input data set.
What is the use of Map-Operation Transfrom?
The Map_Operation transform allows you to change operation codes on data sets to produce the
desired output. Operation codes: INSERT UPDATE, DELETE, NORMAL, or DISCARD.
What is Heirarchy Flatenning?
Constructs a complete hierarchy from parent/child relationships, and then produces a description of
the hierarchy in vertically or horizontally flattened format.
Parent Column, Child Column
Parent Attributes, Child Attributes.
What is the use of Case Transform?
Use the Case transform to simplify branch logic in data flows by consolidating case or decisionmaking logic into one transform. The transformallows you to split a data set into smaller sets based
on logical branches.
What must you define in order to audit a data flow?
You must define audit points and audit rules when you want to audit a data flow.
List some factors for PERFORMANCE TUNING in data services?
The following sections describe ways you can adjust Data Integrator performance
Source-based performance options
Using array fetch size
Caching data
Join ordering
Minimizing extracted data
Target-based performance options
Loading method and rows per commit
Staging tables to speed up auto-correct loads
Job design performance options
Improving throughput
Maximizing the number of pushed-down operations
Minimizing data type conversion
Minimizing locale conversion
Improving Informix repository performance