You are on page 1of 38

ETL - BODI Interview Question

Important Info of Business Object Data Integrator

DATA INTEGRATOR

1 How does the statement Single point of Integration suits the data integrator?
Ans: DI combines both batch data movement and management with caching to provide a single data integration
platform for information management from any information source and for any information use

2 State and explain the key function of data integrator.


Ans: Loading data: Loading ERP or Enterprise application data into an operational datastore and update in batchtime
Routing request: Creates information requests to a DW or ERP system using complex rules
Applying transaction: DI can apply data changes in a variety of data formats and any custom format.

3 State and explain various data integrator components.


Ans: Designer: Is development interface that allows creating, testing and manually executing jobs that load a
DW. Function performed is data mapping, transformation and control logic.
Repository: Set of tables that hold user-created and predefined system objects, metadata, transformation etc.
Types: Local repository: Used to store definitions of DI obj and source/target metadata.
Central repository: Used to support multi-user development. Provides a shared obj library.
Service: Starts Job Server when the system is restarted.
Component: Job Server:
Access Server:
Web Server:
Includes: Administrator: Includes scheduling, starting& stopping real-time services, managing adapters etc.
Metadata Reporting: Provides browser based reports on Enterprise metadata stored in the DI repository.

4 Explain the process of running a job from designer.


Ans: the Designer tells job server to run the job. The job server then gets the job from associated repository and
starts an engine to process the job.

5 Explain job server and engine.


Ans: when jobs are executed, the DI engine starts the data movement engine that integrates data from
heterogeneous sources, performs complex data transformations and manages extraction and transactions in the
ETL process.

6 State the various function of administrator.


Ans Scheduling, monitoring & executing batch job. Managing User.
Configuring, starting and stopping real-time service.
Configuring and managing adapters

7 What are the various analyses present with Data Integrator and BO Enterprise?
Ans: Datastore analysis: Use reports to see whether the following BI reports uses data from tables that are
contained in Business Views, Crystal Reports, and Universe etc.
Dependency analysis: Search specific objects in repository and understand whether they impact or are impacted
by other DI or BO universe or reports.
Universe analysis: View universe class and object lineage
Business view analysis: View data sources for Business Views in the CMS
Report analysis: View data sources for reports in the CMS

8 State and explain the various management tools in data integrator.


Ans: Repository Manager: Allows creating, upgrading and checking the versions of local and central repositories.
Server Manager: Allows adding, deleting or editing properties of Job Server

9 Name some common DI Objects


Ans: Projects, Jobs, Work flows, Data flows, Scripts, Transforms

10 Distinguish between single-use objects and reusable objects


Ans: Single-use objects: > Appear only as components of other objects
> Cannot be copied
> operate only in the context in which they are created.
Reusable objects: > has a single definition

> All calls to the object refer to that definition.


> If the definition of the object is changed in one place then the change is reflected in all other called objects

11 How can the behavior of various objects be changed?


Ans: Through
Options: Control the obj
Properties: Describe the obj
Classes: the obj are of two classes: Single-use, Reusable.

12 State relationships between work flow and data flow.


Ans: A work flow is incorporation of several data flows into a coherent flow of work for an entire job.
A data flow is the process by which source data is transformed into target data

13 State the common characteristics of a project


Ans: They are listed in the local obj library.
Only one project can be open at a time.
They cannot be shared among multiple users.

14 State and explain various phases of DI development process.


Ans: Design: Define objects & build diagrams that instruct DI in data movement requirements.
Test: Here DI is used to test the execution of application. Can test for errors & trace flow of execution.
Production: set up a schedule in DI to run the application as a job. Whenever can return to design phase.

Chapter 3

1 What are datastores?


Ans: Represent connections between DI and databases or applications, directly or through adapters. It allows DI
to access metadata from a DB or application & read from or write to that DB or application.

2 What do metadata consists of?


Ans: Database tables:
Table name
Column name

Column data types


Primary key columns
Table attribute
RDBMS functions
Application-specific data structures

3 State and explain types of datastores.


Ans: Database datastore: Provide a simple way to import metadata directly from a broad variety of RDBMS.
Application datastore: let users easily import metadata from most ERP systems
Adapter datastore: Provide access to an applications data and metadata or just metadata.

4 What are file formats? Or Difference between file format and format.
Ans: A set of properties describing the structure of a flat file or a metadata structure.
A format describes a specific file, whereas a file format template is a generic description that can be used for
many data files.

5 What are various file formats that a file can be described?


Ans: Delimited format: delimiter character separates each field. Max length is 1000 characters.
Fixed width format: the column width is specified by the user. Max length is 1000 characters.
SAP R/3 format: used with predefined transport_Format or with a custom SAP R/3 format.

6 State and explain various mode of file format editor.


Ans: New Mode: used to create new file format template.
Edit Mode: used to edit an existing file format template.
Source Mode: used to edit the file format of a particular source file
Target Mode: used to edit the file format of a particular target file.

7 What are the various work areas for a file format editor?
Ans: Properties-Values: used to edit values for file format properties.
Column Attributes: used to edit and define columns or fields in file
Data Preview: used to view how the setting affects sample data.

Chapter 4

1 What are the common characteristics of a project?


Ans: Listed in a obj library.
Only one project can be open at a time.
Cannot be shared among multiple users.

2 What is a job diagram?


Ans: Made up of one or two objects connected together.

3 What are the things that can be included in a job definition?


Ans: Data flows
Sources
Targets
Transforms
Work flows
Scripts
Conditionals
While Loops
Try/Catch blocks

4 How can a work flow be used?


Ans: Defines decision-making process for executing data flows.
Used for:
Elements in WF can determine path of execution
Can indicate an alternative path if something goes wrong. Prepare for executing DF. Set state of system after the
DF are complete.

5 Why a job is known to be a special work flows?


Ans: Because User can execute them.

6 What are objects that can be elements in a work flow and explain each of them?
Ans: WF: WF can call other WF or itself & can nest to any depth.
DF
Conditionals: Single-use obj used to implement if/then/else logicin a WF
While loops: Single-use obj used in WF
Try/catch blocks: Single-use obj. Combination of one try and one or more catch objects that allows to specify
alternative WF if an error is encountered.
Scripts: Single-use obj. Used to call functions & assign values to variables in WF.

7 What are the necessary conditions for creation of jobs, workflow, dataflow?
Ans: Use consistent naming convention between above 3.
Define job flow step by step i.e. create job 1st, add a WF to job, then add DF.
Name WF in order of sequence.

8 What is data flow?


Ans: Extract, transform & load data. Lines connecting objects represent flow of data through data transformation
steps. After DF, can add it to
job or WF.

9 What are the steps involved in data flow?


Ans: closed operations. Steps:
Source and target objs
Transforms
10 What can a work flow do even though it does not operate on data sets & cannot provide more data to data
flow?
Ans: Call DF to perform data movement operations.
Define conditions appropriate to run DF.
Pass parameters to & from DF

11 What is data set?


Ans: Each step in a DF produces an intermediate result which flows to next step in DF. This intermediate result
consists of a set of rows from the previous operation & schema in which rows are arranged. This result is called
data set.

12 Explain transform & what is the various type of transform?


Ans: Operate on data set.
Transforms manipulate input sets & produce one or more output set. Can be used as output data set or source
obj.
Query transform
Case transform
Merge transform
Row_Generation transform etc
13 List various operations that query transform can perform.
Ans: Choose data to extract from sources.
Join data from multiple sources.
Map columns from I/P to O/P schema.
Perform transformations and function on data.
Add new columns, nested schemas & function results to the O/P schema.
Assign primary keys to O/P columns

14 Explain query editor & what are its components?


Ans: I/P & O/P schema area:
Can contain: Columns.
Nested schemas
Functions
Parameters area

15 List the 3 different ways in which a job is executed.


Ans: Immediate jobs: DI initiates both batch & real time jobs, runs immediately from within Designer. For these
both Designer & designated job server must be running.
Scheduled jobs: Batch jobs are scheduled jobs

Job invoked by 3rd party: for this: The corresponding Job Server must be running.
The DI Designer does not need to be running.

16 Explain template table.


Ans: Useful in early application development (designing & testing)
Do not have to initially create a new table in the DBMS or import metadata. Instead DI automatically create the
table in DB with schema defined by the DF when job is executed.

17 Explain current schema.


Ans: The currently selected O/P schema is known as current schema.
It determines:
The O/P element that can be modified( add, mapped & deleted)
The scope of the Select through Order By tabs in the parameter areas.
The current schema is highlighted while all other are gray.

Chapter 5

1 What are the operations that a DI pushes to DB?


Ans: Aggregations: used with group by clause.
Produce data set smaller than or of same size as of
original data set.
Distinct rows: Unique rows are shown as O/P.
Filtering: Produce data smaller or of equal size of original data set.
Joins: Produce data smaller or of equal size of original tables.

Ordering: does effect data set size. Performs sorting.


Projections: produces smaller data set because it only returns
columns referenced by a DF.
Functions: underlying DB are appropriately translated.

2 What are the transform operations that are not pushed down?
Ans: Expressions that include DI functions that do not have DB
correspondents.
Load operations that contain triggers.
Not all operations can be combined into single requests.

3 Explain description of obj.


Ans: Description is associated with a particular object. When obj is imported or exported the description is also
imported or exported.

4 Explain annotations.
Ans: Describes a flow or part of a flow, diagram in workspace, job, DF, WF, catch, conditionals, or while loops.

5 State and explain various traffic light in monitor tab.


Ans: Green light: job is running.
Red light: job has stopped.
Red cross: job encountered an error

6 To produce expected result state the awareness needed.


Ans: Data was not converted to incompatible types or truncated.
Data was not duplicated in the target.
Data was not lost between updates of the target.
Generated keys have been properly incremented.
Updated values were handled properly.

7 what if the job fails to execute?


Ans: Check the job server icon in status bar.
Verify that job service is running.

Check that the port number in designer matches the number


specified in server manager.
Use server manager resync button to reset the port number in the
local repository.

8 With View Data what is the use on sources & target?


Ans:Is to check status of data at any point after a data source is imported
& before or after DF is processed. Also when job is designed & tested to ensure that design returns the expected
result.
Sources: used before a job is executed.
Using data details can help to: create higher quality job designs.
Scan & analyze imported table & file data from obj
library.
See data for those same objs within existing jobs.
Refer back to source data after job is executed.
Targets: allows to check target data before job execution.

9 View Data displays data in rows & columns. How number of rows displayed?
Ans: Sample size: the number of rows sampled in memory. Default size
1000 rows for imported source, targets & transforms.
Filtering
Sorting: If original data set is smaller or used filters, then number of returned rows could be less than the
default.

10 Tips for using View Data


Ans: Use one or more View Data windows to view & compare sample data from different steps.
Use View Data while building a job to ensure that design returns expected results.

11 explain filters and breakpoints.


Ans:They can be set on lines in a DF diagram before starting debugging.
This allows to examine & modify data row by row.
They can be set between a source & transform or two transforms.
If filter & breakpoint are set on same line then DI set filter first.

Filter: Debug filter functions as Query Transform with a Where clause, but complex expressions are not
supported.
Breakpoint: can only see filtered row.
It is the location where a debug job execution pauses & returns control.
It applies to after image for Update, Normal & Insert row types & before image for a Delete row type.
It used without a conditions pauses job execution for the Ist row passed to breakpoint

12 DI and SQL
Ans: DI only shows SQL generated for table source but not for SQL sources that are not table sources e.g. lookup
function, key_generation transform, key_generation function, table_comparison transform, and target tables

Chapter 6

1 Explain transform.
Ans: Is a step in a DF that act as a data set.
Manipulates data I/P sets & produce one or more O/P data sets
Can edit I/P data options & O/P data in a transform.

2 State & explain various operations codes provided by DI.


Ans: they r described by I/P to &O/P from obj in DF. Can be used with transforms to indicate how each row in
data set is applied to a target table.
Normal: Create a new row in target.
When extracted by a source table or file they r flagged as
Normal.
If a row is flagged as Normal when loading into a target table or file then it is inserted as a new row in the target.
Insert: Create a new row in target.
Rows are flagged Insert by table_comparison transform to indicate that a change has occurred in a data set as
compared with earlier image of same data set.
Delete: Ignored by target. They are not loaded. Only by Map_Operation transform.

Update: Overwrites an existing row in target table.


Flagged by table_comparison transform to indicate that a change has occurred in a data set as compared with
earlier image of same data set.
Map_operation also flags. History_Preserving & key_Generation transforms can accept data sets with rows
flagged.

3 Note on case transform.


Ans: Provides case logic based on row values.
Operates within DF.
Data I/P: Only one DF source.
Only one of multiple branches is executed per row.
I/O schema r identical.
Options: Label: name of connection description indicating where data will go if the corresponding Case condition
is true.
Expression: for corresponding label
Default: only available if Produce default option when all expression are false option is enabled.
True: for one case only option is enabled, the row is passed to 1st case whose expression returns true else
passed to all cases whose expression returns true.
Data O/P: connection between case transform & obj used for a particular case must be labeled.
Each O/P label must be used at least once

4 Note on merge transform.


Ans: Combines incoming data sets, producing a single O/P data set with the same schema as I/P data sets.
Data Input: MT performs a union of sources.
All sources must have: Same no of columns.
Same column names.
Same data types of columns.
If I/P data set contains hierarchical data, the names & data types must match at every level of hierarchy.
Data Output: O/P data has same schema as source data.
Transform does not strip out duplicate rows.

If I/p set contain nested schema the nested data is passed without change

5 Note on validation transform.


Ans: Allow to define a reusable business rule to validate each record & column.
Data I/P: one source in a data flow.
Options: Enable Validation
Do not validate when Null
Condition
Action on fail
Data O/P: Outputs 2 different data sets based on the validation condition specified.

6 What do Validation rule consists of?


Ans: condition & an action on failure.
Use condition to describe what is needed for valid data.
Use Action on failure area to describe what will happen to invalid or failed data.

7 What can be done for failed columns?


Ans: Where to send a row of data when a value in a column fails to meet the conditions specified in rule:
Send to Fail
Send to Pass
Send to both
Can also specify what value to insert as a substitute for a failed value with For Pass, Substitute with option.
Only if: The column value failed validation rule.
Send to Pass or Send to both options are selected.

8 Note on Date_Generation transform

Ans:Ideal for creating time dimension tables. Produces a series of


date incremented as specified.
Options: Start date
End date
Increment
Join rank
Cache
Data O/P: does not generate hierarchical data.
Generated data has range 1900.01.01 through 9999.12.31.
A data set with a single column named DI_GENERATED_DATE containing date sequence.

9 Hierarchical data representation.


Ans: various ways: Multiple rows in a single data set.
Nested data

10 Explain Nested data.


Ans: Using nested data method can be more concise like no repeated information.
Can scale to present a deeper level of hierarchical complexity

11 Explain XML document.


Ans: they are hierarchical.
Their valid structure is stored in separate format documents.
The format of XML file or message(.xml) is specified by dtd or XML schema.(describe data schema of xml
message or file)

12 Explain importing of metadata from a DTD file.


Ans: If metadata is imported from XML file, DI automatically retrieves
DTD for that XML file.
When importing a DTD format, DI reads the defined elements & attributes but ignores text & comments, from file
definition.

13 Explain importing of metadata from an XML schema.


Ans: XML schema make distinction between attributes & elements, DI

imports & convert them in to nested table & attributes.


When importing an XML schema, DI reads defined elements and attributes & imports:
Document structure.
Table & column names.
Data type of each column.
Nested table & column attributes

14 Explain uses of nested data & Query transform.


Ans: Nested data included in transform, exception of Query transform,
passes through transform without being included in transform.
Query transform is used with nested data to unnest data, peform transformation, load data into a target
relational table(because else only column at 1st level of I/p data is transformed).
Query transform assumes that the FROM in SELECT contain data sets that are connected as I/P to query
objects.
With nested data, query transform provides an interface to perform SELECT at each level of relationship that is
defined in O/P schema, but explicitly define FROM

15 Explain from clause construction.


Ans: When including:
1A schema in FROM clause, indicates that all columns, including column with nested schema, are available to be
included in O/P.
2 More than one schema in FROM clause, indicates that O/P will be from cross products of 2 schemas,
constrained by WHERE clause for current schema.

16 what can a FROM clause contain?


Ans: any top-level schema from I/P.
Any schema that is a column of a scheme in FROM clause of
parent schema.

17 Explain unnesting data.

Ans: Loading a data set that contains nested schemas into a relational target requires nested rows to be
unnested.
Unnesting a schema produces a cross-product of top-level schema & nested schema
Can load different column from different nesting levels into different schemas.

18 Note on XML_Pipeline on transform.


Ans: used to process large XML files.
DI does not need to read entire XML I/P into memory & build an internal data structure before performing
transformation.
Data I/P: XML file or message.

19 Rules for using XML_Pipeline transform.


Ans: Cannot drag & drop root level schema.
Can drag & drop same child obj repeated time to O/P schema(before that rename the mapped instance)
Cannot map parent schema for a column while dragging & dropping it.
Cannot map items from 2 sibling repeating sub-schemas because transform does not support Cartesian poduct of
2 repeatable schema.

Chapter 7

1 Explain functions.
Ans: Function take input values & produce a return value.

2 Differentiate between functions & transforms


Ans: functions operate on single values.
Transform operate on data sets, creating, updating and deleting rows of data.

3 Types of operations for a function.


Ans: Aggregate: Generate a single value from set of value.
Can be called from Query transform not from custom function or scripts

Iterative: Maintain state information from one invocation to another.


Life of the information is execution life of query in which are included.
Can be called from Query transform not from script or functions
Stateless: State information is not maintained from one invocation to next. Can be used anywhere.

4 Different categories in which functions are grouped into.


Ans: Aggregate, Conversion, Database, Date, Environment, Math, Miscellaneous, String, System, Validation.

5 List types of functions.


Ans: Database & applications functions: specific to DBMS.
The metadata for a function includes I/P, O/P & their data types.
Custom functions: these are functions that are defined. Can create own functions by writing script functions.

6 Place where functions are used.


Ans: Used to add: column based on other value
Generate key field.
Used in: Transforms
Scripts
Conditionals
Custom functions
7 Various date functions.
Ans: to_char (date1, format)
to_date (input_string, format)
julian (date1)
month (date1)
quarter(date1)
8 Various lookup functions.
Ans: lookup, lookup_seq, lookup_ext

9 What all lookup functions provide.


Ans: *A specialized type of join,

A SQL outer join may return multiple matches for a single record in outer table.
Functions always returns exactly same number of records that are in source.
* Sophisticated caching options.
* A default value when no match is found.

10 Describe the lookup functions.


Ans: Lookup_ext (): allows specification of an Order by column & return policy to return record with the
highest/lowest value in a given field.
Lookup_seq(): Searches in matching records to returns a field from record where sequences column is closest to
but not greater than a specified sequence value.

11 Various database functions


Ans: db_type(ds_name), db_version(), db_database_name, db_owner, decode(used to return an expression
based on 1st condition in specified list of conditions & expressions that evaluates to True)

Chapter 8

1 Note on variables
Ans: Symbolic placeholders for values.
Datatype supported: integer, decimal, date, text string.
Used in expression for decision-making or data manipulation.
Name must start with $.
Script used to assign values to variable in a WF.

2 About call tab.


Ans: call tab allows to view name of parameters. Values in a call tab must also use:
The same data type as variable if they are placed inside an I/P or I/O parameter type, & a compatible data type if
they r placed inside an O/P parameter type.
DI rules & syntax.

3 Where else a variable be used?

Ans: flat file source & target.


XML file source & target.
XML message target.
Document file source & target.
Document message target.

4 Local variable and parameter.


Ans: in DI local variables are restricted to obj in which they are created
Use parameters to pass to local variables to child obj.

5 Variables values & Smart Editor.


Ans: the return value must be passed outside function using: RETURN(expression)
Existing variables & parameters displayed in Smart Editor are filtered by the context from which the Smart Editor
is opened.

6 Expression & variable substitution.


Ans: Square brackets substitute the values of expression.
Curly brackets quote value of expression in single quote.

7 Difference between global & local variables.


Ans: local variables:
Restricted to obj in which they are created.
Must use parameters to pass local variables to child obj. Global variables:
Restricted to job in which they are created.
Do not require parameters to be passed to work flow & DF.

8 Explain basic syntax rule, syntax for column & table references in expressions.

Ans: basic syntax rule:


* Statements end with a semicolon(;)
* Variables begin with the dollar sign($)
* String values are enclosed in single quote()
* Comments begin with pound(#)
syntax for column & table references in expressions:
* expressions can be used inside DF obj, they often contain
column name.
* if more than one column with same name in I/P schema, indicate which column is included in an expression by
qualifying column name with the table name.

9 What are the script usage.


Ans: A script is executed before data flows for initialization steps & used in conjunction with conditionals to
determine execution paths.
Used after WF or DF to record execution information.
Used to calculate values that will be passed on to other parts of
WF. Used to assign values to variables & execute functions.

10 Various strings and variables in DI scripting language.


Ans: Quotation Marks: constants: String constant, Numeric constant.
Escape characters: \ or other special character are escaped since they are used in DI scripting language.
Trailing blank: not stripped from strings.

11 Explain custom functions.


Ans: when built-in functions do not meet the need of application then custom functions is created.
Thus CuF are: Written by user in DI scripting language
Reusable objects.
Managed through function wizard.
CuF return values through:
Function invocation.
O/P parameters.

Guidelines for creating CuF:


Functions can call other functions.
Functions cannot call themselves.
Functions cannot participate in a cycle of recursive
call.
Functions return value.
Functions can have parameters for I/P, O/P or both.
But data flow cannot pass parameters of type O/P or
I/O.
posted by mukund at 11:36 pm

no comments:
Post a Comment

Older PostHome

Subscribe to: Post Comments (Atom)

1. What is the use of BusinessObjects Data Services?


Answer:
BusinessObjects Data Services provides a graphical interface that allows you to easily create jobs that
extract data fromheterogeneous sources, transform that data to meet the business requirements of your
organization, and load the data into a single location.
2. Define Data Services components.
Answer:
Data Services includes the following standard components:

Designer

Repository

Job Server

Engines

Access Server

Adapters

Real-time Services

Address Server

Cleansing Packages, Dictionaries, andDirectories

Management Console

3. What are the steps included in Data integration process?

Answer:

Stage data in an operational datastore, data warehouse, or data mart.

Update staged data in batch or real-time modes.

Create a single environment for developing, testing, and deploying the entire data integration
platform.

Manage a single metadata repository to capture the relationships between different extraction and
access methods and provide integrated lineage and impact analysis.

4. Define the terms Job, Workflow, and Dataflow


Answer:

A job is the smallest unit of work that you can schedule independently for execution.

A work flow defines the decision-making process for executing data flows.

Data flows extract, transform, and load data. Everything having to do with data, including reading
sources, transforming data, and loading targets, occurs inside a data flow.

5. Arrange these objects in order by their hierarchy: Dataflow, Job, Project, and Workflow.
Answer
Project, Job, Workflow, Dataflow.
6. What are reusable objects in DataServices?
Answer:
Job, Workflow, Dataflow.
7. What is a transform?
Answer:
A transform enables you to control how datasets change in a dataflow.
8. What is a Script?
Answer:
A script is a single-use object that is used to call functions and assign values in a workflow.
9. What is a real time Job?
Answer:
Real-time jobs "extract" data from the body of the real time message received and from any secondary
sources used in the job.
10. What is an Embedded Dataflow?
Answer:
An Embedded Dataflow is a dataflow that is called from inside another dataflow.
11. What is the difference between a data store and a database?

Answer:
A datastore is a connection to a database.
12. How many types of datastores are present in Data services?
Answer:
Three.

Database Datastores: provide a simple way to import metadata directly froman RDBMS.

Application Datastores: let users easily import metadata frommost Enterprise Resource Planning
(ERP) systems.

Adapter Datastores: can provide access to an applications data and metadata or just metadata.

13. What is the use of Compace repository?


Answer:
Remove redundant and obsolete objects from the repository tables.
14. What are Memory Datastores?
Answer:
Data Services also allows you to create a database datastore using Memory as the Database type. Memory
Datastores are designed to enhance processing performance of data flows executing in real-time jobs.
15. What are file formats?
Answer:
A file format is a set of properties describing the structure of a flat file (ASCII). File formats describe the
metadata structure. File format objects can describe files in:

Delimited format Characters such as commas or tabs separate each field.

Fixed width format The column width is specified by the user.

SAP ERP and R/3 format.

16. Which is NOT a datastore type?


Answer:
File Format
17. What is repository? List the types of repositories.
Answer:
The DataServices repository is a set of tables that holds user-created and predefined system objects,
source and target metadata, and transformation rules. There are 3 types of repositories.

A local repository

A central repository

A profiler repository

18. What is the difference between a Repository and a Datastore?

Answer:
A Repository is a set of tables that hold system objects, source and target metadata, and transformation
rules. A Datastore is an actual connection to a database that holds data.
19. What is the difference between a Parameter and a Variable?
Answer:
A Parameter is an expression that passes a piece of information to a work flow, data flow or custom function
when it is called in a job. A Variable is a symbolic placeholder for values.
20. When would you use a global variable instead of a local variable?
Answer:

When the variable will need to be used multiple times within a job.

When you want to reduce the development time required for passing values between job
components.

When you need to create a dependency between job level global variable name and job
components.

21. What is Substitution Parameter?


Answer:
The Value that is constant in one environment, but may change when a job is migrated to another
environment.
22. List some reasons why a job might fail to execute?
Answer:
Incorrect syntax, Job Server not running, port numbers for Designer and Job Server not matching.
23. List factors you consider when determining whether to run work flows or data flows serially or in
parallel?
Answer:
Consider the following:

Whether or not the flows are independent of each other

Whether or not the server can handle the processing requirements of flows running at the same
time (in parallel)

24. What does a lookup function do? How do the different variations of the lookup function differ?
Answer:
All lookup functions return one row for each row in the source. They differ in how they choose which of
several matching rows to return.
'
25. List the three types of input formats accepted by the Address Cleanse transform.
Answer:

Discrete, multiline, and hybrid.


26. Name the transform that you would use to combine incoming data sets to produce a single
output data set with the same schema as the input data sets.
Answer:
The Merge transform.
27. What are Adapters?
Answer:
Adapters are additional Java-based programs that can be installed on the job server to provide connectivity
to other systems such as Salesforce.com or the JavaMessagingQueue. There is also a
SoftwareDevelopment Kit (SDK) to allow customers to create adapters for custom applications.
28. List the data integrator transforms
Answer:

Data_Transfer

Date_Generation

Effective_Date

Hierarchy_Flattening

History_Preserving

Key_Generation

Map_CDC_Operation

Pivot Reverse Pivot

Table_Comparison

XML_Pipeline

29. List the Data Quality Transforms


Answer:

Global_Address_Cleanse

Data_Cleanse

Match

Associate

Country_id

USA_Regulatory_Address_Cleanse

30. What are Cleansing Packages?


Answer:
These are packages that enhance the ability of Data Cleanse to accurately process various forms of global
data by including language-specific reference data and parsing rules.
31. What is Data Cleanse?
Answer:

The Data Cleanse transform identifies and isolates specific parts of mixed data, and standardizes your data
based on information stored in the parsing dictionary, business rules defined in the rule file, and expressions
defined in the pattern file.
32. What is the difference between Dictionary and Directory?
Answer:
Directories provide information on addresses from postal authorities. Dictionary files are used to identify,
parse, and standardize data such as names, titles, and firm data.
33. Give some examples of how data can be enhanced through the data cleanse transform, and
describe the benefit of those enhancements.
Answer:

Enhancement Benefit

Determine gender distributions and target

Gender Codes marketing campaigns

Provide fields for improving matching

Match Standards results

34. A project requires the parsing of names into given and family, validating address information,
and finding duplicates across several systems. Name the transforms needed and the task they will
perform.
Answer:

Data Cleanse: Parse names into given and family.

Address Cleanse: Validate address information.

Match: Find duplicates.

35. Describe when to use the USA Regulatory and Global Address Cleanse transforms.
Answer:
Use the USA Regulatory transform if USPS certification and/or additional options such as DPV and Geocode
are required. Global Address Cleanse should be utilized when processing multi-country data.
36. Give two examples of how the Data Cleanse transform can enhance (append) data.
Answer:
The Data Cleanse transform can generate name match standards and greetings. It can also assign gender
codes and prenames such as Mr. and Mrs.
37. What are name match standards and how are they used?
Answer:
Name match standards illustrate the multiple ways a name can be represented.They are used in the match
process to greatly increase match results.
38. What are the different strategies you can use to avoid duplicate rows of data when re-loading a
job.

Answer:

Using the auto-correct load option in the target table.

Including the Table Comparison transform in the data flow.

Designing the data flow to completely replace the target table during each execution.

Including a preload SQL statement to execute before the table loads.

39. What is the use of Auto Correct Load?


Answer:
It does not allow duplicated data entering into the target table.It works like Type 1 Insert else Update the
rows based on Non-matching and matching data respectively.
40. What is the use of Array fetch size?
Answer:
Array fetch size indicates the number of rows retrieved in a single request to a source database. The default
value is 1000. Higher numbers reduce requests, lowering network traffic, and possibly improve performance.
The maximum value is 5000
41. What are the difference between Row-by-row select and Cached comparison table and sorted
input in Table Comparison Tranform?
Answer:

Row-by-row select look up the target table using SQL every time it receives an input row. This
option is best if the target table is large.

Cached comparison table To load the comparison table into memory. This option is best when
the table fits into memory and you are comparing the entire target table

Sorted input To read the comparison table in the order of the primary key column(s) using
sequential read.This option improves performance because Data Integrator reads the comparison
table only once.Add a query between the source and the Table_Comparison transform. Then, from
the querys input schema, drag the primary key columns into the Order By box of the query.

42. What is the use of using Number of loaders in Target Table?


Answer:
Number of loaders loading with one loader is known as Single loader Loading. Loading when the number
of loaders is greater than one is known as Parallel Loading. The default number of loaders is 1. The
maximum number of loaders is 5.
43. What is the use of Rows per commit?
Answer:
Specifies the transaction size in number of rows. If set to 1000, Data Integrator sends a commit to the
underlying database every 1000 rows.
44. What is the difference between lookup (), lookup_ext () and lookup_seq ()?
Answer:

lookup() : Briefly, It returns single value based on single condition

lookup_ext(): It returns multiple values based on single/multiple condition(s)

lookup_seq(): It returns multiple values based on sequence number

45. What is the use of History preserving transform?


Answer:
The History_Preserving transform allows you to produce a new row in your target rather than updating an
existing row. You can indicate in which columns the transform identifies changes to be preserved. If the
value of certain columns change, this transform creates a new row for each row flagged as UPDATE in the
input data set.
46. What is the use of Map-Operation Transfrom?
Answer:
The Map_Operation transform allows you to change operation codes on data sets to produce the desired
output. Operation codes: INSERT UPDATE, DELETE, NORMAL, or DISCARD.
47. What is Heirarchy Flatenning?
Answer:
Constructs a complete hierarchy from parent/child relationships, and then produces a description of the
hierarchy in vertically or horizontally flattened format.

Parent Column, Child Column

Parent Attributes, Child Attributes.

48. What is the use of Case Transform?


Answer:
Use the Case transform to simplify branch logic in data flows by consolidating case or decision-making logic
into one transform. The transformallows you to split a data set into smaller sets based on logical branches.
49. What must you define in order to audit a data flow?
Answer:
You must define audit points and audit rules when you want to audit a data flow.
50. List some factors for PERFORMANCE TUNING in data services?
Answer:
The following sections describe ways you can adjust Data Integrator performance

Source-based performance options

Using array fetch size

Caching data

Join ordering

Minimizing extracted data

Target-based performance options

Loading method and rows per commit

Staging tables to speed up auto-correct loads

Job design performance options

Improving throughput

Maximizing the number of pushed-down operations

Minimizing data type conversion

Minimizing locale conversion

Improving Informix repository performance

What are SAP R/3 sources?


What is degree of parallelism?
Types of caches in Dataflow?
what is ABAP dataflow?
Complex situations in BODS?
Describe table comparison tranform?
How to capture data, net change data, if there is no identifier flags in source (no dates, no flags
etc..)?
What are real time jobs?
What is embedded dataflows?
Types of transforms in BODS?
What is Idoc files?
Scenarios:
1. How to update data through script?
sql('DS_ILX', 'update tab_x set a = /'XXX/' where a is null');
2. Create files based on department? If department numbers are fixed.
Use case transform for each department.
3. Create files based on department? If department numbers are not fixed.

Company Name
Design type 2 in BODS?
1. History preserve transform
2. Table comparison, map operation, case transform
What will happen if a column removed in sql t/f?
1. Job will not show error while validating but during running or if we update schema in sql t/f, it will
fail.

How to call a batch script in BODS?


1. EXEC function
How oracle table values loaded into variables?
1. SQL function
How to use pushdown sql in BODS?
1. To limit data from source/target
2. Reduce the barrier between source/target to BODS
What are date types available?
Date
Datetime
Timestamp
What are the available memory types in data flows?
1. Pageable
2. In Memory
The software provides the following types of caches that your data flow can use for all of the
operations it contains:
In-memory
Use in-memory cache when your data flow processes a small amount of data that fits in memory.
Pageable cache
Use a pageable cache when your data flow processes a very large amount of data that does not fit in
memory.
How to import data from table in datastore?
1. Import table by giving owner name

Company Name
How many types datastores in BODS?
3 Types
Databases and mainframe file systems.
Applications that have pre-packaged or user-written adapters.
J.D. Edwards One World and J.D. Edwards World, Oracle Applications, PeopleSoft, SAP ERP and
SAP NetWeaver BW, and Siebel Applications. See the appropriate supplement guide.
Difference between parameters and variables?
What are the available lookups and brief about those?
Lookup
Lookup_ext
Lookup_seq

Difference between row by row and cached in table comparison?


Huge (more than 50% source) delta data/net change data, then cached
Less than 20% source data then row by row
Available flags in Map Operation transform?
Insert, update, delete, discard
What is the use of row limit fetch?
Commit check points
What is hierarchy transform?
What is dataflow audit?
Use of number of loaders in target?
What are SCD?
Slowly changing concept for dimensions only
Type1
Type2
Type3
What are additive and non additives?
This concept for Fact only
What is conform dimension?
Time dimension - Fiscal and calender can be maintained with same dimension
Same dimension for sales fact and inventory facts

Company Name
Brief about oracle exception? Built in and user defined?
Write an exception to update data when data not inserted?
Exception when unique constraint error exists - Duplicate
Can we do DML operation in oracle function?
Yes
Can we call oracle function with DML operation in SELECT statement?
It throws error

What is the use of user hints?


If table is having multiple indexes, we can suggest which index need to be used.
Parallel option
How parallel hint works?
Based on number of usages
Difference between procedure and package?
Procedure will be locked state when it's using
Package will not be locked
Company Name

What is incremental load?

How dimensional loading and fact loading will be implemented in BODS?

What are the transforms used in BODS?

What is the use of MAP_OPERATION transform?

What are the components in BODS?


Designer
Admin console
Server Manager
Repository Manager
License Manager
Metadata Manager
Locale Selector

What is the types of variables?


Global Variables
Local Variables

What is the use of parameters?

Types of error logs in BODS?


Trace
Monitor
Error log

How to improve performance of a job?

How to identify long running dataflows in a job?

How to improve performance in BODS?


Lookup Caches, Join Ranks

What are the steps included in Data integration process?


Stage data in an operational datastore, data warehouse, or data mart.
Update staged data in batch or real-time modes.
Create a single environment for developing, testing, and deploying the entire data integration
platform.
Manage a single metadata repository to capture the relationships between different extraction and
access methods and provide integrated lineage and impact analysis.
Define the terms Job, Workflow, and Dataflow
A job is the smallest unit of work that you can schedule independently for execution.
A work flow defines the decision-making process for executing data flows.
Data flows extract, transform, and load data. Everything having to do with data,
including reading sources, transforming data, and loading targets, occurs inside a data flow.
Arrange these objects in order by their hierarchy: Dataflow, Job, Project, and Workflow.
Project, Job, Workflow, Dataflow.
What are reusable objects in DataServices?
Job, Workflow, Dataflow.
What is a transform?
A transform enables you to control how datasets change in a dataflow.
What is a Script?
A script is a single-use object that is used to call functions and assign values in a workflow.
What is a real time Job?
Real-time jobs "extract" data from the body of the real time message received and from any
secondary sources used in the job.
What is an Embedded Dataflow?
An Embedded Dataflow is a dataflow that is called from inside another dataflow.
What is the difference between a data store and a database?

A datastore is a connection to a database.


How many types of datastores are present in Data services?
Three.
Database Datastores: provide a simple way to import metadata directly froman RDBMS.
Application Datastores: let users easily import metadata frommost Enterprise Resource Planning
(ERP) systems.
Adapter Datastores: can provide access to an application's data and metadata or just metadata.
What is the use of Compace repository?
Remove redundant and obsolete objects from the repository tables.
What are Memory Datastores?
Data Services also allows you to create a database datastore using Memory as the Database type.
Memory Datastores are designed to enhance processing performance of data flows executing in realtime jobs.
What are file formats?
A file format is a set of properties describing the structure of a flat file (ASCII). File formats describe
the metadata structure. File format objects can describe files in:
Delimited format - Characters such as commas or tabs separate each field.
Fixed width format - The column width is specified by the user.
SAP ERP and R/3 format.
Which is NOT a datastore type?
File Format
What is repository? List the types of repositories.
The DataServices repository is a set of tables that holds user-created and predefined system objects,
source and target metadata, and transformation rules. There are 3 types of repositories.
A local repository
A central repository
A profiler repository
What is the difference between a Repository and a Datastore?
A Repository is a set of tables that hold system objects, source and target metadata, and
transformation rules. A Datastore is an actual connection to a database that holds data.
What is the difference between a Parameter and a Variable?
A Parameter is an expression that passes a piece of information to a work flow, data flow or custom
function when it is called in a job. A Variable is a symbolic placeholder for values.
When would you use a global variable instead of a local variable?
When the variable will need to be used multiple times within a job.
When you want to reduce the development time required for passing values between job
components.
When you need to create a dependency between job level global variable name and job
components.
What is Substitution Parameter?
The Value that is constant in one environment, but may change when a job is migrated to another
environment.
List some reasons why a job might fail to execute?
Incorrect syntax, Job Server not running, port numbers for Designer and Job Server not matching.
List factors you consider when determining whether to run work flows or data flows serially or in

parallel?
Consider the following:
Whether or not the flows are independent of each other
Whether or not the server can handle the processing requirements of flows running at the same time
(in parallel)
What does a lookup function do? How do the different variations of the lookup function differ?
All lookup functions return one row for each row in the source. They differ in how they choose which
of several matching rows to return.
List the three types of input formats accepted by the Address Cleanse transform.
Discrete, multiline, and hybrid.
Name the transform that you would use to combine incoming data sets to produce a single output
data set with the same schema as the input data sets.
The Merge transform.
What are Adapters?
Adapters are additional Java-based programs that can be installed on the job server to provide
connectivity to other systems such as Salesforce.com or the JavaMessagingQueue. There is also a
SoftwareDevelopment Kit (SDK) to allow customers to create adapters for custom applications.
List the data integrator transforms
Data_Transfer
Date_Generation
Effective_Date
Hierarchy_Flattening
History_Preserving
Key_Generation
Map_CDC_Operation
Pivot Reverse Pivot
Table_Comparison
XML_Pipeline
List the Data Quality Transforms
Global_Address_Cleanse
Data_Cleanse
Match
Associate
Country_id
USA_Regulatory_Address_Cleanse
What are Cleansing Packages?
These are packages that enhance the ability of Data Cleanse to accurately process various forms of
global data by including language-specific reference data and parsing rules.
What is Data Cleanse?
The Data Cleanse transform identifies and isolates specific parts of mixed data, and standardizes
your data based on information stored in the parsing dictionary, business rules defined in the rule file,
and expressions defined in the pattern file.
What is the difference between Dictionary and Directory?
Directories provide information on addresses from postal authorities. Dictionary files are used to
identify, parse, and standardize data such as names, titles, and firm data.

Give some examples of how data can be enhanced through the data cleanse transform, and describe
the benefit of those enhancements.
Enhancement Benefit
Determine gender distributions and target
Gender Codes marketing campaigns
Provide fields for improving matching
Match Standards results
A project requires the parsing of names into given and family, validating address information, and
finding duplicates across several systems. Name the transforms needed and the task they will
perform.
Data Cleanse: Parse names into given and family.
Address Cleanse: Validate address information.
Match: Find duplicates.
Describe when to use the USA Regulatory and Global Address Cleanse transforms.
Use the USA Regulatory transform if USPS certification and/or additional options such as DPV and
Geocode are required. Global Address Cleanse should be utilized when processing multi-country
data.
Give two examples of how the Data Cleanse transform can enhance (append) data.
The Data Cleanse transform can generate name match standards and greetings. It can also assign
gender codes and prenames such as Mr. and Mrs.
What are name match standards and how are they used?
Name match standards illustrate the multiple ways a name can be represented.They are used in the
match process to greatly increase match results.
What are the different strategies you can use to avoid duplicate rows of data when re-loading a job.
Using the auto-correct load option in the target table.
Including the Table Comparison transform in the data flow.
Designing the data flow to completely replace the target table during each execution.
Including a preload SQL statement to execute before the table loads.
What is the use of Auto Correct Load?
It does not allow duplicated data entering into the target table.It works like Type 1 Insert else Update
the rows based on Non-matching and matching data respectively.
What is the use of Array fetch size?
Array fetch size indicates the number of rows retrieved in a single request to a source database. The
default value is 1000. Higher numbers reduce requests, lowering network traffic, and possibly
improve performance. The maximum value is 5000
What are the difference between Row-by-row select and Cached comparison table and sorted input
in Table Comparison Tranform?
Row-by-row select look up the target table using SQL every time it receives an input row. This option
is best if the target table is large.
Cached comparison table - To load the comparison table into memory. This option is best when the
table fits into memory and you are comparing the entire target table
Sorted input - To read the comparison table in the order of the primary key column(s) using
sequential read.This option improves performance because Data Integrator reads the comparison
table only once.Add a query between the source and the Table_Comparison transform. Then, from
the query's input schema, drag the primary key columns into the Order By box of the query.
What is the use of using Number of loaders in Target Table?

Number of loaders loading with one loader is known as Single loader Loading. Loading when the
number of loaders is greater than one is known as Parallel Loading. The default number of loaders is
1. The maximum number of loaders is 5.
What is the use of Rows per commit?
Specifies the transaction size in number of rows. If set to 1000, Data Integrator sends a commit to the
underlying database every 1000 rows.
What is the difference between lookup (), lookup_ext () and lookup_seq ()?
lookup() : Briefly, It returns single value based on single condition
lookup_ext(): It returns multiple values based on single/multiple condition(s)
lookup_seq(): It returns multiple values based on sequence number
What is the use of History preserving transform?
The History_Preserving transform allows you to produce a new row in your target rather than
updating an existing row. You can indicate in which columns the transform identifies changes to be
preserved. If the value of certain columns change, this transform creates a new row for each row
flagged as UPDATE in the input data set.
What is the use of Map-Operation Transfrom?
The Map_Operation transform allows you to change operation codes on data sets to produce the
desired output. Operation codes: INSERT UPDATE, DELETE, NORMAL, or DISCARD.
What is Heirarchy Flatenning?
Constructs a complete hierarchy from parent/child relationships, and then produces a description of
the hierarchy in vertically or horizontally flattened format.
Parent Column, Child Column
Parent Attributes, Child Attributes.
What is the use of Case Transform?
Use the Case transform to simplify branch logic in data flows by consolidating case or decisionmaking logic into one transform. The transformallows you to split a data set into smaller sets based
on logical branches.
What must you define in order to audit a data flow?
You must define audit points and audit rules when you want to audit a data flow.
List some factors for PERFORMANCE TUNING in data services?
The following sections describe ways you can adjust Data Integrator performance
Source-based performance options
Using array fetch size
Caching data
Join ordering
Minimizing extracted data
Target-based performance options
Loading method and rows per commit
Staging tables to speed up auto-correct loads
Job design performance options
Improving throughput
Maximizing the number of pushed-down operations
Minimizing data type conversion
Minimizing locale conversion
Improving Informix repository performance

You might also like