You are on page 1of 33

SQL Server Integration 2005 Services

Essentials

1. Two Categories of Control Flow tasks: a) Control Flow Items and b)


Maintenance Plan tasks.

2. Various Control Flow Items:


• Bulk Insert task: The Bulk Insert task lets you perform a fast load of data from flat
files into a target table. It’s similar to using BCP to bulk copy a file into a table,
and is appropriate for loading clean data. If you need to transform data that’s
stored in a flat file, you should instead use a data flow task that uses the flat file as
a source.
• Execute SQL task: The Execute SQL task, as you might expect, will execute a
SQL statement against a specified database. The database need not be SQL
Server, but the statement must be written in the appropriate dialect of SQL. Use
this task to perform database operations like creating views, tables, or even
databases. A common use of this task is to query data or metadata, and to put the
query results into a variable.
• File Transfer Protocol and File System tasks: Use the File Transfer Protocol task
to transfer files and sets of files. Use the File System task to perform file system
operations like copying, moving, or deleting files and folders.
• Execute Package, Execute DTS2000 Package, and Execute Process tasks: The
Execute Package task executes an Integration Services package. With it you can
(and should) break a complex workflow into smaller packages, and define a
parent or master package to execute them. Create a separate package to populate
each table in your data warehouse database. The use of parent and children
packages enables the modularization and reuse of complex logic. If you have
DTS2000 packages running already in your production system, you can run them
with the Execute DTS2000 Package task. The Execute Process task will run any
operating system process. For example, you may have a custom process that will
generate the source system extract, or you may invoke a non-SQL Server
relational database’s bulk loader.
• Send Mail task: The Send Mail task sends an email message. You will almost
certainly use this task in an event handler, for example to send a message to an
operator about a processing failure.
• Script and ActiveX Script tasks: These tasks are available to perform an endless
array of operations that are beyond the scope of the standard tasks. The ActiveX
Script task is provided for backwards compatibility to DTS2000; use the Script
task for new work. The Script task uses Visual Basic .NET from the Visual Studio
for Applications environment. Or, you can use any .NET language to create a
custom task that will become available in the list of control flow tasks. Defining a
custom task is a programming job, rather than simply scripting, but has the
significant benefit of re-use.
• Data Mining and Analysis Services Processing tasks: The Data Mining task runs
an Analysis Services data mining query and saves the results to a table. The
Analysis Services Processing task will launch processing on Analysis Services
dimensions and databases. Use the Analysis Services DDL task to create new
Analysis Services partitions, or perform any data definition language operation.
There are Data Mining and Analysis Services Data Flow transforms, as well as
these control flow tasks. Use the Analysis Services control flow tasks to fully or
incrementally update your databases and models. The use of the corresponding
Data Flow transforms is discussed in the next section.
• XML and Web Services tasks: The XML task retrieves XML documents and
applies XML operations to them. Use the XML task to validate an XML
document against its XSD schema, or to compare or merge two XML documents.
Use the Web Services task to make calls to a web service.
• Message Queue, WMI Data Reader, and WMI Event Watcher tasks: These tasks
are useful for building an automated ETL system. The Message Queue task uses
Microsoft Message Queue (MSMQ) to manage tasks in a distributed system. You
can use the WMI tasks to coordinate with the Windows Management Interface,
and automate the execution of a package or set of tasks when a specific system
event has occurred.
• ForEach Loop, For Loop, and Sequence containers: Use containers like the
ForEach Loop and For Loop to execute a set of tasks multiple times. For example,
you can loop over all the tables in a database, performing a standard set of
operations like updating index statistics. The Sequence container groups together
several tasks. Use it to define a transaction boundary around a set of tasks so they
all fail or succeed together. Or, use it simply to reduce the clutter on the design
surface by hiding the detailed steps within the sequence. We used a sequence
container in the package illustrated in Figure 5.2, for the Sales Reason and Sales
Reason Bridge tables. You can also group control flow objects, and collapse or
expand those groups. There’s no task for grouping. Simply select several objects,
right-click, and choose Group. In Figure 5.2 we grouped the Execute Package
tasks for several small dimensions, and collapsed that into the Small dimensions
group.
• Data Flow task: The Data Flow task is where most ETL work is performed. The
Data Flow task is discussed in the next section.

3. Data Flow Task definition: The Data Flow task is a pipeline in which data is
picked up, processed, and written to a destination. The key characteristic of the
pipeline is defined by the task’s name: The data flows through the pipeline in
memory. An implication of the data flow pipeline architecture is that avoiding I/O
provides excellent performance, subject to the memory characteristics of the
physical system.

4. Various Data Flow Components:

Data Sources

There are several kinds of predefined source adapters, called Data Flow Sources in the
toolbox pane:
• Use the OLE DB Source adapter to extract data from any source that has an OLE
DB provider, including SQL Server, Oracle, and DB2. You can source from a
table or view, or from a SQL query or stored procedure. You may source from a
query or stored procedure if your source table is huge and you want only a
relatively small number of rows. Figure 5.3 depicts a data flow that uses an OLE
DB data source.
• Use the DataReader Source adapter to extract data from a .NET Provider like
ADONET:SQL or ADONET:ORACLE. You’ll probably use an OLE DB provider
instead, even for sourcing data from SQL Server because it has a nicer user
interface. If performance is vital, use the DataReader source adapter, which has
slightly better performance than the OLE DB adapter, but only the most basic user
interface.
• The Flat File Source adapter pulls data from a flat file. You could use an OLE
DB or ODBC flat file driver, but we prefer the Flat File Source because it has a
better UI and handles bad data more flexibly.
• The Raw File Source adapter pulls data from a raw data file, which is a format
unique to Integration Services. Integration Services writes data to and reads data
from the raw format very efficiently. Raw files are a great format to use for data
staging, for example to store data extracts during the day (or month) until the
main transformation logic kicks off. Also use raw files to hold a fast “backup” of
the dimension tables or fact rows that will be updated during an ETL operation.
• The XML Source adapter pulls data from an XML source. The XML source can
be either in a file or in a package variable (which, presumably, you set in an
earlier task in the package). You can optionally apply an XSD schema definition
against the XML data. You should hesitate to take any XML data that doesn’t
have a schema! Because XML data is so unstructured, the source adapter lets you
break out the data into multiple destinations. These might flow into different
tables, or be handled in different ways before being recombined into a single
table.
• You can define a custom source adapter to consume data from a unique source,
like a file whose rows have different formats under different conditions. For a
single-time use, you could use the Script transform to write the source adapter. To
easily re-use the custom adapter, develop it in any VS.NET language and install
the custom adapter into the list of data flow transformations, as described in
Books Online.

Data Destinations

There are several kinds of predefined destination adapters, called Data Flow Destinations.
These are available in the same toolbox illustrated in Figure 5.3, simply by scrolling
down in the toolbox list.

• The OLE DB destination adapter will load data into any OLE DB target database
or table, like SQL Server, Oracle, DB2, or even Excel. If you are loading a large
volume of data into non-SQL Server OLE DB destination, it’s usually more
efficient to write the data as a flat file and use the Execute Process control flow
task to invoke the target database’s bulk loader.
• You can use the SQL Server destination adapter to load data into the SQL Server
relational database, instead of the OLE DB adapter. There are several differences
between the two adapters:
o The SQL Server adapter can write data only to a SQL Server relational
database running on the same machine as Integration Services.
o The SQL Server adapter requires that data be formatted perfectly; it skips
some data checking and conversion steps that the OLE DB adapter
performs for you.
o The SQL Server adapter can perform faster than the OLE DB adapter
because it skips those data checking and conversion steps.
o Unlike the OLE DB adapter, the SQL Server adapter doesn’t let you set a
batch size or commit size for bulk loading. In some scenarios, setting these
parameters optimally enables the OLE DB adapter to out-perform the SQL
Server adapter.

• Use OLE DB during the early stages of development because it’s more forgiving
of data type mismatches. Late in the development cycle, test the performance of
the two adapters to determine which you want to use in production.

• The Flat File and Raw File destination adapters write the data stream to the file
system. Use raw files if the data will be consumed only by Integration Services.
Flat files are a standard format appropriate for many uses.
• Use the Data Reader destination adapter to generate a Reporting Services report
directly from an Integration Services package. This feature enables an interesting
use of Integration Services for real-time DW/BI systems, as we discuss in Chapter
17.
• The Recordset destination adapter populates an ADODB recordset object held in
memory.
• The SQL Mobile destination adapter is appropriate for pushing data down to SQL
Server Mobile Edition databases.
• The Dimension and Partition Processing destination adapters push the pipeline
data into an Analysis Services dimension or fact partition, without the usual
intermediate step of writing the data to a relational database. In this scenario, the
Analysis Services database is populated directly from the ETL data stream. This
feature is most interesting for real-time DW/BI systems.
• You can define a custom destination adapter to write data to a custom target, just
as you could develop a custom source adapter. We hope someone will develop a
flexible, high-performance Oracle destination.

Data Transformations

Between sources and destinations are the data transformation steps. These include:

• Sort and Aggregate transforms perform high performance sorting and


aggregation. The Aggregate transform can generate multiple aggregates in a single
step. The Sort transform can optionally remove duplicate rows. Both the Sort and
Aggregate transforms are asynchronous, which means they consume all the data
in the flow before they generate an output flow. When you think about what
they’re doing, this makes perfect sense. The performance of the Sort and
Aggregate transforms is excellent, and should meet most applications’ sorting and
aggregation requirements. In exceptional circumstances you may need a third-
party sort or aggregation utility that integrates with Integration Services.

• Conditional Split and Multicast transforms create multiple output streams from
one data stream. Use Conditional Split to define a condition that divides the data
stream into discrete pieces, for example based on a column’s value being greater
or less than 100. You can treat the split streams differently and recombine them
later, or you can use the Conditional Split to filter the data. The Multicast
transform efficiently replicates a stream, perhaps for parallel processing. The
Conditional Split transform sends each input row to one and only one output
stream; Multicast sends each input row to each output stream.

• The Union All, Merge, Merge Join, and Lookup transforms all join multiple data
streams. Use Union All to combine multiple streams with similar structures,
perhaps from an upstream Conditional Split, or for Customer records from
multiple source systems. Merge is very similar to Union All, but it interleaves
sorted rows. If you don’t care about the order of the rows in the output stream,
Union All is both more forgiving and more efficient. Merge Join is a lot like a
database join: it merges rows based on columns in common, and can perform a
left, full, or inner join. The Lookup transform is very useful for decoding a code
from the source data into a friendly label using information stored in a database
table. You’ll make heavy use of the Lookup transform in performing the surrogate
key assignment on your incoming fact data. The operations performed by these
four transforms could all be performed using SQL queries. But if the data is in the
pipeline it’s easier and almost always more efficient to use these transforms than
to write to a table and potentially index that table before performing a SQL query.

• The Character Map, Copy/Map, Data Conversion, and Derived Column


transforms all do basic transformations. Character Map works only on string data,
and will perform transformations like changing case, and changing character
width for international strings. Copy/Map simply creates a second copy of an
existing column in the stream. Data Conversion changes the data type of one or
more columns, for example translating a date string to a true date. Finally, the
Derived Column transform is where the most interesting transformation logic
occurs. Use Integration Services variables and expressions, discussed in the text
that follows, to develop complex transformations on one or more columns, like
parsing or concatenating strings, calculating a mathematical expression, or
performing a date/time computation. The Derived Column transform dialog box
calls the expression editor to help you construct your expressions. We use the
Derived Column transform in Figure 5.3.
• The Slowly Changing Dimension transform does a lot more for you than most of
the other transforms. It launches a wizard that generates a bunch of objects to
manage dimension changes. You’ll use the wizard to specify which columns take
a Type 1 change, a Type 2 change, or which can never be changed. The generated
objects automatically compare the incoming dimension rows with the existing
dimension, propagate a new row for the Type 2 dimension if necessary, update
Type 1 attributes, and manage the common slowly changing dimension metadata
attributes. You could do all this by hand using more atomic Integration Services
objects, but the Slowly Changing Dimension transform is a lot easier.

• Use the OLE DB Command transform to update or delete rows in a target table,
based on the data in the data flow. The OLE DB Command transformation
executes a SQL statement for each row in a data flow; the SQL statement is
usually parameterized. The OLE DB Command is the only way to execute an
UPDATE statement from within the Data Flow. If your ETL process requires a
large number of updates—most common for snapshot fact tables—you should
consider writing the target rows to a staging table and performing a bulk UPDATE
statement from the Execute SQL task in the Control Flow instead.

• The Row Count and Audit transforms are most useful for developing process
metadata about the data flow. Use the Row Count transform to efficiently count
the rows that will be loaded to the target, or that flow into the error handler. Audit
captures information about the system environment at the time the data flow task
is run, including the computer name, user id, and execution start time.

• You can use the Percentage Sampling and Row Sampling transforms during
package testing, to operate against a subset of the source data. You’ll also use
sampling to choose a random sample for deep analysis and data mining, and for
building your sandbox source system as described later in this chapter.

• The Pivot and Unpivot transforms will reduce or increase, respectively, the
number of rows in the data stream. Use the Unpivot transform to create 12 rows
of data from a single source row that has 12 columns holding a year’s worth of
monthly values.

• Use the Data Mining Model Training and Data Mining Query transforms to
incorporate Analysis Services data mining technology into your ETL application.
For example, you may have a Customer Score attribute in the Customer
dimension, whose value comes from a data mining model. These transforms will
compute that model and return the results into the data flow pipeline.
• Fuzzy Grouping and Fuzzy Lookup transforms employ fuzzy logic algorithms that
were developed by Microsoft Research. These are useful transforms for data
cleansing and de-duplication. Fuzzy Lookup, like the Lookup transform,
compares and matches two sets of data, but with a measure of confidence based
on semantic similarity. Fuzzy Grouping acts much the same way, but is used to
de-duplicate data on fuzzy criteria. Unlike third-party tools that perform
name/address matching based on a detailed history of addresses, these fuzzy
algorithms are based on semantics. You have to experiment with the settings, but
the transforms are valuable for a wide range of data cleansing problems.
• Term Extraction and Term Lookup transforms employ text-mining algorithms that
were also developed by Microsoft Research. Term Extraction is a text-mining
component which returns key terms from a document in a text column. Term
Lookup matches a document to key terms in a reference table.
• The File Extractor and File Injector transforms are used primarily to strip out
(extract) text and image data from a data flow and put it into a file or files, or to
add such data from files into the data flow.
• The Script component provides a simple mechanism for creating a custom
transformation in the Visual Studio for Applications environment. We include a
simple example of using the Script component in Chapter

5. Synchronous and asynchronous Outputs:

Transformation components with synchronous outputs receive rows from upstream


components, and read or modify the values in the columns of these rows as they pass the
rows to downstream components. They may also define additional output columns that
are derived from the columns provided by upstream components, but they do not add
rows to the data flow. This type of component is suited for tasks where the data is
modified inline as it is provided to the component and where the component does not
have to see all the rows before processing them.

Transformation components with asynchronous outputs cannot output rows until the
component has received all its input rows, or when the transformation does not produce
exactly one output row for each row received as input. The Aggregate transformation, for
example, cannot calculate a sum across rows until it has read all the rows.

6. Data Viewers:

Data viewers display data between two components in a data flow. Data viewers can
display data when the data is extracted from a data source and first enters a data flow,
before and after a transformation updates the data, and before the data is loaded into its
destination. To view the data, you attach data viewers to the path that connects two data
flow components. The ability to view data between data flow components makes it easier
to identify unexpected data values, view the way a transformation changes column
values, and discover the reason that a transformation fails. A data viewer can display data
in a grid, histogram, scatter plot, or column chart.
• Using a grid, you select the columns to display. The values for the selected
columns display in a tabular format.

• Using a histogram, you select the column to model in the histogram. A histogram
shows the distribution of numeric values and works only with numeric data.

• Using a scatter plot, you select the columns to appear on the x-axis and y-axis.
The scatter plot graph works only with numeric data.

• Using a chart format, you select the column to model with the column chart. The
column chart displays the occurrence count of discrete values in the selected
column.

7. Variables

Variables store values that a SQL Server 2005 Integration Services (SSIS) package and its
containers, tasks, and event handlers can use at run time. The scripts in the Script task and
the Script component can also use variables. The precedence constraints that sequence
tasks and containers into a workflow can use variables when their constraint definitions
include expressions.

You can use variables in Integration Services packages for the following purposes:

• Updating properties of package elements at run time. For example, dynamically


set the number of concurrent executables that a Foreach Loop container allows
every time that the package runs.

• Including an in-memory lookup table. For example, a package can run an Execute
SQL task that loads a variable with data values.

• Loading variables with data values and then using them to specify a search
condition in a WHERE clause. For example, a package can update the script in a
Script task or a Transact-SQL statement in an Execute SQL task.

• Loading a variable with an integer and then using the value to control looping
within a package control flow. For example, the evaluation expression in a For
Loop container can use a variable to control iteration.

• Populating parameter values for Transact-SQL statements at run time. For


example, a package can run an Execute SQL task and then dynamically set the
parameters in a Transact-SQL statement.

• Building expressions that include variable values. For example, the Derived
Column transformation can populate a column with the result obtained by
multiplying a variable value by a column value.

Integration Services supports two types of variables: user-defined variables and system
variables. Package developers define user-defined variables, and system variables are
defined by Integration Services. You can create as many user-defined variables as a
package requires, but you cannot create additional system variables.

8. Connection Managers:

A connection manager is a logical representation of a connection. At design time, you set


the properties of a connection manager to describe the physical connection that
Integration Services creates when the package runs. For example, a connection manager
includes the ConnectionString property that you set at design time; at run time, a
physical connection is created using the value in the connection string property.

A package can use multiple instances of a connection manager type, and you can set the
properties on each instance. At run time, each instance of a connection manager type
creates a connection that has different attributes. The following table lists the connection
manager types.

Connects to ActiveX Data Objects (ADO) ADO Connection


ADO
objects. Manager
Connects to a data source by using a .NET ADO.NET
ADO.NET
provider. Connection Manager
Excel Connection
EXCEL Connects to an Excel workbook file.
Manager
File Connection
FILE Connects to a file or a folder.
Manager
Flat File Connection
FLATFILE Connect to data in a single flat file.
Manager
FTP Connection
FTP Connect to an FTP server.
Manager
HTTP Connection
HTTP Connects to a web server.
Manager
MSMQ Connection
MSMQ Connects to a message queue.
Manager
Connects to an instance of SQL Server 2005
Analysis Services
MSOLAP90 Analysis Services (SSAS) or an Analysis
Connection Manager
Services project.
Multiple Files
MULTIFILE Connects to multiple files and folders.
Connection Manager
Multiple Flat Files
MULTIFLATFILE Connects to multiple data files and folders.
Connection Manager
Connects to a data source by using an OLE DB OLE DB Connection
OLEDB
provider. Manager
ODBC Connection
ODBC Connects to a data source by using ODBC.
Manager
Connects to a SQL Management Objects (SMO) SMO Connection
SMOServer
server. Manager
SMTP Connects to an SMTP mail server. SMTP Connection
Manager
SQL Server Mobile
SQLMOBILE Connects to a SQL Server Mobile database.
Connection Manager
Connects to a server and specifies the scope of
WMI Connection
WMI Windows Management Instrumentation (WMI)
Manager
management on the server.

9. Log Providers
Log providers are an important part of the IS logging infrastructure. Every IS object
has access to the logging infrastructure. Log providers are, yet again, pluggable
components that are responsible for taking log entries sent by components in the
package and writing them to the particular destination and medium they support. For
example, a stock XML log provider writes log entries to an XML file. Tasks and other
components are not aware of the destination to which they send their log entries. They
just send their logs to the IS runtime and the runtime routes the log to the configured
log providers. The log providers then take the log entries and convert them to the
correct format and save the log entries based on how they are configured. It's also
possible to log to more than one location at a time using two different log providers
simultaneously on the same package.

10. Precedence constraints

Precedence constraints link executables, containers, and tasks in packages into a control
flow, and specify conditions that determine whether executables run. An executable can
be a For Loop, Foreach Loop, or Sequence container; a task; or an event handler. Event
handlers also use precedence constraints to link their executables into a control flow.

A precedence constraint links two executables: the precedence executable and the
constrained executable. The precedence executable runs before the constrained
executable, and the execution result of the precedence executable may determine whether
the constrained executable runs. The following diagram shows two executables linked by
a precedence constraint.

You can configure precedence constraints in the following ways:

• Specify an evaluation operation. The precedence constraint uses a constraint


value, an expression, both, or either to determine whether the constrained
executable runs.
• If the precedence constraint uses an execution result, you can specify the
execution result to be success, failure, or completion.

• If the precedence constraint uses an evaluation result, you can provide an


expression that evaluates to a Boolean.

• Specify whether the precedence constraint is evaluated singly or together with


other constraints that apply to the constrained executable.

11. Containers

Containers are objects in SQL Server 2005 Integration Services (SSIS) that provide
structure to packages and services to tasks. They support repeating control flows in
packages, and they group tasks and containers into meaningful units of work. Containers
can include other containers in addition to tasks.

Packages use containers for the following purposes:

• Repeat tasks for each element in a collection, such as files in a folder, schemas, or
SQL Management Objects (SMO) objects. For example, a package can run
Transact-SQL statements that reside in multiple files.

• Repeat tasks until a specified expression evaluates to false. For example, a


package can send a different e-mail message seven times, one time for every day
of the week.

• Group tasks and containers that must succeed or fail as a unit. For example, a
package can group tasks that delete and add rows in a database table, and then
commit or roll back all the tasks when one fails.

Integration Services provides four types of containers for building packages. The
following table lists the container types.

Container Description
Foreach Loop
Runs a control flow repeatedly by using an enumerator.
Container
For Loop Container Runs a control flow repeatedly by testing a condition.
Groups tasks and containers into control flows that are subsets of the
Sequence Container
package control flow.
Task Host Container Provides services to a single task.

12. Difference between Control Flow and data flow

Following are the main differences between data flow and control flow:
• Execution model
• Connecting lines
• Functional gamut

Execution Model: Data flow and workflow components execute differently. The
duration of a given task's execution time is a fraction of the total duration of the
execution time of the package it runs within. Adding together the execution duration
times from each of the individual tasks can closely approximate the total package
execution duration. Of course, this is complicated when there is more than one
logically parallel executing thread. Then, the execution duration of the package is the
execution duration of the longest logical thread duration. In addition, tasks on the
same logical thread don't run at the same time. Estimating the total execution
duration of a Data Flow Task is a little more complicated, but conceptually it is driven
by the amount of data running through the Data Flow Task and the types of
transforms in the Data Flow Task. The significant difference is that it's possible for all
adapters and transforms on one logical thread to be active at the same time.

Connecting lines: Tasks have precedence constraints. Transforms have paths. Those
connecting lines look a lot alike, but they're very different. Precedence constraints as
described previously do little but determine if the next task can execute. Paths
actually represent outputs of transforms and the metadata associated with those
outputs. Precedence constraints control task execution; paths show where the data
flows .

Functional gamut: The functional gamut of tasks is virtually unlimited. Tasks can be
written to do just about anything any piece of software can do. Witness the stock tasks
that ship with IS that vary from the Script Task, which, in itself, represents the key to
unlock the full power of .Net software development, to the Data Flow Task. The
Windows Management Instrumentation (WMI) and Web Services Tasks are other
good examples of the range of different tasks you can write that use various emerging
technologies. Transforms, on the other hand, are very specific to processing data and
are strictly bound to the architecture of the Data Flow Task. This isn't to say that it is a
deficit for transforms, but merely to point out that tasks and transforms are quite
different. The beauty of these differences is that both are supported within one
framework and both are equally extensible through pluggable architectures.

12. TaskHost Container

The Taskhost is a very dedicated sort of container. Taskhosts live and die with the
tasks they contain. Every instance of a task has a Taskhost surrounding it, servicing
it, protecting it, and controlling it. When the task is deleted, the Taskhost goes with
it. Yet, when viewed within the designer, the task is all you see, whereas the
Taskhost is mostly invisible.

13. Packages

A package is an organized collection of connections, control flow elements, data flow


elements, event handlers, variables, and configurations, that you assemble using
either the graphical design tools that SQL Server 2005 Integration Services (SSIS)
provides, or build programmatically. You then save the completed package to SQL
Server 2005, the SSIS Package Store, or the file system. The package is the unit of
work that is retrieved, executed, and saved.

14. How do you do error handling in SSIS?


When a data flow component applies a transformation to column data, extracts data
from sources, or loads data into destinations, errors can occur. Errors frequently occur
because of unexpected data values.

For example, a data conversion fails because a column contains a string instead of a
number, an insertion into a database column fails because the data is a date and the
column has a numeric data type, or an expression fails to evaluate because a column
value is zero, resulting in a mathematical operation that is not valid.

Errors typically fall into one the following categories:

-Data conversion errors, which occur if a conversion results in loss of significant


digits, the loss of insignificant digits, and the truncation of strings. Data conversion
errors also occur if the requested conversion is not supported.
-Expression evaluation errors, which occur if expressions that are evaluated at run
time perform invalid operations or become syntactically incorrect because of missing
or incorrect data values.
-Lookup errors, which occur if a lookup operation fails to locate a match in the
lookup table.

Many data flow components support error outputs, which let you control how the
component handles row-level errors in both incoming and outgoing data. You specify
how the component behaves when truncation or an error occurs by setting options on
individual columns in the input or output.

For example, you can specify that the component should fail if customer name data is
truncated, but ignore errors on another column that contains less important data.

15. How do you do logging in SSIS?

SSIS includes logging features that write log entries when run-time events occur and
can also write custom messages.

Integration Services supports a diverse set of log providers, and gives you the ability
to create custom log providers. The Integration Services log providers can write log
entries to text files, SQL Server Profiler, SQL Server, Windows Event Log, or XML
files.

Logs are associated with packages and are configured at the package level. Each task
or container in a package can log information to any package log. The tasks and
containers in a package can be enabled for logging even if the package itself is not.
To customize the logging of an event or custom message, Integration Services
provides a schema of commonly logged information to include in log entries. The
Integration Services log schema defines the information that you can log. You can
select elements from the log schema for each log entry.

To enable logging in a package


1. In Business Intelligence Development Studio, open the Integration Services project
that contains the package you want.
2. On the SSIS menu, click Logging.
3. Select a log provider in the Provider type list, and then click Add.

16. How do you deploy SSIS packages?


SQL Server 2005 Integration Services (SSIS) makes it simple to deploy packages to
any computer.
There are two steps in the package deployment process:
-The first step is to build the Integration Services project to create a package
deployment utility.
-The second step is to copy the deployment folder that was created when you built the
Integration Services project to the target computer, and then run the Package
Installation Wizard to install the packages.

17. What are checkpoints?


SQL Server 2005 Integration Services (SSIS) can restart packages that failed from the
point of failure, instead of rerunning the whole package. If a package is configured to
use checkpoints, information about package execution is written to a checkpoint file.
When the failed package is rerun, the checkpoint file is used to restart the package
from the point of failure. If the package runs successfully, the checkpoint file is
deleted, and then re-created the next time the package is run.

Using checkpoints in a package can provide the following benefits.

• Avoid repeating the downloading and uploading of large files. For example, a
package that downloads multiple large files by using an FTP task for each
download can be restarted after the downloading of a single file fails and then
download only that file.

• Avoid repeating the loading of large amounts of data. For example, a package
that performs bulk inserts into dimension tables in a data warehouse using a
different Bulk Insert task for each dimension can be restarted if the insertion
fails for one dimension table, and only that dimension will be reloaded.

• Avoid repeating the aggregation of values. For example, a package that


computes many aggregates, such as averages and sums, using a separate Data
Flow task to perform each aggregation, can be restarted after computing an
aggregation fails and only that aggregation will be recomputed.

A package can be restarted only at the control flow level. You cannot restart a
package in the middle of a data flow. To avoid rerunning the whole data flow, the
package might be designed to include multiple data flows, each one using a
different Data Flow task. This way the package can be restarted, rerunning only
one Data Flow task.

18. What are breakpoints?

The Integration Services run-time engine allows packages, tasks, and other types
of containers to be suspended during execution by using breakpoints. The use of
breakpoints lets you review and correct errors that prevent your application or
tasks from running correctly. The breakpoint architecture enables the client to
evaluate the run-time value of objects in the package at defined points of
execution, and to decide if the task processing should suspend. Custom task
developers can utilize this architecture to create custom breakpoints by using the
IDTSBreakpointSite interface, and its parent class, the IDTSSuspend interface.
The IDTSBreakpointSite interface defines the interaction between the run-time
engine and the task for creating and managing custom breakpoints. The
IDTSSuspend interface provides methods and properties that are called by the
run-time engine to notify the task to suspend its execution.

19. What are Package Configurations?

SQL Server 2005 Integration Services (SSIS) provides package configurations for
updating the values of properties at run time. Using configurations you can create
packages, set properties on package objects during package development, and
then add the configuration to the package to update the properties at run time. For
example, you can update the values of variables or the connection strings of
connection managers.

Using package configurations can provide the following benefits:

• Configurations make it easier to move packages from a development


environment to a production environment. For example, a configuration
can update the connection string in the connection manager that the
package uses.

• Configurations are useful when you deploy packages to many different


servers. For example, a variable in the configuration for each deployed
package can contain a different disk space threshold, under which the
package does not run.

• Configurations make packages more flexible. For example, a


configuration can update the value of a variable that is used in a property
expression.

Package Configuration Types:

The following table describes the package configuration types.

Type Description
XML An XML file contains the configurations. The XML file can include
configuration file multiple configurations.
Environment
An environment variable contains the configuration.
variable
Registry entry A registry entry contains the configuration.
Parent package A variable in the package contains the configuration. This configuration
variable type is typically used to update properties in called packages.
A table in a SQL Server database contains the configuration. The table
SQL Server table
can include multiple configurations

20. Direct and Indirect configurations

Integration Services provides direct and indirect configurations. If you specify


configurations directly, Integration Services creates a direct link between the
configuration item and the package object property. Direct configurations are a
better choice when the location of the source does not change. For example, if you
are sure that all deployments in the package use the same file path, you can
specify an XML configuration file.

Indirect configurations use environment variables. Instead of specifying the


configuration setting directly, the configuration points to an environment variable,
which in turn contains the configuration value. Using indirect configurations is a
better choice when the location of the configuration can change for each
deployment of a package.

21.What is a deployment utility?

Deployment is really about getting the destination machine to "look" like the
machine where the package was built. If the package references a custom task,
user account, configuration file, or share on the development machine, the custom
task, user account, configuration file, and share must be available on the
destination machine. The designer provides a way to bundle the project packages
and miscellaneous files into a folder that you can copy to another machine. The
Deployment Utility is a small executable that knows how to install the bundle.
The deployment utility is created on the computer on which the Integration
Services project is stored.

You create a package deployment utility for an Integration Services project by


first configuring the build process to create a deployment utility, and then building
the project. When you build the project, all packages and package configurations
in the project are automatically included. To deploy additional files such as a
Readme file with the project, place the files in the Miscellaneous folder of the
Integration Services project. When the project is built, these files are also
automatically included.
22.Importing and exporting packages.

Packages can be saved either in the sysdtspackages90 table in the SQL Server
msdb database or in the file system.

You can import and export packages between the following storage types:

• File system folders anywhere in the file system.

• Folders in the SSIS Package Store. The two default folders are named File
System and MSDB.

• The SQL Server msdb database.

Integration Services gives you the ability to import and export packages, and by
doing this change the storage format and location of packages. Using the import
and export features, you can add packages to the file system, package store, or
msdb database, and copy packages from one storage format to another. For
example, packages saved in msdb can be copied to the file system and vice versa.

23. Deployment Wizard or package Installation Wizard usage

• Copies packages It copies packages to SQL Server or the file system.


• Copies XML configurations It copies XML configuration files the packages
reference.
• Copies dependent files It copies the files in the Miscellaneous Files node of
the package project.
• Modify configuration It allows you to modify the configurations.

The Deployment Wizard does not do the following:

• Move custom components It doesn't move custom components to the


destination machine, such as custom tasks or Data Flow transforms, upon
which packages rely.
• Recognize deltas It doesn't recognize when significant changes happen in the
package or dependent files or if they are unchanged. Nor does it modify what
it deploys based on such information. It simply copies all the files every time
you run it.
• Detect external dependencies It doesn't recognize when a package depends on
a file that is not included in the Miscellaneous Files node.

24. What are the Protection levels in SSIS package?

The following table describes the protection levels that Integration Services
provides.

Protection level Description


Suppresses sensitive information in the package
when it is saved. This protection level does not
encrypt, but instead it prevents properties that are
marked sensitive from being saved with the
Do not save sensitive
package and therefore makes the sensitive data
(DontSaveSensitive)
unavailable to other users. If a different user opens
the package, the sensitive information is replaced
with blanks and the user must provide the sensitive
information.
Encrypts the whole package by using a password.
The package is encrypted by using a password that
the user supplies when the package is created or
Encrypt all with password exported. To open the package in SSIS Designer or
(EncryptAllWithPassword) run the package by using the dtexec command
prompt utility, the user must provide the package
password. Without the password the user cannot
access or run the package.
Encrypts the whole package by using a key based
on the user profile. Only the same user using the
same profile can load the package. The package is
Encrypt all with user key encrypted by using a key that is based on the user
(EncryptAllWithUserKey) who created or exported the package. Only the
user who created or exported the package can open
the package in SSIS Designer or run the package
by using the dtexec command prompt utility.
Encrypts only the sensitive information in the
package by using a password. DPAPI is used for
this encryption. Sensitive data is saved as a part of
the package, but that data is encrypted by using a
password that the current user supplies when the
package is created or exported. To open the
package in SSIS Designer, the user must provide
Encrypt sensitive with password
the package password. If the password is not
(EncryptSensitiveWithPassword)
provided, the package opens without the sensitive
data and the current user must provide new values
for sensitive data. If the user tries to execute the
package without providing the password, package
execution fails. For more information about
passwords and command line execution, see
dtexec Utility.
Encrypt sensitive with user key Encrypts only the sensitive information in the
(EncryptSensitiveWithUserKey) package by using keys based on the current user.
Only the same user using the same profile can load
the package. If a different user opens the package,
the sensitive information is replaced with blanks
and the current user must provide new values for
the sensitive data. If the user attempts to execute
the package, package execution fails. DPAPI is
used for this encryption.
Protects the whole package using SQL Server
database roles. This option is supported only when
Rely on server storage for encryption a package is saved to the SQL Server msdb
(ServerStorage) database. It is not supported when a package is
saved to the file system from Business Intelligence
Development Studio.

25. What are the different types of Integration service roles?

The following table describes the read and write actions of the fixed database-
level roles in Integration Services.

Role Read action Write action


Enumerate own packages.

Enumerate all packages.

View own packages.


Import packages.
View all packages.
Delete own packages.
Execute own packages.
db_dtsadmin Delete all packages.
Execute all packages.
Change own package roles.
Export own packages.
Change all package roles.
Export all packages.

Execute all packages in SQL


Server Agent.
Enumerate own packages.

Enumerate all packages. Import packages.

db_dtsltduser View own packages. Delete own packages.

Execute own packages. Change own package roles.

Export own packages.


db_dtsoperator Enumerate all packages. None

View all packages.


Execute all packages.

Export all packages.

Execute all packages in SQL


Server Agent.
Stop all currently running
Windows administrators View all running packages
packages.
Non-Windows View any packages that they Stop any packages that they
administrators started. started.

26. The sysdtspackages90 table in MSDB

The sysdtspackages90 table in msdb contains the packages that are saved to SQL
Server. The sysdtspackages90 table includes columns that contain information about
the roles that are assigned to packages.

• The readerrole column specifies the role that has read access to the package.

• The writerrole column specifies the role that has write access to the package.

• The ownersid column contains the unique security identifier of the user who
created the package. This column defines the owner of the package.

27.What is Integration service service?

Microsoft SQL Server 2005 Integration Services (SSIS) includes the Integration
Services service, a Windows service for managing Integration Services packages.
The Integration Services service is available only in SQL Server Management
Studio.

Running the Integration Services service provides the following management


capabilities:

• Starting remote and locally stored packages

• Stopping remote and locally running packages

• Monitoring remote and locally running packages

• Importing and exporting packages

• Managing package storage

• Customizing storage folders

• Stopping running packages when the service is stopped


• Viewing the Windows Event log

• Connecting to multiple Integration Services servers

The Integration Services service is not required if you only want to design and
execute Integration Services packages. However, the service is required to list and
monitor packages using SQL Server Management Studio. You manage Integration
Services service using Services, a Microsoft Management Console (MMC) snap-
in.

28. What is DTUTIL?

The dtutil command prompt utility is used to manage SQL Server 2005 Integration
Services (SSIS) packages. The utility can copy, move, delete, or verify the existence of a
package. These actions can be performed on any SSIS package that is stored in one of
three locations: a Microsoft SQL Server database, the SSIS Package Store, and the file
system. The package and its storage type are identified by the /SQL, /FILE, and /DTS
options.

Syntax
dtutil /option [value] [/option [value]]...

Parameters
Option Description
/? Displays the command prompt options.
Copies an SSIS package:

• The sourcePathandPackageName
argument specifies where the SSIS
package is copied from.

• The
destinationPathandPackageName
argument specifies where the SSIS
package is copied to. If the
/C[opy] {SQL | FILE | DTS}; destination is SQL, the DestUser,
sourcePathandPackageName {SQL | FILE | DestPassword and DestServer
DTS}; destinationPathandPackageName arguments must also be specified in
the command.

When the Copy action encounters an


existing package at the destination, dtutil
prompts the user to confirm package
deletion. The Y reply overwrites the package
and the N reply ends the program. When the
command includes the Quiet argument, no
prompt appears and any existing package is
overwritten.
(Optional). Sets the decryption password
/Dec[rypt] password used when loading a package with password
encryption.
Deletes the package specified by the SQL,
/Del[ete] DTS or FILE option. If dtutil cannot delete
the package, the program ends.
Specifies the password used with the SQL
option to connect to a destination SQL
Server instance using SQL Server
Authentication. An error is generated if
DESTPASSWORD is specified in a command
line that does not include the DTSUSER
/DestP[assword] password
option.

Note:
When possible, use Windows
Authentication..
Specifies the server name used with any
action that causes a destination to be saved to
SQL Server. It is used to identify a non-local
or non-default server when saving an SSIS
package. It is an error to specify
DESTSERVER in a command line that does
not have an action associated with SQL
/DestS[erver] server_instance Server. Actions such as SIGN SQL, COPY
SQL, or MOVE SQL options would be
appropriate commands to combine with this
option.

A SQL Server instance name can be


specified by adding a backslash and the
instance name to the server name.
Specifies the user name used with the SIGN
SQL, COPY SQL, and MOVE SQL options to
connect to a SQL Server instance that uses
/DestU[ser] username SQL Server Authentication. It is an error to
specify DESTUSER in a command line that
does not include the SIGN SQL, COPY SQL,
or MOVE SQL option.
/DT[S] filespec This option specifies that the SSIS package
to be operated on is located in the SSIS
Package Store. The filespec argument is a
relative path, commencing at the root of the
SSIS Package Store.

If the DT[S] option is specified on the same


command line as any of the following
options, a DTEXEC_DTEXECERROR is
returned:

• FILE

• SQL

• SOURCEUSER

• SOURCEPASSWORD

• SOURCESQLSERVER

(Optional). Encrypts the loaded package with


the specified protection level and password,
and saves it to the location specified in Path.
The ProtectionLevel determines whether a
password is required.

• SQL - Path is the destination package


name.

• FILE - Path is the fully-qualified path


and file name for the package.

• DTS - This option is not supported


currently.

ProtectionLevel options:
/En[crypt] {SQL | FILE};
Path;ProtectionLevel[;password] Level 0: Strips sensitive information.

Level 1: Sensitive information is encrypted


using local user credentials.

Level 2: Sensitive information is encrypted


using the required password.

Level 3: Package is encrypted using the


required password.

Level 4: Package is encrypted using local


user credentials.

Level 5 Package uses SQL Server storage


encryption.
/E[xists] (Optional). Used to determine whether a
package exists. dtutil tries to locate the
package specified by either the SQL, DTS or
FILE options. If dtutil cannot locate the
package specified, a
DTEXEC_DTEXECERROR is returned.
(Optional). Create a new folder that has the
/FC[reate] {SQL | name specified by NewFolderName. The
DTS};ParentFolderPath;NewFolderName location of the new folder is indicated by the
ParentFolderPath.
(Optional). Deletes the folder specified by
/FD[elete] {SQL | the name in FolderName from SQL Server
DTS}[;ParentFolderPath;FolderName or SSIS. The location of the folder to delete
is indicated by the ParentFolderPath.
(Optional). Lists the contents, both folders
and packages, in a folder on SSIS or SQL
Server. The optional FolderPath parameter
specifies the folder that you want to view the
/FDi[rectory] {SQL | DTS};FolderPath[;S]]
content of. The optional S parameter
specifies that you want to view a listing of
the contents of the subfolders for the folder
specified in FolderPath.
(Optional). Verifies if the specified folder
exists on SSIS or SQL Server. The
/FE[xists ] {SQL | DTS};FolderPath
FolderPath parameter is the path and name
of the folder to verify the existence of.
This option specifies that the SSIS package
to be operated on is located in the file
system. The filespec value can be provided
as either a Universal Naming Convention
(UNC) path or local path.

If the File option is specified on the same


command line as any of the following
options, a DTEXEC_DTEXECERROR is
returned:
/Fi[le] filespec
• DTS

• SQL

• SOURCEUSER

• SOURCEPASSWORD

• SOURCESQLSERVER

/FR[ename] {SQL | DTS} (Optional). Renames a folder on the SSIS or


[;ParentFolderPath; SQL Server. The ParentFolderPath is the
OldFolderName;NewFolderName location of the folder to rename. The
OldFolderName is the current name of the
folder, and NewFolderName is the new name
to give the folder.
Displays text extensive help that shows the
dtutil options and describes their use. The
option argument is optional. If the argument
is included, the Help text includes detailed
information about the specified option. The
following example displays help for all
options:

dtutil /H
/H[elp] option
The following two examples show how to
use the /H option to display extended help
for a specific option, the /Q [uiet] option, in
this example:

dtutil /Help Quiet

dtutil /H Q
Creates a new GUID for the package and
updates the package ID property. When a
package is copied, the package ID remains
/I[DRegenerate] the same; therefore, the log files contain the
same GUID for both packages. This action
creates a new GUID for the newly-copied
package to distinguish it from the original.
/M[ove] {SQL | File | DTS}; path Moves an SSIS package. This action uses
two arguments, which are separated by a
semicolon:

• The destination argument can specify


SQL, FILE, or DTS. A SQL
destination can include the
DESTUSER, DESTPASSWORD, and
DESTSERVER options.

• The path argument specifies the


package location: SQL uses the
package path and package name,
FILE uses a UNC or local path, and
DTS uses a location that is relative to
the root of the SSIS Package Store.
When the destination is FILE or DTS,
the path argument does not include
the file name. Instead, it uses the
package name at the specified
location as the file name.

When the MOVE action encounters an


existing package at the destination, dtutil
prompts you to confirm that you want to
overwrite the package. The Y reply
overwrites the package and the N reply ends
the program. When the command includes
the QUIET option, no prompt appears and
any existing package is overwritten.
Stops the confirmation prompts that can
appear when a command including the
COPY, MOVE, or SIGN option is
/Q[uiet] executed. These prompts appear if a package
with the same name as the specified package
already exists at the destination computer or
if the specified package is already signed.
Adds a comment to the command line. The
comment argument is optional. If the
comment text includes spaces, the text must
/R[emark] text
be enclosed in quotation marks. You can
include multiple REM options in a command
line.
Signs an SSIS package. This action uses
three arguments, which are separated by
semicolons:

• The destination argument can specify


SQL, FILE, or DTS. A SQL
destination can include the
DESTUSER, DESTPASSWORD and
DESTSERVER options.

• The path argument specifies the


location of the package to take action
/Si[gn] {SQL | File | DTS}; path; hash
on.

• The hash argument specifies a


certificate identifier expressed as a
hexadecimal string of varying length.

When the SIGN action encounters a signed


package at the destination, dtutil prompts for
confirmation to re-sign the package. When
the command line includes the Quiet option,
no prompt appears and the package is re-
signed.
/SourceP[assword] password Specifies the password used with the SQL
and SOURCEUSER options to enable the
retrieval of an SSIS package that is stored in
a database on a SQL Server instance that
uses SQL Server Authentication. It is an
error to specify SOURCEPASSWORD in a
command line that does not include the
SOURCEUSER option.

Note:
When possible, use Windows
Authentication..
Specifies the server name used with the SQL
option to enable the retrieval of an SSIS
package that is stored in SQL Server. It is an
error to specify SOURCESQLSERVER in a
command line that does not include the
/SourceS[erver] server_instance SIGN SQL, COPY SQL, or MOVE SQL
option.

A SQL Server instance name can be


specified by adding a backslash and the
instance name to the server name.
Specifies the user name used with the
SOURCESQLSERVER option to enable the
retrieval of an SSIS package stored in SQL
Server using SQL Server Authentication. It is
an error to specify SOURCEUSER in a
command line that does not include the
/SourceU[ser] username SIGN SQL, COPY SQL, or MOVE SQL
option.

Note:
When possible, use Windows
Authentication..
/SQ[L] package_path Specifies the location of an SSIS package.
This option indicates that the package is
stored in the msdb database. The
package_path argument specifies the path
and name of the SSIS package. Folder names
are terminated with back slashes.

If the SQL option is specified on the same


command line as any of the following
options, a DTEXEC_DTEXECERROR is
returned:
• DTS

• FILE

The SQL option may be accompanied


by zero or one instance of the
following options:

• SOURCEUSER

• SOURCEPASSWORD

• SOURCESQLSERVER

If SOURCEUSERNAME is not included,


Windows Authentication is used to access
the package. SOURCEPASSWORD is
allowed only if SOURCEUSER is present. If
SOURCEPASSWORD is not included, a
blank password is used.

Important:
Do not use a blank password. Use a strong
password.

29. What is an SQL server Agent Job?

A job is a specified series of operations performed sequentially by SQL Server


Agent. A job can perform a wide range of activities, including running Transact-
SQL scripts, command-line applications, Microsoft ActiveX scripts, Integration
Services packages, Analysis Services commands and queries, or Replication tasks.
Jobs can run repetitive tasks or those that can be scheduled, and they can
automatically notify users of job status by generating alerts, thereby greatly
simplifying SQL Server administration.

To create a job, a user must be a member of one of the SQL Server Agent fixed
database roles or the sysadmin fixed server role. A job can be edited only by its
owner or members of the sysadmin role. For more information about the SQL
Server Agent fixed database roles. Jobs can be written to run on the local instance
of SQL Server or on multiple instances across an enterprise. To run jobs on
multiple servers, you must set up at least one master server and one or more target
servers.

30. What are the SQL servers Agent fixed database roles?
SQL Server 2005 introduces the following msdb database fixed database roles, which
give administrators finer control over access to SQL Server Agent. The roles listed
from least to most privileged access are:

• SQLAgentUserRole

• SQLAgentReaderRole

• SQLAgentOperatorRole

When users who are not members of one of these roles are connected to SQL Server
in SQL Server Management Studio, the SQL Server Agent node in Object Explorer
is not visible. A user must be a member of one of these fixed database roles or a
member of the sysadmin fixed server role to use SQL Server Agent.

31.What are the permissions of SQL server agent fixed database roles?

SQLAgentUserRole Permissions

SQLAgentUserRole is the least privileged of the SQL Server Agent fixed database roles.
It has permissions on only operators, local jobs, and job schedules. Members of
SQLAgentUserRole have permissions on only local jobs and job schedules that they
own. They cannot use multiserver jobs (master and target server jobs), and they cannot
change job ownership to gain access to jobs that they do not already own.
SQLAgentUserRole members can view a list of available proxies only in the Job Step
Properties dialog box of SQL Server Management Studio. Only the Jobs node in SQL
Server Management Studio Object Explorer is visible to members of
SQLAgentUserRole.

SQLAgentReaderRole Permissions

SQLAgentReaderRole includes all the SQLAgentUserRole permissions as well as


permissions to view the list of available multiserver jobs, their properties, and their
history. Members of this role can also view the list of all available jobs and job schedules
and their properties, not just those jobs and job schedules that they own.
SQLAgentReaderRole members cannot change job ownership to gain access to jobs that
they do not already own. Only the Jobs node in SQL Server Management Studio Object
Explorer is visible to members of the SQLAgentReaderRole.

SQLAgentOperatorRole Permissions

SQLAgentOperatorRole is the most privileged of the SQL Server Agent fixed database
roles. It includes all the permissions of SQLAgentUserRole and
SQLAgentReaderRole. Members of this role can also view properties for operators and
proxies, and enumerate available proxies and alerts on the server.
SQLAgentOperatorRole members have additional permissions on local jobs and
schedules. They can execute, stop, or start all local jobs, and they can delete the job
history for any local job on the server. They can also enable or disable all local jobs and
schedules on the server. To enable or disable local jobs or schedules, members of this role
must use the stored procedures sp_update_job and sp_update_schedule. Only the
parameters that specify the job or schedule name or identifier and the @enabled
parameter can be specified by members of SQLAgentOperatorRole. If they specify any
other parameters, execution of these stored procedures fails. SQLAgentOperatorRole
members cannot change job ownership to gain access to jobs that they do not already
own.

The Jobs, Alerts, Operators, and Proxies nodes in SQL Server Management Studio
Object Explorer are visible to members of SQLAgentOperatorRole. Only the Error
Logs node is not visible to members of this role.

32.

Programmatically using SSIS

1. Integration Service Architecture : Integration Services architecture separates data


movement and transformation from package control flow and management. There are
two distinct "engines" that define this architecture and that can be extended and
automated when programming Integration Services. The run-time engine implements the
control flow and package management infrastructure that lets developers control the flow
of execution and set options for logging, breakpoints, event handlers, and variables. The
data flow engine is a specialized, high performance engine that is exclusively dedicated
to extracting, transforming, and loading data. When programming Integration Services,
you will be programming against these two engines. The following image depicts the
architecture of Integration Services.
2. Integration Services Run-time Engine: The Integration Services run-time engine
controls the management and execution of packages, by implementing the infrastructure
that enables execution order, logging, variables, and event handling. Programming the
Integration Services run-time engine lets developers automate the creation, maintenance,
and execution of packages; create custom extensions; and create custom stand-alone
applications that can be distributed independently from the Microsoft® SQL Server™
installation.

3. Integration Services Data Flow Engine : The data flow engine supports the data flow
task, which is a specialized, high performance task dedicated to moving and transforming
data from disparate sources. Unlike other tasks, the data flow task contains additional
objects called pipeline components. These components are the core "moving parts" of the
task. They define the movement and transformation of data. Programming the data flow
engine lets developers automate the creation and configuration of the components in a
data flow task, and create custom components.

4. Commonly Used Assemblies: The assemblies that are frequently used when
programming Integration Services using the .NET Framework are :

Assembly Description
Microsoft.SqlServer.Managed
Contains the managed run-time engine.
DTS.dll
Microsoft.SqlServer.Pipeline
The PIA for the data flow engine.
Wrapper.dll
Microsoft.SqlServer.Runtime
The PIA for the native run-time engine.
Wrapper.dll
Contains the PipelineComponent base
Microsoft.SqlServer.PipelineH
class, which is used to create custom data
ost.dll
flow components.

5. Four classes that are important to understand when developing custom tasks:

a. Task class - Defines an abstract class that implements the methods and properties
common to all tasks in Integration Services.

Namespace: Microsoft.SqlServer.Dts.Runtime
Assembly: Microsoft.SqlServer.ManagedDTS (in
microsoft.sqlserver.manageddts.dll)

b. TaskHost class- Provides a container that encapsulates a single task. In the SSIS
Designer, the TaskHost is not configured separately; instead, it is configured
when you set the properties of the task it encapsulates.

Namespace: Microsoft.SqlServer.Dts.Runtime
Assembly: Microsoft.SqlServer.ManagedDTS (in
microsoft.sqlserver.manageddts.dll)

c. DtsTaskAttribute attribute: Supplies design time information about a Task object.

Namespace: Microsoft.SqlServer.Dts.Runtime
Assembly: Microsoft.SqlServer.ManagedDTS (in

microsoft.sqlserver.manageddts.dll)

d. IDtsTaskUI interface : Defines the collection of methods and properties called by


the SSIS Designer to initialize and display the user interface associated with a
task.

Namespace: Microsoft.SqlServer.Dts.Runtime.Design
Assembly: Microsoft.SqlServer.Dts.Design (in microsoft.sqlserver.dts.design.dll)
6. Objects where tasks can be added:

• Package

• Sequence

• ForLoop

• ForEachLoop

• DtsEventHandler

You might also like