You are on page 1of 38

Hands-On Lab

Lab Manual
Creating a custom transformation
component in SQL Server Integration
Services

Page i
Information in this document is subject to change without notice. The example companies, organizations, products,
people, and events depicted herein are fictitious. No association with any real company, organization, product, person or
event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user.
Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a
retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or
otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarked, copyrights, or other intellectual property rights covering
subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the
furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual
property.

© 2005 Microsoft Corporation. All rights reserved.

Microsoft, MS-DOS, MS, Windows, Windows NT, MSDN, Active Directory, BizTalk, SQL Server, SharePoint, Outlook,
PowerPoint, FrontPage, Visual Basic, Visual C++, Visual J++, Visual InterDev, Visual SourceSafe, Visual C#, Visual J#,
and Visual Studio are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A. and/or other
countries.

Other product and company names herein may be the trademarks of their respective owners.

Page ii
Contents

CONTENTS.............................................................................................................................................................III

LAB 1: CREATING A NO-OP COMPONENT..........................................................................................................1

LAB 2: CREATING A HASH TRANSFORMATION...............................................................................................18

CONCLUSION........................................................................................................................................................35

Page iii
Lab 1: Creating a no-op component
Instructor notes: This lab will lead attendees through creating the bare bones data flow component and
how they can debug it. The code will be written in C#. However, any language that can produce a .Net
assembly can be used: VB.Net, Delphi, etc.

Lab Objective
Estimated time to complete this lab: 40 minutes
The objective of this lab is to write the simplest transformation possible: a no-op transform that allows
you to see the data as it flows through the transform.

The exercises in this lab are:

• Writing the no-op data flow transformation component


• Building and registering the component
• Using the component in a data flow task
• Debugging the execution of the component

Page 1
Exercise 1 – Writing the no-op data flow transformation component
Instructor notes: Some students might have a question about why the namespaces of integration
services has DTS in the name. This is because the name change happened too late into the
development cycle for us to go back and update the object model.

In this exercise, you will create a new C# based class library which will be the implementation of the
data flow component.

Task 1 – Create a new C# Class Library project


Create a new Visual C# Class Library Project inside Visual Studio 2005 called ‘SeeBuffer’. Put the
directory in the location shown so that it can be cleaned up automatically after you’re done with this lab.

Task 2 – Specify the default component settings


1. Add reference to the ‘Microsoft.SqlServer.PipelineHost’ assembly which provides the base
implementation of a pipeline component.

2. Add reference to the ‘Microsoft.SqlServer.DTSPipelineWrap’ assembly which provides an


interop wrapper around the pipeline object model.

Page 2
3. Declare the use of the ‘Microsoft.SqlServer.Dts.Pipeline’ namespace, change the namespace of
your class to CustomComponents, and change the name of the class to SeeBufferComponent:
using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.SqlServer.Dts.Pipeline;

namespace CustomComponents
{
public class SeeBufferComponent
{
}
}

Derive the SeeBufferComponent class from the default base component implementation,
PipelineComponent:

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.SqlServer.Dts.Pipeline;

namespace CustomComponents
{
public class SeeBufferComponent : PipelineComponent
{
}
}
Add the DtsPipelineComponent attribute to your component class. This will allow you to also set the
name of the component that will show up in the toolbox, as well as describe whether this is a Source,
Transformation or Destination component.

Page 3
namespace CustomComponents
{
[DtsPipelineComponent(DisplayName = "SeeBuffer",
ComponentType = ComponentType.Transform)]
public class SeeBufferComponent : PipelineComponent
{
}
}
Note: DtsPipelineComponent has other properties to specify description, icon,
editor, etc.

Override the base ProcessInput implementation. ProcessInput is the method where a buffer is made
available to the transform as it’s produced from upstream components.
public class SeeBufferComponent : PipelineComponent
{
public override void ProcessInput(int inputID, PipelineBuffer buffer)
{
int numberOfRows = buffer.RowCount;
bool eof = buffer.EndOfRowset;
}
}
Note: The ‘numberOfRows’ and ‘eof’ variables are created for debugging only.

That’s it! You’ve now created the simplest transform. Lets use it in the designer.

Page 4
Exercise 2 – Building and registering the component

Task 1 – Signing and building the component


1. Edit the properties of the project, and switch to the ‘Signing’ tab:

2. Check the ‘Sign the assembly’ option and click ‘<New…>’ to create a new key file:

3. Select ‘SeeBuffer’ as the name of the key file. For this exercise, do not protect the key file with a
password:

Page 5
4. Build the project.

After building, you’ll have a ‘SeeBuffer.dll’ file in the bin\Debug directory of your project. If you used
the suggested folder for creating the project, the project is in the ‘C:\SSIS Hands-On-
Lab\Scratch\SeeBuffer\SeeBuffer\’ folder.

Task 2 – Registering the component.

1. Copy SeeBuffer.DLL that was created by the build process to %ProgramFiles%\Microsoft SQL
Server\90\DTS\PipelineComponents. This will enable the SSIS component enumeration to discover
the component you’ve just created.

2. Use GACUtil.EXE to install the assembly into the GAC using the following command:

gacutil /i "<location of the project>\bin\Debug\SeeBuffer.dll"

This command will enable the assembly to be used by the designer and during execution.

Note: Subsequent labs will invoke a batch file that registers the component after a
successful build.

Congratulations! You’ve now registered your first component. Let us use this component in a package
next.

Page 6
Exercise 3 – Using the component in a package
In this exercise, we’ll register the component in the designer toolbox and use it in a simple package.

Task 1 – Registering the component for use in the designer

1. Start the Business Intelligence Development Studio.


2. Create a new ‘Integration Services Project’ project, again under the ‘Scratch’ directory:

3. Right click on the toolbox and select ‘Choose Items…’

Page 7
At this point, the designer will enumerate all tasks, connections, and data flow components
registered for use on this machine.

4. In the ‘SSIS Data Flow Items’ tab, check the ‘SeeBuffer’ list item. This will add the ‘SeeBuffer’
transform to your toolbox.

Task 2 – Using the component in a data flow

1. Add a ‘Data Flow Task’ to your package from the toolbox by either double clicking the task on
the toolbox or dragging and dropping it on the Control Flow diagramming surface:

2. Double click the Data Flow Task to switch to its editor.

3. Add an ‘OLE DB Source’ adapter to the data flow task.

Page 8
4. Edit the OLE DB Source adapter, click ‘New’ to create a new connection for the package.

5. In the dialog, you’ll see a list of two or more most recently used connections that have been
defined for you already. Use ‘dbil405.AdventureWork’ as the source connection.

6. In the ‘Name of the table or the view’ drop down, select ‘[Person].[Address]’ as the source table.
For this exercise, it doesn’t matter what source table is selected.

7. Click ‘OK’ to commit your edits to the OLE DB Source component.

Page 9
8. Next, find the ‘SeeBuffer’ transform on the toolbox. Since it was declared to be a transformation
and not a source or a destination, it would show up under the ‘Data Flow Transformations’
category of the toolbox.

9. Drag and drop the ‘SeeBuffer’ transform on the data flow surface.

10. Click and drag the green arrow out of the OLE DB Source component and drop it on the
SeeBuffer component to configure a path of data flow between them.

11. Since the transform doesn’t do any operations, it is not necessary to edit it. You can, however,
edit the transform and notice that a default editor is provided for you automatically:

Page 10
Note: You, of course, have the ability to define your custom transformation that
hides most of the object model from your users.

12. Whenever you’re done with the SeeBuffer editor, click Cancel to get back to the data flow task
editor.

13. To make sure the transform run, add a Union All after the See Buffer so that your data flow
looks like this:

Page 11
Task 3 – Executing the test package

1. Right click on the path between the ‘SeeBuffer’ and the ‘UnionAll’ and select ‘Data Viewers’.

2. Click ‘Add’ to create a new Data Viewer.

3. Select the default ‘Grid’ viewer, and click ‘OK’.

4. Click OK to get out of the Path editor and back to the data flow task again. Your data flow will
now look like this:

5. To execute the package, hit ‘F5’. When Business Intelligence Development Studio switches to
debug mode, you will get an extra tool window which contains the data viewer. Rearrange that
as appropriate. Your environment will look like this:

Here, you can see the data coming from the Person.Address table passing through the SeeBuffer
component without any operations.

Page 12
6. Once done with the debugging, you can stop the debugger to stop execution.

7. Close the Business Intelligence Development Studio.

Page 13
Exercise 4 – Debugging the component code
To understand how to debug the component, you need to first understand how the designer executes
the package.
The Business Intelligence Development Studio (BIDS) is basically Visual Studio, or devenv.exe. When
the SSIS debug engine starts, devenv starts a process called DtsDebugHost.EXE in which to execute
the package. This isolates the designer (devenv) from side effects of execution – memory corruptions,
incorrect cleanup, side effects of debugging values, etc. In order to keep the debugger operational
during debugging, devenev.exe actually starts another dummy copy of DtsDebugHost.EXE and
connects to that which receiving events from the executing DtsDebugHost. This dummy process could
have been anything – including notepad – but we selected dtsdebughost.exe since we understand and
can easily control that process.
This means that if you want to debug the design time functionality of the component (none of which
you’ve actually created in your package as yet), you’ll have to attach your debugger to the devenv.exe
process that’s hosting the Integration Services project.
However, in order to debug execution time behavior, you’ll have to attach to DtsDebugHost.EXE if the
package is started from devenv. As outlined above, there will be two DtsDebugHost.EXE available,
we’ll have to pick the right one because the other one is boring. The easiest way to identify the right
DtsDebugHost.eXE would be to check what types of debug engines have been loaded in the process.
You’re looking for the one that has Managed and native debug engines, process with PID 3216 below.

Another way to execute the process is by using dtexec.exe. This command line utility executes the
packages outside the designer. Since debugging execution time using dtexec is more straightforward,
that’s what we’ll use for this lab.

Page 14
Task 1 – Debugging the component through DTExec

1. Open the existing project, called ‘SeeBuffer’ from ‘C:\SSIS Hands-On-Lab\Scratch\Lab1


Completed\SeeBuffer\SeeBuffer.csproj’

2. Open the ‘SeeBuffer.cs’ file in the project and review the code – it should look almost identical
to what you created in the previous exercises.

3. Edit the Properties of this project, and switch to the ‘Build Events’ tab to review its settings:

The ‘Build Events’ tab specifies that a batch file will be run to register the components so that
we don’t have to worry about that. However, having done this in the previous exercises, you’re
now aware of what the batch file contains and why it is necessary. Review the batch file if you
wish, it is in the same directory as the project.

4. Switch to the ‘Debug’ tab and review and settings there which instruct Visual Studio to start
dtexec.exe with a pre defined package which includes the project to start debugging:

Page 15
5. Lets get debugging. Put a breakpoint on the first line of the ProcessInput implementation:

6. Start execution.

7. Dtexec.exe will popup, write a lot of validation messages, and then cause the debugger to stop
at the breakpoint.

8. Inspect the various buffer properties. It seems this buffer has 16 rows.

9. Hit F5 again to complete processing this buffer. The debugger will then stop again, this time with
a buffer.RowCount of 0 and buffer.EndOfRowset of True. This is the indication to your
transformation that there is no more data coming from upstream components.

Page 16
Note: if you’re writing a transform such as a Sort or and Aggregate, you need to
know when you have encountered all of the data… EndOfRowset is exactly the
property to use for that.

10. Close all instances of Business Intelligence Development Studio and Visual Studio.

Congratulations, you have built your first transform, registered it in the designer, used it in a data flow,
reviewed the default editor your transform gets automatically, watched the debugging experience in the
designer, and finally have debugged through the component at execution time!

Page 17
Lab 2: Creating a Hash Transformation

Lab Objective
Estimated time to complete this lab: 50 minutes
The objective of this lab is to write a more functional transform that references data available from an
upstream component and produces new data on the output. The transform will also perform some
validation so that you can get an understanding of what’s involved there.

In order to demonstrate this functionality, the transform will implement the ability to create a hash value
of a Unicode string.

The exercises of this lab are:

• Initializing the component


• Providing design time validation
• Caching column indices in PreExecute
• Producing a hash result on the output column
• Testing the component

Page 18
Exercise 1 – Initializing the component’s inputs and outputs
The component object model provides a method, ProvideComponentProperties, for allowing
component authors to initialize the component when it’s first added to the data flow. This function is
useful for components to set up the initial number of inputs/outputs. For example, the Merge Join
transform uses this to create two inputs and one output. This method can also be called to reset a
component back to its initial state, which is useful if users edit the persisted XML (which itself is not
supported) and get it wrong.

Task 1 – Initializing the component assembly environment

These steps are similar to what we did in the first exercise of the first lab. Instead of repeating the steps
of Exercise #1, a starting point is available to you.

1. Open the CreateHashTransform.csproj project in “C:\SSIS Hands-On-


Lab\Scratch\CreateHashTransform”.

2. First review the list of references. You’ll see a new reference there, to
Microsoft.SqlServer.ManagedDTS, the managed runtime object model; and to
Microsoft.SqlServer.DTSRuntimeWrap, the interop for the native runtime object model.

3. In addition, you’ll also notice the following declarations of using new namespaces in
CreateHashTransform.cs:

using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Security.Cryptography;

Both of these will be used by the component code.

4. The use of DtsPipelineComponent attribute and deriving from PipelineComponent was outlined
in Exercise 1 and is already done for you.

5. In addition, a successful build automatically registers the component for you.

Task 2 – Initialize the component

Finally, we can work on the component code itself. In this task, we will implement the code called when
this component is first added to the data flow, or when the component is asked to reset itself.

1. Override the ‘ProvideComponentProperties’ method. Comments are added for clarification and
do not, of course, need to be typed in verbatim.

public class CreateHashTransform : PipelineComponent


{
/// <summary>
/// Called to initialize/reset the component
/// </summary>

Page 19
public override void ProvideComponentProperties()
{
}
}

2. Call the base implementation:

/// <summary>
/// Called to initialize/reset the component
/// </summary>
public override void ProvideComponentProperties()
{
base.ProvideComponentProperties();
}

3. Now, since this method can also be called to reset the component, delete any existing metadata
(inputs, outputs, input columns, etc.) from it by calling another method in the base class:

/// <summary>
/// Called to initialize/reset the component
/// </summary>
public override void ProvideComponentProperties()
{
base.ProvideComponentProperties();

// remove any existing metadata


base.RemoveAllInputsOutputsAndCustomProperties();

4. Next, create the input and output. This transform will have only one input and one output. Note
that this is where you can name the input and output appropriately.

/// <summary>
/// Called to initialize/reset the component
/// </summary>
public override void ProvideComponentProperties()
{
base.ProvideComponentProperties();

// remove any existing metadata


base.RemoveAllInputsOutputsAndCustomProperties();

// create the input and the output


IDTSInput90 input = this.ComponentMetaData.InputCollection.New();
input.Name = "Input";

IDTSOutput90 output =
this.ComponentMetaData.OutputCollection.New();
output.Name = "Output";

5. Next, we’ll tie the output to the input. In other words the buffer that comes in to the input will be
the same one that goes to the output, we’ll just add our data to a separate column in the buffer:

Page 20
// create the input and the output
IDTSInput90 input = this.ComponentMetaData.InputCollection.New();
input.Name = "Input";

IDTSOutput90 output =
this.ComponentMetaData.OutputCollection.New();
output.Name = "Output";

// the output is synchronous with the input


output.SynchronousInputID = input.ID;

6. Finally, lets create the output column that will contain the result of the transform, the hash value:
// the output is synchronous with the input
output.SynchronousInputID = input.ID;

// create our hash value output column


IDTSOutputColumn90 outputColumn =
output.OutputColumnCollection.New();
outputColumn.Name = "Hash value";

outputColumn.SetDataTypeProperties(Microsoft.SqlServer.Dts.Runtime.Wrapper.DataType
.DT_WSTR, 24, 0, 0, 0);

The highlighted lines of code create a new output column with the name “Hash Value”, and a data
type of DT_WSTR (Unicode string) of size 24 characters.

That’s it. We’ve now created the metadata of the component.

Page 21
Exercise 2 – Providing design time validation
As you can imagine, components can provide fairly unique functionality. In order to help guide the user
along during the design time and also to highlight unintentional errors during both design time and
during execution, the pipeline object model provides the ability for components to validate their state
and report this status to the designer or any other pipeline object model client.

In this exercise, we will add rudimentary validation to the component. Note that the Validate method
that is discussed below is called during both design time and during execution.

Task 1 – Validate the component


1. Override the ‘Validate’ method that’s provided for you by the base class.

/// <summary>
/// only one column can be supported at this time
/// </summary>
/// <returns></returns>
public override DTSValidationStatus Validate()
{
return base.Validate();
}

You’ll notice that the DTSValidationStatus is an enumeration that has the following potential values:

DTSValidationStatus Description
VS_ISVALID The component is correctly configured and ready for
execution.
VS_ISBROKEN The component is incorrectly configured; typically,
this indicates that a property is set incorrectly.
VS_NEEDSNEWMETADATA The component's metadata is outdated or corrupt,
and a call to component’s ReinitializeMetaData()
will repair the component.
VS_ISCORRUPT The component is irreparably damaged and must be
completely reset. The designer calls the
component's ProvideComponentProperties().

2. If the base implementation of Validate says the component is valid, we’ll perform our own
validation. The simplest validation for us is to make sure we have only one input, one output,
and the single output has only one column. Otherwise, we report an error using the ‘FireError’
method. The only way we could get into this state is if the user modified the XML. We’ll
therefore use the VS_ISCORRUPT validation status flag notifying the caller of this.

DTSValidationStatus status = base.Validate();

if (status == DTSValidationStatus.VS_ISVALID)
{
// input and output are created by us, they should be exactly
what we wanted unless someone

Page 22
// played around with the XML.
if ((ComponentMetaData.InputCollection.Count != 1)
|| (ComponentMetaData.OutputCollection.Count != 1)
||
(ComponentMetaData.OutputCollection[0].OutputColumnCollection.Count != 1))
{
bool cancel = false;
this.ComponentMetaData.FireError(0, ComponentMetaData.Name,
"Invalid metadata", string.Empty, 0, out cancel);
status = DTSValidationStatus.VS_ISCORRUPT;
}
}

return status;
}

3. In this version of the transform, only a single input column can be selected by the user.

// input and output are created by us, they should be exactly what we wanted
// unless someone played around with the XML.
if ((ComponentMetaData.InputCollection.Count != 1)
|| (ComponentMetaData.OutputCollection.Count != 1)
||
(ComponentMetaData.OutputCollection[0].OutputColumnCollection.Count != 1))
{
bool cancel = false;
ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid
metadata", string.Empty, 0, out cancel);
status = DTSValidationStatus.VS_ISCORRUPT;
}
else
{
// only one column can be selected
if (ComponentMetaData.InputCollection[0].InputColumnCollection.Count !=
1)
{
bool cancel = false;
ComponentMetaData.FireError(0, ComponentMetaData.Name, "Only one
input column is currently supported", string.Empty, 0, out cancel);
return DTSValidationStatus.VS_ISBROKEN;
}
}

return status;

4. Finally, the data type of the input column must say it is a Unicode string since that’s all we
currently support:

// only one column can be selected


if (ComponentMetaData.InputCollection[0].InputColumnCollection.Count != 1)
{
bool cancel = false;
ComponentMetaData.FireError(0, ComponentMetaData.Name, "Only one input
column is currently supported", string.Empty, 0, out cancel);
return DTSValidationStatus.VS_ISBROKEN;
}

Page 23
else
{
// the data type of the input column must be DT_WSTR because that's all
we support right now
if
(ComponentMetaData.InputCollection[0].InputColumnCollection[0].DataType !=
DataType.DT_WSTR)
{
bool cancel = false;
ComponentMetaData.FireError(0, ComponentMetaData.Name, "Input
column data type must be DT_WSTR", string.Empty, 0, out cancel);
return DTSValidationStatus.VS_ISBROKEN;
}
}

That’s it for some simple validation. Lets test this design time behavior out in the designer.

Task 2 – Testing design time validation


1. Build the component which will register it as well.

2. Start the Business Intelligence Development Studio and add the ‘CreateHash’ transform to the
toolbox. The steps to follow here are similar to what we went through to get ‘SeeBuffer’
registered. Of course the transform to add to the toolbox is now called ‘CreateHash’ instead of
‘SeeBuffer’.

3. Create a new package that contains an OLE DB source to any table that contains a unicode
string column. While it is unlikely users will want to create a hash value on a string that denotes
the address of a person, that’s what we’ll use for this experiment. Such data exists in the
[Person].[Address] table of AdventureWorks:

4. Now add a new ‘Create Hash transform’. Notice the red X icon signifying a validation error. Click
on the transform to get the validation error in a tooltip, or use the Errors List window. In this
case, we’re getting a validation error saying only one column can be selected in the input:

Page 24
5. Edit the Create Hash Transform, switch to the Input Columns tab, and select a column which is
not a Unicode string. In this example, we’ve selected ‘rowguid’:

6. Click OK and see the new validation error complaining about the data type of the selected
column:

Page 25
7. Finally, edit the transform again, unselect ‘rowguid’, and select the ‘AddressLine1’ column:

8. Click ‘OK’ and notice that the ‘Create Hash Transform’ now no longer has the red X icon
identifying the error. It is now valid!

Page 26
9. Close the Business Intelligence Development Studio whenever you’re done with this exercise.

Page 27
Exercise 3 – Caching column indices in PreExecute
Now we’re ready to work on the execution time behavior of this component. As we saw in Lab1,
ProcessInput is called once per buffer. PreExecute is a method called before ProcessInput to allow a
component to cache whatever information it needs to cache and get ready for a call to ProcessInput (or
other per-buffer calls depending on the type of component).

If you’ve written code to access rowset fields by names in the past, you’ve probably optimized the loop
over all the rows by first finding what column index contains a column for a specific name. Since this is
done outside the loop, and the loop indexes into the fields collection by an integer rather than the
column name, execution time improves.

We’ll follow a similar pattern here and get the column index of both the input column and the output
column so that the per-buffer call is faster.

Task 1 – Populate the index of input and output columns in the buffer

1. Create a couple of variables at the class level that will hold this information:

public class CreateHashTransform : PipelineComponent


{
/// <summary>
/// buffer indices for processing
/// </summary>
private int inputColumnBufferIndex = -1;
private int outputColumnBufferIndex = -1;

2. Now, override the PreExecute method:

/// <summary>
/// called before executing, remembers the buffer column index
/// </summary>
public override void PreExecute()
{
base.PreExecute();
}

3. Lets get the index of the input column:

/// <summary>
/// called before executing, remembers the buffer column index
/// </summary>
public override void PreExecute()
{
base.PreExecute();

// get the location of the input column in the buffer


IDTSInput90 input = ComponentMetaData.InputCollection[0];
inputColumnBufferIndex =
BufferManager.FindColumnByLineageID(input.Buffer,
input.InputColumnCollection[0].LineageID);
}

Page 28
Several concepts need to be highlighted here:
a. The BufferManager manages the layout of buffer in the pipeline engine and is the class
which can tell us where a column exists in a buffer.

b. As you know, several components can reference the same column in a data flow. For
example, we’re using the AddressLine1 column to create a hash. A subsequent destination
might use the AddressLine1 to map to a destination column; and another transform in the
middle might be responsible for parsing the contents of the AddressLine1 column. However,
all of them are using the same column! The pipeline keeps track of this fact by assigning a
unique LineageID to the AddressLine1 column when it’s introduced by the OLE DB Source
into the data flow. From that point onwards all input columns that reference this column refer
to it by its LineageID.

c. Finally, during runtime, the Buffer is available as a property of the Input.

4. Now we do the same for the output column. Note that since the output of this transform is
synchronous with the input (no new buffer is created), the output column will be in the input’s
buffer.

/// <summary>
/// called before executing, remembers the buffer column index
/// </summary>
public override void PreExecute()
{
base.PreExecute();

// get the location of the input column in the buffer


IDTSInput90 input = ComponentMetaData.InputCollection[0];
inputColumnBufferIndex =
BufferManager.FindColumnByLineageID(input.Buffer,
input.InputColumnCollection[0].LineageID);

// get the location of the output column in the buffer


IDTSOutput90 output = ComponentMetaData.OutputCollection[0];
outputColumnBufferIndex =
BufferManager.FindColumnByLineageID(input.Buffer,
output.OutputColumnCollection[0].LineageID);
}

Task 2 – Caching the MD5 crypto service provider


We’ll be using this class for performing the hash function. So, add a variable at the class and initialize it
in PreExecute:

1. Add the variable to the class:

public class CreateHashTransform : PipelineComponent


{
/// <summary>
/// our hash code provider
/// </summary>
private MD5 md5 = null;

Page 29
2. Add the code to initialize this variable in PreExecute:

public override void PreExecute()


{
[...]

// initialize our hash provider


md5 = new MD5CryptoServiceProvider();
}

3. Finally, lets clear it out as part of a new PostExecute:


public override void PostExecute()
{
base.PostExecute();

md5 = null;
}

Page 30
Exercise 4 – Producing a hash result on the output column
Finally, we can now create the hash value itself.

Task 1 – Create the ProcessInput method


1. Override the ProcessInput method from the base:
/// <summary>
/// called to do the actual processing, computes the has value and saves it
/// in the output column
/// </summary>
/// <param name="inputID"></param>
/// <param name="buffer"></param>
public override void ProcessInput(int inputID, PipelineBuffer buffer)
{
}

2. Only do work if this buffer has some data in it:

public override void ProcessInput(int inputID, PipelineBuffer buffer)


{
if (buffer.EndOfRowset == false)
{
}
}

3. Some basic error traps to report any errors encountered during execution:

public override void ProcessInput(int inputID, PipelineBuffer buffer)


{
if (buffer.EndOfRowset == false)
{
try
{
}
catch (System.Exception ex)
{
bool cancel = false;
ComponentMetaData.FireError(0, ComponentMetaData.Name,
ex.Message, string.Empty, 0, out cancel);
throw new Exception("Could not process input buffer.");
}
}
}

4. Now, loop over all the rows in the buffer provided:


try
{
while (buffer.NextRow())
{
}
}
5. If the input column is null, the hash value will also be null.

Page 31
while (buffer.NextRow())
{
// hash value for NULLs is NULL.
if (buffer.IsNull(inputColumnBufferIndex))
{
buffer.SetNull(outputColumnBufferIndex);
}
}

As you can see, the buffer provides methods for checking for NULL values and for setting NULL
values for output columns.

6. If the input value is not null, we calculate a hash using the MD5 crypto service provider
// hash value for NULLs is NULL.
if (buffer.IsNull(inputColumnBufferIndex))
{
buffer.SetNull(outputColumnBufferIndex);
}
else
{
// get the raw contents of the buffer
string inputValue = buffer.GetString(inputColumnBufferIndex);

byte[] columnContents =
UnicodeEncoding.Unicode.GetBytes(inputValue);

// now compute the hash


byte[] hash = md5.ComputeHash(columnContents);

// convert to strings to save in the output


string hashString = Convert.ToBase64String(hash,
Base64FormattingOptions.None); // include newlines

// set the output column value


buffer.SetString(outputColumnBufferIndex, hashString);
}

That’s it!

Page 32
Exercise 5 - Testing the component
Your project is ready to execute with a sample package that looks like this:

1. Put a breakpoint at the beginning of PreExecute, and another at the beginning of ProcessInput.

2. Press F5 to debug the Create Hash Transform. You’ll see dtexec.exe pop up, and then the
PreExecute breakpoint will be hit:

Page 33
3. Hit F5 again and the breakpoint in ProcessInput will be hit. Step through the function to look at the
code execution.

You can also open the package up in the designer, add a visualizer, and see the results of the
execution. Here’s the configuration of the visualizer to show only the ‘AddressLine1’ and ‘Hash value’
columns:

And, finally, here’s the result of the visual debugging of the data flow itself:

Page 34
Conclusion
At this point, you are well versed in:
• Building and registering transformation components for use in the designer.
• Writing transforms with synchronous outputs
• Providing users validation messages during both design time and execution time
• Referencing data available on input columns
• Writing the results of transformation to the pipeline buffers
• Debugging both the design time and execution time behavior of the component

To build on this momentum, we suggest the following potential next steps:


• Go through the samples that are installed with SQL Server 2005, which includes not just
transformations with synchronous outputs but also Sources and transformations with
asynchronous outputs. In addition, source code is available in both C# and VB.
Visit http://www.sqlis.com for excellent materials presented by our MVPs and the community on how to
build component and the component UIs, in addition to several more samples.
Interact with the product team and the community on the SSIS MSDN forum at
http://forums.microsoft.com/msdn/ShowForum.aspx?ForumID=80
Visit the SSIS MSDN development center for more information on webcasts, training, bloggers, books,
etc. at http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx

Finally, we’d like to thank you for your enthusiasm of our product and your support! Do let us know how
this material could be improved for future consumptions by you peers or what other information would
be useful to you.

Sincerely,

SQL Server Integration Services Product Team

Page 35