Professional Documents
Culture Documents
Integrate
Data acquisition from source systems and integration Data transformation and synthesis
Analyze
Data enrichment, with business logic, hierarchical views Data discovery via data mining
Report
Data presentation and distribution Data access for the masses
Development Studio Administration in SQL Server Management Studio Extensibility through .NET code
What is it?
Microsofts ETL solution bundled with SQL Server E Extract
T Transform
L Load
Source
Read
SSIS
Write
Destination
Why?
Data exists in many places and in many types. Data is more useful when consolidated.
How do we get data from multiple systems and locations to a central place? How do we convert data? How do we consolidate it? How we structure it so it matches our business domain? How do we make sure every insert, delete and update does not rot the data. How do we fix dirty data?
Data Latency
Tasks, Loop enumerators, Event handlers, Log Providers Data Flow Sources, Destinations, Transformations, Connection Managers
SSIS - Architecture
Visual Studio/ SQL BI Studio SQL Management Studio
SSIS - Packages
Control Flow Package (XML) Container Task
Variables
Event Handlers Connections
Configurations
Control Flow DataFlow Task
Path Source Transform Dest
SSIS - Architecture
SQL Management Studio Package List, Monitor DTExec
Package
IS Service
Package Store msdb
Cfg
new caching options ADO.NET Source and destination components Data profiling task and viewer Wizard interface for defining source and destination Scripts (for the Script Transform) are now done in Visual Studio and thus in .NET languages. New package format Three new data formats for working with times
SLOW (5 times than a raw file!) Memory SSIS is an in memory process. SELECT * Exceptionally bad in SSIS Use many small packages Comments!!! Understand the components Many do the same things in different ways with different trade offs Lookup vs. Merge Join or Execute SQL vs. Execute T-SQL Understand which components run asynchronously and which run synchronously
Logically works row by row Row Count, Derived Column Buffer is reused
Partially Blocking
(asynchronous)
Works with groups of rows Merge, Merge Join, Union All Data copied to new buffers Needs all input rows before producing any output rows Aggregate, Sort, Pivot
Blocking
(asynchronous)
Source
Transform
Multicast
Destination
Sort
Sort
File System
SQL Server
Package Store
Deployment Utility