You are on page 1of 25

Module 4

Creating an ETL Solution with SSIS


Module Overview

Introduction to ETL with SSIS


Exploring Source Data
Implementing Data Flow
Lesson 1: Introduction to ETL with SSIS

Options for ETL


What Is SSIS?
SSIS Projects and Packages
The SSIS Design Environment
Upgrading from Previous Versions
Options for ETL

SQL Server Integration Services


The Import and Export Data Wizard
Transact-SQL
The bcp utility
Replication
What Is SSIS?

A platform for ETL


Control Flow Engine
operations
Installed as a feature
of
SQL Server
Data Flow Engine
Control flow engine:
Runtime resources and

Pipeline
operational support for
data flow
Data flow engine:
Pipeline architecture for
buffer-oriented rowset
processing
SSIS Projects and Packages

Package Deployment Model


SSIS Packages are deployed and managed individually

Project Deployment Model


Multiple packages are deployed in a single project

Project Project-level parameter

Project-level connection manager Deploy SSIS Catalog

Package Package
Package-level parameter Package-level parameter Package
Deploy Deployment
Package connection manager Package connection manager Model
The SSIS Design Environment
Package-level Package
Parameters Explorer

Data Flow Event


Tab Handlers
Tab

SSIS Solution
Toolbox Control Flow Explorer
Pane Design
Surface

Connection
Managers
Pane

Variables Properties
Pane Pane
Upgrading from Previous Versions

SQL Server 2000 DTS packages:


No direct update
Recreate, or migrate to SQL Server 2005/2008 and then
upgrade to 2014
SQL Server 2005, 2008, or 2012 SSIS packages:
Run by using DTSEXEC
Migrate to the SQL Server 2014 format
Scripts:
Migrated VSA scripts are automatically updated to VSTA
Microsoft ActiveX scripts are no longer supported and
must be replaced
Lesson 2: Exploring Source Data

Why Explore Source Data?


Examining Source Data
Demonstration: Exploring Source Data
Profiling Source Data
Demonstration: Using the Data Profiling Task
Why Explore Source Data?

Understand business data:


What business entities are represented
How to interpret values and codes
Relationships between business entities

Examine data for:


Column data types and lengths
Data volume and sparseness
Data quality issues
Examining Source Data

Extract a sample of data:


For example, use the Import and Export Data Wizard

Examine the data:


For example, in Microsoft Excel
Demonstration: Exploring Source Data

In this demonstration, you will see how to:


Extract Data with the Import and Export Data
Wizard
Explore Data in Microsoft Excel
Profiling Source Data

Use the Data Profiling task in SSIS to report


data statistics:
Candidate key
Column length distribution
Column null ratio
Column pattern
Column statistics
Column value distribution
Functional dependency
Value inclusion
View the profile in the Data Profile Viewer
Demonstration: Using the Data Profiling Task

In this demonstration, you will see how to:


Use the Data Profiling Task
View a Data Profiling Report
Lesson 3: Implementing Data Flow

Connection Managers
The Data Flow Task
Data Sources
Data Destinations
Data Transformations
Optimizing Data Flow Performance
Demonstration: Implementing a Data Flow
Connection Managers

A connection to a data source or destination:


Provider (for example, ADO.NET, OLE DB, or flat file)
Connection string
Credentials

Project or package level:


Project-level connection managers:
Can be shared across packages
Are listed in Solution Explorer and the Connection Managers
pane for packages in which they are used
Package-level connection managers:
Can be shared across objects in the package
Are listed only in the Connection Managers pane for packages
in which they are used
The Data Flow Task

The core control flow task in most SSIS


packages
It encapsulates a data flow pipeline
You define the pipeline for the task on the Data
Flow tab
Data Sources

The source of data for a data flow:


Connection manager
Table, view, or query (where supported)
Columns that are included
Many Sources Supported:
Database (ADO.NET, OLE DB, CDC Source)
File (Excel, Flat File, XML, Raw File)
Custom
Data Destinations

Endpoint for a data flow:


Connection manager
Table or view (where supported)
Column mapping

Multiple destination types:


Database (ADO.NET, OLE DB, SQL Server, SQL Server
Compact)
File (Excel, Flat File, Raw File)
SQL Server Analysis Services (Data mining model
training, dimension processing, partition processing)
Rowset (DataReader, Recordset)
Custom
Data Transformations

Row Transformations
Character Map, Copy Column, data Conversion, Derived Column, Export
Column, Import Column, OLE DB Command
Rowset Transformations
Aggregate, Sort, Percentage Sampling, Row Sampling, Pivot, Unpivot
Split and Join Transformations
Conditional Split, Multicast, Union All, Merge, Merge Join, Lookup, Cache,
CDC Splitter
Auditing Transformations
Audit, Rowcount
BI Transformations
Slowly Changing Dimension, Fuzzy Grouping, Fuzzy Lookup, Term
Extraction, Term Lookup, Data Mining Query, Data Cleansing
Custom Transformations
Script, Custom Component
Optimizing Data Flow Performance

Optimize queries:
Select only the rows and columns that you need

Avoid unnecessary sorting:


Use presorted data where possible
Set the IsSorted property where applicable

Configure Data Flow task properties:


Buffer size
Temporary storage location
Parallelism
Optimized mode
Demonstration: Implementing a Data Flow

In this demonstration, you will see how to:


Configure a Data Source
Use a Derived Column Transformation
Use a Lookup Transformation
Configure a Destination
Lab: Implementing Data Flow in an SSIS Package

Exercise 1: Exploring Source Data


Exercise 2: Transferring Data by Using a Data Flow
Task
Exercise 3: Using Transformations in a Data Flow

Logon Information
Virtual machine: 20463C-MIA-SQL
User name: ADVENTUREWORKS\Student
Password: Pa$$w0rd

Estimated Time: 60 minutes


Lab Scenario

In this lab, you will focus on the extraction of customer


and sales order data from the InternetSales database used
by the companys e-commerce site, which you must load
into the Staging database. This database contains
customer data (in a table named Customers), and sales
order data (in tables named SalesOrderHeader and
SalesOrderDetail). You will extract sales order data at the
line item level of granularity. The total sales amount for
each sales order line item is then calculated by multiplying
the unit price of the product purchased by the quantity
ordered. Additionally, the sales order data includes only
the ID of the product purchased, so your data flow must
look up the details of each product in a separate Products
database.
Module Review and Takeaways

Review Question(s)

You might also like