You are on page 1of 63

SAS Professionals Convention

14-16 July 2009

Data Integration Best Practices


(Healthy Habits for SAS Data Integration Studio Users)

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Abstract: Version 9 of the SAS System offers tools to help developers and business users manage and organise the wealth of data and processes that face SAS professionals today. SAS Data Integration Studio benefits from many features that support healthy habits for data integration, but they can only 'be of use' if they are 'being used'. DI Studio allows customisation of the custom tree, error monitoring, job status handling, data validation, conformed data model support, selfdocumentation, and role assignment. Identification of the benefits behind using these functions is often enough to motivate users into controlled and organised methods of working. This paper describes examples of best practice for developing data integration suites to ensure quality, efficiency and resilience is built into the heart of your enterprises information estate.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation

14-16 July 2009

Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Integration Structure Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base? Employ a Data Integration flow structure.

14-16 July 2009

Solution:

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Integration Structure Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base? Employ a Data Integration flow structure.

14-16 July 2009

Solution:

Source Systems

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Integration Structure Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base? Employ a Data Integration flow structure.

14-16 July 2009

Solution:

Source Systems

Detailed Data Model

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Integration Structure Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base? Employ a Data Integration flow structure.

14-16 July 2009

Solution:

Source Systems

Detailed Data Model

Subject Specific Data Marts

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Integration Structure Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base? Employ a Data Integration flow structure.

14-16 July 2009

Solution:

Source Systems

Detailed Data Model

Subject Specific Data Marts

Subject Specific Business Intelligence

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation

14-16 July 2009

Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Integration Organisation Challenge: How can you keep track of the thousands of jobs typically created in a data integration suite? Utilise the custom tree in SAS Data Integration Studio.

14-16 July 2009

Solution:

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Integration Organisation Create folders for each integration layer.

14-16 July 2009

Sub divide them by: Jobs Libraries Tables Number the folders preserve order.
Stick to methodology: (e.g. dont transform in capture layer)

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation

14-16 July 2009

Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Capture Control Challenge: How can I perform incremental extracts from several source systems? Define Capture Control Tables for each source table.

14-16 July 2009

Solution:

Status To ensure smooth running of DI suite. (Started, Failed, or Success) From/To Datetimes To extract against the last updated column in the database. Also useful to determine processing times as data increases day by day.

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Capture Control Send Job Status to dataset with same name as the job.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Capture Control

14-16 July 2009

Only extract records which have updated since last run. Capture Job

Source Systems

Conformed Model

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Capture Control

14-16 July 2009

Only extract records which have updated since last run. Capture Job

Source Systems

Conformed Model
CoreInfo Tables

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Capture Control

14-16 July 2009

Only extract records which have updated since last run. Pre Capture Job Post

Source Systems

Conformed Model
CoreInfo Tables

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Capture Control Pre-Processing

14-16 July 2009

Is this the first time the job has run successfully today?

No

Warn that duplicate facts will occur.

Yes
Did the previous run fail, or not finish? Yes Update dates in CCT table for this source. (&source_table._CCT) No Warn that this is a replacement run.

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Capture Control Post-Processing

14-16 July 2009

Did the job run successfully ? Yes

No

Update CCT table with Status= Failed.

Update dates in CCT table for this source. (&source_table._CCT)

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation

14-16 July 2009

Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Error Monitoring Challenge: How can I keep my production support department informed of job failures/successes ? Email job statistics to designated mailbox.

14-16 July 2009

Solution:

Create User Transform called Email_Stats. Add Email_Stats transform to each job.

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Error Monitoring Add Email_Stats transform to Job.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Error Monitoring Drag Target table to one input. Drag Email_Stats to other input. (Email_Stats table contains email addresses of recipients). Dont hard-code email addresses. What happens when people leave? Different recipients for dev/prod.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Error Monitoring Email_Stats transform properties. Only emails if job has failed.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Error Monitoring Last job in flow always sends email to Admin & Support. Set Last Job to Yes.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation

14-16 July 2009

Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Validation Challenge: How can I ensure only clean data gets loaded into the warehouse? Use the Data Validation transformation.

14-16 July 2009

Solution:

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Validation Challenge: How can I ensure only clean data gets loaded into the warehouse? Use the Data Validation transformation.

14-16 July 2009

Solution:

Use the standard Invalid, Missing, Duplicate tabs. Employ custom validation and apply a severity rating: 1 = Exclusion 2 = Correction 3 = Improvement Store exceptions in permanent dataset for further analysis.

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Validation e.g. Check for Truncation of Key columns

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Validation 1) Create each condition

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Validation 1) Create each condition 2) Determine validation

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Validation 1) Create each condition 2) Determine validation 3) Define corrective action if required

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Validation 1) Create each condition 2) Determine validation 3) Define corrective action if required 4) This gets written to temp dataset ETLS_EXCEPTIONS.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Validation 1) Create each condition 2) Determine validation 3) Define corrective action if required 4) This gets written to temp dataset ETLS_EXCEPTIONS. 5) Run %Append_Data_Quality Macro in post-processing.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Validation 1) Create each condition 2) Determine validation 3) Define corrective action if required 4) This gets written to temp dataset ETLS_EXCEPTIONS. 5) Run %Append_Data_Quality Macro in post-processing. 6) Use BI tools to investigate Data Quality issues (e.g. Particular source system requires cleansing)

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Validation %Append_Data_Quality Macro Logic.

14-16 July 2009

Does ETLS_EXCEPTIONS exist ? Yes

No

Halt macro as no errors to process.

Append exceptions to permanent table DQ_Error_Event.

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Validation Table Properties for DQ_ERROR_EVENT.
Column name
Row_Extraction_Date Exception_Event_Date Job_Name Table_Name Row_Number Column_Name Screen_Description Exception_Description Exception_Action Exception_Severity Unconformed_ValueN Conformed_ValueN Unconformed_ValueC Conformed_ValueC

14-16 July 2009

Description
Date-timestamp when the row was exported or extracted from the source system. Date-timestamp when the exception was identified by the data warehouse processes. The name of the ETL job which identified the exception. The library and table name which contains the row and column containing the exception. The row number containing the exception. The column name containing the datum of the exception. The screen (data quality test) description. Standardised description of the exception . Automated data conform action (if any) . The severity level of the DQ Error Event (1=Exclusion, 2=Correction, 3=Improvement ). Original value (numeric) before conforming . Conformed (numeric) value . Original value (character) before conforming . Conformed (character) value .

Type
Num Num Char Char Num Char Char Char Char Num Num Num Char Char

Length
(8) (8) (64) (41) (8) (32) (256) (256) (256) (8) (8) (8) (256) (256)

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation

14-16 July 2009

Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Scrambling Challenge: How can I ensure Im not holding sensitive production data on development/test systems. Use Data Scrambling routines in non-production environments.

14-16 July 2009

Solution:

Often development source systems are created using production data, and warehouses can propagate the risk of breaching the data protection act.

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Scrambling Custom Transform The %data_scrambler macro allows for columns to be scrambled or passed through normally.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Scrambling Custom transform Edit Paramters: Select Pass dont scramble key fields! Scramble method: Ranuni Function MD5 Function Translate Function

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Data Scrambling What about Production? %let liveEnvironment = PROD; %let thisEnvironment= %sysfunc(substr(%sysfunc(upcase(%sysfunc(getoption(METASERVER)))),1,4); Dont perform scramble routine if thisEnvironment = liveEnvironment. When runnning in Dev the METASERVER option should be different. Could set up a table with environment value in.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation

14-16 July 2009

Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Conformed Model Challenge: How can I track trends in my data when the source systems dont hold history. Use a conformed data model in a warehouse, using slowly changing dimensions where appropriate.

14-16 July 2009

Solution:

Re-Useable Dimensions

Fact Tables

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Conformed Model In the Integrate layer use the SCD Type II Loader transform to make use of effective date processing.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Conformed Model In the Integrate Layer use the Surrogate Key Generator to determine keys for dimension tables.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation

14-16 July 2009

Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


SQL Optimisation Challenge: How can I ensure the best possible SQL performance is achieved through my SQL Join transform. Use the undocumented _Method option on the SQL procedure to determine processing.

14-16 July 2009

Solution:

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


SQL Optimisation: _Method Option (SAS Note 33604)

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation

14-16 July 2009

Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Self Documentation Challenge: How can I ensure the executed warehouse code is documented to an acceptable standard? DI Studio self documents the code, based on descriptions in in the job and transform properties.

14-16 July 2009

Solution:

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Self Documentation Meaningful Job names Descriptions of why not just what.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Self Documentation Use Notes and Document Attachments.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Self Documentation Descriptions & Notes are propagated through to the executable code, benefitting production support teams.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation

14-16 July 2009

Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Role Assignment Challenge: How can I address who is responsible for which job / entity?

14-16 July 2009

Solution:

Use Role Assignment in DI studio.

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Role Assignment Allocate names and roles where required.

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation

14-16 July 2009

Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Rename Standard Transforms Challenge: How can I keep track of processing in a job which has a lot of transformations. Dont use the default transform names, but rename the default to something meaningful.

14-16 July 2009

Solution:

E.g. Rename SQL Join to Merge Agent_Dim with Broker_Dim

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices


Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation

14-16 July 2009

Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices Contributors Mick Collington Jethro Day Steve Morton Nick Treadgold

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices Contributors Mick Collington Jethro Day Steve Morton Nick Treadgold Data Integration Developer Group (SAS Professionals) Julien Heijster John Robertson http://www.sasprofessionals.net/group/dataintegrationdeveloper/ forum/topics/data-integration-best

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

SAS Professionals Convention Data Integration Best Practices Contributors Mick Collington Jethro Day Steve Morton Nick Treadgold Data Integration Developer Group (SAS Professionals) Julien Heijster John Robertson http://www.sasprofessionals.net/group/dataintegrationdeveloper/ forum/topics/data-integration-best SAS.COM

14-16 July 2009

www.definitivequality.com

Copyright 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

You might also like