You are on page 1of 8

Purpose

The purpose of this document is to provide the different types of Tables using Star
Schema and how the data is storing in Data Mart.

Star schemas

A star schema consists of fact tables and dimension tables. Fact tables contain the
quantitative or factual data about the Bank. Dimension tables are usually smaller and
hold descriptive data that reflects the dimensions, or attributes, of Fact Table

Type 1

The Type 1 methodology overwrites old data with new data, and therefore does not
track historical data at all. This is most appropriate when correcting certain types of
data errors, such as the spelling of a name. (Assuming you won't ever need to know
how it used to be misspelled in the past.)

Here is an example of a database table that keeps bank information:

Origin_ Account_ Customer_N Customer_S


ID No ame tate
Acme Supply
123 00112233 CA
Co

In this example, Account_No is the natural key and Origin_ID is a surrogate key.
Technically, the surrogate key is not necessary, since the table will be unique by the
natural key (Account_No). However, the joins will perform better on an integer than
on a character string.

Origin_I Account_ Customer Customer


D No name State
Acme Supply
123 00112233 IL
Co

Type 2

The Type 2 method tracks historical data by creating multiple records for a given
natural key in the dimensional tables with separate surrogate keys and/or different
version numbers. With Type 2, we have unlimited history preservation as a new
record is inserted each time a change is made.

In the same example, if the Customer moves to Illinois, the table could look like this,
with incremented version numbers to indicate the sequence of changes:

The Start and End will be defined in the table using that we can able to find the what
are the changed happened to the particular customer.

Origin_I Account_ Customer_Na Customer_St


Start Date End Date
D No me ate
123 00112233 Acme Supply CA 01-Jan- 21-Dec-
Co 2000 2004
Acme Supply 22-Dec-
124 00112233 IL
Co 2004

Type 3

The Type 3 method tracks changes using separate columns. Whereas Type 2 had
unlimited history preservation, Type 3 has limited history preservation, as it's limited
to the number of columns we designate for storing historical data. Where the original
table structure in Type 1 and Type 2 was very similar, Type 3 will add additional
columns to the tables

Origin_ Account_ Customer_N Original_Customer Effective_ Current_Customer


ID No ame _Stare Date _State
Acme Supply 22-Dec-
123 00112233 CA IL
Co 2004

Test Objective

Testing will be conducted to check

1. The data integrity, correctness and completeness of a particular data source.

2. That exception is handled appropriately.

3. Target system able to process, Store the data that was sent from source
system

4. To Make sure there is no impact on the existing functionality of the business

Test Strategy

The below 2x2 matrix can determine how testing can be done and what are the
scenarios it can be valid or not valid.

• Any change in the source should reflect in the Target – Valid test /Pass

• Any Change in the source system but it is not reflecting in the Target -- valid
test / Fail

• No change in the Source but change in the target – Valid test /Fail

• No Change in the source no change in the Target – Valid Test/Pass

Target Target No
Source Change change
No Valid test/ Valid
change Fail Test/Pass
Valid test/
Change Pass Valid Test/Fail
Test Approach

Below the flow chart describes the test process

Type 1

Create a view on top of dimensions that joins on itself to retrieve the most recent
Type-1 attributes.

(This could be incorrect though in some scenarios as mentioned in example below


and it is advised to update all history for Type-1 changes.)
Example:

Consider the changes below:

Account_N Customer_Na Account_Ty


Description
o me pe
1234 John P Personal
Small Business
1234 John R
Account

We get a Type-1 change to the Account name changing it to ‘John S’. If we had this
data in the dimension

Account_N Customer_Na Account_Ty


Description
o me pe
1234 John P Personal
Small Business
1234 John S R
Account

And this data in Name fact

Account_ Account_Na
No me
1234 John
1234 John S

How many “John S”in the system?

This could be the SQL

SELECT Count(Account_No) FROM Fact_Table T INNER JOIN Dim_Name N ON


s.Account_N0 = N.Account_No WHERE N.Account_Name = 'John S'

The answer would be 1. This is wrong because we have Total Account No 2.

This means aggregations are invalid. If we change historical row, aggression remains
proper. The Name dimension now will look like

Account_N Account_Na Account_Ty


Description
o me pe
1234 John S P Personal
Small Business
1234 John S R
Account
Type 2:

Build a checksum of type 2 changes

Account_ Account_Na Account_T Descripti RowStartD RowEndD


No me ype on ate ate
24-Oct-
01-Jan-1900
1234 John P Personal 2008
00:00:00
00:00:00
Small 24-Oct- 31-Dec-
1234 John R Business 2008 9999
Account 00:00:00 00:00:00

What would happen if we get a new row like this?

Account_ Account_Na Account_T


Description
No me ype
Personal
1234 John S P
Account

Here the Account Name (Type-1) has changed as has the Account Type (Type-2).

In this situation the new row will flow down the Type-1 change route, updating the
Account_Name and Account_Type for all historic rows and current row. The table will
now look like this:

Account_ Account_Na Account_T Descript RowStart RowEndDa


No me ype ion Date te
01-Jan-
Personal 24-Oct-2008
1234 John S P 1900
Account 00:00:00
00:00:00
Small 24-Oct- 31-Dec-
1234 John S R Business 2008 9999
Account 00:00:00 00:00:00

This is all fine and expected. However have you spotted the issue yet? What happens
if Account No: 1234 changes Account Type e.g:

Account_ Account_Na Account_T Descripti


No me ype on
1234 John S S Savings

In this situation the Checksum for Type-1 and Type-2 will mark a change; Account
Type has changed from ‘P’ to ’S’ causing a Type-2 checksum to change and
Account_Name has also changed from ‘Personal’ to ’Savings’ causing the Type-1
checksum to change.

So, we update all the historic rows with the new Type-1 data and insert the new
Type-2 row (after handling the Type-2 change). Using this logic the Dimension will
now look like this.
Account_ Account_Na Account_T Descript RowStart RowEndD
No me ype ion Date ate
01-Jan- 24-Oct-
1234 John S P Savings 1900 2008
00:00:00 00:00:00
24-Oct- 27-Oct-
1234 John S R Savings 2008 2008
00:00:00 00:00:00
27-Oct- 31-Dec-
1234 John S S Savings 2008 9999
00:00:00 00:00:00

The current row is correct but the previous rows have had their Department_Name
updated which is wrong!! We want the Dimension to look like this:

Account_ Account_Na Account_T Desripti RowStart


RowEndDate
No me ype on Date
01-Jan-
24-Oct-2008
1234 John S P Personal 1900
00:00:00
00:00:00
Small 24-Oct-
27-Oct-2008
1234 John S R Business 2008
00:00:00
Account 00:00:00
27-Oct-
31-Dec-9999
1234 John S S Savings 2008
00:00:00
00:00:00

If Type-1 updates changes related Type-2 attributes, don’t update all the historic
data

Sample Test Approach/Steps

SCD Type
1 Testing

1 Record count check--Record count should match between source and target
2 Compare source data with the dimension/target data. it should match
3 Primary key should be unique in target
4 Target data should be updated using primary key
5 Surrogate key(if any) should be unique in dimension table
If any record is being changed in source, Primary key, Surrogate key should
6 not change in target/dimension
Whenever there is a change/update in a record in source ,Same record
7 should get updated in dimension with existing surrogate key
Whenever there is a new in a record in source ,Same record should get
8 inserted in dimension with new surrogate key
9 Load type to dimension should be upset(update else insert)

SCD Type
2 Testing
A If implemented using version number
1 Surrogate key should be unique
2 Primary key and version number combination should be unique
Whenever a record is changed/updated in source, it should be inserted as
3 new record in dimension and version number should increased by one
Whenever a new record is created in source, it should be inserted as new
4 record in dimension and version number should 0
5 Compare latest version data with source and it should match.
6 Any old record should not be updated in dimension
7 Load type should be Insert(change/new record should be inserted)
8 Only for changed record Version should be created
Every change in source record should create a new version of record in
9 dimension

B If implemented using start and end effective date


1 Surrogate key should be unique
Whenever a record is changed/updated in source, it should be inserted as
2 new record in dimension and end effective date should be null
Whenever a new record is created in source, it should be inserted as new
3 record in dimension and end effective date should be null
Compare latest data(where end effective date is null) with source and it
4 should match.
5 Any old record should not be updated in dimension
6 Load type should be Insert(change/new record should be inserted)
7 primary key should be unique when effective end date is null
Whenever a record is changed/updated in source, previous record in
8 dimension should be updated with end effective date
Number of record count between source and target will match when end
9 effective date is null
10 except latest record all previous record should have end effective date
All record should have a proper start date(the date when record
11 created/updated in source)
Generally Start date of a record is the date when record is being
12 changed/updated/created in source
Generally Previous version record's end date is the date when record is
13 being changed/updated in source

You might also like