Professional Documents
Culture Documents
The purpose of this document is to provide the different types of Tables using Star
Schema and how the data is storing in Data Mart.
Star schemas
A star schema consists of fact tables and dimension tables. Fact tables contain the
quantitative or factual data about the Bank. Dimension tables are usually smaller and
hold descriptive data that reflects the dimensions, or attributes, of Fact Table
Type 1
The Type 1 methodology overwrites old data with new data, and therefore does not
track historical data at all. This is most appropriate when correcting certain types of
data errors, such as the spelling of a name. (Assuming you won't ever need to know
how it used to be misspelled in the past.)
In this example, Account_No is the natural key and Origin_ID is a surrogate key.
Technically, the surrogate key is not necessary, since the table will be unique by the
natural key (Account_No). However, the joins will perform better on an integer than
on a character string.
Type 2
The Type 2 method tracks historical data by creating multiple records for a given
natural key in the dimensional tables with separate surrogate keys and/or different
version numbers. With Type 2, we have unlimited history preservation as a new
record is inserted each time a change is made.
In the same example, if the Customer moves to Illinois, the table could look like this,
with incremented version numbers to indicate the sequence of changes:
The Start and End will be defined in the table using that we can able to find the what
are the changed happened to the particular customer.
Type 3
The Type 3 method tracks changes using separate columns. Whereas Type 2 had
unlimited history preservation, Type 3 has limited history preservation, as it's limited
to the number of columns we designate for storing historical data. Where the original
table structure in Type 1 and Type 2 was very similar, Type 3 will add additional
columns to the tables
Test Objective
3. Target system able to process, Store the data that was sent from source
system
Test Strategy
The below 2x2 matrix can determine how testing can be done and what are the
scenarios it can be valid or not valid.
• Any change in the source should reflect in the Target – Valid test /Pass
• Any Change in the source system but it is not reflecting in the Target -- valid
test / Fail
• No change in the Source but change in the target – Valid test /Fail
Target Target No
Source Change change
No Valid test/ Valid
change Fail Test/Pass
Valid test/
Change Pass Valid Test/Fail
Test Approach
Type 1
Create a view on top of dimensions that joins on itself to retrieve the most recent
Type-1 attributes.
We get a Type-1 change to the Account name changing it to ‘John S’. If we had this
data in the dimension
Account_ Account_Na
No me
1234 John
1234 John S
This means aggregations are invalid. If we change historical row, aggression remains
proper. The Name dimension now will look like
Here the Account Name (Type-1) has changed as has the Account Type (Type-2).
In this situation the new row will flow down the Type-1 change route, updating the
Account_Name and Account_Type for all historic rows and current row. The table will
now look like this:
This is all fine and expected. However have you spotted the issue yet? What happens
if Account No: 1234 changes Account Type e.g:
In this situation the Checksum for Type-1 and Type-2 will mark a change; Account
Type has changed from ‘P’ to ’S’ causing a Type-2 checksum to change and
Account_Name has also changed from ‘Personal’ to ’Savings’ causing the Type-1
checksum to change.
So, we update all the historic rows with the new Type-1 data and insert the new
Type-2 row (after handling the Type-2 change). Using this logic the Dimension will
now look like this.
Account_ Account_Na Account_T Descript RowStart RowEndD
No me ype ion Date ate
01-Jan- 24-Oct-
1234 John S P Savings 1900 2008
00:00:00 00:00:00
24-Oct- 27-Oct-
1234 John S R Savings 2008 2008
00:00:00 00:00:00
27-Oct- 31-Dec-
1234 John S S Savings 2008 9999
00:00:00 00:00:00
The current row is correct but the previous rows have had their Department_Name
updated which is wrong!! We want the Dimension to look like this:
If Type-1 updates changes related Type-2 attributes, don’t update all the historic
data
SCD Type
1 Testing
1 Record count check--Record count should match between source and target
2 Compare source data with the dimension/target data. it should match
3 Primary key should be unique in target
4 Target data should be updated using primary key
5 Surrogate key(if any) should be unique in dimension table
If any record is being changed in source, Primary key, Surrogate key should
6 not change in target/dimension
Whenever there is a change/update in a record in source ,Same record
7 should get updated in dimension with existing surrogate key
Whenever there is a new in a record in source ,Same record should get
8 inserted in dimension with new surrogate key
9 Load type to dimension should be upset(update else insert)
SCD Type
2 Testing
A If implemented using version number
1 Surrogate key should be unique
2 Primary key and version number combination should be unique
Whenever a record is changed/updated in source, it should be inserted as
3 new record in dimension and version number should increased by one
Whenever a new record is created in source, it should be inserted as new
4 record in dimension and version number should 0
5 Compare latest version data with source and it should match.
6 Any old record should not be updated in dimension
7 Load type should be Insert(change/new record should be inserted)
8 Only for changed record Version should be created
Every change in source record should create a new version of record in
9 dimension