You are on page 1of 5

MethodsofIncrementalLoadinginDataWarehouse

WrittenbyDWBIConceptsTeam

LastUpdated:18June2014

Incrementalloadinga.k.aDeltaloadingisanwidelyusedmethodtoloaddataindatawarehousesfromthe
respective source systems. This technique is employed to perform faster load in less time utilizing less
systemresources.Inthistutorialwewillunderstandthebasicmethodsofincrementalloading.

WhatisIncrementalLoadingandwhyisitrequired
In almost all data warehousing scenario, we extract data from one or more source systems and keep
storing them in the data warehouse for future analysis. The source systems are generally OLTP systems
which store everyday transactional data. Now when it comes to loading these transactional data to data
warehouse,wehave2waystoaccomplishthis,FullLoadorIncrementalLoad.
Tounderstandthesetwoloadsbetter,considerasimplescenario.Let'ssaymysourcesysteminRDBMS
thatis,adatabaseandIhave2tables,customerandSales.
InthecustomertableIhavedetailsofallmycustomersinthisformat:
CustomerIDCustomerNameTypeEntryDate
1JohnIndividual22Mar2012
2RyanIndividual22Mar2012
3Bakers'Corporate23Mar2012
Inthesalestable,Ihavethedetailsofproductsoldtocustomers.Thisishowthesalestablelookslike:

IDCustomerIDProductDescriptionQtyRevenueSalesDate
11Whitesheet(A4)1004.0022Mar2012
21JamesClip(Box)12.5022Mar2012
32WhiteboardMarker12.0022Mar2012
43LetterEnvelop20075.0023Mar2012
51PaperClip124.0023Mar2012
Asyoucansee,abovetablesstoredatafor2consecutivedays22Marand23Mar.On22Mar,Ihadonly
2 customers (John and Ryan) who made 3 transactions in the sales table. Next day, I have got one more
customer (Bakers') and I have recorded 2 transactions one from Bakers' and 1 from my old customer
John.
Also imagine, we have a data warehouse which is loaded everyday in the night with the data from this
system.

FULLLOADMETHODFORLOADINGDATAWAREHOUSE
In case we are to opt for full load method for loading, we will read the 2 source tables (Customer and
Sales)everydayinfull.So,
On 22 Mar 2012: We will read 2 records from Customer and 3 records from Sales and load all of them in
thetarget.
On23Mar2012:Wewillread3recordsfromcustomer(includingthe2olderrecords)and5recordsfrom
sales(including3oldrecords)andwillloadorupdatetheminthetargetdatawarehouse.
As you can clearly guess, this method of loading unnecessarily read old records that we need not read as
wehavealreadyprocessedthembefore.Henceweneedtoimplementasmarterwayofloading.

INCREMENTAL
WAREHOUSE

LOAD

METHOD

FOR

LOADING

DATA

In case of incremental loading, we will only read those records that are not already read and loaded into
our target system (data warehouse). That is, on 22 March, we will read 2 records from customer and 3
records from sales however on 23 March, we will read 1 record from customer and 2 records from
sales.
But how do we ensure that we "only" read those records that are not "already" read? How do we know
whichrecordsarealreadyreadandwhichrecordsarenot?
Thisisatrickyquestionbuttheansweris,fortunately,easy!

Wecanmakeuseof"entrydate"fieldinthecustomertableand"salesdate"fieldinthesalestabletokeep
track of this. After each loading we will "store" the date until which the loading has been performed in
some data warehouse table and next day we only extract those records that has a date greater than our
storeddate.Let'screateanewtabletostorethisdate.Wewillcallthistableas"Batch"
Batch
Batch_IDLoaded_UntilStatus
122Mar2012Success
223Mar2012Success
Once we have done this, all we have to do to perform incremental or delta loading is to rite our data
extractionSQLqueriesinthisformat:
CustomerTableExtractionSQL
SELECTt.*
FROMCustomert
WHEREt.entry_date>(selectnvl(
max(b.loaded_until),
to_date('01011900','MMDDYYYY')
)
frombatchb
whereb.status='Success');
SalesTableExtractionSQL
SELECTt.*
FROMSalest
WHEREt.sales_date>(selectnvl(
max(b.loaded_until),
to_date('01011900','MMDDYYYY')
)
frombatchb
whereb.status='Success');
Okay,nowatthispointyoumaywonderandask
Howdoestheabovequerywork?

Let'ssee...
OnFirstday(22Mar):

There wont be any record in our batch table since we have not loaded any batch yet. So "SELECT
max(b.loaded_until)"willreturnNULL.ThatiswhywehaveputoneNVL()functiontoreplacetheNULLwith
averyoldhistoricaldate01Jan1900inthiscase.
Sointhefirstday,weareaskingtheselectquerytoextractallthedatahavingentrydate(orsalesdate)
greater than 01Jan1900. This will essentially extract everything from the table. Once 22 Mar loading is
complete,wewillmakeoneentryinthebatchtable(entry1)tomarkthesuccessfulextractionofrecords.
SecondDay(23Mar):
Nextday,thequery"SELECTmax(b.loaded_until)"willreturnme22Mar2012.Soineffect,abovequeries
willreducetothis:
CustomerTableExtractionSQL
SELECTt.*
FROMCustomert
WHEREt.entry_date>'22Mar2012';
SalesTableExtractionSQL
SELECTt.*
FROMSalest
WHEREt.sales_date>'22Mar2012';
As you can understand, this will ensure that only 23Mar records are extracted from the table thereby
performingasuccessfulincrementalloading.Afterthisloadingiscompletesuccessfully,wewillmakeone
moreentryinthebatchtable(entrynumber2).

WhyMAX()isusedintheabovequery?
Whenwetrytoload23Mardata,therewasonlyoneentryinthebatchtable(thatof22nd).Butwhenwe
gotoload24thdataoranydataafterthat,therewillbemultipleentriesinthebatchtable.Wemusttake
themaxoftheseentries.

Whystatusfieldiscreatedinbatchtable?
This is because it might so happen that 23rd load has failed. So when we start loading again on 24th, we
musttakeintoconsiderationboth23rddataand24thdata.
Batch_IDLoaded_UntilStatus
122Mar2012Success
223Mar2012Fail
324Mar2012Success
Intheabovecase,23rdbatchloadwasafailure.Thatiswhynextdaywehaveselectedallthedataafter
22Mar(including23rdand24thMar).
Now that we have discussed the general concepts of Incremental loading, next please read Incremental
Loading for Dimension Table (/etl/etl/54incrementalloadingfordimensiontable.html) and Incremental
Loading for Fact Tables (/etl/etl/55incrementalloadingforfacttables.html) where we will discuss
specificapproaches.
Prev(/etl/etl/54incrementalloadingfordimensiontable)
Next(/etl/etl/52whydoweneedstagingareaduringetlload)

Canyouanswerthis?
Whatdoes'E'in'ETL'standsfor?
Extraction
Elimination
Entry
Evacuation
Submit

Popular
Top20SQLInterviewQuestionswithAnswers(/database/sql/72top20sqlinterviewquestionswithanswers)
BestInformaticaInterviewQuestions&Answers(/etl/informatica/131importantpracticalinterviewquestions)
Top50DataWarehousing/AnalyticsInterviewQuestionsandAnswers(/datamodelling/dimensionalmodel/58
top50dwbiinterviewquestionswithanswers)
Top50DWBIInterviewQuestionswithAnswersPart2(/datamodelling/dimensionalmodel/59top50dwbi
interviewquestionswithanswerspart2)
The101GuidetoDimensionalDataModeling(/datamodelling/dimensionalmodel/1dimensionalmodeling
guide)
Top30BusinessObjectsinterviewquestions(BO)withAnswers(/analysis/businessobjects/69top
businessobjectsinterviewquestions)

AlsoRead
IncrementalLoadingforDimensionTable(/etl/etl/54incrementalloadingfordimensiontable)
CDCImplementationusingFlatfile(/etl/informatica/152cdcimplementationusingflatfile)

Haveaquestiononthissubject?
Askquestionstoourexpertcommunitymembersandclearyourdoubts.Askingquestionorengagingin
technicaldiscussionisbotheasyandrewarding.

AskaQuestion,we'llAnswer

AreyouonTwitter?
Startfollowingus.Thiswaywewillalwayskeepyouupdatedwithwhat'shappeninginDataAnalytics
community.Wewon'tspamyou.Promise.
Follow@dwbic

AboutUs
DataWarehousingandBusinessIntelligenceOrganizationAdvancingBusinessIntelligence
DWBI.orgisaprofessionalinstitutioncreatedandendorsedbyveteranBIandDataAnalyticsprofessionals
fortheadvancementofdatadrivenintelligence
JoinUs(/dwbi.org/component/easysocial/login)|Submitanarticle(/contribute)|ContactUs(/contact)

Copyright
(https://creativecommons.org/licenses/byncsa/4.0/)
Exceptwhereotherwisenoted,contentsofDWBI.ORGbyIntellipLLP(http://intellip.com)islicensedunder
aCreativeCommonsAttributionNonCommercialShareAlike4.0InternationalLicense.
PrivacyPolicy(/privacy)|TermsofUse(/terms)

Getintouch
(https://www.facebook.com/datawarehousing)
(https://www.linkedin.com/company/dwbiconcepts)

(https://twitter.com/dwbiconcepts)
(https://www.youtube.com/dwbiconcepts)

(https://plus.google.com/b/105042632846858744029)

Security
(https://www.beyondsecurity.com/vulnerabilityscannerverification/dwbi.org)