You are on page 1of 11

ETL Estimate Guidelines ETL Complexity Calculator

Version: http://etlcode.com/index.php/utility/etl_complexity_calculator
Updated By: cells that can be user entered ETL Estimation Tool
Date Updated: Constants http://etlcode.com/index.php/utility/etl_estimate
Hours/Day 8

Application Complexity Matrix

Base Time Build Test


Modules Assessment Design Implementation Totals
ETL Application Complexity Coding Unit Test Integration QA
(in Days) (in Hours)
# % Hours % Hours % Hours % Hours % Hours % Hours % Hours % Hours Days
Very Simple 1 8 5 5% 2.00 30% 12.00 20% 8.00 10% 4.00 10% 4.00 20% 8.00 5% 2.00 100% 40.00 5.00
Simple 2 16 6 5% 4.80 30% 28.80 20% 19.20 10% 9.60 10% 9.60 20% 19.20 5% 4.80 100% 96.00 12.00
Medium 5 40 8 5% 16.00 30% 96.00 20% 64.00 10% 32.00 10% 32.00 20% 64.00 5% 16.00 100% 320.00 40.00
Complex 8 64 2 5% 6.40 30% 38.40 20% 25.60 10% 12.80 10% 12.80 20% 25.60 5% 6.40 100% 128.00 16.00
Very Complex 15 120 1 5% 6.00 30% 36.00 20% 24.00 10% 12.00 10% 12.00 20% 24.00 5% 6.00 100% 120.00 15.00
Totals 22 35.20 211.20 140.80 70.40 70.40 140.80 35.20 704.00 88.00

Complexity Level Table

Complexity Description Guidelines for Classification Examples

1 Very Simple . Single source. . Staging mappings.


. No table joins
. No expression transformation.
. One-to-one mapping.
. Single path in the mapping pipeline.
2 Simple . Single or multiple sources, but not more than 2. . Type 1 mappings.
. Simple logic applied to or implemented in a mapping.
. Single path in the mapping pipeline, with up to 2 lookups.
3 Medium . Single or multiple sources, but not more than 3. . Type 1 mappings with error handling.
. Medium complex logic applied to or implemented in a mapping. . Type 2 mappings with or without error handling.
. Single or multiple paths in the mapping pipeline, but not more than 3 paths, and . Type 2 mappings with both Type-1 and Type-2 change triggers using
with up to 5 lookups. Type 2 Plug-In Maplet.

4 Complex . Single or multiple sources, but not more than 4. . Type 2 mappings with or without error handling.
. Multiple paths in the mapping pipeline, but not more than 5 paths, and with up to . Type 2 mappings with both Type-1 and Type-2 change triggers using
10 lookups. Type 2 Plug-In Maplet.

. Complex business/ transformation rules.


. Use of mapplets, but not more than 3 implemented in a mapping.
. More significant logic applied to or implemented in a mapping.
5 Very Complex . Single or multiple sources, more than 4. . This is a very rare scenario. It is strongly recommended that when
. Very complex logic applied or implemented to a mapping. design
a very complex ETL process, the process should be broken down into
. Very complex business/ transformation rules. simple processes. However, time needed for development of the
. Complex ETL process. process(es) should be within the estimate.
. Significant data anomalies.
. Use of mapplets, more than 3 implemented in a mapping.
. Impact to all mappings batches for the application.
. Multiple paths in the mapping pipeline, with more than 5 paths, and more than 10
lookups.
Estimates (effort)
Estimates include design, development, testing

Source to Integration:
10-20 days per table

Data Mart:
Average: 35 day (summaries, light derivations)
Complex: 70 days (complex derivations, multiple events)

Catalogue:
Average: 10 days

Cubes/Reports:
Average: 15 days

Estimates (phases)
Assessment 5%
Design 30%
Development 30%
Test 30%
Implementation 5%
NIVEL
# TAREA Facil Medio Complejo
1 Toma de Requerimientos
2 Mapeo de Datos
3 Diseno - Nivel Macro
4 Diseno - Nivel Tecnico
5 Desarrollo y Pruebas
6 Extraccion de Datos 0
6.1 Procesos de Traslado a Area Stage 0
6.1.1 Tablas Dimensionales
6.1.2 Tablas Fact

6.2 Depuracion de Datos


6.3 Sistema de Desarrollo 0
6.3.1 Cargar todos los datos del Proceso 0
6.3.1.1 Verificacion
6.3.1.2 Modificaciones al ETL
6.4 Sistema Productivo 0
6.4.2 Cargar todos los datos del Proceso 0
6.4.2.1 Verificacion
7 Pruebas en Ambiente QA/Produccion
8 Prueba de Integracion y Carga
9 Verificacion y Soporte

Totales por Fuente


Total de Horas 0 0 0
Total de Dias
Total de Semanas

Totales
Numeros de Fuentes de Datos 1 1 1
Total de Esfuerzo (Horas) 0 0 0
Total Dias 0 0 0
Total Semanas 0 0 0
2 sem

2 sem
easy - 1 hour (simple import)
medium - 5 hours (plus transforms, multiple tables)
difficult - 15 hours (plus script tasks, data merge, lookups, loops, etc...)

Low volume
Medium volume
High volume
Very high volume

Extract invoice header to flat file: 1 day


Extract invoice item to flat file: 1 day
Prepare invoice header file: 1 (base) + 1 (four joins) + 0.5 (change data capture)
+ 0.5 (validate fields) = 3 days.
Load invoice header data: 1 day.
Prepare invoice item file: 1 (base) + 0.5 (two joins) + 0.25 (reject missing header)
+ 0.5 (change data capture) = 2.25 days.
Load invoice item data: 1 (base) + 0.5 (before-sql disable constraints, bulk load,
after-sql enable constraints) = 1.5.

An expert (4+ years) would complete all jobs in 9.75 days. A novice would take 29.25 days with the x3 weighting.

Relational Table Prepare Job


Prepares a data source for a load into a relational database table.

1 day base
0.25 for rejection of rows with missing parents.
0.5 for output of augmentation requests for missing parents.
0.5 for any type of hierarchy validation.

Dimension Prepare job


Takes a source database SQL or a source flat file and creates staging files with trans

1 day base

0.5 type I unit testing

0.5 type II unit testing


0.5 type II unit testing

End to End Load

A job that extracts data, prepares it and loads it all in one is a combination of the
estimates from above. Just merge the times for the prepare and load jobs into one.
The fact job has a high base estimate and attracts a lot of general overheads such
as lookups and joins.

Data Volume Weighting

Very high volumes of data can take longer to develop: more time is spent making the
job as efficient as possible. Unit testing of large volumes takes more time.
Optimisation testing takes time.

Low volume = 1
Medium volume = 1.25
High volume = 1.5
Very high volume = 2

Examples

I extract a list of invoices from a system and load it to some relational data store
tables. An invoice is made up of invoice header records and invoice item records (an
invoice can have many items on it).

Extract invoice header to flat file: 1 day

Extract invoice item to flat file: 1 day

Prepare invoice header file: 1 (base) + 1 (four joins) + 0.5 (change data capture)
+ 0.5 (validate fields) = 3 days.

Load invoice header data: 1 day.

Prepare invoice item file: 1 (base) + 0.5 (two joins) + 0.25 (reject missing header)
+ 0.5 (change data capture) = 2.25 days.

Load invoice item data: 1 (base) + 0.5 (before-sql disable constraints, bulk load,
after-sql enable constraints) = 1.5.
An expert (4+ years) would complete all jobs in 9.75 days. A novice would take 29.25
days with the x3 weighting.
1
1.2
1.5
2

ould take 29.25 days with the x3 weighting.

Table Append, Load, Insert, Update or Delete


Loads data into a relational table via insert, update or delete.

0.5 day base


0.25 for a bulk load.
0.25 for user-defined SQL.
0.25 for before-sql or after-sql requirements.
0.5 rollback table after failed load.
1 restart from last row after failed load.

Fact Prepare Job


Loads fact data. The validation against dimensions is a general lookup overhead. Fact jobs tend to be the most complex jobs having a lot of source tables and validating against multi

3 days base
0.25 per source table (adds to SQL select, lookup and change
capture complexity)
0.25 for calculations (this is a custom setting

Skill Weighting
The skill weighting alters the estimated time based on the
experience and confidence of the developer. For lack of an
alternative the skills is defined as the number of years of experience
with the tool. An experience consultant has the base weighting of 1
(no affect on estimates) with less experienced staff attracting more
time.

4+ years = 1
3-4 years = 1.25
1-2 years = 1.5
6 months to 1 year = 1.75
Up to 6 months = 2
Novice = 3

The total number of days estimated for each job is multiplied by the skill weighting to get a fin
a lot of source tables and validating against multiple dimensions. They also tend to have the most complex functions.
ch job is multiplied by the skill weighting to get a final estimate.

You might also like