You are on page 1of 14

National Enterprise Wide Statistical System (NEWSS) DATA MIGRATION

Azizah Bt Hashim, Nur Hurriyatul Huda Bt Abdullah Sani

Abstract
This paper aims to examine the benefit of data migration in National Enterprise Wide
Statistical System (NEWSS) project to the Department Of Statistic Malaysia . Number of
data is drawn from NEWSS Phase 1 and II included data from Economic Census 2005 and
2010. ETL model is a comprehensive method to study NEWSS data migration because we
can investigate the effect on the data across sectors in department. The general findings of
this paper is that the contribution of data migration activities to the Operational of System
NEWSS is significantly increased in 2014 compared to 2011. This is in line with the
Department mission and objective to produce integrity and reliability data of National
Statistics through the use of the best technology, and to improve and strengthen statistical
services and delivery system.
Keywords: Database migration, Data Migration, ETL, Objective and Mission

1.

Introduction

With the rapid growing of business requirement in DOSM and new enterprise wide application
integration, organizations come to a stage where they have to change from working in
separated database and multiple platform to a single and integrated one. Migration also happen
when Organization realize that the existing systems have performance and scalability
limitations, which cannot cater to their ever-expanding business needs.
Data migration is the process of transferring data between storage types, formats, or computer
system. It is required when Organizations or Individuals Change Computer Systems or
Upgrade to a New Systems or when System Merge. Usually data migration performed
programmatically to achieve an automated migration.
figure 1: Data migration flow in DOSM

There is a different between data migration and database migration, though database
migration encompasses data migration also.

Database migration essentially means the


1

movement of data and conversion of various other structures and objects associated with the
database including schema and applications associated with the current system to a different

technology/platform. Database migration is one of the most common but a major task in any
application migration. Example of activity comprises in database migration are

Business Logic - Stored Procedure, Triggers, Packages, Functions


Schema Tables, Views, Synonyms, Sequences, Indexes
Physical Data Security, Users, Roles, Privileges
Database dependency of applications associated with the database

Data migration is simply the movement of data from one database (or File System)/platform
to another. This may include extraction of the data, cleansing of the data and loading the
same into the target database. for example, when an application is developed, it is required
to get those data for the newly developed application to operate. In this case only the data is
moved from the required database to the database used by the new application.
In simple ways database migration can be referred when there is a shifting from one type of
database systems to an entirely new type of database system or to a database system with
entirely new features and functionality. Hence data migration is a subset when database
migration activities are carried out, though data migration may also be taken up
independently.

There are interesting question why it is required to move to other database while the existing
systems are good running with current database. The reason why data migration activities are
carried out in NEWSS project in DOSM are
1. Avoid Businesses Failure
2. Improve corporate performance and deliver competitive advantage
3. Efficient and effective business processes (centralized db)
4. Measureable and accurate view of data
5. Perceive better value in the newer system in term of standardization of operational field
work and data entry

2.

Literature review

There are a number of studies conducted on best practices for data migration. For example
data migration, Methodologies for assessing, planning, moving and validating data migration
by IBMGlobal Technology Services, October 2009 and NetApp Global Services,
January 2006. Meanwhile, study about Database Migration Approach & Planning done by
Keshav Tripathy, Pragjnyajeet Mohanty and Biraja Prasad Nath(2002).
2

From the study by Martin Wagner, March 17, 2011 on Introduction on Patterns for Data
Migration Projects, he conclude that the quality constraints on the data in the old
system may be lower than the constraints in the target system. Inconsistent or missing
data entries that the legacy system somehow copes with (or ignores) might cause severe
problems in the target system. In addition, the data migration itself might corrupt the
data in a way that is not visible to the software developers but only to business users.
NetApp Global Services, January 2006 on Data Migration Best Practices state that for IT
managers, data migration has become one of the most routineand challengingfacts of
life. With the increase in the percentage of mission-critical data and the proportionate
increase in data availability demands, downtimewith its huge impact on a companys
financial bottom linebecomes unacceptable. In addition, business, technical and
operational requirements impose challenging restrictions on the migration process itself.
Resource demandsstaff, CPU cycles, and bandwidthand risksapplication downtime,
performance impact to production environments, technical incompatibilities, and data
corruption/lossmake migration one of ITs biggest challenges. Since the majority of
storage systems purchased by customers is used to store existingrather than newdata,
getting these new systems production-ready requires that data be copied/moved from the
old system to be replaced to the new system being deployed. Whether the migration is
performed by internal IT or an external services provider, the migration methodology is the
same.
On the other hand, IBM Global Technology Services October 2009 mention that when
systems must be taken down for migration, business operations can be seriously affected. A
keyway to minimize the business impact of data migration is to use best practices that
incorporate planning, technology implementation and validation. Any change in the storage
infrastructure, whether it is a tech-nology refresh, consolidation, relocation or storage
optimization,requires an organization to migrate data.
There are a variety of software products that can be used to migrate
data,including volume-management products, host- or array-based replication
products and relocation utilitiesas well as custom-developed scripts. Each
ofthese has strengths and weaknesses surrounding performance, operating
system support, storage-vendor platform support and whether or not application
downtime is required to migrate the data. Some of these products enable online
migration of dataso applications dont need to be taken offline during the
migration process. A subset of these provides nondisruptive migration,which
means that applications not only remain online, but also that application
processing continues without interruption or significant performance delays.
Therefore, IT organizations should carefully explore software options.Specific
requirements can help determine the best software technology to use for each
migration.
In addtion , Keshav Tripathy, Pragjnyajeet Mohanty and Biraja Prasad Nath on
Database Migration Approach & Planning state that Database migration,
consists of three major components, they are
Schema Migration This consists of mapping and migrating the source schema
with the target schema. For this the schema needs to be extracted from the
source system and the equivalent needs to be replicated in the target system
Data Migration This is the part where the data is extracted from the source
database. Then it is checked for consistency and accuracy, it is cleansed if
necessary. Finally it is loaded into the target system.

Application Migration This necessarily consists of changing the database


dependent areas (function calls, data accessing methods etc) of the application
so that the Input/Output behavior of the converted application with the target
database is exactly identical with that of the original application with the source
database.
However, this paper just focus on the methodology of data migration only, which
cover the data for NEWSS Phase 1 and II. The problem on migration in DOSM in
all includes the difficulties in getting the final source data untill the completion of
migration project. hence, based on this applied methodology and approach helps
to benefit the preparation of quality data for Department.

3.

Methodology

3.1

Definition of Data Migration

According en.wikipedia.org, Data migration is the process of transferring data


between storage types, formats, or computer systems. Data migration is usually
performed programmatically to achieve an automated migration. By perform
programmatically/automatic it freeing up human resources from tedious tasks.
It is also required when organizations or individuals change computer systems or
upgrade to new systems, or when systems merge happen. In DOSM
environment, migration happen by transferring database from old silo system to
integrated one.
For the purpose of this study, we considered the impact of data migration in National
Enterprise Wide Statistical System (NEWSS) to the Department Of Statistic Malaysia .
3.2

Source of Data

Number of migration data is drawn from from NEWSS Phase 1 and II included data
from Economic Census 2005 and 2010.

3.3
Analysis
The process occur during the migration process are analysis, mapping, planning, designing,
testing, loading n verifying.
The analysis happen in the source system, after that process extract and transform data into
staging area. Staging area is a workspace where we work to clean, put a rules, validate data
before we load it into the target.

figure 1. 1: Data Migration Methodology


3.4
Data Migration Lyfe Cycle
Data Migration Lyfe Cycle inclusive 6 phase which is analyze phase, map , high level
design, detail design, construct and test & deploy.

figure 1. 2: Data Migration Lyfe Cycle

3.5

DATA MIGRATION WORK FLOW

figure 1. 2: Data Migration Lyfe Cycle

figure 1. 3: Data Migration Work Flow


In data migration process, there is a few steps that need to be done as below:
a) Prerequisite - all the requirement for migration need to be defined properly and field
list need tp be prepared based on the NEWSS database format. All the Questionnaire
form, sample of data and explanation for requirement needed for further process.
b) Mapping - This is phase to identify database fields from source file and map it with
the fields in the NEWSS databases. The mapping is important to identify which field
in database related to which field in NEWSS databases, to identify the related field
changes from year to year in questionnaire, to identify the related field changes from
year to year in sources data, to find information about changes in each fields like new
field has been created for the current database or certain field been drop or fields been
split out or combined. so all those information in this phase is very important for the
next phase to be execute.
c) Designing - In this phases the script for migration been developed based on the
mapping gathered in step (b) and by referring questionnaire and fields list in steps (a).
After the script been designed, the script been tested first by the sample data gathered
in steps (a). This testing part need to be perform to ensure that the script developed
meet the requirement and no error.
7

d) Data Source - Verification on the data sources and format been done in this phase.
e) Cleansing and Loading Data - The Verification on the data sources with migration
script been done to check either the source data is clean or not. if during the
verification process the data show error, so the details checking and correction need to
be made by SMD. Otherwise if the verification shows a successful process, the data
will go to the final verification and be prepared for the next steps.
f) Production Loaded - The Data with Final Verification that been prepared on steps (e)
been loaded to the Production database. After the data been loaded a few testing
using the NEWSS system on the data loaded need to be perform by SMD. After the
successful testing, the approval of data migrated will be done by SMD.
3.6

Master Data Management


Master Data Management comprise of 4 main activity as below:
1. Identify data source - In NEWSS, data source will come from other Statistical
system/OLTP, and various sources such as from MS Excel, csv, MS Access, flat file,
My SQL, MS SQL Server, etc.
2. Create data profiling - the process of examining the DOSM data such as Economic,
Population, Trade, External Trade, Labour Force , everything is available in existing
data source information.
The important of profiling are:
a) The aim is to understand your data completely and fully
b) to improve data quality (by clarify the structure, content, relationships )
c) to improve understanding anomalies of data for the users (basis for an early
go/no-go decision)
d) To discover, register, and assess enterprise metadata (it will validate
metadata when it is available and to discover metadata when it is not)
e) And the last one improving data accuracy in corporate databases. (which
helps to assure that data cleaning and transformations have been done
correctly according to requirements.)
3. Check data quality - this process aim to discover data what have you missed, when
things go wrong, making confident decision and reliability of data for further data
analysis and data analytics. For example, It checks all relevant data such as gender,
addresses, postcode, district, state, and date format for a given respondents is
required. Common data problems like misspellings, typing error, and random
abbreviations have been cleaned up.
4. Extract, transform and load - Extract referring to extract data from multiple sources
and format (MS Excel, csv, MS Access, flat file, oracle, My SQL, MS SQL Server
and etc.) to a single standardize format. Meanwhile transfors involved data
mapping, verify process, code generation and data conversion. Load which is
transfer the data from historical data into production databased or end target.
However in NEWSS current migration process, the data profiling and data quality
activities not been done because of the certain constraint. To substitute with this
unavailability component, the checking data happen in the process of extract, transform
and load. However it will take the process of migration slightly longer to be completed.
Practice done during this ETL activities are, whatever data problem will be passing
back to SMD for checking, and SMD will check and make a corresponding action.
After completed, SMD will send the data back to the BPM. The new data need to be
validate again, and if there is still data problem so we need to return it back to SMD for
correction. So this process usually will drag the time for ETL activities.

figure 1. 4: Extract, Transform and Load (ETL) diagram

3.7
Migration tools,
In our market for migration tools, there were many software tools that being use for
ETL purpose. All the available tools do have their own strength and weaknesses.
Below is the tools that been explored during the migration process conduct in DOSM.
a) Talend
really powerful, stable and customizable
It's quite-well embeddable, it produces java code
The drawback is the learning curve
b) Pentaho
Its ETL tool (named Kettle)
is just a component of Pentaho Business Intelligence open platform
It's java-based
The major drawback is that Kettle is much harder to extend than Talend.
c) CloverETL
mostly younger
it's light, easily embeddable and easy to learn
But it's really much less powerful than Talend and even than Kettle.

3.8
a)
b)
c)
d)
e)
f)

CHALLENGES
Challenges that occurs during migration process in DOSM are :
Analyze -difficulty of data collection process because of data come from different
sources and misunderstanding of user requirements.
Mapping - Uncontrollable of mapping versioning due to frequent changes of the
survey form.
Design Migration Script - usually time constraint problem because large number of
data been inform to be migrate in a short time frame. sometime ad hoc request from
SMD.
Cleansing Data - Hard to get the real final data because of a few revision release.
Sometime data cleansing need to be confirmed several times from the SMD especially
if there is a data problem during the ETL process.
Testing - Checking the data involved many reference tables ie: Establishment Frame,
MSIC Code, Household Frame, locality and etc.
Data Loaded - Some data patching activities impacts the new version of final data.

10

4. Findings and Discussion


4.1. Way forward
Based on the migration activities done for NEWSS system in DOSM, there are some
findings and enhancement have been suggest as below

CURRENT

FUTURE

No Profiling activity

Establish profiling phase

No centralized final data

StatsDW

Patching

Less data patching

Repetition migration process impacts


varies of Final Data

Final Data is fixed and cannot be


altered

Hard to get the clean


effect DM process

Fix the cleansing data process at the


beginning stage

data which

Table 1: Finding during migration process

4.2
Contribution of Migration activities to the DOSM
Based on migration process for NEWSS, the result is shown in Table 2 as below.

CURRENT

FUTURE

No Profiling activity

Establish
phase

profiling

No centralized final
data

StatsDW

Patching

Less data patching

11

Repetition migration
process impacts varies
of Final Data

Final Data is fixed


and
cannot
be
altered

Hard to get the clean


data which effect DM
process

Fix the cleansing


data process at the
beginning stage

Table 1: Finding during migration process

12

5.

Concluding Remarks

The contribution of data migration in the operational of NEWSS has been proved by increasing
of demand by SMD on migrated data. However, final clean data is crucial in the contribution
of the migration process. By improving the quality of data in each division, creating integrity
and reliable data to the public in National Enterprise Wide Statistical System (NEWSS) will
valued DOSM productivity workforce and in line with Department vision to become a leading
statistical organization Internationally by 2020.
Hence, this study was conducted with the aim to investigate whether data migration activities
in National Enterprise Wide Statistical System (NEWSS) project is in line with Department
intention. The migration methodology applied in ETL process in DOSM is contribute on the
preparation of quality data and to expedite operational fieldwork. The contribution of
migration process increase the time for updating frame for the purpose of operational fieldwork
from manually operation taken one until two weeks time to one day operation by migrating
process.

References
13

Best practices for data migration, Methodologies for assessing, planning, moving
andvalidating data migration, IBMGlobal Technology Services, October 2009
Keshav Tripathy, Pragjnyajeet Mohanty and Biraja Prasad Nath, GDU Surface Transport,
Bhubaneswar, Satyam Technology Center, September 12 and 13, 2002, Database Migration,
Approach & Planning
Data Migration Best Practices, NetApp Global Services, January 2006
Martin Wagner, Tim Wellhausen, March 17, 2011, Patterns for Data Migration Projects
Derek Wilson, Practical Data Migration Strategies, 23 April 2014

14

You might also like