You are on page 1of 13

11/16/2017 De-duplication of Master Data during large SAP Implementation Projects | SAP Blogs

Products
Products Industries
Industries Support
Support Training
Training Community
Community Developer
Developer Partner
Partner

About
About


Home / Community / Blogs + Actions

De-duplication of Master Data during large


SAP Implementation Projects
March 7, 2014 | 1,142 Views |

Kathiravan Subramaniam
more by this author

SAP Data Services

share
0 share
0 tweet share
14

Follow

https://blogs.sap.com/2014/03/07/de-duplication-of-master-data-during-large-sap-implementation-projects/ 1/13
11/16/2017 De-duplication of Master Data during large SAP Implementation Projects | SAP Blogs

De-duplication of Master Data during large


SAP Implementation Projects

Abstract
Introduction
Overview of de-duplication process
Advantages of de-duplication process
Execution of de-duplication process

Initial cleansing of source data


Source master data comparison

Rules used in de-duplication logic


Master data de-duplication table
Nomination of leading master data
New master data creation in the target system
Mapping of non-leading master data to leading master data

Impact on the transactional data migration


Project team structure for the de-duplication projects
Pitfalls and the mitigation plan in de-duplication projects
Conclusion

Abstract

During large SAP Implementation project, which happens by consolidating one


or more SAP and non-SAP systems, there is a high possibility of same
material master and vendor master to re-appear in the target system with
different names and details. Incorrect master data leads to various issues like
incorrect reporting. This whitepaper addresses the issue of duplicate records

https://blogs.sap.com/2014/03/07/de-duplication-of-master-data-during-large-sap-implementation-projects/ 2/13
11/16/2017 De-duplication of Master Data during large SAP Implementation Projects | SAP Blogs

and provides solution on how to eliminate them. This whitepaper delves into
the advantages of de-duplication, explains the process steps to execute de-
duplication along with information on fields which should be used to pick the
duplicate records, suggests the best team structure to manage de-duplication
projects and provides a guide explaining the common pitfalls and the
mitigation plans while executing de-duplication projects.

Introduction

Performing master data cleansing at source systems (SAP and non-SAP


systems) and at the intermediary stage prior to final data upload into the target
system is a major activity in SAP implementation and rollout projects. This is a
key activity while consolidating one or more ERP systems. The primary activity
in master data cleansing is de-duplication of master data. De-duplication of
master data refers to the process of finding the identical master data within or
across the source system(s) and eliminating them before migrating to the
target system. Master data here typically refers to material master data and
vendor master data.

There are several tools used in de-duplication projects. The capabilities of


such tools are not discussed in this whitepaper.

The aim of this whitepaper is to provide key information of the de-duplication


process which could be followed in SAP rollout and implementation projects.

Overview of de-duplication process

The following picture depicts the de-duplication process

https://blogs.sap.com/2014/03/07/de-duplication-of-master-data-during-large-sap-implementation-projects/ 3/13
11/16/2017 De-duplication of Master Data during large SAP Implementation Projects | SAP Blogs

Figure 1: Overview of de-duplication process

The de-duplication process comprises following steps:

1. Initial cleansing of source data


2. Source data comparison based on de-duplication logic specific to
master data and preparation of de-duplication reports
3. Nomination of leading master data
4. Mapping of non-leading master data to leading master data
5. New master data creation in the target system

Advantages of de-duplication process

The advantages of executing de-duplication process during


rollout/implementation projects are listed as follows:

1. Duplicate material and vendor master records leads to incorrect and


inconsistent reporting
2. Incorrect consumption and availability information of material master
leads to inaccurate material planning
3. De-duplication process adhered during the start of the
rollout/implementation projects reduces the time and cost spent in
manually identifying duplicates later
4. De-duplication process improves the overall reliability of material and
vendor reports & analysis
5. Removing duplicate vendor master records helps to maintain effective
and consistent communication with vendors
6. Consolidated, consistent, harmonized and cleansed master data are
pre-requisites for innovation and growth
https://blogs.sap.com/2014/03/07/de-duplication-of-master-data-during-large-sap-implementation-projects/ 4/13
11/16/2017 De-duplication of Master Data during large SAP Implementation Projects | SAP Blogs

Execution of de-duplication process

De-duplication process gets executed as described in following steps:

Initial cleansing of source data

The scope for de-duplication of master data comprises all the master data in
the source systems except those which would fall under one or more below
criteria:

a. Master data which are deleted or blocked at the highest level in the
organisational structure. However master data which are blocked at one of the
lower organisational structure level might still be active and relevant at another
organisational level, hence those data should still be considered for de-
duplication exercise.

b. Vendor master data which are not created at company code level but
created only at purchasing organisational level for different purposes

c. Master data which cannot be migrated to target system due to non-


availability of mapping values on important fields like unit of measure in
material master

The initial cleansing of source master data is important since this would
dramatically reduce the number of duplicated group of master records. Initial
cleansing would include both enrichment of key master data and performing
corrections in the master data. Predominantly, initial cleansing will be
performed on the data within the source system only.

Few examples of initial cleansing of source master data are as follows:

a. Correct the material master with dummy text

b. Correct material master with material type HERS duplicated with the
material type HIBE to which the HERS material master is assigned to

c. Update key details used in de-duplication logic which are missing in the
material and vendor master data

d. Check and correct redundant partner functions created in vendor master

Important fields which are focused during initial cleansing or enrichment within
source system in material master and vendor master are as follows:

https://blogs.sap.com/2014/03/07/de-duplication-of-master-data-during-large-sap-implementation-projects/ 5/13
11/16/2017 De-duplication of Master Data during large SAP Implementation Projects | SAP Blogs

Material Master:

a. Material description

b. Unit of measure

c. Manufacturer details

d. UNSPSC code

e. Vendor part number

Vendor Master:

a. Vendor name

b. Address

c. VAT registration number

d. DUNS

e. Bank account number

Source master data comparison

This step is the core of de-duplication process. After the data is cleansed and
enriched, it would be compared against each other. The result of this
comparison would be the grouping of similar master data. The source system
for the master data could be single ERP system or multiple ERP systems.
There are tools available for comparing the master data and creating the
groups of similar master data.

The de-duplication tool applies the de-duplication logic in order to identify


similar master data and develops the master data de-duplication table.

Rules used in de-duplication logic

The criteria used to determine the group of similar master data would depend
on many factors like the availability of data, level of initial cleansing done,
scope of enrichment performed within source data, etc.,

Some of the rules used in material master de-duplication logic in order to


identify the similar group of material masters are as follows:

a. Same manufacturer and same base unit of measure

https://blogs.sap.com/2014/03/07/de-duplication-of-master-data-during-large-sap-implementation-projects/ 6/13
11/16/2017 De-duplication of Master Data during large SAP Implementation Projects | SAP Blogs

b. Same UNSPSC code

c. Same manufacturer and same vendor part number

d. Similar description

Predominantly the details would be concatenated and text comparisons are


performed in order to arrive at the similar master data groups.

Likewise, some of the rules used in vendor master de-duplication logic would
be as follows:

a. Same DUNS number

b. Same bank details

c. Same tax code

d. Same address details like name, street number, PO box, Postal code,
etc.,

Master data de-duplication table

Master data de-duplication table is the result of initial data cleansing activity
and the application of de-duplication logic on the pre-cleansed source data.
De-duplication tool has the capability in order to identify group of master
records within single ERP system or across multiple ERP systems based on
the de-duplication logic.

Simple example of master data de-duplication table would appear as follows:

Table 1 Master Data De-Duplication Table

Source
Group Material
System Material Manufacturer
number Description
Name

1 SAP Material A Bolt Hydraulic Mnfr 1


System 1

1 SAP Material B Bolt Mnfr 1


System 1

1 Non-SAP Material C Bolt, long Oil Mnfr 1


system 2 hydr

https://blogs.sap.com/2014/03/07/de-duplication-of-master-data-during-large-sap-implementation-projects/ 7/13
11/16/2017 De-duplication of Master Data during large SAP Implementation Projects | SAP Blogs

In the above example the de-duplication logic has worked on the source
system data and grouped these three materials which are of similar nature.

Nomination of leading master data

Once the grouping of similar master data has been done, there is need to
select the material which should get migrated into the target system.

Table 2 Master Data De-Duplication Table Appended with Nomination


Columns

Source Vendor
Group Material
System Material Manufacturer part
number Description
Name number

1 SAP Material A Bolt Hydraulic Mnfr 1 9N-4524


System 1

1 SAP Material B Bolt Mnfr 1 9N4524


System 1

1 Non-SAP Material C Bolt, long Oil Mnfr 1 9N-45-24


system 2 hydr

In the above example if Material A is identified as the leading material which


should be migrated into the target system and the other two, Material B and
Material C are identified as non-leading materials and they are duplicates.
The non-leading materials which are the duplicates will not be migrated to the
target system. Leading material is also referred as parent material and non-
leading material is also referred as child material.

In the above example, refer to the column Vendor part number. The same
part number provided by the same manufacturer was created in two different
systems in three different ways. Hence text search logic like normalization of
the text (removing the special characters to determine the actual text) should
be implemented to determine the duplicates.

The selection of leading/non-leading material is a manual activity which should


be guided by few principles as follows:

If a group contains master data from two different systems, then there is
conflict of which system specific master data is given first preference to be
selected as leading material. Here normally the thumb rule is to have the

https://blogs.sap.com/2014/03/07/de-duplication-of-master-data-during-large-sap-implementation-projects/ 8/13
11/16/2017 De-duplication of Master Data during large SAP Implementation Projects | SAP Blogs

oldest system master data which has the updated information to get the first
preference. The other approach would be to select the master data which has
most transactional data. This issue becomes complex (during the selection of
the leading material) when the different systems are owned by different
internal organizations. Normally the de-duplication process should be carried
out centrally with central co-coordinator to mitigate conflicts arising out of
selecting the leading material in the groups.

New master data creation in the target system

During this process step we should have arrived with all the manual
nominations of identifying the leading and non-leading master data. This
would enable us to segregate the leading material which will get migrated to
the target system. The leading material which would get migrated to the target
system would have new material number in the target system as per the target
system material number nomenclature.

Mapping of non-leading master data to leading master


data

As a final process step within the de-duplication process, after receiving the
new material number in the target system, we should arrive at the mapping of
leading and non-leading master data number (old source system material
number) to the leading master data number (new material number in the
target system) which would appear as per above example in section 3.3 as
follows:

Table 3 Master Data Mapping Table

Old source system material


New material number in the target system
number

Material A New Material A

Material B New Material A

Material C New Material A

The Old source system material number contains both the leading (parent)
and the non-leading (child) master data number.

https://blogs.sap.com/2014/03/07/de-duplication-of-master-data-during-large-sap-implementation-projects/ 9/13
11/16/2017 De-duplication of Master Data during large SAP Implementation Projects | SAP Blogs

New material number in the target system contains the material number
which is created in the target system.

The non-leading (child) material and vendor inherits the leading (parent)
material and vendor master data. Certain data like bank details of child vendor
will be consolidated to the parent vendor. Child vendor will inherit parent
vendor general data.

All non-leading (child) vendors company code/purchasing org/plant will be


extended to the parent vendor in the target system.

Impact on the transactional data migration

The non-leading master data will not be migrated to the target system. The
transactional data of the non-leading master data would be created using the
equivalent mapped leading master data.

Project team structure for the de-duplication


projects

De-duplication project involves lot of coordination between different owners of


the source systems. Usually in projects staggered across geographies, there
would be separate team responsible for each company code relevant master
data. In all such scenarios, there should be de-dup coordinator in each
location who should liaise with other de-dup coordinators and the central de-
dup coordinator. The master data organisation should provide the high level
governance. It is beneficial to position central de-duplication co-ordinator
centrally across geographies.

Table 4 Project Team Structure

Roles Major responsibilities

Central de-duplication co- 1. Provide technical guidance for doing the le


ordinator master data nominations
2. Co-ordinate leading/non-leading master d
3. Issue and scope management
4. Leadership activities
5. Arrange recurring meetings to track progre
6. Solve conflicts

https://blogs.sap.com/2014/03/07/de-duplication-of-master-data-during-large-sap-implementation-projects/ 10/13
11/16/2017 De-duplication of Master Data during large SAP Implementation Projects | SAP Blogs

De-duplication co-ordinator in 1. Perform the leading/non-leading master d


2. Participate in recurring meetings
every company code/
3. Ensure data quality in the source system
system/geography

Master data organisation 1. Governance on master data


2. Provide clarification on the master data de
3. Review data quality of source system and
structural changes

Pitfalls and the mitigation plan in de-


duplication projects

The common pitfalls and the mitigation plans in de-duplication projects are as
follows:

Table 5 Pitfalls and Mitigation Plans

Pitfalls Mitigation plan

During initial review stages, there is likely As a general guideline, the resource e
underestimation of resources needed to based on 100 master data a week per
review the items and perform the on authors experience in de-duplicatio
nomination for leading / non-leading with complete analysis including inves
master data history

If there are multiple system owners, time The role of central coordinator and the
taken to reach consensus on nominating more so that conflicts could be settled
the leading master data was huge

Incorrect nominations leads to Resources involved in de-duplication p


complexities, when the group contains detailed knowledge on master data an
master data across systems process

Incomplete nominations leads to Tracking mechanism to determine whi


complexities, , when the group contains nomination is pending with which team
master data across systems

High risk (or) high value items High risk and (or) high value items sh
with caution

Conclusion

https://blogs.sap.com/2014/03/07/de-duplication-of-master-data-during-large-sap-implementation-projects/ 11/13
11/16/2017 De-duplication of Master Data during large SAP Implementation Projects | SAP Blogs

This whitepaper discusses the de-duplication process during the initial stages
of rollout/implementation projects. However once the parent master data is
identified and the new master data are created after eliminating the child
duplicates, it is imperative to have defined approach to avoid duplicates further
in the target system. There could be single source for master data creation
and changes along with effective rules & processes to prevent duplicates at
source.

Alert Moderator

1 Comment
You must be Logged on to comment or reply to a post.

Ted Kwon

July 17, 2015 at 5:17 am

Good case for Material master and Vendor master De-duplication !

Thanks

https://blogs.sap.com/2014/03/07/de-duplication-of-master-data-during-large-sap-implementation-projects/ 12/13
11/16/2017 De-duplication of Master Data during large SAP Implementation Projects | SAP Blogs

Share & Follow


Privacy Terms of Use Legal Disclosure Copyright Trademark Sitemap Newsletter

https://blogs.sap.com/2014/03/07/de-duplication-of-master-data-during-large-sap-implementation-projects/ 13/13

You might also like