You are on page 1of 27

HHSC Enterprise Information Technology

Proof of Concept Assessment Report for

Master Data Management

Date: February 2, 2010


Prepared by: Enterprise Data Warehouse project

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

Table of Contents
1.

Problem Definition ................................................................................................................2


Data Collection....................................................................................................................2
Data Cleansing ....................................................................................................................2
Data Matching .....................................................................................................................3
Data Standardization and Ongoing Maintenance ...............................................................3
2. Master Data Management (MDM) ......................................................................................4
Definition .............................................................................................................................4
How MDM aligns with HHS Initiatives ...............................................................................5
3. Proof of Concept (PoC) .........................................................................................................6
Scope of IBM Master Data Management (MDM) Proof of Concept (PoC) ........................6
IBM MDM Product Suite .....................................................................................................8
Proof of Concept (PoC) Approach ....................................................................................11
4. Results and Observations....................................................................................................14
Observations of Source Data Quality ................................................................................15
Data Matching Process......................................................................................................17
Use Case Results................................................................................................................18
5. Recommendations and Conclusion ....................................................................................19
Recommendations ..............................................................................................................21
Conclusion .........................................................................................................................24
Appendices...................................................................................................................................... i
Appendix A Glossary........................................................................................................ ii

Page i

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

1. Problem Definition
The Health and Human Services (HHS) system uses various mission critical applications to support and
maintain its day to day operations for providing client services while providing decision making
capabilities within its executive, management and operational activities. As a result of program growth
across HHS agencies and the need to adapt to various laws, legislation, policies and procedures over the
years, the IT systems that support these operations have become complex, difficult to maintain, and
difficult to change from a decentralized, program-centric design of providing client services to a clientcentric design of providing client services. As a result, it is difficult to transform to more current
federal and state philosophies to move to a client-centric service delivery view and to support
interoperability initiatives as set forth by initiatives such as the Medicaid Information Technology
Architecture (MITA). For example, a lack of contextually consistent identification mechanisms,
definitions, and standards associated with tracking key business entities, such as clients, providers, and
services introduces a significant challenge for executive management and operational staff to get a
holistic view of an entity across programs. The ability to match and link these entities across different
programs and systems with a high degree of trust is a foundational level issue that could directly or
indirectly impact the successful implementation of upcoming initiatives on the HHS roadmap, such as
Enterprise Data Warehouse, MITA and Health Information Technology / Health Information Exchange
(HIT/HIE).
Establishing a robust and reusable solution for programs and applications to establish a trustworthy
enterprise view of the client is critical for moving forward with future initiatives at HHS. There are
several inherent system designs, operational practices, and technical issues that currently prevent HHS
from creating an enterprise view of a client or a provider at this time.
Data Collection
Each HHS agency uses independent operational systems to support their various programs. Although
key components of a given data set are often similar across the enterprise (e.g. client data: name,
date of birth, social security number, address, etc.), the data collected by each agency resides in silocentric systems in different formats with varying operational business rules. Linking data from
different sources across the enterprise is difficult due to program-specific system designs,
inconsistent data formats, and lack of data sharing agreements.
Data Cleansing
To assist executive management in making informed decisions and to satisfy HHS analytical and
reporting needs, several partial or unsuccessful attempts have been made to consolidate data into one
central location from various context-specific data sources across a subset of HHS agencies /
programs rather than from the perspective of establishing a complete client-centric view of services
availed. Often times, these data collection and cleansing processes are extremely resource-intensive
and, in some cases, do not accurately consolidate the large amount of data in a contextually
meaningful way.

Page 2

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

Data Matching
Client data collected in various mission-critical applications are operation-specific and their designs
ignore the existence of the same client data being captured in one or more other HHS systems. Data
collected in each system have separate business rules, formats and attributes that present challenges
to matching client data from various agencies and prevent establishing a single view of a client. This
operational practice of capturing client data for different contexts without contextual validation
across systems during the operational process, makes matching and reconciliation of cross-system
data a significant challenge during downstream operations and strategic analysis activities.
Data Standardization and Ongoing Maintenance
There are limited enterprise-level standards and guidelines to define HHS master data entities (client,
provider, claim etc.), or relationships between similar entities (client, patient, person) which has
contributed to redundant data across disparate operational and analytical systems and potential
duplication of services provided by HHS agencies to clients. See example in diagram 1 below.

Client Data
Source
Client Data
Attribute

Data Source
(DS) 1

INDV_ID

Data Source
(DS) 2

XXX_MEMBER_NO

Data Source
(DS) 3

Data Source
(DS) 4

PCN_NBR

RECORD_KEY

Data Source
(DS) 5

PERSON_ID

Client data is stored with different names, formats and values there is no single view of a client.
Problem: Client data cannot be joined across various systems without extensive data analysis
and transformation rules.
Diagram 1: Ways of storing Client Data in various HHS systems
There is no single, accurate, and comprehensive reusable framework to link client data across the
various systems as a proactive and foundational basis for decision making and managing operations
from a cross-functional client view perspective. Operational practices that attempt to build such
context-specific views are therefore reactive in nature and are often resource intensive, involve
significant manual intervention, and take a significant amount of time to design and perform the
necessary cross-system analysis. In addition, these reactive solutions are often situation-driven,
context-specific, and offer limited opportunity for expansion into reusable robust enterprise-focused
long term solutions that efficiently leverage and capture cross-program subject matter knowledge.
A potential solution to these data issues is to proactively recognize this pattern of problems across
the enterprise and establish a unified view of commonly used entities (client, provider, claim, etc.) to
make available entity-centric structures at appropriate levels of detail for reuse by various agency-

Page 3

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

specific analytical and operational level activities. Program level entity-matching operations could
then refocus their resources from siloed, resource intensive, entity matching activities using a service
from a centralized repository that maintains the necessary business intelligence and data for a
dynamic entity-centric view of key business entities.
This foundational level strategy of centralized data management, while potentially resource intensive
from an organizational support and automation resource support standpoint, could be the basis for
cost effectively making investments. The return on these investments would be very effectively
demonstrated for existing program functions and systems, as well as future client-centric data
management initiatives on the HHS roadmap, such as the Enterprise Data Warehouse (EDW),
Medicaid Information Technology Architecture, and Health Information Technology / Health
Information Exchange (HIT/HIE).

2. Master Data Management (MDM)


Recent industry trends with client data entity management and tracking of other key data entities across
systems advocate the use of Master Data Management (MDM) as a solution. In addition, the
implementation of MDM as an enterprise level solution to maximize the benefits of MDM is a current
industry trend. This document presents the assessment results of a proof of concept exercise performed
to assess the capabilities of Master Data Management (MDM) in the context of providing centralized
data entity management and entity linking across systems.
Definition

Master Data Management (MDM) comprises a set of processes and tools that consistently define and
manage the data entities of an organization. MDM has the objective of providing processes for
collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing such
data throughout an organization to ensure consistency and control in the ongoing maintenance and
application use of this information.
MDM is about two critical components -- the data itself and the functionality to ensure the data is
contextually accurate and timely.
While data is the foundation of a Master Data Management solution, it cannot be effective without a
secondary component -- functionality to govern the data. Data on its own has no ability to maintain
data readiness or more simply, the accuracy of the data in the context of a specific purpose. Master
data must be actively managed by appropriately selected data stewards within an organization.

Page 4

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

How MDM aligns with HHS Initiatives

An MDM solution can be used:

Collaboratively, to create and define master data


Operationally for real-time data access, and
Analytically, for data analysis.

An Enterprise MDM strategy, when properly implemented, can be beneficial across the HHS
enterprise for cross agency data sharing and synchronization, Health Information Exchange (HIE)
efforts, Medicaid Information Technology Architecture (MITA), and the creation of a true Enterprise
Data Warehouse (EDW).
A true Enterprise MDM implementation can manage changes, event triggers and notifications across
all applications enterprise-wide. In addition, MDM must do more than simply house the data; it must
manage its use in processes across the enterprise using different implementation strategies.
MDM in the context of Cross-Agency Data Sharing and Synchronization Initiatives
A Collaborative MDM manages the process of creating, defining, and synchronizing
master data across systems. Once the master data is defined, it can then be synchronized
with operational and analytical systems and applications. Collaborative MDM provides a
platform to aggregate, enrich, and publish definitional data and requires workflow and
advanced security capabilities.
The MDM solution provides execution on all critical data changes and event notifications
from simple to complex. This includes everything from resolving a duplicate record to
determining which systems get specific updates.
For example, address changes made in one source system can be sent to MDM as part of
real-time updates or a daily batch feed to update the master record. MDM can then
identify that the same client exists in other systems within HHSC and can send a critical
data change notification to these systems as well.
MDM in the context of upcoming Health Information Exchange (HIE) and Medicaid
Information Technology Architecture (MITA) Initiatives
In an Operational MDM, use and maintenance of master data occurs within operational
processes and applications. The master data is leveraged by other systems using these
services. Operational MDM can leverage and become a significant part of a servicesoriented architecture to support a variety of application needs. In the case of the HHS
environment, HIE and MITA related initiatives will establish systemic processes that
could benefit from Operational MDM. The MDM implementation requires performance

Page 5

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

to handle high transaction levels and should have open integration with operational
applications. Operational MDM uses pre-defined, out-of-the-box business services.
An Operational MDM solution is modeled on a service-oriented architecture (SOA),
should be flexible and scalable, and have some predefined set of out-of-the-box functions
to support the management and integrity of data. Operational MDM systems have the
flexibility to extend functionality to support new or additional business processes.
MDM in the context of Business Intelligence (BI) through Enterprise Data Warehouse
(EDW) Initiatives
An Analytical MDM provides accurate, consistent, and up-to-date master data to an
Enterprise Data Warehouse (EDW). It feeds business intelligence insight data back into
collaborative and operational MDM.
For example, a change in address (city/county/region) by a client thru MDM can indicate
that he/she is now eligible under a different program previously not available for the
client. MDM can then be used to send a notification to the EDW that can trigger a
Business Intelligence (BI) event to alert the case worker to contact the client regarding
the additional eligibility available.

3. Proof of Concept (PoC)


Scope of IBM Master Data Management (MDM) Proof of Concept (PoC)

To evaluate the viability and capabilities of master data management (MDM) using client data across
different program data sets, HHSC entered into an agreement with IBM to perform a proof of
concept (PoC) exercise. The IBM MDM solution was chosen for the evaluation for two key reasons:
1. IBM had been previously identified as one of three visionary industry leaders in the customer
data integration solution space during a Gartner Research Study in May 2009.
2. IBM agreed to commit resources and make available the necessary software and hardware
infrastructure to perform the proof of concept exercise in accordance with HHSC policies
and procedures.
The IBM InfoSphere product suite was used to assess if the Master Data Management (MDM)
technical solution could help HHSC in defining a single view of a client called Master Client Index
(MCI). The PoC was designed to demonstrate the viability of IBM MDM software products to build
a unified, standardized, and integrated repository of clients served and used by the various benefits
programs offered by HHS. The PoC was intended to:

Prove the benefits of utilizing a Master Data Management solution within HHS using
business use cases.
Validate the role of MDM in enabling strategic and operational analytic applications.

Page 6

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

Validate an MDM solution across structured and unstructured data stores.


Identify any supporting operational roles, standards, processes, and other key dependencies
that would have to be established for implementing MDM.

The scope of the PoC was intended to demonstrate the functional and technical capabilities of an
MDM solution by accomplishing the following:

Determine the attributes that should be used to match client records across various source
systems (SSN, name, address, etc.)
Identify individual clients processed by multiple source systems
Resolve the same clients records across multiple systems into a single record based on
matching attributes
Assign a single, integrated Master Client Index key for each individual
Create a Master record which associates all source system keys for an identified client
Use the Master record for integrated reporting on client information across source systems.

The PoC assesses if various HHS agencies and systems could take full advantage of a Master Client
Index capability maintained at an enterprise level. An MCI could potentially enable various HHS
applications to link data with other systems to derive and answer analytical and operational questions
accurately and assist business operations and executives to make informed decisions. Assessment of
the performance capabilities of the various hardware and software tools used in the PoC was not in
scope for this PoC. In addition, the focus of the PoC was to evaluate the general capabilities and
maturity of available MDM tools rather than perform a technology evaluation of the IBM MDM
solution relative to other technologies.

Page 7

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

IBM MDM Product Suite

The IBM MDM tool set and high level functionality has been summarized in this section.
IBM Master Data Management
Industry Models & Assets

Configured From
A Multi-Form
Master Data
Management
System

Collaborate

Operationalize

Analyze

Define, create and synchronize


Master Information

Deliver Master Information as a


Service for business operations

Drive real time business insight

Product, Partner, Customer, Supplier, Location


Data Domain

IBM Information Server


Unified Deployment

Exploit
Information
Server

Understand

Cleanse

Transform

Deliver

Discover, model, and


govern information
structure and content

Standardize, merge,
and correct information

Combine and restructure


information for new uses

Synchronize, virtualize
and move information for
in-line delivery

QualityStage

DataStage

DataStage

Information Analyzer

Unified Metadata Management


Parallel Processing
Rich Connectivity to Applications, Data, and Content

Page 8

HHSC Enterprise Information Technology


Master Data Management

Product Suite
IBM InfoSphere
Information Server

Description
IBM InfoSphere Information
Server enables businesses to
perform five key integration
functions:

IBM Master Data


Management

IBM Multiform Master Data


Management (MDM) addresses
the challenges for an effective
and complete management of
master data with a proven
framework
designed to help organizations
across the enterprise. The
fundamental principle of MDM is
that master data is decoupled
from operational, transactional
and analytical systems into a
centralized independent
repository or hub. This
centralized information is then
provided to Service Oriented
Architecture (SOA) business

Proof of Concept Assessment Report

Function and Toolset


1) Understand the data. IBM InfoSphere Information Analyzer can help companies
automatically discover, model, define and govern information content and structure, as
well as understand and analyze the meaning, relationships and lineage of information.
2) Cleanse the data. IBM InfoSphere QualityStage supports information quality and
consistency by standardizing, validating, matching and merging data.
3) Transform data into information. IBM InfoSphere DataStage help transforms and
enriches information to help ensure that it is in the proper context for new uses. It also
provides high-volume, complex data transformation and movement functionality that
can be used for stand-alone ETL scenarios or as a real-time data processing engine
for applications or processes.
4) Deliver the right information at the right time. IBM InfoSphere DataStage
provides the ability to virtualize, synchronize or move information to the people,
processes or applications that need it. It also supports critical Service Oriented
Architectures (SOAs) by allowing transformation rules to be deployed and reused as
services across multiple enterprise applications.
5) Perform unified metadata management. IBM InfoSphere Information Server is
built on a unified metadata infrastructure that enables shared understanding between
the different user roles involved in a data integration project, including business,
operational and technical domains.
Master Data Repository. InfoSphere Master Data Management Server maintains
master data for multiple domains including customer, account and product as well as
other data types such as location and privacy preferences.
MDM Business Services. Through business services, InfoSphere Master Data
Management Server facilitates integration with all applications and business processes
that consume master data.
The MDM Integrity layer of InfoSphere Master Data Management Server provides
data quality management capabilities around party matching, data validation, data
standardization and external reference identifiers.
The MDM Intelligence layer of the InfoSphere Master Data Management Server
contains a business rule and event detection functionality that is fully integrated with
the MDM Business Services.
MDM Data Governance Services allow transaction and data attributebased
authorization.
SOA Service Interfaces allow multiple systems
and applications to integrate with the MDM Business
Services.

Page 9

HHSC Enterprise Information Technology


Master Data Management
Product Suite

Proof of Concept Assessment Report

Description
services so data is managed
independently of any single line
of business, system or
application. This strategy enables
enterprises to identify common
functionality for all systems and
applications and then support
efficient, consistent use of
business information and
processes.

Function and Toolset


The MDM Data Stewardship user interface provides
an intuitive graphical interface for managing various
collaborative data processes such as managing
groups, duplicate suspect processing and hierarchies.
The MDM Event Management client provides the ability to trigger events and
schedule processing at a party level.
MDM Batch Job Manager. This client application is designed to manage batch
processing by providing capabilities such as pacing,
logging and multithreading.

Per IBM, their MDM platform moves beyond previous attempts at centralizing control of data by allowing users to fully manage
data with multiple domains and multiple styles of data usage.

Wide audience of users of master data


Operational
Applications

Middleware &
Business
Processes

Data
warehouses
& Analytics

Data
Stewards &
MDM Users

MDM
Business

Intelligence
Business
Logic

Data Quality
Management

Data
Governance

Knowledge
MDM Domains (i.e.. Party, account,

Page 10

UI Applications

InfoSphere Master Data Management Server

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

Proof of Concept (PoC) Approach

To perform the PoC, HHSC EIT collected client data from various agencies/systems. Due to
the limited hardware capacity associated with performing a PoC, only a small subset of client
data was used to assess the functionality of IBMs MDM product suite. Specifically, the
subset of client data sets whose last name began with I was chosen for this exercise.
To prove Master Client Index (MCI) integration between data sources, analytical use cases
were defined to merge MCI data with claims data from one system and lab data from
another. These use cases were designed to prove how an MCI could be used for data
warehousing and analytical application integration.
The diagram on the next page shows the architecture, number of data sources and use cases
involved in the IBM MDM PoC.

Page 11

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

Diagram 2: MDM PoC Architecture Diagram

Page 12

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

The diagram and tables below present the various data sets that were used for the PoC
including the subset of data loaded from each source into the consolidated data environment
on the MDM Server.

Diagram 3: MDM PoC Data Flow Diagram

Page 13

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

The following table describes the data source details of the data files used to load MDM.
Source

Data Details

DS 1

DS 2

DS 3

DS 4

Data from source systems master client index table (all clients).
Only loaded clients whose last names started with I into the PoC.
Client addresses not in this source system were obtained via a data extract
from the appropriate source system.
Monthly client extract (all clients obtained).
Only loaded data for clients whose last name started with I
Monthly extract file as of August 2009.
Only loaded data in for clients whose last name started with I
Clients and claims data.
2 Years (2008 and 2009) of client and lab data from this system.
Statistics quoted are only for clients whose last name starts with I.

4. Results and Observations


Observations in this section are specifically for the sample data extracts used in the PoC.
However, until further validation has been performed with subject matter experts, there is no
clear indication that the types of issues identified are valid issues and that these issues currently
exist in the source systems. In addition, it is important that conclusions made on the extent of
certain types of data patterns or problems not be inferred across the entire data set or system.
In some cases, HHSC was aware of the observations (e.g., inclusion of historical client records in
the data sets provided). Since the client data from MDM that was joined with the claims data to
produce analytical reports for this PoC was a small subset of the data obtained from source
systems, some of the observations may be skewed or misrepresented due to the subset selected.
It is therefore important to understand that the results inferred from this PoC be used to reach
conclusions on the capabilities of a MDM solution rather than generalizations about the quality
of the data itself.
The following results and observations are intended to provide insight into:
(1) the redundant data issues identified from the source sample data prior to the creation
of the Master Client Index, and
(2) the types of data issues encountered with each system that had to be addressed by the
Master Data Management solution.

Page 14

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

Observations of Source Data Quality

The table below summarizes the steps facilitated by the MDM software and the associated
counts as data was loaded from the different sources into the MDM environment and the
different software capabilities were used:
Data Load Description
Raw client data starting with "I" for all 4 data sources (clients loaded into the
consolidated table)
Duplicates within a data source dropped (sum for all 4 data sources)
Records dropped due to data issues within each data source (total of 4 data sources)
Records duplicated across data sources
(updated existing MDM record with additional data source identifier)
Client records resulting from standardization & matching (clients loaded into master
table)
Rows dropped due to invalid last name
Total clients loaded into MDM

Record
Count
69,057
5,620
3,563
7,298
52,576
749
51,827

Issues Addressed by Data Standardization

The MDM solutions data transformation step included data standardization. This data
standardization process was required to address the following types of issues encountered
with source system data prior to the creation of the MCI:

DS 1
15% of the records had a blank value in the Social Security Number field.
30% of the records contained filler information in the address fields (e.g.
Hurricane Ike, Homeless, Same as above)
75% of the addresses were blank in the data extracts utilized
Non-standard data entries within the address related fields i.e. address values
spread across multiple columns
11% of clients had multiple records within the same data source

DS 2

1% non-standard address structures


13% blank SSN
Clients with multiple records within source 2%
Contains clients > 18 yrs old (pregnant women)

DS 3

Suspected invalid age data, client ages greater than 107 years
28% of the records had non-standard address values
Records containing case numbers with zero values
2% of clients had multiple records within the same data source

Page 15

HHSC Enterprise Information Technology


Master Data Management

DS 4

Proof of Concept Assessment Report

Multiple date formats in date-related fields - mm/dd, mm/dd/yyyy, mm/dd/yy


98% of the records had a blank value in the Social Security Number field.
91% addresses in the data extract used were blank
21% of clients had multiple records within the same data source

Across Source Systems


Inconsistent formats for birthdates
Inconsistent formats for addresses (missing or incomplete address data
components)

The data below provides additional details on the number and percentage of data
standardization and duplication issues encountered in the subset of data used for the PoC.

DS
DS
DS
DS

1
2
3
4

Invalid Address Structures


Count
Percent
16,628
29.52
38
1.24
1,000
28.4
5,683
92.46

DS
DS
DS
DS

DS
DS
DS
DS

1
2
3
4

1
2
3
4

DS
DS
DS
DS

Invalid Social Security Numbers


Count
Percent
1
9,186
16.3
2
410
13.42
3
67
1.9
4
6,007
97.73

Invalid Zip Codes


Count
Percent
41,832
74.52
0
0
1,702
48.35
77
1.25

Duplicate Records
Count
Percent
11,696
16.93
284
0.41
133
0.19
4,367
6.32

DS
DS
DS
DS

1
2
3
4

Duplicates Within Source


Count
Percent
6,916
12.28
65
2.13
60
1.7
1,542
25.09

Once the final MCI had been created, further analysis of the data using the software allowed
one to observe the following:

Clients were identified as matching from 2 or 3 data sources, however, no single


client was found in all 4 data sources used.
71% of clients from the DS 2 data set existed in the DS 1 data set (this doesnt imply
that these clients were receiving benefits from both systems simultaneously;
additional cross referencing would be necessary).

Page 16

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

Data Matching Process

Matching of client data sets involved the following steps:

Client data set was grouped based upon predefined criteria.


The grouped data was then matched against attributes to produce a statistical score on
the likelihood that the records matched.
Any data whose score was not sufficient to instill confidence that the records matched
was retained for use in the next data matching iteration.

Below are the 3 grouping utilized in this PoC:


Grouping
1st
2nd
3rd

Attributes Used
First Name, Last Name, Street Name
First Name, Last Name, DOB
SSN

Data evaluated for each grouping used the following attributes in the data matching process:

SSN
Last Name
First Name
Middle Name
DOB
Gender
Address

City
State
Zip
County
Region
Phone
Source System Key

As issues were identified in the data matching process, the MDM tool set allowed additional
data matching rules to be defined. Overall observation was that any MDM solution
implemented would require a flexible tool set that could be customized to address data
matching needs.

Page 17

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

Use Case Results

Use cases identified used a sampling (clients whose last names start with I) of HHS data
from all 4 data sources. Both operational and analytical use cases were utilized.
Operational MDM Use Cases (Enterprise Master Client Index (MCI)):

Identify and report on contradictory and/or overlapping attribute values per identified
individual and general data profiling information discovered in the analysis
Successfully demonstrated.
Demonstrate the ability to identify a suspected duplicate individual during an
operational add of a new individual to the Master Data Management Repository
Successfully demonstrated.
Demonstrate the potential capability to enable HHS applications to search, access,
and update individual client information with service calls to the Master Data
Management repository Successfully demonstrated.
Single View of Client and Claims for auditing purpose Successfully demonstrated.

Analytical MDM Use Case:

Show aggregated costs across both coverage programs at different levels of


aggregation (aggregate by Plans, Services, Population demographics, etc.), including:

Costs for the 100 Most Costly Medicaid Clients Successfully demonstrated
Five claim data files were manually loaded to a claim fact table (2
acute care and 3 CMS claims). This report was generated by joining
the claim fact table with the customer table which was a dimension out
of MDM.
Determine Diabetic Clients Overdue for a Medical Screening Not
performed

Collaborative MDM Use Case:

No collaborative use cases had been identified for this PoC at the onset of planning
this exercise. Future POC and assessment activities will need to validate these
capabilities.

Page 18

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

5. Recommendations and Conclusion


Although the MDM Proof of Concept (PoC) was initially undertaken to show the viability of
accurately matching or linking records across different data sources to establish a unified and
contextually accurate view of an entity (client, patient, provider, etc.), it quickly became evident
during the PoC that there were a number of other HHS initiatives that could benefit from an
Enterprise Master Data Management (EMDM) solution.
It was determined that a Master Data Management combined with an Enterprise Data Warehouse
or a data bank might be utilized in the development of an enterprise level information
repository that could be considered for use on Health Information Exchange (HIE) initiatives. A
single enterprise level MDM system to handle the cleansing, standardization, and linking of
client records for use in performing consistent data exchange with other nodes in the HIE
network could prevent data mismatches during the process. This could effectively eliminate the
need for individual agencies and/or departments to develop multiple data matching solutions
(and algorithms for matching) and interfaces with various trading partners (Providers,
Physicians, the RHIE, and/or National Health Information Network (NHIN)) that could avoid
risks related to lack of data integrity and data corruption in the exchange processes.
It is important to note that to date several limited silo-centric MDM solutions or processes have
been identified as currently being used within HHS. While some areas identified the need to
enhance or upgrade these solutions and were interested in contributing requirements to an
enterprise level solution, other areas believed that their solutions or processes sufficed from their
individual operational point of view. MDM solutions currently in use within HHS included:
Informatica SSA
Sun GlassFish
An older version of Informatica SSA combined with custom code
SPSS and Python based solution
Custom Code for matching clients between systems at an HHS agency.
The purpose of this POC was not to facilitate the selection of a recommended technology or tool.
That is recommended as a future next step. Rather, the assessment was to verify the availability
of a comprehensive solution that could provide a complete spectrum of capabilities reflecting
current needs in place today while at the same time is scalable and uses more current matching
algorithms and techniques for future initiatives. It is important that a solution with this wide
range of capabilities be assessed for the following reasons:
1. Current solutions and tools implemented were often chosen for a narrower, operationspecific set of requirements (e.g. batch processing only with no data stewardship) and
often driven by having limited financial resources.

Page 19

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

2. The different platforms available in the market, including those in use within HHS, use
different matching mechanisms and logic that do not match across platforms. For
example, the resulting set of matching clients using one technology is not the same as
another (although most may overlap). This does not represent a fully effective and
consistent solution as client matching mechanisms for an enterprise level view of a client
will have different results and mismatches as data is pulled together from different
sources, thus repeating the problem that master data management was supposed to solve.
For this reason, a solution that allows for flexibility in automation versus manual decision
making through data stewardship with flexibility to centralize or decentralize the data
governance decision making processes is important. This allows the owners of record at various
levels of the organization to participate in the data management and provisioning processes.
While the capabilities may exist in vendor tools and technology offerings (through additional
modules), the current implementations of MDM tools in the enterprise do not reflect such
sophistication and are therefore limited.

Page 20

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

Recommendations

Conducting this PoC resulted in the following recommendations for implementing an Enterprise Master
Data Management (EMDM) solution:
1. Identified requirements for a robust, comprehensive, and enterprise level MDM solution.
The MDM solution selected needs to include a robust and comprehensive toolset representing
the current and future needs of the enterprise. The current environment of multiple, limited
solutions and implementations presents a barrier to enterprise level master data management and
in turn, an enterprise level view of a client. That comprehensive toolset should have the ability
to:

Customize data standardization rules that could be applied similarly to both batch processes
and real-time processes
Analyze data sets and identify data quality concerns and inconsistencies
Match data using an easily customized set of rules and weight factors
Delegate data stewardship with a user friendly interface review and processing of
suspected duplicates identified by the data matching process that require human
intervention for final determination
Capture end-to-end metadata (or data about data) to show data lineage (where data comes
from) and impact analysis (how adding or changing data will affect existing data)
Interact with standard, authenticated data sources, like USPS (US Postal Service) to verify
addresses and SSA (Social Security Administration) to check death records
Provide capabilities to efficiently create standardized data sets that will be used
downstream to exchange data with external entities. E.g. to adapt to various electronic data
exchange standards, including X-12, HL7, etc.

Implementing an MDM solution with a robust toolset like the functionality described above
decreases the amount of manual record-matching needed and when configured effectively,
reduces mismatched records. Regardless of whether a mismatch results in an incorrectly merged
record that was in fact unique clients or failure to merge records that were duplicates, the impact
can be costly.

Not merging duplicate records can result in clients receiving benefits to which they are not
entitled and that could lead to another client receiving fewer benefits due to insufficient
funding.
Incorrectly merging client records could result in the inappropriate disclosure of
sensitive/confidential data.
When an incorrectly merged client record is split, the process is complex because multiple
transactions for two people may be recorded as one person and it may not be clear which
transaction was entered for which person.

Page 21

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

2. Data Governance
Organizational infrastructure and processes need to be committed for data governance. An
MDM solution cannot replace all components of master data management needs. However, it
can minimize the resources currently being used to perform MDM-related or MDM-like
activities. Prior to designing and implementing an MDM solution, a data governance team needs
to be in place to:

Identify and prioritize the data elements / attributes to be captured and maintained within
the MDM repository.
Identify data matching attributes and qualitative scoring that will be used to determine
unique client criteria.
Clearly define system of record precedence when matching and merging records (which
system has the best source of data for each data element).
Define data standardization rules to be enforced via automation, such as changing all
instances of Street to ST to standardize an address so that the USPS can validate the
address.
Identify and establish the process and data steward team that will have the authority to
handle suspect processing and criteria on where data should be corrected (i.e., when client
discrepancies exist, decide whether to automate the data correction process with source
systems or manage the corrected data in MDM with a notification to source systems that
the client exists).

A strong data governance structure is needed to ensure the accuracy of the data. MDM solutions
provide the technical means by which data can be managed to facilitate client matching across
the enterprise; it does not address the data ownership and decision making structure required to
accurately process and consolidate enterprise data. A team composed of data experts from each
agency is needed to work together to develop enterprise data-related rules (such as using USPS
standards for entering addresses) and to take ownership of the data to ensure those rules are
enforced; to address issues and concerns; and to govern the process as new elements for
inclusion in the MDM repository are identified.
3. Perform cost/benefit analysis (CBA) to determine the true implementation cost.
This PoC did not include a cost/benefit analysis (CBA) component. As this POC dealt only with
a small subset of HHS data, further analysis is required to determine the true cost for
implementing an enterprise MDM solution (i.e., license costs, staffing costs, and hardware
infrastructure).

Page 22

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

4. Perform additional proof of concept or perform pilot projects for MDM.


An additional PoC could be conducted with another recognized MDM solution provider to assess
viability of the technology for collaborative MDM and other use cases not verified during the
first proof of concept project. This recommendation suggests performing a PoC on a larger set
of data with more data sources to gain a better understanding of the data issues that might be
encountered with a full implementation. In addition, in order to properly assess viability and
maturity of the MDM concept, the PoC should be conducted with a different tool or technology.
In staying with the confines of maximizing value of performing the PoC through industry leading
vendors recommended by independent research, the PoC should be performed using technologies
such as Initiate systems or Oracle as recommended by Gartner Research. However, a
constraining factor may be that other vendors in the MDM market may not have the financial
resources to perform such a POC, so there is a risk that proving a solution on a full set of data
may not be feasible. This will also require organizational support and staff resources to facilitate
oversight of the PoC which may also be a constraining factor. In order to obtain meaningful
results to support a purchasing decision, it may be necessary to allocate funds to participate in an
actual pilot versus PoC.
5. Educate and present results of the MDM PoC to various user communities
A significant number of users and technical operations continue to maintain that MDM solutions
should be implemented at a local level with cheaper solutions. Often times, this approach is
recommended due to a lack of understanding or awareness of the overall, enterprise need, or a
lack of understanding of downstream processes from areas that may be a consumer of their data.
In addition, issues of control, limited budget, and speed of execution drive the decision making
process of choosing a local siloed implementation. This recommendation advocates presentation
of the MDM PoC results and education of the user community to achieve a broader level vision
of how MDM can positively impact agency wide operations and could be cost-effectively
implemented across the enterprise through cost sharing.

Page 23

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

Conclusion

From a technical standpoint, MDM was assessed to be a viable solution for the problem of matching and
linking clients/patients across different programs and systems with a high degree of trust. An MDM
solution implemented on an enterprise level could potentially play an integral part in the success of other
HHS initiatives, including the EDW Initiative and HIE/HIT. MDM could support and facilitate
enterprise level data governance operations.
The MDM concept needs to be assessed in more detail from the standpoint of collaborative analytics. A
successful Enterprise MDM implementation will require substantial planning and investment not only in
the software/hardware environment, but also in establishing a supporting governance structure. In order
to establish the viability of implementation, further research needs to be done on the financial viability
of an enterprise level solution. In addition, the performance capabilities of MDM solutions needs to be
researched through case studies of MDM implemented in other enterprises.
An additional PoC and/or pilot should be undertaken before a final MDM solution is selected so that the
tool set capabilities can be compared and the magnitude of effort required when working with a larger
set of data can be assessed.

Page 24

HHSC
Enterprise IT: IBM Master Data Management

PoC Executive Report

Appendices

Page i

HHSC Enterprise Information Technology


Master Data Management

Proof of Concept Assessment Report

Appendix A Glossary

Term / Acronym

Definition

BI

Business Intelligence

CBA

Cost Benefit Analysis

DOB

Date of Birth

EDW

Enterprise Data Warehouse

EIT

Enterprise Information Technology

EMDM

Enterprise Master Data Management

FTE

Full-Time Employee

HHS

Health and Human Services

HHSC

Health and Human Services Commission

HIE

Health Information Exchange

HIT

Health Information Technology

MCI

Master Client Index

MDM

Master Data Management

MITA

Medicaid Information Technology Architecture

MOU

Memorandum of Understanding

NHIN

National Health Information Network

PoC

Proof of Concept

RHIE

Regional Health Information Exchange

SME

Subject Matter Expert

SOA

Service-Oriented Architecture

SSN

Social Security Number

Page ii

You might also like