Professional Documents
Culture Documents
Table of Contents
1.
Page i
1. Problem Definition
The Health and Human Services (HHS) system uses various mission critical applications to support and
maintain its day to day operations for providing client services while providing decision making
capabilities within its executive, management and operational activities. As a result of program growth
across HHS agencies and the need to adapt to various laws, legislation, policies and procedures over the
years, the IT systems that support these operations have become complex, difficult to maintain, and
difficult to change from a decentralized, program-centric design of providing client services to a clientcentric design of providing client services. As a result, it is difficult to transform to more current
federal and state philosophies to move to a client-centric service delivery view and to support
interoperability initiatives as set forth by initiatives such as the Medicaid Information Technology
Architecture (MITA). For example, a lack of contextually consistent identification mechanisms,
definitions, and standards associated with tracking key business entities, such as clients, providers, and
services introduces a significant challenge for executive management and operational staff to get a
holistic view of an entity across programs. The ability to match and link these entities across different
programs and systems with a high degree of trust is a foundational level issue that could directly or
indirectly impact the successful implementation of upcoming initiatives on the HHS roadmap, such as
Enterprise Data Warehouse, MITA and Health Information Technology / Health Information Exchange
(HIT/HIE).
Establishing a robust and reusable solution for programs and applications to establish a trustworthy
enterprise view of the client is critical for moving forward with future initiatives at HHS. There are
several inherent system designs, operational practices, and technical issues that currently prevent HHS
from creating an enterprise view of a client or a provider at this time.
Data Collection
Each HHS agency uses independent operational systems to support their various programs. Although
key components of a given data set are often similar across the enterprise (e.g. client data: name,
date of birth, social security number, address, etc.), the data collected by each agency resides in silocentric systems in different formats with varying operational business rules. Linking data from
different sources across the enterprise is difficult due to program-specific system designs,
inconsistent data formats, and lack of data sharing agreements.
Data Cleansing
To assist executive management in making informed decisions and to satisfy HHS analytical and
reporting needs, several partial or unsuccessful attempts have been made to consolidate data into one
central location from various context-specific data sources across a subset of HHS agencies /
programs rather than from the perspective of establishing a complete client-centric view of services
availed. Often times, these data collection and cleansing processes are extremely resource-intensive
and, in some cases, do not accurately consolidate the large amount of data in a contextually
meaningful way.
Page 2
Data Matching
Client data collected in various mission-critical applications are operation-specific and their designs
ignore the existence of the same client data being captured in one or more other HHS systems. Data
collected in each system have separate business rules, formats and attributes that present challenges
to matching client data from various agencies and prevent establishing a single view of a client. This
operational practice of capturing client data for different contexts without contextual validation
across systems during the operational process, makes matching and reconciliation of cross-system
data a significant challenge during downstream operations and strategic analysis activities.
Data Standardization and Ongoing Maintenance
There are limited enterprise-level standards and guidelines to define HHS master data entities (client,
provider, claim etc.), or relationships between similar entities (client, patient, person) which has
contributed to redundant data across disparate operational and analytical systems and potential
duplication of services provided by HHS agencies to clients. See example in diagram 1 below.
Client Data
Source
Client Data
Attribute
Data Source
(DS) 1
INDV_ID
Data Source
(DS) 2
XXX_MEMBER_NO
Data Source
(DS) 3
Data Source
(DS) 4
PCN_NBR
RECORD_KEY
Data Source
(DS) 5
PERSON_ID
Client data is stored with different names, formats and values there is no single view of a client.
Problem: Client data cannot be joined across various systems without extensive data analysis
and transformation rules.
Diagram 1: Ways of storing Client Data in various HHS systems
There is no single, accurate, and comprehensive reusable framework to link client data across the
various systems as a proactive and foundational basis for decision making and managing operations
from a cross-functional client view perspective. Operational practices that attempt to build such
context-specific views are therefore reactive in nature and are often resource intensive, involve
significant manual intervention, and take a significant amount of time to design and perform the
necessary cross-system analysis. In addition, these reactive solutions are often situation-driven,
context-specific, and offer limited opportunity for expansion into reusable robust enterprise-focused
long term solutions that efficiently leverage and capture cross-program subject matter knowledge.
A potential solution to these data issues is to proactively recognize this pattern of problems across
the enterprise and establish a unified view of commonly used entities (client, provider, claim, etc.) to
make available entity-centric structures at appropriate levels of detail for reuse by various agency-
Page 3
specific analytical and operational level activities. Program level entity-matching operations could
then refocus their resources from siloed, resource intensive, entity matching activities using a service
from a centralized repository that maintains the necessary business intelligence and data for a
dynamic entity-centric view of key business entities.
This foundational level strategy of centralized data management, while potentially resource intensive
from an organizational support and automation resource support standpoint, could be the basis for
cost effectively making investments. The return on these investments would be very effectively
demonstrated for existing program functions and systems, as well as future client-centric data
management initiatives on the HHS roadmap, such as the Enterprise Data Warehouse (EDW),
Medicaid Information Technology Architecture, and Health Information Technology / Health
Information Exchange (HIT/HIE).
Master Data Management (MDM) comprises a set of processes and tools that consistently define and
manage the data entities of an organization. MDM has the objective of providing processes for
collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing such
data throughout an organization to ensure consistency and control in the ongoing maintenance and
application use of this information.
MDM is about two critical components -- the data itself and the functionality to ensure the data is
contextually accurate and timely.
While data is the foundation of a Master Data Management solution, it cannot be effective without a
secondary component -- functionality to govern the data. Data on its own has no ability to maintain
data readiness or more simply, the accuracy of the data in the context of a specific purpose. Master
data must be actively managed by appropriately selected data stewards within an organization.
Page 4
An Enterprise MDM strategy, when properly implemented, can be beneficial across the HHS
enterprise for cross agency data sharing and synchronization, Health Information Exchange (HIE)
efforts, Medicaid Information Technology Architecture (MITA), and the creation of a true Enterprise
Data Warehouse (EDW).
A true Enterprise MDM implementation can manage changes, event triggers and notifications across
all applications enterprise-wide. In addition, MDM must do more than simply house the data; it must
manage its use in processes across the enterprise using different implementation strategies.
MDM in the context of Cross-Agency Data Sharing and Synchronization Initiatives
A Collaborative MDM manages the process of creating, defining, and synchronizing
master data across systems. Once the master data is defined, it can then be synchronized
with operational and analytical systems and applications. Collaborative MDM provides a
platform to aggregate, enrich, and publish definitional data and requires workflow and
advanced security capabilities.
The MDM solution provides execution on all critical data changes and event notifications
from simple to complex. This includes everything from resolving a duplicate record to
determining which systems get specific updates.
For example, address changes made in one source system can be sent to MDM as part of
real-time updates or a daily batch feed to update the master record. MDM can then
identify that the same client exists in other systems within HHSC and can send a critical
data change notification to these systems as well.
MDM in the context of upcoming Health Information Exchange (HIE) and Medicaid
Information Technology Architecture (MITA) Initiatives
In an Operational MDM, use and maintenance of master data occurs within operational
processes and applications. The master data is leveraged by other systems using these
services. Operational MDM can leverage and become a significant part of a servicesoriented architecture to support a variety of application needs. In the case of the HHS
environment, HIE and MITA related initiatives will establish systemic processes that
could benefit from Operational MDM. The MDM implementation requires performance
Page 5
to handle high transaction levels and should have open integration with operational
applications. Operational MDM uses pre-defined, out-of-the-box business services.
An Operational MDM solution is modeled on a service-oriented architecture (SOA),
should be flexible and scalable, and have some predefined set of out-of-the-box functions
to support the management and integrity of data. Operational MDM systems have the
flexibility to extend functionality to support new or additional business processes.
MDM in the context of Business Intelligence (BI) through Enterprise Data Warehouse
(EDW) Initiatives
An Analytical MDM provides accurate, consistent, and up-to-date master data to an
Enterprise Data Warehouse (EDW). It feeds business intelligence insight data back into
collaborative and operational MDM.
For example, a change in address (city/county/region) by a client thru MDM can indicate
that he/she is now eligible under a different program previously not available for the
client. MDM can then be used to send a notification to the EDW that can trigger a
Business Intelligence (BI) event to alert the case worker to contact the client regarding
the additional eligibility available.
To evaluate the viability and capabilities of master data management (MDM) using client data across
different program data sets, HHSC entered into an agreement with IBM to perform a proof of
concept (PoC) exercise. The IBM MDM solution was chosen for the evaluation for two key reasons:
1. IBM had been previously identified as one of three visionary industry leaders in the customer
data integration solution space during a Gartner Research Study in May 2009.
2. IBM agreed to commit resources and make available the necessary software and hardware
infrastructure to perform the proof of concept exercise in accordance with HHSC policies
and procedures.
The IBM InfoSphere product suite was used to assess if the Master Data Management (MDM)
technical solution could help HHSC in defining a single view of a client called Master Client Index
(MCI). The PoC was designed to demonstrate the viability of IBM MDM software products to build
a unified, standardized, and integrated repository of clients served and used by the various benefits
programs offered by HHS. The PoC was intended to:
Prove the benefits of utilizing a Master Data Management solution within HHS using
business use cases.
Validate the role of MDM in enabling strategic and operational analytic applications.
Page 6
The scope of the PoC was intended to demonstrate the functional and technical capabilities of an
MDM solution by accomplishing the following:
Determine the attributes that should be used to match client records across various source
systems (SSN, name, address, etc.)
Identify individual clients processed by multiple source systems
Resolve the same clients records across multiple systems into a single record based on
matching attributes
Assign a single, integrated Master Client Index key for each individual
Create a Master record which associates all source system keys for an identified client
Use the Master record for integrated reporting on client information across source systems.
The PoC assesses if various HHS agencies and systems could take full advantage of a Master Client
Index capability maintained at an enterprise level. An MCI could potentially enable various HHS
applications to link data with other systems to derive and answer analytical and operational questions
accurately and assist business operations and executives to make informed decisions. Assessment of
the performance capabilities of the various hardware and software tools used in the PoC was not in
scope for this PoC. In addition, the focus of the PoC was to evaluate the general capabilities and
maturity of available MDM tools rather than perform a technology evaluation of the IBM MDM
solution relative to other technologies.
Page 7
The IBM MDM tool set and high level functionality has been summarized in this section.
IBM Master Data Management
Industry Models & Assets
Configured From
A Multi-Form
Master Data
Management
System
Collaborate
Operationalize
Analyze
Exploit
Information
Server
Understand
Cleanse
Transform
Deliver
Standardize, merge,
and correct information
Synchronize, virtualize
and move information for
in-line delivery
QualityStage
DataStage
DataStage
Information Analyzer
Page 8
Product Suite
IBM InfoSphere
Information Server
Description
IBM InfoSphere Information
Server enables businesses to
perform five key integration
functions:
Page 9
Description
services so data is managed
independently of any single line
of business, system or
application. This strategy enables
enterprises to identify common
functionality for all systems and
applications and then support
efficient, consistent use of
business information and
processes.
Per IBM, their MDM platform moves beyond previous attempts at centralizing control of data by allowing users to fully manage
data with multiple domains and multiple styles of data usage.
Middleware &
Business
Processes
Data
warehouses
& Analytics
Data
Stewards &
MDM Users
MDM
Business
Intelligence
Business
Logic
Data Quality
Management
Data
Governance
Knowledge
MDM Domains (i.e.. Party, account,
Page 10
UI Applications
To perform the PoC, HHSC EIT collected client data from various agencies/systems. Due to
the limited hardware capacity associated with performing a PoC, only a small subset of client
data was used to assess the functionality of IBMs MDM product suite. Specifically, the
subset of client data sets whose last name began with I was chosen for this exercise.
To prove Master Client Index (MCI) integration between data sources, analytical use cases
were defined to merge MCI data with claims data from one system and lab data from
another. These use cases were designed to prove how an MCI could be used for data
warehousing and analytical application integration.
The diagram on the next page shows the architecture, number of data sources and use cases
involved in the IBM MDM PoC.
Page 11
Page 12
The diagram and tables below present the various data sets that were used for the PoC
including the subset of data loaded from each source into the consolidated data environment
on the MDM Server.
Page 13
The following table describes the data source details of the data files used to load MDM.
Source
Data Details
DS 1
DS 2
DS 3
DS 4
Data from source systems master client index table (all clients).
Only loaded clients whose last names started with I into the PoC.
Client addresses not in this source system were obtained via a data extract
from the appropriate source system.
Monthly client extract (all clients obtained).
Only loaded data for clients whose last name started with I
Monthly extract file as of August 2009.
Only loaded data in for clients whose last name started with I
Clients and claims data.
2 Years (2008 and 2009) of client and lab data from this system.
Statistics quoted are only for clients whose last name starts with I.
Page 14
The table below summarizes the steps facilitated by the MDM software and the associated
counts as data was loaded from the different sources into the MDM environment and the
different software capabilities were used:
Data Load Description
Raw client data starting with "I" for all 4 data sources (clients loaded into the
consolidated table)
Duplicates within a data source dropped (sum for all 4 data sources)
Records dropped due to data issues within each data source (total of 4 data sources)
Records duplicated across data sources
(updated existing MDM record with additional data source identifier)
Client records resulting from standardization & matching (clients loaded into master
table)
Rows dropped due to invalid last name
Total clients loaded into MDM
Record
Count
69,057
5,620
3,563
7,298
52,576
749
51,827
The MDM solutions data transformation step included data standardization. This data
standardization process was required to address the following types of issues encountered
with source system data prior to the creation of the MCI:
DS 1
15% of the records had a blank value in the Social Security Number field.
30% of the records contained filler information in the address fields (e.g.
Hurricane Ike, Homeless, Same as above)
75% of the addresses were blank in the data extracts utilized
Non-standard data entries within the address related fields i.e. address values
spread across multiple columns
11% of clients had multiple records within the same data source
DS 2
DS 3
Suspected invalid age data, client ages greater than 107 years
28% of the records had non-standard address values
Records containing case numbers with zero values
2% of clients had multiple records within the same data source
Page 15
DS 4
The data below provides additional details on the number and percentage of data
standardization and duplication issues encountered in the subset of data used for the PoC.
DS
DS
DS
DS
1
2
3
4
DS
DS
DS
DS
DS
DS
DS
DS
1
2
3
4
1
2
3
4
DS
DS
DS
DS
Duplicate Records
Count
Percent
11,696
16.93
284
0.41
133
0.19
4,367
6.32
DS
DS
DS
DS
1
2
3
4
Once the final MCI had been created, further analysis of the data using the software allowed
one to observe the following:
Page 16
Attributes Used
First Name, Last Name, Street Name
First Name, Last Name, DOB
SSN
Data evaluated for each grouping used the following attributes in the data matching process:
SSN
Last Name
First Name
Middle Name
DOB
Gender
Address
City
State
Zip
County
Region
Phone
Source System Key
As issues were identified in the data matching process, the MDM tool set allowed additional
data matching rules to be defined. Overall observation was that any MDM solution
implemented would require a flexible tool set that could be customized to address data
matching needs.
Page 17
Use cases identified used a sampling (clients whose last names start with I) of HHS data
from all 4 data sources. Both operational and analytical use cases were utilized.
Operational MDM Use Cases (Enterprise Master Client Index (MCI)):
Identify and report on contradictory and/or overlapping attribute values per identified
individual and general data profiling information discovered in the analysis
Successfully demonstrated.
Demonstrate the ability to identify a suspected duplicate individual during an
operational add of a new individual to the Master Data Management Repository
Successfully demonstrated.
Demonstrate the potential capability to enable HHS applications to search, access,
and update individual client information with service calls to the Master Data
Management repository Successfully demonstrated.
Single View of Client and Claims for auditing purpose Successfully demonstrated.
Costs for the 100 Most Costly Medicaid Clients Successfully demonstrated
Five claim data files were manually loaded to a claim fact table (2
acute care and 3 CMS claims). This report was generated by joining
the claim fact table with the customer table which was a dimension out
of MDM.
Determine Diabetic Clients Overdue for a Medical Screening Not
performed
No collaborative use cases had been identified for this PoC at the onset of planning
this exercise. Future POC and assessment activities will need to validate these
capabilities.
Page 18
Page 19
2. The different platforms available in the market, including those in use within HHS, use
different matching mechanisms and logic that do not match across platforms. For
example, the resulting set of matching clients using one technology is not the same as
another (although most may overlap). This does not represent a fully effective and
consistent solution as client matching mechanisms for an enterprise level view of a client
will have different results and mismatches as data is pulled together from different
sources, thus repeating the problem that master data management was supposed to solve.
For this reason, a solution that allows for flexibility in automation versus manual decision
making through data stewardship with flexibility to centralize or decentralize the data
governance decision making processes is important. This allows the owners of record at various
levels of the organization to participate in the data management and provisioning processes.
While the capabilities may exist in vendor tools and technology offerings (through additional
modules), the current implementations of MDM tools in the enterprise do not reflect such
sophistication and are therefore limited.
Page 20
Recommendations
Conducting this PoC resulted in the following recommendations for implementing an Enterprise Master
Data Management (EMDM) solution:
1. Identified requirements for a robust, comprehensive, and enterprise level MDM solution.
The MDM solution selected needs to include a robust and comprehensive toolset representing
the current and future needs of the enterprise. The current environment of multiple, limited
solutions and implementations presents a barrier to enterprise level master data management and
in turn, an enterprise level view of a client. That comprehensive toolset should have the ability
to:
Customize data standardization rules that could be applied similarly to both batch processes
and real-time processes
Analyze data sets and identify data quality concerns and inconsistencies
Match data using an easily customized set of rules and weight factors
Delegate data stewardship with a user friendly interface review and processing of
suspected duplicates identified by the data matching process that require human
intervention for final determination
Capture end-to-end metadata (or data about data) to show data lineage (where data comes
from) and impact analysis (how adding or changing data will affect existing data)
Interact with standard, authenticated data sources, like USPS (US Postal Service) to verify
addresses and SSA (Social Security Administration) to check death records
Provide capabilities to efficiently create standardized data sets that will be used
downstream to exchange data with external entities. E.g. to adapt to various electronic data
exchange standards, including X-12, HL7, etc.
Implementing an MDM solution with a robust toolset like the functionality described above
decreases the amount of manual record-matching needed and when configured effectively,
reduces mismatched records. Regardless of whether a mismatch results in an incorrectly merged
record that was in fact unique clients or failure to merge records that were duplicates, the impact
can be costly.
Not merging duplicate records can result in clients receiving benefits to which they are not
entitled and that could lead to another client receiving fewer benefits due to insufficient
funding.
Incorrectly merging client records could result in the inappropriate disclosure of
sensitive/confidential data.
When an incorrectly merged client record is split, the process is complex because multiple
transactions for two people may be recorded as one person and it may not be clear which
transaction was entered for which person.
Page 21
2. Data Governance
Organizational infrastructure and processes need to be committed for data governance. An
MDM solution cannot replace all components of master data management needs. However, it
can minimize the resources currently being used to perform MDM-related or MDM-like
activities. Prior to designing and implementing an MDM solution, a data governance team needs
to be in place to:
Identify and prioritize the data elements / attributes to be captured and maintained within
the MDM repository.
Identify data matching attributes and qualitative scoring that will be used to determine
unique client criteria.
Clearly define system of record precedence when matching and merging records (which
system has the best source of data for each data element).
Define data standardization rules to be enforced via automation, such as changing all
instances of Street to ST to standardize an address so that the USPS can validate the
address.
Identify and establish the process and data steward team that will have the authority to
handle suspect processing and criteria on where data should be corrected (i.e., when client
discrepancies exist, decide whether to automate the data correction process with source
systems or manage the corrected data in MDM with a notification to source systems that
the client exists).
A strong data governance structure is needed to ensure the accuracy of the data. MDM solutions
provide the technical means by which data can be managed to facilitate client matching across
the enterprise; it does not address the data ownership and decision making structure required to
accurately process and consolidate enterprise data. A team composed of data experts from each
agency is needed to work together to develop enterprise data-related rules (such as using USPS
standards for entering addresses) and to take ownership of the data to ensure those rules are
enforced; to address issues and concerns; and to govern the process as new elements for
inclusion in the MDM repository are identified.
3. Perform cost/benefit analysis (CBA) to determine the true implementation cost.
This PoC did not include a cost/benefit analysis (CBA) component. As this POC dealt only with
a small subset of HHS data, further analysis is required to determine the true cost for
implementing an enterprise MDM solution (i.e., license costs, staffing costs, and hardware
infrastructure).
Page 22
Page 23
Conclusion
From a technical standpoint, MDM was assessed to be a viable solution for the problem of matching and
linking clients/patients across different programs and systems with a high degree of trust. An MDM
solution implemented on an enterprise level could potentially play an integral part in the success of other
HHS initiatives, including the EDW Initiative and HIE/HIT. MDM could support and facilitate
enterprise level data governance operations.
The MDM concept needs to be assessed in more detail from the standpoint of collaborative analytics. A
successful Enterprise MDM implementation will require substantial planning and investment not only in
the software/hardware environment, but also in establishing a supporting governance structure. In order
to establish the viability of implementation, further research needs to be done on the financial viability
of an enterprise level solution. In addition, the performance capabilities of MDM solutions needs to be
researched through case studies of MDM implemented in other enterprises.
An additional PoC and/or pilot should be undertaken before a final MDM solution is selected so that the
tool set capabilities can be compared and the magnitude of effort required when working with a larger
set of data can be assessed.
Page 24
HHSC
Enterprise IT: IBM Master Data Management
Appendices
Page i
Appendix A Glossary
Term / Acronym
Definition
BI
Business Intelligence
CBA
DOB
Date of Birth
EDW
EIT
EMDM
FTE
Full-Time Employee
HHS
HHSC
HIE
HIT
MCI
MDM
MITA
MOU
Memorandum of Understanding
NHIN
PoC
Proof of Concept
RHIE
SME
SOA
Service-Oriented Architecture
SSN
Page ii