Professional Documents
Culture Documents
Reality
Father calls me William, Sister calls me Will, Mother calls me Willie, But the fellers call me Bill!
Eugene Field (Poet)
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
A Thought
Are these the same people?
12 May 49
British
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
,
Sid Ali Ahmd Al Gamdi, Saud Ali Abdullah AlGamdi
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Transliteration Realities
Transliteration does not make the problem of identity search & matching go away, it just adds to the complexity The ideal solution captures the identity data in both local source and transliterated form Together, and with algorithms that address the individual characteristics of each form, the opportunity for success is multiplied even further.
Transliterated into French by Algerian speaker Transliterated from French into English by English Speaker
Arabic Identity
In high-risk systems,
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Identify persona non-grata Check visa applicants against known terrorists and undesirables Identify threats and prevent entry at border posts Manage case lifecycle for immigration benefits
IDENTITY ADVANTAGE RESULTS/BENEFITS
THE CHALLENGE
Identity data in national watch lists are incomplete, spelled incorrectly, from various languages, countries and cultures. Romanization and transliteration introduces more error and variation Cost of missed match
Embeddability Transactions latency and throughput Ability to deal with incomplete or partial data Ability to deal with entity data from anywhere in the world
Improved performance and better accuracy compared to in-house solutions Lower false positives helps resource/staff mgmt Lower TCO COTS, ongoing maintenance Reduced reliance on internal resources
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Hybrid Approach
Linguistic
No single algorithm is capable of compensating for all the classes of error and variation present in identity data.
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Our Solution
Smart indexing Overcome spelling & phonetic errors missing/out-of-order words and other errors & variation transliteration & multi-country data Flexible Search Strategies Balance performance and comprehensiveness of search Matching algorithms Mimic a human experts ability to determine a match based on numerous attributes
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
At search time :
2. Search the data for the required candidates 3. Verify the Match using additional data elements
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
A J S
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Name (Compressed) Andrew Jackson Smith Jackson Smith Andrew Smith Andrew Jackson
Address 9 Headley Road Woodley Reading 29 Headley Road Reading 12 High Street Bracknell
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Name (Compressed) Andrew Jackson Smith Jackson Smith Andrew Smith Andrew Jackson Andrew Smythe Smythe Andrew
Address 9 Headley Road Woodley Reading 29 Headley Road Reading 12 High Street Bracknell
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Name (Compressed) Andrew Jackson Smith Jackson Smith Andrew Smith Andrew Jackson Andrew Smythe Smythe Andrew Andrew Smith Smith Andrew
Other Data (Compressed) ABC123+ ABC123+ ABC123+ ADE938+ ADE938+ ARF073+ ARF073+
Address 9 Headley Road Woodley Reading 29 Headley Road Reading 12 High Street Bracknell
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Name (Compressed) Andrew Jackson Smith Andrew Smith Andrew Smythe Jackson Smith Andrew Smith Andrew Jackson Smythe Andrew Smith Andrew
Other Data (Compressed) ABC123+ ARF073+ ADE938+ ABC123+ ABC123+ ADE938+ ARF073+
Address Index
Address 9 Headley Road Woodley Reading 29 Headley Road Reading 12 High Street Bracknell
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
At search time :
2. Search the data for the required candidates
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
From Key
KDM$$
To Key
KDMZZ
Database Index
Key
KDM/> KDM$< KDM$E
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
At search time :
2. Search the data for the required candidates 3. Verify the Match using additional data elements
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Step 3 : Scoring
The scoring step is carried out using the fields chosen by the user :
i.e. Name : Andy J Smith Address : 9 Hedley Rd Reading DOB : 12 May 1975 (Choosing weights suitable for finding the same Resident)
ID Name Andrew Jackson Smith Andrew Smythe Andrew Smith Address 9 Headley Road Woodley Reading 29 Headley Road Reading 12 High Street Bracknell DOB 12/05/75 05/07/49 05/12/75
97 90 54
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
User Results
The required data is found quickly Results are scored and ranked
Search : Andy J Smith 97 90 Andrew Smythe 9 Hedley Rd Reading 29 Headley Road Reading 12 May 1975 05/07/49
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
SSA-NAME3
SDK for on-line and batch name search and matching applications Core technology for other IIR products
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
But also :
Car registrations Music Titles ..
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Tuning
We have 20+ years experience of tuning to overcome an enormous variety of data and business issues But YOU know YOUR data better than anyone SO
Here is our knowledge and some tools to help you tune the knowledge to fit your data
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Tuning Tools
Included is the ability for :
Users to add their knowledge Administrators to add, change and delete rules Business rules to be customised Field weightings to be modified Thresholds to be set
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Jonathon Smith
Smith Jon
Mr J Smith
Johnny Smythe
Dr J A Smith
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Checking Lists
Database
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Summary
Our Identity Resolution Software is used : To overcome the unavoidable error and variation in identity data 24/7 batch and online Cost of missing a match is high Varied character sets and Transliteration Large volumes of data
11 TB of data 2 million real time searches per day 1 million batch transactions per hour
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
A Final Thought
Are these the same people? Could your existing software tell you?
12 May 49
British
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.
Questions ?
Copyright 2008 Informatica Corporation. All rights reserved. Unauthorised distribution or copying is prohibited.