You are on page 1of 32

Big

Data and Predic.ve


Analy.cs for AML and
Financial Crime Detec.on
Sanjay Kumar
GM Industry Solu.ons – Telecom & FS
Agenda

Ã  Introduc=on
Ã  What is Financial Crime, AML and what we are seeing in the AML Space
Ã  Brief Discussion of Customer Ac=vity in AML
Ã  Illustra=ve Use Cases
Ã  Where Current Implementa=ons fall short?
Ã  Reference Architecture for AML and Predic=ve Analy=cs
Ã  Q&A

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


FSI Industry Market Segments
•  It is key to understand the market segment of
•  There are 4 primary market segments/ sectors the Banking company as the business process
comprising the global FSI industry: Capital Markets; and data/informa=on needs and challenges are
Retail Banking, Payments; Market Exchanges. very different across the 4. Addi=onally,
•  Each geography, country and state may have their own challenges vary by Premium/Revenue =er.
regula=on and compliance requirements for products, •  There are many Global FS companies which
distribu=on and ra=ng requirements. Banking is the most may define standards globally and deploy
regulated industry! locally.

FSI Industry"

Capital Markets" Retail Lines" Payments" Market Exchanges"

Acquirer & Issuer


Investment Banks" Hedge Funds" Wealth Mgmt" Consumer lines" Corporate" Banks " Schemes"

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


Impact of Big Data in 5 major areas

Predictive Analytics
Analy=cs enabling both defensive and offensive use cases
And ML/DL

Digital Banking Enabling Digital bank, providing seamless customer experience

Capital Markets Enhancing capabili=es across investment banking, trading etc.

Improving wealth management capabili=es thereby providing enhanced customer


Wealth Management
service

Cybersecurity Helping defend ins=tu=ons against cyber threats

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


Why Big Data for Financial Crimes and Controls

Ã  Firms, large and small, need to navigate a set of increasingly complex compliance rules
and regula=ons as regulatory bodies clampdown on loopholes in the financial regulatory
framework. With =ghter regula=on comes the need to seek out more advanced and
cost effec=ve compliance solu=ons
Ã  It is es=mated by the Financial Ac=on Task Force that over one trillion dollars is
laundered annually.
Ã  Regulators increasingly require greater oversight from ins=tu=ons, including closer
monitoring for an=-money laundering (AML) and know your customer (KYC)
compliance.
Ã  The methods and tac=cs used to launder money are constantly evolving, from loan-back
schemes and front companies, to trusts and black market currency exchanges, there is
no “typical” money laundering case.

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


What Is AML, Financial Crime and What we are
seeing in AML

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


What is AML and Financial Crimes

Ã  Financial crime is commonly considered as covering the following offences:


–  Fraud
–  Electronic Crime(Credit Card, stolen informa=on etc)
–  Money Laundering
–  Terrorist financing
–  Bribery and Corrup=on (KYC)
–  market abuse and insider dealing (Trade Surveillance)
–  Informa=on security (Cyber Security)

Ã  An=-money laundering (AML) is a term mainly used in the financial and legal industries
to describe the legal controls that require financial ins=tu=ons and other regulated
en==es to prevent or report money laundering ac=vi=es.

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


Financial Crime Is On the Rise!

A poll of 500 execu.ves and owners of small and medium businesses showed:
58% of businesses were vic=ms of fraud

80% of banks failed to catch fraud before funds were transferred out

87% of fraud aiacks, the bank was unable to fully recover assets

40% of businesses said they have moved their banking ac=vi=es elsewhere

20% Only 20% of banks were able to iden=fy fraud before money was transferred.

“The ROI of inves/ng in fraud preven/on is clear.”


8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Source: Ponemon Ins=tute/Guardian Analy=cs study, March, 2010
Key AML Use Cases

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


Case1: Understand Customer Profile (KYC)
•  Case Descrip.on: Mr Alex is a Compliance officer at ABC bank. While scru=nizing number of the customer profile and account
ac=vity he noted some suspicious ac=vity in one of the customer's account. Customer profile and account ac=vity has the following
informa=on.
•  Customer Profile:
–  Individual customer account, Risk Type Classifica=on – Sensi=ve Client, Senior Public Figure. Customers carrying out large
transac=ons
–  A number of transac=ons in the range of $10000 to 5,000,000 carried out by the same customer within a short space of =me
–  A number of customers sending payments to the same individual
•  Uniqueness of Use case: Mul=–Channel Linked Accounts involving mul=ple geography
•  Data elements involved
– Customer Data
– Transac=on Data over 5 year period
•  Challenges with current technology
– Mul=ple Linked Accounts and Past History beyond 6 months Data retrieval
– Real-=me visualiza=on
l  Suppor.ng Data required to simulate the use case
– Cross Currency, Cross Geography Loca=ons
– Mul=ple Channels Transac=ons
– Mul=ple Cross Currency transac=ons from USD, SGD, GBP and EUR
– Nearly x Accounts
– Across Geography in 50 countries
– Between 500-600 CR/DB transac=on every Month
l  Results / Objec.ve of Use Case: To demonstrate Mul= Channel transac=ons with historic data set
l 10 Visualiza.on to show results of use case: To be iden=fied
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Case2: Mul. Product Linked Accounts (KYC)
•  Case Descrip.on: A customer profile with a business profile with linked accounts and Transac=on across products and investments.
There are many funneled transac=ons in to the account and investments across geographical loca=ons of high risk countries.
•  Customer Profile:
–  Business customer account, Risk Type Classifica=on – High Risk Client, Customers carrying out large transac=ons
–  Complex and Large cash transac=ons in the range of $50,000 above
–  Mul=ple Exchange of cash in one currency for foreign currency
–  High cash businesses such as restaurants, pubs, casinos, taxi firms, beauty salons and amusement arcades
–  A number of customers sending payments to the same individual
•  Uniqueness of Use case: Mul=–Product Linked Accounts
•  Data elements involved
– Customer Master Profile
– Product Master
– Transac=ons over x year data set
•  Challenges with current technology
– Mul=ple Linked Accounts with Mul= products
– Real-=me link visualiza=on and tracking
l  Suppor.ng Data required to simulate the use case
– Cross Currency, Cross Geography Loca=ons
– Mul=ple Product Transac=ons and wired transac=ons
– Mul=ple Cross Currency transac=ons from USD, SGD, GBP and EUR
– Nearly x Linked Accounts
– Across Geography in 50 countries
– Between 2000 CR/DB transac=on every Month
l 11 Results / Objec.ve of Use Case: To demonstrate Product transac=on links with historic data set
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
l  Visualiza.on to show results of use case: To be iden=fied
Case3: $200 Million Credit Card Fraud
•  Case Descrip.on: On Feb. 5, federal authori=es arrested 13 individuals allegedly connected to one of the biggest payment card
schemes ever uncovered by the Department of Jus=ce. The defendants' alleged criminal enterprise - built on synthe=c, or fake,
iden==es and fraudulent credit histories - crossed numerous state and interna=onal borders, inves=gators say.
•  Customer Profile:
–  169 Bank Accounts
–  25000 Fraudulent Credit cards
–  7000 false iden==es
–  Wired Transac=on across geographies
l  Uniqueness of Use case: Mul=ple customer profiles tracking
•  Data elements involved
– Customer Master Profile
l  Challenges with current technology
– Mul= Customer Profile tracking and verifica=on
– Accurate profile verifica=on by cross-verifica=on of public records with u=lity bills and bank accounts around the world
– Create a single en=ty view (SEV) of similar en==es
– Detect aliases whether they are created inten=onally or through human error
– Iden=fy irregulari=es in user input
– Reduce false posi=ves through data enrichment
l  Suppor.ng Data required to simulate the use case
– Cross Geography Loca=ons Profiles
– x Linked Accounts across different banks and products
l  Results / Objec.ve of Use Case: To demonstrate DE-duplica=on of customer profiles and verifica=on of iden=ty
l  Visualiza.on to show results of use case: To be iden=fied
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Case4: Social Network Analysis
•  Case Descrip.on: Analysis of Social Network Network sites to establish links with fraudulent customers Links
•  Customer Profile:
– Customer Profiles with over 5 Million records
– Across Geography in 50 countries
– Search, match and link with Telephone, Mobile Number, Email, Social Network IDs
– Iden=fy irregulari=es in user input
– Protect individual privacy concerns through anonymous resolu=on, displaying either the full matching records
– Reduce false posi=ves through Data enrichment
l  Uniqueness of Use case: Social Network Analysis of Customer Profiles
•  Data elements involved
– Customer Master Profile
l  Challenges with current technology
– Ability to link to social network sites and Text Analysis
l  Suppor.ng Data required to simulate the use case
– Customer Profiles gleaned from social network sites like Facebook, LindedIn, Myspace and other social networks/
communi=es
l  Results / Objec.ve of Use Case: To demonstrate Social Network iden=ty links with customer profiles to establish Fraudulent
customer profiles and to reduce false iden=ty
l  Visualiza.on to show results of use case: To be iden=fied

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


Case5: WatchList Filtering and Text Mining
•  Case Descrip.on: Watch list filtering primary requirement is to rou=nely scan current and prospec=ve clients against a database
(watch list) consis=ng of names, aka and address entries.
•  Customer Profile:
– Compare and scru=nize 1,000,000 names on the global PEP list
– Nearly 120 sanc=ons lists that collec=vely have more than 20,000 profiles.
– Watch list screening is crea=ng an effec=ve screening process that minimizes false posi=ves and false nega=ves.
– Search, match and link with names and provide comparison with actual and original records
l  Uniqueness of Use case: Text Mining of Unstructured Data
•  Data elements involved
– Customer Master Profile
l  Challenges with current technology
– Unstructured data results in False Posi=ves
– Number of Matching Rules and Ease of incorpora=ng Match Matrix changes.
– Customer Data Integrity
– Foreign names, mul=part names, hyphenated names, names which “sound” similar but spelled differently
(eg.Muhammed v/s Mohamad)
l  Suppor.ng Data required to simulate the use case
– OFAC's SDN list, Bank of England List, Denied Person's List
l  Results / Objec.ve of Use Case: To demonstrate Reliable and scalable watch-list filtering
l  Visualiza.on to show results of use case: To be iden=fied

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


Data Analysis Trends in AML

Ã  Need for highly interac=ve and visually appealing UI’s for inves=ga=on
Ã  Need for advanced analy=cs for deeper insight into trends in customer
behavior.
Ã  Higher degree of depth of analysis in AML program.
Ã  Guard against Aging technology and Manual approaches
Ã  Automated Risk Classifica=on Approaches
Ã  Need to reduce the volume of False posi=ves
Ã  The need for structured and unstructured data analysis

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


What we are seeing in AML..

l  Higher degree of technology sophis=ca=on among criminals


l  AML programs need to move from running detec=on processes on similar data
sets, to opera=ng across diverse data Fraud paierns of fraud demand 360 view
of Risk as well as an ability to work across more complex and larger data sets
l  Most illicit ac=vi=es spanning across geographies, products and accounts
l  Lack of efficiency in Inves=ga=on Tools and Processes
l  Expert Systems or Rules Engine based approaches becoming ineffec=ve
l  Predic=ve approach to detec=ng fraud is emerging as a key trend
l  Move to increased automa=on
l  The amount of data that is needed to feed the predic=ve approaches is growing
exponen=ally.

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


Where current solutions fall short

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


What We Have Seen at Banks
Ã  Fragmented Book of Record Transac=on systems
–  Lending systems along geographic and business lines
–  Trading systems along desk and geographic lines
Ã  Fragmented enterprise systems
–  Mul=ple general ledgers
–  Mul=ple Enterprise Risk Systems
–  Mul=ple compliance systems by business line
•  AML for Retail, AML for Commercial Lending, AML for Capital Markets…
•  Lack of real =me data processing, transac=on monitoring and historical analy=cs
Ã  Typically proprietary vendor and in-house built solu=ons that have been acquired over
the years building up a significant technological debt.
Ã  Unable to keep pace with the progress of technology
Ã  Move to combine Fraud (AML, Credit Card Fraud & InfoSec) into one plavorm
Ã  Issues with flexibility, cost and scalability

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

High Level Solution - Architecture
Predictive Analytics

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


Some essen.al data elements for AML: Structure and Unstructured

Ã  Inflow and ouvlow


Ã  Links between en==es and accounts
Ã  Account ac=vity: speed, volume, anonymity, etc.
Ã  Reac=va=on of dormant accounts
Ã  Signer rela=onship
Ã  Deposit mix
Ã  Transac=ons in areas of concern
Ã  Use of mul=ple accounts and account types
Ã  Social Media Behavior
Ã  Etc.
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Big Data for Financial Crimes and Controls- Solu.on
Ã  The unique nature of money laundering requires a new genera=on of solu=ons based on
–  Vast variety of Historical Data
–  Business rules
–  fuzzy logic
–  Data Mining
–  supervised and unsupervised learning and other machine learning technologies to increase
detec=on and reduce false posi=ves.
Ã  To implement a next genera=on solu=on for BSA/AML, firms must look towards updated
machine learning tools that allow finer grain resolu=on at the scale needed to detect
AML.
Ã  Phased Approach
–  Rule Based Model ( Crawl Phase )
–  Feature based Model (Walk Phase)
–  Data Driven Model ( Run Phase)

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


AML Solu.on: Rule Based Solu.on (Crawl Phase)

Key Highlights and Challenges


Ã  Manual Analysis by a inves=gator

Ã  Subjec=ve and Inconsistent LexisNexis Rule Based AML


Solu=on
Ã  Time Consuming Accounts
Database
Alerts from Rule
Ã  High False Posi=ve Based System
Transac=on
Data
Ã  Constant update to rules
Ã  Not able to Catch no modes of Frauds Card data Suspicious
Dashboard to Match
Data
Payment Data
NOT

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


AML Solu.on: Feature Based Solu.on (Walk Phase)
Rule base & Supervised & Unsupervised Learning for AML
Key highlights Historical Alerts

Ã  Features are meta data (Extracted from


the data)--average balance of last 7 days LexisNexis
Ã  Features help algorithms capture Machine Learning
Accounts Algorithms
informa=on from the data. Database

Ã  Feature engineering is a form of language Transac=on Alerts from ML


Data
transla=on: Between raw data and the Based System
algorithm. Card data

Ã  Uses Supervised and/or unsupervised Suspicious


Dashboard to Match
Machine Learning Payment Data Data
NOT
Ã  Quick classifica=on
Ã  Low false posi=ve rate - tweaked based on
23 risk appe=te.
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Type of Machine Learning and Poten.al Usage

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


Next Gen AML Solu.on: Data Driven Based Solu.on (Deep Learning)

Key highlights
Ã  The algorithm understands malicious
behavior through data LexisNexis

Ã  Algorithm is smart to work without


Accounts
features - metadata Database

Ã  Does not need alerts for training Transac=on Deep learning


Data
Algorithms Suspicious
Ã  Helps in iden=fying any kind of Data Driven Solu=on
anomalous behavior Card data

Ã  Deeper insights about customer NOT


Payment Data

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


High Level System Architecture: MAX ROI & Future Proof Solu.on
Note Just for AML/Fraud Risk Scoring Engine (examples)
•  Credit score (if allowed by regulatory agencies)
•  Ra=ng aiributes (demograhics, geographic,
social, property aiributes)
•  Likelihood of fraud/risk(frequency/severity)


Source
Data Dynamic Customer
(examples) External/3rd party Data Sources
Real-Time Event Profile /Risk
Appe=te Model
Accounts
Streaming Engine Enrich Events with
Customer/Risk info and Visualiza.on / Analy.cal Views
Scoring Models

Transac=ons

Update Profiles and Scoring Models


Data.gov Update Data Lake

Central Data Lake Na=ve API


Social
Real-.me Intelligent Ac.on ODBC/JDBC
•  Risk Similarity/Risk Profiling
•  Related En=ty Analysis (graph database)
lexisNexis •  Fraud/Social Network Analysis
•  Mul=-line “profitable” class code Rest API
•  Geospa=al data
•  Updated risk appe=te

26

© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Deliverable to build Big Data Solu.on

Ã  Automa=ng Due Diligence around KYC data


–  Simple informa=on collected during customer onboarding
–  More complex informa=on for certain en==es
–  Applying sophis=cated analysis to such en==es
–  Automa=ng Research across news feeds (LexisNexis, DB, TR, DJ, Google etc)
Ã  Efficient Case Management
Ã  Capture all Data Set at one place
Ã  Applying Advanced Analy=cs (two sub Use Cases)
–  Exploratory Data Science
–  Advanced Transac=on Intelligence
–  Machine Learning/ Deep Learning
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Business Analy.cs Must Evolve To Deal With Data Tipping Point

PROVIDE INSIGHT INTO THE PAST


via data aggrega.on, data mining,
business repor.ng, OLAP,
visualiza.on, dashboards, etc.
UNDERSTAND THE FUTURE
via sta.s.cal models, forecas.ng
techniques, machine learning, etc.

ADVISE ON POSSIBLE OUTCOMES


via rules, op.miza.on and
simula.on algorithms

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


The Data Tipping Point

Drivers of a
Connected Data
Architecture

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


Why Will This Work Now?
Ã  A free open source linearly scalable plavorm has only become available within the last
few years
Ã  Due to the amount of regula=on over the last 15 years all bank enterprise compliance,
risk and finance systems now func=on essen=ally the same way
Ã  Banks partnering with an open source partner is very different from partnering with a
vendor who develops proprietary soyware
Ã  Proprietary soyware vendors will adopt the new standards since it is in their self
interest to do so
Ã  Regulators can now streamline their regulatory prac=ces by adop=ng a Big Data based
approach
Ã  Having a standards based Open Source plavorm means that regulators can use the
same plavorm as the banks

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


Digital Banking Solu.on Architecture

Retail Banking Enterprise Data & Compute Lake

Banking Business Logic


Sources
Retail Banking Apps Marke.ng Apps Customer Journey SVC NBA
Layer
BI &
Repor.ng
Social
Governance & Integra.on Enterprise Security Business Workflow
RDBMS Business
Analy=cs
Mainframe Applica.ons &
Workloads
Batch Search In-Memory Real-Time SQL Predic.ve Pivotal HAWQ

Document Data
Mgmt Systems Science
Data Opera.ng System
Data Silos Processing
Mul=-purpose plavorm enablement SAS

Core Banking

Industry Ref. Storage
Distributed File System
Staging, Database, Structured, Unstructured, Archival, Document Other…

Web Logs

Cloud Compu.ng Stack (Public or Private)


Public Cloud, Private Cloud, Hybrid Cloud suppor=ng a full stack of VMs and Docker

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved


Q & A

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

You might also like