You are on page 1of 37

Data Visualization Techniques

for Fraud Analysis


A white paper by Centrifuge Systems, Inc.

WWW.CENTRIFUGESYSTEMS.COM
TO LEARN MORE ABOUT CENTRIFUGE SYSTEMS, VISIT CENTRIFUGESYSTEMS.COM OR CALL 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis
A white paper by Centrifuge Systems, Inc.

About Centrifuge

Centrifuge Systems is a leading provider of data visualization software that


helps organizations discover insights, patterns and relationships hidden
in their data. The unique Centrifuge approach allows users to ask open
ended questions of their data by interacting with visual representations of
the data directly.

Traditional solutions require users to define what they want to see in


advance and present the results in static dashboards. With Centrifuge,
users determine what is of interest ‘on the fly’, then manipulate the displays
directly in a highly interactive fashion. The experience is refreshingly easy-
to-use and the resulting insights can be extraordinary.

Centrifuge is used in some of the most demanding applications in the


world, including counter-terrorism homeland defense, to help analysts
identify hidden meaning in their data and communicate those results to
other team members.

Notices

Centrifuge Systems, Inc. makes no warranty of any kind with regard


to this material, including, but not limited to, the implied warranties of
merchantability and fitness for a particular purpose. Centrifuge Systems
shall not be liable for errors contained herein or for incidental, consequential,
or other indirect damages in connection with the furnishing, performance,
or use of this material.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 01 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Executive Summary............................................................................................................. 3

Introduction..........................................................................................................................4

The fraud management process.................................................................................. 5

Investigative Analytics....................................................................................................8

Techniques for Fraud Analysis.................................................................................. 10

Phase 1: DATA PREPARATION AND DATA CONNECTIVITY.................................. 10

Phase 2: Initial Data Analysis............................................................................ 16

Phase 3: Advanced Analysis & Identity Visualization........................... 21

Phase 4: Annotation, Collaboration & Presentation............................. 32

Conclusion............................................................................................................................ 34

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 02 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Executive Summary
Today more than ever, fraud investigators are faced with unprecedented challenges as
they attempt to accurately identify fraud and money laundering activity. Investigators
are asked to operate in shrinking windows of time, while the volume and velocity of data
pouring in grows exponentially. Over the past few years, most of the innovation in analytics
has been in the area of automated information analysis. These techniques remove the
analyst from the equation and attempt to reveal all relevant insights automatically. We
have found that in most investigative processes, the single most important component is
human judgment. So the question is “Where is the analyst-centric innovation?”

One approach that has proven highly effective in this environment is called Investigative
Analytics. Investigative Analytics is a human-focused approach to analyzing large
amounts of data. It is based on the three modern innovations in analysis: interactive
data visualization, unified data views and collaborative analysis. Through Investigative
Analytics, an investigator can take control of the process while applying her training,
experience and judgment to discover hidden relationships and insights across data. With
this approach, the analyst’s brain serves as the ultimate pattern recognition machine
and the technology opens up the potential for unconstrained analytical power. When
an investigator detects something relevant, inferences are drawn almost immediately.
Suspicious relationships are investigated and confirmed. The result is accurate identification,
an essential by-product of the investigation which positively impacts detection, reporting
and issue resolution.

Existing investigative analysis products on the market fall short in four key areas.
+ Too hard to use
+ Too static (lack interactivity)
+ Too disconnected
+ Too isolated (lack collaboration)

Next generation products must address these shortcomings and allow investigators to
rapidly assimilate important facts, detect hidden relationships, socialize results with others
and act on knowledge uncovered during this process. The need for this technology has
never been greater than it is today. This paper explores this subject in depth while also
providing a recipe for performing investigative analytics. At a time when the reputation
of financial institutions is at stake and regulatory compliance standards are dramatically
increasing, effective next generation approaches could not be more relevant.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 03 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

This paper is divided into three sections. In section 1, we define the Fraud Management
Process. Section 2 summarizes the three key components of the Fraud Identification phase
of the process. Section 3 is dedicated to the techniques used to identity fraud.

Introduction

If you have ever visited the FBI’s web site (www.fbi.gov) and clicked on “What We
Investigate,” you will notice at least ten different types of fraud from telemarketing to
mortgage to insurance and others. You will see “cyber crimes”, “network intrusion”, “identity
theft” and other criminal activities listed. Diving deeper, you will notice that each type of
fraud has different schemes (market manipulation fraud, foreign currency fraud, internet
pharmacy fraud and hundreds of others). Each scheme is quite elaborate; some have been
around for over 100 years and others have become prevalent in the last 100 days.

Fraud is common. The schemes change rapidly, often to throw investigators off the scent
while more elaborate schemes are put in place. As internet usage has exploded, consumers
have become comfortable with e-commerce transactions and people have flocked to social
networking sites, a fertile breeding ground for fraud, identity theft, money laundering and
cyber crime. Fraudsters like to remain anonymous and what better way to do that than
through the World Wide Web? Let’s examine some interesting facts:

The FBI reports losses totaling $40 billion for securities and commodities fraud in
2006.1

The number of mortgage fraud “Suspicious Activity Reports “(SARs) filed with the FBI
rose from 5,600 in 2002 to over 37,000 in 2006.2

According to the Centers for Medicare & Medicaid Services, national healthcare
expenditures topped $1.3 trillion in 2000. Although the exact amount of healthcare fraud
is difficult to determine, estimates range from three to ten percent, thus translating
into staggering amounts of money lost to fraud.3

Large international banks have recently been fined $65 million for latent filing of SARs,
$80 million for not meeting regulatory requirements to prevent money laundering and
$32 million for the same reason. In some cases, regulatory agencies have cited a lack
of “financial intelligence” as part of the reason for the fines.

Recently, 41 million credit card and debit card numbers were stolen through cyber
breaches at retailers as hackers sat in vans outside major retail establishments and
hacked into servers which were supposedly secure.
This is a massive problem that only seems to be getting worse.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 04 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

What is the Challenge?

Fraud and money laundering pose real problems for investigators:

Not Enough Time

Investigators are asked to do more with less in an attempt to accurately identify fraud
before it is too late. But too often the crime has been committed, the perpetrators can’t be
found and the money is gone. Government regulations also create a need for investigators
to identify and report problems quickly.

Existing Technology is Limited

Not only are current tools difficult to use, they often limit the breadth of the investigation
by constraining the analysis to a pre-determined set of data and operations. To effectively
leverage an investigator’s expertise, next generation solutions need to allow investigators
to operate at the speed of the human brain and pursue lines of inquiry on the fly.

Not Enough Collaboration

Investigative analysis is a lonely function in most organizations. Even in some of the most
well known financial institutions, business lines and investigative groups assigned to those
business lines are separate. With credit card transactions separate from ATM transactions
and both separate from mortgage loans, it is very difficult to connect fraudulent activity
across these systems.

Can’t See the Whole Picture

It is very difficult to identify fraud without comprehensive access to all relevant data.
Typically, the data is spread out across transaction monitoring systems, account activity,
customer profiles and historical silos. If investigators don’t have a 360 degree view of
what is going on, fraud can go completely undetected.

The Fraud Management Process

Let’s look at the essential steps in the fraud management process to better understand
where the process breaks down.

Fraud management is typically divided into four stages:

1) Detection
2) Identification
3) Regulatory Reporting
4) Issue Resolution

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 05 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Accurate identification is
the most critical step in the
fraud management process.
It can positively impact
detection, reporting and Fraud Management Process
resolution.
In a perfect word, the process would unfold as follows: The detection process
includes all relevant transaction monitoring systems so that alerts from each
line of business may be analyzed together. Automated rules are applied to
detect suspicious activity. When conditions match these pre-existing rules,
alerts fire off and notify fraud investigators that something suspicious is
taking place. The investigators are then charged with investigating these
cases that have been flagged. This is the key step. The investigator leverages
all available data, and her own domain knowledge and expertise, to determine
if this case does in fact represent fraudulent activity. If so, a report is filled.
The criminal activity is then pursued in conjunction with federal and local
authorities and resolved as quickly as possible. Ideally, accurate identification
by the investigator is fully documented and meets regulatory requirements.
Unfortunately, this perfect world doesn’t exist.

One could argue that most critical step in this process is Step 2, identification.
Better stated, the most critical step is accurate identification by the investigator.
By improving this step, all of the other steps can be positively impacted. Let’s
analyze this in more detail. If the investigator can accurately identify fraud
from thousands of alerts, she can provide a feedback loop into the alerting
process to improve detection over time. As the investigator learns more, the
rules get better and the job becomes more focused by virtue of the fact that
accurate detection is in place. Similarly, accurate identification leads to accurate
reporting which leads to more effective utilization of resources in the last step,
issue resolution. All of this translates to less risk for the business on many
levels. There is lower risk of non-compliance, lower risk of fines, less risk of
negative publicity and more positive awareness that the business is managing
risk in a manner consistent with consumer and organizational expectations.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 06 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Investigative Analysis using Data Visualization

So, the identification phase is arguably the most important phase of the fraud management
process. This phase encompasses real investigative analysis and has the potential to
positively impact the other phases. It is also the weakest component of most existing
analytical solutions. Let’s summarize three emerging technologies that can significantly
improve the investigative analysis effort.

1. Interactive Data Visualization


2. Unified Data Views
3. Collaborative Analysis

1. Interactive Data Visualization

Data visualization is getting a lot of attention today. This is the use of visual metaphors to
enhance our ability to detect patterns in data. Interactive Visualization takes this further
and allows us to interact with the visualizations directly to ask follow up questions and
pursue a line of inquiry. This has proven to be very effective at allowing investigators
to navigate through, explore and understand massive amounts of data. We find that
when we see something relevant, we draw inferences almost instantly allowing the
investigator to work at the speed of the human brain. This is very different from the
static charts that most tools provide today. When used effectively, the resulting insights
can be remarkable.

2. Unified Data Views

Accurate identification depends on having access to all relevant data pertaining to


the investigation. Since important facts exist in disparate systems, the ability to access
these data sources without extensive integration and programming efforts is critical.

Internal data used in the investigation represents one important class of information.
Increasingly, third party data, news wires, blog posts, network traffic, historical information
and many other sources are equally important. Providing the investigator with the ability
to easily reach out to these sources from within the investigative framework is extremely
powerful. The absence of this capability often yields an incomplete investigation.

A common complaint is that the investigator needs to go out to multiple tools to get a
comprehensive view of the case. This can be tedious and highly disruptive to a particular
line of reasoning. The ability to create unified views of the disparate data is a powerful
paradigm for visual analysis. Unified views allow us to “shift our lens.” For example,
we could move from a quantitative to a relational to a temporal view of the same data
very quickly. This allows investigators to validate findings and eliminate false positives
very quickly.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 07 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

3. Collaborative Analysis

Business professionals have leveraged the power of collaboration technology to increase


productivity and foster the exchange of ideas for quite some time. This needs to be
applied to fraud and AML investigations. Since investigators are assigned cases, and many
of these cases are interrelated, it stands to reason that if investigators can collaborate,
notify each other of important findings and publish results for review, they can solve
cases faster while also improving the accuracy of the identification process. The ability to
document the results of the investigation for audit purposes is also very important especially
in the area of compliance and regulation. Knowing exactly what steps the investigator took
in the analysis process to arrive at a conclusion is useful for audit purposes, training, and
notifying other investigators who may have similar types of cases to solve.

Automatically notifying others in the organization that results are available for review can
dramatically speed up investigations leading to shorter windows for criminal activity to
occur Saving the results of the analysis to document key findings in the investigation
is very important. These analytic assets need to be protected, archived, retrieved when
needed and used to meet compliance requirements.

Investigative Analytics

These three improvements comprise the pillars of Investigative Analytics. IA is a fraud


analyst-centric approach to analyzing and understanding data in support of accurate
identification. It is based on highly interactive visualizations that allow users to rapidly
comprehend and act on large amounts of data. This remarkable approach empowers
investigators to apply their domain knowledge and experience while exploring all
relevant data in a particular case.

Investigative Analytics holds great promise for quickly and effectively detecting potential
fraud schemes. This approach allows the investigator to ask questions of the data (who,
what, why, where and when) and explore relationships between individuals, banks,
accounts, phone records, e-mail records or other relevant data regardless of where it
resides.

This approach is very different from other analytical techniques that are currently applied.
Today, investigators are largely dependent on first generation business intelligence
products which produce static dashboards that may describe the problem but don’t allow
the investigator to interact with the data in an unconstrained way. By way of example, cyber
investigators focused on detecting network intrusion may have access to dashboards which
reveal leading indicators of suspicious activity such as spikes in e-mail activity to specific
IP addresses with attachments over a certain file size. These indicators suggest a potential
malicious attack where the attacker is trying to establish a presence on a network server

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 08 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

followed by the installation of some form of malware which could scrape credit card
numbers. The problem is, the investigator needs much more than leading indicators of
the historical attacks if they are to identify and thwart the new attacks. She also need to
leverage the collective domain knowledge of the team through rich collaboration.

Statistical analysis (and predictive analytics) is another class of analytics which uses
statistical techniques ranging from simple correlations to complex neural networks in
an attempt to predict or forecast a specific outcome or behavior. For example, given the
right amount of input data, an analyst could build a model to predict that mortgage fraud
through inflated home appraisals is about to take place and the loss amount will exceed
a specific dollar value.

While these techniques can work successfully, they suffer from a number of inherent
weaknesses and should be used in conjunction with Investigative Analytics. They require
a deep understanding of statistical modeling and data transformations. Additionally,
since models require historical data to accurately predict the future, the accuracy of the
models depends on having sufficient data.

The results of investigative analysis should be easy to understand, clear and concise and
easily transferable to others involved in the case.
Techniques for Fraud Analysis

Techniques for Fraud Analysis

Four phases of fraud analysis are discussed below. They represent important phases
when trying to identify fraud. Results from these phases are often integrated with case
management technology, rules based systems to refine alerts and predictive analytics
technology. Techniques presented below have been organized into these phases:

1) Data Preparation & Data Connectivity


2) Initial Data Analysis
3) Advanced Analysis & Identity Visualization
4) Annotation, Collaboration & Presentation.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 09 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Phase 1: Data Preparation & Connectivity

Data preparation and data connection are essential first steps in fraud analysis. When
done properly, they provide a foundation for your analysis later. This phase provides
a basic understating of the data and allows the analyst to unify disparate sources of
data. Fundamentally, these two processes streamline the analysis stages that follow. The
primary components of this phase include:

+ Connect to data sources and integrate essential data for analysis


+ Inventory data sources and determine what you have to work with
+ Identify gaps and anomalies in the data
+ Pre-process the data to select segments required in the analysis
+ Transform the data by creating new data fields and modifying field types
+ Define “Dataviews” for later use in data profiling and advanced data visualization

More and more data is becoming available for analysis every day. The need to easily
connect to these sources and unify them is essential if the fraud analyst is going to
successfully “connect the dots” between pieces of data in different sources. This case
study uses 4 sources of data:

1) Fraud Alerts across different business lines in a bank


2) Financial data on banking transactions and account officers
3) National identity management databases
4) Independent “watch-lists”

Joining Data

With so many data sources available for analysis, the process of integrating the data
allows analysts to thoroughly and accurately investigate cases. Joining different data
sources involves indicating where the data resides followed by linking disparate sources
based on a common key (a unique key present in one or more sources of data).

The example in Figure 1 shows the first two sources of data (Weekly Fraud Alerts and
Financial & Customer Demographic data). These two data sources are in different formats
(Excel and Microsoft Access) yet they can be joined on a common key (Customer ID).
Notice that each of the two sources of data contains different data fields. The Fraud
Alerts (listed as Accounts Query) has alert ID, alert name, at risk value and more. The
Financial and Demographic data has contact information, branch and account officer
data. The fraud analyst has chosen to include all of the data in both sources (indicated
by check marks next to the field names) but could have decided to exclude data fields
irrelevant in the investigation. Excluding data could make it easier for the analyst to
navigate through the analysis phases and also speed up performance if any of the tables
are extremely wide.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 10 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Figure 1: Joining Disparate Sources of Data

Typically, most organizations will have more than two sources of data. By integrating
multiple sources of data, the Fraud Analyst increases her chances of identifying unusual
behavior across the sources. In figure 2, many sources are connected. In the center of the
figure, the analyst has joined 16 different sources with data on property, SSNs, vehicles,
aliases and much more.

Figure 2: Unifying Many Data Sources

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 11 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Figure 2: Unifying Many Data Sources

Both examples show dozens of data fields that can be useful in the analysis. Each data
field has a “type” allowing the technology to understand the form the data takes. For
example, is the data represented in integers? Are certain fields in a date format? If so,
what format of date is used? Some analysis tools will automatically classify data fields
by type but it’s important that analysts review data types to ensure the data is being
interpreted correctly.

Connecting to data sources should be as simple as indicating the location of files and
allowing the analytical tool to read the metadata (the information in the file that describes
the data). In figure 2, the data files have been joined by drawing a line between the two
different sources based on the common key (customer ID). In some cases, it may be
useful to refer back to the original sources of data to ensure that the customer IDs are
identical for a select number of records. In some instances, common keys can be created
by combining portions of existing fields. For example, you could take the first 4 letters
of last name, ZIP code, the first 3 letters of street name and other portions of data fields
and combine them into a unique identifier. Without common keys across the data, joining
disparate data is not possible.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 12 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

At the bottom of figure 3, the analyst previews individual data joined from two of the
sources. This technique allows the analyst to validate the data prior to loading large
volumes for analysis. It also ensures that the data has been joined correctly. Notice that
data fields such as Branch Name, Customer Risk Category, Account Officer and Title have
been connected to the original set of fraud alerts. These additional fields allow for new
types of analysis to be conducted.

Figure 3: Validating Data Connections

Selecting Segments of Data for Analysis

There are many techniques used to select data for analysis. One technique is “filtering
the data”. It often takes place during the analysis phase. Another technique involves pre-
selecting data based on data field, individual records or both.

We will revisit filtering in Phase 3, Advanced Analysis. Examples of pre-selecting data


would be selecting only the alerts within the last 30 days or all of the alerts for a set
of branches, account officers, or a combination of other criteria. In trying to determine
if recent alerts represent fraud, you may decide to only analyze alerts within the last
30 days. This technique can be helpful since it focuses the investigation, reduces data
volumes, increases performance and shortens the time it takes to identify fraud. In this
particular example, configuring input parameters in the lower left of the screen could be
used for this very purpose.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 13 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Inventory the Data

Analyzing the imported data in a table format and then running frequency distributions
on each field to show the number of values for every data element is an excellent way to
inventory the data prior to analysis. It may also reveal important insights or anomalies
about the data pointing the analyst in a specific direction.

A very simple chart in Figure 4 shows a count of fraud alerts by alert type, at risk dollars,
branch name and risk category. Analysts can use these charts to better understand the
data. In this case, Forged Signature Alerts for the Checking Business line are high given
the timeframe for this set of alerts. These alerts are concentrated in the Florida and
California branches. Analyzing data using this type of chart (or others) leads the analyst
down a path of discovery that could be useful. For example, “At Risk Dollars” is zero in
many cases even though alert counts are high. This may need to be explored. This type
of analysis can also reveal “hot spots” in the data, null values and unusual behavior
that may need to be investigated. Finally, the analyst may discover missing data that is
required to prove the case.

Figure 4: Data Inventory Using Matrix Charts

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 14 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Creating New Fields for Analysis

Creating new fields allows the analyst to derive new and important information using pre-
existing data. This technique expands the analysis and may also reveal important insights
in the data that may have gone undetected. Figure 5 shows that a new calculated value
is being created by adding “At Risk Value” to “Existing Loss Amount”. Thinking ahead,
the fraud analyst knows that alerts where the combined value is high could be a leading
indicator of fraudulent behavior. Let’s take this example even further. The analysts may
decide to look at the average liability per alert. To accomplish this, she could derive a
field which would be the sum of alerts per customer and then divide that count into the
field just created. The technique of creating new variables using existing data and math
functions can be powerful if done correctly. It can include robust formulas, weighting of
specific data fields and other ways of transforming the data..

Figure 5: Creating New Fields to Expand the Analysis

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 15 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Phase 2: Initial Data Analysis

In phase 2, the analyst is focused on data profiling in support of understanding the data
and developing a series of questions requiring investigation. During this phase, the
fraud analyst can identify correlations between data fields as well as look for anomalies
in the data, null values, suspicious behavior and basic patterns of behavior. Based on
this process, the analyst formulates a hypothesis for the investigaion. Results from this
phase include:

+ A set of charts, tables and other forms of visualizations


+ A set of questions leading the analyst down a path of investigation
+ Identification of data that appears to be suspicious requiring more advanced analysis.
+ A hypothesis for the investigation.

A small sample of data visualizations are presented in this paper. Additional visualizations
will be provided in the ACFE Conference Session.

Data Profiles

Figure 6: Bubble Chart of Fraud Alerts by Type and Name

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 16 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

An initial bubble chart of fraud alerts (Figure 6) by TYPE and NAME show that KYC profile
triggers represent the highest number of alerts. Checking, Loan and Credit Card alerts
have lower concentrations of alerts. Do these alert types represent the most risk to the
bank? How much risk do they represent? Figure 6 shows the number of alerts. By
changing the measure from the number of alerts to the sum of money at risk, the
picture tells a different story.

Figure 7: Bubble Chart Measures the Sum of Money At Risk

Figure 7 reveals that high appraisal loan alerts represent the most money at risk to the
bank. This result leads the analysts down another line of questioning. Is this a new issue
or has it been seen before? Is the money at risk associated with one or more branches?
Where are these branches located?

Figure 8 shows yet a different measure: sum of money lost in the past. This chart confirms
that this problem has been persistent. Let’s quantify the problem.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 17 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Figure 8: Bubble Chart of Historical Money Lost by Alert Name and Type

Figure 9: Heat Map Quantifying Money at Risk by Business Line and Name

A heat map of the money at risk to the bank by alert name and type clearly shows the
magnitude of the problem -- $2.28 million is at risk in the LOAN business line for High
Appraisal Alerts. As you see from the prior series of visualizations, as analysts navigate
across the data, they can represent the alerts in different forms, each telling a unique story
and leading the analyst down a path of inquiry. Are the alerts evenly distributed across

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 18 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

account officers? Figure 10 identifies two important things; The vast majority of the High
Appraisal Alerts show “Null” for the account officer. Secondly, a few of the account officers
have more alerts than others. Charles Head is one. How do the alerts vary by branch?

Figure 10: Alerts by Account Office and Alert Type

Figure 11: Fraud Alerts by Branch Region

Clearly, the branches with the greatest number of alerts are in Florida, Los Angeles and
Washington DC.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 19 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Question Development

This series of charts and graphs illustrates some of the more important aspects of
Phase 2. Clearly, it could be expanded to include other visualizations including time
lines, geospatial and relationship graphs. Some of these visualizations will be shown in
Phase 3. Using these profiles, a series of questions have emerged requiring additional
investigation. Some have been addressed in the charts above. Others need to be resolved
in the Advanced Analysis phase. A sample of questions include:

+ Do the customers with historical alerts show a consistent pattern of behavior over time?
+ Are the alerts clustered around certain days of the week or times of day?
+ Are the account officers in any way related to the customers’ behavior?
+ Are account officers issuing mortgages in close geographic proximity to
their branches?
+ Are any of the customers with high risk alerts tied to any watch lists?
+ Are there any customers that have suspicious data linked to their identities?
+ Are any of the customers linked to the same property or linked in other ways
(i.e. phone records, other property owned, employers, other associations)?
+ Why are so many of the high appraisal alerts not tied to an account officer?
+ Do other financial transactions and accounts show suspicious behavior?

Based on the initial profiles, the fraud analyst formulates a hypothesis for the
investigation. Specific customers are linked to high appraisal alerts. These customers
are also linked in some way to the Florida, California and Washington, D.C. branches.
The number of alerts associated with certain account officers appears to be high.
Collusion between the banking customers and loan officers could be taking place
with illegal kickbacks paid to loan officers.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 20 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Phase 3: Advanced Analysis and Identity Visualization

Charts, Tables and Heat Maps tell part of the story. They are typically used to show
summary and aggregate level views of data. Analysts use them to profile data fields, show
how the data is organized, investigate if two or more fields of data could be correlated
and isolate anomalies in the data. Oftentimes, these forms of visualization communicate
the magnitude of the problem. Shifting from one form of visualization to another allows
the analyst to reveal new insights.

But charts, heat maps and tabular data don’t show relationships between the people,
transactions, and locations. They don’t show networks of activity or connections between
individual pieces of data.

In addition to identifying meaningful relationships hidden in the data, the fraud analyst
is typically also concerned about the timing, strength and direction of the relationship.
Is there someone representing the leader or “head” of the relationship? Are there people
who exist “near” the potential fraudster or “in between” two individuals clearly involved
in fraud? Do the identities of these people indicate anything suspicious? Are there
people linked through employers? How strong are the relationships between people,
accounts or loan officers? These types of questions are better suited to a form of data
visualization commonly called “link analysis” but also known as “relationship graphs” or
“link-node diagrams.”

Revealing hidden meaning in data requires analysts to maintain their train of thought.
Jumping from one data source to another breaks that train of thought. Moving from one
analytical tool to another further complicates this problem. Checking identities outside of
the analytical environment used to identify the fraud creates delays and inaccuracies. As
a result, this phase also includes “Identity Visualization.”

The advanced analysis summarized in this phase allows the analyst to do the following:

+ Build relationships graphs to identity hidden insight


+ Analyze relationship graphs using advanced functions
+ Integrate watch list analysis
+ Validate identities using commercially available identity data

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 21 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

What are relationships graphs?

Relationship graphs are a way of showing visual representations of data through links
between data objects. They are comprised of nodes and links. The “nodes” of the graph
are usually real world items, such as people, places, telephones, vehicles, and so on.
The “links” are lines connecting these nodes to show that a relationship exists between
the nodes.

The characteristics of the links are important since they can show the strength and
direction of the related nodes. These diagrams can get complicated with large volumes
of data and many different types of nodes. For example, a relationship graph showing
linkages between people and properties is less complex than one showing, people linked
to properties, airline flights and employers. As a result, oftentimes analysts use other
forms of visualizations, “filters” and search capabilities to identify a set of data they want
to draw in the graph. In other words, using charts to initially identify fraud alerts for high
risk customers and then selecting these records for use in the relationship graph is a
common practice in data visualization.

Let’s look at an example outside of the financial services industry to demonstrate how
these graphs can be used in other applications:

Figure 12: Network Security Login Traffic

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 22 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Figure 12 is a relationship graph for network login activity to a social networking site.
It shows nodes for Source IP Address, Source Organization and Destination IP address.
Focusing on the central part of the graph (circled in red), there are 4 source organizations
linked to many source IP addresses. These source IPs are ALL linked to one Destination
IP address in blue (center of the circle). This many-to-one relationship could indicate
excessive account access which may mean a data breach has occurred. At the very least
it shows an unusual pattern of behavior. Relationship graphs, unlike charts, show you
details about how data is linked. These relationships can often reveal unusual behavior.

In Figures 13 and 14, the relationship graphs are configured to show links between
banking customers and their fraud alerts. Figure 14 zooms in on a specific section
of the relationship graph. Certain people are linked to 3 or more alert types. These
visualizations show important connections that lead to deeper investigative analysis. As
a fraud analyst, it is important to better understand the timing of each alert, the money
at risk, the identities of the individuals and the locations of the customers in question.
Why are Carver, Carnahan and Camp linked to so many fraud alerts?

Figure 13: Bank Customers Linked to Alert Types

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 23 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Figure 14: Bank Customers Linked to Alert Types (Zoom)

Advanced fraud analysis using data visualization technology includes a wide range of
techniques that are useful in proving the hypothesis in question. As the analyst interacts
with all of the visualizations, a limitless number of pictures, questions and techniques can
be applied to explore the data. Covering all of these techniques is beyond the scope of
this paper. Let’s concentrate on a set of best practices.
They are:

1) Configuring relationship graphs


2) Advanced functions in relationship graphs
3) Interactive workspaces to incorporate timeline and geo-spatial analysis
4) Analyzing third party data to understand identities

Configuring Relationship Graphs

Now that we know the value of a relationship graph, how does an analyst configure one?
Earlier in the analysis, we developed profiles that showed the amount of money at risk
varied by branch and that Florida, California and Washington DC were there locations
that had a high number of alerts. We also saw that specific account officers had more
alerts than others. We formulated a hypothesis. A set of customers could be linked to
account officers providing irregular approval of loans. As a result a high concentration
of home appraisal alerts had been triggered. Let’s put this theory to the test.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 24 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

In Figure 15, the analyst has configured a relationship graph with four nodes. Links
have been drawn in between the nodes. She wants to see customers linked to alerts as
well as account officers. She also wants to see account officers linked to branches. The
relationship graph could be customized to show much more data about the alerts, years of
employment for the account officers and property locations for the customers.

Figure 15: Configuring a Relationship Graph

Now, let’ see what this relationship graph looks like using a small set of alerts and
these related nodes.

Figure 16: Relationship Graph with Customers, Alerts, Officers & Branches

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 25 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Even with a small set of data, the graph can become complex quickly. It is difficult for
the analyst to focus the investigation and discern meaning within this graph. Fortunately,
there are many techniques that prove useful in navigating and searching this graph.

Advanced Relationship Graph Features

Figure 17: Link Intelligence Metrics

Important metrics can be used to quickly identify the most important nodes and links. By
applying link intelligence metrics to the graph, the fraud analyst can isolate some of the
more important suspects. Figure 17 has been filtered to only see the high appraisal alerts.
Most importantly, the size of the customers and account offices has been scaled based
on the number of links they have. Notice account officer Charles Head is linked to many
customers and other account officers with high appraisal alerts. The thickness of the links
is scaled based on the amount of money at risk to the bank. Using a combination of filters
and scaling for both links and nodes, the analyst can begin to focus the investigation.

Filters are a useful way to narrow the investigation by limiting the data analyzed.
Figure 18 shows a three-part filter using “At Risk Value,” “Branch” and “Alert Name”.
Notice that the filter for at risk value uses a sliding scale set by the analyst.

Once the graph is redrawn, the analyst can apply a technique called “bundling” to group
nodes together on the graph. The benefits of bundling are identified in the annotation
on this graph.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 26 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Figure 18: Applying Filters in Relationship Graphs

Figure 19: Using Bundling in Relationship Graph

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 27 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Interactive Workspace with Time line and Geo-Spatial Analysis

By integrating two or more visualizations into the same workspace, the fraud analyst can
now investigate across other dimensions. Figure 20 incorporates a timeline designed to
analyze alerts triggered just after accounts have been opened. These “short interval”
alerts are then “broadcasted” to the relationship graph. Think of broadcasting as a way
to communicate filtered results to other visualizations. In this case, the time line is
broadcasting to the relationship graph which has been set to “listen”. This technique
is useful in identifying individuals tied to suspicious transactions based on geographic
location, timing or some other characteristic of the alerts. For example, alerts with high
risk could be selected from a chart and broadcasted to the relationship graph.

Figure 20: Broadcasting Selections - Time Lines and Relationship Graph

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 28 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Figure 21: Geospatial Visualization with Relationship Graph

The relationship graph in figure 21 is for all customers with alerts where Charles Head
is the account officer (a filter has been applied to the graph). Charles Head is assigned
to two different branches of the bank (Florida and California). Notice that Mr. Head is
the loan officer for Bokovoy who has a Washington, D.C. address. Bokovoy also has
high loss amounts and a very high “at risk” amount. Additional geospatial analysis also
revealed that Jim Camp has similar attributes. Head is linked to Camp, who lives in DC
and has high loss and at risk amounts. Unusual geographic patterns of behavior, when
used in conjunction with other important data and relationship graphs, can help build the
case for deeper fraud investigations.

Identity Visualization Using Third party Data

With a wealth of identity data and other third party sources including public records data,
compilers have amassed 300 million identity records from hundreds of sources. This data
can be accessed in real time to validate SSNs, check fraud scores and retrieve personal
property data. When this is done within the analytical framework, the fraud analyst does
not lose her train of thought. As a result, she can solve cases faster. This technique can
beextraordinarily powerful when the identity data is used in conjunction with customer
data, fraud alerts and account information. Figure 22 shows and integrated relationship
graph with many sources of data.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 29 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Figure 22: Relationship Graph with Identity Data

What does this graph reveal? To simplify the presentation of this graph, some of the
important facts are located in tool tips for the nodes and not shown unless the analyst
hovers over the node. Visually, the fraud analyst can see that two suspects share a
business located in Washington, D.C. yet they are both working with a loan officer
(Charles Head) who is assigned to the Los Angeles Branch. Bokovoy and Camp have at
least 4 fraud alerts in common. Camp owns a plane. Other account officers are linked to
Camp and Bokovoy. Are they involved in a fraud ring? To simplify the presentation, the
analyst decided to show annotations that indicate large sums of money at risk to the bank
for these two customers ($250,000 and $105,000 respectively). Both “at risk” amounts
are tied to high appraisal alerts for home loans far from the Los Angeles branch. When
important identity management data is connected to banking transactions, important
linkages are revealed in support of the investigation. Showing disparate data in one
relationship graph allows the analysts to easily connect the dots.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 30 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Figure 23: Checking Watch List Data

Matching to watch lists can help build the case. By matching of names, addresses, phone
numbers or unique identifiers, the analyst can easily access these new sources. Figure
23 shows a startling result; Four of the people shown in the Figure 22 are on watch lists.
Camp, Head and Bokovoy are being watched for various reasons including a Cyber Data
Breach (Camp), TSA Flight Risk (Head) and Financial Crimes (Bokovoy). Also interesting
is the fact that Paul Willow is on a Terrorist Watch list.

Data integration is a common theme throughout this case study. Since risks in this case
involve more than money lost to the bank including potential terrorist activity, the time
to solve the case is a critical success factor. Connecting to data sources and analyzing
the new sources from within a single analytical framework needs to be mastered by the
fraud analyst to meet growing challenges tied to the proliferation of data sources.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 31 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Phase 4: Annotation, Collaboration and Presentation Techniqes

As fraud analysts work through the investigation, annotating data visualizations helps
highlight significant findings. Annotations are useful in litigation support, training new
analysts and collaborating with other members of the investigative team. Best practices
dictate that these annotated results are saved for future use in a repository. For one, they
can document the steps the fraud analyst has taken to arrive at specific conclusions.

Results can be organized into individual worksheets, each with their own annotations.
In figure 24, a series of steps in the fraud analysis have been added as an annotation.
These guidelines may be useful for new investigators. Notice that the guidelines refer to
worksheets that are part of the complete investigation.

On the relationship graph itself, certain nodes have been selected and appear within the
orange box.

Figure 24: Annotating Worksheets with Step-By-Step Guideline

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 32 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

Annotations can be used to call out specific findings, emphasize proof points in support of
litigation, communicate findings to team members and summarize results for executive
leadership.

As Figure 24 demonstrates, they can also be used as a training guide for other team
members.

Many of the same techniques should be incorporated into presentation of findings. The
presentation should emphasize how the analyst arrives at the conclusion. It needs to
be clear, concise and complete. Additional examples of presentation techniques will be
provided at the ACFE training session along with examples of collaboration.

Conclusion

While fraud schemes continue to morph and become more elaborate, the tools that
investigators can bring to bear on the problem have not evolved. The tools today fall short
in four key areas; they are to hard too use, too static, too disconnected and too isolated.
Next generation approaches must improve in these areas and free the investigator to apply
experience and knowledge in an unconstrained manner.

By improving the identification phase in fraud management, all other phases benefit.
Investigative Analytics provides a powerful new paradigm for improving this analysis
effort and comprises three emerging innovations: 1) Interactive Visualization, 2) Unified
Data Views and 3) Collaborative Analysis. The approach must also drastically improve the
user experience which has been far too complicated. Investigative Analytics allows for
unconstrained analysis across disparate data sets. It allows the investigator to visualize
and detect hidden relationships while also collaborating and working with others. It is easily
adoptable. It is consistent with the way investigators have been trained and think. Most
importantly, it allows them to apply their knowledge and experience to the problem.

By deploying investigative analysis tools that embrace these characteristics, investigators


are armed with technology built for the modern fraud landscape. These tools are weapons
in the fight against fraud.

In this investigation, the analyst detected suspicious behavior in terms of the number of
fraud alerts assigned to an account officer, Charles Head. The alerts were concentrated
in a few branches. Upon closer investigation, she noticed that certain banking customers
were tied to these alerts across business lines. Bokovoy and Camp were linked to the
same address and both were were working with the same account officer (Head) who

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 33 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

happened to be across the country. The amount of money at risk to the bank was high.
High appraisal alerts were not the first alerts set off by Camp and Bokovoy. Identity
visualization using 3rd party data indicated additional problems. Watch lists were checked
and all three suspects (plus one new one) showed up on these lists. Results were published
to other members of the team.

This approach has been put to the test in some of the most demanding applications worldwide
and has proven to be highly effective. If the investigator is able to gain access to critical
data in support of his investigation, if the investigator can identify hidden relationships
within massive data sets, if the investigator can notify others of results, the identification
process can be improved while also enhancing detection, reporting and issue resolution.

Because of these benefits and the enormous information challenges organizations face
today, Investigative Analytics is taking on new meaning worldwide as fraud analysts,
intelligence analysts, cyber security analysts and law enforcement leverage technology
to efficiently and effectively identify fraud.

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 34 THE FREEDOM TO EXPLORE
Data Visualization Techniques
for Fraud Analysis

References

1. Federal Crimes Report to the Public, Fiscal Year 2006, Federal Bureau
of Investigation

2. Federal Crimes Report to the Public, Fiscal Year 2006, Federal Bureau
of Investigation

3. Internal Revenue Service, Department of the Treasury

WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
COPYRIGHT 2010 CENTRIFUGE SYSTEMS, INC. ALL RIGHTS RESERVED - PROPRIETARY 35 THE FREEDOM TO EXPLORE
THE FREEDOM TO EXPLORE
7926 Jones Branch Drive Suite 210 McLean, VA 22102 | Tel: (571) 830-1300 | www.centrifugesystems.com | info@centrifugesystems.com
WWW.CENTRIFUGESYSTEMS.COM 571-830-1300
© 2010 Centrifuge Systems, Inc. All rights reserved. Centrifuge is a trademark of Centrifuge Systems, Inc. All other product or company names may be trademarks
and/or registered trademarks of their respective owners. Information in this document may be subject to change without notice.
While every effort is made
COPYRIGHT 2010to ensure the information
CENTRIFUGE SYSTEMS,given
INC. is accurate,
ALL RIGHTSCentrifuge
RESERVED Systems does not accept liability for any errors or mistakes which may arise.
- PROPRIETARY

You might also like