You are on page 1of 11

Worst Practices in Predictive Analytics

Speed Implementation, Boost User Adoption, and


Maximize ROI by Avoiding Common Pitfalls
A White Paper
1 Executive Summary
2 Identifying Common Worst Practices
2 Failing to Focus on a Specific Business Initiative
2 Ignoring Critical Steps
3 Spending Too Much Time on Model Evaluation
4 Investing Heavily in Analytic Tools With Little or No Return
4 Failing to Operationalize
5 Avoiding Worst Practices
5 Driving ROI
5 Focusing on Bottom-Line Initiatives
5 Preparing Data
5 Evaluate the Model, Without Over-Evaluating
5 Deploying the Results
6 Keys to Successful Predictive Analytics Deployment
6 Understanding the Business Need
6 Understanding the Data
6 Preparing the Data
6 Modeling
6 Evaluation
6 Deployment
7 WebFOCUS RStat: Cutting-Edge Predictive Modeling
8 Conclusion

Table of Contents
Information Builders
1

Executive Summary
Reactive decision-making, while successful in the past, has proven ineffective in recent times.
Organizations can no longer wait to make critical choices after an opportunity arises or a
problem is uncovered. They must take a more proactive approach to running their businesses
by anticipating important changes, events, and trends, and taking action before they occur.
Thats where predictive analytics comes in. Unlike traditional reporting and analysis techniques,
which provide a rear-view perspective of what has happened in the past, predictive analytics
enables the discovery of patterns and trends in historical data to determine what will likely occur
in the future. This eliminates the need for decision-makers to rely solely on intuition, giving them
valuable, forward-looking insight that improves the effectiveness of plans, strategies, and decisions.
In his blog, Forrester analyst James Kobielus claims that predictive analytics, is not just about
forecasting whats coming down the pike. Its also about keeping the bad alternative futures from
happening. If you can see the nasty things that might happen far enough in advance, you have
a better chance of neutralizing or squelching them entirely.
1
According to research from IDC,
the benefits are even more straightforward: organizations using predictive analytics solutions
generate an average return on investment of 145 percent.
2
Unfortunately, many companies dont implement it correctly and fail to achieve these desired
results. In this white paper, we will investigate worst practices in predictive analytics. Well discuss
why these actions can derail predictive analytics initiatives, and what steps can be taken to avoid
making such mistakes. Well also highlight the key steps required for building and deploying
effective predictive applications, and showcase WebFOCUS RStat, todays most powerful and
full-featured solution for predictive analytics.

1
Kobielus, James. Interdictive Analytics: Catching Baddies at the Pass and in The Nick of Time, Forrester, July 2010.
2
The Financial Impact of Business Analytics: Key Findings, IDC, January 2003.
Avoiding Worst Practices in Predictive Analytics
2
As beneficial as predictive analytics can be to an organization, implementation and deployment
projects often fall apart or fail to get underway due to common poor practices, procedures, and
decisions, such as:
n
Failing to focus on a specific business initiative that predictive analytics can enhance
n
Ignoring crucial steps, such as data preparation and access, or deployment of results
n
Spending too much time evaluating models
n
Investing in tools that yield little or no returns
n
Failing to operationalize findings
Failing to Focus on a Specific Business Initiative
The first step in any successful predictive analytics endeavor is to determine what business
questions will be answered by the results. This enables organizations to more readily define
project objectives and requirements in a way that satisfies the need driving the initiative.
Predictive analytics is most effective when it is used to identify expected cases. For example,
customers are scored for risk of churn, to predict who is most likely to defect to a competitor.
Or they are scored to determine who is most likely to respond to a certain type of campaign or
promotion. The expected behavior is known, but determining who is most likely to engage in a
particular behavior requires predictive analytics to identify specific patterns.
Though this benefit is substantial, most organizations are also trying to discover something
critical that they dont already know. Many fail in this endeavor because they begin building their
predictive applications with somewhat loose goals in mind.
They try various models, or alter the underlying business questions over and over again. This
drains project resources and forces developers into a never-ending cycle of definition, evaluation,
and fine-tuning. It can also prevent the organization from reaching its ultimate objective the
deployment of a predictive application for end users.
The best approach, when decisions need to be made with little or no pre-existing knowledge, is
to apply insight from patterns existing in the data to these new cases.
Ignoring Critical Steps
When deploying predictive analytics, many companies overlook important steps in the process.
One of the most frequently ignored is data preparation and access. In reality, this should be
the activity to which the most effort is devoted. In fact, data preparation typically accounts for
approximately 60 to 80 percent of the cost of a predictive modeling initiative.
Raw information must be gathered from various sources across the enterprise, and compiled
in a final data set that is fed to the predictive model. This requires more than just pulling data
from back-end systems and moving it into a centralized location, such as a data mart or data
warehouse. Many companies fail to properly select, cleanse, and enhance data to make it truly
analytics-ready. Others are totally unaware of how complete or accurate their information is
they think it is clean, but in reality, it is not.

Identifying Common Worst Practices
Information Builders
3
Because the information will be dispersed throughout an organization, the proper tables, records,
and attributes must be selected. Invalid or erroneous records must be located and corrected, and
any missing data must be filled in. Without the proper knowledge and tools a comprehensive
business intelligence (BI) platform that can profile, transform, or fill in information, for example
data preparation will serve as nothing more than a stumbling block that creates significant delays.
What happens when the information used in predictive modeling lacks integrity? The principal of
garbage in, garbage out certainly applies here. If the information used is poor, the accuracy of the
results will be as well.
Many companies also fail to share the results of their efforts on a wide scale. In this case, the
insight provided by predictive analytics cannot deliver tangible business value to the very people
who can use it, including executives and managers, frontline workers, and external stakeholders
such as partners and suppliers. Further, results must not only be distributed to the right people,
they must be delivered in a way that is easy for end users to understand, interpret, and act
upon. Keeping the predictive analytic results in the back office is a sure way to garner
disappointing results.
Spending Too Much Time on Model Evaluation
Predictive models must be evaluated to determine how accurately they predict patterns. First,
they must be measured from a data perspective to ensure that all needed information is available
and properly structured before the models are applied. Then they must be assessed from a
business perspective to ensure they will meet end-user expectations and requirements.
Accuracy comes at a cost, and companies must decide in advance how precise they need their
models to be. Is 70 percent good enough? Or do results need to be at least 90 percent correct?
Companies often tend to over-evaluate. They add new variables to the models to increase their
accuracy, which often requires rebuilding. They test and retest the models, spending tremendous
amounts of time making continuous refinements because they are not quite perfect. This delays
deployment, and prevents the organization from recognizing the substantial advantages that
predictive analytics can offer.
There is a tradeoff to be made between time to market, usefulness, and accuracy. Companies
must sacrifice some precision in order to accelerate deployment. Or they must halt
implementation and rollout and delay the realization of benefits to achieve higher levels
of accuracy.
The truth is, if a model is better than the current approach to forward-looking decision-making
(and it likely is), then it should be considered ready for deployment. No model will ever be
perfect, because shifting business strategies and evolving end-user needs require continuous
modifications. The idea that models cannot be deployed until they are just right is just wrong,
and companies risk never deploying them at all.

Avoiding Worst Practices in Predictive Analytics
4
How will a company know when its model is ready? If high-quality information is used, the
models accuracy is satisfactory from a business point of view. If it is properly designed to answer
specific business questions, then the interpretability of the results should be the key criteria for
determining if it is ready for prime time.
It is important to remember that accuracy is key. Even if all other criteria are met, the model cannot
be deployed if it does not meet accuracy standards. In addition, an estimate of the models ROI
can be determined, and when that ROI is at the proper level, the model can be deployed.
Investing Heavily in Analytic Tools With Little or No Return
There are several common mistakes made when it comes to investing in predictive analytics tools.
Companies often buy expensive, complex analytic software that is way too sophisticated for their
needs. These solutions not only come with very high price tags, but also they are typically hard
to deploy and difficult to use by anyone other than statisticians and experienced analysts. As a
result, they likely contain features and functions that will never be used. All of these factors will
significantly diminish return on investment (ROI).
Buyers should also determine if they are buying a package for research, or for deployment
purposes. Those solutions that will support a targeted research project will require only a single
user license for the analyst responsible. Deployment, on the other hand, implies scale and will
require an enterprise-level software solution. Many companies dont make this distinction, and
end up either under- or over-buying.
Other organizations try to build their own software, relying on internal programmers to create a
predictive analytics application. Or they purchase a syntax-based solution that requires extensive
amounts of manual coding. These solutions drain IT resources, and may not include all the
necessary capabilities. Users may also experience complexity in deploying results, rendering the
solution totally ineffective.
Finally, when it comes to the computing environment, organizations typically need two systems
one for predictive analytics, and a reporting system to deliver results. This creates additional
and unnecessary hardware, support, and maintenance costs. A simpler and more cost-effective
approach is to combine these into a single server environment.
Failing to Operationalize
For predictive analytics to succeed, it must be embedded into applications that are leveraged
whenever users need to make decisions. If an application is not built and deployed, the effort
devoted to creating a model will do nothing to enhance forward-looking decision-making. The
results will remain in a document that few people will refer to in support of their daily activities.
However, when a model is incorporated into a dashboard or reporting environment, the results
will be readily accessible to end users, whenever they need them. This will help to create an
analytics-driven culture across the entire business.


Information Builders
5
The worst practices we have highlighted dont have to derail a predictive analytics initiative. In
fact, they can all be easily avoided by:
Driving ROI
When planning a predictive application, companies must consider total cost of ownership and
anticipated return, to ensure that maximum value is achieved.
Focusing on Bottom-Line Initiatives
Create models that will provide forward-looking intelligence to help solve specific problems (i.e.,
minimizing customer churn by uncovering the factors that contribute to it) or help to achieve
certain goals (i.e., increasing up-sell and cross-sell revenue by understanding what new products
customers are most likely to buy).
Preparing Data
Guarantee the most accurate possible results by ensuring that disparate data is easily and properly
accessed and cleansed before the models are created and applied.
Evaluate the Model, Without Over-Evaluating
The model must be tested to ensure that it provides better decision-making capabilities over
current analysis methods. But over-evaluation can delay deployment and hinder ROI. It simply
needs to be assessed until it is determined that it will provide value. At that point, it can be
implemented. The statistical properties of the finished model are secondary to the value it brings
to the business.
Deploying the Results
The insight provided by predictive analysis efforts must be shared with key stakeholders across
and beyond the organization. For example, a bank that has predicted which customers are most
likely to churn should disseminate that information to all those who interact with those clients,
including call center staff and branch personnel. That way, everyone can contribute to correcting
the problem and ensure that countermeasures are being implemented.


Avoiding Worst Practices
Avoiding Worst Practices in Predictive Analytics
6
Now that weve discussed the wrong approach to predictive analytics, lets look at some of the
critical steps that must be taken to ensure its success.
Understanding the Business Need
As mentioned earlier, it is crucial for companies to identify the drivers behind the predictive
analytics project in the early planning stages. Once an organization defines what new information
it is trying to uncover, what new facts it wants to learn, or what business initiatives need to be
enhanced, it can build models and deploy results accordingly.
Understanding the Data
A thorough collection and exploration of the data should be performed. This enables those who
are building the application to get familiar with the information at hand, so they can identify
quality issues, glean initial insight, or detect relevant subsets that can be used to form hypotheses
suggested by the experts for hidden information. This also ensures that the available data will
address the business objective.
Preparing the Data
To get data ready, IT organizations must select tables, records, and attributes from various sources
across the business. Data must be transformed, merged, aggregated, derived, sampled, and
weighed. It is then cleansed and enhanced to optimize results. These steps may need to be
performed multiple times in order to make data truly ready for the modeling tool.
Modeling
Once information has been prepared, various modeling techniques should be selected and
applied, and their parameters calibrated to optimal values. Choice of the modeling technique is
determined by the underlying data characteristics, or by the desired form of the model for scoring.
In other words, some techniques may explain the underlying patterns in data better than others,
and therefore, the outcomes of various modeling methods must be compared. A decision tree
would also be used if it were deemed important to have a set of rules as the scoring model, which
is very easy to interpret. Several techniques can be applied to the same scenario to produce results
from multiple perspectives.
Evaluation
Thorough assessments should be conducted from two unique perspectives: a technical/data
approach often performed by statisticians, and a business approach, which gathers feedback from
the business issue owners and end users. This often leads to changes in the model; but while the
technical/data evaluation is important, it should not be so stringent that it significantly delays
implementation and use of the model. The models business value should be the primary test.
Deployment
Deployment, the final step, can mean one of two things: the generation of a single report for
analysis, or the implementation of a repeatable data mining or scoring application. The goal here
is to create a reusable application that can be used to generate predictions for large volumes of
current data. The results are then distributed to front-line workers, in a format they are comfortable
with reports, dashboards, maps, or graphics to enable proactive decision-making.

Keys to Successful Predictive Analytics Deployment
Information Builders
7
WebFOCUS RStat from Information Builders is the markets first fully integrated BI and data mining
environment, seamlessly bridging the gap between backward- and forward-facing views of
business operations. With RStat, companies can easily and cost-effectively deploy predictive
models as intuitive scoring applications, so business users at all levels can make decisions based
on accurate, validated future predictions instead of relying on instinct alone.
WebFOCUS RStat provides a single platform for BI, data modeling, and scoring. This eliminates
the need to purchase and maintain multiple tools, and frees analysts and other statisticians
from spending countless hours extracting and querying data. At the same time, it reduces costs,
simplifies maintenance, and optimizes IT resources.


With RStat, scoring routines can be incorporated into any WebFOCUS report or application.
RStats greatest benefit is its significantly increased accuracy. With the R engine a powerful
and flexible open source statistical programming language as its underlying analysis tool,
WebFOCUS RStat can deliver results that are consistent, complete, and correct every time.
WebFOCUS RStat provides:
n
A single tool with access to more than 300 data sources, for both BI developers and data miners
n
Comprehensive data exploration, descriptive statistics, and interactive graphs
n
In-depth data visualization and transformation
n
Hypothesis testing, clustering, and correlation analysis
n
The ability to build and export models for estimation and classification
n
Comprehensive model evaluation
n
Rapid application creation through easy incorporation of scoring routines into WebFOCUS reports

WebFOCUS RStat: Cutting-Edge Predictive Modeling
Avoiding Worst Practices in Predictive Analytics
8
Avoiding common worst practices, and adopting best ones, is the key to successfully
implementing and using predictive analytics. By knowing what pitfalls to avoid, and what
important steps need to be taken, companies can accelerate implementation, maximize user
adoption, and realize substantial ROI.
Choosing the right supporting solution also plays a vital role in the success of a predictive
application. Only WebFOCUS RStat offers unmatched data access capabilities, as well as all the
tools needed to build a predictive model, manipulate the results, and deploy them to business
users in a way that is easy to understand, interpret, and use.

Conclusion
Worldwide Offices
Corporate Headquarters
Two Penn Plaza
New York, NY 10121-2898
(212) 736-4433
(800) 969-4636
United States
Atlanta, GA* (770) 395-9913
Baltimore, MD (703) 247-5565
Boston, MA* (781) 224-7660
Channels (770) 677-9923
Chicago, IL* (630) 971-6700
Cincinnati, OH* (513) 891-2338
Dallas, TX* (972) 398-4100
Denver, CO* (303) 770-4440
Detroit, MI* (248) 641-8820
Federal Systems, DC* (703) 276-9006
Florham Park, NJ (973) 593-0022
Gulf Area (972) 490-1300
Hartford, CT (781) 272-8600
Houston, TX* (713) 952-4800
Kansas City, MO (816) 471-3320
Los Angeles, CA* (310) 615-0735
Milwaukee, WI (414) 827-4685
Minneapolis, MN* (651) 602-9100
New York, NY* (212) 736-4433
Orlando, FL (407) 804-8000
Philadelphia, PA* (610) 940-0790
Phoenix, AZ (480) 346-1095
Pittsburgh, PA (412) 494-9699
Sacramento, CA (916) 973-9511
San Jose, CA* (408) 453-7600
Seattle, WA (206) 624-9055
St. Louis, MO* (636) 519-1411, ext. 321
Washington DC* (703) 276-9006
International
Australia*
Melbourne 61-3-9631-7900
Sydney 61-2-8223-0600
Austria Raffeisen Informatik Consulting GmbH
Wien 43-1-211-36-3344
Bangladesh
Dhaka 415-505-1329
Belgium*
Brussels 32(0)2-743-02-40
Brazil InfoBuild Brazil Ltda.
So Paulo 55-11-3285-1050
Canada
Calgary (403) 437-3479
Montreal* (514) 421-1555
Ottawa (613) 233-7647
Toronto* (416) 364-2760
Vancouver (604) 688-2499
China
Beijing 010-51289680, ext. 8010
Croatia InfoBuild CEE
Strmec Samoborski 385-1-23-62-400
Czech Republic InfoBuild CEE
Praha 420-221-986-460
Estonia InfoBuild Baltics
Tallinn 372-5265815
Finland InfoBuild Oy
Espoo 358-207-580-840
France*
Svres +33 (0)1-45-07-66-00
Germany
Eschborn* 49-6196-775-76-0
Greece Applied Science Ltd.
Athens 30-210-699-8225
Guatemala IDS de Centroamerica
Guatemala City (502) 2412-4212
Hungary InfoBuild CEE
Budapest 36-1-430-3500
India* InfoBuild India
Chennai 91-44-42177082
Israel Malam Team SRL Products
Petah-Tikva 972-3-7662040
Italy
Milan 39-02-92-349-724
Japan KK Ashisuto
Tokyo 81-3-5276-5863
Kuwait InfoBuild Middle East
Safat 965-2-232-2926
Latvia InfoBuild Baltics
Riga 371-67039637
Lebanon InfoBuild Middle East
Beirut 961-4-533162
Lithuania InfoBuild Baltics
Vilnius 370-5-268-3327
Mexico
Mexico City 52-55-5062-0660
Netherlands*
Amstelveen 31 (0)20-4563333
Nigeria InfoBuild Nigeria
Garki-Abuja 234-803-318-4750
Norway InfoBuild Norge AS
Oslo 47-4820-4030
Poland InfoBuild CEE
Warszawa 48-22-657-0014
Portugal
Lisboa 351-217-217-400
Qatar InfoBuild Middle East
Doha 974-4-466-6244
Russian Federation InfoBuild CIS
Moscow 7-495-797-20-46
n Armenia n Azerbaijan n Belarus n Kazakhstan
n Kyrgyzstan n Moldova n Tajikistan
n Turkmenistan n Ukraine n Uzbekistan
Saudi Arabia InfoBuild Middle East
Riyadh 966-1-479-7623
Singapore Automatic Identification Technology Ltd.
Singapore 65-6286-2922
Slovakia InfoBuild CEE
Bratislava 421-232-332-513
n Bulgaria n Romania n Serbia n Slovenia
South Africa Fujitsu (Pty) Ltd.
Cape Town 27-21-937-6100
Johannesburg 27-11-233-5432
South Korea Uvansys
Seoul 82-2-832-0705
Spain
Barcelona 34-93-452-63-85
Bilbao 34-94-452-50-15
Madrid* 34-91-710-22-75
Sweden InfoBuild AB
Solna 46-8-578-772-01
Switzerland
Dietlikon 41-44-839-49-49
Taiwan Galaxy Software Services, Inc.
Taipei (866) 2-2586-7890
Thailand Datapro Computer Systems Co. Ltd.
Bangkok 66(2) 301 2800
Turkey InfoBuild Turkey
Ankara 90-312-266-3300
Istanbul 90-212-351-2730
United Arab Emirates InfoBuild Middle East
Abu Dhabi 971-2-627-5911
n Bahrain n Egypt n Jordan n Oman
Dubai 971-4-391-4394
United Kingdom*
Uxbridge Middlesex 0845-658-8484
Venezuela InfoServices Consulting
Caracas 58212-763-1653
* Training facilities are located at these offices.
Corporate Headquarters Two Penn Plaza, New York, NY 10121-2898 (212) 736-4433 Fax (212) 967-6406 DN7506898.0811
Connect With Us informationbuilders.com askinfo@informationbuilders.com
Copyright 2011 by Information Builders. All rights reserved. [95] All products and product names
mentioned in this publication are trademarks or registered trademarks of their respective companies.
Printed in the U.S.A.
on recycled paper

You might also like