You are on page 1of 24

Table of Contents

Introduction............................................................................................3 Versatility Meets Utility: How Web Data Extraction is Used............. 13


Automated Web Data Extraction is Vital for World-Class Businesses..... 4 Real-World Results: Web Data Extraction Success Stories............. 14
The Rise of Swivel Chair Automation.....................................................5 Is Web Data Extraction Right for You?.................................................. 19
8 Drawbacks of Swivel Chair Automation..............................................6 The View from Leading Analysts........................................................... 20
Traditional Options, Incomplete Results.............................................8 How Automated Web Data Extraction Transforms Your Business......... 21
Closing the Web Data Extraction Gap: 3 Traditional Options................. 9 Four Must-Haves for Your RPA Solution............................................... 22
The Problem with Tradition.................................................................. 10 Additional Resources........................................................................... 23
The Missing Piece: Web Data Extraction.......................................... 11
What is Web Data Extraction?.............................................................12

2 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
Introduction
The internet is big. Really big.

The surface web—those pages indexed by search engines and accessible to the public—contains 4.5 Billion Pages
at least 4.5 billion pages.1

Then there’s the deep web, or invisible web—pages that aren’t indexed by search engines,
such as content accessed via forms or password credentials, and pages that aren’t linked or
registered with search engines. No one knows how big the invisible web really is, but some
experts estimate that it’s a whopping 500 times the size of the surface web.

When your business depends on collecting and interpreting web data, the glut of available
information is both the challenge and the opportunity. You can’t afford to manually sift through
even a tiny fraction of those billions of web pages and portals, logging in and out or hoping
your homegrown code or web scraper won’t break on a dynamic web site.

But what if you could find a way to efficiently and automatically collect accurate, comprehensive
web research that’s delivered in near-real time to power your most critical business decisions?

Suddenly the internet feels a little more manageable.


500x
1
http://www.worldwidewebsize.com/

3 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
Automated Web Data Extraction is
Vital for World-Class Businesses
Global businesses are in a continual race to evolve. This is especially true for companies that are in the business
of information—whether you gather and analyze information to advise or sell to others, use information in-house
to maintain a competitive edge, or leverage information in an entirely new and disruptive business model.

Yet many companies rely on manual research, incomplete information gathered from web scraping tools, third-
party data and home-grown coding solutions to power this critical piece of their business—the equivalent of
pedaling a bike down the online superhighway while others are whizzing past in race cars.

“The goal is to turn data into information, and information into insight.”
– Carly Fiorina, former CEO of Hewlett-Packard

4 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
The Rise of Swivel Chair
Automation
Complete automation of web data collection has been out of reach
for many organizations due to limited technology options. Manual
data collection and cleanup is a tremendous burden that requires
human workers to act as the conduit between several systems,
moving between websites, portals and applications to key, re-key,
copy and paste information.

This manual data collection process is often referred to as “swivel


chair automation,” calling to mind a cubicle farm full of workers
swiveling left to right in their chairs, looking from one monitor to
another as they copy and paste information. It’s not exactly the
picture of purposeful, efficient, scalable operations.

5 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
8 Drawbacks of Swivel Chair Automation

1. Reduced 2. Diminished 3. Increased 4. Insufficient


Productivity Accuracy Expense Standardization
No matter how well-skilled, Homegrown tools and web scrapers Much of what you’re paying for Humans each have their own work
employees can only work so fast; often deliver incomplete and inaccurate is essentially a copy and paste style and preferences. Multiplied
we also need a lot more food data, requiring employees to manually task. Highly repetitive work across dozens or hundreds of
and rest than computer software. sort and cleanse the data. Even siphons valuable time away workers, reconciling these workflow
Despite complaints that may be experienced workers are prone from knowledge workers who variations is costly. Manual processes
overheard near water coolers, to make errors, especially when could be spending their time are highly inconsistent when it comes
none of us is really able to work completing large volumes of work. analyzing information instead to information-gathering and research.
24 hours a day, 7 days a week. of gathering it.

6 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
8 Drawbacks of Swivel Chair Automation

5. Limited 6. Incomplete 7. Diluted Customer 8. Weakened


Scalability Process Visibility Experience Compliance and
When repetitive tasks
and Analytics An inefficient research process causes
Security
depend on human workers Manual processes are inconsistent and a ripple effect on customer service, Some data-gathering and
to complete them, scaling much harder to track than automated whether you’re a retail company reacting interpretation is subject to strict rules.
and reacting rapidly is processes. Because you’re not too slowly to competitor price changes, People are famously good at making
extremely difficult, if not starting with 100% accurate data, a consulting organization charged with mistakes, adopting shortcuts and
impossible. insights gained from processes that delivering the freshest insights to clients bending the rules under pressure.
involve manual task completion will be or a disruptive business leveraging web Manual processes set the stage for
inherently flawed. data to launch a new product or service regulatory non-compliance.
to the market.

7 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
TRADITIONAL OPTIONS,
INCOMPLETE RESULTS
8 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
Closing the Web Data Extraction Gap: 3 Traditional Options
Let’s look at the three options traditionally used by organizations that rely on web data to power their businesses:

1. Manual 2. Web Scraping Software 3. Outsourced/Purchased


It’s likely your IT department has a longer to-
or Custom Development Although it’s possible to lower costs with purchased
do list than it can handle, and data collection At first glance, using an inexpensive web scraping data, you’re separating important business
projects often fall into the “important but not tool or leveraging homegrown tools to gather the data processes—data gathering and analysis—and relying
urgent” category. Meanwhile, the job still needs you need seems like a no-brainer. Startup costs are on information that might be incomplete, outdated or
to be done, and the easiest way to get it done minimal, and you can circumvent IT’s long to-do list contain discrepancies.
is to throw more people at it. and deploy and manage it within the business unit.

9 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
The Problem with Tradition
Tradition, a.k.a. “the way we’ve always done it,” is a wonderful way to celebrate special holidays and customs,
but this approach often falls short of delivering an ideal outcome in the rapidly-changing business world.

1. Manual 2. Web Scraping Software 3. Outsourced/Purchased


Assigning people to swivel between dozens
or Custom Development Quality. Quantity. Speed. Pick two. This saying
or even hundreds of web data sources is Web scrapers and custom code can create more applies when outsourcing manual data-gathering,
neither cost-effective nor scalable. With problems than they solve when they can’t access and you’re still not solving for human errors and
more information being generated than ever dynamic content, filter out unwanted information productivity limitations.
before, spread across an increasing number or break when websites change. The data they
of applications and locations, manual web manage to deliver often needs to be cleansed and
data-gathering should be a last resort for your corrected before use, causing expensive delays
information-driven organization. when you require dynamic, timely intelligence.

Web Data Extraction: A Flexible and Cost-Effective Alternative


To become agile and efficient, your organization needs a combination approach—a flexible solution that embraces both the
innovation of new technologies to collect, filter and deliver large amounts of rapidly-changing dynamic information and the power
of your people to analyze data and deliver market-leading insights for a competitive edge.

10 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
THE MISSING PIECE:
WEB DATA EXTRACTION
11 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
What is Web Data Extraction?
Web data extraction is one use case of Kofax Kapow™ Robotic Process Automation (RPA), which uses
intelligent software robots to automate the collection of vast amounts of data like market intelligence,
financial data, news information, public records and court documents, competitive pricing and many other
diverse sources of data on the public web and online web portals.

Not only can you deploy robots in a matter of days or weeks, not months, with no coding required, but
those robots will immediately deliver accurate, timely, high-value content from the web with near-real-time
monitoring of large volumes of information and precise web data extraction.

In short, web data extraction solves problems that were previously unsolvable.

12 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
Versatility Meets Utility:
How Web Data Extraction is Used
Web Data Extraction replaces traditional methods of data collection and is ideal for:

Research and consultancy firms that extract and analyze large amounts of information from the web
Examples: Investment research; government research

Screening and risk management services


Examples: Tenant screening; employment screening; security clearance checks;
criminal background checks

Competitive price monitoring and business intelligence


Examples: Internet retailer price monitoring and unprecedented visibility into online channels and customers

Disruptive online service models


Examples: Personal financial management services that aggregate bank, credit and loan
account information; bank loan or insurance providers that deliver approval in minutes

13 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
REAL WORLD RESULTS:
WEB DATA EXTRACTION
SUCCESS STORIES
14 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
Global Research and Consultancy Company Frees
Analysts from Time-Consuming Manual Data Collection

The Problem Outcome


The company’s unique oil and gas offering, the North America Well Analysis Tool, • Reduction of time spent on data collection: 95%
needed 30-40 data points per oil well on tens of thousands of wells, with more • Data set forming the core of the Well Analysis Tool:
than a thousand new wells being drilled every week. This volume and detail of Over 16 billion data points
information was impossible to collect manually. • Structured and standardized approach to data collection results in
better tracking and an audit trail for copyright and usage conditions

Bonus outcome:
The Solution Exponentially less time spent on data collection frees
analysts to provide higher-quality services to clients
The company deployed Kofax Kapow to automate the and make smarter strategic decisions
collection and integration of data, freeing analysts to
with profitable results.
focus on data analysis and research work.
Read the full story >

15 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
Global Financial Services Company Delivers Faster,
Sharper Investment Insights to Clients

The Problem Outcome


Providing trusted economic forecasts, investment strategy and insights to global • Report turnaround time reduced from several weeks to one week
corporate, institutional and private wealth clients means having a comprehensive • Easily adapts to local languages during the data and content
take on any given topic. Gathering web-based research by hand and dealing transformation process to produce an accurate picture of global markets
with translations from local languages is time-consuming and error-prone. • Larger, richer data sets delivered efficiently allow the company to scale
without losing quality, flexibility and control over its data recommendations

Bonus outcome:
The Solution Timely, holistic insights empower clients to make the
most of new opportunities and run more competitive
The company deployed Kofax Kapow to automate and profitable businesses.
and streamline the extraction, transformation and
delivery of web-based content in both structured and Read the full story >
unstructured formats.

16 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
Farner Consulting AG Create the First Fully Automated
Solution for Monitoring Political Issues

The Problem Outcome


Monitoring political issues and events at all levels of government in Switzerland • Data extraction, transformation and delivery of data
required gathering information from dozens of websites. The time-consuming accelerated 90 percent
manual process created long cycle times and higher fees for reports, which • Created IssueManager, the first fully automated software as a service
affected Farner’s competitiveness. (SaaS) offering for monitoring political issues in Switzerland
• Clients and consultants are freed from repetitive information
management work to perform value-added work

The Solution Bonus outcome:


With the potential to transform Farner’s political consulting business
Farner Consulting deployed Kofax Kapow to automate model, IssueManager is taking the business in a new direction
and streamline the extraction, transformation and with cutting-edge, diversified offerings.
delivery of web-based content to a central database. Read the full story >
A user-friendly front end interface allows consultants to
easily search for and filter information.

17 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
Spotcap Revolutionizes Lending with Flexible Financing
for Small and Medium-Sized Businesses

The Problem Outcome


Small and medium businesses looking for lines of credit aren’t attractive to traditional • A fully automated, end-to-end lending process resulting in a
lenders—often because conducting customer due diligence and processing loan offer within minutes
conventional loan applications is expensive for lenders and slow for SMBs. • Organizational growth of more than 300 percent from 2015-2016
• Ranks among the top 30 fintech companies in Europe and
top three in Germany
The Solution • Achieved a full ROI with Kapow in under a year

Spotcap knew if they could make the loan process incredibly fast and efficient, Bonus outcome:
they could revolutionize the SMB lending landscape. Spotcap uses Kapow By making it easier for smaller companies to access the financing they
robots and supporting application program interfaces (APIs) to automatically need, Spotcap empowers small and medium businesses, the
extract thousands of data points from a wealth of sources, including customers’ “backbone of the economy,” to take their companies to new
accounting software, company registers, tax authority records, credit databases, heights and strengthen the economy as a whole. Read the full story >
e-commerce websites and more. Kapow then transforms and integrates the data
so it can be used by Spotcap’s credit assessment algorithm.

18 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
Is Web Data Extraction
Right for You?
Organizations that benefit most from the automation of web data
collection and integration:

Have a complex or time-consuming extraction process for web-


based data that won’t scale with your current solution

Use large amounts of data to gain insight, create a competitive


edge, ensure compliance, or deliver services

Want to move from slow, manual monitoring to proactive, near-real-


time intelligence from a larger, scalable, more accurate data set

19 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
The View from Leading Analysts
Forrester
“Kofax acquired Kapow in 2013 for its data integration smarts but soon found the real jewels: a robotic
engine that drives web APIs for use cases that must gather and process data from internal and external sites.”

— The Forrester Wave™: Robotic Process Automation, Q1 2017

Celent
“Kofax Kapow™ is particularly strong in deploying web bots, serves multiple industries such as banking and
finance, logistics manufacturing, healthcare, and retail and travel, and has developed solutions for compliance
monitoring and reporting for banks’ KYC-AML operations.

— Innovation in Compliance Technology: Emerging Themes and Vendor Solutions, July 2017

Aragon Research
“Kapow robots can be implemented without complex coding or lengthy development cycles, which
dramatically expedites project deployments and speeds ROI. Instead of requiring a costly virtual desktop
infrastructure (VDI) like other RPA vendors, Kapow minimizes its VDI footprint by providing a centralized server
model that allows web and mainframe robots to execute without ever connecting to a virtual desktop.”

— Hot Vendors in Robotic Process Automation, Sept. 2017

20 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
How Automated Web Data Extraction Transforms Your Business
Before After

Employees/ •  Manual validation •  Automatic validation


Operations •  Incomplete and inaccurate data •  Complete and accurate data
•  Rules applied unevenly •  Rules applied systematically
•  Majority of employees time spent •  Free employees to analyze information
collecting, not analyzing, information and unlock valuable insights
•  Questionable audit trail •  Complete audit trail
•  Not easily scalable •  Easily and quickly scalable

Customers •  Burdened with outdated or •  Empowered with timely, holistic


incomplete information information to make good
business decisions

Business •  Struggle to stay competitive •  Significant competitive edge


•  Increased operational costs •  Decreased cost of ownership
and increased ROI

21 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
Four Must-Haves for Your RPA Solution
If you’ve decided to investigate robotic process automation for automating web data extraction,
consider a solution that:

Can extract and deliver data from multiple sources, including websites, web portals and web
apps, as well as internal and external applications

Transforms unstructured web data without compromising quality, accuracy or completeness

Automates all aspects of the data extraction and integration process, including real-time
changes on complex dynamic websites

Can securely scale without the need for complex and costly virtual desktops and browsers

“Service providers that leverage automation in their services portfolio have shown
that they can increase value to their existing customers and differentiate themselves
to new customers in a crowded marketplace. When providers expedite manually
intensive processes, they are able to broaden their offerings and grow their client base.”

—Institute for Robotic Process Automation & Artificial Intelligence 2

2
Institute for Robotic Process Automation & Artificial Intelligence

22 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
Additional Resources

Learn more about how smart software robots can automate and scale the acquisition, transformation and delivery of
web data in your organization:

Video: Gain a Competitive Advantage with Data Integration and Robotic Process Automation

eBook: Choose Your Future: A Guide to Rethinking Web Scraping

eBook: The ABCs of Automating CDD, KYC and AML

White Paper: Ten Must-Haves for Web Data Extraction and Transformation

White Paper: Integrating Data Sources is an Expensive Challenge for the Financial Services Sector

Infographic: Banking on Precise Data for Investment Research

23 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction
For more information, ask for a demo of
Kapow Robotic Process Automation from Kofax.

POWER YOUR PROCESSES. Power web data extraction with the Kofax Kapow Robotic
EMPOWER YOUR CUSTOMERS. Process Automation Platform. For more information, contact us
at info@kofax.com or give us a call at +1 949.783.1333.
kofax.com/rpa

© 2018 Kofax. Kofax and the Kofax logo are trademarks of Kofax, registered in the United States and/or other countries. All other trademarks are the property of their respective owners.

24 | Scrap the Web Scraping: The Guide to Automating Web Data Extraction

You might also like