You are on page 1of 13

White Paper

Big Data to Big Value

How Qlik can help you gain value from your Big Data

September, 2017

qlik.com
White Paper

Table of Contents
Executive Summary 3
Introduction 3
The growing need for Big Data analytics 4
How Big Data flows from source to analysis 5
Utilizing Big Data: Focus on relevance and context 6
Different methods for different data volumes and complexities 7
Comparison of different Big Data Access methods 11
Qlik and Big Data connectivity 11
Qlik goes the last mile with Big Data 12

qlik.com
Executive Summary

• Big Data’s promised benefits are not realized until there is a way for business users to
easily analyze data.
• The key to unlocking value lies in presenting only what is relevant and contextual to the
problem at hand.
• Different data volumes and complexities are best met using different methods or a
combination of methods.
• Qlik offers multiple methods and best practices to give customers a significant advantage
in time-to-insight when it comes to analyzing Big Data.

Introduction

There continues to be an incredible amount of interest in the topic of Big Data. It has transcended from a
trend to being simply part of the current IT lexicon. For some organizations, its use has already become
an operational reality; providing unprecedented ability to store and analyze large volumes of disparate
data that are critical to the organization’s competitive success. It has enabled people to identify new
opportunities and solve problems they haven’t been able to solve before.

For other organizations, Big Data is still something that needs to be better understood in terms of its
relevance to a company’s current and future business needs. This paper reviews how data flows from
source to analysis and then discusses how the Qlik data analytics platform can help companies gain the
most leverage from a Big Data implementation by easing access and making Big Data both relevant and
in-context for the organization’s business users.

Big Data to Big Value | 3


The growing need for Big Data analytics

Historically, the uses of Big Data focused on Data Scientists running very complex algorithms on
massively parallel computing clusters to solve major challenges in academia, government, and the private
sector. While the need for Data Scientists to solve such complex problems still exists, there is a much
broader need for end users to be able to harness the power of Big Data analytics for a variety of business
issues.

And unlike the algorithmic model which seeks to find the needle in the haystack by mining through all the
data available, business users are more likely to ask ad hoc questions that focus on various slices of the
data that relate to them. They want to gain new insights to better answer actionable business issues such
as:

• How have my product sales performed since we ran the last promotion?
• How effectively is my sales team cross-selling our products?
• Which of my products are NOT selling well? Does this vary by region or sales team?
• Is there a lack of redundancy anywhere within my plant’s supply chain? What happens if a natural
disaster cuts off our primary suppliers?
• Does the service call history for my region indicate any pattern of customer satisfaction or
dissatisfaction?

These types of questions have been posed by business users long before the advent of Big Data, but
such questions weren’t answered with a high degree of certainty or granularity because key data sets
didn’t exist or were impractical to access. Business users were unable to combine their intuition with
better data to arrive at more optimal decisions.

Now, however, the technology exists to expand the availability of Big Data sources to business users.
Qlik provides both the rapid, flexible analytics on the front end as well as the ability to integrate data from
multiple sources (e.g., Hadoop repositories, data warehouses, departmental databases, and
spreadsheets) in one single, interactive analytics layer.

Big Data to Big Value | 4


How Big Data flows from source to analysis

To make an analogy from metal mining, raw ore must be extracted from the earth, transported to plants
which use mechanical and chemical processes to refine the metal, and only then can it be fashioned into
jewelry or other products.

Likewise, data follows a journey from its raw form to delivering business insight:

• Gather. The origin of business-oriented Big Data is typically machine or IoT data (e.g., data
streams, server logs, and RFID logs), transaction data (e.g., website activity, point of sale data
from physical stores), and cloud data (e.g., stock ticker prices, social media feeds). This data is
often unstructured (strings of text or images) or semi- structured (log data with a timestamp, IP
address, and other details). In the common definition of Big Data, this sort of data has high volume
(terabytes to petabytes), high velocity (many terabytes of new data per day), and high variety
(hundreds of different types of servers and applications each creating information in their own
format).
• Initial processing. If cost of storage is the primary concern, the data is often copied to a Hadoop
cluster. The Hadoop Distributed File System (HDFS) is an example of a distributed, scalable, and
portable file system designed to run on commodity hardware. Hadoop jobs such as MapReduce
enable highly parallel data manipulation and aggregation, but this is typically only sufficient as a
first-level interpretation of the raw data. Accelerator tools such as Apache Drill, Spark and Cloudera
Impala provide open source means for external systems, such as Qlik, to better query the data
stored in Hadoop.
• Refinement. Quite often, organizations will also employ an enterprise data warehouse (EDW)
which serves as the central repository for structured data that require analysis. EDWs are designed
for not just storage volume but also have robust ETL (extract, transform, load) capabilities hence
they play a complementary role with Hadoop clusters. EDWs can extract data directly from the data
source, a SAN (storage area network) or NAS (network attached storage) system, or Hadoop
clusters. Because data in EDWs is structured and not raw, it is easier to query and represent a
higher level of meaning than raw data.
• Analyze. The typical business user needs the flexibility to integrate data from multiple sources and
be immune from the details about where the data comes from or how it is organized. Data
modeling must be fast and easily span different data sources. Such an environment not only
reduces the burden on IT to keep up with business demands, but it also empower business users
to incorporate additional data in their analysis as needed in a timely manner.

Big Data to Big Value | 5


Utilizing Big Data: Focus on relevance and context

Business users are constantly being challenged to efficiently access, filter, and analyze data - and gain
insight from it - without using data analytics solutions that require specialized skills. They need better,
easier ways to navigate through the massive amounts of data to find what’s relevant to them, and to get
answers to their specific business questions so they can make better and quicker decisions.

Qlik is seeing a few common misconceptions about how Big Data fits into the overall analysis needs of
the business user. It is important to understand that:

• The most important data may not be in the Big Data


repository. Often, the data from the Big Data repository
acts as supporting evidence for a discovery initially made in
operational data or even in a spreadsheet. For example, a
spreadsheet or small database containing customer King.com uses Qlik for Big
satisfaction survey results may be the basis for an analytic Data analytics
inquiry, and the data from a Big Data repository allows the
user to correlate a customer’s customer service or support “Implementing Qlik has cost
less than 20% of the
history with their satisfaction scores.
alternative solutions. The
• The data needed for analysis may be scattered in payback period was just a
multiple repositories. The process of configuring an few months.”
enterprise data warehouse may not only involve copying – Mats-Olov Eriksson,
Main architect of the analytics
data from an operational data source but also include
system
metadata modeling and transformations. Because this could
Background:
be time consuming or cost prohibitive, some operational • Worldwide leader in casual
sources may continue to be separate. They don’t warrant social games
the cost and effort of loading it into the data warehouse. • Offers 150 games in 14
languages
Two important aspects to consider when working with Big Data are • 40 million monthly players
determining the relevance and context of the information. • 2 billion rows of new log data
per day

Relevance: the right information to the right person at the Use case:
right time • Analyze ROI of marketing
campaigns
Qlik’s approach has always been to understand what business • Track update of new game
offers
users require from their analysis, rather than to force feed a
solution that might not be appropriate. Access to appropriate data Technology:
at the right time is more valuable to users than access to all the • Logs stored in 14-node Hadoop
cluster
data, all the time. For example, local bank branch managers may
• Batch processing create KPIs
want to understand the sales, customer intelligence, and market and aggregates in Hive
dynamics in their branch catchment area, not the entire nationwide • Qlik connects via ODBC (open
database connectivity) to Hive
branch network. With a simple consideration like this, the
conversation moves from one of large data volumes to one of
relevance and value.

Big Data to Big Value | 6


Context: what does the Big Data mean in context of other sources of insight?

Qlik’s patented, innovative Associative Engine is designed specifically for interactive, free-form
exploration and analysis so data is naturally surrounded with context. Qlik’s associative experience
means that every piece of data is dynamically associated with every other piece of data, across all data
sources. Qlik also offers powerful on-the-fly calculation and aggregation that instantly updates all
analytics and highlights all associations based on user interactions. For example, a Sales by Region chart
may be surrounded by related visualizations such as a Sales by Product chart or interactive list boxes
that contain contextual information such as date, location, customer, sales history, etc. Any time the user
selects within one chart or list box, every other list box and chart is instantly updated based on the user’s
selections. This unique capability of Qlik makes it incredibly easy for a business user to focus on (for
example) a particular product in a particular geography sold to a particular customer and see only the
data that is relevant to them.

The usefulness of these associations is even more apparent where there might be hundreds or
thousands of products, customers, geographies, etc. Extremely large datasets can be sliced with a few
clicks rather than scrolling through thousands of items. With Qlik, context and relevance go hand in hand
and quickly take what seemed to be a Big Data problem down to something that is quite manageable
without any programming or advanced visualization skills.

Different methods for different data volumes and complexities

Because Big Data is a relative term and the use cases and infrastructure in every organization are
different, Qlik offers multiple techniques to handle Big Data scenarios:

• In-memory
• Segmentation Multiple techniques to handle Big Data
• Chaining
• On Demand App Generation
• Other methods

In some cases, one method may be sufficient. Other


scenarios may dictate the use of multiple methods working
together.

Every situation is different. We put the power in the hands of


our customer to decide how they will best manage the
inherent tradeoffs between flexibility, user performance and
the typical Big Data characteristics of data volume, variety,
and velocity.

This section reviews the different Qlik methods that can be


utilized in Big Data scenarios.

Big Data to Big Value | 7


In-memory

Because the Qlik Associative Engine optimizes in-memory speed, compressing data down to 10% of its
original size, many Qlik customers find that the inherent capabilities of the product satisfy their Big Data
requirements while preserving high performance.

In addition, the amount of memory on standard computer hardware continues to grow in size and
decrease in price. This has enabled Qlik to handle ever-larger volumes of data in memory. For example,
a single 512GB server can handle uncompressed data sets near 4TB in size. Qlik’s compression scheme
means that the more redundancy in the data values, the greater the compression.

Unlike technologies that simply “support” multi- processor hardware, In-memory Data Flow
Qlik is optimized to take full advantage of all the power of multi-
processor hardware. It efficiently distributes the number-crunching
calculations across all available processor cores, thereby maximizing
performance and the hardware investment. In a clustered
environment, Qlik apps can be hosted on different servers. For
example, an app containing a smaller amount of aggregated data
could be run on a server with less memory while an app with large
amounts of detailed data could be configured to run on a larger
server, all of this being invisible to the user.

In addition, Qlik can be deployed such that one server runs in the
background extracting and transforming large amounts of data while
another server runs the user-facing app; free from the added burden
of handling back-end tasks. An additional benefit to IT with this multi-
tiered architecture is that the transactional data source only has to be
accessed once. That data can then be reused in multiple Qlik apps
without a fresh extract.

Administrators can also configure Qlik to load only data that is new or
has changed since the last load, thus greatly reducing the bandwidth
required from any data source.

Big Data to Big Value | 8


Segmentation

Segmentation is the process of dividing up one Qlik application into multiple applications to optimize
performance, security, scalability, simplicity and maintenance. Data can be segmented by region or
department. Or a user may want to segment a small dashboard or summary app from another app that
contains the detailed data. For example, a retail company may have a very large set of data and want to
expose analytics (and more importantly insights) in the application to the retail analysts across
departments as well as executives and a few power analysts that do the bulk of the detailed analytics.
Segmentation will allow us to “break up” the large set of data that would resides together in the
application to chunks that serve those different groups. If this is done, each of these groups would be able
to utilize their app without incurring the full cost of RAM and CPU required for the full version of the
application. Note that segmentation requires very little maintenance or overhead to manage the
segmented versions.

Chaining

Chaining refers to the linking (or jumping) from one Qlik Segmentation & Chaining Data Flow
application to another and maintaining some sense of “state”
or selections that the user had made prior to linking. While
these are separate Qlik apps, even potentially running on
different servers, they can share selection states. For
example, a CRM application includes several different
customer subject areas. Each of the subject areas correspond
to a department within the company. Qlik can be configured to
have a dashboard and comprehensive app of the overall
customer base. These apps are then linked or chained to
subject-area apps that are specific to each department. Thus,
chaining is another method that allows the customer to
manage apps that would contain too much data for their
hardware to handle as one giant app.

It is important to note that the techniques of segmentation


and chaining can also be utilized together by segmenting
apart multifaceted data views into subject-specific views and
then chaining these separate views to each other.

Big Data to Big Value | 9


On-demand App Generation

On Demand App Generation (ODAG) is a method that empowers the user to automatically create a
purpose-built analysis app every time they select a slice of a very large data source.

The vast majority of users don’t want to analyze the entire On Demand App Generation Data Flow
Big Data source and many times they don’t initially know
which “slice” of data they want to analyze in more detail.
Thus, what’s desired is a method to quickly scan the
entire data source for potentially interesting sections that
warrant a more detailed analysis. In some cases, this
need could be met by using the concepts of chaining
and segmentation - a summary app would be chained to
other apps that each contain a segment of the data
source in more detail. But what if there are too many
potential segments to pre-define as apps? What if the
user doesn’t know what parts of the database they want
to analyze? Freeform data discovery means that the
user can explore in any direction. And that could mean a
new app is needed every time an unexplored area is
encountered.

On Demand App Generation can thus be very valuable in scenarios where the user may not know exactly
what part of the database they want to analyze in detail. On-demand App Generation typically consists of
two different apps - initially the user is given a selection app where they would pick from a “shopping list”
of particular subsets of data such as a Time Period, Customer Segment or Geography. This selection can
then be used to trigger the immediate generation of a purpose-built analysis app that only contains
detailed data related to the selection. The user is then free to explore the selected detailed data in any
direction using the in-memory capabilities of Qlik. Since these apps are governed by the standard Qlik
Sense security rules, one can control who can access the detailed data vs summary information.

Users now have the freedom to “fail fast” – easily investigating different slices of the data source without
the need to develop a new app each time they want to analyze a different set of data. This also allows the
administrator to give users broad access to a data source of immense size since only the requested slice
of detailed data is actually being managed in-memory at any one time.

Other methods

There are other techniques one could utilize to access Big Data. There are a variety of partner
technologies and tools available that could be integrated with the Qlik Platform. Once could also develop
a custom analytic app using JavaScript and the same APIs that the On-Demand App Generation apps
utilize in the background. Just like the standard ODAG extension that come with Qlik Sense, user
selections would spawn the generation of a filtered data set for analysis via multiple API’s in Qlik Sense
or QMS API/EDX in QlikView. Developing such customer apps will likely require greater technical skills,
but it removes any limitations imposed by standard Qlik functionality. For example, one could develop a
single UI experience that contains both the selection and analysis apps.

Big Data to Big Value | 10


Comparison of different Big Data Access methods

Just as there is not one method to manage Big Data, there is not one best method to access and analyze
Big Data sources. Customers should consider their specific user requirements and data sources to decide
which method or combination of methods make the most sense for them.

Segmentation & On Demand App Other methods


In-Memory
Chaining Generation
Highly compresses data Users move between User selections spawn • User selections spawn
into memory. Methods for multiple related generation of a filtered
generation of a filtered
data load can extend this segmented apps (e.g. by data set and purpose- data set for analysis
even further. region). built app for analysis via multiple API’s in
Description
Qlik Sense or QMS
API/EDX in QlikView
• Partner technology
• Other custom solution

Applicable • The compressed data A data source that is too A data source that is too • Custom UI
situation(s) source fits into server unwieldly to be managed unwieldly to be managed • Technology used to
memory. in server memory and in server memory and access Big Data
• Only aggregated or can be split into pre- cannot be split into pre- sources requires
summary data is defined segments defined segments custom development
needed
• Only record- level
detail over a limited
time period is needed.

100s millions to billions 100s millions to billions Billions of rows Billions of rows
Data
of rows per segmented
Volumes
app

Qlik and Big Data connectivity

Qlik is designed as an open platform and comes with a number of built-in and third-party connectivity
options for Big Data repositories.

• ODBC Connectivity. Qlik’s out-of-the-box ODBC connectivity includes drivers for Apache Hive,
Cloudera Impala and other software. Additional Big Data tools can be accessed using the
Vendor’s ODBC Connector. For example, Micro Focus provides an ODBC driver to Vertica, their
Big Data analytics platform.
• Data-source specific connectivity. Qlik has partnered with multiple vendors to be certified on the
vendor provided ODBC driver. For example, MapR has certified us for Apache Drill and we
received SAP certification for their HANA ODBC driver.
• Partner-developed connectivity. A number of Qlik partners have developed connectors that are
designed to work with specific data sources or applications where Qlik does not already offer
connectivity. This growing list of partner-developed connectors can be found at market.qlik.com.

Big Data to Big Value | 11


Qlik goes the last mile with Big Data

One of the big challenges in telecom is the “last mile” — bringing the telephone, cable, or Internet service
to its end point in the home. It is expensive for the service provider to fan out the network from the trunk
or backbone – to roll out trucks, dig trenches, and install lines. As a result, in some cases telecom
providers pass high installation costs down to the customer — or neglect to go the last mile at all.

There is a “last mile” problem in Big Data, too. Today, most technology providers working on the problems
of Big Data are focused on processing the data — they are focused on the backbone, to use the telecom
analogy (or the plant, in the ore mining analogy). But the last mile is where Qlik is focused. Qlik’s mission
is to simplify decisions for everyone, everywhere, by empowering them to see the whole story that lives
within their data.

Qlik already does Big Data and it does it well. Many customers have successfully used Qlik to increase
the value of their investment in Big Data technology by ensuring that it isn’t restricted to only the few data
scientists. Instead, Qlik empowers every user to access and collaborate on Big Data information in
combination with traditional data sources and then uses the powerful Qlik associative experience to gain
new insight.

Big Data to Big Value | 12


150 N. Radnor Chester Road
Suite E120
Radnor, PA 19087
Phone: +1 (888) 828-9768
Fax: +1 (610) 975-5987

qlik.com
Big Data to Big Value | 13
© 2017 QlikTech International AB. All rights reserved. Qlik®, Qlik Sense®, QlikView®, QlikTech®, Qlik Cloud®, Qlik DataMarket®, Qlik Analytics
Platform®, Qlik NPrinting™, Qlik Connectors™, Qlik GeoAnalytics®, and the QlikTech logos are trademarks of QlikTech International AB which
have been registered in multiple countries. Other marks and logos mentioned herein are trademarks or registered trademarks of their respective
owners.

You might also like