Professional Documents
Culture Documents
Speaker Bio
Information Management practice consultant for PwC Canada (PriceWaterhouseCoopers LLP), specializing in Agile Analytics
and Data Quality
20 years experience in IS/IT, both in full time and external consulting capacities with a focus on Data Management over past
7 years
Have worked for very large Fortune 500 organizations as well as small start-ups and mid sized companies across
numerous industries
Education:
Honours Bachelor of Mathematics in Computer Science from the University of Waterloo
Certified Data Management Professional (Mastery Level)
Advised Wall Street equity analyst on technology company, prior to IPO in 2010
Neil Hepburn
Presentation Roadmap
A brief history of QlikTech
How Traditional BI works
How QlikView works
What is Agile BI/Agile Analytics
Demo of QlikView
How QlikView works internally?
Criticism of QlikView
Novel Features of QlikView
QlikView Competition
QlikView blindspots
What are the implications for OLAP and the data warehouse
Neil Hepburn
Founded in Lund, Sweden in 1993 by Bjrn Berg and Staffan Gestrelius originally as a
consultancy
Traditional OLAP/cube technologies primarily provide the ability to drill up and down through
dimension hierarchies, allowing the end-user to see pre-aggregated measures
Dimensions and measures must be know a priori
A small team is usually required to complete a BI project
A data warehouse or data mart is usually required as a pre-requisite before OLAP cubes can
be built
This can often lie on the critical path of other data warehouse projects. Since data
warehouse usage cannot be anticipated, a single version of the truth can often bog down
development
ETL is very slow to test, which in turn slows down development time
If a detail drill down report (e.g. to see all point-of-sale records), a drill through query link is
made to the operational data store to retrieve these data
Introduces another point-of-failure
Associations between dimensions are not computed only resulting measures (e.g. counts)
Neil Hepburn
The secret sauce is: An experienced QlikView can build and test a dashboard solution
(including user acceptance testing) faster than any other BI tool I have evaluated
This makes Agile BI possible
Users and developers can remain focused on insights and outcomes
The resulting dashboards are effectively by-products of the analysis process
More flexible data model allows normalized data to be imported with fewer transformations
ETL development is in-memory. ETL jobs can be tested orders of magnitude faster than
traditional ETL tools
All data is automatically profiled on import
QlikView uses the word associative to distinguish itself from other BI vendors
Associative is a tricky concept to explain, but most people will get it when they see it
Associative puts emphasis on understanding how sets of data relate to one another
All those tricky SQL queries involving NOT EXISTS or LEFT/RIGHT OUTER JOIN are
but a mouse click away
Neil Hepburn
Neil Hepburn
Traditional BI workflow
Neil Hepburn
QlikView workflow
Neil Hepburn
Neil Hepburn 11
QlikView does not reveal the specifics of its inner workings. However, the following gives us
clues:
From Curt Monashs DBMS2 blog: The main ingredient of the performance secret sauce
in QlikView is that selections are compiled straight into machine code. (QlikTech gave
me the impression that this post is the first time that will be publicly revealed.)
We can also look at their main patent, with Hkan Wolg listed as the inventor. This is the
first part of their first, and most important claim. Note, the final multi-dimensional cube
Neil Hepburn 12
At the centre of QlikView is a large Multi-Dimensional Cube Table, with one column for
each table, and each row containing pointers back to the original tables row index
Also uses a: Global Symbol Table; Value Tables; and Data Tables
The machine code most likely refers to bitmap indexes. QlikView heavily relies on bitmap
indexes to perform its JOINs
QlikView may have the best known solution to Kimballs Big JOIN problem (JOINing a
billion dimensions with a trillion facts), since a single row is effectively being represented by a
single bit
Consider that a 64 rows can be JOINed in less than a clock cycle
Intel and AMD now support Active Vector Extensions (AVX), which will allow 256 rows to
be JOINed in less than a clock cycle
Unclear if this architecture lends itself to map/reduce
The embedded example shows in detail how the indexes work
Neil Hepburn 13
Criticism of QlikView
QlikView is the biggest threat to established BI vendors. Not surprisingly, there is more criticism directed
to QlikView than any other product.
Some criticism is valid, but most of it either misunderstands the product or distorts the truth
Criticism #1: You cant fit very much data in memory
Used to be true. When when 32-bit OSes were the norm, upper limit of 20 GB uncompressed data
Now I can buy an HP Integrity Superdome 2 /w 4 TB RAM, and load 40 TB of uncompressed data
about a years worth of call-detail-record data for a major Canadian telecom
Criticism #2: QlikView forces you to rename foreign and/or primary key columns to be the same
This is true, QlikView relies on Natural Joins. This is what you what you want as it leads to a more
intuitive [or Natural] user experience. The effort to rename columns is negligible
Criticism #3: QlikView stores data in proprietary files
QlikView now supports an open QVX format with a published spec and SDK
Third party tools (e.g. Expressor an ETL tool) integrate with the QVX format
Neil Hepburn 14
Neil Hepburn 15
Neil Hepburn 16
QlikView Competition
Started as Project Gemini, which was announced 21 months in advanced the farthest out
for any MS project
MS has done their best to mimic QlikViews associative experience
Will now be rolling out Power View as part of SQLServer 2012 SSRS
Other vendors have greatly simplified the cube/OLAP approach, and can be considered
somewhat Agile, although they lack the Associative experience. Primarily:
Tableau
TIBCO SpotFire
Many vendors have jumped on the in memory bandwagon, but ultimately have just moved
their existing cubes in memory effectively just speeding up user interaction, but offering
nothing new in terms of user experience or development timelines
Some big data analytical DB vendors (e.g. SAP HANA) are feigning competition with
QlikView but none of these get to the last mile of user experience
Neil Hepburn 17
QlikView blindspots
Neil Hepburn 18
Data quality, de-duplication, and fuzzy matching should be treated as operational issues, e.g.
fuzzy matching tables should be maintained operationally
Dashboard schemas will be built in tools like QlikView, as needed.
Star and snowflake schemas are still useful, but should be built as needed on-the-fly
The data warehouse should more-or-less be a time invariant mirror of the ODS, and more-orless maintain itself
Neil Hepburn 19