Professional Documents
Culture Documents
THE LEADING PUBLICATION FOR BUSINESS INTELLIGENCE AND DATA WAREHOUSING PROFESSIONALS
BI-based Organizations
Hugh J. Watson
Barry Devlin
17
26
Linda L. Briggs
29
36
41
Dashboard Platforms
Alexander Chiang
51
BI Training Solutions:
As Close as Your Conference Room
Its just that easy. Your location, our instructors, your team.
Contact Yvonne Baho at 978.582.7105 or
ybaho@tdwi.org for more information.
www.tdwi.org/onsite
VOLUME 15 NUMBER 2
BI-based Organizations
Hugh J. Watson
Barry Devlin
56 BI Statshots
VOLUME 15 NUMBER 2
tdwi.org
EDITORIAL BOARD
Editorial Director
James E. Powell, TDWI
Managing Editor
Jennifer Agee, TDWI
President
Rich Zbylut
Melissa Parrish
Graphic Designer
Rod Gosser
President &
Chief Executive Officer
Neal Vitale
Richard Vitale
Michael J. Valenti
Abraham M. Langer
Christopher M. Coates
Senior Editor
Hugh J. Watson, TDWI Fellow, University of Georgia
Director, TDWI Research
Wayne W. Eckerson, TDWI
Senior Manager, TDWI Research
Philip Russom, TDWI
Associate Editors
David Flood, TDWI Fellow, Novo Nordisk
Mark Frolick, Xavier University
Paul Gray, Claremont Graduate University
Claudia Imhoff, TDWI Fellow, Intelligent Solutions, Inc.
Graeme Shanks, University of Melbourne
James Thomann, TDWI Fellow, DecisionPath Consulting
Barbara Haley Wixom, TDWI Fellow, University of Virginia
Vice President,
Erik A. Lindgren
Information Technology
& Application Development
Vice President,
Attendee Marketing
Carmel McDonagh
Vice President,
Event Operations
David F. Myers
Jeffrey S. Klein
List Rentals: 1105 Media, Inc., offers numerous e-mail, postal, and telemarketing
as other high-tech markets. For more information, please contact our list manager,
Merit Direct, at 914.368.1000 or www.meritdirect.com.
Reprints: For single article reprints (in minimum quantities of 250500),
e-prints, plaques and posters contact: PARS International, Phone: 212.221.9595,
E-mail: 1105reprints@parsintl.com, www.magreprints.com/QuickQuote.asp
n good economies and bad, the secret to success is to meet your customers or clients
needs. Your enterprise has to respond to changing conditions and emerging trends, and
it has to do so quickly. Your organization must be, in a word, agile.
Agile has been used to describe an application development methodology designed
to help IT get more done in less time. Were expanding the meaning of agile to include
the techniques and best practices that will help an organization as a whole be more
responsive to the marketplace, especially as it relates to its business intelligence efforts.
In our cover story, William Sunna and Pankaj Agrawal note that rapid results in active
data warehousing become vital if organizations are to manage and make optimal use of
their data. Their compressed flat-file architecture helps an enterprise develop less costly
solutions and do so fasterwhich is at the very heart of agile BI.
Sule Balkan and Michael Goul explain how in-database analytics advance predictive
modeling processes. Such technology can significantly reduce cycle times for rebuilding
and redeploying updated models. It will benefit analysts who are under pressure to
develop new models in less time and help enterprises fine-tune their business rules and
react in record timethat is, boost agility.
Barry Devlin notes that businesses need more from IT than just BI. Transaction
processing and social networking must be considered. Devlin points out how agility is
a major driver of operational environment evolution, and how the need for agility in
the face of change is driving the need for a new architecture. Alexander Chiang looks
at dashboard platforms (the technologies, business challenges, and solutions) and how
rapid deployment of agile dashboard development reduces costs and puts dashboards
into the hands of users quickly.
Also in this issue, senior editor Hugh J. Watson looks at enterprises that have immersed
BI in the business environment, where work processes and BI intermingle and are
highly interdependent. Mukund Deshpande and Avik Sarkar explain how sentiment
data (opinions, emotions, and evaluations) can be mined and assessed as part of your
overall business intelligence. In our Experts Perspective column, Jonathan G. Geiger,
Arkady Maydanchik, and Philip Russom suggest best practices for correcting data
quality issues.
Were always interested in your comments about our publication and specific articles
youve enjoyed. Please send your comments to jpowell@1105media.com. I promise to
be agile in my reply.
BI-BASED ORGANIZATIONS
BI-based
Organizations
Hugh J. Watson
A growing number of companies are becoming BI-based.
For these firms, business intelligence is not just nice
to have; rather, it is a necessity for competing in the
marketplace. These firms literally cannot survive without
BI (Wixom and Watson, 2010).
Hugh J. Watson is Professor of MIS and
C. Herman and Mary Virginia Terry Chair
of Business Administration in the Terry
College of Business at the University
of Georgia. hwatson@terry.uga.edu
In BI-based organizations, BI is immersed in the business environment.1 Work processes and BI intermingle,
are highly interdependent, and influence one another.
Business intelligence changes the way people work as
individuals, in groups, and in the enterprise. People
perform their work following business processes that
have BI embedded in them. Business intelligence extends
beyond organizational boundaries and is used to connect
and inform suppliers and customers.
BI-BASED ORGANIZATIONS
Web analytics
BI-BASED ORGANIZATIONS
An Analytical Culture
The bank we mentioned had 12 marketing specialists
prior to implementing its customer intimacy strategy, and
had 12 different people afterwards. All of the original
dozen employees had moved to other positions or left the
bank. The banks CEO said their idea of marketing was
handing out balloons and suckers at the teller line and
running focus groups. The new marketing jobs were very
analytical, and the previous people couldnt or didnt
want to do that kind of work.
At Harrahs Entertainment, decisions used to be made
based on Harrahismspieces of conventional wisdom
that were believed to be true (Watson and Volonino,
2002). As Harrahs moved to fact-based decision making,
these Harrahisms were replaced by analyses and tests of
what worked best. Using this strategy, Harrahs evolved
from a blue-collar casino into the industry leader.
In the short run, a company either has an analytical
culture or it doesnt. Change needs to originate at the
top, and it may require replacing people who dont have
analytical skills.
A Comprehensive Data Infrastructure
A companys BI efforts cannot be any better than the
available data. That is why so much time and effort is
devoted to building data marts and warehouses, enhancing data quality, and putting data governance in place.
Once these exist, however, it is relatively easy to realize
the benefits of BI.
Continental Airlines has a comprehensive data
warehouse that includes marketing, revenue, operations, flight and crew data, and more (Watson et al,
2006). Because the data is in place, and the BI team
and business users are familiar with the data and have
the ability to build applications, new applications can
be developed in days rather than months, allowing
Continental to be very agile.
Talented BI Professionals
BI groups need a mix of technical and business skills.
Although good technical talent is a must, an enterprise
must have people who can work effectively with users.
Conclusion
My list of drivers and requirements for a BI-based organization is not all-inclusive, but if you get these things right,
you are well on your way to creating a successful BI-based
organization. n
References
Cooper, B.L., H.J. Watson, B.H. Wixom, and D.L.
Goodhue [2000]. Data Warehousing Supports
Corporate Strategy at First American Corporation,
MIS Quarterly, December, pp. 547567.
El Sawy, O.A. [2003]. The IS CoreThe 3 Faces of
IS Identity: Connection, Immersion, and Fusion,
Communications of the Association for Information
Systems, Vol. 12, pp. 588598.
Watson, H.J., B.H. Wixom, J.A. Hoffer, R. AndersonLehman, and A.M. Reynolds [2006]. Real-time
Business Intelligence: Best Practices at Continental
Airlines, Information Systems Management, Winter, pp.
718.
, and L. Volonino [2002]. Customer Relationship
Management at Harrahs Entertainment, DecisionMaking Support Systems: Achievements and Challenges
for the Decade, Forgionne, G.A., J.N.D. Gupta, and M.
Mora (eds.), Idea Group Publishing.
Wixom, B.H., and H.J. Watson [2010]. The BI-Based
Organization, International Journal of Business
Intelligence Research, JanuaryMarch, pp. 1325.
BEYOND BI
Beyond Business
Intelligence
Barry Devlin
Abstract
BEYOND BI
Reports
End user
workstation
Data marts
Metadata
Metadata
Data interface
Data dictionary and
business process
definitions
Local
data
Operational
systems
Operational systems
BEYOND BI
BEYOND BI
10
BEYOND BI
Personal
Action
Domain
Deferred
Immediate
Active
Thoughtful
Inventive
Business
Function
Assembly
Workflow
Activity
Creative Conditioning Analytical Decisional
Business
Information
Resource
Uncertified
Unstructured
Structured
In-flight
11
BEYOND BI
Reliance/usage
Global
Knowledge
density
Enterprise
Multiplex
Local
Compound
Personal
Derived
Vague
Atomic
In-flight
Live
Stable
Reconciled Historical
Timeliness/consistency
12
BEYOND BI
13
BEYOND BI
14
BEYOND BI
Reliance/usage
15
BEYOND BI
Conclusions
References
16
PREDICTIVE MODELING
Advances in Predictive
Modeling: How
In-Database Analytics
Will Evolve to Change
the Game
Sule Balkan and Michael Goul
Abstract
Introduction
17
PREDICTIVE MODELING
18
PREDICTIVE MODELING
SAMPLE
Input data,
sampling,
data partition
EXPLORE
Ranks-plots
variable selection
MODIFY
MODEL
Transform variable,
filter outliers,
missing imputation
ASSESS
Regression,
tree,
neural network
Assessment,
score,
report
19
PREDICTIVE MODELING
20
PREDICTIVE MODELING
Process
Benefits
Eliminate multiple versions of truth and large data set movements to andfrom analytical tool
suites
RET
AR
N
SIG
DE
MO
D
ORE
PL
X
E
MPLE
SA
SEMMA
EVALUA
TE
DEPLOYMENT
EM
PO
T
GE
ANCE
ORM
RF SURE
PE MEA
EMBED
ER
W
DEEPER
Figure 2. DEEPER phases guide the deployment, adoption, evaluation, and recalibration of
predictive models.
21
PREDICTIVE MODELING
22
PREDICTIVE MODELING
23
PREDICTIVE MODELING
Conclusion
24
PREDICTIVE MODELING
Intelligence-to-plan
Planning is streamlined; push and pull strategies are feasible; schema design can support planning
Plan-to-implementation
Scores maintained in-database; embedded SQL in HTML can facilitate view deployment; triggers and alerts can be used
to guard for exceptions
Implementation-to-use
Stress testing and global rollout follow database/warehouse methodologies and rely on common human and physical
resources
Use-to-results
Dashboards can be readily adapted; database/warehouse tables can be used as response aggregators
Results-to-evaluation
Re-examine all created models efficiently in light of response information; embed if-then logic to re-target nonresponders
Evaluation-to-decision
Consider applying different models; allow targeted respondents to rest; use database to provide decision support for
deciding to re-target or re-enter the intelligence cycle
Table 2. Generic value streams and areas for innovation with in-database analytics
References
Azevedo, Ana, and Manuel Felipe Santos [2008]. KDD,
SEMMA AND CRISP-DM: A Parallel Overview.
IADIS European Conference Data Mining,
pp. 182185.
Fayyad, U. M., Gregory Piatetski-Shapiro, Padhraic
Smyth, and Ramasamy Uthurusamy [1996]. Advances in
Knowledge Discovery and Data Mining, AAAI Press/The
MIT Press.
Gray, Paul, and Hugh J. Watson, Hugh [2007]. What Is
New in BI, Business Intelligence Journal, Vol. 12, No. 1.
Houghton, Bob, Omar A. El Sawy, Paul Gray, Craig
Donegan, and Ashish Joshi [2004]. Vigilant
Information Systems for Managing Enterprises in
Dynamic Supply Chains: Real-Time Dashboards at
Western Digital, MIS Quarterly Executive,
Vol. 3, No. 1.
Pfeffer, Jeffrey, and Robert I. Sutton [2006]. Evidence
Based Management, Harvard Business Review, January.
25
BI CASE STUDY
BI Case Study
SaaS Helps HR Firm Better Analyze Sales Pipeline
By Linda L. Briggs
When Tom Svec joined Taleo as marketing operations manager, he
immediately ran up against what he calls The Beast, a massive, 100-MBplus sales and marketing report in Microsoft Excel.
Ugly as it was, the monster Excel report, created weekly from Salesforce.
com data, served a critical function in helping with basic sales trend
analysis. Each Monday, data imported from Salesforce.com offered
snapshots of the previous weeks patterns to provide guidance on upcoming sales opportunities.
The information was critical to Taleos sales managers. The publicly traded
company, with 900 employees and just under $200 million in reported
revenue in 2009, provides software-as-a-service (SaaS) solutions for talent
management. Its products are designed to help HR departments attract,
hire, and retain talent; they range from recruiting and performance
management functions to compensation and succession planning tools.
Given Taleos current needs and projected continuing rapid growth, Svec
says he realized that along with the need for more sales visibilityespecially for senior managersthe risks of manipulating such critical data in
Excel had increased to an unacceptable level. He also needed a tool that
could manipulate data and provide information faster than Excel could.
I needed to look for a scalable solution, a reliable solution, and a low-risk
solution, Svec notes. He thus began a search for a BI tool to help manage
the sales opportunity data, particularly entry and pipeline metrics, for the
demand-generation group as well as for Taleos sales organization overall.
The tool would need to work with Salesforce.com initially, but eventually
might be used with other data as well. For example, Taleo uses a front-end
marketing automation and demand-generation platform called Eloqua to
execute and measure marketing activity. In time, Svec says, the company
may want to import and manipulate Eloqua data directly in its BI solution.
As the companys only marketing operations expertwith lots of overlap
with the sales operations team as wellSvec needed a complete lifecycle
view of both sales and marketing data. The demand-generation team and
I are very, very focused on everything from the top of the funnel all the
way through to close of business, he explains. That includes involvement in
26
BI CASE STUDY
Given limited
technology resources,
Svec wanted a quick,
easy implementation
that he could
accomplish without IT
involvement.
Today our [focus] is Salesforce,
Svec says, but looking down the
road 6, 12, however many months,
we wanted something built to
accommodate other data sources.
During a relatively quick six- to
eight-week implementation, Svec
worked closely with PivotLink in
a collaborative process, pushing
them a bit, he says, to integrate
more deeply with Salesforce. He
was pleased overall with how the
integration proceeded, in particular with the vendor relationship: I
think [PivotLink] was discovering
new things along the way, particu-
27
BI CASE STUDY
Future Plans
To that end, Svec first wants to
boost user interest in and adoption
of PivotLink throughout the company. Taleos finance department,
for example, is enviously eyeing
the PivotLink-produced reports
coming from Svecs group and is
thus a candidate for adoption.
Second, he envisions incorporating
additional data sourcesand this
is where PivotLinks ability to
handle disparate data sources will
be importantthus giving him
28
Taleos finance
department is
enviously eyeing the
PivotLink-produced
reports coming from
Svecs group and is
thus a candidate for
adoption.
Although the return on investment
from making better decisions is
always an elusive measure, Svec says
that PivotLinks pricing model has
proven economical for a company
the size of Taleo. Certainly, issues
such as better risk management
from avoiding the manipulation
AGILE BI
Enabling Agile BI
with a Compressed
Flat Files Architecture
William Sunna and Pankaj Agrawal
Abstract
Introduction
Large enterprises often find themselves unable to use
their core data effectively to perform BI. This is mainly
due to a lack of agility in their information systems and
the delays required to update their data warehouses with
new information. As business climates change rapidly,
new dimensions, key performance indicators, and derived
facts need to be added quickly to the data warehouse so the
29
AGILE BI
business can stay competitive. In addition, access to historical, low-granularity transaction data is vital for tactical and
strategic decision making.
Traditional data warehouse solutions that use relational
databases and implement complicated models may not be
sufficient to satisfy the agility needs of such BI environments. Introducing new data into a warehouse often
involves relatively long development and testing cycles.
Furthermore, the traditional data warehouse architectures
do not adequately cope with many years of transactional
data while meeting the performance expectations of end
users. Enterprises often settle for summarized data in the
warehouse, but this severely compromises their ability
to perform advanced analytics that require access to vast
amounts of low-level transactional data.
With all of these inconveniences, the need for an agile
solution that can handle these challenges has become
acute. This article presents an innovative architecture that
30
Case Study
We will use a simple case study to demonstrate the CFF
architecture. Suppose researchers and pricing analysts in
a major retail chain want to study the sales trends and
profitability of the products sold at their stores located in all
50 states. To support their analyses, they need 10 years of
detailed sales transaction data available online.
Lets assume the chain sells more than 30 categories of
products such as automotive and hardware. Each category
contains a wide range of products. For example, the
automotive category contains engine oil, windshield washer
fluid, and wiper blades; each of these products has a unique
product code. Once a day, all the stores send a flat file
containing point-of-sale (POS) transactions to headquarters. In addition to product and geographical information,
the transactions also contain other information such as the
manufacturer code, sales channel, cost of the product, and
sale price.
Assume that most users analysis is based on the geographic
location, product category, and the accounting month in
which the products are sold. Lets refer to such attributes
as major key attributes. For example, a business analyst
may request a profitability report for a selected number of
products in a given category in Illinois in the first quarter
of 2009.
AGILE BI
Operational
Data Sources
Data
Data
ETL
ta
Da
Compressed
Flat Files
Data Files
High-Performance
Query Engine
Query
Business
Analysts
Figure 1. Overview of CFF architecture
31
AGILE BI
The query engine then reads the data in the relevant files
and applies additional data filters such as the product code.
The next step will be aggregating the measures requested
(sales amount and cost) by product code and presenting
the results to the analyst. The resulting data sets can be
produced in any format, such as comma-separated or
SAS-formatted files. Note that the user interface presented
here is to be used as a data extraction interface, as opposed
to a standard reporting or presentation interface. Standard
BI tools such as MicroStrategy and Business Objects are
also supported by this architecture.
Once the user submits the request, the query details are
passed to the high-performance query engine that is
responsible for extracting data directly from the compressed
flat files. The query engine will first build a list of the
compressed flat files needed for the extraction based on
the major key attributes selected. In our example, only
one category code has been requested for one state during
a three-month period. Therefore, only three compressed
flat files out of the 180,000 total files are needed to satisfy
the request. This early selection of files represents a huge
32
AGILE BI
Metadata
Management
Module
User
Login
User
Interface
Security
Grid
Schema
Files
Query
Control
Process
CFF
High
Performance
Query Engine
Requests
Configuration
Repository
Results
Metadata-driven Approach
The CFF architecture is highly metadata-driven to allow for
maximum agility in both the initial build of the application
and any required maintenance in the future. Due to the
simplicity of the data model manifested in the CFF, the
data layouts (schema files) of the CFF are leveraged to
generate the contents of the user interface via the metadata
management module, as shown in Figure 3. Therefore, the
addition of new fields or modifications to existing fields are
reflected in the user interface unit without requiring any
programming effort.
The metadata management module also takes into
consideration the classification of attributes in the data
as specified in the schema files; it distinguishes major key
attributes from other dimensional attributes and measures.
Furthermore, it provides user privileges information to the
interface by consulting the security grid module, which
contains privileges and security rules for data access. The
user interface builds custom data extraction menus for
different users depending on what they are allowed to
query or extract.
33
AGILE BI
Business
Table
Manager
Staged
Raw Data
Extract
Acquire
Stage
Conformed
Operational
Data Source
Enterprise
Data Warehouse
Presentation
Conform
Synchronize
Integrate
Load EDW
Present
Reprocessing
Automated
Balance and Audit
Control
Operational
Source
Systems
Admin
Services
Notification
Services
Reporting and
Monitoring
Data
Architecture
ETL
Processes
ETL
Components
ETL
Metadata
34
AGILE BI
Phase
Development
Maintenance
Change Step
Traditional Architecture
CFF
New attribute
Delete attribute
Update attributes
Insert attributes
Delete attributes
Update attributes
Summary
According to Forrester Research principal analyst
Boris Evelson, the slightest change in a traditional data
warehouse solution can trigger massive amounts of work
involving changing multiple ETL routines, operational data
store attributes, facts, dimensions, major key performance
indicators, filters, reports, cubes, and dashboards. Such
changes cost time and money. This frustrates IT managers
and business users alike. The need for agile data management has, therefore, become acute. Such solutions should
not be driven by what tools are available but by smart
strategies and architectures.
In response to business needs for agility and lower cost,
we have presented a new but proven data management
architecture, the compressed flat files architecture. We have
demonstrated the simplicity of this architecture and how
it can be used to satisfy business needs in an agile environment. We have shown how this architecture is independent
of any technologies or tools. We also demonstrated how
it allows business users to analyze vast amounts of data at
the most granular level without any loss of detail, a feature
that would be prohibitively expensive to build using a
traditional solution.
35
BI EXPERTS PERSPECTIVE
BI Experts Perspective:
Pervasive BI
Jonathan G. Geiger, Arkady Maydanchik, and Philip Russom
Kelsey Graham has recently taken over as business intelligence (BI) director
at Omega, a manufacturer of office products. She inherits a BI staff that has
been in place for four years and boasts many accomplishments, includingan
enterprise data warehouse, performance dashboards, forecasting models, and
pricing models. There are eight BI professionals on staff; they perform roles and
tasks that vary: planning the BI architecture, developing and maintaining the
warehouse, and developing enterprisewide applications.
Her predecessor didnt make much progress in working with some of the business units to correct the data quality problems originating in the source systems,
and there is limited metadata that informs users about the quality of the data
they are accessing. Kelsey knows that as BI becomes more pervasive, these
data quality issues will demand more attention. She needs to think through what
actions to take.
1.
How should Kelsey start a dialogue with senior management about correcting
the data quality problems in the source systems? Her sense is that she needs
senior managements help to get the business units to allocate the necessary
resources to address the problems.
2.
What metadata about data quality does Kelsey need to provide to users?
Should she use categorical indicators such as excellent, good, fair, or poor,
or specific numerical indicators such as 90 percent accurate?
3.
36
BI EXPERTS PERSPECTIVE
Jonathan G. Geiger
Kelsey is dealing with
a BI program that
is perceived to be sufficiently
successful to be widely adopted
but that has some gaps under the
covers. In addition, her team seems
to be oblivious to the data quality
issues. Fortunately, she recognizes
that she needs to address the
deficiencies before providing wider
access to the data. She needs to
address the teams attitude, get a
realistic assessment of the situation,
gain senior management support,
and provide information on the
actual data quality.
Team Attitude
Kelseys team is proud of its
accomplishments, and probably
with good reason. They have, after
all, implemented a data warehouse
that provides data to its intended
audience, and this information
is used to provide benefits to
the organization. If Kelsey is to
address her data quality concerns,
she must first discuss these
concerns with her team.
Kelsey should speak with her team,
individually and collectively, to discuss the strengths and risks of the
existing environment. If there is
any merit to her suspicions, at least
some of the team members will
mention concerns about data quality. Being careful to give the team
credit for its accomplishments, as
the manager, Kelsey is in a position
to determine which deficiencies
need to be addressed first. If she
feels that the most significant issue
to be addressed prior to widespread
37
BI EXPERTS PERSPECTIVE
A systemic data
quality assessment
project can be
executed with limited
resources and in a
short time period.
Arkady Maydanchik
Data quality in data warehousing
and BI is a common problem
because the data comes to data
warehouses from numerous source
systems and through numerous
interfaces. Existing source data
problems migrate to the data
38
BI EXPERTS PERSPECTIVE
Philip Russom
I envy Kelsey. Then again, I dont.
Kelseys position is strong because
the BI team has an impressive
track record of producing a wide
range of successful BI solutions.
More strength comes from senior
managements direct support of an
expansion of BI solutions to more
employees and partnering companies. Kelsey and team have useful
and exciting work ahead, backed
up by an executive mandate. I envy
them shamelessly.
Thats the good news. Here comes
the bad.
Kelsey is contemplating crossing
the line by sticking her nose into
another teams business so she
can tell them their hard work isnt
good enough. Not only is this a
tall hurdle, but Kelsey will face
fearsome opposition on the other
side. Her chances of success are
39
BI EXPERTS PERSPECTIVE
40
SENTIMENT ANALYSIS
BI and Sentiment
Analysis
Mukund Deshpande and Avik Sarkar
Overview
Over the past two decades, there has been explosive growth
in the volume of information and articles published on the
Internet. With this enormous increase in online content
came the challenge of quickly finding specific information.
Google, AltaVista, MSN, Yahoo, and other search sites
stepped in and developed novel technologies to efficiently
search and harness the massive amount of Internet
information. Some search engines indexed keywords; others
used information hierarchies, arranging Web pages in a
structured way for easy browsing and for quickly locating
requested information. Text classification, also known as
text categorization, and text-clustering-based techniques
advanced, allowing Web pages to be automatically
organized into relevant hierarchies.
Web sites frequently discuss consumer products or
servicesfrom movies and restaurants to hotels and
politics. These shared opinions, termed the voice of the
customer, have become highly valuable to businesses and
organizations large and small. In fact, a recent study by
Deloitte found that 82 percent of purchase decisions have
been directly influenced by reviews. The rapid spread of
information over the Internet and the heightened impact
of the media have broken down physical and geographical
boundaries and caused organizations to become increasingly cautious about their reputations.
Businesses and market research firms have carried out
traditional sentiment analysis (also referred to as opinion
analysis or reputation analysis) for some time, but it
requires significant resources (travel to a given location;
staffing the survey process; offering survey respondents
incentives; and collecting, aggregating, and analyzing
results). Such analysis is cumbersome, time-consuming,
and costly.
41
SENTIMENT ANALYSIS
Text Mining
Research and business communities are using text
mining to harness large amounts of unstructured textual
information and transform it into structured information.
Text mining refers to a collection of techniques and
algorithms from multiple domains, such as data mining,
artificial intelligence, natural language processing (NLP),
machine learning, statistics, linguistics, and computational
linguistics. The objective of text mining is to put the
already accumulated data to better use and enhance an
organizations profitability. With a variety of customer
trends and behavior and increasing competition in each
market segment, the better the quality of the intelligence,
the better the chances of increasing profitability.
Sentiment Analysis
Sentiment analysis broadly refers to the identification and
assessment of opinions, emotions, and evaluations, which,
for the purposes of computation, might be defined as
written expressions of subjective mental states.
For example, consider this unstructured English sentence
in the context of a digital camera review:
Canon PowerShot A540 had good aperture
combined with excellent resolution.
Consider how sentiment analysis breaks down the information. First, the entities of interest are extracted from the
sentence:
42
SENTIMENT ANALYSIS
Fetch/Crawl
+ Cleanse
Text
Classification
Sentiment
Extraction
Entity
Extraction
Sentiment
Summary
Reports/
Charts
43
SENTIMENT ANALYSIS
44
SENTIMENT ANALYSIS
Sentiment Assignment
Suppose a sentence mentions digital camera features such
as resolution, usage, and megapixels; the sentence also
mentions a sentiment word, say, good. Should we relate
all or only some of the features to the sentiment word?
The issue becomes even more challenging when multiple
sentiment words or model names are mentioned in the
same sentence. Limited accuracy can be achieved by using
simple heuristics, such as assigning the model name or
feature to the nearest occurring sentiment word (this yields
acceptable accuracy). Deep NLP techniques may be used
to identify the model names or features (nouns) that are
related to the sentiment word (adjective or adverb) in the
context of that sentence.
Reviews often include comparative comments about
multiple digital camera models within single sentences. For
example:
45
SENTIMENT ANALYSIS
Examples
Sentiment Analysis of Digital Camera Reviews
There are many Web sites that contain reviews related to
digital cameras. Suppose a consumer is looking to buy a
particular digital camera and would like to get a complete
understanding of the cameras different features, strengths,
and weakness. She would then compare this information
to other contemporary digital camera models of the same
or competing brands. This would involve manual research
across all related Web sites, which might require days
or even months of research. Rather than doing this, the
consumer is more likely to gather incomplete information
by visiting just a few sites.
Automated sentiment analysis and BI-based reporting can
come to the rescue by providing a complete overview of
the many discussions about digital camera models and
their features.
First, a list of available digital camera models is collected
from the various companies catalogs to create a comprehensive taxonomy of digital camera models. An initial list
of digital camera features or dimensions is also collected
from these catalogs. All online discussion pages are
collected from the digital camera review Web sites.
One important consideration during taxonomy creation is
the grouping of synonymous entities. For example, Canon
PowerShot A540 may also be referred to as PowerShot
540 or Canon A540. All of these should be grouped as a
single entity. Again, the dimension camera resolution may
be referred to as resolution, megapixel, or simply MP;
all should be aligned to the single entity resolution.
The presence of the camera model name on a given page
indicates that it should be considered for further analysis.
The next challenge is to extract the entities of interest from
the textthat is, the digital camera model names and
features. A taxonomy-based method is used to extract those
that are known. Machine-learning-based approaches can
extract the others. Here, documents tagged with existing
model names and features are provided as training to the
machine learning the algorithm, which uses the data to
learn the extraction rules. These rules are then used to
extract entities from other incoming articles.
46
SENTIMENT ANALYSIS
Washington
Montana
Maine
North Dakota
Minnesota
Oregon
Idaho
South Dakota
Vt.
Wisconsin
Wyoming
Nebraska
Nevada
Utah
California
Arizona
New Mexico
Kansas
Oklahoma
N.H.
Mass.
R.I.
Pennsylvania
Indiana
Ohio
Md.
West
Virginia
Virginia
Missouri
Kentucky
North Carolina
Tennessee
Arkansas
South
Mississippi
Texas
Michigan
Iowa
Illinois
Colorado
New York
Alabama
Georgia
Negative
Louisiana
Florida
Positive
Alaska
Obama
McCain
47
SENTIMENT ANALYSIS
48
Final Thoughts
In closing, we would like to spotlight two observations that
highlight the growing need for sentiment analysis:
With the explosion of Web 2.0 platforms such as
blogs, discussion forums, peer-to-peer networks,
and various other types of social media all of
which continue to proliferate across the Internet
at lightning speed, consumers have at their
disposal a soapbox of unprecedented reach and
power by which to share their brand experiences
and opinions, positive or negative, regarding
any product or service. As major companies are
increasingly coming to realize, these consumer
voices can wield enormous influence in shaping the opinions of other consumersand,
ultimately, their brand loyalties, their purchase
decisions, and their own brand advocacy. Companies can respond to the consumer insights they
generate through social media monitoring and
analysis by modifying their marketing messages,
brand positioning, product development, and
other activities accordingly.
Jeff Zabin and Alex Jefferies [2008]. Social Media
Monitoring and Analysis: Generating Consumer Insights from
Online Conversation, Aberdeen Group Benchmark Report.
Marketers have always needed to monitor media
for information related to their brandswhether
its for public relations activities, fraud violations,
or competitive intelligence. But fragmenting
media and changing consumer behavior have
crippled traditional monitoring methods.
Technorati estimates that 75,000 new blogs are
created daily, along with 1.2 million new posts
SENTIMENT ANALYSIS
Bibliography
Baeza-Yates, Ricardo, and B. Ribeiro-Neto [1999].
Modern Information Retrieval. Addison-Wesley
Longman Publishing Company.
Cunningham, Hamish, Diana Maynard, Kalina
Bontcheva, and Valentin Tablan [2002]. GATE: A
Framework and Graphical Development Environment
for Robust NLP Tools and Applications. Proceedings of
the 40th Anniversary Meeting of the Association for
Computational Linguistics (ACL02). Philadelphia, PA.
49
AUTHOR INSTRUCTIONS
Editorial Acceptance
n
Submissions
tdwi.org/journalsubmissions
Materials should be submitted to:
Jennifer Agee, Managing Editor
E-mail: journal@tdwi.org
Analytic applications
50
DASHBOARD PLATFORMS
Dashboard Platforms
Alexander Chiang
Introduction
This article discusses the importance of a platformbased dashboard solution for business professionals
responsible for developing a digital dashboard. The first
two sections focus on business users and information
workers such as business analysts. The latter sections
speak to technologists, including software developers.
Alexander Chiang is director of consulting
services for Dundas Data Visualization, Inc.
alexanderc@dundas.com
We will take a brief look at the technologies in the context of the BI stack to help readers put the significance
of dashboard platforms into perspective. Next, we
will present the business challenges of the dashboards,
followed by an explanation of how these challenges can
be addressed with a dashboard solution that is based on
a platform.
51
DASHBOARD PLATFORMS
A collaborative workflow
52
DASHBOARD PLATFORMS
System integration
System Integration
In general, organizations have an existing IT infrastructure in place, including corporate Web portals.
Ideally, the chosen dashboard solution should be easy
to integrate within this infrastructure; traditionally,
most dashboard solutions and their respective tools
were standalone desktop applications. It is difficult to
Data source neutrality is important for dashboard vendors. That is, these solutions must connect to multiple
data sources to feed into the dashboards. Although
most dashboard products provide connectivity to
popular databases and analytics packages, the challenge
arises when an organization has to use a homegrown
analytics engine or a more specialized database. For
those businesses investing in complete BI solutions
provided by bigger vendors, this is a non-issue, as they
can leverage their consolidation technologies. For the
mid-market, choosing an end-to-end solution may
not be practical or within the budget. This makes it
important for the dashboard solution to provide a way
to connect to various types of data sources.
Specialized Data Visualizations
53
DASHBOARD PLATFORMS
54
DASHBOARD PLATFORMS
References
Chiang, Alexander [2009]. Creating Dashboards: The
Players and Collaboration You Need for a Successful
Project, Business Intelligence Journal, Vol. 14, No. 1,
pp. 5963.
Leuf, Bo, and Ward Cunningham [2001]. The Wiki Way:
Quick Collaboration on the Web, Addison-Wesley.
Choosing a platform that allows third-party visualizations to be integrated into the dashboard design
provides comfort to a company that is unsure of what
types of DVs it will need in the future.
Final Note
A dashboard solution should facilitate accelerated
dashboard production, infuse a sense of collaboration
among personnel involved in development, and provide
an open API to allow for a customized solution. Companies choosing a flexible and customizable dashboard
solution should be looking for these features.
The benefits are apparent and should be realized
immediately. Rapid dashboard development deployment decreases development costs and gets dashboards
into the hands of decision makers more quickly.
Interfaces and workflow designed for specific resources
reduce the learning curve and increase the likelihood of
corporate adoption so the software doesnt just sit on a
shelf. Finally, an open API will allow an organization
to customize a solution specific to its requirements,
lowering the risk of choosing an inappropriate solution
for its immediate and long-term needs. Viewing these
areas as checkboxes during a product evaluation will
help an organization select the right solution.
55
BI STATSHOTS
BI StatShots
Strategic Value. To test perceptions
of UDMs strategic status, this
reports survey asked respondents to
rate UDMs possible strategic value.
Philip Russom
In your organization, what are the top potential barriers to coordinating multiple data management
practices? (Select six or fewer.)
Corporate culture based on silos
61%
60%
44%
42%
32%
31%
28%
28%
28%
24%
23%
20%
20%
19%
14%
4%
Figure 1. Based on 857 responses from 179 respondents (4.8 average responses per respondent).
Source: Unified Data Management, TDWI Best Practices Report, Q2 2010.
56
www.tdwi.org/cbip
Set yourself
apart from
the crowd.
Get certified.
W H AT SE T S YOU A PA R T FROM T HE CROW D?
Distinguishing yourself in your career can be a difficult task.
Through TDWIs CBIP (Certified Business Intelligence Professional)
program, we help you define, establish, and set yourself apart
professionally with a meaningful BI certification credential.