You are on page 1of 48

Text and Web Mining

Outline
 Business Analytics
 Text mining
 Web Mining
 Graphics & Decision Making
Business Analytics
 Business Intelligence (BI)
The use of analitical methods, either manually of
automatically, to gain relationships from data
 Business Analyitics (BA)
The application of models directly to business
data. BA involves using MSS tools, especially
models, in assisting decision makers; essentially a
form of decision support
Business Analytics Overview
Business Analytics Foundation
 Executive Support/Information Systems

Rockart & Treacy, 1992


Executive Information Systems
 Executive Data Cube:
 Business units – organizes
the variables per
department, division,
product category, region,
etc.
 Business variables – items
of interest you want to track
or report on
 Time – compresses or
summarizes variables by
day, week, month, year, etc.
Executive Information Systems
 Executive data analysis:
 Personal analysis
 Database quering,
numerical manipulation,
graphic, etc.
 Status access
 Online access to business
Executive Information Systems
 Requiered support
organization:
 Systems & database
development
 Data gathering
 Training & ongoing
assistence
 Data-analyitical expertise
 etc
Executive Information Systems
 Current EIS capabilities:
 Drill down (access to detailed information)
 Roll up (access to aggregate information)
 Dashboards
 Status access (latest data)
 Exception reporting
 Trend reporting (time-series, charts)
 Environment scanning (external data)
 Navigation and Communication
Multidimesional Data Structures
 OLAP – Online Analytical Processing
 Initailly the basis of Data Warehouse data models
 Multidimensional OLAP (MOLAP)
 OLAP implementated via a specialized
multidimensional database (or data store) that
summarizes transactions into multidimensional views
ahead of time
Multidimesional Data Structures
 Multidimensional data structures: the relational model
goes 3-D

 How many dimensions can you find?


Multidimesional Data Structures
 Adding yet another dimension:

 A multi-dimensional data cube!


Multidimesional Data Structures
 Consulidation: Aggregation of data; simple roll-ups or
complex expressions involving interrelated data
 Drill-Down: Display of detail data which make up the
consolidated data
 Slicing and Dicing: ability to display database from different
viewpoints using rotation, consulidation and drill down
 Slice: 2-dimensional page from a data cube
 Dice: 3-D cube extracted from database
 Rotate (Pivot): Changing the dimensional orientation of
a data-cube.
Multidimesional Data Structures
Text Mining
 A specialized application of data mining
 The semi-automated process of extracting patterns
from large amounts of unstructured data sources
 Text files, words documents, PDF Files, XML files, etc.
Text Mining
 Sample applications of text mining
 Automatic detection of e-mail spam or phising
through analysis of the document content
 Automatic processing of massages or e-mail to route a
massage to the most appropiate party to process that
massage
 Analysis of warranty claims, help desk calls/reports,
and so on to identify the most common problems and
relevant responses
Text Mining
 Analysis of related scientific publications in
journals to create an automated summary view
of a particular discipline
 Creation of a “relationship view” of a document
collection
 Qualitative analysis of documents to detect fraud
Text Mining Categories
 Information extraction based on key phrases in
documents
 Top tracking among documents
 Summarization of documents
 Categoritation of documents into non-predefined
categories
 Concept Linking among documents
 Question Answering through pattern matching
Text Mining
 How to mine text:
Text Mining and Natural
Languange Processing (NLP)
 NLP is a subfield of Artificial Intelligent and computational
linguistics
 It aims to convert interpretations of human languange into
formal representations that can be manipulated by software
 Written language
 Analysis of key words & generation of response
 Spoken language
 Additional problem of speech recognation
 Speech recognation & Voice synthesis
Text Mining and Natural
Languange Processing (NLP)
 Some problems that make NLP difficult:
 Text segmentation & world boundary detection

 Word sense disambiguation

 Syntatic ambiguity

 Imperfect or irrelgular input

 Speech acts and plans


Text Mining and Natural
Languange Processing (NLP)
 Why people are still better:
Text Mining and Natural
Languange Processing (NLP)
 NLP Techniques
 Key word search
 Search specific words or word patterns known by
program
 Language processing
 Examining the syntax or semantics of a full statement
 Neural computing (relatively new)
 Use Neural Network for pattern recognation
Text Mining
 Biomedical text
mining
 DNA sequence
discovery and encoding
 Sequence alignment
and comparison
Web Mining
 Web Mining:
 The discovery and analysis of interesting and useful
information from the Web, about the Web, and usually
through Web-based tools
Web Mining
 A few challanges:
 The Web is too big
 The Web is too complex
 The Web is too dynamic
 The Web is not domain-specific
 The Web has everything
Web Mining
 Web usage mining
 The extraction of useful information from the data being
generated through webpage visits, transaction, etc.
 Web content mining
 The extraction of useful information from web pages
 Web structure mining
 The development of useful information from the links
included in the Web documents
Web Mining
Web Mining
 A simple process:

 1 Raw Web data is technically focussed and difficult to


interpret
 2 Prepare Web data by transforming it into business events
 3 Analyze business events and deliver actionable predictive
insight such as “97% of searches for help result in a
terminated visit”
Web Mining
 Uses for Web mining:
 Determine the lifetime value of clients
 Design cross-marketing strategies across products
 Evaluate promotional campaigns
 Target electronic ads and coupons at user group
 Predict user behavior
 Present dynamic information to users
Graphics & Decision Making

“Ghrapics reveal data; they encourage


the eye to compare different
pieces of data”

Edward R Tuffe
The Visual Display of Quantitative Information
Graphics & Decision Making

Napoleon’s Russian Campaign of 1812, map by Charles Joseph Minard, 1869


Graphics & Decision Making
 Purpose of graphics:
 Display complex structures and relationships
 Facilitate comparative analysis
 Assist in manufacturing production and project scheduling
 Find information
 Recognize patterns or organize data by geographic region
 Sequence events
 Visualize large amount of data

Source: Carey & Kacmar, 2004


Graphics & Decision Making
 Traditional business graphics:
 Bar charts
 Pie charts
 Scatter plots
 Control charts
 Time series charts
 Maps
 PERT charts
Graphics & Decision Making
Graphics & Decision Making
Graphics & Decision Making
Graphics & Decision Making
Graphics & Decision Making
Graphics & Decision Making
Geographic Information Systems
 GIS are computerized systems for storage, retrieval,
manipulation, analysis, and display of geographically
referenced data.
 Since GIS can include
physiscal, biological,
cultural, demographic,
economic information;
they are valuable tools
in the natural, social,
medical, and
engineeering sciences,
as well as in business
and planning”
Geographic Information Systems

 Sample GIS applications:


 Political campaign support
 Consumer marketing and
sales support
 Sales and territory analysis
 Site selection
 Fleet management
 Route planning
 Disaster planning
 Regulatory complaince
Geographic Information Systems
Geographic Information Systems
Geographic Information Systems
2001
Geographic Information Systems
2005
Geographic Information Systems
Google earth as a GIS platform
Geographic Information Systems

 GIS features have been


integrated into
location-aware
applications on most
mobile platforms

You might also like