Professional Documents
Culture Documents
Tomer Steinberg
SAP Israel Public
Agenda
Law enforcement
Intelligence
Social Media Analytics
Precision Marketing
Predictive Maintenance
Investment trade
Credit Scoring
Patents
Mary’s interest
is leggings
jackets
JANE Free shipping on %category%
and more thru 11/28
Inxight spun off Inxight acquired Business Objects First integration Text analysis in
from PARC, a by Business acquired by SAP into SAP HANA SAP HANA
Xerox Company Objects Text analysis technology Foundation for full-text Foundation for virtually
Finite-State technology Integration of text continues to focus on BI search, BI and sentiment any type of unstructured
for modeling natural analysis technology into applications analysis applications textual data processing in
language BI applications the platform
14 engineers
7 computational linguists
Application Development
Processing Engine
Life-cycle Management
Process Orchestration
Unified Administration
OLTP | OLAP | Search | Text Analysis |Predictive | Events | Spatial | Rules | Planning | Calculators
Security
Database Services
Integration Services
Data Virtualization | Replication | ETL/ELT | Mobile Synch | Streaming
Capabilities
Native full-text and fuzzy search
In-database text analysis
Graphical modeling of search models
Info Access – HTML5 UI toolkit and API for JavaScript
Benefits
Less data duplication and movement – leverage one
infrastructure for analytical and search workloads
Extract salient information from unstructured textual
data
Easy-to-use modeling tools – HANA Studio
Build search applications quickly – Info Access
In addition to string matching, Capabilities range from basic Text mining makes semantic
HANA features full-text search tokenization and stemming to determinations about the overall
which works on content stored more complex semantic content of documents relative to
in tables or exposed via views. analysis in the form of entity other documents. Capabilities
Just like searching on the and fact extraction. Text include key term identification
Internet, full-text search analysis applies within individual and document categorization.
finds terms irrespective of the documents and is the Text mining is complementary to
sequence of characters and foundation for both full-text text analysis.
words. search and text mining.
In addition to string matching, Capabilities range from basic Text mining makes semantic
HANA features full-text search tokenization and stemming to determinations about the overall
which works on content stored more complex semantic content of documents relative to
in tables or exposed via views. analysis in the form of entity other documents. Capabilities
Just like searching on the and fact extraction. Text include key term identification
Internet, full-text search analysis applies within individual and document categorization.
finds terms irrespective of the documents and is the Text mining is complementary to
sequence of characters and foundation for both full-text text analysis.
words. search and text mining.
InBy addition
MARK SCOTT to It’s string
surprising
February 16, 2015 matching,
that the Tristan and Isolde story, a Capabilities range from basic Text mining makes semantic
medieval Celtic tale that has long figured in literature,
HANA features
As consumers filmthe
change andfull-text
in Wagner’s
way search
opera
they use their of the same name, hastokenization
been and stemming to determinations about the overall
so infrequently used by ballet. Like “Romeo and Juliet,” it
which worksitself
Vodafone is finding has
on content
smartphones, surf the web and watch television,
instant
in need attraction stored
and
of a face-lift. union
After between lovers from
more complex semantic content of documents relative to
opposing camps, with business, Category
society and history against them, Classical_Music
inyears
tables or
of focusing exposed
heavily
Vodafone, based in and
Britain
on its
tragic via
cellphone
anddeath
views.
at its end.
the world’s
analysis
You can imagine what John
second-
in the form of entity other documents. Capabilities
Key terms Semperoper, Wagner, ballet,
Just
largest like
mobilesearching
Cranko
operator behind
guns-blazing
on
or Kenneth
China the
MacMillan,
Mobile basedwho
story balletsbroadband.
and
on brought the big,
like “Manon” and “Eugene
all-
fact extraction.
John Cranko,
Text
Royal Ballet School …
include key term identification
subscribers, is concentrating on high-speed
Internet, full-text Onegin” to search
the world in the 1960s and 1970s (ballet analysis
box applies within individual and document categorization.
offices are still thanking
Once, Europeans were happy to pay for separate them), might have done with it.
finds terms
cellphone, cable andirrespective
pay-TV services. Now, ofthey
the prefer documents and is the Text mining is complementary to
sequence of characters and
them bundled into a single package that streams content
to any device — a smartphone, tablet or Internet-
foundation for both full-text text analysis.
words.
connected television. search and text mining.
Regional rivals like OrangeCategory
of France and Deutsche
Telecommunications
Telekom of Germany have moved quickly to offer …
Key terms Vodafone, broadband, cellphone business,
Orange, Deutsche Telekom, …
Full-text index ‘Attaches’ to the table column
Part-of-Speech
Tags word categories
Examples: quick: Adj; houses: Nn-Pl
Noun groups
Identifies concepts
Examples: text data; global piracy
Entity extraction
Classifies pre-defined entity types
Examples: Winston Churchill: PERSON; U.K.: COUNTRY;
Entities:
John Lennon was one of the Beatles.
<PERSON>John Lennon</PERSON> was one of the
<ORGANIZATION@ENTERTAINMENT>Beatles</ORGANIZATION@ENTERTAINMENT>.
Facts:
I love your product.
I <STRONGPOSITIVESENTIMENT>love</STRONGPOSITIVESENTIMENT> <TOPIC>your
product</TOPIC>.
Who: People, job title, and national Where: Addresses, cities, states,
identification numbers countries, facilities, internet
What: Companies, organizations, addresses, and phone numbers
financial indexes, and products How much: Currencies and units of measure
When: Dates, days, holidays, months, Generic concepts: text data, global piracy, and so
years, times, and time periods on
Languages:
Arabic, English, Dutch, Farsi, French, German,
Italian, Japanese, Korean, Portuguese, Russian,
Simplified Chinese, Spanish, Traditional Chinese
Voice of customer
Sentiments: strong positive, weak positive, neutral, weak negative, strong negative, and problems
Requests: general and contact info
Emoticons: strong positive, weak positive, weak negative, strong negative
Profanity: ambiguous and unambiguous
Languages:
English, Dutch*, French, German, Italian,
Portuguese, Russian, Simplified Chinese, Spanish,
Traditional Chinese
*Emoticons and profanity only
Language: Language:
English English
Grammatical Parsing:
Load
Documents
Analyze Create
Results Text Index
Load
Documents
Analyze Create
Results Text Index
Load
Documents
Analyze Create
Results Text Index
Used for:
Identify similar documents
Identify key terms of a document
Identify related terms
Categorize new documents based on a training corpus
Scenarios
Highlight the key terms when viewing a patent document
Identify similar incidents for faster problem solving
Categorize new scientific papers along a hierarchy of topics
Business Benefits
Understand your Customer/Process
Tomer Steinberg
Tomer.Steinberg@sap.com