You are on page 1of 57

Configuring and

Administrating TREX
using the
TREX Admin Tool
Bettina Knauss
NetWeaver RIG EMEA
SAP AG
Walldorf 07.03.2007

TREX Introduction
TREX Administration Tool
Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces

TREX Architecture

SAP AG 2006, Title of Presentation / Speaker Name / 3

TREX Anatomy
TREX provides several client options:
Java client for communication via HTTP/XML in SAP EP
ABAP client for communication via RFC or ICM in SAP landscape
C++ and Python clients for internal calls and development

Inside TREX there are four main services:


Name server: manages TREX landscape, allocates TREX services
Index server: indexing and retrieval
Text-mining engine for classification and similarity search
Text search engine for search and indexing unstructured text
Attribute/BIA engine for searching and indexing structured data

Queue server: manages asynchronous indexing


Preprocessor: document retrieval, filtering, linguistic processing

SAP AG 2006, Title of Presentation / Speaker Name / 4

Name Server
TREX Name Server
Monitors the landscape (for high availability)
Maintains a list of all services and their status
Is called whenever one service seeks another
Distributes load

Example
When a service sends the name server the request
GetServer (IndexServer, SearchMode, MyIndex)
the name server answers with the address
<host>:<port>
of the index server to which to send the request

SAP AG 2006, Title of Presentation / Speaker Name / 5

Name Server: Initialization Files


The most important .ini files are:
topology.ini
Read by all name servers
Contains all index-relevant information
To edit the file, use the TREX standalone admin tool

sapprofile.ini
Read by all TREX services and clients
Specifies:
Port number of local name server
Host and port numbers of all master name servers
Amount of shared memory used by topology.ini data
System ID
Path information to where each service saves its data

SAP AG 2006, Title of Presentation / Speaker Name / 6

Queue Server
TREX Queue Server
Collects indexing requests
Sends them to the index server

Enables asynchronous indexing


Scheduled
Event triggered

Includes scheduler for replication


Replication runs on index server

Stores snapshots for replication

SAP AG 2006, Title of Presentation / Speaker Name / 7

Preprocessor 1
TREX Preprocessor
Delivers documents that the engines can use directly
Supports almost any data type
Gets documents via HTTP from source
Converts documents to HTML
Keeps the document structure
Extracts attributes
Metadata from DOC, PDF, ...
.*

Names from a lexicon

Tokenization
Stemming
Tagging
(using third party products)

SAP AG 2006, Title of Presentation / Speaker Name / 8

.ppt

.pdf

Application-specific attributes

Performs linguistic processing

.zip

.*

<html>
<head></head>
<body></body>
</html>

.*
.doc

Preprocessor 2
TREX Preprocessor
Reduces workload on the other engines
Works independently of the indexes
Is stateless
Java
Client

Python
Extensions

HTTP Client

ABAP
Client

Index
Server

Name
Server
Client

Preprocessor

HTML Filter

SAP AG 2006, Title of Presentation / Speaker Name / 9

Lexicon

Highlighting

Extensions

Text Search Engine


Indexing

Search

Many documents at once

Exact search

Up to tens of millions

SAP

Many formats *

Phrase search

PDF, doc, ppt, zip,

SAP AG

Boolean search
SAP AND ORACLE

Masked or wildcard search


Fuzzy or error-tolerant search
Kagermann

Linguistic search
Houses

Synchronous or asynchronous

Automatic language identification *


31 languages so far

Web*
Kagerman

With or without queueing

House

Attribute search
Author = Stevens

Attribute extraction *
DC and other metadata

Linguistic processing *
Tokenizing, tagging, stemming,

Ranking
TF*IDF and P-norm
* Via Preprocessor

SAP AG 2006, Title of Presentation / Speaker Name / 10

Text Mining Engine


Text Mining Search

Classification

See also

Taxonomy generation

Get more documents like this

Refine your query


More or less general similar terms

Guided navigation
See result set sizes in advance

Find similar documents


Based on document features

Find similar terms


Based on document statistics

SAP AG 2006, Title of Presentation / Speaker Name / 11

Based on QBC and/or EBC

Document classification
Assign documents to categories

Document feature extraction


Find characteristic terms

Document clustering
Discover sets of related documents

Term clustering
Discover sets of related terms

Attribute Engine
Attribute Indexing
Attribute engine has its own index

Dublin Core
Metadata Model

Separate from other indexes

Attributes are used for text mining


Classification

Resource

Similar document search


Taxonomy building
Feature extraction

has-attributes
Attribute Search
Search over document metadata
Title
Creator

SAP AG 2006, Title of Presentation / Speaker Name / 12

Title
Creator

Format
Identifier

Subject
Description

Source
Language

Publisher
Contributor

Relation
Coverage

Date
Type

Rights

How Search Works: An Example


BooksOnline, an online bookstore, offers a range of books with the
special feature that a customer can search the full text of the books
online before purchase

Auditor Jane wants to buy a book about invoice verification and


decides to evaluate the suggestions offered by the BooksOnline
search service
The following slides describe how the SAP NetWeaver search
service used by BooksOnline answers her search request

SAP AG 2006, Title of Presentation / Speaker Name / 13

Search Example 1
Jane enters invoice verification in the BooksOnline search field in
the Web browser on her office desktop PC
The business application forwards her search request, together
with information about the kind of search and which index to use,
as an HTTP/XML packet via the Java client to the Web server

Java Client

TREX
Name
Name
Server
Server

Preprocessor
Preprocessor

Queue
Queue
Server
Server

Index Server
Index Server
Web Server

Do a phrase search for


invoice verification in
the BooksOnline index

SAP AG 2006, Title of Presentation / Speaker Name / 14

Text Mining
Engine

Text Search
Engine

Attribute
Engine

Index

Index

Index

Search Example 2
The Web server converts the HTTP message into the format used
inside TREX and sends a request to the name server for the name
and address of a service to handle the request
The name server checks its list of available servers and tells the
Web server the address of an index server that has received the
fewest calls so far and can handle the request
Java Client

TREX
Where can I
send this
request?

Web Server

SAP AG 2006, Title of Presentation / Speaker Name / 15

Name
Name
Server
Server

Send it to
Index
Server 1

Preprocessor
Preprocessor

Queue
Queue
Server
Server

Index Server
Index Server

Text Mining
Engine

Text Search
Engine

Attribute
Engine

Index

Index

Index

Search Example 3
The Web server passes the search request to the index server as
a TCP/IP packet
The index server sees that the request is for a phrase search and
therefore forwards the phrase to the preprocessor for language
identification, tokenization, tagging, and stemming

Java Client

TREX
Name
Name
Server
Do a phrase searchServer
for

Queue
Queue
Server
Server

Preprocessor
Preprocessor

invoice verification in
the BooksOnline index

Web Server

!Text Mining
Engine

The language of the search


may be specified in advance
SAP AG 2006, Title of Presentation / Speaker Name / 16

Index

Index Server
Index Server
Text Search
Attribute
phrase
search Engine

Engine

A
this means work for
the preprocessor!
Index

Index

Search Example 4
The preprocessor performs linguistic processing. It parses
the phrase into two words invoice and verification, tags
them as nouns, reduces the words to their stem forms (in
this case the words themselves) and sends the result back
to the index server

Java Client

TREX
Name
Name
Server
Server

Web Server

Please preprocess
the phrase invoice
verification

Text Mining
Engine

Index

SAP AG 2006, Title of Presentation / Speaker Name / 17

Preprocessor
Preprocessor

Queue
Queue
Server
Server

Index Server
Index Server
Text Search
Done Attribute
two English
Engine
Engine
nouns in
stem form

Index

Index

Search Example 5
The index server sends the preprocessed request to the search
engine for optimization and result retrieval
The query optimizer in the search engine analyzes the query,
builds the query tree, which in this case has three nodes, one for
each word and one for AND, and optimizes it based on index
statistics, to evaluate the term that appears less frequently first
Java Client

TREX
Name
Name
Server
Server

Preprocessor
Preprocessor

Queue
Queue
Server
Server

This is a simple
query just a
Index Server
2-word phraseIndex
Server
Web Server

Text Mining
Engine

The index listing for invoice


is longer than the index
listing for verification Index
so
select verification first
SAP AG 2006, Title of Presentation / Speaker Name / 18

Text Search
Engine

Attribute
Engine

Index

Index

Search Example 6
The search engine finds the row for the term verification in the
BooksOnline index and selects the set of books containing the
term, then it checks this set of books against the row for the term
invoice and selects just the books that contain both terms
Next, it reads the addresses of the terms in each book, calculates
rank values, sorts the results, and takes the top ten (or more)
Java Client

TREX
Name
Name
Server
Server

Preprocessor
Preprocessor

Queue
Queue
Server
Server

Calculate ranks
and sort

Web Server

The rank of a document


for a term is defined by
TF*IDF ranking

SAP AG 2006, Title of Presentation / Speaker Name / 19

1. Find set of books


with verification
Text Mining
Engine
2. Find subset
with
invoice
3. Find addresses
Index
of both terms

Index Server
Index Server
Text Search
Engine

Attribute
Engine

Index

Index

Search Example 7
The search engine reads all the requested attributes for the
selected books, including titles and authors and keys to the
documents
The engine uses the keys to load the document contents and
scans the texts for the first occurrences of the search phrase (or
linguistic variants of the phrase) to create a brief summary text
Java Client

TREX
Name
Name
Server
Server

Web Server

The preprocessor
extracted attributes
during indexing

SAP AG 2006, Title of Presentation / Speaker Name / 20

Preprocessor
Preprocessor

Queue
Queue
Server
Server

Scans through the


Index Server
Index
Server
texts to find the
first few sentences
Text Mining
Text Search
Attribute
containing the
Engine
Engine
Engine
phrase invoice
verification
Index

Index

Index

Search Example 8
The search engine passes the result set back via the index server
for merging with results from any other engines (here none)
The index server passes the result set back via the Web server
and the Java client to the graphical user interface
Jane sees a ranked list of books about invoice verification less
than a second after she launched the search
Java Client

TREX
Name
Name
Server
Server

Preprocessor
Preprocessor

Queue
Queue
Server
Server

Index Server
Index Server
Web Server

73 books found
in 0.14 seconds

SAP AG 2006, Title of Presentation / Speaker Name / 21

Text Mining
Engine

Text Search
Engine

Attribute
Engine

Index

Index

Index

Search: Results
A sample document from the result set
Exact format depends on application settings

Internal
InternalAuditing
Auditing

by
byFirst
FirstAuthor,
Author,Second
SecondAuthor
Author
Economic
Publishers,
New
Economic Publishers, NewYork
York
Invoice
verification
is
the
next
Invoice verification is the nextstep
step......The
Theinvoice
invoiceverification
verificationininthe
the......

375
375pages
pagesFirst
Firstedition
editionISBN
ISBN0-3XX-XXXXX-X
0-3XX-XXXXX-X

Browse
Browsefull
fulltext
text
Document
attributes

Link to
document

Sample phrases
with search terms
highlighted

Results ranked by frequency of search terms


How many results returned depends on application settings

SAP AG 2006, Title of Presentation / Speaker Name / 22

How Indexing Works: An Example


BooksOnline worked hard to give Jane such a rewarding search
experience

Before Jane could see a ranked list of books about invoice


verification and browse the books, BooksOnline had to index the
full texts of all the books
The following slides describe how the SAP NetWeaver search
service used by BooksOnline indexes the full texts of the books on
show in its website

SAP AG 2006, Title of Presentation / Speaker Name / 23

Indexing Example 1
The BooksOnline indexing administrator opens the SAP queue
and index administration tool and sends a request to TREX to
create an index called BooksOnline
The ABAP Client forwards the index request as a Remote
Function Call via the SAP Gateway to the RFC server

ABAP Client

TREX
RFC Server

Gateway

Name
Name
Server
Server

Preprocessor
Preprocessor

Queue
Queue
Server
Server

Index Server
Index Server

Create an index
called BooksOnline

Indexing can be done just as


well via the Java Client
SAP AG 2006, Title of Presentation / Speaker Name / 24

Text Mining
Engine

Text Search
Engine

Attribute
Engine

Index

Index

Index

Indexing Example 2
The name server tells the RFC server the address of an index
server that can create the index
In a one-box implementation of TREX, this step is straightforward
unless the index server is down for some reason
The name server uses a round robin procedure to select an index
server
ABAP Client

TREX
RFC Server

Gateway

I want to create
a new index!

SAP AG 2006, Title of Presentation / Speaker Name / 25

Name
Name
Server
Server

So go to
<host>:<port>

Preprocessor
Preprocessor

Queue
Queue
Server
Server

Index Server
Index Server

Text Mining
Engine

Text Search
Engine

Attribute
Engine

Index

Index

Index

Indexing Example 3
The RFC server sends the request to the index server
The index server creates a new index called BooksOnline
The new index is still empty but any documents to be indexed can
now be assigned to it

ABAP Client

TREX
RFC Server

Gateway

I want to create a
new index called
BooksOnline
New index created
successfully!

SAP AG 2006, Title of Presentation / Speaker Name / 26

Name
Name
Server
Server

Preprocessor
Preprocessor

Queue
Queue
Server
Server

Index Server
Index Server
Text Mining
Engine

Text Search
Engine

Attribute
Engine

Index

Index

Index

Indexing Example 4
The administrator sends a request to index the new books in a
specified folder and write the results in the BooksOnline index
The digital files for the books are in a variety of formats, but TREX
can handle all standard formats, such as Microsoft Word (.doc),
Adobe Page Description Format (.pdf), and plain text (.txt)
The name server directs the request to an available queue server
ABAP Client

TREX
RFC Server

Gateway

Please index all the


books in folder
<path_to_folder>

Queueing is an option:
Indexing can also be
done immediately
SAP AG 2006, Title of Presentation / Speaker Name / 27

Name
Name
Server
Server

Preprocessor
Preprocessor

Queue
Queue
Server
Server

Please put this indexing request


in your queue and have the
Index Server
documents indexed
soon as
Index as
Server
TREX finds the time to do it
Text Mining
Engine

Text Search
Engine

Attribute
Engine

Index

Index

Index

Indexing Example 5

.htm .pdf
.xls

.doc

.ppt
.txt

The queue server receives the list of URLs for the documents
from the specified folder and persists them in a queue for the
index for as long as required until a preprocessor is available
Indexing a large collection of documents can be a long job, so the
administrator can hold or flush the queue manually at any time

ABAP Client

TREX
RFC Server

Gateway

Name
Name
Server
Server

Preprocessor
Preprocessor

Queue
Queue
Server
Server

Queue server receives document


URLs and adds them to the
Index Server
BooksOnline queue
indexing
Index for
Server

BooksOnline has all its books


available in digital form (either as
author files or scanned and OCR'd)
ready for indexing and browsing
SAP AG 2006, Title of Presentation / Speaker Name / 28

Text Mining
Engine

Text Search
Engine

Attribute
Engine

Index

Index

Index

Indexing Example 6
The queue server sends the documents to a free preprocessor
.htm .pdf
.xls

.doc

.ppt
.txt

The preprocessor fetches documents via URLs, filters them from


their original format to HTML, identifies their language, tokenizes
them into sequences of terms, tags the terms as nouns or
whatever, and stems the terms as appropriate
The preprocessed documents are then sent to the index server

ABAP Client

TREX
RFC Server

Gateway

Name
Name
Server
Server

Preprocessor
Preprocessor

Queue
Queue
Server
Server

A lot of work for


the preprocessor
Index Server
Index Server HTML

SAP AG 2006, Title of Presentation / Speaker Name / 29

Text Mining
Engine

Text Search
Engine

Attribute
Engine

Index

Index

Index

Indexing Example 7
The index server forwards the documents to the search engine
.htm .pdf
.xls

.doc

.ppt
.txt

For each document, the search engine writes a list of all its terms
and for each term it writes a list of positions in the document
where the term appears
The engine merges the term list for each document to the existing
term-document matrix that forms the BooksOnline index

ABAP Client

TREX
RFC Server

Gateway

Name
Name
Server
Server

Preprocessor
Preprocessor

Queue
Queue
Server
Server

Index Server
Index Server

Indexing data merged


into existing matrix

SAP AG 2006, Title of Presentation / Speaker Name / 30

Text Mining
Engine

Text Search
Engine

Attribute
Engine

Index

Index

Index

Indexing Example 8
The BooksOnline indexing administrator can use the TREX queue
and index administration tool to display the status of the indexing
process at any time during the process

ABAP Client

Gateway

SAP AG 2006, Title of Presentation / Speaker Name / 31

TREX

The tool lets you follow the progress of


queued documents from left to right

TREX Introduction
TREX Administration Tool
Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces

TREX Administration Tools


The TREX administration tool is the place to:
Set up and configure a distributed landscape
Monitor and administer services, indexes, queues, replication, ...
Show trace files, configuration files, version info, ...

There are three flavors:


Standalone
Richest feature set
Requires full access to TREX host

ABAP
Restricted feature set
Easy access on customer systems

Java
Highly restricted feature set
Browser access via Portal

SAP AG 2006, Title of Presentation / Speaker Name / 33

TREX Administration Tool


Start Tool
DEMO

SAP AG 2006, Title of Presentation / Speaker Name / 34

TREX Introduction
TREX Administration Tool
Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces

Landscape Example

SAP AG 2006, Title of Presentation / Speaker Name / 36

Distributed Scenario Simple Example


One master, multiple slaves

IS
M
MI
NS
PP
Q
QS
RFC
SN
S
SI
WS

Index server
Master
Master index
Name server
Preprocessor
Queue
Queue server
RFC server
Snapshots
Slave
Slave index
Web server

http://trex.wdf.sap.corp:1080/

SAP AG 2006, Title of Presentation / Speaker Name / 37

Master
mytrexmaster
WS

RFC
M NS
M QS
M IS
Q
Q

Slaves
mytrexslave1 ... 2

Q
MI

RFC

WS

S NS
S IS

PP

PP

Q
SN
SI

Documentation

SI

Distributed Search and Classification (TREX) 7.0 SP2 Systems

Distributed Scenarios Shared Backup Server


One backup, multiple masters, multiple slaves, one filer
Slave Hosts

Master Host
mytrexmaster1

Backup Host
mytrexbackup

RFC

WS

M NS
M QS
M IS

PP

mytrexslave1/2

File Server

RFC

WS

S NS
S IS

PP

T
RFC

WS

M NS
B QS
B IS

PP

http://trex.wdf.sap.corp:1080/

SAP AG 2006, Title of Presentation / Speaker Name / 38

Q
Q QQ
Master Host
mytrexmaster2

Q
MI MIQ

Q
Q SI SI
SN SNQSI

Slave Hosts
mytrexslave3/4

RFC

WS

RFC

WS

M NS
M QS
M IS

PP

S NS
S IS

PP

Documentation

Distributed Search and Classification (TREX) 7.0 SP2 Systems

Distributed Scenarios Dedicated Backup Servers


Multiple backups, multiple masters, multiple slaves, one filer
Backup Host
mytrexbackup1

Slave Hosts

Master Host
mytrexmaster1

RFC

WS

RFC

WS

M NS
B QS
B IS

PP

M NS
M QS
M IS

PP

mytrexslave1/2

File Server

RFC

WS

S NS
S IS

PP

T
Q
Q QQ

Backup Host
mytrexbackup2

Master Host
mytrexmaster2

Q Q
MI MI

Q
Q SI SI
SN SNQSI

Slave Hosts
mytrexslave3/4

RFC

WS

RFC

WS

RFC

WS

S NS
B QS
B IS

PP

M NS
M QS
M IS

PP

S NS
S IS

PP

http://trex.wdf.sap.corp:1080/

SAP AG 2006, Title of Presentation / Speaker Name / 39

Documentation

Distributed Search and Classification (TREX) 7.0 SP2 Systems

Landscape Configuration
DEMO

SAP AG 2006, Title of Presentation / Speaker Name / 40

TREX Introduction
TREX Administration Tool
Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces

Creating RFC Connection

SAP AG 2006, Title of Presentation / Speaker Name / 42

RFC Connection
DEMO

SAP AG 2006, Title of Presentation / Speaker Name / 43

TREX Introduction
TREX Administration Tool
Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces

Reorg I

SAP AG 2006, Title of Presentation / Speaker Name / 45

Reorg II

SAP AG 2006, Title of Presentation / Speaker Name / 46

Reorg III

SAP AG 2006, Title of Presentation / Speaker Name / 47

Reorg IV

SAP AG 2006, Title of Presentation / Speaker Name / 48

Alert area

SAP AG 2006, Title of Presentation / Speaker Name / 49

Alert server Configuration

SAP AG 2006, Title of Presentation / Speaker Name / 50

Checks that are executed and required actions I

SAP AG 2006, Title of Presentation / Speaker Name / 51

Checks that are executed and required actions II

SAP AG 2006, Title of Presentation / Speaker Name / 52

Checks that are executed and required actions III

SAP AG 2006, Title of Presentation / Speaker Name / 53

Checks that are executed and required actions IV

SAP AG 2006, Title of Presentation / Speaker Name / 54

TREX Introduction
TREX Administration Tool
Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces

TREX Traces
DEMO

SAP AG 2006, Title of Presentation / Speaker Name / 56

Copyright 2006 SAP AG. All Rights Reserved


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be
changed without prior notice.
Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.
Microsoft, Windows, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation.
IBM, DB2, DB2 Universal Database, OS/2, Parallel Sysplex, MVS/ESA, AIX, S/390, AS/400, OS/390, OS/400, iSeries, pSeries, xSeries, zSeries, System i, System i5, System p,
System p5, System x, System z, System z9, z/OS, AFP, Intelligent Miner, WebSphere, Netfinity, Tivoli, Informix, i5/OS, POWER, POWER5, POWER5+, OpenPower and PowerPC are
trademarks or registered trademarks of IBM Corporation.
Adobe, the Adobe logo, Acrobat, PostScript, and Reader are either trademarks or registered trademarks of Adobe Systems Incorporated in the United States and/or other countries.
Oracle is a registered trademark of Oracle Corporation.
UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group.
Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems, Inc.
HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C , World Wide Web Consortium, Massachusetts Institute of Technology.
Java is a registered trademark of Sun Microsystems, Inc.
JavaScript is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by Netscape.
MaxDB is a trademark of MySQL AB, Sweden.
SAP, R/3, mySAP, mySAP.com, xApps, xApp, SAP NetWeaver, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP AG in Germany and in several other countries all over the world. All other product and service names mentioned are the trademarks of their respective companies.
Data contained in this document serves informational purposes only. National product specifications may vary.

The information in this document is proprietary to SAP. No part of this document may be reproduced, copied, or transmitted in any form or for any purpose without the express prior
written permission of SAP AG.
This document is a preliminary version and not subject to your license agreement or any other agreement with SAP. This document contains only intended strategies, developments,
and functionalities of the SAP product and is not intended to be binding upon SAP to any particular course of business, product strategy, and/or development. Please note that this
document is subject to change and may be changed by SAP at any time without notice.
SAP assumes no responsibility for errors or omissions in this document. SAP does not warrant the accuracy or completeness of the information, text, graphics, links, or other items
contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability,
fitness for a particular purpose, or non-infringement.
SAP shall have no liability for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these materials. This
limitation shall not apply in cases of intent or gross negligence.
The statutory liability for personal injury and defective products is not affected. SAP has no control over the information that you may access through the use of hot links contained in
these materials and does not endorse your use of third-party Web pages nor provide any warranty whatsoever relating to third-party Web pages.

SAP AG 2006, Title of Presentation / Speaker Name / 57

You might also like