Professional Documents
Culture Documents
Administrating TREX
using the
TREX Admin Tool
Bettina Knauss
NetWeaver RIG EMEA
SAP AG
Walldorf 07.03.2007
TREX Introduction
TREX Administration Tool
Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces
TREX Architecture
TREX Anatomy
TREX provides several client options:
Java client for communication via HTTP/XML in SAP EP
ABAP client for communication via RFC or ICM in SAP landscape
C++ and Python clients for internal calls and development
Name Server
TREX Name Server
Monitors the landscape (for high availability)
Maintains a list of all services and their status
Is called whenever one service seeks another
Distributes load
Example
When a service sends the name server the request
GetServer (IndexServer, SearchMode, MyIndex)
the name server answers with the address
<host>:<port>
of the index server to which to send the request
sapprofile.ini
Read by all TREX services and clients
Specifies:
Port number of local name server
Host and port numbers of all master name servers
Amount of shared memory used by topology.ini data
System ID
Path information to where each service saves its data
Queue Server
TREX Queue Server
Collects indexing requests
Sends them to the index server
Preprocessor 1
TREX Preprocessor
Delivers documents that the engines can use directly
Supports almost any data type
Gets documents via HTTP from source
Converts documents to HTML
Keeps the document structure
Extracts attributes
Metadata from DOC, PDF, ...
.*
Tokenization
Stemming
Tagging
(using third party products)
.ppt
Application-specific attributes
.zip
.*
<html>
<head></head>
<body></body>
</html>
.*
.doc
Preprocessor 2
TREX Preprocessor
Reduces workload on the other engines
Works independently of the indexes
Is stateless
Java
Client
Python
Extensions
HTTP Client
ABAP
Client
Index
Server
Name
Server
Client
Preprocessor
HTML Filter
Lexicon
Highlighting
Extensions
Search
Exact search
Up to tens of millions
SAP
Many formats *
Phrase search
SAP AG
Boolean search
SAP AND ORACLE
Linguistic search
Houses
Synchronous or asynchronous
Web*
Kagerman
House
Attribute search
Author = Stevens
Attribute extraction *
DC and other metadata
Linguistic processing *
Tokenizing, tagging, stemming,
Ranking
TF*IDF and P-norm
* Via Preprocessor
Classification
See also
Taxonomy generation
Guided navigation
See result set sizes in advance
Document classification
Assign documents to categories
Document clustering
Discover sets of related documents
Term clustering
Discover sets of related terms
Attribute Engine
Attribute Indexing
Attribute engine has its own index
Dublin Core
Metadata Model
Resource
has-attributes
Attribute Search
Search over document metadata
Title
Creator
Title
Creator
Format
Identifier
Subject
Description
Source
Language
Publisher
Contributor
Relation
Coverage
Date
Type
Rights
Search Example 1
Jane enters invoice verification in the BooksOnline search field in
the Web browser on her office desktop PC
The business application forwards her search request, together
with information about the kind of search and which index to use,
as an HTTP/XML packet via the Java client to the Web server
Java Client
TREX
Name
Name
Server
Server
Preprocessor
Preprocessor
Queue
Queue
Server
Server
Index Server
Index Server
Web Server
Text Mining
Engine
Text Search
Engine
Attribute
Engine
Index
Index
Index
Search Example 2
The Web server converts the HTTP message into the format used
inside TREX and sends a request to the name server for the name
and address of a service to handle the request
The name server checks its list of available servers and tells the
Web server the address of an index server that has received the
fewest calls so far and can handle the request
Java Client
TREX
Where can I
send this
request?
Web Server
Name
Name
Server
Server
Send it to
Index
Server 1
Preprocessor
Preprocessor
Queue
Queue
Server
Server
Index Server
Index Server
Text Mining
Engine
Text Search
Engine
Attribute
Engine
Index
Index
Index
Search Example 3
The Web server passes the search request to the index server as
a TCP/IP packet
The index server sees that the request is for a phrase search and
therefore forwards the phrase to the preprocessor for language
identification, tokenization, tagging, and stemming
Java Client
TREX
Name
Name
Server
Do a phrase searchServer
for
Queue
Queue
Server
Server
Preprocessor
Preprocessor
invoice verification in
the BooksOnline index
Web Server
!Text Mining
Engine
Index
Index Server
Index Server
Text Search
Attribute
phrase
search Engine
Engine
A
this means work for
the preprocessor!
Index
Index
Search Example 4
The preprocessor performs linguistic processing. It parses
the phrase into two words invoice and verification, tags
them as nouns, reduces the words to their stem forms (in
this case the words themselves) and sends the result back
to the index server
Java Client
TREX
Name
Name
Server
Server
Web Server
Please preprocess
the phrase invoice
verification
Text Mining
Engine
Index
Preprocessor
Preprocessor
Queue
Queue
Server
Server
Index Server
Index Server
Text Search
Done Attribute
two English
Engine
Engine
nouns in
stem form
Index
Index
Search Example 5
The index server sends the preprocessed request to the search
engine for optimization and result retrieval
The query optimizer in the search engine analyzes the query,
builds the query tree, which in this case has three nodes, one for
each word and one for AND, and optimizes it based on index
statistics, to evaluate the term that appears less frequently first
Java Client
TREX
Name
Name
Server
Server
Preprocessor
Preprocessor
Queue
Queue
Server
Server
This is a simple
query just a
Index Server
2-word phraseIndex
Server
Web Server
Text Mining
Engine
Text Search
Engine
Attribute
Engine
Index
Index
Search Example 6
The search engine finds the row for the term verification in the
BooksOnline index and selects the set of books containing the
term, then it checks this set of books against the row for the term
invoice and selects just the books that contain both terms
Next, it reads the addresses of the terms in each book, calculates
rank values, sorts the results, and takes the top ten (or more)
Java Client
TREX
Name
Name
Server
Server
Preprocessor
Preprocessor
Queue
Queue
Server
Server
Calculate ranks
and sort
Web Server
Index Server
Index Server
Text Search
Engine
Attribute
Engine
Index
Index
Search Example 7
The search engine reads all the requested attributes for the
selected books, including titles and authors and keys to the
documents
The engine uses the keys to load the document contents and
scans the texts for the first occurrences of the search phrase (or
linguistic variants of the phrase) to create a brief summary text
Java Client
TREX
Name
Name
Server
Server
Web Server
The preprocessor
extracted attributes
during indexing
Preprocessor
Preprocessor
Queue
Queue
Server
Server
Index
Index
Search Example 8
The search engine passes the result set back via the index server
for merging with results from any other engines (here none)
The index server passes the result set back via the Web server
and the Java client to the graphical user interface
Jane sees a ranked list of books about invoice verification less
than a second after she launched the search
Java Client
TREX
Name
Name
Server
Server
Preprocessor
Preprocessor
Queue
Queue
Server
Server
Index Server
Index Server
Web Server
73 books found
in 0.14 seconds
Text Mining
Engine
Text Search
Engine
Attribute
Engine
Index
Index
Index
Search: Results
A sample document from the result set
Exact format depends on application settings
Internal
InternalAuditing
Auditing
by
byFirst
FirstAuthor,
Author,Second
SecondAuthor
Author
Economic
Publishers,
New
Economic Publishers, NewYork
York
Invoice
verification
is
the
next
Invoice verification is the nextstep
step......The
Theinvoice
invoiceverification
verificationininthe
the......
375
375pages
pagesFirst
Firstedition
editionISBN
ISBN0-3XX-XXXXX-X
0-3XX-XXXXX-X
Browse
Browsefull
fulltext
text
Document
attributes
Link to
document
Sample phrases
with search terms
highlighted
Indexing Example 1
The BooksOnline indexing administrator opens the SAP queue
and index administration tool and sends a request to TREX to
create an index called BooksOnline
The ABAP Client forwards the index request as a Remote
Function Call via the SAP Gateway to the RFC server
ABAP Client
TREX
RFC Server
Gateway
Name
Name
Server
Server
Preprocessor
Preprocessor
Queue
Queue
Server
Server
Index Server
Index Server
Create an index
called BooksOnline
Text Mining
Engine
Text Search
Engine
Attribute
Engine
Index
Index
Index
Indexing Example 2
The name server tells the RFC server the address of an index
server that can create the index
In a one-box implementation of TREX, this step is straightforward
unless the index server is down for some reason
The name server uses a round robin procedure to select an index
server
ABAP Client
TREX
RFC Server
Gateway
I want to create
a new index!
Name
Name
Server
Server
So go to
<host>:<port>
Preprocessor
Preprocessor
Queue
Queue
Server
Server
Index Server
Index Server
Text Mining
Engine
Text Search
Engine
Attribute
Engine
Index
Index
Index
Indexing Example 3
The RFC server sends the request to the index server
The index server creates a new index called BooksOnline
The new index is still empty but any documents to be indexed can
now be assigned to it
ABAP Client
TREX
RFC Server
Gateway
I want to create a
new index called
BooksOnline
New index created
successfully!
Name
Name
Server
Server
Preprocessor
Preprocessor
Queue
Queue
Server
Server
Index Server
Index Server
Text Mining
Engine
Text Search
Engine
Attribute
Engine
Index
Index
Index
Indexing Example 4
The administrator sends a request to index the new books in a
specified folder and write the results in the BooksOnline index
The digital files for the books are in a variety of formats, but TREX
can handle all standard formats, such as Microsoft Word (.doc),
Adobe Page Description Format (.pdf), and plain text (.txt)
The name server directs the request to an available queue server
ABAP Client
TREX
RFC Server
Gateway
Queueing is an option:
Indexing can also be
done immediately
SAP AG 2006, Title of Presentation / Speaker Name / 27
Name
Name
Server
Server
Preprocessor
Preprocessor
Queue
Queue
Server
Server
Text Search
Engine
Attribute
Engine
Index
Index
Index
Indexing Example 5
.htm .pdf
.xls
.doc
.ppt
.txt
The queue server receives the list of URLs for the documents
from the specified folder and persists them in a queue for the
index for as long as required until a preprocessor is available
Indexing a large collection of documents can be a long job, so the
administrator can hold or flush the queue manually at any time
ABAP Client
TREX
RFC Server
Gateway
Name
Name
Server
Server
Preprocessor
Preprocessor
Queue
Queue
Server
Server
Text Mining
Engine
Text Search
Engine
Attribute
Engine
Index
Index
Index
Indexing Example 6
The queue server sends the documents to a free preprocessor
.htm .pdf
.xls
.doc
.ppt
.txt
ABAP Client
TREX
RFC Server
Gateway
Name
Name
Server
Server
Preprocessor
Preprocessor
Queue
Queue
Server
Server
Text Mining
Engine
Text Search
Engine
Attribute
Engine
Index
Index
Index
Indexing Example 7
The index server forwards the documents to the search engine
.htm .pdf
.xls
.doc
.ppt
.txt
For each document, the search engine writes a list of all its terms
and for each term it writes a list of positions in the document
where the term appears
The engine merges the term list for each document to the existing
term-document matrix that forms the BooksOnline index
ABAP Client
TREX
RFC Server
Gateway
Name
Name
Server
Server
Preprocessor
Preprocessor
Queue
Queue
Server
Server
Index Server
Index Server
Text Mining
Engine
Text Search
Engine
Attribute
Engine
Index
Index
Index
Indexing Example 8
The BooksOnline indexing administrator can use the TREX queue
and index administration tool to display the status of the indexing
process at any time during the process
ABAP Client
Gateway
TREX
TREX Introduction
TREX Administration Tool
Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces
ABAP
Restricted feature set
Easy access on customer systems
Java
Highly restricted feature set
Browser access via Portal
TREX Introduction
TREX Administration Tool
Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces
Landscape Example
IS
M
MI
NS
PP
Q
QS
RFC
SN
S
SI
WS
Index server
Master
Master index
Name server
Preprocessor
Queue
Queue server
RFC server
Snapshots
Slave
Slave index
Web server
http://trex.wdf.sap.corp:1080/
Master
mytrexmaster
WS
RFC
M NS
M QS
M IS
Q
Q
Slaves
mytrexslave1 ... 2
Q
MI
RFC
WS
S NS
S IS
PP
PP
Q
SN
SI
Documentation
SI
Master Host
mytrexmaster1
Backup Host
mytrexbackup
RFC
WS
M NS
M QS
M IS
PP
mytrexslave1/2
File Server
RFC
WS
S NS
S IS
PP
T
RFC
WS
M NS
B QS
B IS
PP
http://trex.wdf.sap.corp:1080/
Q
Q QQ
Master Host
mytrexmaster2
Q
MI MIQ
Q
Q SI SI
SN SNQSI
Slave Hosts
mytrexslave3/4
RFC
WS
RFC
WS
M NS
M QS
M IS
PP
S NS
S IS
PP
Documentation
Slave Hosts
Master Host
mytrexmaster1
RFC
WS
RFC
WS
M NS
B QS
B IS
PP
M NS
M QS
M IS
PP
mytrexslave1/2
File Server
RFC
WS
S NS
S IS
PP
T
Q
Q QQ
Backup Host
mytrexbackup2
Master Host
mytrexmaster2
Q Q
MI MI
Q
Q SI SI
SN SNQSI
Slave Hosts
mytrexslave3/4
RFC
WS
RFC
WS
RFC
WS
S NS
B QS
B IS
PP
M NS
M QS
M IS
PP
S NS
S IS
PP
http://trex.wdf.sap.corp:1080/
Documentation
Landscape Configuration
DEMO
TREX Introduction
TREX Administration Tool
Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces
RFC Connection
DEMO
TREX Introduction
TREX Administration Tool
Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces
Reorg I
Reorg II
Reorg III
Reorg IV
Alert area
TREX Introduction
TREX Administration Tool
Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces
TREX Traces
DEMO
The information in this document is proprietary to SAP. No part of this document may be reproduced, copied, or transmitted in any form or for any purpose without the express prior
written permission of SAP AG.
This document is a preliminary version and not subject to your license agreement or any other agreement with SAP. This document contains only intended strategies, developments,
and functionalities of the SAP product and is not intended to be binding upon SAP to any particular course of business, product strategy, and/or development. Please note that this
document is subject to change and may be changed by SAP at any time without notice.
SAP assumes no responsibility for errors or omissions in this document. SAP does not warrant the accuracy or completeness of the information, text, graphics, links, or other items
contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability,
fitness for a particular purpose, or non-infringement.
SAP shall have no liability for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these materials. This
limitation shall not apply in cases of intent or gross negligence.
The statutory liability for personal injury and defective products is not affected. SAP has no control over the information that you may access through the use of hot links contained in
these materials and does not endorse your use of third-party Web pages nor provide any warranty whatsoever relating to third-party Web pages.