You are on page 1of 14

Introducing Xapian

Justin Finkelstein | @ilithium PHP London, November 2011

Background and Alternatives


ReportBuyer.com

235,000 reports 1.3 GB of text Hierarchical categories MySQL FullText

Search alternatives:

Sphinx Lucene, etc

Justin Finkelstein | @ilithium PHP London, November 2011

Benefits
Easy to install and portable Fast searching Accurate Powerful

Justin Finkelstein | @ilithium PHP London, November 2011

Drawbacks
Not a database Single-writer, many reader Limited to 4.2 billion documents OS file size limit

Justin Finkelstein | @ilithium PHP London, November 2011

Installation
Binaries for Windows Vendor packages & PPA Source code Bindings

PHP C# Java Lua Perl Python, etc


Justin Finkelstein | @ilithium PHP London, November 2011

Indexing
Databases Documents

Document IDs must be unique Terms & Stemmers Term Generator Values

Justin Finkelstein | @ilithium PHP London, November 2011

Querying the Database


Simple Queries

Phrases: php development Logical operators: OR, AND, NOT, MAYBE Ranges: alpha..omega NEAR: shop NEAR pub Wildcards (report*) Synonyms

Query Parser make it easy


data management AND NOT real estate AND NEAR data

Justin Finkelstein | @ilithium PHP London, November 2011

Relevance and Sorting


BM25 Probabilistic Relevancy

Sort by rank/relevance Sort by values

Justin Finkelstein | @ilithium PHP London, November 2011

Getting Started
Know your data set What are users looking for How will they refine their search

Justin Finkelstein | @ilithium PHP London, November 2011

Report Buyer Product Data


item_guid title subtitle summary table of contents price category publication date availability product url
Justin Finkelstein | @ilithium PHP London, November 2011

Searching on Report Buyer


Search by:

Refine by:

Product code Category Title Price Title Subtitle Summary Table of Contents

Price Availability

Search text of:

Justin Finkelstein | @ilithium PHP London, November 2011

Mapping to Xapian
Full text with weighting:

Values:

name subtitle summary table of contents

price availability publication date

Facets: Text with prefixes:


Category Availability

title product code category

Justin Finkelstein PHP London, November 2011

Demo Walk-throughs
Indexing the data Query parser Sorting MatchSpies

Justin Finkelstein | @ilithium PHP London, November 2011

The End
http://readthedocs.org/docs/getting-started-with-xapian/
www.redwiredesign.com blog.ilithium.com

Justin Finkelstein | @ilithium PHP London, November 2011

You might also like