You are on page 1of 26

Presenting Lily

Bay Area HBase UG - NYC - 10/11/2010

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org


Devoxx: Nov. 15-19, Antwerp, Belgium
NoSQL/Cloud track

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2


Outerthought

» software product company

» scalable content applications

» open source product portfolio

» Java, REST, internet


THIS NOTEBOOK BELONGS TO:

Noteblock_03.indd 1 23/05/10 14:42

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3


Technology

»Lily : NoSQL-based content


repository (HBase + SOLR)
THIS N OTE B OOK B ELO N GS TO :
» Kauri : REST centric webapp dev framework
» Daisy : techdoc / QDoc / publishing CMS

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4


Needs for Scalable Content
» wire-speed capturing ➡ NoSQL & write-
optimized storage
» batch-oriented post-
processing ➡ map/reduce

» semantic lifting : ➡ Natural Language


extracting knowledge Processing
out of noise
» data and inferred data ➡ smart content
become one repositories

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5


customers

The Lily Project

REST-centric content
cloud-scale content applications

batch
} partners

}
alternative processing and
content app UI augmentation ins and outs
indexes process
framework (enrichment)
coordination

us
content repository: store + search

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6


Lily essentials

» www.lilyproject.org

» Apache license for maximal flexibility

» (lots of) documentation at


docs.outerthought.org

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7


Lily content repository

» Scalable store (HBase) and


search (SOLR)
content
» flexible content model application
» index maintenance
repository

» high-level API

» base foundation

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8


HBase
» a datamodel where you can have column
families which keep all versions and others
which do not, which fits very well on our
CMS document model
» ordered tables with the ability to do range
scans on them, which allows to build
scalable indexes on top of it
» HDFS, a convenient place to store large blobs

» Apache license and community, a familiar


environment for us

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9


IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11
1. Store, 2. Search...? Ouch.

» CMS = two types of search


» structured, ‘logic’ search
» numbers, strings
» based on logic (SQL, anyone?)

» information retrieval (or: full-text search)


» text
» based on statistics

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12


Search ponderings

» All of that, at scale

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13


Structured Search
» HBase Indexing Library
» idea from Google App Engine datastore indexes
» http://code.google.com/appengine/articles/
index_building.html

rowkey col col rowkey col

order
A val3 foo6 val2-B

B val2 foo7 val3-A

content table index table A

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14


Full-text / IR search

» Lucene?
» no sharding (for scale)
» no replication (for availability)
» batched index updates (not real-time)

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15


Beyond Lucene
» Katta
» scalable architecture, however only search, no indexing

» Elastic Search
» very young (sorry)

» hbasene et al.
» stores inverted index in HBase, does not scale all features

» SOLR
» widely used, schema, facets, query syntax, cloud branch

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16


?
+
=
r ?
! O
as y
E
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
➙ Need for reliable queuing

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18


Connecting things
» we needed a reliable bridge between our
main storage (HBase) and our index/search
server(s) (SOLR)
» indexing, reindexing, mass reindexing (M/R)

» we need a reliable method of updating


HBase secondary indexes
» all of that eventually to run distributed

» distribution means coping with failure

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 19


Solution

» ... a QUEUE ! (Meh)

» ACMEMessageQueue ? Bzzzzzt.
We wanted fault-safe HBase persistence for
the queues.
Also for ease of administration.
» ➙ WAL & Queue implemented on top of
HBase tables

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20


WAL & Queue = RowLog Library
» WAL » Queue
» guaranteed execution » triggering of async
of synchronous actions actions
» call doesn’t return before » e.g. (re)index (updated)
secondary action finishes record with SOLR back-end
» e.g. update secondary indexes » size depends on speed of
» if all goes well, back-end process
size = #concurrent ops
» useful outside of Lily context
as well!

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21


The Sum
» Lily model (records & fields)

» mapped onto HBase (=storage)

» indexed and searchable through


SOLR
» using a WAL/Queue mechanism
implemented in HBase
» runtime based on Kauri

» with client/server comms via Avro


(and a REST interface with JSON)

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22


Architecture
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23
Architecture
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 24
Lily roadmap
» development started Sept. 2009
» development trunk opened Jul. 2010

» end of Oct. 2010: milestone/beta release


» fully distributable
» spec-complete
» Onwards:
» ‘business-level’ 1.0 release (packaging, testing, performance)
» user/auth management & access control
» UI framework (Kauri)
» ins and outs, semantic lifting

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25


Thanks for your
hospitality and
attention !

THIS NOTEBOOK BELONGS TO:


» stevenn@outerthought.org

Noteblock_03.indd 1 23/05/10 14:42


» @stevenn

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 26

You might also like