You are on page 1of 11

Welcome to Lucene!

Table of contents
1 What Is Lucene?.................................................................................................................2
2 News...................................................................................................................................2
2.1 25 June 2010 - Solr 1.4.1 Released............................................................................... 2
2.2 18 June 2010 - Lucene Java 3.0.2 and 2.9.3 available.................................................. 2
2.3 11 May 2010 - Nutch and Tika Graduate...................................................................... 3
2.4 7 May 2010 - Apache Lucene Eurocon 2010 Coming to Prague May 18-21............... 3
2.5 17 March 2010 - Apache Mahout 0.3 released..............................................................4
2.6 26 February 2010 - Lucene Java 3.0.1 and 2.9.2 available........................................... 4
2.7 25 November 2009 - Lucene Java 3.0.0 available.........................................................5
2.8 17 November 2009 - Apache Mahout 0.2 released....................................................... 6
2.9 10 November 2009 - Solr 1.4 Released.........................................................................6
2.10 6 November 2009 - Lucene Java 2.9.1 available......................................................... 6
2.11 25 September 2009 - Lucene Java 2.9.0 available.......................................................7
2.12 14 August 2009 - Lucene at US ApacheCon...............................................................8
2.13 25 June 2009 - Apache Open Relevance Kickoff........................................................8
2.14 07 April 2009 - Apache Mahout 0.1 released.............................................................. 9
2.15 9 March 2009 - Lucene Java 2.4.1 available............................................................... 9
2.16 09 February 2009 - Lucene at ApacheCon Europe 2009 in Amsterdam.....................9
2.17 19 January 2009 - PyLucene joins the Lucene TLP.................................................. 10
2.18 8 October 2008 - Lucene Java 2.4.0 available........................................................... 10
2.19 15 September 2008 - Solr 1.3.0 Available.................................................................10

Copyright © 2009 The Apache Software Foundation. All rights reserved.


Welcome to Lucene!

1. What Is Lucene?
The Apache Lucene project develops open-source search software, including:
• Lucene Java, our flagship sub-project, provides Java-based indexing and search
technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization
capabilities.
• Solr is our high performance enterprise search server, with XML/HTTP and
JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, database
integration, web admin and search interfaces.
• Lucene.Net is a source code, class-per-class, API-per-API and algorithmatic port of the
Lucene Java search engine to the C# and .NET platform utilizing Microsoft .NET
Framework. Lucene.Net is currently under incubation.
• PyLucene is a Python port of the the Lucene Java project.
• Open Relevance Project is a subproject with the aim of collecting and distributing free
materials for relevance testing and performance.
• Lucy is a loose C port of Lucene Java, with Perl and Ruby bindings, currently in
incubation.
• Droids is an intelligent robot crawling framework currently in incubation.

2. News

2.1. 25 June 2010 - Solr 1.4.1 Released


Solr 1.4.1 has been released and is now available for public download! Solr 1.4.1 is a bug fix
release for Solr 1.4 that includes many Solr bug fixes as well as Lucene bug fixes from
Lucene 2.9.3.
See the release notes for more details.

2.2. 18 June 2010 - Lucene Java 3.0.2 and 2.9.3 available


Both releases fix bugs in the previous versions:
• 2.9.3 is a bugfix release for the Lucene Java 2.x series, based on Java 1.4.
• 3.0.2 has the same bug fix level but is for the Lucene Java 3.x series, based on Java 5.
New users of Lucene are advised to use version 3.0.2 for new developments, because it has a
clean, type-safe API.
Important improvements in these releases include:
• Fixed memory leaks in IndexWriter when large documents are indexed. It also uses

Page 2
Copyright © 2009 The Apache Software Foundation. All rights reserved.
Welcome to Lucene!

now shared memory pools for term vectors and stored fields. IndexWriter now
releases Fieldables and Readers on close.
• NativeFSLockFactory fixes and improvements. Release write lock if exception
occurs in IndexWriter ctors.
• Improve concurrency of IndexReader, especially in the context of near real-time
readers.
• Near real-time readers, opened while addIndexes* is running, no longer miss some
segments.
• Performance improvements in ParallelMultiSearcher (3.0.2 only).
• IndexSearcher no longer throws NegativeArraySizeException if you pass
Integer.MAX_VALUE as nDocs to search methods.
Both releases are fully compatible with the corresponding previous versions. We strongly
recommend upgrading to 2.9.3 if you are using 2.9.x; and to 3.0.2 if you are using 3.0.x.
See 3.0.2 CHANGES and 2.9.3 CHANGES for details. Binary and source distributions
are available here. Maven artifacts are available here.

2.3. 11 May 2010 - Nutch and Tika Graduate


Lucene's Nutch and Tika subprojects have graduated to become top-level Apache projects.
Nutch can now be found at http://nutch.apache.org/ and Tika can be found at
http://tika.apache.org.

2.4. 7 May 2010 - Apache Lucene Eurocon 2010 Coming to Prague May 18-21
On May 18th to the 21st Prague will play host to the first ever dedicated Lucene and Solr
User Conference in Europe: Apache Lucene Eurocon 2010. This is a a not-for-profit
conference presented by Lucid Imagination, with net proceeds being donated to The Apache
Software Foundation. Registration is now open. Schedule highlights include:
• Two days of in depth training classes:
• Solr Application Development Workshop - Erik Hatcher
• Lucene Bootcamp - Grant Ingersoll
• Four general sessions:
• The Search Revolution: How Lucene & Solr Are Changing The World - Eric Gries
• From Publisher To Platform: How The Guardian Used Content, Search, and Open
Source To Build a Powerful New Business Model - Stephen Dunn
• Software Disruption: How Using Open Source, Search, Big Data and Cloud
technology are Disrupting IT - Zack Urlocker
• Solr 1.5 and Beyond - Yonik Seeley

Page 3
Copyright © 2009 The Apache Software Foundation. All rights reserved.
Welcome to Lucene!

• 24 technical sessions, spanning two days, divided into two tracks


• A Thursday night MeetUp
• An event at the Czech Beer Festival

2.5. 17 March 2010 - Apache Mahout 0.3 released


The Apache Lucene project is pleased to announce the release of Apache Mahout 0.3.
Highlights include:
• New: math and collections modules based on the high performance Colt library
• Faster Frequent Pattern Growth(FPGrowth) using FP-bonsai pruning
• Parallel Dirichlet process clustering (model-based clustering algorithm)
• Parallel co-occurrence based recommender
• Parallel text document to vector conversion using LLR based ngram generation
• Parallel Lanczos SVD(Singular Value Decomposition) solver
• Shell scripts for easier running of algorithms, utilities and examples
• ... and much much more: code cleanup, many bug fixes and performance improvements
Details on what's included can be found in the release notes.
Downloads are available from the Apache Mirrors

2.6. 26 February 2010 - Lucene Java 3.0.1 and 2.9.2 available


Both releases fix bugs in the previous versions:
• 2.9.2 is a bugfix release for the Lucene Java 2.x series, based on Java 1.4.
• 3.0.1 has the same bug fix level but is for the Lucene Java 3.x series, based on Java 5.
New users of Lucene are advised to use version 3.0.1 for new developments, because it has a
clean, type-safe API.
Important improvements in these releases include:
• An increased maximum number of unique terms in each index segment.
• Fixed experimental CustomScoreQuery to respect per-segment search. This
introduced an API change!
• Important fixes to IndexWriter: a commit() thread-safety issue, lost document deletes
in near real-time indexing.
• Bugfixes for Contrib's Analyzers package.
• Restoration of some public methods that were lost during deprecation removal (3.0.1
only).
• The new Attribute-based TokenStream API now works correctly with different class

Page 4
Copyright © 2009 The Apache Software Foundation. All rights reserved.
Welcome to Lucene!

loaders.
Both releases are fully compatible with the corresponding previous versions. We strongly
recommend upgrading to 2.9.2 if you are using 2.9.1 or 2.9.0; and to 3.0.1 if you are using
3.0.0.
See 3.0.1 CHANGES and 2.9.2 CHANGES for details. Binary and source distributions
are available here. Maven artifacts are available here.

2.7. 25 November 2009 - Lucene Java 3.0.0 available


The new version is mostly a cleanup release without any new features. All deprecations
targeted to be removed in version 3.0 were removed. If you are upgrading from version 2.9.1
of Lucene, you have to fix all deprecation warnings in your code base to be able to recompile
against this version.
This is the first Lucene release with Java 5 as a minimum requirement. The API was cleaned
up to make use of Java 5's generics, varargs, enums, and autoboxing. New users of Lucene
are advised to use this version for new developments, because it has a clean, type safe new
API. Upgrading users can now remove unnecessary casts and add generics to their code, too.
If you have not upgraded your installation to Java 5, please read the file
JRE_VERSION_MIGRATION.txt (please note that this is not related to Lucene 3.0, it
will also happen with any previous release when you upgrade your Java environment).
Lucene 3.0 has some changes regarding compressed fields: 2.9 already deprecated
compressed fields; support for them was removed now. Lucene 3.0 is still able to read
indexes with compressed fields, but as soon as merges occur or the index is optimized, all
compressed fields are decompressed and converted to Field.Store.YES. Because of
this, indexes with compressed fields can suddenly get larger.
While we generally try and maintain full backwards compatibility between major versions,
Lucene 3.0 has some minor breaks, mostly related to deprecation removal, pointed out in the
'Changes in backwards compatibility policy' section of CHANGES.txt. Notable are:
• IndexReader.open(Directory) now opens in read-only mode per default (this
method was deprecated because of that in 2.9). The same occurs to IndexSearcher.
• Already started in 2.9, core TokenStreams are now made final to enforce the decorator
pattern.
• If you interrupt an IndexWriter merge thread, IndexWriter now throws an
unchecked ThreadInterruptedException that extends RuntimeException
and clears the interrupt status.
See CHANGES for details.

Page 5
Copyright © 2009 The Apache Software Foundation. All rights reserved.
Welcome to Lucene!

Binary and source distributions are available here. Maven artifacts are available here.

2.8. 17 November 2009 - Apache Mahout 0.2 released


The Apache Lucene project is pleased to announce the release of Apache Mahout 0.2.
Highlights include:
• Significant performance increase (and API changes) in collaborative filtering engine
• K-nearest-neighbor and SVD recommenders
• Much code cleanup, bug fixing
• Random forests, frequent pattern mining using parallel FP growth
• Latent Dirichlet Allocation
• Updates for Hadoop 0.20.x
Details on what's included can be found in the release notes.
Downloads are available from the Apache Mirrors

2.9. 10 November 2009 - Solr 1.4 Released


Solr 1.4 has been released and is now available for public download! New Solr 1.4 features
include
• Major performance enhancements in indexing, searching, and faceting
• Revamped all-Java index replication that's simple to configure and can replicate config
files
• Greatly improved database integration via the DataImportHandler
• Rich document processing (Word, PDF, HTML) via Apache Tika
• Dynamic search results clustering via Carrot2
• Multi-select faceting (support for multiple items in a single category to be selected)
• Many powerful query enhancements, including ranges over arbitrary functions, nested
queries of different syntaxes
• Many other plugins including Terms for auto-suggest, Statistics, TermVectors,
Deduplication
See the release notes for more details.

2.10. 6 November 2009 - Lucene Java 2.9.1 available


This release fixes bugs from 2.9.0, including one serious bug whereby BooleanQuery could
silently fail to retrieve certain matching documents.

Page 6
Copyright © 2009 The Apache Software Foundation. All rights reserved.
Welcome to Lucene!

There are also some minor API changes, including a Version parameter added to
QueryParser and contrib Analyzers, so that version dependent defaults are consistent across
classes, as well as un-deprecating of certain methods (we were too zealous in a few cases!).
Otherwise the changes are all bug fixes and documentation improvements.
This release is fully compatible with 2.9.0. We strongly recommend upgrading to 2.9.1 if you
are using 2.9.0. Furthermore, because some additional APIs were deprecated in 2.9.1, to
ensure a clean ("JAR drop in") upgrade to 3.0 you'll need to ensure your code compiles
against 2.9.1 without deprecation warnings.
See CHANGES for details.
Binary and source distributions are available here.
Maven artifacts are available here.

2.11. 25 September 2009 - Lucene Java 2.9.0 available


This release has many improvements since release 2.4.1, including:
• Per segment searching and caching (can lead to much faster reopen among other things)
• Near real-time search capabilities added to IndexWriter
• New Query types
• Smarter, more scalable multi-term queries (wildcard, range, etc)
• A freshly optimized Collector/Scorer API
• Improved Unicode support and the addition of Collation contrib
• A new Attribute based TokenStream API
• A new QueryParser framework in contrib with a core QueryParser replacement impl
included.
• Scoring is now optional when sorting by Field, or using a custom Collector, gaining
sizable performance when scores are not required.
• New analyzers (PersianAnalyzer, ArabicAnalyzer, SmartChineseAnalyzer)
• New fast-vector-highlighter for large documents
• Lucene now includes high-performance handling of numeric fields. Such fields are
indexed with a trie structure, enabling simple to use and much faster numeric range
searching without having to externally pre-process numeric values into textual values.
See CHANGES for details.
While we generally try and maintain full backwards compatibility between major versions,
Lucene 2.9 has a variety of breaks that are spelled out in the 'Changes in backwards
compatibility policy' section of CHANGES. We recommend that you recompile your

Page 7
Copyright © 2009 The Apache Software Foundation. All rights reserved.
Welcome to Lucene!

application with Lucene 2.9 rather than attempting to drop it in. This will alert you to any
issues you may have to fix if you are affected by one of the backward compatibility breaks.
Binary and source distributions are available here.
Maven artifacts are available here.

2.12. 14 August 2009 - Lucene at US ApacheCon


ApacheCon US is once again in the Bay Area and Lucene is coming along for the ride! The
Lucene community has planned two full days of talks, plus a meetup and the usual bevy of
training. With a well-balanced mix of first time and veteran ApacheCon speakers, the Lucene
track at ApacheCon US promises to have something for everyone. Be sure not to miss:
Training:
• Lucene Boot Camp - A two day training session, Nov. 2nd & 3rd
• Solr Day - A one day training session, Nov. 2nd
Thursday, Nov. 5th
• Introduction to the Lucene Ecosystem - Grant Ingersoll @ 9:00
• Lucene Basics and New Features - Michael Busch @ 10:00
• Apache Solr: Out of the Box - Chris Hostetter @ 14:00
• Introduction to Nutch - Andrzej Bialecki @ 15:00
• Lucene and Solr Performance Tuning - Mark Miller @ 16:30
Friday, Nov. 6th
• Implementing an Information Retrieval Framework for an Organizational Repository -
Sithu D Sudarsan @ 9:00
• Apache Mahout - Going from raw data to Information - Isabel Drost @ 10:00
• MIME Magic with Apache Tika - Jukka Zitting @ 11:30
• Building Intelligent Search Applications with the Lucene Ecosystem - Ted Dunning @
14:00
• Realtime Search - Jason Rutherglen @ 15:00

2.13. 25 June 2009 - Apache Open Relevance Kickoff


The Apache Lucene PMC has officially voted to add the Open Relevance Project (ORP) as a
Lucene subproject. ORP's main goal is to build out collections, judgments and queries in an
open environment to make it easier for Lucene developers and users to do relevance testing,
much like one would get if using TREC or other evaluation conferences.

Page 8
Copyright © 2009 The Apache Software Foundation. All rights reserved.
Welcome to Lucene!

See http://lucene.apache.org/openrelevance for more info

2.14. 07 April 2009 - Apache Mahout 0.1 released


The Apache Lucene project is pleased to announce the release of Apache Mahout 0.1.
Apache Mahout is a subproject of Apache Lucene with the goal of delivering scalable
machine learning algorithm implementations under the Apache license. The first public
release includes implementations for clustering, classification, collaborative filtering and
evolutionary programming.
Highlights include:
• Taste Collaborative Filtering
• Several distributed clustering implementations: k-Means, Fuzzy k-Means, Dirchlet,
Mean-Shift and Canopy
• Distributed Naive Bayes and Complementary Naive Bayes classification implementations
• Distributed fitness function implementation for the Watchmaker evolutionary
programming library
• Most implementations are built on top of Apache Hadoop (http://hadoop.apache.org) for
scalability
More info is available on the Mahout website.

2.15. 9 March 2009 - Lucene Java 2.4.1 available


This release contains fixes for bugs found in 2.4.0, including one data loss bug
(LUCENE-1452) where in certain situations binary fields would be truncated to 0 bytes.
See CHANGES for details.
2.4.1 does not contain any new features, API or file format changes, which makes it fully
compatible with 2.4.0.
Binary and source distributions are available here.
Maven artifacts are available here.

2.16. 09 February 2009 - Lucene at ApacheCon Europe 2009 in Amsterdam

Page 9
Copyright © 2009 The Apache Software Foundation. All rights reserved.
Welcome to Lucene!

Lucene will be extremely well represented at ApacheCon EU 2009 in Amsterdam,


Netherlands this March 23-27, 2009:
• Lucene Boot Camp - A two day training session, March 23 & 24th
• Solr Boot Camp - A one day training session, March 24th
• Introducing Apache Mahout - Grant Ingersoll. March 25th @ 10:30
• Lucene/Solr Case Studies - Erik Hatcher. March 25th @ 11:30
• Advanced Indexing Techniques with Apache Lucene - Michael Busch. March 25th @
14:00
• Apache Solr - A Case Study - Uri Boness. March 26th @ 17:30
• Best of breed - httpd, forrest, solr and droids - Thorsten Scherler. March 27th @ 17:30
• Apache Droids - an intelligent standalone robot framework - Thorsten Scherler. March
26th @ 15:00

2.17. 19 January 2009 - PyLucene joins the Lucene TLP


PyLucene, the Python based port of Lucene is now an official Lucene subproject.

2.18. 8 October 2008 - Lucene Java 2.4.0 available


Lucene 2.4.0 is available for public download. This version contains many enhancements and
bug fixes. See CHANGES for details.
Binary and source distributions are available here.
Maven artifacts are available here.

2.19. 15 September 2008 - Solr 1.3.0 Available


Solr 1.3.0 is available for public download. This version contains many enhancements and
bug fixes, including distributed search capabilities, Lucene 2.3.x performance improvements
and many others.

Page 10
Copyright © 2009 The Apache Software Foundation. All rights reserved.
Welcome to Lucene!

See the release notes for more details. Download is available from a Apache Mirror.

Page 11
Copyright © 2009 The Apache Software Foundation. All rights reserved.

You might also like