How to build a quick Lucene Search
March 25th, 2010 by Micha KopsHelo – today I wanted to post a small tutorial for a small index and search operation using the Lucene indexer and Maven for the project setup.
Contents
Setup
- Create an empty Maven sample project using the Eclipse Maven Plugin or use the following console command:
mvn archetype:create -DgroupId=in.student.demo.search -DartifactId=lucene-sample
- Here is my pom.xml there are some dependencies for Lucene defined:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>in.student.demo.search</groupId> <artifactId>lucene-sample</artifactId> <version>0.0.1-SNAPSHOT</version> <name>My Lucene Search Sample</name> <description>Lucene Search Sample</description> <dependencies> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>2.4.1</version> </dependency> <dependency> <groupId>lucene</groupId> <artifactId>lucene</artifactId> <version>1.4.3</version> </dependency> </dependencies> </project>
Index Example
I put everything in one class in the package in.student.demo.search called Main.java – hey it’s just a demo:
package in.student.demo.search; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.queryParser.MultiFieldQueryParser; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocCollector; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.store.LockObtainFailedException; public class Main { public static void main(String[] args) throws CorruptIndexException, LockObtainFailedException, IOException, ParseException { List l = new ArrayList(); l.add("you all"); l.add("visit"); l.add("some blog"); l.add("sometimes"); // create some index // we could also create an index in our ram ... // Directory index = new RAMDirectory(); Directory index = FSDirectory.getDirectory("/tmp/ourtestindex/"); StandardAnalyzer analyzer = new StandardAnalyzer(); IndexWriter w = new IndexWriter(index, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED); // index some data for (String i : l) { System.out.println("indexing " + i); Document doc = new Document(); doc.add(new Field("title", i, Field.Store.YES, Field.Index.ANALYZED)); doc.add(new Field("name", i, Field.Store.YES, Field.Index.ANALYZED)); w.addDocument(doc); } // loop and index some random data for (int i = 1; i < 40000; i++) { Document doc = new Document(); doc.add(new Field("title", "xyz" + i, Field.Store.YES, Field.Index.ANALYZED)); doc.add(new Field("name", "" + i, Field.Store.YES, Field.Index.ANALYZED)); w.addDocument(doc); } w.close(); System.out.println("index generated"); // parse query over multiple fields Query q = new MultiFieldQueryParser(new String[]{"title", "name"}, analyzer).parse("s*"); // searching ... int hitsPerPage = 10; IndexSearcher searcher = new IndexSearcher(index); TopDocCollector collector = new TopDocCollector(hitsPerPage); searcher.search(q, collector); ScoreDoc[] hits = collector.topDocs().scoreDocs; // output results System.out.println("Found " + hits.length + " hits."); for (int i = 0; i < hits.length; ++i) { int docId = hits[i].doc; Document d = searcher.doc(docId); System.out.println((i + 1) + ". " + d.get("name") + ": " + d.get("title")); } } }
Running the Example
Running the code produces the following output:
indexing you all indexing visit indexing some blog indexing sometimes index generated Found 2 hits. 1. sometimes: sometimes 2. some blog: some blog
Troubleshooting
If we change line 65/66 to search for xyz we get a nifty exception at runtime – more about this situation can be found in the Lucene FAQ:
Exception in thread "main" org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024
Other Lucene Articles
If you’re interested in some other Lucene articles of mine, please feel free to have a look at the following list:
- Lucene by Example: Specifying Analyzers on a per-field-basis and writing a custom Analyzer/Tokenizer
- Lucene Snippets: Index Stats
- Lucene Snippets: Faceting Search
- Creating elegant, typesafe Queries for JPA, mongoDB/Morphia and Lucene using Querydsl
- Content Detection, Metadata and Content Extraction with Apache Tika
- JPA Persistence and Lucene Indexing combined in Hibernate Search
- Hibernate Search Faceting: Discrete and Range Faceting by Example
Resources
Article Updates
- 2015-03-02: Structure and table of contents added, links to Lucene tutorials added
Tags: demo, document, index, indexer, lucene, maven, multi-field-search, search, snippets, solr, tutorial
March 28th, 2010 at 2:11 pm
[...] This post was mentioned on Twitter by some code. some code said: New post: How to build a quick lucene search http://cli.gs/eAMqY [...]
October 24th, 2014 at 8:16 pm
thanks a lot…..