How to build a quick Lucene Search

March 25th, 2010 by

Helo – today I wanted to post a small tutorial for a small index and search operation using the Lucene indexer and Maven for the project setup.

 

Setup

  1. Create an empty Maven sample project using the Eclipse Maven Plugin or use the following console command:
    mvn archetype:create -DgroupId=in.student.demo.search -DartifactId=lucene-sample
  2. Here is my pom.xml there are some dependencies for Lucene defined:
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
      <modelVersion>4.0.0</modelVersion>
      <groupId>in.student.demo.search</groupId>
      <artifactId>lucene-sample</artifactId>
      <version>0.0.1-SNAPSHOT</version>
      <name>My Lucene Search Sample</name>
      <description>Lucene Search Sample</description>
      <dependencies>
        <dependency>
          <groupId>org.apache.lucene</groupId>
          <artifactId>lucene-core</artifactId>
          <version>2.4.1</version>
        </dependency>
        <dependency>
          <groupId>lucene</groupId>
          <artifactId>lucene</artifactId>
          <version>1.4.3</version>
        </dependency>
      </dependencies>
    </project>

Index Example

I put everything in one class in the package in.student.demo.search called Main.java – hey it’s just a demo:

package in.student.demo.search;
 
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
 
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;
 
public class Main {
 
    public static void main(String[] args) throws CorruptIndexException,
            LockObtainFailedException, IOException, ParseException {
        List l = new ArrayList();
        l.add("you all");
        l.add("visit");
        l.add("some blog");
        l.add("sometimes");
 
        // create some index
        // we could also create an index in our ram ...
        // Directory index = new RAMDirectory();
        Directory index = FSDirectory.getDirectory("/tmp/ourtestindex/");
        StandardAnalyzer analyzer = new StandardAnalyzer();
        IndexWriter w = new IndexWriter(index, analyzer, true,
                IndexWriter.MaxFieldLength.UNLIMITED);
 
        // index some data
        for (String i : l) {
            System.out.println("indexing " + i);
            Document doc = new Document();
            doc.add(new Field("title", i, Field.Store.YES,
                            Field.Index.ANALYZED));
            doc.add(new Field("name", i, Field.Store.YES,
                            Field.Index.ANALYZED));
            w.addDocument(doc);
        }
 
        // loop and index some random data
        for (int i = 1; i < 40000; i++) {
            Document doc = new Document();
            doc.add(new Field("title", "xyz" + i, Field.Store.YES,
                    Field.Index.ANALYZED));
            doc.add(new Field("name", "" + i, Field.Store.YES,
                    Field.Index.ANALYZED));
            w.addDocument(doc);
        }
        w.close();
        System.out.println("index generated");
        // parse query over multiple fields
        Query q = new MultiFieldQueryParser(new String[]{"title", "name"},
                analyzer).parse("s*");
 
        // searching ...
        int hitsPerPage = 10;
        IndexSearcher searcher = new IndexSearcher(index);
        TopDocCollector collector = new TopDocCollector(hitsPerPage);
        searcher.search(q, collector);
        ScoreDoc[] hits = collector.topDocs().scoreDocs;
 
        // output results
        System.out.println("Found " + hits.length + " hits.");
        for (int i = 0; i < hits.length; ++i) {
            int docId = hits[i].doc;
            Document d = searcher.doc(docId);
            System.out.println((i + 1) + ". " + d.get("name") + ": "
                    + d.get("title"));
        }
 
    }
 
}

Running the Example

Running the code produces the following output:

indexing you all
indexing visit
indexing some blog
indexing sometimes
index generated
Found 2 hits.
1. sometimes: sometimes
2. some blog: some blog

Troubleshooting

If we change line 65/66 to search for xyz we get a nifty exception at runtime – more about this situation can be found in the Lucene FAQ:

Exception in thread "main" org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024

Other Lucene Articles

If you’re interested in some other Lucene articles of mine, please feel free to have a look at the following list:

Resources

Article Updates

  • 2015-03-02: Structure and table of contents added, links to Lucene tutorials added

Tags: , , , , , , , , , ,

2 Responses to “How to build a quick Lucene Search”

  1. Tweets that mention Some Code » Blog Archive » How to build a quick lucene search -- Topsy.com Says:

    [...] This post was mentioned on Twitter by some code. some code said: New post: How to build a quick lucene search http://cli.gs/eAMqY [...]

  2. nilkanth Says:

    thanks a lot…..

Search
Categories