How to build a quick lucene search

March 25th, 2010 by

Helo – today I wanted to post a small tutorial for a small index and search operation using the Lucene indexer and Maven for the project setup.

  1. Create an empty Maven sample project using the Eclipse Maven Plugin or use the following console command:
    mvn archetype:create -DgroupId=in.student.demo.search -DartifactId=lucene-sample
  2. Here is my pom.xml there are some dependencies for Lucene defined:
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
      <modelVersion>4.0.0</modelVersion>
      <groupId>in.student.demo.search</groupId>
      <artifactId>lucene-sample</artifactId>
      <version>0.0.1-SNAPSHOT</version>
      <name>My Lucene Search Sample</name>
      <description>Lucene Search Sample</description>
      <dependencies>
        <dependency>
          <groupId>org.apache.lucene</groupId>
          <artifactId>lucene-core</artifactId>
          <version>2.4.1</version>
        </dependency>
        <dependency>
          <groupId>lucene</groupId>
          <artifactId>lucene</artifactId>
          <version>1.4.3</version>
        </dependency>
      </dependencies>
    </project>
  3. I put everything in one class in the package in.student.demo.search called Main.java – hey it’s just a demo:
    package in.student.demo.search;
     
    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.List;
     
    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.document.Document;
    import org.apache.lucene.document.Field;
    import org.apache.lucene.index.CorruptIndexException;
    import org.apache.lucene.index.IndexWriter;
    import org.apache.lucene.queryParser.MultiFieldQueryParser;
    import org.apache.lucene.queryParser.ParseException;
    import org.apache.lucene.search.IndexSearcher;
    import org.apache.lucene.search.Query;
    import org.apache.lucene.search.ScoreDoc;
    import org.apache.lucene.search.TopDocCollector;
    import org.apache.lucene.store.Directory;
    import org.apache.lucene.store.FSDirectory;
    import org.apache.lucene.store.LockObtainFailedException;
     
    public class Main {
     
        public static void main(String[] args) throws CorruptIndexException,
                LockObtainFailedException, IOException, ParseException {
            List l = new ArrayList();
            l.add("you all");
            l.add("visit");
            l.add("some blog");
            l.add("sometimes");
     
            // create some index
            // we could also create an index in our ram ...
            // Directory index = new RAMDirectory();
            Directory index = FSDirectory.getDirectory("/tmp/ourtestindex/");
            StandardAnalyzer analyzer = new StandardAnalyzer();
            IndexWriter w = new IndexWriter(index, analyzer, true,
                    IndexWriter.MaxFieldLength.UNLIMITED);
     
            // index some data
            for (String i : l) {
                System.out.println("indexing " + i);
                Document doc = new Document();
                doc.add(new Field("title", i, Field.Store.YES,
                                Field.Index.ANALYZED));
                doc.add(new Field("name", i, Field.Store.YES,
                                Field.Index.ANALYZED));
                w.addDocument(doc);
            }
     
            // loop and index some random data
            for (int i = 1; i < 40000; i++) {
                Document doc = new Document();
                doc.add(new Field("title", "xyz" + i, Field.Store.YES,
                        Field.Index.ANALYZED));
                doc.add(new Field("name", "" + i, Field.Store.YES,
                        Field.Index.ANALYZED));
                w.addDocument(doc);
            }
            w.close();
            System.out.println("index generated");
            // parse query over multiple fields
            Query q = new MultiFieldQueryParser(new String[]{"title", "name"},
                    analyzer).parse("s*");
     
            // searching ...
            int hitsPerPage = 10;
            IndexSearcher searcher = new IndexSearcher(index);
            TopDocCollector collector = new TopDocCollector(hitsPerPage);
            searcher.search(q, collector);
            ScoreDoc[] hits = collector.topDocs().scoreDocs;
     
            // output results
            System.out.println("Found " + hits.length + " hits.");
            for (int i = 0; i < hits.length; ++i) {
                int docId = hits[i].doc;
                Document d = searcher.doc(docId);
                System.out.println((i + 1) + ". " + d.get("name") + ": "
                        + d.get("title"));
            }
     
        }
     
    }
  4. Running the code produces the following output:
    indexing you all
    indexing visit
    indexing some blog
    indexing sometimes
    index generated
    Found 2 hits.
    1. sometimes: sometimes
    2. some blog: some blog
  5. If we change line 65/66 to search for xyz we get a nifty exception at runtime – more about this situation can be found in the Lucene FAQ:
    Exception in thread "main" org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024

Helpful Information:

Perhaps the next time I am going to post some examples regarding Solr – there are some very nice features included!

Also nice is the Tika project – offering content/word extraction from several formats like the office formats, mp3 id tags etc..

<project xmlns=”http://maven.apache.org/POM/4.0.0″ xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd”>
<modelVersion>4.0.0</modelVersion>
<groupId>in.student.demo.search</groupId>
<artifactId>lucene-sample</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>My Lucene Search Sample</name>
<description>Lucene Search Sample</description>
<dependencies>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>2.4.1</version>
</dependency>
<dependency>
<groupId>lucene</groupId>
<artifactId>lucene</artifactId>
<version>1.4.3</version>
</dependency>
</dependencies>
</project>

Tags: , , , , , , , , ,

One Response to “How to build a quick lucene search”

  1. Tweets that mention Some Code » Blog Archive » How to build a quick lucene search -- Topsy.com Says:

    [...] This post was mentioned on Twitter by some code. some code said: New post: How to build a quick lucene search http://cli.gs/eAMqY [...]

Leave a Reply

Please leave these two fields as-is:

Protected by Invisible Defender. Showed 403 to 80,962 bad guys.

Search
Categories