Lucene Snippets: Faceting Search

August 28th, 2012 by

The latest snippet from my Lucene examples demonstrates how to achieve a facet search using the Lucene 4.0 API and how easy it is to define multiple category paths to aggregate search results for different possible facets.

In the following example we’re indexing some books as a classical example and create multiple category paths for author, publication date and category afterwards ..


 

Lucene Dependencies

We simply need two dependencies here .. lucene-core of course and in addition the lucene-facet library .. I’ve added the declarations needed for Maven and SBT here .. if you’re using Gradle or Buildr you should’t have a problem to transfer the information needed ;)

Maven

Simply add the following dependencies to your Maven-ized project’s pom.xml

    <properties>
        <lucene.version>4.0-SNAPSHOT</lucene.version>
    </properties>
    <dependencies>
        <dependency>
            <artifactId>lucene-core</artifactId>
            <groupId>org.apache.lucene</groupId>
            <version>${lucene.version}</version>
        </dependency>
        <dependency>
            <artifactId>lucene-facet</artifactId>
            <groupId>org.apache.lucene</groupId>
            <version>${lucene.version}</version>
        </dependency>
    </dependencies>
    <repositories>
        <repository>
            <id>lucene-repository</id>
            <name>Lucene Maven</name>
            <url>https://repository.apache.org/snapshots/</url>
            <snapshots>
                <enabled>true</enabled>
                <updatePolicy>always</updatePolicy>
            </snapshots>
        </repository>
    </repositories>

Simple Build Tool / SBT

To use Lucene here, simply add the following lines to your build.sbt

libraryDependencies += "org.apache.lucene" % "lucene-core" % "4.0.0-BETA"
 
libraryDependencies += "org.apache.lucene" % "lucene-facet" % "4.0.0-BETA"
 
resolvers += "apache-snapshots-repo" at "https://repository.apache.org/snapshots/"

Facet Book Search

First we’re creating an entity to be added to our index – and the classical example is a book .. a book has a title, an author, a publication date and a category .. so that’s our book entity ..

public static class Book {
	private final String title;
	private final String author;
	private final String published;
	private final String category;
 
	public Book(final String title, final String author,
			final String published, final String category) {
		this.title = title;
		this.author = author;
		this.published = published;
		this.category = category;
	}
 
	public String getTitle() {
		return title;
	}
 
	public String getAuthor() {
		return author;
	}
 
	public String getPublished() {
		return published;
	}
 
	public String getCategory() {
		return category;
	}
}

And that’s our faceting example. As you can see, we’re using two different directories – one for the normal index as we’re used to – and another for the taxonomy index.

We’re adding multiple category paths to the index to enable facet search for categories like author, category or publication date.

This allows us to finally create a request with multiple faceting parameters and afterwards to iterate over the results of the faceting search …

Title Author Publication Date Category
Tom Sawyer Mark Twain 1840 Novel
Collected Tales Mark Twain 1850 Novel
The Trial Franz Kafka 1901 Novel
Some book Some author 1901 Novel
package com.hascode.tutorial;
 
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
 
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.StringField;
import org.apache.lucene.facet.index.CategoryDocumentBuilder;
import org.apache.lucene.facet.search.FacetsCollector;
import org.apache.lucene.facet.search.params.CountFacetRequest;
import org.apache.lucene.facet.search.params.FacetSearchParams;
import org.apache.lucene.facet.search.results.FacetResult;
import org.apache.lucene.facet.search.results.FacetResultNode;
import org.apache.lucene.facet.taxonomy.CategoryPath;
import org.apache.lucene.facet.taxonomy.TaxonomyReader;
import org.apache.lucene.facet.taxonomy.TaxonomyWriter;
import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader;
import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MultiCollector;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
 
public class FacetingExample {
	private static final String INDEX = "target/facet/index";
	private static final String INDEX_TAXO = "target/facet/taxo";
 
	public static void main(final String[] args) throws IOException {
		Directory dir = FSDirectory.open(new File(INDEX));
		Directory taxoDir = FSDirectory.open(new File(INDEX_TAXO));
		Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
		IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_40,
				analyzer);
		iwc.setOpenMode(OpenMode.CREATE);
		IndexWriter writer = new IndexWriter(dir, iwc);
		TaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(taxoDir,
				OpenMode.CREATE_OR_APPEND);
 
		List books = Arrays
				.asList(new Book("Tom Sawyer", "Mark Twain", "1840", "Novel"),
						new Book("Collected Tales", "Mark Twain", "1850",
								"Novel"), new Book("The Trial", "Franz Kafka",
								"1901", "Novel"), new Book("Some book",
								"Some author", "1901", "Novel"));
 
		createDocuments(writer, taxoWriter, books);
		taxoWriter.commit();
		writer.commit();
		writer.close();
		taxoWriter.close();
 
		IndexReader indexReader = DirectoryReader.open(dir);
		IndexSearcher searcher = new IndexSearcher(indexReader);
		TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoDir);
		Query q = new TermQuery(new Term("category", "Novel"));
		TopScoreDocCollector tdc = TopScoreDocCollector.create(10, true);
		FacetSearchParams facetSearchParams = new FacetSearchParams();
		facetSearchParams.addFacetRequest(new CountFacetRequest(
				new CategoryPath("author"), 10));
		facetSearchParams.addFacetRequest(new CountFacetRequest(
				new CategoryPath("category"), 10));
		facetSearchParams.addFacetRequest(new CountFacetRequest(
				new CategoryPath("published"), 10));
		FacetsCollector facetsCollector = new FacetsCollector(
				facetSearchParams, indexReader, taxoReader);
		searcher.search(q, MultiCollector.wrap(tdc, facetsCollector));
		List res = facetsCollector.getFacetResults();
		System.out
				.println("Search for books with the category:Novel returned : "
						+ res.size()
						+ " results\n---------------------------------");
		for (final FacetResult r : res) {
			System.out.println("\nMatching "
					+ r.getFacetResultNode().getLabel()
					+ ":\n------------------------------------");
			for (FacetResultNode n : r.getFacetResultNode().getSubResults()) {
				System.out.println(String.format("\t%s: %.0f", n.getLabel()
						.lastComponent(), n.getValue()));
			}
		}
	}
 
	private static void createDocuments(final IndexWriter writer,
			final TaxonomyWriter taxoWriter, final List books)
			throws IOException {
		for (final Book b : books) {
			Document doc = new Document();
			doc.add(new StringField("title", b.getTitle(), Store.YES));
			doc.add(new StringField("category", b.getCategory(), Store.YES));
			List categories = new ArrayList();
			categories.add(new CategoryPath("author", b.getAuthor()));
			categories.add(new CategoryPath("category", b.getCategory()));
			categories.add(new CategoryPath("published", b.getPublished()));
			CategoryDocumentBuilder categoryDocBuilder = new CategoryDocumentBuilder(
					taxoWriter);
			categoryDocBuilder.setCategoryPaths(categories);
			categoryDocBuilder.build(doc);
			writer.addDocument(doc);
		}
	}
}

The example above should produce the following output:

sbt> run-main com.hascode.tutorial.FacetingExample
[info] Running com.hascode.tutorial.FacetingExample
Search for books with the category:Novel returned : 3 results
---------------------------------
 
Matching author:
------------------------------------
 Mark Twain: 2
 Some author: 1
 Franz Kafka: 1
 
Matching category:
------------------------------------
 Novel: 4
 
Matching published:
------------------------------------
 1901: 2
 1850: 1

Tutorial Sources

Please feel free to to view and download the complete sources from this tutorial from my Bitbucket repository – or if you’ve got Mercurial installed just check it out with

hg clone https://bitbucket.org/hascode/lucene-4-tutorial

Resources

Tags: , , , , , , , , ,

One Response to “Lucene Snippets: Faceting Search”

  1. Sakuraba Says:

    Could you elaborate on how to create a filter query based on a facet entry? In your example it would be like clicking on “Mark Twain” and expecting only results within that facet.

  2. Gagan Gupta Says:

    I am using LUCENE 4.4. CountFacetRequest as FacetRequest is working perfectly fine. Thanks.

    But I tried with SumScoreFacetRequest as FacetRequest and its still showing the count.
    Please help me on how can I change the count to some different paramater of my own choice.

Search
Categories