Lucene Snippets: Faceting Search
August 28th, 2012 by Micha KopsThe latest snippet from my Lucene examples demonstrates how to achieve a facet search using the Lucene 4.0 API and how easy it is to define multiple category paths to aggregate search results for different possible facets.
In the following example we’re indexing some books as a classical example and create multiple category paths for author, publication date and category afterwards ..
Contents
Lucene Dependencies
We simply need two dependencies here .. lucene-core of course and in addition the lucene-facet library .. I’ve added the declarations needed for Maven and SBT here .. if you’re using Gradle or Buildr you should’t have a problem to transfer the information needed ;)
Maven
Simply add the following dependencies to your Maven-ized project’s pom.xml
<properties> <lucene.version>4.0-SNAPSHOT</lucene.version> </properties> <dependencies> <dependency> <artifactId>lucene-core</artifactId> <groupId>org.apache.lucene</groupId> <version>${lucene.version}</version> </dependency> <dependency> <artifactId>lucene-facet</artifactId> <groupId>org.apache.lucene</groupId> <version>${lucene.version}</version> </dependency> </dependencies> <repositories> <repository> <id>lucene-repository</id> <name>Lucene Maven</name> <url>https://repository.apache.org/snapshots/</url> <snapshots> <enabled>true</enabled> <updatePolicy>always</updatePolicy> </snapshots> </repository> </repositories>
Simple Build Tool / SBT
To use Lucene here, simply add the following lines to your build.sbt
libraryDependencies += "org.apache.lucene" % "lucene-core" % "4.0.0-BETA" libraryDependencies += "org.apache.lucene" % "lucene-facet" % "4.0.0-BETA" resolvers += "apache-snapshots-repo" at "https://repository.apache.org/snapshots/"
Facet Book Search
First we’re creating an entity to be added to our index – and the classical example is a book .. a book has a title, an author, a publication date and a category .. so that’s our book entity ..
public static class Book { private final String title; private final String author; private final String published; private final String category; public Book(final String title, final String author, final String published, final String category) { this.title = title; this.author = author; this.published = published; this.category = category; } public String getTitle() { return title; } public String getAuthor() { return author; } public String getPublished() { return published; } public String getCategory() { return category; } }
And that’s our faceting example. As you can see, we’re using two different directories – one for the normal index as we’re used to – and another for the taxonomy index.
We’re adding multiple category paths to the index to enable facet search for categories like author, category or publication date.
This allows us to finally create a request with multiple faceting parameters and afterwards to iterate over the results of the faceting search …
Title | Author | Publication Date | Category |
---|---|---|---|
Tom Sawyer | Mark Twain | 1840 | Novel |
Collected Tales | Mark Twain | 1850 | Novel |
The Trial | Franz Kafka | 1901 | Novel |
Some book | Some author | 1901 | Novel |
package com.hascode.tutorial; import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.Arrays; import java.util.List; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field.Store; import org.apache.lucene.document.StringField; import org.apache.lucene.facet.index.CategoryDocumentBuilder; import org.apache.lucene.facet.search.FacetsCollector; import org.apache.lucene.facet.search.params.CountFacetRequest; import org.apache.lucene.facet.search.params.FacetSearchParams; import org.apache.lucene.facet.search.results.FacetResult; import org.apache.lucene.facet.search.results.FacetResultNode; import org.apache.lucene.facet.taxonomy.CategoryPath; import org.apache.lucene.facet.taxonomy.TaxonomyReader; import org.apache.lucene.facet.taxonomy.TaxonomyWriter; import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader; import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.IndexWriterConfig.OpenMode; import org.apache.lucene.index.Term; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.MultiCollector; import org.apache.lucene.search.Query; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopScoreDocCollector; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class FacetingExample { private static final String INDEX = "target/facet/index"; private static final String INDEX_TAXO = "target/facet/taxo"; public static void main(final String[] args) throws IOException { Directory dir = FSDirectory.open(new File(INDEX)); Directory taxoDir = FSDirectory.open(new File(INDEX_TAXO)); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40); IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_40, analyzer); iwc.setOpenMode(OpenMode.CREATE); IndexWriter writer = new IndexWriter(dir, iwc); TaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(taxoDir, OpenMode.CREATE_OR_APPEND); List books = Arrays .asList(new Book("Tom Sawyer", "Mark Twain", "1840", "Novel"), new Book("Collected Tales", "Mark Twain", "1850", "Novel"), new Book("The Trial", "Franz Kafka", "1901", "Novel"), new Book("Some book", "Some author", "1901", "Novel")); createDocuments(writer, taxoWriter, books); taxoWriter.commit(); writer.commit(); writer.close(); taxoWriter.close(); IndexReader indexReader = DirectoryReader.open(dir); IndexSearcher searcher = new IndexSearcher(indexReader); TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoDir); Query q = new TermQuery(new Term("category", "Novel")); TopScoreDocCollector tdc = TopScoreDocCollector.create(10, true); FacetSearchParams facetSearchParams = new FacetSearchParams(); facetSearchParams.addFacetRequest(new CountFacetRequest( new CategoryPath("author"), 10)); facetSearchParams.addFacetRequest(new CountFacetRequest( new CategoryPath("category"), 10)); facetSearchParams.addFacetRequest(new CountFacetRequest( new CategoryPath("published"), 10)); FacetsCollector facetsCollector = new FacetsCollector( facetSearchParams, indexReader, taxoReader); searcher.search(q, MultiCollector.wrap(tdc, facetsCollector)); List res = facetsCollector.getFacetResults(); System.out .println("Search for books with the category:Novel returned : " + res.size() + " results\n---------------------------------"); for (final FacetResult r : res) { System.out.println("\nMatching " + r.getFacetResultNode().getLabel() + ":\n------------------------------------"); for (FacetResultNode n : r.getFacetResultNode().getSubResults()) { System.out.println(String.format("\t%s: %.0f", n.getLabel() .lastComponent(), n.getValue())); } } } private static void createDocuments(final IndexWriter writer, final TaxonomyWriter taxoWriter, final List books) throws IOException { for (final Book b : books) { Document doc = new Document(); doc.add(new StringField("title", b.getTitle(), Store.YES)); doc.add(new StringField("category", b.getCategory(), Store.YES)); List categories = new ArrayList(); categories.add(new CategoryPath("author", b.getAuthor())); categories.add(new CategoryPath("category", b.getCategory())); categories.add(new CategoryPath("published", b.getPublished())); CategoryDocumentBuilder categoryDocBuilder = new CategoryDocumentBuilder( taxoWriter); categoryDocBuilder.setCategoryPaths(categories); categoryDocBuilder.build(doc); writer.addDocument(doc); } } }
The example above should produce the following output:
sbt> run-main com.hascode.tutorial.FacetingExample [info] Running com.hascode.tutorial.FacetingExample Search for books with the category:Novel returned : 3 results --------------------------------- Matching author: ------------------------------------ Mark Twain: 2 Some author: 1 Franz Kafka: 1 Matching category: ------------------------------------ Novel: 4 Matching published: ------------------------------------ 1901: 2 1850: 1
Tutorial Sources
Please feel free to to view and download the complete sources from this tutorial from my Bitbucket repository – or if you’ve got Mercurial installed just check it out with
hg clone https://bitbucket.org/hascode/lucene-4-tutorial
Resources
Tags: analyzer, Api, facet, faceting, indexer, lucene, maven, sbt, search, taxonomy
September 10th, 2012 at 8:12 am
Could you elaborate on how to create a filter query based on a facet entry? In your example it would be like clicking on “Mark Twain” and expecting only results within that facet.
January 16th, 2014 at 5:22 am
I am using LUCENE 4.4. CountFacetRequest as FacetRequest is working perfectly fine. Thanks.
But I tried with SumScoreFacetRequest as FacetRequest and its still showing the count.
Please help me on how can I change the count to some different paramater of my own choice.