Articles Tagged ‘lucene’

Elasticsearch Integration Testing with Java

Tuesday, August 23rd, 2016

When building up search engines, indexing tons of data into a schema-less, distributed data store, Elasticsearch has always been a favourite tool of mine.

In addition to its core features, it also offers tools and documentation for us developers when we need to write integration tests for our Elasticsearch powered Java applications.

In the following tutorial I’d like to demonstrate how to implement a small sample application using Elasticsearch under the hood and how to write integration-tests with these tools for this application afterwards.

(more…)

Lucene by Example: Specifying Analyzers on a per-field-basis and writing a custom Analyzer/Tokenizer

Sunday, July 6th, 2014

Lucene is my favourite search engine library and the more often I use it in my projects the more features or functionality I find that were unknown to me.

Two of those features I’d like to share in the following tutorial is one the one hand the possibility to specify different analyzers on a per-field basis and on the other hand the API to create a simple character based tokenizer and analyzer within a few steps.

Finally we’re going to create a small index- and search application to test both features in a real scenario.

(more…)

Creating elegant, typesafe Queries for JPA, mongoDB/Morphia and Lucene using Querydsl

Thursday, February 13th, 2014

Querydsl is a framework that allows us to create elegant, type-safe queries for a variety of different data-sources like Java Persistence API (JPA) entities, Java Data Objects (JDO), mongoDB with Morphia, SQL, Hibernate Search up to Lucene.

In the following tutorial we’re implementing example queries for different environments – Java Persistence API compared with a JPQL and a criteria API query, mongoDB with Morphia and last but not least for Lucene.

(more…)

Content Detection, Metadata and Content Extraction with Apache Tika

Sunday, December 2nd, 2012

Encountering the situation that you want to extract meta-data or content from a file – might it be an office document, a spreadsheet or even a mp3 or an image – or you’d like to detect the content type for a given file then Apache Tika might be a helpful tool for you.

Apache Tika supports a variety of document formats and has a nice, extendable parser and detection API with a lot of built-in parsers available.

(more…)

Lucene Snippets: Index Stats

Saturday, September 8th, 2012

In Lucene 4.x there is an API to fetch index statistics for specific document’s fields.

The following examples shows how to create an index with some random documents and fetch some statistics for a field afterwards ..

(more…)

Lucene Snippets: Faceting Search

Tuesday, August 28th, 2012

The latest snippet from my Lucene examples demonstrates how to achieve a facet search using the Lucene 4.0 API and how easy it is to define multiple category paths to aggregate search results for different possible facets.

In the following example we’re indexing some books as a classical example and create multiple category paths for author, publication date and category afterwards ..

(more…)

Hibernate Search Faceting: Discrete and Range Faceting by Example

Monday, March 26th, 2012

In today’s tutorial we’re exploring the world of faceted searches like the one we’re used to see when we’re searching for an item on Amazon.com or other websites. We’re using Hibernate Search here that offers an API to perform discrete as well as range faceted searches on our persisted data.

(more…)

JPA Persistence and Lucene Indexing combined in Hibernate Search

Sunday, February 5th, 2012

Often we’re writing an application that has to handle entities that – on the one side need to be persisted in a relational database using standards like the Java Persistence API (JPA) and using frameworks like Hibernate ORM or EclipseLink.

On the other side those entities and their fields are often stored in a highspeed indexer like Lucene. From this situation arises a bunch of common problems .. to synchronize both data sources, to handle special data mapped in an entity like an office document and so on..

Hibernate Search makes this all a lot easier for us as we’re hopefully going to see in the following short tutorial…

(more…)

Neo4j Graph Database Tutorial: How to build a Route Planner and other Examples

Friday, January 20th, 2012

Often in the life of developer’s life there is a scenario where using a relational database tends to get complicated or sometimes even slow – especially when there are fragments with multiple relationships or multiple connections present. This often leads to complex database queries or desperate software engineers trying to handle those problems with their ORM framework.

A possible solution might be to switch from a relational database to a graph database – and – neo4j is our tool of choice here. In the following tutorial we’re going to implement several examples to demonstrate the strengths of a graph database .. from a route planner to a social graph.

(more…)

Extending the Confluence Search Index

Sunday, May 23rd, 2010

Developing plugins for the Confluence Wiki a developer sometimes needs to save additional metadata to a page object using Bandana or the ContentPropertyManager. Wouldn’t it be nice if this metadata was available in the built-in Lucene index?

That is were the Confluence Extractor Module comes into play..

(more…)

How to build a quick Lucene Search

Thursday, March 25th, 2010

Helo – today I wanted to post a small tutorial for a small index and search operation using the Lucene indexer and Maven for the project setup.

(more…)

Search
Categories