<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Search on Micha Kops&#39; Tech Notes</title>
    <link>https://www.hascode.com/tags/search/</link>
    <description>Recent content in Search on Micha Kops&#39; Tech Notes</description>
    <generator>Hugo</generator>
    <language>en</language>
    <copyright>Copyright © 2010 - 2025 Micha Kops. #e9d956c0c0154a221ad83c925346a8fa0e72f866</copyright>
    <lastBuildDate>Tue, 23 Aug 2016 00:00:00 +0200</lastBuildDate>
    <atom:link href="https://www.hascode.com/tags/search/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Elasticsearch Integration Testing with Java</title>
      <link>https://www.hascode.com/elasticsearch-integration-testing-with-java/</link>
      <pubDate>Tue, 23 Aug 2016 00:00:00 +0200</pubDate>
      <guid>https://www.hascode.com/elasticsearch-integration-testing-with-java/</guid>
      <description>&lt;div id=&#34;preamble&#34;&gt;
&lt;div class=&#34;sectionbody&#34;&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;When building up search engines, indexing tons of data into a schema-less, distributed data store, Elasticsearch has always been a favourite tool of mine.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;In addition to its core features, it also offers tools and documentation for us developers when we need to write integration tests for our Elasticsearch powered Java applications.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;In the following tutorial I’d like to demonstrate how to implement a small sample application using Elasticsearch under the hood and how to write integration-tests with these tools for this application afterwards.&lt;/p&gt;
&lt;/div&gt;</description>
    </item>
    <item>
      <title>Lucene by Example: Specifying Analyzers on a per-field-basis and writing a custom Analyzer/Tokenizer</title>
      <link>https://www.hascode.com/lucene-by-example-specifying-analyzers-on-a-per-field-basis-and-writing-a-custom-analyzer/tokenizer/</link>
      <pubDate>Sun, 06 Jul 2014 00:00:00 +0200</pubDate>
      <guid>https://www.hascode.com/lucene-by-example-specifying-analyzers-on-a-per-field-basis-and-writing-a-custom-analyzer/tokenizer/</guid>
      <description>&lt;div id=&#34;preamble&#34;&gt;
&lt;div class=&#34;sectionbody&#34;&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;Lucene is my favourite search engine library and the more often I use it in my projects the more features or functionality I find that were unknown to me.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;Two of those features I’d like to share in the following tutorial is one the one hand the possibility to specify different analyzers on a per-field basis and on the other hand the API to create a simple character based tokenizer and analyzer within a few steps.&lt;/p&gt;
&lt;/div&gt;</description>
    </item>
    <item>
      <title>Content Detection, Metadata and Content Extraction with Apache Tika</title>
      <link>https://www.hascode.com/content-detection-metadata-and-content-extraction-with-apache-tika/</link>
      <pubDate>Sun, 02 Dec 2012 00:00:00 +0100</pubDate>
      <guid>https://www.hascode.com/content-detection-metadata-and-content-extraction-with-apache-tika/</guid>
      <description>&lt;div id=&#34;preamble&#34;&gt;
&lt;div class=&#34;sectionbody&#34;&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;Encountering the situation that you want to extract meta-data or content from a file – might it be an office document, a spreadsheet or even a mp3 or an image – or you’d like to detect the content type for a given file then Apache Tika might be a helpful tool for you.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;Apache Tika supports a variety of document formats and has a nice, extendable parser and detection API with a lot of built-in parsers available.&lt;/p&gt;
&lt;/div&gt;</description>
    </item>
    <item>
      <title>Lucene Snippets: Index Stats</title>
      <link>https://www.hascode.com/lucene-snippets-index-stats/</link>
      <pubDate>Sat, 08 Sep 2012 00:00:00 +0200</pubDate>
      <guid>https://www.hascode.com/lucene-snippets-index-stats/</guid>
      <description>&lt;div id=&#34;preamble&#34;&gt;
&lt;div class=&#34;sectionbody&#34;&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;In Lucene 4.x there is an API to fetch index statistics for specific document’s fields.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;The following examples shows how to create an index with some random documents and fetch some statistics for a field afterwards ..&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;sect1&#34;&gt;
&lt;h2 id=&#34;_lucene_dependencies&#34;&gt;Lucene Dependencies&lt;/h2&gt;
&lt;div class=&#34;sectionbody&#34;&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;Just one dependency needed here .. &lt;em&gt;lucene-core&lt;/em&gt;. I’ve added the declarations needed for Maven and SBT here .. if you’re using Gradle or Buildr you should’t have a problem to create your build file either..&lt;/p&gt;
&lt;/div&gt;</description>
    </item>
    <item>
      <title>Lucene Snippets: Faceting Search</title>
      <link>https://www.hascode.com/lucene-snippets-faceting-search/</link>
      <pubDate>Tue, 28 Aug 2012 00:00:00 +0200</pubDate>
      <guid>https://www.hascode.com/lucene-snippets-faceting-search/</guid>
      <description>&lt;div id=&#34;preamble&#34;&gt;
&lt;div class=&#34;sectionbody&#34;&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;The latest snippet from my Lucene examples demonstrates how to achieve a facet search using the Lucene 4.0 API and how easy it is to define multiple category paths to aggregate search results for different possible facets.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;In the following example we’re indexing some books as a classical example and create multiple category paths for author, publication date and category afterwards ..&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;sect1&#34;&gt;
&lt;h2 id=&#34;_lucene_dependencies&#34;&gt;Lucene Dependencies&lt;/h2&gt;
&lt;div class=&#34;sectionbody&#34;&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;We simply need two dependencies here .. &lt;em&gt;lucene-core&lt;/em&gt; of course and in addition the &lt;em&gt;lucene-facet&lt;/em&gt; library .. I’ve added the declarations needed for Maven and SBT here .. if you’re using Gradle or Buildr you should’t have a problem to transfer the information needed ;)&lt;/p&gt;
&lt;/div&gt;</description>
    </item>
    <item>
      <title>Hibernate Search Faceting: Discrete and Range Faceting by Example</title>
      <link>https://www.hascode.com/hibernate-search-faceting-discrete-and-range-faceting-by-example/</link>
      <pubDate>Mon, 26 Mar 2012 00:00:00 +0200</pubDate>
      <guid>https://www.hascode.com/hibernate-search-faceting-discrete-and-range-faceting-by-example/</guid>
      <description>&lt;div id=&#34;preamble&#34;&gt;
&lt;div class=&#34;sectionbody&#34;&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;In today’s tutorial we’re exploring the world of faceted searches like the one we’re used to see when we’re searching for an item on Amazon.com or other websites. We’re using Hibernate Search here that offers an API to perform discrete as well as range faceted searches on our persisted data.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;sect1&#34;&gt;
&lt;h2 id=&#34;_maven_dependencies_needed&#34;&gt;Maven Dependencies Needed&lt;/h2&gt;
&lt;div class=&#34;sectionbody&#34;&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;For simplicity’s sake am I going to use an HSQL database for persistence, in addition the dependencies for &lt;em&gt;hibernate-entitymanager&lt;/em&gt; and &lt;em&gt;hibernate-search&lt;/em&gt; (of course) should be added to your &lt;em&gt;pom.xml&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;</description>
    </item>
    <item>
      <title>Extending the Confluence Search Index</title>
      <link>https://www.hascode.com/extending-the-confluence-search-index/</link>
      <pubDate>Sun, 23 May 2010 00:00:00 +0200</pubDate>
      <guid>https://www.hascode.com/extending-the-confluence-search-index/</guid>
      <description>&lt;div id=&#34;preamble&#34;&gt;
&lt;div class=&#34;sectionbody&#34;&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;Developing plugins for the Confluence Wiki a developer sometimes needs to save additional metadata to a page object using Bandana or the ContentPropertyManager. Wouldn’t it be nice if this metadata was available in the built-in Lucene index?&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;That is were the Confluence Extractor Module comes into play..&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;sect1&#34;&gt;
&lt;h2 id=&#34;_overview&#34;&gt;Overview&lt;/h2&gt;
&lt;div class=&#34;sectionbody&#34;&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;An extractor allows the developer to add new fields to the lucene search index. Creating a new extractor is quite simple – just implement the interface &lt;em&gt;com.atlassian.bonnie.search.Extractor&lt;/em&gt; or &lt;em&gt;bucket.search.lucene.extractor.BaseAttachmentContentExtractor&lt;/em&gt; if you want to build a new file extractor.&lt;/p&gt;
&lt;/div&gt;</description>
    </item>
    <item>
      <title>How to build a quick Lucene Search</title>
      <link>https://www.hascode.com/how-to-build-a-quick-lucene-search/</link>
      <pubDate>Thu, 25 Mar 2010 00:00:00 +0100</pubDate>
      <guid>https://www.hascode.com/how-to-build-a-quick-lucene-search/</guid>
      <description>&lt;div id=&#34;preamble&#34;&gt;
&lt;div class=&#34;sectionbody&#34;&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;Helo – today I wanted to post a small tutorial for a small index and search operation using the &lt;a href=&#34;http://lucene.apache.org/&#34;&gt;Lucene&lt;/a&gt; indexer and Maven for the project setup.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;sect1&#34;&gt;
&lt;h2 id=&#34;_setup&#34;&gt;Setup&lt;/h2&gt;
&lt;div class=&#34;sectionbody&#34;&gt;
&lt;div class=&#34;olist arabic&#34;&gt;
&lt;ol class=&#34;arabic&#34;&gt;
&lt;li&gt;
&lt;p&gt;Create an empty Maven sample project using the &lt;a href=&#34;http://m2eclipse.sonatype.org/&#34;&gt;Eclipse Maven Plugin&lt;/a&gt; or use the following console command:&lt;/p&gt;
&lt;div class=&#34;listingblock&#34;&gt;
&lt;div class=&#34;content&#34;&gt;
&lt;pre class=&#34;highlight&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;mvn archetype:create -DgroupId=com.hascode.demo.search -DartifactId=lucene-sample&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Here is my &lt;em&gt;pom.xml&lt;/em&gt; there are some dependencies for Lucene defined:&lt;/p&gt;
&lt;div class=&#34;listingblock&#34;&gt;
&lt;div class=&#34;content&#34;&gt;
&lt;pre class=&#34;highlight&#34;&gt;&lt;code class=&#34;language-xml&#34; data-lang=&#34;xml&#34;&gt;&amp;lt;project xmlns=&amp;#34;http://maven.apache.org/POM/4.0.0&amp;#34; xmlns:xsi=&amp;#34;http://www.w3.org/2001/XMLSchema-instance&amp;#34; xsi:schemaLocation=&amp;#34;http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd&amp;#34;&amp;gt;
  &amp;lt;modelVersion&amp;gt;4.0.0&amp;lt;/modelVersion&amp;gt;
  &amp;lt;groupId&amp;gt;com.hascode.demo.search&amp;lt;/groupId&amp;gt;
  &amp;lt;artifactId&amp;gt;lucene-sample&amp;lt;/artifactId&amp;gt;
  &amp;lt;version&amp;gt;0.0.1-SNAPSHOT&amp;lt;/version&amp;gt;
  &amp;lt;name&amp;gt;My Lucene Search Sample&amp;lt;/name&amp;gt;
  &amp;lt;description&amp;gt;Lucene Search Sample&amp;lt;/description&amp;gt;
  &amp;lt;dependencies&amp;gt;
    &amp;lt;dependency&amp;gt;
      &amp;lt;groupId&amp;gt;org.apache.lucene&amp;lt;/groupId&amp;gt;
      &amp;lt;artifactId&amp;gt;lucene-core&amp;lt;/artifactId&amp;gt;
      &amp;lt;version&amp;gt;2.4.1&amp;lt;/version&amp;gt;
    &amp;lt;/dependency&amp;gt;
    &amp;lt;dependency&amp;gt;
      &amp;lt;groupId&amp;gt;lucene&amp;lt;/groupId&amp;gt;
      &amp;lt;artifactId&amp;gt;lucene&amp;lt;/artifactId&amp;gt;
      &amp;lt;version&amp;gt;1.4.3&amp;lt;/version&amp;gt;
    &amp;lt;/dependency&amp;gt;
  &amp;lt;/dependencies&amp;gt;
&amp;lt;/project&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;</description>
    </item>
    <item>
      <title>Docker Snippets</title>
      <link>https://www.hascode.com/docker-snippets/</link>
      <pubDate>Mon, 01 Mar 2010 00:00:00 +0100</pubDate>
      <guid>https://www.hascode.com/docker-snippets/</guid>
      <description>&lt;div class=&#34;sect1&#34;&gt;
&lt;h2 id=&#34;_restrict_network&#34;&gt;Restrict Network&lt;/h2&gt;
&lt;div class=&#34;sectionbody&#34;&gt;
&lt;div class=&#34;admonitionblock tip&#34;&gt;
&lt;table&gt;
&lt;tbody&gt;&lt;tr&gt;
&lt;td class=&#34;icon&#34;&gt;
&lt;i class=&#34;fa icon-tip&#34; title=&#34;Tip&#34;&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;td class=&#34;content&#34;&gt;
Can be useful when using a third-party image that we do not trust
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;div class=&#34;sect2&#34;&gt;
&lt;h3 id=&#34;_run_with_no_network&#34;&gt;Run with no network&lt;/h3&gt;
&lt;div class=&#34;listingblock&#34;&gt;
&lt;div class=&#34;content&#34;&gt;
&lt;pre class=&#34;highlight&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;docker run --network none &amp;lt;image&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;sect2&#34;&gt;
&lt;h3 id=&#34;_run_with_private_isolated_network&#34;&gt;Run with private isolated network&lt;/h3&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;At least containers attached to this network can talk with another&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&#34;listingblock&#34;&gt;
&lt;div class=&#34;content&#34;&gt;
&lt;pre class=&#34;highlight&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;docker network create --internal my_isolated_network
docker run --network my_isolated_network &amp;lt;image&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;sect2&#34;&gt;
&lt;h3 id=&#34;_block_using_firewall&#34;&gt;Block using firewall&lt;/h3&gt;
&lt;div class=&#34;paragraph&#34;&gt;
&lt;p&gt;e.g. using &lt;code&gt;iptables&lt;/code&gt; or &lt;code&gt;ipfw&lt;/code&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&#34;listingblock&#34;&gt;
&lt;div class=&#34;content&#34;&gt;
&lt;pre class=&#34;highlight&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;# Get container&amp;#39;s IP
docker inspect -f &amp;#39;{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}&amp;#39; &amp;lt;container_name&amp;gt;

# Block all outbound connections from that IP
sudo iptables -I DOCKER-USER -s &amp;lt;container_ip&amp;gt; -j DROP&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;</description>
    </item>
    <item>
      <title>Firefox Snippets</title>
      <link>https://www.hascode.com/firefox-snippets/</link>
      <pubDate>Mon, 01 Mar 2010 00:00:00 +0100</pubDate>
      <guid>https://www.hascode.com/firefox-snippets/</guid>
      <description>&lt;div class=&#34;sect1&#34;&gt;
&lt;h2 id=&#34;_configure_address_bar_to_return_search_results&#34;&gt;Configure address bar to return search results&lt;/h2&gt;
&lt;div class=&#34;sectionbody&#34;&gt;
&lt;div class=&#34;ulist&#34;&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Enter &lt;code&gt;about:config&lt;/code&gt; in the address bar&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Search for the key &lt;code&gt;keyword.url&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Modify the value for your search engine of choice .. e.g. for the google search: &lt;a href=&#34;http://www.google.com/search?q=&#34; class=&#34;bare&#34;&gt;http://www.google.com/search?q=&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
    </item>
  </channel>
</rss>
