Articles Tagged ‘parser’

XMLBeam: Snippets and Examples

Tuesday, July 22nd, 2014

XMLBeam is an interesting library using an approach of projecting parts of an XML DOM tree into Java using some simple interfaces, annotations and XPath expressions.

In the following article, I’d like to share three experiments of mine with this library for reading, writing XML and parsing a live RSS feed.

(more…)

Creating Grammar Parsers in Java and Scala with Parboiled

Sunday, January 26th, 2014

Parboiled is a modern. lightweight and easy to use library to parse expression grammars in Java or Scala and in my humble opinion it is perfect for use cases where you need something between regular expressions and a complex parser generator like ANTLR.

In the following tutorial we’re going to create a simple grammar to specify a task list and write an implementation of a parser also as unit tests for each grammar rule in Java.

Additionally, we’re using the Scala variant of Parboiled to build up an Abstract Syntax Tree parser and analyze a given task list with it.

(more…)

Content Detection, Metadata and Content Extraction with Apache Tika

Sunday, December 2nd, 2012

Encountering the situation that you want to extract meta-data or content from a file – might it be an office document, a spreadsheet or even a mp3 or an image – or you’d like to detect the content type for a given file then Apache Tika might be a helpful tool for you.

Apache Tika supports a variety of document formats and has a nice, extendable parser and detection API with a lot of built-in parsers available.

(more…)

Screenscraping made easy using jsoup and Maven

Tuesday, August 30th, 2011

Sometimes in a developer’s life there is no clean API available to gather information from a web application .. no SOAP, no XML-RPC and no REST .. just a website hiding the information we’re looking for somewhere in its DOM hierarchy – so the only solution is screenscraping.

Screenscraping always leaves me with a bad feeling – but luckily there is a tool that makes this job at least a bit easier for a developer .. jsoup to the rescue!

(more…)

Search
Categories