Articles Tagged ‘content extraction’

Content Detection, Metadata and Content Extraction with Apache Tika

Sunday, December 2nd, 2012

Encountering the situation that you want to extract meta-data or content from a file – might it be an office document, a spreadsheet or even a mp3 or an image – or you’d like to detect the content type for a given file then Apache Tika might be a helpful tool for you.

Apache Tika supports a variety of document formats and has a nice, extendable parser and detection API with a lot of built-in parsers available.

(more…)

Search
Tags
Categories