Using Apache Avro with Java and Maven
March 8th, 2014 by Micha KopsApache Avro is a serialization framework similar to Google’s Protocol Buffers or Apache Thrift and offering features like rich data structures, a compact binary format, simple integration with dynamic languages and more.
In the following short five minute tutorial, we’re going to specify a schema to serialize books in a JSON format, we’re using the Avro Maven plugin to generate the stub classes and finally we’re serializing the data into a single file.
Contents
Why another Framework
What are the advantages of Avro over Google Protocol Buffers (see my article about Protocol Buffers) or Apache Thrift?
Imho one thing I like is the use of JSON as a data format, another good thing is the fact that the schema is written to the serialized file so there might be less problems when using different versions of a schema.
If you’re interested in a more detailed (but biased) comparison, feel free to have a look at this nice presentation from Igor Anishchenko.
Maven Dependencies
We’re adding two dependencies to our pom.xml – the one is the Apache Avro library, the other one is the Maven plugin that allows us to generate Java classes from our format specifications.
We’re configuring the plugin to look in src/main/avro for specification files and to put the generated Java classes to src/main/java.
<dependencies> <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro</artifactId> <version>1.7.6</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.avro</groupId> <artifactId>avro-maven-plugin</artifactId> <version>1.7.6</version> <executions> <execution> <phase>generate-sources</phase> <goals> <goal>schema</goal> </goals> <configuration> <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory> <outputDirectory>${project.basedir}/src/main/java/</outputDirectory> </configuration> </execution> </executions> </plugin> </plugins> </build>
Defining the Book Schema
An Avro schema is written in the JSON format and we may use different primitive or complex types here.
A detailed documentation can be found in the Avro documentation.
This is our book schema in src/main/avro/book.avsc:
{ "namespace": "com.hascode.entity", "type": "record", "name": "Book", "fields": [ {"name": "name", "type": "string"}, {"name": "id", "type": ["int", "null"]}, {"name": "category", "type": ["string", "null"]} ] }
Schema Compiling / Generating Classes
You may run the following command to create the Book class needed from the schema file:
mvn generate-sources
Serializing / Deserializing from a File
The following snippet serializes books to a file and afterwards deserializes it and prints it to the output.
package com.hascode.tutorial; import java.io.File; import java.io.IOException; import org.apache.avro.file.DataFileReader; import org.apache.avro.file.DataFileWriter; import org.apache.avro.io.DatumReader; import org.apache.avro.io.DatumWriter; import org.apache.avro.specific.SpecificDatumReader; import org.apache.avro.specific.SpecificDatumWriter; import com.hascode.entity.Book; public class FileSerializationExample { public static void main(final String[] args) throws IOException { Book book1 = Book.newBuilder().setId(123).setName("Programming is fun") .setCategory("Fiction").build(); Book book2 = new Book("Some book", 456, "Horror"); Book book3 = new Book(); book3.setName("And another book"); book3.setId(789); File store = File.createTempFile("book", ".avro"); // serializing System.out .println("serializing books to temp file: " + store.getPath()); DatumWriter<Book> bookDatumWriter = new SpecificDatumWriter<Book>( Book.class); DataFileWriter<Book> bookFileWriter = new DataFileWriter<Book>( bookDatumWriter); bookFileWriter.create(book1.getSchema(), store); bookFileWriter.append(book1); bookFileWriter.append(book2); bookFileWriter.append(book3); bookFileWriter.close(); // deserializing DatumReader<Book> bookDatumReader = new SpecificDatumReader<Book>( Book.class); DataFileReader<Book> bookFileReader = new DataFileReader<Book>(store, bookDatumReader); while (bookFileReader.hasNext()) { Book b1 = bookFileReader.next(); System.out.println("deserialized from file: " + b1); } } }
Running the example code should produce the following output:
serializing books to temp file: /tmp/book5516033028097754203.avro deserialized from file: {"name": "Programming is fun", "id": 123, "category": "Fiction"} deserialized from file: {"name": "Some book", "id": 456, "category": "Horror"} deserialized from file: {"name": "And another book", "id": 789, "category": null}
Tutorial Sources
Please feel free to download the tutorial sources from my Bitbucket repository, fork it there or clone it using Git:
git clone https://bitbucket.org/hascode/avro-tutorial.git
Resources
- Apache Avro Website
- Apache Avro Documentation: Schema Format
- Igor Anishchenko: Thrift vs Protocol Buffers vs Avro – Biased Comparison
Tags: Apache, avro, google protocol buffers, maven, serialization, thrift
March 10th, 2014 at 1:42 pm
or with JDK onboard Plain Old CORBA:
module entity
{
struct Book
{
string name;
long id;
string category;
};
typedef sequence BookSequence;
};
package t;
import java.io.FileOutputStream;
import java.util.Properties;
import org.omg.CORBA.Any;
import org.omg.CORBA.ORB;
import org.omg.IOP.Codec;
import org.omg.IOP.CodecFactory;
import org.omg.IOP.CodecFactoryHelper;
import org.omg.IOP.ENCODING_CDR_ENCAPS;
import org.omg.IOP.Encoding;
import entity.Book;
import entity.BookSequenceHelper;
public class T
{
public static void main(String[] args) throws Exception
{
ORB orb = ORB.init(args, null);
org.omg.CORBA.Object obj = orb.resolve_initial_references(“CodecFactory”);
CodecFactory codecFactory = CodecFactoryHelper.narrow(obj);
Codec codec = codecFactory.create_codec(new Encoding(
ENCODING_CDR_ENCAPS.value, (byte) 1, (byte) 2));
Book[] books = {
new Book(“Programming is fun”, 1, “Fiction”),
new Book(“Some book”, 2, “Horror”)};
Any any = orb.create_any();
BookSequenceHelper.insert(any, books);
byte[] b = codec.encode(any);
FileOutputStream fos = new FileOutputStream(“books.iiop”);
fos.write(b);
fos.close();
Any decoded = codec.decode(b);
Book[] books2 = BookSequenceHelper.extract(decoded);
System.out.println(books2);
}
}
March 10th, 2014 at 9:58 pm
Interesting, haven’t seen CORBA in a while! Thanks for your input! :)
February 22nd, 2018 at 1:02 pm
Nice article, it helped to to understand avro maven plugin, I am working on my college project, thank you very much,
could you please tell me, where the maven command is executed, i mean in what folder, correct me, if i am wrong, should I need to run where avro schema file located.
Can you explain how to do using eclipse, I am having issues with that approach.
February 22nd, 2018 at 7:53 pm
Hi,
this is my project structure:
I’m running Maven from the project directory. In Eclipse IDE you might need the m2eclipse integration.