Using Apache Avro with Java and Maven

March 8th, 2014 by

Apache Avro is a serialization framework similar to Google’s Protocol Buffers or Apache Thrift and offering features like rich data structures, a compact binary format, simple integration with dynamic languages and more.

In the following short five minute tutorial, we’re going to specify a schema to serialize books in a JSON format, we’re using the Avro Maven plugin to generate the stub classes and finally we’re serializing the data into a single file.

Avro Schema Declaration

Avro Schema Declaration

 

Why another Framework

What are the advantages of Avro over Google Protocol Buffers (see my article about Protocol Buffers) or Apache Thrift?

Imho one thing I like is the use of JSON as a data format, another good thing is the fact that the schema is written to the serialized file so there might be less problems when using different versions of a schema.

If you’re interested in a more detailed (but biased) comparison, feel free to have a look at this nice presentation from Igor Anishchenko.

Maven Dependencies

We’re adding two dependencies to our pom.xml – the one is the Apache Avro library, the other one is the Maven plugin that allows us to generate Java classes from our format specifications.

We’re configuring the plugin to look in src/main/avro for specification files and to put the generated Java classes to src/main/java.

<dependencies>
	<dependency>
		<groupId>org.apache.avro</groupId>
		<artifactId>avro</artifactId>
		<version>1.7.6</version>
	</dependency>
</dependencies>
 
<build>
	<plugins>
		<plugin>
			<groupId>org.apache.avro</groupId>
			<artifactId>avro-maven-plugin</artifactId>
			<version>1.7.6</version>
			<executions>
				<execution>
					<phase>generate-sources</phase>
					<goals>
						<goal>schema</goal>
					</goals>
					<configuration>
						<sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
						<outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
					</configuration>
				</execution>
			</executions>
		</plugin>
	</plugins>
</build>

Defining the Book Schema

An Avro schema is written in the JSON format and we may use different primitive or complex types here.

A detailed documentation can be found in the Avro documentation.

This is our book schema in src/main/avro/book.avsc:

{
	"namespace": "com.hascode.entity",
	"type": "record",
	"name": "Book",
	"fields": [
		{"name": "name", "type": "string"},
		{"name": "id",  "type": ["int", "null"]},
		{"name": "category", "type": ["string", "null"]}
	 ]
}

Schema Compiling / Generating Classes

You may run the following command to create the Book class needed from the schema file:

mvn generate-sources

Serializing / Deserializing from a File

The following snippet serializes books to a file and afterwards deserializes it and prints it to the output.

package com.hascode.tutorial;
 
import java.io.File;
import java.io.IOException;
 
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.specific.SpecificDatumReader;
import org.apache.avro.specific.SpecificDatumWriter;
 
import com.hascode.entity.Book;
 
public class FileSerializationExample {
	public static void main(final String[] args) throws IOException {
		Book book1 = Book.newBuilder().setId(123).setName("Programming is fun")
				.setCategory("Fiction").build();
		Book book2 = new Book("Some book", 456, "Horror");
		Book book3 = new Book();
		book3.setName("And another book");
		book3.setId(789);
		File store = File.createTempFile("book", ".avro");
 
		// serializing
		System.out
				.println("serializing books to temp file: " + store.getPath());
		DatumWriter<Book> bookDatumWriter = new SpecificDatumWriter<Book>(
				Book.class);
		DataFileWriter<Book> bookFileWriter = new DataFileWriter<Book>(
				bookDatumWriter);
		bookFileWriter.create(book1.getSchema(), store);
		bookFileWriter.append(book1);
		bookFileWriter.append(book2);
		bookFileWriter.append(book3);
		bookFileWriter.close();
 
		// deserializing
		DatumReader<Book> bookDatumReader = new SpecificDatumReader<Book>(
				Book.class);
		DataFileReader<Book> bookFileReader = new DataFileReader<Book>(store,
				bookDatumReader);
		while (bookFileReader.hasNext()) {
			Book b1 = bookFileReader.next();
			System.out.println("deserialized from file: " + b1);
		}
	}
 
}

Running the example code should produce the following output:

serializing books to temp file: /tmp/book5516033028097754203.avro
deserialized from file: {"name": "Programming is fun", "id": 123, "category": "Fiction"}
deserialized from file: {"name": "Some book", "id": 456, "category": "Horror"}
deserialized from file: {"name": "And another book", "id": 789, "category": null}

Tutorial Sources

Please feel free to download the tutorial sources from my Bitbucket repository, fork it there or clone it using Git:

git clone https://bitbucket.org/hascode/avro-tutorial.git

Resources

Tags: , , , , ,

4 Responses to “Using Apache Avro with Java and Maven”

  1. Juergen Weber Says:

    or with JDK onboard Plain Old CORBA:

    module entity
    {
    struct Book
    {
    string name;
    long id;
    string category;
    };

    typedef sequence BookSequence;
    };

    package t;

    import java.io.FileOutputStream;
    import java.util.Properties;

    import org.omg.CORBA.Any;
    import org.omg.CORBA.ORB;
    import org.omg.IOP.Codec;
    import org.omg.IOP.CodecFactory;
    import org.omg.IOP.CodecFactoryHelper;
    import org.omg.IOP.ENCODING_CDR_ENCAPS;
    import org.omg.IOP.Encoding;

    import entity.Book;
    import entity.BookSequenceHelper;

    public class T
    {

    public static void main(String[] args) throws Exception
    {
    ORB orb = ORB.init(args, null);

    org.omg.CORBA.Object obj = orb.resolve_initial_references(“CodecFactory”);

    CodecFactory codecFactory = CodecFactoryHelper.narrow(obj);

    Codec codec = codecFactory.create_codec(new Encoding(
    ENCODING_CDR_ENCAPS.value, (byte) 1, (byte) 2));

    Book[] books = {

    new Book(“Programming is fun”, 1, “Fiction”),
    new Book(“Some book”, 2, “Horror”)};

    Any any = orb.create_any();

    BookSequenceHelper.insert(any, books);

    byte[] b = codec.encode(any);

    FileOutputStream fos = new FileOutputStream(“books.iiop”);
    fos.write(b);
    fos.close();

    Any decoded = codec.decode(b);

    Book[] books2 = BookSequenceHelper.extract(decoded);

    System.out.println(books2);
    }
    }

  2. micha kops Says:

    Interesting, haven’t seen CORBA in a while! Thanks for your input! :)

  3. laki Says:

    Nice article, it helped to to understand avro maven plugin, I am working on my college project, thank you very much,

    could you please tell me, where the maven command is executed, i mean in what folder, correct me, if i am wrong, should I need to run where avro schema file located.

    Can you explain how to do using eclipse, I am having issues with that approach.

  4. Micha Kops Says:

    Hi,

    this is my project structure:

    .
    ├── pom.xml
    ├── README.md
    └── src
        ├── main
        │   ├── avro
        │   │   └── book.avsc
        │   ├── java
        │   │   └── com
        │   │       └── hascode
        │   │           ├── entity
        │   │           │   └── Book.java
        │   │           └── tutorial
        │   │               └── FileSerializationExample.java
        │   └── resources
        └── test
            ├── java
            └── resources
    

    I’m running Maven from the project directory. In Eclipse IDE you might need the m2eclipse integration.

Search
Tags
Categories