Using Google Protocol Buffers (protobuf) in Java

Hello, Khabrovites. Within the course “Java Developer. Professional” prepared a translation of useful material for you.

We also invite you to attend an open webinar on the topic “GRPC for microservices or not a single REST.”


Recently released third edition of the book “Effective Java” (“Java: Programming Effectively”), and I was wondering what was new in this classic Java book, since the previous edition only covered Java 6. Obviously, there are completely new topics related to Java 7, Java 8 and Java 9such as Chapter 7 “Lambdas and Streams”, Section 9 “Prefer try-with-resources to try-finally” (in the Russian edition of 2.9. Prefer try-with-resources to try- finally “) and section 55” Return optionals judiciously “(in the Russian edition” 8.7. Return Optional with caution “). But I was a little surprised when I found a new section, not related to innovations in Java, but due to changes in the world of software development. It was this section 85 “Prefer alternatives to Java Serialization” (in the Russian edition of “12.1. Prefer alternatives to Java serialization”) that prompted me to write this article on the use of Google Protocol Buffers in Java.

In section 85, “Prefer alternatives to Java Serialization” (12.1, “Prefer alternatives to Java serialization”), Josh Bloch highlights the following two statements related to Java serialization in bold:

“The best way to avoid serialization problems is to never deserialize anything.”

“There is no reason to use Java serialization on any new system you write.”

After outlining the problems with deserialization in Java and making these bold claims, Bloch recommends using what he calls a “cross-platform representation of structured data” (to avoid the “serialization” confusion when discussing Java). Bloch says the main solutions here are Json (JavaScript Object Notation) and Protocol buffers (protobuf). I found the mention of Protocol Buffers interesting as I have been reading a little about them and playing around with them lately. There is quite a lot of material on the Internet on the use of JSON (even in Java), while the awareness of Protocol Buffers among java developers is much less. Therefore, I think the article on using Protocol Buffers in Java will be helpful.

On project page Google Protocol Buffers is described as “a language and platform independent extensible mechanism for serializing structured data.” It also explains: “Like XML, but smaller, faster and simpler.” Although one of the advantages of Protocol Buffers is support for various programming languages, this article will focus exclusively on the use of Protocol Buffers in Java.

There are several useful online resources related to Protocol Buffers, including project home page, the protobuf project page on GitHub, proto3 Language Guide (also available proto2 Language Guide), tutorial Protocol Buffer Basics: Java, leadership Java Generated Code Guide, API documentation Java API (Javadoc) Documentation, Protocol Buffers releases page and Maven repository page… The examples in this article are based on Protocol Buffers 3.5.1

The use of Protocol Buffers in Java is described in the tutorial “Protocol Buffer Basics: Java“. It covers many more features and things that need to be considered in Java than what I will cover here. The first step is to define the Protocol Buffers format, which is language independent. It is described in a text file with the extension .proto As an example, let’s describe the protocol format in the album.proto file, which is shown in the following code listing.

album.proto

syntax = "proto3";

option java_outer_classname = "AlbumProtos";
option java_package = "dustin.examples.protobuf";

message Album {
    string title = 1;
    repeated string artist = 2;
    int32 release_year = 3;
    repeated string song_title = 4;
}

Despite the simplicity of the above definition of the protocol format, there is quite a lot of information in it. The first line explicitly states that it is used proto3 instead proto2the default if nothing is explicitly specified. The two lines starting with option indicate the options for generating Java code (the name of the generated class and the package of this class) and are only needed when using Java.

The “message” keyword defines the “Album” structure that we want to represent. It has four fields, three of which are strings (string), and one is an integer (int32). Two of them may appear more than once in the message, since they have the repeated reserved word. Note that the message format is defined independently of Java, with the exception of two options that define the details of generating Java classes from this specification.

File album.protoabove, you now need to “compile” into a Java source class file (AlbumProtos.java in the package dustin.examples.protobuf) that can be used to write and read the Protocol Buffers binary format. The generation of the Java source file is done with the protoc compiler appropriate for your operating system. I am running this example on Windows 10, so I downloaded and unpacked the protoc-3.5.1-win32.zip file. The image below shows my running protoc for album.proto using the command protoc --proto_path=src --java_out=distgenerated album.proto

Before running the above command, I put the file album.proto to the src directory pointed to by --proto_path, and created an empty directory buildgenerated to host the generated Java source, which is specified in the parameter --java_out

The generated Java class AlbumProtos.java contains more than 1000 lines, and I will not give it here, it available on github… Among several interesting points about the generated code, I would like to point out the lack of expressions import (fully qualified class names with packages are used instead). More details on Java source code generated by protoc are available in the manual Java Generated Code… It is important to note that this generated AlbumProtos class has nothing to do with my Java application yet, and was generated exclusively from the album.proto text file shown earlier.

Now the Java source code for AlbumProtos needs to be added in your IDE to the project source list. Or it can be used as a library by compiling to .class or .jar.

Before moving on, we need a simple Java class to demonstrate Protocol Buffers. For this I will use the class Album, which is given below (code on github).

Album.java

package dustin.examples.protobuf;

import java.util.ArrayList;
import java.util.List;

/**
 * Music album.
 */
public class Album {
    private final String title;

    private final List < String > artists;

    private final int releaseYear;

    private final List < String > songsTitles;

    private Album(final String newTitle, final List < String > newArtists,
        final int newYear, final List < String > newSongsTitles) {
        title = newTitle;
        artists = newArtists;
        releaseYear = newYear;
        songsTitles = newSongsTitles;
    }

    public String getTitle() {
        return title;
    }

    public List < String > getArtists() {
        return artists;
    }

    public int getReleaseYear() {
        return releaseYear;
    }

    public List < String > getSongsTitles() {
        return songsTitles;
    }

    @Override
    public String toString() {
        return "'" + title + "' (" + releaseYear + ") by " + artists + " features songs " + songsTitles;
    }

    /**
     * Builder class for instantiating an instance of
     * enclosing Album class.
     */
    public static class Builder {
        private String title;
        private ArrayList < String > artists = new ArrayList < > ();
        private int releaseYear;
        private ArrayList < String > songsTitles = new ArrayList < > ();

        public Builder(final String newTitle, final int newReleaseYear) {
            title = newTitle;
            releaseYear = newReleaseYear;
        }

        public Builder songTitle(final String newSongTitle) {
            songsTitles.add(newSongTitle);
            return this;
        }

        public Builder songsTitles(final List < String > newSongsTitles) {
            songsTitles.addAll(newSongsTitles);
            return this;
        }

        public Builder artist(final String newArtist) {
            artists.add(newArtist);
            return this;
        }

        public Builder artists(final List < String > newArtists) {
            artists.addAll(newArtists);
            return this;
        }

        public Album build() {
            return new Album(title, artists, releaseYear, songsTitles);
        }
    }
}

We now have a data class Album, Protocol Buffers class representing this Album (AlbumProtos.java) and we are ready to write a Java application to “serialize” Album information without using Java serialization. Application code is in class AlbumDemowhose full code is available on github

Create an instance Album with the following code:

/**
 * Generates instance of Album to be used in demonstration.
 *
 * @return Instance of Album to be used in demonstration.
 */
public Album generateAlbum()
{
   return new Album.Builder("Songs from the Big Chair", 1985)
      .artist("Tears For Fears")
      .songTitle("Shout")
      .songTitle("The Working Hour")
      .songTitle("Everybody Wants to Rule the World")
      .songTitle("Mothers Talk")
      .songTitle("I Believe")
      .songTitle("Broken")
      .songTitle("Head Over Heels")
      .songTitle("Listen")
      .build();
}

Class AlbumProtosgenerated by Protocol Buffers includes a nested class AlbumProtos.Albumwhich is used for binary serialization of Album. The following listing demonstrates how this is done.

final Album album = instance.generateAlbum();
final AlbumProtos.Album albumMessage
    = AlbumProtos.Album.newBuilder()
        .setTitle(album.getTitle())
        .addAllArtist(album.getArtists())
        .setReleaseYear(album.getReleaseYear())
        .addAllSongTitle(album.getSongsTitles())
        .build();

As you can see from the previous example, the Builder pattern is used to populate the immutable instance of the class generated by Protocol Buffers. Through a reference, an instance of this class can now easily convert an object to a binary form of Protocol Buffers using the method toByteArray()as shown in the following listing:

final byte[] binaryAlbum = albumMessage.toByteArray();

Reading an array byte[] back to instance Album can be done like this:

/**
 * Generates an instance of Album based on the provided
 * bytes array.
 *
 * @param binaryAlbum Bytes array that should represent an
 *    AlbumProtos.Album based on Google Protocol Buffers
 *    binary format.
 * @return Instance of Album based on the provided binary form
 *    of an Album; may be {@code null} if an error is encountered
 *    while trying to process the provided binary data.
 */
public Album instantiateAlbumFromBinary(final byte[] binaryAlbum) {
    Album album = null;
    try {
        final AlbumProtos.Album copiedAlbumProtos = AlbumProtos.Album.parseFrom(binaryAlbum);
        final List <String> copiedArtists = copiedAlbumProtos.getArtistList();
        final List <String> copiedSongsTitles = copiedAlbumProtos.getSongTitleList();
        album = new Album.Builder(
                copiedAlbumProtos.getTitle(), copiedAlbumProtos.getReleaseYear())
            .artists(copiedArtists)
            .songsTitles(copiedSongsTitles)
            .build();
    } catch (InvalidProtocolBufferException ipbe) {
        out.println("ERROR: Unable to instantiate AlbumProtos.Album instance from provided binary data - " +
            ipbe);
    }
    return album;
}

As you noticed, when calling a static method parseFrom(byte []) a checkable exception can be thrown InvalidProtocolBufferException… To get a “deserialized” instance of the generated class, in fact, you only need one line, and the rest of the code is to create the original Album class from the received data.

Demo class includes two lines that output the contents of the original Album instance and the binary derived instance. They have a method call System.identityHashCode () on both instances to show that they are different objects even if their contents are the same. If this code is executed with an example Albumabove, the result will be:

BEFORE Album (1323165413): 'Songs from the Big Chair' (1985) by [Tears For Fears] features songs [Shout, The Working Hour, Everybody Wants to Rule the World, Mothers Talk, I Believe, Broken, Head Over Heels, Listen]

AFTER Album (1880587981): 'Songs from the Big Chair' (1985) by [Tears For Fears] features songs [Shout, The Working Hour, Everybody Wants to Rule the World, Mothers Talk, I Believe, Broken, Head Over Heels, Listen]

Here we see that in both instances the corresponding fields are the same and the two instances are really different. There is indeed a little more work to be done with Protocol Buffers than with “almost automatic” Java serialization enginewhen you just need to inherit from the interface Serializable, but there are important advantages that justify the cost. In the third edition of Effective Java, Joshua Bloch discusses the security vulnerabilities associated with standard deserialization in Java and states that “There is no reason to use Java serialization on any new system you write“.


Learn more about the course “Java Developer. Professional”.

Watch an open webinar on the topic “gRPC for microservices or not a single REST”

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *