Minecraft protocol VarInt and VarLong. How to make a number out of ones and zeros using Go as an example?

In this article, I want to explain on my fingers how numbers are converted from bytes to the desired data types (VarInt, VarLong). Let’s take a closer look at the implementation, examples, and write unit tests. Let’s remember binary operations, binary and hexadecimal number system.

background

I have deployed my Minecraft server more than once using different cores: vanilla, Spigot, Sponge Forge, Paper, Velocity(although this is a proxy, but still) – they are all written in Java. At one point, I wondered if it was possible to write my own implementation of the Minecraft server from scratch (from scratch). The answer to my question lies in this link. When I clicked on it, I saw the heading “Before You Get Started“, which has three small theses:

  • “Make sure you don’t want to fork, or join an existing fork” is an interesting suggestion, but I’d like to look at forks as an implementation example.

  • “Think about why you want to do this?” – I want to go deeper into the logic of Minecraft’s client-server interaction.

  • “Choose a language with good networking, such as Java, C# or Python” – Ok, I’ll try to write in Goit just has excellent built-in tools for such tasks.

As you already understood, I chose Go, in particular, because it is now my main language in which I do my work, I like Go because it is simple, fast, and it will eat much less memory than Java, which means it can become a good option for writing your own server. I want to try!

Implementation

The interaction between the Minecraft client and server occurs by establishing a connection via the TCP protocol, through which both sides begin to exchange data packets. The package structure is also described in the documentation (link).

To extract information from packets, you need to learn how to decode them. We abstract from unnecessary information and concentrate
on two types of data VarInt and VarLong. The documentation for them is given a lot of attention and even describes examples of how to properly produce decode and encode.

These are very similar to Protocol Buffer Varints.
(translation) These two types are very similar to Protocol Buffer Varints.

The documentation says that the least significant 7 bits are used to encode the value, and the most significant 8 (eighth) to determine whether there will be more subsequent bytes of the number after the current one. There are also conditions:

Let’s start implementing! Let’s write the code in which we define our two types

// Package packet назвал его именно так, 
// потому что типы данным принадлежат пакетам.
// Сейчас мы находимя в файле data_types.go
package packet

const (
  // MaxVarIntLen максимальное кол-во байтов для VarInt.
  MaxVarIntLen  = 5
  // MaxVarLongLen максимальное кол-во байтов для VarLong.
  MaxVarLongLen = 10
)

type (
  // VarInt обычный int (-2147483648 -- 2147483647)
	VarInt  int
  // VarLong обычный int64 (-9223372036854775808 -- 9223372036854775807)
	VarLong int64
)

We will need to read bytes from some buffer, so we will implement the interface io.ReaderFrom. Let’s define new methods for our types.

func (v *VarInt) ReadFrom(r io.Reader) (n int64, err error) {
    return 0, nil
}

func (v *VarLong) ReadFrom(r io.Reader) (n int64, err error) {
    return 0, nil
}

Let’s take a look at the implementation. On the Internet, I found a library https://github.com/tnze/go-mc. According to the developers, it has already implemented the implementation of the protocol for the Minecraft server, packet parsing, etc. My code will be very similar to the implementation https://github.com/Tnze/go-mc/blob/master/net/packet/types.go#L265but I want to understand why this algorithm is correct and how in general from the sequence [101010001] numbers are obtained.

The interface implementation itself io.ReaderFrom for VarInt and VarLong is no different, except perhaps by using different constants for the maximum possible length of a sequence of bytes.

First, let’s define two constants, an error if the number of bytes exceeds the maximum allowed data type parsing, and a little function that reads one bytes from buffer:

const (
    // segmentBits 7 бит 01111111 = int(127)
    segmentBits byte = 0x7F
    // continueBit 8 бит 11111111 = int(128)
    continueBit 8 byte = 0x80
)

// ErrTooBig ошибка, если какой-то тип превышает максимально доступное кол-во байтов
var ErrTooBig = errors.New("too big")

// readByte считывает только ОДИН байт из буфера и возвращает его.
func readByte(r io.Reader) (byte, error) {
    if r, ok := r.(io.ByteReader); ok {
				return r.ReadByte()
		}

		var b [1]byte
		if _, err := io.ReadFull(r, b[:]); err != nil {
				return 0, errors.Wrap(err, "failed to perform read full")
		}

		return b[0], nil
}

Implementing the interface io.ReaderFrom for VarInt.

// ReadFrom ...
func (v *VarInt) ReadFrom(r io.Reader) (n int64, err error) {
    var val uint32

    for sec := continueBit; sec&continueBit != 0; n++ {
        if n > VarIntMaxLen {
          	return 0, ErrTooBig
        }

        sec, err = readByte(r)
        if err != nil {
          	return 0, errors.Wrap(err, "failed to read a byte")
        }

        val |= uint32(sec&segmentBits) << uint32(7*n)
    }

    *v = VarInt(val)

    return n, nil
}

Let’s take a closer look at this code and see how numbers are obtained from a sequence of bytes consisting of zeros and ones.
Consider the example of a number 25565

Number 25565 in our buffer can be represented in the following ways:

  • AT hexadecimal system = [0xdd, 0xc7, 0x01]

  • AT decimal system = [221, 199, 1]

  • AT binary system = [11011101, 11000111, 00000001]

In the buffer we have[..., 11011101, 11000111, 00000001, ...].
This sequence can be of different lengths, for example, we will discard the front part and assume that we have already read up to our first byte,
and our pointer is on it –11011101.

The variable in which we will write the result: var val uint32
Let’s consider each iteration in detail, there will be only them here 3.

Iteration #1

Reading a byte sec, err = readByte(r),
as a result in sec lies the value 11011101

V |= uint32(sec&segmentBits) << uint32(7*n) would be equivalent
0 |= uint32(11011101 & 01111111) << uint32(7*0)

11011101
&
01111111
--------
01011101

As a result, we get the number 01011101 = uint32(93).
Next, we perform a logical shift to the left by 7*0, i.e. on the 0 bits.

01011101 << 0 = 01011101

We calculate the value of val, everything is simple here:

val |= 01011101

00000000
|
01011101
--------
01011101

By the end of the first iteration, the value val = uint32(93).

Iteration #2

Reading a byte sec, err = readByte(r),
as a result in sec lies the value 11000111.

V |= uint32(sec&segmentBits) << uint32(7*n) would be equivalent
93 |= uint32(11000111 & 01111111) << uint32(7*1)

11000111
&
01111111
--------
01000111

As a result, we get the number 01000111 = uint32(71).
Next, we perform a logical shift to the left by 7*1, i.e. on the 7 bits.

01000111 << 7 = 010111010000000

Calculate the value of val:

val |= 010111010000000
000000001011101
|
010111010000000
---------------
010001111011101

By the end of the second iteration, the value val = uint32(9181).

Iteration #3

Reading a byte sec, err = readByte(r),
as a result in sec lies the value 00000001.

V |= uint32(sec&segmentBits) << uint32(7*n) would be equivalent
9181 |= uint32(00000001& 01111111) << uint32(7*2)

00000001
&
01111111
--------
00000001

As a result, we get the number 00000001 = uint32(1).
Next, we perform a logical shift to the left by 7*2, i.e. on the 14 bits.

00000001 << 7 = 0000000100000000000000

Calculate the value val:

val |= 0000000100000000000000
0000000010001111011101
|
0000000100000000000000
----------------------
0000000110001111011101

By the end of the second iteration, the value val = uint32(25565).
Why is this iteration the last, because the expression sec&continueBit = 0

00000001 - sec
&
10000000 - continueBit
--------
00000000

Implementation for VarLong

As I mentioned earlier, the implementation for this type is almost the same:

// ReadFrom ...
func (v *VarLong) ReadFrom(r io.Reader) (n int64, err error) {
    var val uint64
    for sec := continueBit; sec&continueBit != 0; n++ {
        if n > VarLongMaxLen {
          	return 0, ErrTooBig
        }

        sec, err = readByte(r)
        if err != nil {
          	return 0, errors.Wrap(err, "failed to read a byte")
        }

        val |= uint64(sec&segmentBits) << uint64(7*n)
    }

    *v = VarLong(val)

    return n, nil
}

Negative numbers

How are negative numbers obtained? It is quite problematic to paint such a bulky piece, but briefly using the example of a number -1:
in the last iteration it turns out 11111111 11111111 11111111 11111111 . This number is in binary and in the format uint32 = 4294967295 . Next, we already cast to the desired type int32as a result we get the number -one.

Unit tests

Where without them, we will cover only those places that are of interest to us, namely, we will miss the cases of returning errors. Test cases were taken from the examples described in documentation.

package packet

import (
	"bytes"
	"github.com/stretchr/testify/assert"
	"strconv"
	"testing"
)

func TestVarInt_ReadFrom(t *testing.T) {
    a := assert.New

    type testCase struct {
        Bytes    []byte
        Expected VarInt
    }

    testCases := []testCase{
        {
          Bytes:    []byte{0x00},
          Expected: 0,
        },
        {
          Bytes:    []byte{0x01},
          Expected: 1,
        },
        {
          Bytes:    []byte{0x10},
          Expected: 16,
        },
        {
          Bytes:    []byte{0x7f},
          Expected: 127,
        },
        {
          Bytes:    []byte{0xac, 0x02},
          Expected: 300,
        },
        {
          Bytes:    []byte{0xdd, 0xc7, 0x01},
          Expected: 25565,
        },
        {
          Bytes:    []byte{0xff, 0xff, 0xff, 0xff, 0x07},
          Expected: 2147483647,
        },
        {
          Bytes:    []byte{0x80, 0x80, 0x80, 0x80, 0x08},
          Expected: -2147483648,
        },
        {
          Bytes:    []byte{0xff, 0xff, 0xff, 0xff, 0x0f},
          Expected: -1,
        },
    }

    for _, tc := range testCases {
        t.Run(strconv.FormatInt(int64(tc.Expected), 10), func(t *testing.T) {
          var varInt VarInt
          n, err := varInt.ReadFrom(bytes.NewReader(tc.Bytes))

          // No error should be here.
          a.NoError(err)

          // Length of the VarInt must be equal to the bytes size.
          a.EqualValues(len(tc.Bytes), n)

          // Asserting to the expected VarInt value.
          a.EqualValues(tc.Expected, varInt)
        })
    }
}
Unit test result for VarInt
Unit test result for VarInt

func TestVarLong_ReadFrom(t *testing.T) {
    a := assert.New

    type testCase struct {
      Bytes    []byte
      Expected VarLong
    }

    testCases := []testCase{
        {
          Bytes:    []byte{0x00},
          Expected: 0,
        },
        {
          Bytes:    []byte{0x01},
          Expected: 1,
        },
        {
          Bytes:    []byte{0x10},
          Expected: 16,
        },
        {
          Bytes:    []byte{0x7f},
          Expected: 127,
        },
        {
          Bytes:    []byte{0xac, 0x02},
          Expected: 300,
        },
        {
          Bytes:    []byte{0xdd, 0xc7, 0x01},
          Expected: 25565,
        },
        {
          Bytes:    []byte{0xff, 0xff, 0xff, 0xff, 0x07},
          Expected: 2147483647,
        },
        {
          Bytes:    []byte{0x80, 0x80, 0x80, 0x80, 0xf8, 0xff, 0xff, 0xff, 0xff, 0x01},
          Expected: -2147483648,
        },
        {
          Bytes:    []byte{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x01},
          Expected: -1,
        },
        {
          Bytes:    []byte{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f},
          Expected: 9223372036854775807,
        },
        {
          Bytes:    []byte{0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x01},
          Expected: -9223372036854775808,
        },
    }

    for _, tc := range testCases {
        t.Run(strconv.FormatInt(int64(tc.Expected), 10), func(t *testing.T) {
          var varLong VarLong
          n, err := varLong.ReadFrom(bytes.NewReader(tc.Bytes))

          // No error should be here.
          a.NoError(err)

          // Length of the VarInt must be equal to the bytes size.
          a.EqualValues(len(tc.Bytes), n)

          // Asserting to the expected VarInt value.
          a.EqualValues(tc.Expected, varLong)
        })
    }
}
Unit test results for VarLong
Unit test results for VarLong

Conclusion

This article turned out to be quite long and perhaps difficult to understand, but I tried to state everything in as much detail as possible. Thinking about writing second part about two types of data, in which I will analyze in detail how VarInt and VarLong converted back to a sequence of bytes. They will be very happy to read your opinion, answer your questions in the comments!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *