Golang RPC and everything…

Disclaimer: this is not another one gRPC hate article… Oh, whait…

Let’s start from afar – you know, it has always been interesting why, in fact, there is such a wide variety of libraries for golang, for some commonly used entities, somehow – http routers (fasthttprouter forgot, as suggested in the comments) or cache?

With the choice of RPC, everything seems to be simple, gRPC is our everything (by the way, you know that g here is not Google suddenly). But it was not there…

Everything is just crazy Mary gRPC (no).

Let’s start with the fact that golang originally implemented net/rpc with its gob serializer. Type there is a need – in golang there is a solution out of the box (the same story as with the http router – it is, but everyone uses third-party solutions due to parameterized request paths). And here is an ambush – this rpc can only be used between golang applications. Then they rolled out gRPC and wrapped everything up… In short – gRPC uses http/2 and protobuf for serialization (remember, rpc is an exchange protocol plus a serializer). Moreover, gRPC implementation is available for many languages, in fact there is no binding on what to write the client and server parts. So far, so good…

However, not everything is so smooth… It is understandable that Google strives to embrace all possible cases, but! Over time, a lot of questions arose about the original implementation of gRPC. How else to explain that a bunch of firms started to cut their own RPC implementations (and / or serializers)? Also, suddenly, it turned out that the requirements for RPC inside the cloud (read between microservices) and RPC between clients outside the cloud / data center and services inside it (behind the ingress / proxy / load balancer – whatever you want to call it) are kind of “a little” different? And the choice of http / 2 as a transport – well, who knew that the implementation would go wrong (quickly) as expected.

Let’s start with serializers, the generally accepted favorite is gogo/protobuf (fork golang/protobuf), generates faster serialization code through memory reuse and reflection/pointer rejection, as well as other optimizations, but wait – aka Deprecated (and now looking for new ownership)? And this is because after the transition of Google to protobuf API v2, the gogo developers chose to abandon the project (which is unfortunate) than to rewrite its code almost entirely. Although here is an example of how they switched from gogo to API v2 – Things Learned From Trying to Migrate To Protobuf V2 API from gogoprotobuf (So Far).

But there are still enthusiasts – check it out vtprotobuf. Guys from Vitesse got confused, and still wrote their own serializer under the protobuf API v2, see the reasons and numbers in the article A new Protocol Buffers generator for Go.

By the way – not a single protobuf, as they say, for example, the same Google once muddied flatbuffers serializer. Interestingly, gRPC actually supports custom serializers, not just protobuf out of the box. Here is an example project dgraph (who started with net/rpc with flatbuffers instead of gob), and then switched to gRPC, but also with flatbuffers – Custom encoding: Go implementation in net/rpc vs grpc and why we switched.

In general, as mentioned earlier – there are 100500 different implementations of individual entities (probably, this is still not a golang-specific problem), here github turnipwhich compares the performance of all (probably) existing golang serializers, although the results there are rather strange as of now (gob is slower than JSON – how is it in general?), when compared over the years:

2022/09/05 – Go 1.16.5 linux/amd64 i7-3630QM

benchmark

iter

time/iter

bytes/op

allocs/op

Json_Marshal-8

189709

6090

151

208

Json_Unmarshal-8

92833

12751

151

383

Gob_Marshal-8

71692

16463

163

1616

Gob_Unmarshal-8

14772

84385

163

7688

Goprotobuf_Marshal-8

1405010

854

53

64

Goprotobuf_Unmarshal-8

973688

1255

53

168

Gogoprotobuf_Marshal-8

3359550

354

53

64

Gogoprotobuf_Unmarshal-8

1908633

619

53

96

Musgo_Marshal-8

4294477

280

46

48

Musgo_Unmarshal-8

2498404

480

46

96

2021/06/21 – Go 1.16.5 linux/amd64 i7-3630QM

benchmark

iter

time/iter

bytes/op

allocs/op

Json_Marshal-8

501478

2538

151

208

Json_Unmarshal-8

226456

5023

151

383

Gob_Marshal-8

1320562

882

63

40

Gob_Unmarshal-8

1000000

1041

63

112

Goprotobuf_Marshal-8

3247056

378

53

64

Goprotobuf_Unmarshal-8

1839267

651

53

168

Gogoprotobuf_Marshal-8

5886194

204

53

64

Gogoprotobuf_Unmarshal-8

3464098

345

53

96

Musgo_Marshal-8

12882543

86

0

0

Musgo_Unmarshal-8

3381966

343

96

96

Elsewhere found more “relevant” results:

2022/03/19 Go 1.17.8 Darwin/arm64 Apple M1 Max

benchmark

iter

time/iter

bytes/op

allocs/op

Json_Marshal-8

1440837

822

148

208

Json_Unmarshal-8

653754

1817

148

399

Gob_Marshal-8

2750721

440

63

40

Gob_Unmarshal-8

2918254

410

63

112

Goprotobuf_Marshal-8

6831308

176

53

64

Goprotobuf_Unmarshal-8

5746256

210

53

168

Gogoprotobuf_Marshal-8

16528346

72

53

64

Gogoprotobuf_Unmarshal-8

12764978

94

53

96

Musgo_Marshal-8

22535546

53

48

0

Musgo_Unmarshal-8

12952696

90

48

96

In general, gogo is twice as fast as Google’s implementation. By the way, you can see in the table a certain musgo – shows itself very well (because codegen). Probably, it was worth inserting into the table a well-known msgpack – a project from the opensource community, which still does not take off as it should (but there seems to be some progress). For additional reading – Zoo at Golang MSA. Protobuf, MessagePack, Gob – what to choose?

Move on. More and more often, developers are wondering, why is golang gRPC so monstrous in terms of overhead on dependencies? And why under the hood it has its own implementation of the http/2 stack, and not the reuse of the “golang.org/x/net/http2” package (well, yes, types and configs from it are used, but no more). And in general – not everything is so smooth with forwarding http/2 through load balancers.

In order to solve the two mentioned problems – dependencies on the code (read, the constant war with bugs and breaking changes, which in Google, apparently, is a “normal” phenomenon) and support for http 1.1, in Twitch filed your framework Twirp (by the way, http/2 is also supported from golang standard library) – Twirp: a sweet new RPC framework for Go, it was also written about on Habr – Twirp vs gRPC. Is it worth it?

For the same reasons in Storj also developed their own alternative to gRPC – DRPCsee article Introducing DRPC: Our Replacement for gRPCand they considered Twirp as a possible solution, but it did not have the necessary feature – streaming (as in gRPC), which was also implemented in DRPC.

Wait a minute, so far all the talk has been about RPC between, relatively speaking, the cloud and PC/Mobile clients. And why such bells and whistles for the interaction of microservices? Why not plain TCP (or even UDP, they sometimes do this in network toys)? Oh yes – net / rpc is there (what else do you need, as if asking Google).

Need more performance and features! This is how the library first appeared. valyala/gorpcand then valyala/fastrpc from Alexander Valyalkin, author fast http (read about it here on Habr – Sins of performance optimization).

Upon closer examination, it turns out that in fact there are many RPC implementations (for example, rpcx, kitex, arpccomparing their performance with gRPC and net/rpc – 2022 Go Ecosystem rpc Framework Benchmark), but gRPC is known to everyone as a kind of “silver bullet”.

And about UDP based RPC – there is a project Hprose (High Performance Remote Object Service Engine) from Chinese comrades, it is supported for many languages, and for golang also has an implementation, and so – there is UDP support. Also, the above rpcx supports TCP, HTTP, QUIC (which is under the hood of UDP) and KCP (so to speak, the Chinese version of QUIC, also on UDP).

And finally, to the question of how gRPC works under the hood … It turns out that there is an easy way to speed it up. Here are some slowpacks writing in 2022, something The Mysterious Gotcha of gRPC Stream Performancewe have been using this trick in PROD for 4 years already: as you know, gRPC has simple calls and streaming ones, and so – if you make a pool of streams instead of a simple call, then everything works about twice as fast (with sequential or concurrent requests – it doesn’t matter ), abstract example:

api.proto

syntax = "proto3";

package pb

message Request {}
message Response {}

service Service {
  rpc Unary (Request) returns (Response);
  rpc Stream (stream Request) returns (stream Response);
}

server.go

func (s *grpcServer) Unary(ctx context.Context, req *pb.Request) (*pb.Response, error) {
	return &pb.ResponseDomain{}, nil
}

func (s *grpcServer) Stream(stream pb.Service_StreamServer) error {
	ctx := stream.Context()
	for {
		select {
		case <-ctx.Done():
			return ctx.Err()
		default:
		}

		req, err := stream.Recv()
		if err == io.EOF {
			break
		}

		if err != nil {
			return err
		}

		resp, _ := s.Unary(ctx, req)
		if err := stream.Send(resp); err != nil {
			return err
		}
	}
	return nil
}

client.go

func (c *grpcClient) Call(ctx context.Context, req *pb.Request) (*pb.Response, error) {
	if !c.streams {
		return c.client.Unary(ctx, req)
	}

	stream := c.getStreamFromPool()
	if stream == nil {
		return nil, fmt.Errorf("no stream")
	}

	if err := stream.Send(req); err != nil {
		stream, err = c.client.Stream(ctx)
		if err != nil {
			return nil, err
		}
	}

	defer c.putStreamToPool(stream)
	return stream.Recv()
}

But no one knows when it will break (although this is a hack, whatever one may say), otherwise such tweets would not be, I think.