Tracing in Go is easy

In the era of fast-growing services, it is important to be able to monitor the state of the system at any time. One of the tools to achieve this are logs and metrics that help us keep track of many parameters such as requests per second (RPS), memory consumption, percentage of calls cached and so on. In other words, logs and metrics add to our system such an important characteristic as observability (observability)

observability allows us to easily fix bugs and solve new problems by answering the question “Why is this happening?“.

Imagine that after months of work, the average response time of the service has increased by 2 times, which we learned from the metrics. You can immediately assume that the problem is on the database side, but in complex projects with a large number of integrated technologies, the reason is no longer so obvious. What if several parts of our service started to slow down? In such situations, the total query execution time does not indicate to us either the problem area or the direction where we need to dig. Another tool comes to the rescue called tracing (tracing).

In this article, we will look at the basics of integrating tracing into applications written in Go language with the help of tools opentelemetry and Jaeger. Most of it is taken from the official documentation, so you can always explore this topic in more detail.

opentelemetry

opentelemetry (OTel) is a standard for collecting data on the operation of applications, which consists of ideas from already existing implementations, such as OpenCensus and OpenTracing. The main task is to provide a set of standardized SDKs, APIs and tools for preparing and sending data to services (back-end observability) to monitor systems.

OTel includes libraries for many popular languages ​​(C++, Java, Python, .NET, PHP, Rust full list), as well as ready-made community tools for tracing popular libraries and frameworks. Before we start writing code, I suggest that you familiarize yourself with the main components in OpenTelemetry that we will use in the future. The first important component of OpenTelemetry is span.

Span

Span is a unit of work (execution) of some operation. It keeps track of specific events (events) that the request is performing, painting a picture of what happened during the time during which this operation was performed. It contains a name, time data, structured messages, and other metadata (attributes) to provide information about the monitored operation.

An example of the information that the http request span contains:

Metadata contains information about the request, process + name and version of the library.
Metadata contains information about the request, process + name and version of the library.

They are divided into two types: Parent (root) span and child span. Child spans are created by nested function calls that are inherited from a previously created parent span. In the implementation for the Go language, they are passed via context.Context.

The rule is simple: if there is already a span in the passed context, then we create a child from it. Otherwise, we create the parent (root) span.

Each span has span_id and trace_id. It is important to know that from the beginning of the creation of the parent span to the end of the child spans trace_id does not changewhile span_id for each span unique. Such properties allow nesting and branching of multiple calls to be combined:

For example, in a microservice architecture, we can pass trace_id between services to get the complete call tree at the end. Similarly, by sending the same identifier from the client, you can get data on all its calls for analytics (and not only).

Example of tracing client requests with one trace_id
Example of tracing client requests with one trace_id

A status can have one of two statuses: error and Ok . As the name suggests, status error indicates that an error occurred during the execution of operations, and Ok vice versa – all operations were completed successfully

Jaeger

Earlier it was said that OpenTelemetry only prepares (does not store!) and sends data to back-end observability. One example of such a backend is Jaeger.

At the time of writing, the latest version is 1.41.0. Learn more about installation in documentation

Installation

Docker
docker run -d --name jaeger \
  -p 16686:16686 \
  -p 14268:14268 \
  jaegertracing/all-in-one:1.41
docker-compose
version: '3.8'

services:
  jaeger:
    image: jaegertracing/all-in-one
    ports:
      - "14268:14268"
      - "16686:16686"
docker compose up -d
Local installation

Download and run the finished binary https://www.jaegertracing.io/download/

When opening http://localhost:16686 you should see something like this:

Writing an application

For example, I wrote a small service that will save and return notes from Redis.

Structure:

Structure
├── storage
│   └── redis.go
├── server
│   └── http.go
├── models
│   └── note.go
├── main.go
├── go.sum
├── go.mod
├── docker-compose.yaml
└── cmd
    └── main.go
models/note.go
package models

import (
	"errors"
	"time"
)
import "github.com/google/uuid"

var ErrNotFound = errors.New("note not found")

type Note struct {
	NoteID  uuid.UUID `json:"note_id"`
	Title   string    `json:"title"`
	Content string    `json:"content"`
	Created time.Time `json:"created"`
}
storage/redis.go
package storage

import (
	"context"
	"encoding/json"
	"fmt"
	"github.com/go-redis/redis/v9"
	"github.com/google/uuid"
	"trace-example/models"
)

type NotesStorage struct {
	client redis.UniversalClient
}

func NewNotesStorage(client redis.UniversalClient) NotesStorage {
	return NotesStorage{client: client}
}

func (s NotesStorage) Store(ctx context.Context, note models.Note) error {
	data, err := json.Marshal(note)
	if err != nil {
		return fmt.Errorf("marshal note: %w", err)
	}

	if err = s.client.Set(ctx, note.NoteID.String(), data, -1).Err(); err != nil {
		return fmt.Errorf("redis set: %w", err)
	}

	return nil
}

func (s NotesStorage) Get(ctx context.Context, noteID uuid.UUID) (*models.Note, error) {
	data, err := s.client.Get(ctx, noteID.String()).Bytes()
	if err != nil {
		if err == redis.Nil {
			return nil, models.ErrNotFound
		}
		return nil, fmt.Errorf("redis get: %w", err)
	}

	var note models.Note
	if err = json.Unmarshal(data, &note); err != nil {
		return nil, fmt.Errorf("unmarshal note: %w", err)
	}

	return &note, nil
}
server/http.go
package server

import (
	"errors"
	"github.com/gofiber/fiber/v2"
	"github.com/google/uuid"
	"time"
	"trace-example/models"
	"trace-example/storage"
)

type FiberHandler struct {
	notesStorage storage.NotesStorage
}

func NewFiberHandler(notesStorage storage.NotesStorage) FiberHandler {
	return FiberHandler{notesStorage: notesStorage}
}

func (h FiberHandler) CreateNote(fiberctx *fiber.Ctx) error {
	input := struct {
		Title   string `json:"title"`
		Content string `json:"content"`
	}{}

	if err := fiberctx.BodyParser(&input); err != nil {
		return err
	}

	noteID := uuid.New()
	err := h.notesStorage.Store(fiberctx.UserContext(), models.Note{
		NoteID:  noteID,
		Title:   input.Title,
		Content: input.Content,
		Created: time.Now(),
	})
	if err != nil {
		return fiber.NewError(fiber.StatusInternalServerError, err.Error())
	}

	return fiberctx.JSON(map[string]any{
		"note_id": noteID,
	})
}

func (h FiberHandler) GetNote(fiberctx *fiber.Ctx) error {
	noteID, err := uuid.Parse(fiberctx.Query("note_id"))
	if err != nil {
		return fiber.NewError(fiber.StatusBadRequest, err.Error())
	}

	note, err := h.notesStorage.Get(fiberctx.UserContext(), noteID)
	if err != nil {
		if errors.Is(err, models.ErrNotFound) {
			return fiber.NewError(fiber.StatusNotFound, err.Error())
		}
		return fiber.NewError(fiber.StatusInternalServerError, err.Error())
	}

	return fiberctx.JSON(note)
}
cmd/main.go
package main

import (
	"context"
	"github.com/go-redis/redis/v9"
	"github.com/gofiber/fiber/v2"
	"log"
	"trace-example/server"
	"trace-example/storage"
)

func main() {
	app := fiber.New()

	// Подключаемся к Redis
	client := redis.NewClient(&redis.Options{
		Addr: "localhost:6379",
	})
	if err := client.Ping(context.TODO()).Err(); err != nil {
		log.Fatal("create redis client", err)
	}

	// Настраиваем роутер
	handler := server.NewFiberHandler(storage.NewNotesStorage(client))
	app.Post("/create", handler.CreateNote)
	app.Get("/get", handler.GetNote)

	log.Fatal(app.Listen(":8080"))
}
docker-compose.yaml
version: '3.8'

services:

  jaeger:
    image: jaegertracing/all-in-one
    ports:
      - "14268:14268"
      - "16686:16686"


  redis:
    restart: on-failure

    image: "redis:latest"

    command: redis-server --port 6380

    ports:
      - "6379:6379"

    environment:
      REDIS_REPLICATION_MODE: master

    volumes:
      - redis-data:/var/lib/redis

volumes:
  redis-data:

Raise containers with Redis and Jaeger

docker compose up -d

And we start our service

go run cmd/main.go

After the start, we can add a note:

And get it back:

curl requests

Adding a note

curl --location --request POST 'localhost:8080/create' \
--header 'Content-Type: application/json' \
--data-raw '{
    "title":"Something interesting",
    "content": "Lorem ipsum..."
}'

Getting a note

curl --location --request GET 'localhost:8080/get?note_id=7411ff79-fd1d-46ab-b9f8-21105cd770ce'

Let’s start integrating OpenTelemetry by creating exporter which sends (exports) data to back-end observability and has a simple interface

type SpanExporter interface {
    ExportSpans(ctx context.Context, spans []ReadOnlySpan) error
    Shutdown(ctx context.Context) error
}

Here we see another type of spans – ReadOnlySpan. This is not a separate type, but just an interface for getting information about the span.

Now let’s create exporter for Jaeger:

package tracer

import (
	"go.opentelemetry.io/otel/exporters/jaeger"
	tracesdk "go.opentelemetry.io/otel/sdk/trace"
)

// NewJaegerExporter creates new jaeger exporter
//
//	url example - http://localhost:14268/api/traces
func NewJaegerExporter(url string) (tracesdk.SpanExporter, error) {
	return jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint(url)))
}

Now let’s create provider (provider), which will create tracers

package tracer

import (
	"go.opentelemetry.io/otel/exporters/jaeger"
	"go.opentelemetry.io/otel/sdk/resource"
	tracesdk "go.opentelemetry.io/otel/sdk/trace"
	semconv "go.opentelemetry.io/otel/semconv/v1.12.0"
)

func NewTraceProvider(exp tracesdk.SpanExporter, ServiceName string) (*tracesdk.TracerProvider, error) {
	// Ensure default SDK resources and the required service name are set.
	r, err := resource.Merge(
		resource.Default(),
		resource.NewWithAttributes(
			semconv.SchemaURL,
			semconv.ServiceNameKey.String(ServiceName),
		),
	)
	if err != nil {
		return nil, err
	}

	return tracesdk.NewTracerProvider(
		tracesdk.WithBatcher(exp),
		tracesdk.WithResource(r),
	), nil
}

Let’s analyze the code:

return tracesdk.NewTracerProvider(
		tracesdk.WithBatcher(exp),
		tracesdk.WithResource(r),
), nil

When creating a provider, we add options:

  • With Batcher – sends data to the exporter in batches, increasing performance

  • WithResource – adds data (resources) to each trace

resource.Merge(
	resource.Default(),
	resource.NewWithAttributes(
		semconv.SchemaURL,
		semconv.ServiceNameKey.String(ServiceName),
	),
)

resource.Merge combines multiple resources. Together with the standard ones, we add the version of the library and the name of the service to add the ability to filter the traces of each service.

Now it remains only to create a tracer:

package tracer

import (
	"fmt"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/exporters/jaeger"
	"go.opentelemetry.io/otel/sdk/resource"
	tracesdk "go.opentelemetry.io/otel/sdk/trace"
	semconv "go.opentelemetry.io/otel/semconv/v1.12.0"
	"go.opentelemetry.io/otel/trace"
)

func InitTracer(jaegerURL string, serviceName string) (trace.Tracer, error) {
	exporter, err := NewJaegerExporter(jaegerURL)
	if err != nil {
		return nil, fmt.Errorf("initialize exporter: %w", err)
	}

	tp, err := NewTraceProvider(exporter, serviceName)
	if err != nil {
		return nil, fmt.Errorf("initialize provider: %w", err)
	}

	otel.SetTracerProvider(tp) // !!!!!!!!!!!

	return tp.Tracer("main tracer"), nil
}

otel.SetTracerProvider(tp) is a very important line. It defines a global provider so that other instrumented libraries can start their traces.

Add a call to main.go InitTracer :

func main() {
tracer, err := trace.InitTracer("http://localhost:14268/api/traces", "Note Service")
if err != nil {
	log.Fatal("init tracer", err)
}
........

We are looking for ready-made tools:

Before you write your bike, try to find it ready on the Internet

In total, there are three ways to search for ready-made tools

  1. View in https://github.com/open-telemetry/opentelemetry-go-contrib/tree/main/instrumentation

  2. Find the package in the repository of the required library otelXxx (Xxx is the name of the library)

  3. To Google 🙂 golang название_библиотеки OpenTelemetry

For example, let’s find a toolkit for github.com/gofiber/fiber:

We see that the third-party repository is archived, but its code has been added to official

Many libraries do not have readme.md file, so it’s always best to look in example folder. Finding a string app.Use(otelfiber.Middleware("my-server")) . Let’s add it to our code!

import "github.com/gofiber/contrib/otelfiber"

func main() {
    _, err := trace.InitTracer("http://localhost:14268/api/traces", "Note Service")
    .......
	app := fiber.New()
	app.Use(otelfiber.Middleware("my-server")) // <-------
    .......

After the restart, we will execute a few queries and return to Jaeger:

Our service is visible in the list!
Our service is visible in the list!
Click Find Traces!
Click Find Traces!

After clicking on Find Traces, we will see a list of our traces:

To go to the trace we need, just click on its name.

How it works?

Earlier, we created a middleware for the fiber framework that intercepts an incoming request, creates a parent span, and passes it in the context. Note that without the line otel.SetTracerProvider this trace will not be recorded anywhere.

In the span’s metadata, he included request and response data that might be important when fixing problems.

Now let’s add tracing for radish: using the second search method, we will find out that the required package is already in the repository go-redishttps://github.com/go-redis/redis/tree/master/extra/redisotel. Let’s add it to the code:

import "github.com/go-redis/redis/extra/redisotel/v9"

func main() {
    .........
	// Подключаемся к Redis
	client := redis.NewClient(&redis.Options{
		Addr: "localhost:6379",
	})
	if err := client.Ping(context.TODO()).Err(); err != nil {
		log.Fatal("create redis client", err)
	}
	if err := redisotel.InstrumentTracing(client); err != nil { // <------
		log.Fatal("enable instrument tracing", err)
	}
    .........

Let’s run the queries again and look at the traces. Don’t forget to press the button Find Traces again (or reload the page)

Now we can see not only fiber data, but also queries in the database!

How it works?

redisotel adds a hook for our client to create another span with the required data when calling any operation.

In our example, we only looked at two libraries, but you can find several hundred for popular libraries. Examples:


Manual tracing

It would seem that with such tools we are already increasing the observability our service. But often there are situations when you need to create your own spans with more detailed information. Let’s take a look at the tracer interface:

type Tracer interface {
    Start(ctx context.Context, spanName string, opts ...SpanStartOption) (context.Context, Span)
}

Let’s add it to the structure FiberHandlers

type FiberHandler struct {
	notesStorage storage.NotesStorage
	tracer       trace.Tracer // <-------
}

func NewFiberHandler(notesStorage storage.NotesStorage, tracer trace.Tracer) FiberHandler {
	return FiberHandler{notesStorage: notesStorage, tracer: tracer}
}

Now let’s create our span:

ctx, span := h.tracer.Start(fiberctx.UserContext(), "GetNote")
defer span.End()
....
note, err := h.notesStorage.Get(ctx, noteID)
....

Important points:

  • In all subsequent functions, you need to pass the context obtained from tracer.Start

  • defer span.End() ends the span. Span time = call time span.End()call time tracer.Start()

To add your information to the span, you need to use the function trace.WithAttributes()

ctx, span := h.tracer.Start(fiberctx.UserContext(), "GetNote", trace.WithAttributes(
	attribute.String("MyKey", "MyValue")),
)

Now let’s look at the new trace:

Events:

Often our functions make multiple calls to others. To measure the time of each, it is possible to divide the span into events using span.AddEvent(). You can also set attributes in the parameters:

func (h FiberHandler) GetNote(fiberctx *fiber.Ctx) error {
	ctx, span := h.tracer.Start(fiberctx.UserContext(), "GetNote")
	defer span.End()

	span.AddEvent("parse note_id") // <------
	noteID, err := uuid.Parse(fiberctx.Query("note_id"))
	if err != nil {
		return fiber.NewError(fiber.StatusBadRequest, err.Error())
	}

	span.AddEvent("call redis storage", trace.WithAttributes( // <----
      
		attribute.String("MyEventKey", "MyEventValue")),
	)
	note, err := h.notesStorage.Get(ctx, noteID)
	if err != nil {
		if errors.Is(err, models.ErrNotFound) {
			return fiber.NewError(fiber.StatusNotFound, err.Error())
		}
		return fiber.NewError(fiber.StatusInternalServerError, err.Error())
	}

	span.AddEvent("write note in json") // <------
	return fiberctx.JSON(note)
}

If we open the details of the new span, we will see the following

The numbers here are not execution time, but the amount of time elapsed since the beginning of the span (in other words, offset). For example, real query execution time in redis = sleep time – previous shift = 914 – 43 = 871 microseconds.

Spawn states and errors

Stop Redis and try to execute the query:

$ docker ps --format '{{.ID}} {{.Names}}'
3d69134be67f trace-example-redis-1
830c81a3b071 trace-example-jaeger-1
$ docker stop 3d69134be67f
3d69134be67f
Trace got errors
Trace got errors

From the trace, you can understand that the radish client tried to restore the connection 4 times, but returned an error at the end. We can see the details in the parent span:

But why do we see the type *fiber.Error? Previously connected middleware from otelfiber checks for an error in our handler. If err != nil, then it writes the error to the span using the method span.RecordError(err) . Let’s add this line to our code:

import "go.opentelemetry.io/otel/codes"
....
    err := h.notesStorage.Store(ctx, ......)
	if err != nil {
		span.RecordError(err)
		span.SetStatus(codes.Error, err.Error())
      
		return fiber.NewError(fiber.StatusInternalServerError, err.Error())
	}
....

Pay attention to the extra challenge span.SetStatus . From the RecordError description:

An additional call to SetStatus is required if the Status of the Span should be set to Error, as this method does not change the Span status

Earlier it was said that a span has one of two statuses: Ok and Error . In the code above, we additionally set this status (although Jeager considers the trace with an error even without it).

You can also add attributes to the error record, as in the span and its events:

span.RecordError(err, trace.WithAttributes(
    attribute.String("SomeErrorInfo", "FATAL!!!!")),
)

Now the error will be recorded not only through the toolkit, but also in our code. If you raise all errors to the top, then call span.RecordError only possible at the transport or service level. Otherwise (if errors are logged in place), adding an entry at each stage can be useful.

Conclusion

The code: https://github.com/illiafox/tracing-example

In this article I wanted to show that a terrible word tracing – only half an hour of reading the documentation. By adding just a few lines to the clients of the products used, we can easily track any execution with the smallest details, and with manual spans, in general, find out all the necessary information without getting into the code.

A lot of information was taken from the official documentation:

In the next article, we will look at tracing between microservices, which (spoiler) is also implemented in one line.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *