Tracing in Go

Hello everyone, this is my first post on this platform, please love and favor it.

Tracing is an important tool for monitoring and diagnosing microservices. It allows you to understand how requests flow through the system, where bottlenecks occur, and how different components of the application interact. In this article, I will share my experience integrating tracing into a Go service using GORM.

1. Basics of Tracing with OpenTelemetry

OpenTelemetry is a popular platform for collecting, processing and exporting metrics, logs and traces. An example of OpenTelemetry setup:

exp, err := jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint(traceConfig.Url)))
if err != nil {
   ...
}

tp := tracesdk.NewTracerProvider(
    tracesdk.WithBatcher(exp),
    tracesdk.WithResource(resource.NewWithAttributes(
        semconv.SchemaURL,
        semconv.ServiceNameKey.String(traceConfig.ServiceName),
        attribute.String("environment", "production"),
        attribute.Int64("ID", 1),
    )),
)

2. Integration with GORM

To integrate tracing with GORM, it is important to intercept events before and after SQL queries are executed. This allows you to collect information about query execution time, the number of affected rows, and possible errors.

Example of a plugin for GORM:


func MiddleWareGormTrace() gorm.Plugin {
	return &GormTracing{}
}

type GormTracing struct {
}

func (g *GormTracing) Name() string {
	return ""
}

func (p *GormTracing) Initialize(db *gorm.DB) error {
	tracer := tracer.TraceClient

	if tracer == nil || !tracer.IsEnabled {
		return nil
	}
	db.Callback().Create().Before("gorm:before_create").Register("gormotel:before_create", p.before(tracer))
	db.Callback().Query().Before("gorm:before_query").Register("gormotel:before_query", p.before(tracer))
	db.Callback().Delete().Before("gorm:before_delete").Register("gormotel:before_delete", p.before(tracer))
	db.Callback().Update().Before("gorm:before_update").Register("gormotel:before_update", p.before(tracer))
	db.Callback().Row().Before("gorm:before_row").Register("gormotel:before_row", p.before(tracer))
	db.Callback().Raw().Before("gorm:before_raw").Register("gormotel:before_raw", p.before(tracer))
	db.Callback().Create().After("gorm:after_create").Register("gormotel:after_create", p.after)
	db.Callback().Query().After("gorm:after_query").Register("gormotel:after_query", p.after)
	db.Callback().Delete().After("gorm:after_delete").Register("gormotel:after_delete", p.after)
	db.Callback().Update().After("gorm:after_update").Register("gormotel:after_update", p.after)
	db.Callback().Row().After("gorm:after_row").Register("gormotel:after_row", p.after)
	db.Callback().Raw().After("gorm:after_raw").Register("gormotel:after_raw", p.after)

	return nil
}


func (p *PluginTrace) before(tracer *tracer.Tracer) func(*gorm.DB) {
    return func(db *gorm.DB) {
        ctx, span := tracer.CreateSpan(db.Statement.Context, "[DB]")
        db.InstanceSet("otel:span", span)
        db.Statement.Context = ctx
    }
}

func (p *PluginTrace) after(db *gorm.DB) {
    if spanVal, ok := db.InstanceGet("otel:span"); ok {
        if span, ok := spanVal.(trace.Span); ok {
            defer span.End()

            span.SetAttributes(
                attribute.String(span2.AttributeDBStatement, db.Statement.SQL.String()),
                attribute.String(span2.AttributeDBTable, db.Statement.Table),
                attribute.Int64(span2.AttributeDbRowsAffected, db.RowsAffected),
            )

            if db.Error != nil {
                span.RecordError(db.Error)
                span.SetStatus(trace2.StatusCodeError, db.Error.Error())
            }
        }
    }
}

Here is an example of how to implement it in GORM:

	dbClient, err := database.GetGormConnection(
		database.DbConfig{
			Driver:             database.MySql,
			Host:               app.dbConfig.Host,
			User:               app.dbConfig.User,
			Password:           app.dbConfig.Password,
			Db:                 app.dbConfig.Db,
			Port:               app.dbConfig.Port,
			SslMode:            false,
			Logging:            app.dbConfig.Logging,
			MaxOpenConnections: app.dbConfig.MaxOpenConnections,
			MaxIdleConnections: app.dbConfig.MaxIdleConnections,
		},
	)


	if err != nil {

		return err
	}
	dbClient.Use(gormtracing.MiddleWareGormTrace())

3. Processing HTTP requests

HTTP request tracing allows you to track the path of a request through all layers of the application. To do this, it is important to use middleware that will create a span for each incoming request and record important metadata. However, you should not use this middleware for all requests; in my bitter experience, there were services that recorded health checkers.

Example of middleware for Gin:

func (t *Tracer) MiddleWareTrace() gin.HandlerFunc {
    return func(c *gin.Context) {
        if t == nil || !t.cfg.IsTraceEnabled {
            c.Next()
            return
        }

        parentCtx, span := t.CreateSpan(c.Request.Context(), "["+c.Request.Method+"] "+c.FullPath())
        defer span.End()

        c.Request = c.Request.WithContext(parentCtx)
        c.Next()

        // Обработка ошибок для сервисов использующих sdk
        excep := c.Keys["exception"]
        switch v := excep.(type) {
        case *exception.AppException:
            span.SetAttributes(attribute.Int(span2.AttributeRespHttpCode, v.Code))
            if v.Error != nil {
                span.SetAttributes(attribute.String(span2.AttributeRespErrMsg, v.Error.Error()))
            }
        default:
            span.SetAttributes(attribute.Int(span2.AttributeRespHttpCode, c.Writer.Status()))
        }
    }
}

Here is an example of how to implement MW

	v1 := router.Group("/banner/v1")
	v1.Use(tracer.MiddleWareTrace())

Here is the full code for the tracing client:

I pulled this variable var TraceClient *Tracer into the global only because there is an implementation of HTTP-Builder, where I create my own span for each request. We have a layered architecture implemented in Go services, and we would have to pass this client to each layer to trace http requests to third-party services.

package tracer

import (
	"bytes"
	"context"
	"fmt"
	"github.com/gin-gonic/gin"
	"github.com/gookit/goutil/netutil/httpctype"
	"github.com/gookit/goutil/netutil/httpheader"
	"gitlab.almanit.kz/jmart/gosdk/pkg/config"
	"gitlab.almanit.kz/jmart/gosdk/pkg/exception"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/attribute"
	"go.opentelemetry.io/otel/exporters/jaeger"
	"go.opentelemetry.io/otel/propagation"
	"go.opentelemetry.io/otel/sdk/resource"
	tracesdk "go.opentelemetry.io/otel/sdk/trace"
	semconv "go.opentelemetry.io/otel/semconv/v1.7.0"
	trace2 "go.opentelemetry.io/otel/trace"
	"go.opentelemetry.io/otel/trace/noop"
	"io"
	"net/http"
	"strings"
)

var TraceClient *Tracer

const AttributeReqBody = "request.body"

const (
	AttributeRespHttpCode = "http.status_code"
	AttributeRespErrMsg   = "error.message"
)

type TraceConfig struct {
	IsTraceEnabled    bool   `mapstructure:"TRACE_IS_ENABLED"`
	Url               string `mapstructure:"TRACE_URL"`
	ServiceName       string `mapstructure:"TRACE_SERVICE_NAME"`
	IsHttpBodyEnabled bool   `mapstructure:"TRACE_IS_HTTP_BODY_ENABLED"`
}

type Tracer struct {
	tp          *tracesdk.TracerProvider
	cfg         *TraceConfig
	IsEnabled   bool
	ServiceName string
}

// InitTraceClient - создание клиента трассировки
func InitTraceClient() (*Tracer, error) {
	t := &Tracer{}
	// config init
	if err := t.initTraceConfig(); err != nil {
		return nil, err
	}

	if !t.cfg.IsTraceEnabled {
		return t, nil
	}

	// Create the Jaeger exporter
	exp, err := jaeger.New(
		jaeger.WithCollectorEndpoint(
			jaeger.WithEndpoint(t.cfg.Url),
		),
	)

	if err != nil {
		return nil, err
	}

	tp := tracesdk.NewTracerProvider(
		//tracesdk.WithSampler(),
		// Always be sure to batch in production.
		tracesdk.WithBatcher(exp),
		// Record information about this application in a Resource.
		tracesdk.WithResource(resource.NewWithAttributes(
			semconv.SchemaURL,
			semconv.ServiceNameKey.String(t.cfg.ServiceName),
			//attribute.String("environment", "development"),
			//attribute.Int64("ID", 1),
		)),
	)

	otel.SetTracerProvider(tp)
	otel.SetErrorHandler(otel.ErrorHandlerFunc(func(err error) {
		//	error handler
	}))

	t.tp = tp
	TraceClient = t

	return t, nil
}

// Shutdown -
func (t *Tracer) Shutdown(ctx context.Context) error {
	fmt.Println("shutdown")
	return t.tp.Shutdown(ctx)
}

// InjectHttpTraceId -  записывает  trace id  в запрос, требует  *http.Request
func (t *Tracer) InjectHttpTraceId(ctx context.Context, req *http.Request) {
	otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))
}

// MiddleWareTrace -  мидлвар который записывает трассировку
// при этом контекст спана записывается в 	c.Request . В хэндлере рекомендуется передавать ctx.Request.Context() в слой ниже, или другую функцию
func (t *Tracer) MiddleWareTrace() gin.HandlerFunc {
	return func(c *gin.Context) {
		if t == nil || !t.cfg.IsTraceEnabled {
			c.Next()

			return
		}

		parentCtx, span := t.CreateSpan(c.Request.Context(), "["+c.Request.Method+"] "+c.FullPath(), "middleware")
		defer span.End()

		// парсинг body
		if t.cfg.IsHttpBodyEnabled {
			// нет смысла копировать тело запроса при наличии файла
			if !strings.HasPrefix(c.GetHeader(httpheader.ContentType), httpctype.MIMEDataForm) {
				bodyBytes, _ := io.ReadAll(c.Request.Body)
				c.Request.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))

				span.SetAttributes(attribute.String(AttributeReqBody, string(bodyBytes)))
			}
		}

		c.Request = c.Request.WithContext(parentCtx)
		c.Next()

		// парсинг ошибок
		{
			excep := c.Keys["exception"]

			switch v := excep.(type) {
			case *exception.AppException:
				span.SetAttributes(attribute.Int(AttributeRespHttpCode, v.Code))
				if v.Error != nil {
					span.SetAttributes(attribute.String(AttributeRespErrMsg, v.Error.Error()))
				}
			default:
				span.SetAttributes(attribute.Int(AttributeRespHttpCode, c.Writer.Status()))
			}
		}
	}
}

// CreateSpan - Создает родительский спан,и возвращает контекст, этот контекст нужен для дочернего спана.
// В случае если в ctx нет контекста родителя то создается контекст родителя
// Не забыть вызывать span.End()
func (t *Tracer) CreateSpan(ctx context.Context, name string, fun string) (context.Context, trace2.Span) {
	if t == nil || t.tp == nil {
		return context.Background(), noop.Span{}
	}

	return t.tp.Tracer(t.ServiceName).Start(ctx, name)
}

// CreateSpanWithCustomTraceId -  экспериментальный метод, создаем спан на основе кастомного трайс айди
func (t *Tracer) CreateSpanWithCustomTraceId(ctx context.Context, traceId, name string) (context.Context, trace2.Span, error) {
	tId, err := trace2.TraceIDFromHex(traceId)

	if err != nil {
		return nil, noop.Span{}, err
	}

	spanContext := trace2.NewSpanContext(trace2.SpanContextConfig{
		TraceID: tId,
	})

	ctx1 := trace2.ContextWithSpanContext(ctx, spanContext)
	ctx1, span := t.tp.Tracer(t.ServiceName).Start(ctx1, name)

	return ctx1, span, nil
}

// initTraceConfig -  инициализирует конфиг трассировки, читает  из файла  .env переменки
func (t *Tracer) initTraceConfig() error {
	if err := config.ReadEnv(); err != nil {
		return err
	}

	traceCfg := &TraceConfig{}
	err := config.InitConfig(traceCfg)

	if err != nil {
		return err
	}

	t.cfg = traceCfg
	t.ServiceName = traceCfg.ServiceName
	t.IsEnabled = traceCfg.IsTraceEnabled

	return nil
}

4. What best practices for the production environment have I identified:

Once tracing is integrated and working, it is important to consider the following points for use in a production environment:

  • Minimize the load: Use batching and asynchronous data sending to minimize the impact on application performance.

  • Maintain data privacy: Do not write sensitive data to attributes or trace logs.

  • Regular monitoring: Monitor the amount of data sent to tracing to avoid generating excess data and overloading the monitoring system.

  • Set data limits: Limit the depth and number of traces, especially for highly loaded services. In the production environment, we implemented 1-of-5 trace recording. That is, only 1 trace out of 5 will be written to the database, the rest are ignored.

Conclusion

Tracing is a powerful tool for monitoring microservices. This tool has increased the observability of our services, but also made it easier to diagnose and troubleshoot.

P.S. The main thing in creating spans is to correctly transfer contexts, otherwise in the admin panel you will not see an accurate picture of your traces, there will be no nesting. Unfortunately, I spent a lot of time to identify this.

This post is written for the exchange of experience, my experience is only one of the possible approaches, and I invite you to a constructive discussion.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *