Advanced Logging in Linux

journalctl – Working with structured logs

The event log is a component of systemd that captures syslog messages, kernel logs, all system initialization events (RAM, disk, boot, STDOUT/STDERR for all services), indexes them, and then provides a user-friendly interface for searching and filtering logs. The journal (systemd journal) can be used with or instead of syslog or syslog-ng.

The journalctl command line utility, when compared to traditional UNIX logging tools (tail, grep, sed, awk), has more features.

Let’s look at the main features that the systemd journal provides and how to use them.

The systemd log is saved to disk, so all logs “survive” the system reboot. View a list of available downloads:

journalctl --list-boots |head -n2

journalctl --list-boots |tail -n2

Team journalctl –b will show all logs for the current download. If logs of a certain period are needed, then you need to add the download number as an argument:

journalctl -b -1

To get all logs for the current download in reverse chronological order:

journalctl -b --all --catalog --no-pager

All logs for all time in one file:

journalctl --all --catalog --merge --no-pager

Current boot, kernel logs only:

journalctl -b -k --no-pager

To be able to track the logs in real time (similar to tail -f):

journalctl -f

View how much space systemd journals take up on disk (/var/log/journal/):

journalctl --disk-usage

Get logs and metadata:

journalctl --output verbose

Export logs to a file:

journalctl --output export > export.log

Log filtering

You can filter logs by priority (RFC 5424 6.2.1):

journalctl -f -p emerg
journalctl -f -p alert
journalctl -f -p crit
journalctl -f -p err
journalctl -f -p warning
journalctl -f -p notice
journalctl -f -p info
journalctl -f -p debug

Display only logs with priority error, critical and alert:

journalctl -p err..alert

Logs for specific id only:

journalctl -t NetworkManager

journalctl can be used along with standard command line tools – grep, awk:

journalctl -b | grep -i selinux

In order to reduce the time it is better to use the flag -g or --grep:

journalctl -g nginx

journalctl -b -g kube

journalctl -g fail --case-sensitive=true

Like grep --grep “understands” regular expressions:

journalctl --grep '(Started|Stopping)'

Allows you to filter logs by timestamps, without grep, awk and sed. No need to memorize complex regular expressions:

journalctl --since "20 min ago"

If you have a geo-distributed infrastructure in different time zones, then journalctl will help with different time zones:

journalctl --since "2023-06-21 14:24 Pacific/Auckland" --until "2023-06-21 14:30 Europe/Amsterdam"

The systemd log saves logs in a structured format:

journalctl -o verbose --no-pager

Sat 2023-07-22 17:17:40.468870 MSK [s=8e997e4278d4420da4ee36deb1bcb537;i=48abc;b=200a318f51b04680a1207f58ed5aaf88;m=2199c4e719;t=601140b4770a0;x=59dabeebe46afe51]
    _TRANSPORT=syslog
    PRIORITY=6
    SYSLOG_IDENTIFIER=sshd
    _UID=0
    _GID=0
    _COMM=sshd
    _EXE=/usr/sbin/sshd
    _CAP_EFFECTIVE=1ffffffffff
    _SELINUX_CONTEXT=system_u:system_r:sshd_t:s0-s0:c0.c1023
    _SYSTEMD_CGROUP=/system.slice/sshd.service
    _SYSTEMD_UNIT=sshd.service
    _SYSTEMD_SLICE=system.slice
    _SYSTEMD_INVOCATION_ID=2c28dd046868493ca6c4ae9325b237b9
    _BOOT_ID=200a318f51b04680a1207f58ed5aaf88
    _MACHINE_ID=b0900a09b82b4ecca86af861e99e64c5
    _HOSTNAME=Pythagoras
    _RUNTIME_SCOPE=system
    SYSLOG_FACILITY=10
    _CMDLINE="sshd: unknown [priv]"
    SYSLOG_PID=1339341
    _PID=1339341
    SYSLOG_TIMESTAMP=Jul 22 17:17:40
    MESSAGE=Connection closed by invalid user ubuntu 45.95.147.231 port 53778 [preauth]

The systemd log saves logs in a structured format:

journalctl _PID=1339341

Or, for example, look at the logs of all programs written in python:

journalctl _COMM=python

JSON – new opportunities for automation

journalctl can output logs in json format – you can use the jq utility to filter messages:

journalctl --since "1500 min ago" -u kubelet.service -o json | jq .""_CMDLINE""

"/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-ip=<IP_ADDRESS> --node-labels=<LABEL>/cluster=true, --pod-infra-container-image=registry.k8s.io/pause:3.9"
"/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-ip=<IP_ADDRESS> --node-labels=<LABEL>/cluster=true, --pod-infra-container-image=registry.k8s.io/pause:3.9"
"/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-ip=<IP_ADDRESS> --node-labels=<LABEL>/cluster=true, --pod-infra-container-image=registry.k8s.io/pause:3.9"

Structured logs open up wide possibilities for automation. No more “hacking” trying to glue sed, grep and awk in scripts. Any high-level programming language contains a module for working with JSON.

As an illustrative example, you can imagine a passing scenario: Application X writes some messages to the event log. A simple program running in Linux daemon mode keeps track of events in the systemd log. If a certain message appears in the log, the program, following its internal logic, sends a message to the data bus (kafka, rabbitmq, NATS) or performs certain operations in the system:

Code
package main

import (
        "bufio"
        "fmt"
        "os"
        "os/exec"
        "strings"
)

func main() {
        // Отслеживаем сообщения для идентификатора «events_topic»
        cmdName := "journalctl -t events_topic -f"
        cmdArgs := strings.Fields(cmdName)

        cmd := exec.Command(cmdArgs[0], cmdArgs[1:len(cmdArgs)]...)
        stdout, _ := cmd.StdoutPipe()
        cmd.Start()
        oneByte := make([]byte, 100)
        num := 1
        for {
                _, err := stdout.Read(oneByte)
                if err != nil {
                        fmt.Printf(err.Error())
                        break
                }
                r := bufio.NewReader(stdout)
                line, _, _ := r.ReadLine()
                fmt.Println(string(line))
                num = num + 1
                if num > 3 {
                        os.Exit(0)
                }
        }
}

Now, in one terminal session, we run a program that tracks events in the log:

go run watch.go
Jul 22 15:49:25 Pythagoras events_topic[1293240]: NEW EVENT
Jul 22 15:49:28 Pythagoras events_topic[1293250]: NEW EVENT

And in another terminal session, we emulate the application writing X messages to the log using the built-in systemd-cat utility:

echo "NEW EVENT" | systemd-cat -t events_topic
echo "NEW EVENT" | systemd-cat -t events_topic

In this case, the “os / exec” package is used to work with the log – that is, we execute system commands, but you can use one of the existing libraries:
https://github.com/coreos/go-systemd

This is a simple example of using systemd’s structured logs. The capabilities of systemd are much wider – many of them are used, for example, by cloud providers to build a fault-tolerant infrastructure.

sos report – when you need to find a needle in a haystack

https://github.com/sosreport/sos

It is not always possible to analyze the logs and outputs of various diagnostic commands on an industrial infrastructure in a calm, unhurried environment. In some cases, to find the root cause, you need a deep analysis, reconciliation of the log and metrics in retrospect, sometimes writing automation for parsing logs.

Sos report is just for such scenarios. This is a unified tool for collecting logs and diagnostic information. With sos report, you can create an archive with all the necessary information about the system – this data is used to analyze and find problems offline.

Installing sosreport:

yum install sos

apt install sosreport

The program is written in Python. Expands easily. Dozens of plugins have been written for sos report:

sos report --list-plugins

alternatives         System alternatives
anaconda             Anaconda installer
anacron              Anacron job scheduling service
ata                  ATA and IDE information
auditd               Audit daemon information
block                Block device information
boot                 Bootloader information
cgroups              Control groups subsystem
chrony               Chrony clock (for Network time protocol)
cockpit              Cockpit Web Service
conntrack            conntrack - netfilter connection tracking
 ...

profiles:

sos report --list-profiles

boot            boot, devices, dracut, grub2, services, systemd, udev
cluster         conntrack
container       cgroups, containers_common, podman, selinux
 ...

and presets:

sos report --list-presets
       name: none
description: Do not load a preset
       note: Use to disable automatically loaded presets

       name: minimal
description: Small and quick report that reduces sos report resource consumption
       note: May be useful for low-resource systems, but may not provide sufficient data for analysis

All this can be customized for a specific case.

Report generation example:

sos report --all-logs --batch --tmp-dir=/var

The created archive with logs can be taken for further analysis:

Your sosreport has been generated and saved in:
	/var/tmp/<FILENAME>.tar.xz

Size	20.25MiB
Owner	root
sha256	db71d185550428b5c3a12010fc76206b58b159671c6b74c2fce575254e388030

Depending on the selected profile and plugins, the archive may contain detailed diagnostic information with the outputs of various utilities and tools:

tar -xf <FILENAME>.tar.xz

boot  dmidecode    free            ip_addr   lib    lspci    proc    root-symlinks  sos_logs     uname   var
date  environment  hostname        ip_route  lsmod  mount    ps      run            sos_reports  uptime  version.txt
df    etc          installed-rpms  last      lsof   netstat  pstree  sos_commands   sys          usr     vgdisplay

When generating a report, using flags --clean, --domains And --keywords you can obfuscate (clean up) the resulting archive by deleting all confidential information.


Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *