Advanced Logging in Linux
journalctl – Working with structured logs
The event log is a component of systemd that captures syslog messages, kernel logs, all system initialization events (RAM, disk, boot, STDOUT/STDERR for all services), indexes them, and then provides a user-friendly interface for searching and filtering logs. The journal (systemd journal) can be used with or instead of syslog or syslog-ng.
The journalctl command line utility, when compared to traditional UNIX logging tools (tail, grep, sed, awk), has more features.
Let’s look at the main features that the systemd journal provides and how to use them.
The systemd log is saved to disk, so all logs “survive” the system reboot. View a list of available downloads:
journalctl --list-boots |head -n2
journalctl --list-boots |tail -n2
Team journalctl –b
will show all logs for the current download. If logs of a certain period are needed, then you need to add the download number as an argument:
journalctl -b -1
To get all logs for the current download in reverse chronological order:
journalctl -b --all --catalog --no-pager
All logs for all time in one file:
journalctl --all --catalog --merge --no-pager
Current boot, kernel logs only:
journalctl -b -k --no-pager
To be able to track the logs in real time (similar to tail -f):
journalctl -f
View how much space systemd journals take up on disk (/var/log/journal/):
journalctl --disk-usage
Get logs and metadata:
journalctl --output verbose
Export logs to a file:
journalctl --output export > export.log
Log filtering
You can filter logs by priority (RFC 5424 6.2.1):
journalctl -f -p emerg
journalctl -f -p alert
journalctl -f -p crit
journalctl -f -p err
journalctl -f -p warning
journalctl -f -p notice
journalctl -f -p info
journalctl -f -p debug
Display only logs with priority error, critical and alert:
journalctl -p err..alert
Logs for specific id only:
journalctl -t NetworkManager
journalctl can be used along with standard command line tools – grep, awk:
journalctl -b | grep -i selinux
In order to reduce the time it is better to use the flag -g
or --grep
:
journalctl -g nginx
journalctl -b -g kube
journalctl -g fail --case-sensitive=true
Like grep --grep
“understands” regular expressions:
journalctl --grep '(Started|Stopping)'
Allows you to filter logs by timestamps, without grep, awk and sed. No need to memorize complex regular expressions:
journalctl --since "20 min ago"
If you have a geo-distributed infrastructure in different time zones, then journalctl will help with different time zones:
journalctl --since "2023-06-21 14:24 Pacific/Auckland" --until "2023-06-21 14:30 Europe/Amsterdam"
The systemd log saves logs in a structured format:
journalctl -o verbose --no-pager
Sat 2023-07-22 17:17:40.468870 MSK [s=8e997e4278d4420da4ee36deb1bcb537;i=48abc;b=200a318f51b04680a1207f58ed5aaf88;m=2199c4e719;t=601140b4770a0;x=59dabeebe46afe51]
_TRANSPORT=syslog
PRIORITY=6
SYSLOG_IDENTIFIER=sshd
_UID=0
_GID=0
_COMM=sshd
_EXE=/usr/sbin/sshd
_CAP_EFFECTIVE=1ffffffffff
_SELINUX_CONTEXT=system_u:system_r:sshd_t:s0-s0:c0.c1023
_SYSTEMD_CGROUP=/system.slice/sshd.service
_SYSTEMD_UNIT=sshd.service
_SYSTEMD_SLICE=system.slice
_SYSTEMD_INVOCATION_ID=2c28dd046868493ca6c4ae9325b237b9
_BOOT_ID=200a318f51b04680a1207f58ed5aaf88
_MACHINE_ID=b0900a09b82b4ecca86af861e99e64c5
_HOSTNAME=Pythagoras
_RUNTIME_SCOPE=system
SYSLOG_FACILITY=10
_CMDLINE="sshd: unknown [priv]"
SYSLOG_PID=1339341
_PID=1339341
SYSLOG_TIMESTAMP=Jul 22 17:17:40
MESSAGE=Connection closed by invalid user ubuntu 45.95.147.231 port 53778 [preauth]
The systemd log saves logs in a structured format:
journalctl _PID=1339341
Or, for example, look at the logs of all programs written in python:
journalctl _COMM=python
JSON – new opportunities for automation
journalctl can output logs in json format – you can use the jq utility to filter messages:
journalctl --since "1500 min ago" -u kubelet.service -o json | jq .""_CMDLINE""
"/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-ip=<IP_ADDRESS> --node-labels=<LABEL>/cluster=true, --pod-infra-container-image=registry.k8s.io/pause:3.9"
"/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-ip=<IP_ADDRESS> --node-labels=<LABEL>/cluster=true, --pod-infra-container-image=registry.k8s.io/pause:3.9"
"/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-ip=<IP_ADDRESS> --node-labels=<LABEL>/cluster=true, --pod-infra-container-image=registry.k8s.io/pause:3.9"
Structured logs open up wide possibilities for automation. No more “hacking” trying to glue sed, grep and awk in scripts. Any high-level programming language contains a module for working with JSON.
As an illustrative example, you can imagine a passing scenario: Application X writes some messages to the event log. A simple program running in Linux daemon mode keeps track of events in the systemd log. If a certain message appears in the log, the program, following its internal logic, sends a message to the data bus (kafka, rabbitmq, NATS) or performs certain operations in the system:
Code
package main
import (
"bufio"
"fmt"
"os"
"os/exec"
"strings"
)
func main() {
// Отслеживаем сообщения для идентификатора «events_topic»
cmdName := "journalctl -t events_topic -f"
cmdArgs := strings.Fields(cmdName)
cmd := exec.Command(cmdArgs[0], cmdArgs[1:len(cmdArgs)]...)
stdout, _ := cmd.StdoutPipe()
cmd.Start()
oneByte := make([]byte, 100)
num := 1
for {
_, err := stdout.Read(oneByte)
if err != nil {
fmt.Printf(err.Error())
break
}
r := bufio.NewReader(stdout)
line, _, _ := r.ReadLine()
fmt.Println(string(line))
num = num + 1
if num > 3 {
os.Exit(0)
}
}
}
Now, in one terminal session, we run a program that tracks events in the log:
go run watch.go
Jul 22 15:49:25 Pythagoras events_topic[1293240]: NEW EVENT
Jul 22 15:49:28 Pythagoras events_topic[1293250]: NEW EVENT
And in another terminal session, we emulate the application writing X messages to the log using the built-in systemd-cat utility:
echo "NEW EVENT" | systemd-cat -t events_topic
echo "NEW EVENT" | systemd-cat -t events_topic
In this case, the “os / exec” package is used to work with the log – that is, we execute system commands, but you can use one of the existing libraries:
https://github.com/coreos/go-systemd
This is a simple example of using systemd’s structured logs. The capabilities of systemd are much wider – many of them are used, for example, by cloud providers to build a fault-tolerant infrastructure.
sos report – when you need to find a needle in a haystack
https://github.com/sosreport/sos
It is not always possible to analyze the logs and outputs of various diagnostic commands on an industrial infrastructure in a calm, unhurried environment. In some cases, to find the root cause, you need a deep analysis, reconciliation of the log and metrics in retrospect, sometimes writing automation for parsing logs.
Sos report is just for such scenarios. This is a unified tool for collecting logs and diagnostic information. With sos report, you can create an archive with all the necessary information about the system – this data is used to analyze and find problems offline.
Installing sosreport:
yum install sos
apt install sosreport
The program is written in Python. Expands easily. Dozens of plugins have been written for sos report:
sos report --list-plugins
alternatives System alternatives
anaconda Anaconda installer
anacron Anacron job scheduling service
ata ATA and IDE information
auditd Audit daemon information
block Block device information
boot Bootloader information
cgroups Control groups subsystem
chrony Chrony clock (for Network time protocol)
cockpit Cockpit Web Service
conntrack conntrack - netfilter connection tracking
...
profiles:
sos report --list-profiles
boot boot, devices, dracut, grub2, services, systemd, udev
cluster conntrack
container cgroups, containers_common, podman, selinux
...
and presets:
sos report --list-presets
name: none
description: Do not load a preset
note: Use to disable automatically loaded presets
name: minimal
description: Small and quick report that reduces sos report resource consumption
note: May be useful for low-resource systems, but may not provide sufficient data for analysis
All this can be customized for a specific case.
Report generation example:
sos report --all-logs --batch --tmp-dir=/var
The created archive with logs can be taken for further analysis:
Your sosreport has been generated and saved in:
/var/tmp/<FILENAME>.tar.xz
Size 20.25MiB
Owner root
sha256 db71d185550428b5c3a12010fc76206b58b159671c6b74c2fce575254e388030
Depending on the selected profile and plugins, the archive may contain detailed diagnostic information with the outputs of various utilities and tools:
tar -xf <FILENAME>.tar.xz
boot dmidecode free ip_addr lib lspci proc root-symlinks sos_logs uname var
date environment hostname ip_route lsmod mount ps run sos_reports uptime version.txt
df etc installed-rpms last lsof netstat pstree sos_commands sys usr vgdisplay
When generating a report, using flags --clean
, --domains
And --keywords
you can obfuscate (clean up) the resulting archive by deleting all confidential information.