use cannot be invented

Formulation of the problem

Let's imagine that we are testing a system with several components. During testing, the automation system works with one or more components on one physical server. For most functional tests, it is sufficient to simply carefully start and stop component processes and interact with it via the API. However, to expand test coverage, log analysis may be required to detect errors, artifacts of background tasks, debugging messages, and other things.

The components output information to standard output (stdout) and error (stderr) streams, which corresponds to the principles The Twelve-Factor App. At the same time, we can work with several components, including an automation system that can also log internal states. Our goal is to collect all logs into one file for later analysis, similar to the pass-through identifier X-Request-ID used to combine all contexts on a request. At the same time, we want to create tests based on analysis of logs during testing: if the line “Segmentation fault” appears in the log, then the test is considered failed.

Requests for file formatting are quite simple:

loose ordering by time (from old to new);
component prefix (so that logs can be conveniently grated).

Simple Component

The role of a conditional server or mini-application will be performed by a regular Python script (server.py), which writes every second to stdout And stderr using the function print and a preconfigured logger:

# server.py

import time
import sys
import logging

from datetime import datetime

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

handler = logging.StreamHandler(sys.stderr)
handler.setLevel(logging.DEBUG)

formatter = logging.Formatter("%(asctime)s [%(levelname)s]> %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)

while True:
    # YYYY-MM-DD HH:MM:SS,000
    now = datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S,%f")[:-3]
    print(f"{now} [PRINTER]> Today was a good day!")
    logger.info("I can't believe today was a good day")
    time.sleep(1)

Having launched our server, we will see the expected output:

2024-05-24 11:23:12,988 [PRINTER]> Today was a good day!
2024-05-24 15:23:12,988 [INFO]> I can't believe today was a good day
2024-05-24 11:23:13,990 [PRINTER]> Today was a good day!
2024-05-24 15:23:13,990 [INFO]> I can't believe today was a good day
2024-05-24 11:23:14,991 [PRINTER]> Today was a good day!
2024-05-24 15:23:14,991 [INFO]> I can't believe today was a good day

Easily run a process and read the output

Let's return to our testing system. When starting a process, in addition to specifying subprocess.PIPE For stdout And stderrlet's not forget the flag -u (or environment variable PYTHONUNBUFFERED) to disable buffering of output streams:

# reader.py

import sys
import os
import subprocess

p = subprocess.Popen(
    args=[sys.executable, '-u', 'server.py'],
    stderr = subprocess.PIPE,
    stdout = subprocess.PIPE
)

PS For different languages and systems, buffering is disabled in different ways, it is worth remembering this.

Now we read no more buffer_size byte from the standard output file descriptor. Convert to format utf-8 and split by symbol LF ('\n'). We print it out by adding some prefix:

# reader.py
...
buffer_size = 1024
try:
    while True:
        stdout_chunks = os.read(p.stdout.fileno(), buffer_size)
        formatted_chunks = stdout_chunks.decode('utf-8').split('\n')
        for line in formatted_chunks:
            if line:
                print(f"server | {line}")
        time.sleep(2)
except KeyboardInterrupt:
    p.terminate()

# output:
# server | 2024-05-24 12:36:45,336 [PRINTER]> Today was a good day!
# server | 2024-05-24 12:36:46,337 [PRINTER]> Today was a good day!
# server | 2024-05-24 12:36:47,338 [PRINTER]> Today was a good day!
# server | 2024-05-24 12:36:48,343 [PRINTER]> Today was a good day!
# server | 2024-05-24 12:36:49,347 [PRINTER]> Today was a good day!
# server | 2024-05-24 12:36:50,351 [PRINTER]> Today was a good day!

We're great – now everyone stdout from our server we can watch in the terminal. The only thing that is stopping us now is that the main execution process is blocked. This is where the module comes to the rescue threading and class Thread.

class Reader(Thread):
    def __init__(self, fd: int, name: str):
        super().__init__(self)
        self.fd = fd
        self.name = name
        self.daemon = True
        self.running = True
        self._buffer_size = 1024
        self._output_delay = 1

    def run(self):
        while self.running:
            try:
                stdout_chunks = os.read(self.fd, self._buffer_size)
                formatted_chunks = stdout_chunks.decode('utf-8').split('\n')
                for line in formatted_chunks:
                    if line:
                        print(f"{self.name} | {line}")
            except Exception as e:
                print(f'Reader[{self.name}] failed: ' + str(e))

    def join(self):
        self.running = False
        return super().join()

We add several specific nuances, such as stopping the endless loop using a flag self.running, demon mode and thread name.

Now when running two threads (for stdout And stderr) we will explicitly write a message from the main process to see that there is no blocking. To stop correctly, we will add exception handling KeyboardInterrupt — when it appears, you must carefully terminate the process and two threads.

try:
    stdout_reader = Reader(p.stdout.fileno(), 'server_stdout')
    stderr_reader = Reader(p.stderr.fileno(), 'server_stderr')

    stdout_reader.start()
    stderr_reader.start()
    while True:
        print('main | not blocked')
        time.sleep(2)
except KeyboardInterrupt:
    stdout_reader.join()
    stderr_reader.join()
    p.terminate()

# output:
# main | not blocked
# server_stdout | 2024-05-24 13:10:53,851 [PRINTER]> Today was a good day!
# server_stderr | 2024-05-24 17:10:53,851 [INFO]> I can't believe today was a good day
# server_stdout | 2024-05-24 13:10:54,853 [PRINTER]> Today was a good day!
# server_stderr | 2024-05-24 17:10:54,853 [INFO]> I can't believe today was a good day
# main | not blocked
# server_stdout | 2024-05-24 13:10:55,854 [PRINTER]> Today was a good day!
# server_stderr | 2024-05-24 17:10:55,854 [INFO]> I can't believe today was a good day

We read the logs of the main process and play tricks with file descriptors

Start the process and just read it stdout/stderr – this is a rather trivial task. However, we still want to see an end-to-end log file with automation system logs. Launch Reader for reading sys.stdout And sys.stderr will not work. The same ones come to the rescue pipes.

Using a logger from a file server.pyadding a little trick:

r, w = os.pipe()
os.dup2(w, sys.stderr.fileno())

We add stderr reading from the main process and write something to it:

# reader.py

r, w = os.pipe()
os.dup2(w, sys.stderr.fileno())

try:
    stdout_reader = Reader(p.stdout.fileno(), 'server_stdout')
    stderr_reader = Reader(p.stderr.fileno(), 'server_stderr')
    stderr_reader_main = Reader(r, 'main_stderr')

    stdout_reader.start()
    stderr_reader.start()
    stderr_reader_main.start()

    while True:
        print('main | not blocked')
        logger.info('yep, not blocked')
        time.sleep(2)

# output:
# main | not blocked?
# main_stderr | 2024-05-24 17:24:01,031 [INFO]> yep, not blocked
# server_stdout | 2024-05-24 13:24:01,047 [PRINTER]> Today was a good day!
# server_stderr | 2024-05-24 17:24:01,047 [INFO]> I can't believe today was a good day
# server_stdout | 2024-05-24 13:24:02,047 [PRINTER]> Today was a good day!
# server_stderr | 2024-05-24 17:24:02,047 [INFO]> I can't believe today was a good day
# main | not blocked?
# main_stderr | 2024-05-24 17:24:03,031 [INFO]> yep, not blocked
# server_stdout | 2024-05-24 13:24:03,049 [PRINTER]> Today was a good day!
# server_stderr | 2024-05-24 17:24:03,049 [INFO]> I can't believe today was a good day
# server_stdout | 2024-05-24 13:24:04,051 [PRINTER]> Today was a good day!
# server_stderr | 2024-05-24 17:24:04,051 [INFO]> I can't believe today was a good day

To get the same thing, but in a file, you just need next to print specify a file and write to it:

def run(self):
    while self.running:
        try:
          ...
          if line:
              print(f"{self.name} | {line}")
              file.write(f"{self.name} | {line}\n") # <--
        except Exception as e:
          ...

Now we have full-fledged capture of logs, external components and our own system.

There are logs, but what to do with them?

You can record everything in a file as a testing artifact, search directly during the test, explore the complex behavior of systems during a specific test with a simple grep, and much more. Moreover, this solution is portable, as it relies on the basic mechanisms of interprocess communication (pipes) and preemptive multitasking (threads).

For example, a developer needed to get some of the logs about RAFT (you can read interesting things about replication in article my colleague). Previously, it needed to run throughout the entire file system: ls, cat, grep, find. It's probably fun to do this a couple of times for a demo, but not every day when debugging tests. Now this is one file where the status of connections, test data, and background operations can also be logged (and we have them).

End-to-end log file from three components (bob, frank, john - Tarantool instances) and a simple grep by keyword — End-to-end log file from three components (bob, frank, john – Tarantool instances) and a simple grep by keyword

Finally

ELK is one of the main tools for working with logs, but I was never able to solve the following problems:

excessive dependency: most functions are not used, and support and administration require a lot of time and resources;
processing (delivery) speed: autotests must be not only fast, but also stable. Network delays and problems with intermediate buffers led to unstable tests.

That's why I decided to write my solution.

Maybe I just cooked it wrong ELK And rsyslog in the context of test automation and was too biased, since I don’t like dragging something huge for the sake of a couple of functions, so I would be glad if you shared your approaches to working with logs in tests.