Where do the logs come from? Veeam log diving

We continue our immersion into an exciting world gadan … troubleshooting by logs. In the previous article, we agreed on the meaning of the basic terms and looked at the general structure of Veeam as a single application in one eye. The task for this is to figure out how log files are formed, what kind of information is displayed in them and why they look the way they look.

What do you think these “logs” are? In the opinion of the majority, the logs of any application should be assigned the role of a sort of omnipotent entity, which most of the time vegetates somewhere in the backyard, but at the right moment appears from nowhere in shining armor and saves everyone. That is, they should contain everything, from the smallest errors in each component, to individual database transactions. And so that after an error it is immediately written how else to fix it. And all this should fit in a couple of megabytes, no more. It’s just text! Text files cannot occupy tens of gigabytes, I heard it somewhere!

So the logs

In the real world, logs are just an archive of diagnostic information. And what to store there, where to get information for storage and how detailed it should be, is up to the developers to decide. Someone follows the path of minimalism keeping records of the ON / OFF level, and someone diligently rakes up everything they could reach. Although there is also an intermediate option with the ability to select the so-called Logging Level, when you yourself indicate how much detailed information you want to store and how much extra disk space you have =) VBR has six such levels, by the way. And, believe me, you don’t want to see what happens during the most detailed logging with free space on your disk.

Okay. We roughly understood what we want to save, but a legitimate question arises: where to get this information? Some of the events for logging, of course, are formed by us ourselves with our internal processes. But what to do when there is an interaction with the external environment? In order not to slide into a hell of crutches and bicycles, Veeam tends not to reinvent inventions already invented. Whenever there is a ready-made API, a function built into the system, a library, etc., we will give priority to ready-made versions before starting to fence our ingenious solutions. Although the latter are also missing. Therefore, when analyzing logs, it is important to understand that the lion’s share of errors occurs in messages from third-party APIs, system calls and other libraries. In this case, the role of VBR is reduced to forwarding these errors to log files as is. And the main task of the user is to learn to understand which line is from whom, and for what this “who” is responsible. Therefore, if the error code from the VBR log leads you to the MSDN page, this is normal and correct.

As we agreed earlier: Veeam is a so-called SQL-based application. This means that all settings, all information and in general everything that is only necessary for normal functioning – everything is stored in its database. Hence the simple truth: what is not in the logs is most likely in the database. But this is not a silver bullet either: some things are not in the local logs of Veeam components, or in its database. Therefore, you need to learn how to study the host logs, the local machine logs and the logs of everything in general that is involved in the backup and restaurant process. And it also happens that the necessary information is not available anywhere. This is the way.

Some examples of such APIs

This list does not aim to be of exceptional completeness, so there is no need to look for the ultimate truth in it. Its purpose is only to show you the most common third-party APIs and technologies used in our products.

Let’s start with VMware

The first on the list will be vSphere API… Used for authentication, reading hierarchy, creating and deleting snapshots, requesting information about machines, and much (very much) much more. The functionality of the solution is very wide, so I can recommend VMware vSphere API Reference for the version to everyone. 5.5 and 6.0… For more up-to-date versions, everything is just googled.

VIX API… Black magic of the hypervisor, for which there is a separate error list… VMware API for working with files on the host without connecting to them over the network. A variant of the last hope, when you need to put a file in a car, to which there is no better communication channel. It is pain and suffering if the file is large and the host is loaded. But here the rule is that even 56.6 Kb / s is better than 0 Kb / s. In Hyper-V, this is called PowerShell Direct. But it was only like this before

vSpehere Web Services API Starting with vSphere 6.0 (approximately, since this API was first introduced in version 5.5) it is used to work with guest machines and has replaced VIX almost everywhere. In fact, this is another API for managing vSphere. I can advise those who are interested to study a great manual.

VDDK (Virtual Disk Development Kit). The library, which was partially mentioned in this article… Used to read virtual disks. Once upon a time it was part of the VIX, but over time it was moved to a separate product. But as an heir, it uses the same error codes as VIX. But for some reason, there is no description of these errors in the SDK itself. Therefore, it was experimentally found that VDDK errors with other codes are just a translation from binary to decimal code. Consists of two parts – the first half is undocumented information about the context, and the second part is the traditional VIX / VDDK errors. For example, if we see:

VDDK error: 21036749815809.Unknown error

Then we can safely convert it to hex and get 132200000001. We simply discard the uninformative beginning 132200, and the remainder will be our error code (VDDK 1: Unknown error). About the most common VDDK errors just recently there was a separate article

Now let’s look at WIndows

Here everything that is most necessary and important for us can be found in the standard Event Viewer… But there is one catch: according to a long-standing tradition, Windows does not log the full text of the error, but only its number. For example, error 5 is “Access denied”, and 1722 is “The RPC server is unavailable”, well, 10060 is “Connection timed out”. Of course, it’s great if you remember the most famous, but what about hitherto unseen?

And so that life does not seem like honey at all, errors are also stored in hexadecimal form, with the prefix 0x8007. For example, 0x8007000e is actually 14, Out of Memory. Why and for whom it was done is a mystery shrouded in darkness. However, the complete list of errors can be downloaded for free and without SMS from devtsentra

By the way, sometimes there are other prefixes, not just 0x8007. In such a sad situation, to understand HRESULT (“result handle”), you need to get even deeper into documentation for developers. In ordinary life, I do not advise you to do this, but if you suddenly pushed against the wall or just wondering, now you know what to do.

But comrades at Microsoft took pity on us a little and showed the world the utility ERR… This is a small piece of console happiness that can translate error codes into a human without using Google. It works like this.

C:UsersrootDesktop>err.exe 0x54f
# for hex 0x54f / decimal 1359
  ERROR_INTERNAL_ERROR                                           winerror.h
# An internal error occurred.
# as an HRESULT: Severity: SUCCESS (0), FACILITY_NULL (0x0), Code 0x54f
# for hex 0x54f / decimal 1359
  ERROR_INTERNAL_ERROR                                           winerror.h
# An internal error occurred.
# 2 matches found for "0x54f"

A legitimate question arises: why do we not immediately write the decryption to the logs, but leave these mysterious codes? Answer in third party applications. When you yourself pull some WinAPI call, it is not difficult to decipher its response, because there is even a special WinAPI call for this. But as already mentioned, everything that only comes to us in responses gets into our logs. And here, already for decryption, it would be necessary to constantly monitor this stream of consciousness, snatch pieces with Windows errors from it, decrypt them and insert them back. Let’s be honest, not the most exciting thing to do.

Windows File Management API is used in every possible way when working with files. Creating files, deleting, opening for writing, working with attributes, etc., and more.

Mentioned above PowerShell Direct as an analogue of the VIX API in the Hyper-V world. Unfortunately, it is not so flexible: a lot of restrictions on functionality, does not work with every version of the host and not with all guests.

RPC (Remote Procedure Call) I don’t think there is a single person who has worked with WIndows who has not seen errors related to RPC. Despite the popular misconception, it is not a single protocol, but any client-server protocol that satisfies a number of parameters. However, if there is an RPC error in our logs, in 90% of cases it will be an error from Microsoft RPC, which is part of the DCOM (Distributed Component Object Model). There is a huge amount of documentation on this topic on the Internet, but the lion’s share of it is quite outdated. But if there is a keen desire to study the topic, then I can recommend articles What is RPC?, How RPC Works and a long list RPC errors

The main causes of RPC errors in our logs are unsuccessful attempts to communicate between VBR components (server> proxy, for example) and most often due to communication problems.

The top top among all the tops is the error The RPC server is unavailable (1722). To put it simply, the client was unable to establish a connection to the server. How and why – there is no single answer, but usually it is a problem with authentication or network access to port 135. The latter is typical for infrastructures with dynamic port assignment. There is even separate HF… And Microsoft – voluminous guide to find the causes of the malfunction.

The second most common bug: There are no more endpoints available from the endpoint mapper (1753). The RPC client or server failed to assign a port to itself. Usually occurs when the server (in our case the guest machine) has been configured to dynamically allocate ports from a narrow range that has ended. And if you enter from the client side (in our case, the VBR server), this means that our VeeamVssAgent either did not start or was not registered as an RPC interface. On this topic there is also separate HF

Well, to complete the Top 3 RPC errors, let’s remember RPC function call failed (1726). Appears if the connection has been established, but RPC requests are not processed. For example, we request information about the VSS status (all of a sudden there is a shadow copy being made there, and we are trying to climb), and in response to us silence and ignore.

Windows Tape Backup API needed to work with tape libraries or drives. As I mentioned at the beginning: we have no pleasure in writing our own drivers and then having to suffer with the support of each device. Therefore, the wim does not have any of its own drivers. All through the standard API, the support of which is implemented by the hardware vendors themselves. So much more logical, right?

SMB / CIFS Everyone writes them side by side out of habit, although not everyone remembers that CIFS (Common Internet File System) is just a private version of SMB (Server Message Block). So there is nothing wrong with generalizing these concepts. Samba is already a Linux Unix implementation, and it has its own peculiarities, but I digress. What’s important here: when Veeam asks to write something along the UNC path ( server directory), the server uses the file system driver hierarchy, including mup and mrxsmb, to write to the share. Accordingly, these drivers will also generate errors.

There is no way to do without Winsock API… If something needs to be done over the network, VBR works through the Windows Socket API, popularly known as Winsock. So if we see the IP: Port bundle in the log, this is it. The official documentation has a nice list of possible mistakes

Mentioned above WMI (Windows Management Instrumentation) is some kind of omnipotent API for managing everything and everything in the Windows world. For example, when working with Hyper-V, almost all requests to the host go through it. In a word, the thing is absolutely irreplaceable and very powerful in its capabilities. The built-in tool WBEMtest.exe is very helpful in trying to help figure out where and what has broken.

And the last on the list, but absolutely not the last in importance – VSS (Volume Shadow Storage). The topic is as inexhaustible and mysterious as there is a lot of documentation written on it. Shadow Copy is easiest to understand as a special type of snapshot, which in fact it is. Thanks to him, you can make application-consistent backups in VMware, and almost everything in Hyper-V. I have plans to make a separate article with some squeeze on VSS, but for now you can try reading this description… Just be careful, because trying to understand VSS at a glance can lead to brain injury.

On this, perhaps, we can stop. I consider the task of explaining the most basic things completed, so in the next chapter we will already look at the logs. But if you still have questions, do not hesitate to voice them in the comments.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *