Exploring the code quality of the Zephyr operating system

5 min


PVS-Studio and Zephyr

We recently said that the PVS-Studio code analyzer began to integrate with PlatformIO. Naturally, at the same time, the PVS-Studio development team communicated with the PlatformIO team and they suggested, for the sake of interest, checking the code of the Zephyr real-time operating system. Why not, we thought, and here is an article about such a study.

PVS-Studio

PVS-Studio is still little known in the world of embedded systems, so just in case I will make an introduction for new readers who are not yet familiar with this tool. Our regular readers can skip right to the next section.

PVS-Studio – This is a static code analyzer that allows you to identify errors and potential vulnerabilities in the code of programs written in C, C ++, C # and Java. If we talk only about C and C ++, then the following compilers are supported:

  • Windows Visual Studio 2010-2019 C, C ++, C ++ / CLI, C ++ / CX (WinRT)
  • Windows IAR Embedded Workbench, C / C ++ Compiler for ARM C, C ++
  • Windows QNX Momentics, QCC C, C ++
  • Windows / Linux Keil µVision, DS-MDK, ARM Compiler 5/6 C, C ++
  • Windows / Linux Texas Instruments Code Composer Studio, ARM Code Generation Tools C, C ++
  • Windows / Linux / macOS. GNU Arm Embedded Toolchain, Arm Embedded GCC compiler, C, C ++
  • Windows / Linux / macOS. Clang C, C ++
  • Linux / macOS. GCC C, C ++
  • Windows MinGW C, C ++

The analyzer has its own classification system. warningsbut, if necessary, you can enable the display of alerts according to coding standards CWE, SEI CERT, Misra.

You can quickly start regularly using PVS-Studio even in a large legacy project. For this, a special mechanism of mass warning suppression. All current warnings are considered technical debt and are hidden, which allows you to focus on warnings that apply only to new or modified code. This allows the team to immediately start using the analyzer daily in their work, and you can come back to the technical duty from time to time and improve the code.

There are many other scenarios for using PVS-Studio. For example, you can use it as a plugin to SonarQube. Integration with systems such as Travis CI, CircleCI, GitLab CI / CD, etc. is possible. A more detailed description of PVS-Studio is beyond the scope of this article. Therefore, I propose to read the article, which has many useful links, and in which answers to many questions are given: “Reasons to introduce PVS-Studio static code analyzer into the development process

Zephyr

Working on PVS-Studio integration in PlatformIO, our teams talked, and we were asked to check out a project from the embedded world, namely Zephyr. We liked the idea, which was the reason for writing this article.

Zephyr – A lightweight real-time operating system designed to work on devices with limited resources of various architectures. The code is distributed under the open source Apache 2.0 license. Works on the following platforms: ARM (Cortex-M0, Cortex-M3, Cortex-M4, Cortex-M23, Cortex-M33, Cortex-R4, Cortex-R5, Cortex-A53), x86, x86-64, ARC, RISC- V, Nios II, Xtensa.

Some features:

  • Unified address space. The specific application code in combination with the custom kernel creates a monolithic image that is executed on the device.
  • Great customization options. The application receives only those features that he needs and when he needs them.
  • Resources are determined at compile time. This reduces code size and increases performance.
  • Minimal error control. Serves for the same. At the same time, during testing, it is possible to receive full debugging information.
  • A rich set of features for the developer: multithreading, interrupt control, in-stream synchronization, tools for working with memory, power management and much more.

Of the interesting moments for us, the company takes part in the development of the operating system Synopsys. In 2014, Synopsys acquired Coverity, which produced the static code analyzer of the same name.

It’s quite natural that from the very beginning Zephyr used an analyzer Coverity. The analyzer is a market leader and this could not but affect the quality of the operating system code for the better.

Zephyr Code Quality

In my opinion, the Zephyr operating system code is quality. Here is what gives me reason to think so:

  • The PVS-Studio analyzer generated 122 general-purpose warnings of the High level and 367 warnings of the Medium level. This is a bit, considering that 560 C / C ++ files were checked. The kernel is checked by checking the samples. In total, in the project I counted 7,810 C / C ++ files and 10,075 header files. It turns out that only part of the project has been verified. However, the task of checking everything was not set, and the results obtained still indicate a low density of warnings.
  • Many of the warnings issued are false or semi-false. What is meant by “semi-false” warnings, I will explain below.
  • Utility SourcemonitorHaving analyzed the source code, it produced statistics that 48% of the code is comments. This is a lot and in my experience indicates a high concern for the quality of the code, its comprehensibility for other developers.
  • When developing a project, a Coverity static code analyzer is used. Most likely, due to this fact, the PVS-Studio analyzer, although it found errors in the project, could not show itself vividly, as it sometimes happens when analyzing other projects.

Based on this, I believe that the authors of the project care about the quality and reliability of the code. Let’s now look at some warnings issued by the PVS-Studio analyzer (version 7.06).

Semi-false warnings

The project code, due to its low level, is written quite specifically and with a lot of conditional compilation (#ifdef). This generates a large number of warnings that do not indicate a real mistake, but they can not be called simply false. It will be easiest to clarify this with a few examples.

An example of a “semi-false” actuation N1

static struct char_framebuffer char_fb;

int cfb_framebuffer_invert(struct device *dev)
{
  struct char_framebuffer *fb = &char_fb;

  if (!fb || !fb->buf) {
    return -1;
  }

  fb->inverted = !fb->inverted;

  return 0;
}

PVS-Studio warning: V560 A part of conditional expression is always false:! Fb. cfb.c 188

When taking the address of a static variable, a non-zero pointer is always obtained. Therefore the pointer fb always non-zero and checking it does not make sense.

However, it is clear that this is not an error at all, but simply an excessive check, which does no harm. Moreover, when building the Release version, the compiler will throw it away, so this will not even cause a slowdown.

A similar case, in my understanding, falls under the concept of “semi-false” analyzer triggering. Formally, the analyzer is absolutely right. And it’s better to remove the extra pointless check from the code. However, all this is petty and such warnings are not even interesting to consider in the framework of the article.

An example of a “semi-false” actuation N2

int hex2char(u8_t x, char *c)
{
  if (x <= 9) {
    *c = x + '0';
  } else if (x >= 10 && x <= 15) {
    *c = x - 10 + 'a';
  } else {
    return -EINVAL;
  }
  return 0;
}

PVS-Studio warning: V560 A part of conditional expression is always true: x> = 10. hex.c 31

The analyzer is again formally right in asserting that part of the condition is always true. If the variable x not less than / equal to 9, it turns out that it is always greater than / equal to 10. And the code can be simplified:

} else if (x <= 15) {

Once again, it’s clear that there is no real harmful error here, and the extra comparison is written just for the beauty of the code.

Now let's look at a more complex example of N3

First, let's see how a macro can be implemented. CHECKIF.

#if defined(CONFIG_ASSERT_ON_ERRORS)
#define CHECKIF(expr) 
  __ASSERT_NO_MSG(!(expr));   
  if (0)
#elif defined(CONFIG_NO_RUNTIME_CHECKS)
#define CHECKIF(...) 
  if (0)
#else
#define CHECKIF(expr) 
  if (expr)
#endif

Depending on the compilation mode of the project, the check may be either performed or skipped. In our case, when checking the code using PVS-Studio, this macro implementation was selected:

#define CHECKIF(expr) 
  if (expr)

Now let's see what this leads to.

int k_queue_append_list(struct k_queue *queue, void *head, void *tail)
{
  CHECKIF(head == NULL || tail == NULL) {
    return -EINVAL;
  }

  k_spinlock_key_t key = k_spin_lock(&queue->lock);
  struct k_thread *thread = NULL;
  if (head != NULL) {
    thread = z_unpend_first_thread(&queue->wait_q);
  }
  ....
}

PVS-Studio warning: V547 [CWE-571] Expression 'head! = NULL' is always true. queue.c 244

The analyzer believes that the check (head! = NULL) always gives the truth. And indeed it is. If the pointer head was equal to NULL, then the function would cease to work due to a check at the beginning of the function:

CHECKIF(head == NULL || tail == NULL) {
  return -EINVAL;
}

Recall that here the macro expands as follows:

if (head == NULL || tail == NULL) {
  return -EINVAL;
}

So, the PVS-Studio analyzer is right from its point of view and issues a correct warning. However, this penetration cannot be removed. She is needed. In another scenario, the macro will open like this:

if (0) {
  return -EINVAL;
}

And then re-checking the pointer is needed. Of course, the analyzer will not generate a warning in this type of code compilation. However, it gives a warning for the debug version of the compilation.

I hope that now it’s clear to readers where the “semi-false” warnings come from. However, there is nothing wrong with them. The PVS-Studio analyzer provides various mechanisms for suppressing false alerts, which can be found in the documentation.

Case Warnings

But did you find something interesting after all? It was a success, and now we look at various errors. At the same time, I want to immediately note two points:

  1. The essence of static analysis is not in such one-time checks. Correct methodology: regular launch of the analyzer, as, in fact, Coverity is used in the project now. Therefore, adding PVS-Studio or some other analyzer to the development process, you can find even more errors at an early stage and thereby reduce the cost of fixing them.
  2. When writing an article, I did not have the goal of finding as many errors as possible. Therefore, many errors have gone unnoticed by me or I have vainly attributed them to the category of “semi-false” responses and have not included them in the article. If the authors of the project are interested in this publication, I recommend that they independently conduct an analysis and study the received report. Since the project is open and located on GitHub, you can use free licensing option PVS-Studio.

Fragment N1, typo

static void gen_prov_ack(struct prov_rx *rx, struct net_buf_simple *buf)
{
  ....
  if (link.tx.cb && link.tx.cb) {
    link.tx.cb(0, link.tx.cb_data);
  }
  ....
}

PVS-Studio warning: V501 [CWE-571] There are identical sub-expressions to the left and to the right of the '&&' operator: link.tx.cb && link.tx.cb pb_adv.c 377

The same variable is checked twice link.tx.cb. Apparently, this is a typo, and the second variable to be checked should be link.tx.cb_data.

Fragment N2, buffer overflow

Consider the function net_hostname_getto be used next.

#if defined(CONFIG_NET_HOSTNAME_ENABLE)
const char *net_hostname_get(void);
#else
static inline const char *net_hostname_get(void)
{
  return "zephyr";
}
#endif

In my case, when preprocessing, the option related to the branch was selected #else. That is, in a preprocessed file, the function is implemented like this:

static inline const char *net_hostname_get(void)
{
  return "zephyr";
}

The function returns a pointer to an array of 7 bytes (we take into account the terminal zero at the end of the line).

Now consider the code leading to the exit of the array boundary.

static int do_net_init(void)
{
  ....
  (void)memcpy(hostname, net_hostname_get(), MAX_HOSTNAME_LEN);
  ....
}

PVS-Studio warning: V512 [CWE-119] A call of the 'memcpy' function will lead to the 'net_hostname_get ()' buffer becoming out of range. log_backend_net.c 114

After preprocessing MAX_HOSTNAME_LEN disclosed as follows:

(void)memcpy(hostname, net_hostname_get(),
    sizeof("xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx"));

Accordingly, when copying data, an outflow of the string literal occurs. It is difficult to predict how this will affect the execution of the program, as this leads to undefined behavior.

Fragment N3, potential buffer overrun

int do_write_op_json(struct lwm2m_message *msg)
{
  u8_t value[TOKEN_BUF_LEN];
  u8_t base_name[MAX_RESOURCE_LEN];
  u8_t full_name[MAX_RESOURCE_LEN];
  ....
  /* combine base_name + name */
  snprintf(full_name, TOKEN_BUF_LEN, "%s%s", base_name, value);
  ....
}

PVS-Studio warning: V512 [CWE-119] A call of the 'snprintf' function will lead to overflow of the buffer 'full_name'. lwm2m_rw_json.c 826

If we substitute the values ​​of the macros, then the picture of what is happening is as follows:

u8_t value[64];
u8_t base_name[20];
u8_t full_name[20];
....
snprintf(full_name, 64, "%s%s", base_name, value);

Under buffer full_name, in which the string is formed, only 20 bytes are allocated. At the same time, the parts from which the string is formed are stored in buffers of 20 and 64 bytes in size. Moreover, the constant 64 passed to the function snprintf and designed to prevent the array from crossing the border, it’s obviously too big!

This code does not necessarily lead to a buffer overflow. Perhaps always lucky, and substrings are always very small. However, in general, this code is in no way protected from overflow and contains a classic security flaw CWE-119.

Fragment N4, expression is always true

static int keys_set(const char *name, size_t len_rd, settings_read_cb read_cb,
                    void *cb_arg)
{
  ....
  size_t len;
  ....
  len = read_cb(cb_arg, val, sizeof(val));
  if (len < 0) {
    BT_ERR("Failed to read value (err %zu)", len);
    return -EINVAL;
  }
  ....
}

PVS-Studio warning: V547 [CWE-570] Expression 'len <0' is always false. Unsigned type value is never <0. keys.c 312 Variable len has an unsigned type and, therefore, cannot be less than 0. Accordingly, the error status is not processed in any way. In other places to store the result of the function read_cb type used int or ssize_t. Example:

static inline int mesh_x_set(....)
{
 ssize_t len;
 len = read_cb(cb_arg, out, read_len);
 if (len < 0) {
 ....
}

Note. It seems with function read_cb in general, everything is bad. The fact is that it is declared like this:

static u8_t read_cb(const struct bt_gatt_attr *attr, void *user_data)

A type u8_t this is unsigned char.

The function always returns only positive numbers of type unsigned char. If you put this value in a signed variable of type int or ssize_t, anyway, the value will always be positive. Therefore, in other places, checking for error status does not work either. But I did not delve into the study of this issue.

Fragment N5, something very strange

static char *mntpt_prepare(char *mntpt)
{
  char *cpy_mntpt;

  cpy_mntpt = k_malloc(strlen(mntpt) + 1);
  if (cpy_mntpt) {
    ((u8_t *)mntpt)[strlen(mntpt)] = '';
    memcpy(cpy_mntpt, mntpt, strlen(mntpt));
  }
  return cpy_mntpt;
}

PVS-Studio warning: V575 [CWE-628] The 'memcpy' function doesn't copy the whole string. Use 'strcpy / strcpy_s' function to preserve terminal null. shell.c 427

Strange code

Someone tried to make an analog function strdupbut he didn’t succeed.

Let's start with the analyzer warning. He reports that the function memcpy copies the line, but does not copy the terminal zero, and this is very suspicious.

It seems that this terminal 0 is copied here:

((u8_t *)mntpt)[strlen(mntpt)] = '';

But no! Here's a typo, because of which the terminal zero is copied to itself! Note that writing to the array mntptnot in cpy_mntpt. As a result, the function mntpt_prepare returns a string incomplete with terminal zero.

In fact, the programmer wanted to write like this:

((u8_t *)cpy_mntpt)[strlen(mntpt)] = '';

However, it is still not clear why it was so complicated! This code can be simplified to the following option:

static char *mntpt_prepare(char *mntpt)
{
  char *cpy_mntpt;

  cpy_mntpt = k_malloc(strlen(mntpt) + 1);
  if (cpy_mntpt) {
    strcpy(cpy_mntpt, mntpt);
  }
  return cpy_mntpt;
}

Fragment N6, dereferencing a pointer before validation

int bt_mesh_model_publish(struct bt_mesh_model *model)
{
  ....
  struct bt_mesh_model_pub *pub = model->pub;
  ....
  struct bt_mesh_msg_ctx ctx = {
    .send_rel = pub->send_rel,
  };
  ....
  if (!pub) {
    return -ENOTSUP;
  }
  ....
}

PVS-Studio warning: V595 [CWE-476] The 'pub' pointer was utilized before it was verified against nullptr. Check lines: 708, 719. access.c 708

Very common error pattern. First, the pointer is dereferenced to initialize a member of the structure:

.send_rel = pub->send_rel,

And only then follows a check that this pointer can be null.

Fragment N7-N9, pointer dereferencing before validation

int net_tcp_accept(struct net_context *context, net_tcp_accept_cb_t cb,
                   void *user_data)
{
  ....
  struct tcp *conn = context->tcp;
  ....
  conn->accept_cb = cb;

  if (!conn || conn->state != TCP_LISTEN) {
    return -EINVAL;
  }
  ....
}

PVS-Studio warning: V595 [CWE-476] The 'conn' pointer was utilized before it was verified against nullptr. Check lines: 1071, 1073. tcp2.c 1071

The same as in the previous case. Explanation is not required here.

Two more such errors can be seen here:

  • V595 [CWE-476] The 'context-> tcp' pointer was utilized before it was verified against nullptr. Check lines: 1512, 1518. tcp.c 1512
  • V595 [CWE-476] The 'fsm' pointer was utilized before it was verified against nullptr. Check lines: 365, 382. fsm.c 365

Fragment N10, erroneous check

static int x509_get_subject_alt_name( unsigned char **p,
                                      const unsigned char *end,
                                      mbedtls_x509_sequence *subject_alt_name)
{
  ....
    while( *p < end )
    {
        if( ( end - *p ) < 1 )
            return( MBEDTLS_ERR_X509_INVALID_EXTENSIONS +
                    MBEDTLS_ERR_ASN1_OUT_OF_DATA );
    ....
  }
  ....
}

PVS-Studio warning: V547 [CWE-570] Expression '(end - * p) <1' ​​is always false. x509_crt.c 635 Look carefully at the conditions:

  • * p
  • (end - * p) <1

They contradict each other.

If (* p Fragment N11, unreachable code

uint32_t lv_disp_get_inactive_time(const lv_disp_t * disp)
{
    if(!disp) disp = lv_disp_get_default();
    if(!disp) {
        LV_LOG_WARN("lv_disp_get_inactive_time: no display registered");
        return 0;
    }

    if(disp) return lv_tick_elaps(disp->last_activity_time);

    lv_disp_t * d;
    uint32_t t = UINT32_MAX;
    d          = lv_disp_get_next(NULL);
    while(d) {
        t = LV_MATH_MIN(t, lv_tick_elaps(d->last_activity_time));
        d = lv_disp_get_next(d);
    }

    return t;
}

PVS-Studio warning: V547 [CWE-571] Expression 'disp' is always true. lv_disp.c 148

The function stops working if disp is a null pointer. Then, on the contrary, it is checked that the pointer disp not zero (and this is always the case), and the function again finishes its work.

As a result, part of the code in a function will never get control at all.

Fragment N12, strange return value

static size_t put_end_tlv(struct lwm2m_output_context *out, u16_t mark_pos,
        u8_t *writer_flags, u8_t writer_flag,
        int tlv_type, int tlv_id)
{
  struct tlv_out_formatter_data *fd;
  struct oma_tlv tlv;
  u32_t len = 0U;

  fd = engine_get_out_user_data(out);
  if (!fd) {
    return 0;
  }

  *writer_flags &= ~writer_flag;

  len = out->out_cpkt->offset - mark_pos;

  /* use stored location */
  fd->mark_pos = mark_pos;

  /* set instance length */
  tlv_setup(&tlv, tlv_type, tlv_id, len);
  len = oma_tlv_put(&tlv, out, NULL, true) - tlv.length;
  return 0;
}

PVS-Studio warning: V1001 The 'len' variable is assigned but is not used by the end of the function. lwm2m_rw_oma_tlv.c 338

The function contains two operators. returnwhich both return 0. It is strange that the function always returns 0. It is also strange that the variable len after assignment is no longer used. I have a big suspicion that it should actually be written like this:

  len = oma_tlv_put(&tlv, out, NULL, true) - tlv.length;
  return len;
}

Fragment N13-N16, synchronization error

static int nvs_startup(struct nvs_fs *fs)
{
  ....
  k_mutex_lock(&fs->nvs_lock, K_FOREVER);
  ....
  if (fs->ate_wra == fs->data_wra && last_ate.len) {
    return -ESPIPE;
  }
  ....
end:
  k_mutex_unlock(&fs->nvs_lock);
  return rc;
}

PVS-Studio warning: V1020 The function exited without calling the 'k_mutex_unlock' function. Check lines: 620, 549. nvs.c 620

There is a situation when a function completes its work without unlocking the mutex. As I understand it, it would be correct to write like this:

static int nvs_startup(struct nvs_fs *fs)
{
  ....
  k_mutex_lock(&fs->nvs_lock, K_FOREVER);
  ....
  if (fs->ate_wra == fs->data_wra && last_ate.len) {
    rc = -ESPIPE;
    goto end;
  }
  ....
end:
  k_mutex_unlock(&fs->nvs_lock);
  return rc;
}

Three more such errors:

  • V1020 The function exited without calling the 'k_mutex_unlock' function. Check lines: 574, 549. nvs.c 574
  • V1020 The function exited without calling the 'k_mutex_unlock' function. Check lines: 908, 890. net_context.c 908
  • V1020 The function exited without calling the 'k_mutex_unlock' function. Check lines: 1194, 1189. shell.c 1194

Conclusion

I hope you enjoyed it. Visit us at the blog read about checks of other projects and other interesting publications.

Use static analyzers in your work to reduce the number of errors and potential vulnerabilities even at the stage of writing code. Especially early error detection is relevant for embedded systems, updating programs in which is often a time-consuming and expensive process.

I also suggest not postponing and trying to check your projects using the PVS-Studio analyzer. See article: How to quickly see interesting warnings generated by the PVS-Studio analyzer for C and C ++ code?

If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov. Checking the Code of Zephyr Operating System.


0 Comments

Leave a Reply