Fuzzing testing. Practical use

A little about the team

The content team on the supplier portal is responsible for the careful and responsible storage of data on product cards of all sellers on Wildberries.

One of the main tasks is the development and support of an API service designed to work with cards. Card content is all the text that you see on the product page.

This is a product card

This is a product card

The second important task of the team is the processing of text content, namely, validation and normalization. Moreover, this must be done quickly so that suppliers do not wait long while their product data is updated.

Well, the third task is to transmit data about goods. Since we are a master data system and all our other teams receive information about products from us, we must transfer it to our other employees, services, and data consumers.

Why Fuzzing?

We came to testing code using fuzzing for several reasons.

  1. In our repositories a large number of methods with complex logic have accumulated.

  2. Lots of cases in unit tests, that is, about 1000 lines of code. When we had an error, a number of these test cases would break. The developers spent a lot of time solving this problem, and the company spent a lot of money.

  3. The request may contain unexpected data. We serve suppliers from different CIS countries: Armenia, Kazakhstan and many others. And they can write whatever they want on the card. It often happens that they took some kind of PDF file or Word, copied the data from there, threw it, and all the characters from there migrated to us, which is why we had errors.

  4. Checking internal integrations. Whenever we normalize and validate data about product cards, we transfer it to our consumers (data consuming services), they receive this data, process it and perform certain actions with it. It happens that one of our fields becomes deprecated (outdated). We remove it, send it to our consumers, and then, for example, the logistics team says: “Damn, all our logic depended on this, now we’re lying” That's why we need to check all this before making any changes to the systems.

But before we move on to practical examples, let's start with where we ourselves once started – namely, with theory.

What is Fuzzing?

Illustration from the article "Where to look for bugs using fuzzing and where did this method come from?"

Fuzzing is a program testing technique that occurs through generated random values. This technology is widespread and used for testing highly loaded applications.

Fuzzing is best used when testing complex code, since it is physically very difficult to cover all possible variations in input data using standard unit testing. Usually, when we create a unit test as a developer, we throw several test cases for those moments in which we understand where our program can go wrong, but not all of them. And with the help of fuzzing we can completely cover all our code with tests.

Fuzzing Go

This programming language uses coverage-guidance fuzzer, which is available out of the box from version 1.18. Previously, fuzzing was also available in Go, but as a separate library, gofuzz.

The fuzz test itself is a function that has the fuzz prefix (mandatory) and takes the value testing.F as input. It has all the same methods as the testing.T object, but it adds two new functions. These are f.Add and f.Fuzz.

What are these functions? With the add function we add values ​​to the initial body of our fuzzer, and with the fuzz method we initialize a function where the fuzzer will generate the value we want and execute our test. The values ​​that the fuzzer generates are called fuzzing arguments.

But how do you run these tests? First we use the command go test with a flag -fuzz and the name of our fuzz function. For example, go test -fuzz FuzzTestFuncName.

The command has other flags:

  • -fuzztime – is responsible for the amount of time that the fuzzer should work, or the number of iterations;

  • -fuzzminimizetime — the amount of time or number of iterations that are minimized before being written to the corpus;

  • -parallelwhich is responsible for how many threads our fuzzer will run.

Code Coverage with Fuzzing

Once again, we note that Go uses coverage guidance fuzzer. Its trick is that when running a test with the generated values, it determines which parts of the code it affected. If a new code branch was touched with a new argument, then the fuzzer will remember this value and place it in the generated corpus. Next, he starts doing mutations on it to test most of the code.

In this example, we see that the function is initially called without seeds, since we did not add values ​​to the initial corpus. Fuzzer mutates an empty string by adding random values. He does this until he reaches the next condition. On this condition, we already have the value AAAA in the corpus, followed by FAAA, FUAA, FUZA, and we reach panic with the value FUZZ.

In general, fuzzer could make the value FUZZ and some kind of super-large string that would contain characters we don’t need: a period, a slash, and so on. And here minimize is used, that is, it removes all unnecessary bytes, and we arrive at the optimal value, which causes our panic.

Where is the corpus data?

The data of the generated body lies along the path that you see in the screenshot. There are files there, and they describe the version of the fuzzer that was launched, and the value itself. We also have an initial corpus, and the data for it is in your project, where you ran the fuzz test in testdata/fuzz/ + the name of your function.

Fuzz testing result

  1. Here you can see base coat assembly records. To collect basic coverage, fuzzing runs tests with the original corpus and the generated corpus in order to make sure there are no errors and understand the code coverage at the beginning of the test. We fill the original corpus ourselves using the add method, and the generated corpus is filled as the fuzzer runs. There are conclusions that led to the call of a new line of code.

  1. It describes how many threads the fuzzer worked on, this is 10 workers.

  1. It shows how long it took to work from the first launch of fuzzing, and that during this time about 1 million 411 thousand tests were launched with various input data, as well as how many seeds were recorded as fuzzing worked in 3 seconds. That is, in our case it is 0, but in total we already have 8 seeds in the case.

Fuzzing and http handler

Let's move on to examples. The first thing we started using fuzzing as a team was the http handler. For example, let's take this problem. The product card has a description field, and you cannot add more than three Emoji to it.

To solve this problem, we described a handler that accepts a request. This request has a description field. We send this description to the checkEmoji method. The method retrieves all the Emoji from there, after which we check the number it found. If there are more than 3, we give an error. If everything is ok, then we give ok.

To test this functionality we can use two types of testsirrational And rational. Let's take a closer look at them.

  1. Irrational type

Here we describe a fuzz target, in which we ask the fuzzer to generate a slice byte for us. We validate it: that it is valid JSON, it is unmarshaled, and description is not an empty string. Then we send you for testing.

What's the problem here? When we validate, we skip test cases, and because of the valid and unmarshal methods, we fuzz them rather than the handler we actually want to test. That is, all seeds that are added to the generated corpus will be generated to check the valid and unmarshal methods, and not the handler function, which is why more time will be spent on finding an error.

  1. Rational fuzz target

Here we ask fuzzer to make us a string, we will immediately send it to the model, unmarshal, and we will give this ready-made JSON for testing our handler. Thus, the fuzzer will be specifically for the handler, we will quickly find the error and happily go to fix it.

If we compare both options, we will see that the operating time of fuzzing in the case of an irrational approach is significantly longer than that of a rational one

There are fewer seeds in one of the approaches, but they are more accurate in order to find an error in your code. The number of tests run also varies significantly, and in the case of using the rational approach there are fewer of them, which is why we use less computer resources when calling fuzzing such an implementation.

Fuzzing and unit testing

Let's move on to the next case – when you can use fuzzing and unit testing side by side. Namely, when you need to generate your own random arguments using fuzz arguments to test a function. Let's take a look at the function we will be testing.

This is the Sum function. It takes a slice of numbers as input and finds the sum of these values ​​in the slice. There is a deliberate mistake made here – we ignore values ​​​​that are divisible by 100,000 so that we can find some kind of error.

Next, we describe a fuzz test in which we initially add 10 to the initial corpus so that the fuzzer begins to be generated from it. We also added that the length of the slice that we will generate should not be greater than 10. We did this using the mod operation. We go through the loop from 0 to this n and generate our value using rand.Int 63. Next, we fill in our test slice, calculate the amount that should be obtained and describe the string in which we will display the error and what arguments led to this error.

After which we run it all, check it and get the answer that the actual and expected sums are different. That is, we have an error.

This can be transferred to a unit test in order to run it later. If we run fuzzing again at this point, it will generate new values ​​and we won't get the same error.

As a result, we get the following algorithm:

  • generate test data;

  • We display “interesting” values ​​– those that led to the error;

  • we save “interesting” values ​​in test cases;

  • fixing the code;

  • run a unit test;

  • repeat from the beginning.

If there are no more errors, we repeat again from the beginning so that the fuzzer may still find some error.

We use this approach only for small functions. For large functions, this is an incorrect check, because coverage guidance will not work, and we will not be able to cover the entire code with tests. That is, here it turns out to be a black box, not a gray box.

Generating content in a foreign language

Our users are hundreds of thousands of sellers from different CIS countries. And sometimes we have cards that are filled out not in Latin or Cyrillic, but, for example, in Armenian.

This kind of text could cause our code to throw an error because we didn't know what the characters might be. There may also be some problem in the database or in the code, especially in terms of normalizing strings to find, for example, incorrect words.

For such cases, we made the following fuzz test that generates text in a foreign language:

If a new country with a new language is added, we will take the lower and upper bounds and string length from unicode.

Next, using mod, we will describe the generation of the string length that we should have.

And after that we will generate a string in the language that we need. We go to unicode, generate a character code and container it all together. After that, we apply it to the function under test.

As a result, we get an error. In this example, the error was caused by this round symbol – the Armenian sign of eternity, and after the test we successfully fixed it.

Fuzzing and SQL

How to fuzz SQL queries? Let's look at a few examples.

Method for writing to a database table

For example, let's take the function CreateTestRow. There is an SQL query here that creates a certain row in our table .foo. We give the identifier of this line that we created. Ignore the error ErrNoRowswe output the rest.

In order to check this, we wrote a fuzz test in which we initialize the test container.

Why do we take a container? We tried to do this with a virtual machine, but ran into problems. When fuzzing stops working, the machine itself sends a signal kill on the fuzzing process, and this kill I had to track it somehow. This did not work for us, which is why we were unable to clear data from the table. Therefore, we chose a solution with test containers. We launch the test container, send fuzzing parameters there, it creates them, the fuzz test ends, we disconnect from the test container, everything is cleared, and we can start again.

After we have initialized the test container and launched the primary migrations to test our method, we describe the fuzz target itself and ask it to generate some string for us. We send this string to the method for creating a string in the database and run our test.

As a result, we find the following error: a certain symbol cannot be written to the database, which is why we need to fix it.

Reading from a database

Let's move on to the next example – this is reading from a database.

We have a query where we get an identifier from a table .foo by bar value using the operation ILIKE. Again, ignore ErrNoRows, add error handling if we have time. out, and output the error and identifiers.

We describe the initialization of the test container, launch our migrations, describe the fuzz target and transfer our method there FindTestRows.

After which we find an error, namely an SQL injection, which we need to fix, that is, use, for example, placeholders in our query.

What bugs could we find using fuzzing?

  1. Panic when working with indexes

This applies to methods in which we work with slices, go-channels, pointers or reflection. You can also catch the error of division by zero, for example.

  1. Errors when calling the wrong status code when the handler is running

For example, we have an authorization error, but we give a 500 error, although we should get a 403. Fuzzing helped to detect these problems.

  1. An unpleasant bug associated with the symbols й and ё

In the text validation method, namely in the search function for prohibited words, we need to catch prohibited words or phrases from the text content of the product card and display them to the seller. So that he understands why his data is not saved in our table.

When searching for a string, we normalize it for further search, and as a result we get the found prohibited word in an unnormalized form. To display the prohibited word to the client, we need to denormalize the string to its original form. This is where the bug was.

Let's illustrate the error. There is the word “vape” in a certain category, for example, electronics, and it is considered prohibited. We find and display not “vape”, but some kind of “vein”, where n is actually a Latin letter. All this was caught and repaired. Thanks to fuzzing for helping us find such bugs.

How to use fuzzing for load testing

What do we want to get?

  1. Don't spoil anything with a new release

When we make edits, add new fields, remove old fields, or change the logic for generating current fields, we must not spoil anything for either our integrators or ourselves.

  1. Load test for us and our consumers

Firstly, we must conduct a load test with the new logic of our writer to the broker, and secondly, we must understand whether our consumers (especially new ones) can cope with the amount of information that we transfer to the broker or not.

What decision did we make?

  • Initially we wrote a fuzz test.

  • Next, they described methods for generating events for product cards using, for example, byte shuffle. And we created the formation of category events, product cards, sizes and characteristics.

  • Next, we sent these events to a message broker, for example, to Kafka.

  • We transferred the data for connecting to the broker to our consumers so that they could test their integration with us.

  • We implemented a bot that collected events from Grafana. If someone made a mistake, he immediately wrote it down in the alert. They watched the bot, tagged who was responsible for this service, and went to fix it.

  • We set up monitoring of these alert data when running fast tests in the stage environment, and now we notify our consumers that we are launching integration tests. We wait about an hour after launch. If there are no alerts, then everything is ok, and we can move on.

  • Configured resource monitoring. If any of our or our consumers' services begins to throttle or restart due to memory overflow, we also receive notifications. Thus, new data consumer services can test the load when integrating with us before going into production.

As a result, we received fewer bugs in production, more saved nerve cells during release, and more confidence among new consumers when entering production.

conclusions

  • Fuzzing in Go is awesome. It can be used almost anywhere, but not everywhere.

  • Don't skip fuzz tests because of generated arguments. Because by doing so you will be fuzzing precisely these methods, and not the ones you need.

  • If you had to generate data from fuzz data, then if there is an error, output it as a result. To describe a test case in a unit test based on the generated values ​​for further testing of your code. This is suitable for small functions where you do not want to describe the test cases yourself, but trust the fuzzer.

  • For fuzzing SQL, you need to use a test container so as not to store fuzz values ​​in tables. You can also use fuzzing in Go to test methods that call SQL queries, such as SQL injections.

  • Fuzzing can be used for load testing – that’s exactly what we did in our team.

Fuzzing is easy to use, but at the same time helps to catch difficult to reproduce and very unpleasant bugs. You can share your cases of using fuzzing in the comments)

The material was prepared based on my talk at Golang Conf.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *