Do you have Detection as a Code in your SIEM? No? Will soon

Hi! My name is Kermen, I am a second-line SOC analyst. Our team examines data from Ozon infrastructure and services to identify illegitimate activity: from violations of information security policies to targeted attacks.

We receive millions of events every minute — reviewing them manually and in real time is impossible. That’s why we automate some of our work with correlation rules — scheduled queries that alert us when a condition is met: an indicator of compromise has been found, a sequence of actions has occurred, or a threshold in the number of events has been reached.

Sometimes the information in events is not enough for us, so we enrich them with additional features using auxiliary objects: tables, macros, machine learning models.

They and the correlation rules are the unique content of our SOC. And today the number of all objects is in the hundreds, and it is quite difficult to keep track of each one: you need to understand when the rule was created, changed, who the author was, whether it was tested and whether it is still viable.

We didn't have the process and tools to answer our questions. As a result, we reached chaos and couldn't control it.

We wanted to put things in order, so we borrowed the developers' experience: we made a new storage format, added more meta-information, set up CI/CD, and now our objects have their own life cycle, including the stages of development, testing, transfer to production, revision, and shutdown.

For this purpose, a framework for content development and management was born — catzone or, as it is more often called, kotozone. In this article, I will tell you how we came to it, what it consists of and how it now simplifies our lives.

By the way, our manager told us about the tradition of naming SOC services using cats. People Tech Ask.

It's impossible to live as before

Remember the millions of events? Great! They are stored in our SIEM — a system that helps us collect data, store it, and process it. In most cases, SIEM is a proprietary solution, where users play by the vendor’s rules: it is difficult or impossible to achieve changes to the system.

When I came to the internship, our SIEM configuration was already stored in the version control system, and any changes had to undergo code review.

In addition to creating the request itself, the content developer had to do the following:

add mandatory fields to the request with your paws – metadata of the rule itself: unique identifier, status, type, criticality, etc.;
observe the limitations of configuration files: escape the newline, copy parameters that never change, but are needed;
resolve merge conflicts because the rules were stored in one large file;
create a playbook page on Confluence;
Be extremely careful: no one except the author and the reviewer will notice the mistake.

At the same time, we lacked metadata about our objects (for example, who, when, why created, when reviewed, when deemed unnecessary and for which ticket), which would be in one place, and not scattered across tickets, threads and merge requests. And we also wanted to store some metrics about triggers, which my colleague recently talked about, in the rule itself, in order to monitor its quality.

Yes, our hearts demanded changes! And in theory, we could improve SIEM for our wishes, but there are two buts:

the inconvenience of working with configuration files remains;
It is expensive and scary to get your hands on a proprietary solution.

It would seem that this is all, but you have to fight for your dream! A crutch? A crutch!

How to live in a new way

We looked at several open source projects and were inspired by the idea: we format the rules in YAML, and create configuration files using Jinja templates. This principle of operation became the basis for our own solution.

All that was needed was to decide on the contents of the YAMLs, the structure of their storage, learn how to create configs, set up a pipeline and add a couple more automations. The plan sounded easy and reliable, all that was left was to do it ^^ What a job!

YAML is a user-friendly format, so there was no doubt about using it. It was more difficult to decide on the structure: what should be there now, and what will be useful in the future?

It took us a week to think about it, after which we identified two groups of fields:

The following fields have become mandatory:

object name;
version;
author;
date of creation, last revision (if necessary, closing);
description;
tags: history, related ticket, status.

Some of them are used for further analysis:

we calculate how much we created and how much we closed;
we notify you if it is time to review the property;
We initiate the transfer to production after calculating the quality of the work and the testing period.

We have many objects, and it is impossible to cover all the features in one article, so I suggest considering some of the unique fields using the example of detections.

Actually, we didn't explicitly specify this, but we noticed that we have three groups of different saved queries:

detections – what will turn into alerts;
baselines – what prepares data for alerts, reports and dashboards;
reports – what is the download or report.

Detections have a mode — their operating modes:

observer — an observation mode to which the first line does not need to respond. Often these are rules with low criticality and a large number of legitimate actions;
responder — a mode in which we need a first line response. Yes, alerts need to be sorted out!

Several observer rules can be included in one responder rule, so we search for a chain of events and reduce noise. Such multi-stage rules have their own type, which we call meta. There are also TTP, anomaly and mitigation.

Of course, we add the Kill Chain phase, the MITRE Matrix technique, tactics and procedure, criticality.

We also have confidence — an indicator of how correct the logic and implementation of the query in the rule are. This is an integer value that is set by analyzing the triggering of alerts and tags that the first line set. We strive to revise the rules if they are noisy.

But it is possible that this indicator will be revised, because we are moving to our new idea – dynamic exceptions.

Each object is stored in a separate YAML and in its own directory (sometimes in a subdirectory). And this is convenient, because it allows us to manage codeowners more flexibly: one of the department teams now independently manages its exceptions to the rules.

By the way, about rights: the content can be used not only by the employees of the Information Security Department, but also by our other colleagues. However, we are not ready to disclose everything to them, so in the YAML objects themselves we indicate those to whom we grant access. It has become more convenient, because earlier, when creating, for example, tables, it was necessary to make settings in different configs, and now everything is in one file.

Pipeline is something about pen, pineapple, apple, pen?

The repository is created, there are YAML examples, which means it's time to put it all together using three stages.

Before the creation of MR

There are two ways to create a new YAML:

if the rule already exists, then for easier transfer to the new format you can use a parser that creates a ready-made YAML, where you need to fill in the empty fields;
If you need to create a rule from scratch, you can take a ready-made template.

Once the file is ready, it needs to be committed. And here the pre-commit hook comes to the rescue, which validates fields and values. For example, it checks that the rule with the report type is stored in the report directory, the date matches the format, and the status field contains only one of two values - prod or test.

Each object has its own validation, and it has saved me more than once from situations after which I am ashamed to look the reviewer in the eye. And yes, checking at this stage is a way to help the performer not make mistakes due to carelessness.

After creating MR

If the author has done everything he could and is ready to present his creation, he opens a merge request. In fact, code review is a sore subject for us, because it takes up the lion's share of the analyst's working time. We are still looking for a solution! But for now we are trying to reduce the load by additional automatic checks.

Standardization

Some fields are best left to automation, so at this stage the modified or created YAMLs are brought to a uniform form: the object identifier is added, as well as the identifiers of the macros and tables used in the query, the date of changes is updated, and the query itself is brought to our code formatting standard.

Previously, we also tried to calculate the criticality of our rule based on the existing fields, but this approach generated more controversy than the expert assessment method. In life, you have to give up something, so subjectivity is everything.

Checking the correctness of the config creation

Even if the YAML is clumsy, the request is incorrect, and the reviewer is in a good mood, it will still not be possible to break the configs. Here we have a check configured that the final file is in order, and nothing will break when merging changes. If the check fails, it will not be possible to add changes.

After the merger of MR

Creating configs

Remember our pain at the very beginning? Yes, I'm talking about the required fields in the request! Now they are tags in YAML, so there is no need to add them to the request anymore! However, we still need them to send to IRP, but now the fields are silently and seamlessly inserted into the request when creating configuration files.

Here we also substitute parameters that have never changed. I think half the team doesn't know about their existence. Is that good or bad?

After the configs are created and the tables and models are moved to the correct directory, the correctness check is launched again. Only if the check is successful, we move on!

Release

Just in case, we create releases and store them, you never know what might happen! And we also track our path and count how many updates we create.

Deploy

We save the created application in artifacts, and then deploy it to the main repository. From there, the content of the catozone reaches the SIEM.

Our pains and mistakes

We had a hard time maintaining discipline: since everything was in the hands of one person at the initial stage, changes were made quickly and haphazardly, and documentation was changed late.

At the same time, testing was minimal, which also led to the last last fixes.

Now we have learned to release new versions of the framework so that they are tested, the changes are documented, and the employees are familiar with them. But the first testers of the cat zone suffered a lot!

So we must not forget three important principles:

test, test and test again;
documentation saves from a large number of questions;
Don't trust the user, even if he is your colleague. Someone will definitely not read the documentation.

We look boldly to the future

We've been with Kotozon for a year now, he's settled in and become one of SOC's pets. He's growing and improving constantly, what else could you dream of?

Okay, I’m being disingenuous, we dream of many things:

migrate data sources, processes and rule exceptions to the new format;
add validation and testing as a full stage of the pipeline;
create a linter and much, much more.

Now the framework is supported by engineers, so we can believe in our bright future! All that's left is to ping and make eyes like the cat from Shrek.

This is how the story turned out, I hope our experience was useful in some way, and you got some ideas for yourself. Don't forget to pull changes from the main branch, be fashionable and youthful, follow the trends of the cat zone! See you!

Do you have Detection as a Code in your SIEM? No? Will soon

It's impossible to live as before

How to live in a new way