why is it never a shame to take and redo everything

Where did we start building SOCs?

This story will be somewhat personal – with one of the previous teams, in which I was once just an engineer, many years ago we started building SOCs for customers, not really understanding how to do it correctly.
Everything was used – international standards, common sense and the desire to “do well.” As a result, looking back, we see that what the centers have become now and what they were then are obviously very different things. But what the SOC became, which we assembled over 1-2 years, was strikingly different from everything that was at the very beginning – from ideology to direct execution.
Several times in each of the projects the framework itself was reworked – types of incidents, playbooks, teams of analysts. Roles, classifications, and the very approach to how to investigate incidents changed.
Initially, we incorporated spiral development into each project, the “Did it? redo” for each link in this chain. That is, an integral part of any playbook was the step “Look retrospectively at all the steps and decide how to change it for further work.” In general, if you initially set up a playbook as a set of specific steps, you can iteratively improve them at a comfortable pace, but changing the concept itself on the fly is difficult, expensive (at least in terms of time) and, most importantly, it does not provide any return on the project, not to mention that every step must be documented. It seems like I did something, but this option is never final, never the best, and in fact, sometimes it’s not even a fact that it works.
We were torn between helping set up processes and closing the project and fulfilling our obligations to the customer, and in fact there are rarely cases when these two concepts mean the same thing. The balance between staying on time, ensuring that the team doesn’t burn out and be able to juggle multiple projects at once, and ensuring that the result is good is rarely achievable, especially the first time.
As a result, at some point the result was considered final, but in fact everything was just beginning.

How all the cool ideas found their place

Often, during the development (or construction) process, several people can come up with countless innovative and good ideas; it is a pity that there are not always resources to test these hypotheses.
And it’s a pity when the receiving party does not light up with these ideas, which is generally understandable when you work with a team that grabs at everything in sight in search of the ideal option. In fact, something clicks when everything is done correctly – you look at the result of your work and don’t want to change anything, because everything is done in place and well. We got the greatest return from working with visualization and performance metrics: when you start interacting with people from the SOC team, understand their pains and needs and take their place for a while, it becomes much easier to understand what exactly you would like to see at the moment. Later in Security Vision, this evolved into the idea of ​​full-fledged personal accounts for both the customer and individual analysts – and each element was selected based on our personal experience in building a SOC or working in it. That is, the tools are not actually taken from nowhere and are not copied from various decrees and orders – they are mainly experience.
A lot of things had to be postponed due to lack of time, but some of them waited in the wings years later – for example, our desire not to get hung up on a hard-coded playbook and create something dynamic in real time 🙂

How many times have we remade everything

Then
Honestly, a lot. How many times have we started from scratch, that would be more accurate to say. How many times we built it, so many times we started it. But in fact, all jokes aside, that’s how it all happened. And every time, coming to a new customer and starting to “do SOC,” we tried over and over again to find that same “formula for success.” Nowadays SOC, including commercial and including SOCaaS, is quite widespread and has acquired its own public and private standards, but then, 8 years ago, it was difficult.
Did the experience of foreign colleagues help? Partially yes. But in the end, having worked a little in an international company, I can personally say that the processes vary greatly – from the choice of tools to the approach to creating a workflow. This all comes from top to bottom, from business to information security, and different needs require different implementations.

and now
Now our entire Security Vision team, of which I am a part, is developing ready-made boxed solutions for internal processes. And we can’t count how many times we worked on the hypothesis and the concept of one box all together.
Sometimes it’s fun to compare before and after in retrospect, putting side by side the process plan at the very beginning and the final version, already demonstrated to the customer – most often the difference is large, and the changes concern a wide variety of parts of the product.
For example, we worked several times on how to implement response actions in the system in an optimally understandable way, and eventually came to an object-oriented approach – almost the same one that one foreign vendor had for several years (but not the same one), but which did not take off due to the specifics of the work of analysts with us.

Why do alterations occur?

hypotheses
We always come up with something new. You always want to find a silver bullet for everything – unique, unlike anything else, new – and be the first to do it. This applies to both problems with UI/UX, banal playbooks, workplaces, and a generally global approach to both incidents and assets – sometimes working out a hypothesis requires intervention in processes adjacent to information security, working with the same IT department and business.
Hypotheses arise both from experience and from related areas – for example, from BI we learned the now traditional approach to planning and setting up dashboards, and from some open community tools – the vector along which we began to build expertise.
expertise
Analysts do not stand still. Technologies make it possible to automate routine tasks more and more, freeing up time for something non-standard. Then we also face the task of standardizing this and automating it again and again.
What previously seemed like manual labor and a set of complex rules – the same classification – now finds its implementation in dynamic execution, sometimes even using ML.
The increased level of expertise also dictates a different level of requirements for the product – for example, previously, simply the ability to take and write a connector in the system was considered a good result, then – the presence of a marketplace and a set of connectors somewhere, and now – a pre-configured library of integrations that work out of the box .
changing world
Other manufacturers and specialists are also not standing still – in addition to technical approaches to incident management, more and more new best practice investigations are emerging in principle – where they now look to search for evidence, how they interact with the same TI to obtain a complete picture and markers of the attacker in a seemingly simple story. Not only are there new classes of solutions with which integration is required, but the logic, coupled with the expertise embedded in them, gives more scope.
trends
For example, working with network forensics and new tools.
New methods of attackers to obtain information, which are now classified differently than before.
Not to mention AI, which is used on both sides of the fence.

How to do it well?

The most important thing is to understand that you will never be able to do something perfectly the first time.
optimization
Both the processes and the final details of the solution are optimized – be it the operation of a specific connector or even the algorithm for filling out a table with data on an incident card.
The chain of steps that an analyst uses during an investigation must be considered each time in the context of the number of attempts that a person makes within one algorithm: something is redundant, something is missing over and over again, and this is what needs to be changed; combine and reduce everything that will allow you to achieve your goal faster and more clearly.
hear the customer's needs
Sometimes it’s not worth redoing something that’s already established just because it’s not optimal. Habits can work more effectively than innovations even over a long period of time.
And, on the other hand, you should never reject new ideas, even if you personally think they are untenable or simply stupid. After testing, a seemingly harmless desire to move a button or implement communication in another way can significantly speed up the work process.
more automation
But only if it is more effective than manual labor over time.
You shouldn’t automate absolutely everything without a clear plan and some proven implementation algorithm. Sometimes automation is not possible (yet) in the area of ​​expertise, and time can be spent on something else.
Nevertheless, speaking about the same tendencies and tendencies, we can draw conclusions that any habitual action of an analyst should turn into at least a button, and not wandering in the labyrinths of another system.
review the process regularly
And we are talking here not only about a specific playbook, but also about the incident management process as a whole.
How people communicate, how a manager accepts and evaluates work – all this also requires regular review, in addition to technical settings. Sometimes communications are much more important than the engine under the hood, and a clear reward system (hello, gamification!) motivates much better than dry numbers on a dashboard.
You cannot simultaneously get a good and fast mechanism for analyzing incidents if you have an untouchable paper volume with printouts of flowcharts lying under a layer of dust.

Interpretation of LL in the context of the incident

In fact, this whole article could only be about that.
What is it, that last step, the last phase of NIST and the depressing need for some to return to the very beginning? Traditionally, working on errors is the job of an analyst; often an L3 analyst combines both investigation and this difficult process.
We are working to delegate this process to automation (and there have even been attempts), but for now a person needs to look at the result of his work and take, for example, several hundred closed incidents of the same type in one way or another, decide in which moment he was missing something or something was superfluous. What the investigation ran into each time and lost in speed and content. Some changes could be applied immediately, while others would require time and reworking of the entire concept. Let's take a closer look:
rules
This is probably the biggest problem and pain – to correctly configure the collection of events and their correlation. Correctly optimizing them is a huge task, not even for one year, and most SOCs use complex multi-level rules, “if there was not” constructs in addition to “if there was”, and so on. It’s not just a matter of processing false positives, although that’s also the case, but also how to correctly apply the result of investigations – perhaps it would be useful to change the logic, approach and aggregation of rules, change something in the setup for receiving events from the source.
Rules do not always lead to false, of all possible problems, this can affect performance and the likelihood of getting the expected result. We couldn’t count how many times we rewrote the rules for our first SOC: adaptation to the features of the infrastructure, to other related processes in the company (for example, policy management), the ability to get the same thing, but faster, and the like.
infrastructure
This is a real nightmare for a juice builder – to do everything close to ideal, to make friends with IT and IS teams, to test all the integrations and to be faced with the fact that the usual scheme scales with great difficulty when the customer undergoes internal infrastructure changes. When one information security system is replaced by another, but with a slightly different operating principle, when network segments move in all directions and now the old buttons, approaches, communications – everything does not work.
And on the other hand, it will never work without this, because the post-incident stage and hardening precisely contain the need to change this infrastructure, configure it differently and protect it more reliably.
After an incident has been investigated, and especially a real one, changes can be quite dramatic, to the point that the customer can begin developing an independent detection solution.
playbooks
Something that undergoes changes regularly.
At first – on paper, when the real system was still far away, and the processes in the company were just being built. Then both the logic of the work and the approach to the description changed – sometimes these were UML diagrams with actors, sometimes we used BPMN notation (like some vendors), sometimes just text with numbering was used, and an ignorant person looking into such a document , I might think that we are talking about a strange role-playing game – if you get a virus, open page 211 – there was something in it from the world of games.
Afterwards – already in the system itself, when all the necessary milestones were determined (which policies work, where the areas of competence lie), there came a moment of testing, during which hypotheses were cut off, and specially untrained people were forced to go through all the stages like an analyst (as we understood , how accessible the product is to a person without experience, because by the end of the tests the eye was blurred).
When that was over, the final version was given to trained people, and we monitored the analysts – what they were looking for, where they were looking – in order to better adjust the visual and make it more convenient.
And finally, we reviewed the fruits of our labors after finishing the work to make sure that everything (or almost everything) was done wrong.

conclusions

A lot of copies were broken along the way.
What the Russian SOC now represents in a vacuum is the technology stack and the experience of many analytical teams who, through time and perseverance, have developed a certain standard for the incident management process.
Solutions and products are trying to adapt to this standard in order to catch up with experience and open the way to something more. The main thing to remember is that redoing is not a shame or sad, it is a very important step towards the development and development of something truly useful.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *