Saint HighLoad++ 2024. Traveler's notes

High loads in all their glory!

High loads in all its glory!

“Does it work? Don't touch it!” But not in HighLoad! You need to constantly grow. Change and redo everything. But how? And with the help of what practices? Or maybe it will do? I went looking for an answer on Saint HighLoad++.


The beginning of a short journey.

The beginning of a short journey.

Midnight, night express, conversations about life in English. I am lucky with my fellow travelers. This time luck accumulated in the form of a foreign tourist who visited St. Petersburg, Moscow and headed back to the northern capital for departure.

The future space engineer, who is going to defend his master's degree in Italy, talked about his trip, Italian dishes and his native Turkey.

Opening of the conference

Incendiary start.

Incendiary start.

Upon arrival, I checked into the hotel early in the morning. And then headed to the location DESIGN DISTRICT DAA in SPBwhere the two-day conference on high loads was to take place. Reports in the program seemed varied and interesting. The AI ​​theme was also present.

So that the guests wouldn't get bored before the opening, which was to take place in a huge hall. “Tower” with a 48-meter ceiling, drummers came out on stage. The musicians beat out a lively beat, setting the rhythm for the entire upcoming event.

By this point I had already started to perform one of the 3 main activities of the conference – getting to know each other.

Reports

What to choose?

What to choose?

After the opening, it was necessary to decide which of the reports to attend. Since there are many streams going on at the same time, you need to choose the most interesting one for yourself at the moment. Only at the end of the conference did I understand how to manage everything at once thanks to the secret bunker. But more on that later.

Report No. 1. Running scheduled tasks on the backend

The scheduler is like a regular service that needs all the

The story was built from simple to complex. You can start with cron on one node. And then, as the tasks grow and geographic distribution increases, move on to targeted solutions such as – Cluster Scheduled tasks, Quartz Clustering, Dkron, JobRunrHave you ever seen such?

At one time I came across well-functioning minimal scripts in cron just on one node. Therefore, it was interesting to find out why such large-scale solutions are needed, what pitfalls might there be? There were quite a few of them.

First – different time zones, change of summer, winter time. When a task can be executed several times or not executed at all. For example, the task of calculating interest once an hour may not be completed correctly.

Error processing. If the task is not completed, what should I do? Do it again? For stateless & idempotent operations can be done. For statefull & run once it is already necessary to store the state in case of a scheduler crash. So that it can correctly recover and understand what has already been done and does not need to be repeated.

An example of such a task is ordering plastic cards for employees bank client companies. Such orders are formed on the basis of incoming lists and are transferred to another system periodically. Repeated ordering of cards incurs additional costs for the bank. If we miss the issue, the client may be very upset.

The topic of testing the scheduler itself on a single node and distributed was also touched upon.

This report reminded me of a reflection on the topic “what is working code?“. Is this just code that runs on production? Or is there something more – the code itself, tests for it, architectural patterns used, monitoring, tools around?

The elements of the planner were also covered:

  1. Copies

  2. Planners

  3. Tasks

  4. Triggers

The report helped me to draw the following conclusions:

  • The scheduler can be considered as a regular service, which is characterized by all the standard pains of distributed work and communication with other services.

  • Rely on a boxed solution with intention – “It just works, and it’ll do just fine” is possible only after a good study of the specifics of the scheduler itself and consideration of various cases of its use.

My question after the report was about understanding the transition from a scheduler on one node to a distributed solution. In response, I received a recommendation not to let things get to the point where we need to move suddenly and on a large scale. It’s better to gradually prepare the architecture and debug use cases on a large scale. Lay out some straw for yourself.

Report No. 2. Experience in converting a banking product to realtime

A story about the peculiarities of creating a project from scratch with strict deadlines. It was a combo report with 2 speakers – the product owner and the chief developer.

Product story told:

Собрали требования -> Сформировали команду -> Оценили технологии -> Выбрали архитектуру -> Сделали MVP -> Требования изменились на 180 градусов -> Всё или почти всё переделали -> Словили баги -> Отладились -> Всё работает

I told this story in less than a minute, the speakers in 30 minutes, the creation itself from scratch, with an absent team, lack of experience in choosing HighLoad solutions for such realtime tasks, satisfying all devsecops of the company’s activities, integration into adjacent circuits took 9 months.

I noticed that the reports at the conference were told cheerfully. Basically, they fit in the allotted time. Therefore, enough questions could be asked at the end.

How do you feel about the presentation being drawn out? Does it seem like you're being held in a trap/the class time is over and the legal break has already begun? Or, if you're really interested, can you listen for the entire 50-70-120 minutes?

In parallel with the general product story, the topic of assessing the applicability of possible technologies was covered:

  • Redis or Tarantool for high loads for resident storage?

  • Unfamiliar Lua, Scala in addition to familiar Java. Is it worth making multilingual? Lua is its own ecosystem. Must be configured separately from actively used banking Java.

  • Apache Flink. The necessary scalability out of the box. After all, it is necessary to scale up to 1 million rps. But there is not enough expertise.

Used in Tarantula crud module to search by shards in a cluster. At some point they realized that the algorithm that uses it does not work correctly. After analysis we went to the module itself. Contacted the developer. Fixed.

My question was about the loads. Now it is hundreds of rps. Testing is being prepared in accordance with all regulations for hundreds of thousands of rps for each service. They have a tarantula 1 million rps.

The story is also interesting because both the management and the team were flexible enough to change current solutions on the fly to meet new requirements. Agile helped them with this. It seems that there is no place for rigid people in an active market.

Have you developed a mutual love with agile?

Report No. 3. Redis – so simple and so complex!

Base, tuning, comparison with Dragonfly

Base, tuning, comparison with Dragonfly

Who would have thought that 300 lines of code Over time they will grow into a world-famous product! The basis from which radishes started is fast caching. The speed is equal to the RAM access speed.

Now, when talking about Redis, we can already talk about:

  • Fault Tolerance – Replication, Sentinel

  • Security – Access control, Data encryption

  • Administration – Configuration, Monitoring

Also about typical usage scenarios – cache, session processing, distributed locking. Even a Rate Limiter can be built!

We deploy 1 node. But it sounds a bit unreliable. Better cluster with masters and replicas. Asynchronous replication will be implemented thanks to gossip protocol.

Persistence can be ensured either with the help of Append-Only-File(AOF)or with Redis DB(RDB). Each method has its own characteristics. We remember that radish is single-threaded(!) in the sense of working with data. At least, until recently it was so.

Closer to the middle of the report, the story turned to tuning – what parameters can be tweaked in the OS. Then followed a comparison with the “Redis killer” – Dragonfly. At least, someone wanted to look like that. But the Redis developers conducted their tests and showed that the old Remote D.I.ctionary SIt's too early to retire erver. Their tests turned out to be cooler.

I liked the report for its good overview of the main capabilities of this DB, examples and comparisons with others. Perhaps, the very HighLoad boxed solution, the silver bullet, has been found – so that I deployed it and everything works by itself?! However, the mentioned necessary tuning for use with high loads and the specifics of ensuring the same persistence bring me back down to earth…

How did you use Redis? What loads did 1 instance, cluster support?

Now let's play hide and seek…

Secret place

In one black, black room... No.  There is enough light here.  What is this?

In one black, black room… No. There is enough light here. What is this?

The report part of the event is the 2nd of 3 activities of the conference. At some point, I realized that I wanted to receive information from 2 reports at the same time. The broadcast from the main hall is available online to everyone! This means I can listen to it while in another room or tent at another report and parallelize the receipt of information into 2 streams! After all, the brain can perceive this way?!

Already closer to the end of the conference, I remembered the words about a real bunker on the conference grounds. It turned out that it had equipment installed to broadcast all (well, almost) reports! Snacks, blankets, headphones – everything at your disposal!

Games, merch, discussions

Here everyone will find an activity to their liking.

Here everyone will find an activity to their liking.

This time I didn't pay much attention to the 3rd activity – getting merch. I was interested in the architectural tasks. I solved them with three companies, got bonuses, exchanged them for merch and that was enough. For those who are interested in this part of the conference – there were enough tasks and competitions. It's also fun, after all. Sometimes there is an element of competition. When you need to organize a team and beat those guys.

In my opinion, the conference is interesting also because during the day or at the after party you can discuss with experts such topics as “What's better – Redis or Tarantool? Why did Kotlin take off? Will Go win or is it just hype?”. Which is what we did) By the way, what is your opinion on these issues?

Goodbye our affectionate…

Good weather is another pleasant bonus.

Good weather is another pleasant bonus.

The city on the Neva received the guests warmly and good-naturedly. The conference organizers tried to create the best possible conditions for participants to stay on site, organize reports, meals, and various activities.

I think if they were guided by the principle “And so it will do” – such ease and level of organization could not be achieved. However, as in the systems described by the speakers. They highlighted the next challenges, details and pains that had to be faced. And also felt the joy of the next victories and readiness to handle ever greater loads.

PS On my tm channel, dedicated to the architecture of high-load applications, I conduct architectural tutorials with colleagues, write relevant posts, and conduct System Design Interviews. If interested, welcome to system_design_world 🙂

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *