Blazor Interactive SSR Performance Study Report under Multiplayer Game Load CodeGenerator

When I started my career about 15 years ago, I sometimes made websites in PHP, Perl, RoR, ASP… But due to the too frequent change of standards in the Web, I burned out and “went” entirely into Backend + Desktop development, where everything is more or less stable for years, if not decades.

However, in the context of the growing demand for cross-platform, I am increasingly thinking about returning to the Web, and since I use .NET most of all, I decided to look into Blazor in more detail, or more precisely, Interactive Server-Side Rendering.

I quickly learned how it works, but there is still an unanswered question – what about performance? What if many people press buttons in the system at the same time? What is the load on the servers?

Having failed to find detailed analyses in open sources, I made a toy for Telegram App in the evenings over a week, and on Friday, July 26, 2024, gathered people from my and friendly communities to study the load and possible incidents. I disclose the collected information with details in this article.

Context: Rules of the game and solution architecture

In order to understand the context, it is necessary to immerse yourself in the developed game and the solution architecture.

Game – “Guess the number”, many know it under the name “Bulls and Cows”. The essence of the original game is that one player guesses a number from unique digits, and the second must guess it. To help the guesser, for each of his attempts, the guesser must name how many “Cows” (the number of guessed digits not in their places) and “Bulls” (the number of guessed digits in their places) are in the named number.

My implementation of this game differs from the original in that the server guesses the number, and it is not one player who guesses it, but all players together. To simplify registration, the game was adapted for launching via Telegram Mini Apps, thanks to which I was able to obtain ready-made user IDs with minimal costs without spending time and effort on creating my own registration system.

The game interface is a 9-digit keyboard, 4 screens for displaying the entered number, 4 screens for guessing hint “lights”, and several screens for statistical information. Each time you press the “keyboard” button, a number is entered, and after entering four numbers, the guesses are checked and the result of the check is displayed (“cows” – yellow lights, “bulls” – green).

Game interface, captured from the browser during development

Game interface, captured from the browser during development

The competitive aspect is that only 120 seconds are given to guess the number. The sooner the player guesses the number, the more points he gets. The one who guesses waits for the end of the round until the server comes up with a new number, and those who have not guessed by the end of the round are left without their points and start all over again. For the convenience of tracking the time, a timer is triggered every second, counting down the time, and when it reaches zero, it requests a statistics update and a reset of the previously entered data.

All the actions described above were implemented strictly in C# and, according to Server-Side Rendering technology, must be processed on the server side. And since comparing the entered number with the guessed one, processing statistics, and eliminating conflicts in a multiplayer game can put their own load on the server, the project was divided into two parts that worked on different servers:

Part / Server 1 (Front) — An ASP.NET application with Blazor Interactive Server-Side Render enabled, which processes only interface changes and sends API requests to a second server.

Part / Server 2 (Back) — ASP.NET application with Quartz enabled for regularly updating the hidden number, WebApi for processing requests from Blazor, and PostgreSQL for storing statistical information. At the same time, all game data is stored in RAM, and SELECT in the DB is not used to speed up work.

Schematic representation of architecture

Schematic representation of architecture

First look at the results

The game was played by 37 different players during the day. They played 445 rounds and made 7250 attempts to guess the number.

The game was launched at 8am and the number of players simultaneously grew until about 11am. At the same time, the monitoring showed that the load on the servers was quite small:

Server 1 – (Front / Blazor Server-Side Rendering)

Server 2 – (Back/Quartz + WebApi + PostgreSQL)

Then, almost the entire day, the load was approximately at the same level, with the exception of two surges, which we will analyze further:

Server 1 (Front / Blazor Server-Side Rendering)

Server 2 – (Back/Quartz + WebApi + PostgreSQL)

Analysis of the incident that happened at lunchtime

During lunch break, my colleagues and I were sitting quietly, playing a game, not bothering anyone, when suddenly the game stopped offering us new numbers to guess. The first thing I thought was that the backend application that provided the calculations had crashed.

Considering that I did not spend much time on development, I allowed such crashes with a quick solution by “Restart the application”. But imagine my surprise when I could not get to the server at all, neither via RDP nor via SSH. As a result, it had to be completely rebooted, after which the back-application successfully launched and the game came to life.

On the monitoring page, this situation on the backend looked like this:

The increase in load is visible in two places:

1 – Increased network activity up to 100kBps, which is not really a big indicator, but it stands out from the overall picture.

2 — Increased disk load up to 30 MBps, which raised questions. Here we can assume that there was spam from the front, but considering that the guessing attempt is sent from the front only after entering four digits, and it is recorded by only one INSERT, we should expect a higher network load on the server.

But even if the hypothesis with spam attempts is correct, then there should be many records in the table with user attempts at the moment the hang began. But in the database, the second-to-last attempt before the hang was at 12:51:11, and the last one at 12:53:19 – not such a serious load, it was like this almost the whole day:

Result of query attempts of all users since 12:50

Result of query attempts of all users since 12:50

The next suspect was Quartz — what if it went crazy and started triggering not once every 2 minutes, but more often? After all, when Quartz is triggered, a new number is generated, and with it, not only is this number written to the database, but also statistics for the round + per hour are collected + recorded. But this hypothesis also turned out to be false after analyzing the database — the penultimate code was generated at 12:50:57, the last one before freezing at 12:52:57 — exactly 2 minutes later, and the next time was already at 13:20:38, after a reboot:

Result of request for generated codes from 12:50

Result of request for generated codes from 12:50

Thus, it turns out that neither the game engine nor the DBMS itself are the cause of the failure. Perhaps something went wrong at the system level or the cloud provider infrastructure. After all, the fact that it is unavailable even via SSH is especially strange.

At this point, one could clap one's hands and close the incident as pseudo-false, but the CPU load graph and the front network raised a question:

Here I saw the potentially major minus point that I wanted to immediately give to Blazor – the load on the CPU, but an analysis of the situation made me change my mind.

I quickly realized that the high CPU and network consumption was due to the countdown timer, which, upon reaching 0 seconds, requests current statistics from the backend and only upon receiving it, returns the countdown for 120 seconds. Since the timer fires every second, it turns out that every second a new timer handler thread is created, in which a request is sent to the backend, which in turn falls into a wait until the timeout, and until that moment the thread consumes CPU time. And since the timeout is about a minute, about 60 threads per player (open page with a Blazor application in the browser) accumulate in a minute.

And here we could clap our hands again and say, “So that's the reason!”, but a silent question arises – “Why doesn't the timer stop working if the players, having seen the game not working, exited the game? Why weren't the sessions on Interactive SSR extinguished?” I couldn't find the answer, but perhaps Blazor experts know how to handle such cases, I invite them to write answers in the comments.

Stress Testing Analysis

At the end of the day, I came to the conclusion that the graphs looked too beautiful. And the analysis of the database showed that there was no case during the entire day when more than 4 people played in one round. And this is not a good indicator for the test. So I put out a call in several communities and gathered 9 volunteers who from 19:00 to 19:05 in the game simply clicked buttons at a fast pace without trying to guess the number.

As a result, there were about 4500 requests in 5 minutes, or about 15 requests per second. Of course, this is not the level of a conditional Yandex, but my server capacity is not very large. And here is the result of monitoring:

Server 1 – (Front / Blazor Server-Side Rendering)

Server 2 – (Back/Quartz + WebApi + PostgreSQL)

The load spikes are visible, but with a Front CPU peak of 3.5% and 60kBps network, it seems that Blazor is a viable solution for production workloads. Of course, there is always a but.

Some of the players complained about the same problem during stress testing: Sometimes the buttons are not pressed. This was visible when they were pressed sequentially, for example, pressing 1 2 3 4, but only 1 2 4 was entered (3 seemed not to be pressed).

As communication showed, this did not happen at the same time for everyone. I myself repeated the problem after stress testing with a lower load on the server. From this I conclude that Blazor Interactive SSR (or rather the SignalR connection) cannot handle such a large number of requests between the server and the same client, as a result of which it misses some.

How much of a problem this is for application is an open question for your systems. What will happen if the user is nervous and clicks a lot quickly, and some checkbox is not pressed, because of which the next button pressed will not work properly?

In addition to the spam commands, I also found another negative moment. When I was riding on the train, the game screen suddenly turned gray and a loading animation appeared:

Unpleasant gray screen when connection to server is lost

Unpleasant gray screen when connection to server is lost

As it turns out, this is normal behavior when SignalR loses connection with the Blazor server. On the one hand, this is correct – if there is no connection, then the commands will not work, and the user should know about it. On the other hand, what if the user wants to methodically read the information from the page? Will this loading padding interfere with him? Perhaps Blazor professionals know how to solve such problems, I also invite you to write in the comments.

Let's sum up the results of the study

Based on the research, I decided to put together the pros and cons so that other professionals making decisions about the choice of technology for their projects could make an informed decision about applicability:

Pros:

  • Easy entry into the technology for a developer who knows C# /.NET. This means that back-end developers can be involved in front-end development, even though they may not understand the intricacies of JS/TS languages ​​and the architecture of frameworks like AngularJS, Vue.js. And on those projects that do not require the implementation of a super-beautiful interface, you can most likely do without front-end specialists at all (for small integrators, this can mean a good potential financial benefit);

  • Solving the problem of transferring complex objects between the browser and the server. In the past, I've had a lot of trouble passing objects with dozens of fields between a C#/Java backend and browser JS/TS, and I understand this pain very well. The fact that Blazor allows you to get a native C# class from the API and immediately display it in the data on the web page seems like a very significant feature.;

Minuses:

  • There is no way to mix Interactive Server-Side Rendering and Client-Side Rendering (WebAssembly). As the game's practice has shown, a whole series of actions (for example, counting down a timer, simply subtracting one from the int variable timeToNextRound) via server rendering is an extremely unhealthy story. But, on the other hand, enabling the entire client-side rendering burdens the client's browser with sending several dozen megabytes of Runtime, which is also a bad idea (and it is not a fact that it will work normally on weak devices). Therefore, it seems that some intermediate story is missing. It would be great if Microsoft made some kind of C# to JS translator for methods with simple calculations and changes in primitives. But, until this happens, stories with constant data changes that do not actually depend on the back-server look deadly not only for the front-server performance, but also for the network;

  • SignalR Opaque Debugging. Periodically during development, I had some kind of failure and the buttons pressed in the browser seemed to work, but the breakpoint in Visual Studio did not work. It is unclear why. In the browser developer tools on the “Network” panel, there was no data on sending requests and receiving responses either. In general, rebuilding / restarting the project saved me, but for some time it ate me, which I definitely write down as a negative.

  • Loss of calls when handlers are triggered more often than 2-3 times per second. Although this story only occurs in cases where the user hysterically presses buttons that have handlers underneath them, and in reality, with the exception of games, this is unlikely to happen, this point is definitely worth keeping in mind.

Personally, I have come to the conclusion that despite all the voiced disadvantages, the technology has its place in the corporate world / in closed circuits and I will definitely advocate using Blazor to most teams that write in C#. What conclusions do you draw? Share in the comments!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *