Harbingers of the storm in small and medium-sized businesses and in medium-sized government agencies

Part 2 – quality of personnel. For many years, schools and institutes have been teaching “the wrong thing.” The reasons are known, the “why” article is at the very beginning of the text, what to do for someone who really wants to get into IT is also known.
In addition, and I also wrote about this, for many years there has been an archaization of IT – when boomers / oldies / cheap grandfathers say “go read a book and do your homework“, without specifying what to read from a book of 1500 pages in English, but “We read it and you read it too” Or vice versa, asking “why did you do THIS according to the old manual, there is a new one that says the exact opposite” Snobbery and toxicity are in full bloom in the Russian-speaking community. We add to this the “westward” washout of all those who took the acceleration at the stage of transition from junior to middle and left, and we get a sharp failure in the presence of a group of “middles who are ready to share their experience in Russian.” Instead, there are seniors who write strange things, and some technical writers who write in the style of “three hundred devices.”
As a result, as in a recent attempt to tell on a corporate blog that NVME is cool, it turns out that mechanical 7200 hard drives sometimes outperform NVME SSDs. The author of the article has no self-criticism, no understanding that, regardless of the interface for connecting a local disk, Storage I/O Control (SIOC) can be enabled in the cloud, and replication, due to the inferiority of the solution, can be much slower than the author and himself think The author's testing mechanism checks the cache size in the system.
And in the same blog in 2021 there were articles “something is wrong somewhere,” but you need to read them.
Example articles:
Everyone lies: the epic with NVMe servers and Hi-CPUs
Everyone lies-2: how the detective about slow NVMe and the inability to make RAID HDD, SSD or NVMe ended: what to choose for a virtual server (tests inside)

Similarly, a seemingly serious company – Yandex – writes an article about hybrid mail, but in terms that are not found anywhere except this article, and without even making an attempt to bring the article into a uniform form, mixing both GUI and powershell, and indicating “we have There is also an API, but there will be no example of working with it.”
That is, there is a systemic problem with the quality of employees and candidates, and the phrase
9 out of 10 Russian employers are experiencing a dire shortage of highly qualified low-paid workers
can be continued as “but no matter how much money you give to the guys, they will still remain guys”

Junes
There are many Junes. Some of them have already learned that you can measure the speed of the disk subsystem not only by copying files, but also by running Crystal disk mark, or even ATTO Disk Benchmark (which has a problem with the size of the test file). Maybe they will soon master DISKSPD, and then fio and vbench will be suitable, or even XRay, or even discover the presence of different counters not only in Windows, but also in host systems.
Despite the fact that recommendations on “how to do things better” have long been written – what is in the article
Meaningful load testing of the disk subsystem, what is in the article Testing block storage: nuances and features of practice, what is in the review Why is hyperconvergence needed? Review and tests of Cisco HyperFlex (however, here you need to scroll to the middle of the text)

Everything has been written, but few can repeat it and few want to. As a result, we get testing of the storage system in the form of “we turned on two whole servers in the storage system, nothing crashed right away – we can take it.” And this is testing from the vendor.

Increasing complexity of systems
For the last week or two, two dramas have been unfolding in parallel in one telegram group, clearly demonstrating the complex problem of combining inexperienced personnel, outdated equipment and outdated software, and the desire to save where it is not necessary by applying a simple solution.
Not knowing that there is often no simple solution for complex systems – and the problem described in the article The educational gap or a coupon for one help manifests itself.

Outdated hardware, outdated software and lack of support
On March 5, 2024, updates for critical vulnerabilities were released VMware CVE-2024-22252 – For 6.5 – build 23084120
For 6.7 – build 23084122
But. These patches are only available to Extended Support users.which means it is impossible to download them “just like that”, and makes it unsafe to operate all systems below ESXi 7.0 P08 Update 3p (23307199) and ESXi 8.0 U2 P03 (23305546)

The size of this hole and its threat are comparable to EternalBlue – CVE-2017-0144 from 2017or with CVE-2023-2163. Many people also didn’t care about the patch from EternalBlue in 2017.

Plus, SUDDENLY, an old problem with Lighttpd has surfaced at Supermicro and Lenovo, especially on older models, which you can read about here – Intel and Lenovo servers impacted by 6-year-old BMC flaw – “According to Binarly Transparency Platform data, we see multiple products from Intel, Lenovo, and Supermicro are impacted,” Binarly told BleepingComputer. While I was writing this article, they even wrote about the problem on Kaspersky’s blog.

Considering the average indifferent attitude towards timely installation of updates, you can get an excellent combo at any time: failure of the entire system, plus encrypted data. It would be necessary to update, BUT.
But when moving from versions 6.5 and 6.7 to 7.0, two obstacles arise, these are:
1. Lack of support for older processors, or rather, refusal to support them.
2. Refusal of vendors to support old peripherals. More precisely, in version 7.0 it is no longer possible (in ESXi) to install Linux drivers. Vendors of LSI type controllers have not released drivers for ancient RAID controllers. This also applies to network cards, old FCs, and so on. Sometimes there are options (read here), but it would be more correct to read the documentation.
It would be necessary to update the server fleet and read HCL\HCG – BUT.
Upgrading to HPE\Dell\Lenovo\Supermicro\Huawei has become more expensive. Seriously more expensive, both in rubles and dollars.
Upgrading to import substitution is even more expensive. And it’s not just the price of the servers themselves, and not only the terms of the warranty, but also the disgusting terms and conditions of delivery.

And two dramas occur.
The first drama – instead of storage systems dying of old age, you can assemble SDS. But. Assembling either S2D or vSAN on the components of their home segment, on the same SSD from the nearest DNS – because it’s cheap, and assembling them inside or MS Server 2016 or ESXi 6.5\6.7 – leads to the loss of the entire cluster with data at the first careless moment movements and power failures. There are no backups, there is nothing to extract data from the resulting mess – modern SDS is not a classic raid, the disks from which can be taken to a data recovery company. That is, you can attribute it, but you still won’t be able to collect it, for a number of reasons. Make and check backups, what else can I recommend?
The second drama is that for some reason some people have forgotten how to read and plan the assembled structure. It seems that there are plenty of materials on system design in open form, recommended settings are available, there is a working and not very toxic community – but no, with enviable tenacity, people still configure systems exactly the opposite of what is written.
Then they suffer because they thought they could put it together from chips and acorns and get the same speed as on a single NVME Optane, but instead they run into delays on an unmanaged 10G switch, and don’t get the performance they wanted. Again, it would seem – well, open BASE in Russian, to understand how data is distributed, look at esxtop in the context of DAVG\KAVG according to the manual, either in the dataline blog, or in the original Using esxtop to identify storage performance issues for ESX / ESXi (multiple versions) (1008205) – but no, let’s immediately drag a crystal crystal with a sample size of a quarter of the cache and ask why it doesn’t work the way I want.
That is, the system of wood chips and acorns works like acorns and has the stability of wood chips.

KII subjects and those falling under the skating rink of widespread import substitution will face another surprise – and this is not only the quality of the servers, not only the quality of the IT product containing it, and not the quality of articles in the style of “an armful of firewood and the cluster is ready,” but also the quality of its support. Take and fork Open STACK | Nebula is possible. And you can even sell it. And you can even send a person to install it, because the development organization cannot assemble the installer in ISO; instead, you need to do a bunch of manual movements. 95% Vmware, yeah. Certainly. Under the motto “we couldn't even do even ~~basic~~ basic bash\ansible”. But. This is not so important, but the problems are that:
1) The ideology and complexity of these products are designed for creating a large infrastructure, and not for 3 old servers,
2) to run them you need at least two average Linux specialists, because the zVirt Hosted Engine article: deployment practice in pictures is painful, and trying to find HCL using links from comments on the developer’s website – for example https://www.orionsoft.ru/programs/server-kraftway –
causes persistent bewilderment. For example, how to understand the quote:

Version zVirt 4.0 is compatible with the following versions of Kraftway – Kraftway Trusted TS2000.

What server components are compatible with, what versions of microcode, drivers and host OS kernel – nothing. The section and sites are not just made “to be left behind”, but deliberately crooked. Everything is thrown into one pile – backup systems, Dallas, hardware, and guest OS. This is not to mention the fact that the site itself works with an error – by default the selection field is zVirt 4.1, and compatibility with 4.0 is loaded.
By the way, in 4.1 a certain compatible with Windows XP 64-bit Edition at the “guest tool support” level – wondering what that means? What do native KVM tools do? These, and not only these, questions need to be dealt with first in a test segment, then done as a pilot project, and along the way, fend off marketers with their opinions, a quote from a similar discussion:

I indicated in the article that the figure of 95% is taken not from all the capabilities of VMware, but from the real needs of customers for the functionality of the platform

Silencing problems
The same one of the segments of the MTS cloud that has been lying for a couple of days – there is a cloudless sky above the entire MTS – does not fall into any statistics. Whether there was a loss of data in the end, or not – MTS did not officially and openly confirm anything. The falling VK at least wrote something – how it fell, what fell and what rose (VK services have been restored) and, sooner or later, I hope there will be an analysis of “what was there and how to prevent it from happening.” VK are great on the technical side, they are one of the few remaining authors who at least somehow write something openly. But globally there are still no statistics or analytics on the real availability of clouds. In such conditions, moving to any cloud is a leap of faith… or the need to store backups at home, clogging the channel to the cloud all night.

Summarizing:
Personnel with the qualifications required for planning and migration are expensive, 30-50% more expensive than the customer is willing to pay.
Personnel from the next generation, seemingly young, are not ready to work for the amounts offered to “beginning specialists” – because even as a courier in Moscow you can earn more here and now, and there are not so many qualified personnel.
It is no longer possible not to change the equipment, and it is expensive to change. And there are issues with licensing.
For KII\IZ, replacement and migration is complicated by the lack of public expertise to assess the pros/cons of the zoo (30+ secondary products)hiring your own expertise is not only expensive, but also pointless.
The main integrators – Croc, Jet, Aiteko, Lanit, T1, IBS (the list goes on) and the integration and IT departments of large customers – Greenatom, Sberbank (Sbertech), other **integration from subsidiaries and granddaughters of large customers – everything is fine, the expertise has been developed, the laboratories have been assembled, and there is something to talk about. And they write great articles.
For example:
12/19/2022 Load testing of storage systems and features of generating test data from Dell’s experience – about Dell EMC Power Max
02/27/2023 Testing block storage: nuances and features of practice
06/25/2023 World. Work. Maipu. Or how we tested the Chinese storage system (Maipu)
08/16/2023 Review of Infortrend GS 2024U – budget storage system with a claim to something more
10.10.2023 Seamless software upgrade for a data storage system: how to organize and improve (Yadro)

Only for medium and small businesses, and for budget organizations, hiring an integrator will cost an extra five hundred dollarsand will be accompanied by screams dearly, you cut without a knife, grabbing the heart and other elements of bargaining. Already, some organizations in Moscow are looking for an “expert architect to carry out import substitution” – for a salary slightly higher than that of a courier. It’s scary to think what will happen to their budgeting commission when they see a commercial proposal from an integrator.

What else can be done in such a situation?
The only option left is “try to wait it out, maybe it suddenly gets better.” But this is a game with risks. What if systems fall into cascading failure before things get better?

Alternatives?
You can listen to a good song about IT alternatives.

Summarizing.
Nothing particularly scary or terrible is happening “in the moment and for everyone at once”; the headline is wrong. There are some companies for which the quality of the published articles and the content of the site are not important; their sales work on a different mechanism. Other questions “it doesn’t work” or “where can we find a worker” can still be solved with money. Moreover, with the same money as a year or two ago, but there is a nuance – you just need (spoiler):
recalculate the opinion of those “crying” in rubles at the rate of 65 rubles/USD, and multiply by the current rate for selling cash USD in a savings bank. At the time of writing, the rate was 97.6 rubles/USD.
Those who cry have neither box office gaps nor lack of budgets. But, for now, the situation is at the stage when it is already possible and necessary to show grief to the public, but you still can’t give money. In any industry.
This is what the statistics show, quote:

At the same time, as Shirov reported, the share of labor compensation in the cost structure of enterprises over the past 10 years has not increased, but, on the contrary, has decreased: from 28% of output in 2013 to 22% in 2022. Which leads to the conclusion that enterprises, let’s say, are increasingly underpaying workers for their work.
Another indicator, which is given in the materials of the Institute of Economic Forecasting of the Russian Academy of Sciences, is the share of wages in the country’s GDP. In 2013 it was 46%, then increased to 48% in 2015, but after 2017 this share decreased, and in 2022 it was already about 39%.The share of wages in the GDP structure has fallen to a record level
in 2022, the share of wages in GDP was 38.5%, and in 2023 – already 40.7%. Source

But that's a completely different story.