Draining Yandex sources as the biggest push of Russian IT


I’ll try without much discussion, straight to the point. Hello, I’m mobilz, and at one time I already “merged” some Yandex sources, including. Preliminary, of course, warning them. I have nothing to do with current events, but I have thoughts that I want to share.

First, it’s a star. This is not the first drain, but probably the largest. If this happened to my projects, I would sit in the corner, hugging my knees, and cry for a long time.
Secondly, this is the best thing that will happen to Russian IT this year. We will not see such growth as this year for a long time.


Undoubtedly, Yandex, as a commercial company, will lose profit in the short term. A lot of time will be spent fixing bottlenecks, approach and architecture. But in the long run, everyone will benefit, including Yandex itself, which will receive valuable personnel. Again, if you omit the lyrics, then the Yandex databases (deliveries, taxis) were leaked more than once, Yandex got off with a fine. But now everything has changed.

Prerequisites.

Monorepositories are very convenient. Over the past 5 years, we have been actively seeing that many people are switching to monoreps, and Yandex was one of the pioneers of this approach at one time. However, this also plays a cruel joke.

The database leaks I mentioned above were wake-up calls. impunity, that is a fine of 60,000 rubles for the merged personal data of Yandex.Food customers is proof of this. This does not mean that the developers are stupid or do not care, just that there is no responsibility.

The political situation in the country. Segalovich, eternal memory, now Volozh with tops who left Russia, this is a negative growth. Of course, Kudrin is not to blame for this. To destroy what has been created for decades, he would have to try very hard. But the date the data was last modified hints that not everyone agrees. You can talk for a long time that McDonald’s did not leave, but sold its business, and that Ikea earned billions on us and fled with her tail between her legs. And the auto giants are trying to annoy us, everyone around is fools. However, large IT and not only, are negatively added in the country, alas.

Who did this?

Two main theories taken from the lips:

  1. Hacking based on prerequisites (last paragraph).

  2. “Rat” inside that had access and just leaked.

I personally do not believe in the second option. I could be wrong, but the reasons are:

  • January 26, 11 months later? Why not right away? Why not after Bucha? And why not after September 21 or 30? Where were you for 8 years?

  • There is a lot missing in the plum. Somewhere entirely directories, somewhere specific files. This is more like not a “rat”, but a calculation.

  • The architecture of monoreps allows this to be done by an external person under certain conditions.

  • Unpreparedness for this, for me personally, indicates the exclusivity of the situation. Obviously, Yandex had protection against such internal leaks, but faced something else (remember the chaos in closed telegram chats on January 26th)

I’m not saying it’s definitely a hack. No one is immune from an employee who slowly and systematically, quietly, so that no one notices, drained resources. And then, drinking Long Island in Cuba, he decided which of them he would publish and what not.

What for?

The shortest paragraph, For me, the answer is in mtime, the last modification of the files. You can argue for a long time, you can say that it’s just dust in the eyes, etc. The fact is that this could have happened before. The fact is it didn’t happen.

What’s good?

I glanced at the sources. The first thing I noted for myself is that you need to pull up go, python and, of course, c++. Returning to the title of the article, it seems to be a mine of best-practice in terms of c++/python/go/typescript/docker/etc at least. In README.MD, I often saw this: “Ready-made libs either in c, making a wrapper for c++ is cheaper to write from scratch. And ready-made libs in c++ – people don’t understand how to make architecturally sound libs.” In React apps, dependencies are minimal. Where I expected to see reanimated or react-navigation is a native implementation. It does not mean that it is better and cooler, just “guys” act in isolation. You don’t have to look far, a prime example of ClickHouse, which has gone public and open to the public and has won its place in the biggest companies like cloudflare, bloomberg and lesser-known b2c companies related to AD. But no less ambitious.

In resources, we can find super simple scripts, such as json validation, base64/pbf encode/decode/script_functions, archiving functions, working with geodata, etc. So are quite complex data analysis algorithms.
There are many sources of cool c/c++ projects on github, but where else can you find the whole spectrum, from simple to complex? However, a lot of code is documented. It’s one thing to learn c++ using the same ClickHouse, where the team thinks over each line, trying to optimize it and prevent unnecessary memcpy. Another is to start small, like url validation. The modern world is overflowing with skillboxes and other Yandex.Workshops, which often impose “their” technologies and dirty templates. The same, _BEM I did not often meet in these sources.

Obviously, the sources of “Yandex” attract attention. Even if it turns out that many projects were in the public domain, which they are, it is the “merged” sources that will be of interest to many. They will be studied. Considering that Yandex was and still is the largest and highest quality IT team in the country, everyone who has received resources and will be equal to them will most likely play a plus.

Obviously, many resources will be of interest to pentesters as well. In our country, the activities of illegal access to computer information have long been interpreted freely. Previously, department “K” could break into your house and seize the monitor (it has all the information on it, right?), modern realities are compensated by imperfect laws very simply – if you are brave, then either with us, or we will find you, for which you plant, 228 no one canceled. The article is “presidential”, you will get the maximum. This is also why we can observe reports that “Russian” hackers are caught mainly abroad. IB doesn’t count.
This does not mean that Yandex can rub its hands and wait for new employees + free security analytics, of course not. Obviously, Yandex will pay a lot of money for a pentest with access to the source code. But it is not in vain that they announced “increased stakes” on “hunting for bugs.” It’s strange how short it is. Wangyu that will be extended. And everyone will win. The developers themselves will say “thank you” if vulnerabilities are found among their code.

“Adult” companies have received a great example of how to do it. The infrastructure, logic and implementation of Yandex has always been something “magic”. Now we can all be convinced that these were not fairy tales, the way they are. Everything is structured, documentation for each project, codestyle for each project, almost every documentation has a separate item: “if something is not clear – do not be shy, ask.” Internal CRM, arcadia, wikis, communities and chats. An excellent occasion to look at how you do business “personally”. There is someone to look up to.

What’s wrong with that?

In addition to the obvious commercial losses, Yandex suffered reputational losses. Not that they get used to it, but the source codes of the site robot and the captcha anti-robot are a pool for further commercial losses, at least

Offended market partners who believe that they are being minus – here are the source codes, let’s look. You are not in the first place in the search results – here are the sources (yes, the robot simply collects data, it is analyzed after, but there is still access to the robot). Hidden functions and settings of the Yandex navigator – here they are, read the code. The taxi driver is not given an order that is next to him – well, your speed is above the limit (as far as I understand, something like more than 55 km / h) or less than the limit, i.e. you’re standing… Obviously, internal algorithms will hit Yandex and many will want to take advantage of it. Some a little earlier, some a little later. It is good for Yandex to look for new metrics for algorithms, but this will not happen quickly. As well as there will be no rapid rebuilding of the architecture. I mean, Yandex will obviously spend a lot of money unscheduled to compensate for this drain. And, obviously, this will affect us – Yandex users (taxi, food, search, film search, auto.ru, and so on).

Well, let’s not forget about the new database plums. If we have experienced this before, now it will become a normal event. Unless, of course, the geniuses from Yandex quickly solve this problem. Alas, I doubt that Kudrin is a “genius from Yandex.” He is a genius, but definitely not about Yandex and a modern IT company.

Yuri Dud Synodov (if you don’t know, this is “Dud” in IT journalism, webplaneta, roem, etc.) said that the charm of those few geniuses who left with Volozh Sr. is that they will do in a week what Yandex employees have been doing for decades. Of course this is not the case and of course Yuri had to introduce new projects based on the old ones. It will definitely not work to recreate the infrastructure of the current Yandex in a week, and it will not work in half a year either. Even the “geniuses” who left with Volozh will spend a lot of time to understand what it is and how to avoid it in the future, instead of creating further. And they will also have to think about it, because. they are also involved in it.

Instead of an afterword.

Many of Yandex’s internal projects have the MIT license “by default” in their resources, this is clearly stated in the metafiles. Other sources are not signed in any way, which also allows them to be copied, used for commercial purposes, modified, etc. No matter how much I personally “disliked” Yandex (for me they have long been an evil corporation), I sincerely send rays of good to all the employees remaining in the Russian Federation. You are very cool and great fellows, what you have created and support is a national treasure. Looking through your fingers, I am convinced that you are great professionals. Be patient and never give up. I repeat, this drain is relevant only “here and now”, it is up to you to make it “irrelevant” half a year later. Do not swear at whites that they are pentesting based on your resources. It so happened. And thank you very much for documenting the code.

This is not your fault.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *