Big retrospective of RBK.money's participation in The Standoff 2020

… or how hackers broke our open source payment processing in a cyber city.

Hello! We recently took an active part in the cyber polygon with RBK.money processing The standoff – this is when they make a virtual model of an entire metropolis with all its infrastructure, power plants, shops and other important elements. And then they let the blue team (6 teams this year) and the red team (29 teams this year, respectively) teams into this digital twin of the city, the first protects this entire infrastructure, the second is actively trying to break something.

from the movie “Blade Runner 2049”

Of course, we brought our processing to the event, which you can read about in the previous posts of our blog. Processing was part of the bank and provided financial and payment services for residents of the cyber city, serviced the issue of payment cards, and made it possible to sell goods and services.

In this post, I want to tell you how hackers broke us (spoiler alert: and didn’t), as well as how we ourselves shot ourselves in the foot a couple of times in the process of preparing and deploying our solution. And yes, the main thing is that for us it was initially a win-win situation: they will not break us, which means that it is not for nothing that we are so confident in our processing that we put it in open source and now we give it to hackers. They will break it – generally great, we will see where the weaknesses were, and we will become even more protected in terms of security for our clients.

We infiltrated the cyber city in a tangible hurry: this is one of our first deployments in Kubernetes (before that we deployed everything with Salt states), and we had to use new installation approaches. And the consequences of this rush were not long in coming.

Tips for hackers

Even before rolling out processing to a cyber city, we deliberately left two rather weak spots there.

The first is a vulnerability associated with card tokenization methods. These methods were susceptible to vulnerabilities in the form of replay attacks, that is, the card could be successfully tokenized in one store, and then with this token came to another and reused there. We timidly hoped that the vulnerability would be found, but alas.

The second is a simple accounting of the main merchant. We created only one oligarch merchant, it was a legal entity that owned online stores in the cyber city. And this moneybag had pretty simple credentials, that is, a password like Parolec0 in principle, it would be possible to pick up. But it didn’t take off either.

But our own jambs came up.

In a hurry – you can’t protect everything

An attentive reader will conclude from the point about the main merchant – wait a minute, you have one and only oligarch who owns all online stores, it is enough to hack one such online store, and you can get access to the rest. Well, yes, they didn’t think about this moment in a hurry ..

And in fact, after hacking this merchant, it was possible to get its API key for our processing and fully manage this processing. Actually, this is what happened – the attackers hacked a third-party solution, an online entertainment store in the city, got the API key of our merchant from there, came with it to our processing and called a method that allows you to turn on / off your online store. And since one legal entity owned all retail outlets throughout the city, they all turned off.

In principle, this is correct, if you are a hefty greedy oligarch who has grabbed everything for himself – suffer. We drew conclusions and decided to promptly dispossess the moneybag by creating 5 more independent legal entities, and for each separate “login-password” and API-key pairs. I think that in the next such competitions we will try to make the situation even closer to reality in the business part.

And it also “flew by” because of the peculiarities of the kuber.

In Kubernetes, the main store about the state of the cluster is – ETCD, a useful distributed system on which you can build very, very reliable things. But it is too critical of the latency of hard drives.

As I wrote, we deployed processing in a virtual cyber city environment. There were quite active attacks on the objects adjacent to us, and once one of such “noisy” objects was moved to our datastore, the infrastructure of one of the participants, which was broken for a long time and persistently. And although de facto we were not a target in this case and at that moment no one broke the processing, it continued to work normally, but the cluster itself began to slow down wildly, the hard drives simply could not cope (they managed to notice that the output of the command ls -l took about 19 seconds). It turned out that a kind of DoS came out, and in one night we sent our standard cats to all requests.

After this situation, the organizers of The Standoff moved us to other disks, that is, they turned off one virtual machine and turned on another one in a different location. After that, our distributed DBMS happily caught a split-brain, half of the nodes contained one information, half – another, and could not really agree with themselves. In battle, of course, we would have been more seriously confused with migration and would not have allowed this. And in a test environment, it was much easier to just crash everything that was available and reinstall it, which we did, spending, by the way, 2 hours. Why I emphasize this – because we deployed a full-fledged workflow with all the components in two hours, and you can do this with our processing in battle in your company. Classic processing is usually deployed in companies of 3 months.

So, about split-brain, it’s all a rush. We are in a zapara just on one of the nodes to the root demolished / tmp… Who knew that the module CSI LVM, which distributes local volumes from hardware to pods, hidden (!) mounts a persistent Kuber volume in / tmp… Thus, it turned out that with our own hands we demolished the data under the feet of the DBMS that was spinning on it. Moreover, despite the fact that we demolished the storage for some of the nodes in the base clusters, everything continued to work for us until the first restart of the virtual machine, which happened when they began to transfer us to new sides.

Blue-team is slowly turning off its defenses …

One day the blue-team decided to turn off external protection (firewall, etc.). That is, the first couple of days hackers tried to break systems with this kind of protection enabled, and then – without. We also had a third-party WAF, which in the ingress with njinks as a module was loaded and protected our traffic.

And then the day came, we went to turn off WAF and realized that it was already turned off. Two days ago. Because we are great and in a hurry (yes, yes), we set up ingress kubernetes, which had a WAF instance. And everything would be fine, but WAF simply did not get through to the control server, could not download the license file from it, shrugged his shoulders and simply turned off from sin. And it turns out that all these two days we “Break with the included protection” were sitting, in fact, without this protection.

But still we were not broken.

Another confirmation of the thesis that hurrying is harmful (if you don’t have enough of the previous ones) – the situation with our antifraud. I described it in previous blog posts, there is a magic box with a set of rules. Antifraud protects against busting bank cards, attempts to pay at one point from different locations / IP / email, and other unfriendly actions. We told the defense team that we would thoughtfully set all these rules ourselves.

And we did it – we carefully set up all the anti-fraud rules. On our production server RBK.money instead of installing for The Standoff. The UI urls in the browser address bar are corny. That is, the antifraud at this time was a box with a silent mysterious void.

This became one of the successful vectors for editors:
For example, they had previously hacked a third-party Internet bank by stealing the PAN-code of the card (the numbers themselves, Primary Account Number), the name of the cardholder and choosing the validity date. After that, already in our processing on this PAN, they began to sort out CVVs. And everything would be fine, after 3 attempts to bust the cards, they would be blocked by our antifraud. If only … see above.

In general, there were many such funny stories. At some point, our services stopped resolving and the time on the nodes ceased to be synchronized, and somehow randomly, from some hosts.

Of course, the first thing they did was to blame the misconfig, the incomprehensible work of the cluster.

With DNS, the problem was quickly solved by moving CoreDNS pods directly to the nodes where this traffic was not fired, but with NTP we were lucky – fortunately, we did not catch a large clock skew, and when creating a cluster, the nodes still managed to synchronize.

It turned out that at some point at the firewall level, outgoing NTP and DNS requests were disabled for some of the nodes. Apparently, someone tightened the filtration nuts too much.

Other attacks

Attacks on other nearby cyber city targets have also been successful at times, including those, like us, in the cyber city financial system.

It’s good that we did not confuse the alert urls above the elastic and in the monitoring, and saw them quite accurately and quickly enough.

For example, as in the case of hacking our oligarch and the withdrawn API key. It was hacked at 22.17 Moscow time. At 22.22 we on our side noticed this and immediately reported it to the team of defenders and organizers. We noticed, by the way, thanks to the crooked use of the API – the hackers passed a strange content type header in the first request, called a rare method of our API, and some other nuances – that was the reason to trigger the alert.

When the system works normally and automatically, it rarely all coincides at the same time. But if something like that happens, it means that someone is playing with the API with their hands. Here’s the alert and it worked.
If we are not talking about a cyber city, but about real situations of this kind, then everything happens even faster, in such cases the system automatically blocks the actions of the merchant so that nothing is withdrawn from his account and, in principle, does not harm his work and business, raises panic and connects already living employees

For the future

The cyber city hacking format is, no joke, the future of information security. We will definitely come here again, take into account all the mistakes and think over the infrastructure and business schemes so that they are as close to reality as possible. Ideally, I generally want people from the Central Bank or State Services to come to the same conclusions – to test their infrastructure in battle, provide an opportunity to find vulnerabilities and become even better and more reliable for users and businesses.

And it was actually very cool and fun. In fact, we received a large set Chaos monkey– cases that would not be reproduced in production, we tested the processing for attacks, having received more attacks in 2 days than we regularly receive in six months.

We learned a lot, and most importantly, we saw that our product, which we are writing for ourselves and for you, is popular among cyber city participants, for our IT it was a strong motivation – after all, it’s nice when people use the result of your work, even if in this case and for very specific purposes.

We really liked it and want MORE.

In the next Standoff, wait for us with even more interesting schemes, more payment functions and new attack vectors!