How we migrate to the cloud with a change of hypervisor and without downtime

There is a myth that cloud migration is a simple procedure, just like moving systems from one platform to another. But if we are talking about complex cases with the migration of a large number of services with minimal downtime, everything becomes not so rosy. And if it means moving the hypervisor with me – so generally. For a long time this was a serious limitation for us and for our clients, until a tool appeared that allows you to very easily and quickly carry out migrations. More on it later.

Moving from one cloud platform to another is a common situation. Companies often change providers because they are not satisfied with the quality of services: either the real parameters do not correspond to those stated in the contract, or failures begin, and technical support does not respond for several days. Another common reason is a change in pricing policy.

The migration proceeds according to the same scenario: the company holds a competition, selects a provider based on the range of services, characteristics and prices, and then starts moving. Many technical problems occur at this stage.

What you have to face

It is good if the target cloud uses the same virtualization platform as the source one. So, when migrating from VMware to VMware, it is enough to take a snapshot and upload it to a new cloud. It is simple and, as a rule, the transfer goes without surprises. For the system itself, nothing changes except the IP addresses and the hardware on which virtualization is running.
But when switching from VMware to the KVM hypervisor, the task becomes more complicated, because KVM uses a different format of virtual machines. To overcome the incompatibility, it is necessary to dump the disks and convert them. Along the way, you will have to write drivers to the virtual machine that will allow you to start on the new hypervisor.

In fact, as practice shows, these difficulties can be overcome, but for a long time moving from one hypervisor to another actually meant a lot of work for engineers and a headache for customers. Such a migration cannot be completed in a few days, and if the infrastructure is large, it can take more than a week. In order to maintain data consistency in such conditions, it is necessary to go for a long downtime.
This is despite the fact that even a few days without a working cloud infrastructure means huge monetary and reputation losses. Downtime, which lasted for a week, doesn’t just sound scary, it is a disaster for many clients.

Therefore, ideally, migration should take place as quickly as possible and, if possible, without stopping services. For this, one could use, for example, Carbonite Migrate, but we weighed the pros and cons and found a more suitable solution for cloud-to-cloud migration – Hystax Acura.

How Hystax Acura Works: Theory

This tool works with VMware, KVM and Hyper-V, supports migration from physical servers and is not expensive. So we use it for automatic migration to the CROC Cloud, backups, backup and disaster recovery of customer infrastructure. In all these cases, the architecture and operating principle of the Hystax Acura are the same.
On the one hand, we have a source environment, a source cloud or a physical server where the user’s infrastructure is running. On the other side is the CROC Cloud. And the task is to transfer data from source to target.
image

One of three replication agents is installed on the side of the source environment.
Two of them are for Linux or Windows respectively. They are installed inside a virtual machine and act as services at the operating system level. The third is specifically for migration from VMware. It is deployed at the ESXi host level and allows you to replicate all virtual machines around at once.
Replication agents send data to the target cloud at the block level. Moreover, replication occurs in the background, without the need to stop virtual machines in the source cloud.

The Hystax Acura receiver service receives data and also writes block-by-block to their newly created volume in the CROC Cloud. At the end of replication, we take a snapshot. These snapshots form a chain of restpoints. Data consistency for Windows machines is provided by the standard VSS (Volume Shadow Copy Service) service. For the Linux agent, the Hystax developers have released their own proprietary driver that duplicates the VSS functionality. As for VMware, the CBT API is used there.
After creating a snapshot, all that remains is to issue a command to start the replicated virtual machine in a new location.

Acura performs an automatic V2V conversion on the target cloud just before picking up the vehicle. This is one of the main things you need to do to get the machine to start when you change the hypervisor.

Acura injects new drivers into the virtual machine, cuts out old ones if necessary, and tweaks the bootloader so that the virtual machine starts on KVM.
The duration of this process depends on the size of the machine. V2V conversion takes minutes and runs in parallel if multiple virtual machines are up and running at the same time.

A group of virtual machines is launched according to a pre-defined DR plan. It includes the description of virtual machines, network settings, and orchestration – setting the correct order for starting interdependent services.

Hystax Acura limitations

This approach allows you to migrate to CROC Cloud from almost any technology, but Hystax Acura also has limitations.

So, in the case of Linux, the performance of replication and P2V / V2V conversion depends on the kernel version.

The choice is wide enough, but if the required kernel is not in the list of supported ones, you will have to wait until Hystax prepares custom drivers. In addition, replication agents are not yet able to migrate Windows systems deployed entirely on dynamic disks.

Migration from Hystax Acura in practice

Hystax Acura makes migration to a new cloud noticeably easier. So much so that some cloud providers offer customers to move to them on their own.
In principle, the graphical interface of this solution is simple enough to figure it out without much preparation, but we still do everything ourselves and, as a rule, through the API. We develop a migration or DR plan in advance, deploy replication agents, and conduct test migrations.

Of course, there are no instant transfers, but now the bottleneck is not the work of porting virtual machines from one hypervisor to another, which takes an unpredictable long time, but the network bandwidth.
The most time consuming data transfer occurs during the first replication, but this is a predictable process, and the time required is calculated using a simple calculator.

Small services are replicated in a few hours, even with a fairly modest channel. While the transfer of the infrastructure of a large marketplace may take several days. Fortunately, all this time, the services continue to work, and we can afford to take our time.

With subsequent incremental replications, Acura only sends deltas, changes to virtual machines, and replication is much faster. Therefore, RTO (recovery time objective), the time that the system will be unavailable during the move, is actually equal to the loading speed of cars on the target side.

Screenshot of DR plan (json)

Before the final move, we do a test run and measure how long it takes to load. This is how we determine how long it is necessary to stop services on the source-side in order not to lose a single byte of data. About ten minutes downtime, and the giant infrastructure is ready to receive traffic in our cloud.

As for disaster recovery, according to our calculations, RPO (recovery point objective – the time it takes to lose data) is three minutes for virtual machines on Linux and five minutes for Windows. These are impressive figures, and now you know what technical solutions allow you to achieve them in practice.

If you still have questions about the operation of Hystax Acura and CROC Clouds in general, do not hesitate to leave them in the comments or send them to the mail: RSayfutdinov@croc.ru

Similar Posts

Leave a Reply Cancel reply