how a hyperconverged platform works

Hyperconverged solutions available on the modern market can be divided into two types: “full-fledged” and integrated. Their main difference lies in the hardware that can be used to build the infrastructure. True hyperconvergence (Nutanix HCI and VMware HCI) is not tied to hardware manufacturers: it can easily make friends with the hardware of another vendor. And integrated solutions work only on vendor hardware.

Our vStack platform is also a full-fledged hyperconvergence. Under the cut, I will tell you in detail how the computing (SDC), storage (SDS) and network (SDN) functions are implemented in it, and what opportunities this gives infrastructure owners.

For those who are interested in the history of the creation of vStack, I recommend that you read with previous article.

What is HCI

If you are already familiar with the basic concepts, you can safely skip this part and go to the next section. Here we will analyze how hyperconvergence differs from convergence and what are its main advantages.

Hyper Converged Infrastructure (HCI) is a software-defined infrastructure that combines compute (SDC), data storage (SDS), and software-defined networking (SDN) functions. Unlike classical infrastructure, where these functions are performed by servers with dedicated roles or separate entities like storage systems and routers, in HCI they are assigned to each server participating in the infrastructure.

Pay attention to the image below.

On the right is a classic scheme for organizing a corporate infrastructure: a router, a pair of HA core switches, a pair of general-purpose network switches, the servers themselves (including backup hosts), a pair of SAN switches, a storage complex with reserved controllers (possibly with additional disk shelves) . In such a situation, the use of equipment from different vendors (for example, network equipment from Cisco, storage systems from NetApp, etc.) is an absolutely normal practice. At the same time, to support each segment (computing / network / storage), a separate specialist is required who is familiar with the features of operation, interaction with the vendor and the specifics of the equipment as a whole. NAS remains to the right of the viewport 🤷‍♂️

On the left is a hyperconverged solution. Here, all functions (SDS, SDC, SDN) are performed by a cluster of unified x86 servers with disks.

Agree, it looks much more compact? In this case, the level of accessibility in both the first and second embodiments, other things being equal, will be the same.

Key Benefits of HCI:

The discreteness of the hyperconverged infrastructure is significantly lower even in the above example, despite the fact that in the case of a converged infrastructure in the storage layer, the NAS component simply did not fit on the screen. This factor directly affects the complexity of operation, since many narrow specialists are involved in it. The number of people grows with the size of the infrastructure, as both its management and itself are discrete. Each of the elements (network equipment, storage, SAN, NAS, servers) has its own management interfaces, concepts, APIs, etc. This discreteness leads to the following postulate: changes in a converged infrastructure can stretch for months.

In our hyperconverged solution, there is no such discreteness: management has a single endpoint, interface, and API.

The homogeneity of elements of a hyperconverged infrastructure allows, in the event of a malfunction (if desired), not to waste precious time in the 21st century on its localization. It is enough to remove the disks from the failed server and insert them into ZIP and continue working with the previous (reference) level of redundancy.

Any equipment available on the market will do, which is now more relevant than ever. Even consumer.

It is possible to purchase consumer-segment hardware and thus save on hardware for non-critical environments, such as test environments. Or drastically reduce the cost where horizontally scalable solutions are used that do not require a high degree of redundancy of one element. In addition, it will be somewhat easier to replace a failed “consumer” device than an enterprise one.

It can be serviced by just one experienced specialist – you do not have to hire separate employees for each segment.

vStack Architecture

The platform creates a single cluster space based on servers that simultaneously perform 3 functions at once: Software Defined Storage, Software Defined Networking, Software Defined Computing – storage, network, computing. It combines three traditionally disparate components into one software-defined solution.

Software Defined Computing (SDC)

The SDC layer works on the basis of a second type hypervisor.

The virtio specification is supported for network ports, disks, and other peripherals (rnd, balloon, etc.). Guest OS customization in cloud images is implemented using cloud-init for unix systems and cloud-base for windows.

Cloud images:

  • The ability to limit the performance of the network port (MBps) and disk (IOPS, MBps) in real time.

  • Real-time vCPU resource capping, including autonomous distribution of CPU quantums depending on the degree of overcommit, node utilization and the load of a particular VM, due to which vCPU overcommit rates in industrial clusters sometimes reach 900%.

  • VM snapshots containing its configuration (including network ports and IP addresses, as well as MAC addresses on these network ports).

The autonomous mechanism for budgeting vCPU quantums deserves special mention, thanks to which economic efficiency (efficiency) in the SDC layer up to 900% becomes possible. Not every solution can boast of such a value.

When creating vStack, we primarily focused on its lightness. Low CPU Overhead (decrease in performance of a virtualized server relative to a physical one) is one example of the possible effect of this approach. For example, overhead in KVM in some loads reaches 15%, while in vStack it is only 2-5%. These low values ​​were achieved by using the uniqueness of the bhyve hypervisor.

Software-defined storage (SDS)

The technological basis of SDS is ZFS.

Capabilities:

  • compression and deduplication;

  • internal data integrity;

  • clones, snapshots;

  • self-healing data;

  • transactional integrity.

Limits (pool/filesystem):

  • size: 1 zettabyte;

  • unlimited number of file systems;

  • unlimited number of block devices.

This is what a five-node cluster looks like. Vertical containers are pools, horizontal containers are cluster nodes. Each pool contains a disk from each node. Pool redundancy is always equal to cluster redundancy.

For all industrial installations, we recommend N+2 or higher redundancy (the platform supports N+1, N+2, and N+3).

What happens if an accident occurs?

In the picture below, the yellow host failed and was excluded from the cluster due to the fencing mechanism, and all pools lost one disk each. The cluster automatically failed over the resources of this host, and the yellow pool became available on the blue host. Redundancy has decreased, but the pool continues to work, and without loss of performance. All VMs running on the yellow host continued to run on the blue one.

Software Defined Networking (SDN)

There are three options for provisioning virtual networks:

Regardless of the implementation method, the network can be isolated or routed. A specific instance of a virtual network is based on a distributed switch (indicated by a dark blue rectangle in the picture). The implementation of the virtual network itself, the switch in the netgraph ecosystem, and modifications in the network stack are our own developments that have passed serious tests and have been in commercial operation for a long time (since September 2020) on a large number of clusters.

  • own MTU for each network instance;

  • routed/isolated networks;

  • support jumbo frames;

  • TSO/GSO support.

  • support for path mtu discovery out of the box

Limits:

How we develop vStack

The following features will appear in the new vStack v.2.1 release:

  • Migration of running virtual machines. This functionality will allow you to move active workloads from one server to another without downtime, so that users continue to have uninterrupted access to the systems they need. Similar to vMotion in VMware.

  • Full cloning of virtual machines. Instant creation of a clone of a virtual machine of any size, limited by cluster resources.

  • Support for networks based on GENEVE. An open technology that has a standard.

  • Support for EDGE functionality in the GUI. An EDGE router is a software router through which internal network virtual machines gain access outside it and vice versa.

  • Support for VM ballooning, a mechanism for optimizing the operation of virtual machines with RAM, which allows you to return to the host the memory once allocated by the guest OS, which was subsequently freed.

  • QinQ (802.1ad) support.

  • Increased platform performance in the storage and compute layer.

A few words about licensing

vStack is provided in two models:

Fee for the use of licenses. Monthly billing and payment for points.

Purchase of licenses in the property. One time payment plus annual support fee. In this model, overcommit is available: you can use 33% more resources than allocated under the contract.

In any option, the Managed vStack option is available – remote administration from the vendor. If you do not have an employee who could take on the role of an administrator, you can delegate this task to us.

You can test the capabilities of vStack using serverspace, general ambassador of the platform. If you still have questions that imply an answer, ask them in the comments, I will try to answer everyone, regardless of eye color! 🙂

Similar Posts

Leave a Reply