Looking at the beautiful – power from Uncle Huang, Nvidia HGX H100 in the Netherlands

The holiday is coming to us

* The treasured parcel has arrived at the Dutch office.

So pleasing to the eye of ITGLOBAL.COM system administrators, as well as everyone involved in the purchase and delivery of equipment, a weighty box with a surprise on a wooden pallet. The agonizing wait for its arrival is long gone, all the formalities have been completed, and you just want to start unpacking, and the puzzled colleagues who suddenly caught you in the process only tease you. But you have to restrain yourself and not break down, after all, for marketing you still need to take photos, otherwise where would the photos for this article come from?

And here is the first milestone passed, but the tension is not less, but only more. After all, it is not the coveted new server that greets us yet, but its components. Guides for mounting in racks, wires and other useful, but not so interesting little things, unlike the main course awaiting us. So we continue to dive deep into the cardboard with foam plastic, separating us from the coveted goal.

Here is our handsome fellow, Dell PowerEdge XE9680, which is based on the NVIDIA HGX H100 platform, not to be confused with DGX H100, although the filling is similar there. The difference between these platforms is about the same as between the reference line of Founders Edition video cards and those from OEM manufacturers – ASUS, MSI, Gigabyte, etc. DGX is a ready-made reference platform from NVIDIA itself, and HGX is a platform for OEM manufacturers, in our case, Dell, who make their servers based on it.

The purpose, however, of both HGX and DGX is the same – for data centers with a focus on accelerating tasks related to AI – to train neurons, spin neurons, reduce loss (loss function) in the gradient. Everything is as usual, in general. In fact, no one interferes with doing any other tasks that feel good from hardware acceleration from the GPU, fortunately, there are plenty of them here – as many as 8 NVIDIA H100. But everything else from 3D graphics to Cloud Gaming, which is either alive or dead, is somehow not in fashion. Crypto has gone towards PoS (Proof of Stake), or is spinning on its special ASICs, leaving gamers alone, who no longer need to fight to the death with miners for video cards, as well as memory drives.

By the way, do you remember those dark times when there seemed to be a shortage of video cards for everyone due to the mining boom, while PoW (Proof of Work) with algorithms that work efficiently on GPUs was popular? The boom ended and suddenly it turned out that warehouses were full of video cards for which there was less demand than supply, but oh well. What's more interesting is that in May 2020, Uncle Jenson Huang showed our server's freshly baked older brother, NVIDIA DGX A100 – the largest graphics card in the world at that time. But to be more precise, 8 NVIDIA A100 graphics cards.
I wonder how far progress has come in just 4 years? The answer is obvious, but it is no less fascinating how former leviathans with seemingly titanic productivity so quickly seem like pygmies in the shadow of more recent models.

For a clearer comparison, let's first pull out the module with video cards. Please keep sensitive people away from the screen, as contemplating such a number of CUDA cores and VRAM volume in one place can be harmful to health.

Oh… a sight to behold, isn't it? A truly terrifying sight to behold when you think about the price, each individual H100 costs around 30k euros, and there are 8 of them here. At the total price, this modest wooden pallet is almost like a small house on the Mediterranean coast, impressive. But let's get back to the point, we wanted to see what has changed over these 4 years, and there will be no less impressions there.

Up to 4.5X faster AI performance than A100!
True, the price has also become higher; the recommended price for the A100 at the time of release was about 18 thousand for a 40 gigabyte model, adjusted for inflation That's about $22,000 now. The H100 is about 36% more expensive than its 2020 predecessor.

Returning from economics to hardware, A100 chips produced on TSMC's 7-nanometer process technology, N7 FinFET, and H100 on 4-nanometer. The A100 has 54.2 billion transistors in an area of 826 square millimeters, while the H100 has 80 billion in the same area. Moore's Law may be dead, but silicon-based semiconductors are still refusing to give up, and will be with us until the very end for an indefinite period of time.

In architecture, all this transistor greatness is reflected in the form of 6912 CUDA cores and 432 Tensor cores of the 3rd generation and 80 gigabytes of HBM3 memory at – A100. A at H100 – 14592 CUDA cores and 456 4th generation Tensor Cores, as well as 80 GB of HBM3 memory.

Powerful, and such power requires adequate power.

PowerEdge XE9680 Server Max Power Consumption makes up 11.5 kW, but as you can see from the 6 power supplies with a capacity of 2800 W and a total of 16.8 kW, Dell made sure that the power supply system had a reserve. Plus, everything is modular and can be easily and quickly replaced if necessary.

What won't you do just to train and launch neural networks? Looking at how the appetites of the machine learning industry are rapidly outpacing even those of miners, which sometimes parasitize on the power grid like ticks, you involuntarily begin to think that perhaps a separate nuclear power plant will be needed to train another language model with 100500 trillion parameters. And the waste heat will be recuperated back into a steam turbine, or, as EQUINIX does in Paris, for heating. housesolympic swimming pools and other infrastructure to be launched.

The same applies to a pair of NVME drives with PCI 5.0 support, each with a capacity of 960 gigabytes. Everything is at hand, everything is easily replaced without any extra effort, in a word – a system administrator’s joy.

“Two 48-core Intel Xeon Platinum 8468 processors are difficult to replace as quickly as other components, but the need for this arises extremely rarely. DDR5 memory for 2 terabytes, in essence, changes quickly, but you will not be able to get to it on the fly, you will have to work with a screwdriver, unscrewing the server case.

SuperPOD architecture

One of the key advantages of NVIDIA HGX is its modular and scalable architecture – SuperPOD. DGX SuperPOD consists of modular units (SUs), which allows for easy and quick deployment of clusters of various sizes depending on needs.
An attentive reader will have a question here, what does DGX have to do with it? After all, the previous post was about HGX H100, even more questions will arise if you open the official NVIDIA documentation – DGX is everywhere. It is not particularly clear whether it was intentional or not, but the SuperPOD architecture is applicable both to the video card module from HGX OEM systems and to the reference ones from NVIDIA itself in the DGX line.

DGX H100 systems are equipped with eight NVIDIA H100 GPUs, which deliver incredible computing power. Using NVIDIA NVLink and NVSwitch technologies, these GPUs can be connected together to form a single high-bandwidth system. This means that each GPU can communicate with other GPUs at up to 900 GB/s. And yes, if you want to train giant language models or create impressive generative images, eight H100s can handle any task. Want to train a model to write a novel or draw a masterpiece? Yes, they can.

NVIDIA InfiniBand NDR (up to 400 Gbps) delivers high performance and low latency between nodes. This technology supports adaptive routing and dynamic network restoration, making it ideal for building large, high-performance clusters.

* Illustration from NVIDIA documentation for SuperPOD.

Ease and speed of deployment

The DGX SuperPOD architecture allows for a significant reduction in deployment time. Using modular SUs, a cluster can be deployed in weeks instead of months. This is achieved by pre-configuring and testing all components, eliminating time-consuming configuration and debugging phases.

These advantages make the DGX SuperPOD an ideal choice for companies that want to quickly scale up their computing power for their workloads. Thanks to its modular architecture, you can start with a small cluster and gradually grow it by adding new SUs as needed.

ITGLOBAL.COM

ITGLOBAL.COM clients will be able to take advantage of all the benefits of the HGX H100 architecture as part of our service AI Cloud.
Want your neural network to predict stock exchange rates or write movie scripts? ITGLOBAL.COM will help you with this!

We will be happy to provide our clients with the best solutions for machine learning tasks using advanced technologies.