A small company presented a processor with 900 thousand cores. What is this miracle of technology?

The manufacturer claims the latest, most powerful and largest chip, called Wafer Scale Engine 3 (WSE-3). The cores have already been mentioned, now it’s worth mentioning that the chip contains 4 trillion transistors. Its performance in AI-related computing is 125 petaflops (PFLOPS). The processor is a solid silicon wafer with a total area of ​​46,225 mm² – this is about 60 times larger than the Nvidia H100.

As always with Cerebras, this is not a concept or even a prototype of a chip, but a finished processor made using 5nm technology. It was released by TSMC, the largest contract manufacturer of modern chips, which has been collaborating with Cerebras for five years (or even more).

The company, by the way, appeared in 2016 and is still not only staying afloat, but actively developing. During its existence, it managed to release three of the largest AI chips of its time. In addition, based on these processors, it also produces supercomputers, and compact ones – 15U.

What is the purpose of the processor?

It allows you to train the world's largest AI models relatively quickly, so it is quite possible that the chip will be of interest to corporations like Google, Apple and others. In any case, no one else produces such processors at the moment – those solutions that exist are less powerful.

For example, WSE-3 boasts a memory bandwidth of 21 PB per second. That's about 7,000 more than Nvidia's H100. As for such characteristics as Fabric 214, the indicator of the new chip is approximately 3 thousand times higher than that of the H100. The new product has as much as 44 GB of built-in memory.

The processor is capable of working with external memory of very large capacities – 1.5 TB, 12 TB and 1.2 PB. This is an almost ideal opportunity for training AI models, since they do not have to be separated. According to company representatives, one chip can train an AI model using 24 trillion parameters at once.

It is worth noting that the performance of Cerebras systems depends on the “sparseness” coefficient of operations. According to experts, the company's new system will be slightly less productive during FP16 operations than a pair of Nvidia DGX H100 servers, with the same power consumption and installation area. The result is approximately 15 Pflops versus 15.8 Pflops from Nvidia.

As for the new supercomputer, based on the capabilities of the chip, it can be scaled in clusters of up to 2,048 systems, which allows you to work with models of 70 billion parameters, setting them up in a day. This is an excellent opportunity for those companies involved in the development of artificial intelligence. The system allows the use of frameworks such as PyTorch and TensorFlow.

Not all the characteristics of the new supercomputer are now known, but the previous model, CS-2, consumed 17 kW of energy. The CS-1 required 19 kW.

What about a supercomputer?

Now Cerebras is already in full swing implementing these systems in its Condor Galaxy AI supercluster. It is designed to handle very large-scale problems using artificial intelligence. The cluster will include nine supercomputers from different regions.

This year the cluster is planned to be supplemented with a CG-3 system in Dallas, Texas. During its creation, the developers integrate several CS-3 with a total AI performance of 8 exaflops. As a result, the total result of the supercluster will be approximately 64 exaflops.

But that's not all – the fact is that the giant processor manufacturer is already working with Qualcomm. The partners are going to develop optimized models for Qualcomm AI accelerators with Arm architecture.

Specifically, we plan to optimize models for Cloud AI100 Ultra that will take advantage of techniques such as sparsity, speculative decoding, MX6, and network architecture search.

“As we have already shown, sparsity, when implemented correctly, can significantly improve the performance of accelerators. Speculative decoding is designed to improve the efficiency of a model during deployment by using a small, lightweight model to generate an initial response, and then using a larger model to test the accuracy of that response.” noted Cerebras CEO Andrew Feldman.

Of course, a chip from Cerebras cannot be inexpensive. Most likely, its cost will be much higher than the price of Nvidia H100 accelerators – they sell for about $30 thousand.

In the near future, the company will reveal more information about the chip, the supercomputer, including the cost of both

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *