RISC-V based Snitch processor boasts 6x faster performance

Two new RISC-V ISA extensions enable the Snitch processor to run up to 6.45x faster and more efficiently than comparable processors.

A team of scientists from ETH Zurich presented a new version of the RISC-V processor. She was named Snitch. According to the developers, the new CPU has impressive performance and is capable of delivering 6x the performance and almost 4x the power efficiency for multi-core workloads. But do not rush to conclusions: everything is not as straightforward as it seems at first glance. Under the cut – an analysis of the main features of the new processor, the real “figures” of its performance and information about the developers.

RISC-V Replenishment

Briefly about RISC-V: this name “hides” the free instruction set and processor architecture based on the RISC concept.

Some interesting facts:

  • The RISC-V description contains about 50 standard instructions (due to extensions, 53 more instructions are available, and 34 additional instructions are defined by the C format).

  • The first microcontrollers and processors based on RISC-V were launched in 2017.

  • RISC-V is headquartered in Zurich (founded in 2015), and since 2018, the RISC-V Foundation has been actively collaborating with The Linux Foundation.

Snitch is also built on RISC-V, which adds (quote from the developers) “2 tiny and not obtrusive extensions at all.” These include a compact control core and a double precision FPU. The first extension relieves the processor of some explicit instructions for working with memory. The second allows not to load the control kernel when performing floating point calculations, but to entrust it with a number of other tasks.

“Parallel computing applications, such as data analytics, machine learning and scientific computing software, are increasingly demanding floating point operations per second,” the team explains. “And as the degree of integration increases, energy efficiency is becoming one of the highest priorities in processor design. Although specialized accelerators are already quite effective, their field of application is still too narrow. They are trivially difficult to adapt if you suddenly need to change the original algorithm for which they are sharpened. We, in turn, are offering the market a new architectural concept that will simultaneously solve the problem of achieving extreme energy efficiency and at the same time retain the flexibility that is not less than that of other general-purpose computing devices. “

Maintain maximum utilization of computing resources and increase energy efficiency

The problem of maintaining a high level of utilization of computer power and FPU in particular has previously been the subject of many architectural studies. The most famous and / or widely used examples are superscalar architectures, vector architectures (e.g. Cray), and so-called general-purpose computing using GPUs. But despite the impressive results in terms of achieving high performance, energy efficiency is by no means the main goal of their creators.

Experts from ETH Zurich decided to parallelize the cores in order to evenly distribute the work to each of them. Synchronization between cores is achieved with RISC-V atomic extension and atomic mode support on TCDM and AXI using AXI5 atomic extension and atomic adapter.

Depending on the benchmark, parallelization provides a 3x to 6x performance improvement. The measurements were carried out on a special eight-core cluster, the results obtained were compared with the single-core version. The highest speed of a multicore processor-based computer can be achieved for tasks such as matrix multiplication, 2D convolution, kNN, and Monte Carlo calculations.

Snitch and the embedded FPU can do virtually double the work at a minimum 3.2% overhead. At the same time, it is significantly more flexible than other representatives of modern lines of vector processors, and is twice as energy efficient. To get an idea of ​​absolute energy efficiency, the developers estimated the achievable peak energy efficiency on a 22nm process technology. According to the test results, Snitch reaches 79% of the theoretical peak efficiency.

The final field testing, for which the previously mentioned special eight-core Snitch cluster was built on a 22-nm process technology, showed extremely impressive figures: the processor turned out to be 6.45 times faster and 3.5 times more efficient than a single-core “competitor”.

Developers of the new processor and their plans for the future

At the end of the article, we would like to say a few words about each of the members of the Snitch project development team.

Florian Zaruba earned his bachelor’s and then master’s degree from the Swiss Federal Institute of Technology in 2017. He is currently pursuing a Ph.D. from the Integrated Systems Laboratory. His research interests are focused primarily on the design of high-performance computer architectures.

Fabian Schuiki earned his Bachelor of Science in Electrical Engineering and his Master of Science in 2014 and 2016, respectively. He is currently working on his PhD in Digital Circuits and Systems under the supervision of Luca Benini, who is also a co-author of Snitch. Among his scientific interests are the study of computer architectures, high-precision computing, as well as issues of data processing in memory.

Thorsten Höfler, professor of computer science at ETH Zürich, Switzerland. He is also one of the key members of the Message Passing Interface (MPI) Forum, where he leads the direction of Collective Operations and Topologies. His research interests focus primarily on one central topic, performance-oriented systems design, and include issues related to scalable networks, parallel programming techniques, and performance modeling. Thorsten has received Best Published Paper awards at ACM / IEEE Supercomputing SC10, SC13, SC14, EuroMPI’13, HPDC’15, HPDC’16, IPDPS’15 and other similar events. Hoeffler has also produced many actively peer-reviewed scientific publications for a number of major industry conferences and journals.

Luca Benini Head of the Department of Digital Circuits and Systems at ETH Zürich, and is also a professor at the University of Bologna. Benini’s research focuses primarily on the design of energy efficient computing systems. The author’s Peru owns over 1000 peer-reviewed articles and five full-length books. Benini is a member of the ACM and Academia Europaea, as well as the 2016 IEEE CAS Mac Van Valkenburg Award winner.

As for plans for the future, Florian Zaruba in interviews and press releases says mainly that software development for the new processor will be a little more difficult, but programmers will certainly appreciate the versatility, speed and efficiency of the new CPU. In a special interview for the IEEE, he also mentioned that projects on Snitch could be scaled to thousands of cores with tasks spread across multiple chiplets.

The project source code was published in the PULP Platform repository on GitHub and is available to everyone under the Apache 2.0 license.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *